KR20220059409A

KR20220059409A - Memory apparatus for artificial neural network

Info

Publication number: KR20220059409A
Application number: KR1020210142773A
Authority: KR
Inventors: 김녹원
Original assignee: 주식회사 딥엑스
Priority date: 2020-11-02
Filing date: 2021-10-25
Publication date: 2022-05-10
Also published as: WO2022092988A1

Abstract

According to one disclosure of the present specification, proposed is a memory device for an artificial neural network. The memory device may comprise: at least one memory cell array having N number of columns and M number of rows; and a memory control part configured to sequentially operate a data read or write operation of the at least one memory cell array in a burst mode based on the preset sequential access information. Therefore, the present invention is capable of having an effect that substantially eliminates or reduces the data supply delay.

Description

MEMORY APPARATUS FOR ARTIFICIAL NEURAL NETWORK

본 개시는 메모리 장치에 관한 것으로, 보다 상세하게는, 인공신경망을 위한 메모리 장치에 관한 것이다.The present disclosure relates to a memory device, and more particularly, to a memory device for an artificial neural network.

인공지능 추론 능력이 발전됨에 따라, 인공지능 스피커, 스마트 폰, 스마트 냉장고, VR 장치, AR 장치, 인공지능 CCTV, 인공지능 로봇 청소기, 태블릿, 노트북 컴퓨터, 자율 주행 자동차, 2족 보행 로봇, 4족 보행 로봇, 산업용 로봇 등, 다양한 전자 장치들에 인공지능을 활용한 음향 인식, 음성 인식, 영상 인식, 객체 감지, 운전자 졸음 감지, 위험 순간 감지, 및 제스처 감지 등의 다양한 추론 서비스가 탑재되고 있다.As AI reasoning ability develops, artificial intelligence speaker, smart phone, smart refrigerator, VR device, AR device, AI CCTV, AI robot vacuum cleaner, tablet, laptop computer, autonomous vehicle, biped robot, quadruped Various inference services such as acoustic recognition, voice recognition, image recognition, object detection, driver drowsiness detection, danger moment detection, and gesture detection using artificial intelligence are being installed in various electronic devices such as walking robots and industrial robots.

최근 딥러닝 기술이 발달함에 따라 빅 데이터 기반의 학습을 통한 인공신경망 추론 서비스의 성능이 발전하고 있다. 이러한 인공신경망의 학습 및 추론 서비스는 인공신경망에 방대한 양의 학습 데이터를 반복 학습시키고, 학습된 인공신경망모델을 통해서 다양하고 복잡한 데이터들을 추론한다. 따라서, 인공신경망 기술을 활용하여 다양한 서비스가 상술한 전자 장치들에게 제공되고 있다. With the recent development of deep learning technology, the performance of artificial neural network inference services through big data-based learning is developing. This artificial neural network learning and inference service repeatedly learns a huge amount of learning data in the artificial neural network, and infers various and complex data through the learned artificial neural network model. Accordingly, various services are provided to the above-described electronic devices using artificial neural network technology.

하지만, 인공신경망을 활용하는 추론 서비스에게 요구되는 기능 및 정확도가 점점 증가하고 있다. 이에 따라, 인공신경망모델의 크기, 연산량, 및 학습 데이터의 크기가 기하급수적으로 증가되고 있다. 이러한 인공신경망모델의 추론 연산을 감당할 수 있는 프로세서와 메모리의 요구 성능이 점차 높아지고 있으며, 빅 데이터를 용이하게 처리할 수 있는 클라우드 컴퓨팅(cloud computing) 기반의 서버에서 인공신경망 추론 서비스가 활발하게 제공되고 있다. However, functions and accuracy required for inference services using artificial neural networks are increasing. Accordingly, the size of the artificial neural network model, the amount of computation, and the size of the training data are increasing exponentially. The performance required for the processor and memory that can handle the reasoning operation of the artificial neural network model is gradually increasing, and the artificial neural network inference service is actively provided in the cloud computing-based server that can easily process big data. there is.

한편으론, 인공신경망모델 기술을 활용하는 엣지 컴퓨팅(edge computing)이 활발하게 연구되고 있다. 엣지 컴퓨팅은 컴퓨팅이 일어나는 가장자리, 주변부란 의미이다. 엣지 컴퓨팅은 데이터를 직접 생산하는 단말기나 단말기와 근접한 위치에 있는 다양한 전자 장치들을 의미한다. 엣지 컴퓨팅은 엣지 디바이스(edge device)로 지칭될 수 있다. 엣지 디바이스는 자율 주행 드론, 자율 주행 로봇이나, 자율 주행 자동차처럼 방대한 양의 데이터를 1/100초 이내로 처리해야하는 것처럼, 즉각적이고 안정적으로 필요한 임무를 수행할 때 활용될 수도 있다. 따라서, 엣지 디바이스가 적용될 수 있는 분야가 급격하게 증가하고 있다.On the other hand, edge computing using artificial neural network model technology is being actively studied. Edge computing means the edge, the perimeter, where computing takes place. Edge computing refers to a terminal that directly produces data or various electronic devices located close to the terminal. Edge computing may be referred to as an edge device. Edge devices can also be utilized when performing necessary tasks immediately and reliably, such as autonomous drones, autonomous robots, or autonomous vehicles that need to process massive amounts of data within 1/100 of a second. Accordingly, fields to which edge devices can be applied are rapidly increasing.

본 개시의 발명자는, 종래의 인공신경망모델의 연산은 높은 소비 전력, 발열, 상대적으로 낮은 메모리 대역폭에 의한 프로세서 연산의 병목 현상, 메모리의 지연시간(latency) 등의 문제들을 가진다는 사실을 인식하였다. 따라서 인공신경망모델의 연산 처리 성능을 향상시키는데 다양한 어려움들이 존재한다는 사실을 인식하였고, 이러한 문제들을 개선할 수 있는 인공신경망 메모리 시스템의 개발이 필요하다고 인식하였다. The inventor of the present disclosure recognized the fact that the calculation of the conventional artificial neural network model has problems such as high power consumption, heat generation, a bottleneck of processor operation due to a relatively low memory bandwidth, and latency of the memory. . Therefore, it was recognized that various difficulties exist in improving the computational processing performance of the artificial neural network model, and it was recognized that it was necessary to develop an artificial neural network memory system that can improve these problems.

이에, 본 개시의 발명자는 서버 시스템 및/또는 엣지 컴퓨팅에 적용될 수 있는 인공신경망 메모리 시스템에 대하여 연구하였다. 더 나아가서, 본 개시의 발명자는 인공신경망모델 처리에 최적화된 인공신경망 메모리 시스템의 프로세서인, 신경 프로세싱 유닛(neural processing unit; NPU), 또는 신경 프로세싱 유닛에 대해서도 연구하였다. Accordingly, the inventors of the present disclosure have studied an artificial neural network memory system that can be applied to a server system and/or edge computing. Furthermore, the inventors of the present disclosure have also studied a neural processing unit (NPU), or a neural processing unit, which is a processor of an artificial neural network memory system optimized for artificial neural network model processing.

첫째, 본 개시의 발명자는 인공신경망모델의 연산 시 메모리를 효과적으로 제어하는 것이 인공신경망 연산 처리 속도 향상의 핵심이라고 인식하였다. 본 개시의 발명자는 인공신경망모델을 학습 시키거나 또는 추론 할 때 메모리 제어를 적절히 하지 못할 경우, 필요한 데이터를 사전에 준비하지 못하여 메모리 실효 대역폭 감소 및/또는 메모리의 데이터 공지 지연이 빈번히 발생할 수 있다는 사실을 인식하였다. 또한 본 개시의 발명자는 이러한 경우 프로세서가 처리할 데이터를 공급받지 못하는 기아(starvation) 또는 대기(idle) 상태가 되어 실제 연산을 할 수 없게 되어 연산 성능이 저하된다는 사실을 인식하였다.First, the inventor of the present disclosure recognized that effectively controlling the memory when calculating the artificial neural network model is the key to improving the artificial neural network computation processing speed. The fact that the inventor of the present disclosure may not be able to properly prepare the necessary data in advance to reduce the effective bandwidth of the memory and/or delay the announcement of data in the memory if the memory control is not properly performed when training or inferring the artificial neural network model. was recognized. In addition, the inventors of the present disclosure have recognized the fact that in this case, the processor is in a starvation or idle state that does not receive data to be processed, and thus cannot perform actual calculations, thereby degrading calculation performance.

둘째, 본 개시의 발명자는 종래의 알고리즘 레벨에서의 인공신경망모델의 연산 처리 방식의 한계를 인식하였다. 예를 들면, 종래의 프리패치(prefetch) 알고리즘은 인공신경망모델을 개념적인 레이어 단위로 해석하여 각 레이어 단위로 메모리로부터 데이터를 프로세서가 읽어오는 기술이다. 그러나 프리패치 알고리즘은 프로세서-메모리 레벨, 즉, 하드웨어 레벨에 존재하는 인공신경망모델의 워드 단위 또는 메모리 접근 요청 단위로 인공신경망 데이터 지역성을 인식할 수 없다. 본 개시의 발명자는 프리패치 기법 만으로는 프로세서-메모리 레벨에서 데이터 송수신 동작을 최적화 할 수 없다는 사실을 인식하였다.Second, the inventor of the present disclosure recognized the limitations of the computational processing method of the artificial neural network model in the conventional algorithm level. For example, the conventional prefetch algorithm is a technology in which the processor reads data from the memory in units of layers by interpreting the artificial neural network model in units of conceptual layers. However, the prefetch algorithm cannot recognize the locality of the artificial neural network data in the word unit or memory access request unit of the neural network model existing at the processor-memory level, that is, the hardware level. The inventors of the present disclosure have recognized that it is not possible to optimize the data transmission/reception operation at the processor-memory level using only the prefetch technique.

셋째, 본 개시의 발명자는 인공신경망모델이 가지는 고유한 특성인 “인공신경망 데이터 지역성”에 대하여 인식하였다. 본 개시의 발명자는 프로세서-메모리 레벨에서 워드 단위 또는 메모리 접근 요청 단위로 인공신경망 데이터 지역성이 존재하며 이를 활용하여 실효 메모리 대역폭을 극대화하고, 프로세서에 대한 데이터 공급 지연을 최소화하여 프로세서의 인공신경망 학습/추론 연산 처리 성능을 향상할 수 있다는 사실을 인식하였다. Third, the inventor of the present disclosure recognized the “artificial neural network data locality”, which is a unique characteristic of the artificial neural network model. The inventor of the present disclosure has artificial neural network data locality in word units or memory access request units at the processor-memory level, and utilizes this to maximize the effective memory bandwidth and minimize the delay in data supply to the processor to learn / It has been recognized that the performance of inference processing can be improved.

구체적으로, 본 개시의 발명자가 인식한 인공신경망모델의 “인공신경망 데이터 지역성”이란 프로세서가 특정 인공신경망모델을 처리할 때 해당 인공신경망모델의 구조 및 연산 알고리즘을 따라 수행되는 프로세서가 해당 인공신경망을 연산 처리하는데 필요한 데이터의 워드(word) 단위의 순서 정보를 의미할 수 있다. 더 나아가서, 본 개시의 발명자는 이러한 인공신경망모델의 연산 처리 순서는 프로세서에게 주어지는 인공신경망모델에 대한 반복적인 학습 및/또는 추론의 연산에 대해서 인공신경망 데이터 지역성이 유지되는 특성이 있다는 사실을 인식하였다. 따라서 본 개시의 발명자는 인공신경망 데이터 지역성이 유지될 경우, 프로세서가 처리하는 인공신경망 연산에 필요한 데이터의 처리 순서가 워드 단위로 유지된다는 사실을 인식하였으며, 이러한 정보를 제공받거나 또는 분석하여 인공신경망 연산에 활용할 수 있다는 사실을 인식하였다. 부연 설명하면, 프로세서의 워드 단위는 프로세서가 처리할 수 있는 기본 단위인 엘리먼트 단위를 의미할 수 있다. 예를 들면, 신경 프로세싱 유닛이 N비트의 입력 데이터와 M비트의 커널 가중치를 곱셈을 처리할 경우 프로세서의 입력 데이터 워드 단위는 N비트이고 가중치 데이터의 워드 단위는 M비트일 수 있다. 또한, 본 개시의 발명자는 프로세서의 워드 단위가 인공신경망모델의 레이어, 특징맵, 커널, 활성화 함수 등에 따라 각각 다르게 설정될 수 있다는 사실도 인식하였다. 따라서 본 개시의 발명자는 각각의 워드 단위의 연산을 위해서는 정교한 메모리 제어 기술이 필요하다는 사실도 인식하였다.Specifically, the term “artificial neural network data locality” of the artificial neural network model recognized by the inventor of the present disclosure means that when the processor processes a specific artificial neural network model, the processor that is performed according to the structure and calculation algorithm of the artificial neural network model creates the artificial neural network. It may mean order information in word units of data required for arithmetic processing. Furthermore, the inventor of the present disclosure has recognized the fact that the computational processing order of such an artificial neural network model has a characteristic in which artificial neural network data locality is maintained for iterative learning and/or inference computation for the artificial neural network model given to the processor. . Therefore, the inventor of the present disclosure recognized the fact that when the locality of the artificial neural network data is maintained, the processing order of data required for the artificial neural network operation processed by the processor is maintained in word units, and the artificial neural network operation is performed by receiving or analyzing this information. recognized that it can be used for In more detail, the word unit of the processor may mean an element unit, which is a basic unit that the processor can process. For example, when the neural processing unit multiplies N-bit input data by an M-bit kernel weight, the word unit of input data of the processor may be N bits and the word unit of weight data may be M bits. In addition, the inventors of the present disclosure also recognized the fact that the word unit of the processor may be set differently depending on the layer, the feature map, the kernel, the activation function, etc. of the artificial neural network model. Accordingly, the inventors of the present disclosure have also recognized the fact that sophisticated memory control technology is required for each word unit operation.

본 개시의 발명자는 컴파일러에 의해서 인공신경망모델이 특정 프로세서에서 실행되도록 컴파일 될 때 인공신경망 데이터 지역성이 구성된다는 사실에 주목하였다. 그리고 컴파일러, 인공신경망모델에 적용된 알고리즘들, 및 프로세서의 동작 특성에 따라서 인공신경망 데이터 지역성이 구성될 수 있다는 사실을 인식하였다. 부연 설명하면, 본 개시의 발명자는 동일한 인공신경망모델의 경우에도 프로세서가 해당 인공신경망모델을 연산하는 방식, 예를 들면, 특징맵 타일링, 프로세싱 엘리먼트의 스테이셔너리(Stationary) 기법 등, 프로세서의 프로세싱 엘리먼트 개수, 프로세서내 특징맵 및 가중치 등의 캐쉬 메모리 용량, 프로세서내의 메모리 계층 구조, 해당 인공신경망모델을 연산 처리하기 위한 프로세서의 연산 동작의 순서를 결정해 주는 컴파일러의 알고리즘 특성 등에 따라서 처리하고자 하는 인공신경망모델의 인공신경망 데이터 지역성이 다르게 구성될 수 있다는 사실을 인식하였다. 왜냐하면, 상술한 각 요인들에 의해서 동일한 인공신경망모델을 연산 처리하더라도 프로세서가 클럭 단위로 매 순간 필요한 데이터의 순서를 상이하게 결정할 수 있기 때문이다. 즉, 본 개시의 발명자는 개념적으로 보면 인공신경망모델의 연산에 필요한 데이터의 순서는 인공신경망의 레이어, 단위 합성곱 및/또는 행렬곱의 연산 순서라는 것을 인식하였다. 더 나아가서, 본 개시의 발명자는, 물리적인 연산 처리에 필요한 데이터의 순서는 워드 단위로 프로세서-메모리 레벨, 즉 하드웨어 레벨에서 해당 인공신경망모델의 인공신경망 데이터 지역성이 구성된다는 사실을 인식하였다. 또한 본 개시의 발명자는, 인공신경망 데이터 지역성은 프로세서와 해당 프로세서에 사용된 컴파일러에 의존적인 특성을 가진다는 사실을 인식하였다.The inventors of the present disclosure have paid attention to the fact that the neural network data locality is configured when the neural network model is compiled to be executed on a specific processor by the compiler. And it was recognized that the locality of artificial neural network data can be configured according to the operating characteristics of the compiler, the algorithms applied to the artificial neural network model, and the processor. To elaborate, the inventor of the present disclosure describes a method in which a processor calculates a corresponding artificial neural network model even in the case of the same artificial neural network model, for example, feature map tiling, a stationary technique of processing elements, etc., processing of the processor The number of elements, cache memory capacity such as feature maps and weights in the processor, the memory hierarchical structure in the processor, and the artificial neural network model to be processed according to the algorithm characteristics of the compiler that determine the order of operation of the processor to process the artificial neural network model. It was recognized that the locality of the artificial neural network data in the neural network model can be configured differently. This is because, even if the same artificial neural network model is processed by the above-described factors, the processor may differently determine the order of data required at each moment in clock units. That is, the inventor of the present disclosure recognized that, conceptually, the order of data required for calculation of the artificial neural network model is the operation order of layers of the artificial neural network, unit convolution, and/or matrix product. Furthermore, the inventors of the present disclosure have recognized the fact that the sequence of data required for physical operation processing constitutes the locality of the artificial neural network data of the corresponding artificial neural network model at the processor-memory level, ie, the hardware level, in units of words. In addition, the inventor of the present disclosure recognized the fact that artificial neural network data locality has a characteristic that is dependent on a processor and a compiler used for the processor.

넷째, 본 개시의 발명자는 인공신경망 데이터 지역성 정보를 제공받아 활용하도록 구성된 인공신경망 메모리 시스템을 제공할 경우, 프로세서-메모리 레벨에서 인공신경망모델의 처리 성능을 극대화 할 수 있다는 사실을 인식하였다.Fourth, the inventor of the present disclosure recognized the fact that, when an artificial neural network memory system configured to receive and utilize artificial neural network data locality information is provided, the processing performance of the artificial neural network model can be maximized at the processor-memory level.

본 개시의 발명자는 인공신경망 메모리 시스템이 인공신경망모델의 인공신경망 데이터 지역성을 워드 단위까지 정교하게 파악할 수 있는 경우, 프로세서가 인공신경망모델을 처리하는 최소 단위인 워드 단위의 연산 처리 순서 정보까지도 알 수 있다는 사실을 인식하였다. 즉, 인공신경망 데이터 지역성을 활용할 수 있는 인공신경망 메모리 시스템을 제공할 경우, 인공신경망 메모리 시스템은 워드 단위로 정교하게 특정 데이터를 특정 타이밍에 메모리에서 읽어서 프로세서에게 제공할지 여부 또는 특정 데이터를 프로세서가 연산하여 특정 타이밍에 메모리에 저장할지 여부를 사전에 예측할 수 있다는 사실을 인식하였다. 이에 본 개시의 발명자는 인공신경망 메모리 시스템을 제공하여 워드 단위로 프로세서가 요청할 데이터를 사전에 준비할 수 있다는 사실을 인식하였다. The inventor of the present disclosure can know even the operation processing order information of the word unit, which is the minimum unit for the processor to process the artificial neural network model, when the artificial neural network memory system can precisely grasp the locality of the artificial neural network data of the artificial neural network model to the word unit. recognized the fact that That is, if an artificial neural network memory system that can utilize artificial neural network data locality is provided, the artificial neural network memory system precisely reads specific data from the memory at a specific timing in word units and provides it to the processor, or whether the processor calculates the specific data. Therefore, we recognized the fact that it is possible to predict in advance whether or not to store in the memory at a specific timing. Accordingly, the inventor of the present disclosure recognized the fact that by providing an artificial neural network memory system, data to be requested by the processor in word units can be prepared in advance.

부연 설명하면, 본 개시의 발명자는 인공신경망 메모리 시스템이 인공신경망 데이터 지역성을 알면, 프로세서가 특징맵 타일링과 같은 기법을 사용하여 특정 입력 데이터와 특정 커널의 합성곱을 연산 할 때 커널이 특정 방향으로 이동하면서 처리 되는 합성곱의 연산 처리 순서도 워드 단위로 알 수 있다는 사실을 인식하였다. In other words, the inventor of the present disclosure, if the neural network memory system knows the locality of the artificial neural network data, the kernel moves in a specific direction when the processor calculates the convolution of the specific input data and the specific kernel using a technique such as feature map tiling We recognized the fact that the processing order of the convolution to be processed can also be known in word units.

즉, 인공신경망 메모리 시스템이 인공신경망 데이터 지역성을 활용하여 프로세서가 어떠한 데이터를 필요로 하는가를 사전에 예측함으로써, 프로세서가 요청할 메모리 읽기/쓰기 동작을 사전에 예측하고, 프로세서가 처리할 데이터를 사전에 준비하여 메모리 실효 대역폭 증가 및/또는 메모리의 데이터 공급 지연을 최소화 하거나 제거할 수 있다는 사실을 인식하였다. 또한 인공신경망 메모리 시스템이 프로세서가 처리할 데이터를 필요한 타이밍에 공급할 수 있다면 프로세서의 기아 또는 대기 상태를 최소화 할 수 있게 된다는 사실을 인식하였다. 따라서, 본 개시의 발명자는 인공신경망 메모리 시스템에 의해서 연산 처리 성능 향상과 전력 소모를 저감 효과가 제공될 수 있다는 사실을 인식하였다.That is, the artificial neural network memory system utilizes the artificial neural network data locality to predict in advance what kind of data the processor needs, so that the memory read/write operation requested by the processor is predicted in advance, and the data to be processed by the processor is predicted in advance. It has been recognized that it is possible to minimize or eliminate the increase in the effective bandwidth of the memory and/or the delay in data supply of the memory by preparing. In addition, we recognized the fact that if the artificial neural network memory system can supply the data to be processed by the processor at the required timing, the starvation or standby state of the processor can be minimized. Accordingly, the inventors of the present disclosure have recognized that the artificial neural network memory system can provide an effect of improving arithmetic processing performance and reducing power consumption.

다섯째, 본 개시의 발명자는, 인공신경망 메모리 제어부가 인공신경망 데이터 지역성 정보를 제공받지 않더라도, 인공신경망 메모리 제어부를 인공신경망모델을 처리하고 있는 프로세서와 메모리의 사이의 통신 채널에 배치한 다음, 프로세서가 특정 인공신경망모델의 연산을 처리할 때 메모리에게 요청하는 데이터 접근 요청을 분석하여, 프로세서가 처리중인 인공신경망모델의 인공신경망 데이터 지역성을 프로세서-메모리간 데이터 접근 요청 단위로 유추할 수 있다는 사실을 인식하였다. 즉, 각각의 인공신경망모델에는 고유한 인공신경망 데이터 지역성이 존재하기 때문에, 프로세서-메모리 레벨에서 프로세서는 인공신경망 데이터 지역성에 따라서 특정한 순서로 데이터 접근 요청을 생성한다는 사실을 인식하였다. 또한 프로세서가 해당 인공신경망모델을 학습/추론 연산을 반복적으로 연산 처리하면서 인공신경망 데이터 지역성은 유지된다는 사실에 기초해 프로세서-메모리간 데이터 요청을 위한 메모리에 저장된 데이터의 액세스 순서도 유지됨을 인식하였다.Fifth, the inventor of the present disclosure, even if the artificial neural network memory controller is not provided with artificial neural network data locality information, the artificial neural network memory controller is placed in a communication channel between the processor and the memory processing the artificial neural network model, and then the processor By analyzing the data access request requested to the memory when processing the operation of a specific artificial neural network model, it is recognized that the locality of the artificial neural network data of the artificial neural network model being processed by the processor can be inferred from the processor-memory data access request unit. did That is, since each neural network model has its own artificial neural network data locality, the processor at the processor-memory level recognized the fact that the processor generates data access requests in a specific order according to the neural network data locality. In addition, it was recognized that the access order of data stored in the memory for data requests between the processor and memory is also maintained based on the fact that the locality of the artificial neural network data is maintained while the processor repeatedly processes the learning/inference operation of the neural network model.

이에, 본 개시의 발명자는, 인공신경망 메모리 제어부를 인공신경망모델을 연산 처리하고 있는 프로세서와 메모리의 통신 채널에 배치하였다. 또한, 첫번째 또는 몇차례의 학습 및 추론 연산을 위한 프로세서-메모리간 데이터 접근 요청을 관찰함으로써 인공신경망 메모리 제어부가 데이터 접근 요청 단위로 인공신경망 데이터 지역성을 유추할 수 있다는 사실을 인식하였다. 따라서 본 개시의 발명자는, 인공신경망 데이터 지역성 정보가 제공되지 않더라도, 인공신경망 메모리 제어부에 의해서 인공신경망 데이터 지역성을 유추할 수 있다는 사실을 인식하였다.Accordingly, the inventors of the present disclosure have arranged the artificial neural network memory control unit in a communication channel between the processor and the memory for processing the artificial neural network model. In addition, by observing the data access request between the processor and memory for the first or several times of learning and inference operations, it was recognized that the neural network memory controller can infer the locality of the artificial neural network data in units of data access requests. Therefore, the inventor of the present disclosure recognized the fact that artificial neural network data locality can be inferred by the artificial neural network memory controller even if artificial neural network data locality information is not provided.

이에, 본 개시의 발명자는, 데이터 접근 요청 단위로 재구성된 인공신경망 데이터 지역성에 기초하여 프로세서가 요청할 메모리 읽기/쓰기 동작을 사전에 예측하고, 프로세서가 처리할 데이터를 사전에 준비하여 메모리 실효 대역폭 증가 및/또는 메모리 데이터 공급 지연을 최소화 또는 실질적으로 제거할 수 있다는 사실을 인식하였다. 또한, 본 개시의 발명자는 인공신경망 메모리 시스템이 프로세서가 처리할 데이터를 필요한 타이밍에 공급할 수 있다면 프로세서의 기아 또는 대기 상태 발생률을 최소화 할 수 있게 된다는 사실을 인식하였다.Accordingly, the inventor of the present disclosure predicts in advance the memory read/write operation requested by the processor based on the artificial neural network data locality reconstructed in the data access request unit, and prepares the data to be processed by the processor in advance to increase the effective memory bandwidth and/or can minimize or substantially eliminate memory data supply delay. In addition, the inventors of the present disclosure have recognized the fact that if the artificial neural network memory system can supply data to be processed by the processor at a necessary timing, the occurrence rate of starvation or standby state of the processor can be minimized.

이에 본 개시가 해결하고자 하는 과제는 프로세서-메모리 레벨에서 동작하는 인공신경망모델의 인공신경망 데이터 지역성을 활용하여, 프로세서의 인공신경망 연산을 최적화할 수 있는 인공신경망 메모리 시스템을 제공하는 것이다. Accordingly, an object of the present disclosure is to provide an artificial neural network memory system capable of optimizing the artificial neural network operation of the processor by utilizing the artificial neural network data locality of the artificial neural network model operating at the processor-memory level.

이에 본 개시가 해결하고자 하는 과제는 프로세서가 생성하는 데이터 접근 요청을 감지하여 프로세서가 처리중인 인공신경망모델의 데이터 지역성 패턴을 생성하여, 프로세서가 요청할 데이터 접근 요청을 사전에 준비하여 메모리의 지연시간 문제를 개선할 수 있는 인공신경망 메모리 제어부를 포함하는 인공신경망 메모리 시스템을 제공하는 것이다. 단 본 개시는 이에 제한되지 않으며, 또 다른 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.Accordingly, the problem to be solved by the present disclosure is to detect the data access request generated by the processor, generate the data locality pattern of the artificial neural network model being processed by the processor, and prepare the data access request requested by the processor in advance to solve the memory latency problem It is to provide an artificial neural network memory system including an artificial neural network memory control unit that can improve the However, the present disclosure is not limited thereto, and other problems will be clearly understood by those skilled in the art from the following description.

본 명세서의 일 개시에 따르면 인공 신경망을 위한 메모리 장치가 제시된다. 상기 메모리 장치는 N개의 컬럼 및 M개의 로우를 가지는 적어도 하나의 메모리 셀 어레이; 및 기 설정된 순차 접근 정보에 기초하여 상기 적어도 하나의 메모리 셀 어레이의 데이터 읽기 또는 쓰기 동작을 순차적인 버스트 모드로 동작 시키도록 구성된 메모리 제어부를 포함할 수 있다.According to one disclosure of the present specification, a memory device for an artificial neural network is provided. The memory device may include: at least one memory cell array having N columns and M rows; and a memory controller configured to sequentially read or write data of the at least one memory cell array in a burst mode based on preset sequential access information.

상기 메모리 자치는 상기 적어도 하나의 메모리 셀 어레이는 누설 전류 특성을 가지는 복수의 동적 메모리 셀을 더 포함할 수 있다.The at least one memory cell array may further include a plurality of dynamic memory cells having leakage current characteristics.

상기 적어도 하나의 메모리 셀 어레이는: 상기 N개의 컬럼의 접근을 제어하는 컬럼 디코더; 상기 컬럼 디코더와 연결된 복수의 비트 라인; 상기 M개의 로우의 접근을 제어하는 로우 디코더; 상기 로우 디코더와 연결된 복수의 워드 라인; 및 상기 복수의 비트 라인의 일단에 연결된 감지 증폭기;를 포함할 수 있다.The at least one memory cell array may include: a column decoder for controlling access to the N columns; a plurality of bit lines connected to the column decoder; a row decoder for controlling access to the M rows; a plurality of word lines connected to the row decoder; and a sense amplifier connected to one end of the plurality of bit lines.

상기 메모리 제어부는, 상기 순차 접근 정보에 기초하여 인공신경망 연산을 처리할 프로세서와 상기 인공신경망 연산에 필요한 데이터가 저장되는 상기 적어도 하나의 메모리 셀 어레이의 데이터 통신을 제어하도록 구성될 수 있다.The memory controller may be configured to control data communication between a processor to process an artificial neural network operation and the at least one memory cell array in which data required for the artificial neural network operation is stored based on the sequential access information.

상기 순차 접근 정보는 프로세서가 처리하고자 하는 인공신경망의 "인공신경망 데이터 지역성"에 기초하여 생성될 수 있다.The sequential access information may be generated based on "locality of artificial neural network data" of the artificial neural network that the processor intends to process.

상기 메모리 제어부는, 상기 적어도 하나의 메모리 셀 어레이가 상기 순차 접근 정보에 기초하여 상기 순차적인 버스트 모드로 동작하도록 상기 적어도 하나의 메모리 셀 어레이의 상기 N개의 컬럼 및 상기 M개의 로우 주소를 직접 제어하도록 구성될 수 있다.The memory controller is configured to directly control the N column and M row addresses of the at least one memory cell array so that the at least one memory cell array operates in the sequential burst mode based on the sequential access information. can be configured.

상기 메모리 제어부는, 상기 적어도 하나의 메모리 셀 어레이에 저장될 연산 단계별 데이터의 메모리 주소들을 상기 순차 접근 정보에 기초하여 설정하도록 구성될 수 있다.The memory controller may be configured to set memory addresses of data for each operation step to be stored in the at least one memory cell array based on the sequential access information.

상기 메모리 제어부는, 상기 적어도 하나의 메모리 셀 어레이의 상기 N개의 컬럼 및 M개의 로우에 대응되는 어드레스 값을 순차적으로 할당하여 인공신경망 데이터를 저장하도록 구성될 수 있다.The memory controller may be configured to sequentially allocate address values corresponding to the N columns and M rows of the at least one memory cell array to store artificial neural network data.

본 명세서의 일 개시에 따르면 인공 신경망을 위한 메모리 장치가 제시된다. 상기 메모리 장치는 적어도 하나의 메모리 셀 어레이; 및 인공신경망 데이터 지역성을 기초로 상기 적어도 하나의 메모리 셀 어레이의 읽기 또는 쓰기 동작을 직접 제어하도록 구성된 메모리 제어부를 포함할 수 있다.According to one disclosure of the present specification, a memory device for an artificial neural network is provided. The memory device may include at least one memory cell array; and a memory controller configured to directly control a read or write operation of the at least one memory cell array based on artificial neural network data locality.

상기 인공신경망 데이터 지역성은, 프로세서가 처리할 인공신경망의 기 설정된 연산 순서 정보를 포함할 수 있다.The artificial neural network data locality may include preset operation order information of the artificial neural network to be processed by the processor.

상기 인공신경망 데이터 지역성은, 기 설정된 연산 순서 각각의 데이터 크기 정보를 포함할 수 있다.The artificial neural network data locality may include data size information for each of a preset operation order.

상기 인공신경망 메모리 제어부는, 기 설정된 연산 순서 정보 및 대응되는 연산 순서 각각의 데이터 크기를 기초로 순차적인 매너(manner)로 생성된 메모리 맵을 저장하도록 구성될 수 있다.The artificial neural network memory controller may be configured to store a memory map generated in a sequential manner based on preset operation order information and a data size of each corresponding operation order.

상기 인공신경망 데이터 지역성은 가중치, 입력 특징맵, 및 출력 특징맵을 식별하는 신호를 더 포함할 수 있다. 상기 가중치 데이터, 입력 특징맵 데이터, 및 출력 특징맵 데이터의 연산 순서 패턴은 프로세서의 특성에 기초하여 컴파일될 때 결정될 수 있다.The artificial neural network data locality may further include a signal for identifying a weight, an input feature map, and an output feature map. The operation order pattern of the weight data, the input feature map data, and the output feature map data may be determined when compiled based on the characteristics of the processor.

상기 인공신경망 데이터 지역성은, 인공신경망모델의 특성, 프로세서의 특성, 캐시 메모리의 크기, 및/또는 연산 알고리즘 정책 중 적어도 하나에 기초하여 결정될 수 있다.The artificial neural network data locality may be determined based on at least one of a characteristic of an artificial neural network model, a characteristic of a processor, a size of a cache memory, and/or an operation algorithm policy.

본 명세서의 일 개시에 따르면 인공 신경망을 위한, 메모리 장치가 제시된다. 상기 메모리 장치는 적어도 하나의 동적 메모리 셀 어레이; 및 상기 적어도 하나의 메모리 셀 어레이에 인공신경망 데이터 지역성에 기초한 순서대로 인공신경망 데이터를 저장하도록 구성된 메모리 제어부를 포함할 수 있다.According to one disclosure of the present specification, a memory device for an artificial neural network is provided. The memory device includes at least one dynamic memory cell array; and a memory controller configured to store artificial neural network data in the at least one memory cell array in an order based on locality of the artificial neural network data.

상기 인공신경망 데이터 지역성에 기초한 순서는 적어도 입력 특징맵, 커널, 및 출력 특징맵 순서의 반복 패턴을 포함할 수 있다.The order based on the locality of the artificial neural network data may include at least a repeating pattern of an input feature map, a kernel, and an output feature map order.

상기 인공신경망 데이터 지역성에 기초한 순서는 적어도 커널, 입력 특징맵, 및 출력 특징맵 순서의 반복 패턴을 포함할 수 있다.The order based on the locality of the artificial neural network data may include at least a repeating pattern of a kernel, an input feature map, and an output feature map order.

상기 인공신경망 데이터 지역성은 프로세서가 상기 메모리 제어부에 요청하는 데이터 접근 요청 단위로 구성될 수 있다. 상기 인공신경망 데이터 지역성은 상기 프로세서가 처리하는 인공신경망의 1회 추론에 필요한 모든 데이터 접근 요청의 순서 정보를 포함할 수 있다.The artificial neural network data locality may be configured in a data access request unit that a processor requests from the memory control unit. The artificial neural network data locality may include order information of all data access requests required for one-time inference of the artificial neural network processed by the processor.

상기 메모리 제어부는 상기 커널, 입력 특징맵, 및 출력 특징맵을 구분하도록 구성된 식별 정보에 기초하여 상기 적어도 하나의 동적 메모리 셀 어레이의 영역을 커널 영역 및 특징맵 영역으로 구분하도록 구성될 수 있다.The memory controller may be configured to divide the region of the at least one dynamic memory cell array into a kernel region and a feature map region based on identification information configured to distinguish the kernel, the input feature map, and the output feature map.

상기 적어도 하나의 동적 메모리 셀 어레이는 인터리빙(interleving) 동작이 가능하도록 구성된 복수의 뱅크를 더 포함할 수 있다. 상기 메모리 제어부는 상기 복수의 뱅크에 상기 인터리빙 동작에 대응되는 버스트 모드로 동작하도록 상기 인공신경망 데이터를 각각의 뱅크에 분할하여 저장하도록 구성될 수 있다.The at least one dynamic memory cell array may further include a plurality of banks configured to enable an interleaving operation. The memory controller may be configured to divide and store the artificial neural network data in each bank to operate in a burst mode corresponding to the interleaving operation in the plurality of banks.

상기 메모리 장치는 상기 메모리 제어부에 상기 인공신경망 데이터 지역성 정보를 제공하도록 구성된, 프로세서를 더 포함할 수 있다.The memory device may further include a processor configured to provide the artificial neural network data locality information to the memory controller.

상기 메모리 장치는 상기 메모리 제어부에 적어도 입력 특징맵, 커널, 및 출력 특징맵 식별 정보를 제공하도록 구성된, 프로세서를 더 포함할 수 있다. The memory device may further include a processor configured to provide at least input feature map, kernel, and output feature map identification information to the memory controller.

본 개시의 실시예들에 따르면, 인공신경망을 처리하는 시스템에서 인공신경망 데이터 지역성에 의해서 프로세서에 대한 메모리의 데이터 공급 지연을 실질적으로 제거하거나 저감할 수 있는 효과가 있다.According to the embodiments of the present disclosure, there is an effect that can substantially eliminate or reduce the delay in data supply of the memory to the processor by the artificial neural network data locality in the system for processing the artificial neural network.

본 개시의 실시예들에 따르면, 인공신경망 메모리 제어부는 프로세서-메모리 레벨에서 처리되는 인공신경망모델의 데이터를 프로세서가 요청하기 전에 사전에 준비할 수 있는 효과가 있다.According to the embodiments of the present disclosure, the artificial neural network memory control unit has an effect of being able to prepare in advance the data of the artificial neural network model processed at the processor-memory level before the processor requests it.

본 개시의 실시예들에 따르면, 프로세서가 처리하는 인공신경망모델의 학습 및 추론 연산 처리 시간이 단축되어 해당 프로세서의 연산 처리 성능이 향상되며, 시스템 레벨의 연산 처리에 대한 전력 효율성이 향상될 수 있는 효과가 있다.According to the embodiments of the present disclosure, the learning and inference calculation processing time of the artificial neural network model processed by the processor is shortened, so that the calculation processing performance of the corresponding processor is improved, and the power efficiency for the system-level calculation processing can be improved. It works.

본 개시에 따른 효과는 이상에서 예시된 내용에 의해 제한되지 않으며, 더욱 다양한 효과들이 본 명세서 내에 포함되어 있다.Effects according to the present disclosure are not limited by the contents exemplified above, and more various effects are included in the present specification.

도 1a는 본 개시의 일 예시에 따른 인공신경망 데이터 지역성에 기초한 인공 신경망 메모리 시스템의 프로세서 및 인공신경망 메모리 제어부를 설명하는 개략적인 블록도이다.
도 1b는 본 개시의 다양한 예시들에 적용될 수 있는 인공신경망 데이터 지역성 패턴의 재구성의 설명을 위한 예시적인 신경 프로세싱 유닛의 예시를 나타내는 개략도이다.
도 2는 본 개시의 일 예시에 따른 인공신경망 데이터 지역성 패턴을 설명하는 개략도이다.
도 3은 본 개시의 다양한 예시들에 적용될 수 있는 인공신경망 데이터 지역성 패턴의 설명을 위한 예시적인 인공신경망모델을 나타내는 개략도이다.
도 4는 본 개시의 일 예시에 따른 인공신경망 메모리 제어부가 도 3a의 인공신경망모델을 분석하여 생성한 인공신경망 데이터 지역성 패턴을 설명하는 개략도이다.
도 5는 도 4의 인공신경망 데이터 지역성 패턴에 대응되는 토큰과 식별 정보를 설명하는 개략도이다.
도 6은 본 개시의 일 예시에 따른 인공신경망 메모리 제어부가 인공신경망 데이터 지역성 패턴에 기초하여 생성한 예측된 데이터 접근 요청과 실제 데이터 접근 요청을 설명하는 개략도이다.
도 7은 본 개시의 일 예시에 따른 인공신경망 메모리 제어부의 동작을 개략적으로 설명하는 순서도이다.
도 8은 본 개시의 다른 예시에 따른 인공신경망 메모리 시스템을 설명하는 개략적인 블록도이다.
도 9는 본 개시의 비교예에 따른 메모리 시스템의 동작을 설명하는 개략도이다.
도 10은 본 개시의 다른 예시에 따른 메모리 시스템의 설명하는 개략도이다.
도 11은 본 개시의 또 다른 예시에 따른 인공신경망 메모리 시스템을 설명하는 개략적인 블록도이다.
도 12는 데이터 접근 요청의 예시적인 식별 정보를 설명하는 개략도이다.
도 13은 인공신경망 메모리 시스템의 단위 동작 당 에너지 소모를 설명하는 개략도이다.
도 14는 본 개시의 다양한 예시들에 따른 인공신경망 메모리 시스템을 설명하는 개략도이다.
도 15는 메모리가 실장된 기판과 채널을 나타낸 예시도이다.
도 16은 다중 뱅크 구조의 메모리에서 데이터를 읽는 과정을 나타낸 예시도이다.
도 17은 종래의 DRAM에서 발생하는 레이턴시를 나타낸 예시도이다.
도 18은 본 명세서의 개시에 따른 SAM(Sequential Access Memory)의 기본 개념을 나타낸 예시도이다.
도 19는 16개의 레이어에 대한 연산량과 데이터 사이즈를 예시적으로 나타낸 테이블이다.
도 20은 28개의 레이어에 대한 연산량과 데이터 사이즈를 예시적으로 나타낸 테이블이다.
도 21은 인공신경망 데이터 지역성(ANN DL) 정보 내의 순서 정보에 따라 메모리에 액세스하는 제1 예시를 나타낸 테이블이다.
도 22는 도 21에 도시된 테이블을 간략화화여 나타낸 예시적 테이블이다.
도 23은 도 22에 도시된 테이블에 따라서, SAM이 메모리 주소 맵을 설정한 예를 나타낸다.
도 24는 인공신경망 데이터 지역성(ANN DL) 정보 내의 순서 정보에 따라 메모리에 액세스하는 제2 예시를 나타낸 테이블이다.
도 25는 도 24에 도시된 테이블에 따라서, SAM이 메모리 주소 맵을 설정한 예를 나타낸다.
도 26은 인공신경망 데이터 지역성(ANN DL) 정보 내의 순서 정보에 따라 메모리에 액세스하는 제3 예시를 나타낸 테이블이다.
도 27a 및 도 27b는 인공신경망 데이터 지역성 정보에 따라서 메모리 주소 맵을 설정한 예를 나타낸다.
도 28은 SAM 컨트롤러의 제어 신호를 나타낸 개념도이다.
도 29는 도 28에 도시된 사이드밴드 시그널에 따른 메모리 주소 맵을 설정한 일 예를 나타낸 예시도이다.
도 30a는 사이드밴드 시그널에 따른 메모리 주소 맵을 설정한 다른 예를 나타내고, 도 30b는 커널만 순차적으로 기록한 메모리 주소 맵의 일 예를 나타낸다.
도 31a은 사이드밴드 시그널을 통해 전송되는 'READ_DISCARD' 명령을 본 명세서에서 제시하는 일 예에 따라 나타낸 예시도이고, 도 31b는 READ 명령의 예시를 나타낸다.
도 32는 본 명세서에서 제시하는 일 예에 따라 DRAM의 메모리셀 형태로 구현된 예시적인 SAM의 회로도 일부를 나타낸다.
도 33은 도 32에 도시된 SAM 회로도에서 프리차지(precharge) 동작을 설명하기 위한 예시도이다.
도 34는 도 32에 도시된 SAM 회로도에서 메모리 셀 액세스 동작을 설명하기 위한 예시도이다.
도 35는 도 32에 도시된 SAM 회로도에서 데이터 검출(DATA SENSE) 동작을 설명하기 위한 예시도이다.
도 36은 도 32에 도시된 SAM 회로도에서 READ-DISCARD 동작을 설명하기 위한 예시도이다.
도 37은 도 32에 도시된 SAM 회로도에서 READ 동작을 설명하기 위한 예시도이다.
도 38a은 READ-DISCARD 동작의 예시적인 파형도이고, 도 38b는 READ 동작의 예시적인 파형도이다.
도 39는 REFREASH 동작에 대해서 설명하기 위하여 도 21에 도시된 테이블의 일부를 발췌하여 나타낸 테이블이다.
도 40은 본 명세서에서 제시되는 예시에 따라 SAM 메모리가 다양한 형태로 구현되는 예를 나타낸다.
도 41은 ANN 데이터 지역성 정보에 기초하여 메인 메모리의 주소를 매핑하는 방식의 일 예를 나타낸 예시도이다.
도 42는 ANN 데이터 지역성 정보에 기초하여 메인 메모리의 주소를 매핑하는 방식의 다른 예를 나타낸 예시도이다.
도 43은 인공신경망 데이터 지역성(ANN DL) 정보 내의 순서 정보에 따라 메모리에 액세스하는 예를 나타낸 테이블이다.
도 44는 SAM 컨트롤러가 내장된 메모리의 예를 나타낸 예시도이다.
도 45는 컴파일러를 포함하는 아키텍처를 나타낸 예시도이다.
도 46은 제1 예시에 따른 아키텍처를 나타낸다.
도 47은 제2 예시에 따른 아키텍처를 나타낸다.
도 48은 제3 예시에 따른 아키텍처를 나타낸다.
도 49는 제4 예시에 따른 아키텍처를 나타낸다.
도 50는 제5 예시에 따른 아키텍처를 나타낸다.
도 51은 제6 예시에 따른 아키텍처를 나타낸다.
도 52는 도 51에 도시된 제6 예시에 따른 동작을 나타낸 예시도이다.
도 53a 및 도 53b는 합성곱의 예시를 나타낸 예시도들이다.
도 54는 메인 메모리 내의 데이터를 캐쉬 메모리에 캐싱한 후, 타일링 기법에 기초하여 연산을 수행하는 다른 예를 나타낸다.
도 55는 본 개시의 다양한 예시들에 따른 인공신경망 메모리 시스템을 설명하는 개략도이다.
도 56은 도 55에 도시된 SFU의 상세 동작 구성을 나타낸다.
도 57는 버퍼 메모리(캐시)와 메인 메모리 간에 데이터 버스의 대역폭을 측정한 그래프를 나타낸다.1A is a schematic block diagram illustrating a processor and an artificial neural network memory controller of an artificial neural network memory system based on artificial neural network data locality according to an example of the present disclosure.
1B is a schematic diagram illustrating an example of an exemplary neural processing unit for explanation of reconstruction of an artificial neural network data locality pattern that may be applied to various examples of the present disclosure.
2 is a schematic diagram illustrating an artificial neural network data locality pattern according to an example of the present disclosure.
3 is a schematic diagram illustrating an exemplary artificial neural network model for explaining an artificial neural network data locality pattern that can be applied to various examples of the present disclosure.
4 is a schematic diagram illustrating an artificial neural network data locality pattern generated by an artificial neural network memory controller analyzing the artificial neural network model of FIG. 3A according to an example of the present disclosure.
5 is a schematic diagram illustrating a token and identification information corresponding to the artificial neural network data locality pattern of FIG. 4 .
6 is a schematic diagram illustrating a predicted data access request and an actual data access request generated by an artificial neural network memory controller based on an artificial neural network data locality pattern according to an example of the present disclosure.
7 is a flowchart schematically illustrating an operation of an artificial neural network memory controller according to an example of the present disclosure.
8 is a schematic block diagram illustrating an artificial neural network memory system according to another example of the present disclosure.
9 is a schematic diagram illustrating an operation of a memory system according to a comparative example of the present disclosure.
10 is a schematic diagram illustrating a memory system according to another example of the present disclosure.
11 is a schematic block diagram illustrating an artificial neural network memory system according to another example of the present disclosure.
12 is a schematic diagram illustrating exemplary identification information of a data access request.
13 is a schematic diagram illustrating energy consumption per unit operation of an artificial neural network memory system.
14 is a schematic diagram illustrating an artificial neural network memory system according to various examples of the present disclosure.
15 is an exemplary diagram illustrating a substrate and a channel on which a memory is mounted.
16 is an exemplary diagram illustrating a process of reading data from a memory having a multi-bank structure.
17 is an exemplary diagram illustrating latency occurring in a conventional DRAM.
18 is an exemplary diagram illustrating a basic concept of a Sequential Access Memory (SAM) according to the disclosure of the present specification.
19 is a table exemplarily showing the amount of computation and data size for 16 layers.
20 is a table exemplarily showing the amount of computation and data size for 28 layers.
21 is a table showing a first example of accessing a memory according to order information in artificial neural network data locality (ANN DL) information.
22 is an exemplary table showing the table shown in FIG. 21 in a simplified manner.
23 shows an example in which the SAM sets a memory address map according to the table shown in FIG. 22 .
24 is a table showing a second example of accessing a memory according to order information in artificial neural network data locality (ANN DL) information.
25 shows an example in which the SAM sets a memory address map according to the table shown in FIG. 24 .
26 is a table showing a third example of accessing a memory according to order information in artificial neural network data locality (ANN DL) information.
27A and 27B show examples of setting a memory address map according to artificial neural network data locality information.
28 is a conceptual diagram illustrating a control signal of a SAM controller.
29 is an exemplary diagram illustrating an example of setting a memory address map according to a sideband signal shown in FIG. 28 .
30A shows another example of setting a memory address map according to a sideband signal, and FIG. 30B shows an example of a memory address map in which only a kernel is sequentially recorded.
31A is an exemplary diagram illustrating a 'READ_DISCARD' command transmitted through a sideband signal according to an example presented in the present specification, and FIG. 31B illustrates an example of the READ command.
32 shows a part of a circuit diagram of an exemplary SAM implemented in the form of a memory cell of a DRAM according to an example presented herein.
FIG. 33 is an exemplary diagram for explaining a precharge operation in the SAM circuit diagram shown in FIG. 32 .
FIG. 34 is an exemplary diagram for explaining a memory cell access operation in the SAM circuit diagram shown in FIG. 32 .
FIG. 35 is an exemplary diagram for explaining a data detection (DATA SENSE) operation in the SAM circuit diagram shown in FIG. 32 .
FIG. 36 is an exemplary diagram for explaining a READ-DISCARD operation in the SAM circuit diagram shown in FIG. 32 .
37 is an exemplary diagram for explaining a READ operation in the SAM circuit diagram shown in FIG. 32 .
38A is an exemplary waveform diagram of a READ-DISCARD operation, and FIG. 38B is an exemplary waveform diagram of a READ operation.
FIG. 39 is a table showing a part of the table shown in FIG. 21 in order to explain the REFREASH operation.
40 shows an example in which a SAM memory is implemented in various forms according to an example presented herein.
41 is an exemplary diagram illustrating an example of a method of mapping an address of a main memory based on ANN data locality information.
42 is an exemplary diagram illustrating another example of a method of mapping an address of a main memory based on ANN data locality information.
43 is a table showing an example of accessing a memory according to order information in artificial neural network data locality (ANN DL) information.
44 is an exemplary diagram illustrating an example of a memory in which a SAM controller is embedded.
45 is an exemplary diagram illustrating an architecture including a compiler.
46 shows an architecture according to a first example.
47 shows an architecture according to a second example.
48 shows an architecture according to a third example.
49 shows an architecture according to a fourth example.
50 shows an architecture according to a fifth example.
51 shows an architecture according to a sixth example.
52 is an exemplary diagram illustrating an operation according to the sixth example shown in FIG. 51 .
53A and 53B are exemplary views illustrating examples of convolution.
54 illustrates another example of caching data in the main memory in the cache memory and then performing an operation based on a tiling technique.
55 is a schematic diagram illustrating an artificial neural network memory system according to various examples of the present disclosure.
56 shows the detailed operation configuration of the SFU shown in FIG.
57 shows a graph in which the bandwidth of the data bus between the buffer memory (cache) and the main memory is measured.

본 개시의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 다양한 예시들을 참조하면 명확해질 것이다. 그러나 본 개시는 이하에서 설명되는 예시들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 개시의 예시들은 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present disclosure, and methods of achieving them, will become apparent with reference to the various examples described below in detail in conjunction with the accompanying drawings. However, the present disclosure is not limited to the examples described below, but may be implemented in various different forms, and only the examples of the present disclosure completely convey the scope of the invention to those of ordinary skill in the art to which the present disclosure belongs. It is provided as a reminder, and the present invention is only defined by the scope of the claims.

본 개시에 대한 상세한 설명은, 본 개시가 실시될 수 있는 특정 예시를 예시로서 설명의 편의를 위해 도면을 참조하여 설명할 수 있다. 본 개시의 다양한 예시들의 구성요소들이 서로 상이하더라도 특정 예시에 기재되어 있는 제조 방법, 동작 방법, 알고리즘, 형상, 공정, 구조 및 특성은 다른 예시와 결합하거나 또는 포함될 수 있다. 또한, 각각의 개시된 예시 내의 개별 구성요소의 위치 또는 배치는 본 개시의 정신 및 범위를 벗어나지 않으면서 변경될 수 있다. 본 개시의 여러 예시들의 각각 특징들이 부분적으로 또는 전체적으로 서로 결합 또는 조합 가능하며, 당업자가 충분히 이해할 수 있듯이 기술적으로 다양한 연동 및 작동이 가능하며, 각 예시들이 서로에 대하여 독립적으로 실시 가능할 수도 있고 연관 관계로 함께 실시할 수도 있다. The detailed description of the present disclosure may be described with reference to the drawings for convenience of description by way of illustration of specific examples in which the present disclosure may be practiced. Even if components of various examples of the present disclosure are different from each other, manufacturing methods, operating methods, algorithms, shapes, processes, structures, and characteristics described in a specific example may be combined with or included in other examples. In addition, the location or arrangement of individual components within each disclosed example may be changed without departing from the spirit and scope of the present disclosure. Each feature of the various examples of the present disclosure may be partially or wholly combined or combined with each other, and as those skilled in the art will fully understand, technically various interlocking and operation are possible, and each example may be independently implemented with respect to each other, and related relationships can also be performed together.

본 개시의 예시들을 설명하기 위한 도면에 개시된 형상, 크기, 비율, 각도, 개수 등은 예시적인 것이므로 본 개시는 도면을 참조하되 이에 한정되지 않는다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭할 수 있다. 또한, 본 개시를 설명함에 있어서, 관련된 공지 기술에 대한 구체적인 설명이 본 개시의 요지를 불필요하게 흐릴 수 있다고 결정되는 경우 그 상세한 설명은 생략할 수 있다. 본 명세서 상에서 언급된 '포함한다', '갖는다', '이루어진다' 등이 사용되는 경우 '~만'이 사용되지 않는 이상 다른 구성요소가 추가될 수 있다. 구성요소를 단수로 표현한 경우에 특별히 명시적인 기재 사항이 없는 한 복수를 포함하는 경우를 포함한다. 구성요소를 해석함에 있어서, 별도의 명시적 기재가 없더라도 오차 범위를 포함하는 것으로 해석한다. 위치 관계에 대한 설명일 경우, 예를 들면, '~상에', '~상부에', '~하부에', '~옆에', '~인접하여' 등으로 두 구성요소의 위치 관계가 설명되는 경우, '바로' 또는 '직접'이 사용되지 않는 이상 두 구성요소 사이에 하나의 다른 구성요소가 위치할 수도 있다. 소자 또는 층이 다른 소자 또는 층 "위 (on)"로 지칭되는 것은 다른 소자 바로 위에 또는 중간에 다른 층 또는 다른 소자를 개재한 경우를 모두 포함한다. Since the shape, size, ratio, angle, number, etc. disclosed in the drawings for explaining examples of the present disclosure are exemplary, the present disclosure is not limited thereto with reference to the drawings. Like reference numerals may refer to like elements throughout. In addition, in describing the present disclosure, when it is determined that a detailed description of a related known technology may unnecessarily obscure the subject matter of the present disclosure, the detailed description may be omitted. When 'including', 'having', 'consisting', etc. mentioned in this specification are used, other components may be added unless 'only' is used. When a component is expressed in the singular, cases including the plural are included unless otherwise explicitly stated. In interpreting the components, it is interpreted as including an error range even if there is no separate explicit description. In the case of a description of the positional relationship, for example, the positional relationship between the two components is expressed as 'on', 'on', 'on', 'next to', 'adjacent to', etc. When described, one other element may be positioned between two elements unless 'directly' or 'directly' is used. Reference to a device or layer “on” another device or layer includes any intervening layer or other device directly on or in the middle of the other device or layer.

도 1a는 본 개시의 일 예시에 따른 인공신경망 데이터 지역성에 기초한 인공 신경망 메모리 시스템의 프로세서 및 인공신경망 메모리 제어부를 설명하는 개략적인 블록도이다.1A is a schematic block diagram illustrating a processor and an artificial neural network memory controller of an artificial neural network memory system based on artificial neural network data locality according to an example of the present disclosure.

도 1a를 참조하면, 인공신경망 메모리 시스템(100)은 적어도 하나의 프로세서(110) 및 적어도 하나의 인공신경망 메모리 제어부(120)를 포함하도록 구성될 수 있다. 즉, 본 개시의 예시들에 따른 프로세서(110)는 적어도 하나 이상이며, 복수 개의 프로세서가 활용될 수 있다. 즉, 본 개시의 예시들에 따른 인공신경망 메모리 제어부(120)는 적어도 하나이며, 복수개의 인공신경망 메모리 제어부가 활용될 수 있다.Referring to FIG. 1A , the artificial neural network memory system 100 may be configured to include at least one processor 110 and at least one artificial neural network memory controller 120 . That is, there is at least one processor 110 according to examples of the present disclosure, and a plurality of processors may be utilized. That is, there is at least one artificial neural network memory controller 120 according to examples of the present disclosure, and a plurality of artificial neural network memory controllers may be utilized.

이하 설명의 편의를 위해 적어도 하나의 프로세서(110)가 하나의 프로세서일 경우, 프로세서(110)로 지칭할 수 있다. For convenience of description below, when the at least one processor 110 is one processor, it may be referred to as a processor 110 .

이하 설명의 편의를 위해 적어도 하나의 인공신경망 메모리 제어부(120)가 하나의 인공신경망 메모리 제어부(120)일 경우, 인공신경망 메모리 제어부(120)로 지칭할 수 있다. For convenience of description below, when the at least one artificial neural network memory controller 120 is one artificial neural network memory controller 120 , it may be referred to as an artificial neural network memory controller 120 .

프로세서(110)는 인공신경망모델을 처리하도록 구성된다. 예를 들어, 프로세서(110)는 특정 추론 기능을 수행하도록 학습된 인공신경망모델의 추론을 처리하여 입력 데이터에 따른 인공신경망모델의 추론 결과를 제공할 수 있다. 예를 들어, 프로세서(110)는 특정 추론 기능을 수행하기 위한 인공신경망모델의 학습을 처리하여 학습된 인공신경망모델을 제공할 수 있다. 특정 추론 기능은, 객체 인식, 음성 인식, 영상 처리 등 인공신경망이 추론할 수 있는 다양한 추론 기능들을 포함할 수 있다. The processor 110 is configured to process an artificial neural network model. For example, the processor 110 may provide an inference result of the artificial neural network model according to input data by processing the inference of the artificial neural network model learned to perform a specific reasoning function. For example, the processor 110 may provide the learned artificial neural network model by processing the learning of the artificial neural network model for performing a specific reasoning function. The specific inference function may include various inference functions that can be inferred by the artificial neural network, such as object recognition, voice recognition, and image processing.

프로세서(110)는 중앙 처리 장치(CPU), 그래픽 처리 장치(GPU), 어플리케이션 프로세서(AP), 디지털 신호 처리 장치(DSP), 산술 논리 연산 장치(ALU) 및 인공신경망 프로세서(NPU) 중 적어도 하나를 포함하도록 구성될 수 있다. 단, 본 개시의 프로세서(110)는 상술한 프로세서들에 제한되지 않는다.The processor 110 is at least one of a central processing unit (CPU), a graphic processing unit (GPU), an application processor (AP), a digital signal processing unit (DSP), an arithmetic logic unit (ALU), and an artificial neural network processor (NPU) It may be configured to include However, the processor 110 of the present disclosure is not limited to the above-described processors.

프로세서(110)는 인공신경망 메모리 제어부(120)와 통신하도록 구성될 수 있다. 프로세서(110)는 데이터 접근 요청을 생성하도록 구성될 수 있다. 데이터 접근 요청은 인공신경망 메모리 제어부(120)로 전송될 수 있다. 여기서 데이터 접근 요청은 프로세서(110)가 인공신경망모델의 추론 또는 학습을 처리할 때 필요한 데이터에 접근하는 요청을 의미할 수 있다. The processor 110 may be configured to communicate with the artificial neural network memory controller 120 . The processor 110 may be configured to generate a data access request. The data access request may be transmitted to the artificial neural network memory controller 120 . Here, the data access request may mean a request to access data required when the processor 110 processes inference or learning of the artificial neural network model.

프로세서(110)는 인공신경망 메모리 제어부(120)에 데이터 접근 요청을 전송하여 인공신경망 메모리 제어부(120)로부터 인공신경망모델의 추론 또는 학습에 필요한 데이터를 제공받거나, 또는 프로세서(110)가 처리한 인공신경망의 추론 또는 학습 결과를 인공신경망 메모리 제어부(120)에게 제공할 수 있다.The processor 110 transmits a data access request to the artificial neural network memory controller 120 to receive data necessary for inference or learning of the artificial neural network model from the artificial neural network memory controller 120, or to receive artificial neural network model processing data. The inference or learning result of the neural network may be provided to the artificial neural network memory controller 120 .

프로세서(110)는 특정 인공신경망모델을 처리한 추론 결과 또는 학습 결과를 제공할 수 있다. 이때 프로세서(110)는 추론 또는 학습을 하기 위한 인공신경망의 연산들을 특정 순서대로 처리하도록 구성될 수 있다. The processor 110 may provide an inference result or a learning result obtained by processing a specific artificial neural network model. In this case, the processor 110 may be configured to process operations of the artificial neural network for reasoning or learning in a specific order.

프로세서(110)가 특정 순서대로 인공신경망 연산을 처리해야 하는 이유는, 각각의 인공신경망모델이 각각의 고유한 인공신경망 구조를 가지도록 구성되었기 때문이다. 즉, 각각의 인공신경망모델은 고유한 인공신경망 구조에 따른 고유한 인공신경망 데이터 지역성을 가지도록 구성된다. 더 나아가서 고유한 인공신경망 데이터 지역성에 따라서 프로세서(110)가 처리하는 인공신경망모델의 연산 순서가 결정되게 된다. The reason that the processor 110 must process the artificial neural network calculations in a specific order is that each artificial neural network model is configured to have a unique artificial neural network structure. That is, each artificial neural network model is configured to have unique artificial neural network data locality according to a unique artificial neural network structure. Furthermore, the operation order of the artificial neural network model processed by the processor 110 is determined according to the unique locality of the artificial neural network data.

부연 설명하면, 인공신경망 데이터 지역성은 컴파일러에 의해서 인공신경망모델이 특정 프로세서에서 실행되도록 컴파일 될 때 구성될 수 있다. 인공신경망 데이터 지역성은 컴파일러, 인공신경망모델에 적용된 알고리즘들, 및 프로세서의 동작 특성에 따라서 구성될 수 있다. In other words, artificial neural network data locality can be configured when an artificial neural network model is compiled to run on a specific processor by a compiler. Neural network data locality may be configured according to a compiler, algorithms applied to an artificial neural network model, and operating characteristics of a processor.

프로세서(110)가 처리할 인공신경망모델은 프로세서(110)와 인공신경망모델의 알고리즘 특성을 고려할 수 있는 컴파일러에 의해서 컴파일될 수 있다. 즉, 인공신경망모델의 구조 및 알고리즘 정보를 알고, 프로세서(110)의 구동 특성을 알면, 컴파일러는 인공신경망 메모리 제어부(120)에게 워드 단위 순서로 인공신경망 데이터 지역성 정보를 제공하도록 구성될 수 있다.The artificial neural network model to be processed by the processor 110 may be compiled by the processor 110 and a compiler capable of considering the algorithm characteristics of the artificial neural network model. That is, if the structure and algorithm information of the artificial neural network model are known and the driving characteristics of the processor 110 are known, the compiler may be configured to provide artificial neural network data locality information to the artificial neural network memory controller 120 in word unit order.

예를 들면, 종래의 알고리즘 레벨의 특정 인공신경망모델의 특정 레이어의 가중치 값은 레이어 단위로 연산 될 수 있다. 하지만, 본 개시의 예시들에 따른 프로세서-메모리 레벨의 특정 인공신경망모델의 특정 레이어의 가중치 값은 프로세서(110)가 처리하도록 스케줄된 워드 단위로 연산 될 수 있다. For example, a weight value of a specific layer of a specific artificial neural network model of a conventional algorithm level may be calculated for each layer. However, the weight value of the specific layer of the specific artificial neural network model of the processor-memory level according to the examples of the present disclosure may be calculated in units of words scheduled to be processed by the processor 110 .

예를 들면, 프로세서(110)의 캐쉬 메모리의 크기가 처리할 인공신경망모델의 특정 레이어의 가중치 값들의 데이터 크기 보다 작을 경우, 프로세서(110)는 한 번에 특정 레이어의 가중치 값들을 처리하지 않도록 컴파일될 수 있다.For example, when the size of the cache memory of the processor 110 is smaller than the data size of the weight values of the specific layer of the artificial neural network model to be processed, the processor 110 compiles so as not to process the weight values of the specific layer at once. can be

즉, 프로세서(110)가 특정 레이어의 가중치 값들과 노드 값을 연산할 때, 가중치 값이 너무 크기 때문에, 결과 값들을 저장할 캐쉬 메모리 공간이 부족할 수 있다. 이러한 경우, 프로세서(110)가 생성하는 데이터 접근 요청이 복수의 데이터 접근 요청들로 증가될 수 있다. 따라서 프로세서(110)는 증가된 데이터 접근 요청들을 특정 순서로 처리하도록 구성될 수 있다. 이러한 경우, 알고리즘 레벨의 연산 순서와 프로세서-메모리 레벨의 인공신경망 데이터 지역성에 따른 연산 순서는 서로 상이해질 수 있다.That is, when the processor 110 calculates the weight values and node values of a specific layer, since the weight value is too large, a cache memory space to store the result values may be insufficient. In this case, the data access request generated by the processor 110 may be increased to a plurality of data access requests. Accordingly, the processor 110 may be configured to process the increased data access requests in a specific order. In this case, the operation order of the algorithm level and the operation order according to the locality of the artificial neural network data at the processor-memory level may be different from each other.

즉, 알고리즘 레벨에서의 인공신경망 연산 순서는 해당 인공신경망모델을 처리할 프로세서 및 메모리의 하드웨어 특성을 고려하여 프로세서-메모리 레벨의 인공신경망 데이터 지역성에 의해 재구성 될 수 있다.That is, the neural network operation order at the algorithm level can be reconstructed by the neural network data locality at the processor-memory level in consideration of the hardware characteristics of the processor and memory to process the corresponding artificial neural network model.

프로세서-메모리 레벨에서 존재하는 인공신경망모델의 인공신경망 데이터 지역성이란 프로세서(110)가 메모리에 요청하는 데이터 접근 요청 순서에 기반하여 프로세서-메모리 레벨에서 프로세서(110)가 처리하는 인공신경망모델의 연산 순서를 예측하게 하는 정보로 정의될 수 있다. The neural network data locality of the artificial neural network model existing at the processor-memory level is the operation sequence of the artificial neural network model processed by the processor 110 at the processor-memory level based on the data access request order requested by the processor 110 to the memory. It can be defined as information that makes predictions.

부연 설명하면 동일한 인공신경망모델의 경우에도 프로세서(110)의 연산 기능, 예를 들면, 특징맵 타일링(tiling) 기법, 프로세싱 엘리먼트의 스테이셔너리(Stationary) 기법 등, 프로세서(110)의 프로세싱 엘리먼트 개수, 프로세서(110)내 특징맵 및 가중치 등의 캐쉬 메모리 용량, 프로세서(110) 내의 메모리 계층 구조, 인공신경망모델을 연산 처리하기 위한 프로세서(110)의 연산 동작의 순서를 결정해 주는 컴파일러의 알고리즘 특성 등에 따라서 인공신경망모델의 인공신경망 데이터 지역성이 다르게 구성될 수 있다. To elaborate, even in the case of the same artificial neural network model, the number of processing elements of the processor 110, such as the calculation function of the processor 110, for example, a feature map tiling technique, a stationary technique of processing elements, etc. , the processor 110, cache memory capacity such as feature maps and weights, the memory hierarchy in the processor 110, and the algorithm characteristics of the compiler that determine the order of operation of the processor 110 for processing the artificial neural network model The locality of the artificial neural network data of the artificial neural network model may be configured differently according to the like.

예를 들면, 특징맵 타일링은 합성곱을 분할하는 인공신경망 기법으로, 합성곱 영역이 분할됨에 따라 특징맵이 분할되어 연산된다. 따라서, 타일링 합성곱에 의해서 같은 인공신경망모델이라 할지라도, 인공신경망모델의 인공신경망 데이터 지역성은 서로 상이할 수 있다. For example, feature map tiling is an artificial neural network technique that divides the convolution, and as the convolution region is divided, the feature map is divided and calculated. Therefore, even in the same artificial neural network model by tiling convolution, the locality of the artificial neural network data of the artificial neural network model may be different from each other.

예를 들면, 스테이셔너리 기법은 신경 프로세싱 유닛에서 프로세싱 엘리먼트들(PE)의 구동 방법을 제어하는 기법이다. 스테이셔너리 기법에 따르면 처리되는 데이터 종류, 예를 들면, 입력 특징맵, 가중치, 및 출력 특징맵 중 하나가 프로세싱 엘리먼트에 고정되어 재사용될 수 있다. 따라서, 프로세서(110)가 메모리에게 요청하는 데이터의 종류 및 순서가 달라질 수 있다. For example, the stationary technique is a technique for controlling a driving method of the processing elements PE in the neural processing unit. According to the stationary technique, a type of processed data, for example, one of an input feature map, a weight, and an output feature map may be fixed to a processing element and reused. Accordingly, the type and order of data requested by the processor 110 from the memory may vary.

즉, 동일한 인공신경망모델의 경우라도 다양한 알고리즘 및/또는 기법 등 따라 인공신경망 데이터 지역성은 재구성될 수 있다. 따라서, 인공신경망 데이터 지역성은 프로세서, 컴파일러, 메모리 등 다양한 조건들에 의해서 전체적으로 또는 부분적으로 재구성 될 수 있다.That is, even in the case of the same artificial neural network model, artificial neural network data locality may be reconstructed according to various algorithms and/or techniques. Therefore, artificial neural network data locality can be completely or partially reconstructed by various conditions such as processor, compiler, and memory.

도 1b는 본 개시의 다양한 예시들에 적용될 수 있는 인공신경망 데이터 지역성 패턴의 재구성에 관한 설명을 위한 예시적인 신경 프로세싱 유닛의 예시를 나타내는 개략도이다.1B is a schematic diagram illustrating an example of an exemplary neural processing unit for explanation regarding reconstruction of an artificial neural network data locality pattern that can be applied to various examples of the present disclosure.

도 1b를 참조하면, 프로세서(110)가 신경 프로세싱 유닛(NPU)일 경우 적용될 수 있는 예시적인 스테이셔너리 기법들이 도시되어 있다.Referring to FIG. 1B , exemplary stationary techniques that may be applied when the processor 110 is a neural processing unit (NPU) are illustrated.

프로세싱 엘리먼트들(PE)은 어레이 형태로 구성될 수 있으며, 각각의 프로세싱 엘리먼트는 곱셈기(x)와 덧셈기(＋)를 포함하도록 구성될 수 있다. 프로세싱 엘리먼트들(PE)은 버퍼 메모리 또는 캐쉬 메모리, 예를 들면, 글로벌 버퍼(global buffer)와 연결될 수 있다. 프로세싱 엘리먼트들(PE)은 입력 특징맵 화소(Ifmap pixel; I), 필터 가중치(Filter weight; W), 및 부분합(Psum; P) 중 하나의 데이터를 프로세싱 엘리먼트들(PE)의 레지스터에 고정시킬 수 있다. 그리고 나머지 데이터들을 프로세싱 엘리먼트들(PE)의 입력 데이터로 제공될 수 있다. 부분합(P)의 누산이 완료되면 출력 특징맵 화소가 될 수 있다. 단, 복수의 프로세싱 엘리먼트들은 어레이 형태가 아닌, 개별 구동하는 형식으로 구현되는 것도 가능하다. The processing elements PE may be configured in an array form, and each processing element may be configured to include a multiplier (x) and an adder (+). The processing elements PE may be connected to a buffer memory or a cache memory, for example, a global buffer. The processing elements PE are configured to fix data of one of an input feature map pixel (I), a filter weight (W), and a subsum (Psum) to a register of the processing elements PE. can In addition, the remaining data may be provided as input data of the processing elements PE. When the accumulation of the subtotal P is completed, it may become an output feature map pixel. However, it is also possible that the plurality of processing elements are implemented not in the form of an array, but in the form of individually driving.

도 1b의 (a)는 가중치 스테이셔너리(Weight-Stationary; WS) 기법을 도시한다. 가중치 스테이셔너리(WS) 기법에 따르면, 프로세싱 엘리먼트들(PE) 각각의 레지스터파일에 필터 가중치들(W0 to W7)이 고정되고, 병렬로 프로세싱 엘리먼트들(PE)에 입력되는 입력 특징맵 화소(I)를 0번째 입력 특징맵 화소(I0)에서 8번째 입력 특징맵 화소(I8)로 이동 시키면서 연산을 실행할 수 있다. 부분합들(P0 to P8)은 직렬로 연결된 프로세싱 엘리먼트들(PE)에 누적될 수 있다. 부분합들(P0 to P8)은 순차적으로 다음 프로세싱 엘리먼트로 이동할 수 있다. 고정된 필터 가중치들(W0 to W7)을 사용하는 모든 MAC(multiply and accumulation) 연산은 직렬 처리를 위해 동일한 프로세싱 엘리먼트들(PE)에 맵핑(mapping) 되어야 한다. 1B (a) illustrates a Weight-Stationary (WS) technique. According to the weight stationary (WS) technique, filter weights W0 to W7 are fixed in the register file of each of the processing elements PE, and an input feature map pixel ( The operation can be executed while moving I) from the 0th input feature map pixel (I0) to the 8th input feature map pixel (I8). The subtotals P0 to P8 may be accumulated in the serially connected processing elements PE. The subtotals P0 to P8 may sequentially move to the next processing element. All multiply and accumulation (MAC) operations using the fixed filter weights W0 to W7 must be mapped to the same processing elements PE for serial processing.

상술한 구성에 따르면, 레지스터파일에서 필터 가중치(W)의 합성곱 연산 시 필터 가중치(W) 재사용을 최대화하여 필터 가중치(W)의 액세스 에너지 소비를 최소화 할 수 있는 효과가 있다. According to the above configuration, there is an effect of minimizing the access energy consumption of the filter weight W by maximizing the reuse of the filter weight W during the convolution operation of the filter weight W in the register file.

주목해야할 점은, 컴파일 단계에서 인공신경망모델에 가중치 스테이셔너리(WS) 기법을 적용함에 따라, 인공신경망모델의 인공신경망 데이터 지역성은 프로세서-메모리 레벨에서 가중치 스테이셔너리(WS) 기법에 최적화되기 위해서 재구성된다. 예를 들면, 가중치 스테이셔너리(WS) 기법에서는 연산의 효율성을 위해서 프로세싱 엘리먼트들(PE)에 필터 가중치들(W0 to W7)을 우선적으로 저장하도록 구성될 수 있다. 따라서 인공신경망 데이터 지역성은 필터 가중치(W), 입력 특징맵 화소(I), 및 부분합(P) 순서대로 재구성될 수 있으며, 이에 프로세서(110)가 생성하는 데이터 접근 요청 순서도 재구성된 인공신경망 데이터 지역성에 따라서 결정될 수 있다.It should be noted that, as the weighted stationary (WS) method is applied to the artificial neural network model at the compilation stage, the locality of the artificial neural network data of the artificial neural network model is optimized for the weighted stationary (WS) method at the processor-memory level. is reconstructed for For example, in the weight stationary (WS) technique, the filter weights W0 to W7 may be preferentially stored in the processing elements PE for efficiency of operation. Therefore, the neural network data locality can be reconstructed in the order of the filter weight (W), the input feature map pixel (I), and the subtotal (P), and thus the data access request sequence generated by the processor 110 is also reconstructed. can be determined according to

도 1b의 (b)는 출력 스테이셔너리(Output-Stationary; OS) 기법을 도시한다. 출력 스테이셔너리(OS) 기법에 따르면, 프로세싱 엘리먼트들(PE)의 각각의 레지스터파일에 부분합들(P0 to P7)이 고정되어 누산되고, 병렬로 프로세싱 엘리먼트들(PE)에 입력되는 필터 가중치(W)를 0번째 입력 필터 가중치(W0)에서 7번째 필터 가중치(W7)로 이동 시키면서 연산을 실행할 수 있다. 입력 특징맵 화소들(I0 to I7)은 직렬로 연결된 프로세싱 엘리먼트들(PE)로 이동될 수 있다. 각각의 부분합들(P0 to P7)은 각각의 프로세싱 엘리먼트들(PE)에 고정되어 MAC(multiply and accumulation) 연산을 처리하도록 매핑(mapping) 되어야 한다. Figure 1b (b) shows an output stationary (Output-Stationary; OS) technique. According to the output stationary (OS) technique, subtotals P0 to P7 are fixed and accumulated in each register file of the processing elements PE, and the filter weights ( The operation can be executed while moving W) from the 0th input filter weight (W0) to the 7th filter weight (W7). The input feature map pixels I0 to I7 may be moved to serially connected processing elements PE. Each of the subtotals P0 to P7 is fixed to each of the processing elements PE and must be mapped to process a multiply and accumulation (MAC) operation.

상술한 구성에 따르면, 프로세싱 엘리먼트들(PE)에서 필터 가중치(W)의 합성곱 연산 시 부분합(P)을 프로세싱 엘리먼트들(PE)의 레지스터파일에 고정시켜서 부분합(P)의 재사용을 최대화하고 부분합(P)의 이동에 따른 에너지 소비를 최소화할 수 있는 효과가 있다. 고정된 부분합(P)의 누산이 완료되면 출력 특징맵이 될 수 있다.According to the above configuration, the subtotal P is fixed to the register file of the processing elements PE during the convolution operation of the filter weights W in the processing elements PE to maximize the reuse of the subtotal P and to maximize the subsum There is an effect that can minimize the energy consumption according to the movement of (P). When the accumulation of the fixed subtotal P is completed, it may become an output feature map.

주목해야할 점은, 프로세서(110)가 출력 스테이셔너리(OS) 기법을 적용함에 따라, 인공신경망모델의 인공신경망 데이터 지역성은 프로세서-메모리 레벨에서 출력 스테이셔너리(OS) 기법에 최적화되기 위해서 재구성된다. 예를 들면, 출력 스테이셔너리(OS) 기법에서는 연산의 효율성을 위해서 프로세싱 엘리먼트들(PE)에 부분합들(P0 to P7)을 우선적으로 저장하도록 구성될 수 있다. 따라서 인공신경망 데이터 지역성은 부분합(P), 필터 가중치(W), 및 입력 특징맵 화소(I) 순서대로 재구성될 수 있으며, 이에 프로세서(110)가 생성하는 데이터 접근 요청 순서도 재구성된 인공신경망 데이터 지역성에 따라서 결정될 수 있다.인공신경망모델 컴파일러는 프로세서(110)와 메모리의 하드웨어 특성정보를 전달받아 인공신경망모델이 프로세서-메모리 레벨에서 동작할 수 있는 코드로 변환할 수 있다. 이때, 인공신경망모델은 프로세서에 의해서 실행되는 코드로 변환되기 때문에, 로우-레벨의 코드로 변환될 수 있다.It should be noted that, as the processor 110 applies the output stationary (OS) technique, the neural network data locality of the neural network model is reconfigured to be optimized for the output stationary (OS) technique at the processor-memory level. do. For example, the output stationary (OS) technique may be configured to preferentially store the subtotals P0 to P7 in the processing elements PE for efficiency of operation. Therefore, the neural network data locality can be reconstructed in the order of the subtotal (P), the filter weight (W), and the input feature map pixel (I). The neural network model compiler may receive hardware characteristic information of the processor 110 and the memory and convert the artificial neural network model into code that can operate at the processor-memory level. At this time, since the artificial neural network model is converted into code executed by a processor, it can be converted into a low-level code.

즉, 상술한 각 요인들에 의하면 동일한 인공신경망모델을 연산 처리하더라도 프로세서(110)가 클럭 단위로 매 순간 필요한 데이터의 순서를 변경할 수 있다. 따라서 인공신경망모델의 인공신경망 데이터 지역성이 하드웨어 레벨에서 다르게 구성될 수 있다.That is, according to each of the factors described above, the processor 110 may change the order of data required at every moment in clock units even when the same artificial neural network model is processed. Therefore, the locality of the artificial neural network data of the artificial neural network model may be configured differently at the hardware level.

다만, 인공신경망 데이터 지역성의 구성이 완료될 경우, 프로세서(110)의 연산 순서 및 해당 연산에 필요한 데이터 처리 순서가 해당 인공신경망모델의 학습 연산 또는 추론 연산마다 정확하게 반복될 수 있다.However, when the configuration of the artificial neural network data locality is completed, the operation order of the processor 110 and the data processing order necessary for the operation may be accurately repeated for each learning operation or inference operation of the artificial neural network model.

이하 상술한 본 개시의 일 예시에 따른 인공신경망 메모리 시스템(100)은 인공신경망 데이터 지역성이 제공하는 정확한 연산 순서에 기초하여 프로세서(110)가 요청할 다음 데이터를 사전에 예측하여 메모리 지연 문제 및 메모리 대역폭 문제를 개선하여 인공신경망 연산 처리 성능을 향상 시키고, 전력소모 등을 저감하도록 구성될 수 있다.Hereinafter, the neural network memory system 100 according to an example of the present disclosure described above predicts the next data to be requested by the processor 110 in advance based on the precise operation sequence provided by the artificial neural network data locality, thereby reducing the memory delay problem and memory bandwidth. By improving the problem, it can be configured to improve the processing performance of the artificial neural network and reduce power consumption.

본 개시의 일 예시에 따른 인공신경망 메모리 제어부(120)는 프로세서(110)가 처리할 인공신경망모델의 인공신경망 데이터 지역성 정보를 제공 받도록 구성되거나 또는 프로세서(110)가 처리중인 인공신경망모델의 인공신경망 데이터 지역성을 분석하도록 구성된 것을 특징으로 한다.The artificial neural network memory controller 120 according to an example of the present disclosure is configured to receive locality information of the artificial neural network data of the artificial neural network model to be processed by the processor 110, or the artificial neural network of the artificial neural network model being processed by the processor 110 It is characterized in that it is configured to analyze data locality.

인공신경망 메모리 제어부(120)는 프로세서(110)에서 생성된 데이터 접근 요청을 수신하도록 구성될 수 있다. The artificial neural network memory controller 120 may be configured to receive a data access request generated by the processor 110 .

인공신경망 메모리 제어부(120)는 프로세서(110)로부터 수신한 데이터 접근 요청을 모니터링 하거나 또는 기록하도록 구성될 수 있다. 인공신경망 메모리 제어부(120)는 인공신경망모델을 처리하고 있는 프로세서(110)가 출력하는 데이터 접근 요청들을 관찰하여 이후에 요청될 데이터 액세스 순서를 정확하게 예측할 수 있는 효과가 있다. 하나의 데이터 접근 요청은 적어도 하나의 워드 단위의 데이터를 포함하도록 구성될 수 있다. The artificial neural network memory control unit 120 may be configured to monitor or record the data access request received from the processor 110 . The artificial neural network memory control unit 120 has the effect of accurately predicting the data access sequence to be requested later by observing the data access requests output by the processor 110 processing the artificial neural network model. One data access request may be configured to include data of at least one word unit.

인공신경망 메모리 제어부(120)는 프로세서(110)에서 수신된 데이터 접근 요청을 순차적으로 기록하거나 또는 모니터링하도록 구성될 수 있다.The artificial neural network memory control unit 120 may be configured to sequentially record or monitor data access requests received from the processor 110 .

인공신경망 메모리 제어부(120)가 기록하는 데이터 접근 요청들은 로그 파일(log file), 테이블(table), 리스트(list) 등 다양한 형태로 저장될 수 있다. 단, 본 개시의 일 예시에 따른 인공신경망 메모리 제어부(120)는 데이터 접근 요청의 기록된 형태나 양식 등에 제한되지 않는다.The data access requests recorded by the artificial neural network memory controller 120 may be stored in various formats, such as a log file, a table, and a list. However, the artificial neural network memory controller 120 according to an example of the present disclosure is not limited to the recorded form or form of the data access request.

인공신경망 메모리 제어부(120)가 모니터링하는 데이터 접근 요청들은 인공신경망 메모리 제어부(120) 내의 임의의 메모리에 저장될 수 있다. 단, 본 개시의 일 예시에 따른 인공신경망 메모리 제어부(120)는 데이터 접근 요청의 모니터링 방식에 제한되지 않는다. The data access requests monitored by the artificial neural network memory controller 120 may be stored in any memory in the artificial neural network memory controller 120 . However, the artificial neural network memory controller 120 according to an example of the present disclosure is not limited to the monitoring method of the data access request.

인공신경망 메모리 제어부(120)는 데이터 접근 요청의 기록 또는 모니터링을 위한 임의의 메모리를 더 포함하도록 구성될 수 있다. 단, 본 개시의 일 예시에 따른 인공신경망 메모리 제어부(120)는 이에 제한되지 않으며, 외부 메모리와 통신하도록 구성될 수 있다.The artificial neural network memory control unit 120 may be configured to further include an arbitrary memory for recording or monitoring a data access request. However, the artificial neural network memory controller 120 according to an example of the present disclosure is not limited thereto, and may be configured to communicate with an external memory.

인공신경망 메모리 제어부(120)는 프로세서(110)로부터 수신한 데이터 접근 요청을 모니터링 하거나 또는 기록하여 데이터 접근 요청들을 분석하도록 구성될 수 있다.The artificial neural network memory controller 120 may be configured to analyze the data access requests by monitoring or recording the data access requests received from the processor 110 .

즉, 인공신경망 메모리 제어부(120)는 수신한 데이터 접근 요청들을 분석하여 프로세서(110)가 처리중인 인공신경망모델의 인공신경망 데이터 지역성을 분석하도록 구성될 수 있다. That is, the artificial neural network memory controller 120 may be configured to analyze the received data access requests to analyze the artificial neural network data locality of the artificial neural network model being processed by the processor 110 .

즉, 인공신경망 메모리 제어부(120)는 프로세서-메모리 레벨에서 동작하도록 컴파일 된 인공신경망모델의 인공신경망 데이터 지역성을 분석하도록 구성될 수 있다.That is, the artificial neural network memory controller 120 may be configured to analyze the locality of the artificial neural network data of the artificial neural network model compiled to operate at the processor-memory level.

즉, 인공신경망 메모리 제어부(120)는 프로세서-메모리 레벨의 인공신경망의 데이터 지역성에 기초하여, 인공신경망의 연산 처리 순서를 프로세서가 생성하는 메모리 접근 요청 단위로 분석하여 인공신경망모델의 인공신경망 데이터 지역성을 분석하도록 구성될 수 있다.That is, the artificial neural network memory controller 120 analyzes the computational processing sequence of the artificial neural network in units of memory access requests generated by the processor, based on the processor-memory level data locality of the artificial neural network, and analyzes the artificial neural network data locality of the artificial neural network model. can be configured to analyze.

상술한 구성에 따르면, 인공신경망 메모리 제어부(120)는 프로세서-메모리 레벨에서 재구성된 인공신경망 데이터 지역성을 분석할 수 있는 효과가 있다. According to the above configuration, the artificial neural network memory controller 120 has an effect of analyzing the locality of the artificial neural network data reconstructed at the processor-memory level.

몇몇 예시에서는, 컴파일러는 인공신경망모델의 인공신경망 데이터 지역성을 워드(WORD) 단위까지 분석하도록 구성될 수 있다.In some examples, the compiler may be configured to analyze the neural network data locality of the artificial neural network model to a word unit.

몇몇 예시에서는, 적어도 하나의 인공신경망 메모리 제어부는 컴파일러가 분석한 인공신경망 데이터 지역성을 워드 단위로 제공받도록 구성될 수 있다. 여기서 워드 단위는 프로세서(110)의 워드 단위에 따라 8bit, 16bit, 32bit, 64bit 등으로 달라질 수 있다. 여기서 워드 단위는 컴파일 된 인공신경망모델의 커널, 특징맵 등의 양자화 알고리즘에 따라 2bit, 3bit, 5bit 등 각각 다른 워드 단위로 설정될 수 있다.In some examples, the at least one artificial neural network memory controller may be configured to receive the artificial neural network data locality analyzed by the compiler in units of words. Here, the word unit may be 8-bit, 16-bit, 32-bit, 64-bit, or the like, depending on the word unit of the processor 110 . Here, the word unit may be set in different word units, such as 2 bit, 3 bit, or 5 bit, depending on the quantization algorithm such as the kernel and feature map of the compiled artificial neural network model.

인공신경망 메모리 제어부(120)는 특수 기능 레지스터(special function register)를 포함하도록 구성될 수 있다. 특수 기능 레지스터는 인공신경망 데이터 지역성 정보를 저장하도록 구성될 수 있다.The artificial neural network memory control unit 120 may be configured to include a special function register. The special function register may be configured to store artificial neural network data locality information.

인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 정보의 저장 여부에 따라 서로 다른 모드로 동작하도록 구성될 수 있다.The artificial neural network memory controller 120 may be configured to operate in different modes depending on whether artificial neural network data locality information is stored.

만약, 인공신경망 메모리 제어부(120)가 인공신경망 데이터 지역성 정보를 저장한 경우, 인공신경망 메모리 제어부(120)는 프로세서(110)가 처리할 인공신경망모델의 데이터 처리 순서를 워드 단위 순서로 미리 예측할 수 있기 때문에, 별도의 데이터 접근 요청을 기록하지 않도록 구성될 수도 있다. 단, 이에 제한되지 않으며, 인공신경망 메모리 제어부(120)는 저장된 인공신경망 데이터 지역성 정보와 프로세서가 생성하는 데이터 접근 요청을 비교하면서, 저장된 인공신경망 데이터 지역성에 오류가 존재하는지 검증하도록 구성될 수 있다.If the artificial neural network memory control unit 120 stores artificial neural network data locality information, the artificial neural network memory controller 120 may predict the data processing order of the artificial neural network model to be processed by the processor 110 in advance in word unit order. Therefore, it may be configured not to record a separate data access request. However, the present invention is not limited thereto, and the artificial neural network memory controller 120 may be configured to verify whether an error exists in the stored artificial neural network data locality while comparing the stored artificial neural network data locality information with the data access request generated by the processor.

만약, 인공신경망 메모리 제어부(120)가 인공신경망 데이터 지역성 정보를 제공받지 않은 경우, 인공신경망 메모리 제어부(120)는 프로세서(110)가 생성하는 데이터 접근 요청을 관찰하여 프로세서(110)가 처리하는 인공신경망모델의 인공신경망 데이터 지역성을 예측하는 모드로 동작하도록 구성될 수 있다.If the artificial neural network memory controller 120 is not provided with the artificial neural network data locality information, the artificial neural network memory controller 120 observes the data access request generated by the processor 110 and processes the artificial neural network data by the processor 110 . It may be configured to operate in a mode for predicting artificial neural network data locality of a neural network model.

몇몇 예시에서는, 인공신경망 메모리 시스템은 프로세서, 메모리 및 캐쉬 메모리를 포함하고, 인공신경망 데이터 지역성 정보에 기초하여 프로세서가 요청할 데이터를 포함하는 예측된 데이터 접근 요청을 생성하도록 구성될 수 있다. 예측된 데이터 접근 요청은 ANN DL을 기초로 예측한 데이터 접근 요청, 또는 요청될 데이터 접근 요청 등으로 지칭될 수 있다. 이하 설명의 편의를 위해서 예측된 데이터 접근 요청은 사전 데이터 접근 요청으로 지칭할 수 있다. 인공신경망 메모리 시스템은 메모리로부터 예측된 데이터 접근 요청에 대응되는 데이터를 프로세서가 요청하기 전에 캐쉬 메모리에 저장하도록 구성될 수 있다. 이때, 인공신경망 메모리 시스템은 인공신경망 데이터 지역성 정보를 제공 받아 동작하도록 구성된 제1 모드 또는 프로세서가 생성하는 데이터 접근 요청들을 관찰하여 인공신경망 데이터 지역성 정보를 예측하여 동작하도록 구성된 제2 모드 중 하나의 모드로 동작하도록 구성될 수 있다. 상술한 구성에 따르면, 인공신경망 메모리 시스템은 인공신경망 데이터 지역성 정보를 제공 받을 경우, 워드 단위로 프로세서가 요청할 데이터를 사전에 예측하여 준비할 수 있는 효과가 있으며, 인공신경망 데이터 지역성 정보가 제공되지 않더라도, 프로세서가 생성하는 데이터 접근 요청들을 일정기간 모니터링함으로써 프로세서가 처리중인 인공신경망 데이터 지역성을 데이터 접근 요청 단위로 예측할 수 있는 효과가 있다. 더 나아가서, 인공신경망 데이터 지역성 정보가 제공되더라도, 인공신경망 메모리 시스템은 자체적으로 데이터 접근 요청을 모니터링 함으로써 인공신경망 데이터 지역성을 재구성하여 제공된 인공신경망 데이터 지역성을 검증하는 용도로 활용할 수도 있다. 따라서 인공신경망모델의 변경, 또는 오류 등의 발생을 감지할 수 있는 효과가 제공될 수 있다. In some examples, the neural network memory system may include a processor, a memory, and a cache memory, and may be configured to generate a predicted data access request including data to be requested by the processor based on the neural network data locality information. The predicted data access request may be referred to as a data access request predicted based on the ANN DL or a data access request to be requested. For convenience of description below, the predicted data access request may be referred to as a prior data access request. The artificial neural network memory system may be configured to store data corresponding to the data access request predicted from the memory in the cache memory before the processor requests it. At this time, the artificial neural network memory system is one of a first mode configured to operate by receiving artificial neural network data locality information or a second mode configured to predict and operate artificial neural network data locality information by observing data access requests generated by the processor It can be configured to operate as According to the above configuration, when the artificial neural network data locality information is provided, the artificial neural network memory system has the effect of predicting and preparing the data requested by the processor in word units in advance, even if the artificial neural network data locality information is not provided. By monitoring the data access requests generated by the processor for a certain period of time, it has the effect of predicting the locality of the artificial neural network data being processed by the processor in units of data access requests. Furthermore, even if the artificial neural network data locality information is provided, the artificial neural network memory system can reconstruct the artificial neural network data locality by monitoring data access requests by itself and use it for the purpose of verifying the provided artificial neural network data locality. Accordingly, an effect of detecting a change in the artificial neural network model or occurrence of an error may be provided.

몇몇 예시에서는, 적어도 하나의 인공신경망 메모리 제어부와 적어도 하나의 프로세서가 직접 통신하도록 구성될 수 있다. 상술한 구성에 따르면, 인공신경망 메모리 제어부는 프로세서로부터 직접 데이터 접근 요청을 수신할 수 있기 때문에, 프로세서와 인공신경망 메모리 제어부 사이의 시스템버스에 의해서 발생될 수 있는 지연시간을 제거할 수 있는 효과가 있다. 부연 설명하면, 프로세서와 인공신경망 메모리 제어부의 직접 통신을 위해서, 전용 버스를 더 포함하도록 구성될 수 있거나 또는 전용 통신 채널을 더 포함하도록 구성될 수 있다. 단, 이에 제한되지 않는다.In some examples, the at least one artificial neural network memory controller and the at least one processor may be configured to communicate directly. According to the above configuration, since the artificial neural network memory control unit can directly receive a data access request from the processor, there is an effect of eliminating delay time that may be caused by the system bus between the processor and the artificial neural network memory control unit. . In more detail, for direct communication between the processor and the artificial neural network memory controller, it may be configured to further include a dedicated bus or to further include a dedicated communication channel. However, the present invention is not limited thereto.

몇몇 예시에서는, 인공신경망 데이터 지역성 정보는 프로세서(110) 및/또는 인공신경망 메모리 제어부(120)에 선택적으로 저장되도록 구성될 수 있다. 인공신경망 데이터 지역성 정보는 프로세서(110) 및/또는 인공신경망 메모리 제어부(120)에 포함된 특수 목적 레지스터(special function register)에 저장되도록 구성될 수 있다. 단, 이에 제한되지 않으며, 인공신경망 데이터 지역성 정보는 인공신경망 메모리 시스템과 통신할 수 있는 임의의 메모리, 레지스터 등에 저장될 수 있다. In some examples, the artificial neural network data locality information may be configured to be selectively stored in the processor 110 and/or the artificial neural network memory controller 120 . The artificial neural network data locality information may be configured to be stored in a special function register included in the processor 110 and/or the artificial neural network memory controller 120 . However, the present invention is not limited thereto, and the artificial neural network data locality information may be stored in any memory, register, or the like that can communicate with the artificial neural network memory system.

도 2는 본 개시의 일 예시에 따른 인공신경망 데이터 지역성 패턴을 설명하는 개략도이다. 2 is a schematic diagram illustrating an artificial neural network data locality pattern according to an example of the present disclosure. 이하 도 2를 참조하여 인공신경망모델의 인공신경망 데이터 지역성 및 인공신경망 데이터 지역성 패턴에 대해서 설명한다.Hereinafter, the artificial neural network data locality and the artificial neural network data locality pattern of the artificial neural network model will be described with reference to FIG. 2 .

인공신경망 메모리 제어부(120)는 프로세서(110)로부터 수신된 데이터 접근 요청을 순서대로 기록 또는 모니터링 하도록 구성된다. The artificial neural network memory control unit 120 is configured to sequentially record or monitor the data access requests received from the processor 110 .

인공신경망 메모리 제어부(120)는 프로세서(110)가 처리중인 인공신경망모델의 데이터 지역성을 포함하는 인공신경망 데이터 지역성 패턴을 생성하도록 구성된다. 즉, 인공신경망 메모리 제어부(120)는 프로세서(110)가 생성하는 인공신경망모델과 관련된 데이터 접근 요청들을 분석하여 반복되는 특정 패턴을 생성하도록 구성될 수 있다. 즉, 데이터 접근 요청을 관찰할 경우, 인공신경망 데이터 지역성 정보는 인공신경망 데이터 지역성 패턴으로 저장될 수 있다. The artificial neural network memory controller 120 is configured to generate an artificial neural network data locality pattern including the data locality of the artificial neural network model being processed by the processor 110 . That is, the artificial neural network memory control unit 120 may be configured to generate a repeating specific pattern by analyzing data access requests related to the artificial neural network model generated by the processor 110 . That is, when observing a data access request, artificial neural network data locality information may be stored as an artificial neural network data locality pattern.

도 2를 참조하면, 예시적으로 18개의 데이터 접근 요청들이 인공신경망 메모리 제어부(120)에 순차적으로 기록되어 있다. 각각의 데이터 접근 요청들은 식별 정보를 포함하도록 구성된다. Referring to FIG. 2 , for example, 18 data access requests are sequentially recorded in the artificial neural network memory controller 120 . Each data access request is configured to include identification information.

데이터 접근 요청에 포함된 식별 정보는 다양한 정보를 포함하도록 구성될 수 있다. The identification information included in the data access request may be configured to include various information.

예를 들면, 식별 정보는 적어도 메모리 주소 값 및 동작 모드(mode) 값을 포함하도록 구성된다. For example, the identification information is configured to include at least a memory address value and an operating mode value.

예를 들면, 메모리 주소 값은 요청된 데이터에 대응되는 메모리 주소 값들을 포함하도록 구성될 수 있다. 단, 본 개시는 이에 제한되지 않는다.For example, the memory address value may be configured to include memory address values corresponding to the requested data. However, the present disclosure is not limited thereto.

예를 들면, 메모리 주소 값은 요청된 데이터에 대응되는 메모리 주소의 시작 값과 끝 값을 포함하도록 구성될 수 있다. 상술한 구성에 따르면, 메모리 주소의 시작 값과 끝 값 사이에 데이터가 순차적으로 저장된 것으로 간주한다. 따라서 메모리 주소 값들을 저장하는 용량을 저감할 수 있는 효과가 있다. 즉, 트리거 값이 활성화가 되면 메모리는 버스트 모드로 동작하는 것도 가능하다. For example, the memory address value may be configured to include a start value and an end value of the memory address corresponding to the requested data. According to the above configuration, it is considered that data is sequentially stored between the start value and the end value of the memory address. Accordingly, there is an effect that the capacity for storing memory address values can be reduced. That is, when the trigger value is activated, the memory can also operate in burst mode.

예를 들면, 메모리 주소 값은 요청된 데이터에 대응되는 메모리 주소의 시작 값과 데이터 연속 읽기 트리거(trigger) 값을 포함하도록 구성될 수 있다. 상술한 구성에 따르면, 메모리 주소의 시작 값부터 연속 읽기 트리거 값이 바뀔 때까지 연속으로 데이터를 읽을 수 있다. 상술한 구성에 따르면, 데이터를 연속으로 읽을 수 있기 때문에 메모리 실효 대역폭을 증가시킬 수 있는 효과가 있다. For example, the memory address value may be configured to include a start value of the memory address corresponding to the requested data and a data continuous read trigger value. According to the above configuration, data can be read continuously from the start value of the memory address until the continuous read trigger value is changed. According to the above-described configuration, since data can be read continuously, there is an effect that an effective memory bandwidth can be increased.

예를 들면, 메모리 주소 값은 요청된 데이터에 대응되는 메모리 주소의 시작 값과 데이터의 개수 정보를 포함하도록 구성될 수 있다. 데이터의 개수의 단위는 메모리의 용량의 단위에 기초하여 결정될 수 있다. 단위는 예를 들면, 8비트인 1바이트(byte), 4바이트인 1단어(word), 또는 1024바이트인 1블록(block) 중 하나일 수 있다. 단, 본 개시는 이에 제한되지 않는다. 상술한 구성에 따르면, 메모리 주소의 시작 값부터 설정된 단위 크기의 데이터 개수만큼 연속으로 데이터를 읽을 수 있다. 상술한 구성에 따르면, 데이터를 연속으로 읽을 수 있기 때문에 메모리 실효 대역폭을 증가시킬 수 있는 효과가 있다.For example, the memory address value may be configured to include a start value of a memory address corresponding to the requested data and information on the number of data. The unit of the number of data may be determined based on the unit of the capacity of the memory. The unit may be, for example, one of 8 bits of 1 byte, 4 bytes of 1 word, or 1024 bytes of 1 block. However, the present disclosure is not limited thereto. According to the above-described configuration, data can be read continuously from the start value of the memory address as much as the number of data of the set unit size. According to the above configuration, since data can be read continuously, there is an effect that an effective memory bandwidth can be increased.

예를 들면, 메모리가 비휘발성 메모리인 경우, 메모리 주소 값은 물리-논리 주소 매핑 테이블 또는 플래시 변환 계층(flash translation layer) 정보를 더 포함할 수 있다. 단, 본 개시는 이에 제한되지 않는다. For example, when the memory is a non-volatile memory, the memory address value may further include a physical-logical address mapping table or flash translation layer information. However, the present disclosure is not limited thereto.

예를 들면, 동작 모드는 읽기(read) 모드 및 쓰기(write) 모드를 포함하도록 구성될 수 있다. 읽기 및 쓰기는 버스트 모드를 더 포함할 수 있다.For example, the operation mode may be configured to include a read mode and a write mode. Reading and writing may further include a burst mode.

예를 들면, 동작 모드는 덮어쓰기(overwrite)를 더 포함하도록 구성될 수 있다. 단, 본 개시는 이에 제한되지 않는다.For example, the mode of operation may be configured to further include overwrite. However, the present disclosure is not limited thereto.

인공신경망 메모리 제어부(120)는 데이터 접근 요청들 각각의 식별 정보의 동일 여부를 결정하도록 구성될 수 있다. The artificial neural network memory controller 120 may be configured to determine whether the identification information of each of the data access requests is the same.

예를 들면, 인공신경망 메모리 제어부(120)는 데이터 접근 요청들 각각의 메모리 주소 및 동작 모드의 동일 여부를 결정하도록 구성될 수 있다. 다르게 설명하면, 인공신경망 메모리 제어부(120)는 동일한 메모리 주소 값 및 동일한 동작 모드를 가지는 데이터 접근 요청 값을 감지하도록 구성될 수 있다.For example, the artificial neural network memory controller 120 may be configured to determine whether the memory address and operation mode of each of the data access requests are the same. In other words, the artificial neural network memory controller 120 may be configured to detect a data access request value having the same memory address value and the same operation mode.

예를 들면, 제1 데이터 접근 요청의 메모리 주소 값 및 동작 모드와 제10 데이터 접근 요청의 메모리 주소 값 및 동작 모드가 서로 동일할 때, 인공신경망 메모리 제어부(120)는 해당 메모리 주소 값 및 동작 모드에 대응되는 인공신경망 데이터 지역성 패턴을 생성하도록 구성된다. For example, when the memory address value and operation mode of the first data access request and the memory address value and operation mode of the tenth data access request are the same, the artificial neural network memory controller 120 controls the memory address value and operation mode It is configured to generate an artificial neural network data locality pattern corresponding to .

인공신경망 데이터 지역성 패턴은, 데이터 접근 요청들의 메모리의 주소들을 순차적으로 기록한 데이터를 포함하도록 구성될 수 있다.The artificial neural network data locality pattern may be configured to include data in which addresses of data access requests are sequentially recorded.

즉, 인공신경망 메모리 제어부(120)는 동일한 메모리 주소 값 및 동작 모드를 가지는 데이터 접근 요청들의 반복 주기를 감지하여 반복되는 메모리 주소 값 및 동작 모드를 가지는 데이터 접근 요청들로 구성된 인공신경망 데이터 지역성 패턴을 생성하도록 구성될 수 있다. That is, the neural network memory control unit 120 detects a repetition period of data access requests having the same memory address value and operation mode, and generates an artificial neural network data locality pattern composed of data access requests having a repeated memory address value and operation mode. can be configured to create

즉, 인공신경망 메모리 제어부(120)는 데이터 접근 요청에 포함된 메모리 주소의 반복 패턴을 감지하여 인공신경망 데이터 지역성 패턴을 생성하도록 구성될 수 있다.That is, the artificial neural network memory control unit 120 may be configured to generate an artificial neural network data locality pattern by detecting a repeating pattern of a memory address included in the data access request.

도 2를 참조하여 설명하면, 인공신경망 메모리 제어부(120)가 제1 번째 데이터 접근 요청과 제10 번째 데이터 접근 요청의 메모리 주소 값 및 동작 모드가 동일한 것을 확인할 경우, 인공신경망 메모리 제어부(120)는 동일한 데이터 접근 요청들 중 시작되는 데이터 접근 요청부터 반복되는 데이터 접근 요청의 이전 데이터 접근 요청 까지를 하나의 인공신경망 데이터 지역성 패턴으로 생성하도록 구성될 수 있다. 이러한 경우, 인공신경망 메모리 제어부(120)는 제1 데이터 접근 요청 내지 제9 데이터 접근 요청을 포함하는 인공신경망 데이터 지역성 패턴을 생성하도록 구성될 수 있다.Referring to FIG. 2 , when the neural network memory controller 120 confirms that the memory address values and operation modes of the first data access request and the tenth data access request are the same, the neural network memory controller 120 is It can be configured to generate one artificial neural network data locality pattern from a data access request initiated among the same data access requests to a previous data access request of a repeated data access request. In this case, the neural network memory control unit 120 may be configured to generate the neural network data locality pattern including the first data access request to the ninth data access request.

즉, 도 2의 예시에 설명된 인공신경망 데이터 지역성 패턴은 제1 데이터 접근 요청, 제2 데이터 접근 요청, 제3 데이터 접근 요청, 제4 데이터 접근 요청, 제5 데이터 접근 요청, 제6 데이터 접근 요청, 제7 데이터 접근 요청, 제8 데이터 접근 요청 및 제9 데이터 접근 요청 순서로 구성된 메모리 주소 값들 동작 모드 값들을 포함하도록 구성될 수 있다.That is, the artificial neural network data locality pattern described in the example of FIG. 2 is a first data access request, a second data access request, a third data access request, a fourth data access request, a fifth data access request, and a sixth data access request. , the seventh data access request, the eighth data access request, and the ninth data access request sequence may be configured to include memory address values and operation mode values.

인공신경망 메모리 제어부(120)가 생성한 인공신경망 데이터 지역성 패턴은 로그 파일(log file), 테이블(table), 리스트(list) 등 다양한 형태로 저장될 수 있으며, 본 개시의 일 예시에 따른 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴의 기록된 형태나 양식 등에 제한되지 않는다.The artificial neural network data locality pattern generated by the artificial neural network memory controller 120 may be stored in various forms such as a log file, a table, and a list, and the artificial neural network according to an example of the present disclosure The memory control unit 120 is not limited to the recorded form or form of the artificial neural network data locality pattern.

인공신경망 메모리 제어부(120)가 생성한 인공신경망 데이터 지역성 패턴은 인공신경망 메모리 제어부(120)의 임의의 메모리에 저장될 수 있으며, 본 개시의 일 예시에 따른 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴을 저장하는 메모리의 구조 또는 방식 등에 제한되지 않는다. The artificial neural network data locality pattern generated by the artificial neural network memory controller 120 may be stored in an arbitrary memory of the artificial neural network memory controller 120, and the artificial neural network memory controller 120 according to an example of the present disclosure It is not limited to the structure or method of the memory for storing the data locality pattern.

인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴 저장을 위한 임의의 메모리를 더 포함하도록 구성될 수 있다. 단, 본 개시의 일 예시에 따른 인공신경망 메모리 제어부(120)는 이에 제한되지 않으며, 외부 메모리와 통신하도록 구성될 수 있다.The artificial neural network memory control unit 120 may be configured to further include an arbitrary memory for storing artificial neural network data locality patterns. However, the artificial neural network memory controller 120 according to an example of the present disclosure is not limited thereto, and may be configured to communicate with an external memory.

즉, 본 개시의 일 예시에 따른 인공신경망 메모리 시스템(100)은 인공신경망 연산에 대응되는 데이터 접근 요청을 생성하도록 구성된 적어도 하나의 프로세서(110) 및 데이터 접근 요청을 순차적으로 기록하여 인공신경망 데이터 지역성 패턴을 생성하도록 구성된 인공신경망 메모리 제어부(120)를 포함하도록 구성될 수 있다.That is, the artificial neural network memory system 100 according to an example of the present disclosure sequentially records the data access request and at least one processor 110 configured to generate a data access request corresponding to the artificial neural network operation, so that the artificial neural network data locality It may be configured to include an artificial neural network memory controller 120 configured to generate a pattern.

인공신경망 메모리 제어부(120)가 인공신경망 데이터 지역성 패턴을 생성한 경우, 인공신경망 메모리 제어부(120)는 프로세서(110)로부터 수신되는 각각의 데이터 접근 요청의 메모리 주소 값 및 동작 모드 값이 기 생성된 인공신경망 데이터 지역성 패턴에 포함된 메모리 주소 값들 및 동작 모드 값들 중 어느 하나와 일치하는지 결정하도록 구성될 수 있다.When the artificial neural network memory controller 120 generates the artificial neural network data locality pattern, the artificial neural network memory controller 120 determines that the memory address value and the operation mode value of each data access request received from the processor 110 are previously generated. It may be configured to determine whether any one of memory address values and operation mode values included in the artificial neural network data locality pattern matches.

도 2를 참조하여 설명하면, 인공신경망 메모리 제어부(120)가 제10 데이터 접근 요청을 프로세서(110)로부터 수신할 때, 인공신경망 메모리 제어부(120)는 수신된 데이터 접근 요청이 인공신경망 데이터 지역성 패턴에 포함된 메모리 주소 값과 동일한 메모리 주소 값을 가지고 있는지를 결정하도록 구성될 수 있다. Referring to FIG. 2 , when the artificial neural network memory controller 120 receives the tenth data access request from the processor 110 , the artificial neural network memory controller 120 determines that the received data access request is an artificial neural network data locality pattern. and determine whether it has the same memory address value as the memory address value contained in the .

도 2의 예시를 참조하여 설명하면, 인공신경망 메모리 제어부(120)가 제10 데이터 접근 요청을 수신 받는 경우, 인공신경망 메모리 제어부(120)는 제10 데이터 접근 요청의 메모리 주소 값인 시작 값 [0] 및 끝 값 [0x1000000]과 제1 데이터 접근 요청의 메모리 주소 값인 시작 값 [0] 및 끝 값 [0x1000000]이 서로 동일하다는 것을 감지하고, 제10 데이터 접근 요청의 동작 모드의 읽기 모드 값과 제1 데이터 접근 요청의 동작 모드의 읽기 모드 값이 서로 동일하다는 것을 감지하여, 제10 데이터 접근 요청이 제1 데이터 접근 요청과 서로 동일하고, 제10 데이터 접근 요청은 인공신경망 연산이라고 결정하도록 구성될 수 있다. 2, when the artificial neural network memory controller 120 receives the tenth data access request, the neural network memory controller 120 controls the memory address value of the tenth data access request, starting value [0]. and detecting that the end value [0x1000000] and the start value [0] and the end value [0x1000000] that are the memory address values of the first data access request are the same, and the read mode value of the operation mode of the tenth data access request and the first Detect that the read mode values of the operation modes of the data access request are identical to each other, and determine that the tenth data access request is the same as the first data access request, and that the tenth data access request is an artificial neural network operation. .

인공신경망 메모리 제어부(120)가 제11 데이터 접근 요청을 수신 받는 경우, 제11 데이터 접근 요청의 메모리 주소 값인 시작 값 [0x1100000] 끝 값 [0x1110000]과 제2 데이터 접근 요청의 메모리 주소 값인 시작 값 [0x1100000] 끝 값 [0x1110000]이 동일하다는 것을 감지하고, 제11 데이터 접근 요청의 동작 모드의 쓰기 모드 값과 제2 데이터 접근 요청의 동작 모드의 쓰기 모드 값이 서로 동일하다는 것을 감지하여, 제11 데이터 접근 요청이 제2 데이터 접근 요청과 서로 동일하고, 제11 데이터 접근 요청은 인공신경망 연산이라고 결정하도록 구성될 수 있다. When the artificial neural network memory control unit 120 receives the eleventh data access request, the start value [0x1100000] that is the memory address value of the eleventh data access request, the end value [0x1110000], and the start value that is the memory address value of the second data access request [ 0x1100000] detecting that the end value [0x1110000] is the same, detecting that the write mode value of the operation mode of the eleventh data access request and the write mode value of the operation mode of the second data access request are the same It may be configured to determine that the access request is identical to the second data access request, and the eleventh data access request is an artificial neural network operation.

즉, 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴의 시작과 끝을 구분할 수 있다. 또한, 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴의 끝 이후에 특별한 명령이 없으면 인공신경망 데이터 지역성 패턴의 시작을 사전에 준비할 수 있다. 따라서, 동일한 동작이 반복될 때, 추론의 끝을 기초로 다음 추론의 시작을 예측하여 다음 추론의 시작 전에 데이터를 준비할 수 있는 효과가 있다. 따라서, 동일한 인공신경망 데이터 지역성 패턴이 반복될 경우 시작과 끝에서의 지연 시간을 방지 또는 저감할 수 있다.That is, the artificial neural network memory controller 120 may distinguish the start and the end of the artificial neural network data locality pattern. In addition, the artificial neural network memory controller 120 may prepare in advance for the start of the artificial neural network data locality pattern if there is no special command after the end of the artificial neural network data locality pattern. Therefore, when the same operation is repeated, there is an effect that data can be prepared before the start of the next speculation by predicting the start of the next speculation based on the end of the speculation. Therefore, when the same artificial neural network data locality pattern is repeated, it is possible to prevent or reduce the delay time at the beginning and the end.

도 2를 다시 참조하면, 인공신경망 메모리 제어부(120)는 제1 데이터 접근 요청부터 제9 데이터 접근 요청까지는 인공신경망 데이터 지역성 패턴을 생성하지 않은 경우를 예시하고 있다. 이러한 경우는, 인공신경망 메모리 제어부(120)이 초기화 되거나, 프로세서(110)가 인공신경망 연산을 수행하지 않은 경우일 수 있다. 따라서 인공신경망 메모리 제어부(120)는 제9 데이터 접근 요청까지 패턴이 일치되는 경우를 감지하지 않는다. 인공신경망 메모리 제어부(120)는 제10 데이터 접근 요청 시 제1 데이터 접근 요청과 동일성을 결정하고 인공신경망 데이터 지역성 패턴을 생성하고, 패턴의 일치 여부를 기록할 수 있다. 제10 데이터 접근 요청내지 제18 데이터 접근 요청은 제1 데이터 접근 요청내지 제9 데이터 접근 요청과 동일하기 때문에, 인공신경망 메모리 제어부(120)는 제10 데이터 접근 요청내지 제18 데이터 접근의 패턴은 인공신경망 데이터 지역성 패턴과 일치한다고 결정할 수 있다.Referring back to FIG. 2 , the neural network memory controller 120 exemplifies a case in which the neural network data locality pattern is not generated from the first data access request to the ninth data access request. In this case, the artificial neural network memory controller 120 may be initialized or the processor 110 may not perform the artificial neural network operation. Therefore, the artificial neural network memory controller 120 does not detect a case in which the pattern matches until the ninth data access request. When the tenth data access request is made, the artificial neural network memory controller 120 may determine the same as the first data access request, generate an artificial neural network data locality pattern, and record whether the pattern matches. Since the tenth data access request to the eighteenth data access request are the same as the first data access request to the ninth data access request, the artificial neural network memory control unit 120 determines that the pattern of the tenth data access request to the eighteenth data access request is artificial. It can be determined that it matches the neural network data locality pattern.

즉, 본 개시의 일 예시에 따른 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴을 활용하여 프로세서(110)가 처리중인 연산이 인공신경망 연산인지 여부를 결정하도록 구성될 수 있다. 상술한 구성에 따르면, 인공신경망 메모리 제어부(120)는 프로세서(110)가 생성하는 메모리 주소 값 및 동작 모드 값을 포함하는 데이터 접근 요청만 수신하더라도 프로세서(110)가 인공신경망 연산을 처리중인 것을 결정할 수 있는 효과를 제공할 수 있다. 따라서 인공신경망 메모리 제어부(120)는 별도의 추가적인 식별 정보가 없더라도 인공신경망 데이터 지역성 패턴에 기초하여 프로세서(110)가 현재 인공신경망 연산을 수행하는지 여부를 결정할 수 있는 효과를 제공할 수 있다.That is, the artificial neural network memory controller 120 according to an example of the present disclosure may be configured to determine whether the operation being processed by the processor 110 is an artificial neural network operation by utilizing the artificial neural network data locality pattern. According to the above configuration, the artificial neural network memory control unit 120 determines that the processor 110 is processing the artificial neural network operation even if only the data access request including the memory address value and the operation mode value generated by the processor 110 is received. possible effects can be provided. Accordingly, the artificial neural network memory controller 120 may provide an effect of determining whether the processor 110 currently performs an artificial neural network operation based on the artificial neural network data locality pattern even without additional identification information.

도 2를 참조하여 부연 설명하면, 각각의 데이터 접근 요청은 토큰으로 저장되도록 구성될 수 있다. 예를 들면, 예를 들면, 인공신경망 각각의 데이터 접근 요청은 데이터 접근 요청을 토큰화(tokenization)하여 저장할 수 있다. 예를 들면, 인공신경망 각각의 데이터 접근 요청은 식별 정보를 기초로 토큰화 할 수 있다. 예를 들면, 인공신경망 각각의 데이터 접근 요청은 메모리 주소 값을 기초로 토큰화 할 수 있다. 단, 본 개시의 예시들은 이에 제한되지 않으며, 토큰은 코드(code) 또는 아이디(ID) 등으로 지칭될 수 있다. 예를 들면, 토큰은 워드 단위, 데이터 접근 요청 단위, 또는 ANN DL 단위일 수 있다.2 , each data access request may be configured to be stored as a token. For example, each data access request of the artificial neural network may be stored by tokenizing the data access request. For example, each data access request of an artificial neural network can be tokenized based on identification information. For example, each data access request of an artificial neural network can be tokenized based on a memory address value. However, examples of the present disclosure are not limited thereto, and the token may be referred to as a code or an ID. For example, the token may be a word unit, a data access request unit, or an ANN DL unit.

예를 들면, 제1 데이터 접근 요청은 토큰(token) [1]로 저장될 수 있다. 제4 데이터 접근 요청은 토큰 [4]로 저장될 수 있다. 제7 데이터 접근 요청은 토큰 [7]로 저장될 수 있다. 예를 들면, 인공신경망 데이터 지역성 패턴은 토큰 [1-2-3-4-5-6-7-8-9]로 저장될 수 있다. 예를 들면, 제 10 데이터 접근 요청은 토큰 [1]과 동일한 메모리 주소 값 및 동일한 동작 모드 값을 가지기 때문에 토큰 [1]로 저장될 수 있다. 제13 데이터 접근 요청은 토큰 [4]와 동일한 메모리 주소 값 및 동작 모드 값을 가지기 때문에 토큰 [4]로 저장될 수 있다. 따라서 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴의 토큰과 동일한 토큰을 감지하면, 해당 데이터 접근 요청이 인공신경망 연산인 것을 결정하도록 구성될 수 있다. For example, the first data access request may be stored as a token [1]. The fourth data access request may be stored as a token [4]. The seventh data access request may be stored as a token [7]. For example, the neural network data locality pattern may be stored as a token [1-2-3-4-5-6-7-8-9]. For example, the tenth data access request may be stored as token [1] because it has the same memory address value and the same operation mode value as token [1]. Since the thirteenth data access request has the same memory address value and operation mode value as the token [4], it may be stored as the token [4]. Accordingly, the neural network memory controller 120 may be configured to determine that the corresponding data access request is an artificial neural network operation when detecting the same token as the token of the artificial neural network data locality pattern.

상술한 구성에 따르면 인공신경망 메모리 제어부(120)는 토큰화 된 인공신경망 데이터 지역성 패턴을 활용하여 데이터 접근 요청을 쉽고 빠르게 인식하고 구분할 수 있는 효과가 있으며, 더 나아가서, 데이터 접근 요청에 추가적인 식별 정보 및/또는 데이터가 더 추가될 경우에도 동일한 토큰을 사용하여, 데이터 접근 요청의 추가 정보가 증가하는 경우에도 토큰을 활용하여 데이터 접근 요청을 쉽고 빠르게 인식하고 구분할 수 있는 효과를 제공할 수 있다. According to the above configuration, the artificial neural network memory control unit 120 has the effect of easily and quickly recognizing and classifying a data access request by using the tokenized artificial neural network data locality pattern, and furthermore, additional identification information and / Or even when more data is added, the same token can be used to provide the effect of quickly and easily recognizing and distinguishing a data access request by using the token even when the additional information of the data access request increases.

몇몇 예시에서는, 인공신경망 메모리 제어부에 저장된 인공신경망 데이터 지역성 패턴이 삭제되거나 또는 초기화 될 수 있다. 예를 들어, 인공신경망 데이터 지역성 패턴이 기 설정된 시간을 초과할 동안 활용되지 않을 경우, 예를 들면, 인공신경망 데이터 지역성 패턴과 매칭되는 데이터 접근 요청이 특정 시간 동안 생성되지 않는 경우, 인공신경망 메모리 제어부는 해당 인공신경망 데이터 지역성 패턴의 활용 빈도가 낮다고 결정하여, 해당 인공신경망 데이터 지역성 패턴을 삭제하거나 또는 초기화 할 수 있다. In some examples, the artificial neural network data locality pattern stored in the artificial neural network memory controller may be deleted or initialized. For example, when the artificial neural network data locality pattern is not utilized for more than a preset time, for example, when a data access request matching the artificial neural network data locality pattern is not generated for a specific time, the artificial neural network memory controller determines that the frequency of use of the artificial neural network data locality pattern is low, and may delete or initialize the artificial neural network data locality pattern.

상술한 구성에 따르면, 인공신경망 데이터 지역성 패턴을 저장하는 메모리의 저장공간의 활용도를 향상시킬 수 있는 효과가 있다.According to the above configuration, there is an effect that can improve the utilization of the storage space of the memory for storing the artificial neural network data locality pattern.

몇몇 예시에서는, 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴의 갱신된 패턴과 이전의 패턴을 저장하여, 인공신경망모델의 변화 여부를 결정하도록 구성될 수 있다. 즉, 인공신경망 메모리 제어부는 인공신경망모델의 개수가 복수일 경우, 인공신경망모델의 개수에 대응되는 인공신경망 데이터 지역성 패턴들을 더 생성하도록 구성될 수 있다. In some examples, the artificial neural network memory controller may be configured to store the updated pattern and the previous pattern of the artificial neural network data locality pattern to determine whether to change the artificial neural network model. That is, when the number of artificial neural network models is plural, the artificial neural network memory controller may be configured to further generate artificial neural network data locality patterns corresponding to the number of artificial neural network models.

예를 들면, 제1 인공신경망 데이터 지역성 패턴은 토큰 [1-2-3-4-5-6-7-8-9]이고 제2 인공신경망 데이터 지역성 패턴은 토큰 [11-12-13-14-15-16]일 경우, 프로세서가 토큰 [1]에 대응되는 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부는 제1 인공신경망 데이터 지역성 패턴을 선택하도록 구성될 수 있다. 또는 프로세서가 토큰 [11]에 대응되는 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부는 제2 인공신경망 데이터 지역성 패턴을 선택하도록 구성될 수 있다. For example, the first neural network data locality pattern is token [1-2-3-4-5-6-7-8-9] and the second neural network data locality pattern is token [11-12-13-14 -15-16], when the processor generates a data access request corresponding to the token [1], the neural network memory control unit may be configured to select the first artificial neural network data locality pattern. Alternatively, when the processor generates a data access request corresponding to the token [11], the artificial neural network memory control unit may be configured to select the second artificial neural network data locality pattern.

상술한 구성에 의하면, 인공신경망 메모리 제어부는 복수의 인공신경망 데이터 지역성 패턴을 저장할 수 있으며, 프로세서가 처리하는 인공신경망모델이 다른 인공신경망모델로 바뀔 때, 기 저장된 인공신경망 데이터 지역성 패턴을 빠르게 적용할 수 있는 효과가 있다.According to the above configuration, the artificial neural network memory controller can store a plurality of artificial neural network data locality patterns, and when the artificial neural network model processed by the processor is changed to another artificial neural network model, the previously stored artificial neural network data locality pattern can be quickly applied. can have an effect.

몇몇 예시에서는, 인공신경망 메모리 제어부는 데이터 접근 요청들이 하나의 인공신경망모델의 요청들인지 또는 복수의 인공신경망모델들의 요청들이 혼합된 것인지 여부를 결정하도록 구성될 수 있다. 또한, 인공신경망 메모리 제어부는 복수의 인공신경망모델들 각각의 인공신경망 데이터 지역성에 대응되는 데이터 접근 요청을 각각 예측하도록 구성될 수 있다.In some examples, the neural network memory controller may be configured to determine whether the data access requests are requests of one artificial neural network model or a mixture of requests of a plurality of neural network models. Also, the artificial neural network memory controller may be configured to predict a data access request corresponding to the artificial neural network data locality of each of the plurality of artificial neural network models.

예를 들면, 프로세서는 복수개의 인공신경망모델을 동시에 처리할 수 있으며, 이러한 경우에 프로세서가 생성하는 데이터 접근 요청은 복수개의 인공신경망모델에 대응되는 데이터 접근 요청이 혼합될 수 있다. For example, the processor may simultaneously process a plurality of artificial neural network models, and in this case, the data access request generated by the processor may be mixed with data access requests corresponding to the plurality of artificial neural network models.

예를 들면, 제1 인공신경망 데이터 지역성 패턴은 토큰 [1-2-3-4-5-6-7-8-9]이고 제2 인공신경망 데이터 지역성 패턴은 토큰 [11-12-13-14-15-16]일 경우, 프로세서(110)는 [1-11-2-3-12-13-14-4-5-6-15-16-7-8-9]의 순서로 데이터 접근 요청에 대응되는 토큰을 생성할 수 있다. For example, the first neural network data locality pattern is token [1-2-3-4-5-6-7-8-9] and the second neural network data locality pattern is token [11-12-13-14 -15-16], the processor 110 requests data access in the order of [1-11-2-3-12-13-14-4-5-6-15-16-7-8-9] You can create a token corresponding to .

인공신경망 메모리 제어부는 각각의 인공신경망 데이터 지역성 패턴을 알기 때문에, 토큰[1]이 생성된 다음 토큰[11]이 생성되더라도, 인공신경망 메모리 제어부는 토큰[2]가 다음에 생성될 것을 예측할 수 있다. 따라서 인공신경망 메모리 제어부는 토큰[2]에 대응되는 사전 데이터 접근을 생성할 수 있다. 또한 토큰[11]이 생성된 다음 토큰[2]가 생성되더라도, 인공신경망 메모리 제어부는 토큰 [12]가 다음에 생성될 것을 예측할 수 있다. 따라서 인공신경망 메모리 제어부는 토큰[12]에 대응되는 사전 데이터 접근을 생성할 수 있다.Since the neural network memory controller knows each neural network data locality pattern, even if token [1] is generated and then token [11] is generated, the neural network memory controller can predict that token [2] will be generated next. . Therefore, the neural network memory control unit can generate a dictionary data access corresponding to the token [2]. Also, even after token [11] is generated and then token [2] is generated, the artificial neural network memory controller can predict that token [12] will be generated next. Therefore, the neural network memory control unit can generate a dictionary data access corresponding to the token [12].

상술한 구성에 따르면, 인공신경망 메모리 제어부(120)는 복수의 인공신경망모델을 처리하는 프로세서(110)가 생성할 데이터 접근 요청을 인공신경망모델 별로 각각 예측하여 프로세서(110)가 요청할 데이터를 사전에 예측하여 대비할 수 있는 효과가 있다.According to the above configuration, the artificial neural network memory control unit 120 predicts the data access request to be generated by the processor 110 for processing a plurality of artificial neural network models for each artificial neural network model, and pre-predicts the data to be requested by the processor 110 . It has the effect of being able to predict and prepare for it.

몇몇 예시에서는, 인공신경망 메모리 제어부는 복수개의 인공신경망 데이터 지역성 패턴을 저장하도록 구성될 수 있다. In some examples, the neural network memory controller may be configured to store a plurality of neural network data locality patterns.

예를 들어, 프로세서가 2개의 인공신경망모델을 처리할 경우, 인공신경망 메모리 제어부는 각각의 인공신경망모델의 인공신경망 데이터 지역성 패턴을 저장하도록 구성될 수 있다. For example, when the processor processes two artificial neural network models, the artificial neural network memory controller may be configured to store artificial neural network data locality patterns of each artificial neural network model.

상술한 구성에 따르면, 각각의 인공신경망모델의 연산이 처리될 때, 각각의 모델에 대응되는 실제 데이터 접근 요청이 예측될 수 있기 때문에, 본 발명의 예시는 인공신경망 연산의 처리 속도를 향상시킬 수 있는 효과가 있다.According to the above configuration, when the operation of each artificial neural network model is processed, the actual data access request corresponding to each model can be predicted, so the example of the present invention can improve the processing speed of the artificial neural network operation. there is an effect

몇몇 예시에서는, 인공신경망 메모리 제어부는, 인공신경망 데이터 지역성 패턴을 기계학습을 하도록 구성된 인공신경망모델을 더 포함하도록 구성될 수 있다.In some examples, the artificial neural network memory control unit may be configured to further include an artificial neural network model configured to machine-learning artificial neural network data locality patterns.

상술한 구성에 따르면, 인공신경망 메모리 제어부의 인공신경망모델은 프로세서가 생성하는 데이터 접근 요청을 실시간으로 강화 학습하도록 구성될 수 있다. 또한 인공신경망 메모리 제어부의 인공신경망모델은 종래에 잘 알려진 인공신경망모델들의 인공신경망 데이터 지역성 패턴들을 학습 자료로 활용하여 학습된 모델일 수 있다. 따라서 인공신경망 메모리 제어부는 다양한 인공신경망모델들을 인공신경망 데이터 지역성 패턴을 추출해 낼 수 있는 효과가 있다. 특히 서버와 같이 다수의 사용자의 요청에 의해서 다양한 인공신경망모델들을 처리할 때 이러한 방식이 효과적일 수 있다. According to the above configuration, the artificial neural network model of the artificial neural network memory controller may be configured to reinforce-learning data access requests generated by the processor in real time. In addition, the artificial neural network model of the artificial neural network memory controller may be a model learned by using artificial neural network data locality patterns of conventionally well-known artificial neural network models as learning materials. Therefore, the artificial neural network memory control unit has the effect of extracting artificial neural network data locality patterns from various artificial neural network models. In particular, this method can be effective when processing various artificial neural network models according to requests from multiple users, such as a server.

도 2를 참조하여 부연 설명하면, 인공신경망 메모리 제어부(120)는 프로세서(110)가 처리하는 인공신경망모델을 동적으로 또는 실시간으로 모니터링하고, 인공신경망모델의 변경 여부를 결정하도록 구성될 수 있다. 2 , the artificial neural network memory controller 120 may be configured to dynamically or in real time monitor the artificial neural network model processed by the processor 110 and determine whether to change the artificial neural network model.

예를 들면, 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴의 패턴 일치 빈도를 통계적으로 활용하여 인공신경망 데이터 지역성 패턴의 신뢰도를 결정하도록 구성될 수 있다. 데이터 지역성 패턴의 패턴 일치 빈도가 증가할수록 인공신경망 데이터 지역성 패턴의 신뢰도가 증가하도록 구성되고, 데이터 지역성 패턴의 패턴 일치 빈도가 저감될수록 인공신경망 데이터 지역성 패턴의 신뢰도가 감소하도록 구성될 수 있다. For example, the artificial neural network memory controller 120 may be configured to statistically utilize the pattern matching frequency of the artificial neural network data locality pattern to determine the reliability of the artificial neural network data locality pattern. The reliability of the neural network data locality pattern may be configured to increase as the pattern matching frequency of the data locality pattern increases, and the reliability of the neural network data locality pattern may be configured to decrease as the pattern matching frequency of the data locality pattern decreases.

상술한 구성에 따르면, 프로세서(110)가 특정 인공신경망모델을 반복 처리할 때 인공신경망 메모리 제어부(120)는 특정 인공신경망모델의 인공신경망 데이터 지역성 예측 신뢰도가 향상될 수 있는 효과가 있다. According to the above configuration, when the processor 110 repeatedly processes a specific artificial neural network model, the artificial neural network memory control unit 120 has an effect that the locality prediction reliability of the artificial neural network data of the specific artificial neural network model can be improved.

도 3은 본 개시의 다양한 예시들에 적용될 수 있는 인공신경망 데이터 지역성 패턴의 설명을 위한 예시적인 인공신경망모델을 나타내는 개략도이다.3 is a schematic diagram illustrating an exemplary artificial neural network model for explaining an artificial neural network data locality pattern that can be applied to various examples of the present disclosure.

도 3에 도시된 프로세서(110)가 처리중인 예시적인 인공신경망모델(1300)은 특정 추론 기능을 하도록 학습된 임의의 인공신경망모델일 수 있다. 단지 설명의 편의를 위해서 각각의 모든 노드(node)가 모두 연결된(fully-connected) 인공신경망모델을 도시하였지만, 본 개시는 이에 제한되지 않는다.The exemplary artificial neural network model 1300 being processed by the processor 110 shown in FIG. 3 may be any artificial neural network model trained to perform a specific reasoning function. For convenience of explanation only, although an artificial neural network model in which all nodes are fully-connected is illustrated, the present disclosure is not limited thereto.

도 3에 도시되지 않았지만, 본 개시에 적용될 수 있는 인공신경망모델은 심층 신경망(DNN, Deep Neural Network)의 한 종류인 컨벌루션 신경망(CNN, Convolutional Neural Network)일 수 있다. 예시적인 인공신경망모델은 VGG, VGG16, DenseNet 및, encoder-decoder structure를 갖는 FCN (Fully Convolutional Network), SegNet, DeconvNet, DeepLAB V3+, U-net와 같은 DNN (deep neural network), SqueezeNet, Alexnet, ResNet18, MobileNet-v2, GoogLeNet, Resnet-v2, Resnet50, Resnet101, Inception-v3 등의 모델이거나 또는 적어도 두 개의 서로 다른 모델들에 기초한 앙상블 모델일 수도 있다 수 있다. 단, 본 개시의 인공신경망모델은 이에 제한되지 않는다.Although not shown in FIG. 3 , the artificial neural network model applicable to the present disclosure may be a convolutional neural network (CNN), which is a type of a deep neural network (DNN). Exemplary artificial neural network models include VGG, VGG16, DenseNet and FCN (Fully Convolutional Network) with encoder-decoder structure, SegNet, DeconvNet, DeepLAB V3+, Deep neural network (DNN) such as U-net, SqueezeNet, Alexnet, ResNet18 , MobileNet-v2, GoogLeNet, Resnet-v2, Resnet50, Resnet101, Inception-v3, etc., or an ensemble model based on at least two different models. However, the artificial neural network model of the present disclosure is not limited thereto.

상술한 예시적인 인공신경망모델들은 인공신경망 데이터 지역성을 가지도록 구성될 수 있다.The above-described exemplary artificial neural network models may be configured to have artificial neural network data locality.

다시 도 3을 참조하여 프로세서(110)가 처리하는 인공신경망모델의 인공신경망 데이터 지역성에 대해서 자세히 설명한다. Referring again to FIG. 3 , the locality of the artificial neural network data of the artificial neural network model processed by the processor 110 will be described in detail.

예시적인 인공신경망모델(1300)은 입력 레이어(1310), 제1 연결망(1320), 제1 은닉 레이어(1330), 제2 연결망(1340), 제2 은닉 레이어(1350), 제3 연결망(1360), 및 출력 레이어(1370)을 포함한다. An exemplary artificial neural network model 1300 includes an input layer 1310 , a first connection network 1320 , a first hidden layer 1330 , a second connection network 1340 , a second hidden layer 1350 , and a third connection network 1360 . ), and an output layer 1370 .

인공신경망의 연결망은 대응되는 가중치 값을 가진다. 연결망의 가중치 값은 입력 노드 값과 곱해지고, 곱해진 값들의 누산된 값이 대응되는 출력 레이어의 노드에 저장된다. The artificial neural network has a corresponding weight value. The weight value of the network is multiplied by the input node value, and the accumulated value of the multiplied values is stored in the node of the corresponding output layer.

부연 설명하면, 인공신경망모델(1300)의 연결망은 선으로 도시되어 있으며 가중치는 ⓧ로 도시되어 있다. To elaborate, the connection network of the artificial neural network model 1300 is shown as a line and the weight is shown as ⓧ.

부연 설명하면, 누산된 값에 비선형성을 부여하기 위한 여러 가지 활성화 함수를 추가적으로 제공하도록 구성될 수 있다. 활성화 함수는 예를 들면, 시그모이드 함수, 하이퍼볼릭 탄젠트 함수, ELU 함수, Hard-Sigmoid 함수, Swish 함수, Hard-Swish 함수, SELU 함수, CELU 함수, GELU 함수, TANHSHRINK 함수, SOFTPLUS 함수, MISH 함수, Piecewise Interpolation Approximation for Non-linear 함수 또는 ReLU함수 등일 수 있다. 단, 본 개시는 이에 제한되지 않는다. To elaborate, it may be configured to additionally provide various activation functions for imparting nonlinearity to the accumulated value. The activation function is, for example, a sigmoid function, a hyperbolic tangent function, an ELU function, a Hard-Sigmoid function, a Swish function, a Hard-Swish function, a SELU function, a CELU function, a GELU function, a TANHSHRINK function, a SOFTPLUS function, a MISH function. , it may be a Piecewise Interpolation Approximation for Non-linear function or a ReLU function. However, the present disclosure is not limited thereto.

예시적인 인공신경망모델(1300)의 입력 레이어(1310)는 x1 및 x2 입력 노드를 포함한다. The input layer 1310 of the exemplary artificial neural network model 1300 includes x1 and x2 input nodes.

예시적인 인공신경망모델(1300)의 제1 연결망(1320)은 입력 레이어(1310)의 각각의 노드와 제1 은닉 레이어(1330)의 노드들을 연결하는 6개의 가중치 값을 가지는 연결망들을 포함한다. The first connection network 1320 of the exemplary artificial neural network model 1300 includes connections having six weight values that connect each node of the input layer 1310 and the nodes of the first hidden layer 1330 .

예시적인 인공신경망모델(1300)의 제1 은닉 레이어(1330)는 a1, a2, 및 a3 노드를 포함한다. 제1 연결망(1320)의 가중치 값들은 대응되는 입력 레이어(1310)의 노드 값과 곱해지고, 곱해진 값들의 누산된 값이 제1 은닉 레이어(1330)에 저장된다. The first hidden layer 1330 of the exemplary artificial neural network model 1300 includes nodes a1, a2, and a3. The weight values of the first connection network 1320 are multiplied by the node values of the corresponding input layer 1310 , and the accumulated value of the multiplied values is stored in the first hidden layer 1330 .

예시적인 인공신경망모델(1300)의 제2 연결망(1340)은 제1 은닉 레이어(1330)의 노드들과 제2 은닉 레이어(1350)의 노드들을 연결하는 9개의 가중치 값을 가지는 연결망들을 포함한다. The second connection network 1340 of the exemplary artificial neural network model 1300 includes connections having nine weight values connecting nodes of the first hidden layer 1330 and nodes of the second hidden layer 1350 .

예시적인 인공신경망모델(1300)의 제2 은닉 레이어(1350)는 b1, b2, 및 b3 노드를 포함한다. 제2 연결망(1340)의 가중치 값은 대응되는 제1 은닉 레이어(1330)의 노드 값과 곱해지고, 곱해진 값들의 누산된 값이 제2 은닉 레이어(1350)에 저장된다. The second hidden layer 1350 of the exemplary artificial neural network model 1300 includes nodes b1, b2, and b3. A weight value of the second connection network 1340 is multiplied by a corresponding node value of the first hidden layer 1330 , and an accumulated value of the multiplied values is stored in the second hidden layer 1350 .

예시적인 인공신경망모델(1300)의 제3 연결망(1360)은 제2 은닉 레이어(1350)의 각각의 노드와 출력 레이어(1370)의 각각의 노드를 연결하는 6개의 가중치 값을 가지는 연결망들을 포함한다. The third connection network 1360 of the exemplary artificial neural network model 1300 includes connections having six weight values connecting each node of the second hidden layer 1350 and each node of the output layer 1370. .

예시적인 인공신경망모델(1300)의 출력 레이어(1370)는 y1, 및 y2 노드를 포함한다. 제3 연결망(1360)의 가중치 값은 대응되는 제2 은닉 레이어(1350)의 입력 노드 값과 곱해지고, 곱해진 값들의 누산된 값이 출력 레이어(1370)에 저장된다.The output layer 1370 of the exemplary artificial neural network model 1300 includes nodes y1 and y2. The weight value of the third connection network 1360 is multiplied by the input node value of the corresponding second hidden layer 1350 , and the accumulated value of the multiplied values is stored in the output layer 1370 .

상술한 인공신경망모델(1300)의 구조에 의하면, 각 레이어 별 연산은 순차적으로 수행되어야 한다는 사실을 인식할 수 있다. 즉, 인공신경망모델의 구조가 확정될 경우, 레이어 별 연산순서가 정해져야 하며, 순서를 다르게 연산할 경우, 추론 결과가 부정확해질 수 있는 문제가 발생할 수 있다. 이러한 인공신경망모델의 구조에 따른 연산의 순서 또는 데이터 흐름의 순서를 인공신경망 데이터 지역성으로 정의할 수 있다. According to the structure of the artificial neural network model 1300 described above, it can be recognized that the operation for each layer must be sequentially performed. That is, when the structure of the artificial neural network model is determined, the order of operation for each layer must be determined, and if the order is calculated differently, a problem that the inference result may be inaccurate may occur. The order of operations or data flow according to the structure of the artificial neural network model can be defined as artificial neural network data locality.

부연 설명하면, 단지 설명의 편의를 위해서 도 2에서 레이어 단위로 설명하였으나, 본 개시의 예시들은 레이어 단위에 제한되지 않는다. 본 개시의 예시들에 따른 프로세서(110)는 인공신경망 데이터 지역성에 기초하여 데이터를 처리하기 때문에, 레이어 단위가 아닌 워드 단위 또는 데이터 접근 요청 단위로 동작될 수 있다. 여기서 데이터 접근 요청의 데이터의 크기는 대응되는 레이어의 데이터 크기 이하일 수 있다.In more detail, although the description has been made on a layer-by-layer basis in FIG. 2 only for convenience of description, examples of the present disclosure are not limited to a layer-by-layer basis. Since the processor 110 according to examples of the present disclosure processes data based on artificial neural network data locality, it may be operated in units of words or data access requests, not in units of layers. Here, the data size of the data access request may be less than or equal to the data size of the corresponding layer.

다시 도 3을 참조하여 예를 들면, 제1 연결망(1320)의 가중치 값들과 입력 레이어(1310)의 노드 값의 곱셈 연산을 위해서 프로세서(110)는 레이어 단위로 데이터 접근 요청을 생성할 수 있다. Referring again to FIG. 3 , for example, for multiplication of weight values of the first connection network 1320 and node values of the input layer 1310 , the processor 110 may generate a data access request on a layer-by-layer basis.

하지만 프로세서(110)의 특징맵 분할 합성곱, 프로세싱 엘리먼트의 스테이셔너리 기법, 프로세서의 프로세싱 엘리먼트 개수, 프로세서(110)의 캐쉬 메모리 용량, 프로세서(110)의 메모리 계층 구조, 및/또는 프로세서(110)의 컴파일러 알고리즘에 따라서 제1 연결망(1320)의 가중치 값들과 입력 레이어(1310)의 노드 값들의 레이어 연산은 하나의 데이터 접근 요청으로 처리되지 않고, 복수로 분할된 순차적 데이터 접근 요청들로 처리될 수 있다. However, the feature map division convolution of the processor 110 , the stationary technique of processing elements, the number of processing elements of the processor, the cache memory capacity of the processor 110 , the memory hierarchy of the processor 110 , and/or the processor 110 . can

프로세서(110)가 요청할 데이터 접근 요청이 복수로 분할될 경우, 분할된 데이터 접근 요청들을 요청하는 순서가 인공신경망 데이터 지역성에 의해서 결정될 수 있다. 이때, 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성을 제공 받아서, 프로세서(110)가 다음에 요청할 실제 데이터 접근 요청, 즉, 다음 데이터 접근 요청에 대응되는 데이터를 제공할 준비를 하도록 구성되는 것도 가능하다. 이하 실제 데이터 접근 요청은 다음 데이터 접근 요청으로 지칭되는 것도 가능하다. 또는, 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성을 예측하여, 프로세서(110)가 요청할 실제 데이터 접근 요청에 대응되는 데이터를 제공할 준비를 하도록 구성되는 것도 가능하다.When the data access request requested by the processor 110 is divided into a plurality of pieces, the order of requesting the divided data access requests may be determined by the artificial neural network data locality. At this time, the artificial neural network memory control unit 120 is provided with the artificial neural network data locality, and the processor 110 is configured to prepare to provide data corresponding to the actual data access request requested next, that is, the next data access request. It is possible. Hereinafter, the actual data access request may be referred to as a next data access request. Alternatively, the artificial neural network memory control unit 120 may be configured to predict the artificial neural network data locality and prepare to provide data corresponding to the actual data access request requested by the processor 110 .

도 3에 도시된 인공신경망모델(1300)의 인공신경망 연산 시 프로세서(110)가 생성하는 데이터 접근 요청들과 인공신경망 데이터 지역성에 대해여 설명한다. The data access requests and artificial neural network data locality generated by the processor 110 during the artificial neural network operation of the artificial neural network model 1300 shown in FIG. 3 will be described.

프로세서(110)는 인공신경망모델(1300)의 입력 레이어(1310)는 입력 노드 값들을 읽기 위한 제1 데이터 접근 요청을 생성한다. 제1 데이터 접근 요청은 제1 메모리 주소 값 및 읽기 모드 값을 포함한다. 제1 데이터 접근 요청은 토큰[1]로 저장될 수 있다.The processor 110 generates a first data access request for reading the input node values of the input layer 1310 of the artificial neural network model 1300 . The first data access request includes a first memory address value and a read mode value. The first data access request may be stored as a token [1].

다음으로, 프로세서(110)는 인공신경망모델(1300)의 제1 연결망(1320)의 가중치 값들을 읽기 위한 제2 데이터 접근 요청을 생성한다. 제2 데이터 접근 요청은 제2 메모리 주소 값 및 읽기 모드 값을 포함한다. 제2 데이터 접근 요청은 토큰[2]로 저장될 수 있다.Next, the processor 110 generates a second data access request for reading the weight values of the first connection network 1320 of the artificial neural network model 1300 . The second data access request includes a second memory address value and a read mode value. The second data access request may be stored as a token [2].

다음으로, 프로세서(110)는 인공신경망모델(1300)의 제1 연결망(1320)의 가중치 값들과 입력 레이어(1310)의 노드 값들을 곱하고 누산한 제1 은닉 레이어(1330)의 노드 값들을 저장하기 위한 제3 데이터 접근 요청을 생성한다. 제3 데이터 접근 요청은 제3 메모리 주소 값 및 쓰기 모드 값을 포함한다. 제3 데이터 접근 요청은 토큰[3]으로 저장될 수 있다.Next, the processor 110 multiplies the weight values of the first connection network 1320 of the artificial neural network model 1300 and the node values of the input layer 1310 and stores the accumulated node values of the first hidden layer 1330. Create a third data access request for The third data access request includes a third memory address value and a write mode value. The third data access request may be stored as a token [3].

다음으로, 프로세서(110)는 인공신경망모델(1300)의 제1 은닉 레이어(1330)에 저장된 노드 값들을 읽기 위한 제4 데이터 접근 요청을 생성한다. 제4 데이터 접근 요청은 제3 메모리 주소 값 및 읽기 모드 값을 포함한다. 제4 데이터 접근 요청은 토큰[4]로 저장될 수 있다.Next, the processor 110 generates a fourth data access request for reading node values stored in the first hidden layer 1330 of the artificial neural network model 1300 . The fourth data access request includes a third memory address value and a read mode value. The fourth data access request may be stored as a token [4].

다음으로, 프로세서(110)는 인공신경망모델(1300)의 제2 연결망(1340)의 가중치 값들을 읽기 위한 제5 데이터 접근 요청을 생성한다. 제5 데이터 접근 요청은 제5 메모리 주소 값 및 쓰기 모드 값을 포함한다. 제5 데이터 접근 요청은 토큰[5]로 저장될 수 있다. Next, the processor 110 generates a fifth data access request for reading the weight values of the second connection network 1340 of the artificial neural network model 1300 . The fifth data access request includes a fifth memory address value and a write mode value. The fifth data access request may be stored as a token [5].

다음으로, 프로세서(110)는 인공신경망모델(1300)의 제2 연결망(1340)의 가중치 값들과 제1 은닉 레이어(1330)의 노드 값들을 곱하고 누산한 제2 은닉 레이어(1350)의 노드 값들을 저장하기 위한 제6 데이터 접근 요청을 생성한다. 제6 데이터 접근 요청은 제6 메모리 주소 값 및 쓰기 모드 값을 포함한다. 제6 데이터 접근 요청은 토큰[6]으로 저장될 수 있다.Next, the processor 110 multiplies the weight values of the second connection network 1340 of the artificial neural network model 1300 and the node values of the first hidden layer 1330 and accumulates the node values of the second hidden layer 1350. Generate a sixth data access request for storage. The sixth data access request includes a sixth memory address value and a write mode value. The sixth data access request may be stored as a token [6].

다음으로, 프로세서(110)는 인공신경망모델(1300)의 제2 은닉 레이어(1350)에 저장된 노드 값들을 읽기 위한 제7 데이터 접근 요청을 생성한다. 제7 데이터 접근 요청은 제6 메모리 주소 값 및 읽기 모드 값을 포함한다. 제7 데이터 접근 요청은 토큰[7]로 저장될 수 있다.Next, the processor 110 generates a seventh data access request for reading node values stored in the second hidden layer 1350 of the artificial neural network model 1300 . The seventh data access request includes a sixth memory address value and a read mode value. The seventh data access request may be stored as a token [7].

다음으로, 프로세서(110)는 인공신경망모델(1300)의 제3 연결망(1360)의 가중치 값들을 읽기 위한 제8 데이터 접근 요청을 생성한다. 제8 데이터 접근 요청은 제8 메모리 주소 값 및 읽기 모드 값을 포함한다. 제8 데이터 접근 요청은 토큰[8]로 저장될 수 있다.Next, the processor 110 generates an eighth data access request for reading the weight values of the third connection network 1360 of the artificial neural network model 1300 . The eighth data access request includes an eighth memory address value and a read mode value. The eighth data access request may be stored as a token [8].

다음으로, 프로세서(110)는 인공신경망모델(1300)의 제3 연결망(1360)의 가중치 값들과 제2 은닉 레이어(1350)의 노드 값들을 곱하고 누산한 출력 레이어(1370)의 노드 값들을 저장하기 위한 제9 데이터 접근 요청을 생성한다. 제9 데이터 접근 요청은 제9 메모리 주소 값 및 쓰기 모드 값을 포함한다. 제9 데이터 접근 요청은 토큰[9]로 저장될 수 있다. 노드 값들은 특징맵(feature map), 활성화 맵(activation map) 등 일 수 있다. 단, 이에 제한되지 않는다. 가중치 값들은 커널 윈도우일 수 있다. 단, 이에 제한되지 않는다.Next, the processor 110 multiplies the weight values of the third connection network 1360 of the artificial neural network model 1300 and the node values of the second hidden layer 1350 and stores the accumulated node values of the output layer 1370. Create a ninth data access request for The ninth data access request includes a ninth memory address value and a write mode value. The ninth data access request may be stored as a token [9]. The node values may be a feature map, an activation map, or the like. However, the present invention is not limited thereto. The weight values may be a kernel window. However, the present invention is not limited thereto.

즉, 프로세서(110)는 예시적인 인공신경망모델(1300)의 추론을 위해서 제1 내지 제9 데이터 접근 요청을 생성해야 한다. 만약 프로세서(110)가 생성하는 데이터 접근 요청의 순서가 뒤섞일 경우, 인공신경망모델(1300)의 인공신경망 데이터 지역성이 손상되어 인공신경망모델(1300)의 추론 결과에 오류가 발생되거나 정확도가 저해될 수 있다. 예를 들면, 프로세서(110)가 제2 레이어를 먼저 연산하고 제1 레이어를 연산할 경우 등. 따라서 프로세서(110)는 인공신경망 데이터 지역성에 기초하여 데이터 접근 요청을 순차적으로 생성하도록 구성될 수 있다. 따라서 인공신경망 메모리 제어부(120)는 프로세서(110)가 인공신경망 연산 시 인공신경망 데이터 지역성에 기초하여 데이터 접근 요청을 순차적으로 생성한다고 가정할 수 있다.That is, the processor 110 must generate the first to ninth data access requests for inference of the exemplary artificial neural network model 1300 . If the order of the data access requests generated by the processor 110 is mixed, the locality of the artificial neural network data of the artificial neural network model 1300 is damaged, so that an error occurs in the inference result of the artificial neural network model 1300 or the accuracy is impaired. can For example, when the processor 110 calculates the second layer first and then calculates the first layer, and the like. Accordingly, the processor 110 may be configured to sequentially generate data access requests based on artificial neural network data locality. Therefore, the artificial neural network memory controller 120 may assume that the processor 110 sequentially generates data access requests based on the artificial neural network data locality during computation of the artificial neural network.

다만, 상술하였듯이, 각각의 데이터 접근 요청은 프로세서의 하드웨어 특성에 따라서 프로세서-메모리 레벨에서 재해석 될 수 있다. 상술한 예는, 프로세서의 캐쉬 메모리의 가용 용량이 충분하고, 노드 값의 데이터 크기와 가중치 값의 데이터 크기가 캐쉬 메모리의 가용 용량보다 작은 경우를 예시로 설명하였다. 따라서, 각각의 레이어는 한번의 데이터 접근 요청 단위로 처리되는 것으로 설명될 수 있다. 만약, 인공신경망모델의 가중치 값, 특징맵, 커널, 활성화 맵 등의 데이터 크기가 프로세서의 캐쉬 메모리의 가용 용량보다 클 경우, 대응되는 데이터 접근 요청은 복수개로 분할될 수 있으며, 이러한 경우, 인공신경망모델의 인공신경망 데이터 지역성이 재구성될 수 있다.However, as described above, each data access request may be reinterpreted at the processor-memory level according to the hardware characteristics of the processor. The above-described example has been described as an example in which the available capacity of the cache memory of the processor is sufficient, and the data size of the node value and the data size of the weight value are smaller than the available capacity of the cache memory. Accordingly, each layer may be described as being processed in one data access request unit. If the data size of the artificial neural network model weight value, feature map, kernel, activation map, etc. is larger than the available capacity of the cache memory of the processor, the corresponding data access request may be divided into a plurality of pieces. In this case, the artificial neural network The neural network data locality of the model can be reconstructed.

본 개시의 일 예시에 따른 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴을 생성할 수 있기 때문에, 능동적으로 프로세서가 처리하는 인공신경망모델의 인공신경망 데이터 지역성에 대응되어 동작될 수 있는 효과가 있다.Since the artificial neural network memory control unit 120 according to an example of the present disclosure can generate an artificial neural network data locality pattern, the effect of being able to operate in response to the artificial neural network data locality of the artificial neural network model actively processed by the processor is effective. there is.

즉, 인공신경망 메모리 제어부(120)는 프로세서(110)가 처리중인 인공신경망모델의 실제 인공신경망 데이터 지역성을 모르더라도, 기록된 데이터 접근 요청을 분석하여 인공신경망 데이터 지역성을 실질적으로 분석할 수 있는 효과가 있다.That is, the artificial neural network memory control unit 120 analyzes the recorded data access request even if the processor 110 does not know the actual artificial neural network data locality of the artificial neural network model being processed. there is

즉, 인공신경망 메모리 제어부(120)는 프로세서(110)가 처리중인 인공신경망모델의 구조 정보를 제공하지 않더라도, 기록된 데이터 접근 요청을 분석하여 인공신경망 데이터 지역성을 실질적으로 분석할 수 있는 효과가 있다.That is, the artificial neural network memory control unit 120 has the effect of substantially analyzing the artificial neural network data locality by analyzing the recorded data access request, even if the processor 110 does not provide structural information of the artificial neural network model being processed. .

몇몇 예시에서는, 인공신경망 메모리 제어부는 프로세서-메모리 레벨에서 기 생성된 인공신경망 데이터 지역성 패턴을 제공받도록 구성될 수 있다. In some examples, the neural network memory control unit may be configured to receive the neural network data locality pattern previously generated at the processor-memory level.

도 4는 본 개시의 일 예시에 따른 인공신경망 메모리 제어부가 도 3의 인공신경망모델을 분석하여 생성한 인공신경망 데이터 지역성 패턴을 설명하는 개략도이다. 도 5는 도 4의 인공신경망 데이터 지역성 패턴에 대응되는 토큰과 식별 정보를 설명하는 개략도이다.4 is a schematic diagram illustrating an artificial neural network data locality pattern generated by an artificial neural network memory controller analyzing the artificial neural network model of FIG. 3 according to an example of the present disclosure. 5 is a schematic diagram illustrating a token and identification information corresponding to the artificial neural network data locality pattern of FIG. 4 .

도 4에 도시된 인공신경망 데이터 지역성 패턴(1400)은 단지 설명의 편의를 위해 토큰으로 도시되어 있다. 도 1a 내지 도 4를 참조하여 설명하면, 인공신경망모델(1300)의 인공신경망 데이터 지역성 패턴(1400)은 토큰 [1-2-3-4-5-6-7-8-9]으로 저장되어 있다. 도 5에 도시된 인공신경망 데이터 지역성 패턴(1400)에 대응되는 토큰과 대응되는 식별 정보가 도시되어 있다. The artificial neural network data locality pattern 1400 shown in FIG. 4 is only shown as a token for convenience of description. 1A to 4, the artificial neural network data locality pattern 1400 of the artificial neural network model 1300 is stored as a token [1-2-3-4-5-6-7-8-9]. there is. A token corresponding to the artificial neural network data locality pattern 1400 shown in FIG. 5 and identification information corresponding to the token are shown.

각각의 데이터 접근 요청은 식별 정보를 포함하도록 구성될 수 있다. 각각의 데이터 접근 요청은 토큰으로 표현될 수 있다. 단, 이는 단지 설명의 편의를 위한 것이며, 본 개시는 토큰에 제한되지 않는다. Each data access request may be configured to include identification information. Each data access request can be expressed as a token. However, this is only for convenience of description, and the present disclosure is not limited to tokens.

인공신경망 데이터 지역성 패턴(1400)에 따르면, 인공신경망 메모리 제어부(120)는 현재의 토큰 이후에 발생될 토큰의 순서를 순차적으로 예측할 수 있는 효과가 있다.According to the artificial neural network data locality pattern 1400 , the artificial neural network memory controller 120 has an effect of sequentially predicting the order of tokens to be generated after the current token.

예를 들면, 인공신경망 데이터 지역성 패턴(1400)은 마지막 토큰에서 시작 토큰으로 순서가 연결되는 루프 형태의 패턴을 가지도록 구성될 수 있다. 단, 본 개시는 이에 제한되지 않는다. For example, the artificial neural network data locality pattern 1400 may be configured to have a pattern in the form of a loop in which the sequence is connected from the last token to the start token. However, the present disclosure is not limited thereto.

예를 들면, 인공신경망 데이터 지역성 패턴(1400)은 반복되는 루프 특성을 가지는 메모리 주소들로 구성될 수 있다. 단, 본 개시는 이에 제한되지 않는다.For example, the artificial neural network data locality pattern 1400 may be composed of memory addresses having a repeated loop characteristic. However, the present disclosure is not limited thereto.

예를 들면, 인공신경망 데이터 지역성 패턴(1400)은 인공신경망모델의 연산의 시작과 끝을 식별할 수 있는 식별 정보를 더 포함하도록 구성될 수 있다. 단, 본 개시는 이에 제한되지 않는다.For example, the artificial neural network data locality pattern 1400 may be configured to further include identification information that can identify the start and end of the computation of the artificial neural network model. However, the present disclosure is not limited thereto.

예를 들면, 인공신경망 데이터 지역성 패턴(1400)의 시작과 끝은 패턴의 시작 토큰과 마지막 토큰으로 구분하도록 구성될 수 있다. 단, 본 개시는 이에 제한되지 않는다. For example, the start and end of the artificial neural network data locality pattern 1400 may be configured to be divided into a start token and a last token of the pattern. However, the present disclosure is not limited thereto.

상술한 구성에 따르면, 프로세서(110)가 특정 인공신경망모델을 반복하여 추론 할 때, 인공신경망 데이터 지역성 패턴(1400)은 루프 형태의 패턴이기 때문에 특정 인공신경망모델의 현재 추론이 끝나더라도, 다음 추론의 시작을 예측할 수 있는 효과가 있다. According to the above configuration, when the processor 110 repeatedly infers a specific artificial neural network model, since the artificial neural network data locality pattern 1400 is a loop-type pattern, even if the current inference of the specific artificial neural network model ends, the next inference has the effect of predicting the onset of

예를 들면, 초당 30 IPS(inference per second) 속도로 자율 주행 자동차에 장착된 전방 카메라의 영상의 물체를 인식하는 인공신경망모델의 경우, 연속적으로 동일한 추론이 특정 주기로 계속 반복된다. 따라서 상술한 루프 형태의 인공신경망 데이터 지역성 패턴을 활용하면, 반복되는 데이터 접근 요청을 예측할 수 있는 효과가 있다. For example, in the case of an artificial neural network model that recognizes an object in the image of a front camera mounted on an autonomous vehicle at a speed of 30 IPS (inference per second) per second, the same reasoning is continuously repeated at a specific cycle. Therefore, if the aforementioned loop-type artificial neural network data locality pattern is utilized, there is an effect of predicting repeated data access requests.

식별 정보에 대해서 예를 들어 부연 설명하면, 인공신경망 데이터 지역성 패턴(1400)의 토큰 [3]과 토큰 [4]는 동일한 메모리 주소 값을 가지나 동작 모드가 다른 것을 확인할 수 있다. 따라서 인공신경망 메모리 제어부(120)는 메모리 주소 값이 동일하더라도, 동작 모드가 다르기 때문에 제3 데이터 접근 요청과 제4 데이터 접근 요청을 서로 다른 토큰으로 분류하도록 구성될 수 있다. 단, 본 개시의 예시들의 식별 정보는 동작 모드에 제한되지 않으며, 메모리 주소 값만으로 인공신경망 데이터 지역성 패턴을 예측하도록 구성될 수 있다.To elaborate on the identification information for example, it can be confirmed that token [3] and token [4] of the artificial neural network data locality pattern 1400 have the same memory address value but have different operation modes. Accordingly, the artificial neural network memory controller 120 may be configured to classify the third data access request and the fourth data access request into different tokens because the operation modes are different even if the memory address values are the same. However, identification information of examples of the present disclosure is not limited to an operation mode, and may be configured to predict an artificial neural network data locality pattern only with a memory address value.

인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴(1400)에 기초하여 대응되는 예측된 데이터 접근 요청(즉, 사전 데이터 접근 요청)을 생성하도록 구성될 수 있다.The neural network memory controller 120 may be configured to generate a corresponding predicted data access request (ie, a prior data access request) based on the neural network data locality pattern 1400 .

인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴(1400)에 기초하여 예측된 데이터 접근 요청을 순차적으로 더 생성하도록 구성될 수 있다. The artificial neural network memory controller 120 may be configured to sequentially further generate a predicted data access request based on the artificial neural network data locality pattern 1400 .

상술한 구성에 따르면, 프로세서(110)가 인공신경망 데이터 지역성 패턴(1400)에 포함된 특정 데이터 접근 요청을 생성하면 인공신경망 메모리 제어부(120)는 특정 데이터 접근 요청 이후의 데이터 접근 요청들을 적어도 하나 이상 순차적으로 예측할 수 있는 효과가 있다. 예를 들면, 토큰 [1]을 프로세서(110)가 생성하면, 인공신경망 메모리 제어부(120)는 토큰 [2]에 대응되는 데이터 접근 요청이 다음에 생성될 것을 예측할 수 있는 효과가 있다. 예를 들면, 토큰 [3]을 프로세서(110)가 생성하면, 인공신경망 메모리 제어부(120)는 토큰 [4]에 대응되는 데이터 접근 요청이 다음에 생성될 것을 예측할 수 있는 효과가 있다. 예를 들면, 토큰 [1]을 프로세서(110)가 생성하면, 인공신경망 메모리 제어부(120)는 토큰 [2-3-4-5-6-7-8-9] 순서로 대응되는 데이터 접근 요청들이 생성될 것을 예측할 수 있는 효과가 있다. According to the above configuration, when the processor 110 generates a specific data access request included in the artificial neural network data locality pattern 1400, the neural network memory controller 120 transmits at least one or more data access requests after the specific data access request. There is an effect that can be predicted sequentially. For example, when the processor 110 generates the token [1], the artificial neural network memory control unit 120 has the effect of predicting that a data access request corresponding to the token [2] will be generated next. For example, when the processor 110 generates the token [3], the artificial neural network memory control unit 120 has the effect of predicting that the data access request corresponding to the token [4] will be generated next. For example, when the processor 110 generates the token [1], the artificial neural network memory control unit 120 requests data access corresponding to the token [2-3-4-5-6-7-8-9] in the order It has the effect of predicting what will be created.

부연 설명하면, 프로세서(110)가 복수의 인공신경망모델들을 처리할 경우, 인공신경망 데이터 지역성 패턴(1400)의 토큰들 사이에 예측하지 못한 데이터 지역성 패턴이 끼어들 수 있다. 예를 들면, 토큰 [2] 이후에 새로운 토큰[41]이 난입할 수 있다. 하지만 이러한 경우에도, 인공신경망 메모리 제어부(120)는 토큰 [2] 이후에는 프로세서(110)가 토큰[3]을 생성할 것을 예측하고 준비할 수 있는 효과가 있다. In other words, when the processor 110 processes a plurality of artificial neural network models, an unexpected data locality pattern may be interposed between tokens of the artificial neural network data locality pattern 1400 . For example, a new token [41] can invade after token [2]. However, even in this case, the artificial neural network memory controller 120 has the effect of predicting that the processor 110 will generate the token [3] after the token [2] and preparing it.

예를 들면, 프로세서(110)가 토큰[9]를 생성하면, 인공신경망 메모리 제어부(120)는 프로세서(110)가 토큰[1]을 생성할 것을 예측할 수 있다. For example, when the processor 110 generates the token [9], the artificial neural network memory control unit 120 may predict that the processor 110 will generate the token [1].

도 6은 본 개시의 일 예시에 따른 인공신경망 메모리 제어부가 인공신경망 데이터 지역성 패턴에 기초하여 생성한 예측된 데이터 접근 요청과 실제 데이터 접근 요청을 설명하는 개략도이다.6 is a schematic diagram illustrating a predicted data access request and an actual data access request generated by an artificial neural network memory controller based on an artificial neural network data locality pattern according to an example of the present disclosure.

본 개시의 일 예시에 따른 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴을 활용하여 프로세서(110)가 다음에 요청할 실제 데이터 접근 요청을 예측하여 예측된 데이터 접근 요청을 생성하도록 구성될 수 있다.The neural network memory control unit 120 according to an example of the present disclosure may be configured to generate a predicted data access request by predicting the actual data access request that the processor 110 will request next by using the neural network data locality pattern. .

도 6을 참조하면, 데이터 접근 요청 토큰은 인공신경망 메모리 제어부(120)가 프로세서(110)로부터 수신한 데이터 접근 요청에 대응되는 토큰을 의미한다. 예측된 데이터 접근 요청 토큰은 프로세서(110)가 다음에 요청할 데이터 접근 요청을 인공신경망 메모리 제어부(120)가 인공신경망 데이터 지역성 패턴에 기초하여 사전에 예측한 데이터 접근 요청에 대응되는 토큰이다. 실제 데이터 접근 요청 토큰은 예측된 데이터 접근 요청 토큰 생성 후 프로세서(110)가 실제 생성한 데이터 접근 요청 토큰이다. 단, 본 개시의 토큰은 단지 설명의 편의를 위한 예시일 뿐이며, 본 개시는 토큰에 제한되지 않는다.Referring to FIG. 6 , the data access request token means a token corresponding to the data access request received by the artificial neural network memory controller 120 from the processor 110 . The predicted data access request token is a token corresponding to the data access request that the neural network memory controller 120 predicts in advance based on the neural network data locality pattern for the data access request to be requested next by the processor 110 . The actual data access request token is a data access request token actually generated by the processor 110 after the predicted data access request token is generated. However, the token of the present disclosure is merely an example for convenience of description, and the present disclosure is not limited to the token.

데이터 접근 요청 및 사전 데이터 접근은 데이터 접근 요청 토큰에 대응될 수 있다. 이러한 경우, 특정 데이터 접근 요청 토큰에 매칭되는 데이터 접근 요청 및 예측된 데이터 접근 요청은 서로 동일한 메모리 주소를 가지도록 구성될 수 있다. 즉, 데이터 접근 요청 및 사전 데이터 접근은 서로 동일한 메모리 주소를 포함하도록 구성될 수 있다. The data access request and the dictionary data access may correspond to the data access request token. In this case, the data access request matching the specific data access request token and the predicted data access request may be configured to have the same memory address. That is, the data access request and the dictionary data access may be configured to include the same memory address.

예를 들면, 데이터 접근 요청 토큰이 [3]이고 예측된 데이터 접근 요청 토큰이 [3]일 경우, 각각의 토큰의 메모리 주소 값은 서로 동일할 수 있다. 즉, 데이터 접근 요청 및 사전 데이터 접근은 서로 동일한 동작 모드 값을 포함하도록 구성될 수 있다. 예를 들면, 데이터 접근 요청 토큰이 [3]이고 예측된 데이터 접근 요청 토큰이 [3]일 경우, 각각의 토큰의 동작 모드 값은 서로 동일할 수 있다.For example, when the data access request token is [3] and the predicted data access request token is [3], the memory address values of the respective tokens may be the same. That is, the data access request and the dictionary data access may be configured to include the same operation mode value. For example, when the data access request token is [3] and the predicted data access request token is [3], the operation mode values of each token may be the same.

도 6을 참조하면, 프로세서(110)가 토큰 [1]에 대응되는 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부(120)는 토큰 [2]에 대응되는 예측된 데이터 접근 요청을 생성한다. 프로세서(110)는 예측된 데이터 접근 요청 생성 후 토큰 [2]에 대응되는 실제 데이터 접근 요청을 생성하였다. 그리고 인공신경망 메모리 제어부(120)는 예측된 데이터 접근 요청이 실제 데이터 접근 요청을 정확히 예측했는지를 결정하도록 구성된다. 인공신경망 메모리 제어부(120)는 예측된 데이터 접근 요청과 실제 데이터 접근 요청에 대응되는 토큰이 동일하기 때문에 패턴이 일치한다고 결정할 수 있다. Referring to FIG. 6 , when the processor 110 generates a data access request corresponding to the token [1], the artificial neural network memory controller 120 generates a predicted data access request corresponding to the token [2]. The processor 110 generates an actual data access request corresponding to the token [2] after generating the predicted data access request. And the artificial neural network memory control unit 120 is configured to determine whether the predicted data access request accurately predicted the actual data access request. The artificial neural network memory controller 120 may determine that the patterns match because the tokens corresponding to the predicted data access request and the actual data access request are the same.

다음으로 예를 들면, 프로세서(110)가 토큰 [2]에 대응되는 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부(120)는 토큰 [3]에 대응되는 예측된 데이터 접근 요청을 생성한다. 프로세서(110)는 예측된 데이터 접근 요청 생성 후 토큰 [3]에 대응되는 실제 데이터 접근 요청을 생성하였다. 그리고 인공신경망 메모리 제어부(120)는 예측된 데이터 접근 요청이 실제 데이터 접근 요청을 정확히 예측했는지를 결정하도록 구성된다. 인공신경망 메모리 제어부(120)는 예측된 데이터 접근 요청과 실제 데이터 접근 요청에 대응되는 토큰이 동일하기 때문에 패턴이 일치한다고 결정할 수 있다. Next, for example, when the processor 110 generates a data access request corresponding to the token [2], the artificial neural network memory control unit 120 generates a predicted data access request corresponding to the token [3]. The processor 110 generates an actual data access request corresponding to the token [3] after generating the predicted data access request. And the artificial neural network memory control unit 120 is configured to determine whether the predicted data access request accurately predicted the actual data access request. The artificial neural network memory controller 120 may determine that the patterns match because the tokens corresponding to the predicted data access request and the actual data access request are the same.

다시 예를 들면, 프로세서(110)가 토큰 [9]에 대응되는 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부(120)는 토큰 [1]에 대응되는 예측된 데이터 접근 요청을 생성한다. 프로세서(110)는 예측된 데이터 접근 요청 생성 후 토큰 [9]에 대응되는 실제 데이터 접근 요청을 생성하였다. 그리고 인공신경망 메모리 제어부(120)는 예측된 데이터 접근 요청이 이후 생성될 실제 데이터 접근 요청을 정확히 예측했는지를 확인하도록 구성된다. 인공신경망 메모리 제어부(120)는 예측된 데이터 접근 요청과 실제 데이터 접근 요청에 대응되는 토큰이 동일하기 때문에 패턴이 일치한다고 결정할 수 있다.Again, for example, when the processor 110 generates a data access request corresponding to the token [9], the artificial neural network memory control unit 120 generates a predicted data access request corresponding to the token [1]. The processor 110 generates an actual data access request corresponding to the token [9] after generating the predicted data access request. And the artificial neural network memory control unit 120 is configured to check whether the predicted data access request accurately predicted the actual data access request to be generated later. The artificial neural network memory controller 120 may determine that the patterns match because the tokens corresponding to the predicted data access request and the actual data access request are the same.

인공신경망 메모리 제어부(120)가 예측된 데이터 접근 요청을 생성한 이후, 프로세서(110)가 실제 데이터 접근 요청을 생성할 경우, 인공신경망 메모리 제어부(120)는 예측된 데이터 접근 요청과 실제 데이터 접근 요청이 서로 동일한 요청인지를 판단하도록 구성될 수 있다. After the artificial neural network memory controller 120 generates the predicted data access request, when the processor 110 generates an actual data access request, the artificial neural network memory controller 120 performs the predicted data access request and the actual data access request. may be configured to determine whether these requests are the same as each other.

상술한 구성에 따르면, 인공신경망 메모리 시스템(100)은 프로세서(110)가 처리하는 인공신경망모델의 인공신경망 데이터 지역성의 변화를 감지할 수 있는 효과가 있다. 따라서, 인공신경망 메모리 제어부(120)는 인공신경망모델이 변하더라도 변경된 인공신경망 데이터 지역성을 분석할 수 있는 효과가 있다. According to the above configuration, the artificial neural network memory system 100 has an effect of detecting a change in the locality of the artificial neural network data of the artificial neural network model processed by the processor 110 . Accordingly, the artificial neural network memory controller 120 has an effect of analyzing the locality of the changed artificial neural network data even if the artificial neural network model is changed.

인공신경망 메모리 제어부(120)가 예측된 데이터 접근 요청과 실제 데이터 접근 요청이 동일하다고 결정할 경우, 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴을 유지하도록 구성될 수 있다.When the neural network memory controller 120 determines that the predicted data access request and the actual data access request are the same, the neural network memory controller 120 may be configured to maintain the neural network data locality pattern.

상술한 구성에 따르면, 인공신경망 메모리 시스템(100)은 프로세서(110)가 처리하는 인공신경망모델이 반복 사용되는 것을 감지하여, 프로세서(110)가 요구하는 데이터를 보다 더 빠르게 준비하거나 또는 제공할 수 있는 효과가 있다.According to the above-described configuration, the artificial neural network memory system 100 detects that the artificial neural network model processed by the processor 110 is repeatedly used, and can prepare or provide data requested by the processor 110 more quickly. there is an effect

인공신경망 메모리 제어부(120)가 예측된 데이터 접근 요청과 실제 데이터 접근 요청이 상이하다고 결정할 경우, 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴을 갱신하거나 또는 신규 인공신경망 데이터 지역성 패턴을 더 생성하도록 구성될 수 있다.When the neural network memory controller 120 determines that the predicted data access request and the actual data access request are different, the neural network memory controller 120 updates the neural network data locality pattern or further generates a new artificial neural network data locality pattern can be configured to

상술한 구성에 따르면, 인공신경망 메모리 시스템(100)은 프로세서(110)가 처리하는 인공신경망모델이 변경된 것을 감지하여, 변경된 인공신경망모델에 대응되는 예측된 데이터 접근 요청을 생성할 수 있는 효과가 있다.According to the above configuration, the artificial neural network memory system 100 detects that the artificial neural network model processed by the processor 110 is changed, and has the effect of generating a predicted data access request corresponding to the changed artificial neural network model. .

몇몇 예시에서는, 인공신경망 메모리 제어부는 연속된 예측된 데이터 접근 요청들을 생성하도록 구성될 수 있다. In some examples, the neural network memory controller may be configured to generate a sequence of predicted data access requests.

예를 들면, 데이터 접근 요청 토큰이 [2] 일 경우, 인공신경망 메모리 제어부가 생성하는 예측된 데이터 접근 요청은 토큰[3]에 대응되는 데이터 접근 요청일 수 있다. 단, 이에 제한되지 않으며, 예를 들면, 인공신경망 메모리 제어부가 생성하는 예측된 데이터 접근 요청은 토큰[3-4]에 대응되는 복수의 데이터 접근 요청들일 수 있다. 단, 이에 제한되지 않으며, 예를 들면, 인공신경망 메모리 제어부가 생성하는 예측된 데이터 접근 요청은 토큰[3-4-5-6]에 대응되는 복수의 데이터 접근 요청들일 수 있다. For example, when the data access request token is [2], the predicted data access request generated by the artificial neural network memory controller may be a data access request corresponding to the token [3]. However, the present invention is not limited thereto, and for example, the predicted data access request generated by the artificial neural network memory controller may be a plurality of data access requests corresponding to the token [3-4]. However, the present invention is not limited thereto, and for example, the predicted data access request generated by the artificial neural network memory controller may be a plurality of data access requests corresponding to the token [3-4-5-6].

상술한 구성에 따르면, 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴에 기초하여, 계속 반복되는 데이터 접근 요청들의 순서를 모두 예측한 예측된 데이터 접근 요청을 생성할 수 있는 효과가 있다. According to the above-described configuration, the neural network memory controller has an effect of generating a predicted data access request that predicts all the sequences of continuously repeated data access requests, based on the neural network data locality pattern.

상술한 구성에 따르면, 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴에 기초하여, 적어도 일부의 데이터 접근 요청들의 순서를 사전에 예측한 예측된 데이터 접근 요청을 생성할 수 있는 효과가 있다. According to the above configuration, the artificial neural network memory controller has an effect of generating a predicted data access request that predicts the order of at least some data access requests in advance, based on the artificial neural network data locality pattern.

도 7은 본 개시의 일 예시에 따른 인공신경망 메모리 제어부의 동작을 개략적으로 설명하는 순서도이다.7 is a flowchart schematically illustrating an operation of an artificial neural network memory controller according to an example of the present disclosure.

도 7을 참조하면, 인공신경망 연산 처리를 위해서 프로세서(110)는 인공신경망 데이터 지역성에 기초하여 인공신경망모델에 대응되는 데이터 접근 요청을 생성하도록 구성될 수 있다.Referring to FIG. 7 , for artificial neural network operation processing, the processor 110 may be configured to generate a data access request corresponding to an artificial neural network model based on artificial neural network data locality.

인공신경망 메모리 제어부(120)는 프로세서(110)에서 생성된 데이터 접근 요청들을 순차적으로 기록하여 인공신경망 데이터 지역성 패턴을 생성한다(S710).The artificial neural network memory controller 120 sequentially records the data access requests generated by the processor 110 to generate an artificial neural network data locality pattern (S710).

인공신경망 메모리 제어부(120)는 생성된 인공신경망 데이터 지역성 패턴과 프로세서(110)가 생성하는 데이터 접근 요청을 비교하여 프로세서(110)가 생성할 실제 데이터 접근 요청을 예측한 예측된 데이터 접근 요청을 생성하도록 구성될 수 있다. The artificial neural network memory controller 120 compares the generated artificial neural network data locality pattern with the data access request generated by the processor 110 to generate a predicted data access request that predicts the actual data access request to be generated by the processor 110 can be configured to

본 개시의 일 예시에 따른 인공신경망 메모리 시스템(100)은 인공신경망 연산에 대응되는 데이터 접근 요청을 생성하도록 구성된 적어도 하나의 프로세서(110) 및 데이터 접근 요청을 순차적으로 기록하여 인공신경망 연산의 인공신경망 데이터 지역성 패턴을 생성한다(S720). 메모리 인공신경망 메모리 시스템(100)은 인공신경망 데이터 지역성 패턴에 기초하여 적어도 하나의 프로세서(110)가 생성한 데이터 접근 요청의 실제 데이터 접근 요청을 예측한 예측된 데이터 접근 요청을 생성하도록 구성된, 적어도 하나의 인공신경망 메모리 제어부(120)를 포함하도록 구성될 수 있다.The artificial neural network memory system 100 according to an example of the present disclosure sequentially records the data access request and at least one processor 110 configured to generate a data access request corresponding to the artificial neural network operation, and the artificial neural network of the artificial neural network operation A data locality pattern is generated (S720). At least one memory neural network memory system 100 is configured to generate a predicted data access request that predicts the actual data access request of the data access request generated by the at least one processor 110 based on the neural network data locality pattern It may be configured to include the artificial neural network memory control unit 120 of the.

즉, 적어도 하나의 인공신경망 메모리 제어부(120)는 실제 데이터 접근 요청 생성 전에 예측된 데이터 접근 요청을 생성한다(S730).That is, the at least one artificial neural network memory controller 120 generates a predicted data access request before generating the actual data access request (S730).

즉, 적어도 하나의 프로세서(110)는 적어도 하나의 인공신경망 메모리 제어부(120)에 데이터 접근 요청을 전송하도록 구성되고, 적어도 하나의 인공신경망 메모리 제어부(120)는 데이터 접근 요청에 대응하여 예측된 데이터 접근 요청을 출력하도록 구성될 수 있다.That is, the at least one processor 110 is configured to transmit a data access request to the at least one artificial neural network memory controller 120 , and the at least one artificial neural network memory controller 120 transmits data predicted in response to the data access request. It may be configured to output an access request.

본 개시의 일 예시에 따른 인공신경망 메모리 시스템(100)은 인공신경망 연산에 대응되는 데이터 접근 요청을 생성하도록 구성된 적어도 하나의 프로세서(110) 및 적어도 하나의 프로세서(110)가 생성한 데이터 접근 요청을 순차적으로 기록하여 인공신경망 연산의 인공신경망 데이터 지역성 패턴을 생성하도록 구성되고, 인공신경망 데이터 지역성 패턴에 기초하여 적어도 하나의 프로세서(110)가 생성한 실제 데이터 접근 요청을 예측한 데이터 접근 요청을 생성하도록 구성된 적어도 하나의 인공신경망 메모리 제어부(120)를 포함하도록 구성될 수 있다.The artificial neural network memory system 100 according to an example of the present disclosure receives the data access request generated by the at least one processor 110 and the at least one processor 110 configured to generate a data access request corresponding to the artificial neural network operation. It is configured to sequentially record and generate an artificial neural network data locality pattern of an artificial neural network operation, and based on the artificial neural network data locality pattern to generate a data access request that predicts the actual data access request generated by the at least one processor 110 It may be configured to include at least one configured artificial neural network memory controller 120 .

상술한 구성에 따르면, 인공신경망 메모리 제어부(120)는 인공신경망 데이터 지역성 패턴에 기초하여 프로세서(110)가 처리중인 인공신경망모델이 생성할 실제 데이터 접근 요청을 사전에 예측할 수 있기 때문에, 프로세서(110)가 요청하기 전에 해당 데이터를 사전에 제공할 준비를 할 수 있는 장점이 있다. According to the above configuration, since the artificial neural network memory controller 120 can predict in advance the actual data access request to be generated by the artificial neural network model being processed by the processor 110 based on the artificial neural network data locality pattern, the processor 110 ) has the advantage of being able to prepare to provide the data in advance before requesting it.

인공신경망 메모리 제어부(120)는 생성된 예측된 데이터 접근 요청과 예측된 데이터 접근 요청 생성 후 프로세서(110)가 생성한 실제 데이터 접근 요청을 비교하여 인공신경망 데이터 지역성 패턴이 매칭되는지를 결정하도록 구성될 수 있다(S740). The artificial neural network memory control unit 120 compares the generated predicted data access request with the actual data access request generated by the processor 110 after generating the predicted data access request to determine whether the artificial neural network data locality pattern matches. It can be (S740).

상술한 구성에 따르면, 인공신경망 메모리 제어부(120)는 실제 데이터 접근 요청 생성 전에 예측된 데이터 접근 요청을 생성하여 사전에 데이터를 제공할 준비를 할 수 있다. 따라서 인공신경망 메모리 제어부(120)는 프로세서(110)에 데이터를 제공할 때 발생될 수 있는 지연시간을 실질적으로 제거하거나 또는 저감할 수 있는 효과가 있다.According to the above configuration, the artificial neural network memory controller 120 may prepare to provide data in advance by generating a predicted data access request before generating the actual data access request. Accordingly, the artificial neural network memory controller 120 has an effect of substantially eliminating or reducing a delay time that may occur when providing data to the processor 110 .

도 8은 본 개시의 다른 예시에 따른 인공신경망 메모리 시스템을 설명하는 개략적인 블록도이다.8 is a schematic block diagram illustrating an artificial neural network memory system according to another example of the present disclosure.

도 8을 참조하면, 인공신경망 메모리 시스템(200)은 프로세서(210), 인공신경망 메모리 제어부(220), 및 메모리(230)를 포함하도록 구성될 수 있다.Referring to FIG. 8 , the artificial neural network memory system 200 may be configured to include a processor 210 , an artificial neural network memory controller 220 , and a memory 230 .

본 개시의 다른 예시에 따른 인공신경망 메모리 시스템(200)을 본 개시의 일 예시에 따른 인공신경망 메모리 시스템(100)과 비교하면, 인공신경망 메모리 시스템(200)이 메모리(230)를 더 포함하는 것을 제외하곤 실질적으로 동일하기 때문에, 이하 단지 설명의 편의를 위해서 중복 설명은 생략할 수 있다. When the artificial neural network memory system 200 according to another example of the present disclosure is compared with the artificial neural network memory system 100 according to an example of the present disclosure, the artificial neural network memory system 200 further includes a memory 230 . Since they are substantially the same except for, redundant descriptions may be omitted for convenience of description only.

본 개시의 다른 예시에 따른 인공신경망 메모리 시스템(200)은 인공신경망 메모리 제어부(220)와 통신하도록 구성된 메모리(230)를 포함하고, 메모리(230)는 인공신경망 메모리 제어부(220)에서 출력되는 메모리 접근 요청에 대응하여 동작하도록 구성될 수 있다.The artificial neural network memory system 200 according to another example of the present disclosure includes a memory 230 configured to communicate with the artificial neural network memory controller 220 , and the memory 230 is a memory output from the artificial neural network memory controller 220 . It may be configured to act in response to an access request.

프로세서(210)는 인공신경망 메모리 제어부(220)와 통신하도록 구성될 수 있다. 프로세서(210)는 인공신경망 메모리 제어부(220)로 송신할 데이터 접근 요청을 생성하도록 구성될 수 있다. 데이터 접근 요청은 처리중인 인공신경망모델의 인공신경망 데이터 지역성에 기초하여 생성될 수 있다. 프로세서(210)는 데이터 접근 요청에 대응되는 데이터를 인공신경망 메모리 제어부(220)로부터 제공받도록 구성된다. The processor 210 may be configured to communicate with the artificial neural network memory controller 220 . The processor 210 may be configured to generate a data access request to be transmitted to the artificial neural network memory controller 220 . The data access request may be generated based on the neural network data locality of the neural network model being processed. The processor 210 is configured to receive data corresponding to the data access request from the artificial neural network memory controller 220 .

인공신경망 메모리 제어부(220)는 프로세서(210)에서 생성된 데이터 접근 요청을 수신하도록 구성될 수 있다. 인공신경망 메모리 제어부(220)는 프로세서(210)가 처리중인 인공신경망모델의 인공신경망 데이터 지역성을 분석하여 인공신경망 데이터 지역성 패턴을 생성하도록 구성될 수 있다. The artificial neural network memory controller 220 may be configured to receive the data access request generated by the processor 210 . The artificial neural network memory controller 220 may be configured to generate an artificial neural network data locality pattern by analyzing the artificial neural network data locality of the artificial neural network model being processed by the processor 210 .

인공신경망 메모리 제어부(220)는 메모리 접근 요청을 생성하여 메모리(230)를 제어하도록 구성될 수 있다. 인공신경망 메모리 제어부(220)는 데이터 접근 요청에 대응되는 메모리 접근 요청을 생성하도록 구성될 수 있다. 즉, 인공신경망 메모리 제어부(220)는 프로세서(210)가 생성한 데이터 접근 요청에 대응되는 메모리 접근 요청을 생성하도록 구성될 수 있다. 예를 들면, 인공신경망 메모리 제어부(220)가 인공신경망 데이터 지역성 패턴을 생성하지 않은 경우, 인공신경망 메모리 제어부(220)는 프로세서(210)가 생성한 데이터 접근 요청에 기초하여 메모리 접근 요청을 생성하도록 구성될 수 있다. 이러한 경우 메모리 접근 요청은 데이터 접근 요청에 포함된 식별 정보 중 메모리 주소 값 및 동작 모드 값을 포함하도록 구성될 수 있다.The artificial neural network memory controller 220 may be configured to control the memory 230 by generating a memory access request. The artificial neural network memory controller 220 may be configured to generate a memory access request corresponding to the data access request. That is, the artificial neural network memory controller 220 may be configured to generate a memory access request corresponding to the data access request generated by the processor 210 . For example, when the artificial neural network memory controller 220 does not generate the artificial neural network data locality pattern, the artificial neural network memory controller 220 generates a memory access request based on the data access request generated by the processor 210. can be configured. In this case, the memory access request may be configured to include a memory address value and an operation mode value among identification information included in the data access request.

인공신경망 메모리 제어부(220)는 예측된 데이터 접근 요청에 대응되는 메모리 접근 요청을 생성하도록 구성될 수 있다. 즉, 인공신경망 메모리 제어부(220)는 인공신경망 데이터 지역성 패턴에 기초여 생성된 예측된 데이터 접근 요청에 기초하여 메모리 접근 요청을 생성하도록 구성될 수 있다. 예를 들면, 인공신경망 메모리 제어부(220)가 인공신경망 데이터 지역성 패턴을 생성한 경우, 인공신경망 메모리 제어부(220)는 예측된 데이터 접근 요청에 기초하여 메모리 접근 요청을 생성하도록 구성될 수 있다.The artificial neural network memory controller 220 may be configured to generate a memory access request corresponding to the predicted data access request. That is, the neural network memory controller 220 may be configured to generate a memory access request based on the predicted data access request generated based on the neural network data locality pattern. For example, when the neural network memory controller 220 generates the neural network data locality pattern, the neural network memory controller 220 may be configured to generate a memory access request based on the predicted data access request.

상술한 구성에 따르면, 인공신경망 메모리 제어부(220)는 메모리 접근 요청을 통해서 메모리(220)와 데이터를 주고 받을 수 있으며, 해당 메모리 접근 요청이 예측된 데이터 접근 요청에 기초하여 생성될 경우, 인공신경망 메모리 시스템(200)은 프로세서(210)에 데이터를 보다 더 빠르게 제공할 수 있는 효과가 있다. According to the above configuration, the artificial neural network memory controller 220 can send and receive data to and from the memory 220 through a memory access request, and when the corresponding memory access request is generated based on the predicted data access request, the artificial neural network The memory system 200 has the effect of more rapidly providing data to the processor 210 .

인공신경망 메모리 제어부(220)는 프로세서(210)가 생성한 데이터 접근 요청 및 인공신경망 메모리 제어부(220)가 생성한 예측된 데이터 접근 요청 중 하나에 기초하여 메모리 접근 요청을 생성하도록 구성될 수 있다. 즉, 인공신경망 메모리 제어부(220)가 생성하는 메모리 접근 요청은 데이터 접근 요청 또는 예측된 데이터 접근 요청에 기초하여 선택적으로 생성될 수 있다. The neural network memory controller 220 may be configured to generate a memory access request based on one of a data access request generated by the processor 210 and a predicted data access request generated by the neural network memory controller 220 . That is, the memory access request generated by the artificial neural network memory controller 220 may be selectively generated based on the data access request or the predicted data access request.

인공신경망 메모리 제어부(220)는 데이터 접근 요청 및 예측된 데이터 접근 요청에 포함된 식별 정보 중 적어도 일부를 포함하는 메모리 접근 요청을 생성하도록 구성될 수 있다. 예를 들면, 프로세서(210)가 생성한 데이터 접근 요청은 메모리 주소 값 및 동작 모드 값을 포함할 수 있다. 이때, 인공신경망 메모리 제어부(220)가 생성한 메모리 접근 요청은 대응되는 데이터 접근 요청의 메모리 주소 값 및 동작 모드 값을 포함하도록 구성될 수 있다.The artificial neural network memory control unit 220 may be configured to generate a memory access request including at least some of the identification information included in the data access request and the predicted data access request. For example, the data access request generated by the processor 210 may include a memory address value and an operation mode value. In this case, the memory access request generated by the artificial neural network memory controller 220 may be configured to include a memory address value and an operation mode value of the corresponding data access request.

즉, 데이터 접근 요청, 예측된 데이터 접근 요청 및 메모리 접근 요청 각각은 대응되는 메모리 주소 값 및 동작 모드 값을 각각 포함하도록 구성될 수 있다. 동작 모드는 읽기 모드 및 쓰기 모드를 포함하도록 구성될 수 있다. 예를 들면, 인공신경망 메모리 제어부(220)가 생성하는 메모리 접근 요청은 데이터 접근 요청 또는 예측된 데이터 접근 요청과 동일한 구조의 데이터 형태로 구성될 수 있다. 따라서 메모리(230)의 입장에서는 데이터 접근 요청과 예측된 데이터 접근 요청을 구분하지 않아도, 인공신경망 메모리 제어부(220)의 지시에 따라서 메모리 접근 요청 업무를 수행할 수 있다.That is, each of the data access request, the predicted data access request, and the memory access request may be configured to include a corresponding memory address value and an operation mode value, respectively. The operation mode may be configured to include a read mode and a write mode. For example, the memory access request generated by the artificial neural network memory controller 220 may be configured in the form of data having the same structure as the data access request or the predicted data access request. Therefore, the memory 230 can perform the memory access request task according to the instruction of the artificial neural network memory controller 220 without distinguishing between the data access request and the predicted data access request.

상술한 구성에 따르면, 메모리(230)는 인공신경망 메모리 제어부(220)가 생성하는 메모리 접근 요청이 데이터 접근 요청에 기초한 것인지 또는 예측된 데이터 접근 요청에 기초한 것인지 여부와 상관없이 동작할 수 있는 효과가 있다. 따라서 인공신경망 메모리 제어부(220)는 인공신경망 데이터 지역성에 기초하여 동작하더라도, 다양한 종류의 메모리와 호환되어 동작할 수 있는 효과가 있다.According to the above configuration, the memory 230 has an effect that can operate regardless of whether the memory access request generated by the artificial neural network memory controller 220 is based on a data access request or a predicted data access request. there is. Therefore, even if the artificial neural network memory control unit 220 operates based on the artificial neural network data locality, there is an effect that it can operate compatible with various types of memories.

인공신경망 메모리 제어부(220)는 메모리 접근 요청을 메모리(230)에 전달하고, 메모리(230)는 메모리 접근 요청에 대응되는 메모리 동작을 수행하도록 구성된다.The artificial neural network memory controller 220 transmits a memory access request to the memory 230 , and the memory 230 is configured to perform a memory operation corresponding to the memory access request.

본 개시의 예시들에 따른 메모리는 다양한 형태로 구현될 수 있다. 메모리는 휘발성 메모리(volatile memory)와 비휘발성 메모리(non-volatile memory)로 구현될 수 있다. Memory according to examples of the present disclosure may be implemented in various forms. The memory may be implemented as a volatile memory and a non-volatile memory.

휘발성 메모리는 DRAM(Dynamic RAM)과 SRAM(Static RAM) 등을 포함할 수 있다. 비휘발성 메모리는 PROM(Programmable ROM), EPROM(Erasable PROM), EEPROM(Electrically EPROM), 플래시 메모리(Flash Memory), 강유전체 램(ferroelectric RAM(FRAM)), 마그네틱 램(magnetic RAM(MRAM)), 및 상 변화 메모리 장치(phase change RAM) 등을 포함할 수 있다. 단, 본 개시는 이에 제한되지 않는다.The volatile memory may include a dynamic RAM (DRAM) and a static RAM (SRAM). Non-volatile memory includes Programmable ROM (PROM), Erasable PROM (EPROM), Electrically EPROM (EEPROM), Flash Memory, ferroelectric RAM (FRAM), Magnetic RAM (MRAM), and and a phase change memory device (RAM), and the like. However, the present disclosure is not limited thereto.

메모리(230)는 프로세서(210)가 처리중인 인공신경망모델의 추론 데이터, 가중치 데이터 및 특징맵 데이터 중 적어도 하나를 저장하도록 구성될 수 있다. 추론 데이터는 인공신경망모델의 입력신호일 수 있다. The memory 230 may be configured to store at least one of inference data, weight data, and feature map data of the artificial neural network model being processed by the processor 210 . The inference data may be an input signal of an artificial neural network model.

메모리(230)는 인공신경망 메모리 제어부(220)로부터 메모리 접근 요청을 수신하도록 구성될 수 있다. 메모리(230)는 수신한 메모리 접근 요청에 대응되는 메모리 동작을 수행하도록 구성될 수 있다. 메모리 동작을 제어하는 동작 모드는 읽기 모드 또는 쓰기 모드를 포함할 수 있다.The memory 230 may be configured to receive a memory access request from the artificial neural network memory controller 220 . The memory 230 may be configured to perform a memory operation corresponding to the received memory access request. The operation mode for controlling the memory operation may include a read mode or a write mode.

예를 들면, 수신한 메모리 접근 요청의 동작 모드가 쓰기 모드일 경우, 메모리(230)는 대응되는 메모리 주소 값에 인공신경망 메모리 제어부(220)에서 수신된 데이터를 저장할 수 있다. For example, when the operation mode of the received memory access request is the write mode, the memory 230 may store data received from the artificial neural network memory controller 220 in a corresponding memory address value.

예를 들면, 수신한 메모리 접근 요청의 동작 모드가 읽기 모드일 경우, 메모리(230)는 대응되는 메모리 주소 값에 저장된 데이터를 인공신경망 메모리 제어부(220)에 전달할 수 있다. 인공신경망 메모리 제어부(220)는 전달받은 데이터를 프로세서(210)에 다시 전달하도록 구성될 수 있다. For example, when the operation mode of the received memory access request is the read mode, the memory 230 may transmit data stored in a corresponding memory address value to the artificial neural network memory controller 220 . The artificial neural network memory controller 220 may be configured to transmit the received data back to the processor 210 .

메모리(230)는 지연시간(latency)을 가질 수 있다. 메모리(230)의 지연시간은 인공신경망 메모리 제어부(220)가 메모리 접근 요청을 처리할 때 지연되는 시간을 의미할 수 있다. 즉, 메모리(230)가 인공신경망 메모리 제어부(220)에서 메모리 접근 요청을 수신 하면, 특정 클럭 사이클의 지연시간 이후에 실제로 요구된 데이터가 메모리(230)에서 출력된다. The memory 230 may have latency. The delay time of the memory 230 may mean a delay time when the artificial neural network memory controller 220 processes a memory access request. That is, when the memory 230 receives a memory access request from the neural network memory controller 220 , the data actually requested is output from the memory 230 after a delay time of a specific clock cycle.

메모리(230)가 메모리 접근 요청을 처리하기 위해서, 메모리(230)는 메모리 접근 요청에 포함된 메모리 주소 값에 접근할 수 있다. 따라서, 메모리 주소 값에 접근하기 위한 시간이 필요하며, 이런 시간을 메모리 지연시간으로 정의할 수 있다. 예를 들면, DDR4 SDRAM 메모리의 CAS 지연시간은 10ns 정도다. 지연시간이 발생하는 동안 프로세서(210)에 데이터가 공급되지 않을 경우, 프로세서(210)는 대기(IDLE) 상태가 되어 실제 연산을 할 수 없게 될 수 있다.In order for the memory 230 to process the memory access request, the memory 230 may access a memory address value included in the memory access request. Therefore, time is required to access the memory address value, and this time can be defined as the memory delay time. For example, the CAS latency of DDR4 SDRAM memory is about 10ns. If data is not supplied to the processor 210 while the delay time occurs, the processor 210 may be in an IDLE state and may not be able to perform an actual operation.

부연 설명하면, 메모리(230)의 한 종류인 DRAM의 경우, 메모리(230)의 Row 주소에 따라 word line 및 bit line을 활성화하는 데 여러 클럭, Column line을 활성화하는 데 여러 클럭, 데이터를 메모리(230) 외부로 전송하는 경로를 통과하는 데 여러 클럭이 소요되며 NAND Flash의 경우에는 한번에 활성화되는 단위가 커서 그 중에서 필요한 주소의 데이터를 탐색하는 데까지 여러 클럭이 추가로 소요될 수도 있다.More specifically, in the case of DRAM, which is a type of memory 230 , several clocks for activating word lines and bit lines, several clocks for activating column lines, and data 230) It takes several clocks to pass through the external transmission path, and in the case of NAND Flash, since the unit activated at one time is large, several clocks may be additionally required to search for data of a required address among them.

메모리(230)는 대역폭(bandwidth)을 가질 수 있다. 메모리(230)의 데이터 전송률을 메모리 대역폭으로 정의할 수 있다. 예를 들면, DDR4 SDRAM 메모리의 대역폭은 4GBytes/sec 정도다. 메모리 대역폭이 높을수록 메모리(230)는 프로세서(210)에 데이터를 빠르게 전송할 수 있다.The memory 230 may have a bandwidth. A data transfer rate of the memory 230 may be defined as a memory bandwidth. For example, the bandwidth of DDR4 SDRAM memory is about 4GBytes/sec. The higher the memory bandwidth, the faster the memory 230 can transmit data to the processor 210 .

즉, 인공신경망 메모리 시스템(200)의 처리 속도는 프로세서(210)의 처리 성능 보다, 프로세서(210)가 처리할 데이터를 공급할 때 발생되는 지연시간과 메모리(230)의 대역폭 성능이 상대적으로 더 많은 영향을 끼친다. That is, the processing speed of the artificial neural network memory system 200 is higher than the processing performance of the processor 210 , the delay time generated when the processor 210 supplies data to be processed and the bandwidth performance of the memory 230 are relatively higher. affects

부연 설명하면, 메모리의 대역폭은 점진적으로 증가되고 있으나, 메모리의 지연시간은 대역폭의 개선 속도에 비해서 상대적으로 개선 속도가 느리다. 특히 메모리 접근 요청이 발생될 때마다, 메모리(230)의 지연시간이 발생되기 때문에, 빈번한 메모리 접근 요청은 인공신경망 처리 속도 저하의 중요한 원인이 될 수 있다.In other words, although the bandwidth of the memory is gradually increased, the improvement rate of the delay time of the memory is relatively slow compared to the improvement rate of the bandwidth. In particular, whenever a memory access request is generated, since a delay time of the memory 230 is generated, frequent memory access requests may be an important cause of a decrease in the processing speed of the artificial neural network.

즉, 프로세서(210)의 연산 처리 속도가 빠르더라도, 연산에 필요한 데이터를 가져올 때 지연이 발생되면, 프로세서(210)는 연산을 하지 않는 대기 상태가 될 수 있으며, 이러한 경우 프로세서(210)의 연산 처리 속도가 저하될 수 있다. That is, even if the processing speed of the processor 210 is fast, if there is a delay in fetching data required for the operation, the processor 210 may be in a standby state in which no operation is performed. In this case, the operation of the processor 210 is Processing speed may be reduced.

이에 본 개시의 예시들에 따른 인공신경망 메모리 시스템은 메모리(230)의 대역폭 및/또는 지연시간을 개선하도록 구성될 수 있다.Accordingly, the artificial neural network memory system according to examples of the present disclosure may be configured to improve the bandwidth and/or delay time of the memory 230 .

도 9는 본 개시의 비교예에 따른 메모리 시스템의 동작을 설명하는 개략도이다.9 is a schematic diagram illustrating an operation of a memory system according to a comparative example of the present disclosure.

도 9를 참조하면, 프로세서가 데이터 접근 요청을 생성하고, 종래의 메모리 시스템은 데이터 접근 요청에 대응되는 메모리 접근 요청을 메모리에 전달할 수 있다. 이때 메모리는 지연시간을 가지기 때문에, 프로세서는 지연시간 동안 대기한 후 요청한 데이터를 메모리에서 제공받을 수 있다. Referring to FIG. 9 , a processor may generate a data access request, and the conventional memory system may transmit a memory access request corresponding to the data access request to the memory. In this case, since the memory has a delay time, the processor can receive the requested data from the memory after waiting for the delay time.

예를 들면, 프로세서가 생성한 데이터 접근 요청[1]을 종래의 메모리 시스템이 수신하고, 데이터 접근 요청[1]에 대응되는 메모리 접근 요청[1']을 메모리에 전달한다. 메모리는 지연시간 이후에 메모리 시스템에 데이터[1'']를 전달 할 수 있다. 따라서, 프로세서는 하나의 데이터 접근 요청마다 메모리의 지연시간만큼 처리 시간이 지연될 수 있다. 따라서, 인공신경망 추론 연산의 시간이 메모리 지연시간 만큼 느려 질 수 있다. 특히, 프로세서가 데이터 접근 요청을 많이 생성할수록, 종래의 메모리 시스템의 인공신경망 추론 연산 시간이 더욱 더 지연될 수 있다.For example, the conventional memory system receives the data access request [1] generated by the processor, and transmits the memory access request [1'] corresponding to the data access request [1] to the memory. The memory can deliver data[1''] to the memory system after the delay time. Accordingly, the processor may delay the processing time by the delay time of the memory for each data access request. Accordingly, the time of the artificial neural network reasoning operation may be as slow as the memory delay time. In particular, as the processor generates more data access requests, the artificial neural network reasoning operation time of the conventional memory system may be further delayed.

도 10은 본 개시의 다른 예시에 따른 메모리 시스템의 설명하는 개략도이다.10 is a schematic diagram illustrating a memory system according to another example of the present disclosure.

도 10을 참조하면, 프로세서(210)가 데이터 접근 요청[1]을 생성하고, 인공신경망 메모리 제어부(220)는 인공신경망 데이터 지역성 패턴에 기초하여 생성된 예측된 데이터 접근 요청에 대응되는 메모리 접근 요청을 메모리(230)에 전달할 수 있다. 이때 메모리(230)가 지연시간을 가지더라도, 프로세서(210)는 예측된 데이터 접근 요청에 대응되는 메모리 접근 요청을 생성하였기 때문에, 프로세서(210)가 실제 데이터 접근 요청을 생성할 때 인공신경망 메모리 제어부(220)는 프로세서(210)가 요청한 데이터를 바로 프로세서(210)에 제공할 수 있다. Referring to FIG. 10 , the processor 210 generates a data access request [1], and the neural network memory control unit 220 requests a memory access corresponding to the predicted data access request generated based on the artificial neural network data locality pattern. may be transferred to the memory 230 . At this time, even if the memory 230 has a delay time, since the processor 210 generates a memory access request corresponding to the predicted data access request, when the processor 210 generates the actual data access request, the artificial neural network memory controller The 220 may directly provide the data requested by the processor 210 to the processor 210 .

예를 들면, 프로세서(210)가 생성한 데이터 접근 요청[1]을 인공 신경망 메모리 제어부(220)가 수신하여 예측된 데이터 접근 요청[2]을 생성하고, 예측된 데이터 접근 요청[2]에 대응되는 메모리 접근 요청[2']을 메모리(230)에 전달한다. 메모리(230)는 지연시간 이후에 인공신경망 메모리 제어부(220)에 데이터[2'']를 전달할 수 있다. 하지만, 메모리(230)가 제공한 데이터[2'']는 예측된 데이터 접근 요청[2]에 기초한 메모리 접근 요청[2']에 대응되는 데이터이다. 따라서 프로세서(210)가 실제 데이터 접근 요청[2]를 생성하면, 인공신경망 메모리 제어부(220)는 프로세서(210)에 데이터[2'']를 즉각 제공할 수 있다.For example, the artificial neural network memory controller 220 receives the data access request [1] generated by the processor 210, generates a predicted data access request [2], and responds to the predicted data access request [2] It transfers the memory access request [2'] to the memory 230 . The memory 230 may transmit the data [2''] to the artificial neural network memory controller 220 after the delay time. However, the data [2''] provided by the memory 230 is data corresponding to the memory access request [2'] based on the predicted data access request [2]. Therefore, when the processor 210 generates an actual data access request [2], the artificial neural network memory controller 220 may immediately provide the data [2''] to the processor 210 .

만약, 예측된 데이터 접근 요청에 기초한 메모리 접근 요청과 실제 데이터 접근 요청 사이의 시간이 메모리(230)의 지연시간 이상일 경우, 인공신경망 메모리 제어부(220)는 프로세서(210)에서 실제 데이터 접근 요청을 수신하자 마자 프로세서(210)에 데이터를 제공할 수 있다. 이러한 경우, 인공신경망 메모리 제어부(220)는 메모리(230)의 지연시간을 실질적으로 제거할 수 있는 효과가 있다. If the time between the memory access request based on the predicted data access request and the actual data access request is equal to or greater than the delay time of the memory 230 , the artificial neural network memory controller 220 receives the actual data access request from the processor 210 . Data may be provided to the processor 210 as soon as it is done. In this case, the artificial neural network memory controller 220 has the effect of substantially eliminating the delay time of the memory 230 .

다르게 설명하면, 예측된 데이터 접근 요청에 기초한 메모리 접근 요청이 메모리(230)에 전달될 때, 메모리(230)의 지연시간이 예측된 데이터 접근 요청 생성부터 실제 데이터 접근 요청 생성 까지의 시간 이하일 수 있다. 이러한 경우, 인공신경망 메모리 제어부(220)는 프로세서(210)가 실제 데이터 접근 요청을 생성하자 마자 지연시간 없이 데이터를 바로 제공할 수 있는 효과가 있다. In other words, when the memory access request based on the predicted data access request is transmitted to the memory 230, the delay time of the memory 230 may be less than or equal to the time from the predicted data access request generation to the actual data access request generation. . In this case, the artificial neural network memory control unit 220 has the effect of being able to provide data immediately without delay as soon as the processor 210 generates an actual data access request.

만약, 예측된 데이터 접근 요청에 기초한 메모리 접근 요청과 실제 데이터 접근 요청 사이의 시간이 메모리(230)의 지연시간 미만이더라도, 메모리 접근 요청과 실제 데이터 접근 요청 사이의 시간만큼 메모리(230)의 지연시간을 실질적으로 감소시킬 수 있는 효과가 있다.If the time between the memory access request and the actual data access request based on the predicted data access request is less than the delay time of the memory 230 , the delay time of the memory 230 by the time between the memory access request and the actual data access request has the effect of substantially reducing

상술한 구성에 따르면, 인공신경망 메모리 제어부(220)는 프로세서(210)에 제공할 데이터의 지연시간을 실질적으로 제거하거나 또는 저감할 수 있는 효과가 있다.According to the above configuration, the artificial neural network memory controller 220 has an effect of substantially eliminating or reducing the delay time of data to be provided to the processor 210 .

몇몇 예시에서는, 인공신경망 메모리 시스템의 인공신경망 메모리 제어부는 메모리의 지연시간을 측정하거나 또는 메모리의 지연시간 값을 메모리로부터 제공받도록 구성될 수 있다.In some examples, the artificial neural network memory control unit of the artificial neural network memory system may be configured to measure the delay time of the memory or to receive the delay time value of the memory from the memory.

상술한 구성에 따르면, 인공신경망 메모리 제어부는 메모리의 지연시간에 기초하여 예측된 데이터 접근 요청에 기초한 메모리 접근 요청의 생성 시기를 결정하도록 구성될 수 있다. 따라서 인공신경망 메모리 제어부가 메모리의 지연시간을 실질적으로 최소화 시키는 예측된 데이터 접근 요청에 기초한 메모리 접근 요청을 생성 할 수 있는 효과가 있다. According to the above-described configuration, the artificial neural network memory controller may be configured to determine the generation time of the memory access request based on the predicted data access request based on the delay time of the memory. Therefore, there is an effect that the artificial neural network memory controller can generate a memory access request based on the predicted data access request that substantially minimizes the delay time of the memory.

몇몇 예시에서는, 인공신경망 메모리 시스템의 메모리는 메모리 셀(즉, 메모리셀 어레이)의 전압을 갱신할 수 있는 리프레쉬 기능을 포함하도록 구성된 메모리일 수 있다. 인공신경망 메모리 제어부는 예측된 데이터 접근 요청에 대응되는 메모리 접근 요청에 대응되는 메모리의 메모리 주소 영역의 리프레쉬를 선택적으로 제어하도록 구성될 수 있다. 예를 들면, 메모리는 리프레쉬 기능을 포함한 SAM 또는 DRAM일 수 있다.In some examples, the memory of the artificial neural network memory system may be a memory configured to include a refresh function capable of updating a voltage of a memory cell (ie, a memory cell array). The artificial neural network memory controller may be configured to selectively control the refresh of a memory address region of a memory corresponding to a memory access request corresponding to the predicted data access request. For example, the memory may be a SAM or DRAM with a refresh function.

DRAM은 메모리 셀의 전압을 리프레쉬하지 않으면 메모리 셀이 서서히 방전되어, 저장된 데이터가 손실될 수 있다. 따라서 특정 주기마다 메모리 셀의 전압이 리프레쉬되어야 한다. 만약 인공신경망 메모리 제어부가 메모리 접근 요청을 할 때와 리프레쉬 타이밍이 겹칠 경우, 인공신경망 메모리 시스템은 메모리 셀의 전압을 리프레쉬하는 타이밍을 앞당기거나, 또는 지연시키도록 구성될 수 있다. In the DRAM, if the voltage of the memory cell is not refreshed, the memory cell is gradually discharged, and stored data may be lost. Therefore, the voltage of the memory cell must be refreshed every specific period. If the refresh timing overlaps with the memory access request of the artificial neural network memory controller, the artificial neural network memory system may be configured to advance or delay the timing of refreshing the voltage of the memory cell.

인공신경망 메모리 시스템은 인공신경망 데이터 지역성 패턴을 기초로 메모리 접근 요청의 생성 타이밍을 예측하거나 또는 계산할 수 있다. 따라서, 인공신경망 메모리 시스템은 메모리 접근 요청 동작 시 메모리 셀의 전압 리프레쉬를 제한하도록 구성될 수 있다.The artificial neural network memory system may predict or calculate the generation timing of the memory access request based on the artificial neural network data locality pattern. Accordingly, the artificial neural network memory system may be configured to limit the voltage refresh of the memory cell during a memory access request operation.

부연 설명하면, 인공신경망 연산의 추론 연산은 정확도 개념으로 동작하기 때문에, 메모리 셀의 전압 리프레쉬가 지연되어 저장된 데이터에 일부 손실이 발생하더라도, 추론 정확도 저하는 실질적으로 무시할 수 있는 수준일 수 있다.In other words, since the reasoning operation of the artificial neural network operation operates with the concept of accuracy, even if the voltage refresh of the memory cell is delayed and some loss occurs in the stored data, the degradation of the reasoning accuracy may be substantially negligible.

상술한 구성에 따르면, 인공신경망 메모리 시스템은 메모리 셀의 전압 리프레쉬 주기와 조절하여 메모리 접근 요청에 따른 데이터를 메모리로부터 제공 받을 수 있는 효과가 있다. 따라서 인공신경망 메모리 시스템은 추론 정확도가 실질적으로 저하되지 않게 하면서 메모리 셀의 전압 리프레쉬에 따른 인공신경망 연산 속도 저하를 개선할 수 있는 효과가 있다. According to the above-described configuration, the artificial neural network memory system has an effect of being able to receive data according to a memory access request from the memory by adjusting the voltage refresh period of the memory cell. Therefore, the artificial neural network memory system has an effect of improving the artificial neural network operation speed decrease due to the voltage refresh of the memory cell without substantially lowering the inference accuracy.

몇몇 예시에서는, 인공신경망 메모리 시스템의 메모리는 메모리의 글로벌 비트라인을 특정 전압으로 충전시킬 수 있는 프리차지(Precharge) 기능을 더 포함하도록 구성될 수 있다. 이때, 인공신경망 메모리 제어부는 예측된 데이터 접근 요청에 대응되는 메모리 접근 요청에 대응되는 메모리의 메모리 주소 영역에 프리차지를 선택적으로 제공하도록 구성될 수 있다.In some examples, the memory of the artificial neural network memory system may be configured to further include a precharge function capable of charging the global bit line of the memory to a specific voltage. In this case, the artificial neural network memory controller may be configured to selectively provide a precharge to a memory address area of a memory corresponding to a memory access request corresponding to the predicted data access request.

몇몇 예시에서는, 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴에 기초하여 예측된 데이터 접근 요청에 대응되는 메모리 작업을 수행할 메모리의 비트라인을 프리차지 시키거나 또는 지연시키도록 구성될 수 있다. In some examples, the neural network memory controller may be configured to precharge or delay a bit line of a memory to perform a memory operation corresponding to a data access request predicted based on the neural network data locality pattern.

일반적으로 메모리는 메모리 접근 요청을 입력 받아 읽기 동작 또는 쓰기 동작을 수행하는데 프리차지 동작을 수행한다. 한 번의 메모리 동작이 완료되면, 데이터 읽기 쓰기 동작을 수행한 비트라인 및 각 데이터 입출력 라인에 신호들이 남아 있게 되는데, 이와 같은 라인들을 기 설정된 레벨로 프리차지해야 다음의 메모리 동작을 원활하게 수행할 수 있다. 다만, 프리차지에 소요되는 시간이 상당히 길기 때문에, 메모리 접근 요청 생성 시기와 프리차지 타이밍이 겹칠 경우, 메모리 동작이 프리차지 시간만큼 지연될 수 있다. 따라서 프로세서가 요청한 데이터 접근 요청의 처리 시간이 지연될 수 있다. In general, a memory receives a memory access request and performs a read operation or a write operation, and a precharge operation is performed. When one memory operation is completed, signals remain on the bit line and each data input/output line on which the data read/write operation has been performed. These lines must be precharged to a preset level to smoothly perform the next memory operation there is. However, since the time required for the precharge is quite long, when the timing of generating the memory access request and the timing of the precharge overlap, the memory operation may be delayed by the precharge time. Therefore, the processing time of the data access request requested by the processor may be delayed.

인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴에 기초하여 특정 순서에 특정 메모리의 비트라인에 메모리 동작이 수행될 것을 예측할 수 있다. 따라서 인공신경망 메모리 제어부는 특정 비트라인에 메모리 동작이 수행될 때와 프리차지 타이밍이 겹치지 않게 프리차지 타이밍을 앞당기거나 또는 지연시킬 수 있다.The artificial neural network memory controller may predict that a memory operation is performed on a bit line of a particular memory in a particular order based on the artificial neural network data locality pattern. Accordingly, the artificial neural network memory control unit may advance or delay the precharge timing so that the precharge timing does not overlap with when a memory operation is performed on a specific bit line.

부연 설명하면, 인공신경망모델의 추론 연산은 정확도 개념으로 동작하기 때문에, 프리차지가 지연되어 저장된 데이터에 일부 손실이 발생하더라도, 추론 정확도 저하는 실질적으로 무시할 수 있는 수준일 수 있다. In other words, since the reasoning operation of the artificial neural network model operates based on the concept of accuracy, even if some loss occurs in the stored data due to the delay in precharging, the degradation of the inference accuracy may be substantially negligible.

부연 설명하면, 인공신경망은 생물학의 뇌 신경망을 모방하여 모델링한 수학적 모델이다. 뉴런(Neuron)이라 불리는 인간의 신경세포는 시냅스(Synapse)라 불리는 신경세포의 접합부를 통하여 정보를 교환하며 신경세포와 신경세포 간의 정보교환은 매우 단순하지만, 상당한 수의 신경세포가 모여 지능을 만들어 낸다. 이러한 구조는 몇몇의 신경세포가 잘못된 정보를 전달하여도 전체 정보에 큰 영향을 끼치지 않으므로 적은 오류에 매우 강인한 장점을 지닌다. 즉, 상술한 특성 때문에, 인공신경망모델의 데이터를 저장하는 메모리의 프리차지 및 리프레쉬 기능을 선택적으로 제한하더라도 인공신경망모델의 정확도는 실질적으로 문제가 발생하지 않을 수 있으며 프리차지 또는 리프레쉬에 의한 메모리 지연시간을 저감할 수 있는 효과가 있다.To elaborate, the artificial neural network is a mathematical model modeled by mimicking the brain neural network of biology. Human nerve cells called neurons exchange information through the junctions of nerve cells called synapses. pay This structure has the advantage of being very strong against small errors because even if some nerve cells transmit erroneous information, the overall information is not greatly affected. That is, due to the above-described characteristics, even if the precharge and refresh functions of the memory for storing the data of the artificial neural network model are selectively restricted, the accuracy of the artificial neural network model may not be substantially problematic, and the memory delay due to the precharge or refresh It has the effect of saving time.

상술한 구성에 따르면, 인공신경망 메모리 시스템은 추론 정확도가 실질적으로 저하되지 않게 하면서 프리차지에 따른 인공신경망 연산 속도 저하를 개선할 수 있는 효과가 있다. According to the above-described configuration, the artificial neural network memory system has an effect of improving the artificial neural network computation speed decrease due to precharge while not substantially lowering inference accuracy.

몇몇 예시에서는, 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴에 기초하여 메모리의 리프레쉬 기능 및 프리차지 기능을 각각 제어하도록 구성될 수 있다. In some examples, the artificial neural network memory controller may be configured to respectively control a refresh function and a precharge function of the memory based on the neural network data locality pattern.

도 11은 본 개시의 또 다른 예시에 따른 인공신경망 메모리 시스템을 설명하는 개략적인 블록도이다.11 is a schematic block diagram illustrating an artificial neural network memory system according to another example of the present disclosure.

도 11을 참조하면, 인공신경망 메모리 시스템(300)은 프로세서(310), 캐쉬 메모리(322)를 포함하는 인공신경망 메모리 제어부(320), 및 메모리(330)를 포함하도록 구성될 수 있다. Referring to FIG. 11 , the artificial neural network memory system 300 may be configured to include a processor 310 , an artificial neural network memory controller 320 including a cache memory 322 , and a memory 330 .

본 개시의 다양한 예시들에 포함될 수 있는 메모리(330)는 인공신경망 연산에 특화된 메모리일 수 있으며 SEQUENTIAL ACCESS MEMORY(SAM)로 지칭될 수 있다. 단, 이에 제한되지 않으며 본 개시의 다양한 예시들의 메모리는 인공신경망 데이터 지역성에 기초하여 제어될 수 있는 인공신경망에 특화된 메모리를 지칭할 수 있다.The memory 330 that may be included in various examples of the present disclosure may be a memory specialized for artificial neural network operation, and may be referred to as SEQUENTIAL ACCESS MEMORY (SAM). However, the present disclosure is not limited thereto, and the memory of various examples of the present disclosure may refer to a memory specialized for an artificial neural network that can be controlled based on artificial neural network data locality.

본 개시의 또 다른 예시에 따른 인공신경망 메모리 시스템(300)을 본 개시의 다른 예시에 따른 인공신경망 메모리 시스템(200)과 비교하면, 인공신경망 메모리 시스템(300)이 캐쉬 메모리(322)를 더 포함하는 것을 제외하곤 실질적으로 동일하기 때문에, 이하 단지 설명의 편의를 위해서 중복 설명은 생략할 수 있다. When the artificial neural network memory system 300 according to another example of the present disclosure is compared with the artificial neural network memory system 200 according to another example of the present disclosure, the artificial neural network memory system 300 further includes a cache memory 322 . Since they are substantially the same except for the following, duplicate description may be omitted for convenience of description only.

본 개시의 또 다른 예시에 따른 인공신경망 메모리 시스템(300)은 예측된 데이터 접근 요청에 기초한 메모리 접근 요청에 응답하여 메모리(330)가 전송한 데이터를 저장하도록 구성된 캐쉬 메모리(322)를 포함하는 인공신경망 메모리 제어부(320)를 포함하도록 구성될 수 있다. The artificial neural network memory system 300 according to another example of the present disclosure is artificial including a cache memory 322 configured to store data transmitted by the memory 330 in response to a memory access request based on the predicted data access request. It may be configured to include a neural network memory controller 320 .

상술한 구성에 따르면, 인공신경망 메모리 제어부(320)는 예측된 데이터 접근 요청에 기초한 메모리 접근 요청에 응답한 데이터를 메모리(330)에서 읽어와서 캐쉬 메모리(322)에 저장할 수 있다. 따라서 프로세서(310)가 실제 데이터 접근 요청 생성 시, 인공신경망 메모리 제어부(320)는 캐쉬 메모리(322)에 저장된 데이터를 프로세서(310)에 바로 제공할 수 있는 효과가 있다. According to the above configuration, the artificial neural network memory controller 320 may read data in response to the memory access request based on the predicted data access request from the memory 330 and store it in the cache memory 322 . Therefore, when the processor 310 generates an actual data access request, the artificial neural network memory control unit 320 has an effect of directly providing the data stored in the cache memory 322 to the processor 310 .

캐쉬 메모리(322)의 지연시간은 메모리(330)의 지연시간보다 상대적으로 훨씬 짧다. 캐쉬 메모리(322)의 대역폭은 메모리(330)의 대역폭보다 상대적으로 더 높다. The delay time of the cache memory 322 is relatively much shorter than the delay time of the memory 330 . The bandwidth of the cache memory 322 is relatively higher than the bandwidth of the memory 330 .

본 개시의 또 다른 예시에 따른 캐쉬 메모리(322)를 포함한 인공신경망 메모리 시스템(300)의 인공신경망모델 처리 성능은 본 개시의 다른 예시에 따른 인공신경망 메모리 시스템(200)보다 상대적으로 더 우수할 수 있는 효과가 있다. The artificial neural network model processing performance of the artificial neural network memory system 300 including the cache memory 322 according to another example of the present disclosure may be relatively better than the artificial neural network memory system 200 according to another example of the present disclosure. there is an effect

다시, 도 3의 인공신경망모델(1300)을 참조하여 본 개시의 또 다른 예시에 따른 인공신경망 메모리 시스템(300)을 설명한다. Again, an artificial neural network memory system 300 according to another example of the present disclosure will be described with reference to the artificial neural network model 1300 of FIG. 3 .

인공신경망모델(1300)은 특정 컴파일러(compiler)에 의해서 컴파일 되어 프로세서(310)에서 연산 될 수도 있다. 컴파일러는 인공신경망 메모리 제어부(320)에 인공신경망 데이터 지역성 패턴을 제공하도록 구성될 수도 있다. The artificial neural network model 1300 may be compiled by a specific compiler and calculated by the processor 310 . The compiler may be configured to provide the neural network data locality pattern to the neural network memory controller 320 .

인공신경망모델(1300)을 추론하기 위해서 프로세서(310)는 인공신경망 데이터 지역성에 기초한 순서대로 데이터 접근 요청들을 생성하도록 구성된다. 따라서 인공신경망 메모리 제어부(320)는 데이터 접근 요청들을 모니터링하여 인공신경망 데이터 지역성 패턴(1400)을 생성할 수 있다. 또는, 인공신경망 메모리 제어부(320)는 기 생성된 인공신경망 데이터 지역성 패턴(1400)을 저장하고 있을 수도 있다. In order to infer the neural network model 1300, the processor 310 is configured to generate data access requests in an order based on the neural network data locality. Accordingly, the neural network memory controller 320 may generate the neural network data locality pattern 1400 by monitoring data access requests. Alternatively, the artificial neural network memory controller 320 may store the previously generated artificial neural network data locality pattern 1400 .

이하 인공신경망 데이터 지역성 패턴(1400)이 생성되지 않은 경우를 설명한다.Hereinafter, a case in which the artificial neural network data locality pattern 1400 is not generated will be described.

먼저 프로세서(310)는 입력 레이어(1310)의 노드 값 읽기에 대응되는 토큰[1]의 데이터 접근 요청을 생성할 수 있다. 따라서, 인공신경망 메모리 제어부(320)는 토큰[1]의 메모리 접근 요청을 생성하여 메모리(330)에서 전달 받은 입력 레이어(1310)의 노드 값을 프로세서(310)에 전달할 수 있다.First, the processor 310 may generate a data access request of the token [1] corresponding to reading the node value of the input layer 1310 . Accordingly, the artificial neural network memory control unit 320 may generate a memory access request of the token [1] and transmit the node value of the input layer 1310 received from the memory 330 to the processor 310 .

이어서, 프로세서(310)는 제1 연결망(1320)의 가중치 값 읽기에 대응되는 토큰[2]의 데이터 접근 요청을 생성할 수 있다. 따라서, 인공신경망 메모리 제어부(320)는 토큰[2]의 메모리 접근 요청을 생성하여 메모리(330)에서 전달 받은 제1 연결망(1320)의 가중치 값을 프로세서(310)에 전달할 수 있다.Subsequently, the processor 310 may generate a data access request of the token [2] corresponding to the reading of the weight value of the first connection network 1320 . Accordingly, the artificial neural network memory controller 320 may generate a memory access request of the token [2] and transmit the weight value of the first connection network 1320 received from the memory 330 to the processor 310 .

이어서, 프로세서(310)는 입력 레이어(1310)의 노드 값과 제1 연결망(1320)의 가중치 값을 전달 받아 제1 은닉 레이어(1330)의 노드 값을 연산할 수 있다. 즉, 프로세서(310)는 제1 은닉 레이어(1330)의 노드 값 쓰기에 대응되는 토큰[3]의 데이터 접근 요청을 생성 할 수 있다. 따라서, 인공신경망 메모리 제어부(320)는 토큰[3]의 메모리 접근 요청을 생성하여 제1 은닉 레이어(1330)의 노드 값을 메모리(330)에 저장할 수 있다.Subsequently, the processor 310 may receive the node value of the input layer 1310 and the weight value of the first connection network 1320 to calculate the node value of the first hidden layer 1330 . That is, the processor 310 may generate a data access request of the token [3] corresponding to the writing of the node value of the first hidden layer 1330 . Accordingly, the artificial neural network memory control unit 320 may generate a memory access request of the token [3] and store the node value of the first hidden layer 1330 in the memory 330 .

이어서, 프로세서(310)는 제1 은닉 레이어(1330)의 노드 값 읽기에 대응되는 토큰[4]의 데이터 접근 요청을 생성할 수 있다. 따라서, 인공신경망 메모리 제어부(320)는 토큰[4]의 메모리 접근 요청을 생성하여 메모리(330)에서 전달 받은 제1 은닉 레이어(1330)의 노드 값을 프로세서(310)에 전달할 수 있다.Subsequently, the processor 310 may generate a data access request of the token [4] corresponding to the reading of the node value of the first hidden layer 1330 . Accordingly, the artificial neural network memory control unit 320 may generate a memory access request of the token [4] and transmit the node value of the first hidden layer 1330 received from the memory 330 to the processor 310 .

이어서, 프로세서(310)는 제2 연결망(1340)의 가중치 값 읽기에 대응되는 토큰[5]의 데이터 접근 요청을 생성할 수 있다. 따라서, 인공신경망 메모리 제어부(320)는 토큰[5]의 메모리 접근 요청을 생성하여 메모리(330)에서 전달 받은 제2 연결망(1340)의 가중치 값을 프로세서(310)에 전달할 수 있다.Subsequently, the processor 310 may generate a data access request of the token [5] corresponding to the reading of the weight value of the second connection network 1340 . Accordingly, the artificial neural network memory control unit 320 may generate a memory access request of the token [5] and transmit the weight value of the second connection network 1340 received from the memory 330 to the processor 310 .

이어서, 프로세서(310)는 제1 은닉 레이어(1330)의 노드 값과 제2 연결망(1340)의 가중치 값을 전달 받아 제2 은닉 레이어(1350)의 노드 값을 연산할 수 있다. 즉, 프로세서(310)는 제2 은닉 레이어(1350)의 노드 값 쓰기에 대응되는 토큰[6]의 데이터 접근 요청을 생성 할 수 있다. 따라서, 인공신경망 메모리 제어부(320)는 토큰[6]의 메모리 접근 요청을 생성하여 제2 은닉 레이어(1350)의 노드 값을 메모리(330)에 저장할 수 있다.Subsequently, the processor 310 may receive the node value of the first hidden layer 1330 and the weight value of the second connection network 1340 to calculate the node value of the second hidden layer 1350 . That is, the processor 310 may generate a data access request of the token [6] corresponding to the writing of the node value of the second hidden layer 1350. Accordingly, the artificial neural network memory controller 320 may generate a memory access request of the token [6] and store the node value of the second hidden layer 1350 in the memory 330 .

이어서, 프로세서(310)는 제2 은닉 레이어(1350)의 노드 값 읽기에 대응되는 토큰[7]의 데이터 접근 요청을 생성할 수 있다. 따라서, 인공신경망 메모리 제어부(320)는 토큰[7]의 메모리 접근 요청을 생성하여 메모리(330)에서 전달 받은 제2 은닉 레이어(1350)의 노드 값을 프로세서(310)에 전달할 수 있다.Subsequently, the processor 310 may generate a data access request of the token [7] corresponding to the reading of the node value of the second hidden layer 1350 . Accordingly, the artificial neural network memory control unit 320 may generate a memory access request of the token [7] and transmit the node value of the second hidden layer 1350 received from the memory 330 to the processor 310 .

이어서, 프로세서(310)는 제3 연결망(1360)의 가중치 값 읽기에 대응되는 토큰[8]의 데이터 접근 요청을 생성할 수 있다. 따라서, 인공신경망 메모리 제어부(320)는 토큰[8]의 메모리 접근 요청을 생성하여 메모리(330)에서 전달 받은 제3 연결망(1360)의 가중치 값을 프로세서(310)에 전달할 수 있다.Subsequently, the processor 310 may generate a data access request of the token [8] corresponding to the reading of the weight value of the third connection network 1360 . Accordingly, the artificial neural network memory control unit 320 may generate a memory access request of the token [8] and transmit the weight value of the third connection network 1360 received from the memory 330 to the processor 310 .

이어서, 프로세서(310)는 제2 은닉 레이어(1350)의 노드 값과 제3 연결망(1360)의 가중치 값을 전달 받아 출력 레이어(1370)의 노드 값을 연산할 수 있다. 즉, 프로세서(310)는 출력 레이어(1370)의 노드 값 쓰기에 대응되는 토큰[9]의 데이터 접근 요청을 생성 할 수 있다. 따라서, 인공신경망 메모리 제어부(320)는 토큰[9]의 메모리 접근 요청을 생성하여 출력 레이어(1370)의 노드 값을 메모리(330)에 저장할 수 있다.Subsequently, the processor 310 may calculate the node value of the output layer 1370 by receiving the node value of the second hidden layer 1350 and the weight value of the third connection network 1360 . That is, the processor 310 may generate a data access request of the token [9] corresponding to the writing of the node value of the output layer 1370 . Accordingly, the artificial neural network memory control unit 320 may generate a memory access request of the token [9] and store the node value of the output layer 1370 in the memory 330 .

따라서, 인공신경망 메모리 시스템(300)은 출력 레이어(1370)에 인공신경망모델(1300)의 추론 결과를 저장할 수 있다. Accordingly, the artificial neural network memory system 300 may store the inference result of the artificial neural network model 1300 in the output layer 1370 .

상술한 예시는 인공신경망 메모리 제어부(320)에 인공신경망 데이터 지역성 패턴(1400)이 생성되지 않은 경우이다. 따라서 상술한 예시는 예측된 데이터 접근 요청을 생성할 수 없다. 따라서 인공신경망 메모리 제어부(320)가 사전에 데이터를 제공하지 못했기 때문에 각각의 메모리 접근 요청마다 메모리(330)의 지연시간이 발생할 수 있다. The above-described example is a case in which the artificial neural network data locality pattern 1400 is not generated in the artificial neural network memory controller 320 . Thus, the above example cannot generate a predicted data access request. Therefore, since the artificial neural network memory controller 320 fails to provide data in advance, a delay time of the memory 330 may occur for each memory access request.

하지만, 인공신경망 메모리 제어부(320)가 데이터 접근 요청들을 기록하였기 때문에 다시 프로세서(310)가 입력 레이어(1310)의 노드 값 읽기에 대응되는 토큰[1]의 데이터 접근 요청을 생성할 경우, 인공신경망 데이터 지역성 패턴(1400)을 생성할 수 있다.However, since the artificial neural network memory control unit 320 records data access requests, when the processor 310 again generates a data access request of the token [1] corresponding to reading the node value of the input layer 1310, the artificial neural network A data locality pattern 1400 may be generated.

이하에서는 도 4를 다시 참조하여, 인공신경망 데이터 지역성 패턴(1400)이 생성된 경우를 설명한다.Hereinafter, a case in which the artificial neural network data locality pattern 1400 is generated will be described with reference to FIG. 4 again.

이하의 예시는, 인공신경망 데이터 지역성 패턴(1400)이 생성되고, 프로세서(310)가 인공신경망모델(1300)을 반복 추론 중인 경우일 수 있다. 단, 이에 제한되지 않는다.The following example may be a case in which the artificial neural network data locality pattern 1400 is generated and the processor 310 is repeatedly inferring the artificial neural network model 1300 . However, the present invention is not limited thereto.

프로세서(310)는 반복된 토큰[1]의 데이터 접근 요청을 감지하여 인공신경망 데이터 지역성 패턴(1400)을 생성할 수 있다. 부연 설명하면, 인공신경망 메모리 제어부(320)가 토큰[1] 부터 토큰[9]를 순차적으로 저장하였기 때문에, 인공신경망 메모리 제어부(320)가 토큰[1]을 다시 감지할 때 인공신경망 데이터 지역성을 결정할 수 있다. The processor 310 may generate the artificial neural network data locality pattern 1400 by detecting the data access request of the repeated token [1]. To elaborate, since the artificial neural network memory controller 320 sequentially stores the token [1] to the token [9], when the artificial neural network memory controller 320 detects the token [1] again, the neural network data locality is can decide

다만, 상술하였듯이 본 개시의 예시들에 따른 인공신경망 메모리 제어부는 토큰에 제한되지 않으며, 토큰은 단지 설명의 편의를 위할 뿐이며, 데이터 접근 요청 및 메모리 접근 요청에 포함된 식별 정보에 의해서 본 개시의 예시들은 구현될 수 있다. However, as described above, the artificial neural network memory control unit according to the examples of the present disclosure is not limited to the token, and the token is only for convenience of description, and is an example of the present disclosure by the identification information included in the data access request and the memory access request. can be implemented.

예를 들면, 프로세서(310)가 토큰[9]의 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부(320)는 토큰[1]의 예측된 데이터 접근 요청을 생성한다. 따라서 인공신경망 메모리 제어부(320)는 토큰[1]의 메모리 접근 요청을 생성하여 입력 레이어(1310)의 노드 값을 사전에 캐쉬 메모리(322)에 저장할 수 있다. For example, when the processor 310 generates a data access request of the token [9], the artificial neural network memory control unit 320 generates a predicted data access request of the token [1]. Accordingly, the artificial neural network memory controller 320 may generate a memory access request of the token [1] and store the node value of the input layer 1310 in the cache memory 322 in advance.

즉, 토큰[9]의 데이터 접근 요청이 인공신경망모델(1300)의 마지막 단계라면, 인공신경망 메모리 제어부(320)는 인공신경망모델(1300)의 시작 단계인 토큰[1]의 데이터 접근 요청이 생성될 것으로 예측할 수 있다.That is, if the data access request of the token [9] is the last step of the artificial neural network model 1300, the artificial neural network memory control unit 320 generates the data access request of the token [1], which is the starting stage of the artificial neural network model 1300. can be predicted to be

이어서, 프로세서(310)가 토큰[1]의 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부(320)는 토큰[1]의 예측된 데이터 접근 요청과 토큰[1]의 데이터 접근 요청이 동일한지를 결정한다. 동일하다고 결정된 경우, 캐쉬 메모리(322)에 저장된 입력 레이어(1310)의 노드 값을 프로세서(310)에 바로 제공할 수 있다. Subsequently, when the processor 310 generates a data access request of the token [1], the artificial neural network memory control unit 320 determines whether the predicted data access request of the token [1] and the data access request of the token [1] are the same do. When it is determined that they are the same, the node value of the input layer 1310 stored in the cache memory 322 may be directly provided to the processor 310 .

이때, 인공신경망 메모리 제어부(320)는 토큰[2]의 예측된 데이터 접근 요청을 생성한다. At this time, the artificial neural network memory control unit 320 generates a predicted data access request of the token [2].

따라서, 인공신경망 메모리 제어부(320)는 토큰[2]의 메모리 접근 요청을 생성하여 제1 연결망(1320)의 가중치 값을 사전에 캐쉬 메모리(322)에 저장할 수 있다.Accordingly, the artificial neural network memory control unit 320 may generate a memory access request of the token [2] and store the weight value of the first connection network 1320 in the cache memory 322 in advance.

이어서, 프로세서(310)가 토큰[2]의 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부(320)는 토큰[2]의 예측된 데이터 접근 요청과 토큰[2]의 데이터 접근 요청이 동일한지를 결정한다. 동일하다고 결정된 경우, 캐쉬 메모리(322)에 저장된 제1 연결망(1320)의 노드 값을 프로세서(310)에 바로 제공할 수 있다. Subsequently, when the processor 310 generates a data access request of the token [2], the artificial neural network memory control unit 320 determines whether the predicted data access request of the token [2] and the data access request of the token [2] are the same do. When it is determined that they are the same, the node value of the first connection network 1320 stored in the cache memory 322 may be directly provided to the processor 310 .

이때, 인공신경망 메모리 제어부(320)는 토큰[3]의 예측된 데이터 접근 요청을 생성한다. At this time, the artificial neural network memory control unit 320 generates the predicted data access request of the token [3].

이어서, 프로세서(310)는 입력 레이어(1310)의 노드 값과 제1 연결망(1320)의 가중치 값을 전달 받아 제1 은닉 레이어(1330)의 노드 값을 연산할 수 있다. 프로세서(310)가 토큰[3]의 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부(320)는 토큰[3]의 예측된 데이터 접근 요청과 토큰[3]의 데이터 접근 요청이 동일한지를 결정한다. 동일하다고 결정된 경우, 연산된 제1 은닉 레이어(1330)의 노드 값이 메모리(330) 및/또는 캐쉬 메모리(322)에 저장될 수 있다. Subsequently, the processor 310 may receive the node value of the input layer 1310 and the weight value of the first connection network 1320 to calculate the node value of the first hidden layer 1330 . When the processor 310 generates a data access request of the token [3], the artificial neural network memory control unit 320 determines whether the predicted data access request of the token [3] and the data access request of the token [3] are the same. When it is determined that they are the same, the calculated node value of the first hidden layer 1330 may be stored in the memory 330 and/or the cache memory 322 .

캐쉬 메모리(322)에 대하여 부연 설명하면, 캐쉬 메모리(322) 없이 동일한 데이터가 토큰[3]의 메모리 접근 요청으로 메모리(330)에 저장되고, 다시 토큰[4]의 메모리 접근 요청으로 메모리(330)에서 읽어 올 경우, 메모리(330)의 지연시간이 2배가 될 수 있다. To elaborate on the cache memory 322, the same data without the cache memory 322 is stored in the memory 330 as a memory access request of the token [3], and again in the memory 330 with the memory access request of the token [4]. ), the delay time of the memory 330 may be doubled.

이러한 경우, 인공신경망 메모리 제어부(320)는 연속된 토큰들의 메모리 주소 값이 동일하고, 앞선 토큰의 동작 모드는 쓰기 모드이고 다음 토큰의 동작 모드는 읽기 모드인 것을 기초로 연산된 레이어의 노드 값을 저장하고, 해당 노드 값을 다음 레이어의 입력 값으로 사용한다고 결정하도록 구성될 수 있다.In this case, the neural network memory control unit 320 determines the node value of the layer calculated on the basis that the memory address values of consecutive tokens are the same, the operation mode of the preceding token is the write mode, and the operation mode of the next token is the read mode. and to determine to use the corresponding node value as an input value for the next layer.

즉, 캐쉬 메모리(322)에 토큰[3]의 데이터가 저장되면, 토큰[3] 및 토큰[4]에 대응되는 데이터 접근 요청이 캐쉬 메모리(322)에서 처리될 수 있다. 따라서 인공신경망 메모리 제어부(320)는 토큰[3]의 데이터 접근 요청과 토큰[4]의 데이터 접근 요청에 대응되는 메모리 접근 요청들을 생성하지 않도록 구성될 수 있다. 상술한 구성에 따르면 토큰[3]의 메모리 접근 요청 및 토큰[4]의 메모리 접근 요청으로 메모리(330)에 의한 메모리(330)의 지연시간을 제거할 수 있는 효과가 있다. 특히 이러한 캐쉬 메모리(322) 운영 정책은 인공신경망 데이터 지역성 패턴(1400)에 기초하여 실행될 수 있다. That is, when data of the token [3] is stored in the cache memory 322 , a data access request corresponding to the token [3] and the token [4] may be processed in the cache memory 322 . Therefore, the artificial neural network memory control unit 320 may be configured not to generate memory access requests corresponding to the data access request of the token [3] and the data access request of the token [4]. According to the above configuration, there is an effect that the delay time of the memory 330 due to the memory 330 can be eliminated by the memory access request of the token [3] and the memory access request of the token [4]. In particular, the cache memory 322 operation policy may be executed based on the artificial neural network data locality pattern 1400 .

이때, 인공신경망 메모리 제어부(320)는 토큰[4]의 예측된 데이터 접근 요청을 생성한다.At this time, the artificial neural network memory control unit 320 generates the predicted data access request of the token [4].

이어서, 프로세서(310)가 토큰[4]의 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부(320)는 토큰[4]의 예측된 데이터 접근 요청과 토큰[4]의 데이터 접근 요청이 동일한지를 결정한다. 동일하다고 결정된 경우, 캐쉬 메모리(322)에 저장된 제1 은닉 레이어(1330)의 노드 값을 프로세서(310)에 바로 제공할 수 있다. Subsequently, when the processor 310 generates a data access request of the token [4], the artificial neural network memory control unit 320 determines whether the predicted data access request of the token [4] and the data access request of the token [4] are the same do. When it is determined that they are the same, the node value of the first hidden layer 1330 stored in the cache memory 322 may be directly provided to the processor 310 .

이때, 인공신경망 메모리 제어부(320)는 토큰[5]의 예측된 데이터 접근 요청을 생성한다. At this time, the artificial neural network memory control unit 320 generates the predicted data access request of the token [5].

따라서, 인공신경망 메모리 제어부(320)는 토큰[5]의 메모리 접근 요청을 생성하여 제2 연결망(1340)의 가중치 값을 사전에 캐쉬 메모리(322)에 저장할 수 있다.Accordingly, the artificial neural network memory controller 320 may generate a memory access request of the token [5] and store the weight value of the second connection network 1340 in the cache memory 322 in advance.

이어서, 프로세서(310)가 토큰[5]의 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부(320)는 토큰[5]의 예측된 데이터 접근 요청과 토큰[5]의 데이터 접근 요청이 동일한지를 결정한다. 동일하다고 결정된 경우, 캐쉬 메모리(322)에 저장된 제2 연결망(1340)의 가중치 값을 프로세서(310)에 바로 제공할 수 있다. Subsequently, when the processor 310 generates a data access request of the token [5], the artificial neural network memory control unit 320 determines whether the predicted data access request of the token [5] and the data access request of the token [5] are the same do. When it is determined that they are the same, the weight value of the second connection network 1340 stored in the cache memory 322 may be directly provided to the processor 310 .

이때, 인공신경망 메모리 제어부(320)는 토큰[6]의 예측된 데이터 접근 요청을 생성한다. At this time, the artificial neural network memory control unit 320 generates the predicted data access request of the token [6].

이어서, 프로세서(310)는 제1 은닉 레이어(1330)의 노드 값과 제2 연결망(1340)의 가중치 값을 전달 받아 제2 은닉 레이어(1350)의 노드 값을 연산할 수 있다. 프로세서(310)가 토큰[6]의 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부(320)는 토큰[6]의 예측된 데이터 접근 요청과 토큰[6]의 데이터 접근 요청이 동일한지를 결정한다. 동일하다고 결정된 경우, 연산 된 제2 은닉 레이어(1350)의 노드 값을 메모리(330) 및/또는 캐쉬 메모리(322)에 저장할 수 있다. Subsequently, the processor 310 may receive the node value of the first hidden layer 1330 and the weight value of the second connection network 1340 to calculate the node value of the second hidden layer 1350 . When the processor 310 generates a data access request of the token [6], the artificial neural network memory control unit 320 determines whether the predicted data access request of the token [6] and the data access request of the token [6] are the same. When it is determined that they are the same, the calculated node value of the second hidden layer 1350 may be stored in the memory 330 and/or the cache memory 322 .

이때, 인공신경망 메모리 제어부(320)는 토큰[7]의 예측된 데이터 접근 요청을 생성한다.At this time, the artificial neural network memory control unit 320 generates the predicted data access request of the token [7].

이어서, 프로세서(310)가 토큰[7]의 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부(320)는 토큰[7]의 예측된 데이터 접근 요청과 토큰[7]의 데이터 접근 요청이 동일한지를 결정한다. 동일하다고 결정된 경우, 캐쉬 메모리(322)에 저장된 제2 은닉 레이어(1350)의 노드 값은 프로세서(310)에 바로 제공될 수 있다. Subsequently, when the processor 310 generates a data access request of the token [7], the artificial neural network memory control unit 320 determines whether the predicted data access request of the token [7] and the data access request of the token [7] are the same do. When it is determined that they are the same, the node value of the second hidden layer 1350 stored in the cache memory 322 may be directly provided to the processor 310 .

이때, 인공신경망 메모리 제어부(320)는 토큰[8]의 예측된 데이터 접근 요청을 생성한다. At this time, the artificial neural network memory control unit 320 generates the predicted data access request of the token [8].

따라서, 인공신경망 메모리 제어부(320)는 토큰[8]의 메모리 접근 요청을 생성하여 제3 연결망(1360)의 가중치 값을 사전에 캐쉬 메모리(322)에 저장할 수 있다.Accordingly, the artificial neural network memory controller 320 may generate a memory access request of the token [8] and store the weight value of the third connection network 1360 in the cache memory 322 in advance.

이어서, 프로세서(310)가 토큰[8]의 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부(320)는 토큰[8]의 예측된 데이터 접근 요청과 토큰[8]의 데이터 접근 요청이 동일한지를 결정한다. 동일하다고 결정된 경우, 캐쉬 메모리(322)에 저장된 제3 연결망(1360)의 가중치 값을 프로세서(310)에 바로 제공할 수 있다. Subsequently, when the processor 310 generates a data access request of the token [8], the artificial neural network memory control unit 320 determines whether the predicted data access request of the token [8] and the data access request of the token [8] are the same do. When it is determined that they are the same, the weight value of the third connection network 1360 stored in the cache memory 322 may be directly provided to the processor 310 .

이때, 인공신경망 메모리 제어부(320)는 토큰[9]의 예측된 데이터 접근 요청을 생성한다. At this time, the artificial neural network memory control unit 320 generates the predicted data access request of the token [9].

이어서, 프로세서(310)는 제2 은닉 레이어(1350)의 노드 값과 제3 연결망(1360)의 가중치 값을 전달 받아 출력 레이어(1370)의 노드 값을 연산할 수 있다. 프로세서(310)가 토큰[9]의 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부(320)는 토큰[9]의 예측된 데이터 접근 요청과 토큰[9]의 데이터 접근 요청이 동일한지를 결정한다. 동일하다고 결정된 경우, 연산 된 출력 레이어(1370)의 노드 값을 메모리(330) 및/또는 캐쉬 메모리(322)에 저장할 수 있다. Subsequently, the processor 310 may calculate the node value of the output layer 1370 by receiving the node value of the second hidden layer 1350 and the weight value of the third connection network 1360 . When the processor 310 generates a data access request of the token [9], the artificial neural network memory control unit 320 determines whether the predicted data access request of the token [9] and the data access request of the token [9] are the same. When it is determined that they are the same, the calculated node value of the output layer 1370 may be stored in the memory 330 and/or the cache memory 322 .

따라서, 인공신경망 메모리 시스템(300)은 출력 레이어(1370)에 인공신경망모델(1300)의 추론 결과를 저장할 수 있다.Accordingly, the artificial neural network memory system 300 may store the inference result of the artificial neural network model 1300 in the output layer 1370 .

인공신경망 메모리 시스템(300)은 인공신경망 데이터 지역성 패턴(1400)에 의해서 인공신경망모델(1300)의 추론이 끝나더라도 다음 추론을 즉각 시작하도록 준비할 수 있는 효과가 있다.The artificial neural network memory system 300 has the effect of preparing to immediately start the next inference even after the inference of the artificial neural network model 1300 is finished by the artificial neural network data locality pattern 1400 .

즉, 본 개시의 또 다른 예시에 따른 인공신경망 메모리 시스템(300)은 인공신경망 데이터 지역성에 기초하여 예측된 데이터 접근 요청을 생성하고, 예측된 데이터 접근 요청과 실제 데이터 접근 요청이 동일한지 결정하고, 동일할 경우 다음 순서의 예측된 데이터 접근 요청을 더 생성하도록 구성될 수 있다. 상술한 구성에 따르면, 인공신경망 메모리 제어부(320)는 각각의 데이터 접근 요청 처리 시 메모리(320)의 지연시간을 제거 또는 저감할 수 있는 효과가 있다.That is, the neural network memory system 300 according to another example of the present disclosure generates a predicted data access request based on the artificial neural network data locality, and determines whether the predicted data access request and the actual data access request are the same, If equal, it may be configured to further generate the next sequence of predicted data access requests. According to the above configuration, the artificial neural network memory control unit 320 has an effect of removing or reducing the delay time of the memory 320 when processing each data access request.

몇몇 예시에서는, 인공신경망 메모리 제어부는 예측된 데이터 접근 요청을 적어도 하나 이상 생성하여 캐쉬 메모리의 여유 공간을 최소화하도록 동작하도록 구성될 수 있다.In some examples, the artificial neural network memory controller may be configured to generate at least one predicted data access request to operate to minimize the free space of the cache memory.

즉, 인공신경망 메모리 제어부는 캐쉬 메모리의 메모리 여유 공간과 저장할 데이터 값의 크기를 비교하여, 캐쉬 메모리의 메모리 여유 공간이 있는 경우, 예측된 데이터 접근 요청을 적어도 하나 이상 생성하여 캐쉬 메모리의 여유 공간을 최소화하도록 구성될 수 있다. That is, the artificial neural network memory controller compares the free memory space of the cache memory with the size of the data value to be stored. can be configured to be minimized.

즉, 캐쉬 메모리의 용량에 따라 인공신경망 메모리 제어부가 복수개의 예측된 데이터 접근 요청들을 생성하도록 구성될 수 있다. That is, the artificial neural network memory control unit may be configured to generate a plurality of predicted data access requests according to the capacity of the cache memory.

즉, 인공신경망 메모리 제어부는 캐쉬 메모리의 잔여 용량에 기초 하여 메모리 접근 요청을 적어도 하나 이상 순차적으로 생성하여 캐쉬 메모리의 잔여 용량이 최소화되도록 구성될 수 있다.That is, the artificial neural network memory control unit may be configured such that the remaining capacity of the cache memory is minimized by sequentially generating at least one or more memory access requests based on the remaining capacity of the cache memory.

도 2 내지 도 6을 참조하여 예시를 설명 한다. 프로세서가 토큰[1]의 데이터 접근 요청을 생성하면, 인공신경망 메모리 제어부는 토큰[2]의 예측된 데이터 접근 요청을 생성하여 제1 연결망(1320)의 가중치 값을 사전에 캐쉬 메모리에 저장할 수 있다. 이어서, 인공신경망 메모리 제어부는 토큰[3] 및 토큰[4]에 대응되는 제1 은닉 레이어(1330)의 노드 값 연산 처리 결과를 저장하고 읽을 공간을 사전에 캐쉬 메모리에 할당할 수 있다. 이어서 인공신경망 메모리 제어부는 토큰[5]에 대응되는 제2 연결망(1340)의 가중치 값을 사전에 캐쉬 메모리에 저장할 수 있다. 여기서 인공신경망 메모리 제어부는 캐쉬 메모리에 여유가 있을 경우, 인공신경망 데이터 지역성 패턴에 기초하여 예측된 데이터 접근 요청을 순차적으로 더 생성하도록 구성될 수 있다. 즉, 캐쉬 메모리에 용량의 여유가 있는 경우, 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴에 기초하여 캐쉬 메모리에 가중치 값들을 미리 저장하거나 인공신경망 연산 결과를 저장할 영역을 사전에 확보하도록 구성될 수 있다. An example will be described with reference to FIGS. 2 to 6 . When the processor generates a data access request of the token [1], the artificial neural network memory control unit generates a predicted data access request of the token [2] and stores the weight value of the first connection network 1320 in the cache memory in advance. . Subsequently, the artificial neural network memory controller may store the node value operation processing result of the first hidden layer 1330 corresponding to the token [3] and the token [4] and allocate a read space to the cache memory in advance. Subsequently, the artificial neural network memory controller may store the weight value of the second connection network 1340 corresponding to the token [5] in the cache memory in advance. Here, the artificial neural network memory control unit may be configured to sequentially further generate a predicted data access request based on the artificial neural network data locality pattern when there is room in the cache memory. That is, when the cache memory has a sufficient capacity, the artificial neural network memory control unit may be configured to pre-store weight values in the cache memory based on the artificial neural network data locality pattern or to secure a region to store the artificial neural network calculation results in advance. .

만약, 캐쉬 메모리의 용량이 충분할 경우, 인공신경망모델(1300)의 모든 연결망들의 가중치 값들을 캐쉬 메모리에 저장하도록 구성될 수 있다. 특히, 학습이 완료된 인공신경망모델의 경우 가중치 값들은 고정된다. 따라서 가중치 값들이 캐쉬 메모리에 상주할 경우, 가중치 값들을 읽기 위한 메모리 접근 요청에 의한 메모리 지연시간을 제거할 수 있는 효과가 있다. If the capacity of the cache memory is sufficient, it may be configured to store the weight values of all the connected networks of the artificial neural network model 1300 in the cache memory. In particular, in the case of an artificial neural network model that has been trained, the weight values are fixed. Therefore, when the weight values reside in the cache memory, there is an effect that memory delay time due to a memory access request for reading the weight values can be eliminated.

상술한 구성에 따르면, 인공신경망 데이터 지역성을 기초로 캐쉬 메모리에 필요한 데이터를 저장함으로 캐쉬 메모리의 가동 효율을 최적화 하고 인공신경망 메모리 시스템(300)의 처리 속도를 향상시킬 수 있는 효과가 있다.According to the above configuration, there is an effect of optimizing the operation efficiency of the cache memory and improving the processing speed of the artificial neural network memory system 300 by storing necessary data in the cache memory based on the locality of the artificial neural network data.

상술한 구성에 따르면 캐쉬 메모리가 인공신경망 데이터 지역성 패턴 및 캐쉬 메모리의 용량을 모두 고려하여 예측된 데이터 접근 요청을 순차적으로 생성하기 때문에, 인공신경망 메모리 시스템의 처리 속도가 향상될 수 있는 효과가 있다.According to the above configuration, since the cache memory sequentially generates predicted data access requests in consideration of both the neural network data locality pattern and the capacity of the cache memory, there is an effect that the processing speed of the artificial neural network memory system can be improved.

상술한 구성에 따르면, 프로세서가 인공신경망 데이터 지역성 패턴(1400)에 포함된 특정 데이터 접근 요청을 생성하면 인공신경망 메모리 제어부는 특정 데이터 접근 요청 이후의 데이터 접근 요청들을 적어도 하나 이상 순차적으로 예측할 수 있는 효과가 있다. 예를 들면, 토큰[1]의 데이터 접근 요청을 프로세서가 생성하면, 인공신경망 메모리 제어부는 토큰 [2-3-4-5-6-7-8-9] 순서로 대응되는 데이터 접근 요청들이 생성될 것을 예측할 수 있는 효과가 있다. According to the above configuration, when the processor generates a specific data access request included in the artificial neural network data locality pattern 1400, the neural network memory controller can sequentially predict at least one or more data access requests after the specific data access request. there is For example, if the processor generates a data access request for token [1], the artificial neural network memory controller generates data access requests corresponding to token [2-3-4-5-6-7-8-9] It has the ability to predict what will happen.

상술한 구성에 따르면, 인공신경망 메모리 제어부(320)는 특정 가중치 값들은 캐쉬 메모리에 특정 기간동안 상주 시킬 수 있다. 예를 들어, 프로세서가 초당 30회 속도로 인공신경망모델을 활용해 추론을 할 경우, 특정 레이어의 가중치 값을 캐쉬 메모리에 상주시킬 수 있다. 이러한 경우, 인공신경망 메모리 제어부는 캐쉬 메모리에 저장된 가중치 값을 각각의 추론마다 재활용할 수 있는 효과가 있다. 따라서 대응되는 메모리 접근 요청을 선택적으로 삭제할 수 있는 효과가 있다. 따라서 메모리 접근 요청에 따른 지연시간을 제거할 수 있는 효과가 있다.According to the above-described configuration, the artificial neural network memory control unit 320 may allow specific weight values to reside in the cache memory for a specific period. For example, when the processor makes inference using an artificial neural network model at a speed of 30 times per second, the weight value of a specific layer can be resident in the cache memory. In this case, the artificial neural network memory control unit has the effect of reusing the weight value stored in the cache memory for each inference. Accordingly, there is an effect of selectively deleting the corresponding memory access request. Therefore, there is an effect that can remove the delay time due to the memory access request.

몇몇 예시에서는, 캐쉬 메모리는 계층화 된 복수의 캐쉬 메모리로 구성될 수 있다. 예를 들면, 가중치 값을 저장하도록 구성된 캐쉬 메모리 또는 특징맵을 저장하도록 구성된 캐쉬 메모리 등을 포함할 수 있다.In some examples, the cache memory may be configured as a plurality of layered cache memories. For example, it may include a cache memory configured to store a weight value or a cache memory configured to store a feature map, and the like.

몇몇 예시에서는, 인공신경망 데이터 지역성 패턴(1400)이 생성된 경우, 인공신경망 메모리 제어부는 데이터 접근 요청에 포함된 식별 정보에 기초하여 가중치 값, 노드 값을 예측하도록 구성될 수 있다. 따라서 인공신경망 메모리 제어부는 가중치 값에 대응되는 데이터 접근 요청을 식별하도록 구성될 수 있다. 구체적으로 설명하면, 학습이 완료되어 연결망의 가중치 값이 고정되었다고 가정하면, 인공신경망 데이터 지역성 패턴(1400)에서 가중치 값은 읽기 모드로만 동작하도록 구성될 수 있다. 따라서 인공신경망 메모리 제어부는 토큰[2], 토큰[5], 토큰[8]를 가중치 값으로 결정할 수 있다. 부연 설명하면, 토큰[1]은 추론의 시작 단계이기 때문에 입력 노드 값이라고 결정할 수 있다. 부연 설명하면, 토큰[9]는 추론의 마지막 단계이기 때문에 출력 노드 값이라고 결정할 수 있다. 부연 설명하면, 토큰[3][4]는 동일한 메모리 주소 값의 쓰기 모드 및 읽기 모드의 순서를 가지기 때문에 은닉 레이어의 노드 값이라고 결정할 수 있다. 단, 이는 인공신경망모델의 인공신경망 데이터 지역성에 따라 달라질 수 있다.In some examples, when the artificial neural network data locality pattern 1400 is generated, the artificial neural network memory controller may be configured to predict a weight value and a node value based on identification information included in the data access request. Accordingly, the artificial neural network memory controller may be configured to identify a data access request corresponding to a weight value. Specifically, assuming that the learning is completed and the weight value of the connection network is fixed, the weight value in the artificial neural network data locality pattern 1400 may be configured to operate only in the read mode. Therefore, the artificial neural network memory controller can determine the token [2], token [5], and token [8] as weight values. To elaborate, it can be determined that the token [1] is an input node value because it is the starting stage of inference. To elaborate, it can be determined that the token [9] is the output node value because it is the last stage of inference. In more detail, since tokens [3] [4] have the order of the write mode and the read mode of the same memory address value, it can be determined as a node value of the hidden layer. However, this may vary depending on the locality of the artificial neural network data of the artificial neural network model.

인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴을 분석하여 각각의 데이터 접근 요청이 인공신경망모델의 가중치 값, 커널 윈도우 값, 노드 값, 활성화 맵 값 등인지를 결정하도록 구성될 수 있다. The artificial neural network memory controller may be configured to analyze the artificial neural network data locality pattern to determine whether each data access request is a weight value of the neural network model, a kernel window value, a node value, an activation map value, or the like.

몇몇 예시에서는, 인공신경망 메모리 시스템은 인공신경망 연산에 대응되는 데이터 접근 요청을 생성하도록 구성된 프로세서, 컴파일러가 생성한 인공신경망 데이터 지역성 패턴을 저장하도록 구성되고, 인공신경망 데이터 지역성 패턴에 기초하여 프로세서가 생성한 데이터 접근 요청의 실제 데이터 접근 요청을 예측한 예측된 데이터 접근 요청을 생성하도록 구성된, 인공신경망 메모리 제어부, 및 인공신경망 메모리 제어부와 통신하도록 구성된 메모리를 포함한다. 메모리는 인공신경망 메모리 제어부에서 출력되는 메모리 접근 요청에 대응하여 동작하도록 구성될 수 있다.In some examples, the neural network memory system is configured to store a processor configured to generate a data access request corresponding to an artificial neural network operation, a compiler generated neural network data locality pattern, and the processor generated based on the artificial neural network data locality pattern and a neural network memory controller, configured to generate a predicted data access request that predicts the actual data access request of the one data access request, and a memory configured to communicate with the neural network memory controller. The memory may be configured to operate in response to a memory access request output from the artificial neural network memory controller.

상술한 구성에 따르면, 인공신경망 메모리 제어부는 컴파일러로부터 생성된 인공신경망 데이터 지역성 패턴을 제공받도록 구성될 수 있다. 이러한 경우, 인공신공망 메모리 제어부는 컴파일러가 생성한 인공신경망 데이터 지역성 패턴을 기초로 프로세서가 처리중인 인공신경망모델의 데이터 접근 요청들을 사전에 캐쉬 메모리에 준비시킬 수 있는 효과가 있다. 특히 컴파일러가 생성한 인공신경망 데이터 지역성 패턴은 인공신경망 데이터 지역성을 모니터링하여 생성된 인공신경망 데이터 지역성 패턴보다 더 정확할 수 있는 효과가 있다. According to the above configuration, the artificial neural network memory controller may be configured to receive the artificial neural network data locality pattern generated by the compiler. In this case, the artificial neural network memory controller has the effect of preparing in advance the data access requests of the artificial neural network model being processed by the processor in the cache memory based on the artificial neural network data locality pattern generated by the compiler. In particular, the artificial neural network data locality pattern generated by the compiler has the effect of being more accurate than the artificial neural network data locality pattern generated by monitoring the artificial neural network data locality.

부연 설명하면, 인공신경망 메모리 제어부는 컴파일러에 의해 생성된 인공신경망 데이터 지역성 패턴과 자체적으로 데이터 접근 요청을 모니터링 하여 생성한 인공신경망 데이터 지역성 패턴을 각각 저장하도록 구성될 수 있다. In other words, the artificial neural network memory controller may be configured to store the artificial neural network data locality pattern generated by the compiler and the artificial neural network data locality pattern generated by monitoring data access requests by itself.

도 12는 데이터 접근 요청의 예시적인 식별 정보를 설명하는 개략도이다.12 is a schematic diagram illustrating exemplary identification information of a data access request.

본 개시의 예시들에 따른 프로세서가 생성하는 데이터 접근 요청은 적어도 하나의 추가 식별 정보를 더 포함하도록 구성될 수 있다. 추가 식별 정보는 사이드 밴드 신호 또는 정보로 지칭되는 것도 가능하다. The data access request generated by the processor according to examples of the present disclosure may be configured to further include at least one piece of additional identification information. The additional identifying information may also be referred to as a side band signal or information.

프로세서가 생성하는 데이터 접근 요청은 특정한 구조의 인터페이스 신호일 수 있다. 즉, 데이터 접근 요청은 프로세서와 인공신경망 메모리 제어부의 통신을 위한 인터페이스 신호일 수 있다. 데이터 접근 요청은 인터페이스 신호에 추가 비트를 더 포함하여 인공신공망 연산에 필요한 식별 정보를 추가적으로 제공하도록 구성될 수 있다. 단, 본 개시는 이에 제한되지 않으며, 다양한 방식으로 추가 식별 정보를 제공하도록 구성될 수 있다. The data access request generated by the processor may be an interface signal having a specific structure. That is, the data access request may be an interface signal for communication between the processor and the artificial neural network memory controller. The data access request may be configured to further include additional bits in the interface signal to additionally provide identification information necessary for artificial artificial intelligence network operation. However, the present disclosure is not limited thereto, and may be configured to provide additional identification information in various ways.

몇몇 예시에서는, 인공신경망 메모리 시스템의 데이터 접근 요청은 인공신경망 연산인지 여부를 식별할 수 있는 식별 정보를 더 포함하도록 구성될 수 있다. 단, 본 개시의 예시들은 이에 제한되지 않는다.In some examples, the data access request of the artificial neural network memory system may be configured to further include identification information capable of identifying whether the artificial neural network operation is performed. However, examples of the present disclosure are not limited thereto.

예를 들면, 인공신경망 메모리 시스템은 데이터 접근 요청에 1 비트의 식별 코드를 추가하여 인공신경망 메모리 제어부가 수신한 데이터 접근 요청이 인공신경망 연산과 관련된 데이터 접근 요청인지를 식별하도록 구성될 수 있다. 단 본 개시의 예시들에 따른 식별 코드의 비트 수는 제한되지 않으며, 식별하고자 하는 대상의 경우의 수에 따라 조절될 수 있다. For example, the artificial neural network memory system may be configured to add a 1-bit identification code to the data access request to identify whether the data access request received by the artificial neural network memory controller is a data access request related to an artificial neural network operation. However, the number of bits of the identification code according to examples of the present disclosure is not limited, and may be adjusted according to the number of cases to be identified.

예를 들면, 식별 코드가 [0]일 경우, 인공신경망 메모리 제어부는 해당 데이터 접근 요청이 인공신경망 연산과 관련 있다고 결정하도록 구성될 수 있다. For example, when the identification code is [0], the artificial neural network memory control unit may be configured to determine that the corresponding data access request is related to the artificial neural network operation.

예를 들면, 식별 코드가 [1]일 경우 인공신경망 메모리 제어부는 해당 데이터 접근 요청이 인공신경망 연산과 관련 없다고 결정하도록 구성될 수 있다. For example, when the identification code is [1], the artificial neural network memory control unit may be configured to determine that the corresponding data access request is not related to the artificial neural network operation.

이러한 경우, 인공신경망 메모리 제어부는 데이터 접근 요청에 포함된 식별 정보에 기초하여 인공신경망 연산과 관련된 데이터 접근 요청만 기록하여 인공신경망 데이터 지역성 패턴을 생성하도록 구성될 수 있다. 상술한 구성에 따르면, 인공신경망 메모리 제어부는 인공신경망 연산과 관련 없는 데이터 접근 요청은 기록하지 않을 수 있다. 따라서 데이터 접근 요청들을 기록하여 생성되는 인공신경망 데이터 지역성 패턴의 정확도를 향상시킬 수 있는 효과가 있다. 단, 본 개시의 예시들은 이에 제한되지 않는다. In this case, the artificial neural network memory controller may be configured to generate an artificial neural network data locality pattern by recording only a data access request related to an artificial neural network operation based on identification information included in the data access request. According to the above configuration, the artificial neural network memory controller may not record data access requests that are not related to artificial neural network operation. Therefore, it is possible to improve the accuracy of the artificial neural network data locality pattern generated by recording data access requests. However, examples of the present disclosure are not limited thereto.

몇몇 예시에서는, 인공신경망 메모리 시스템의 데이터 접근 요청은 인공신경망 연산이 학습을 위한 연산인지 또는 추론을 위한 연산인지 여부를 식별할 수 있는 식별 정보를 더 포함하도록 구성될 수 있다. 단, 본 개시의 예시들은 이에 제한되지 않는다.In some examples, the data access request of the artificial neural network memory system may be configured to further include identification information capable of identifying whether the artificial neural network operation is an operation for learning or an operation for inference. However, examples of the present disclosure are not limited thereto.

예를 들면, 인공신경망 메모리 시스템은 데이터 접근 요청에 1 비트의 식별 코드를 추가하여 인공신경망 메모리 제어부가 수신한 데이터 접근 요청이 인공신경망모델의 동작 유형이 학습인지 또는 추론인지를 식별하도록 구성될 수 있다. 단 본 개시의 예시들에 따른 식별 코드의 비트 수는 제한되지 않으며, 식별하고자 하는 대상의 경우의 수에 따라 조절될 수 있다.For example, the neural network memory system may be configured to add a 1-bit identification code to the data access request so that the data access request received by the artificial neural network memory controller identifies whether the operation type of the neural network model is learning or inference. there is. However, the number of bits of the identification code according to examples of the present disclosure is not limited, and may be adjusted according to the number of cases to be identified.

예를 들면, 식별 코드가 [0]일 경우, 인공신경망 메모리 제어부는 해당 데이터 접근 요청이 학습 동작으로 결정하도록 구성될 수 있다.For example, when the identification code is [0], the artificial neural network memory controller may be configured to determine the corresponding data access request as a learning operation.

예를 들면, 식별 코드가 [1]일 경우, 인공신경망 메모리 제어부는 해당 데이터 접근 요청이 추론 동작 결정하도록 구성될 수 있다.For example, when the identification code is [1], the artificial neural network memory controller may be configured to determine the reasoning operation of the corresponding data access request.

이러한 경우, 인공신경망 메모리 제어부는 학습 동작의 데이터 접근 요청과 추론 동작의 데이터 접근 요청을 구분하여 기록하여 인공신경망 데이터 지역성 패턴을 생성하도록 구성될 수 있다. 예를 들면, 학습 모드에선, 인공신경망모델의 각각의 레이어 및/또는 커널 윈도우의 가중치 값들을 갱신할 수 있고, 학습된 인공신경망모델의 추론 정확도를 결정하는 평가 단계가 더 포함될 수 있다. 따라서 인공신경망모델의 구조가 동일하더라도, 프로세서가 처리하는 인공신경망 데이터 지역성은 학습 동작 또는 추론 동작 시 서로 상이할 수 있다. In this case, the artificial neural network memory controller may be configured to separately record the data access request of the learning operation and the data access request of the inference operation to generate the artificial neural network data locality pattern. For example, in the learning mode, weight values of each layer and/or kernel window of the artificial neural network model may be updated, and an evaluation step of determining the inference accuracy of the learned artificial neural network model may be further included. Therefore, even if the structure of the artificial neural network model is the same, the locality of the artificial neural network data processed by the processor may be different from each other during a learning operation or an inference operation.

상술한 구성에 따르면, 인공신경망 메모리 제어부는 특정 인공신경망모델의 학습 모드의 인공신경망 데이터 지역성 패턴과 추론 모드의 인공신경망 데이터 지역성 패턴을 구분하여 생성하도록 구성될 수 있다. 따라서 인공신경망 메모리 제어부가 데이터 접근 요청들을 기록하여 생성한 인공신경망 데이터 지역성 패턴의 정확도를 향상시킬 수 있는 효과가 있다. 단, 본 개시의 예시들은 이에 제한되지 않는다. According to the above-described configuration, the artificial neural network memory controller may be configured to separately generate an artificial neural network data locality pattern of a learning mode and an artificial neural network data locality pattern of an inference mode of a specific artificial neural network model. Accordingly, there is an effect of improving the accuracy of the artificial neural network data locality pattern generated by the neural network memory controller recording data access requests. However, examples of the present disclosure are not limited thereto.

몇몇 예시에서는, 인공신경망 메모리 시스템의 데이터 접근 요청은 메모리 읽기 동작 및 메모리 쓰기 동작을 식별할 수 있는 식별 정보를 포함하는 동작 모드로 구성될 수 있다. 단, 이에 제한되지 않으며, 인공신경망 메모리 시스템의 데이터 접근 요청은 덮어쓰기 동작 및/또는 보호 동작을 식별할 수 있는 식별 정보를 더 포함하는 동작 모드로 구성될 수 있다. 단, 본 개시의 예시들은 이에 제한되지 않는다.In some examples, the data access request of the artificial neural network memory system may be configured in an operation mode including identification information capable of identifying a memory read operation and a memory write operation. However, the present invention is not limited thereto, and the data access request of the artificial neural network memory system may be configured in an operation mode that further includes identification information for identifying an overwrite operation and/or a protection operation. However, examples of the present disclosure are not limited thereto.

예를 들면, 인공신경망 메모리 시스템의 데이터 접근 요청에 1 비트의 식별 코드를 추가하여 읽기 동작과 쓰기 동작을 포함하도록 구성될 수 있다. 또는 인공신경망 메모리 시스템의 데이터 접근 요청에 2 비트의 식별 코드를 추가하여 읽기 동작, 쓰기 동작, 덮어쓰기 동작, 및 보호 동작을 식별하도록 구성될 수 있다. 단 본 개시의 예시들에 따른 식별 코드의 비트 수는 제한되지 않으며, 식별하고자 하는 대상의 경우의 수에 따라 조절될 수 있다.For example, it may be configured to include a read operation and a write operation by adding a 1-bit identification code to the data access request of the artificial neural network memory system. Alternatively, it may be configured to identify a read operation, a write operation, an overwrite operation, and a protection operation by adding a 2-bit identification code to the data access request of the artificial neural network memory system. However, the number of bits of the identification code according to examples of the present disclosure is not limited, and may be adjusted according to the number of cases to be identified.

부연 설명하면, 인공신경망 메모리 시스템의 동작을 위해서 데이터 접근 요청은 적어도 메모리 주소 값과 읽기 동작 및 쓰기 동작을 식별할 수 있는 식별 정보를 포함해야 한다. 인공신경망 메모리 제어부는 데이터 접근 요청을 수신하여 대응되는 메모리 접근 요청을 생성하여 메모리 동작을 수행하도록 구성될 수 있다.In other words, for the operation of the artificial neural network memory system, the data access request must include at least a memory address value and identification information for identifying read and write operations. The artificial neural network memory controller may be configured to receive a data access request and generate a corresponding memory access request to perform a memory operation.

예를 들면, 식별 코드가 [000]일 경우, 인공신경망 메모리 제어부는 해당 데이터 접근 요청은 읽기 동작으로 결정하도록 구성될 수 있다.For example, when the identification code is [000], the artificial neural network memory controller may be configured to determine the corresponding data access request as a read operation.

예를 들면, 식별 코드가 [001]일 경우, 인공신경망 메모리 제어부는 해당 데이터 접근 요청은 쓰기 동작으로 결정하도록 구성될 수 있다.For example, when the identification code is [001], the artificial neural network memory control unit may be configured to determine the corresponding data access request as a write operation.

예를 들면, 식별 코드가 [010]일 경우, 인공신경망 메모리 제어부는 해당 데이터 접근 요청은 덮어쓰기 동작으로 결정하도록 구성될 수 있다.For example, when the identification code is [010], the artificial neural network memory controller may be configured to determine the corresponding data access request as an overwrite operation.

예를 들면, 식별 코드가 [011]일 경우, 인공신경망 메모리 제어부는 해당 데이터 접근 요청은 보호 동작으로 결정하도록 구성될 수 있다.For example, when the identification code is [011], the artificial neural network memory control unit may be configured to determine the corresponding data access request as a protection operation.

예를 들면, 식별 코드가 [100]일 경우, 인공신경망 메모리 제어부는 해당 데이터 접근 요청은 읽기-버스트 동작으로 결정하도록 구성될 수 있다.For example, when the identification code is [100], the artificial neural network memory controller may be configured to determine the corresponding data access request as a read-burst operation.

예를 들면, 식별 코드가 [001]일 경우, 인공신경망 메모리 제어부는 해당 데이터 접근 요청은 쓰기-버스트 동작으로 결정하도록 구성될 수 있다.For example, when the identification code is [001], the artificial neural network memory controller may be configured to determine the corresponding data access request as a write-burst operation.

단, 본 개시의 예시들은 이에 제한되지 않는다. However, examples of the present disclosure are not limited thereto.

상술한 구성에 따르면, 인공신경망 메모리 제어부는 읽기 모드 또는 쓰기 모드에 따라 메모리를 제어하여 인공신경망모델의 다양한 데이터를 메모리로부터 제공받거나 또는 메모리에 저장할 수 있다.According to the above configuration, the artificial neural network memory controller may receive various data of the artificial neural network model from the memory or store it in the memory by controlling the memory according to the read mode or the write mode.

상술한 구성에 따르면, 인공신경망 메모리 제어부는 인공신경망의 학습 동작 시 덮어쓰기 모드에 의해서 특정 레이어의 가중치 값을 갱신할 수 있다. 특히 갱신된 가중치 값은 동일한 메모리 주소 값에 저장되기 때문에 새로운 메모리 주소를 할당하지 않을 수 있다. 따라서 쓰기 모드보다 덮어쓰기 모드가 학습 동작 시 더 효율적일 수 있다. According to the above configuration, the artificial neural network memory controller may update the weight value of a specific layer by the overwrite mode during the learning operation of the artificial neural network. In particular, since the updated weight value is stored in the same memory address value, a new memory address may not be allocated. Therefore, the overwrite mode may be more efficient in the learning operation than the write mode.

상술한 구성에 따르면, 인공신경망 메모리 제어부는 보호 모드에 의해서 특정 메모리 주소에 저장된 데이터를 보호할 수 있다. 특히 서버와 같은 다수의 사용자가 접근하는 환경에서 인공신경망모델의 데이터가 임의로 삭제되는 것을 방지할 수 있는 효과가 있다. 또한 학습이 완료된 인공신경망모델의 가중치 값들을 보호 모드로 보호하는 것도 가능하다.According to the above configuration, the artificial neural network memory controller can protect data stored in a specific memory address by the protection mode. In particular, there is an effect that can prevent the data of the artificial neural network model from being deleted arbitrarily in an environment accessed by multiple users, such as a server. It is also possible to protect the weight values of the artificial neural network model that has been trained in the protection mode.

몇몇 예시에서는, 인공신경망 메모리 시스템의 데이터 접근 요청은 추론 데이터, 가중치, 특징맵, 학습 데이터 세트, 평가 데이터 세트 및 기타 여부를 식별할 수 있는 식별 정보를 더 포함하도록 구성될 수 있다. 단, 본 개시의 예시들은 이에 제한되지 않는다.In some examples, the data access request of the artificial neural network memory system may be configured to further include identification information capable of identifying whether inference data, weights, feature maps, training data sets, evaluation data sets, and others. However, examples of the present disclosure are not limited thereto.

예를 들면, 인공신경망 메모리 시스템은 데이터 접근 요청에 3 비트의 식별 코드를 추가하여 인공신경망 메모리 제어부가 접근할 데이터의 도메인을 식별하도록 구성될 수 있다. 단 본 개시의 예시들에 따른 식별 코드의 비트 수는 제한되지 않으며, 식별하고자 하는 대상의 경우의 수에 따라 조절될 수 있다.For example, the artificial neural network memory system may be configured to add a 3-bit identification code to the data access request so that the artificial neural network memory controller identifies a domain of data to be accessed. However, the number of bits of the identification code according to examples of the present disclosure is not limited, and may be adjusted according to the number of cases to be identified.

예를 들면, 식별 코드가 [000]일 경우, 인공신경망 메모리 제어부는 해당 데이터가 인공신경망모델과 관련 없는 데이터로 결정하도록 구성될 수 있다.For example, when the identification code is [000], the artificial neural network memory controller may be configured to determine that the corresponding data is data unrelated to the artificial neural network model.

예를 들면, 식별 코드가 [001]일 경우, 인공신경망 메모리 제어부는 해당 데이터가 인공신경망모델의 추론 데이터로 결정하도록 구성될 수 있다.For example, when the identification code is [001], the artificial neural network memory controller may be configured to determine the corresponding data as inferred data of the artificial neural network model.

예를 들면, 식별 코드가 [010]일 경우, 인공신경망 메모리 제어부는 해당 데이터가 인공신경망모델의 특징맵으로 결정하도록 구성될 수 있다.For example, when the identification code is [010], the artificial neural network memory controller may be configured to determine the corresponding data as a feature map of the artificial neural network model.

예를 들면, 식별 코드가 [011]일 경우, 인공신경망 메모리 제어부는 해당 데이터가 인공신경망모델의 가중치로 결정하도록 구성될 수 있다.For example, when the identification code is [011], the artificial neural network memory control unit may be configured to determine the corresponding data as a weight of the artificial neural network model.

예를 들면, 식별 코드가 [100]일 경우, 인공신경망 메모리 제어부는 해당 데이터가 인공신경망모델의 학습 데이터 세트로 결정하도록 구성될 수 있다.For example, when the identification code is [100], the artificial neural network memory control unit may be configured to determine the corresponding data as a training data set of the artificial neural network model.

예를 들면, 식별 코드가 [101]일 경우, 인공신경망 메모리 제어부는 해당 데이터가 인공신경망모델의 추론 데이터 세트로 결정하도록 구성될 수 있다. For example, when the identification code is [101], the artificial neural network memory control unit may be configured to determine the corresponding data as an inference data set of the artificial neural network model.

상술한 구성에 따르면, 인공신경망 메모리 제어부는 인공신경망모델의 데이터의 도메인을 식별하고, 각각의 도메인에 해당되는 데이터가 저장되는 메모리의 주소를 할당하도록 구성될 수 있다. 예를 들면, 인공신경망 메모리 제어부는 각각의 도메인에 할당된 메모리 영역의 시작 수조 및 끝 주소를 설정할 수 있다. 상술한 구성에 따르면, 각각의 도메인에 할당된 데이터를 인공신경망 데이터 지역성 패턴의 순서에 대응되도록 저장할 수 있다.According to the above-described configuration, the artificial neural network memory controller may be configured to identify domains of data of the artificial neural network model, and allocate addresses of memories in which data corresponding to each domain is stored. For example, the artificial neural network memory controller may set the start number and end address of the memory area allocated to each domain. According to the above-described configuration, data allocated to each domain may be stored to correspond to the sequence of the artificial neural network data locality pattern.

예를 들면, 인공신경망모델의 각각의 도메인의 데이터들은 각각의 도메인에 할당된 메모리 영역에 순차적으로 저장될 수 있다. 이때 해당 메모리는 읽기-버스트(read-burst) 기능을 지원할 수 있는 메모리일 수 있다. 상술한 구성에 따르면, 인공신경망 메모리 제어부가 메모리에서 특정 도메인의 데이터를 읽어올 때, 특정 데이터가 인공신경망 데이터 지역성 패턴에 따라 저장되었기 때문에 읽기-버스트 기능에 최적화 되도록 구성될 수 있다. 즉, 인공신경망 메모리 제어부는, 메모리의 저장 영역을 읽기-버스트 기능을 고려하여 설정하도록 구성될 수 있다.For example, data of each domain of the artificial neural network model may be sequentially stored in a memory area allocated to each domain. In this case, the corresponding memory may be a memory capable of supporting a read-burst function. According to the above configuration, when the artificial neural network memory controller reads data of a specific domain from the memory, the specific data is stored according to the artificial neural network data locality pattern, so it may be configured to be optimized for the read-burst function. That is, the artificial neural network memory controller may be configured to set the storage area of the memory in consideration of the read-burst function.

몇몇 예시에서는, 메모리는 읽기-버스트 기능을 더 포함하고, 적어도 하나의 인공신경망 메모리 제어부는, 적어도 하나의 메모리의 저장 영역을 읽기-버스트 기능을 고려하여 쓰도록 구성될 수 있다.In some examples, the memory may further include a read-burst function, and the at least one artificial neural network memory controller may be configured to write the storage area of the at least one memory in consideration of the read-burst function.

몇몇 예시에서는, 인공신경망 메모리 시스템의 데이터 접근 요청은, 인공신경망모델의 양자화를 식별할 수 있는 식별 정보를 더 포함하도록 구성될 수 있다. 단, 본 개시의 예시들은 이에 제한되지 않는다.In some examples, the data access request of the artificial neural network memory system may be configured to further include identification information capable of identifying the quantization of the artificial neural network model. However, examples of the present disclosure are not limited thereto.

예를 들면, 인공신경망 메모리 시스템은 데이터 접근 요청에 적어도 메모리 주소 값, 도메인, 및 양자화 식별 정보가 포함할 경우, 해당 도메인의 데이터의 양자화 정보를 식별하도록 구성될 수 있다. For example, when the data access request includes at least a memory address value, a domain, and quantization identification information, the artificial neural network memory system may be configured to identify quantization information of data of a corresponding domain.

예를 들면, 식별 코드가 [00001]일 경우, 인공신경망 메모리 제어부는 해당 데이터가 1 비트로 양자화된 데이터로 결정하도록 구성될 수 있다. For example, when the identification code is [00001], the artificial neural network memory control unit may be configured to determine the corresponding data as data quantized to 1 bit.

예를 들면, 식별 코드가 [11111]일 경우, 인공신경망 메모리 제어부는 해당 데이터가 32 비트로 양자화된 데이터로 결정하도록 구성될 수 있다.For example, when the identification code is [11111], the artificial neural network memory controller may be configured to determine the corresponding data as 32-bit quantized data.

몇몇 예시에서는 데이터 접근 요청에 다양한 식별 정보가 선택적으로 포함될 수 있다.In some examples, various identification information may be optionally included in the data access request.

상술한 구성에 따르면, 인공신경망 메모리 제어부는 데이터 접근 요청의 식별 코드를 분석하여, 보다 정확한 인공신경망 데이터 지역성 패턴을 생성할 수 있는 효과가 있다. 또한 각각의 식별 정보를 파악함으로 써 메모리의 저장 정책을 선택적으로 제어할 수 있게 하는 효과도 있다. According to the above-described configuration, the artificial neural network memory controller has an effect of analyzing the identification code of the data access request to generate a more accurate artificial neural network data locality pattern. In addition, it has the effect of selectively controlling the storage policy of the memory by identifying each identification information.

예를 들면, 학습과 추론을 식별 할 수 있으면, 각각의 인공신경망 데이터 지역성 패턴을 생성할 수 있다. For example, if learning and reasoning can be identified, each artificial neural network data locality pattern can be generated.

예를 들면, 데이터의 도메인을 식별할 수 있으면, 인공신경망 데이터 지역성 패턴의 데이터를 특정 메모리 영역에 저장하는 정책을 수립하여, 메모리 동작의 효율성을 향상시킬 수 있는 효과가 있다. For example, if the domain of data can be identified, there is an effect of improving the efficiency of memory operation by establishing a policy for storing the data of the artificial neural network data locality pattern in a specific memory area.

몇몇 예시에서는, 인공신경망 메모리 시스템이 복수의 인공신경망모델을 처리하도록 구성될 경우, 인공신경망 메모리 제어부는 인공신경망모델의 식별 정보, 예를 들면, 제1 인공신경망모델, 제2 인공신경망모델 등의 추가 식별 정보를 더 생성하도록 구성될 수 있다. 이때, 인공신경망 메모리 제어부는 각각의 인공신경망모델의 인공신경망 데이터 지역성에 기초하여 인공신경망모델을 구분하도록 구성될 수 있다. 단, 이에 제한되지 않는다. In some examples, when the artificial neural network memory system is configured to process a plurality of artificial neural network models, the artificial neural network memory controller includes identification information of the artificial neural network model, for example, the first artificial neural network model, the second artificial neural network model, etc. It may be configured to further generate additional identifying information. In this case, the artificial neural network memory controller may be configured to classify the artificial neural network models based on the artificial neural network data locality of each artificial neural network model. However, the present invention is not limited thereto.

도 12에 도시된 사이드밴드 시그널과 ANN(인공신경망) 데이터 지역성 정보는 선택적으로 통합되거나 또는 분리될 수 있다.The sideband signal and ANN (artificial neural network) data locality information shown in FIG. 12 may be selectively integrated or separated.

인공신경망 연산: SAM MEMORY CONTROLLER에서 해당 데이터의 ANN 연산 여부를 판단할 수 있다.Artificial Neural Network Calculation: It is possible to determine whether ANN operation of the data is performed in the SAM MEMORY CONTROLLER.

동작 유형 : SAM MEMORY CONTROLLER에서 해당 데이터가 학습 인지, 추론 인지 여부를 판단할 수 있다. (추론 시 가중치 값 갱신 스케줄)Operation type: In the SAM MEMORY CONTROLLER, it is possible to determine whether the data is learning or inference. (Schedule for updating weight values during inference)

동작 모드 : SAM MEMORY CONTROLLER에서 RAM을 동작 제어할 수 있음(Kernel의 경우 Domain을 보고 refresh 할 수 있고, 특징 맵의 경우 read-discard 할 수 있다)Operation mode: RAM operation can be controlled by SAM MEMORY CONTROLLER (Kernel can view and refresh domain, and feature map can read-discard)

DOMAIN : SAM MEMORY CONTROLLER에서 MEMORY MAP 설정에 필요한 정보일 수 있다.(ANN 데이터 지역성 정보에 따라 DOMAIN이 동일한 데이터를 특정 영역에 할당할 수 있다)DOMAIN: This may be information required for MEMORY MAP setting in SAM MEMORY CONTROLLER. (DOMAIN may allocate the same data to a specific area according to ANN data locality information)

양자화 : SAM MEMORY CONTROLLER는 해당 데이터의 양자화 정보를 제공할 수 있다.Quantization: SAM MEMORY CONTROLLER can provide quantization information of the corresponding data.

ANN MODEL # : SAM MEMORY CONTROLLER는 각각의 모델을 ANN 데이터 지역성 정보에 따라서 MEMORY MAP에 각각 할당할 수 있다. 최소 ANN의 전체 DATA 크기는 확보할 수 있다.ANN MODEL #: SAM MEMORY CONTROLLER can allocate each model to MEMORY MAP according to ANN data locality information. The minimum ANN's total DATA size can be secured.

MULTI-THREAD : SAM MEMORY CONTROLLER는 각각의 ANN MODEL의 THREAD 개수에 따라서, 커널은 공유하고, 특징 맵은 각각 할당할 수 있다.MULTI-THREAD: The kernel shares the SAM MEMORY CONTROLLER according to the number of THREADs of each ANN MODEL, and each feature map can be allocated.

ANN 데이터 지역성(DATA LOCALITY) : ANN의 데이터 지역성 정보의 현재 처리 단계를 의미하는 정보. ANN data locality (DATA LOCALITY): Information indicating the current processing stage of data locality information of ANN.

한편, 모든 사이드밴드 시그널은 PACKET으로 구현될 수도 있다. Meanwhile, all sideband signals may be implemented as PACKET.

도 13은 인공신경망 메모리 시스템의 단위 동작 당 에너지 소모를 설명하는 개략도이다.13 is a schematic diagram illustrating energy consumption per unit operation of an artificial neural network memory system.

도 13을 참조하면, 인공신경망 메모리 시스템(300)의 단위 동작 당 소비되는 에너지를 개략적으로 설명하는 표이다. 에너지 소모는 메모리 액세스, 덧셈 연산 및 곱셈 연산으로 구분하여 설명할 수 있다. Referring to FIG. 13 , it is a table schematically explaining energy consumed per unit operation of the artificial neural network memory system 300 . Energy consumption can be divided into memory access, addition operation, and multiplication operation.

“8b Add”는 가산기의 8비트 정수 덧셈 연산을 의미한다. 8비트 정수 덧셈 연산은 0.03pj의 에너지를 소비할 수 있다.“8b Add” refers to the 8-bit integer addition operation of the adder. An 8-bit integer addition operation can consume 0.03 pj of energy.

“16b Add”는 가산기의 16비트 정수 덧셈 연산을 의미한다. 16비트 정수 덧셈 연산은 0.05pj의 에너지를 소비할 수 있다.“16b Add” refers to the 16-bit integer addition operation of the adder. A 16-bit integer addition operation can consume 0.05pj of energy.

“32b Add”는 가산기의 32비트 정수 덧셈 연산을 의미한다. 32비트 정수 덧셈 연산은 0.1pj의 에너지를 소비할 수 있다. “32b Add” refers to the 32-bit integer addition operation of the adder. A 32-bit integer addition operation can consume 0.1pj of energy.

“16b FP Add”는 가산기의 16비트 부동소수점 덧셈 연산을 의미한다. 16비트 부동소수점 덧셈 연산은 0.4pj의 에너지를 소비할 수 있다.“16b FP Add” refers to the 16-bit floating-point addition operation of the adder. A 16-bit floating-point addition operation can consume 0.4pj of energy.

“32b FP Add”는 가산기의 32비트 부동소수점 덧셈 연산을 의미한다. 32비트 부동소수점 덧셈 연산은 0.9pj의 에너지를 소비할 수 있다.“32b FP Add” refers to the 32-bit floating-point addition operation of the adder. A 32-bit floating-point addition operation can consume 0.9pj of energy.

“8b Mult”는 곱셈기의 8비트 정수 곱셈 연산을 의미한다. 8비트 정수 곱셈 연산은 0.2pj의 에너지를 소비할 수 있다.“8b Mult” refers to the multiplier's 8-bit integer multiplication operation. An 8-bit integer multiplication operation can consume 0.2pj of energy.

“32b Mult”는 곱셈기의 32비트 정수 곱셈 연산을 의미한다. 32비트 정수 곱셈 연산은 3.1pj의 에너지를 소비할 수 있다.“32b Mult” refers to the multiplier's 32-bit integer multiplication operation. A 32-bit integer multiplication operation can consume 3.1pj of energy.

“16b FP Mult”는 곱셈기의 16비트 부동소수점 곱셈 연산을 의미한다. 16비트 부동소수점 곱셈 연산은 1.1pj의 에너지를 소비할 수 있다. “16b FP Mult” refers to the multiplier's 16-bit floating-point multiplication operation. A 16-bit floating-point multiplication operation can consume 1.1pj of energy.

“32b FP Mult”는 곱셈기의 32비트 부동소수점 곱셈 연산을 의미한다. 32비트 부동소수점 곱셈 연산은 3.7pj의 에너지를 소비할 수 있다.“32b FP Mult” refers to the multiplier's 32-bit floating-point multiplication operation. A 32-bit floating-point multiplication operation can consume 3.7 pj of energy.

“32b SRAM Read”는 인공신경망 메모리 시스템(300)의 캐쉬 메모리(322)가 SRAM(static random access memory)일 경우, 32비트의 데이터 읽기 액세스를 의미한다. 32비트의 데이터를 캐쉬 메모리(322)에서 프로세서(310)로 읽어오는데 5pj의 에너지를 소비할 수 있다.“32b SRAM Read” refers to 32-bit data read access when the cache memory 322 of the artificial neural network memory system 300 is a static random access memory (SRAM). 5pj of energy may be consumed to read 32-bit data from the cache memory 322 to the processor 310 .

“32b DRAM Read”는 인공신경망 메모리 시스템(300)의 메모리(330)가 DRAM일 경우, 32비트의 데이터 읽기 액세스를 의미한다. 32비트 데이터를 메모리(330)에서 프로세서(310)로 읽어오는데 640pj의 에너지를 소비할 수 있다. 에너지 단위는 피코-줄(pj)을 의미한다.“32b DRAM Read” refers to 32-bit data read access when the memory 330 of the artificial neural network memory system 300 is DRAM. Reading 32-bit data from the memory 330 to the processor 310 may consume 640pj of energy. Energy unit means pico-joule (pj).

인공신경망 메모리 시스템(300)이 32비트 부동소수점 곱셈을 하는 경우와 8비트 정수 곱셈을 하는 경우를 비교하면, 단위 동작 당 에너지 소모는 대략 18.5배 차이가 난다. DRAM으로 구성된 메모리(330)에서 32비트 데이터를 읽어오는 경우와 SRAM으로 구성된 캐쉬 메모리(322)에서 32비트 데이터를 읽어오는 경우 단위 동작 당 에너지 소모는 대략 128배 차이가 난다. When the artificial neural network memory system 300 performs 32-bit floating-point multiplication and 8-bit integer multiplication, energy consumption per unit operation is approximately 18.5 times different. When 32-bit data is read from the memory 330 composed of DRAM and when 32-bit data is read from the cache memory 322 composed of SRAM, energy consumption per unit operation is approximately 128 times different.

즉, 소비전력 관점에서, 데이터의 비트 크기가 증가할수록 소비전력이 증가한다. 또한 부동 소수점 연산을 사용하면 정수 연산보다 소비전력이 증가한다. 또한 DRAM에서 데이터를 읽어올 경우 소비전력이 급격히 증가한다. That is, in terms of power consumption, as the bit size of data increases, power consumption increases. In addition, using floating-point arithmetic increases power consumption compared to integer arithmetic. Also, when data is read from DRAM, power consumption increases rapidly.

이에 본 개시의 또 다른 예시에 따른 인공신경망 메모리 시스템(300)은 캐쉬 메모리(322)의 용량을 인공신경망모델(1300)의 데이터 값을 모두 저장할 수 있는 정도의 용량으로 구성될 수 있다. Accordingly, the neural network memory system 300 according to another example of the present disclosure may be configured such that the capacity of the cache memory 322 is sufficient to store all the data values of the neural network model 1300 .

본 개시의 예시들에 따른 캐쉬 메모리는 SRAM에 제한되지 않는다. SRAM과 같은 고속 구동이 가능한 정적 메모리는 SRAM, MRAM, STT-MRAM, eMRAM, 및 OST-MRAM 등이 있다. 더 나아가서, MRAM, STT-MRAM, eMRAM, 및 OST-MRAM은 정적 메모리이면서 비휘발성 특성을 가지고 있다. 따라서, 인공신경망 메모리 시스템(300)의 전원이 차단된 후 다시 부팅될 때 메모리(330)에서 인공신경망모델(1300)을 다시 제공받지 않아도 될 수 있는 효과가 있다. 단, 본 개시에 따른 예시들은 이에 제한되지 않는다.The cache memory according to examples of the present disclosure is not limited to SRAM. Static memories capable of high-speed driving such as SRAM include SRAM, MRAM, STT-MRAM, eMRAM, and OST-MRAM. Furthermore, MRAM, STT-MRAM, eMRAM, and OST-MRAM are static memories and have non-volatile characteristics. Therefore, there is an effect that the artificial neural network model 1300 does not need to be provided again from the memory 330 when the artificial neural network memory system 300 is rebooted after the power is cut off. However, examples according to the present disclosure are not limited thereto.

상술한 구성에 따르면, 인공신경망 메모리 시스템(300)은 인공신경망 데이터 지역성 패턴(1400)에 기초하여 인공신경망모델(1300)의 추론 연산 시 메모리(330)의 읽기 동작에 의한 소비전력을 대폭 저감할 수 있는 효과가 있다. According to the above-described configuration, the artificial neural network memory system 300 can significantly reduce power consumption by the read operation of the memory 330 during inference calculation of the artificial neural network model 1300 based on the artificial neural network data locality pattern 1400. can have an effect.

도 14는 본 개시의 다양한 예시들에 따른 인공신경망 메모리 시스템을 설명하는 개략도이다.14 is a schematic diagram illustrating an artificial neural network memory system according to various examples of the present disclosure.

이하 도 14를 참조하여 본 개시에 따른 다양한 예시들에 대해서 설명한다. 도 14는 본 개시에 따른 다양한 예시들이 실시될 수 있는 다양한 경우의 수를 설명할 수 있다.Hereinafter, various examples according to the present disclosure will be described with reference to FIG. 14 . 14 may explain the number of various instances in which various examples according to the present disclosure may be practiced.

본 개시의 다양한 예시들에 따르면, 인공신경망 메모리 시스템(400)은 적어도 하나의 프로세서, 적어도 하나의 메모리, 및 적어도 하나의 프로세서를 포함하고, 적어도 하나의 프로세서에서 데이터 접근 요청을 수신 받아 적어도 하나의 메모리에게 메모리 접근 요청을 제공하도록 구성된 적어도 하나의 인공신경망 메모리 제어부(ANN Memory Controller: AMC)를 포함하도록 구성될 수 있다. 적어도 하나의 인공신경망 메모리 제어부(AMC)는 예시적인 인공신경망 메모리 제어부들(120, 220, 320)과 실질적으로 동일하게 구성될 수 있다. 단, 이에 제한되지 않으며, 인공신경망 메모리 시스템(400)의 하나의 인공신경망 메모리 제어부는 다른 인공신경망 메모리 제어부와 서로 상이하게 구성될 수 있다. 이하 인공신경망 메모리 제어부(411, 412, 413, 414, 415, 416, 417)와 상술한 인공신경망 메모리 제어부들(120, 220, 320)의 중복 설명은 단지 설명의 편의를 위해서 생략할 수 있다. According to various examples of the present disclosure, the artificial neural network memory system 400 includes at least one processor, at least one memory, and at least one processor, and receives at least one data access request from at least one processor. It may be configured to include at least one artificial neural network memory controller (ANN Memory Controller: AMC) configured to provide a memory access request to the memory. At least one artificial neural network memory controller (AMC) may be configured substantially the same as the exemplary artificial neural network memory controllers 120 , 220 , 320 . However, the present invention is not limited thereto, and one artificial neural network memory controller of the artificial neural network memory system 400 may be configured differently from other artificial neural network memory controllers. Hereinafter, overlapping descriptions of the neural network memory controllers 411 , 412 , 413 , 414 , 415 , 416 , and 417 and the aforementioned artificial neural network memory controllers 120 , 220 , 320 may be omitted for convenience of description.

적어도 하나의 인공신경망 메모리 제어부는 적어도 하나의 프로세서와 적어도 하나의 메모리를 연결하도록 구성된다. 이때, 적어도 하나의 프로세서와 적어도 하나의 메모리 사이의 데이터 이동 경로에는 대응되는 인공신경망 데이터 지역성에 존재할 수 있다. 따라서, 해당 데이터 이동 경로에 위치한 인공신경망 메모리 제어부는 대응되는 인공신경망 데이터 지역성 패턴을 추출하도록 구성될 수 있다.The at least one artificial neural network memory controller is configured to connect at least one processor and at least one memory. In this case, the data movement path between the at least one processor and the at least one memory may exist in a corresponding artificial neural network data locality. Accordingly, the artificial neural network memory controller located in the corresponding data movement path may be configured to extract the corresponding artificial neural network data locality pattern.

각각의 인공신경망 메모리 제어부(AMC)는 각각의 데이터 접근 요청을 모니터링해서 각각 인공신경망 데이터 지역성 패턴을 생성하도록 구성될 수 있다.인공신경망 메모리 시스템(400)은 적어도 하나의 프로세서를 포함하도록 구성될 수 있다. 적어도 하나의 프로세서는 인공신경망 연산을 단독으로 또는 다른 프로세서와 협력하여 처리하도록 구성될 수 있다. Each artificial neural network memory control unit (AMC) may be configured to monitor each data access request to generate each artificial neural network data locality pattern. The artificial neural network memory system 400 may be configured to include at least one processor. there is. At least one processor may be configured to process an artificial neural network operation alone or in cooperation with another processor.

인공신경망 메모리 시스템(400)은 적어도 하나의 내부 메모리를 포함하도록 구성될 수 있다. 인공신경망 메모리 시스템(400)은 적어도 하나의 외부 메모리와 연결되도록 구성될 수 있다. 내부 메모리 또는 외부 메모리는 DRAM(Dynamic RAM), HBM(High bandwidth memory), SRAM(Static RAM), PROM(Programmable ROM), EPROM(Erasable PROM), EEPROM(Electrically EPROM), 플래시 메모리(Flash Memory), 강유전체 램(ferroelectric RAM(FRAM)), 플래쉬 메모리(flash memory), 마그네틱 램(magnetic RAM(MRAM)), 하드 디스크, 및 상 변화 메모리 장치(phase change RAM) 등을 포함할 수 있다. 단, 본 개시는 이에 제한되지 않는다.The artificial neural network memory system 400 may be configured to include at least one internal memory. The artificial neural network memory system 400 may be configured to be connected to at least one external memory. Internal or external memory includes DRAM (Dynamic RAM), HBM (High bandwidth memory), SRAM (Static RAM), PROM (Programmable ROM), EPROM (Erasable PROM), EEPROM (Electrically EPROM), Flash Memory, It may include a ferroelectric RAM (FRAM), a flash memory, a magnetic RAM (MRAM), a hard disk, a phase change memory device (phase change RAM), and the like. However, the present disclosure is not limited thereto.

인공신경망 메모리 시스템(400)은 외부 메모리(External MEM)와 연결되는 외부 메모리 인터페이스를 포함할 수 있다. 외부 메모리 인터페이스는 메모리 접근 요청을 인공신경망 메모리 시스템(400)의 적어도 하나의 외부 메모리로 전송하고, 적어도 하나의 외부 메모리로부터 메모리 접근 요청에 응답하는 데이터를 수신할 수 있다. 예시적인 인공신경망 메모리 제어부들(120, 220, 320)에 개시된 구성과 기능은 복수의 인공신경망 메모리 제어부(411, 412, 413, 414, 415, 416, 417)로 분산되어 인공신경망 메모리 시스템(400)의 특정 위치에 배치될 수 있다. 몇몇 예시에서는, 프로세서는 인공신경망 메모리 제어부를 포함하도록 구성될 수 있다.The artificial neural network memory system 400 may include an external memory interface connected to an external memory (External MEM). The external memory interface may transmit a memory access request to at least one external memory of the artificial neural network memory system 400 and receive data in response to the memory access request from the at least one external memory. The configuration and functions disclosed in the exemplary neural network memory controllers 120, 220, 320 are distributed to a plurality of artificial neural network memory controllers 411, 412, 413, 414, 415, 416, 417, and the artificial neural network memory system 400 ) can be placed in a specific position. In some examples, the processor may be configured to include an artificial neural network memory controller.

몇몇 예시에서는, 메모리는 DRAM일 수 있으며, 이때 인공신경망 메모리 제어부는 DRAM 내부에 포함되도록 구성될 수 있다.In some examples, the memory may be DRAM, and in this case, the artificial neural network memory controller may be configured to be included in the DRAM.

예를 들면, 인공신경망 메모리 제어부들(411, 412, 413, 414, 415, 416, 417) 중 적어도 하나는 캐쉬 메모리를 내장하도록 구성될 수 있다. 또한, 캐쉬 메모리는 프로세서, 내부 메모리, 및/또는 외부 메모리에 포함되도록 구성될 수 있다.For example, at least one of the artificial neural network memory controllers 411 , 412 , 413 , 414 , 415 , 416 , and 417 may be configured to have a cache memory embedded therein. In addition, the cache memory may be configured to be included in the processor, internal memory, and/or external memory.

예를 들면, 인공신경망 메모리 제어부들(411, 412, 413, 414, 415, 416, 417) 중 적어도 하나는 메모리와 프로세서 사이의 데이터의 전송 경로에 분산되어 배치되도록 구성될 수 있다.For example, at least one of the artificial neural network memory controllers 411 , 412 , 413 , 414 , 415 , 416 , and 417 may be configured to be distributed and disposed in a data transmission path between the memory and the processor.

예를 들면, 인공신경망 메모리 시스템(400)에 구현될 수 있는 인공신경망 메모리 제어부는 독립된 형태로 구성된 인공신경망 메모리 제어부(411), 시스템버스에 포함된 인공신경망 메모리 제어부(412), 프로세서의 인터페이스로 구성된 인공신경망 메모리 제어부(413), 내부 메모리의 메모리 인터페이스와 시스템버스 사이의 Wrapper Block 내에 포함된 인공신경망 메모리 제어부(414), 내부 메모리의 메모리 인터페이스에 포함된 인공신경망 메모리 제어부, 내부 메모리 내에 포함된 인공신경망 메모리 제어부(415), 외부 메모리에 대응하는 메모리 인터페이스에 포함된 인공신경망 메모리 제어부, 외부 메모리의 메모리 인터페이스와 시스템버스 사이의 Wrapper Block 내에 포함된 인공신경망 메모리 제어부(416), 및/또는 외부 메모리 내에 포함된 인공신경망 메모리 제어부(417) 중 하나로 구성될 수 있다. 단, 본 개시의 예시들에 따른 인공신경망 메모리 제어부는 이에 제한되지 않는다.For example, the artificial neural network memory controller that can be implemented in the artificial neural network memory system 400 is an artificial neural network memory controller 411 configured in an independent form, an artificial neural network memory controller 412 included in the system bus, and an interface of the processor. The configured artificial neural network memory controller 413, the artificial neural network memory controller 414 included in the Wrapper Block between the memory interface of the internal memory and the system bus, the artificial neural network memory controller included in the memory interface of the internal memory, and included in the internal memory The artificial neural network memory controller 415, the artificial neural network memory controller included in the memory interface corresponding to the external memory, the artificial neural network memory controller 416 included in the wrapper block between the memory interface of the external memory and the system bus, and/or external It may be configured as one of the artificial neural network memory controllers 417 included in the memory. However, the artificial neural network memory controller according to examples of the present disclosure is not limited thereto.

예를 들면, 제1 인공신경망 메모리 제어부(411)와 제2 인공신경망 메모리 제어부(412)가 생성하는 각각의 인공신경망 데이터 지역성 패턴들은 서로 같거나 또는 서로 상이할 수 있다. For example, the artificial neural network data locality patterns generated by the first artificial neural network memory controller 411 and the second artificial neural network memory controller 412 may be the same or different from each other.

부연 설명하면, 제1 인공신경망 메모리 제어부(411)는 시스템 버스(system bus)를 통해서 제1 프로세서(processor 1)와 제1 내부 메모리(internal MEM 1)를 연결하도록 구성될 수 있다. 이때 제1 프로세서(processor 1)와 제1 내부 메모리(internal MEM 1) 사이의 데이터 이동 경로에는 대응되는 제1 인공신경망 데이터 지역성이 존재할 수 있다. In more detail, the first artificial neural network memory controller 411 may be configured to connect the first processor 1 and the first internal memory 1 through a system bus. In this case, a corresponding first artificial neural network data locality may exist in a data movement path between the first processor 1 and the first internal memory 1 .

이때, 해당 경로에는 제3 인공신경망 메모리 제어부(413)가 도시되어 있으나, 이는 단지 예시를 위한 것이며, 제3 인공신경망 메모리 제어부(413)가 삭제될 수 있다. 즉, 프로세서와 메모리 사이에 적어도 하나의 인공신경망 메모리 제어부가 배치되면 프로세서가 처리하는 인공신경망모델의 인공신경망 데이터 지역성 패턴을 생성할 수 있다. At this time, although the third artificial neural network memory controller 413 is shown in the corresponding path, this is only for illustration, and the third artificial neural network memory controller 413 may be deleted. That is, when at least one artificial neural network memory controller is disposed between the processor and the memory, the artificial neural network data locality pattern of the artificial neural network model processed by the processor may be generated.

부연 설명하면, 제2 인공신경망 메모리 제어부(412)는 제2 프로세서(processor 2)와 제1 외부 메모리(external MEM 1)를 연결하도록 구성될 수 있다. 이때 제2 프로세서(processor 2)와 제1 외부 메모리(external MEM 1) 사이의 데이터 이동 경로에는 대응되는 제2 인공신경망 데이터 지역성이 존재할 수 있다.In more detail, the second artificial neural network memory controller 412 may be configured to connect the second processor 2 and the first external memory 1 external MEM 1 . In this case, a corresponding second artificial neural network data locality may exist in a data movement path between the second processor 2 and the first external memory 1 .

예를 들면, 제1 프로세서(processor 1)가 처리하는 제1 인공신경망모델은 객체인식모델일 수 있으며, 제2 프로세서(processor 2)가 처리하는 제2 인공신경망모델은 음성인식모델일 수 있다. 따라서 각각의 인공신경망모델을 서로 상이하고, 대응되는 인공신경망 데이터 지역성 패턴들도 서로 상이할 수 있다.For example, the first artificial neural network model processed by the first processor 1 may be an object recognition model, and the second artificial neural network model processed by the second processor 2 may be a voice recognition model. Accordingly, each artificial neural network model may be different from each other, and corresponding artificial neural network data locality patterns may also be different from each other.

즉, 인공신경망 메모리 제어부들(411, 412, 413, 414, 415, 416, 417) 각각이 생성하는 인공신경망 데이터 지역성 패턴은 대응되는 프로세서가 생성하는 데이터 접근 요청의 패턴 특징에 따라서 결정될 수 있다. That is, the artificial neural network data locality pattern generated by each of the neural network memory controllers 411, 412, 413, 414, 415, 416, and 417 may be determined according to the pattern characteristic of the data access request generated by the corresponding processor.

즉, 인공신경망 메모리 시스템(400)의 인공신경망 메모리 제어부는 임의의 프로세서와 임의의 메모리 사이에 배치되더라도, 해당 위치의 인공신경망 데이터 지역성 패턴을 생성할 수 있는 적응력을 제공할 수 있는 효과가 있다. That is, even if the artificial neural network memory control unit of the artificial neural network memory system 400 is disposed between an arbitrary processor and an arbitrary memory, there is an effect that can provide the adaptability to generate the artificial neural network data locality pattern of the corresponding location.

부연 설명하면, 하나의 인공신경망모델을 두 개의 프로세서가 협력해서 병렬로 처리 할 경우, 해당 인공신경망모델의 인공신경망 데이터 지역성 패턴은 각각의 프로세서에게 분할되어 할당될 수 있다. 예를 들면, 제1 레이어의 컨벌루션 연산은 제1 프로세서가 처리하고 제2 레이어의 컨벌루션 연산은 제2 프로세서가 처리하여 인공신경망모델의 연산을 분산시킬 수 있다. 이러한 경우, 인공신경망모델이 동일하더라도, 각각의 프로세서가 처리하는 인공신경망모델의 인공신경망 데이터 지역성은 데이터 접근 요청 단위로 재구성될 수 있다. 이러한 경우, 각각의 인공신경망 메모리 제어부는 각각의 인공신경망 메모리 제어부가 처리하는 프로세서의 데이터 접근 요청에 대응되는 인공신경망 데이터 지역성 패턴을 각각 생성하도록 구성될 수 있는 적응력을 제공할 수 있는 효과가 있다.In other words, when two processors cooperate to process one artificial neural network model in parallel, the neural network data locality pattern of the corresponding artificial neural network model can be divided and assigned to each processor. For example, the convolution operation of the first layer may be processed by the first processor and the convolution operation of the second layer may be processed by the second processor to distribute the computation of the artificial neural network model. In this case, even if the artificial neural network model is the same, the locality of the artificial neural network data of the artificial neural network model processed by each processor can be reconfigured in units of data access requests. In this case, each artificial neural network memory control unit has the effect of providing adaptability that can be configured to respectively generate artificial neural network data locality patterns corresponding to the data access request of the processor processed by each artificial neural network memory controller.

데이터 접근 요청 단위는 적어도 하나의 워드 단위로 구성될 수 있다. 인공신경망 데이터 지역성 (ANN DL) 단위는 적어도 하나의 데이터 접근 요청 단위로 구성될 수 있다.The data access request unit may be composed of at least one word unit. The artificial neural network data locality (ANN DL) unit may consist of at least one data access request unit.

상술한 구성에 따르면, 복수의 프로세서와 복수의 메모리 사이에 복수의 인공신경망 메모리 제어부가 분산 배치 되더라도, 각각의 상황에 맞게 생성되는 인공신경망 데이터 지역성 패턴들에 의해서 인공신경망 메모리 시스템(400)의 성능이 최적화 될 수 있는 효과가 있다. 즉, 각각의 인공신경망 메모리 제어부는 각자 위치한 자리에서 인공신경망 데이터 지역성을 분석할 수 있기 때문에 가변적으로 실시간으로 처리되는 인공신경망 연산에 최적화 될 수 있는 효과가 있다.According to the above configuration, even if a plurality of artificial neural network memory controllers are distributed among a plurality of processors and a plurality of memories, the performance of the artificial neural network memory system 400 is performed by artificial neural network data locality patterns generated according to each situation. This has the effect that it can be optimized. That is, since each artificial neural network memory controller can analyze locality of artificial neural network data at each location, there is an effect that can be optimized for artificial neural network calculations that are variably processed in real time.

몇몇 예시에서는, 인공신경망 메모리 제어부들(411, 412, 413, 414, 415, 416, 417) 중 적어도 하나는 메모리 개수, 메모리 종류, 메모리의 실효 대역폭, 메모리의 지연시간, 메모리 크기 중 적어도 하나의 정보를 확인하도록 구성될 수 있다. In some examples, at least one of the neural network memory controllers 411, 412, 413, 414, 415, 416, 417 may include at least one of the number of memories, memory types, effective bandwidth of memory, delay time of memory, and memory size. may be configured to verify information.

몇몇 예시에서는, 인공신경망 메모리 제어부들(411, 412, 413, 414, 415, 416, 417) 중 적어도 하나는 메모리 접근 요청에 응답하는 메모리의 실효 대역폭을 측정하도록 구성될 수 있다. 여기서 메모리는 적어도 하나 이상일 수 있으며, 각각의 인공신경망 메모리 제어부는 각각의 메모리와 통신하는 채널의 실효 대역폭을 측정할 수 있다. 실효 대역폭은 인공신경망 메모리 제어부가 메모리 접근 요청을 생성하고, 해당 메모리 접근 요청이 완료되는 시간과 데이터 전송 비트 레이트(bit rate)를 측정하여 계산될 수 있다.In some examples, at least one of the neural network memory controllers 411 , 412 , 413 , 414 , 415 , 416 , and 417 may be configured to measure an effective bandwidth of a memory in response to a memory access request. Here, there may be at least one memory, and each artificial neural network memory controller may measure an effective bandwidth of a channel communicating with each memory. The effective bandwidth may be calculated by the artificial neural network memory controller generating a memory access request, measuring the time at which the memory access request is completed, and the data transmission bit rate.

몇몇 예시에서는, 인공신경망 메모리 제어부(411, 412, 413, 414, 415, 416, 417) 중 적어도 하나는 메모리 접근 요청에 응답하는 적어도 하나의 메모리의 필요 대역폭을 정보를 제공받도록 구성될 수 있다.In some examples, at least one of the neural network memory controllers 411 , 412 , 413 , 414 , 415 , 416 , and 417 may be configured to receive information on a required bandwidth of at least one memory in response to a memory access request.

몇몇 예시에서는, 인공신경망 메모리 시스템(400)은 복수의 메모리를 포함하고, 적어도 하나의 인공신경망 메모리 제어부는 복수의 메모리의 실효 대역폭을 각각 측정하도록 구성될 수 있다.In some examples, the neural network memory system 400 may include a plurality of memories, and at least one artificial neural network memory controller may be configured to measure an effective bandwidth of the plurality of memories, respectively.

몇몇 예시에서는, 인공신경망 메모리 시스템(400)은 복수의 메모리를 포함하고, 적어도 하나의 인공신경망 메모리 제어부는, 복수의 메모리의 지연시간을 각각 측정하도록 구성될 수 있다.In some examples, the artificial neural network memory system 400 may include a plurality of memories, and at least one artificial neural network memory controller may be configured to measure delay times of the plurality of memories, respectively.

즉, 적어도 하나의 인공신경망 메모리 제어부는 자신과 연결된 각각의 메모리들을 오토 캘리브레이션(auto-calibration) 하도록 구성될 수 있다. 오토 캘리브레이션은 인공신경망 메모리 시스템이 시작할 때 또는 특정 주기마다 실행되도록 구성될 수 있다. 적어도 하나의 인공신경망 메모리 제어부는 오토 캘리브레이션을 통해서 자신과 연결된 메모리의 개수, 메모리의 종류, 메모리의 실효 대역폭, 메모리의 지연신간, 메모리의 크기 등의 정보를 수집하도록 구성될 수 있다. That is, the at least one artificial neural network memory controller may be configured to auto-calibrate each memory connected thereto. Auto-calibration can be configured to run when the neural network memory system starts up or at specific intervals. The at least one artificial neural network memory control unit may be configured to collect information such as the number of memories connected thereto, the type of memory, the effective bandwidth of the memory, the delayed renewal of the memory, the size of the memory, and the like through auto-calibration.

상술한 구성에 따르면, 인공신경망 메모리 시스템(400)은 인공신경망 메모리 제어부에 대응되는 메모리의 지연시간 및 실효 대역폭을 알 수 있다. According to the above configuration, the artificial neural network memory system 400 can know the delay time and effective bandwidth of the memory corresponding to the artificial neural network memory controller.

상술한 구성에 따르면, 독립된 형태의 인공신경망 메모리 제어부를 시스템버스에 연결시키더라도, 프로세서가 처리중인 인공신경망모델의 인공신경망 데이터 지역성을 생성하여 메모리를 제어할 수 있는 효과가 있다.According to the above configuration, even when an independent type of artificial neural network memory control unit is connected to the system bus, there is an effect that the processor can control the memory by generating the artificial neural network data locality of the artificial neural network model being processed.

몇몇 예시에서는, 인공신경망 메모리 시스템(400)의 적어도 하나의 인공신경망 메모리 제어부는, 인공신경망 데이터 지역성 패턴의 1회 반복에 소요되는 시간 및 데이터 크기를 계산하여 인공신경망 연산이 요구하는 실효 대역폭을 계산하도록 구성될 수 있다. 구체적으로 설명하면, 인공신경망 데이터 지역성 패턴에 포함된 데이터 접근 요청을 모두 처리할 경우, 프로세서가 인공신경망모델의 추론을 완료했다고 결정할 수 있다. 인공신경망 메모리 시스템(400)은 인공신경망 데이터 지역성 패턴에 기초하여 1회 추론에 걸리는 시간을 측정하여 초당 추론 횟수(IPS; inference per second)를 계산하도록 구성될 수 있다. 또한, 인공신경망 메모리 시스템(400)은 목포 초당 추론 횟수 정보를 프로세서로부터 제공 받을 수 있다. 예를 들면, 특정 어플리케이션은 특정 인공신경망모델의 추론 속도를 30 IPS로 요구할 수 있다. 만약 측정된IPS가 목표 IPS보다 낮을 경우, 인공신경망 메모리 제어부(400)는 프로세서의 인공신경망모델 처리 속도를 향상시키기 위해서 동작하도록 구성될 수 있다.In some examples, the at least one artificial neural network memory control unit of the artificial neural network memory system 400 calculates the effective bandwidth required for the artificial neural network operation by calculating the time and data size required for one repetition of the artificial neural network data locality pattern. can be configured to Specifically, when all data access requests included in the artificial neural network data locality pattern are processed, the processor may determine that the inference of the artificial neural network model has been completed. The artificial neural network memory system 400 may be configured to calculate the number of inferences per second (IPS) by measuring the time taken for one inference based on the artificial neural network data locality pattern. In addition, the artificial neural network memory system 400 may receive information on the number of inferences per second in Mokpo from the processor. For example, a specific application may require the inference speed of a specific artificial neural network model to be 30 IPS. If the measured IPS is lower than the target IPS, the artificial neural network memory control unit 400 may be configured to operate to improve the artificial neural network model processing speed of the processor.

몇몇 예시에서는, 인공신경망 메모리 시스템(400)은 인공신경망 메모리 제어부, 프로세서, 및 메모리의 통신을 제어하도록 구성된 시스템버스를 포함하도록 구성될 수 있다. 또한, 적어도 하나의 인공신경망 메모리 제어부는 시스템버스의 마스터 권한을 가지도록 구성될 수 있다.In some examples, the neural network memory system 400 may be configured to include an artificial neural network memory controller, a processor, and a system bus configured to control communication of the memory. In addition, at least one artificial neural network memory control unit may be configured to have a master authority of the system bus.

부연 설명하면, 인공신경망 메모리 시스템(400)은 인공신경망 연산을 위한 전용 장치가 아닐 수 있다. 이러한 경우, 인공신경망 메모리 시스템(400)의 시스템버스에는 와이파이, 디스플레이, 카메라, 마이크 등 다양한 주변 장치들이 연결될 수 있다. 이러한 경우, 인공신경망 메모리 시스템(400)은 안정적인 인공신경망 연산을 위해서 시스템버스의 대역폭을 제어하도록 구성될 수 있다.In other words, the artificial neural network memory system 400 may not be a dedicated device for artificial neural network computation. In this case, various peripheral devices such as Wi-Fi, a display, a camera, and a microphone may be connected to the system bus of the artificial neural network memory system 400 . In this case, the artificial neural network memory system 400 may be configured to control the bandwidth of the system bus for stable artificial neural network operation.

몇몇 예시에서는, 적어도 하나의 인공신경망 메모리 제어부는, 메모리 접근 요청의 처리 시간동안 인공신경망 연산을 우선 처리하도록 동작하고, 이외의 시간 동안 인공신경망 연산 이외의 연산을 처리하도록 구성될 수 있다.In some examples, the at least one artificial neural network memory control unit may be configured to operate to preferentially process an artificial neural network operation during a processing time of a memory access request, and to process operations other than the artificial neural network operation during other times.

몇몇 예시에서는, 적어도 하나의 인공신경망 메모리 제어부는 적어도 하나의 메모리가 메모리 접근 요청을 완료할 때까지, 시스템버스의 실효 대역폭을 확보하도록 구성될 수 있다. In some examples, the at least one artificial neural network memory controller may be configured to secure an effective bandwidth of the system bus until the at least one memory completes the memory access request.

몇몇 예시에서는, 적어도 하나의 인공신경망 메모리 제어부는 시스템버스 내부에 배치되고, 시스템버스는 시스템버스 내에서 생성된 인공신경망 데이터 지역성 패턴에 기초하여 시스템버스의 대역폭을 동적으로 가변 하도록 구성될 수 있다.In some examples, the at least one artificial neural network memory controller may be disposed in the system bus, and the system bus may be configured to dynamically vary the bandwidth of the system bus based on the artificial neural network data locality pattern generated in the system bus.

몇몇 예시에서는, 적어도 하나의 인공신경망 메모리 제어부는 시스템버스 내에 배치되고, 적어도 하나의 인공신경망 메모리 제어부는 적어도 하나의 메모리가 메모리 접근 요청에 대한 응답을 완료할 때까지, 시스템버스의 제어 권한을 메모리 접근 요청이 없을 때보다 상대적으로 더 높게 증가시키도록 구성될 수 있다.In some examples, the at least one artificial neural network memory controller is disposed in the system bus, and the at least one artificial neural network memory controller stores the control authority of the system bus until the at least one memory completes a response to the memory access request. It can be configured to increase relatively higher than when there is no access request.

몇몇 예시에서는, 적어도 하나의 인공신경망 메모리 제어부는, 복수의 프로세서 중 인공신경망 연산을 처리하는 프로세서의 데이터 접근 요청의 우선 순위를 인공신경망 연산 이외의 연산을 처리하는 프로세서보다 더 높게 설정하도록 구성될 수 있다.In some examples, the at least one artificial neural network memory control unit may be configured to set the priority of a data access request of a processor processing an artificial neural network operation among a plurality of processors higher than a processor processing an operation other than the artificial neural network operation. there is.

몇몇 예시에서는, 인공신경망 메모리 제어부가 메모리를 직접 제어하도록 구성될 수 있다. In some examples, the artificial neural network memory controller may be configured to directly control the memory.

몇몇 예시에서는, 메모리에 인공신경망 메모리 제어부가 포함되고, 인공신경망 메모리 제어부는 적어도 하나의 접근 순서(access que)를 생성하도록 구성될 수 있다. 인공신경망 메모리 제어부는 인공신경망 연산 전용 접근 순서를 별도로 생성하도록 구성될 수 있다. In some examples, an artificial neural network memory controller may be included in the memory, and the neural network memory controller may be configured to generate at least one access que. The artificial neural network memory control unit may be configured to separately generate an access sequence dedicated to artificial neural network computation.

몇몇 예시에서는, 복수의 메모리 중 적어도 하나는 SAM 또는 DRAM일 수 있다. 이러한 경우 적어도 하나의 인공신경망 메모리 제어부는 메모리 접근 요청의 접근 순서를 재조정하도록 구성될 수 있다. 이러한 접근 순서 재조정은 액세스 큐 리오더(access que re-order)일 수 있다.In some examples, at least one of the plurality of memories may be a SAM or a DRAM. In this case, the at least one artificial neural network memory control unit may be configured to readjust the access order of the memory access request. Such access reordering may be an access queue re-order.

몇몇 예시에서는, 인공신경망 메모리 제어부는 복수의 메모리 접근 요청의 접근 순서를 포함하도록 구성될 수 있다. 이러한 경우 제1 접근 순서는 인공신경망 연산 전용 접근 순서일 수 있으며, 제2 접근 순서는 인공신경망 연산 이외의 접근 순서일 수 있다. 인공신경망 메모리 제어부는 우선순위 설정에 따라서 각각의 접근 순서를 선택하여 데이터를 제공하도록 구성될 수 있다.In some examples, the neural network memory control unit may be configured to include an access order of a plurality of memory access requests. In this case, the first access order may be an access order dedicated to artificial neural network operation, and the second access order may be an access order other than artificial neural network operation. The artificial neural network memory control unit may be configured to provide data by selecting each access order according to a priority setting.

몇몇 예시에서는, 적어도 하나의 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴에 기초하여 특정 메모리 접근 요청을 처리하기 위해서 시스템버스에게 요구되는 특정 대역폭을 계산하도록 구성되고, 적어도 하나의 인공신경망 메모리 제어부는 특정 대역폭에 기초하여 시스템버스의 실효 대역폭을 제어하도록 구성될 수 있다.In some examples, the at least one artificial neural network memory controller is configured to calculate a specific bandwidth required for the system bus to process a specific memory access request based on the neural network data locality pattern, and the at least one artificial neural network memory controller includes a specific It may be configured to control the effective bandwidth of the system bus based on the bandwidth.

상술한 구성들에 따르면, 인공신경망 메모리 시스템(400)은 다양한 주변 장치의 메모리 접근 요청들의 우선 순위를 낮추거나 또는 인공신경망 데이터 지역성 패턴에 기초한 예측된 데이터 접근 요청의 우선순위를 향상시키도록 구성될 수 있다. According to the above-described configurations, the neural network memory system 400 is configured to lower the priority of memory access requests of various peripheral devices or improve the priority of the predicted data access requests based on the neural network data locality pattern. can

상술한 구성들에 따르면, 인공신경망 메모리 제어부는 시스템버스의 데이터 접근 요청의 처리 순서를 재조정하여 인공신경망 연산이 처리되는 동안에는 시스템버스의 대역폭을 최대한 활용하고, 인공신경망 연산이 없는 경우에는 다른 주변 장치의 데이터를 처리를 위해서 대역폭을 양보할 수 있다. According to the above-described configurations, the artificial neural network memory controller readjusts the processing order of the data access request of the system bus to maximize the bandwidth of the system bus while the artificial neural network operation is being processed, and when there is no artificial neural network operation, other peripheral devices Bandwidth can be yielded for processing of data.

상술한 구성들에 따르면, 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴에 기초하여 데이터 접근 요청의 우선순위를 재조정할 수 있다. 또한 데이터 접근 요청에 포함된 식별 정보에 기초하여 우선순위를 재조정할 수 있다. 즉, 인공신경망 연산 관점에서 시스템버스의 실효 대역폭이 동적으로 가변 되어 실효 대역폭이 향상 될 수 있다. 따라서 시스템버스의 동작 효율이 향상될 수 있는 효과가 있다. 따라서 인공신경망 메모리 제어부 입장에서 시스템버스의 실효 대역폭이 향상될 수 있는 효과가 있다.According to the above-described configurations, the artificial neural network memory controller may readjust the priority of the data access request based on the artificial neural network data locality pattern. In addition, the priority can be re-adjusted based on the identification information included in the data access request. That is, the effective bandwidth of the system bus can be dynamically changed from the viewpoint of artificial neural network operation, so that the effective bandwidth can be improved. Accordingly, there is an effect that the operating efficiency of the system bus can be improved. Therefore, there is an effect that the effective bandwidth of the system bus can be improved from the point of view of the artificial neural network memory controller.

몇몇 예시에서는, 적어도 하나의 인공신경망 메모리 제어부는 데이터 접근 요청을 기계학습 하도록 구성될 수 있다. 즉, 적어도 하나의 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴을 기계학습 하도록 구성된 인공신경망모델을 더 포함할 수 있다. 즉 인공신경망 데이터 지역성 패턴은 기계학습되기 때문에, 실제 인공신경망 데이터 지역성에 따른 데이터 접근 요청 처리 중간에 다른 데이터 접근 요청이 인터럽트 하는 특이 패턴들도 학습하여 예측하도록 구성될 수 있다. In some examples, the at least one artificial neural network memory control unit may be configured to machine-learning a data access request. That is, the at least one artificial neural network memory controller may further include an artificial neural network model configured to machine-learning artificial neural network data locality patterns. That is, since the artificial neural network data locality pattern is machine learned, it can be configured to learn and predict specific patterns interrupted by other data access requests in the middle of data access request processing according to the actual artificial neural network data locality.

인공신경망 메모리 제어부에 내장된 인공신경망모델은 예측된 데이터 접근 요청이 생성될 경우, 시스템버스의 제어 권한을 예측된 데이터 접근 요청들의 생성되지 않을 때보다 상대적으로 더 높게 증가시키도록 기계학습 될 수 있다.The artificial neural network model built into the artificial neural network memory control unit can be machine-learned to increase the control authority of the system bus relatively higher when predicted data access requests are generated than when predicted data access requests are not generated. .

몇몇 예시에서는, 적어도 하나의 인공신경망 메모리 제어부는, 계층화 된 복수의 캐쉬 메모리를 더 포함하고, 적어도 하나의 인공신경망 메모리 제어부는, 계층화 된 복수의 캐쉬 메모리의 계층간 데이터 접근 요청을 기계학습을 하도록 구성될 수 있다.In some examples, the at least one artificial neural network memory control unit further includes a plurality of layered cache memories, and the at least one artificial neural network memory control unit performs machine learning of inter-layer data access requests of the plurality of layered cache memories. can be configured.

몇몇 예시에서는, 적어도 하나의 인공신경망 메모리 제어부는, 계층화 된 복수의 캐쉬 메모리 각각의 계층의 실효 대역폭, 소비 전력, 및 지연시간 정보 중 적어도 하나를 더 제공 받도록 구성될 수 있다.In some examples, the at least one artificial neural network memory control unit may be configured to further receive at least one of effective bandwidth, power consumption, and delay time information of each layer of a plurality of layered cache memories.

상술한 구성에 의하면, 인공신경망 메모리 제어부는 기계학습을 통해서 인공신경망 데이터 지역성 패턴을 생성하도록 구성될 수 있으며, 기계학습 된 인공신경망 데이터 지역성 패턴은 인공신경망 연산과 무관한 다양한 데이터 접근 요청 들이 특정 패턴을 가지고 생성될 때, 이러한 특정 패턴들의 발생 예측 확률을 향상시킬 수 있는 효과가 있다. 또한 강화 학습에 의해서 프로세서가 처리하는 다양한 인공신경망모델 및 다른 연산들의 특성을 예측하여 인공신경망 연산의 효율을 향상시킬 수 있다.According to the above configuration, the artificial neural network memory control unit may be configured to generate an artificial neural network data locality pattern through machine learning, and the machine-learned artificial neural network data locality pattern is a specific pattern of various data access requests independent of the artificial neural network operation. When generated with In addition, the efficiency of artificial neural network computation can be improved by predicting the characteristics of various artificial neural network models and other operations processed by the processor by reinforcement learning.

몇몇 예시에서는, 적어도 하나의 인공신경망 메모리 제어부는, 복수의 메모리 각각의 실효 대역폭 및 지연시간에 기초하여 상기 복수의 메모리에 저장되는 데이터를 분할하여 저장하도록 구성될 수 있다. In some examples, the at least one artificial neural network memory controller may be configured to divide and store data stored in the plurality of memories based on the effective bandwidth and delay time of each of the plurality of memories.

예를 들면, 데이터는 L 비트의 비트 그룹으로 구성되고, 복수의 메모리는 제1 메모리 및 제2 메모리를 더 포함하고, 제1 메모리는 제1 실효 대역폭 또는 제1 지연시간에 기초하여 상기 L 비트의 비트 그룹 중 M 비트의 데이터를 분할하여 저장하도록 구성되고, 제2 메모리는 제2 실효 대역폭 또는 제2 지연시간에 기초하여 L 비트의 비트 그룹 중 N 비트의 데이터를 분할하여 저장하도록 구성되고, M 비트와 N 비트의 합은 L 비트와 같거나 또는 작도록 구성될 수 있다. 또한, 복수의 메모리는 제3 메모리를 더 포함하고, 제3 메모리는 제3 실효 대역폭 또는 제3 지연시간에 기초하여 L 비트의 비트 그룹 중 O 비트의 데이터를 저장하도록 구성되고, M 비트, N 비트 및 O 비트의 합은 L 비트와 같도록 구성될 수 있다.For example, the data is composed of a bit group of L bits, the plurality of memories further comprising a first memory and a second memory, wherein the first memory includes the L bits based on a first effective bandwidth or a first delay time. is configured to divide and store M-bit data among the bit group of The sum of M bits and N bits may be configured to be less than or equal to L bits. In addition, the plurality of memories further include a third memory, wherein the third memory is configured to store data of O bits of the bit group of L bits based on the third effective bandwidth or the third delay time, M bits, N bits The sum of bits and O bits may be configured to be equal to L bits.

예를 들면, 데이터는 P개의 데이터 묶음으로 구성되고, 복수의 메모리는 제1 메모리 및 제2 메모리를 포함하고, 제1 메모리는 제1 실효 대역폭 또는 제1 지연시간에 기초하여 P개의 데이터 묶음 중 R개의 데이터 묶음을 저장하도록 구성되고, 제2 메모리는 제2 실효 대역폭 또는 제2 지연시간에 기초하여 P개의 데이터 묶음 중 S개의 데이터 묶음을 저장하도록 구성되고, R개와 S개의 합은 상기 P개와 같거나 또는 작도록 구성될 수 있다. 또한 복수의 메모리는 제3 메모리를 더 포함하고, 제3 메모리는 제3 실효 대역폭 또는 제3 지연시간에 기초하여 P개의 데이터 묶음 중 T개의 데이터 묶음을 저장하도록 구성되고, R개, S개 및 T개의 합은 P개와 같도록 구성될 수 있다.For example, the data is composed of P data bundles, the plurality of memories includes a first memory and a second memory, and the first memory is one of the P data bundles based on the first effective bandwidth or the first delay time. configured to store R data bundles, and the second memory is configured to store S data bundles among the P data bundles based on a second effective bandwidth or a second delay time, wherein the R and S sums are equal to the P It may be configured to be the same or smaller. In addition, the plurality of memories further include a third memory, wherein the third memory is configured to store T data bundles among the P data bundles based on the third effective bandwidth or the third delay time, R, S and The sum of T may be configured to be equal to P.

상술한 구성에 따르면, 인공신경망 메모리 제어부는 하나의 메모리의 대역폭이 낮을 때, 복수의 메모리에 데이터를 분산시켜 저장하거나 읽을 수 있기 때문에, 메모리의 실효 대역폭을 향상시킬 수 있는 효과가 있다. 예를 들면, 인공신경망 메모리 제어부는 8비트의 양자화된 가중치 값을 제1 메모리에 4비트 제2 메모리에 4비트씩 분할하여 저장하거나 읽도록 구성될 수 있다. 따라서 인공신경망 메모리 제어부 입장에서 메모리의 실효 대역폭이 향상될 수 있는 효과가 있다. According to the above-described configuration, when the bandwidth of one memory is low, the artificial neural network memory controller can store or read data by distributing data in a plurality of memories, thereby improving the effective bandwidth of the memory. For example, the artificial neural network memory controller may be configured to store or read an 8-bit quantized weight value divided by 4 bits in a first memory and 4 bits in a second memory. Therefore, there is an effect that the effective bandwidth of the memory can be improved from the point of view of the artificial neural network memory controller.

인공신경망 메모리 제어부는 복수의 메모리에 분할되어 저장된 데이터를 병합하여 저장하도록 구성된 캐쉬 메모리를 더 포함하도록 구성될 수 있다. 즉, 적어도 하나의 인공신경망 메모리 제어부는 캐쉬 메모리를 더 포함하고, 적어도 하나의 인공신경망 메모리 제어부는, 복수의 메모리에 분배되어 저장된 데이터를 병합하여 캐쉬 메모리에 저장하도록 구성될 수 있다. 따라서 프로세서는 병합된 데이터를 제공받을 수 있다. The artificial neural network memory control unit may be configured to further include a cache memory configured to merge and store data divided and stored in a plurality of memories. That is, the at least one artificial neural network memory controller may further include a cache memory, and the at least one artificial neural network memory controller may be configured to merge data stored in a plurality of memories and store the merged data in the cache memory. Accordingly, the processor may be provided with the merged data.

분할된 데이터를 병합하기 위해서 적어도 하나의 인공신경망 메모리 제어부는 복수의 메모리에 분할되어 저장된 데이터의 분할 정보를 저장하도록 구성될 수 있다.본 계시의 다양한 예시들은 아래와 같이 설명될 수 있다.In order to merge the divided data, the at least one artificial neural network memory control unit may be configured to store division information of the divided and stored data in a plurality of memories. Various examples of the present disclosure may be described as follows.

본 개시의 예시들에 따르면, 인공신경망 메모리 시스템은 인공신경망 연산에 대응되는 데이터 접근 요청을 생성하도록 구성된 적어도 하나의 프로세서 및 데이터 접근 요청을 순차적으로 기록하여 상기 인공신경망 연산의 인공신경망 데이터 지역성 패턴을 생성하도록 구성되고, 인공신경망 데이터 지역성 패턴에 기초하여 적어도 하나의 프로세서가 생성한 데이터 접근 요청의 실제 데이터 접근 요청을 예측한 예측된 데이터 접근 요청을 생성하도록 구성된 적어도 하나의 인공신경망 메모리 제어부를 포함하도록 구성될 수 있다. 여기서 인공신경망 데이터 지역성은 프로세서-메모리 레벨에서 재구성된 인공신경망 데이터 지역성일 수 있다.According to the examples of the present disclosure, the artificial neural network memory system sequentially records at least one processor configured to generate a data access request corresponding to the artificial neural network operation and the data access request to obtain the neural network data locality pattern of the artificial neural network operation. at least one neural network memory controller configured to generate a predicted data access request that predicts the actual data access request of the data access request generated by the at least one processor based on the neural network data locality pattern can be configured. Here, the artificial neural network data locality may be the artificial neural network data locality reconstructed at the processor-memory level.

본 개시의 예시들에 따르면, 인공신경망 메모리 시스템은 인공신경망모델을 처리하도록 구성된 적어도 하나의 프로세서 및 인공신경망모델의 인공신경망 데이터 지역성 정보를 저장하도록 구성되고 인공신경망 데이터 지역성 정보에 기초하여 적어도 하나의 프로세서가 요청할 데이터를 예측하여 예측된 데이터 접근 요청을 생성하도록 구성된 적어도 하나의 인공신경망 메모리 제어부를 포함하도록 구성될 수 있다.According to examples of the present disclosure, an artificial neural network memory system is configured to store at least one processor configured to process an artificial neural network model and artificial neural network data locality information of the artificial neural network model, and based on the artificial neural network data locality information, at least one The processor may be configured to include at least one artificial neural network memory controller configured to predict data to be requested and generate a predicted data access request.

인공신경망 메모리 시스템은 적어도 하나의 메모리 및 인공신경망 메모리 제어부, 적어도 하나의 프로세서, 및 적어도 하나의 메모리의 통신을 제어하도록 구성된 시스템 버스를 더 포함하도록 구성될 수 있다.본 개시의 예시들에 따르면, 인공신경망 메모리 시스템은 프로세서, 메모리 및 캐쉬 메모리를 포함하고, 인공신경망 데이터 지역성 정보에 기초하여 프로세서가 요청할 데이터를 포함하는 예측된 데이터 접근 요청을 생성하도록 구성되고, 그리고 메모리로부터 예측된 데이터 접근 요청에 대응되는 데이터를 상기 프로세서가 요청하기 전에 상기 캐쉬 메모리에 저장하도록 구성될 수 있다.The artificial neural network memory system may be configured to further include at least one memory and a neural network memory controller, at least one processor, and a system bus configured to control communication of the at least one memory. According to examples of the present disclosure, The neural network memory system includes a processor, a memory, and a cache memory, and is configured to generate a predicted data access request including data to be requested by the processor based on the neural network data locality information, and to respond to the predicted data access request from the memory. It may be configured to store the corresponding data in the cache memory before the processor requests it.

본 개시의 예시들에 따르면, 인공신경망 메모리 시스템은 인공신경망 데이터 지역성 정보를 제공 받아 동작하도록 구성된 제1 모드 또는 프로세서가 생성하는 데이터 접근 요청들을 관찰하여 인공신경망 데이터 지역성 정보를 예측하여 동작하도록 구성된 제2 모드 중 하나의 모드로 동작하도록 구성될 수 있다.According to the examples of the present disclosure, the artificial neural network memory system is configured to predict and operate the artificial neural network data locality information by observing the data access requests generated by the first mode or the processor configured to operate by receiving the artificial neural network data locality information. It may be configured to operate in one of two modes.

적어도 하나의 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴에 기초하여 예측된 데이터 접근 요청을 순차적으로 더 생성하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to sequentially further generate a predicted data access request based on the neural network data locality pattern.

적어도 하나의 인공신경망 메모리 제어부는 실제 데이터 접근 요청 생성 전에 예측된 데이터 접근 요청을 생성하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to generate a predicted data access request before generating an actual data access request.

적어도 하나의 프로세서는 적어도 하나의 인공신경망 메모리 제어부에 데이터 접근 요청을 전송하도록 구성될 수 있다.The at least one processor may be configured to transmit a data access request to the at least one artificial neural network memory controller.

적어도 하나의 인공신경망 메모리 제어부는 데이터 접근 요청에 대응하여 예측된 데이터 접근 요청을 출력하도록 구성될 수 있다.The at least one artificial neural network memory control unit may be configured to output a predicted data access request in response to the data access request.

데이터 접근 요청은 메모리 주소를 더 포함하도록 구성될 수 있다.The data access request may be configured to further include a memory address.

데이터 접근 요청은 메모리의 시작 주소 및 끝 주소를 더 포함하도록 구성될 수 있다.The data access request may be configured to further include a start address and an end address of the memory.

적어도 하나의 인공신경망 메모리 제어부는 적어도 하나의 프로세서가 생성한 데이터 접근 요청 및 인공신경망 메모리 제어부가 생성한 예측된 데이터 접근 요청 중 하나에 기초하여 메모리 접근 요청을 생성하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to generate a memory access request based on one of a data access request generated by the at least one processor and a predicted data access request generated by the artificial neural network memory controller.

데이터 접근 요청은 메모리의 시작 주소와 연속되는 데이터 연속 읽기 트리거(trigger)를 더 포함하도록 구성될 수 있다.The data access request may be configured to further include a start address of memory and a continuous data read trigger (trigger).

데이터 접근 요청은 메모리의 시작 주소와 연속되는 데이터의 개수 정보를 더 포함하도록 구성될 수 있다.The data access request may be configured to further include information on the number of consecutive data and a start address of the memory.

데이터 접근 요청 및 사전 데이터 접근은 매칭되는 동일한 메모리 주소의 데이터 접근 요청 토큰을 더 포함하도록 구성될 수 있다.The data access request and dictionary data access may be configured to further include a matching data access request token at the same memory address.

데이터 접근 요청은 메모리 읽기 또는 쓰기 명령 여부를 식별할 수 있는 식별 정보를 더 포함하도록 구성될 수 있다.The data access request may be configured to further include identification information capable of identifying whether a memory read or write command is issued.

데이터 접근 요청은 덮어쓰기 명령 여부를 식별할 수 있는 식별 정보를 더 포함하도록 구성될 수 있다. The data access request may be configured to further include identification information capable of identifying whether or not an overwrite command is performed.

데이터 접근 요청은 추론 데이터, 가중치 데이터 및 특징맵 데이터 여부를 식별할 수 있는 식별 정보를 더 포함하도록 구성 될 수 있다.The data access request may be configured to further include identification information capable of identifying whether inference data, weight data, and feature map data are present.

데이터 접근 요청은 학습 데이터 및 평가 데이터 여부를 식별할 수 있는 식별 정보를 더 포함하도록 구성될 수 있다.The data access request may be configured to further include identification information that can identify whether the learning data and the evaluation data.

데이터 접근 요청은 인공신경망 연산이 학습을 위한 연산인지 또는 추론을 위한 연산인지 여부를 식별할 수 있는 식별 정보를 더 포함하도록 구성될 수 있다.The data access request may be configured to further include identification information capable of identifying whether the artificial neural network operation is an operation for learning or an operation for inference.

적어도 하나의 프로세서가 실제 데이터 접근 요청을 생성할 경우, 적어도 하나의 인공신경망 메모리 제어부는, 예측된 데이터 접근 요청과 실제 데이터 접근 요청이 서로 동일한 요청인지를 결정하도록 구성될 수 있다.When the at least one processor generates the actual data access request, the at least one artificial neural network memory controller may be configured to determine whether the predicted data access request and the actual data access request are the same request.

적어도 하나의 인공신경망 메모리 제어부는 예측된 데이터 접근 요청과 실제 데이터 접근 요청이 동일할 경우, 상기 인공신경망 데이터 지역성 패턴을 유지하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to maintain the neural network data locality pattern when the predicted data access request and the actual data access request are the same.

적어도 하나의 인공신경망 메모리 제어부는 예측된 데이터 접근 요청과 실제 데이터 접근 요청이 상이할 경우 인공신경망 데이터 지역성 패턴을 갱신하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to update the neural network data locality pattern when the predicted data access request and the actual data access request are different.

인공신경망 데이터 지역성 패턴은 데이터 접근 요청들의 메모리의 주소들을 순차적으로 기록한 데이터를 더 포함하도록 구성될 수 있다.The artificial neural network data locality pattern may be configured to further include data that sequentially records addresses of memory of data access requests.

적어도 하나의 인공신경망 메모리 제어부는 데이터 접근 요청에 포함된 메모리 주소의 반복 패턴을 감지하여 인공신경망 데이터 지역성 패턴을 생성하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to generate an artificial neural network data locality pattern by detecting a repeating pattern of a memory address included in the data access request.

인공신경망 데이터 지역성 패턴은 반복되는 루프 특성을 가지는 메모리 주소들로 구성될 수 있다.The artificial neural network data locality pattern may be composed of memory addresses having a repeating loop characteristic.

인공신경망 데이터 지역성 패턴은 인공신경망모델의 연산의 시작과 끝을 식별할 수 있는 식별 정보를 더 포함하도록 구성될 수 있다.The artificial neural network data locality pattern may be configured to further include identification information capable of identifying the start and end of the computation of the artificial neural network model.

적어도 하나의 프로세서는 데이터 접근 요청에 대응되는 데이터를 인공신경망 메모리 제어부로부터 제공받도록 구성될 수 있다.At least one processor may be configured to receive data corresponding to the data access request from the artificial neural network memory controller.

적어도 하나의 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴을 기계학습을 하도록 구성된 인공신경망모델을 더 포함하도록 구성될 수 있다.The at least one artificial neural network memory control unit may be configured to further include an artificial neural network model configured to machine-learning artificial neural network data locality patterns.

적어도 하나의 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴의 갱신 된 패턴과 이전의 패턴을 저장하여, 인공신경망모델의 변화 여부를 결정하도록 구성될 수 있다.The at least one artificial neural network memory control unit may be configured to store an updated pattern and a previous pattern of the artificial neural network data locality pattern to determine whether to change the artificial neural network model.

적어도 하나의 인공신경망 메모리 제어부는 데이터 접근 요청들이 하나의 인공신경망모델의 요청들인지 또는 복수의 인공신경망모델들의 요청들이 혼합된 것인지 여부를 결정하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to determine whether the data access requests are requests of one artificial neural network model or a mixture of requests of a plurality of artificial neural network models.

적어도 하나의 인공신경망 메모리 제어부는 인공신경망모델의 개수가 복수일 경우, 인공신경망모델의 개수에 대응되는 인공신경망 데이터 지역성 패턴들을 더 생성하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to further generate artificial neural network data locality patterns corresponding to the number of artificial neural network models when the number of artificial neural network models is plural.

적어도 하나의 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴들에 기초하여, 대응되는 예측된 데이터 접근 요청들을 각각 생성하도록 구성될 수 있다.The at least one neural network memory control unit may be configured to respectively generate corresponding predicted data access requests based on the neural network data locality patterns.

적어도 하나의 인공신경망 메모리 제어부는 데이터 접근 요청에 대응되는 메모리 접근 요청을 더 생성하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to further generate a memory access request corresponding to the data access request.

적어도 하나의 인공신경망 메모리 제어부는 예측된 데이터 접근 요청에 대응되는 메모리 접근 요청을 더 생성하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to further generate a memory access request corresponding to the predicted data access request.

데이터 접근 요청, 예측된 데이터 접근 요청 및 메모리 접근 요청 각각은 대응되는 메모리 주소 값 및 동작 모드를 각각 포함하도록 구성될 수 있다.Each of the data access request, the predicted data access request, and the memory access request may be configured to include a corresponding memory address value and an operation mode, respectively.

적어도 하나의 인공신경망 메모리 제어부는, 데이터 접근 요청 및 예측된 데이터 접근 요청에 포함된 정보 중 적어도 일부를 포함하도록 구성된 메모리 접근 요청을 더 생성하도록 구성될 수 있다.The at least one artificial neural network memory control unit may be configured to further generate a memory access request configured to include at least a portion of information included in the data access request and the predicted data access request.

적어도 하나의 인공신경망 메모리 제어부와 통신하도록 구성된 적어도 하나의 메모리를 더 포함하고, 적어도 하나의 메모리는 적어도 하나의 인공신경망 메모리 제어부에서 출력되는 메모리 접근 요청에 대응하여 동작하도록 구성될 수 있다.It may further include at least one memory configured to communicate with the at least one artificial neural network memory controller, and the at least one memory may be configured to operate in response to a memory access request output from the at least one artificial neural network memory controller.

적어도 하나의 메모리는 추론 데이터, 가중치 데이터 및 특징맵 데이터 중 적어도 하나를 저장하도록 구성될 수 있다.The at least one memory may be configured to store at least one of inference data, weight data, and feature map data.

적어도 하나의 인공신경망 메모리 제어부는, 메모리 접근 요청에 응답하여 적어도 하나의 메모리가 전송한 데이터를 저장하도록 구성된 캐쉬 메모리를 더 포함하도록 구성될 수 있다.The at least one artificial neural network memory control unit may be configured to further include a cache memory configured to store data transmitted by the at least one memory in response to a memory access request.

적어도 하나의 프로세서가 실제 데이터 접근 요청을 출력할 경우, 적어도 하나의 인공신경망 메모리 제어부는 예측된 데이터 접근 요청과 실제 데이터 접근 요청이 서로 동일한 요청인지를 결정하고, 동일할 경우 적어도 하나의 인공신경망 메모리 제어부는 적어도 하나의 프로세서에 캐쉬 메모리에 저장된 데이터를 제공하도록 구성되고, 동일하지 않은 경우, 적어도 하나의 인공신경망 메모리 제어부는 실제 데이터 접근 요청에 기초하여 신규 메모리 접근 요청을 생성하도록 구성될 수 있다.When the at least one processor outputs the actual data access request, the at least one artificial neural network memory control unit determines whether the predicted data access request and the actual data access request are the same request, and if they are the same, the at least one artificial neural network memory control unit The control unit is configured to provide the data stored in the cache memory to the at least one processor, and if not identical, the at least one artificial neural network memory control unit may be configured to generate a new memory access request based on the actual data access request.

적어도 하나의 인공신경망 메모리 제어부는 캐쉬 메모리의 잔여 용량에 기초 하여 메모리 접근 요청을 적어도 하나 이상 순차적으로 생성하여 캐쉬 메모리의 상기 잔여 용량이 최소화되도록 구성될 수 있다.The at least one artificial neural network memory control unit may be configured to sequentially generate at least one or more memory access requests based on the remaining capacity of the cache memory to minimize the remaining capacity of the cache memory.

적어도 하나의 인공신경망 메모리 제어부는, 메모리 접근 요청에 응답하는 적어도 하나의 메모리의 실효 대역폭을 측정하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to measure an effective bandwidth of at least one memory in response to a memory access request.

적어도 하나의 인공신경망 메모리 제어부는 메모리 접근 요청에 응답하는 적어도 하나의 메모리의 필요 대역폭을 정보를 제공받도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to receive information on a required bandwidth of at least one memory in response to a memory access request.

적어도 하나의 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴의 특정 시간 동안의 반복 횟수를 계산하여 상기 인공신경망 연산의 1초당 추론 횟수(IPS)를 측정하도록 구성될 수 있다.The at least one artificial neural network memory control unit may be configured to measure the number of inferences per second (IPS) of the artificial neural network operation by calculating the number of repetitions for a specific time of the artificial neural network data locality pattern.

적어도 하나의 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴의 1회 반복에 소요되는 시간 및 데이터 크기를 계산하여 인공신경망 연산이 요구하는 실효 대역폭을 계산하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to calculate an effective bandwidth required for an artificial neural network operation by calculating a time and data size required for one repetition of the neural network data locality pattern.

적어도 하나의 메모리는, 메모리의 셀의 전압을 갱신할 수 있는 리프레쉬 기능을 포함하는 디램(DRAM)을 더 포함하고, 적어도 하나의 인공신경망 메모리 제어부는 예측된 데이터 접근 요청에 대응되는 메모리 접근 요청에 대응되는 적어도 하나의 메모리의 메모리 주소 영역의 리프레쉬를 선택적으로 제어하도록 구성될 수 있다.The at least one memory further includes a DRAM including a refresh function capable of updating a voltage of a cell of the memory, and the at least one artificial neural network memory control unit responds to a memory access request corresponding to the predicted data access request. It may be configured to selectively control refresh of a memory address region of the corresponding at least one memory.

적어도 하나의 메모리는 메모리의 글로벌 비트라인을 특정 전압으로 충전시킬 수 있는 프리차지 기능을 더 포함하고, 적어도 하나의 인공신경망 메모리 제어부는 예측된 데이터 접근 요청에 대응되는 메모리 접근 요청에 대응되는 적어도 하나의 메모리의 메모리 주소 영역에 프리차지를 선택적으로 제공하도록 구성될 수 있다.The at least one memory further includes a precharge function capable of charging the global bit line of the memory to a specific voltage, and the at least one artificial neural network memory control unit includes at least one memory access request corresponding to the predicted data access request. may be configured to selectively provide precharge to a memory address region of a memory of

적어도 하나의 메모리는 복수의 메모리를 더 포함하고 적어도 하나의 인공신경망 메모리 제어부는 복수의 메모리의 실효 대역폭을 각각 측정하도록 구성될 수 있다.The at least one memory may further include a plurality of memories, and the at least one artificial neural network memory controller may be configured to measure an effective bandwidth of the plurality of memories, respectively.

적어도 하나의 메모리는 복수의 메모리를 더 포함하고 적어도 하나의 인공신경망 메모리 제어부는 복수의 메모리의 레이턴시를 각각 측정하도록 구성될 수 있다.The at least one memory may further include a plurality of memories, and the at least one artificial neural network memory controller may be configured to measure latencies of the plurality of memories, respectively.

적어도 하나의 메모리는 복수의 메모리를 더 포함하고, 적어도 하나의 인공신경망 메모리 제어부는 복수의 메모리 각각의 실효 대역폭 및 지연시간에 기초하여 복수의 메모리에 저장되는 데이터를 분할하여 저장하도록 구성될 수 있다.The at least one memory further includes a plurality of memories, and the at least one artificial neural network memory control unit may be configured to divide and store data stored in the plurality of memories based on the effective bandwidth and delay time of each of the plurality of memories. .

데이터는 L 비트의 비트 그룹으로 구성되고, 복수의 메모리는 제1 메모리 및 제2 메모리를 더 포함하고, 제1 메모리는 제1 실효 대역폭 또는 제1 지연시간에 기초하여 L 비트의 비트 그룹 중 M 비트의 데이터를 분할하여 저장하도록 구성되고, 제2 메모리는 제2 실효 대역폭 또는 제2 지연시간에 기초하여 L 비트의 비트 그룹 중 N 비트의 데이터를 분할하여 저장하도록 구성되고, M 비트와 N 비트의 합은 L 비트와 같거나 또는 작도록 구성될 수 있다The data is composed of a bit group of L bits, and the plurality of memories further include a first memory and a second memory, wherein the first memory includes M among the bit group of L bits based on the first effective bandwidth or the first delay time. The second memory is configured to divide and store data of bits, and the second memory is configured to divide and store N bits of data among the L bits bit group based on the second effective bandwidth or the second delay time, and the M bits and the N bits The sum of can be configured to be less than or equal to L bits.

복수의 메모리는 제3 메모리를 더 포함하고, 제3 메모리는 제3 실효 대역폭 또는 제3 지연시간에 기초하여 L 비트의 비트 그룹 중 O 비트의 데이터를 저장하도록 구성되고, M 비트, N 비트 및 O 비트의 합은 L 비트와 같도록 구성될 수 있다.The plurality of memories further includes a third memory, wherein the third memory is configured to store data of O bits of the bit group of L bits based on the third effective bandwidth or the third delay time, the M bits, N bits and The sum of O bits may be configured to be equal to L bits.

적어도 하나의 인공신경망 메모리 제어부는, 복수의 메모리에 분할되어 저장된 데이터를 병합하여 저장하도록 구성된 캐쉬 메모리를 더 포함하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to further include a cache memory configured to merge and store data divided and stored in a plurality of memories.

데이터는 P개의 데이터 묶음으로 구성되고, 복수의 메모리는 제1 메모리 및 제2 메모리를 더 포함하고, 제1 메모리는 제1 실효 대역폭 또는 제1 지연시간에 기초하여 P개의 데이터 묶음 중 R개의 데이터 묶음을 저장하도록 구성되고, 제2 메모리는 제2 실효 대역폭 또는 제2 지연시간에 기초하여 상기 P개의 데이터 묶음 중 S개의 데이터 묶음을 저장하도록 구성되고, R개와 상기 S개의 합은 상기 P개와 같거나 또는 작도록 구성될 수 있다.The data is composed of P data bundles, and the plurality of memories further include a first memory and a second memory, wherein the first memory includes R data of the P data bundles based on the first effective bandwidth or the first delay time. and the second memory is configured to store S data bundles among the P data bundles based on a second effective bandwidth or a second delay time, wherein R and the S sum are equal to the P or may be configured to be small.

복수의 메모리는 제3 메모리를 더 포함하고, 제3 메모리는 제3 실효 대역폭 또는 제3 지연시간에 기초하여 P개의 데이터 묶음 중 T개의 데이터 묶음을 저장하도록 구성되고, R개, 상기 S개 및 상기 T개의 합은 상기 P개와 같도록 구성될 수 있다.The plurality of memories further includes a third memory, wherein the third memory is configured to store T data bundles of the P data bundles based on a third effective bandwidth or a third delay time, R, the S and The sum of the T numbers may be configured to be equal to the P numbers.

적어도 하나의 메모리는 복수의 메모리를 더 포함하고, 적어도 하나의 인공신경망 메모리 제어부는, 캐쉬 메모리를 더 포함하고, 적어도 하나의 인공신경망 메모리 제어부는 복수의 메모리에 분배되어 저장된 데이터를 병합하여 캐쉬 메모리에 저장하도록 구성될 수 있다.The at least one memory further includes a plurality of memories, the at least one artificial neural network memory controller further includes a cache memory, and the at least one artificial neural network memory controller merges data stored in the plurality of memories distributed to the cache memory. It can be configured to store in .

적어도 하나의 메모리는 복수의 메모리를 더 포함하고, 적어도 하나의 인공신경망 메모리 제어부는 복수의 메모리에 분할되어 저장된 데이터의 분할 정보를 저장하도록 구성될 수 있다.The at least one memory may further include a plurality of memories, and the at least one artificial neural network memory controller may be configured to store division information of data divided and stored in the plurality of memories.

적어도 하나의 인공신경망 메모리 제어부는 예측된 데이터 접근 요청 및 적어도 하나의 메모리의 레이턴시 값에 기초하여 캐쉬 메모리에 레이턴시 만큼 데이터의 일부를 저장하도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to store a portion of data as much as a latency in the cache memory based on the predicted data access request and the latency value of the at least one memory.

적어도 하나의 인공신경망 메모리 제어부는 예측된 데이터 접근 요청 및 적어도 하나의 메모리의 데이터 대역폭 요구량에 기초하여 캐쉬 메모리에 상기 데이터의 일부를 저장하도록 구성될 수 있다.The at least one artificial neural network memory control unit may be configured to store a portion of the data in the cache memory based on the predicted data access request and the data bandwidth requirement of the at least one memory.

적어도 하나의 인공신경망 메모리 제어부는 적어도 하나의 프로세서에서 실제 데이터 접근 요청 생성 시, 캐쉬 메모리에 저장된 데이터를 먼저 제공하면서, 데이터의 나머지를 적어도 하나의 메모리로부터 읽기-버스트 모드로 제어하여, 적어도 하나의 메모리의 레이턴시를 저감하도록 구성될 수 있다.When the at least one processor generates an actual data access request, the at least one artificial neural network memory control unit provides the data stored in the cache memory first, and controls the remainder of the data from the at least one memory in a read-burst mode, It may be configured to reduce latency of the memory.

적어도 하나의 인공신경망 메모리 제어부는 예측된 데이터 접근 요청 및 적어도 하나의 메모리의 레이턴시 값에 기초하여 적어도 하나의 프로세서에서 실제 데이터 접근 요청 생성 시, 레이턴시 값만큼 사전에 적어도 하나의 메모리의 읽기-버스트 모드로 시작하여, 적어도 하나의 메모리의 레이턴시를 저감하도록 구성될 수 있다.When the at least one artificial neural network memory control unit generates an actual data access request in at least one processor based on the predicted data access request and the latency value of the at least one memory, the read-burst mode of at least one memory in advance by the latency value Starting with , it may be configured to reduce latency of at least one memory.

인공신경망 메모리 제어부, 상기 적어도 하나의 프로세서, 및 상기 적어도 하나의 메모리의 통신을 제어하도록 구성된 시스템 버스를 더 포함하도록 구성될 수 있다.It may be configured to further include a system bus configured to control communication of an artificial neural network memory controller, the at least one processor, and the at least one memory.

적어도 하나의 인공신경망 메모리 제어부는 시스템 버스의 마스터 권한을 가지도록 구성될 수 있다.At least one artificial neural network memory control unit may be configured to have a master authority of the system bus.

적어도 하나의 인공신경망 메모리 제어부는 인공신경망모델을 더 포함하고, 인공신경망모델은 예측된 데이터 접근 요청이 생성될 경우, 시스템 버스의 제어 권한을 예측된 데이터 접근 요청들의 생성되지 않을 때보다 상대적으로 더 높게 증가시키도록 기계 학습될 수 있다.The at least one artificial neural network memory control unit further includes an artificial neural network model, wherein, when a predicted data access request is generated, the control right of the system bus is relatively greater than when the predicted data access requests are not generated. It can be machine learned to increase high.

적어도 하나의 인공신경망 메모리 제어부는 적어도 하나의 메모리가 상기 메모리 접근 요청을 완료할 때까지, 시스템 버스의 실효 대역폭을 확보하도록 구성될 수 있다.The at least one artificial neural network memory control unit may be configured to secure an effective bandwidth of the system bus until the at least one memory completes the memory access request.

적어도 하나의 인공신경망 메모리 제어부는 인공신경망 데이터 지역성 패턴에 기초하여 특정 메모리 접근 요청을 처리하기 위해서 시스템 버스에게 요구되는 특정 대역폭을 계산하고, 적어도 하나의 인공신경망 메모리 제어부는 특정 대역폭에 기초하여 시스템 버스의 실효 대역폭을 제어하도록 구성될 수 있다.The at least one artificial neural network memory control unit calculates a specific bandwidth required for the system bus to process a specific memory access request based on the artificial neural network data locality pattern, and the at least one artificial neural network memory control unit calculates a specific bandwidth required for the system bus based on the specific bandwidth. may be configured to control the effective bandwidth of

적어도 하나의 인공신경망 메모리 제어부는 시스템 버스 내부에 배치되고, 시스템 버스는 시스템 버스 내에서 생성된 인공신경망 데이터 지역성 패턴에 기초하여 시스템 버스의 대역폭을 동적으로 가변 하도록 구성될 수 있다.The at least one artificial neural network memory controller may be disposed inside the system bus, and the system bus may be configured to dynamically vary the bandwidth of the system bus based on the artificial neural network data locality pattern generated in the system bus.

적어도 하나의 인공신경망 메모리 제어부는 메모리 접근 요청의 처리 시간동안 인공신경망 연산을 우선 처리하도록 동작하고, 이외의 시간 동안 인공신경망 연산 이외의 연산을 처리하도록 구성될 수 있다.The at least one artificial neural network memory control unit may be configured to operate to preferentially process an artificial neural network operation during a processing time of a memory access request, and to process an operation other than an artificial neural network operation for other times.

적어도 하나의 인공신경망 메모리 제어부와 적어도 하나의 프로세서는 직접 통신하도록 구성될 수 있다.The at least one artificial neural network memory controller and the at least one processor may be configured to communicate directly.

인공신경망 메모리 제어부는 인공신경망 연산 전용 접근 순서인 제1 접근 순서 및 인공신경망 연산 이외의 접근 순서인 제2 접근 순서를 더 포함하고, 인공신경망 메모리 제어부는 우선순위 설정에 따라서 각각의 접근 순서를 선택하여 데이터를 제공하도록 구성될 수 있다.The artificial neural network memory control unit further includes a first access order that is an access order dedicated to artificial neural network operation and a second access order that is an access order other than artificial neural network operation, and the artificial neural network memory control unit selects each access order according to a priority setting to provide data.

적어도 하나의 인공신경망 메모리 제어부는 계층화 된 복수의 캐쉬 메모리를 더 포함하고 적어도 하나의 인공신경망 메모리 제어부는 계층화 된 복수의 캐쉬 메모리의 계층간 데이터 접근 요청을 기계학습을 하도록 구성된 인공신경망모델을 더 포함하도록 구성될 수 있다.The at least one artificial neural network memory control unit further includes a plurality of layered cache memories, and the at least one artificial neural network memory control unit further includes an artificial neural network model configured to machine learning a data access request between layers of a plurality of layered cache memories. can be configured to

적어도 하나의 인공신경망 메모리 제어부는 계층화 된 복수의 캐쉬 메모리 각각의 계층의 실효 대역폭, 소비 전력, 및 레이턴시 정보 중 적어도 하나를 더 제공 받도록 구성될 수 있다.The at least one artificial neural network memory controller may be configured to further receive at least one of effective bandwidth, power consumption, and latency information of each layer of a plurality of layered cache memories.

인공신경망 연산에 대응되는 데이터 접근 요청을 생성하도록 구성된 적어도 하나의 프로세서 및 컴파일러로부터 생성된 인공신경망 연산의 인공신경망 데이터 지역성 패턴을 저장하도록 구성되고, 인공신경망 데이터 지역성 패턴에 기초하여 적어도 하나의 프로세서가 생성한 데이터 접근 요청의 실제 데이터 접근 요청을 예측한 예측된 데이터 접근 요청을 생성하도록 구성된 적어도 하나의 인공신경망 메모리 제어부 및 적어도 하나의 인공신경망 메모리 제어부와 통신하도록 구성된 적어도 하나의 메모리를 포함하고, 적어도 하나의 메모리는 적어도 하나의 인공신경망 메모리 제어부에서 출력되는 메모리 접근 요청에 대응하여 동작하도록 구성될 수 있다.At least one processor configured to generate a data access request corresponding to the artificial neural network operation and configured to store an artificial neural network data locality pattern of an artificial neural network operation generated by a compiler, wherein the at least one processor is configured to store an artificial neural network data locality pattern based on the artificial neural network data locality pattern at least one neural network memory controller configured to generate a predicted data access request that predicts the actual data access request of the generated data access request, and at least one memory configured to communicate with the at least one neural network memory controller, at least One memory may be configured to operate in response to a memory access request output from at least one artificial neural network memory controller.

적어도 하나의 인공신경망 메모리 시스템은 적어도 하나의 메모리 및 인공신경망 메모리 제어부, 적어도 하나의 프로세서, 및 적어도 하나의 메모리의 통신을 제어하도록 구성된 시스템 버스를 더 포함하도록 구성될 수 있다.The at least one neural network memory system may be configured to further include at least one memory and a system bus configured to control communication of the neural network memory controller, the at least one processor, and the at least one memory.

적어도 하나의 인공신경망 메모리 제어부는 시스템 버스 내에 배치되고, 적어도 하나의 인공신경망 메모리 제어부는 적어도 하나의 메모리가 메모리 접근 요청에 대한 응답을 완료할 때까지, 상기 시스템 버스의 제어 권한을 상기 메모리 접근 요청이 없을 때보다 상대적으로 더 높게 증가시키도록 구성될 수 있다.The at least one artificial neural network memory control unit is disposed in the system bus, and the at least one artificial neural network memory control unit grants control authority of the system bus until the at least one memory completes a response to the memory access request. It can be configured to increase relatively higher than in the absence of it.

적어도 하나의 인공신경망 메모리 제어부의 적어도 일부는 DRAM에 포함되도록 구성될 수 있다.At least a portion of the at least one artificial neural network memory controller may be configured to be included in the DRAM.

적어도 하나의 인공신경망 메모리 제어부의 적어도 일부는 적어도 하나의 프로세서에 포함되도록 구성될 수 있다.At least a portion of the at least one artificial neural network memory controller may be configured to be included in the at least one processor.

DRAM을 더 포함하거나 또는 적어도 하나의 메모리는 DRAM이고, 적어도 하나의 인공신경망 메모리 제어부는 메모리 접근 요청의 접근 순서(access que)를 재조정하도록 구성될 수 있다. 즉, DRAM의 메모리 컨트롤러의 리-오더 큐(Reorder cue)를 제어하도록 구성될 수 있다. The DRAM may further include or at least one memory is DRAM, and the at least one artificial neural network memory controller may be configured to readjust an access que of a memory access request. That is, it may be configured to control a reorder cue of the memory controller of the DRAM.

인공신경망 메모리 제어부가 메모리의 메모리 컨트롤러에게 제공하는 인공신경망 연산 관련 메모리 접근 요청에 메모리의 메모리 컨트롤러가 해석할 수 있는 우선순위 정보를 더 포함하도록 구성될 수 있다. The artificial neural network memory controller may be configured to further include priority information that the memory controller of the memory can interpret in the request for accessing the artificial neural network operation-related memory provided to the memory controller of the memory.

상술한 구성에 따르면, 메모리의 메모리 컨트롤러는 해당 메모리 접근 요청이 인공신경망 연산과 관련된 것인지 여부와 상관없이 인공신경망 메모리 제어부가 생성한 메모리 접근 요청이 포함하는 우선순위 정보에 기초하여 메모리 컨트롤러 내부의 메모리 접근 순서를 재조정(re-order)하도록 구성될 수 있다. 따라서 인공신경망 연산 처리를 위한 메모리 접근 요청의 접근 순서가 다른 종류의 메모리 접근 요청의 접근 순서에 비해 먼저 처리될 수 있다. 따라서 인공신경망 메모리 제어부는 대응되는 메모리의 실효 대역폭을 상승시킬 수 있는 효과가 있다.According to the above configuration, the memory controller of the memory is configured to store the memory inside the memory controller based on the priority information included in the memory access request generated by the artificial neural network memory controller regardless of whether the memory access request is related to the artificial neural network operation. It may be configured to re-order the access order. Therefore, the access order of memory access requests for artificial neural network operation processing can be processed before the access order of other types of memory access requests. Accordingly, the artificial neural network memory controller has an effect of increasing the effective bandwidth of the corresponding memory.

DRAM의 메모리 컨트롤러가 결정한 메모리 접근 요청 처리 순서를 인공신경망 메모리 제어부가 제공하는 우선순위 정보에 의해서 재조정하도록 구성될 수 있다. It may be configured to readjust the memory access request processing order determined by the memory controller of the DRAM according to the priority information provided by the artificial neural network memory controller.

예를 들면, 인공신경망 메모리 제어부가 생성한 메모리 접근 요청의 우선순위를 긴급으로 설정하면, DRAM의 메모리 컨트롤러는 해당 메모리 접근 요청의 처리 순서를 제1 순위로 변경할 수도 있다. For example, if the priority of the memory access request generated by the artificial neural network memory controller is set to urgent, the memory controller of the DRAM may change the processing order of the memory access request to the first priority.

인공신경망 메모리 제어부는 적어도 하나의 접근 순서를 생성하도록 구성될 수 있다.The artificial neural network memory controller may be configured to generate at least one access sequence.

적어도 하나의 메모리에 인공신경망 메모리 제어부가 포함되고, 인공신경망 메모리 제어부는 인공신경망 연산 전용 접근 순서를 별도로 생성하도록 구성될 수 있다.The at least one memory may include an artificial neural network memory controller, and the artificial neural network memory controller may be configured to separately generate an access sequence dedicated to artificial neural network computation.

적어도 하나의 인공신경망 메모리 제어부는 메모리 접근 요청의 접근 순서를 재조정하도록 구성될 수 있다.The at least one artificial neural network memory control unit may be configured to readjust the access order of the memory access request.

적어도 하나의 메모리는 읽기-버스트 기능을 더 포함하고, 적어도 하나의 인공신경망 메모리 제어부는 적어도 하나의 메모리의 저장 영역을 읽기-버스트 기능을 고려하여 설정하도록 구성될 수 있다.The at least one memory may further include a read-burst function, and the at least one artificial neural network memory controller may be configured to set a storage area of the at least one memory in consideration of the read-burst function.

적어도 하나의 메모리는 읽기-버스트 기능을 더 포함하고, 적어도 하나의 인공신경망 메모리 제어부는 적어도 하나의 메모리의 저장 영역을 읽기-버스트 기능을 고려하여 쓰기 동작을 처리 하도록 구성될 수 있다.The at least one memory may further include a read-burst function, and the at least one artificial neural network memory controller may be configured to process a write operation in the storage area of the at least one memory in consideration of the read-burst function.

적어도 하나의 프로세서는 복수의 프로세서를 더 포함하고, 적어도 하나의 인공신경망 메모리 제어부는 복수의 프로세서 중 인공신경망 연산을 처리하는 프로세서의 데이터 접근 요청의 우선 순위를 인공신경망 연산 이외의 연산을 처리하는 프로세서보다 더 높게 설정하도록 구성될 수 있다.The at least one processor further includes a plurality of processors, and the at least one artificial neural network memory control unit is configured to prioritize a data access request of a processor that processes an artificial neural network operation among the plurality of processors. A processor that processes operations other than the artificial neural network operation It can be configured to set higher than that.

적어도 하나의 AMC 각각은 내부에 저장된 각각의 ANN DL을 기초로 독립적으로 동작하도록 구성될 수 있다. 각각의 ANN DL은 서로 동일하거나 또는 서로 상이할 수 있다. 각각의 ANN DL은 각각의 AMC의 배치된 위치에 따라서 서로 다른 ANN DL을 가지도록 구성될 수 있다. 부연 설명하면, ANN DL은 AMC가 위치한 통신 버스에서의 처리되는 ANN 모델의 ANN DL을 분석하고, 예측된 데이터를 미리 준비하도록 구성되었다. 따라서, 제1 배치 위치에서 제1 AMC가 인식하는 제1 ANN DL은 제2 배치 위치에서 제2 AMC가 인식하는 제2 ANN DL과 서로 상이할 수 있다. 하지만 각각의 AMC는 각각의 ANN DL을 기초로 해당 위치에서 독립적으로 동작할 수 있는 장점이 있다. Each of the at least one AMC may be configured to operate independently based on each ANN DL stored therein. Each ANN DL may be the same as or different from each other. Each ANN DL may be configured to have a different ANN DL according to the arrangement position of each AMC. To elaborate, the ANN DL is configured to analyze the ANN DL of the ANN model processed on the communication bus where the AMC is located, and prepare the predicted data in advance. Accordingly, the first ANN DL recognized by the first AMC at the first arrangement position may be different from the second ANN DL recognized by the second AMC at the second arrangement position. However, each AMC has the advantage of being able to operate independently at a corresponding location based on each ANN DL.

도 15는 메모리가 실장된 기판과 채널을 나타낸 예시도이다.15 is an exemplary diagram illustrating a substrate and a channel on which a memory is mounted.

도시된 바와 같이, 메모리가 실장된(mounted) 기판, 즉 보드 상에는 메모리 버스와 통신을 위한 복수개의 핀들(pins)이 형성되어 있다.As shown, a plurality of pins for communication with a memory bus are formed on a board on which the memory is mounted, that is, the board.

상기 메모리의 메모리 버스는 주소 버스(예컨대, 17비트), 명령 및 제어 버스(예컨대, 6비트)와 그리고 데이터 버스(예컨대, 64비트)를 포함할 수 있다. 부연 설명하면, 상기 메모리 버스는 도 12에서 예시된 사이드 밴드 시그널을 선택적으로 더 포함하는 것도 가능하다. The memory bus of the memory may include an address bus (eg, 17 bits), a command and control bus (eg, 6 bits), and a data bus (eg, 64 bits). To elaborate, it is also possible that the memory bus optionally further includes the side band signal illustrated in FIG. 12 .

즉, 추가되는 사이드 밴드 시그널 신호에 따라서, 본 개시의 다양한 예시들에 따른 SAM 제어부는 메모리의 메모리셀 영역을 선택적으로 구분하여 제어하도록 구성될 수 있다. 단, 이에 제한되지 않으며, 사이드 밴드 시그널을 대체한 IP 헤더 (header) 패킷 (packet) 방식으로 구현되는 것도 가능하다.That is, according to the added sideband signal signal, the SAM controller according to various examples of the present disclosure may be configured to selectively classify and control the memory cell region of the memory. However, the present invention is not limited thereto, and may be implemented in an IP header packet method replacing the sideband signal.

본 명세서는 인공신경망(artificial neural network: ANN) 데이터 지역성(data locality: DL)에 따라 동작하도록 설정된 메모리, 예컨대 SAM(Sequential Access Memory) 및 SAM 제어부를 제시한다. SAM은 인공신경망 전용 메모리를 의미할 수 있다. SAM 제어부는 SAM을 제어하는 메모리 제어부를 의미할 수 있다.The present specification provides a memory configured to operate according to an artificial neural network (ANN) data locality (DL), for example, a sequential access memory (SAM) and a SAM controller. SAM may mean an artificial neural network dedicated memory. The SAM control unit may mean a memory control unit that controls the SAM.

즉, 본 개시의 일 예시에 따른 SAM은 DRAM 메모리의 랜덤 액세스(RANDOM ACCESS) 특성을 배제하고, 인공신경망 데이터 지역성(ANN DL) 정보에 따라서 순차적으로 동작하도록 설정된, 인공신경망 처리에 특화된 메모리를 의미할 수 있다. 단, SAM의 메모리셀 구조는 DRAM에 한정되지 않으며, DRAM과 유사한 구조의 메모리 셀을 가지는 메모리에 적용되는 것도 가능하다. 즉, 인공신경망 데이터 지역성 정보에 기초하면 메모리의 주소 접근을 순차적으로 할 수 있는 순차 접근 정보가 도출될 수 있다. That is, the SAM according to an example of the present disclosure excludes the random access (RANDOM ACCESS) characteristic of the DRAM memory, and is set to sequentially operate according to the artificial neural network data locality (ANN DL) information. It means a memory specialized for artificial neural network processing. can do. However, the memory cell structure of the SAM is not limited to DRAM, and it may be applied to a memory having a memory cell having a structure similar to that of DRAM. That is, based on the artificial neural network data locality information, sequential access information capable of sequentially accessing an address of a memory can be derived.

SAM은 기본적으로 읽기/쓰기 명령을 BURST MODE 방식으로 처리하도록 구성될 수 있다. 이때, 읽기 / 쓰기 명령은 인공신경망 데이터 지역성(ANN DL) 정보 단위로 동작하도록 설정될 수 있다. 즉, SAM 제어부는 ANN DL 단위로 SAM에게 메모리 오퍼레이션을 요청하도록 구성될 수 있다. 이때, SAM은 별도의 BURST MODE 명령 없이도, ANN DL 단위의 메모리 오퍼레이션을 실질적인 BURST MODE로 동작하게끔 메모리 맵이 설정될 수 있다. 여기서 ANN DL 단위는 프로서세가 인공신경망 데이터 지역성 정보에 기초하여 메모리 또는 AMC에 요청하는 최소 단위의 데이터 접근 요청을 의미할 수 있다.SAM can be configured to process read/write commands in BURST MODE basically. In this case, the read/write command may be set to operate in units of artificial neural network data locality (ANN DL) information. That is, the SAM control unit may be configured to request a memory operation from the SAM in units of ANN DLs. In this case, the memory map may be set so that the SAM operates the memory operation of the ANN DL unit in the actual BURST MODE without a separate BURST MODE command. Here, the ANN DL unit may mean a data access request of the minimum unit requested by the processor to the memory or AMC based on the artificial neural network data locality information.

ANN DL이 제공됨에 따라, SAM은 메모리의 랜덤 액세스 특성을 실질적으로 제거할 수 있다. SAM은 인공신경망 데이터 지역성(ANN DL) 정보에 의해서 실질적인 BURST MODE로 동작하므로, CAS latency & RAS latency의 발생 빈도를 최소화 시킬 수 있다. As the ANN DL is provided, the SAM can substantially eliminate the random access nature of the memory. Since SAM operates in the actual Burst mode based on artificial neural network data locality (ANN DL) information, the frequency of occurrence of CAS latency & RAS latency can be minimized.

부연 설명하면, 종래의 메모리의 랜덤 액세스 동작은 프로세서의 메모리 오퍼레이션 순서를 예측하지 못하는 상황에서만 효율적이다. In other words, the conventional random access operation of the memory is effective only in a situation in which the memory operation order of the processor cannot be predicted.

이와 반대로 SAM은 ANN DL을 기초로 프로세서가 요청할 메모리 오퍼레이션 요청 순서를 미리 알 수 있다. 따라서, SAM은 인공신경망 데이터 지역성 정보를 기초로 소비전력 및 레이턴시가 최소화된 메모리 동작을 제공 할 수 있다. Conversely, the SAM may know in advance the order of memory operation requests to be requested by the processor based on the ANN DL. Therefore, the SAM can provide a memory operation with minimized power consumption and latency based on locality information of artificial neural network data.

SAM과 SAM 제어부 사이의 메모리 버스에는 적어도 하나의 사이드 밴드 시그널이 더 포함될 수 있다. The memory bus between the SAM and the SAM controller may further include at least one side band signal.

SAM 제어부와 프로세서 사이의 시스템 버스에는 적어도 하나의 사이드 밴드 시그널이 더 포함될 수 있다. 메모리 버스와 시스템 버스 각각의 사이드 밴드 시그널의 개수는 서로 동일하거나 또는 상이할 수 있다.The system bus between the SAM controller and the processor may further include at least one side band signal. The number of side band signals of each of the memory bus and the system bus may be the same or different from each other.

단, 본 개시는 이에 제한되지 않으며, 사이드 밴드 시그널에 대응되는 정보를 포함하는 패킷 형태로 구현될 수 있다.However, the present disclosure is not limited thereto, and may be implemented in the form of a packet including information corresponding to a sideband signal.

SAM은 메모리의 랜덤 액세스 특성을 제거하고, 인공신경망 데이터 지역성(ANN DL) 정보에 의해서 동작하므로, SAM의 메모리 셀의 정밀한 리프레쉬 타이밍(Refresh timing) 제어가 가능하도록 구성될 수 있다. 동적 메모리 셀은 주기적으로 리프레쉬가 필요하며, SAM으로 구현된 동적 메모리는 ANN DL을 기초로 리프레시가 제어되도록 구성될 수 있다. Since the SAM removes the random access characteristic of the memory and operates based on artificial neural network data locality (ANN DL) information, it may be configured to enable precise refresh timing control of the memory cells of the SAM. A dynamic memory cell needs periodic refresh, and a dynamic memory implemented with SAM may be configured such that refresh is controlled based on ANN DL.

SAM은 메모리의 랜덤 액세스 특성을 제거하고, 인공신경망 데이터 지역성(ANN DL) 정보에 의해서 동작하므로, SAM의 메모리 셀의 정밀한 프리차지 타이밍(Pre-charge timing) 제어가 가능하도록 구성될 수 있다. 동적 메모리 셀은 센스 앰프 동작을 위해서 프리차지가 필요하며, SAM으로 구현된 동적 메모리는 ANN DL을 기초로 프리차지가 제어되도록 구성될 수 있다.Since the SAM removes the random access characteristic of the memory and operates based on artificial neural network data locality (ANN DL) information, it can be configured to enable precise pre-charge timing control of the memory cells of the SAM. The dynamic memory cell requires precharge for the sense amplifier operation, and the dynamic memory implemented as SAM may be configured to control the precharge based on the ANN DL.

SAM은 인공신경망 데이터 지역성(ANN DL) 정보 별 또는 도메인 별로 할당되는 메모리의 영역을 확정하도록 구성될 수 있다.The SAM may be configured to determine an area of memory allocated for each artificial neural network data locality (ANN DL) information or for each domain.

도 16은 다중 뱅크 구조의 SAM에서 데이터를 읽는 과정을 나타낸 예시도이다.16 is an exemplary diagram illustrating a process of reading data from a SAM having a multi-bank structure.

도 16에 도시된 SAM은 종래의 DRAM의 메모리셀 동작의 일부를 활용할 수 있다. SAM은 적어도 하나의 뱅크를 포함할 수 있다. The SAM shown in FIG. 16 may utilize a part of the memory cell operation of a conventional DRAM. The SAM may include at least one bank.

도 16을 참조하면 SAM은 매트릭스 형태의 메모리 셀들은 행과 열의 주소를 가지도록 구성될 수 있다. 하나의 뱅크는 복수의 메모리 셀을 묶어서 구성될 수 있다. Referring to FIG. 16 , the SAM may be configured such that memory cells in a matrix form have row and column addresses. One bank may be configured by bundling a plurality of memory cells.

SAM의 대역폭(BANDWIDTH) 향상을 위해서 SAM의 각 뱅크 내의 메모리 셀들을 인터레이싱(Interlacing)하도록 구성될 수 있다. In order to improve the bandwidth (BANDWIDTH) of the SAM, it may be configured to interlace memory cells in each bank of the SAM.

SAM의 대역폭 향상을 위해서 SAM의 뱅크 단위로 인터리빙(Interleaving)하도록 구성될 수 있다.In order to improve the bandwidth of the SAM, it may be configured to perform interleaving in units of banks of the SAM.

인공신경망 데이터 지역성(ANN DL) 정보에 따라서 SAM의 메모리 셀의 RAS(Row Address Strobe) 신호 및/또는 CAS(Column Address Strobe) 신호가 직접 제어될 수 있다. 따라서 SAM 제어부는 ANN DL에 따라 순차적으로 데이터를 읽어 내거나, 데이터를 쓰도록 SAM을 제어할 수 있다.A row address strobe (RAS) signal and/or a column address strobe (CAS) signal of a memory cell of the SAM may be directly controlled according to the artificial neural network data locality (ANN DL) information. Accordingly, the SAM control unit may control the SAM to sequentially read data or write data according to the ANN DL.

도 12, 도 15 또는 도 16을 참조하면, 본 개시의 일 예시에 따른 SAM은 복수의 뱅크를 포함할 수 있다. 이러한 경우, SAM은 적어도 하나의 사이드 밴드 시그널에 기초하여 특정 뱅크 및/또는 특정 메모리 셀 영역을 특정 용도에 맞게 할당하도록 구성될 수 있다.12, 15 or 16 , the SAM according to an example of the present disclosure may include a plurality of banks. In this case, the SAM may be configured to allocate a specific bank and/or a specific memory cell region for a specific purpose based on at least one side band signal.

예를 들면, 도메인에 따라서 SAM의 제1 뱅크는 특징 맵 전용으로 할당될 수 있다.For example, depending on the domain, the first bank of the SAM may be allocated exclusively for the feature map.

예를 들면, 도메인에 따라서 SAM의 제2 뱅크는 커널 전용으로 할당될 수 있다.For example, depending on the domain, the second bank of the SAM may be allocated exclusively for the kernel.

본 개시의 일 예시에 따른 SAM은 적어도 하나의 뱅크를 포함할 수 있다. 이러한 경우, SAM은 적어도 하나의 사이드 밴드 시그널에 기초하여 적어도 하나의 뱅크의 특정 행(rows)들을 특정 용도에 맞게 할당하도록 구성될 수 있다.A SAM according to an example of the present disclosure may include at least one bank. In this case, the SAM may be configured to allocate specific rows of at least one bank for a specific purpose based on at least one side band signal.

예를 들면, SAM은 도메인에 따라서 제1 뱅크의 제1 영역의 행들을 특징 맵 전용으로 할당할 수 있다.For example, the SAM may allocate the rows of the first area of the first bank exclusively to the feature map according to the domain.

예를 들면, SAM은 도메인에 따라서 제1 뱅크의 제2 영역의 행들을 가중치 전용으로 할당할 수 있다.For example, the SAM may allocate the rows of the second area of the first bank exclusively for weights according to domains.

도 12 또는 도 15를 다시 참조하면, 본 개시의 일 예시에 따른 SAM은 ANN MODEL# 신호에 기초하여 특정 뱅크의 특정 행들을 특정 용도에 맞게 할당하도록 구성될 수 있다. 본 개시의 일 예시에 따른 SAM은 적어도 ANN DATA LOCALITY에 기초하여 특정 뱅크의 특정 행들을 특정 용도에 맞게 할당하도록 구성될 수 있다. 즉, SAM은 적어도 하나의 사이드 밴드 시그널에 기초하여 특정 뱅크 또는 특정 행들의 메모리 셀들을 특정 용도에 맞게 할당하도록 구성될 수 있다.12 or 15 again, the SAM according to an example of the present disclosure may be configured to allocate specific rows of a specific bank for a specific purpose based on the ANN MODEL# signal. SAM according to an example of the present disclosure may be configured to allocate specific rows of a specific bank to a specific purpose based on at least ANN DATA LOCALITY. That is, the SAM may be configured to allocate memory cells of a specific bank or specific rows to a specific purpose based on at least one side band signal.

단, 본 개시는 이에 제한되지 않으며 SAM은 별도의 사이드 밴드 시그널이 없더라도, SAM 제어부가 인공신경망 데이터 지역성(ANN DL) 정보에 기초하여 SAM의 메모리 어드레스를 직접 제어함으로 써 구현되는 것도 가능하다.However, the present disclosure is not limited thereto, and even if there is no separate sideband signal, the SAM controller may be implemented by directly controlling the memory address of the SAM based on the artificial neural network data locality (ANN DL) information.

도 17은 종래의 DRAM에서 발생하는 레이턴시를 나타낸 예시도이다.17 is an exemplary diagram illustrating latency occurring in a conventional DRAM.

도 17을 참고하면, CPU, 종래의 메모리 컨트롤러 그리고 종래의 DRAM 사이의 레이턴시가 개략적으로 도시되어 있다. Referring to FIG. 17 , the latency between a CPU, a conventional memory controller, and a conventional DRAM is schematically illustrated.

종래의 CPU는 다양한 연산을 처리하기 위해서 TLB(TRANSLATION LOOKASIDE BUFFER)를 활용한 가상 메모리를 사용한다. 따라서 종래의 DRAM에 저장되는 인공신경망 데이터는 파편화(FRAGMENT)되어 상기 DRAM에 저장된다.A conventional CPU uses a virtual memory utilizing a TLB (Translation LOOKASIDE BUFFER) to process various operations. Therefore, artificial neural network data stored in the conventional DRAM is fragmented and stored in the DRAM.

상기 CPU가 상기 DRAM으로부터 데이터를 읽어내는 동작은 A 과정부터 F 과정을 포함할 수 있다. 각 과정에서는 레이턴시가 발생할 수 있다.The operation of the CPU reading data from the DRAM may include steps A through F. Each process may incur latency.

A 과정에서는, 상기 CPU는 트랜잭션 요청(transaction request)을 생성한다. 이 과정에서 상기 트랜잭션 요청은 상기 CPU의 큐(Queue) 내에 일시 대기할 수 있고, 이에 따라 레이턴시가 발생할 수 있다. B 과정에서, 상기 CPU는 상기 트랜잭션 요청을 상기 메모리 컨트롤러로 전송할 수 있다. 상기 C 과정에서, 상기 메모리 컨트롤러는, 상기 트랜잭션 요청을 명령어 시퀀스들로 변환할 수 있다. 상기 D 과정에서, 상기 메모리 컨트롤러는 상기 명령어 시퀀스들을 상기 DRAM으로 전달할 수 있다. 상기 E 과정에서는, 상기 DRAM은 상기 명령어 시퀀스들을 처리하기 위하여, 하나의 CAS 신호, 혹은 RAS 신호와 CAS 신호의 조합 혹은 pre-charge(PRE) 신호, RAS 신호 그리고 CAS 신호의 조합을 사용할 수 있다. F 과정에서는, 트랜잭션 에 따른 데이터가 상기 CPU로 전달된다.In step A, the CPU generates a transaction request. In this process, the transaction request may be temporarily queued in a queue of the CPU, and thus latency may occur. In step B, the CPU may transmit the transaction request to the memory controller. In step C, the memory controller may convert the transaction request into command sequences. In step D, the memory controller may transfer the command sequences to the DRAM. In step E, the DRAM may use one CAS signal, a combination of a RAS signal and a CAS signal, or a combination of a pre-charge (PRE) signal, a RAS signal, and a CAS signal to process the command sequences. In process F, data according to the transaction is transferred to the CPU.

상기 A 과정부터 F 과정 까지의 레이턴시는 A+B+C+D+E+F를 포함할 수 있다.The latency from process A to process F may include A+B+C+D+E+F.

E1 과정은 종래의 DRAM에 요청된 데이터 오퍼레이션에 대응되는 데이터 전부가 도 31a에 도시된 센스 앰프에 래치된 경우에 발생할 수 있다.Process E1 may occur when all data corresponding to a data operation requested by the conventional DRAM is latched in the sense amplifier shown in FIG. 31A .

E2 과정은 종래의 DRAM에 요청된 데이터 오퍼레이션에 대응되는 데이터 일부가 복수의 행의 메모리 셀들에 파편화된 경우에 발생할 수 있다.Process E2 may occur when a portion of data corresponding to a data operation requested by the conventional DRAM is fragmented in memory cells of a plurality of rows.

E3 과정은 종래의 DRAM에 요청된 데이터 오퍼레이션에 대응되는 데이터 일부가 복수의 행의 메모리 셀들에 파편화되고, 다양한 사유로 메모리 셀을 프리차지하는 경우에 발생할 수 있다.Process E3 may occur when a portion of data corresponding to a data operation requested by the conventional DRAM is fragmented into a plurality of rows of memory cells, and the memory cells are precharged for various reasons.

여기서, RAS는 Row Address Strobe (RAS) 신호를 의미하고, CAS는 Column Address Strobe (CAS) 신호를 의미하고, PRE는 Pre-charge 신호를 의미한다. 각각의 신호는 대응되는 각각의 지연시간을 포함한다.Here, RAS means a Row Address Strobe (RAS) signal, CAS means a Column Address Strobe (CAS) signal, and PRE means a pre-charge signal. Each signal includes a corresponding respective delay time.

종래의 DRAM과 종래의 메모리 컨트롤러가 인공신경망 데이터를 처리할 때, ANN DL 정보를 고려하지 않는다. 따라서, 인공신경망 데이터가 파편화되고, 이를 가상 메모리로 처리하게 된다. 따라서 종래의 경우는 E1의 경우가 아닌 E2와 E3의 경우가 빈번하게 발생된다. 따라서, 종래의 DRAM에 의해서 인공신경망 처리의 병목현상을 유발할 수 있게 된다. When conventional DRAM and conventional memory controllers process artificial neural network data, ANN DL information is not considered. Accordingly, the artificial neural network data is fragmented and processed as a virtual memory. Therefore, in the conventional case, the cases of E2 and E3 rather than the case of E1 occur frequently. Therefore, it is possible to cause a bottleneck of artificial neural network processing by the conventional DRAM.

이에 반해서, 본 개시의 일 예시에 따른 SAM의 경우에는 ANN DL에 기초하기 때문에, E2와 E3의 발생 빈도를 제거하거나 또는 최소화 하여, E1의 발생 빈도를 최대화 할 수 있다. CAS 신호, RAS 신호, PRE 신호에 따른 지연시간이 저감될 수 있다. 따라서 인공신경망 연산 처리 속도를 향상시킬 수 있다.In contrast, in the case of the SAM according to an example of the present disclosure, since it is based on the ANN DL, the frequency of occurrence of E1 may be maximized by removing or minimizing the frequency of occurrence of E2 and E3. The delay time according to the CAS signal, the RAS signal, and the PRE signal may be reduced. Therefore, it is possible to improve the processing speed of the artificial neural network calculation.

도 18은 본 명세서의 개시에 따른 SAM(Sequential Access Memory)의 기본 개념을 나타낸 예시도이다.18 is an exemplary diagram illustrating a basic concept of a Sequential Access Memory (SAM) according to the disclosure of the present specification.

도 18에는 메인 메모리인 SAM, SAM 컨트롤러 및 프로세서가 나타나 있다. 상기 SAM 컨트롤러는 상기 프로세서와 상기 SAM 사이에 배치되어 상기 SAM을 제어한다.18 shows the main memory, the SAM, the SAM controller, and the processor. The SAM controller is disposed between the processor and the SAM to control the SAM.

상기 SAM 컨트롤러는 상기 메인 메모리인 SAM과 일체화 될 수도 있고 또는 상기 SAM과 물리적으로 분리되어 구현될 수도 있다. 대안적으로, 상기 SAM 컨트롤러는 상기 프로세서에 내장될 수 있다. 그 밖에, 상기 SAM 컨트롤러는 다양한 형태로 구현될 수도 있다.The SAM controller may be integrated with the SAM, which is the main memory, or may be implemented physically separated from the SAM. Alternatively, the SAM controller may be embedded in the processor. In addition, the SAM controller may be implemented in various forms.

상기 SAM 컨트롤러는 인공신경망(ANN)을 처리할 프로세서, 예컨대 NPU 또는 컴파일러(COMPILER)로부터 인공신경망 데이터 지역성(ANN DL) 정보를 제공받을 수 있다. The SAM controller may receive artificial neural network data locality (ANN DL) information from a processor to process an artificial neural network (ANN), for example, an NPU or a compiler (COMPILER).

상기 인공신경망 데이터 지역성 (ANN DL) 정보는 NPU 제어를 위한 레지스터 맵에 포함되거나 또는 별도의 레지스터 맵 또는 테이블로 제공될 수 있다. The artificial neural network data locality (ANN DL) information may be included in a register map for NPU control or may be provided as a separate register map or table.

상기 인공신경망 데이터 지역성 (ANN DL) 정보는 SAM 제어부 제어를 위한 레지스터 맵에 포함되거나 또는 별도의 레지스터 맵 또는 테이블로 제공될 수 있다.The artificial neural network data locality (ANN DL) information may be included in a register map for controlling the SAM controller or may be provided as a separate register map or table.

상기 인공신경망 데이터 지역성 정보는 프로세서 (즉, NPU)와 SAM 제어부에 각각 제공되는 것도 가능하다. 또한, NPU와 SAM 제어부에 각각 제공되는 인공신경망 데이터 지역성 정보는 서로 동일하거나, 또는 적어도 일부만 동일할 수 있다.The artificial neural network data locality information may be provided to the processor (ie, NPU) and the SAM controller, respectively. In addition, the artificial neural network data locality information provided to the NPU and the SAM control unit may be identical to each other, or may be at least partially identical to each other.

상기 SAM 컨트롤러는 인공신경망 데이터 지역성(ANN DL) 정보 내의 순서 정보에 따라서 상기 메인 메모리인 SAM에 읽기 / 쓰기 명령을 전달하고, 상기 프로세서가 요청하는 데이터를 제공하는 역할을 수행할 수 있다.The SAM controller may transmit a read/write command to the SAM, which is the main memory, according to order information in the artificial neural network data locality (ANN DL) information, and may serve to provide data requested by the processor.

상기 메인 메모리인 SAM은 상기 인공신경망 데이터 지역성(ANN DL) 정보 내의 상기 순서 정보에 따라 요청할 데이터 크기를 결정할 수 있다. 인공신경망 데이터 지역성(ANN DL) 정보는 프로세서(즉, NPU) 내의 PE 개수, 상기 프로세서(즉, NPU)의 캐쉬 메모리 크기, 해당 레이어를 위해서 사용될 커널(KERNEL), 특징 맵(FEATURE MAP) 크기 등에 따라서 달라질 수 있다. The SAM, which is the main memory, may determine a data size to request according to the order information in the artificial neural network data locality (ANN DL) information. Artificial neural network data locality (ANN DL) information includes the number of PEs in the processor (ie NPU), the cache memory size of the processor (ie NPU), the kernel to be used for the layer (KERNEL), the feature map (FEATURE MAP) size, etc. Therefore, it may vary.

예를 들어, 데이터의 크기가 캐쉬 메모리(CACHE MEMORY)의 크기 보다 커질 경우, 상기 프로세서는 TILING 알고리즘을 사용할 수 있다. 그리고 SAM 컨트롤러는 상기 프로세서가 처리하는 방식에 대응되어 동작하도록 구성될 수 있다.For example, when the size of data is larger than the size of the cache memory (CACHE MEMORY), the processor may use the TILING algorithm. In addition, the SAM controller may be configured to operate according to the processing method of the processor.

예를 들어, 메인 메모리의 레이턴시 하이딩(latency hiding)을 위해서, 인공신경망 데이터 지역성(ANN DL)이 결정될 수 있다. 부연 설명하면, 레이턴시 하이딩을 위해서 최소한의 클럭 수에 대응되는 크기의 데이터를 먼저 캐슁 하도록 인공신경망 데이터 지역성을 설정하는 것도 가능하다. For example, for latency hiding of the main memory, artificial neural network data locality (ANN DL) may be determined. In other words, for latency hiding, it is also possible to set the locality of the artificial neural network data so that data of a size corresponding to the minimum number of clocks is first cached.

프로세서의 처리 방식이 변경될 경우, 예를 들어, WEIGHT STATIONARY, INPUT STATIONARY, OUTPUT STATIONARY 방식에 따라서 상기 인공신경망 데이터 지역성(ANN DL) 정보가 변경될 수 있다. When the processing method of the processor is changed, for example, the artificial neural network data locality (ANN DL) information may be changed according to the WEIGHT STATIONARY, INPUT STATIONARY, and OUTPUT STATIONARY methods.

특별한 사정이 없는 경우, 본 개시의 예시들에 따른 SAM은 도 17에서 상술한 E2 또는 E3 레이턴시 발생 빈도를 최소화 하도록 구성될 수 있다. 즉, SAM의 메모리 오퍼레이션은 특별한 사정이 없는 경우 뱅크의 메모리 셀의 행들을 순차적으로 접근하도록 동작할 수 있다. 따라서 상술한 도 17의 E2 또는 E3 레이턴시 발생 빈도가 최소화 될 수 있다. Unless there is a special circumstance, the SAM according to examples of the present disclosure may be configured to minimize the frequency of occurrence of the E2 or E3 latency described above with reference to FIG. 17 . That is, the memory operation of the SAM may operate to sequentially access rows of memory cells of the bank unless there is a special circumstance. Therefore, the frequency of occurrence of latency E2 or E3 of FIG. 17 described above can be minimized.

즉, SAM은 ANN DL 단위의 메모리 오퍼레이션마다 순차적인 메모리 셀의 어드레싱으로 동작하도록 구성될 수 있다. SAM의 하나의 메모리 셀의 하나의 행에 대응되는 모든 열의 메모리 셀들은 대응되는 센스 앰프에 래치 될 수 있다. 상기 센스 앰프에 래치된 데이터는 추가적인 RAS Latency 없이 모두 읽을 수 있다. 따라서 하나의 행에 대응되는 열들의 메모리 셀은 순차적으로 읽을 수 있다. That is, the SAM may be configured to operate by sequentially addressing memory cells for each memory operation of the ANN DL unit. Memory cells of all columns corresponding to one row of one memory cell of the SAM may be latched by the corresponding sense amplifier. All data latched in the sense amplifier can be read without additional RAS latency. Accordingly, memory cells of columns corresponding to one row may be sequentially read.

단, 본 개시는 이에 제한되지 않으며, 센스 앰프에 래치 된 데이터를 읽는 순서는 바뀔 수 있으며, 이러한 경우에도 별도의 RAS Latency는 발생하지 않을 수 있다. However, the present disclosure is not limited thereto, and the order of reading data latched in the sense amplifier may be changed, and even in this case, separate RAS latency may not occur.

부연 설명하면, SAM의 순차적 어드레싱 기술은 ANN DL 단위의 메모리 오퍼레이션 처리시 메모리 셀의 행과 열의 주소가 점진적으로 바뀌는 것을 의미할 수 있다. In other words, the sequential addressing technique of the SAM may mean that the address of the row and column of a memory cell is gradually changed when processing a memory operation in the ANN DL unit.

SAM 컨트롤러는 ANN DL 단위로 메인 메모리인 SAM의 저장 주소를 직접적으로 제어하도록 구성될 수 있다. 따라서, 상기 SAM 컨트롤러는 상기 SAM의 메모리 셀에 접근하기 위한, RAS 신호 및 CAS 신호를 직접 제어하도록 구성될 수 있다. The SAM controller may be configured to directly control the storage address of the SAM, which is the main memory, in ANN DL units. Accordingly, the SAM controller may be configured to directly control the RAS signal and the CAS signal for accessing the memory cell of the SAM.

도 19는 16개의 레이어에 대한 연산량과 데이터 사이즈를 예시적으로 나타낸 테이블이다.19 is a table exemplarily showing the amount of computation and data size for 16 layers.

도 19의 예시는, 인공신경망모델이 VGG16인 경우에, 16개의 레이어 별로 입력 특징맵, 출력 특징맵 그리고 커널의 구조 정보를 설명한다. 본 개시의 다양한 예시들은 적어도 하나의 인공신경망모델을 기초로 적어도 하나의 인공신경망 데이터 지역성(ANN DL) 정보를 생성하도록 구성될 수 있다.The example of FIG. 19 describes the input feature map, the output feature map, and the structure information of the kernel for each of 16 layers when the artificial neural network model is VGG16. Various examples of the present disclosure may be configured to generate at least one artificial neural network data locality (ANN DL) information based on at least one artificial neural network model.

도 19에 도시된 테이블에서, 레이어 1~13은 합성곱(Convolutional)을 하기 위한 레이어들이고, 레이어 14~16은 완전 연결(fully-connected)을 위한 레이어들을 포함한다.In the table shown in FIG. 19 , layers 1 to 13 are layers for convolutional, and layers 14 to 16 include fully-connected layers.

일반적으로, 인공신경망모델은 레이어들의 순서에 따라 연산되어야 하나, 다양한 이유들로 인하여, 프로세서가 처리하는 인공신경망 모델의 연산 순서가 증가하거나 또는 감소할 수 있다. In general, the artificial neural network model should be calculated according to the order of the layers, but for various reasons, the operation order of the artificial neural network model processed by the processor may increase or decrease.

이론적으로는 인공신경망모델의 하나의 레이어는 구조적으로 한번의 합성곱 연산으로 처리될 수 있다. 하지만 다양한 조건에 의해서 합성곱은 여러 번 나누어서 수행될 수 있다. 즉, 타일링 횟수만큼 합성곱 연산의 횟수가 증가할 수 있다.Theoretically, one layer of an artificial neural network model can be structurally processed with one convolution operation. However, convolution can be performed several times under various conditions. That is, the number of convolution operations may increase as much as the number of tilings.

예를 들면, 인공신경망 데이터 지역성(ANN DL) 정보는 인공신경망의 레이어 구조, 프로세서(즉, NPU)의 PE 어레이 구조, 프로세서의 내부 메모리의 크기에 따라 변경될 수 있다.For example, the artificial neural network data locality (ANN DL) information may be changed according to the layer structure of the artificial neural network, the PE array structure of the processor (ie, NPU), and the size of the internal memory of the processor.

예를 들면, 커널용 내부 메모리의 크기가 256Kbyte이고 레이어 1의 커널의 크기가 3.2MByte이면 커널용 내부 메모리에 적합한 횟수의 타일링은 13회가 될 수 있다. For example, if the size of the internal memory for the kernel is 256Kbyte and the size of the kernel of layer 1 is 3.2MByte, the number of tiling suitable for the internal memory for the kernel may be 13 times.

또한, 상기 프로세서가 처리할 상기 13개의 타일링 처리 순서가 결정될 수 있다. Also, the order of the 13 tiling processing to be processed by the processor may be determined.

즉, ANN DL의 개수가 프로세서의 내부 메모리 크기에 따라 바뀔 수 있다. 따라서 ANN DL의 개수가 증가하는 것도 가능하다. 한편, 입력 특징맵 용 내부 메모리의 크기는 256Kbyte이고 레이어1의 입력 특징맵의 크기가 1.7KByte이면, 타일링은 불필요할 수 있다. 레이어1의 출력 특징맵에 대해서도, 타일링은 불필요할 수 있다. That is, the number of ANN DLs may change according to the size of the processor's internal memory. Therefore, it is also possible to increase the number of ANN DLs. On the other hand, if the size of the internal memory for the input feature map is 256 Kbyte and the size of the input feature map of layer 1 is 1.7 KByte, tiling may not be necessary. Even for the output feature map of layer 1, tiling may be unnecessary.

즉, 프로세서가 처리하는 인공신경망 모델의 연산 순서가 바뀔 경우, 상기 인공신경망모델의 인공신경망 데이터 지역성(ANN DL) 정보도 변경되게 된다. That is, when the operation order of the artificial neural network model processed by the processor is changed, the artificial neural network data locality (ANN DL) information of the artificial neural network model is also changed.

따라서, 상기 인공신경망모델의 인공신경망 데이터 지역성(ANN DL) 정보는 타일링에 따라 변경된 순서 정보를 포함하도록 구성될 수 있다.Accordingly, the artificial neural network data locality (ANN DL) information of the artificial neural network model may be configured to include order information changed according to tiling.

도 20은 28개의 레이어에 대한 연산량과 데이터 사이즈를 예시적으로 나타낸 테이블이다.20 is a table exemplarily showing the amount of computation and data size for 28 layers.

도 20의 예시는, 인공신경망모델이 Mobilenet V1.0인 경우에, 28개의 레이어 별로 입력 특징맵, 출력 특징맵 그리고 커널의 구조 정보를 설명한다. 본 개시의 다양한 예시들은 적어도 하나의 인공신경망모델을 기초로 적어도 하나의 인공신경망 데이터 지역성(ANN DL) 정보를 생성하도록 구성될 수 있다.The example of FIG. 20 describes the input feature map, the output feature map, and the structure information of the kernel for each 28 layers when the artificial neural network model is Mobilenet V1.0. Various examples of the present disclosure may be configured to generate at least one artificial neural network data locality (ANN DL) information based on at least one artificial neural network model.

도 20에 도시된 테이블에서, 레이어1 내지 레이어 28은 합성곱(Convolutional)을 위한 레이어, Depth-wise 합성곱을 위한 레이어 그리고 point-wise 합성곱을 위한 레이어를 포함한다.In the table shown in FIG. 20 , layers 1 to 28 include a layer for convolutional, a layer for depth-wise convolution, and a layer for point-wise convolution.

일반적으로, 인공신경망모델은 레이어들의 순서에 따라 연산 되어야 하나, 다양한 이유들로 인하여, 연산 순서가 변경될 수 있다. 만약 연산 순서가 변경될 경우 상기 인공신경망모델의 인공신경망 데이터 지역성(ANN DL) 정보도 변경되게 된다.In general, the artificial neural network model should be calculated according to the order of the layers, but the order of operation may be changed for various reasons. If the operation order is changed, the artificial neural network data locality (ANN DL) information of the artificial neural network model is also changed.

예를 들면, 하나의 프로세서가 두 개의 인공신경망모델의 처리할 경우, 상기 하나의 프로세서가 처리하는 인공신경망 데이터 지역성(ANN DL) 정보는 도 19 및 도 20에 도시된 각각의 인공신경망모델의 인공신경망 데이터 지역성 정보가 특정 순서로 서로 통합된 것일 수도 있다.For example, when one processor processes two artificial neural network models, the artificial neural network data locality (ANN DL) information processed by the one processor is the artificial neural network model shown in FIGS. 19 and 20. Neural network data locality information may be integrated with each other in a specific order.

예를 들면, 두개의 프로세서가 하나의 인공신경망모델을 처리할 경우, 상기 두개의 프로세서가 처리하는 인공신경망 데이터 지역성(ANN DL) 정보는 도 19에 도시된 인공신경망모델의 인공신경망 데이터 지역성 정보가 각각의 프로세서에서 처리되도록 분리된 것일 수도 있다. For example, when two processors process one artificial neural network model, the artificial neural network data locality (ANN DL) information processed by the two processors is the artificial neural network data locality information of the artificial neural network model shown in FIG. It may be separate to be processed by each processor.

도 21은 인공신경망 데이터 지역성(ANN DL) 정보 내의 순서 정보에 따라 메모리에 액세스하는 제1 예시를 나타낸 테이블이다.21 is a table showing a first example of accessing a memory according to order information in artificial neural network data locality (ANN DL) information.

도 21에 도시된 제1 예시에서는, SAM 제어부는 인공신경망모델이 Mobilenet V1.0인 경우에, 28개의 레이어에 대해 연산을 수행하기 위해서, 84개의 단계를 포함하는 인공신경망 데이터 지역성(ANN DL) 정보를 가지도록 구성될 수 있다. 즉, 각 단계의 순서를 기초로 SAM의 행과 열의 어드레스 관점에서 순차 접근 정보가 결정될 수 있다.In the first example shown in FIG. 21, the SAM control unit performs an operation on 28 layers when the artificial neural network model is Mobilenet V1.0. Artificial neural network data locality (ANN DL) including 84 steps It can be configured to have information. That is, sequential access information may be determined from the viewpoint of the row and column addresses of the SAM based on the order of each step.

메인 메모리인 SAM은 상기 SAM 제어부에 포함된 인공신경망 데이터 지역성(ANN DL) 정보를 기초로 동작하도록 구성될 수 있다. The SAM, which is the main memory, may be configured to operate based on artificial neural network data locality (ANN DL) information included in the SAM controller.

여기서, 상기 인공신경망 데이터 지역성(ANN DL) 정보는 하기의 조건 중 적어도 일부를 고려하여, 컴파일러 또는 SAM 제어부가 생성한 프로세서의 인공신경망모델의 데이터 처리 순서를 의미한다Here, the artificial neural network data locality (ANN DL) information refers to the data processing order of the artificial neural network model of the processor generated by the compiler or the SAM controller in consideration of at least some of the following conditions.

a. ANN의 모델 구조 (VGG16 또는 Mobilenet V1.0 등) a. ANN's model structure (VGG16 or Mobilenet V1.0, etc.)

b, 프로세서의 구조 (예컨대, CPU, GPU, NPU의 아키텍처에 따라)b, the architecture of the processor (eg, depending on the architecture of the CPU, GPU, NPU)

예를 들어, NPU의 경우, PE 개수, Stationary 구조(input, output, weight), 구조 등For example, in the case of NPU, the number of PEs, stationary structure (input, output, weight), structure, etc.

c. 캐시 메모리의 크기 (캐시 메모리의 크기가 데이터의 크가 보다 작을 때 Tiling 알고리즘 적용 필요 등)c. Size of cache memory (Tiling algorithm needs to be applied when the size of cache memory is smaller than the size of data, etc.)

d. 각 도메인별, 각 레이어 별 데이터 사이즈 d. Data size for each domain and each layer

예를 들어, 도메인은 입력 특징맵(IFMAP), 출력 특징맵(OFMAP), 및 커널(Kernel)을 포함한다. For example, the domain includes an input feature map (IFMAP), an output feature map (OFMAP), and a kernel (Kernel).

e. PROCESSING 정책e. PROCESSING POLICY

f. 데이터 재사용 비율f. Data Reuse Rate

예를 들어, 입력 특징맵(IFMAP) 먼저 읽기 요청 또는 커널(Kernel) 먼저 읽기 요청 등 특정 도메인의 데이터 요청 순서가 결정될 수 있다. For example, a data request order of a specific domain may be determined, such as an input feature map (IFMAP) read first request or a kernel (kernel) read first request.

상기 정책은 프로세서의 구조 또는 컴파일러 알고리즘에 따라 다양해질 수 있다.The policy may vary according to the structure of the processor or the compiler algorithm.

본 개시의 예시들에 따른 SAM 제어부는 ANN DL을 기초로 SAM의 메모리 셀들의 행과 열의 어드레스가 순차적이 되도록 설정할 수 있다. 예를 들면, ANN DL 단위로 SAM의 메모리 셀들의 행과 열의 어드레스가 순차적이 되도록 설정할 수 있다.The SAM controller according to examples of the present disclosure may set the row and column addresses of memory cells of the SAM to be sequential based on the ANN DL. For example, it may be set so that the row and column addresses of the memory cells of the SAM are sequential in units of ANN DL.

도 22는 도 21에 도시된 테이블을 간략화화여 나타낸 예시적 테이블이다.22 is an exemplary table showing the table shown in FIG. 21 in a simplified manner.

도 22에서는 설명의 편의를 위해서 데이터 사이즈 및 메모리 주소를 부호로 나타내었다. In FIG. 22, data sizes and memory addresses are denoted by symbols for convenience of explanation.

도 22를 참조하여 알 수 있는 바와 같이, 인공신경망 데이터 지역성(ANN DL) 정보 내의 순서 정보에 따라서 SAM 제어부는 SAM의 주소 할당에 관한 정책을 수립할 수 있다. 부연 설명하면, SAM 제어부는 SAM의 메모리 셀들의 행과 열의 주소를 직접 제어하도록 구성될 수 있다.As can be seen with reference to FIG. 22 , the SAM control unit may establish a policy regarding address allocation of the SAM according to order information in the artificial neural network data locality (ANN DL) information. In more detail, the SAM controller may be configured to directly control the row and column addresses of memory cells of the SAM.

인공신경망 데이터 지역성(ANN DL) 정보 내의 순서 정보에 따라, 적어도 일부 또는 모든 데이터는 ANN DL 단위로 메모리에 저장될 수 있다. 이때, 상기 모든 데이터는 버스트 모드에 최적화되어 저장될 수 있다.According to order information in the artificial neural network data locality (ANN DL) information, at least some or all data may be stored in the memory in units of ANN DLs. In this case, all of the data may be optimized and stored in the burst mode.

상기 ANN DL 정보는 예를 들어, i) 입력 특징맵을 읽은 후, ii) 대응되는 커널을 읽어와서, iii) 출력 특징맵을 쓰는 순서의 패턴 정보를 포함할 수 있다. 단 본 개시는 상기 패턴에 제한되지 않으며, 다양한 패턴들을 개시하고 있다. 또한 상기 패턴은 레이어별로 각각 다르게 설정하는 것도 가능하다.The ANN DL information may include, for example, pattern information in the order of i) reading the input feature map, ii) reading the corresponding kernel, and iii) writing the output feature map. However, the present disclosure is not limited to the above pattern, and various patterns are disclosed. In addition, the pattern may be set differently for each layer.

이때 메인 메모리인 SAM은 버스트 모드로 동작되도록, ANN DL 정보에 따라서 CAS 신호 또는 RAS 신호를 제어할 수 있다. 도 41 또는 도 42의 예시들을 참조하면, CAS 신호, RAS 신호 및 주소 신호를 제어하여 Row address decoder와 Column Multiplexer/Demultiplexer를 직접 제어하는 예시가 도시되어 있다. At this time, the SAM, which is the main memory, may control the CAS signal or the RAS signal according to the ANN DL information to operate in the burst mode. 41 or 42 , an example of directly controlling the row address decoder and the column multiplexer/demultiplexer by controlling the CAS signal, the RAS signal, and the address signal is shown.

이때, SAM 제어부는 프로세서가 ANN DL 정보 내의 순서 정보에 기초하여 데이터를 특정 순서대로 요청할 것을 예측할 수 있다. In this case, the SAM control unit may predict that the processor requests data in a specific order based on order information in the ANN DL information.

상기 SAM 제어부는 컴파일 된 인공신경망모델의 ANN DL 정보를 분석하고, SAM의 CAS 신호 및/또는 RAS 신호를 직접 제어하여, 상기 프로세서가 요청할 데이터들이 SAM의 메모리셀들에서 연속되게 정렬할 수 있다. 또는 파편화된 데이터를 연속되게 재정렬할 수 있다. 따라서 SAM은 SAM 제어부에 순차적으로 데이터를 제공할 수 있다. The SAM controller analyzes the ANN DL information of the compiled artificial neural network model and directly controls the CAS signal and/or the RAS signal of the SAM, so that the data requested by the processor can be sequentially arranged in the memory cells of the SAM. Alternatively, fragmented data can be continuously rearranged. Accordingly, the SAM can sequentially provide data to the SAM control unit.

즉, 상기 메인 메모리인 SAM은 ANN DL 단위로 시작 주소와 끝 주소 까지 버스트 모드로 동작되도록 구성될 수 있다.That is, the SAM, which is the main memory, may be configured to operate in burst mode up to the start address and the end address in units of ANN DLs.

또는, 상기 메인 메모리인 SAM은 상기 컴파일된 ANN DL 정보를 분석하고, 상기 NPU가 요청할 데이터를 연속된 주소로 정렬한 후, 순차적으로 제공할 수 있다.Alternatively, the SAM, which is the main memory, may analyze the compiled ANN DL information, arrange the data requested by the NPU into consecutive addresses, and then sequentially provide the data.

따라서, 본 개시의 예시들에 따른 SAM 제어부는 ANN DL을 기초로 SAM의 메모리 셀들의 행과 열의 어드레스가 순차적이 되도록 설정할 수 있다.Accordingly, the SAM control unit according to examples of the present disclosure may set the row and column addresses of the memory cells of the SAM to be sequential based on the ANN DL.

각각의 ANN DL 단위는 대응되는 데이터 크기를 가질 수 있다. 예를 들면 제1 ANN DL 단위는 A 크기의 데이터를 가지며, A 크기에 대응되는 시작 주소와 끝 주소를 가질 수 있다. 따라서, SAM의 동작 모드는 기본적으로 실질적인 DRAM의 버스트 모드로 동작하도록 구성될 수 있으며, 상기 SAM은 SAM 제어부에서 읽기 명령을 수행할 때, 기본적으로 버스트 모드로 동작하는 것도 가능하다.Each ANN DL unit may have a corresponding data size. For example, the first ANN DL unit may have data of size A, and may have a start address and an end address corresponding to size A. Accordingly, the operation mode of the SAM may be configured to basically operate in the burst mode of the DRAM, and the SAM may basically operate in the burst mode when the SAM controller performs a read command.

또한, 프로세서의 명령이 읽기 버스트 모드가 아닌 읽기 모드일 경우에도, SAM은 ANN DL을 기초로 실질적인 버스트 모드로 동작하는 것도 가능하다. Also, even when the processor's command is in the read mode instead of the read burst mode, the SAM may operate in the actual burst mode based on the ANN DL.

또한, ANN DL에 기초하여 모든 데이터가 버스트 모드로 동작하는 것이 가능하나, 본 개시는 이에 제한되지 않으며, 거의 대부분의 데이터가 버스트 모드로 설정되는 것도 가능하다. 즉, 적어도 일부는 버스트 모드가 아닐 수도 있다. In addition, all data may operate in the burst mode based on the ANN DL, but the present disclosure is not limited thereto, and most data may be set in the burst mode. That is, at least some may not be in burst mode.

도 23은 도 22에 도시된 테이블에 따라서, SAM이 메모리 주소 맵을 설정한 예를 나타낸다.23 shows an example in which the SAM sets a memory address map according to the table shown in FIG. 22 .

SAM 제어부는 컴파일된 ANN DL 정보를 기초로, SAM의 CAS 신호 및/또는 RAS 신호를 제어하여, 프로세서가 요청할 데이터들을 메모리 맵에서 연속되도록, 정렬시킬 수 있다.The SAM control unit may control the CAS signal and/or the RAS signal of the SAM based on the compiled ANN DL information to arrange data to be requested by the processor so as to be contiguous in the memory map.

상기 SAM은 프로세서가 메모리의 특정 주소에 특정 크기의 데이터에 대한 읽기 또는 쓰기 명령을 어떤 순서로 할 것인지를 이미 알기 때문에, 상기 순서로 데이터를 정렬할 수 있다. Since the SAM already knows in what order the processor will perform a read or write command for data of a specific size at a specific address in the memory, the data can be arranged in the above order.

도 23에 도시된 예시에 따르면, A 데이터 내지 K 데이터 각각은 순차적(sequential)인 주소를 이용하여 저장된다. 이와 같이, ANN DL을 기초로 데이터가 연속적으로 저장되어 있기 때문에, 적어도 ANN DL 단위 마다 버스트 모드로 동작이 가능하게 될 수 있다. 또한, 본 개시의 예시들에 따르면, 인접한 ANN DL들도 순차적인 주소를 가질 수 있기 때문에, 복수의 ANN DL 단위의 버스트 모드 동작 또한 가능하다.A 데이터 내지 K 데이터 각각은 데이터 비트들도 순차적으로 저장될 수 있다. 따라서 SAM은 버스트 모드로 동작될 수 있다. 순차적인 주소란 메모리 셀 어레이의 행과 열의 주소가 순차적으로 증가한다는 것을 의미할 수 있다. According to the example shown in FIG. 23 , each of data A to data K is stored using sequential addresses. As described above, since data is continuously stored based on the ANN DL, it may be possible to operate in a burst mode at least for each ANN DL unit. In addition, according to examples of the present disclosure, since adjacent ANN DLs may also have sequential addresses, a burst mode operation in units of a plurality of ANN DLs is also possible. can be saved. Therefore, the SAM can be operated in burst mode. The sequential address may mean that addresses of rows and columns of the memory cell array are sequentially increased.

이로써, 각각의 데이터는 버스트 모드로 읽어질 수 있고, 연속된 주소를 가지고 저장된 데이터들도 버스트 모드로 읽어질 수 있다.Accordingly, each data may be read in the burst mode, and data stored with consecutive addresses may also be read in the burst mode.

바람직하게는 ANN DL #1 부터 #15 까지 모두 버스트 모드로 동작할 수 있으나, 본 개시는 이에 제한되지 않으며, 적어도 일부의 ANN DL 단위들의 데이터가 버스트 모드로 동작하도록 구성되는 것도 가능하다. Preferably, all of the ANN DLs #1 to #15 may operate in the burst mode, but the present disclosure is not limited thereto, and data of at least some ANN DL units may be configured to operate in the burst mode.

이하 ANN DL 정보에 기초하여 메모리 주소 맵이 설정된 이후의 절차를 설명하면 다음과 같다. Hereinafter, a procedure after the memory address map is set based on the ANN DL information will be described.

ANN DL #1: 프로세서 또는 SAM 컨트롤러는 SAM에게 A 데이터를 읽기-버스트 모드로 읽기 요청한다. ANN DL #1: Processor or SAM controller requests SAM to read A data in read-burst mode.

ANN DL #1의 경우, A 데이터가 순차적으로 저장되어 있기 때문에, A 데이터를 다 읽어올 동안 SAM은 읽기-버스트 모드로 동작될 수 있다.In the case of ANN DL #1, since data A is sequentially stored, the SAM can be operated in read-burst mode while all data A is read.

ANN DL #2: 프로세서 또는 SAM 컨트롤러는 SAM에게 B 데이터를 읽기-버스트 모드로 읽기 요청한다. ANN DL #2: The processor or SAM controller requests the SAM to read B data in read-burst mode.

ANN DL #2의 경우, 데이터가 순차적으로 저장되어 있기 때문에, B 데이터를 다 읽어올 동안 읽기-버스트 모드로 동작될 수 있다.In the case of ANN DL #2, since data is sequentially stored, it may be operated in a read-burst mode while all data B is read.

A 및 B 데이터는 메모리 맵 에서 순차적으로 저장되었기 때문에, A 및 B 데이터, 즉, 연속된 ANN DL 단위의 데이터는 읽기-버스트 모드로 동작될 수 있다.Since A and B data are sequentially stored in the memory map, A and B data, that is, data of continuous ANN DL units may be operated in read-burst mode.

ANN DL #3: 프로세서 또는 SAM 컨트롤러는 SAM에게 출력 특징맵(OFMAP)인 C 데이터를 쓰기-버스트 모드로 쓰기 요청한다. ANN DL #3: The processor or SAM controller requests the SAM to write the output feature map (OFMAP) C data in write-burst mode.

C 데이터는 B 데이터에 뒤따르는 메모리 주소를 갖기 때문에, 쓰기-버스트 모드에 따라 SAM 내에 써질 수 있다. Since C data has a memory address that follows B data, it can be written into the SAM according to the write-burst mode.

ANN DL #4: 프로세서 또는 SAM 컨트롤러는 SAM에게 입력 특징맵(IFMAP)인 C 데이터를 읽기-버스트 모드로 다시 읽기 요청한다.ANN DL #4: The processor or SAM controller requests the SAM to read the C data, which is the input feature map (IFMAP), again in read-burst mode.

ANN DL #3 및 ANN DL #4에 대해서 부연 설명하면, 제1 레이어의 출력 특징맵인 C 데이터는 제2 레이어에서 입력 특징맵으로 재사용된다. ANN DL #3 and ANN DL #4 will be further described. C data, which is an output feature map of the first layer, is reused as an input feature map in the second layer.

이와 같이, 인공신경망모델은 동일 데이터를 쓰기 후 곧 바로 다시 읽기 요청할 수 있기 때문에, 쓰기와 읽기 동작 중에 발생될 수 있는 메모리 셀의 리프레쉬(refresh) 작업을 ANN DL을 기초로 생략할 수 있다. 따라서 C 데이터의 리프레시에 소요되는 시간과 소비 전력을 절감할 수 있다. As described above, since the artificial neural network model can request to read again immediately after writing the same data, the refresh operation of the memory cell that may occur during write and read operations can be omitted based on the ANN DL. Accordingly, the time required for C data refresh and power consumption can be reduced.

또한, 특정 인공신경망모델의 특징맵은 ANN DL의 순서가 지나가면 더 이상 재사용되지 않는 특징을 가질 수 있다. 따라서 이러한 특징맵은 데이터를 더 이상 유지할 필요가 없기 때문에, 리프레쉬를 하지 않아 데이터가 손상되더라도 인공신경망 연산에 오류가 발생하지 않을 수 있다. In addition, the feature map of a specific artificial neural network model may have a feature that is no longer reused when the ANN DL sequence passes. Therefore, since such a feature map does not need to maintain data any longer, an error may not occur in artificial neural network operation even if data is damaged by not refreshing.

ANN DL #5: 프로세서 및/또는 SAM 컨트롤러는 SAM에게 D 데이터를 읽기-버스트 모드로 읽기 요청한다. ANN DL #5: The processor and/or SAM controller requests the SAM to read D data in read-burst mode.

D 데이터는 C 데이터에 뒤따르는 주소를 갖기 때문에, 연속해서 읽기-버스트 모드로 읽힐 수 있다.Since D data has an address following that of C data, it can be read continuously in read-burst mode.

ANN DL #6: 프로세서 및/또는 SAM 컨트롤러는 SAM에게 E 데이터를 쓰기-버스트 모드로 쓰기 요청한다. ANN DL #6: The processor and/or SAM controller requests the SAM to write E data in write-burst mode.

E 데이터는 D 데이터에 뒤따르는 주소를 갖기 때문에, 연속해서 쓰기-버스트 모드에 따라 SAM 내에 써질 수 있다. Since E data has an address following that of D data, it can be successively written into the SAM according to the write-burst mode.

ANN DL #7: 프로세서 및/또는 SAM 컨트롤러는 SAM에게 E 데이터를 읽기-버스트 모드로 다시 읽기 요청한다. ANN DL #7: Processor and/or SAM controller requests SAM to read E data back in read-burst mode.

ANN DL #6 및 ANN DL #7에 대해서 부연 설명하면, 제2 레이어의 출력 특징맵(OFMAP)인 E 데이터는 제3 레이어에서 입력 특징맵(IFMAP)으로 재사용된다. 이와 같이, 인공신경망모델은 동일 데이터를 쓰기 후 곧 바로 다시 읽기 요청할 수 있기 때문에, 쓰기와 읽기 중간에 발생될 수 있는 메모리 셀에 대해서 수행되는 리프레쉬(refresh) 작업을 ANN DL을 기초로 생략할 수 있다. 따라서 E 데이터의 리프레시에 소요되는 시간과 소비 전력을 절감할 수 있다. ANN DL #6 and ANN DL #7 will be further described. Data E, which is an output feature map (OFMAP) of the second layer, is reused as an input feature map (IFMAP) in the third layer. In this way, since the artificial neural network model can request to read again immediately after writing the same data, the refresh operation performed on the memory cell that may occur between writing and reading can be omitted based on the ANN DL. there is. Accordingly, the time required to refresh the E data and power consumption can be reduced.

또한, 특정 인공신경망모델의 특징맵은 인공신경망모델의 특성상 ANN DL의 순서가 지나가면 더 이상 재사용되지 않을 수 있다. 따라서 이러한 특징맵은 데이터를 더 이상 유지할 필요가 없기 때문에, 메모리 셀을 리프레쉬를 하지 않아도 인공신경망 연산에 오류가 발생하지 않을 수 있다. In addition, the feature map of a specific artificial neural network model may not be reused any more when the ANN DL sequence passes due to the nature of the artificial neural network model. Therefore, since the feature map does not need to maintain data any longer, an error may not occur in the artificial neural network operation even if the memory cell is not refreshed.

도 24는 인공신경망 데이터 지역성(ANN DL) 정보 내의 순서 정보에 따라 메모리에 액세스하는 제2 예시를 나타낸 테이블이다.24 is a table showing a second example of accessing a memory according to order information in artificial neural network data locality (ANN DL) information.

도 24에 도시된 제2 예시에서는, ANN DL은 인공신경망모델이 Mobilenet V1.0인 경우에, 입력 특징맵 보다 커널을 먼저 읽도록 구성될 수 있다. 이를 위해서, ANN DL 정보 내의 순서 정보는 입력 특징맵 보다 커널을 먼저 읽는 순서를 포함할 수 있다. In the second example shown in FIG. 24 , the ANN DL may be configured to read the kernel before the input feature map when the artificial neural network model is Mobilenet V1.0. To this end, the order information in the ANN DL information may include an order of reading the kernel before the input feature map.

이러한 제2 예시는 특징 맵의 데이터 크기가 클수록 효과적일 수 있다. This second example may be more effective as the data size of the feature map increases.

SAM에서 커널이 읽혀지고 나면, 입력 특징맵(IFMAP)이 들어오는 순서대로 합성곱(CONVOLUTION)을 할 수 있다 After the kernel is read from SAM, CONVOLUTION can be done in the order in which the input feature maps (IFMAPs) are received.

구체적으로, 제2 예시에 따른 ANN DL 정보는 i) 커널을 먼저 읽고, ii) 대응되는 입력 특징맵을 읽고, 3) 출력 특징맵을 메모리에 쓰는 순서 패턴에 대한 정보를 포함할 수 있다.Specifically, the ANN DL information according to the second example may include information on an order pattern i) first read the kernel, ii) read the corresponding input feature map, and 3) write the output feature map to the memory.

메인 메모리인 SAM은 버스트 모드로 동작하도록 ANN DL 정보에 따라 CAS 신호 및/또는 RAS 신호를 제어할 수 있다.The SAM, which is the main memory, may control the CAS signal and/or the RAS signal according to the ANN DL information to operate in a burst mode.

PROCESSOR의 데이터 operation 요청 순서는 기 설정된 ANN DATA LOCALITY 순서에 기초한다The data operation request order of the PROCESSOR is based on the preset ANN DATA LOCALITY order.

SAM 제어부는 컴파일 된 ANN DL 정보를 기초로, SAM의 CAS 신호 및/또는 RAS 신호를 제어하여 프로세서 또는 NPU가 요청할 데이터를 연속되게 정렬한 후, 순차적으로 버스트 모드로 동작할 수 있다.The SAM control unit may control the CAS signal and/or the RAS signal of the SAM based on the compiled ANN DL information to sequentially align data requested by the processor or NPU, and then sequentially operate in burst mode.

SAM 제어부는 컴파일 된 ANN DL 정보를 기초로, 프로세서 또는 NPU가 요청할 데이터를 연속되게 정렬함으로써, SAM이 버스트 모드로 동작되게끔 최적화를 수행할 수 있다.Based on the compiled ANN DL information, the SAM control unit may perform optimization such that the SAM operates in the burst mode by successively arranging data requested by the processor or NPU.

도 23의 제1 예시와 비교하면, 제2 예시의 메모리 주소 맵은 동일한 인공신경망모델을 처리하더라도 제1 예시의 메모리 주소 맵과 상이할 수 있다. Compared with the first example of FIG. 23 , the memory address map of the second example may be different from the memory address map of the first example even if the same artificial neural network model is processed.

도 25는 도 24에 도시된 테이블에 따라서, SAM이 메모리 주소 맵을 설정한 예를 나타낸다.25 shows an example in which the SAM sets a memory address map according to the table shown in FIG. 24 .

메인 메모리인 SAM은 컴파일 된 ANN DL 정보를 기초로 CAS 신호 및/또는 RAS 신호를 제어하여, 프로세서가 요청할 데이터들을 메모리 주소 맵에서 연속되도록, 정렬시킬 수 있다.The SAM, which is the main memory, may control the CAS signal and/or the RAS signal based on the compiled ANN DL information, so that data requested by the processor are arranged so that they are contiguous in the memory address map.

SAM 제어부는 컴파일 된 ANN DL 정보를 기초로, SAM의 CAS 신호 및/또는 RAS 신호를 제어하여 프로세서가 요청할 데이터를 메모리 주소 맵에 연속되게 정렬시킬 수 있다.The SAM control unit may control the CAS signal and/or the RAS signal of the SAM based on the compiled ANN DL information to continuously align data requested by the processor in the memory address map.

SAM 제어부는 프로세서가 메모리의 특정 주소에 특정 크기의 데이터의 읽기 명령 또는 쓰기 명령을 전송할 것인지를 알기 때문에, 어떤 순서로 데이터가 처리될 것인지 알 수 있다.Since the SAM control unit knows whether the processor will transmit a read command or a write command of data of a specific size to a specific address in the memory, it can know in what order the data will be processed.

도 25를 참조하면, A 데이터 내지 K 데이터 각각은 순차적인 메모리 주소에 따라 저장된다. 이와 같이 데이터가 연속되기 때문에, SAM은 적어도 ANN DL 단위로 버스트 모드로 동작될 수 있다. 또한, 본 개시의 예시들에 따르면, 인접한 ANN DL들도 순차적인 주소를 가질 수 있기 때문에, 복수의 ANN DL 단위의 버스트 모드 동작 또한 가능하다.Referring to FIG. 25 , each of data A to K is stored according to sequential memory addresses. Since data is continuous in this way, the SAM can be operated in burst mode at least in units of ANN DLs. In addition, according to examples of the present disclosure, since adjacent ANN DLs may also have sequential addresses, a burst mode operation in units of a plurality of ANN DLs is also possible.

A 데이터 내지 K 데이터내의 데이터 비트들도 순차적으로 저장되기 때문에, 메모리는 버스트 모드로 동작될 수 있다. Since the data bits in the A data to K data are also stored sequentially, the memory can be operated in burst mode.

즉, 각각의 데이터를 이루는 비트들도 버스트 모드로 읽혀지거나 쓰여질 수 있고, 각 데이터들은 서로 연속되기 때문에, 버스트 모드로 읽혀지거나 쓰여질 수 있다.That is, bits constituting each data may be read or written in the burst mode, and since each data is continuous with each other, it may be read or written in the burst mode.

이하 ANN DL에 기초하여 Memory map을 설정 후 이어지는 추론 단계를 일부 설명을 하면, Hereinafter, some explanation of the inference steps that follow after setting the memory map based on the ANN DL,

i) 프로세서 및/또는 SAM 컨트롤러는 SAM에게 데이터를 읽기-버스트 모드에 따라 읽기 요청한다.i) The processor and/or the SAM controller requests the SAM to read data according to the read-burst mode.

데이터가 순차적으로 저장되어 있기 때문에, A 데이터를 다 읽어올 동안 읽기-버스트 모드가 수행될 수 있다.Since data is sequentially stored, the read-burst mode may be performed while data A is read.

ii) 프로세서 및/또는 SAM 컨트롤러는 SAM에게 B 데이터를 읽기-버스트 모드에 따라 읽기 요청한다. ii) The processor and/or the SAM controller requests the SAM to read B data according to the read-burst mode.

데이터가 순차적으로 저장되어 있기 때문에, B 데이터를 다 읽어올 동안 읽기-버스트 모드가 수행될 수 있다.Since the data are sequentially stored, the read-burst mode may be performed while the B data is read.

A 및 B 데이터는 순차적으로 저장되었기 때문에, A 및 B 데이터는 읽기-버스트 모드로 동작될 수 있다. 즉, 연속된 ANN DL의 데이터가 연속된 읽기-버스트 모드로 동작될 수 있다. Since the A and B data are stored sequentially, the A and B data can be operated in read-burst mode. That is, continuous data of ANN DL may be operated in continuous read-burst mode.

iii) 프로세서 및/또는 SAM 컨트롤러는 SAM에게 출력 특징맵인 C 데이터를 쓰기-버스트 모드에 따라 쓰기 요청한다. iii) The processor and/or the SAM controller requests the SAM to write the C data, which is the output feature map, according to the write-burst mode.

C 데이터는 B 데이터에 뒤따르는 메모리 주소를 갖기 때문에, 쓰기-버스트 모드에 따라 메모리 내에 쓰기 될 수 있다. Since C data has a memory address following that of B data, it can be written into memory according to the write-burst mode.

iv) 프로세서 및/또는 SAM 컨트롤러는 SAM에게 D 데이터를 읽기-버스트 모드에 따라 읽기를 요청한다.iv) The processor and/or the SAM controller requests the SAM to read D data according to the read-burst mode.

v) 프로세서 및/또는 SAM 컨트롤러는 SAM에게 C 데이터를 읽기-버스트 모드에 따라 다시 읽기 요청할 수 있다. 즉, 이전 레이어의 출력 특징맵(OFMAP)은 다음 레이어에서 입력 특징맵(IFMAP)으로 사용될 수 있다.v) The processor and/or the SAM controller may request the SAM to read the C data again according to the read-burst mode. That is, the output feature map (OFMAP) of the previous layer may be used as the input feature map (IFMAP) in the next layer.

프로세서 및/또는 SAM 컨트롤러는 SAM에게 D 데이터 이후 C 데이터를 읽기 요청할 것을 미리 알기 때문에, 해당 메모리 셀에 대한 프리-차지(pre-charge) 및/또는 리프레쉬(refresh) 등 작업을 사전에 선택적으로 스케쥴링할 수 있다.Since the processor and/or the SAM controller know in advance that it will request the SAM to read the C data after the D data, it selectively schedules operations such as pre-charge and/or refresh for the corresponding memory cell in advance. can do.

vi) 프로세서 및/또는 SAM 컨트롤러는 SAM에게 E 데이터를 쓰기-버스트 모드에 따라 쓰기 요청할 수 있다.vi) The processor and/or the SAM controller may request the SAM to write E data according to the write-burst mode.

프로세서 및/또는 SAM 컨트롤러는 SAM에게 C 데이터 이후 E 데이터를 쓰기 요청할 것을 미리 알기 때문에, 해당 메모리 셀에 대한 프리-차지(pre-charge) 및/또는 리프레쉬(refresh) 등 작업을 사전에 선택적으로 스케쥴링할 수 있다.Since the processor and/or the SAM controller know in advance that it will request the SAM to write the E data after the C data, it selectively schedules operations such as pre-charge and/or refresh for the corresponding memory cell in advance. can do.

vii) 프로세서 및/또는 SAM 컨트롤러는 SAM에게 F 데이터를 읽기-버스트 모드에 따라 읽기 요청할 수 있다.vii) The processor and/or the SAM controller may request the SAM to read the F data according to the read-burst mode.

F 데이터는 E 데이터에 뒤따르는 주소를 갖기 때문에, 연속해서 읽기-버스트 모드로 동작이 가능하다.Since F data has an address following that of E data, it is possible to continuously operate in read-burst mode.

프로세서 및/또는 SAM 컨트롤러는 SAM에게 ANN DL #6에 쓴 동일 데이터를 ANN DL #8에서 다시 읽기 요청할 것을 예측할 수 있다. 따라서 ANN DL을 기초로 ANN DL #6과 ANN DL #8에 대응되는 데이터 접근 요청이 언제 수행될지를 예측 또는 계산할 수 있다. 상기 예측 또는 계산을 위해서 프로세서의 클럭 속도, ANN DL # 6 & ANN DL #8에 대응되는 E 데이터의 크기, 메모리 버스의 대역폭 등의 정보를 활용하는 것도 가능하다. 따라서, SAM 제어부 또는 SAM은 해당 메모리 셀에 대한 프리-차지(pre-charge) 및/또는 리프레쉬(refresh) 등 작업을 생략하거나 최적의 타이밍으로 스케줄링할 수 있다.The processor and/or the SAM controller may predict to request the SAM to read back from ANN DL #8 the same data written to ANN DL #6. Therefore, based on the ANN DL, it is possible to predict or calculate when the data access request corresponding to ANN DL #6 and ANN DL #8 will be performed. For the prediction or calculation, it is also possible to utilize information such as the clock speed of the processor, the size of E data corresponding to the ANN DL # 6 & ANN DL # 8, and the bandwidth of the memory bus. Accordingly, the SAM control unit or the SAM may omit a task such as pre-charge and/or refresh of the corresponding memory cell or schedule it with an optimal timing.

도 26은 인공신경망 데이터 지역성(ANN DL) 정보 내의 순서 정보에 따라 메모리에 액세스하는 제3 예시를 나타낸 테이블이다.26 is a table showing a third example of accessing a memory according to order information in artificial neural network data locality (ANN DL) information.

도 26에 도시된 제3 예시에서는, 메모리 내의 특정 영역이 입력 특징맵과 출력 특징맵을 위한 공동 영역으로 설정될 수 있다. 즉, SAM 및/또는 SAM 제어부는 특정 도메인을 기초로 SAM의 영역을 구분하도록 구성될 수 있다. In the third example illustrated in FIG. 26 , a specific region in the memory may be set as a common region for the input feature map and the output feature map. That is, the SAM and/or the SAM control unit may be configured to classify a region of the SAM based on a specific domain.

입력 특징맵 및/또는 출력 특징맵은 한번 사용되면 재사용이 안되는 데이터일 수 있기 때문에, 동일한 영역 내에서 번갈아 가면서 기록될 수 있다.Since the input feature map and/or the output feature map may be data that cannot be reused once used, they may be alternately recorded in the same area.

도시된 테이블 내의 M_FMAP은 복수의 입력 특징맵과 복수의 출력 특징맵 중에서 가장 크기가 큰 특징맵의 크기를 나타낸다. 레이어 별 특징맵의 크기가 상이하기 때문에, 인공신경망모델의 특징맵들의 최대값으로 설정하면 오버 플로우 등의 문제를 방지할 수 있다.M_FMAP in the illustrated table indicates the size of the largest feature map among the plurality of input feature maps and the plurality of output feature maps. Since the size of the feature maps for each layer is different, if the maximum value of the feature maps of the artificial neural network model is set, problems such as overflow can be prevented.

특징 맵을 읽거나 혹은 기록하기 위한 시작 주소는 모두 동일할 수 있고, 종료(End) 주소는 해당 특징 맵의 실제 크기에 따라서 가변 될 수 있다.All of the start addresses for reading or writing the feature map may be the same, and the end address may be changed according to the actual size of the corresponding feature map.

제3 예시에서는, 메모리의 특정 영역이 공용으로 사용되기 때문에, 하기의 조건을 만족해야 한다.In the third example, since a specific area of the memory is used in common, the following condition must be satisfied.

M_FMAP >= C, E, G, I, K (도면에서는 일부 생략되었지만, 예시의 Mobilenet V1.0를 위한 ANN DL 정보는 84 단계를 포함하고, 모든 특징 맵들 중 최대 값) M_FMAP >= C, E, G, I, K (partially omitted from the figure, but the example ANN DL information for Mobilenet V1.0 includes 84 steps, the maximum value among all feature maps)

메모리 내에서 커널들은 순차적으로 저장될 수 있다.In memory, kernels may be stored sequentially.

도 27a 및 도 27b는 인공신경망 데이터 지역성 정보에 따라서 메모리 주소 맵을 설정한 예를 나타낸다. 27A and 27B show examples of setting a memory address map according to artificial neural network data locality information.

SAM 제어부는 컴파일 된 ANN DL 정보를 기초로, SAM의 CAS 신호 및/또는 RAS 신호를 제어하여 프로세서(예컨대, NPU)가 요청할 데이터를 메모리 주소 맵에서 연속되게 정렬시킬 수 있다.The SAM control unit may control the CAS signal and/or the RAS signal of the SAM based on the compiled ANN DL information to sequentially align data requested by the processor (eg, NPU) in the memory address map.

도 27a 및 도 27b에 도시된 바에 따르면, SAM은 프로세서(예컨대, NPU)가 특정 크기의 데이터를 메모리의 특정 주소 내에 읽기하도록 명령하거나 쓰기하도록 명령할 것을 알기 때문에, 어떤 순서로 데이터가 처리될 것인지 알 수 있다. 27A and 27B, since the SAM knows that the processor (eg, NPU) will instruct the processor (eg, NPU) to read or write a specific size of data to or from a specific address in memory, it knows in what order the data will be processed. Able to know.

특징맵은 메모리 내의 공용 영역에서 덮어쓰기 형식으로 기록될 수 있고, 커널들은 순차적인 순서를 갖는 메모리 주소를 이용하여 저장된다. 따라서 연속된 데이터는 버스트 모드에 따라 읽혀지거나, 쓰여질 수 있다.The feature map may be recorded in an overwrite format in a common area in the memory, and kernels are stored using sequentially ordered memory addresses. Accordingly, continuous data can be read or written according to the burst mode.

지금까지 설명한 제1 예시 내지 제3 예시에 대해서 요약하여 설명하면 다음과 같다. The first to third examples described so far will be briefly described as follows.

제1 예시 내지 제3 예시를 참고하면, ANN DL 정보에 따라 메모리 주소 맵이 설정될 수 있다. 상기 메모리 주소 맵은 상술한 다양한 조건, 성능 알고리즘, ANN 모델의 구조 등에 따라 설정된다. 더 나아가서, ANN DL 정보에 의해서 데이터가 버스트-모드로 읽혀지거나 쓰여질 수 있도록, SAM은 ANN DL 정보에 기초하여, 메모리 주소 맵을 설정할 수 있다.Referring to the first to third examples, a memory address map may be set according to ANN DL information. The memory address map is set according to the above-described various conditions, performance algorithms, structures of the ANN model, and the like. Furthermore, the SAM may set a memory address map based on the ANN DL information so that data can be read or written in burst-mode according to the ANN DL information.

제1 예시 내지 제3 예시에 따르면, 커널은 SAM의 순차적인 특성 때문에, 성능 개선을 이룰 수 있다.According to the first to third examples, the kernel can achieve performance improvement due to the sequential nature of the SAM.

한편, 특징 맵은 쓰기 동작으로부터 읽기 동작 순으로 반복된다.On the other hand, the feature map is repeated in order from a write operation to a read operation.

메인 메모리가 DRAM의 메모리셀 구조를 적용할 경우, DRAM의 고유 특성상 메모리 셀 내에서 데이터를 한번 읽어 내면, 해당 메모리 셀 내의 커패시터에 충전된 전하가 방전되고, 이에 데이터는 손실 된다. 따라서 전하를 재충전하는 복원(restore) 동작이 반드시 수행되어야 한다.When the main memory adopts the memory cell structure of DRAM, due to the inherent characteristics of DRAM, once data is read from within the memory cell, the charges charged in the capacitor in the memory cell are discharged, resulting in data loss. Therefore, a restore operation for recharging the electric charge must be performed.

제1 예시는, 가장 순차적인 패턴을 위한 것이나, 입력 특징맵(IFMAP)을 메모리로부터 먼저 읽어 내기 때문에, 커널(KERNEL)이 읽혀진 이후, 합성곱 연산을 수행할 수 있다. The first example is for the most sequential pattern, but since the input feature map IFMAP is first read from the memory, the convolution operation can be performed after the kernel KERNEL is read.

제2 예시에서는, 커널을 메모리로부터 먼저 읽어 내기 때문에, 입력 특징맵(IFMAP)을 읽어낸 이후 합성곱 연산을 시작할 수 있다. 합성곱 수행 관점에서는 제2 예시가 우수하다고 할 수 있다.In the second example, since the kernel is first read from the memory, the convolution operation can be started after the input feature map (IFMAP) is read. From the viewpoint of performing convolution, the second example can be said to be excellent.

제3 예시에서는, 메인 메모리의 용량이 작을 때 효과적일 수 있다. 또는 메인 메모리가 2개의 채널을 갖는 메모리일 때와 같이, 특징 맵과 커널을 분리할 때 효과적일 수 있다. In the third example, it may be effective when the capacity of the main memory is small. Alternatively, it may be effective when the feature map and the kernel are separated, such as when the main memory is a memory having two channels.

DRAM의 버스는 일반적으로 싱글 채널이기 때문에, 제1 예시 내지 제3 예시에 따른 방식으로 데이터를 주고 받을 수 있으나, 다른 예시에서는 복수의 메모리 또는 복수의 채널을 사용하여, SIDEBAND SIGNAL 가중치와 특징 맵을 구분하여 SAM을 구현하는 것도 가능하다. Since the DRAM bus is generally a single channel, data can be sent and received in the manner according to the first to third examples. It is also possible to implement SAM separately.

단, 본 개시의 예시들은 이에 제한되지 않으며, 각각의 레이어의 특징맵과 커널의 크기에 따라서 도 22 내지 도 27에 설명된 예시들은 인공신경망모델의 레이어 별로 각각 다르게 설정될 수 있다.However, the examples of the present disclosure are not limited thereto, and the examples described in FIGS. 22 to 27 may be set differently for each layer of the artificial neural network model according to the size of the feature map and the kernel of each layer.

도 28은 SAM 컨트롤러의 제어 신호를 나타낸 개념도이다.28 is a conceptual diagram illustrating a control signal of a SAM controller.

도 28에는 메모리와 SAM 컨트롤러와 프로세서가 나타나 있다. 상기 프로세서로부터 전달되는 신호들은 개별적인 물리적인 선들(wires)을 통해서 전달되는 것이 아니라, 하나 이상의 선들(wires)을 통해서 전달되는 논리적인 신호들일 수 있다. 단 본 개시는 이에 제한되지 않는다.28 shows the memory, the SAM controller and the processor. Signals transmitted from the processor may not be transmitted through individual physical wires, but may be logical signals transmitted through one or more wires. However, the present disclosure is not limited thereto.

SAM 컨트롤러는 ANN DL 정보를 저장하는 내부 메모리를 포함할 수 있다.The SAM controller may include an internal memory that stores ANN DL information.

ANN DL 정보는 프로세서(예컨대, NPU)에 적합하게 컴파일된 정보를 포함할 수 있다. The ANN DL information may include information compiled to suit a processor (eg, an NPU).

읽기/쓰기 명령 : ANN DL 정보 내의 순서 정보에 따라 전달되는 읽기 명령 신호 또는 쓰기 명령 신호를 의미한다. 각 명령 신호에 대응되는 메모리 주소는 메모리의 시작 주소와 끝 주소 또는 카운트(count) 정보와 함께 전달될 수 있다.Read/write command: Refers to a read command signal or a write command signal delivered according to the sequence information in the ANN DL information. A memory address corresponding to each command signal may be transmitted together with a start address and an end address of the memory or count information.

사이드밴드 시그널(SIDEBAND SIGNAL) : ANN DL 정보에 따른 처리 효율을 올리기 위한 다양한 제어신호들을 필요에 따라 선택적으로 포함할 수 있다.Sideband signal (SIDEBAND SIGNAL): Various control signals for increasing processing efficiency according to ANN DL information may be selectively included as needed.

RESET 신호: ANN 모델이 변경될 때, 메모리 주소 맵을 초기화(RESET)하기 위해서 사용될 수 있다.RESET signal: When the ANN model is changed, it can be used to initialize (RESET) the memory address map.

ENABLE 신호: ENABLE 신호가 ON일 때, 프로세서에 데이터를 전달할 수 있다. ENABLE signal: When the ENABLE signal is ON, data can be transferred to the processor.

ANN DL 정보와 SIDEBAND SIGNAL은 일부 중복된 신호가 있을 수 있다. 그러나, ANN DL 정보는 인공신경망 구조에 따른 정적(STATIC)인 정보이고, 사이드 밴드 신호는 ANN 연산을 위한 동적(DYNAMIC)인 제어 신호일 수 있다.ANN DL information and SIDEBAND SIGNAL may have some overlapping signals. However, the ANN DL information may be static information according to the artificial neural network structure, and the sideband signal may be a dynamic control signal for ANN operation.

도 29는 도 28에 도시된 사이드밴드 시그널에 따른 메모리 주소 맵을 설정한 일 예를 나타낸 예시도이다.29 is an exemplary diagram illustrating an example of setting a memory address map according to a sideband signal shown in FIG. 28 .

도 29에 도시된 바와 같이, 복수의 ANN 모델을 처리할 수 있도록, 메모리 주소 맵이 설정될 수 있다As shown in FIG. 29 , a memory address map may be configured to process a plurality of ANN models.

프로세서(예컨대, NPU)가 ANN MODEL의 # 순서대로 시분할 연산을 할 경우, 메모리 주소 맵은 ANN MODEL #순으로 순차적으로 설정될 수 있다. 따라서 ANN 모델이 바뀌게 될 때 버스트 모드로 동작될 수 있다. 각각의 ANN 모델을 위한 메모리 주소 맵은 전술한 제1 예시 내지 제3 예시에 따라 설정될 수 있다.When the processor (eg, NPU) performs time division operations in the # order of the ANN MODEL, the memory address map may be sequentially set in the # order of the ANN MODEL. Therefore, when the ANN model is changed, it can be operated in burst mode. A memory address map for each ANN model may be set according to the first to third examples described above.

도 30a는 사이드밴드 시그널에 따른 메모리 주소 맵을 설정한 다른 예를 나타내고, 도 30b는 커널만 순차적으로 기록한 메모리 주소 맵의 일 예를 나타낸다.30A shows another example of setting a memory address map according to a sideband signal, and FIG. 30B shows an example of a memory address map in which only a kernel is sequentially recorded.

도 30a에 도시된 바와 같이, 특정 ANN 모델(예컨대, ANN MODEL #1)에 대해서 다중 쓰레드(THREAD)가 설정될 수 있다. 즉, 다중 사용자 접속 시 다중-쓰레드로 ANN 모델을 처리할 수 있도록, 메모리 주소 맵이 설정될 수 있다.As shown in FIG. 30A , multiple threads THREAD may be set for a specific ANN model (eg, ANN MODEL #1). That is, the memory address map can be set to process the ANN model in multi-threads when accessing multiple users.

다중 쓰레드를 이용하면, 하나의 ANN 모델의 커널을 다중 사용자가 공동으로 사용할 수 있게 된다. 각각의 쓰레드는 입력 특징맵 및/또는 출력 특징맵을 저장하기 위한 메모리 주소를 할당 받을 수 있다. By using multiple threads, multiple users can jointly use the kernel of one ANN model. Each thread may be allocated a memory address for storing an input feature map and/or an output feature map.

또는, 다중 쓰레드를 이용할 때, 도 30b에서와 같이 커널만 순차적으로 메모리 주소 맵 내에 매핑할 수 있다. M_FMAP은 쓰레드의 개수 만큼 추가로 생성될 수 있다.Alternatively, when using multiple threads, only the kernel may be sequentially mapped into the memory address map as shown in FIG. 30B . M_FMAP may be additionally created as many as the number of threads.

즉, 커널은 사용자 수와 상관없이 공동으로 사용 가능하고, 특징맵은 사용자 수에 비례하여 증가하도록 구성될 수 있다. That is, the kernel can be used jointly regardless of the number of users, and the feature map may be configured to increase in proportion to the number of users.

도 31a은 사이드밴드 시그널을 통해 전송되는 'READ_DISCARD' 명령을 본 명세서에서 제시하는 일 예에 따라 나타낸 예시도이고, 도 31b는 READ 명령의 예시를 나타낸다.31A is an exemplary diagram illustrating a 'READ_DISCARD' command transmitted through a sideband signal according to an example presented in this specification, and FIG. 31B illustrates an example of the READ command.

도 31b에 도시된 t_RAS는 Data sense(t_RCD) + Data restored to DRAM cells 시간을 의미한다. t _RAS shown in FIG. 31b means Data sense(t _RCD ) + Data restored to DRAM cells time.

Data sense(t_RCD) 시간은 센스 앰프에 데이터를 래칭하기 위한 시간을 의미한다. 상기 래칭 동작을 위해서는 프리차지(precharge), 액세스(access), 센스(sense) 동작이 필요할 수 있다. 상기 동작들에 대해서는 도 32, 도 33 및 도 34를 참조할 수 있다.Data sense(t _RCD ) time means the time for latching data to the sense amplifier. For the latching operation, precharge, access, and sense operations may be required. For the above operations, reference may be made to FIGS. 32, 33 and 34 .

또한, 본 예시의 설명을 위해서 도 17을 같이 참조할 수 있다.Also, reference may be made to FIG. 17 for the description of this example.

본 명세서에서 제시하는 일 예에 따르면, 'READ_DISCARD' 명령은 t_RCD 동안 에만 수행되고, data restored to DRAM cells는 수행되지 않을 수 있다.According to an example presented herein, the 'READ_DISCARD' command may be executed only during t _RCD , and data restored to DRAM cells may not be executed.

따라서 data restored to DRAM cells 행위에 소요되는 레이턴시 및 파워를 절감할 수 있다. 예를 들어, 도 25의 ANN DL 정보 내의 ANN DL #3에 따라 C 데이터를 메모리에 쓰기한 후, ANN DL #5에서 상기 C 데이터를 메모리에서 읽어내어 사용한 후에는, 상기 C 데이터는 ANN DL을 기초로 다시 사용될 일이 없기 때문에, data restored to DRAM cells 동작을 수행할 필요가 없다. 즉, 상기 동작의 판단을 위해서 순서 정보 및/또는 도메인 정보를 활용할 수 있다. Therefore, the latency and power required for data restored to DRAM cells can be reduced. For example, after writing C data to the memory according to ANN DL #3 in the ANN DL information of FIG. 25, and after reading and using the C data from the memory in ANN DL #5, the C data is ANN DL There is no need to perform the data restored to DRAM cells operation because it will never be used again as a basis. That is, order information and/or domain information may be used to determine the operation.

예를 들면, READ_DISCARD 명령을 특정 ANN DL #에 설정할 수 있다. For example, the READ_DISCARD command can be set to a specific ANN DL #.

예를 들면, ANN DL #3인 출력 특징맵(OFMAP)은 ANN DL #5인 다음 레이어의 입력 특징맵(IFMAP)으로 재사용 될 수 있다. 즉, 입력 특징맵(IFMAP)은 커널(KERNEL)과 합성곱 이후에 재사용 되지 않는 사실을 활용할 수 있다. For example, the output feature map (OFMAP) of ANN DL #3 can be reused as the input feature map (IFMAP) of the next layer, which is ANN DL #5. That is, the input feature map (IFMAP) can utilize the fact that it is not reused after convolution with the kernel (KERNEL).

즉, 입력 특징맵(IFMAP)을 읽을 때 READ_DISCARD 명령을 대응되는 ANN DL #에 설정할 수 있다. That is, when reading the input feature map (IFMAP), the READ_DISCARD command can be set to the corresponding ANN DL #.

예를 들면, 제1레이어의 출력 특징맵(OFMAP)가 메모리에 기록되면, 해당 데이터는 제2 레이어의 입력 특징맵(IFMAP)으로 사용되어 상기 메모리로부터 읽혀진다. 그러나, 상기 입력 특징맵(IFMAP)은 다시 사용되지 않기 때문에, data restored to DRAM cells를 수행하지 않아서 해당 데이터가 손실되더라도, ANN 연산에 영향을 주지 않는다. 따라서, 본 개시에서 제시되는 일 예시에 따르면, SAM 컨트롤러는 READ-DISCARD 명령을 상기 메모리에 지시하도록 구성될 수 있다.For example, if the output feature map OFMAP of the first layer is written to the memory, the corresponding data is used as the input feature map IFMAP of the second layer and read from the memory. However, since the input feature map (IFMAP) is not used again, even if the data is lost because data restored to DRAM cells is not performed, the ANN operation is not affected. Accordingly, according to an example presented in the present disclosure, the SAM controller may be configured to instruct the READ-DISCARD command to the memory.

이러한 원리를 도 31a에 나타내었다. 도 31a에 도시된 Data sense(t_RCD)는 sense AMP가 특정 row의 메모리 셀들에 저장된 값을 읽어오는 시간이다.This principle is shown in Figure 31a. Data sense (t _RCD ) shown in FIG. 31A is a time when sense AMP reads a value stored in memory cells of a specific row.

부연 설명하면, 'READ_DISCARD' 명령은 메모리의 행(row) 단위로 수행될 수 있다. In more detail, the 'READ_DISCARD' command may be executed in units of rows of memory.

상기 'Data restored to DRAM cells'는 'sense AMP(amplifier)'를 이용하여 읽기 동작을 수행함으로써 메모리 셀에 저장된 데이터가 손실되었기 때문에, 다시 'sense AMP'를 이용하여 래치된 상기 데이터를 메모리 셀에 다시 복원하는 동작을 의미한다.In the 'Data restored to DRAM cells', since the data stored in the memory cell was lost by performing a read operation using the 'sense AMP (amplifier)', the latched data using the 'sense AMP' is again transferred to the memory cell. It means the operation to restore again.

도 32는 본 명세서에서 제시하는 일 예에 따라 DRAM의 메모리셀 형태로 구현된 예시적인 SAM의 회로도 일부를 나타낸다.32 shows a part of a circuit diagram of an exemplary SAM implemented in the form of a memory cell of a DRAM according to an example presented herein.

도 32에 도시된 SAM의 회로도는 Sense AMP를 포함한다. SAM의 SENSE AMP는 비트 라인에 공급되는 기준 전압(Vref)과 비트 라인 상의 전압차이를 증폭시켜 0 또는 1의 디지털 신호를 생성한다. The circuit diagram of the SAM shown in FIG. 32 includes the Sense AMP. The SENSE AMP of SAM amplifies the difference between the reference voltage (Vref) supplied to the bit line and the voltage on the bit line to generate a digital signal of 0 or 1.

SAM의 SENSE AMP는 비트 라인을 통해 방전된 메모리 셀에 전하를 선택적으로 저장할 수 있다. READ 명령이 수행될 경우, RESTORE도 함께 수행된다. READ-DISCARD 명령일 경우에는, RESTORE는 수행되지 않을 수 있다.The SAM's SENSE AMP can selectively store charge in the discharged memory cell through the bit line. When the READ command is executed, RESTORE is also executed. In the case of a READ-DISCARD command, RESTORE may not be performed.

SAM의 SENSE AMP는 감지된 전압을 래치하는 버퍼 메모리 기능을 제공한다. SAM's SENSE AMP provides a buffer memory function that latches the sensed voltage.

여기서 메모리 셀의 커패시터는 누설 전류 특성을 가질 수 있다.Here, the capacitor of the memory cell may have a leakage current characteristic.

도 33은 도 32에 도시된 SAM 회로도에서 프리차지(precharge) 동작을 설명하기 위한 예시도이다.FIG. 33 is an exemplary diagram for explaining a precharge operation in the SAM circuit diagram shown in FIG. 32 .

프라차지 구간에서는 EQ(equalizing)신호가 공급되어, VOLTAGE EQ. CIRCUIT가 활성화된다. EQ 신호가 공급되면 Vref = Vcc/2 전압이 각각의 Tr을 통해서 Bitline 및

에 인가된다. 따라서 Bitline 및

이 Vref 전압을 가진다. 또한 VOLTAGE EQ. CIRCUIT이 Bitline 및

를 단락(short)시켜서 양쪽 라인의 전압이 동일하게 된다. In the pre-charge section, the EQ (equalizing) signal is supplied, and the VOLTAGE EQ. CIRCUIT is activated. When the EQ signal is supplied, the voltage Vref = Vcc/2 passes through each Tr to Bitline and

is authorized to Therefore, Bitline and

It has this Vref voltage. Also VOLTAGE EQ. CIRCUIT is Bitline and

is shorted so that the voltage on both lines is the same.

도 34는 도 32에 도시된 SAM 회로도에서 메모리 셀 액세스 동작을 설명하기 위한 예시도이다.FIG. 34 is an exemplary diagram for explaining a memory cell access operation in the SAM circuit diagram shown in FIG. 32 .

메모리 셀 액세스 구간에서는 하기의 순서로 비트 라인이 충전될 수 있다.In the memory cell access period, the bit line may be charged in the following order.

i) 도시된 굵은 워드라인과 같이, 액세스하고자 하는 프리차지(Precharged)된 비트라인(Bitline)에 대응되는 워드라인(Wordline)을 Vcc + Vt 전압으로 오버드라이브(over-drive)한다. i) Over-drive a wordline corresponding to a precharged bitline to be accessed with a voltage of Vcc + Vt as shown in the illustrated thick wordline.

워드라인에 Vcc + Vt 전압이 공급되어, 액세스하고자 메모리 셀의 트랜지스터(Tr)을 온(on) 시킨다. CSL Tr이 오프(off) 되어 데이터 출력이 차단된다. A voltage of Vcc + Vt is supplied to the word line to turn on the transistor Tr of the memory cell to be accessed. CSL Tr is turned off and data output is blocked.

ii) 도시된 굵은 비트라인과 같이, 메모리 셀의 Tr의 Cap에 저장된 값이 1이면 커패시터(Cap)이 방전되며, 비트라인의 전압이 Vref에서 Vref+로 상승한다.이때 Cap의 전압이 낮아지며 저장된 데이터가 손실된다.ii) As shown in the illustrated thick bit line, when the value stored in Cap of Tr of the memory cell is 1, the capacitor Cap is discharged and the voltage of the bit line rises from Vref to Vref+. At this time, the voltage at Cap decreases and the stored data is lost

도 35는 도 32에 도시된 SAM 회로도에서 데이터 검출(DATA SENSE) 동작을 설명하기 위한 예시도이다.FIG. 35 is an exemplary diagram for explaining a data detection (DATA SENSE) operation in the SAM circuit diagram shown in FIG. 32 .

검출(SENSE) 구간에서, 하기의 순서로 검출 회로(sensing circuit)가 비트 라인의 전압을 충전시킨다. In the detection (SENSE) section, the sensing circuit charges the voltage of the bit line in the following order.

i) 도시된 굵은 실선과 같이, 비트라인이 Vref+ 전압이 되면 검출 회로의 좌측 하단 트랜지스터(tr)이 온(on)된다.i) As shown by the thick solid line shown, when the bit line becomes Vref+ voltage, the lower left transistor tr of the detection circuit is turned on.

따라서 그라운드(GND) 전압인 SAN이

에 인가된다. Therefore, the SAN, which is the ground (GND) voltage, is

is authorized to

따라서

은 GND 전압이 된다. 따라서 우측 상단의 Tr이 on 된다. thus

is the GND voltage. Therefore, Tr in the upper right corner is on.

ii) 도시된 굵은 점선과 같이, 검출 회로의 우측 상단 Tr이 on되면, Vcc 전압인 SAP(PFet sense amplifier)이 Bitline에 인가된다. 또한 좌측 하단의 Tr에는 Vref+ 전압에서 Vcc 전압이 인가된다. 따라서 Bitline=3V,

=0V 이 인가된다. ii) As shown by the thick dotted line shown, when the upper right Tr of the detection circuit is turned on, the SAP (PFet sense amplifier), which is the Vcc voltage, is applied to the bitline. In addition, the voltage Vcc from the voltage Vref+ is applied to Tr at the lower left. So Bitline=3V,

=0V is applied.

iii) 이제 DRAM의 데이터를 읽을 준비가 되었다. CSL 신호를 공급하면, SENSE AMP의 출력이 생성될 수 있다.iii) Now we are ready to read data from DRAM. When the CSL signal is supplied, the output of the SENSE AMP can be generated.

도 36은 도 32에 도시된 SAM 회로도에서 READ-DISCARD 동작을 설명하기 위한 예시도이다.FIG. 36 is an exemplary diagram for explaining a READ-DISCARD operation in the SAM circuit diagram shown in FIG. 32 .

READ-DISCARD 구간에서는 CSL(Column Select Line) 신호가 공급되고, 그에 따라 SENSE AMP가 데이터를 출력한다.In the READ-DISCARD section, the CSL (Column Select Line) signal is supplied, and the SENSE AMP outputs data accordingly.

본 명세서에서 제시되는 예시에 따르면, 메모리 셀에 저장된 데이터가 읽어내진 이후에는, 다시 해당 메모리 셀에 전하를 재충전시키기 위한 복원(restore)을 수행하지 않음으로, 소비 전력 및 리스도어 시간이 저감될 수 있다. 예를 들어, 출력 특징맵을 저장한 다음, 상기 출력 특징맵을 다음 레이어의 입력 특징맵으로 활용할 때 적용 가능하다.According to the example presented herein, after the data stored in the memory cell is read, the power consumption and the door-to-door time can be reduced by not performing a restore for recharging the charge in the corresponding memory cell. there is. For example, it is applicable when storing the output feature map and then using the output feature map as the input feature map of the next layer.

도 31a와 도 31b를 비교하여 참조하면, t_RAS 시간만큼 메모리의 latency를 저감할 수 있다.Referring to FIG. 31A and FIG. 31B by comparison, memory latency can be reduced by t _RAS time.

한편, 커패시터(Cap)에 전하가 충전되는 것을 방지하기 위해서 오버 드라이브된(VCC + Vt) 워드라인을 오프(off)시킬 수 있다. 따라서 파워를 저감할 수 있다.Meanwhile, in order to prevent charge from being charged in the capacitor Cap, the overdrive (VCC + Vt) word line may be turned off. Therefore, power can be reduced.

도 37은 도 32에 도시된 SAM 회로도에서 READ 동작을 설명하기 위한 예시도이다.37 is an exemplary diagram for explaining a READ operation in the SAM circuit diagram shown in FIG. 32 .

READ 구간에서는 CSL 신호가 공급되어 SENSE AMP가 데이터를 출력한다.In the READ section, the CSL signal is supplied and the SENSE AMP outputs data.

본 명세서에서 제시되는 예시에 따르면, 메모리 셀에 저장된 데이터가 읽어내진 이후에, 다시 해당 메모리 셀에 전하를 재충전시키는 복원(restore)이 수행될 수 있다.According to the example presented in this specification, after data stored in the memory cell is read, restoration of recharging the electric charge in the corresponding memory cell may be performed.

도 31b를 참조하면 Restore를 위해서 t_RAS 시간이 소요 된다. Referring to Figure 31b, it takes t _RAS time for restoration.

워드라인의 활성화를 유지하면 완전히 구동된 비트라인 전압이 엑세스 트랜지스터를 통해 커패시터의 전하를 복원시킬 수 있다. 따라서 복원에 따른 소비 전력이 필요하게 된다. Keeping the wordline active allows the fully driven bitline voltage to restore the charge on the capacitor through the access transistor. Therefore, power consumption according to the restoration is required.

도 38a은 READ-DISCARD 동작의 예시적인 파형도이고, 도 38b는 READ 동작의 예시적인 파형도이다.38A is an exemplary waveform diagram of a READ-DISCARD operation, and FIG. 38B is an exemplary waveform diagram of a READ operation.

도 38b과 대비하여, 도 38a를 참조하면, RESTORE 과정이 없기 때문에, t_RAS 시간이 단축될 수 있다. 해당 동작의 이해를 위해서 도 31a 및 도 31b를 참조할 수 있다. In contrast to FIG. 38B , referring to FIG. 38A , since there is no RESTORE process, the t _RAS time may be shortened. 31A and 31B may be referred to in order to understand the corresponding operation.

도 39는 REFREASH 동작에 대해서 설명하기 위하여 도 21에 도시된 테이블의 일부를 발췌하여 나타낸 테이블이다.FIG. 39 is a table showing a part of the table shown in FIG. 21 in order to explain the REFREASH operation.

도 39에 도시된 테이블은 ANN 모델을 이용하여 1회 추론에 수행하는데 걸리는 시간을 개념적으로 설명하기 위한 것이다.The table shown in FIG. 39 is for conceptually explaining the time it takes to perform one-time inference using the ANN model.

각각의 ANN DL # 단위 별 소요 시간은 프로세서의 처리속도, 데이터 버스의 대역폭, 메모리의 동작 속도를 기초로 측정, 계산 또는 예측할 수 있다. The time required for each ANN DL # unit can be measured, calculated or predicted based on the processing speed of the processor, the bandwidth of the data bus, and the operating speed of the memory.

SAM 컨트롤러는 ANN DL 정보를 기초로 메모리의 특정 영역(DOMAIN)의 데이터에 대해서 REFRESH 여부를 제어할 수 있다. SAM 컨트롤러는 ANN DL 정보에 기초하여 IT(inference time)을 측정할 수 있다. 예를 들어, ANN DL 정보 내의 동일 ANN DL #가 반복되는 시간을 측정할 수 있다. 즉, ANN DL #1를 위한 동작을 수행한 이후 #1 동작이 다시 돌아오는 시간을 측정할 수 있다. 다른 예를 들면, ANN DL 정보 내의 시작 #와 종료 #까지 수행하는데 걸리는 시간을 측정할 수 있다. 즉, ANN DL 정보 내의 #1부터 #84까지, 동작을 수행하는데 걸리는 시간이 측정될 수 있다. 또 다른 예를 들면, ANN DL 정보 내의 특정 기간을 설정하여 처리 시간을 측정할 수 있다. The SAM controller may control whether to REFRESH data in a specific area (DOMAIN) of the memory based on the ANN DL information. The SAM controller may measure inference time (IT) based on the ANN DL information. For example, the repetition time of the same ANN DL # in the ANN DL information may be measured. That is, after performing the operation for ANN DL #1, it is possible to measure the time at which operation #1 returns again. As another example, it is possible to measure the time it takes to perform until the start # and the end # in the ANN DL information. That is, from #1 to #84 in the ANN DL information, the time taken to perform the operation may be measured. As another example, the processing time may be measured by setting a specific period in the ANN DL information.

SAM 컨트롤러는 임계 시간 이내에 1회 추론을 완료했다고 판단하면, 메모리 리프레쉬를 DISABLE 할 수 있다. 예를 들어, ANN DL 정보에 따라 임계 시간내 1회 추론을 완료하면, 커널이 저장된 메모리 영역의 리프레쉬를 DISABLE할 수 있다. If the SAM controller determines that one inference has been completed within the threshold time, memory refresh may be disabled. For example, if the inference is completed once within the threshold time according to the ANN DL information, the refresh of the memory area in which the kernel is stored may be disabled.

이와 같이 하는 이유는 다음과 같다. ANN DL 정보에 따라 1회의 추론이 완료되면, ANN 모델의 모든 커널들은 메모리에서 한번씩 “읽기” 또는 “쓰기”가 완료된 것이다. DRAM의 메모리셀 구조의 경우 “읽기”는 REFRESH와 실질적으로 동일하므로, 애써서 REFRESH를 중복하여 수행하지 않더라도, 데이터가 보존될 수 있다. 다만, 추론이 중간에 중단되거나, 임계 시간이 초과될 경우, SAM 컨트롤러는 상기 커널이 저장된 메모리 셀(예컨대, Row)만 리프레쉬할 수 있다. 다른 예를 들어, ANN DL 정보에 따라 임계 시간내 1회 추론이 완료되면, SAM 컨트롤러는 특징맵이 저장된 메모리 영역만에 대해서 REFRESH를 DISABLE할 수 있다. 이와 같이 하는 이유는 다음과 같다, 상기 특징 맵은 재사용이 불가능하므로, 데이터 손실에 둔감하기 때문에, REFRESH를 DISALBE하더라도 무방하다. 또 다른 예를 들어, READ-DISCARD 동작이 수행되는 경우, 데이터가 이미 손실되었기 때문에 REFRESH를 DISABLE하는 게 효과적일 수 있다. The reason for doing this is as follows. When one inference is completed according to the ANN DL information, “read” or “write” of all kernels of the ANN model is completed once in memory. In the case of the memory cell structure of DRAM, “read” is substantially the same as REFRESH, so data can be preserved even if REFRESH is not repeatedly performed. However, when speculation is interrupted in the middle or a threshold time is exceeded, the SAM controller may refresh only the memory cell (eg, Row) in which the kernel is stored. For another example, if one inference is completed within a threshold time according to the ANN DL information, the SAM controller may disable REFRESH for only the memory area in which the feature map is stored. The reason for doing this is as follows. Since the feature map cannot be reused, it is insensitive to data loss, so it is okay to DISALBE REFRESH. As another example, when a READ-DISCARD operation is performed, it may be effective to disable REFRESH because data has already been lost.

부연 설명하면, KERNEL은 고정된 값일 수 있기 때문에 주기적으로 REFRESH될 수 있다. 다만 상술하였듯이, KERNEL도 ANN DL을 기초로 임계 시간내 읽기가 반복된다는 것을 예측할 수 있다면, REFERESH를 DISABLE 할 수 있다.In more detail, since KERNEL may be a fixed value, it may be periodically REFRESHed. However, as described above, KERNEL can also disable REFERESH if it is possible to predict that reading will be repeated within a critical time based on the ANN DL.

단 본 개시의 예시들은 이에 제한되지 않으며, ANN DL을 기초로, 데이터의 특성, 처리 시간, 재사용 여부 등을 따져서 SAM의 READ, WRITE, READ-DISCARD 명령을 적절히 선택할 수 있다. However, the examples of the present disclosure are not limited thereto, and the READ, WRITE, and READ-DISCARD commands of the SAM may be appropriately selected based on the ANN DL, considering the characteristics of data, processing time, reuse or the like.

한편, 전술한 임계 시간은 예시적으로 REFRESH THRESHOLD TIME, RT_th = 32ms ~ 64ms로 설정될 수 있다. 데이터 손실 방지 권장 시간은 메모리 커패시터의 용량 및 누설 전류량에 따라 달라질 수 있다.Meanwhile, the aforementioned threshold time may be exemplarily set to REFRESH THRESHOLD TIME, RT _th = 32ms to 64ms. The recommended time to prevent data loss may vary depending on the capacity of the memory capacitor and the amount of leakage current.

또한 IT < RT_th를 만족할 경우, 특징맵이 저장된 메모리 셀들은 refresh 명령을 받지 않을 수 있다.Also, when IT < RT _th is satisfied, the memory cells in which the feature map is stored may not receive a refresh command.

ANN DL 정보에 데이터 REHRESH 정책을 다르게 설정할 수 있다.Data REHRESH policy can be set differently in ANN DL information.

예를 들어, 커널의 경우 데이터 보호 수준을 높이도록 REHRESH 정책이 설정될 수 있고, 특징 맵의 경우 데이터 보호 수준이 낮아지도록 REHRESH 정책이 설정될 수 있다.For example, in the case of the kernel, a REHRESH policy may be set to increase the data protection level, and in the case of a feature map, the REHRESH policy may be set to decrease the data protection level.

ANN DL 정보에 기초하여 REFRESH를 DISABLE하여 메모리 동작 지연을 저감하고 전력 소모를 저감시킬 수 있다. By disabling REFRESH based on the ANN DL information, it is possible to reduce memory operation delay and reduce power consumption.

메모리 내 복수의 뱅크에 인공신경망모델이 분산되어 저장될 때, ANN DL 정보에 기초하여, 뱅크의 프리차지 타이밍이 각각 제어되도록 할 수 있다. When the artificial neural network model is distributed and stored in a plurality of banks in the memory, based on the ANN DL information, the precharge timing of the banks can be controlled respectively.

도 40은 본 명세서에서 제시되는 예시에 따라 SAM 메모리가 다양한 형태로 구현되는 예를 나타낸다.40 shows an example in which a SAM memory is implemented in various forms according to an example presented herein.

SAM 메모리는 응용 분야에 따라 다양한 형태로 구현될 수 있다. The SAM memory may be implemented in various forms according to application fields.

프로세서 내의 캐시 간에 데이터 전달 통로인 메모리 버스는 기본이 단독 채널이나 이중(Dual) 채널로 구현될 수도 있다. 채널이 증가하면 소비 전력이 증가하지만, 커널 및/또는 특징맵을 각기 관리하여 대역폭을 향상시킬 수 있는 장점이 있다. 2개의 채널을 사용하게 되면 1개의 채널을 사용할 때 보다 대역폭이 2배로 커지므로 프로세서 내의 캐시에 더 데이터를 전달할 수 있다. 상기 동작들은 ANN DL에 기초하여 제어될 수 있다. The memory bus, which is a data transfer path between caches in the processor, may be implemented as a single channel or a dual channel. When the number of channels increases, power consumption increases, but there is an advantage in that the bandwidth can be improved by managing the kernel and/or the feature map, respectively. When two channels are used, the bandwidth is doubled compared to when one channel is used, so more data can be delivered to the cache in the processor. The operations may be controlled based on the ANN DL.

복수개의 SAM 메모리는 "랭크"로 묶여서 구동될 수 있다. A plurality of SAM memories may be grouped and driven in a “rank”.

각 SAM 메모리는 독립적으로 동작하는 메모리 어레이들의 집한인 "뱅크"를 포함한다. 예를 들어, 1개의 뱅크는 8개의 메모리 어레이를 포함할 수 있다. 인터리빙 다중 메모리 뱅크는 낮은 대역폭 디바이스를 이용해 높은 대역폭의 메모리 버스를 구현할 수 있다. 각각의 메모리 어레이는 로우 디코더, 컬럼 디코더, 센스 앰프, 입출력 버퍼를 포함할 수 있다. "로우"는 메모리 어레이의 로우(Row)를 의미한다. "컬럼"은 메모리 어레이의 컬럼(Column)을 의미한다. Each SAM memory includes a "bank", which is a collection of independently operating memory arrays. For example, one bank may include eight memory arrays. Interleaving multiple memory banks can use low bandwidth devices to implement high bandwidth memory buses. Each memory array may include a row decoder, a column decoder, a sense amplifier, and an input/output buffer. “Row” refers to a row of a memory array. "Column" means a column of a memory array.

도 41은 ANN 데이터 지역성 정보에 기초하여 메인 메모리의 주소를 매핑하는 방식의 일 예를 나타낸 예시도이다.41 is an exemplary diagram illustrating an example of a method of mapping an address of a main memory based on ANN data locality information.

도 41를 참조하면, SAM의 기본 구조가 도시 되어 있다. SAM은 행(row)과 열(column)의 주소를 가지는 매트릭스 구조의 복수의 메모리 셀을 포함한다. SAM은 예를 들어, DRAM으로 구현될 수 있다. 단 본 개시의 예시들은 이에 제한되지 않는다.Referring to FIG. 41, the basic structure of the SAM is shown. The SAM includes a plurality of memory cells in a matrix structure having addresses of rows and columns. The SAM may be implemented as, for example, DRAM. However, examples of the present disclosure are not limited thereto.

상기 매트릭스 구조의 복수의 메모리 셀의 하단에는 센스 앰프가 배치된다. 로우 어드레스 디코더는 특정 행을 선택한다. 해당 동작을 수행하기 위해서 RAS Latency가 소요된다. 선택된 행의 메모리 셀들의 데이터는 센스 앰프에 래치된다. 컬럼 어드레스 디코더는 센스 앰프에 래치된 데이터에서 필요한 데이터를 선택하여 데이터 버퍼로 전송한다. 해당 동작을 수행하기 위해서 CAS Latency가 소요된다. 상기 구조는 DRAM의 뱅크로 지칭될 수 있다. DRAM은 복수의 뱅크를 포함할 수 있다.A sense amplifier is disposed at lower ends of the plurality of memory cells of the matrix structure. The row address decoder selects a specific row. RAS latency is required to perform the corresponding operation. Data of the memory cells of the selected row are latched in the sense amplifier. The column address decoder selects necessary data from the data latched in the sense amplifier and transmits it to the data buffer. CAS Latency is required to perform the corresponding operation. The structure may be referred to as a bank of DRAM. A DRAM may include a plurality of banks.

이때, 버스트 모드로 DRAM이 동작하면, 메모리 셀의 어드레스가 순차적으로 증가되면서 데이터를 읽거나 쓰게 된다. 따라서 파편화된 어드레스의 데이터를 읽는 경우와 비교할 때 RAS Latency와 CAS Latency 발생이 최소화 된다.At this time, when the DRAM operates in the burst mode, data is read or written while the addresses of the memory cells are sequentially increased. Therefore, compared to the case of reading fragmented address data, RAS latency and CAS latency are minimized.

부연 설명하면, AMC 또는 NPU가 메인 메모리에 버스트 모드를 지시하더라도, DRAM에 저장된 데이터가 실질적으로 파편화 된 경우, 파편화 된 만큼의 RAS Latency와 CAS Latency가 발생하게 된다. 따라서 단순히 버스트 모드 명령을 하는 것으로 실질적인 RAS Latency와 CAS Latency 저감을 하기는 어렵다.To elaborate, even if the AMC or NPU instructs the burst mode to the main memory, if the data stored in the DRAM is actually fragmented, the RAS latency and CAS latency corresponding to the fragmentation occur. Therefore, it is difficult to actually reduce RAS latency and CAS latency by simply executing the burst mode command.

이와 반대로, SRAM의 경우 데이터의 파편화 여부가 실질적으로 Latency를 발생시키지 않는다. 따라서, SRAM으로 구성된 버퍼 메모리 또는 내부 메모리는 데이터의 파편화에 따른 Latency 발생이 치명적이지 않을 수 있다.Conversely, in the case of SRAM, data fragmentation does not actually cause latency. Therefore, in the buffer memory or internal memory composed of SRAM, latency generation due to data fragmentation may not be fatal.

도 41를 참조하면 ANN 데이터 지역성 정보(ANN DL)를 기초로 DRAM의 메모리 셀에 NPU가 요청할 데이터의 순서와 크기를 고려하여 메모리 맵을 설정할 수 있다. 상기 메모리 맵은 각 데이터 사이즈를 기초로 시작 주소와 끝 주소를 기초로 설정될 수 있다. 따라서 SAM에서 ANN 데이터 지역성 정보(ANN DL) 순서대로 메모리 오퍼레이션을 수행하면, 모든 메모리 오퍼레이션이 버스트 모드로 동작 가능해질 수 있다. Referring to FIG. 41 , based on the ANN data locality information (ANN DL), the memory map may be set in consideration of the order and size of data requested by the NPU to the memory cells of the DRAM. The memory map may be set based on a start address and an end address based on each data size. Therefore, if memory operations are performed in the order of ANN data locality information (ANN DL) in SAM, all memory operations may be operated in burst mode.

따라서, 도 41에 도시된 메인 메모리는 표 1에 나타난 메모리 주소와 동작 모드를 기초로 제어될 수 있다. Accordingly, the main memory shown in FIG. 41 can be controlled based on the memory address and operation mode shown in Table 1.

레이어Layer 시작 주소start address 끝 주소end address 동작 모드operation mode 도메인domain ANN DLANN DL 크기 (Byte)Size (Byte) 1One 00 A=A'A=A' Read-BurstRead-Burst IFMAPIFMAP 1One AA 1One A'+1A'+1 A+1+B=B'A+1+B=B' Read-BurstRead-Burst KernelKernel 22 BB 1One B'+1B'+1 B'+1+C=C'B'+1+C=C' Write-BurstWrite-Burst OFMAPOFMAP 33 CC 22 B'+1B'+1 B'+1+C=C'B'+1+C=C' Read-BurstRead-Burst IFMAPIFMAP 44 CC 22 C'+1C'+1 C'+1+D=D'C'+1+D=D' Read-BurstRead-Burst KernelKernel 55 DD 22 D'+1D'+1 D'+1+E=E'D'+1+E=E' Write-BurstWrite-Burst OFMAPOFMAP 66 EE 33 D'+1D'+1 D'+1+E=E'D'+1+E=E' Read-BurstRead-Burst IFMAPIFMAP 77 EE 33 E'+1E'+1 E'+1+F=F'E'+1+F=F' Read-BurstRead-Burst KernelKernel 88 FF 33 F'+1F'+1 F'+1+G=G'F'+1+G=G' Write-BurstWrite-Burst OFMAPOFMAP 99 GG 44 F'+1F'+1 F'+1+G=G'F'+1+G=G' Read-BurstRead-Burst IFMAPIFMAP 1010 GG 44 G'+1G'+1 G'+1+H=H'G'+1+H=H' Read-BurstRead-Burst KernelKernel 1111 HH 44 H'+1H'+1 H'+1+I=I'H'+1+I=I' Write-BurstWrite-Burst OFMAPOFMAP 1212 II 55 H'+1H'+1 H'+1+I=I'H'+1+I=I' Read-BurstRead-Burst IFMAPIFMAP 1313 II 55 I'+1I'+1 I'+1+J=J'I'+1+J=J' Read-BurstRead-Burst KernelKernel 1414 JJ 55 J'+1J'+1 J'+1+K=K'J'+1+K=K' Write-BurstWrite-Burst OFMAPOFMAP 1515 KK

부연 설명하면, 표 1의 도메인은 도 12에서 설명한 도메인 정보를 활용하는 것도 가능하다. 부연 설명하면, 표 1의 동작 모드는 도 12에서 설명한 동작 모드 정보를 활용 하는 것도 가능하다.In more detail, for the domain of Table 1, it is also possible to utilize the domain information described with reference to FIG. 12 . In more detail, the operation mode of Table 1 may utilize the operation mode information described with reference to FIG. 12 .

데이터는 ANN 데이터 지역성 정보(ANN DL)에 따라서 순차적인 주소에 매핑되기 때문에, 상기 데이터는 버스트 모드 명령어로 처리될 수 있다. Since the data is mapped to sequential addresses according to the ANN data locality information (ANN DL), the data can be processed with a burst mode command.

즉, AMC는 ANN 데이터 지역성 정보(ANN DL)를 기초로 NPU가 요청하기 전에 필요한 데이터를 캐싱 할 수 있고, 모든 요청 순서를 파악할 수 있다. 따라서, AMC의 버퍼 메모리의 캐쉬 히트 확률은 이론적으로 100%가 되는 것도 가능하다. That is, the AMC can cache the necessary data before the NPU makes a request based on the ANN data locality information (ANN DL), and can determine the order of all requests. Therefore, the cache hit probability of the buffer memory of the AMC can theoretically be 100%.

또한 ANN 데이터 지역성 정보(ANN DL)를 기초로 메인 메모리의 메모리 맵이 설정되기 때문에 모든 메모리 오퍼레이션이 버스트 모드로 동작하는 것도 가능하다.Also, since the memory map of the main memory is set based on the ANN data locality information (ANN DL), it is also possible for all memory operations to operate in the burst mode.

도 29에서는 단일 메모리 뱅크가 예시적으로 나타나 있지만, 메모리의 뱅크, 랭크, 채널의 구성에 따라 주소 매핑은 뱅크 인터리빙(bank interleaving) 방식으로 수행될 수도 있다.Although a single memory bank is exemplarily shown in FIG. 29 , address mapping may be performed in a bank interleaving method according to the configuration of a bank, a rank, and a channel of the memory.

만약, ANN 데이터 지역성 정보(ANN DL)가 없다면, DRAM에는 NPU가 요청할 데이터를 순차적으로 저장하는 것이 실질적으로 불가능하다. 즉, 통상의 인공신경망모델 정보가 있다고 하더라도, 다양한 예시들에서 설명한 ANN 데이터 지역성 정보(ANN DL)가 없다면, NPU가 메인 메모리에 요청할 데이터 오퍼레이션의 모든 순서를 모두 알 수가 없다.If there is no ANN data locality information (ANN DL), it is practically impossible to sequentially store data requested by the NPU in DRAM. That is, even if there is general artificial neural network model information, if there is no ANN data locality information (ANN DL) described in various examples, it is impossible to know all the orders of data operations that the NPU will request from the main memory.

만약 AMC가 ANN 데이터 지역성 정보(ANN DL)를 가지고 있지 않다면, AMC 입장에서 NPU가 인공신경망모델의 제1 레이어의 커널을 먼저 요청할지 또는 입력 특징맵을 먼저 요청할지 알기 어렵다. 따라서, 메인 메모리에 버스트 모드를 고려한 메모리 맵을 설정하는 것이 실질적으로 어렵게 된다.If the AMC does not have the ANN data locality information (ANN DL), it is difficult for the AMC to know whether the NPU will first request the kernel of the first layer of the artificial neural network model or the input feature map first. Accordingly, it becomes substantially difficult to set the memory map considering the burst mode in the main memory.

도 42는 ANN 데이터 지역성 정보에 기초하여 메인 메모리의 주소를 매핑하는 방식의 다른 예를 나타낸 예시도이다.42 is an exemplary diagram illustrating another example of a method of mapping an address of a main memory based on ANN data locality information.

도 42에 도시된 메인 메모리의 구조는 도 41에 도시된 메인 메모리와 실질적으로 동일하므로, 중복 설명은 생략한다.Since the structure of the main memory shown in FIG. 42 is substantially the same as that of the main memory shown in FIG. 41 , a redundant description will be omitted.

도 42를 참조하면 ANN 데이터 지역성 정보(ANN DL)를 기초로 DRAM의 메모리 셀에 NPU가 요청할 데이터의 순서와 크기를 고려하여 메모리 맵을 설정할 수 있다. 상기 메모리 맵은 각 데이터 사이즈를 기초로 시작 주소와 끝 주소를 기초로 설정될 수 있다. 따라서 DRAM에서 ANN 데이터 지역성 정보(ANN DL) 순서대로 메모리 오퍼레이션을 수행하면, 모든 메모리 오퍼레이션이 버스트 모드로 동작 가능해질 수 있다. Referring to FIG. 42 , based on the ANN data locality information (ANN DL), the memory map may be set in consideration of the order and size of data requested by the NPU to the memory cells of the DRAM. The memory map may be set based on a start address and an end address based on each data size. Accordingly, if memory operations are performed in the order of the ANN data locality information (ANN DL) in the DRAM, all memory operations may be operable in the burst mode.

따라서 도 42에 도시된 메인 메모리는 표 2에 나타난 메모리 주소와 동작 모드에 기초하여 제어될 수 있다. Accordingly, the main memory shown in FIG. 42 can be controlled based on the memory address and operation mode shown in Table 2.

도 42 및 표 2에 대응되는 ANN 데이터 지역성 정보(ANN DL)는 NPU가 입력 특징맵과 출력 특징맵을 공용으로 사용하도록 설정된 경우의 예시이다.The ANN data locality information (ANN DL) corresponding to FIG. 42 and Table 2 is an example of a case in which the NPU is set to use the input feature map and the output feature map in common.

레이어 이름layer name 시작 주소start address 끝 주소end address 동작 모드operation mode 도메인domain ANN DLANN DL 크기
(Byte)size
(Byte) 1One 00 M_FMAP=A'M_FMAP=A' Read-BurstRead-Burst IFMAPIFMAP 1One M_FMAPM_FMAP 1One A'+1A'+1 A'+1+B=B'A'+1+B=B' Read-BurstRead-Burst KernelKernel 22 BB 1One 00 CC Write-BurstWrite-Burst OFMAPOFMAP 33 CC 22 00 CC Read-BurstRead-Burst IFMAPIFMAP 44 CC 22 B'+1B'+1 B'+1+D=D'B'+1+D=D' Read-BurstRead-Burst KernelKernel 55 DD 22 00 EE Write-BurstWrite-Burst OFMAPOFMAP 66 EE 33 00 EE Read-BurstRead-Burst IFMAPIFMAP 77 EE 33 D'+1D'+1 D'+1+F=F'D'+1+F=F' Read-BurstRead-Burst KernelKernel 88 FF 33 00 GG Write-BurstWrite-Burst OFMAPOFMAP 99 GG 44 00 GG Read-BurstRead-Burst IFMAPIFMAP 1010 GG 44 F'+1F'+1 F'+1+H=H'F'+1+H=H' Read-BurstRead-Burst KernelKernel 1111 HH 44 00 II Write-BurstWrite-Burst OFMAPOFMAP 1212 II 55 00 II Read-BurstRead-Burst IFMAPIFMAP 1313 II 55 H'+1H'+1 H'+1+J=J'H'+1+J=J' Read-BurstRead-Burst KernelKernel 1414 JJ 55 00 KK Write-BurstWrite-Burst OFMAPOFMAP 1515 KK

커널은 인공신경망모델의 학습이 완료된 경우 그 값이 고정된다. 따라서 커널의 값은 고정된 특성을 가진다. 이에 반해서 입력 특징맵과 출력 특징맵은 영상 데이터, 카메라, 마이크, 레이더, 라이다 등의 입력이기 때문에 한번 사용되면 더 이상 재사용되지 않을 수 있다.The value of the kernel is fixed when training of the artificial neural network model is completed. Therefore, the value of the kernel has a fixed characteristic. On the other hand, since the input feature map and the output feature map are inputs of image data, camera, microphone, radar, lidar, etc., once used, they may not be reused any more.

도 20을 예를 들어 참조하면, 인공신경망모델의 입력 특징맵과 출력 특징맵의 크기가 정의되어 있다. 따라서 상기 인공신경망모델의 입력 특징맵과 출력 특징맵 중 가장 큰 데이터 크기(M_FMAP)를 선택할 수 있다. 도 20의 인공신경망모델의 경우 최대 크기의 특징맵(M_FMAP)은 802,816 Byte이다. 따라서 표 2의 인공신경망모델의 각 레이어의 입력 특징맵과 출력 특징맵들은 동일한 시작 주소를 가지도록 설정된다. 즉, 입력 특징맵과 출력 특징맵은 동일한 메모리 주소에 덮어쓰기 형식으로 동작할 수 있다. 상술하였듯이, 인공신경망모델의 특성 상, 입력 특징맵과 커널을 합성곱 연산하면 출력 특징맵이 생성되고, 해당 출력 특징맵은 다음 레이어의 입력 특징맵이 된다. 따라서 이전 레이어의 특징맵은 재사용되지 않으며, 삭제 되어도 무방할 수 있다.Referring to FIG. 20 as an example, the sizes of the input feature map and the output feature map of the artificial neural network model are defined. Therefore, it is possible to select the largest data size (M_FMAP) among the input feature map and the output feature map of the artificial neural network model. In the case of the artificial neural network model of FIG. 20, the feature map (M_FMAP) of the maximum size is 802,816 bytes. Therefore, the input feature maps and output feature maps of each layer of the artificial neural network model in Table 2 are set to have the same start address. That is, the input feature map and the output feature map may operate in the form of overwriting the same memory address. As described above, due to the characteristics of the artificial neural network model, when the input feature map and the kernel are convolutional, an output feature map is generated, and the output feature map becomes the input feature map of the next layer. Therefore, the feature map of the previous layer is not reused and may be deleted.

상술한 구성에 따르면, 최대 특징맵을 기준으로 설정된 메모리 영역을 입력 특징맵과 출력 특징맵의 공용 영역으로 설정함으로 써, 메인 메모리의 메모리 맵의 크기를 저감할 수 있다. According to the above configuration, the size of the memory map of the main memory can be reduced by setting the memory area set based on the maximum feature map as the shared area of the input feature map and the output feature map.

도 43은 인공신경망 데이터 지역성(ANN DL) 정보 내의 순서 정보에 따라 메모리에 액세스하는 예를 나타낸 테이블이다.43 is a table showing an example of accessing a memory according to order information in artificial neural network data locality (ANN DL) information.

SAM 컨트롤러는 ANN DL 정보에 기초하여 버스트 길이(BURST LENGTH)를 제어할 수 있다. SAM 컨트롤러는 BURST-TERMINATE 명령을 활용하여 ANN DL 정보에 따른 BURST-MODE를 효율적으로 제어할 수 있다. 버스트 길이는 AXI 인터페이스에 따라서 선택될 수 있다.The SAM controller may control the burst length (BURST LENGTH) based on the ANN DL information. The SAM controller can efficiently control the BURST-MODE according to the ANN DL information by using the BURST-TERMINATE command. The burst length can be selected according to the AXI interface.

1) ANN DL #1의 경우 시작 0 끝 A' 주소를 가진다. 따라서 SAM 컨트롤러가 지시하는 READ-BURST의 종료 명령은 데이터 사이즈 A 에 대응될 수 있다.1) In case of ANN DL #1, it has a start 0 end A' address. Therefore, the READ-BURST end command indicated by the SAM controller may correspond to the data size A.

2) ANN DL #2의 경우 시작 A'+1 끝 B' 주소를 가진다. 따라서 SAM 컨트롤러가 지시하는 READ-BURST의 종료 명령은 데이터 사이즈 B 에 대응될 수 있다.2) In case of ANN DL #2, it has a start A'+1 an end B' address. Therefore, the READ-BURST end command indicated by the SAM controller may correspond to the data size B.

3) ANN DL #3의 경우 시작 B'+1 끝 C' 주소를 가진다. 따라서 SAM 컨트롤러가 지시하는 WRITE-BURST의 종료 명령은 데이터 사이즈 C 에 대응될 수 있다.3) In case of ANN DL #3, it has a start B'+1 end C' address. Therefore, the WRITE-BURST end command indicated by the SAM controller may correspond to the data size C.

4) ANN DL #4의 경우 시작 B'+1 끝 C' 주소를 가진다. 따라서 SAM 컨트롤러가 지시하는 READ-BURST의 종료 명령은 데이터 사이즈 C 에 대응될 수 있다.4) In case of ANN DL #4, it has a start B'+1 end C' address. Therefore, the READ-BURST end command indicated by the SAM controller may correspond to the data size C.

5) ANN DL #5의 경우 시작 C'+1 끝 D' 주소를 가진다. 따라서 SAM 컨트롤러가 지시하는 READ-BURST의 종료 명령은 데이터 사이즈 D 에 대응될 수 있다.5) In case of ANN DL #5, it has start C'+1 end D' address. Therefore, the READ-BURST end command indicated by the SAM controller may correspond to the data size D.

본 개시의 예시들은 위에 설명한 내용에 제한되지 않으며, 버스트 길이는 하기의 방법들로 프로그램 가능하다.Examples of the present disclosure are not limited to the contents described above, and the burst length is programmable in the following ways.

a. 짧은 고정 버스트 길이를 사용한다.a. Use a short fixed burst length.

b. 명시적으로 읽기 혹은 쓰기 커맨드의 버스트 길이를 식별한다.b. Explicitly identifies the burst length of a read or write command.

c. DRAM의 퓨즈를 이용하여 버스트 길이를 프로그래밍한다. (laser programmable fuses, electrically programmable fuses) c. The burst length is programmed using the DRAM's fuse. (laser programmable fuses, electrically programmable fuses)

d. 버스트 종료 명령과 함께 길고 고정된 버스트 길이를 사용한다d. Use a long, fixed burst length with the burst end command

e. 각 CAS / 펄스가 하나의 데이터 칼럼을 토글하는 BEDO 스타일 프로토콜을 사용한다. (Burst-mode extended data out; BEDO DRAM)e. It uses a BEDO style protocol where each CAS/pulse toggles one column of data. (Burst-mode extended data out; BEDO DRAM)

도 44는 SAM 컨트롤러가 내장된 메모리의 예를 나타낸 예시도이다.44 is an exemplary diagram illustrating an example of a memory in which a SAM controller is embedded.

도시된 메모리는 인공신경망을 위해 개선된 전용 메모리이고, 상기 메모리는 바와 같이 SAM 컨트롤러가 내장될 수 있다. 즉, DSAM은 DRAM을 기초로 구현된 SAM을 의미할 수 있다.The illustrated memory is an improved dedicated memory for an artificial neural network, and the memory may have a SAM controller embedded therein. That is, DSAM may refer to a SAM implemented based on DRAM.

도 45는 컴파일러를 포함하는 아키텍처를 나타낸 예시도이다.45 is an exemplary diagram illustrating an architecture including a compiler.

컴파일러는 인공신경망모델을 NPU에서 구동할 수 있는 머신 코드로 변환시킨다.The compiler converts the artificial neural network model into machine code that can be run on the NPU.

컴파일러는 전단(Frontend)과 후단(backend)을 포함할 수 있다. IR(Intermediate representation)은 전단과 후단 사이에 존재할 수 있다. 이러한 IR은 프로그램의 추상적 개념이며 프로그램 최적화에 사용된다. 인공신경망모델은 다양한 레벨의 IR로 변환될 수 있다. The compiler may include a front end and a backend. An intermediate representation (IR) may exist between the front end and the rear end. These IRs are abstract concepts of programs and are used for program optimization. The artificial neural network model can be converted to various levels of IR.

상위-레벨 IR은 컴파일러의 전단 측에 존재할 수 있다. 상기 컴파일러의 전단은 인공신경망모델에 대한 정보를 입력 받는다. 예를 들면, 인공신경망모델에 대한 정보는 도 23에 예시된 정보일 수 있다. 상기 컴파일러의 전단은 하드웨어 비종속적인(hardware-independent) 변환과 최적화 작업을 수행할 수 있다. A high-level IR may exist on the front side of the compiler. The front end of the compiler receives information about the artificial neural network model. For example, the information on the artificial neural network model may be the information exemplified in FIG. 23 . The front end of the compiler may perform hardware-independent conversion and optimization.

상위-레벨 IR은 그래프 레벨이고, 계산과 제어 흐름(Control flow)을 최적화할 수 있다. 하위-레벨 IR은 컴파일러의 후단에 위치할 수 있다.The high-level IR is at the graph level and can optimize computation and control flow. The low-level IR may be located at the end of the compiler.

컴파일러의 후단은 상위-레벨의 IR을 하위-레벨의 IR로 변환할 수 있다. 컴파일러의 후단은 NPU 최적화, CODE 생성, Compilation 작업을 수행한다. The back end of the compiler can convert the high-level IR to the low-level IR. The rear end of the compiler performs NPU optimization, CODE generation, and compilation tasks.

상기 컴파일러 후단은 하드웨어 고유한(intrinsic) 매핑, 메모리-할당 등의 최적화 작업을 수행할 수 있다.The compiler rear end may perform optimization tasks such as hardware intrinsic mapping and memory-allocation.

ANN 데이터 지역성 정보는 하위-레벨 IR에서 생성되거나 정의될 수 있다.ANN data locality information may be generated or defined in the lower-level IR.

ANN 데이터 지역성 정보는 NPU가 메인 메모리에 요청할 모든 메모리 오퍼레이션 순서 정보를 포함할 수 있다. 따라서 AMC는 NPU가 요청할 모든 메모리 오퍼레이션 순서를 알 수 있다. 상술하였듯이, ANN 데이터 지역성 정보는 컴파일러에서 생성될 수 있으며, 또는 AMC가 NPU가 메인 메모리에게 요청하는 메모리 오퍼레이션의 반복 패턴을 분석하여 생성될 수 있다. The ANN data locality information may include all memory operation order information requested by the NPU from the main memory. Therefore, the AMC can know the order of all memory operations that the NPU will request. As described above, the ANN data locality information may be generated by the compiler, or the AMC may be generated by analyzing the repetition pattern of the memory operation requested by the NPU from the main memory.

ANN 데이터 지역성 정보는 레지스터 맵 또는 룩업 테이블 형식으로 생성될 수 있다. ANN data locality information may be generated in the form of a register map or a lookup table.

컴파일러는 ANN 데이터 지역성 정보(ANN DL)를 분석 또는 제공받은 후, ANN DL에 기초하여, AMC 및/또는 NPU의 캐싱 스케쥴을 생성할 수 있다. 상기 캐싱 스케쥴은 NPU의 온-칩 메모리의 캐싱 스케쥴 및/또는 AMC의 버퍼 메모리의 캐싱 스케쥴을 포함할 수 있다.After the ANN data locality information (ANN DL) is analyzed or provided, the compiler may generate a caching schedule of the AMC and/or NPU based on the ANN DL. The caching schedule may include a caching schedule of an on-chip memory of the NPU and/or a caching schedule of a buffer memory of the AMC.

한편, 상기 컴파일러는 최적화 알고리즘(예컨대, Quantization, Pruning, Retraining, Layer fusion, Model Compression, Transfer Learning, AI Based Model Optimization, Other Model Optimization)을 반영한 인공신경망모델을 컴파일할 수 있다.Meanwhile, the compiler may compile an artificial neural network model reflecting optimization algorithms (eg, Quantization, Pruning, Retraining, Layer fusion, Model Compression, Transfer Learning, AI Based Model Optimization, and Other Model Optimization).

또한, 컴파일러는 NPU에 최적화된 인공신경망모델의 ANN 데이터 지역성 정보를 생성할 수 있다. 상기 ANN 데이터 지역성 정보는 AMC에 별로도 제공될 수 있으며, NPU와 AMC는 동일한 ANN 데이터 지역성 정보를 각각 제공받는 것도 가능하다. 또한 도 14에서 상술하였듯이 AMC는 적어도 하나 이상일 수 있다.In addition, the compiler may generate ANN data locality information of an artificial neural network model optimized for NPU. The ANN data locality information may be separately provided to the AMC, and it is also possible for the NPU and the AMC to receive the same ANN data locality information, respectively. Also, as described above with reference to FIG. 14 , there may be at least one AMC.

상기 ANN 데이터 지역성 정보는 NPU의 메모리 오퍼레이션 요청 단위로 구성된 동작 시퀀스, 데이터 도메인, 데이터 크기, 순차 주소 지정을 위해 구성된 메모리 주소 맵(memory map configured for sequential addressing)을 포함할 수 있다.The ANN data locality information may include an operation sequence configured as a memory operation request unit of the NPU, a data domain, a data size, and a memory map configured for sequential addressing.

도시된 NPU 내의 스케줄러는 상기 컴파일러로부터 바이너리(Binary) 형태의 Machine Code를 제공받아서 인공신경망 연산을 수행할 수 있다.The illustrated scheduler in the NPU may perform an artificial neural network operation by receiving machine code in a binary form from the compiler.

컴파일러는 인공신경망 메모리 제어부(ANN Memory Controller, AMC)인 DMA 에 순차적으로(Sequential) 정렬된 메인 메모리의 메모리 주소 맵 정보를 제공하고, AMC는 순차적인 메모리 주소 맵(Sequential memory address map)에 기초하여 메인 메모리 내의 인공신경망모델 데이터를 배치, 또는 재정렬할 수 있다. AMC는 NPU의 초기화 또는 런타임 중 메인 메모리의 데이터 재정렬 동작을 수행할 수 있다.The compiler provides sequentially aligned memory address map information of the main memory to the DMA, which is an artificial neural network memory controller (ANN Memory Controller, AMC), and the AMC is based on the sequential memory address map. The artificial neural network model data in the main memory can be arranged or rearranged. The AMC may perform an operation of rearranging data in the main memory during initialization of the NPU or runtime.

이때, 상기 AMC는 상기 배치 또는 재정렬을 수행함에 있어서, read-burst 동작이 최적화하도록 할 수 있다. 상기 배치 또는 재정렬은 NPU 동작 초기화시 수행될 수 있다. 또한, ANN DL의 변동 감지 시 상기 배치 또는 재정렬이 수행될 수 있다. 이러한 기능은, 컴파일러와 무관하게 NPU 동작 중 AMC에서 독립적으로 수행될 수 있다.In this case, the AMC may optimize the read-burst operation in performing the arrangement or realignment. The arrangement or rearrangement may be performed when the NPU operation is initialized. In addition, the arrangement or rearrangement may be performed upon detection of a change in the ANN DL. These functions may be independently performed in the AMC during the operation of the NPU regardless of the compiler.

상기 AMC와 NPU는 서로 ANN 데이터 지역성 정보를 제공받거나 제공할 수 있다. 즉, 컴파일러는 상기 AMC와 NPU에게 ANN 데이터 지역성 정보를 제공할 수 있다. 상기 AMC는 NPU가 처리중인 ANN 데이터 지역성 정보의 연산 단계 정보를 실시간으로 제공받을 수 있다. 또한, 상기 AMC는 ANN 데이터 지역성 정보를 상기 NPU와 동기화할 수 있다. The AMC and the NPU may receive or provide ANN data locality information to each other. That is, the compiler may provide the ANN data locality information to the AMC and the NPU. The AMC may receive real-time information on the operation step of the ANN data locality information being processed by the NPU. In addition, the AMC may synchronize the ANN data locality information with the NPU.

현재 NPU가 ANN 데이터 지역성 정보 토큰(Token) #N에 대응되는 데이터를 처리중이면, AMC는 데이터 지역성 정보 토큰 #(N+1)에 대응되는 데이터가 NPU로부터 요청될 것을 예측하고, 메인 메모리의 지연을 고려하여, ANN 데이터 지역성 정보 토큰 #(N+1)에 대응되는 데이터를 메인 메모리에게 요청한다. 해당 동작은 NPU의 메모리 오퍼레이션 요청 전에, AMC가 독자적으로 수행할 수 있다. If the current NPU is processing the data corresponding to the ANN data locality information token #N, the AMC predicts that the data corresponding to the data locality information token #(N+1) will be requested from the NPU, In consideration of the delay, data corresponding to the ANN data locality information token #(N+1) is requested from the main memory. The corresponding operation may be independently performed by the AMC before the memory operation request of the NPU.

상기 컴파일러는 ANN 데이터 지역성에 따른 예측 동작에 필요한 데이터를 상기 AMC 내의 버퍼 메모리에 저장하도록 캐싱 정책을 생성할 수 있다. 상기 컴파일러는 DMA의 버퍼 크기에 따라서 가능한 많은 데이터를 NPU가 요청하기 전에 사전에 캐싱한다. The compiler may generate a caching policy to store data necessary for a prediction operation according to ANN data locality in a buffer memory in the AMC. The compiler caches as much data as possible before the NPU requests it according to the buffer size of the DMA.

예를 들면, 컴파일러는 ANN 데이터 지역성 정보 토큰 #(N+M) 만큼 캐싱 하도록 AMC에 캐싱 정책을 제공한다. 여기서 M은 ANN 데이터 지역성 정보 토큰 #(N+1)부터 #(N+M)까지를 합친 데이터 크기가 AMC의 캐쉬 용량(Cache capacity)과 같거나 또는 작은 경우를 만족하는 정수 값일 수 있다. For example, the compiler provides AMC with a caching policy to cache as many ANN data locality information tokens as #(N+M). Here, M may be an integer value that satisfies the case where the data size of the ANN data locality information tokens #(N+1) to #(N+M) is equal to or smaller than the cache capacity of the AMC.

상기 컴파일러는 AMC의 캐쉬 메모리 잔여 용량이 ANN 데이터 지역성 정보 토큰 #(N+M+1)의 데이터 크기보다 클 경우, ANN 데이터 지역성 정보 토큰 #(N)에 대응되는 데이터가 저장된 영역에 ANN 데이터 지역성 정보 토큰 #(N+M+1) 데이터를 저장할 수 있다. When the remaining capacity of the cache memory of the AMC is larger than the data size of the ANN data locality information token #(N+M+1), the compiler stores the data corresponding to the ANN data locality information token #(N) in the ANN data locality Can store information token #(N+M+1) data.

부연 설명하면, 상기 캐싱은 AMC의 ANN 데이터 지역성 정보 관리 유닛에 저장된 ANN DL에 기초하여 NPU의 명령 없이 AMC에 의해서 독립적으로 수행될 수 있다.In more detail, the caching may be independently performed by the AMC without an instruction from the NPU based on the ANN DL stored in the ANN data locality information management unit of the AMC.

컴파일러는 모델 경량화 기능을 제공할 수 있다. 컴파일러는 대응되는 NPU 아키텍처에 맞도록 딥러닝 모델을 추가적으로 최적화 그리고 경량화 할 수 있다. The compiler may provide a model lightweight function. The compiler can further optimize and lighten the deep learning model to fit the corresponding NPU architecture.

도 46은 제1 예시에 따른 아키텍처를 나타낸다.46 shows an architecture according to a first example.

도 46을 참조하면, NPU, AMC(인공신경망 메모리 제어부), 그리고 외부 메모리인 메인 메모리가 나타나 있다. 경우에 따라서 메인 메모리는 외부 메모리로 지칭될 수 있다.Referring to FIG. 46 , an NPU, an artificial neural network memory controller (AMC), and a main memory that is an external memory are shown. In some cases, the main memory may be referred to as an external memory.

이하 설명의 편의를 위해서 본 개시의 다양한 예시들의 인공신경망 메모리 제어부는 AMC로 지칭할 수 있다.For convenience of description below, the artificial neural network memory controller of various examples of the present disclosure may be referred to as an AMC.

상기 NPU는 NPU 스케줄러, 내부 메모리 그리고 PE 어레이를 포함할 수 있다. The NPU may include an NPU scheduler, an internal memory, and a PE array.

본 개시의 다양한 예시들의 PE 어레이는 복수의 프로세싱 엘리먼트를 포함한다. 복수의 프로세싱 엘리먼트는 독립적으로 개별 구동 가능하거나 또는 그룹으로써 구동 가능하다. PE 어레이는 복수의 프로세싱 엘리먼트로 지칭될 수 있다.The PE array of various examples of this disclosure includes a plurality of processing elements. The plurality of processing elements may be individually drivable independently or drivable as a group. A PE array may be referred to as a plurality of processing elements.

상기 NPU는 SFU(Special Function Unit)를 더 포함할 수 있다.The NPU may further include a Special Function Unit (SFU).

PE 어레이는 인공신경망을 위한 동작을 수행할 수 있다. 예를 들어, 입력 데이터가 입력되었을 때, PE 어레이는 인공신경망을 통해 추론 결과를 도출하는 동작을 수행할 수 있다. The PE array can perform operations for artificial neural networks. For example, when input data is input, the PE array may perform an operation of deriving an inference result through an artificial neural network.

NPU 스케줄러는 NPU의 추론 연산을 위한 PE 어레이의 연산 및 NPU 내부 메모리의 읽기 및 쓰기 순서를 제어하도록 구성된다. 부연 설명하면, NPU 스케줄러는 ANN(인공신경망) 데이터 지역성 정보에 기초하여 PE 어레이 및 NPU 내부 메모리를 제어하도록 구성될 수 있다. The NPU scheduler is configured to control the operation of the PE array for the reasoning operation of the NPU and the read and write order of the NPU internal memory. To elaborate, the NPU scheduler may be configured to control the PE array and the NPU internal memory based on ANN (Artificial Neural Network) data locality information.

NPU 스케줄러는 PE 어레이에서 작동할 인공신경망모델의 구조를 분석하거나 또는 분석된 정보를 제공받을 수 있다. 예를 들면, 상기 NPU의 컴파일러는 인공신경망 데이터 지역성을 분석하도록 구성될 수 있다. 인공신경망모델이 포함할 수 있는 데이터는 적어도 인공신경망 데이터 지역성에 따른 각각의 레이어의 입력 특징맵, 커널 데이터, 및 출력 특징맵 등이 있다. 각각의 레이어는 레이어의 크기 및 내부 메모리의 크기에 따라서 선택적으로 타일링(tiling) 될 수 있다. The NPU scheduler may analyze the structure of the artificial neural network model to be operated in the PE array or may be provided with the analyzed information. For example, the compiler of the NPU may be configured to analyze artificial neural network data locality. The data that the artificial neural network model may include includes at least an input feature map of each layer according to the locality of the artificial neural network data, kernel data, and an output feature map. Each layer may be selectively tiled according to the size of the layer and the size of the internal memory.

ANN 데이터 지역성 정보는 NPU 스케줄러 내부에 제공되는 메모리 또는 NPU 내부 메모리에 저장될 수 있다. NPU 스케줄러는 상기 메인 메모리에 액세스하여 필요한 데이터를 읽거나 쓸 수 있다. 또한, 상기 NPU 스케줄러는 인공신경망모델의 레이어 별 특징맵 및 커널 데이터 등의 데이터에 기초하여 ANN 데이터 지역성 정보 또는 구조에 대한 정보를 활용 할 수 있다. 커널은 가중치로 지칭되는 것도 가능하다. 특징맵은 노드 데이터로 지칭되는 것도 가능하다. 예를 들면, ANN 데이터 지역성은 인공신경망모델 설계 시, 학습 완료 시, 또는 컴파일 시 생성될 수 있다. NPU 스케줄러는 ANN 데이터 지역성 정보를 레지스터 맵 형식으로 저장할 수 있다. 단, 이에 제한되지 않는다. ANN data locality information may be stored in a memory provided inside the NPU scheduler or in the NPU internal memory. The NPU scheduler can access the main memory to read or write necessary data. In addition, the NPU scheduler may utilize ANN data locality information or information on the structure based on data such as a feature map and kernel data for each layer of the artificial neural network model. A kernel may also be referred to as a weight. The feature map may also be referred to as node data. For example, ANN data locality may be generated when designing an artificial neural network model, completing training, or compiling. The NPU scheduler may store ANN data locality information in the form of a register map. However, the present invention is not limited thereto.

NPU 스케줄러는 ANN 데이터 지역성 정보에 기초하여 인공신경망모델의 연산 순서를 스케줄링 할 수 있다.The NPU scheduler may schedule the operation order of the artificial neural network model based on the locality information of the ANN data.

NPU 스케줄러는 ANN 데이터 지역성 정보에 기초하여 인공신경망모델의 각 레이어의 특징맵 및 커널 데이터가 저장된 메모리 어드레스 값을 획득할 수 있다. 예를 들면, NPU 스케줄러는 메모리에 저장된 인공신경망모델의 레이어의 특징맵 및 커널 데이터가 저장된 메모리 어드레스 값을 획득할 수 있다. 따라서 NPU 스케줄러는 구동할 인공신경망모델의 레이어의 특징맵 및 커널 데이터의 적어도 일부를 메인 메모리에서 미리 가져온 다음, 적시에 NPU 내부 메모리에 제공할 수 있다. 각각의 레이어의 특징맵은 대응되는 각각의 메모리 어드레스 값을 가질 수 있다. 각각의 커널 데이터는 대응되는 각각의 메모리 어드레스 값을 가질 수 있다.The NPU scheduler may acquire a memory address value in which the feature map and kernel data of each layer of the artificial neural network model are stored based on the locality information of the ANN data. For example, the NPU scheduler may obtain a memory address value in which a feature map and kernel data of a layer of an artificial neural network model stored in a memory are stored. Therefore, the NPU scheduler may prefetch at least a part of the feature map and kernel data of the layer of the artificial neural network model to be driven from the main memory, and then provide it to the NPU internal memory in a timely manner. The feature map of each layer may have a corresponding memory address value. Each kernel data may have a corresponding respective memory address value.

NPU 스케줄러는 ANN 데이터 지역성 정보, 예를 들면, 인공신경망모델의 인공 신경망의 레이어들의 배치 데이터 또는 구조에 대한 정보에 기초해서 PE 어레이의 연산 순서를 스케줄링 할 수 있다.The NPU scheduler may schedule the operation order of the PE array based on ANN data locality information, for example, information about arrangement data or structure of layers of an artificial neural network of an artificial neural network model.

NPU 스케줄러는 ANN 데이터 지역성 정보에 기초하여 연산을 스케줄링 하기 때문에, 일반적인 CPU의 스케줄링 개념과 다르게 동작할 수 있다. 일반적인 CPU의 스케줄링은 공평성, 효율성, 안정성, 반응 시간 등을 고려하여, 최상의 효율을 낼 수 있도록 동작한다. 즉, 우선 순위, 연산 시간 등을 고려해서 동일 시간내에 가장 많은 프로세싱을 수행하도록 스케줄링 한다.Since the NPU scheduler schedules operations based on ANN data locality information, it may operate differently from the general CPU scheduling concept. Scheduling of a general CPU operates to achieve the best efficiency by considering fairness, efficiency, stability, and response time. That is, it is scheduled to perform the most processing within the same time in consideration of priority and operation time.

종래의 CPU는 각 프로세싱의 우선 순서, 연산 처리 시간 등의 데이터를 고려하여 작업을 스케줄링 하는 알고리즘을 사용하였다. Conventional CPUs use an algorithm for scheduling tasks in consideration of data such as priority order of each processing and operation processing time.

즉, 일반적인 CPU의 스케줄링은 랜덤하고 예측하기 어렵기 때문에, 통계, 확률, 우선순위를 기초로 결정된다. 이와 반대로 인공신경망 연산 순서는 랜덤하지 않고 예측 가능하기 때문에, 보다 효율적인 스케줄링이 가능하다. 특히 인공신경망 연산은 데이터 량이 방대하기 때문에, 효율적인 스케줄링에 따라서 인공신경망의 연산 처리 속도가 상당히 향상될 수 있다.That is, general CPU scheduling is random and difficult to predict, so it is determined based on statistics, probability, and priorities. Conversely, since the artificial neural network operation order is predictable rather than random, more efficient scheduling is possible. In particular, since artificial neural network computation has a huge amount of data, the computational processing speed of artificial neural network can be significantly improved according to efficient scheduling.

NPU 스케줄러는 ANN 데이터 지역성 정보에 기초하여 연산 순서를 결정할 수 있다.The NPU scheduler may determine the operation order based on the ANN data locality information.

더 나아가면, NPU 스케줄러는 ANN 데이터 지역성 정보 및/또는 사용하려는 NPU의 데이터 지역성 정보 또는 구조에 대한 정보에 기초하여 연산 순서를 결정할 수 있다. Further, the NPU scheduler may determine the operation order based on the ANN data locality information and/or the data locality information of the NPU to be used or information about the structure.

인공신경망모델의 구조에 의하면, 각 레이어 별 연산은 순차적으로 수행된다. 즉, 인공신경망모델의 구조가 확정될 경우, 레이어 별 연산순서가 정해질 수 있다. 이러한 인공신경망모델의 구조에 따른 연산의 순서 또는 데이터 흐름의 순서를 알고리즘 레벨에서의 인공신경망모델의 데이터 지역성으로 정의할 수 있다. According to the structure of the artificial neural network model, calculations for each layer are sequentially performed. That is, when the structure of the artificial neural network model is determined, the operation order for each layer may be determined. The order of operations or data flow according to the structure of the artificial neural network model can be defined as the data locality of the artificial neural network model at the algorithm level.

PE 어레이(즉, 복수의 프로세싱 엘리먼트)는 인공신경망의 특징맵과 커널 데이터를 연산하도록 구성된 복수의 PE들이 배치된 구성을 의미한다. 각각의 PE는 MAC(multiply and accumulate) 연산기 및/또는 ALU(Arithmetic Logic Unit) 연산기를 포함할 수 있다. 단, 본 개시에 따른 예시들은 이에 제한되지 않는다.The PE array (ie, a plurality of processing elements) refers to a configuration in which a plurality of PEs configured to calculate a feature map and kernel data of an artificial neural network are disposed. Each PE may include a multiply and accumulate (MAC) operator and/or an Arithmetic Logic Unit (ALU) operator. However, examples according to the present disclosure are not limited thereto.

한편, 상기 NPU 내의 내부 메모리는 정적 메모리 일 수 있다. 예를 들면, 내부 메모리는 SRAM 또는 레지스터일 수 있다. 상기 내부 메모리는 읽기 동작과 쓰기 동작을 동시에 처리할 수 있다. 이를 위해 상기 AMC와 상기 NPU는 듀얼-포트(dual-port) 통신 인터페이스로 연결되어 있을 수 있다. 대안적으로, 원-포트(one-port) 통신 인터페이스로 상기 AMC와 상기 NPU가 연결되어 있을 경우, TDM 방식으로 읽기 동작과 쓰기 동작을 순차로 수행할 수 있다.On the other hand, the internal memory in the NPU may be a static memory. For example, the internal memory may be SRAM or a register. The internal memory may simultaneously process a read operation and a write operation. To this end, the AMC and the NPU may be connected through a dual-port communication interface. Alternatively, when the AMC and the NPU are connected through a one-port communication interface, a read operation and a write operation may be sequentially performed in a TDM manner.

상기 AMC는 ANN 데이터 지역성 정보 관리 유닛 및 버퍼 메모리를 포함할 수 있다. The AMC may include an ANN data locality information management unit and a buffer memory.

상기 AMC는 상기 ANN 데이터 지역성 정보 관리 유닛을 통해서 상기 NPU의 연산 순서 정보를 모니터링 할 수 있다.The AMC may monitor the operation order information of the NPU through the ANN data locality information management unit.

상기 ANN 데이터 지역성 정보 관리 유닛은, 상기 NPU의 연산 순서에 따라 상기 PE들에게 제공할 데이터를 순서를 정하고 관리할 수 있다. 상기 버퍼 메모리는 상기 메인 메모리로부터 읽어온 데이터를 상기 NPU에게 제공하기 전에 임시 저장할 수 있다. 또한, 상기 버퍼 메모리는 상기 NPU로부터 제공되는 출력 특징맵을 상기 메인 메모리에 전달하기 전에 임시 저장할 수 있다.The ANN data locality information management unit may order and manage the data to be provided to the PEs according to the operation order of the NPU. The buffer memory may temporarily store the data read from the main memory before providing the data to the NPU. Also, the buffer memory may temporarily store the output feature map provided from the NPU before transferring it to the main memory.

상기 AMC는 ANN 데이터 지역성 정보에 기초하여 상기 NPU가 요청할 데이터를 상기 NPU가 요청하기 전에 메인 메모리에서 읽어와서 상기 버퍼 메모리에 저장한다. 상기 AMC는 상기 NPU가 해당 데이터를 실제로 요청하면 상기 버퍼 메모리에 저장된 상기 해당 데이터를 바로 제공한다. 따라서 상기 AMC가 제공됨에 따라 상기 NPU가 처리하는 인공신경망모델의 연산 순서를 모니터링하여 상기 메인 메모리에 의해서 생성될 수 있는 RAS Latency 및 CAS Latency를 실질적으로 제거할 수 있다.The AMC reads data to be requested by the NPU from the main memory before the NPU requests it, based on the ANN data locality information, and stores it in the buffer memory. The AMC immediately provides the corresponding data stored in the buffer memory when the NPU actually requests the corresponding data. Therefore, as the AMC is provided, the RAS latency and CAS latency that may be generated by the main memory can be substantially removed by monitoring the operation sequence of the artificial neural network model processed by the NPU.

상기 메인 메모리는 동적 메모리일 수 있다. 예를 들면 메인 메모리는 SAM 또는 DRAM일 수 있다. 상기 DRAM인 메인 메모리와 상기 AMC는 시스템 버스(예, AXI 인터페이스)로 연결될 수 있다. 상기 시스템 버스는 원-포트로 구현될 수 있다. 이 경우 상기 DRAM은 읽기 동작과 쓰기 동작을 동시에 처리할 수 없을 수 있다.The main memory may be a dynamic memory. For example, the main memory may be a SAM or DRAM. The main memory, which is the DRAM, and the AMC may be connected through a system bus (eg, an AXI interface). The system bus may be implemented as a one-port. In this case, the DRAM may not be able to simultaneously process a read operation and a write operation.

한편, 상기 AMC는 상기 ANN 데이터 지역성 정보에 기초하여, 읽기 동작이 버스트 동작이 되도록, 상기 메인 메모리 내의 데이터를 재정렬할 수 있다.Meanwhile, the AMC may rearrange data in the main memory so that a read operation becomes a burst operation based on the ANN data locality information.

따라서, 상기 메인 메모리인 DRAM이 상기 버퍼 메모리에 버스트 동작으로 데이터를 공급하면, 상기 버퍼 메모리는 상기 데이터를 NPU에 스트리밍 할 수 있다. Accordingly, when the DRAM, which is the main memory, supplies data to the buffer memory in a burst operation, the buffer memory may stream the data to the NPU.

상기 버퍼 메모리는 FIFO(First Input First Output) 형태로 구현될 수 있다. 상기 AMC는 상기 버퍼 메모리가 다 차면 대기 상태로 전환시킨다. 상기 버퍼 메모리가 데이터를 NPU에 전달하면, 상기 AMC는 상기 ANN 데이터 지역성 정보에 기초하여 상기 메인 메모리로부터 데이터를 읽어와서 상기 버퍼 메모리에 저장시킨다.The buffer memory may be implemented in the form of a first input first output (FIFO). The AMC switches to a standby state when the buffer memory is full. When the buffer memory transfers data to the NPU, the AMC reads data from the main memory based on the ANN data locality information and stores the data in the buffer memory.

만약 상기 버퍼 메모리의 크기가 작을 경우(예컨대, 1KB), 상기 버퍼 메모리는 상기 메인 메모리와 상기 NPU 사이의 Latency을 감소시키기 위한 캐싱 역할만을 수행할 수 있다. 이 경우, 상기 메인 메모리와 상기 NPU 사이에는 버스트 동작에 따라 많은 양의 데이터가 한번에 전달될 수 있다. 이와 같이 버스트 동작이 잘 수행되면, 상기 메인 메모리의 대역폭이 실질적으로 최대가 될 수 있다.If the size of the buffer memory is small (eg, 1 KB), the buffer memory may perform only a caching role to reduce latency between the main memory and the NPU. In this case, a large amount of data may be transferred at once between the main memory and the NPU according to a burst operation. If the burst operation is performed well in this way, the bandwidth of the main memory may be substantially maximized.

도 46의 변형 예로서, 상기 AMC는 상기 NPU에 내장되거나 혹은 상기 메인 메모리에 내장되거나, 또는 시스템 버스에 내장될 수도 있다.As a modified example of FIG. 46 , the AMC may be embedded in the NPU, embedded in the main memory, or embedded in a system bus.

도 47은 제2 예시에 따른 아키텍처를 나타낸다.47 shows an architecture according to a second example.

도 47을 참조하면, NPU, AMC 그리고 메인 메모리가 나타나 있다. 제2 예시에서는 다른 예시 들에서 설명한 중복 설명은 설명의 편의를 위해 생략할 수 있다. 다른 예시 들의 구성은 본 예시에 선택적으로 적용 가능하다.Referring to FIG. 47 , an NPU, an AMC, and a main memory are shown. In the second example, duplicate descriptions described in other examples may be omitted for convenience of description. Configurations of other examples are selectively applicable to this example.

상기 NPU는 NPU 스케줄러, 복수의 내부 메모리 그리고 PE 어레이를 포함할 수 있다.The NPU may include an NPU scheduler, a plurality of internal memories, and a PE array.

도 46과 달리, 도 47에 도시된 상기 NPU 내의 상기 복수 내부 메모리는 커널 데이터를 위한 제1 내부 메모리와, 입력 특징맵을 위한 제2 내부 메모리와 그리고 출력 특징맵을 위한 제3 내부 메모리를 포함할 수 있다. 상기 제1 내지 제3 내부 메모리는 하나의 물리적인 메모리 내에 할당된 복수의 영역들일 수 있다. 각각의 내부 메모리는 PE 어레이와 통신할 수 있는 포트가 각각 제공될 수 있다. 각각의 내부 메모리에 각각의 포트가 제공되면, 각각의 내부 메모리의 대역폭이 보장될 수 있다.Unlike FIG. 46, the plurality of internal memories in the NPU shown in FIG. 47 include a first internal memory for kernel data, a second internal memory for an input feature map, and a third internal memory for an output feature map. can do. The first to third internal memories may be a plurality of regions allocated in one physical memory. Each internal memory may be provided with a port capable of communicating with the PE array, respectively. If each port is provided for each internal memory, the bandwidth of each internal memory can be guaranteed.

각각의 내부 메모리의 크기는 가변적으로 조절될 수 있다. 예를 들면, 각각의 내부 메모리의 총합은 1 MByte이고, 각각의 내부 메모리들의 크기는 A:B:C의 비율로 분할될 수 있다. 예를 들면, 각각의 내부 메모리들의 크기는 1:2:3의 비율로 분할 될 수 있다. 각각의 내부 메모리의 비율은 인공신경망모델의 각 연산 순서마다 입력 특징맵의 크기, 출력 특징맵의 크기, 커널 데이터의 크기에 따라 조절될 수 있다. The size of each internal memory may be variably adjusted. For example, the total of each internal memory is 1 MByte, and the size of each internal memory may be divided in a ratio of A:B:C. For example, the size of each of the internal memories may be divided in a ratio of 1:2:3. The ratio of each internal memory may be adjusted according to the size of the input feature map, the size of the output feature map, and the size of the kernel data for each operation order of the artificial neural network model.

도 46과 달리, 상기 도 47에 도시된 상기 AMC는 DMA(direct memory access) 컨트롤러를 포함할 수 있다. Unlike FIG. 46 , the AMC shown in FIG. 47 may include a direct memory access (DMA) controller.

상기 외부 메인 메모리는 SMA 또는 DRAM일 수 있다.The external main memory may be SMA or DRAM.

상기 NPU 내의 PE 어레이가 추론을 위한 연산을 수행하는 도중에, 상기 DMA 컨트롤러는 상기 NPU로부터의 명령이 없더라도, 상기 ANN 데이터 지역성 정보에 기초하여 독자적으로 상기 메인 메모리로부터 데이터를 읽어와서 상기 버퍼 메모리 내에 저장할 수 있다.While the PE array in the NPU performs an operation for inference, the DMA controller independently reads data from the main memory based on the ANN data locality information and stores it in the buffer memory, even if there is no command from the NPU. can

상기 DMA 컨트롤러는 ANN 데이터 지역성 정보에 기초하여 상기 NPU가 요청할 데이터를 상기 NPU가 요청하기 전에 메인 메모리에서 읽어와서 상기 버퍼 메모리에 저장한다. 상기 DMA 컨트롤러는 상기 NPU가 해당 데이터를 실제로 요청하면 상기 버퍼 메모리에 저장된 상기 해당 데이터를 바로 제공한다. 따라서 상기 DMA 컨트롤러가 제공됨에 따라 상기 메인 메모리에 의해서 생성될 수 있는 RAS Latency 및 CAS Latency를 실질적으로 제거할 수 있다.The DMA controller reads data to be requested by the NPU based on the ANN data locality information from the main memory before the NPU requests it, and stores it in the buffer memory. The DMA controller directly provides the corresponding data stored in the buffer memory when the NPU actually requests the corresponding data. Accordingly, as the DMA controller is provided, RAS latency and CAS latency that may be generated by the main memory can be substantially eliminated.

도 48은 제3 예시에 따른 아키텍처를 나타낸다.48 shows an architecture according to a third example.

도 48을 참조하면, NPU, AMC, 그리고 메인 메모리가 나타나 있다. 제3 예시에서는 다른 예시 들에서 설명한 중복 설명은 설명의 편의를 위해 생략할 수 있다. 다른 예시 들의 구성은 본 예시에 선택적으로 적용 가능하다.Referring to FIG. 48 , an NPU, an AMC, and a main memory are shown. In the third example, duplicate descriptions described in other examples may be omitted for convenience of description. Configurations of other examples are selectively applicable to this example.

도 46과 달리, 도 48에 도시된 상기 NPU 내의 상기 복수 내부 메모리는 커널 데이터를 위한 제1 내부 메모리와, 입력 특징맵을 위한 제2 내부 메모리와 그리고 출력 특징맵을 위한 제3 내부 메모리를 포함할 수 있다. 상기 제1 내지 제3 내부 메모리는 하나의 물리적인 메모리 내에 할당된 복수의 영역들일 수 있다.Unlike FIG. 46, the plurality of internal memories in the NPU shown in FIG. 48 include a first internal memory for kernel data, a second internal memory for an input feature map, and a third internal memory for an output feature map. can do. The first to third internal memories may be a plurality of regions allocated in one physical memory.

도 46과 달리, 도 48에 도시된 상기 AMC는 ANN 데이터 지역성 정보 관리 유닛과, 스왑 메모리와 그리고 버퍼 메모리를 포함할 수 있다. Unlike FIG. 46 , the AMC shown in FIG. 48 may include an ANN data locality information management unit, a swap memory, and a buffer memory.

상기 외부 메인 메모리는 SAM 또는 DRAM일 수 있다.The external main memory may be a SAM or DRAM.

상기 AMC 내의 스왑 메모리는 상기 메인 메모리 내의 데이터를 재정렬하기 위해서 사용될 수 있다. A swap memory in the AMC may be used to reorder data in the main memory.

상기 메인 메모리 내에는 데이터가 파편화되어 무작위한 주소에 저장되어 있을 수 있다. 그러나, 이와 같이 데이터가 무작위로 저장되어 있는 경우, 상기 메인 메모리에서 데이터를 읽어오려면, 비-일련적인 메모리 주소가 사용되어야 한다. 이러한 경우 CAS(Column Address Strobe) Latency 및 RAS(Row Address Strobe) Latency가 빈번하게 발생할 수 있다. In the main memory, data may be fragmented and stored at random addresses. However, when data is randomly stored in this way, in order to read data from the main memory, a non-sequential memory address must be used. In this case, column address strobe (CAS) latency and row address strobe (RAS) latency may occur frequently.

이러한 문제를 해결하기 위하여, AMC는 상기 메인 메모리 내의 데이터를 상기 ANN 데이터 지역성 정보에 기초하여 재정렬할 수 있다. 구체적으로, 상기 AMC는 상기 메인 메모리 내에 파편화된 데이터의 적어도 일부를 상기 스왑 메모리 내에 일시 저장한다. 이어서, 상기 ANN 데이터 지역성 정보에 기초하여 버스트 동작이 가능하도록 상기 메인 메모리에 저장된 상기 데이터를 재정렬할 수 있다. To solve this problem, the AMC may rearrange the data in the main memory based on the ANN data locality information. Specifically, the AMC temporarily stores at least a portion of the fragmented data in the main memory in the swap memory. Subsequently, the data stored in the main memory may be rearranged to enable a burst operation based on the ANN data locality information.

상기 데이터 재정렬 동작은 초기 동작 시 1회만 수행될 수 있다. 단, 이에 제한되지 않는다. 만약 ANN 데이터 지역성 정보가 변경된 다면, 상기 변경된 ANN 데이터 지역성 정보에 기초하여 상기 재정렬 동작이 다시 수행될 수 있다.The data rearrangement operation may be performed only once during the initial operation. However, the present invention is not limited thereto. If the ANN data locality information is changed, the reordering operation may be performed again based on the changed ANN data locality information.

한편, 변형예로서, 상기 AMC는 상기 스왑 메모리를 사용하지 않고, 상기 메인 메모리 내에 스왑 영역을 할당한 후, 상기 데이터 재정렬을 수행할 수도 있다. Meanwhile, as a modification, the AMC may allocate a swap area in the main memory without using the swap memory and then perform the data rearrangement.

도 49는 제4 예시에 따른 아키텍처를 나타낸다.49 shows an architecture according to a fourth example.

도 49을 참조하면, NPU, AMC, 그리고 메인 메모리가 나타나 있다. 제4 예시에서는 다른 예시 들에서 설명한 중복 설명은 설명의 편의를 위해 생략할 수 있다. 다른 예시 들의 구성은 본 예시에 선택적으로 적용 가능하다.Referring to FIG. 49 , an NPU, an AMC, and a main memory are shown. In the fourth example, duplicate descriptions described in other examples may be omitted for convenience of description. Configurations of other examples are selectively applicable to this example.

도 46과 달리, 도 49에 도시된 상기 NPU 내의 상기 복수 내부 메모리는 커널 데이터를 위한 제1 내부 메모리와, 입력 특징맵을 위한 제2 내부 메모리와 그리고 출력 특징맵을 위한 제3 내부 메모리를 포함할 수 있다. Unlike FIG. 46, the plurality of internal memories in the NPU shown in FIG. 49 include a first internal memory for kernel data, a second internal memory for an input feature map, and a third internal memory for an output feature map. can do.

상기 AMC는 ANN 데이터 지역성 정보 관리 유닛과 그리고 복수의 버퍼 메모리를 포함할 수 있다. The AMC may include an ANN data locality information management unit and a plurality of buffer memories.

도 46과 달리, 도 49에 도시된, 상기 복수의 버퍼 메모리는 커널 데이터를 위한 제1 버퍼 메모리와, 입력 특징맵을 위한 제2 버퍼 메모리와 그리고 출력 특징맵을 위한 제3 버퍼 메모리를 포함할 수 있다. 상기 제1 내지 제3 버퍼 메모리는 하나의 물리적인 메모리 내에 할당된 복수의 영역들일 수 있다.Unlike FIG. 46 , the plurality of buffer memories shown in FIG. 49 may include a first buffer memory for kernel data, a second buffer memory for an input feature map, and a third buffer memory for an output feature map. can The first to third buffer memories may be a plurality of regions allocated in one physical memory.

상기 NPU 내의 각 내부 메모리는 상기 AMC 내의 각 버퍼 메모리와 연결될 수 있다. 예를 들어, 제1 내부 메모리는 제1 버퍼 메모리와 직접 연결되고, 제2 내부 메모리는 제2 버퍼 메모리와 직접 연결되고, 제3 내부 메모리는 제3 버퍼 메모리와 연결될 수 있다. Each internal memory in the NPU may be connected to each buffer memory in the AMC. For example, the first internal memory may be directly connected to the first buffer memory, the second internal memory may be directly connected to the second buffer memory, and the third internal memory may be connected to the third buffer memory.

각각의 버퍼 메모리는 상기 NPU의 각각의 내부 메모리와 통신할 수 있는 포트가 각각 제공될 수 있다. Each buffer memory may be provided with a port capable of communicating with each internal memory of the NPU, respectively.

각각의 버퍼 메모리의 크기는 가변적으로 조절될 수 있다. 예를 들면, 각각의 버퍼 메모리의 총합은 1 MByte이고, 각각의 버퍼 메모리들의 크기는 A:B:C의 비율로 분할될 수 있다. 예를 들면, 각각의 버퍼 메모리들의 크기는 1:2:3의 비율로 분할 될 수 있다. 각각의 버퍼 메모리의 비율은 인공신경망모델의 각 연산 순서마다 입력 특징맵의 크기, 출력 특징맵의 크기, 커널 데이터의 크기에 따라 조절될 수 있다.The size of each buffer memory may be variably adjusted. For example, the total of each buffer memory is 1 MByte, and the size of each buffer memory may be divided in a ratio of A:B:C. For example, the size of each buffer memory may be divided in a ratio of 1:2:3. The ratio of each buffer memory may be adjusted according to the size of the input feature map, the size of the output feature map, and the size of the kernel data for each operation order of the artificial neural network model.

상기 AMC는 NPU의 연산 동작을 위한 데이터를 상기 ANN 데이터 지역성 정보에 기초하여 상기 각 버퍼 메모리 내에 개별적으로 저장할 수 있다. The AMC may individually store data for an operation operation of the NPU in each of the buffer memories based on the ANN data locality information.

한편, 도 23을 참조하여 알 수 있는 바와 같이 인공신경망모델이 Mobilenet V1.0에 기반한 것일 경우, depth-wise convolution 및/또는 point-wise convolution을 위한 커널(즉, 가중치)의 크기 편차가 상당히 클 수 있다.On the other hand, as can be seen with reference to FIG. 23, when the artificial neural network model is based on Mobilenet V1.0, the size deviation of the kernel (ie, weight) for depth-wise convolution and/or point-wise convolution is quite large. can

따라서, ANN 데이터 지역성 정보에 기초하여, 상기 각 내부 메모리의 크기는 조절될 수 있다. 마찬가지로 상기 각 버퍼 메모리의 크기도 조절될 수 있다.Accordingly, the size of each internal memory may be adjusted based on the ANN data locality information. Similarly, the size of each buffer memory may be adjusted.

도 50은 제5 예시에 따른 아키텍처를 나타낸다.50 shows an architecture according to a fifth example.

도 50을 참조하면, NPU, AMC, 그리고 메인 메모리가 나타나 있다. 제5 예시에서는 다른 예시 들에서 설명한 중복 설명은 설명의 편의를 위해 생략할 수 있다. 다른 예시 들의 구성은 본 예시에 선택적으로 적용 가능하다.Referring to FIG. 50 , an NPU, an AMC, and a main memory are shown. In the fifth example, duplicate descriptions described in other examples may be omitted for convenience of description. Configurations of other examples are selectively applicable to this example.

도 46과 달리, 도 50에 도시된 상기 NPU 내의 상기 복수 내부 메모리는 커널 데이터를 위한 제1 내부 메모리와, 입력 특징맵을 위한 제2 내부 메모리와 그리고 출력 특징맵을 위한 제3 내부 메모리를 포함할 수 있다. Unlike FIG. 46, the plurality of internal memories in the NPU shown in FIG. 50 include a first internal memory for kernel data, a second internal memory for an input feature map, and a third internal memory for an output feature map. can do.

상기 AMC는 ANN 데이터 지역성 정보 관리 유닛과 그리고 버퍼 메모리를 포함할 수 있다. The AMC may include an ANN data locality information management unit and a buffer memory.

다른 예시에서 언급하였듯이 상기 메인 메모리 내에는 데이터가 무작위로 파편화되어 있을 수 있다. 그러나, 이와 같이 데이터가 무작위로 저장되어 있는 경우, 상기 메인 메모리에서 데이터를 읽어오려면, 비-일련적인 메모리 주소가 사용되어야 하므로, CAS(Column Address Strobe) Latency 및 RAS(Row Address Strobe) Latency이 발생할 가능성이 있다. As mentioned in other examples, data may be randomly fragmented in the main memory. However, when data is randomly stored in this way, to read data from the main memory, non-sequential memory addresses must be used, so CAS (Column Address Strobe) Latency and RAS (Row Address Strobe) Latency may occur. There is a possibility.

이러한 문제를 해결하기 위하여, AMC는 상기 메인 메모리 내의 데이터를 상기 ANN 데이터 지역성 정보에 기초하여 재정렬할 수 있다. 구체적으로, 상기 AMC는 상기 메인 메모리 내에 파편화된 데이터의 적어도 일부를 상기 버퍼 메모리 내에 일시 저장한다. 이어서, 상기 ANN 데이터 지역성 정보에 기초하여 버스트 동작이 가능하도록 상기 메인 메모리에 저장된 상기 데이터를 재정렬할 수 있다. To solve this problem, the AMC may rearrange the data in the main memory based on the ANN data locality information. Specifically, the AMC temporarily stores at least a portion of the fragmented data in the main memory in the buffer memory. Subsequently, the data stored in the main memory may be rearranged to enable a burst operation based on the ANN data locality information.

한편, 데이터가 재정렬되면, 메모리 주소가 변경될 수 있다. 따라서, 상기 AMC 내의 상기 ANN 데이터 지역성 정보 관리 유닛과 상기 NPU 스케줄러는 서로 통신할 수 있다. 구체적으로, 상기 ANN 데이터 지역성 정보 관리 유닛은 상기 데이터 재정렬 이후에 갱신된 메모리 주소를 저장한다. 이어서 상기 ANN 데이터 지역성 정보 관리 유닛은 상기 NPU 스케줄러에 저장된 기존의 메모리 주소를 갱신할 수 있다. Meanwhile, when data is rearranged, a memory address may be changed. Accordingly, the ANN data locality information management unit in the AMC and the NPU scheduler may communicate with each other. Specifically, the ANN data locality information management unit stores the updated memory address after the data rearrangement. Then, the ANN data locality information management unit may update the existing memory address stored in the NPU scheduler.

도 51은 제6 예시에 따른 아키텍처를 나타낸다.51 shows an architecture according to a sixth example.

도 51을 참조하면, NPU, AMC, 그리고 메인 메모리가 나타나 있다. 제6 예시에서는 다른 예시 들에서 설명한 중복 설명은 설명의 편의를 위해 생략할 수 있다. 다른 예시 들의 구성은 본 예시에 선택적으로 적용 가능하다.Referring to FIG. 51 , an NPU, an AMC, and a main memory are shown. In the sixth example, duplicate descriptions described in other examples may be omitted for convenience of description. Configurations of other examples are selectively applicable to this example.

도 46과 달리, 도 51에 도시된 상기 NPU 내의 상기 복수 내부 메모리는 가중치를 위한 제1 내부 메모리와, 입력 특징맵을 위한 제2 내부 메모리와 그리고 출력 특징맵을 위한 제3 내부 메모리를 포함할 수 있다. 상기 제1 내지 제3 내부 메모리는 하나의 물리적인 메모리 내에 할당된 복수의 영역들일 수 있다.46, the plurality of internal memories in the NPU shown in FIG. 51 may include a first internal memory for weights, a second internal memory for input feature maps, and a third internal memory for output feature maps. can The first to third internal memories may be a plurality of regions allocated in one physical memory.

상기 AMC는 ANN 데이터 지역성 정보 관리 유닛과 그리고 TLB(translation lookaside buffer) 그리고 버퍼 메모리를 포함할 수 있다. The AMC may include an ANN data locality information management unit, a translation lookaside buffer (TLB), and a buffer memory.

상기 메인 메모리 내에는 데이터가 무작위로 저장되어 있을 수 있다. 그러나, 이와 같이 데이터가 무작위로 저장되어 있는 경우, 상기 메인 메모리에서 데이터를 읽어오려면, 비-일련적인 메모리 주소가 사용되어야 하므로, CAS(Column Address Strobe) Latency 및 RAS(Row Address Strobe) Latency이 발생할 가능성이 있다. Data may be randomly stored in the main memory. However, when data is randomly stored in this way, to read data from the main memory, non-sequential memory addresses must be used, so CAS (Column Address Strobe) Latency and RAS (Row Address Strobe) Latency may occur. There is a possibility.

이러한 문제를 해결하기 위하여, AMC는 상기 메인 메모리 내의 데이터를 상기 ANN 데이터 지역성 정보에 기초하여 재정렬할 수 있다. 구체적으로, 상기 AMC는 상기 메인 메모리 내에 저장된 데이터들을 상기 버퍼 메모리 내에 일시 저장한 후, 상기 ANN 데이터 지역성 정보에 기초하여 버스트 동작이 가능하도록 상기 메인 메모리에 저장된 상기 데이터를 재정렬할 수 있다. To solve this problem, the AMC may rearrange the data in the main memory based on the ANN data locality information. Specifically, after temporarily storing the data stored in the main memory in the buffer memory, the AMC may rearrange the data stored in the main memory to enable a burst operation based on the ANN data locality information.

한편, 데이터가 재정렬되면, 메모리 주소가 변경될 수 있다. 따라서, 상기 AMC 내의 TLB는 재정렬 이전의 구 메모리 주소와 상기 재정렬 이후의 신 메모리 주소를 테이블 형태로 저장할 수 있다. Meanwhile, when data is rearranged, a memory address may be changed. Accordingly, the TLB in the AMC may store the old memory address before the rearrangement and the new memory address after the rearrangement in the form of a table.

상기 NPU 내의 스케줄러가 구 메모리 주소를 사용하여 데이터를 요청하는 경우, 상기 AMC 내의 TLB는 상기 구 메모리 주소를 상기 신 메모리 주소로 변환하여 상기 메인 메모리 내에서 데이터를 읽어온 후, 상기 버퍼 메모리 내에 저장할 수 있다. 따라서, 도 21과 달리, 상기 TLB를 통해서 NPU 스케줄러에 저장된 메모리 어드레스를 갱신할 필요가 없어도 메인 메모리가 버스트 모드로 동작할 수 있게 된다.When the scheduler in the NPU requests data using the old memory address, the TLB in the AMC converts the old memory address to the new memory address, reads data from the main memory, and stores the data in the buffer memory. can Therefore, unlike FIG. 21, the main memory can operate in burst mode without needing to update the memory address stored in the NPU scheduler through the TLB.

상술한 다양한 예시들에서 AMC와 NPU는 분리된 구성으로 도시되어 있지만, AMC는 NPU에 포함되도록 구성되는 것도 가능하다. In the various examples described above, the AMC and the NPU are illustrated as separate configurations, but the AMC may be configured to be included in the NPU.

도 52는 도 51에 도시된 제6 예시에 따른 동작을 나타낸 예시도이다.52 is an exemplary diagram illustrating an operation according to the sixth example shown in FIG. 51 .

도 52를 참조하여 알 수 있는 바와 같이, 메모리 주소 맵은 ANN DL에 따른 테이블에 기초하여 설정될 수 있다. AMC의 버퍼 메모리 내에는 ANN DL 정보 내의 순서에 따라 데이터가 순차적으로 사전에 캐싱된다. 버퍼 메모리가 가득 차지 않게, 버퍼 메모리의 사이즈에 기초하여, 오래된 순서의 데이터들 부터 삭제될 수 있다. As can be seen with reference to FIG. 52 , the memory address map may be set based on a table according to the ANN DL. In the buffer memory of the AMC, data are sequentially cached in advance according to the order in the ANN DL information. In order not to fill the buffer memory, data in the oldest order may be deleted based on the size of the buffer memory.

도 53a 및 도 53b는 합성곱의 예시를 나타낸 예시도들이다.53A and 53B are exemplary views illustrating examples of convolution.

도 53a를 참조하면, 합성곱 연산을 수행하기 위한 제1 레이어가 나타나 있다. 입력 특징맵의 크기는 9x9x1일 수 있고, 가중치를 포함하는 커널의 크기는 3x3x1일 수 있고, 스트라이드(stride)는 1이고, 출력특징맵의 크기는 7x7x1인 것으로 나타나 있다. Referring to FIG. 53A , a first layer for performing a convolution operation is shown. The size of the input feature map may be 9x9x1, the size of the kernel including weights may be 3x3x1, the stride is 1, and the size of the output feature map is 7x7x1.

메인 메모리로부터 제1 입력 특징맵을 화살표 방향으로 읽어내면, 합성곱 연산을 보다 빠르게 시작할 수 있다. 상기 제1 입력 특징맵을 읽어내는 방향은, 커널의 높이 만큼 수직 방향으로 읽은 후, 수평방향으로 읽어내는 것일 수 있다. If the first input feature map is read from the main memory in the direction of the arrow, the convolution operation can be started faster. The reading direction of the first input feature map may be reading in a horizontal direction after reading in a vertical direction by the height of the kernel.

도 53b를 참조하면, 합성곱 연산을 수행하기 위한 제2 레이어가 예시적으로 나타나 있다. 입력 특징맵의 크기는 7x7x1일 수 있고, 가중치를 포함하는 커널의 크기는 3x3x1일 수 있고, 스트라이드(stride)는 1이고, 출력특징맵의 크기는 5x5x1인 것으로 나타나 있다. Referring to FIG. 53B , a second layer for performing a convolution operation is exemplarily shown. The size of the input feature map may be 7x7x1, the size of the kernel including weights may be 3x3x1, the stride is 1, and the size of the output feature map is 5x5x1.

메인 메모리로부터 제2 입력 특징맵을 화살표 방향으로 읽어내면, 합성곱 연산을 보다 빠르게 시작할 수 있다. 상기 제2 입력 특징맵을 읽어내는 방향은, 커널의 높이 만큼 수직 방향으로 읽은 후, 수평방향으로 읽어내는 것일 수 있다. If the second input feature map is read from the main memory in the direction of the arrow, the convolution operation can be started faster. The reading direction of the second input feature map may be reading in a horizontal direction after reading in a vertical direction by the height of the kernel.

도 54는 메인 메모리 내의 데이터를 버퍼 메모리에 캐싱한 후, 타일링 기법에 기초하여 연산을 수행하는 다른 예를 나타낸다.54 illustrates another example of caching data in the main memory in the buffer memory and then performing an operation based on a tiling technique.

도 54를 참조하면, 메인 메모리와 AMC 내의 버퍼 메모리(캐쉬 메모리)가 나타나 있다. 상기 메인 메모리와 상기 버퍼 메모리는 시스템 버스로 연결될 수 있다. 도 54의 예시는 타일링 개념이 적용된 예시이다. 이하 타일링 예시에 대하여 설명한다. Referring to FIG. 54 , the main memory and the buffer memory (cache memory) in the AMC are shown. The main memory and the buffer memory may be connected to each other through a system bus. 54 is an example to which the tiling concept is applied. Hereinafter, an example of tiling will be described.

상기 메인 메모리에 저장되는 커널, 입력 특징맵 및 출력 특징맵 중 적어도 하나가 타일링 될 수 있다. 상기 메인 메모리의 메모리 맵은 타일링 될 수 있다.At least one of a kernel, an input feature map, and an output feature map stored in the main memory may be tiled. The memory map of the main memory may be tiled.

상기 버퍼 메모리에 저장되는 커널, 입력 특징맵 및 출력 특징맵 중 적어도 하나는 타일링 될 수 있다. 상기 버퍼 메모리의 메모리 맵은 타일링 될 수 있다.At least one of a kernel, an input feature map, and an output feature map stored in the buffer memory may be tiled. The memory map of the buffer memory may be tiled.

도시된 바와 같이 제1 레이어(Conv1)를 위한 입력 특징맵은 단지 설명의 편의를 위해서 18x18x1 크기로 가정한다. 상기 입력 특징맵은 9x9x1 크기인 4개의 입력 특징맵으로 타일링 될 수 있다. As shown, it is assumed that the input feature map for the first layer Conv1 has a size of 18x18x1 for convenience of description only. The input feature map may be tiled into four input feature maps having a size of 9x9x1.

즉, 제1 레이어(Conv1)를 위한 제1 입력 특징맵 은 제1 입력 특징맵 타일(IFMAP_1-1), 제2 입력 특징맵 타일(IFMAP_1-2), 제3 입력 특징맵 타일(IFMAP_1-3), 및 제4 입력 특징맵 타일(IFMAP_1-4)로 타일링 될 수 있다. 상기 4개의 입력 특징맵 타일은 조합되어 제1 입력 특징맵이 될 수 있다.That is, the first input feature map for the first layer Conv1 is a first input feature map tile (IFMAP_1-1), a second input feature map tile (IFMAP_1-2), and a third input feature map tile (IFMAP_1-3). ), and a fourth input feature map tile (IFMAP_1-4). The four input feature map tiles may be combined to form a first input feature map.

이때, 제1 레이어(Conv1)의 제1 커널(Kernel_1)은 재사용 될 수 있다. 따라서 각각의 타일의 합성곱에는 동일한 커널이 사용될 수 있다. 이러한 경우, 상기 제1 커널(Kernel_1)은 상기 4개의 타일링이 완료될 때 까지, NPU 내부 메모리에서 재사용 될 수 있다.In this case, the first kernel Kernel_1 of the first layer Conv1 may be reused. Therefore, the same kernel may be used for the convolution of each tile. In this case, the first kernel Kernel_1 may be reused in the NPU internal memory until the four tiling is completed.

즉, 제1 커널(Kernel_1)과 제1 입력 특징맵 타일(IFMAP_1-1)을 합성곱 하면 제1 출력 특징맵 타일(OFMAP_1-1)이 생성 된다. 제1 커널(Kernel_1)과 제2 입력 특징맵 타일(IFMAP_1-2)을 합성곱 하면 제2 출력 특징맵 타일(OFMAP_1-2)이 생성 된다. 제1 커널(Kernel_1)과 제3 입력 특징맵 타일(IFMAP_1-3)을 합성곱 하면 제3 출력 특징맵 타일(OFMAP_1-3)이 생성 된다. 제1 커널(Kernel_1)과 제4 입력 특징맵 타일(IFMAP_1-4)을 합성곱 하면 제4 출력 특징맵 타일(OFMAP_1-4)이 생성 된다. 상기 4개의 출력 특징맵 타일은 조합되어 제1 출력 특징맵이 될 수 있다.That is, when the first kernel Kernel_1 and the first input feature map tile IFMAP_1-1 are convolved, a first output feature map tile OFMAP_1-1 is generated. A second output feature map tile OFMAP_1-2 is generated by convolution of the first kernel Kernel_1 and the second input feature map tile IFMAP_1-2. When the first kernel Kernel_1 and the third input feature map tile IFMAP_1-3 are convolved, a third output feature map tile OFMAP_1-3 is generated. A fourth output feature map tile OFMAP_1-4 is generated by convolution of the first kernel Kernel_1 and the fourth input feature map tile IFMAP_1-4. The four output feature map tiles may be combined to form a first output feature map.

이때, 상기 메인 메모리의 메모리 맵은 타일링 된 인공신경망 데이터 지역성 정보에 기초하여 버스트 모드로 동작 가능하도록 설정될 수 있다. 즉, 타일링 방식에 따라서 인공신경망 데이터 지역성 정보는 변경될 수 있다. 타일링 규칙은 다양하게 변형될 수 있다.In this case, the memory map of the main memory may be set to operate in a burst mode based on the tiled artificial neural network data locality information. That is, the artificial neural network data locality information may be changed according to the tiling method. The tiling rule may be variously modified.

즉, ANN 데이터 지역성 정보는 NPU가 메인 메모리에 요청할 데이터의 순서를 포함하며, 타일링에 따른 순서도 포함된다.That is, the ANN data locality information includes the order of data requested by the NPU to the main memory, and also includes the order according to tiling.

예를 들면, ANN 데이터 지역성 정보는, 제1 입력 특징맵 타일(IFMAP_1-1), 제2 입력 특징맵 타일(IFMAP_1-2), 제3 입력 특징맵 타일(IFMAP_1-3), 및 제4 입력 특징맵 타일(IFMAP_1-4) 순서를 포함할 수 있다.For example, the ANN data locality information includes a first input feature map tile (IFMAP_1-1), a second input feature map tile (IFMAP_1-2), a third input feature map tile (IFMAP_1-3), and a fourth input It may include the order of the feature map tiles (IFMAP_1-4).

예를 들면, ANN 데이터 지역성 정보는, 제4 입력 특징맵 타일(IFMAP_1-4), 제3 입력 특징맵 타일(IFMAP_1-3), 제2 입력 특징맵 타일(IFMAP_1-2), 및 제1 입력 특징맵 타일(IFMAP_1-1) 순서를 포함할 수 있다.For example, the ANN data locality information includes a fourth input feature map tile (IFMAP_1-4), a third input feature map tile (IFMAP_1-3), a second input feature map tile (IFMAP_1-2), and a first input The order of the feature map tiles (IFMAP_1-1) may be included.

즉, AMC의 버퍼 메모리는 ANN 데이터 지역성 정보를 제공받거나 또는 생성하여 NPU가 요청할 순서를 예측하고, 상기 순서에 대응되는 데이터를 순차적으로 캐싱 할 수 있다.That is, the buffer memory of the AMC may receive or generate ANN data locality information, predict the order requested by the NPU, and sequentially cache data corresponding to the order.

도 55는 본 개시의 다양한 예시들에 따른 인공신경망 메모리 시스템을 설명하는 개략도이다.55 is a schematic diagram illustrating an artificial neural network memory system according to various examples of the present disclosure.

도 55을 참조하면, NPU와 하나 이상의 내부 메모리가 시스템온칩(System on Chip: SoC) 형태로 구현되어 있다. 상기 내부 메모리는 SRAM일 수 있다. 따라서, 상기 NPU와 상기 내부 메모리는 SRAM 인터페이스를 통해 연결될 수 있다. Referring to FIG. 55 , the NPU and one or more internal memories are implemented in the form of a System on Chip (SoC). The internal memory may be SRAM. Accordingly, the NPU and the internal memory may be connected through an SRAM interface.

상기 SoC와 상기 메인 메모리 사이에는 AMC가 배치될 수 있다. AMC는 ANN DL 정보를 기초로 메인 메모리와 내부 메모리 사이에서 NPU가 요청할 데이터를 예측하여 NPU가 요청하기 전까지 사전에 데이터를 메인 메모리에서 캐싱하도록 구성될 수 있다.An AMC may be disposed between the SoC and the main memory. The AMC may be configured to predict the data to be requested by the NPU between the main memory and the internal memory based on the ANN DL information, and to cache the data in the main memory in advance until the NPU requests it.

상기 내부 메모리는 가중치를 저장하는 제1 내부 메모리, 입력 특징맵을 저장하는 제2 내부 메모리 그리고 출력 특징맵을 저장하는 제3 내부 메모리를 포함할 수 있다. 상기 3개의 내부 메모리들은 물리적인 하나의 메모리 내에서 할당된 복수의 논리적 영역들일 수 있다. 예를 들어, 상기 제2 내부 메모리의 크기는 128 KB일 수 있고, 상기 제3 내부 메모리의 크기는 196 KB일 수 있다.The internal memory may include a first internal memory for storing weights, a second internal memory for storing an input feature map, and a third internal memory for storing an output feature map. The three internal memories may be a plurality of logical regions allocated in one physical memory. For example, the size of the second internal memory may be 128 KB, and the size of the third internal memory may be 196 KB.

상기 NPU는 PE들을 포함하는 PE 어레이와 SFU(special function unit)를 포함할 수 있다. 상기 NPU는 상기 제1 내무 메모리로부터 가중치를 읽어오고, 상기 제2 내부 메모리로부터 상기 입력 특징맵을 읽어온 후, 상기 입력 특징맵과 상기 가중치에 대해 합성곱 연산을 수행한 후, 출력 특징맵을 출력한다. 그리고 상기 NPU의 SFU는 상기 출력 특징맵을 상기 제3 내무 메모리에 저장한다.The NPU may include a PE array including PEs and a special function unit (SFU). The NPU reads the weights from the first internal memory, reads the input feature map from the second internal memory, performs a convolution operation on the input feature map and the weights, and generates an output feature map print out And the SFU of the NPU stores the output feature map in the third internal memory.

그리고 SoC 외부에는 하나 이상의 외부 메인 메모리가 존재하며, 상기 SoC와 전기 접속되어 연결된다. 상기 외부 메인 메모리는 SAM 또는 DRAM일 수 있다. 따라서, 상기 하나 이상의 외부 메인 메모리와 상기 SoC는 DRAM 인터페이스를 통해 연결될 수 있다. In addition, one or more external main memories exist outside the SoC, and are electrically connected to the SoC. The external main memory may be a SAM or DRAM. Accordingly, the one or more external main memories and the SoC may be connected through a DRAM interface.

상기 외부 메인 메모리는 가중치를 저장하기 위한 제1 외부 메인 메모리와 특징맵을 저장하기 위한 제2 외부 메인 메모리를 포함할 수 있다. 상기 2개의 외부 메인 메모리들은 물리적인 하나의 메모리 내에서 할당된 복수의 영역들일 수 있다. The external main memory may include a first external main memory for storing weights and a second external main memory for storing a feature map. The two external main memories may be a plurality of areas allocated within one physical memory.

상기 SoC는 읽기 명령을 통하여 상기 제1 외부 메인 메모리 내의 상기 가중치와 상기 제2 외부 메인 메모리 내의 특징맵을 읽어낸 후 상기 제1 내부 메모리 및 상기 제2 내부 메모리에 각기 저장한다. 또한, 상기 SoC는 쓰기 명령을 통하여 상기 제3 내부 메모리 내의 출력 특징맵을 상기 제2 외부 메인 메모리에 저장할 수 있다.The SoC reads the weight in the first external main memory and the feature map in the second external main memory through a read command, and stores the reads in the first internal memory and the second internal memory, respectively. Also, the SoC may store the output feature map in the third internal memory in the second external main memory through a write command.

도 56은 도 57에 도시된 SFU의 상세 동작 구성을 나타낸다.Figure 56 shows the detailed operation configuration of the SFU shown in Figure 57.

도 56에 도시된 SFU의 각 동작 구성은 아래와 같은 표로 정리될 수 있다.Each operation configuration of the SFU shown in Figure 56 may be organized in the following table.

설명explanation 동작movement Zero point addZero point add 필터 또는 텐서(Tensor) 별 오프셋 덧셈(Dequantize offset 연산)Offset addition by filter or tensor (Dequantize offset operation) Int addInt add Int2floatInt2float 타입 캐스팅type casting ScaleScale 필터 또는 텐서 별 스케일(Scale) 곱셈(Dequantize offset 연산)Scale multiplication by filter or tensor (dequantize offset operation) Float mulfloat mul Bias addBias add 필터 벼로 바이어스 값 덧셈Add bias value with filter paddy Float addFloat add BatchBatch 필터 별 복소수 포인트 값과 곱셈/덧셈Complex point values per filter and multiplication/addition Float mul, Float addFloat mul, Float add Skip addSkip add 블록 이전 출력과 element wise 덧셈
(skip connection add)Pre-block output and element-wise addition
(skip connection add) Float addFloat add ActivationActivation 활성화 함수activation function SE mulSE mul SE 블록 출력과 이전 출력과 channel wise 곱셈(SE 모듈 출력과의 곱셈)Channel wise multiplication of SE block output with previous output (multiplication with SE module output) Float mulfloat mul AvgpoolAvgpool 누적 후에 특징 차원 나누기Divide Feature Dimensions After Stacking Float add, Float mulFloat add, Float mul QuantizeQuantize Zero point 덧셈 및 스케일 곱셈Zero point addition and scale multiplication Float add, Float mulFloat add, Float mul Float2intfloat2int 타입 캐스팅type casting

도 57은 버퍼 메모리(캐시)와 메인 메모리 간에 데이터 버스의 대역폭을 측정한 그래프를 나타낸다.57 is a graph showing a measurement of the bandwidth of the data bus between the buffer memory (cache) and the main memory.

도 57에 나타난 그래프는 버퍼 메모리(캐시)와 메인 메모리가 AXI4 인터페이스로 연결되어 있을 때, 대역폭을 측정한 결과를 나타낸다.The graph shown in FIG. 57 shows the result of measuring the bandwidth when the buffer memory (cache) and the main memory are connected through the AXI4 interface.

상기 대역폭의 측정은 2 Mbyte의 데이터를 메인 메모리인 DRAM에서 버퍼 메모리 인 SRAM으로 읽어 내는 상황에서 수행되었고, AXI 버스트 길이 별(1~16)로 각기 10번 수행되었다. The bandwidth measurement was performed in a situation where 2 Mbyte of data was read from DRAM, which is the main memory, to SRAM, which is the buffer memory, and was performed 10 times for each AXI burst length (1 to 16).

도 59에 도시된 그래프를 표로 정리하면 아래와 같다.The graph shown in FIG. 59 is summarized in a table as follows.

　 버스트 길이burst length 1개 버스트1 burst 2개 버스트2 bursts 4개 버스트4 bursts 8개 버스트8 bursts 16개 버스트16 bursts Linear
AddressLinear
Address Time (ns)Time (ns) 2,310,440 2,310,440 1,198,699 1,198,699 654,484 654,484 378,766 378,766 242,023 242,023 Bandwidth (Gb/sec)Bandwidth (Gb/sec) 6.93 6.93 13.35 13.35 24.45 24.45 42.24 42.24 66.11 66.11 Random
AddressRandom
Address Time (ns)Time (ns) 6,108,015 6,108,015 1,738,665 1,738,665 983,017 983,017 617,457 617,457 363,018 363,018 Bandwidth (Gb/sec)Bandwidth (Gb/sec) 2.62 2.62 9.20 9.20 16.28 16.28 25.91 25.91 44.07 44.07

버스트 길이와 상관없이 주소(ADDRESS)가 선형(linear)일 때, 전송 대역폭, 즉 전송 속도가 향상된다. Regardless of the burst length, when the address ADDRESS is linear, the transmission bandwidth, that is, the transmission speed is improved.

버스트 길이가 동일하다면, 선형 주소를 사용하는 것이 전송 속도가 더 빠를 수 있다. read-burst가 되도록 메인 메모리인 DRAM의 주소를 효율적으로 할당하는 것이 유리할 수 있다.If the burst lengths are the same, using a linear address may result in a faster transfer rate. It may be advantageous to efficiently allocate the address of the DRAM, which is the main memory, to be read-burst.

버스트 길이란 버스트로 한 번에 읽어오는 길이를 의미한다. 선형인 경우, 버스트 길이가 짧더라도, DRAM 주소가 연속되기 때문에, RAS 지연 및/또는 CAS 지연을 감소시킬 수 있다.The burst length means the length of reading at a time in bursts. In the linear case, even if the burst length is short, since the DRAM addresses are contiguous, RAS delay and/or CAS delay can be reduced.

즉, ANN 데이터 지역성 정보를 기초로 메인 메모리의 메모리 맵을 선형으로 설정하면, 랜덤 한 경우보다 대역폭이 증가한다. 따라서 메인 메모리와 버퍼 메모리 사이의 실효 대역폭을 증가시킬 수 있다.That is, if the memory map of the main memory is set linearly based on the ANN data locality information, the bandwidth increases compared to the random case. Accordingly, the effective bandwidth between the main memory and the buffer memory can be increased.

<본 명세서 개시들의 간략 정리><Simplified summary of the disclosures of the present specification>

본 명세서의 일 개시에 따르면 인공 신경망을 위한 메모리 장치가 제시된다. 상기 메모리 장치는 N개의 컬럼 및 M개의 로우를 가지는 적어도 하나의 메모리 셀 어레이; 및 기 설정된 순차 접근 정보에 기초하여 상기 적어도 하나의 메모리 셀 어레이의 데이터 읽기 또는 쓰기 동작을 순차적인 버스트 모드로 동작 시키도록 구성된 메모리 제어부를 포함할 수 있다.According to one disclosure of the present specification, a memory device for an artificial neural network is provided. The memory device may include: at least one memory cell array having N columns and M rows; and a memory controller configured to sequentially operate the data read or write operation of the at least one memory cell array in a burst mode based on preset sequential access information.

상기 인공신경망 메모리 제어부는, 기 설정된 연산 순서 정보 및 대응되는 연산 순서 각각의 데이터 크기를 기초로 순차적인 매너(manner)로 생성된 메모리 맵을 저장하도록 구성될 수 있다.The artificial neural network memory control unit may be configured to store a memory map generated in a sequential manner based on preset operation order information and a data size of each corresponding operation order.

상기 인공신경망 데이터 지역성은 가중치, 입력 특징맵, 및 출력 특징맵을 식별하는 신호를 더 포함할 수 있다. 상기 가중치 데이터, 입력 특징맵 데이터, 및 출력 특징맵 데이터의 연산 순서 패턴은 프로세서의 특성에 기초하여 컴파일될 때 결정될 수 있다.The artificial neural network data locality may further include a signal for identifying a weight, an input feature map, and an output feature map. An operation order pattern of the weight data, the input feature map data, and the output feature map data may be determined when compiled based on the characteristics of the processor.

이상에서 예시들에 설명된 특징, 구조, 효과 등은 본 개시의 하나의 예시에 포함되며, 반드시 하나의 예시에만 한정되는 것은 아니다. 나아가, 각 예시에서 예시된 특징, 구조, 효과 등은 예시들이 속하는 분야의 통상의 지식을 가지는 자에 의해 다른 예시들에 대해서도 조합 또는 변형되어 실시 가능하다. 따라서 이러한 조합과 변형에 관계된 내용들은 본 개시의 범위에 포함되는 것으로 해석되어야 할 것이다.Features, structures, effects, etc. described in the above examples are included in one example of the present disclosure, and are not necessarily limited to one example. Furthermore, the features, structures, effects, etc. illustrated in each example can be combined or modified for other examples by those of ordinary skill in the art to which the examples belong. Accordingly, the contents related to such combinations and variations should be interpreted as being included in the scope of the present disclosure.

또한, 이상에서 예시를 중심으로 설명하였으나 이는 단지 예시일 뿐 본 발명을 한정하는 것이 아니며, 본 개시가 속하는 분야의 통상의 지식을 가진 자라면 본 예시의 본질적인 특성을 벗어나지 않는 범위에서 이상에 예시되지 않은 여러 가지의 변형과 응용이 가능함을 알 수 있을 것이다. 예를 들어, 예시에 구체적으로 나타난 각 구성요소는 변형하여 실시할 수 있는 것이다. 그리고 이러한 변형과 응용에 관계된 차이점들은 첨부된 청구 범위에서 규정하는 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다.In addition, although it has been described mainly by way of example in the above, it is only an example and does not limit the present invention, and those of ordinary skill in the art to which the present disclosure pertains are not exemplified above in a range that does not depart from the essential characteristics of this example. It can be seen that various modifications and applications are possible. For example, each component specifically shown in the example may be implemented by modification. And differences related to such modifications and applications should be construed as being included in the scope of the present invention defined in the appended claims.

Claims

at least one memory cell array having N columns and M rows; and
A memory device comprising a; a memory controller configured to operate the data read or write operation of the at least one memory cell array in a sequential burst mode based on preset sequential access information.

According to claim 1,
wherein the at least one memory cell array further comprises a plurality of dynamic memory cells having a leakage current characteristic.

The method of claim 1 , wherein the at least one memory cell array comprises:
a column decoder for controlling access to the N columns;
a plurality of bit lines connected to the column decoder;
a row decoder for controlling access to the M rows;
a plurality of word lines connected to the row decoder; and
and a sense amplifier connected to one end of the plurality of bit lines.

According to claim 1,
The memory controller is configured to control data communication between a processor to process an artificial neural network operation and the at least one memory cell array in which data required for the artificial neural network operation is stored based on the sequential access information.

According to claim 1,
The memory device, characterized in that the sequential access information is generated based on the "artificial neural network data locality" of the artificial neural network to be processed by the processor.

According to claim 1,
The memory controller is configured to directly control the N column and M row addresses of the at least one memory cell array so that the at least one memory cell array operates in the sequential burst mode based on the sequential access information. configured, memory device.

According to claim 1,
The memory controller is configured to set memory addresses of data for each operation step to be stored in the at least one memory cell array based on the sequential access information.

According to claim 1,
The memory controller is configured to store artificial neural network data by sequentially allocating address values corresponding to the N columns and M rows of the at least one memory cell array.

at least one memory cell array; and
A memory device comprising: a memory controller configured to directly control a read or write operation of the at least one memory cell array based on artificial neural network data locality.

9. The method of claim 8,
The artificial neural network data locality, the memory device, characterized in that it includes predetermined operation order information of the artificial neural network to be processed by the processor.

9. The method of claim 8,
The artificial neural network data locality, the memory device, characterized in that it includes data size information of each of the predetermined operation order.

9. The method of claim 8,
The artificial neural network memory control unit is configured to store a memory map generated in a sequential manner based on preset arithmetic order information and a data size of each corresponding arithmetic order.

9. The method of claim 8,
The artificial neural network data locality further includes a signal for identifying a weight, an input feature map, and an output feature map,
and an operation order pattern of the weight data, the input feature map data, and the output feature map data is determined when compiled based on a characteristic of a processor.

9. The method of claim 8,
The artificial neural network data locality, characterized in that determined based on at least one of a characteristic of an artificial neural network model, a characteristic of a processor, a size of a cache memory, and/or an arithmetic algorithm policy, the memory device.

at least one dynamic memory cell array; and
and a memory controller configured to store artificial neural network data in an order based on artificial neural network data locality in the at least one memory cell array.

16. The method of claim 15,
The memory device, wherein the order based on the artificial neural network data locality includes at least a repeating pattern of an input feature map, a kernel, and an output feature map order.

16. The method of claim 15,
The memory device, wherein the order based on the artificial neural network data locality includes at least a repeating pattern of a kernel, an input feature map, and an output feature map order.

16. The method of claim 15,
The artificial neural network data locality consists of a data access request unit requested by the processor to the memory control unit,
The artificial neural network data locality includes order information of all data access requests required for one-time inference of the artificial neural network processed by the processor.

16. The method of claim 15,
The memory control unit is configured to divide the region of the at least one dynamic memory cell array into a kernel region and a feature map region based on identification information configured to distinguish the kernel, the input feature map, and the output feature map, memory device.

16. The method of claim 15,
The at least one dynamic memory cell array further includes a plurality of banks configured to enable an interleaving operation,
The memory control unit is characterized in that configured to divide and store the artificial neural network data in each bank to operate in a burst mode corresponding to the interleaving operation in the plurality of banks.

16. The method of claim 15,
The memory device, further comprising a processor, configured to provide the artificial neural network data locality information to the memory controller.

16. The method of claim 15,
and a processor configured to provide at least input feature map, kernel, and output feature map identification information to the memory controller.