KR20230135748A

KR20230135748A - Data storage device for determining a write address using a neural network

Info

Publication number: KR20230135748A
Application number: KR1020220033095A
Authority: KR
Inventors: 장준혁; 강승관; 오동석; 정명수
Original assignee: 에스케이하이닉스 주식회사; 한국과학기술원
Priority date: 2022-03-17
Filing date: 2022-03-17
Publication date: 2023-09-26
Also published as: US11921628B2; US20230297500A1

Abstract

본 기술에 의한 데이터 저장 장치는 다수의 단위 저장 공간을 포함하는 하나 또는 둘 이상의 비휘발성 메모리 장치; 쓰기 요청을 처리하기 위하여 다수의 단위 저장 공간 중 하나의 단위 저장 공간을 추천하는 추천 회로를 포함하되, 추천 회로는 요청 정보, 쓰기 요청에 대응하는 목표 주소, 다수의 단위 저장 공간에 저장된 데이터의 주소로부터 생성된 특징 데이터를 신경망 회로에 적용하여 하나의 단위 저장 공간을 추천한다.A data storage device according to the present technology includes one or more non-volatile memory devices including a plurality of unit storage spaces; It includes a recommendation circuit that recommends one unit storage space among a plurality of unit storage spaces to process a write request, and the recommendation circuit includes request information, a target address corresponding to the write request, and the address of data stored in the plurality of unit storage spaces. The feature data generated from this is applied to a neural network circuit to recommend one unit storage space.

Description

A data storage device that determines the write address using a neural network {DATA STORAGE DEVICE FOR DETERMINING A WRITE ADDRESS USING A NEURAL NETWORK}

본 기술은 신경망을 이용하여 쓰기 주소를 결정하는 데이터 저장 장치에 관한 것이다.This technology relates to a data storage device that determines the write address using a neural network.

솔리드 스테이트 드라이브(SSD)는 낸드 플래시 메모리와 같은 반도체 메모리에 데이터를 저장한다.Solid state drives (SSDs) store data in semiconductor memories such as NAND flash memory.

낸드 플래시 메모리는 단일 플레인 기준 읽기 최대 320MB/s, 쓰기 최대 42MB/s의 입출력 대역폭을 가진다.NAND flash memory has an input/output bandwidth of up to 320MB/s for reading and 42MB/s for writing on a single plane.

이에 비하여 SSD에 채택되고 있는 NVMe 인터페이스는 최대 7.9 GB/s의 대역폭을 가진다.In comparison, the NVMe interface used in SSDs has a bandwidth of up to 7.9 GB/s.

이와 같이 인터페이스 장치와 낸드 플래시 메모리 장치 사이의 속도 차이를 극복하기 위하여 디램 버퍼를 추가하거나 서로 다른 채널/칩/다이/플레인에 나누어 데이터를 병렬적으로 처리하는 기술을 채택한다.In order to overcome the speed difference between the interface device and the NAND flash memory device, a DRAM buffer is added or a technology that processes data in parallel by dividing it into different channels/chips/die/planes is adopted.

예를 들어 쓰기 동작 시에는 디램 버퍼에 데이터를 저장하고 버퍼의 데이터를 낸드 플래시 메모리에 플러시함으로써 쓰기 지연을 숨길 수 있다.For example, during a write operation, write delay can be hidden by storing data in a DRAM buffer and flushing the data in the buffer to NAND flash memory.

그러나 읽기 동작 시에는 낸드 플래시 메모리에서 출력된 데이터가 디램 버퍼를 통해 외부로 출력되기까지 대기해야 하므로 읽기 지연을 숨길 수 없다.However, during a read operation, the read delay cannot be hidden because the data output from the NAND flash memory must wait until it is output externally through the DRAM buffer.

SSD와 같은 데이터 저장 장치에 제공되는 읽기 요청은 랜덤한 주소 패턴을 가지는 것이 일반적이어서 데이터를 프리페칭하여 읽기 성능을 향상시키기 어려워서 병렬적인 읽기 동작을 통해 성능을 향상시키는 것이 사실상 유일한 방법이다.Read requests provided to data storage devices such as SSDs generally have random address patterns, so it is difficult to improve read performance by prefetching data, so improving performance through parallel read operations is virtually the only way.

종래의 SSD에서는 입출력 요청을 여러 채널로 분산시키는 채널 스트라이핑, 한 패키지에서 입출력 요청을 처리하는 동안 입력된 요청을 다른 패키지에서 처리하는 웨이 파이프라이닝 등의 병렬 처리 동작을 수행할 수 있다.Conventional SSDs can perform parallel processing operations such as channel striping, which distributes input/output requests to multiple channels, and way pipelining, which processes input requests in another package while processing input/output requests in one package.

또한 하나의 낸드 플래시 메모리 패키지는 독립적으로 동작 가능한 다수의 다이를 포함하며, 각 다이는 여러 개의 플레인을 포함한다. 다이는 웨이로 지칭하기도 한다.Additionally, one NAND flash memory package includes multiple dies that can operate independently, and each die includes multiple planes. Dai is also referred to as Wei.

이에 따라 다수의 다이에 번갈아가며 요청을 보내는 다이 인터리빙과 두 개 이상의 플레인을 동시에 동작시키는 멀티 플레인 동작을 통해 병렬 처리 동작을 수행한다.Accordingly, parallel processing operations are performed through die interleaving, which alternately sends requests to multiple dies, and multi-plane operation, which operates two or more planes simultaneously.

종래에는 채널, 패키지, 다이, 플레인 수준의 병렬 처리 동작을 통해 쓰기 주소를 결정할 수 있으나 이는 미래의 읽기 요청 패턴을 고려하지 못하여 데이터 저장 장치의 병렬성을 최대로 활용하지 못하는 문제가 있다.Conventionally, the write address can be determined through parallel processing operations at the channel, package, die, and plane levels, but this has the problem of not taking full advantage of the parallelism of the data storage device because it fails to consider future read request patterns.

그 결과 동일한 채널이나 동일한 다이에 저장된 데이터에 대한 읽기 요청이 다수 개 제공되는 경우에는 병렬적인 동작이 불가능하여 읽기 성능을 향상시키기 어려운 문제가 있다.As a result, when multiple read requests for data stored in the same channel or the same die are provided, parallel operations are impossible, making it difficult to improve read performance.

KRKR 10-2021-0019576 10-2021-0019576 AA US 2020/0110536 A1US 2020/0110536 A1

N. Elyasi, C. Choi, A. Sivasubramaniam, J. Yang and V. Balakrishnan, "Trimming the Tail for Deterministic Read Performance in SSDs," 2019 IEEE International Symposium on Workload Characterization (IISWC), 2019, pp. 49-58, doi: 10.1109/IISWC47752.2019.9042073. N. Elyasi, C. Choi, A. Sivasubramaniam, J. Yang and V. Balakrishnan, "Trimming the Tail for Deterministic Read Performance in SSDs," 2019 IEEE International Symposium on Workload Characterization (IISWC), 2019, pp. 49-58, doi: 10.1109/IISWC47752.2019.9042073. N. Elyasi, M. Arjomand, A. Sivasubramaniam, M. T. Kandemir and C. R. Das, "Content Popularity-Based Selective Replication for Read Redirection in SSDs," 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2018, pp. 1-15, doi: 10.1109/MASCOTS.2018.00009. N. Elyasi, M. Arjomand, A. Sivasubramaniam, M. T. Kandemir and C. R. Das, "Content Popularity-Based Selective Replication for Read Redirection in SSDs," 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems ( MASCOTS), 2018, pp. 1-15, doi: 10.1109/MASCOTS.2018.00009. J. Liang, Y. Xu, D. Sun and S. Wu, "Improving Read Performance of SSDs via Balanced Redirected Read," 2016 IEEE International Conference on Networking, Architecture and Storage (NAS), 2016, pp. 1-10, doi: 10.1109/NAS.2016.7549406. J. Liang, Y. Xu, D. Sun and S. Wu, "Improving Read Performance of SSDs via Balanced Redirected Read," 2016 IEEE International Conference on Networking, Architecture and Storage (NAS), 2016, pp. 1-10, doi: 10.1109/NAS.2016.7549406. U. Gupta et al., "The architectural implications of facebook's dnn-based personalized recommendation," in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020: IEEE, pp. 488-501. U. Gupta et al., "The architectural implications of facebook's dnn-based personalized recommendation," in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020: IEEE, pp. 488-501. https://www.tensorflow.org/text/guide/word_embeddings https://www.tensorflow.org/text/guide/word_embeddings

본 기술은 신경망 기술을 이용하여 미래에 읽기 요청을 처리할 때 병렬 처리 동작이 가능하도록 쓰기 주소를 결정하는 데이터 저장 장치에 관한 것이다.This technology relates to a data storage device that uses neural network technology to determine a write address to enable parallel processing operations when processing read requests in the future.

본 발명의 일 실시예에 의한 데이터 저장 장치는 다수의 단위 저장 공간을 포함하는 하나 또는 둘 이상의 비휘발성 메모리 장치; 쓰기 요청을 처리하기 위하여 다수의 단위 저장 공간 중 하나의 단위 저장 공간을 추천하는 추천 회로를 포함하되, 추천 회로는 요청 정보, 쓰기 요청에 대응하는 목표 주소, 다수의 단위 저장 공간에 저장된 데이터의 주소로부터 생성된 특징 데이터를 신경망 회로에 적용하여 하나의 단위 저장 공간을 추천한다.A data storage device according to an embodiment of the present invention includes one or more non-volatile memory devices including a plurality of unit storage spaces; It includes a recommendation circuit that recommends one unit storage space among a plurality of unit storage spaces to process a write request, and the recommendation circuit includes request information, a target address corresponding to the write request, and the address of data stored in the plurality of unit storage spaces. The feature data generated from this is applied to a neural network circuit to recommend one unit storage space.

본 기술은 신경망을 이용해 미래의 읽기 패턴을 고려하여 쓰기 주소를 결정함으로써 데이터 저장 장치 내부의 병렬성을 최대로 활용하여 읽기 성능을 향상시킬 수 있다.This technology uses a neural network to determine the write address by considering future read patterns, thereby improving read performance by maximizing the parallelism inside the data storage device.

도 1은 본 발명의 일 실시예에 의한 데이터 저장 장치를 나타내는 블록도.
도 2는 본 발명의 일 실시예에 의한 추천 회로를 나타내는 블록도.
도 3은 임베딩 테이블을 나타내는 도면.
도 4는 비교 벡터 갱신 동작을 나타내는 설명도.
도 5는 학습 동작을 위한 학습 데이터 생성 방법을 나타내는 설명도.
도 6은 본 발명의 효과를 나타내는 그래프. 1 is a block diagram showing a data storage device according to an embodiment of the present invention.
Figure 2 is a block diagram showing a recommendation circuit according to an embodiment of the present invention.
Figure 3 is a diagram showing an embedding table.
4 is an explanatory diagram showing a comparison vector update operation.
Figure 5 is an explanatory diagram showing a method of generating learning data for learning operations.
Figure 6 is a graph showing the effect of the present invention.

이하에서는 첨부한 도면을 참조하여 본 발명의 실시예를 개시한다.Hereinafter, embodiments of the present invention will be disclosed with reference to the attached drawings.

도 1은 본 발명의 일 실시예에 의한 데이터 저장 장치(1000)를 나타내는 블록도이다.Figure 1 is a block diagram showing a data storage device 1000 according to an embodiment of the present invention.

이하에서는 데이터 저장 장치(1000)의 일 예로서 NAND 플래시 메모리를 포함하는 솔리드 스테이트 드라이브(SSD)를 예로 들어 본 발명을 개시한다.Hereinafter, the present invention will be described using a solid state drive (SSD) including NAND flash memory as an example of the data storage device 1000.

이에 따라 데이터 저장 장치(1000)를 SSD 또는 SSD 장치로 지칭할 수 있다.Accordingly, the data storage device 1000 may be referred to as an SSD or SSD device.

데이터 저장 장치(1000)는 호스트 인터페이스(10), FTL(20), 디램(30), 트랜잭션 큐(40), 채널(50), 및 플래시 메모리(60)를 포함한다.The data storage device 1000 includes a host interface 10, FTL 20, DRAM 30, transaction queue 40, channel 50, and flash memory 60.

데이터 저장 장치(1000)는 추천 회로(100)를 더 포함한다. The data storage device 1000 further includes a recommendation circuit 100.

호스트 인터페이스(10)는 예를 들어 NVMe 표준에 의하여 호스트로부터 요청을 수신하고, 요청의 처리 결과를 호스트에 전송한다. NVMe 표준을 지원하는 인터페이스 기술 자체는 종래의 기술로서 구체적인 개시를 생략한다.The host interface 10 receives a request from the host according to the NVMe standard, for example, and transmits the request processing result to the host. The interface technology supporting the NVMe standard itself is a conventional technology, and detailed disclosure is omitted.

호스트 인터페이스(10)는 페이지 단위의 쓰기 요청을 생성하는데 쓰기 데이터가 한 페이지를 넘는 경우 다수의 페이지 단위의 쓰기 요청을 생성하여 개별적으로 FTL(20)과 추천 회로(100)에 제공할 수 있다. The host interface 10 generates a page-by-page write request. If the write data exceeds one page, multiple page-by-page write requests may be generated and individually provided to the FTL 20 and the recommendation circuit 100.

이때 다수의 페이지 단위의 쓰기 요청은 순차적으로 제공될 수 있다. 이에 따라 FTL(20)과 추천 회로(100)는 하나의 논리 페이지 단위로 동작할 수 있다.At this time, multiple page-level write requests may be provided sequentially. Accordingly, the FTL 20 and the recommendation circuit 100 can operate in units of one logical page.

FTL(Flash Translation Layer, 20)은 SSD에서 일반적으로 사용되는 구성으로서 주소 맵핑, 가비지 콜렉션, 웨어 레벨링 등의 동작을 제어하며 이에 대해서는 구체적인 개시를 생략한다.FTL (Flash Translation Layer, 20) is a configuration commonly used in SSDs and controls operations such as address mapping, garbage collection, and wear leveling, and detailed description of this will be omitted.

본 실시예에서 FTL(20)은 페이지 할당 회로(21)를 더 포함한다. In this embodiment, the FTL 20 further includes a page allocation circuit 21.

페이지 할당 회로(21)는 쓰기 요청에 대응하여 추천 회로(100)에서 제공한 추천 주소에 포함된 페이지를 할당한다.The page allocation circuit 21 allocates a page included in the recommendation address provided by the recommendation circuit 100 in response to a write request.

본 실시예에서 추천 회로(100)에서 제공하는 추천 주소는 다이 주소에 대응한다. 이하에서는 다이 주소를 다이 번호 또는 다이 ID로 지칭할 수 있다. 플래시 메모리에서 다이는 웨이로 지칭할 수 있으며 이에 따라 다이 주소를 웨이 주소로 지칭할 수도 있다.In this embodiment, the recommendation address provided by the recommendation circuit 100 corresponds to the die address. Hereinafter, the die address may be referred to as a die number or die ID. In flash memory, a die can be referred to as a way, and the die address can therefore be referred to as a way address.

즉 페이지 할당 회로(21)는 다이에 포함된 물리 페이지를 실제 쓰기 주소로 제공한다. 이에 따라 할당된 페이지를 쓰기 주소로 지칭하고 페이지 할당 회로(21)는 쓰기 주소 할당 회로(21)로 지칭할 수 있다.That is, the page allocation circuit 21 provides the physical page included in the die as the actual write address. Accordingly, the allocated page may be referred to as a write address, and the page allocation circuit 21 may be referred to as the write address allocation circuit 21.

본 발명에서 제공하는 추천 주소가 다이 주소에 한정되는 것은 아니며 다른 실시예에서는 병렬적 처리를 수행할 수 있는 다른 단위의 주소를 추천 주소로 제공할 수 있다. The recommended address provided in the present invention is not limited to the die address, and in other embodiments, addresses of other units capable of performing parallel processing may be provided as recommended addresses.

예를 들어 채널 단위의 병렬성 확보를 고려하는 경우 추천 주소로서 채널 주소를 제공할 수 있으며 페이지 할당 회로(21)는 추천된 채널 내에서 페이지 주소를 할당한다.For example, when considering securing channel-level parallelism, a channel address can be provided as a recommended address, and the page allocation circuit 21 allocates a page address within the recommended channel.

또 다른 예에서 다이의 하위 단위인 플레인 단위에서 병렬성 확보를 고려할 수 있는데 이 경우에는 추천 주소로서 플레인 주소를 제공할 수 있으며 페이지 할당 회로(21)는 추천된 플레인 내에서 페이지 주소를 할당할 수 있다.In another example, parallelism can be considered in the plane unit, which is a sub-unit of the die. In this case, a plane address can be provided as a recommended address, and the page allocation circuit 21 can allocate a page address within the recommended plane. .

이외에도 다양한 설계 변경이 가능하며 이들은 본 발명의 개시로부터 통상의 기술자가 용이하게 알 수 있는 것이다.In addition, various design changes are possible, and these can be easily recognized by those skilled in the art from the disclosure of the present invention.

디램(30)은 호스트와 트랜잭션 큐(40) 사이에 연결되어 읽기 요청 또는 쓰기 요청을 위한 데이터를 저장한다.The DRAM 30 is connected between the host and the transaction queue 40 to store data for read or write requests.

호스트 인터페이스(10)는 DMA 제어 회로(11)를 더 포함하며 이를 통해 DMA 기술을 이용하여 호스트에 저장된 데이터를 디램(50)으로 읽어오거나 디램(50)에 저장된 데이터를 호스트에 제공할 수 있다.The host interface 10 further includes a DMA control circuit 11, through which data stored in the host can be read into the DRAM 50 or data stored in the DRAM 50 can be provided to the host using DMA technology.

DMA 기술을 이용하여 호스트와 SSD 사이에서 데이터를 송수신하는 기술 자체는 종래의 기술이므로 이에 대한 구체적인 개시는 생략한다.Since the technology for transmitting and receiving data between a host and an SSD using DMA technology itself is a conventional technology, detailed disclosure thereof will be omitted.

FTL(20)은 할당된 페이지에 따라 주소 맵핑 동작을 수행할 수 있다.The FTL 20 can perform an address mapping operation according to the allocated page.

트랜잭션 큐(40)는 플래시 메모리(50) 장치에 대한 읽기, 쓰기 요청에 대응하여 명령과 데이터를 큐잉하는 동작을 수행한다. The transaction queue 40 performs an operation of queuing commands and data in response to read and write requests for the flash memory 50 device.

트랜잭션 큐(40)의 동작 자체는 종래의 구성이므로 구체적인 개시를 생략한다.Since the operation of the transaction queue 40 itself is a conventional structure, detailed description is omitted.

플래시 메모리(60)는 채널(50)에 연결되어 명령과 데이터를 송수신한다.The flash memory 60 is connected to the channel 50 to transmit and receive commands and data.

도 1에는 4개의 채널(50)이 도시되어 있으나 채널(50)의 개수는 실시예에 따라 다양하게 변경될 수 있다.Although four channels 50 are shown in FIG. 1, the number of channels 50 may vary depending on the embodiment.

도 1에는 하나의 채널(50)에 하나의 플래시 메모리(60)가 연결되어 있으나 실시예에 따라 하나의 채널(50)에 다수의 플래시 메모리(60)가 연결될 수 있다.In Figure 1, one flash memory 60 is connected to one channel 50, but depending on the embodiment, multiple flash memories 60 may be connected to one channel 50.

플래시 메모리(60) 각각은 그 내부에 병렬적으로 동작할 수 있는 다수의 다이(70)를 포함한다.Each flash memory 60 includes therein a plurality of dies 70 that can operate in parallel.

도 1에는 하나의 플래시 메모리(60)에 두 개의 다이(70)가 포함된 것으로 도시되어 있으나 실시예에 따라 하나의 플래시 메모리(60)에 더 많은 개수의 다이(70)가 포함될 수 있다.Although FIG. 1 shows two dies 70 included in one flash memory 60, a larger number of dies 70 may be included in one flash memory 60 depending on the embodiment.

추천 회로(100)는 호스트 인터페이스(10)를 통해 페이지 단위의 쓰기 요청을 수신하면 이에 대응하는 추천 주소를 결정하여 페이지 할당 회로(21)에 제공한다. 전술한 바와 같이 본 실시예에서 추천 주소는 다이 주소에 대응한다.When the recommendation circuit 100 receives a page-by-page write request through the host interface 10, it determines a corresponding recommendation address and provides it to the page allocation circuit 21. As described above, in this embodiment, the recommended address corresponds to the die address.

본 실시예에서 추천 회로(100)는 추천 주소 결정을 위하여 신경망 기술을 사용한다. In this embodiment, the recommendation circuit 100 uses neural network technology to determine recommended addresses.

추천 회로(100)는 하드웨어, 소프트웨어 또는 하드웨어와 소프트웨어의 조합으로 구현되어 신경망에 대한 학습 동작을 수행하고 학습된 신경망을 이용하여 추론 동작을 수행할 수 있다.The recommendation circuit 100 is implemented with hardware, software, or a combination of hardware and software, and can perform a learning operation for a neural network and perform an inference operation using the learned neural network.

추천 회로(100)는 미래에 수행될 읽기 요청을 처리하는 과정에서 내부 병렬성을 최대로 활용할 수 있도록 추천 주소를 결정한다.The recommendation circuit 100 determines a recommendation address to maximize internal parallelism in the process of processing read requests to be performed in the future.

본 기술에서는 할당하고자 하는 목표 페이지 정보와, 각 다이에 저장된 페이지 정보를 이용하여 목표 페이지와 비슷한 시점에 각 다이에 요청이 주어질 확률을 예측하고 이 확률이 가장 낮은 다이에서 목표 페이지를 할당한다. In this technology, the target page information to be assigned and the page information stored in each die are used to predict the probability that a request will be given to each die at a similar time as the target page, and the target page is assigned to the die with the lowest probability.

추천 회로(100)는 신경망 기술을 이용하여 위와 같은 확률이 가장 낮은 다이 주소를 추천한다.The recommendation circuit 100 uses neural network technology to recommend the die address with the lowest probability.

본 실시예에서는 신경망 기술의 일종인 추천 시스템 기술을 응용하는데 종래의 추천 시스템은 사용자의 정보를 나타내는 임베딩 벡터와 아이템의 정보를 나타내는 임베딩 벡터를 비교하여 사용자가 해당 아이템을 클릭할 확률을 출력한다.In this embodiment, recommender system technology, a type of neural network technology, is applied. The conventional recommendation system compares an embedding vector representing user information with an embedding vector representing item information and outputs the probability that the user will click on the item.

종래의 추천 시스템의 구성 및 동작은 비특허문헌 4에 상세히 기재된 바 있으므로 구체적인 설명을 생략한다.The configuration and operation of the conventional recommendation system have been described in detail in Non-Patent Document 4, so detailed description is omitted.

도 2는 추천 회로(100)를 나타내는 블록도이다.Figure 2 is a block diagram showing the recommendation circuit 100.

추천 회로(100)는 임베딩 테이블(110), 비교 벡터 생성 회로(120), 연접 회로(130), 다층 신경망 회로(140) 및 결정 회로(150)를 포함한다.The recommendation circuit 100 includes an embedding table 110, a comparison vector generation circuit 120, a concatenation circuit 130, a multilayer neural network circuit 140, and a decision circuit 150.

결정 회로(150)는 다층 신경망 회로(140)에서 출력된 다이별 점수를 참조하여 추천 주소를 결정한다.The decision circuit 150 determines a recommended address by referring to the score for each die output from the multilayer neural network circuit 140.

예를 들어 다이별 점수는 목표 페이지와 비슷한 시점에 해당 다이에 요청이 주어질 확률에 대응할 수 있다. 이 경우 다이별 점수가 가장 낮은 다이의 주소를 추천 주소로 결정할 수 있다.For example, the score for each die may correspond to the probability that a request will be made to that die at a similar time as the target page. In this case, the address of the die with the lowest score for each die can be determined as the recommended address.

다층 신경망 회로(140)는 특징 벡터를 입력받아 각 다이별 점수를 출력한다. 다층 신경망 회로(140)는 완전 연결 신경망의 형태를 가질 수 있다.The multilayer neural network circuit 140 receives feature vectors and outputs scores for each die. The multi-layer neural network circuit 140 may have the form of a fully connected neural network.

연접 회로(130)는 요청 벡터, 목표 벡터, 비교 벡터를 연접하여 하나의 특징 벡터를 생성한다.The concatenation circuit 130 concatenates the request vector, target vector, and comparison vector to generate one feature vector.

연접 회로(130)에 제공되는 요청 벡터는 요청의 순차 접근 또는 랜덤 접근 여부, 요청의 크기를 원소로 포함한다. The request vector provided to the concatenation circuit 130 includes as elements whether the request is sequentially accessed or randomly accessed, and the size of the request.

또한 연속된 요청에 대하여 같은 다이를 추천하지 않기 위하여 이전 요청에 대하여 추천했던 다이 주소를 더 포함한다.Additionally, in order to avoid recommending the same Dai for consecutive requests, the Dai address recommended for the previous request is further included.

임베딩 테이블(110)은 다수의 행을 구비한 테이블 구조로서 각 행은 논리 페이지 주소를 이용하여 접근할 수 있으며, 테이블의 행은 논리 페이지 주소에 대응하는 임베딩 벡터를 저장한다.The embedding table 110 is a table structure with multiple rows, and each row can be accessed using a logical page address, and the rows of the table store embedding vectors corresponding to logical page addresses.

도 3은 임베딩 테이블(110)의 구조를 나타내는데 예를 들어 논리 페이지 주소(LPN) p에 대응하는 임베딩 벡터는 {e0, e1, ..., e8, e9}이다.Figure 3 shows the structure of the embedding table 110. For example, the embedding vector corresponding to the logical page address (LPN) p is {e0, e1, ..., e8, e9}.

입력된 데이터를 벡터 형태로 변환하는 임베딩 기술은 비특허문헌 5와 같이 잘 알려진 것이므로 임베딩 기술 자체에 대한 구체적인 설명은 생략한다.Since the embedding technology that converts input data into vector form is well known, as in Non-Patent Document 5, a detailed description of the embedding technology itself will be omitted.

임베딩 테이블(110)은 신경망 학습 동작을 통해 결정되는데 이에 대해서는 아래에서 구체적으로 개시한다.The embedding table 110 is determined through a neural network learning operation, which is described in detail below.

본 실시예에서 목표 페이지 정보는 쓰기 요청된 논리 페이지 주소(LPN)에 대응한다.In this embodiment, the target page information corresponds to the logical page address (LPN) requested to be written.

비교 벡터 생성 회로(120)는 다수의 다이 각각에 대응하는 다수의 비교 벡터를 생성한다. The comparison vector generation circuit 120 generates a plurality of comparison vectors corresponding to each of the plurality of dies.

본 실시예에서는 다수의 다이가 존재하므로 비교 벡터 역시 이에 대응하여 다수 개가 생성된다.In this embodiment, since there are multiple dies, multiple comparison vectors are also generated correspondingly.

비교 벡터 생성 회로(120)는 예를 들어 1번 다이에 대응하는 비교 벡터를 생성하기 위하여 1번 다이에 저장된 논리 페이지 각각을 임베딩 테이블(110)에 적용하여 다수의 임베딩 벡터를 수신하고 수신된 다수의 임베딩 벡터를 원소별로 더하여 1번 다이에 대응하는 비교 벡터를 생성할 수 있다.For example, the comparison vector generation circuit 120 receives a plurality of embedding vectors by applying each logical page stored in die No. 1 to the embedding table 110 to generate a comparison vector corresponding to die No. 1, and generates a comparison vector corresponding to die No. 1. By adding the embedding vectors for each element, a comparison vector corresponding to die number 1 can be generated.

이와 같은 방식으로 다수의 다이에 대응하는 다수의 비교 벡터를 생성한다.In this way, multiple comparison vectors corresponding to multiple dies are generated.

목표 벡터는 페이지 단위의 쓰기 요청이 입력될 때마다 쓰기 요청된 논리 주소에 대응하여 생성될 수 있다. The target vector may be generated in response to the logical address requested to be written each time a page-by-page write request is input.

이에 비하여 비교 벡터 생성 회로(120)는 쓰기 요청이 수행되기 전에 미리 비교 벡터를 계산할 수 있다. In contrast, the comparison vector generation circuit 120 may calculate the comparison vector in advance before a write request is performed.

비교 벡터 생성 회로(120)는 다이에 대하여 쓰기 요청이 수행된 후 비교 벡터 갱신 동작을 추가로 수행할 수 있다.The comparison vector generation circuit 120 may additionally perform a comparison vector update operation after a write request is performed for the die.

도 4는 비교 벡터의 갱신 동작을 설명하는 도면이다.Figure 4 is a diagram explaining an update operation of a comparison vector.

도 4는 다이 A에서 논리 페이지(p)가 무효화되고 다이 B에 논리 페이지(p)가 기록되는 경우를 나타낸다. Figure 4 shows a case where the logical page (p) is invalidated on die A and the logical page (p) is written on die B.

이하에서, 논리 페이지(p)에 대응하는 임베딩 벡터를 Vp로 표현한다.Hereinafter, the embedding vector corresponding to the logical page (p) is expressed as Vp.

다이 A에 대응하는 기존의 비교 벡터(VA)에서 임베딩 벡터(Vp)를 뺀 값을 새로운 비교 벡터(VA')로 설정한다.The value obtained by subtracting the embedding vector (Vp) from the existing comparison vector (VA) corresponding to die A is set as the new comparison vector (VA').

또한 다이 B에 대응하는 기존의 비교 벡터(VB)에 임베딩 벡터(Vp)를 더한 값을 새로운 비교 벡터(VB')로 설정한다.Additionally, the value obtained by adding the embedding vector (Vp) to the existing comparison vector (VB) corresponding to die B is set as the new comparison vector (VB').

다이 C는 위의 쓰기 동작에 영향을 받지 않으므로 쓰기 동작 이후 비교 벡터(VC')는 기존의 비교 벡터(VC)와 동일하다.Since die C is not affected by the above write operation, the comparison vector (VC') after the write operation is the same as the existing comparison vector (VC).

입출력 요청이 짧은 시간 간격으로 주어지는 버스트 동작에서는 추천 회로(100)의 추론 동작으로 인하여 SSD의 입출력 성능을 저하시킬 수 있다.In a burst operation in which input/output requests are given at short time intervals, the input/output performance of the SSD may be degraded due to the inference operation of the recommender circuit 100.

목표 벡터는 요청이 입력될 때마다 새로 생성되어야 한다. The target vector must be newly created each time a request is entered.

이에 비하여 비교 벡터는 각 다이에 저장된 논리 페이지가 변경되지 않는 이상 변경되지 않으므로 호스트 요청이 제공되지 않는 시간 동안 비교 벡터를 미리 계산하여 성능 저하를 방지할 수 있다.In comparison, the comparison vector does not change unless the logical page stored in each die changes, so performance degradation can be prevented by pre-calculating the comparison vector during times when host requests are not served.

다이에 대한 쓰기, 소거 등의 요청이 수행되는 경우 해당 다이에 대해서는 비교 벡터를 새로 연산해야 한다. 이 경우에도 도 4와 같이 영향을 받는 페이지에 대응하는 임베딩 벡터를 기존의 비교 벡터에 더하거나 기존의 비교 벡터에서 뺌으로써 연산 시간을 줄일 수 있다.When a request such as write or erase is made to a die, a comparison vector must be newly calculated for the die. Even in this case, the computation time can be reduced by adding or subtracting the embedding vector corresponding to the affected page from the existing comparison vector, as shown in Figure 4.

추가적으로 본 기술에서는 추론 동작을 SSD의 입출력 동작 과정에 은닉하여 성능 저하를 방지할 수 있다.Additionally, this technology can prevent performance degradation by hiding the inference operation in the input/output operation process of the SSD.

호스트에서 SSD(1000)에 입출력 요청을 보내면 호스트 인터페이스(10)에서 NVMe 프로토콜에 의해 전송된 요청을 해석하여 요청의 종류, 논리 주소, 데이터 길이, 호스트 메모리 주소를 알아낸다.When the host sends an input/output request to the SSD 1000, the host interface 10 interprets the request transmitted by the NVMe protocol to determine the type of request, logical address, data length, and host memory address.

이후, DMA 제어 회로(11)는 호스트 메모리 주소를 이용하여 호스트로부터 데이터를 읽어서 디램(30)에 저장하거나 디램(30)의 데이터를 호스트에 저장한다.Afterwards, the DMA control circuit 11 reads data from the host using the host memory address and stores it in the DRAM 30, or stores the data in the DRAM 30 in the host.

본 실시예에서는 DMA 제어 회로(11)에서 데이터를 디램(30)에 저장하는 DMA 동작 도중에 추천 회로(100)의 동작을 함께 수행하여 추천 회로(100)의 동작에 걸리는 시간을 은닉함으로써 SSD의 성능 저하를 방지한다.In this embodiment, the DMA control circuit 11 performs the operation of the recommendation circuit 100 during the DMA operation of storing data in the DRAM 30 to hide the time taken for the operation of the recommendation circuit 100, thereby improving the performance of the SSD. prevent degradation.

추천 회로(100)의 임베딩 테이블(110) 및 다층 신경망 회로(140)는 각각 학습을 통해 값이 조절되는 변수를 포함한다.The embedding table 110 and the multi-layer neural network circuit 140 of the recommendation circuit 100 each include variables whose values are adjusted through learning.

본 실시예에서는 임베딩 테이블(110) 및 다층 신경망 회로(140)에 대해서 동시에 학습 동작을 수행하나 다른 실시예에서는 순차적으로 학습 동작을 수행할 수도 있다.In this embodiment, the learning operation is performed simultaneously on the embedding table 110 and the multilayer neural network circuit 140, but in other embodiments, the learning operation may be performed sequentially.

본 실시예에서는 지도 학습 방식을 적용하여 추천 회로(100)에 대한 학습 동작을 수행한다.In this embodiment, a learning operation for the recommendation circuit 100 is performed by applying a supervised learning method.

지도 학습에서는 입력값에 대응하는 출력값과 입력값에 대응하는 참값을 이용하여 계수를 조절한다. 이러한 지도 학습 기술 자체는 종래의 기술이므로 구체적인 개시를 생략한다.In supervised learning, the coefficients are adjusted using the output value corresponding to the input value and the true value corresponding to the input value. Since this supervised learning technology itself is a conventional technology, detailed disclosure is omitted.

지도 학습을 위해서는 신경망에 입력되는 입력값과 이에 대응하는 참값을 포함하는 데이터 셋을 준비해야 한다.For supervised learning, a data set containing input values input to the neural network and corresponding true values must be prepared.

본 실시예에서는 일정 시간 SSD(1000)에 제공되는 트레이스 데이터를 이용하여 데이터 셋을 준비할 수 있다.In this embodiment, a data set can be prepared using trace data provided to the SSD 1000 for a certain period of time.

도 5(A)는 트레이스 데이터의 예를 나타낸다.Figure 5(A) shows an example of trace data.

도시된 트레이스 데이터는 쓰기 요청(W1)과 그 이후 제공된 5개의 읽기 요청(R1 ~ R5)을 포함한다. 도 5는 5번째 읽기 요청(R5)과 쓰기 요청(W1)이 동일한 논리 주소에 대한 요청임을 가정한다.The trace data shown includes a write request (W1) followed by five read requests (R1 to R5). Figure 5 assumes that the fifth read request (R5) and the fifth write request (W1) are requests for the same logical address.

트레이스 데이터로부터 데이터 셋을 생성하는 경우 쓰기 요청에 대하여 할당되어야 할 참값의 다이 주소를 결정하여야 한다.When creating a data set from trace data, the die address of the true value to be assigned to the write request must be determined.

도 5(B)는 쓰기 요청에 대응하는 참값의 다이 주소를 결정하는 방법을 설명한다.Figure 5(B) explains a method for determining the true die address corresponding to a write request.

먼저 트레이스 데이터를 참고하여 1번에서 4번까지 4개의 읽기 요청에 대응하는 다이 주소를 표시한다.First, referring to the trace data, the die addresses corresponding to the four read requests from 1 to 4 are displayed.

도 5(B)는 1번 읽기 요청(R1)은 채널 B의 1번 다이에 할당되고, 2번 읽기 요청(R2)은 채널 B의 2번 다이에 할당되고, 3번 읽기 요청(R3)은 채널 A의 2번 다이에 할당되고, 4번 읽기 요청(R4)은 채널 B의 1번 다이에 할당된 것을 가정한다.In Figure 5(B), read request number 1 (R1) is assigned to die number 1 of channel B, read request number 2 (R2) is assigned to die number 2 of channel B, and read request number 3 (R3) is assigned to die number 2 of channel B. Assume that it is assigned to die number 2 of channel A, and read request number 4 (R4) is assigned to die number 1 of channel B.

도 5(B)에서 "버스"는 읽기 동작 결과 출력된 데이터가 채널 버스를 점유하는 시간을 나타낸다. In Figure 5(B), "bus" represents the time that data output as a result of a read operation occupies the channel bus.

도 5(B)를 참조하면 5번 읽기 요청(R5)은 제공되는 시점을 고려할 때 2번, 3번, 4번 읽기 요청과 처리 시간이 중첩될 수 있으므로 읽기 성능을 향상시키기 위하여 이들과 중첩되지 않는 다이를 선택하는 것이 바람직하다.Referring to Figure 5(B), considering the time at which read request No. 5 (R5) is provided, the processing time may overlap with read requests No. 2, 3, and 4, so it does not overlap with them to improve read performance. It is advisable to select a die that does not

이에 따라 병렬성을 향상시키기 위하여 5번 읽기 동작은 채널 A의 1번 다이에서 수행되는 것이 바람직하다.Accordingly, in order to improve parallelism, it is desirable that the 5th read operation is performed on die 1 of channel A.

이에 따라 쓰기 요청(W1)에 대응하는 다이 주소의 참값은 채널 A의 1번 다이로 결정된다.Accordingly, the true value of the die address corresponding to the write request (W1) is determined to be die number 1 of channel A.

이와 같은 방식으로 트레이스 데이터로부터 학습 데이터를 준비할 수 있다.In this way, learning data can be prepared from trace data.

도 2에서 추천 회로(100)는 학습 제어 회로(160)를 더 포함할 수 있다.In FIG. 2 , the recommendation circuit 100 may further include a learning control circuit 160.

학습 제어 회로(160)는 플래시 메모리(50)의 지정된 주소 영역에 미리 저장된 학습 데이터를 이용하여 지도 학습 과정을 제어할 수 있다.The learning control circuit 160 may control the supervised learning process using learning data pre-stored in a designated address area of the flash memory 50.

학습 제어 회로(160)는 디램(30) 또는 플래시 메모리(50)의 또 다른 지정된 주소 영역에 일정시간 동안 트레이스 데이터를 저장할 수 있으며 트레이스 데이터로부터 학습 데이터를 생성하여 플래시 메모리(50)의 지정된 주소 영역에 학습 데이터를 저장하거나 갱신할 수 있다.The learning control circuit 160 can store trace data for a certain period of time in the DRAM 30 or another designated address area of the flash memory 50 and generates learning data from the trace data in the designated address area of the flash memory 50. You can save or update learning data.

이에 따라 지도 학습 동작은 데이터 저장 장치(1000)의 초기화 과정에서만 수행될 수도 있고 데이터 저장 장치(1000)의 사용 도중 미리 지정된 시간 간격마다 수행되거나 데이터 저장 장치(1000)의 유휴 시간에 수행될 수도 있다.Accordingly, the supervised learning operation may be performed only during the initialization process of the data storage device 1000, may be performed at predetermined time intervals during use of the data storage device 1000, or may be performed during idle time of the data storage device 1000. .

도 6은 본 기술에 의한 효과를 나타내는 그래프이다.Figure 6 is a graph showing the effect of the present technology.

도 6의 그래프는 종래의 기술에 따라 쓰기 동작을 수행하는 경우의 읽기 레이턴시에 대비하여 본 기술에 따라 쓰기 동작을 수행하는 경우의 읽기 레이턴시의 비율을 나타낸다.The graph in FIG. 6 shows the ratio of read latency when a write operation is performed according to the present technology compared to the read latency when a write operation is performed according to the conventional technology.

도 6에서 가로축은 워크로드의 종류를 나타내고 세로축은 종래 기술 대비 읽기 레이턴시의 비율을 나타낸다.In Figure 6, the horizontal axis represents the type of workload and the vertical axis represents the ratio of read latency compared to the prior art.

도시된 바와 같이 모든 워크로드에 대비하여 본 기술을 적용하였을 때 읽기 레이턴시가 감소하였으며 평균 감소율은 25.4퍼센트이다.As shown, when this technology was applied to all workloads, read latency was reduced, with an average reduction rate of 25.4 percent.

이와 같이 본 기술을 적용하는 경우 데이터 저장 장치(1000)의 읽기 성능이 향상됨을 알 수 있다.In this way, it can be seen that when this technology is applied, the read performance of the data storage device 1000 is improved.

본 발명의 권리범위는 이상의 개시로 한정되는 것은 아니다. 본 발명의 권리범위는 청구범위에 문언적으로 기재된 범위와 그 균등범위를 기준으로 해석되어야 한다.The scope of rights of the present invention is not limited to the above disclosure. The scope of rights of the present invention should be interpreted based on the scope literally stated in the claims and the scope of equivalents thereof.

1000: 데이터 저장 장치(SSD 장치) 10: 호스트 인터페이스
11: DMA 제어 회로 20: FTL
21: 페이지 할당 회로 30: 디램(DRAM)
40: 트랜잭션 큐 50: 채널
60: 플래시 메모리 70: 다이
100: 추천 회로
110: 임베딩 테이블 120: 비교 벡터 생성 회로
130: 연접 회로 140: 다층 신경망 회로
150: 결정 회로 160: 학습 제어 회로1000: Data storage device (SSD device) 10: Host interface
11: DMA control circuit 20: FTL
21: Page allocation circuit 30: DRAM
40: transaction queue 50: channel
60: Flash memory 70: Die
100: Recommended circuit
110: Embedding table 120: Comparison vector generation circuit
130: concatenated circuit 140: multilayer neural network circuit
150: decision circuit 160: learning control circuit

Claims

One or more non-volatile memory devices including a plurality of unit storage spaces;
A recommendation circuit that recommends one unit storage space among the plurality of unit storage spaces to process a write request,
The recommendation circuit is a data storage device that recommends one unit storage space by applying feature data generated from request information, a target address corresponding to a write request, and addresses of data stored in a plurality of unit storage spaces to a neural network circuit.

The data storage device of claim 1, further comprising a write address allocation circuit that allocates a write address within one unit storage space recommended by the recommendation circuit.

The method of claim 1, wherein the recommendation circuit
an embedding table that stores an embedding vector corresponding to a data address and generates a target vector from the target address;
a comparison vector generation circuit that generates a plurality of comparison vectors using the plurality of unit storage space information and the embedding table information;
a multi-layer neural network circuit that generates scores for a plurality of unit storage spaces using a request vector corresponding to the request information, the target vector, and the feature data corresponding to the plurality of comparison vectors;
A decision circuit for determining the unit storage space by referring to the score.
A data storage device containing a.

The data storage device of claim 3, further comprising a concatenation circuit for generating the feature data by concatenating the request vector, the target vector, and the plurality of comparison vectors.

The method of claim 3, wherein the target address is a logical address, and the comparison vector generation circuit adds a plurality of embedding vectors corresponding to a plurality of logical addresses corresponding to a plurality of data stored in any one unit storage space to A data storage device that generates comparison vectors corresponding to unit storage space.

The method of claim 5, wherein when one logical address in a unit storage space is invalidated, the comparison vector generation circuit updates the comparison vector by deleting an embedding vector corresponding to the invalidated logical address from the existing comparison vector, A data storage device that updates a comparison vector by adding an embedding vector corresponding to the added logical address to the existing comparison vector when a logical address is added to a unit storage space.

The method of claim 3, wherein the recommendation circuit further includes a learning control circuit,
The one or more non-volatile memory devices store learning data in a pre-stored address,
The learning control circuit is a data storage device that controls learning operations for the embedding table and the multilayer neural network circuit using the learning data.

The method of claim 7, wherein the learning data is generated from trace data, which is a set of requests provided to the data storage device,
The learning data includes a pair of write request information and a true value of a unit storage space to be recommended correspondingly,
The true value is determined by considering parallel processing of a read request that has the same logical address as any one of a plurality of read requests provided after any one write request in the trace data.

The method of claim 2, wherein the non-volatile memory device is a flash memory device,
The unit storage space corresponds to a die of a flash memory device,
A data storage device wherein the write address is a page address inside the recommended die.

The method according to claim 9, further comprising: a host interface for receiving and decoding a request provided from a host;
an FTL that performs address mapping with reference to the write address determined by the write address allocation circuit;
Volatile memory device that stores data to be exchanged with the host
A data storage device further comprising:

The data storage device of claim 10, wherein the host interface generates a write request in units of multiple pages according to the length of data requested to be written.

The method of claim 10, wherein the information decoded from the host interface includes a memory address of the host,
The host interface further includes a DMA control circuit that transmits and receives data between a memory address of the host and the volatile memory device.

The data storage device of claim 12, wherein when a write request is received, the recommendation circuit determines a unit storage space to perform a write operation while the DMA control circuit stores host data in the volatile memory device.