KR20230114178A

KR20230114178A - Low latency multiple storage device system

Info

Publication number: KR20230114178A
Application number: KR1020220164740A
Authority: KR
Inventors: 종왕 리; 마리 메이 응우옌; 박희권; 메흐란 엘야시; 레카 피추마니
Original assignee: 삼성전자주식회사
Priority date: 2022-01-24
Filing date: 2022-11-30
Publication date: 2023-08-01

Abstract

시스템이 개시된다. 저장 장치는 데이터를 저장할 수 있다. 로드 모듈은 입출력(I/O)에 적어도 부분적으로 기초하여 상기 저장 장치로부터 상기 데이터를 읽을 수 있다. 스케줄러는 임계치보다 작은 상기 입출력 요청의 크기에 적어도 부분적으로 기초하여 상기 입출력 요청을 상기 로드 모듈로의 전달을 위한 큐에 배치할 수 있다. system is initiated. A storage device may store data. A load module may read the data from the storage device based at least in part on input/output (I/O). A scheduler may place the I/O request into a queue for delivery to the load module based at least in part on a size of the I/O request that is less than a threshold.

Description

Multiple storage device system with low latency {LOW LATENCY MULTIPLE STORAGE DEVICE SYSTEM}

본 개시는 일반적으로 저장 장치들에 관한 것으로, 보다 구체적으로는 낮은 읽기 레이턴시(low read latency)를 가능하게 하는 저장 장치들을 포함하는 시스템에 관한 것이다.The present disclosure relates generally to storage devices, and more particularly to systems including storage devices enabling low read latency.

일부 애플리케이션들은 크기가 큰 데이터에 의존할 수 있다. 이 데이터가 메모리에 아직 저장되지 않은 경우, 상기 데이터는 저장 장치들에서 읽을 수 있다. 그러나 단일 저장 장치에서 데이터를 액세스하는 것은 병목 현상을 발생시켜 데이터 액세스와 그에 따르는 연산들을 느려지게 할 수 있다. Some applications may depend on large amounts of data. If this data is not yet stored in memory, it can be read from storage devices. However, accessing data from a single storage device can create a bottleneck that slows down data access and subsequent operations.

데이터에 대한 낮은 레이턴시 액세스를 제공해야 할 필요성이 남게 된다. The need to provide low latency access to data remains.

본 개시의 일 목적은 높은 QPS(query per second) 및 낮은 레이턴시를 달성하는 시스템 및 방법을 제공하는 것이다.One object of the present disclosure is to provide systems and methods that achieve high query per second (QPS) and low latency.

본 개시의 실시예들은 시스템을 포함할 수 있다. 상기 시스템은 데이터를 저장하기 위한 저장 장치 및 상기 저장 장치로부터 상기 데이터를 읽기 위한 로드 모듈을 포함할 수 있다. 스케줄러는 입출력(I/O) 요청을 수신하고, 상기 입출력 요청의 크기에 기초하여 상기 입출력 요청을 상기 로드 모듈로 전달할 수 있다.Embodiments of the present disclosure may include a system. The system may include a storage device for storing data and a load module for reading the data from the storage device. The scheduler may receive an input/output (I/O) request and transfer the input/output request to the load module based on the size of the input/output request.

본 개시의 실시예들에 따른 시스템은, 연산 스케줄러를 포함할 수 있다. 상기 연산 스케줄러는 연산 요청의 워크로드에 기초하여 큐들 중 하나에 연산 요청을 배치할 수 있다. 상기 연산 스케줄러는 처리 요소를 선택하고 상기 연산 요청을 상기 선택된 처리 요소로 전송할 수 있다. 상기 연산 스케줄러는 연산 강도에 기초하여 동작할 수 있다. A system according to embodiments of the present disclosure may include an operation scheduler. The operation scheduler may place the operation request into one of the queues based on the workload of the operation request. The operation scheduler may select a processing element and transmit the operation request to the selected processing element. The operation scheduler may operate based on operation intensity.

본 개시의 실시예들에 따른 시스템은, 다중 처리 시스템을 포함한다. 상기 다중 처리 시스템은 입출력 요청을 사용하여 입출력 처리/저장 풀로부터 데이터를 로드할 수 있고, 상기 입출력 요청은 연산 요청을 사용하여 처리되는 데이터에 기초할 수 있다. 데이터는 상기 입출력 처리/저장 풀의 스토리지 장치들과 연관된 로드 모듈들을 사용하여 검색될 수 있다. 상기와 같은 구성에 의해 상기 시스템은 높은 QPS 및 낮은 레이턴시를 달성할 수 있다.A system according to embodiments of the present disclosure includes a multi-processing system. The multiprocessing system may load data from an I/O processing/storage pool using I/O requests, which may be based on data being processed using operation requests. Data may be retrieved using load modules associated with the storage devices of the I/O processing/storage pool. With the above configuration, the system can achieve high QPS and low latency.

아래에 기술된 도면들은 본 개시의 실시예들이 구현될 수 있는 방법의 예들이고, 본 개시의 실시예들을 제한하려는 의도는 아니다. 본 개시의 개별 실시예들은 특정 도면들에 도시되지 않은 요소들을 포함할 수 있고 및/또는 특정 도면에 도시된 요소들을 생략할 수도 있다. 상기 도면들은 예시를 제공하기 위한 것이고 크기가 조정되지 않은 것일 수 있다.
도 1은 본 개시의 실시예들에 따른 연산들을 처리함에 있어서 저장 장치들에 낮은 레이턴시 액세스(low latency access)를 지원하도록 구성된 장치를 도시한다.
도 2는 본 개시의 실시예들에 따른 도 1의 상기 장치의 세부사항을 도시한다.
도 3은 본 개시의 실시예들에 따른 도 1의 상기 다중 처리 시스템의 세부사항을 도시한다.
도 4는 본 개시의 실시예들에 따른 도 1의 상기 다중 처리 시스템으로 발행되는 입출력 요청의 세부사항을 도시한다.
도 5는 본 개시의 실시예들에 따른 도 3의 상기 테이블의 세부사항을 도시한다.
도 6은 본 개시의 실시예들에 따른 도 3의 상기 입출력 스케줄러의 세부사항을 도시한다.
도 7은 본 개시의 실시예들에 따른 도 3의 상기 로드 모듈의 세부사항을 도시한다.
도 8은 본 개시의 실시예들에 따른 도 1의 상기 연산 시스템의 세부사항을 도시한다.
도 9는 본 개시의 실시예들에 따른 도 1의 다중 처리 시스템을 사용하여 도 4의 상기 입출력 요청을 처리하기 위한 예시적인 절차의 순서도를 도시한다.
도 10은 본 개시의 실시예들에 따른 도 1의 다중 처리 시스템을 사용하여 도 4의 상기 입출력 요청을 처리하기 위한 예시적인 절차의 대안적인 순서도를 도시한다.
도 11은 본 개시의 실시예들에 따른 도 3의 상기 입출력 스케줄러가 도 4의 상기 입출력 요청을 큐잉하는 것에 우선순위 큐잉을 사용하기 위한 예시적인 절차의 순서도를 도시한다.
도 12는 본 개시의 실시예들에 따른 도 3의 상기 관리자가 도 4의 상기 입출력 요청을 도 3의 상기 로드 모듈에 할당하기 위한 예시적인 절차의 순서도를 도시한다.
도 13은 본 개시의 실시예들에 따른 도 3의 상기 로드 모듈이 도 1의 상기 저장 장치로부터 데이터를 읽기 위한 예시적인 절차의 순서도를 도시한다.
도 14는 본 개시의 실시예들에 따른 도 1의 연산 시스템이 도 8의 연산 요청을 처리하기 위한 예시적인 절차의 순서도를 도시한다.
도 15a 및 도 15b는 본 개시의 실시예들에 따라, 도 8의 연산 스케줄러가 도 8의 연산 요청을 처리하기 위해 도 8의 처리 요소를 배열하는 예시적인 절차의 순서도를 도시한다.The drawings described below are examples of how embodiments of the present disclosure may be implemented, and are not intended to limit the embodiments of the present disclosure. Individual embodiments of the present disclosure may include elements not shown in certain figures and/or may omit elements shown in certain figures. The drawings are for illustrative purposes only and may not be to scale.
1 illustrates an apparatus configured to support low latency access to storage devices in processing operations according to embodiments of the present disclosure.
Figure 2 shows details of the apparatus of Figure 1 according to embodiments of the present disclosure.
FIG. 3 illustrates details of the multiple processing system of FIG. 1 according to embodiments of the present disclosure.
4 illustrates details of an I/O request issued to the multi-processing system of FIG. 1 according to embodiments of the present disclosure.
Figure 5 shows details of the table of Figure 3 according to embodiments of the present disclosure.
6 illustrates details of the I/O scheduler of FIG. 3 according to embodiments of the present disclosure.
7 shows details of the load module of FIG. 3 according to embodiments of the present disclosure.
8 illustrates details of the computing system of FIG. 1 according to embodiments of the present disclosure.
FIG. 9 illustrates a flowchart of an exemplary procedure for processing the I/O request of FIG. 4 using the multiprocessing system of FIG. 1 in accordance with embodiments of the present disclosure.
FIG. 10 depicts an alternative flow diagram of an exemplary procedure for processing the I/O request of FIG. 4 using the multiprocessing system of FIG. 1 in accordance with embodiments of the present disclosure.
11 illustrates a flowchart of an exemplary procedure for the I/O scheduler of FIG. 3 to use priority queuing for queuing the I/O request of FIG. 4 in accordance with embodiments of the present disclosure.
FIG. 12 illustrates a flowchart of an exemplary procedure for the manager of FIG. 3 to allocate the I/O request of FIG. 4 to the load module of FIG. 3 according to embodiments of the present disclosure.
13 illustrates a flowchart of an exemplary procedure for the load module of FIG. 3 to read data from the storage device of FIG. 1 according to embodiments of the present disclosure.
14 depicts a flowchart of an exemplary procedure for processing the computational request of FIG. 8 by the computational system of FIG. 1 in accordance with embodiments of the present disclosure.
15A and 15B show a flow diagram of an exemplary procedure in which the operation scheduler of FIG. 8 arranges the processing elements of FIG. 8 to process the operation request of FIG. 8, in accordance with embodiments of the present disclosure.

이하 본 개시의 실시예들에 대한 참조가 상세하게 이루어질 것이며, 그 예들은 첨부된 도면들에 도시되어 있다. 다음의 상세한 설명에서, 다수의 특정한 세부사항이 본 개시의 완전한 이해를 위해 설명된다. 그러나 당해 기술분야에 속하는 통상의 기술자가 이러한 특정한 세부사항 없이 본 개시를 실시할 수 있다는 것이 이해되어야 한다. 다른 경우들에서, 잘 알려진 방법들, 절차들, 구성요소들, 회로들, 및 네트워크들은 본 개시의 실시예들의 양상들을 불필요하게 모호하게 하지 않도록 상세하게 설명되지 않는다. Reference will now be made in detail to embodiments of the present disclosure, which examples are illustrated in the accompanying drawings. In the detailed description that follows, numerous specific details are set forth for a thorough understanding of the present disclosure. However, it should be understood that one of ordinary skill in the art may practice the present disclosure without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the embodiments of the present disclosure.

제1, 제2 등의 용어들이 다양한 요소들을 설명하기 위해 여기에서 사용될 수 있으나, 이러한 요소들이 이 용어들에 의해 제한되어서는 안 된다는 것이 이해될 것이다. 이 용어들은 하나의 요소를 다른 요소와 구별하기 위해서만 사용된다. 예를 들어, 본 개시의 범위를 벗어나지 않으면서, 제1 모듈은 제2 모듈로 명명될 수 있고, 유사하게 제2 모듈은 제1 모듈로 명명될 수 있다. Although the terms first, second, etc. may be used herein to describe various elements, it will be understood that these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first module could be termed a second module, and similarly, a second module could be termed a first module, without departing from the scope of the present disclosure.

본 개시의 설명에서 사용된 용어는 단지 특정한 실시예들을 설명하기 위해 사용된 것이고 본 개시를 한정하려는 것은 아니다. 본 개시의 설명 및 첨부된 청구범위에서 사용된 바와 같이, 문맥상 명백하게 다르게 나타내지 않는 한, 단수형은 복수형도 포함하는 것으로 의도된다. 또한 여기에서 사용되는 “및/또는”이라는 용어는 관련되고 나열된 항목들 중 하나 이상의 임의의 및 모든 가능한 조합들을 지칭하고 포함하는 것으로 이해될 수 있다. “포함하다” 및/또는 “포함하는”이라는 용어들은 본 개시에서 사용될 때, 명시된 특징들, 정수들, 단계들, 동작들, 요소들, 및/또는 구성요소들의 존재를 구체화하지만 하나 이상의 다른 특징들, 정수들, 단계들, 동작들, 요소들, 구성요소들, 및/또는 그것의 그룹들의 존재 또는 추가를 배제하지는 않는다. 도면들의 구성요소들 및 특징들은 축적에 맞게 반드시 도시되지는 않는다. Terms used in the description of the present disclosure are only used to describe specific embodiments and are not intended to limit the present disclosure. As used in the description of this disclosure and the appended claims, the singular forms are intended to include the plural forms unless the context clearly dictates otherwise. Also used herein, the term "and/or" can be understood to refer to and include any and all possible combinations of one or more of the related listed items. The terms “comprise” and/or “comprising”, when used in this disclosure, specify the presence of specified features, integers, steps, operations, elements, and/or components but not one or more other features. does not exclude the presence or addition of s, integers, steps, operations, elements, components, and/or groups thereof. Components and features of the drawings are not necessarily drawn to scale.

딥 러닝 추천 모델들(deep learning recommendation models)(DLRMs)과 같은 일부 애플리케이션들은 많은 양의 데이터에 의존할 수 있다. DLRM들은 크기가 테라바이트일 수 있는 내장 테이블들에 의존할 수 있다. 처리를 위해 저장 장치로부터 메모리로 많은 양의 데이터를 전송하는 것에는 시간이 걸릴 수 있다. 다수의 애플리케이션들이 저장 장치에 저장된 데이터를 처리하려고 시도하는 경우, 저장 장치가 요청된 모든 데이터에 대해 충분하지 않은 대역폭 제한을 가질 수 있어 이 문제는 악화될 수 있다. Some applications, such as deep learning recommendation models (DLRMs), can rely on large amounts of data. DLRMs can rely on built-in tables that can be terabytes in size. Transferring large amounts of data from storage to memory for processing can take time. If multiple applications attempt to process the data stored on the storage device, this problem can be exacerbated as the storage device may have insufficient bandwidth limitations for all the requested data.

본 개시의 실시예들은 저장 장치들의 시스템을 도입함으로써 이러한 문제들을 해결한다. 저장 장치들은 예를 들어, SSD들(Solid State Drives)일 수 있다. 데이터는 저장 장치들에 분산되어 개별 저장 장치의 부하를 감소시킴으로써 모든 요청된 데이터를 제공할 수 있다. 스케줄러는 검색될 데이터의 크기에 기초하여 입출력 요청들을 하나 이상의 큐들로 스케줄링할 수 있다. 입출력 처리 관리자는 큐들로부터 요청들을 검색할 수 있고 저장 장치들로부터 데이터를 검색하기 위해 로드 모듈들을 식별할 수 있다. Embodiments of the present disclosure address these problems by introducing a system of storage devices. The storage devices may be, for example, Solid State Drives (SSDs). Data can be distributed across storage devices to provide all requested data by reducing the load on individual storage devices. A scheduler can schedule I/O requests into one or more queues based on the size of the data to be retrieved. The I/O processing manager can retrieve requests from queues and can identify load modules to retrieve data from storage devices.

도 1은 본 개시의 실시예들에 따른 연산들을 처리함에 있어서 저장 장치들에 낮은 레이턴시 액세스(low latency access)를 지원하도록 구성된 장치를 도시한다. 도 1에서, 호스트 또는 시스템이라고도 지칭될 수 있는 장치(105)는 프로세서(110), 메모리(115), 및 저장 장치들(120-1 및 120-2)(통칭하여 저장 장치(120)로 지칭될 수 있다.)을 포함할 수 있다. 프로세서(110)는 임의의 다양한 프로세서일 수 있다(프로세서(110)는 이하에서 논의되는 다른 구성요소들과 함께 설명의 편의를 위해 장치의 외부에 도시된다. 본 개시의 실시예들은 상기 장치 내에 이러한 구성요소들을 포함할 수 있다.). 도 1은 단일의 프로세서(110)를 도시하고 있으나, 장치(105)는 임의의 개수의 프로세서들을 포함할 수 있고, 프로세서들 각각은 단일 코어 또는 다중 코어 프로세서들일 수 있고, 프로세서들 각각은 (다른 가능한 아키텍쳐들 중에서) RISC(Reduced Instruction Set Computer) 아키텍쳐 또는 CISC(Complex Instruction Set Computer) 아키텍쳐를 구현할 수 있고, 임의의 바람직한 조합으로 혼합될 수 있다. 1 illustrates an apparatus configured to support low latency access to storage devices in processing operations according to embodiments of the present disclosure. 1, device 105, which may also be referred to as a host or system, includes a processor 110, memory 115, and storage devices 120-1 and 120-2 (collectively referred to as storage device 120). can be). Processor 110 may be any of a variety of processors (processor 110, along with other components discussed below, is shown external to the device for convenience of explanation. Embodiments of the present disclosure provide such components may be included). 1 depicts a single processor 110, device 105 may include any number of processors, each of which may be single-core or multi-core processors, each of which may be (other than Among possible architectures, it may implement a Reduced Instruction Set Computer (RISC) architecture or a Complex Instruction Set Computer (CISC) architecture, and may be mixed in any desired combination.

프로세서(110)는 메모리(115)와 연결될 수 있다. 메모리(115)는 플래시 메모리, DRAM(Dynamic Random Access Memory), SRAM(Static Random Access Memory), PRAM(Persistent Random Access Memory), FRAM(Ferroelectric Random Access Memory)과 같은 임의의 다양한 메모리, 또는 MRAM(Magnetoresistive Random Access Memory) 등과 같은 NVRAM(Non-Volatile Random Access Memory)일 수 있다. 메모리(115)는 필요에 따라 휘발성 또는 비휘발성 메모리일 수 있다. 메모리(115)는 또한 상이한 메모리 유형들의 임의의 바람직한 조합일 수 있고, 메모리 컨트롤러(125)에 의해 관리될 수 있다. 메모리(115)는 “단기”로 지칭될 수 있는 데이터 즉, 장기간 저장될 것으로 예상되지 않는 데이터를 저장하기 위해 사용될 수 있다. 단기 데이터의 예들은 임시 파일들, 애플리케이션들에 의해 지역적으로(locally) 사용될 수 있는 데이터 (다른 저장 위치들로부터 복사되었을 수 있음) 등을 포함할 수 있다. Processor 110 may be coupled with memory 115 . The memory 115 may be any of a variety of memories such as flash memory, dynamic random access memory (DRAM), static random access memory (SRAM), persistent random access memory (PRAM), ferroelectric random access memory (FRAM), or magnetoresistive memory (MRAM). It may be a Non-Volatile Random Access Memory (NVRAM) such as Random Access Memory). Memory 115 may be volatile or non-volatile memory as desired. Memory 115 may also be any desired combination of different memory types and may be managed by memory controller 125 . Memory 115 may be used to store data that may be referred to as “short-term,” i.e., data that is not expected to be stored for long periods of time. Examples of short-lived data may include temporary files, data that may be used locally by applications (which may have been copied from other storage locations), and the like.

프로세서(110) 및 메모리(115)는 또한 다양한 애플리케이션들이 실행될 수 있는 운영체제를 지원할 수 있다. 이러한 애플리케이션들은 메모리(115)로부터 데이터를 읽거나 메모리(115)에 데이터를 쓰기 위한 요청들(명령들이라고도 지칭될 수 있음)을 발행할 수 있다. 저장 장치(120)가 어떤 종류의 파일 시스템을 통해 데이터를 읽거나 쓰는 애플리케이션들을 지원하기 위해 사용될 때, 저장 장치(120)는 장치 드라이버(130)를 사용하여 액세스될 수 있다. 도 1은 두 개의 저장 장치들(120)을 도시하고 있으나, 장치(105)에는 임의의 개수(하나 이상)의 저장 장치들이 있을 수 있다. 저장 장치(120)는 예를 들어 NVMe(Non-Volatile Memory Express) 프로토콜을 포함하는 임의의 바람직한 프로토콜 또는 프로토콜들을 각각 지원할 수 있다. 상이한 저장 장치들(120)은 상이한 프로토콜들 및/또는 인터페이스들을 지원할 수 있다. Processor 110 and memory 115 may also support an operating system on which various applications may be executed. These applications may issue requests (which may also be referred to as commands) to read data from or write data to memory 115 . When storage device 120 is used to support applications that read or write data through some kind of file system, storage device 120 may be accessed using device driver 130 . 1 shows two storage devices 120, there may be any number (one or more) of storage devices in device 105. Storage device 120 may each support any desired protocol or protocols including, for example, the Non-Volatile Memory Express (NVMe) protocol. Different storage devices 120 may support different protocols and/or interfaces.

도 1은 일반적인 용어인 “저장 장치”를 사용하지만, 본 개시의 실시예들은 연산 저장 유닛들의 사용으로부터 이익을 얻을 수 있는 임의의 저장 장치 유형들을 포함할 수 있고, 예로서 하드 디스크 드라이브들 및 SSD(Solid State Drive)들을 포함할 수 있다. 아래의 “SSD”에 대한 모든 참조는 본 개시의 그러한 다른 실시예들을 포함하는 것으로 이해되어야 한다. 또한, 서로 다른 유형의 저장 장치들이 혼합될 수 있다. 예를 들어, 저장 장치(120-1)는 하드 디스크 드라이브일 수 있고, 저장 장치(120-2)는 SSD 일 수 있다. 1 uses the generic term “storage”, embodiments of this disclosure may include any storage device types that can benefit from the use of computational storage units, such as hard disk drives and SSDs. (Solid State Drives). All references to “SSD” below should be understood to include such other embodiments of the present disclosure. Also, different types of storage devices may be mixed. For example, the storage device 120-1 may be a hard disk drive, and the storage device 120-2 may be an SSD.

장치(105)는 또한 다중 처리 시스템(135) 및 연산 시스템(140)을 포함할 수 있다. 다중 처리 시스템(135)은 프로세서(110)(또는 도 1에 도시되지 않은 원격 장치들의 프로세서들)에서 실행되는 애플리케이션들로부터 수신된 입출력(I/O) 요청들에 기초하여 저장 장치들(120)로부터 데이터를 읽는 것을 관리할 수 있다. 상기 입출력 요청들은 연산 프로세스들에서 사용될 수 있는 저장 장치들(120)로부터 데이터를 요청할 수 있다. 즉, 연산 요청이 주어지면, 상기 연산 요청에 의해 처리될 데이터는, 저장 장치(120)로부터, 먼저 다중 처리 시스템(135)에 의해 처리된 입출력 요청에서 요청될 수 있다. 다중 처리 시스템(135)은 레이턴시(latency)(데이터를 읽고 데이터에 대하여 적절한 명령들을 실행하기 위해 필요한 시간을 포함하여 상기 연산 프로세스들을 완료하는데 필요한 시간)를 줄이기 위해 상기 입출력 요청의 크기에 기초하여 저장 장치들(120)로부터 데이터를 읽는 것을 스케줄링할 수 있다. 다중 처리 시스템(135)은 아래의 도 3을 참조하여 더 논의된다. Device 105 may also include multiple processing system 135 and computing system 140 . Multi-processing system 135 is configured to process storage devices 120 based on input/output (I/O) requests received from applications running on processor 110 (or processors of remote devices not shown in FIG. 1). You can manage reading data from . The I/O requests may request data from storage devices 120 that may be used in computing processes. That is, when an operation request is given, data to be processed by the operation request may be requested from the storage device 120 in an input/output request that is first processed by the multiprocessing system 135 . Multiprocessing system 135 bases its storage on the size of the I/O request to reduce latency (the time required to complete the computational processes, including the time required to read data and execute appropriate instructions on the data). Reading data from devices 120 can be scheduled. Multiple processing system 135 is discussed further with reference to FIG. 3 below.

다중 처리 시스템(135)에 의해 데이터가 읽히면, 연산 시스템(140)은 데이터를 처리하기 위해 연산 프로세스를 실행할 수 있다. 연산 시스템(140)은 아래의 도 8을 참조하여 더 논의된다. Once data is read by multiprocessing system 135, computational system 140 may execute computational processes to process the data. Computing system 140 is discussed further with reference to FIG. 8 below.

상술한 바와 같이, 장치(105)는 다수의 저장 장치들(120)을 포함할 수 있다. 하나 이상의 저장 장치(120)를 포함함으로써, 입출력 요청에서 요청된 데이터는 저장 장치들(120)에 걸쳐 분배될 수 있다. 저장 장치들(120)에 걸쳐 상기 데이터를 분배함으로써, 읽기 요청들은 각 저장 장치(120)에 의해 처리될 수 있다. 이러한 읽기 요청들이 병렬로 처리되는 경우, 상기 요청된 데이터는 모든 데이터가 단지 하나의 저장 장치(120)에만 저장되는 경우보다 빠르게 읽힐 수 있다. 그러나 본 개시의 실시예들은 (다수의 저장 장치들(120)로부터 병렬로 데이터를 읽는 잠재적 이점없이) 하나의 저장 장치(120)를 포함할 수 있다.As noted above, device 105 may include multiple storage devices 120 . By including one or more storage devices 120 , data requested in an I/O request may be distributed across the storage devices 120 . By distributing the data across storage devices 120, read requests can be processed by each storage device 120. When these read requests are processed in parallel, the requested data can be read faster than when all data is stored in only one storage device 120 . However, embodiments of the present disclosure may include one storage device 120 (without the potential benefit of reading data from multiple storage devices 120 in parallel).

도 1은 장치(105)가 다중 처리 시스템(135) 및 연산 시스템(140)을 포함하는 것으로 도시하지만, 본 개시의 실시예들은 다른 곳에 위치하는 이들 구성요소들을 가질 수 있다. 예를 들어, 다중 처리 시스템(135)은 장치(105)의 일부로서 포함될 수 있는 반면, 연산 시스템(140)은 네트워크를 통해 도달하는 다른 장치의 일부일 수 있다. 실제로, 본 개시의 실시예들은 네트워크 또는 통신 경로를 통해 연결된 별도의 장치들(105)에 저장 장치들(120), 다중 처리 시스템(135), 및 연산 시스템(140)을 각각 분리할 수 있다.1 shows device 105 as including multiple processing system 135 and computing system 140, embodiments of the present disclosure may have these components located elsewhere. For example, multiprocessing system 135 may be included as part of device 105 while computing system 140 may be part of another device reaching over a network. Indeed, embodiments of the present disclosure may separate storage devices 120, multiple processing system 135, and computing system 140, respectively, into separate devices 105 that are connected via a network or communication path.

도 2는 본 개시의 실시예들에 따른 도 1의 장치의 세부사항을 도시한다. 도 2에서, 일반적으로 장치(105)는 하나 이상의 프로세서들(110)을 포함하고, 이는 장치(105)의 구성요소들의 동작들을 조정하는데 사용될 수 있는 메모리 컨트롤러(120) 및 클럭들(205)을 포함할 수 있다. 프로세서들(110)은 또한 예로서 RAM(random access memory), ROM(read-only memory), 또는 다른 상태 보존 매체를 포함할 수 있는 메모리들(115)과 연결될 수 있다. 프로세서들(110)은 또한 저장 장치들(120)에 연결될 수 있고, 예를 들어 이더넷 커넥터 또는 무선 커넥터일 수 있는 네트워크 커넥터(210)에 연결될 수 있다. 프로세서들(110)은 또한 버스들(215)에 연결될 수 있고, 버스들(215)에는 사용자 인터페이스들(220) 및 다른 구성요소들 중에서 입출력(I/O) 엔진들을 사용하여 관리될 수 있는 입출력(I/O) 인터페이스 포트들이 부착될 수 있다. 2 shows details of the apparatus of FIG. 1 according to embodiments of the present disclosure. 2, device 105 generally includes one or more processors 110, which control a memory controller 120 and clocks 205 that can be used to coordinate the operations of components of device 105. can include Processors 110 may also be coupled with memories 115 , which may include, for example, random access memory (RAM), read-only memory (ROM), or other state-preserving media. Processors 110 may also be coupled to storage devices 120 and may be coupled to network connector 210, which may be, for example, an Ethernet connector or a wireless connector. Processors 110 may also be coupled to buses 215, which may include user interfaces 220 and input/output that may be managed using input/output (I/O) engines, among other components. (I/O) interface ports may be attached.

도 1의 다중 처리 시스템(135)의 구조 및 동작을 설명하기 전에, 도 1의 장치(105)를 사용하여 처리될 수 있는 다양한 요청들을 고려하는 것이 도움이 될 수 있다. 도 1의 프로세서(110)에서 실행될 수 있는 예시적인 애플리케이션은 장치 학습 알고리즘의 하나의 예인 딥 러닝 추천 모델(deep learning recommendation model)(DLRM)이다. 상기 DLRM 애플리케이션은 쿼리(query)를 처리하는데 걸리는 시간을 지배(결정)할 수 있는 SLA(service level agreement)를 설정했을 수 있다. 즉, 상기 DLRM은 특정 쿼리에 일정한 시간이 걸릴 것으로 예상할 수 있다. 상기 쿼리가 그 시간보다 오래 걸리는 경우, 상기 DLRM은 처리를 계속하기 전에 예상보다 더 오래 기다릴 수 있다. Before describing the structure and operation of the multi-processing system 135 of FIG. 1, it may be helpful to consider the various requests that may be processed using the device 105 of FIG. An example application that may run on processor 110 of FIG. 1 is a deep learning recommendation model (DLRM), which is one example of a machine learning algorithm. The DLRM application may have set a service level agreement (SLA) that can govern (determine) the time required to process a query. That is, the DLRM can predict that a specific query will take a certain amount of time. If the query takes longer than that, the DLRM may wait longer than expected before continuing processing.

상기 쿼리를 실행하기 위해, 도 1의 장치(105)는 문제의 데이터를 검색한 다음에 그 데이터에 대한 연산 과정을 수행해야할 수 있다. 상기 데이터를 검색하고 상기 연산들을 처리하는데 어느 정도의 시간이 걸릴 수 있다. To execute the query, the device 105 of FIG. 1 may have to retrieve the data in question and then perform computations on that data. Retrieving the data and processing the operations may take some time.

상기 쿼리가 상대적으로 작은 경우(예를 들어, 256개 미만의 데이터 포인트들을 포함), 데이터를 검색하는 것이 상대적으로 빠를 수 있고, 상기 쿼리를 실행하는 전체 프로세스가 상기 SLA를 충족할 수 있다. 그러나 상기 쿼리가 상대적으로 큰 경우(예를 들어, 256개 이상의 데이터 포인트들을 포함), 상기 데이터를 검색하는 것이 상기 SLA를 충족하지 못할 정도로 오랜 시간이 걸릴 수 있다. 상기 DLRM은 다양한 쿼리 크기들을 가질 수 있으므로, 상대적으로 큰 쿼리들이 발생할 것으로 예상될 수 있고, 저장된 데이터를 검색하는 프로세스는 연산 프로세스를 실행하는데 필요한 시간보다 클 수 있다. 도 1의 장치(105)가 상기 SLA를 충족하지 못하는 것은 바람직하지 않으므로, 도 1의 저장 장치들(120)로부터 데이터를 검색하는데 필요한 시간을 단축하는 것 즉, 데이터 검색을 위한 낮은 레이턴시를 달성하는 것이 바람직하다. If the query is relatively small (eg, contains less than 256 data points), retrieving data can be relatively fast, and the entire process of executing the query can meet the SLA. However, if the query is relatively large (eg, contains more than 256 data points), retrieving the data may take so long that it does not meet the SLA. Since the DLRM can have a variety of query sizes, relatively large queries can be expected to occur, and the process of retrieving stored data can take longer than the time required to execute the computational process. Since it is undesirable for the device 105 of FIG. 1 to fail to meet the SLA, reducing the time required to retrieve data from the storage devices 120 of FIG. 1 ie achieving low latency for data retrieval it is desirable

도 3은 본 개시의 실시예들에 따른 도 1의 다중 처리 시스템의 세부사항을 도시한다. 도 3에서, 다중 처리 시스템(135)은 입출력(I/O) 스케줄러(305), 큐들(310-1, 310-2, 310-3)(통칭하여 큐들(310)로 지칭될 수 있음), 및 입출력(I/O) 프로세스/저장 풀(315)을 포함할 수 있다. FIG. 3 shows details of the multiple processing system of FIG. 1 in accordance with embodiments of the present disclosure. In FIG. 3 , multiple processing system 135 includes an input/output (I/O) scheduler 305, queues 310-1, 310-2, and 310-3 (which may be referred to collectively as queues 310), and an input/output (I/O) process/storage pool 315 .

입출력 스케줄러(305)는 도 1의 프로세서(110)에서 실행되는 애플리케이션들로부터 입출력 요청을 수신할 수 있다. 이러한 요청들은 (도 1의 연산 시스템(140)에 의해 처리될) 연산 요청에서 사용하기 위해 저장 장치들(120)로부터 읽히게 되는 데이터를 식별할 수 있다. The I/O scheduler 305 may receive I/O requests from applications running on the processor 110 of FIG. 1 . These requests may identify data to be read from storage devices 120 for use in the computational request (to be processed by computational system 140 of FIG. 1).

입출력 스케줄러(305)는 상기 입출력 요청들의 크기 즉, 상기 연산 프로세스를 실행하는데 사용되는 저장 장치들(120)로부터 읽히게 되는 데이터의 양을 결정할 수 있다. 상기 입출력 요청의 크기를 사용하여 입출력 스케줄러(305)는 상기 입출력 요청을 배치할 큐들(310) 중 하나를 선택할 수 있다. The I/O scheduler 305 may determine the size of the I/O requests, that is, the amount of data read from the storage devices 120 used to execute the calculation process. Using the size of the I/O request, the I/O scheduler 305 may select one of the queues 310 to place the I/O request.

도 3에서, 세 개의 큐들(310-1 내지 310-3)이 도시된다. 각 큐(310)는 다양한 크기들의 입출력 요청들을 저장하는데 사용될 수 있다. 예를 들어, 큐(310-1)는 크기가 128 벡터 이하인 데이터(임베딩(embedding) 테이블의 행들)를 검색하기 위한 입출력 요청들을 저장하는데 사용될 수 있다. 큐(310-2)는 크기가 128 벡터보다 크지만 512 벡터보다는 크지 않은 데이터를 검색하기 위한 입출력 요청들을 저장하는데 사용될 수 있다. 큐(310-3)는 크기가 512 벡터보다 큰 데이터를 검색하기 위한 입출력 요청들을 저장하는데 사용될 수 있다. 이러한 방식으로, 입출력 요청들은 검색할 데이터의 대략적인 양에 기초하여 그룹화될 수 있고, 이는 데이터를 검색하는데 필요한 시간을 나타낼 수 있다. In Figure 3, three cues 310-1 to 310-3 are shown. Each queue 310 may be used to store I/O requests of various sizes. For example, queue 310-1 may be used to store I/O requests for retrieving data (rows of an embedding table) of size 128 vectors or less. Queue 310-2 may be used to store I/O requests to retrieve data greater than 128 vectors in size but not greater than 512 vectors. Queue 310-3 may be used to store I/O requests for retrieving data whose size is greater than 512 vectors. In this way, I/O requests can be grouped based on the approximate amount of data to retrieve, which can indicate the time required to retrieve the data.

도 3은 세 개의 큐들(310)을 도시하지만, 본 개시의 실시예들은 임의의 개수(하나 이상)의 큐들(310)을 포함할 수 있다. 예를 들어, 우선순위 큐잉(priority queuing)이 사용되는 경우, 입출력 스케줄러(305)는 저장 장치들(120)로부터 읽을 데이터의 크기에 기초하여 상기 입출력 요청에 대한 우선순위(priority)를 결정할 수 있고, 입출력 처리/저장 풀(315)이 상기 입출력 요청의 상대적인 우선순위를 결정할 수 있도록 우선순위 태그(priority tag)를 큐(310)의 입출력 요청에 연관시킬 수 있다. 예를 들어, 상기 입출력 요청이 128 벡터 이하의 크기를 가지는 데이터(크기가 128 벡터 이하인 임베딩 벡터들(embedding vectors)(임베딩 테이블의 행들))를 요청하는 경우, 상기 우선순위 태그는 상기 요청이 우선순위 ‘1’을 가지는 것을 나타낼 수 있다. 상기 입출력 요청이 128 벡터보다 크지만 512 벡터보다는 크지 않은 크기를 가지는 데이터를 요청하는 경우, 상기 우선순위 태그는 상기 요청이 우선순위 ‘2’를 가지는 것을 나타낼 수 있다. 상기 입출력 요청이 512 벡터보다 큰 크기를 가지는 데이터를 요청하는 경우, 상기 우선순위 태그는 상기 요청이 우선순위 ‘3’을 가지는 것을 나타낼 수 있다. 큐들(310)의 개수와 마찬가지로, 임의의 개수의 상이한 우선순위들이 사용될 수 있다. 위에서 논의된 우선순위들의 개수는 예시적인 것에 불과하다.Although FIG. 3 shows three cues 310 , embodiments of the present disclosure may include any number (one or more) of cues 310 . For example, when priority queuing is used, the I/O scheduler 305 may determine the priority of the I/O request based on the size of data to be read from the storage devices 120, , a priority tag may be associated with an I/O request in queue 310 so that I/O processing/storage pool 315 can determine the relative priority of the I/O request. For example, if the I/O request requests data having a size of 128 vectors or less (embedding vectors (rows of an embedding table) with a size of 128 vectors or less), the priority tag indicates that the request takes precedence. It may indicate having a rank of '1'. When the I/O request requests data having a size greater than 128 vectors but not greater than 512 vectors, the priority tag may indicate that the request has a priority of '2'. When the I/O request requests data having a size greater than 512 vectors, the priority tag may indicate that the request has a priority of '3'. As with the number of queues 310, any number of different priorities may be used. The number of priorities discussed above is merely illustrative.

본 개시의 일부 실시예들에서, 큐들(310)은 선입선출(first in, first out)(FIFO) 큐들일 수 있다. 본 개시의 다른 실시예들에서, 다른 유형의 큐들(310)이 사용될 수 있다. In some embodiments of the present disclosure, queues 310 may be first in, first out (FIFO) queues. In other embodiments of the present disclosure, other types of queues 310 may be used.

단지 하나의 큐(310)만이 존재하는 경우에도, 상기 큐의 선택은 여전히 기술적으로 (모든 입출력 요청들이 해당 큐에 배치될 수 있더라도) 상기 입출력 요청의 크기에 기초할 수 있다. 또한 단지 하나의 큐만 존재하는 경우, 상기 큐는 FIFO 큐가 아닐 수 있다. 즉, 상기 입출력 요청들은 상기 큐에 추가된 순서와 다른 순서로 상기 큐에서 제거될 수 있다(예를 들어, 우선순위 ‘2’인 입출력 요청보다 나중에 상기 큐에 추가된 우선순위 ‘1’인 입출력 요청이 여전히 상기 큐에서 먼저 제거되고 먼저 처리될 수 있음). Even if there is only one queue 310, the selection of the queue may still technically be based on the size of the I/O request (although all I/O requests may be placed in that queue). Also, if there is only one queue, the queue may not be a FIFO queue. That is, the I/O requests may be removed from the queue in an order different from the order in which they were added to the queue (e.g., I/O requests with priority '1' added to the queue later than I/O requests with priority '2'). request may still be removed from the queue first and processed first).

입출력 스케줄러(305)가 큐(310)에 입출력 요청을 배치하면, 입출력 처리/저장 풀(315)은 큐들(310)로부터 상기 입출력 요청을 검색할 수 있다. 다수의 큐들(310)을 사용함으로써(또는 상이한 우선순위들을 사용함으로써), 입출력 처리/저장 풀(315)은 다음에 처리할 입출력 요청을 선택할 수 있다. 이러한 방식으로, 입출력 요청들은 도 1의 프로세서(110)에서 실행되는 애플리케이션들로부터 다중 처리 시스템(135)으로 전송된 순서와 다른 순서로 입출력 처리/저장 풀(315)에 의해 처리될 수 있다. When I/O scheduler 305 places an I/O request in queue 310, I/O processing/storage pool 315 may retrieve the I/O request from queues 310. By using multiple queues 310 (or using different priorities), I/O processing/storage pool 315 can select the next I/O request to process. In this way, I/O requests may be processed by I/O processing/storage pool 315 in a different order than the order in which they are sent from applications running on processor 110 of FIG. 1 to multiprocessing system 135.

입출력 처리/저장 풀(315)은 임의의 바람직한 기술을 사용하여 큐들(315)로부터 입출력 요청들을 검색할 수 있다. 예를 들어, 입출력 처리/저장 풀(315)은 라운드 로빈 액세스를 사용하여 큐(310-1)로부터 입출력 요청을 검색한 다음에, 큐(310-2)로부터 입출력 요청을 검색한 다음에, 큐(310-3)로부터 입출력 요청을 검색한 다음에, 다시 큐(310-1)로 돌아가는 등으로 검색할 수 있다(물론, 큐(310)가 입출력 요청을 가지지 않는 경우, 입출력 처리/저장 풀(315)은 해당 큐(315)를 건너뛰고 다음 큐(315)로 이동하여 입출력 요청을 검색할 수 있다.)I/O processing/storage pool 315 may retrieve I/O requests from queues 315 using any desired technique. For example, I/O processing/storage pool 315 uses round-robin access to retrieve I/O requests from queue 310-1, then retrieves I/O requests from queue 310-2, then queues After retrieving the I/O request from 310-3, it can be retrieved by going back to the queue 310-1, etc. (of course, if the queue 310 does not have an I/O request, the I/O processing/storage pool ( 315) skips the corresponding queue 315 and moves to the next queue 315 to retrieve an input/output request.)

입출력 처리/저장 풀(315)은 큐들(315)로부터 입출력 요청들을 검색하는 것을 담당할 수 있는 관리자(320)를 포함할 수 있다. 관리자(320)는 또한 어느 저장 장치(들)(120)가 요청된 데이터(상기 데이터는 단일의 저장 장치(120)에 저장될 수 있거나, 다수의 저장 장치들(120)에 저장될 수 있다.)를 저장하고 있는지를 결정할 것을 담당할 수 있다. 관리자(320)가 어느 저장 장치(들)(120)가 상기 요청된 데이터를 저장하고 있는지를 결정하면, 관리자(320)는 상기 입출력 요청을 로드 모듈(들)(325-1, 325-2, 325-3, 325-4)(통칭하여 로드 모듈(들)(325)로 지칭될 수 있다.)에 디스패치(dispatch)하여 저장 장치(들)(120)로부터 데이터를 읽을 수 있다. I/O processing/storage pool 315 may include a manager 320 that may be responsible for retrieving I/O requests from queues 315 . Manager 320 may also determine which storage device(s) 120 is the requested data (the data may be stored on a single storage device 120, or may be stored on multiple storage devices 120). ) may be responsible for determining whether the When manager 320 determines which storage device(s) 120 is storing the requested data, manager 320 sends the I/O request to load module(s) 325-1, 325-2, 325-3, 325-4 (which may be referred to collectively as load module(s) 325) to read data from storage device(s) 120.

본 개시의 일부 실시예들에서, 각 저장 장치(120)는 다른 저장 장치들(120)과 구별되는 것으로 간주될 수 있다. 즉, 저장 장치(120)의 사용을 관리하는 저장 장치들(120) 사이의 미리 결정된 관계는 없을 수 있다. 예를 들어, 각 저장 장치(120)는 물리적으로 분리된 저장 장치일뿐만 아니라 논리적으로도 분리된 저장 장치(이러한 방식은 예를 들어, 데이터의 저장에 관한 관리가 RAID(Redundant Array of Independent Disks) 컨트롤러에 남겨져 있는, RAID와 비교될 수 있다.)로 간주될 수 있다. 그러나 본 개시의 다른 실시예들에서, 저장 장치들(120)은 RAID와 같은 어레이로 구성될 수도 있다. In some embodiments of the present disclosure, each storage device 120 may be considered distinct from other storage devices 120 . That is, there may be no predetermined relationship between the storage devices 120 that manages the use of the storage device 120 . For example, each storage device 120 is not only a physically separated storage device, but also a logically separated storage device (this method, for example, data storage management is RAID (Redundant Array of Independent Disks)) It can be considered as a RAID (comparable to RAID), which is left to the controller. However, in other embodiments of the present disclosure, storage devices 120 may be configured in an array such as a RAID.

어느 저장 장치(들)(120)가 상기 입출력 요청에서 요청된 데이터를 저장하고 있는지를 결정하기 위해, 관리자(320)는 테이블(330)에 액세스할 수 있다. 테이블(330)은 SSD의 FTL(flash translation layer)과 유사하게 기능할 수 있다. 그러나 (애플리케이션에 의해 사용되는 데이터의 논리 블록 주소와 같은) 논리적 주소를 (저장 장치의) 물리적 주소로 맵핑하는 대신에, 테이블(330)은 상기 논리적 주소(또는 데이터의 일부 다른 식별자)를 상기 요청된 데이터를 저장하는 저장 장치(들)(120)의 식별자로 맵핑할 수 있다. 테이블(330)은 어떤 형태의 저장 장치(예를 들어, 로컬 DRAM과 같은 휘발성 스토리지 또는 펌웨어 모듈 또는 플래시 스토리지와 같은 비휘발성 스토리지)에 저장될 수 있다. 테이블(330)의 사용은 아래의 도 5를 참조하여 더 후술된다. To determine which storage device(s) 120 is storing the data requested in the I/O request, manager 320 may access table 330. The table 330 may function similarly to a flash translation layer (FTL) of an SSD. However, instead of mapping a logical address (such as the address of a logical block of data used by an application) to a physical address (of a storage device), table 330 maps the logical address (or some other identifier of data) to the request. It may be mapped to the identifier of the storage device(s) 120 that stores the data. Table 330 may be stored in some form of storage (eg, volatile storage such as local DRAM or non-volatile storage such as firmware module or flash storage). The use of table 330 is described further below with reference to FIG. 5 below.

관리자(320)가 상기 입출력 요청을 로드 모듈(들)(325)로 디스패치하면, 로드 모듈(들)(325)은 저장 장치(들)(120)로부터 상기 요청된 데이터를 액세스할 수 있다. 예를 들어, 데이터가 저장 장치(120-1)에 저장되어 있는 경우, 관리자(320)는 상기 입출력 요청을 로드 모듈들(325-1 및/또는 325-2)로 디스패치할 수 있다. 상기 데이터가 저장 장치(120-2)에 저장되어 있는 경우, 관리자(320)는 상기 입출력 요청을 로드 모듈들(325-3 및/또는 325-4)로 디스패치할 수 있다. 도 3에서, 입출력 처리/저장 풀(315)이 두 개의 저장 장치들(120) 및 네 개의 로드 모듈들(325)을 포함하는 것으로 도시되고, 본 개시의 실시예들은 임의의 개수(하나 이상)의 저장 장치들(120) 및 임의의 개수(하나 이상)의 로드 모듈들(325)을 포함할 수 있다(비록 각 저장 장치(120)에 대하여 적어도 하나 이상의 로드 모듈(325)이 존재해야 한다.).When manager 320 dispatches the I/O request to load module(s) 325, load module(s) 325 can access the requested data from storage device(s) 120. For example, if data is stored in storage device 120-1, manager 320 may dispatch the I/O request to load modules 325-1 and/or 325-2. If the data is stored in the storage device 120-2, the manager 320 may dispatch the I/O request to the load modules 325-3 and/or 325-4. In FIG. 3 , input/output processing/storage pool 315 is shown as including two storage devices 120 and four load modules 325 , embodiments of the present disclosure may include any number (one or more) of storage devices 120 and any number (one or more) of load modules 325 (although at least one load module 325 must exist for each storage device 120). ).

도 3에서 각 저장 장치(120)는 상기 스토리지 장치로부터 데이터를 액세스할 수 있는 두 개의 로드 모듈들(325)을 가진다는 것에 유의한다. 본 개시의 일부 실시예들에서, 저장 장치들(120)은 다중 스레드 액세스(multi-threaded access)를 지원할 수 있다. 즉, 저장 장치들(120)은 다수의 요청들을 동시에 충족시키기 위해 데이터를 읽는 것을 지원할 수 있다. 예를 들어, 저장 장치(120)가 각각이 다른 채널들과 독립적으로 사용될 수 있는 다수의 채널들을 포함하는 경우, 하나의 스레드는 제1 채널을 따라 저장된 데이터를 요청할 수 있고, 다른 스레드는 제2 채널을 따라 저장된 데이터를 요청할 수 있고, 두 개의 스레드들은 동시에 동작할 수 있다. 저장 장치(120)에 동시에 액세스할 수 있으나 데이터는 오직 하나의 로드 모듈(325)에 의해 액세스될 수 있는 다수의 로드 모듈들(325)을 포함하는 본 개시의 실시예들에서, 테이블(330)은 또한 상기 요청된 데이터를 검색하는데 사용되는 특정한 로드 모듈(325)의 식별자를 포함할 수 있다. 그리고 본 개시의 일부 실시예들에서, 저장 장치(120) 별로 하나의 로드 모듈(325)만이 있을 수 있다. 본 개시의 실시예들은 또한 이러한 가능성들의 조합들을 포함할 수 있다. 예를 들어, 하나의 저장 장치(120)는 다중 스레드들을 지원할 수 있고, 다수의 로드 모듈들(325)에 의해 액세스될 수 있는 반면, 다른 저장 장치(120)는 다중 스레드들을 지원하지 않고 따라서 오직 하나의 로드 모듈(325)에 의해서만 액세스될 수 있다. Note in FIG. 3 that each storage device 120 has two load modules 325 that can access data from the storage device. In some embodiments of the present disclosure, storage devices 120 may support multi-threaded access. That is, storage devices 120 may support reading data to simultaneously satisfy multiple requests. For example, if storage device 120 includes multiple channels, each of which can be used independently of the other channels, one thread can request stored data along a first channel and another thread can request data stored along a second channel. You can request stored data along a channel, and the two threads can run concurrently. In embodiments of the present disclosure that include multiple load modules 325 that can access storage device 120 concurrently, but data can be accessed by only one load module 325, table 330 may also include the identifier of the particular load module 325 used to retrieve the requested data. And in some embodiments of the present disclosure, there may be only one load module 325 per storage device 120 . Embodiments of the present disclosure may also include combinations of these possibilities. For example, one storage device 120 may support multiple threads and may be accessed by multiple load modules 325, while another storage device 120 does not support multiple threads and thus only It can only be accessed by one load module 325.

도 3에서, 로드 모듈들(325)은 SLS(sparse length sum) 로드 모듈들(325)일 수 있다. SLS 로드 모듈들(325)은 아래의 도 7을 참조하여 더 논의된다. In FIG. 3 , the load modules 325 may be sparse length sum (SLS) load modules 325 . SLS load modules 325 are discussed further with reference to FIG. 7 below.

로드 모듈들(325)은 예를 들어, 저장 장치들(120)에 액세스하기 위해 UNVMe(user-space non-volatile memory express) 드라이버를 사용할 수 있다. 저장 장치들(120)에 액세스하기 위한 애플리케이션들에 의해 드라이버들이 파일 시스템을 이용할 수 있는 반면, UNVMe 드라이버들은 저장 장치들(120)로부터 직접 데이터에 액세스할 수 있고 파일 시스템을 사용하지 않을 수 있다. 로드 모듈들(325)은 또한 데이터에 액세스하기 위해 저장 장치들(120)에 의해 제공되는 다양한 API(application programming interface)들을 사용할 수 있다. The load modules 325 may use, for example, a user-space non-volatile memory express (UNVMe) driver to access the storage devices 120 . While drivers may use a file system by applications to access storage devices 120 , UNVMe drivers may access data directly from storage devices 120 and may not use a file system. Load modules 325 may also use various application programming interfaces (APIs) provided by storage devices 120 to access data.

도 1을 참조하여 상술한 바와 같이, 저장 장치들(120)은 하드 디스크 드라이브들 및 SSD들과 같은 저장 장치들의 임의의 바람직한 여러가지들일 수 있다. 또한, 이러한 여러가지들의 변형 또한 사용될 수 있다. 예를 들어, 저장 장치들(120)은 DLRM 쿼리들을 충족하기 위해 데이터를 저장 및 검색하도록 최적화된 SSD들일 수 있다. 그러한 SSD들은 일반적인 용도로 의도된 SSD와 다른 아키텍쳐를 가질 수 있다. As described above with reference to Figure 1, storage devices 120 may be any desired variety of storage devices, such as hard disk drives and SSDs. Also, variations of these various types may also be used. For example, storage devices 120 may be SSDs optimized to store and retrieve data to satisfy DLRM queries. Such SSDs may have a different architecture than SSDs intended for general use.

본 개시의 일부 실시예들에서, 입출력 요청에서 요청된 데이터는 특정한 저장 장치(120)에 저장될 수 있다. 그 결과, 다수의 입출력 요청들이 동일한 저장 장치(120)로 전송될 수 있다. 특정한 저장 장치(120)로부터 데이터를 액세스하는 때, 로드 모듈들(325)은 서브미션(submission) 큐들을 사용하여 상기 동일한 저장 장치(120)의 다수의 요청들을 관리할 수 있다. 로드 모듈들(325)은 또한 저장 장치들(120)의 균형을 맞추기 위해 상기 요청의 크기와 서브미션 큐들의 가용성을 고려할 수 있다. In some embodiments of the present disclosure, data requested in an input/output request may be stored in a specific storage device 120 . As a result, multiple I/O requests can be sent to the same storage device 120 . When accessing data from a particular storage device 120, load modules 325 can manage multiple requests for that same storage device 120 using submission queues. Load modules 325 may also consider the size of the request and the availability of submission queues to balance storage devices 120 .

상기 논의에서, 데이터가 저장 장치들(120)에 어떻게 저장되는지는 논의되지 않았다. 본 개시의 일부 실시예들에서, 저장 장치들(120)은 데이터를 미리 로드(pre-load)하고, 테이블(330)은 미리 준비될 수 있다. 본 개시의 다른 실시예들에서, 상기 애플리케이션들은 데이터가 저장 장치들(120)에 쓰여질 것을 요청할 수 있고, 관리자(330)는 상기 데이터를 어느 저장 장치(들)(120)에 쓸 것인지를 선택할 수 있다(테이블(330)은 그에 따라서 업데이트됨).In the above discussion, how data is stored in storage devices 120 has not been discussed. In some embodiments of the present disclosure, storage devices 120 may pre-load data, and table 330 may be prepared in advance. In other embodiments of the present disclosure, the applications can request data to be written to storage devices 120, and the manager 330 can select which storage device(s) 120 to write the data to. (Table 330 is updated accordingly).

데이터가 입출력 처리/저장 풀(315)의 모든 저장 장치들(120)에서 거의 동등하게 액세스되는 것이 바람직할 것이다(이 기능은 로드 밸런싱(load balancing)으로 설명될 수 있음). 그러나 데이터가 저장 장치들(120)에 저장되는 방법 및 저장 장치들(120)로부터 데이터를 요청하는 애플리케이션들이 무엇인지에 의존하여, 각 저장 장치(120)에 대한 부하들은 균형을 이루지 못할 수 있다. 예를 들어, 저장 장치들(120-1 및 120-2)이 크기 면에서 동일한 양의 데이터를 저장할 수 있으나, 예를 들어, 상기 입출력 요청들의 80% 는 저장 장치(120-1)에 저장된 데이터를 요청하는 일이 발생할 수 있다(그리고 상기 입출력 요청들의 20% 만이 저장 장치(120-2)에 저장된 데이터를 요청한다.). 그러한 상황에서, 균형이 맞지 않는 부하들은 저장 장치(120-1)로부터 데이터를 액세스하기 위해 바람직한 레이턴시보다 더 높은 레이턴시를 초래할 수 있다.It would be desirable for data to be accessed approximately equally on all storage devices 120 of the I/O processing/storage pool 315 (this function may be described as load balancing). However, depending on how data is stored in storage devices 120 and what applications are requesting data from storage devices 120, the loads on each storage device 120 may not be balanced. For example, although the storage devices 120-1 and 120-2 may store the same amount of data in terms of size, for example, 80% of the I/O requests are data stored in the storage device 120-1. may occur (and only 20% of the I/O requests request data stored in storage device 120-2). In such a situation, unbalanced loads may result in a higher than desirable latency to access data from storage device 120-1.

그러한 상황들을 조정하기 위해, 입출력 처리/저장 풀(315)은 마이그레이션 모듈(도 3에서 미도시)을 포함할 수 있다. 상기 마이그레이션 모듈은 바람직한 균형을 달성하기 위해 저장 장치들(120) 사이에서 데이터를 이동시키는 것을 관리할 수 있다. 예를 들어, 각 저장 장치로부터 얼마나 많은 데이터가 요청되는지의 균형을 맞추기 위해 일부 데이터가 저장 장치(120-1)로부터 저장 장치(120-2)로 마이그레이션될 수 있다. To accommodate such situations, the I/O processing/storage pool 315 may include a migration module (not shown in FIG. 3). The migration module may manage the movement of data between storage devices 120 to achieve a desired balance. For example, some data may be migrated from storage device 120-1 to storage device 120-2 to balance how much data is requested from each storage device.

데이터가 저장 장치들(120) 사이에서 이동할 수 있는 다른 이유들이 있다. 읽기 부하 균형(read load balance)이 중요한 목표일 수 있으나, 저장 장치들의 용량들을 대략적으로 균형있게 유지하는 것(예를 들어, 저장 장치들(120)에 걸쳐 분산될 수 있는 다른 애플리케이션에 의한 새로운 데이터를 쓰는 것을 지원하기 위해) 또한 중요할 수 있다. 또는, 일부 데이터는 중복성(redundancy)을 제공하기 위해 다수의 저장 장치들(120)에 저장되는 데이터를 정당화하기 위해 충분히 자주 액세스되거나 충분히 중요한 것으로 간주될 수 있다. There are other reasons data may move between storage devices 120 . While read load balancing may be an important goal, keeping the capacities of the storage devices roughly balanced (e.g., new data by other applications that may be distributed across the storage devices 120) ) can also be important. Alternatively, some data may be accessed frequently enough or considered sufficiently important to justify the data being stored on multiple storage devices 120 to provide redundancy.

데이터가 마이그레이션되는 이유(예를 들어, 저장 용량 균형(balancing), 읽기 부하 균형, 또는 중복성)에 관계없이, 마이그레이션 도구(tool)는 그러한 변경들을 반영하도록 테이블(330)을 업데이트할 수 있다. 즉, 데이터가 저장 장치(120-1)로부터 저장 장치(120-2)로 마이그레이션되는 경우, 테이블(330)은 그러한 데이터의 마이그레이션을 반영하기 위해 업데이트될 수 있다. Regardless of the reason the data is being migrated (eg, storage capacity balancing, read load balancing, or redundancy), the migration tool can update the table 330 to reflect those changes. That is, when data is migrated from the storage device 120-1 to the storage device 120-2, the table 330 may be updated to reflect the migration of such data.

도 4는 본 개시의 실시예들에 따른 도 1의 다중 처리 시스템(135)으로 발행되는 입출력(I/O) 요청의 세부사항을 도시한다. 도 4에서, 입출력 요청(405)이 도시된다. 입출력 요청(405)은 식별자(410) 및 벡터들(415-1 내지 415-7)(통칭하여 벡터들(415)로 지칭될 수 있다.)을 포함하는 것으로 도시된다. 예를 들어, 벡터들(415)은 64개의 데이터 포인트들을 포함할 수 있으나, 본 개시의 실시예들은 벡터당 임의의 개수의 데이터 포인트들을 포함할 수 있다. 식별자(410)는 도 1의 다목적 시스템(135)으로부터 요청된 데이터의 식별자일 수 있다. 예를 들어, 식별자(410)는 도 1의 프로세서(110)에서 실행되는 애플리케이션에 의해 사용되는 논리 주소일 수 있으나, 본 개시의 실시예들은 상기 데이터의 임의의 바람직한 식별자를 사용할 수 있다. FIG. 4 illustrates details of input/output (I/O) requests issued to the multiprocessing system 135 of FIG. 1 according to embodiments of the present disclosure. In FIG. 4 , an I/O request 405 is shown. I/O request 405 is shown as including an identifier 410 and vectors 415-1 through 415-7 (which may be referred to collectively as vectors 415). For example, vectors 415 may include 64 data points, but embodiments of the present disclosure may include any number of data points per vector. Identifier 410 may be an identifier of the requested data from multipurpose system 135 of FIG. 1 . For example, identifier 410 may be a logical address used by an application running on processor 110 of FIG. 1 , but embodiments of the present disclosure may use any desired identifier of the data.

벡터들(415)은 관심있는 데이터로부터 특정한 벡터들을 식별할 수 있다. 상술한 바와 같이, DLRM 쿼리들은 큰 크기(최대 수백 GB 또는 TB의 크기)를 가지는 임베딩 테이블들로부터 데이터(임베딩 벡터들)를 사용할 수 있다. 그러나 상기 쿼리들은 상기 테이블 내의 특정한 데이터에만 의존할 수 있고, 전체 테이블을 읽는데 실제로 필요한 데이터의 양에 비해 긴 시간이 걸릴 수 있다. 대신에, 입출력 요청(405)은 벡터들(415)을 포함할 수 있고, 벡터들(415)은 상기 임베딩 테이블에서 관심있는 특정한 벡터들을 식별할 수 있고, 모든 다른 벡터들은 무시될 수 있다. 도 4는 입출력 요청(405)에서 5개의 벡터들(415)을 도시하지만, 본 개시의 실시예들은 임의의 개수의 벡터들을 포함할 수 있다. Vectors 415 can identify specific vectors from the data of interest. As described above, DLRM queries can use data (embedding vectors) from embedding tables having large sizes (up to hundreds of GB or TB in size). However, the queries may only depend on specific data in the table, and reading the entire table may take a long time compared to the amount of data actually needed. Instead, I/O request 405 may include vectors 415, which may identify specific vectors of interest in the embedding table, and all other vectors may be ignored. 4 shows five vectors 415 in the I/O request 405, embodiments of the present disclosure may include any number of vectors.

또한, 입출력 요청(405)이 벡터들(415)을 포함하는 경우, 도 3의 입출력 스케줄러(305)는 읽을 데이터의 크기를 결정할 수 있다. 예를 들어, 도 3의 입출력 스케줄러(305)는 입출력 요청(405)의 벡터들(415)의 개수에 각 벡터(415)의 데이터 포인트들의 개수 및 각 벡터(415)의 각 데이터 포인트의 크기를 곱하여 읽을 바이트들의 개수를 결정할 수 있다. 각 벡터(415)가 예를 들어, 각각이 ‘4’ 바이트를 필요로 하는 128 개의 데이터 포인트들을 포함하는 경우, 입출력 요청(405)에 의해 읽을 데이터의 크기는 64 (입출력 요청(405) 내의 벡터들(415)의 개수) × 128 (각 벡터(415) 내의 데이터 포인트들의 개수) = 32,768 B 로 결정될 수 있다. Also, when the I/O request 405 includes the vectors 415, the I/O scheduler 305 of FIG. 3 may determine the size of data to be read. For example, the I/O scheduler 305 of FIG. 3 calculates the number of vectors 415 of the I/O request 405 as the number of data points in each vector 415 and the size of each data point in each vector 415. Multiply to determine the number of bytes to read. If each vector 415 contains, for example, 128 data points each requiring '4' bytes, then the size of the data to be read by I/O request 405 is 64 (the vector in I/O request 405 s 415) × 128 (the number of data points in each vector 415) = 32,768 B.

도 4는 입출력 요청(405)이 식별자(410) 및 벡터들(415)만을 포함하는 것으로 도시하지만, 본 개시의 실시예들은 다른 데이터를 포함할 수 있거나 도시된 데이터 중 일부를 제거할 수 있다. 예를 들어, 벡터들(415)을 포함하는 대신에, 입출력 요청(405)은 논리 주소(도 4에서, 상기 논리 주소는 식별자(410)로서 사용되나, 본 개시의 실시예들은 식별자(410)와 데이터에 대한 논리 주소를 구별할 수 있다. 이 경우, 상기 논리 주소는 입출력 요청(405)에 포함되는 별도의 데이터일 수 있다.)로부터의 오프셋 및 읽을 바이트들의 개수(이 경우, 읽을 바이트들의 개수는 입출력 요청(405)의 크기를 결정하는데 사용될 수 있다.)를 포함할 수 있다. 또는, 입출력 요청(405)은 도 4에 도시되지 않은 다양한 태그들을 포함할 수 있다. 본 개시의 실시예들은 입출력 요청(405)에 대한 임의의 이러한 변형들을 포함할 수 있다. 4 shows I/O request 405 as including only identifier 410 and vectors 415, embodiments of the present disclosure may include other data or may remove some of the data shown. For example, instead of including vectors 415, an I/O request 405 is sent to a logical address (in FIG. 4, the logical address is used as identifier 410, but embodiments of the present disclosure and a logical address for data. In this case, the logical address may be separate data included in the I/O request 405) and the number of bytes to read (in this case, the number of bytes to read) The number can be used to determine the size of the I/O request 405). Alternatively, the I/O request 405 may include various tags not shown in FIG. 4 . Embodiments of this disclosure may include any such variations on I/O request 405 .

도 5는 본 개시의 실시예들에 따른 도 3의 테이블(330)의 세부사항을 도시한다. 도 5에서, 테이블(330)은 세 개의 엔트리들을 포함하는 것으로 도시된다. 하나의 엔트리는 제1 데이터의 식별자(410-1)를 해당 데이터를 저장하는 저장 장치의 식별자(505-1)에 맵핑한다. 다른 엔트리는 제2 데이터의 식별자(410-2)를 해당 데이터를 저장하는 저장 장치의 식별자(505-2)에 맵핑한다. 그리고 제3 엔트리는 제3 데이터의 식별자(410-3)를 해당 데이터를 저장하는 저장 장치의 식별자(505-3)에 맵핑한다. 도 5는 테이블(330)이 세 개의 엔트리들을 포함하는 것으로 도시하지만, 본 개시의 실시예들은 임의의 개수(‘0’ 이상)의 엔트리들을 포함할 수 있다. FIG. 5 shows details of the table 330 of FIG. 3 according to embodiments of the present disclosure. In FIG. 5 , table 330 is shown as including three entries. One entry maps the identifier 410-1 of the first data to the identifier 505-1 of a storage device that stores the corresponding data. Another entry maps the identifier 410-2 of the second data to the identifier 505-2 of a storage device that stores the corresponding data. The third entry maps the identifier 410-3 of the third data to the identifier 505-3 of a storage device that stores the corresponding data. 5 shows table 330 as including three entries, embodiments of the present disclosure may include any number of entries ('0' or greater).

식별자들(505-1, 505-2, 및 505-3)(통칭하여 식별자(505)라고 지칭될 수 있다.)은 숫자들에 의해 특정한 저장 장치들을 식별하는 것으로 도시된다. 실제의 물리적 주소는 저장 장치 그 자체에 저장될 수 있으므로, 테이블(330)은 실제의 물리적 주소를 저장하지 않을 수 있다. 식별자(305)는 저장 장치들을 고유하게 식별할 수 있는 다른 정보로 대체될 수 있다. 예를 들어, 발견 및/또는 열거 동안에 도 1의 저장 장치들(120)에 할당된 식별자 또는 다른 가능성들 중에서 도 1의 저장 장치들(120)의 시리얼 숫자가 사용될 수 있다. Identifiers 505-1, 505-2, and 505-3 (which may be collectively referred to as identifier 505) are shown identifying specific storage devices by number. Since the actual physical address may be stored in the storage device itself, table 330 may not store the actual physical address. Identifier 305 may be replaced with other information capable of uniquely identifying storage devices. For example, an identifier assigned to storage devices 120 of FIG. 1 during discovery and/or enumeration or a serial number of storage devices 120 of FIG. 1 among other possibilities may be used.

도 5는 각 식별자(410)가 단일의 고유한 식별자(505)와 연관될 수 있음을 시사하는 반면, 본 개시의 실시예들은 식별자들(410)을 하나 이상의 식별자들(505)에 맵핑할 수 있다. 식별자(410-1)와 연관된 상기 데이터가 다수의 저장 장치들에 저장되어 있는 경우(예를 들어, 중복성을 제공하기 위해), 테이블(330)은 이러한 사실을 반영할 수 있다. 문제의 데이터가 도 1의 다수의 저장 장치들(120)에 저장되어 있는 경우, 도 3의 관리자(320)는 도 4의 입출력 요청(405)을 도 3의 하나 이상의 로드 모듈(325)에 할당하는 옵션을 가질 수 있다. 이러한 옵션은 예를 들어, 도 3의 입출력 처리/저장 풀(315)에서 도 1의 저장 장치들(120)에 대한 부하들의 균형을 맞추는데 유용할 수 있다. 예를 들어, 도 1의 저장 장치(120-1)에 처리 대기 중인 도 4의 입출력 요청들(405)이 상대적으로 많고 도 1의 저장 장치들(120-1 및 120-2) 모두로부터 데이터를 이용할 수 있는 경우, 도 3의 관리자(320)는 도 3의 로드 모듈들(325-3 또는 325-4)을 선택하여 도 1의 저장 장치(120-2)로부터 입출력 요청(405)을 수행할 수 있다. 5 suggests that each identifier 410 may be associated with a single unique identifier 505, embodiments of the present disclosure may map identifiers 410 to one or more identifiers 505. there is. If the data associated with identifier 410-1 is stored on multiple storage devices (eg, to provide redundancy), table 330 may reflect this fact. If the data in question is stored on multiple storage devices 120 in FIG. 1, the manager 320 in FIG. 3 assigns the I/O request 405 in FIG. 4 to one or more load modules 325 in FIG. You may have the option to This option may be useful, for example, to balance the loads on the storage devices 120 of FIG. 1 in the I/O processing/storage pool 315 of FIG. 3 . For example, the number of I/O requests 405 in FIG. 4 waiting to be processed by the storage device 120-1 in FIG. If available, the manager 320 of FIG. 3 selects the load modules 325-3 or 325-4 of FIG. 3 to perform the I/O request 405 from the storage device 120-2 of FIG. can

본 개시의 일부 실시예들에서, 도 4의 입출력 요청(405)에서 요청된 데이터는 도 1의 단일의 저장 장치(120)에 저장될 수 있다. 본 개시의 그러한 실시예들에서, 식별자(505)는 식별자(410)에 의해 나타내어지는 데이터가 어디에 위치하는지를 고유하게 식별할 수 있다. 그러나 본 개시의 다른 실시예들에서, 상기 데이터는 도 1의 다수의 저장 장치들(120)에 걸쳐 분산될 수 있다. 본 개시의 그러한 실시예들에서, 도 3의 관리자(320)는 도 4의 입출력 요청(405)에서 요청된 모든 데이터가 도 1의 다수의 저장 장치들(120)에 걸쳐서 분산되어 있을 수 있다는 결정을 할 수 있고, 도 4의 입출력 요청(405)을 다수의 다른 입출력 요청들로 분할할 수 있고, 각각을 도 3의 다른 로드 모듈들(325)로 전송할 수 있다. 테이블(330)이 도 1의 저장 장치(120)가 어떤 데이터를 저장하고 있는지를 식별할 수 있으므로, 테이블(330)은 상기 데이터의 다른 부분들에 대한 다수의 엔트리들을 포함할 수 있다. In some embodiments of the present disclosure, the data requested in the I/O request 405 of FIG. 4 may be stored in the single storage device 120 of FIG. 1 . In such embodiments of the present disclosure, identifier 505 may uniquely identify where the data represented by identifier 410 is located. However, in other embodiments of the present disclosure, the data may be distributed across multiple storage devices 120 of FIG. 1 . In such embodiments of the present disclosure, manager 320 of FIG. 3 determines that all data requested in I/O request 405 of FIG. 4 may be distributed across multiple storage devices 120 of FIG. , and can split the I/O request 405 of FIG. 4 into a number of different I/O requests, and send each to the different load modules 325 of FIG. 3 . Since table 330 may identify what data storage device 120 of FIG. 1 is storing, table 330 may include multiple entries for different portions of the data.

예로서, 도 4의 입출력 요청(405)을 다시 고려한다. 도 4의 입출력 요청(405)은 도 4의 5개의 벡터들(415)로부터 데이터를 요청한다. 도 3의 관리자(320)는 도 4의 벡터들(415-1 및 415-4)이 도 1의 저장 장치(120-1)에 저장되어 있고, 도 4의 벡터들(415-2, 415-3, 및 415-5)이 도 1의 저장 장치(120-2)에 저장되어 있다고 결정할 수 있다. 그 경우, 도 3의 관리자(320)는 하나의 입출력 요청을 도 3의 로드 모듈들(325-1 또는 325-2)로 전송하여 도 4의 벡터들(415-1 및 415-4)을 읽을 수 있고, 다른 입출력 요청을 도 3의 로드 모듈들(325-3 또는 325-4)로 전송하여 도 4의 벡터들(415-2, 415-3 및 415-5)을 읽을 수 있다. As an example, consider again I/O request 405 of FIG. I/O request 405 of FIG. 4 requests data from the five vectors 415 of FIG. The manager 320 of FIG. 3 has the vectors 415-1 and 415-4 of FIG. 4 stored in the storage device 120-1 of FIG. 1 and the vectors 415-2 and 415-4 of FIG. 3 and 415-5) may be determined to be stored in the storage device 120-2 of FIG. In this case, the manager 320 of FIG. 3 transmits one I/O request to the load modules 325-1 or 325-2 of FIG. 3 to read vectors 415-1 and 415-4 of FIG. and other I/O requests may be sent to the load modules 325-3 or 325-4 of FIG. 3 to read the vectors 415-2, 415-3, and 415-5 of FIG.

또한, 도 3의 다수의 로드 모듈들(325)이 도 1의 동일한 저장 장치(120)에 액세스할 수 있음을 상기한다. 그 경우에, 읽을 데이터가 예를 들어 도 1의 저장 장치(120-1)에만 저장되어 있는 경우에도 도 3의 관리자(320)는 두 개의 입출력 요청들을, 하나는 도 3의 로드 모듈(325-1)로, 다른 하나는 도 3의 로드 모듈(325-2)로 전송할 수 있다. 이러한 방식으로, 도 3의 단일의 로드 모듈(325)이 도 1의 저장 장치(120-1)로부터 모든 데이터를 읽는 것을 처리할 수 있음에도 불구하고, 도 3의 관리자(320)는 도 1의 저장 장치(120-1)로부터 데이터를 읽는 것을 촉진할 수 있다. Also recall that multiple load modules 325 of FIG. 3 may access the same storage device 120 of FIG. 1 . In that case, even if the data to be read is stored only in the storage device 120-1 of FIG. 1, for example, the manager 320 of FIG. 1), and the other to the load module 325-2 of FIG. 3. In this way, even though a single load module 325 of FIG. 3 can handle reading all the data from the storage device 120-1 of FIG. 1, the manager 320 of FIG. Reading data from device 120-1 may be facilitated.

도 6은 본 개시의 실시예들에 따른 도 3의 입출력 스케줄러(305)의 세부사항을 도시한다. 도 5에서, 입출력 스케줄러(305)는 크기 연산기(605), 임계치(610), 비교기(615), 및 큐 선택기(620)를 포함할 수 있다. 도 3 및 도 4를 참조하여 상술한 바와 같이, 입출력 스케줄러(305)는 도 4의 입출력 요청(405)의 크기를 사용하여 도 4의 입출력 요청(405)이 배치될 수 있는 도 3의 큐(310)를 선택할 수 있다. 도 6의 크기 연산기(605)는 도 4의 입출력 요청(405)의 크기를 결정할 수 있다. 크기 연산기(605)는 다른 데이터 중에서 도 4의 입출력 요청(405)에 따라 데이터로부터 읽을 바이트들의 개수, 또는 도 4의 각 벡터(415)의 데이터 포인트들의 개수와 데이터 포인터 별 바이트들의 개수와 조합된 도 4의 벡터들(415)의 개수를 사용하여 도 4의 입출력 요청(405)의 크기를 연산할 수 있다. FIG. 6 shows details of the I/O scheduler 305 of FIG. 3 in accordance with embodiments of the present disclosure. In FIG. 5 , the I/O scheduler 305 may include a size calculator 605 , a threshold 610 , a comparator 615 , and a queue selector 620 . As described above with reference to FIGS. 3 and 4, the I/O scheduler 305 uses the size of the I/O request 405 of FIG. 4 to place the I/O request 405 in FIG. 310) can be selected. The size calculator 605 of FIG. 6 may determine the size of the I/O request 405 of FIG. 4 . The size calculator 605 calculates, among other data, the number of bytes to be read from the data according to the input/output request 405 of FIG. 4 or the number of data points of each vector 415 of FIG. 4 combined with the number of bytes per data pointer. The size of the I/O request 405 of FIG. 4 can be calculated using the number of vectors 415 of FIG. 4 .

도 4의 입출력 요청(405)의 크기가 크기 연산기(605)에 의해 결정되면, 비교기(615)는 도 4의 입출력 요청(405)과 임계치(610)를 비교할 수 있다. 임계치(610)는 도 4의 입출력 요청(405)의 크기와 비교될 수 있는 임의의 바람직한 임계치일 수 있고, 큐 선택기(620)는 이 정보를 도 3의 큐들(310) 중에서 선택할 때 사용하여 도 4의 입출력 요청(405)을 배치할 수 있다. 도 4의 입출력 요청(405)의 크기가, 크기 연산기(605)에 의해 결정된 바와 같이, 비교기(615)에 따른 임계치(610)보다 작은 경우, 큐 선택기(620)는 도 3의 하나의 큐(310)를 선택하여 도 4의 입출력 요청(405)을 배치할 수 있다. 그렇지 않으면, 큐 선택기(620)는 도 3의 다른 큐(310)를 선택하여 도 4의 입출력 요청(405)을 배치할 수 있다.When the size of the I/O request 405 of FIG. 4 is determined by the size calculator 605, the comparator 615 may compare the I/O request 405 of FIG. 4 with the threshold 610. Threshold 610 can be any desirable threshold comparable to the size of I/O request 405 of FIG. 4, and queue selector 620 uses this information when selecting among queues 310 of FIG. I/O requests 405 of 4 can be placed. If the size of the I/O request 405 of FIG. 4 is smaller than the threshold 610 according to the comparator 615, as determined by the size calculator 605, the queue selector 620 selects one queue ( 310) to place the I/O request 405 of FIG. Otherwise, the queue selector 620 may select another queue 310 of FIG. 3 to place the I/O request 405 of FIG. 4 .

도 6은 본 개시의 실시예들에 따른 하나의 임계치(610)를 도시하지만, 본 개시의 실시예들은 임의의 개수의 임계치들(610)을 포함할 수 있고, 비교기(615)는, 도 4의 입출력 요청(405)의 크기보다 작은 최대 임계치(610)가 식별될 때까지(또는 대안적으로, 도 4의 입출력 요청(405)의 크기보다 큰 최소 임계치(610)가 식별될 때까지), 도 4의 입출력 요청(405)의 크기와 각 임계치(610)를 비교할 수 있다. 그 다음에, 큐 선택기(620)는 이 정보를 사용하여 도 3의 큐(310)를 선택함으로써 도 4의 입출력 요청(405)을 배치할 수 있다. Although FIG. 6 shows one threshold 610 according to embodiments of the present disclosure, embodiments of the present disclosure may include any number of thresholds 610, and comparator 615 is shown in FIG. until a maximum threshold 610 is identified that is less than the size of the I/O request 405 of (or, alternatively, a minimum threshold 610 that is greater than the size of the I/O request 405 of FIG. 4 is identified); The size of the I/O request 405 of FIG. 4 and each threshold 610 may be compared. The queue selector 620 can then use this information to place the I/O request 405 of FIG. 4 by selecting the queue 310 of FIG. 3 .

예를 들어, 도 3을 참조하여 상술한 바와 같이, 도 3의 큐(310-1)는 예를 들어, 128 개의 임베딩 벡터들보다 크지 않은 데이터를 검색하기 위한 입출력 요청들을 저장하기 위해 사용될 수 있다. 도 3의 큐(310-2)는 예를 들어, 128 개의 임베딩 벡터들보다는 크지만 예를 들어, 512 개의 임베딩 벡터들보다는 작은 데이터를 검색하기 위한 입출력 요청들을 저장하기 위해 사용될 수 있다. 도 3의 큐(310-3)는 예를 들어, 512 개의 임베딩 벡터들보다 큰 데이터를 검색하기 위한 입출력 요청들을 저장하기 위해 사용될 수 있다. 이러한 예에서, 2개의 임계치들(610)이, 하나는 128 개의 임베딩 벡터들에서, 다른 하나는 512 개의 임베딩 벡터들에서 사용될 수 있다. 따라서 본 개시의 일부 실시예들에서, 도 3의 큐들(310)의 개수가 임계치들(610)의 개수보다 하나 더 클 수 있다(임계치들(610)은 도 3의 큐 쌍들(310) 사이를 구분하는 선들의 역할을 한다.).For example, as described above with reference to FIG. 3, queue 310-1 of FIG. 3 may be used to store I/O requests to retrieve data, for example, no greater than 128 embedding vectors. . Queue 310-2 of FIG. 3 may be used to store I/O requests for retrieving data greater than, for example, 128 embedding vectors but less than, for example, 512 embedding vectors. Queue 310-3 of FIG. 3 may be used to store I/O requests for retrieving data larger than 512 embedding vectors, for example. In this example, two thresholds 610 may be used, one at 128 embedding vectors and the other at 512 embedding vectors. Thus, in some embodiments of the present disclosure, the number of queues 310 in FIG. 3 may be one greater than the number of thresholds 610 (thresholds 610 may be between pairs of queues 310 in FIG. 3 ). They serve as dividing lines).

도 7은 본 개시의 실시예들에 따른 도 3의 로드 모듈(325)의 세부사항을 도시한다. 도 4를 참조하여 상술한 바와 같이, 도 4의 입출력 요청(405)은 임베딩 테이블로부터 읽을 도 4의 특정한 벡터들(415)을 식별할 수 있다. 도 4의 입출력 요청(405)에서의 도 4의 벡터들(415)의 개수는 데이터의 벡터들의 개수에 비해 작을 수 있기 때문에, 대부분의 데이터는 무시될 수 있다. 그러므로 도 1의 저장 장치들(120)의 데이터는, 도 4의 입출력 요청(405)을 충족시키기 위한 목적으로 대부분의 값들이 무시될 수 있다는 점에서, “희소(sparse)”한 것으로 간주될 수 있다.FIG. 7 shows details of the load module 325 of FIG. 3 according to embodiments of the present disclosure. As discussed above with reference to FIG. 4, the I/O request 405 of FIG. 4 may identify specific vectors 415 of FIG. 4 to be read from the embedding table. Since the number of vectors 415 of FIG. 4 in the I/O request 405 of FIG. 4 may be small compared to the number of vectors of data, most of the data may be ignored. Data in storage devices 120 of FIG. 1 can therefore be considered “sparse” in that most values can be ignored for purposes of satisfying I/O requests 405 of FIG. 4 . there is.

로드 모듈(325)이 SLS(sparse length sum) 로드 모듈일 때, 로드 모듈(325)은 도 4의 입출력 요청(405)에서 식별되는 도 4의 특정한 벡터들(415)을 읽을 수 있고, 그것들을 서로 더하여 단일의 벡터를 생성할 수 있다. 그 다음에 도 4의 상기 식별된 벡터들(415)의 합은 도 4의 입출력 요청(405)에 의해 요청된 데이터로서 반환될 수 있다. When load module 325 is a sparse length sum (SLS) load module, load module 325 can read certain vectors 415 of FIG. 4 identified in I/O request 405 of FIG. They can be added together to create a single vector. The sum of the identified vectors 415 of FIG. 4 may then be returned as the data requested by the I/O request 405 of FIG. 4 .

SLS 로드 모듈(325)로서 동작하기 위해, 로드 모듈(325)은 판독기(750) 및 가산기(710)를 포함할 수 있다. 판독기(705)는 도 1의 저장 장치(120)로부터 도 4의 입출력 요청(405)에서 식별되는 도 4의 특정한 벡터들(415)을 읽을 수 있다. 본 개시의 일부 실시예들에서 판독기(705)는 도 1의 저장 장치(120)로부터 어떻게든 데이터를 액세스할 수 있음을 유의한다. 본 개시의 다른 실시예들에서 판독기(705)는 도 1의 저장 장치(120)로 적절한 명령들을 발행할 수 있고, 저장 장치(120)는 상기 데이터를 로드 모듈(325)로 반환할 수 있다. 그 다음에, 가산기(710)는 도 1의 저장 장치(120)로부터 검색된 상기 벡터들을 더하여 도 4의 입출력 요청(405)에 응답하여 반환되는 데이터를 생성할 수 있다.To operate as the SLS load module 325, the load module 325 may include a reader 750 and an adder 710. The reader 705 can read the particular vectors 415 of FIG. 4 identified in the I/O request 405 of FIG. 4 from the storage device 120 of FIG. 1 . Note that in some embodiments of the present disclosure, reader 705 may somehow access data from storage device 120 of FIG. 1 . In other embodiments of the present disclosure, reader 705 may issue appropriate commands to storage device 120 of FIG. 1 , and storage device 120 may return the data to load module 325 . Next, the adder 710 may add the vectors retrieved from the storage device 120 of FIG. 1 to generate data returned in response to the input/output request 405 of FIG. 4 .

도 8은 본 개시의 실시예들에 따른 도 1의 연산 시스템(140)의 세부사항을 도시한다. 도 8에서, 연산 시스템(140)은 연산 요청(805)을 수신할 수 있고, 연산 요청(805)은 도 4의 입출력 요청(405)에 응답하여 도 1의 다중 처리 시스템(135)에 의해 검색되는 데이터를 처리하기 위한 요청일 수 있다. 연산 시스템(140)은 연산 스케줄러(810), 큐들(815-1 및 815-2)(통칭하여 큐들(815)로 지칭될 수 있다.), 및 처리 요소들(820-1, 820-2, 및 820-3)(통칭하여 처리 요소들(820)로 지칭될 수 있다.)을 포함할 수 있다. FIG. 8 shows details of computing system 140 of FIG. 1 in accordance with embodiments of the present disclosure. In FIG. 8 , computing system 140 may receive operation request 805 , which is retrieved by multiprocessing system 135 of FIG. 1 in response to input/output request 405 of FIG. 4 . It may be a request to process the data to be processed. Computing system 140 includes computation scheduler 810, queues 815-1 and 815-2 (which may be referred to collectively as queues 815), and processing elements 820-1, 820-2, and 820-3) (which may be referred to collectively as processing elements 820).

연산 스케줄러(810)는 연산 요청(805)의 워크로드에 기초하여 큐들(815) 중 하나에 연산 요청(805)을 배치할 수 있다. 예를 들어, 연산 요청(805)은 연산 요청(805)이 배치되어야 하는 큐(815)를 결정할 수 있는 처리 요소들(820) 중 하나만에 의해 제공되는 리소스들을 포함할 수 있다. 연산 스케줄러(810)는 또한 후술하는 바와 같이 연산 요청(805)을 처리 요소들(820)에 할당하는데 있어 처리 요소들(820)이 얼마나 바쁜지(busy)를 고려할 수 있다. Operation scheduler 810 can place operation request 805 in one of queues 815 based on the workload of operation request 805 . For example, the operation request 805 may include resources provided by only one of the processing elements 820 that may determine the queue 815 to which the operation request 805 should be placed. Operation scheduler 810 may also consider how busy processing elements 820 are in assigning operation requests 805 to processing elements 820, as described below.

본 개시의 일부 실시예들에서, 큐들(815)은 FIFO 큐들일 수 있다. 본 개시의 다른 실시예들에서, 다른 유형의 큐들(815)이 사용될 수 있다. In some embodiments of the present disclosure, queues 815 may be FIFO queues. In other embodiments of the present disclosure, other types of queues 815 may be used.

그 다음에, 처리 요소(820)는 큐들(815)로부터 연산 요청(805)을 제거하고 상기 요청을 처리할 수 있다. 처리 요소(820)는 임의의 바람직한 유형의 처리 요소일 수 있다. 예를 들어, 처리 요소(820)는 CPU(central processing unit), GPU(graphics processing unit), GPGPU(general purpose GPU), NPU(neural processing unit), TPU(tensor processing unit), 또는 FPGA(Field Programmable Gate Array)와 같은 가속기이거나 다른 가능성들 중에서 ASIC(Application-Specific Integrated Circuit)일 수 있다. 또한, 다중 코어들(예를 들어, 다중-코어 CPU)을 포함하는 요소들에 대하여 각 코어는 개별적인 처리 요소(820)로 간주될 수 있다. Processing element 820 may then remove operation request 805 from queues 815 and process the request. Processing element 820 may be any desired type of processing element. For example, the processing element 820 may be a central processing unit (CPU), a graphics processing unit (GPU), a general purpose GPU (GPGPU), a neural processing unit (NPU), a tensor processing unit (TPU), or a field programmable unit (FPGA). It could be an accelerator such as a Gate Array or an Application-Specific Integrated Circuit (ASIC) among other possibilities. Also, for elements that include multiple cores (eg, a multi-core CPU), each core may be considered a separate processing element 820 .

본 개시의 일부 실시예들에서, 각 처리 요소(820)는 자신의 큐(815)를 가질 수 있고, 그로부터 처리할 연산 요청들을 수신할 수 있다. 즉, 처리 요소(820)는 구체적으로 할당된 연산 요청들만을 처리할 수 있고, 다른 처리 요소들에 할당된 연산 요청들을 무시할 수 있다. 본 개시의 다른 실시예들에서, 두 개 이상의 처리 요소들이 하나의 큐를 공유할 수 있다. 예를 들어, 도 8에 도시된 바와 같이, 처리 요소들(820-1 및 820-2)은 큐(815-1)를 통해 연산 요청들을 모두 수신할 수 있는 반면, 처리 요소(820-3)는 큐(820-3)를 통해 연산 요청들을 수신할 수 있다. 처리 요소가 연산 요청의 처리를 완료하는 경우, 그 다음에 상기 처리 요소는 처리할 수 있는 다른 연산 요청이 대기하고 있는지를 확인하기 위해 적절한 큐를 조사할 수 있다. 상기 처리 요소가 처리할 수 있고 대기 중인 연산 요청을 찾은 경우, 상기 처리 요소는 상기 연산 요청을 처리하기 시작할 수 있다. 그렇지 않으면 상기 처리 요소는 유휴(idle) 상태로 될 수 있다. In some embodiments of the present disclosure, each processing element 820 may have its own queue 815 from which it may receive operation requests to process. That is, processing element 820 may process only specifically assigned operation requests and may ignore operation requests assigned to other processing elements. In other embodiments of the present disclosure, two or more processing elements may share a queue. For example, as shown in FIG. 8, processing elements 820-1 and 820-2 may both receive operation requests via queue 815-1, while processing element 820-3 may receive operation requests through the queue 820-3. When a processing element has completed processing of an operation request, the processing element may then check the appropriate queue to see if there are other operation requests waiting to be processed. If the processing element finds a pending operation request that it can process, the processing element may begin processing the operation request. Otherwise the processing element may be idle.

본 개시의 일부 실시예들에서, 처리 요소는 다수의 큐들(815)에서 연산 요청을 찾을 수 있다. 예를 들어, 처리 요소(820-3)는 처리 요소들(820-1 및 820-2)이 처리할 수 있는 임의의 연산 요청을 처리할 수 있으나, 일부 추가적인 연산 요청 또한 처리할 수 있다. 그러한 상황에서, 처리 요소(820-3)는 큐(815-2)가 처리되기를 기다리는 연산 요청들을 갖는 한 큐(815-2)로부터 연산 요청들을 검색할 수 있다. 큐(815-2)가 비어 있는 경우, 처리 요소(820-3)는 큐(815-1)로부터 연산 요청을 검색할 수 있다. In some embodiments of the present disclosure, a processing element may seek operation requests in multiple queues 815 . For example, processing element 820-3 can process any computational request that processing elements 820-1 and 820-2 can process, but can also process some additional computational requests. In such a situation, processing element 820-3 may retrieve operation requests from queue 815-2 as long as queue 815-2 has operation requests waiting to be processed. If queue 815-2 is empty, processing element 820-3 may retrieve the operation request from queue 815-1.

연산 시스템(140)은 또한 준비 큐(ready queue)(825)를 포함할 수 있다. 처리 요소들(820)은 준비 큐(825)를 사용하여 처리 요소들(820)이 연산 요청의 처리를 끝마쳤을 때 연산 스케줄러(810)에게 알릴 수 있다. 이러한 방식으로, 연산 스케줄러(810)는 처리 요소들(820)이 얼마나 바쁜지를 추적할 수 있다. 예를 들어, 연산 스케줄러(810)가 연산 요청(805)을 수신하는 상황을 고려하고, 각 처리 요소(820)가 자신의 큐(815)를 갖는다고 가정한다. 연산 스케줄러(810)는 처리 요소들(820-1 또는 820-2) 중 하나가 연산 요청을 처리할 수 있다고 결정할 수 있다. 처리 요소들(820-1 또는 920-2)이 얼마나 바쁜지에 관한 아무런 정보 없이, 연산 스케줄러(810)는 연산 요청(805)을 처리 요소들(820-1 또는 920-2)과 연관된 큐들에 무작위로 할당할 수 있다. 그러나 연산 스케줄러(810)가 준비 큐(825)를 통해 처리 요소(820-2)가 가장 최근의 연산 요청을 완료했다는 정보를 수신하는 경우(따라서 현재 유휴 상태임), 연산 스케줄러(810)는 연산 요청(805)을 어느 처리 요소들(820-1 또는 820-2)이 가벼운 워크로드를 가지고 있는지를 추측할 필요없이 연산 요청(805)을 처리 요소(820-2)에 할당할 수 있다. Computing system 140 may also include a ready queue 825 . Processing elements 820 may use ready queue 825 to notify operation scheduler 810 when processing elements 820 have finished processing an operation request. In this way, operation scheduler 810 can track how busy processing elements 820 are. For example, consider the situation where operation scheduler 810 receives operation request 805, and assume that each processing element 820 has its own queue 815. Operation scheduler 810 may determine that one of processing elements 820-1 or 820-2 is capable of processing the operation request. Without any information about how busy processing elements 820-1 or 920-2 are, operation scheduler 810 randomly sends operation requests 805 to queues associated with processing elements 820-1 or 920-2. can be assigned to However, if operation scheduler 810 receives information via ready queue 825 that processing element 820-2 has completed its most recent operation request (and is therefore currently idle), operation scheduler 810 may perform operation Request 805 can be assigned to processing element 820-2 without having to guess which processing elements 820-1 or 820-2 have a light workload.

유사한 방식으로, 처리 요소들(820-1 또는 820-2)이 연산 요청(805)을 처리하는데 더 바람직한 처리 요소일 수 있으나, 처리 요소들(820-1 및 820-2) 모두가 현재 바쁜 상태(busy)이고 처리 요소(820-3)가 현재 유휴 상태(idle)인 경우, 연산 스케줄러(810)는 처리 요소(820-3)가 연산 요청(805)을 처리하는데 덜 바람직하더라도 연산 요청(805)을 처리하기 위해 처리 요소(820-3)를 스케줄링할 수 있다. In a similar fashion, processing elements 820-1 or 820-2 may be the preferred processing element for processing operation request 805, but both processing elements 820-1 and 820-2 are currently busy. busy, and processing element 820-3 is currently idle, operation scheduler 810 determines operation request 805 even if processing element 820-3 is less desirable to process operation request 805. ), the processing element 820-3 may be scheduled.

연산 요청(805)은 연산 요청을 식별하는 태그를 포함할 수 있다. 대안적으로, 연산 스케줄러(810)는 태그를 연산 요청(805)에 할당함으로써 상기 연산 요청을 식별할 수 있다. 처리 요소들(820)은 이들 태그들을 준비 큐(825)에 사용하여 완료된 연산 요청들이 무엇인지에 대해 연산 스케줄러(810)에게 알릴 수 있다. 이러한 방식으로, 연산 스케줄러(810)는 (각 처리 요소(820)가 어떤 연산 요청들을 처리했는지와 각 처리 요소(820)에 대해 어떤 연산 요청들이 스케줄링되어 있는지를 비교함으로써) 처리 요소들(820)에 대해 계류 중인 워크로드들의 대략적인 방안(idea)을 유지할 수 있다.The operation request 805 may include a tag identifying the operation request. Alternatively, operation scheduler 810 may identify operation request 805 by assigning a tag to operation request 805 . Processing elements 820 may use these tags in ready queue 825 to inform operation scheduler 810 what operation requests have been completed. In this way, operation scheduler 810 provides information about processing elements 820 (by comparing which operation requests each processing element 820 has processed and which operation requests are scheduled for each processing element 820). You can keep a rough idea of pending workloads for .

도 9는 본 개시의 실시예들에 따른 도 1의 다중 처리 시스템(135)을 사용하여 도 4의 입출력 요청을 처리하기 위한 예시적인 절차의 순서도를 도시한다. 도 9에서, 블록 905에서, 도 3의 입출력 스케줄러(305)는 도 4의 입출력 요청(405)을 수신할 수 있다. 블록 910에서, 도 4의 입출력 요청(405)은 도 3의 입출력 스케줄러(305)로부터 도 3의 로드 모듈(325)로 전달될 수 있다. 마지막으로, 블록 915에서, 도 3의 로드 모듈(325)은 도 4의 입출력 요청(405)에 기초하여 도 1의 저장 장치(120)로부터 데이터를 읽을 수 있다. FIG. 9 depicts a flowchart of an exemplary procedure for processing an I/O request of FIG. 4 using the multiprocessing system 135 of FIG. 1 in accordance with embodiments of the present disclosure. In FIG. 9 , at block 905 , the I/O scheduler 305 of FIG. 3 may receive the I/O request 405 of FIG. 4 . At block 910, the I/O request 405 of FIG. 4 may be passed from the I/O scheduler 305 of FIG. 3 to the load module 325 of FIG. Finally, at block 915, the load module 325 of FIG. 3 may read data from the storage device 120 of FIG. 1 based on the I/O request 405 of FIG.

도 10은 본 개시의 실시예들에 따른 도 1의 다중 처리 시스템을 사용하여 도 4의 입출력 요청을 처리하기 위한 예시적인 절차의 대안적인 순서도를 도시한다. 도 10에서, 도 9의 블록들과 유사한 블록들은 동일한 도면 참조 번호들을 사용하여 할당될 수 있다. 도 10에서, 블록 905에서, 도 3의 입출력 스케줄러(305)는 도 4의 입출력 요청(405)을 수신할 수 있다. 블록 1005에서, 도 6의 크기 연산기(605)는 도 4의 입출력 요청(405)의 크기를 결정할 수 있다. 블록 1010에서, 도 6의 큐 선택기(620)는 도 4의 입출력 요청(405)을 위한 도 3의 큐(310)를 선택할 수 있다. 블록 1015에서, 도 3의 입출력 선택기(305)는 도 6의 큐 선택기(620)에 의해 선택된 도 3의 큐(310)에 도 4의 입출력 요청(405)을 배치할 수 있다. 마지막으로 블록 915에서, 도 3의 로드 모듈(325)은 도 4의 입출력 요청(405)에 기초하여 도 1의 저장 장치(120)로부터 데이터를 읽을 수 있다. FIG. 10 depicts an alternative flow diagram of an exemplary procedure for processing the I/O request of FIG. 4 using the multiprocessing system of FIG. 1 in accordance with embodiments of the present disclosure. In FIG. 10, blocks similar to those of FIG. 9 may be assigned using the same figure reference numerals. In FIG. 10 , at block 905 , the I/O scheduler 305 of FIG. 3 may receive the I/O request 405 of FIG. 4 . At block 1005, size calculator 605 of FIG. 6 may determine the size of I/O request 405 of FIG. At block 1010, the queue selector 620 of FIG. 6 may select the queue 310 of FIG. 3 for the I/O request 405 of FIG. At block 1015, the I/O selector 305 of FIG. 3 may place the I/O request 405 of FIG. 4 to the queue 310 of FIG. 3 selected by the queue selector 620 of FIG. Finally at block 915 the load module 325 of FIG. 3 may read data from the storage device 120 of FIG. 1 based on the I/O request 405 of FIG. 4 .

도 11은 본 개시의 실시예들에 따른 도 3의 입출력 스케줄러(305)가 도 4의 입출력 요청(405)을 큐잉하는 것에 우선순위 큐잉을 사용하기 위한 예시적인 절차의 순서도를 도시한다. 도 11에서, 블록 1105에서, 도 3의 입출력 스케줄러(305)는 도 4의 입출력 요청(405)의 크기(도 5의 크기 연산기(605)에 의해 결정될 수 있음)에 기초하여 도 4의 입출력 요청(405)을 위한 우선순위 태그를 결정할 수 있다. 블록 1110에서, 도 3의 입출력 스케줄러(305)는 우선순위 태그를 도 3의 큐(310)에서 도 3의 입출력 요청(405)과 연관시킬 수 있다. FIG. 11 shows a flow diagram of an exemplary procedure for using priority queuing for the I/O scheduler 305 of FIG. 3 to queue the I/O requests 405 of FIG. 4 in accordance with embodiments of the present disclosure. In FIG. 11, at block 1105, the I/O scheduler 305 of FIG. 3 bases the I/O request 405 of FIG. 4 on the size of the I/O request 405 of FIG. 4 (which may be determined by size calculator 605 of FIG. 5). A priority tag for 405 can be determined. At block 1110, the I/O scheduler 305 of FIG. 3 may associate a priority tag with the I/O request 405 of FIG. 3 in the queue 310 of FIG.

도 12는 본 개시의 실시예들에 따른 도 3의 관리자(320)가 도 4의 입출력 요청(405)을 도 3의 로드 모듈(325)에 할당하기 위한 예시적인 절차의 순서도를 도시한다. 도 12에서, 블록 1205에서, 도 3의 관리자(320)는 도 3의 큐(310)로부터 도4의 입출력 요청(405)을 검색할 수 있다. 블록 1210에서, 도 3의 관리자(320)는 데이터를 저장하는 도 1의 저장 장치(120)를 식별할 수 있다. 이는 예를 들어, 블록 1215에 도시된 바와 같이, 데이터의 도 4의 식별자(410)를 도 3의 테이블(330)을 사용하여 데이터를 저장하는 저장 장치의 도 3의 식별자에 맵핑하는 것을 수반할 수 있다. 블록 1220에서, 도 3의 관리자(320)는 도 1의 저장 장치(120)에 액세스할 수 있는 도 3의 로드 모듈(325)을 식별할 수 있다. 마지막으로, 블록 1225에서, 도 3의 관리자(320)는 도 4의 입출력 요청(405)을 도 3의 로드 모듈(325)로 전송할 수 있다.FIG. 12 illustrates a flow diagram of an exemplary procedure for the manager 320 of FIG. 3 to assign the I/O request 405 of FIG. 4 to the load module 325 of FIG. 3 according to embodiments of the present disclosure. In FIG. 12 , at block 1205 the manager 320 of FIG. 3 may retrieve the I/O request 405 of FIG. 4 from the queue 310 of FIG. 3 . At block 1210, the manager 320 of FIG. 3 can identify the storage device 120 of FIG. 1 that stores the data. This may involve mapping the FIG. 4 identifier 410 of the data to the FIG. 3 identifier of the storage device storing the data using the table 330 of FIG. 3 , for example, as shown in block 1215 . can At block 1220, the manager 320 of FIG. 3 can identify the load module 325 of FIG. 3 that can access the storage device 120 of FIG. Finally, at block 1225, the manager 320 of FIG. 3 may send the I/O request 405 of FIG. 4 to the load module 325 of FIG.

도 13은 본 개시의 실시예들에 따른 도 3의 로드 모듈(325)이 도 1의 저장 장치로부터 데이터를 읽기 위한 예시적인 절차의 순서도를 도시한다. 도 13에서, 블록 1305에서, 도 7의 판독기(705)는 도 1의 저장 장치(120)로부터 도 5의 벡터들(415)을 읽을 수 있다. 블록 1310에서, 도 7의 가산기(710)는 도 4의 벡터들(415)을 더할 수 있다. 마지막으로, 블록 1315에서, 도 1의 다중 처리 시스템(135)은 도 8의 연산 요청(805)을 처리하는데 사용하기 위해 데이터를 도 1의 연산 시스템(140)의 도 8의 연산 스케줄러(810)로 전송할 수 있다. FIG. 13 illustrates a flowchart of an exemplary procedure for the load module 325 of FIG. 3 to read data from the storage device of FIG. 1 in accordance with embodiments of the present disclosure. In FIG. 13 , at block 1305 , reader 705 of FIG. 7 can read vectors 415 of FIG. 5 from storage 120 of FIG. 1 . At block 1310, adder 710 of FIG. 7 may add vectors 415 of FIG. Finally, at block 1315, the multiprocessing system 135 of FIG. 1 transfers the data to the operation scheduler 810 of FIG. 8 of the processing system 140 of FIG. 1 for use in processing the operation request 805 of FIG. can be sent to

도 14는 본 개시의 실시예들에 따른 도 1의 연산 시스템(140)이 도 8의 연산 요청을 처리하기 위한 예시적인 절차의 순서도를 도시한다. 도 14에서, 블록 1405에서, 도 8의 연산 스케줄러(810)는 도 4의 입출력 요청(405)에 응답하여 도 1의 다중 처리 시스템(135)에 의해 읽어진 데이터를 사용하여 도 4의 연산 요청(805)의 처리를 스케줄링할 수 있다. 블록 1410에서, 도 8의 처리 요소(820)는 도 8의 연산 스케줄러(810)로부터 도 8의 연산 요청(805)을 수신할 수 있다. 예를 들어, 블록 1405에서 도 8의 연산 스케줄러(810)는 도 8의 연산 요청(805)을 도 8의 큐(815)에 배치할 수 있고, 블록 1410에서 도 8의 처리 요소(820)는 도 8의 큐(815)로부터 도 8의 연산 요청(805)을 검색할 수 있다. FIG. 14 depicts a flow diagram of an exemplary procedure for processing the computation request of FIG. 8 by computing system 140 of FIG. 1 in accordance with embodiments of the present disclosure. In FIG. 14, at block 1405, operation scheduler 810 of FIG. 8 uses data read by multiple processing system 135 of FIG. 1 in response to I/O request 405 of FIG. 4 to request operation of FIG. 4 The processing of 805 can be scheduled. At block 1410 , processing element 820 of FIG. 8 may receive operation request 805 of FIG. 8 from operation scheduler 810 of FIG. 8 . For example, at block 1405 the operation scheduler 810 of FIG. 8 may place the operation request 805 of FIG. 8 into the queue 815 of FIG. 8 and at block 1410 the processing element 820 of FIG. The operation request 805 of FIG. 8 may be retrieved from the queue 815 of FIG. 8 .

블록 1415에서, 도 8의 처리 요소(820)는 도 8의 연산 요청(805)을 처리할 수 있다. 마지막으로, 블록 1420에서, 도 8의 처리 요소(820)는 도 8의 연산 스케줄러(810)에게 도 8의 연산 요청(805)의 처리가 완료되었음을 알릴 수 있다. 예를 들어, 도 8의 처리 요소(820)는 정보를 도 8의 준비 큐(825)에 배치하여 도 8의 연산 스케줄러(810)에게 도 8의 처리 요소(820)가 도 8의 연산 요청(805)의 처리를 완료하였음을 알릴 수 있다. 본 개시의 일부 실시예들에서, 블록 1420은 (예를 들어, 도 8의 하나의 연산 요청(805)만이 도 8의 처리 요소(820)에 스케줄링되어 있고 도 8의 연산 스케줄러(810)가 도 8의 연산 요청(805)을 처리하기 위해 도 8의 처리 요소(820)가 얼마나 걸릴것인지를 알고 있는 경우) 점선(1425)으로 도시된 바와 같이 생략될 수 있다. At block 1415, the processing element 820 of FIG. 8 may process the operation request 805 of FIG. Finally, at block 1420, processing element 820 of FIG. 8 may inform operation scheduler 810 of FIG. 8 that processing of operation request 805 of FIG. 8 is complete. For example, the processing element 820 of FIG. 8 places information in the ready queue 825 of FIG. 8 to the operation scheduler 810 of FIG. 805) processing may be completed. In some embodiments of the present disclosure, block 1420 (e.g., only one operation request 805 of FIG. 8 is scheduled to processing element 820 of FIG. 8 and operation scheduler 810 of FIG. If it is known how long processing element 820 of FIG. 8 will take to process 8's operation request 805), it can be omitted as shown by dashed line 1425.

도 15a 및 도 15b는 본 개시의 실시예들에 따라, 도 8의 연산 스케줄러(810)가 도 8의 연산 요청(805)을 처리하기 위해 처리 요소(820)를 배열하는 예시적인 절차의 순서도를 도시한다. 도 15a에서, 블록 1505에서, 도 8의 연산 스케줄러(810)는 도 8의 처리 요소(820)를 선택하여 도 8의 연산 요청(805)을 처리할 수 있다. 블록 1510에서, 도 8의 연산 스케줄러(810)는 도 8의 연산 요청(805)을 도 8의 처리 요소(820)로 전송할 수 있다. 대안적으로, 블록 1515에서, 도 8의 연산 스케줄러(810)는 도 8의 연산 요청(805)을 도 8의 큐(815)에 할당할 수 있다. 15A and 15B are flowcharts of an exemplary procedure in which operation scheduler 810 of FIG. 8 arranges processing element 820 to process operation request 805 of FIG. 8, in accordance with embodiments of the present disclosure. show In FIG. 15A , at block 1505 , operation scheduler 810 of FIG. 8 may select processing element 820 of FIG. 8 to process operation request 805 of FIG. 8 . At block 1510, the operation scheduler 810 of FIG. 8 may send the operation request 805 of FIG. 8 to the processing element 820 of FIG. Alternatively, at block 1515, operation scheduler 810 of FIG. 8 may assign operation request 805 of FIG. 8 to queue 815 of FIG.

블록 1505에 대한 대안으로서, 블록 1520 (도 15b)에서, 도 8의 연산 스케줄러(810)는 도 8의 연산 요청(805)을 처리하기에 적합한 도 8의 처리 요소(820)의 유형을 식별할 수 있다. 블록 1525에서 도 8의 연산 스케줄러(810)는 도 8의 연산 요청(805)을 도 8의 처리 요소(820)의 유형에 적합한 도 8의 큐(815)에 할당할 수 있다. As an alternative to block 1505, at block 1520 (FIG. 15B), operation scheduler 810 of FIG. 8 will identify the type of processing element 820 of FIG. 8 suitable for processing operation request 805 of FIG. can At block 1525 the operation scheduler 810 of FIG. 8 may assign the operation request 805 of FIG. 8 to the queue 815 of FIG. 8 appropriate for the type of processing element 820 of FIG.

블록 1505에 대한 또 다른 대안으로서, 블록 1530에서, 도 8의 연산 스케줄러(810)는 도 8의 연산 요청(805)의 워크로드를 결정할 수 있다. 블록 1535에서, 도 8의 연산 스케줄러(810)는 도 8의 연산 요청(805)의 워크로드에 기초하여 도 8의 연산 요청(805)을 도 8의 큐(815)에 할당할 수 있다. As another alternative to block 1505, at block 1530, operation scheduler 810 of FIG. 8 may determine the workload of operation request 805 of FIG. At block 1535, the operation scheduler 810 of FIG. 8 may assign the operation request 805 of FIG. 8 to the queue 815 of FIG. 8 based on the workload of the operation request 805 of FIG.

도 9 내지 도 15b에서, 본 개시의 일부 실시예들이 도시된다. 그러나 당해 기술분야에 속하는 통상의 기술자는 블록들의 순서를 변경하거나, 블록들을 생략하거나, 또는 도면들에 도시되지 않은 연결들을 포함함으로써 본 개시의 다른 실시예들 또한 가능할 수 있다는 것을 인식하게 될 것이다. 순서도들의 모든 그러한 변형들은 명시적으로 기술되었는지 여부에 상관없이 본 개시의 실시예들로 간주된다. 9-15B , some embodiments of the present disclosure are shown. However, those skilled in the art will recognize that other embodiments of the present disclosure are also possible by changing the order of blocks, omitting blocks, or including connections not shown in the figures. All such variations of the flowcharts, whether or not explicitly described, are considered embodiments of the present disclosure.

본 개시의 실시예들은 다중 처리 시스템을 포함한다. 상기 다중 처리 시스템은 입출력(I/O) 요청을 사용하여 입출력 처리/저장 풀로부터 데이터를 로드할 수 있고, 상기 입출력 요청은 연산 요청을 사용하여 처리되는 데이터에 기초할 수 있다. 데이터는 상기 입출력 처리/저장 풀의 스토리지 장치들과 연관된 로드 모듈들을 사용하여 검색될 수 있다. 다수의 저장 장치들을 사용하는 것은 다수의 저장 장치들로부터 병렬 데이터 액세스를 활용하여 상기 데이터를 읽을 때 낮은 레이턴시를 달성함으로써 단일의 저장 장치에 데이터를 저장하는 것보다 기술적인 이점을 얻을 수 있다. Embodiments of the present disclosure include multiple processing systems. The multiprocessing system may load data from an input/output processing/storage pool using input/output (I/O) requests, which may be based on data being processed using operation requests. Data may be retrieved using load modules associated with the storage devices of the I/O processing/storage pool. Using multiple storage devices provides technical advantages over storing data in a single storage device by utilizing parallel data access from multiple storage devices to achieve low latency when reading the data.

다른 입출력 요청들은 다른 큐들에 큐잉될 수 있다. 다수의 큐들을 사용하는 것은 많거나 적은 양의 데이터가 포함될 수 있는 입출력 요청들이 다른 데이터 크기들의 다수의 입출력 요청들에 의해 지연되지 않는다는 점에서 기술적인 이점을 얻을 수 있다. Different I/O requests can be queued to different queues. Using multiple queues provides a technical advantage in that I/O requests that may contain large or small amounts of data are not delayed by multiple I/O requests of different data sizes.

다중 처리 시스템에 의해 검색된 데이터는 연산 시스템에 제공될 수 있다. 상기 연산 시스템은 상기 데이터를 사용하여 연산 요청들을 스케줄링할 수 있다. 상기 연산 시스템은 연산 요청들의 워크로드들에 기초하여 다른 큐들을 사용할 수 있으므로 SLA(service level agreement)에 의해 약속된 초당 쿼리 수(초당 요청량)(query per second)(QPS)를 충족하는 기술적인 이점을 얻을 수 있다. Data retrieved by the multi-processing system may be provided to the computing system. The computing system can use the data to schedule computational requests. Since the computing system can use different queues based on workloads of computational requests, it is technical to meet the number of queries per second (QPS) promised by service level agreement (SLA). advantage can be gained.

DLRM(Deep Learning Recommendation Model) 워크로드들은 입출력 집약적일 수 있다. 그들의 SLA 요구 사항을 충족하기 위해, DRAM(dynamic random access memory)이 큰 임베딩 테이블들(최대 100 GB 또는 그 이상)을 저장하기 위해 필요할 수 있다. 그러한 많은 양의 DRAM은 비용이 많이 들 수 있다. Deep Learning Recommendation Model (DLRM) workloads can be I/O intensive. To meet their SLA requirements, dynamic random access memory (DRAM) may be needed to store large embedding tables (up to 100 GB or more). Such large amounts of DRAM can be expensive.

작은 쿼리 크기에 대해, 사용자 공간 드라이버가 포함된 SSD(Solid State Drive)를 사용하여 임베딩 테이블들을 저장함으로써 SLA를 충족시킬 수 있다. 그러나 합리적으로 큰 쿼리 크기(256 이상)에 대해, 단일의 SSD로 상기 SLA를 충족시키는 것은 어려울 수 있다. 더욱이, 단일의 SSD를 사용하면, 상기 SSD 에서의 잠재적인 입출력 병목 현상으로 인해 쿼리들을 병렬적으로 실행하여 높은 QPS를 달성하는 것이 불가능할 수 있다. For small query sizes, SLAs can be met by using solid-state drives (SSDs) with user-space drivers to store the embedding tables. However, for reasonably large query sizes (256 and above), it can be difficult to meet the above SLA with a single SSD. Furthermore, when using a single SSD, it may be impossible to achieve a high QPS by executing queries in parallel due to potential I/O bottlenecks in the SSD.

다수의 SSD들을 사용하는 연산들을 기초로 하는 쿼리 스케줄러는 낮은 효율을 가지고 로드 밸런싱 문제들이 발생할 수 있다(일부의 SSD들은 더 많은 비율의 쿼리들을 처리하고, 다른 SSD들은 더 낮은 비율의 쿼리들을 처리한다.). 많은 수의 쿼리들을 처리하는 SSD들은 상기 쿼리들을 처리하기에 충분한 입출력을 가지지 않을 수 있고(단일의 SSD 모델과 유사), 적은 수의 쿼리들을 처리하는 SSD들은 활용률이 낮을 수 있다. A query scheduler based on operations using multiple SSDs may have low efficiency and load balancing issues (some SSDs process a higher percentage of queries, others process a lower percentage of queries). .). SSDs that process a large number of queries may not have enough I/O to process the queries (similar to the single SSD model), and SSDs that process a small number of queries may be underutilized.

본 개시의 실시예들은 높은 QPS 및 낮은 레이턴시를 달성하기 위해 입출력 스케줄러를 가지는 다중 처리 및 다중 SSD 시스템을 포함할 수 있다. 스케줄 임베딩 테이블 입출력이 다른 입출력 큐들에 큐들을 스케줄링하기 위해 사용될 수 있다. Embodiments of the present disclosure may include a multi-processing and multi-SSD system with an I/O scheduler to achieve high QPS and low latency. The schedule embedding table I/O can be used to schedule queues to other I/O queues.

입출력 처리/SSD 풀은 다양한 SSD들에 대하여 로드 상태 및 입출력 요청에 기초하여 입출력 큐들로부터 입출력 요청들을 페치(fetch)할 수 있다. The I/O processing/SSD pool may fetch I/O requests from the I/O queues based on load status and I/O requests for the various SSDs.

입출력 처리/SSD 풀 내에서, 다중 SSD들은 입출력 요청들을 충족하기 위해 다수의 활성 스레드들을 사용하는 다중 처리 UNVMe(user-space non-volatile memory express) 드라이버/API(application programming interface)에 의해 액세스될 수 있다. Within an I/O process/SSD pool, multiple SSDs can be accessed by a multi-process user-space non-volatile memory express (UNVMe) driver/application programming interface (API) that uses multiple active threads to satisfy I/O requests. there is.

제2 레벨 연산 스케줄러가 연산 강도(computation intensity)에 기초하여 레이턴시 및 QPS를 더 최적화하기 위해 사용될 수 있다. A second level computation scheduler can be used to further optimize latency and QPS based on computation intensity.

다음의 논의는 본 개시의 특정 측면들이 구현될 수 있는 적절한 장치 또는 장치들에 대한 일반적인 기술을 제공하도록 의도된 것이다. 상기 장치 또는 장치들은 다른 장치로부터 수신된 지시들, 가상 현실(VR) 환경과의 상호 작용, 생체 피드백, 또는 기타 입력 신호에 의할 뿐만 아니라 적어도 부분적으로 키보드들, 마우스 등과 같은 기존 입력 장치들로부터의 입력에 의해 제어될 수 있다. The following discussion is intended to provide a general description of a suitable apparatus or apparatuses in which certain aspects of the present disclosure may be implemented. The device or devices may act at least in part from existing input devices such as keyboards, mice, etc., as well as by instructions received from another device, interaction with a virtual reality (VR) environment, biofeedback, or other input signal. It can be controlled by the input of

여기에서 사용되는 바와 같이, “장치”라는 용어는 단일의 장치, 가상 장치, 또는 함께 동작하는 통신적으로 결합된 장치들, 가상 장치들, 또는 장치들의 시스템을 광범위하게 포함하도록 의도된다. 예시적인 장치들은 개인용 컴퓨터들, 워크스테이션들, 서버들, 휴대용 컴퓨터들, 휴대용 장치들, 전화들, 태블릿들 등과 같은 연산 장치들뿐만 아니라 자동차들, 기차들, 택시들 등과 같은 개인 또는 대중 교통과 같은 운송 장치들을 포함한다. As used herein, the term “device” is intended to broadly include a single device, a virtual device, or a system of communicatively coupled devices, virtual devices, or devices operating together. Exemplary devices include computing devices such as personal computers, workstations, servers, portable computers, portable devices, phones, tablets, etc., as well as personal or public transportation such as cars, trains, taxis, etc. Includes transportation devices.

상기 장치 또는 장치들은 프로그램가능 또는 프로그램불가능 논리 장치들 또는 어레이들과 같은 임베디드 컨트롤러들, ASIC(Application Specific Integrated Circuit)들, 임베디드 컴퓨터들, 스마트 카드들 등을 포함할 수 있다. 상기 장치 또는 장치들은 네트워크 인터페이스, 모뎀, 또는 다른 통신적인 연결과 같은 하나 이상의 연결들을 하나 이상의 원격 장치들에 활용할 수 있다. 장치들은 인트라넷, 인터넷, LAN(local area networks), WAN(wide area networks) 등과 같은 물리적 및/또는 논리적 네트워크의 방식으로 상호 연결될 수 있다. 당해 기술분야에 속하는 통상의 기술자는 네트워크 통신이 무선 주파수(RF), 위성, 마이크로웨이브, IEEE(Institute of Electrical and Electronics Engineers) 802. 11, 블루투스®, 광학, 적외선, 케이블, 레이저 등을 포함하는 다양한 유선 및/또는 무선 단거리 또는 장거리 캐리어들 및 프로토콜들을 활용할 수 있음을 이해할 것이다. The device or devices may include embedded controllers such as programmable or non-programmable logic devices or arrays, Application Specific Integrated Circuits (ASICs), embedded computers, smart cards, and the like. The device or devices may utilize one or more connections, such as a network interface, modem, or other communicative connection, to one or more remote devices. Devices may be interconnected by way of physical and/or logical networks, such as intranets, the Internet, local area networks (LANs), wide area networks (WANs), and the like. Those skilled in the art will understand that network communications include radio frequency (RF), satellite, microwave, Institute of Electrical and Electronics Engineers (IEEE) 802.11, Bluetooth®, optical, infrared, cable, laser, and the like. It will be appreciated that a variety of wired and/or wireless short or long range carriers and protocols may be utilized.

본 개시의 실시예들은 장치에 의해 액세스될 때 장치가 작업들을 수행하거나 추상적인 데이터 유형들을 정의하거나 낮은 수준의 하드웨어 문맥들을 정의하는 결과를 초래하는 기능들, 절차들, 데이터 구조들, 애플리케이션 프로그램들 등을 포함하는 연관된 데이터를 참조하거나 이와 함께 기술될 수 있다. 연관된 데이터는 예를 들어, 휘발성 및/또는 비휘발성 메모리(예를 들어, RAM, ROM 등) 또는 다른 저장 장치들 및 하드 드라이브들, 플로피 디스크들, 광학 저장 장치, 테이프들, 플래시 메모리, 메모리 스틱들, 디지털 비디오디스크들, 생물학적 저장 장치들 등을 포함하는 그들의 연관된 저장 매체에 저장될 수 있다. 연관된 데이터는 패킷들, 직렬 데이터, 병렬 데이터, 전파 신호들 등의 형태로, 물리적 및/또는 논리적 네트워크를 포함하는 전송 환경들을 통해 전달될 수 있고, 압축된 또는 암호화된 형식으로 사용될 수 있다. 연관된 데이터는 분산 환경에서 사용될 수 있고, 장치 액세스를 위해 지역적으로 및/또는 원격으로 저장될 수 있다. Embodiments of the present disclosure describe functions, procedures, data structures, and application programs that, when accessed by a device, result in the device performing tasks, defining abstract data types, or defining low-level hardware contexts. It may refer to or be described in conjunction with associated data including the like. The associated data may include, for example, volatile and/or non-volatile memory (e.g. RAM, ROM, etc.) or other storage devices and hard drives, floppy disks, optical storage devices, tapes, flash memory, memory sticks files, digital videodisks, biological storage devices, and the like. Associated data may be carried in the form of packets, serial data, parallel data, propagated signals, etc., over transmission environments including physical and/or logical networks, and may be used in compressed or encrypted form. Associated data may be used in a distributed environment and may be stored locally and/or remotely for device access.

본 개시의 실시예들은 하나 이상의 프로세서들에 의해 실행 가능한 지시어들을 포함하는 유형의(tangible), 비일시적 장치 판독가능 매체를 포함할 수 있고, 상기 지시어들은 여기에 기술된 본 개시의 요소들을 수행하기 위한 지시어들을 포함할 수 있다. 상술한 다양한 동작 방법들은 다양한 하드웨어 및/또는 소프트웨어 구성요소(들), 회로들, 및/또는 모듈(들)과 같은, 상기 동작들을 수행할 수 있는 임의의 적절한 수단에 의해 수행될 수 있다. 상기 소프트웨어는 논리 기능들을 구현하기 위해 실행가능한 지시어들의 정렬된 목록(listing)을 포함할 수 있고, 단일 또는 다중 코어 프로세서 또는 프로세서 포함 시스템과 같은 지시어 실행 시스템 또는 장치에 의한 사용을 위해 또는 이들과 함께 임의의 “프로세서 판독가능 매체”에 구체화될 수 있다. Embodiments of the present disclosure may include a tangible, non-transitory machine-readable medium containing instructions executable by one or more processors to perform the elements of the present disclosure described herein. may contain directives. The various methods of operation described above may be performed by any suitable means capable of performing the operations, such as various hardware and/or software component(s), circuits, and/or module(s). The software may include an ordered listing of executable instructions to implement logical functions, for use by or in conjunction with an instruction execution system or device, such as a single or multi-core processor or system containing processors. may be embodied in any “processor readable medium”.

여기에 개시된 실시예들과 관련하여 설명된 방법 또는 알고리즘의 블록들 또는 단계들 및 기능들은 하드웨어, 프로세서에 의해 실행되는 소프트웨어 모듈, 또는 이 둘의 조합으로 직접 구현될 수 있다. 소프트웨어로 구현되는 경우, 상기 기능들은 유형의, 비일시적 컴퓨터 판독가능 매체에 하나 이상의 지시어들 또는 코드로 저장되거나 전송될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), 플래시 메모리, ROM(Read Only Memory), EPROM(Electrically Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 레지스터들, 하드 디스크, 이동식 디스크, CD ROM 또는 당해 기술분야에서 알려진 저장 매체의 임의의 다른 형태에 상주할 수 있다. Blocks or steps and functions of a method or algorithm described in connection with the embodiments disclosed herein may be implemented directly in hardware, in a software module executed by a processor, or in a combination of the two. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code in a tangible, non-transitory computer readable medium. A software module may include RAM (Random Access Memory), flash memory, ROM (Read Only Memory), EPROM (Electrically Programmable ROM), EEPROM (Electrically Erasable Programmable ROM), registers, hard disk, removable disk, CD ROM, or It may reside on any other form of storage medium known in .

예시된 실시예들을 참조하여 본 개시의 원리들을 기술하고 예시하였으나, 예시된 실시예들은 이러한 원리들을 벗어나지 않고 배열 및 세부사항에서 수정될 수 있고 임의의 바람직한 방식으로 결합될 수 있음을 인식할 것이다. 그리고 상술한 논의는 특정한 실시예들에 초점을 맞추었지만 다른 구성들이 고려된다. 특히, 여기에서 사용된 “본 개시의 일 실시예에 따른” 등과 같은 표현들에도 불구하고, 이러한 문구는 일반적으로 가능한 실시예를 참조하기 위한 것이고, 특정한 실시예 구성들로 본 개시를 제한하려는 의도는 아니다. 여기에서 사용된 바와 같이, 이들 용어들은 다른 실시예들로 결합가능한 동일하거나 다른 실시예들을 지칭할 수 있다. 상술한 예시적인 실시예들은 그것의 개시를 제한하는 것으로 해석되어서는 안 된다. 몇몇의 실시예들이 기술되었지만, 당해 기술분야에 속하는 통상의 기술자는 본 개시의 신규한 교시 및 이점으로부터 실질적으로 벗어나지 않고 그러한 실시예들에 대한 많은 수정이 가능하다는 것을 쉽게 이해할 것이다. 따라서 그러한 모든 수정들은 청구범위에 정의된 바와 같이 본 개시의 범위 내에 포함되도록 의도된다. Although the principles of the present disclosure have been described and illustrated with reference to the illustrated embodiments, it will be appreciated that the illustrated embodiments may be modified in arrangement and detail and combined in any desired manner without departing from these principles. And while the foregoing discussion has focused on specific embodiments, other configurations are contemplated. In particular, despite phrases such as “according to one embodiment of the present disclosure” and the like used herein, such phrases are generally intended to refer to possible embodiments and are not intended to limit the present disclosure to specific embodiment configurations. is not As used herein, these terms may refer to the same or different embodiments that are combinable into other embodiments. The foregoing exemplary embodiments should not be construed as limiting its disclosure. Although several embodiments have been described, those skilled in the art will readily appreciate that many modifications to such embodiments are possible without materially departing from the novel teachings and advantages of the present disclosure. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the claims.

본 개시의 실시예들은 제한 없이 다음의 진술들로 확장될 수 있다. Embodiments of the present disclosure may extend to the following statements without limitation.

진술 1. 본 개시의 일 실시예는, Statement 1. One embodiment of the present disclosure:

데이터를 저장하는 저장 장치;a storage device that stores data;

입출력(I/O) 요청에 적어도 부분적으로 기초하여 상기 저장 장치로부터 상기 데이터를 읽는 로드 모듈; 및 a load module that reads the data from the storage device based at least in part on input/output (I/O) requests; and

상기 입출력 요청을 수신하고, 상기 입출력 요청의 크기에 적어도 부분적으로 기초하여 상기 입출력 요청을 상기 로드 모듈로 전달하는 스케줄러를 포함하는 시스템.and a scheduler that receives the I/O requests and directs the I/O requests to the load module based at least in part on the size of the I/O requests.

진술2. 본 개시의 일 실시예는 진술 1에 따른 상기 시스템을 포함하고, 상기 스케줄러는 상기 입출력 요청의 크기에 적어도 부분적으로 기초하여 상기 입출력 요청을 큐에 전달하도록 구성된다. Statement 2. An embodiment of the present disclosure includes the system according to statement 1, wherein the scheduler is configured to forward the I/O request to a queue based at least in part on a size of the I/O request.

진술 3. 본 개시의 일 실시예는 진술 2에 따른 상기 시스템을 포함하고, 상기 큐로부터 상기 입출력 요청을 검색하고 상기 입출력 요청을 상기 로드 모듈로 할당하는 관리자를 더 포함한다. Statement 3. An embodiment of the present disclosure includes the system according to statement 2, further comprising a manager to retrieve the I/O request from the queue and assign the I/O request to the load module.

진술 4. 본 개시의 일 실시예는 진술 3에 따른 상기 시스템을 포함하고, 상기 시스템은,Statement 4. One embodiment of the present disclosure includes the system according to Statement 3, wherein the system comprises:

제2 데이터를 저장하는 제2 저장 장치; 및 a second storage device that stores second data; and

상기 제2 저장 장치로부터 상기 제2 데이터를 읽는 제2 로드 모듈을 더 포함하고, a second load module for reading the second data from the second storage device;

상기 관리자는 상기 데이터를 요청하는 상기 입출력 요청에 적어도 부분적으로 기초하여 상기 입출력 요청을 상기 로드 모듈에 할당하도록 구성된다. The manager is configured to assign the I/O request to the load module based at least in part on the I/O request requesting the data.

진술 5. 본 개시의 일 실시예는 진술 4에 따른 상기 시스템을 포함하고, Statement 5. One embodiment of the present disclosure includes the system according to Statement 4;

상기 입출력 요청은 상기 데이터의 제1 식별자를 포함하고, 그리고 상기 관리자는 상기 데이터의 상기 제1 식별자를 상기 저장 장치의 제2 식별자에 맵핑하는 테이블을 포함한다.The input/output request includes a first identifier of the data, and the manager includes a table mapping the first identifier of the data to a second identifier of the storage device.

진술 6. 본 개시의 일 실시예는 진술 3에 따른 상기 시스템을 포함하고, 상기 시스템은 상기 저장 장치로부터 상기 데이터를 읽는 제2 로드 모듈을 포함한다. Statement 6. An embodiment of the present disclosure includes the system according to Statement 3, wherein the system includes a second load module to read the data from the storage device.

진술 7. 본 개시의 일 실시예는 진술 6에 따른 상기 시스템을 포함하고, 상기 관리자는 상기 입출력 요청을 처리하기 위한 상기 로드 모듈을 선택하도록 구성된다. Statement 7. An embodiment of the present disclosure includes the system according to Statement 6, wherein the manager is configured to select the load module to process the I/O request.

진술 8. 본 개시의 일 실시예는 진술 3에 따른 상기 시스템을 포함하고, Statement 8. An embodiment of the present disclosure includes the system according to Statement 3;

상기 큐는 선입선출(FIFO) 큐를 포함하고, 그리고 상기 관리자는 상기 큐의 헤드(head)로부터 상기 입출력 요청을 액세스하도록 구성된다. The queue comprises a first in first out (FIFO) queue, and the manager is configured to access the input/output requests from the head of the queue.

진술 9. 본 개시의 일 실시예는 진술 3에 따른 상기 시스템을 포함하고, Statement 9. One embodiment of the present disclosure includes the system according to Statement 3;

상기 큐는 우선순위 큐를 포함하고, The queue includes a priority queue;

상기 스케줄러는 상기 입출력 요청의 크기에 적어도 부분적으로 기초하여 우선순위 태그와 상기 입출력 요청을 연관시키도록 구성되고, 그리고the scheduler is configured to associate the I/O request with a priority tag based at least in part on the size of the I/O request; and

상기 관리자는 상기 우선순위 태그에 적어도 부분적으로 기초하여 상기 큐로부터 상기 입출력 요청을 액세스하도록 구성된다. The manager is configured to access the input/output requests from the queue based at least in part on the priority tag.

진술 10. 본 개시의 일 실시예는 진술 3에 따른 상기 시스템을 포함하고, Statement 10. An embodiment of the present disclosure includes the system according to Statement 3;

상기 스케줄러는 임계치를 포함하고, 그리고the scheduler includes a threshold, and

상기 스케줄러는 상기 임계치보다 작은 상기 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 상기 입출력 요청을 상기 큐에 배치하도록 구성된다. The scheduler is configured to place the I/O request into the queue based at least in part on the size of the I/O request being less than the threshold.

진술 11. 본 개시의 일 실시예는 진술 10에 따른 상기 시스템을 포함하고, Statement 11. An embodiment of the present disclosure includes the system according to Statement 10;

상기 시스템은 제2 큐를 더 포함하고, 그리고The system further comprises a second queue, and

상기 스케줄러는 상기 임계치를 초과하는 상기 제2 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 제2 입출력 요청을 상기 제2 큐에 배치하도록 구성된다. The scheduler is configured to place a second I/O request into the second queue based at least in part on the size of the second I/O request exceeding the threshold.

진술 12. 본 개시의 일 실시예는 진술 11에 따른 상기 시스템을 포함하고, 상기 관리자는 라운드 로빈 액세스(round robin access)를 사용하여 상기 큐로부터 상기 입출력 요청을 검색하고, 상기 제2 큐로부터 상기 제2 입출력 요청을 검색하도록 구성된다. Statement 12. An embodiment of the present disclosure includes the system according to Statement 11, wherein the manager retrieves the I/O request from the queue using round robin access, and retrieves the I/O request from the second queue. and retrieve the second input/output request.

진술 13. 본 개시의 일 실시예는 진술 1에 따른 상기 시스템을 포함하고, 상기 로드 모듈은,Statement 13. An embodiment of the present disclosure includes the system according to Statement 1, wherein the load module comprises:

상기 저장 장치로부터 상기 데이터의 제1 벡터 및 제2 벡터를 읽는 판독기; 및a reader that reads the first vector and the second vector of data from the storage device; and

상기 제1 벡터 및 상기 제2 벡터를 더하는 가산기를 포함한다. and an adder for adding the first vector and the second vector.

진술 14. 본 개시의 일 실시예는 진술 1에 따른 상기 시스템을 포함하고, 상기 로드 모듈은 상기 데이터를 제2 스케줄러에 전송하도록 구성된다. Statement 14. An embodiment of the present disclosure includes the system according to Statement 1, wherein the load module is configured to transmit the data to a second scheduler.

진술 15. 본 개시의 일 실시예는 진술 14에 따른 상기 시스템을 포함하고, 상기 시스템은,Statement 15. One embodiment of the present disclosure includes the system according to Statement 14, the system comprising:

처리 요소; 및processing element; and

상기 처리 요소에 의한 연산 요청의 처리를 스케줄링하도록 구성되는 제2 스케줄러를 더 포함하고, 상기 연산 요청은 상기 데이터를 사용한다. and a second scheduler configured to schedule processing of operation requests by the processing element, wherein the operation requests use the data.

진술 16. 본 개시의 일 실시예는 진술 14에 따른 상기 시스템을 포함하고, 상기 제2 스케줄러는 상기 연산 요청을 큐에 할당하도록 구성된다.Statement 16. An embodiment of the present disclosure includes the system according to statement 14, wherein the second scheduler is configured to assign the operation request to a queue.

진술 17. 본 개시의 일 실시예는 진술 16에 따른 상기 시스템을 포함하고, 상기 제2 스케줄러는 상기 연산 요청의 워크로드에 적어도 부분적으로 기초하여 상기 연산 요청을 상기 큐에 할당하도록 구성된다.Statement 17. An embodiment of this disclosure includes the system according to Statement 16, wherein the second scheduler is configured to assign the operation request to the queue based at least in part on a workload of the operation request.

진술 18. 본 개시의 일 실시예는 진술 16에 따른 상기 시스템을 포함하고, 상기 처리 요소는 상기 큐로부터 상기 연산 요청을 검색하도록 구성된다.Statement 18. An embodiment of the present disclosure includes the system according to Statement 16, wherein the processing element is configured to retrieve the operation request from the queue.

진술 19. 본 개시의 일 실시예는 진술 16에 따른 상기 시스템을 포함하고, Statement 19. An embodiment of the present disclosure includes the system according to Statement 16;

상기 처리 요소와 연관된 상기 큐; 및 the queue associated with the processing element; and

제2 연산 요청을 제2 처리 요소와 연관된 제2 큐에 할당하도록 구성되는 제2 스케줄러를 포함한다. and a second scheduler configured to assign the second operation request to a second queue associated with the second processing element.

진술 20. 본 개시의 일 실시예는 진술 16에 따른 상기 시스템을 포함하고, 상기 큐는 상기 처리 요소의 유형에 연관된다.Statement 20. An embodiment of the present disclosure includes the system according to Statement 16, wherein the queue is associated with the type of processing element.

진술 21. 본 개시의 일 실시예는 진술 16에 따른 상기 시스템을 포함하고, 상기 처리 요소는 완료 상태(completion status)를 상기 제2 스케줄러로 반환하도록 구성된다.Statement 21. An embodiment of the present disclosure includes the system according to Statement 16, wherein the processing element is configured to return a completion status to the second scheduler.

진술 22. 본 개시의 일 실시예는 진술 21에 따른 상기 시스템을 포함하고, 상기 처리 요소는 상기 완료 상태를 제2 큐에 배치하도록 구성된다. Statement 22. An embodiment of the present disclosure includes the system according to Statement 21, wherein the processing element is configured to place the completion status in a second queue.

진술 23. 본 개시의 일 실시예는 진술 22에 따른 상기 시스템을 포함하고, 상기 제2 스케줄러는 상기 제2 큐로부터 상기 완료 상태를 검색하도록 구성된다.Statement 23. An embodiment of the present disclosure includes the system according to statement 22, wherein the second scheduler is configured to retrieve the completion status from the second queue.

진술 24. 본 개시의 일 실시예는 진술 22에 따른 상기 시스템을 포함하고, 제2 처리 요소는 제2 완료 상태를 상기 제2 큐에 배치하도록 구성된다.Statement 24. An embodiment of the present disclosure includes the system according to Statement 22, wherein a second processing element is configured to place a second completion state into the second queue.

진술 25. 본 개시의 일 실시예는,Statement 25. One embodiment of the present disclosure:

스케줄러에서 입출력(I/O) 요청을 수신하는 단계;Receiving an input/output (I/O) request from a scheduler;

상기 입출력 요청의 크기에 적어도 부분적으로 기초하여 상기 입출력 요청을 상기 스케줄러로부터 로드 모듈로 전달하는 단계; 및forwarding the I/O request from the scheduler to a load module based at least in part on the size of the I/O request; and

상기 로드 모듈에 의해 상기 입출력 요청에 적어도 부분적으로 기초하여 저장 장치로부터 데이터를 읽는 단계를 포함하는 방법을 포함한다.and reading data from a storage device based at least in part on the I/O request by the load module.

진술 26. 본 개시의 일 실시예는 진술 25에 따른 상기 방법을 포함하고, Statement 26. One embodiment of the present disclosure includes the method according to Statement 25;

상기 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 상기 입출력 요청을 상기 스케줄러로부터 상기 로드 모듈로 전달하는 단계는,Passing the I/O request from the scheduler to the load module based at least in part on the size of the I/O request comprises:

상기 스케줄러에 의해 상기 입출력 요청의 상기 크기를 결정하는 단계; determining the size of the input/output request by the scheduler;

상기 스케줄러에 의해 상기 입출력 요청의 크기에 적어도 부분적으로 기초하여 큐를 식별하는 단계; 및identifying, by the scheduler, a queue based at least in part on the size of the I/O request; and

상기 스케줄러에 의해 상기 입출력 요청을 상기 큐에 배치하는 단계를 포함한다.placing the I/O request into the queue by the scheduler.

진술 27. 본 개시의 일 실시예는 진술 26에 따른 상기 방법을 포함하고,Statement 27. One embodiment of the present disclosure includes the method according to Statement 26;

상기 로드 모듈에 의해 상기 저장 장치로부터 상기 데이터를 읽는 단계는,The step of reading the data from the storage device by the load module,

관리자에 의해 상기 큐로부터 상기 입출력 요청을 검색하는 단계; 및retrieving the input/output request from the queue by a manager; and

상기 관리자로부터 상기 로드 모듈로 상기 입출력 요청을 전송하는 단계를 포함한다.and transmitting the input/output request from the manager to the load module.

진술 28. 본 개시의 일 실시예는 진술 27에 따른 상기 방법을 포함하고, 상기 입출력 요청을 상기 로드 모듈로 전송하는 단계는,Statement 28. One embodiment of the present disclosure includes the method according to Statement 27, wherein sending the I/O request to the load module comprises:

상기 관리자에 의해 상기 데이터를 저장하는 상기 저장 장치를 식별하는 단계; 및identifying the storage device storing the data by the administrator; and

상기 관리자에 의해 상기 저장 장치에 액세스하는 상기 로드 모듈에 적어도 부분적으로 기초하여 상기 로드 모듈을 식별하는 단계를 포함한다. and identifying the load module based at least in part on the load module accessing the storage device by the manager.

진술 29. 본 개시의 일 실시예는 진술 28에 따른 상기 방법을 포함하고, Statement 29. One embodiment of the present disclosure includes the method according to Statement 28;

상기 관리자에 의해 상기 데이터를 저장하는 상기 저장 장치를 식별하는 단계는, 상기 입출력 요청의 제1 식별자를 상기 저장 장치의 제2 식별자에 맵핑하는 단계를 포함한다.The identifying of the storage device storing the data by the manager includes mapping a first identifier of the input/output request to a second identifier of the storage device.

진술 30. 본 개시의 일 실시예는 진술 28에 따른 상기 방법을 포함하고, 상기 관리자에 의해 상기 데이터를 저장하는 상기 저장 장치를 식별하는 단계는, 상기 관리자에 의해 상기 저장 장치 및 제2 저장 장치로부터 상기 데이터를 저장하는 상기 저장 장치를 식별하는 단계를 포함한다. Statement 30. One embodiment of the present disclosure includes the method according to Statement 28, wherein identifying the storage device storing the data by the administrator comprises: identifying the storage device and the second storage device by the administrator; and identifying the storage device that stores the data from.

진술 31. 본 개시의 일 실시예는 진술 27에 따른 상기 방법을 포함하고, Statement 31. One embodiment of the present disclosure includes the method according to Statement 27;

상기 큐는 선입선출(FIFO) 큐를 포함하고,The queue comprises a first-in-first-out (FIFO) queue;

상기 스케줄러에 의해 상기 큐에 상기 입출력 요청을 배치하는 단계는,Placing the input/output request to the queue by the scheduler comprises:

상기 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 상기 입출력 요청을 위한 우선순위 태그를 결정하는 단계; 및 determining a priority tag for the I/O request based at least in part on the size of the I/O request; and

상기 우선순위 태그를 상기 큐의 상기 입출력 요청과 연관시키는 단계를 포함한다. associating the priority tag with the input/output request of the queue.

진술 32. 본 개시의 일 실시예는 진술 31에 따른 상기 방법을 포함하고, Statement 32. One embodiment of the present disclosure includes the method according to Statement 31;

상기 관리자에 의해 상기 큐로부터 상기 입출력 요청을 검색하는 단계는 상기 우선순위 태그에 적어도 부분적으로 기초하여 상기 관리자에 의해 상기 큐로부터 상기 입출력 요청을 검색하는 단계를 포함한다. Retrieving the I/O request from the queue by the manager includes retrieving the I/O request from the queue by the manager based at least in part on the priority tag.

진술 33. 본 개시의 일 실시예는 진술 27에 따른 상기 방법을 포함하고, Statement 33. One embodiment of the present disclosure includes the method according to Statement 27;

상기 스케줄러에 의해 상기 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 상기 큐를 식별하는 단계는, identifying the queue based at least in part on the size of the I/O request by the scheduler;

상기 스케줄러에 의해 임계치보다 작은 상기 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 상기 큐를 식별하는 단계를 포함한다.and identifying, by the scheduler, the queue based at least in part on the size of the input/output request being less than a threshold.

진술 34. 본 개시의 일 실시예는 진술 33에 따른 상기 방법을 포함하고, Statement 34. One embodiment of the present disclosure includes the method according to Statement 33;

상기 스케줄러에 의해 상기 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 상기 큐를 식별하는 단계는,identifying the queue based at least in part on the size of the I/O request by the scheduler;

상기 스케줄러에 의해 임계치보다 큰 제2 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 제2 큐를 식별하는 단계를 더 포함한다.and identifying, by the scheduler, a second queue based at least in part on the size of the second I/O request being greater than a threshold.

진술 35. 본 개시의 일 실시예는 진술 34에 따른 상기 방법을 포함하고,Statement 35. One embodiment of the present disclosure includes the method according to Statement 34;

상기 관리자에 의해 상기 큐로부터 상기 입출력 요청을 검색하는 단계는,The step of retrieving the input/output request from the queue by the manager,

라운드 로빈 액세스(round robin access)를 사용하여 상기 큐로부터 상기 입출력 요청을 검색하고, 상기 제2 큐로부터 상기 제2 입출력 요청을 검색하는 단계를 포함한다.and retrieving the I/O request from the queue using round robin access, and retrieving the second I/O request from the second queue.

진술 36. 본 개시의 일 실시예는 진술 25에 따른 상기 방법을 포함하고,Statement 36. One embodiment of the present disclosure includes the method according to Statement 25;

상기 저장 장치로부터 제1 벡터를 읽는 단계;reading a first vector from the storage device;

상기 저장 장치로부터 제2 벡터를 읽는 단계; 및reading a second vector from the storage device; and

상기 제1 벡터 및 상기 제2 벡터를 더하여 상기 데이터를 생성하는 단계를 포함한다.and generating the data by adding the first vector and the second vector.

진술 37. 본 개시의 일 실시예는 진술 25에 따른 상기 방법을 포함하고,Statement 37. One embodiment of the present disclosure includes the method according to Statement 25;

상기 데이터를 상기 로드 모듈로부터 제2 스케줄러로 전송하는 단계를 더 포함한다.and transmitting the data from the load module to a second scheduler.

진술 38. 본 개시의 일 실시예는 진술 37에 따른 상기 방법을 포함하고,Statement 38. One embodiment of the present disclosure includes the method according to Statement 37;

상기 데이터를 사용하여 연산 요청을 처리하기 위해 처리 요소를 스케줄링하는 단계를 더 포함한다.Scheduling a processing element to process a computational request using the data.

진술 39. 본 개시의 일 실시예는 진술 38에 따른 상기 방법을 포함하고,Statement 39. One embodiment of the present disclosure includes the method according to Statement 38;

상기 데이터를 사용하여 상기 연산 요청을 처리하기 위해 상기 처리 요소를 스케줄링하는 단계는,Scheduling the processing element to process the operation request using the data comprises:

상기 연산 요청을 큐에 할당하는 단계를 포함한다.and assigning the operation request to a queue.

진술 40. 본 개시의 일 실시예는 진술 39에 따른 상기 방법을 포함하고,Statement 40. An embodiment of the present disclosure includes the method according to Statement 39;

상기 연산 요청을 상기 큐에 할당하는 단계는,The step of assigning the operation request to the queue,

상기 연산 요청의 워크로드에 적어도 부분적으로 기초하여 상기 연산 요청을 상기 큐에 할당하는 단계를 포함한다.and assigning the operation request to the queue based at least in part on the workload of the operation request.

진술 41. 본 개시의 일 실시예는 진술 40에 따른 상기 방법을 포함하고,Statement 41. One embodiment of the present disclosure includes the method according to Statement 40;

상기 처리 요소에 의해 상기 큐로부터 상기 연산 요청을 검색하는 단계; 및retrieving the operation request from the queue by the processing element; and

상기 처리 요소에 의해 상기 데이터를 사용하여 상기 연산 요청을 처리하는 단계를 더 포함한다.and processing the operation request using the data by the processing element.

진술 42. 본 개시의 일 실시예는 진술 41에 따른 상기 방법을 포함하고,Statement 42. An embodiment of the present disclosure includes the method according to Statement 41;

완료 상태(completion status)를 제2 큐의 상기 처리 요소로부터 상기 제2 스케줄러로 반환하는 단계를 더 포함한다.and returning a completion status from the processing element in a second queue to the second scheduler.

진술 43. 본 개시의 일 실시예는 진술 39에 따른 상기 방법을 포함하고, Statement 43. One embodiment of the present disclosure includes the method according to Statement 39;

상기 제2 스케줄러에 의해 상기 제2 큐로부터 상기 완료 상태를 검색하는 단계를 더 포함한다.and retrieving the completion status from the second queue by the second scheduler.

진술 44. 본 개시의 일 실시예는 진술 43에 따른 상기 방법을 포함하고,Statement 44. One embodiment of the present disclosure includes the method according to Statement 43;

상기 제2 스케줄러에 의해 상기 제2 큐로부터 제2 완료 상태를 검색하는 단계를 더 포함하고, 상기 제2 완료 상태는 제2 처리 요소에 의해 상기 제2 큐에 배치된다.and retrieving a second completion status from the second queue by the second scheduler, the second completion status being placed in the second queue by a second processing element.

진술 45. 본 개시의 일 실시예는 진술 39에 따른 상기 방법을 포함하고, 상기 연산 요청을 상기 큐에 할당하는 단계는, 상기 연산 요청을 상기 처리 요소와 연관된 상기 큐에 할당하는 단계를 포함한다.Statement 45. An embodiment of this disclosure includes the method according to Statement 39, wherein assigning the operation request to the queue comprises assigning the operation request to the queue associated with the processing element. .

진술 46. 본 개시의 일 실시예는 진술 39에 따른 상기 방법을 포함하고, Statement 46. One embodiment of the present disclosure includes the method according to Statement 39;

상기 연산 요청에 적어도 부분적으로 기초하여 상기 처리 요소의 유형을 식별하는 단계를 더 포함하고,further comprising identifying the type of processing element based at least in part on the operation request;

상기 연산 요청을 상기 큐에 할당하는 단계는 상기 연산 요청을 상기 처리 요소의 상기 유형과 연관된 상기 큐에 할당하는 단계를 포함한다.Assigning the operation request to the queue includes assigning the operation request to the queue associated with the type of processing element.

진술 47. 본 개시의 일 실시예는 비일시적 저장 매체를 포함하는 물품을 포함하고, 상기 비일시적 저장 매체는, 장치에 의해 실행되는 경우:Statement 47. One embodiment of this disclosure includes an article comprising a non-transitory storage medium, wherein the non-transitory storage medium, when executed by a device:

스케줄러에서 입출력(I/O) 요청을 수신하는 것;receiving input/output (I/O) requests from the scheduler;

상기 입출력 요청의 크기에 적어도 부분적으로 기초하여 상기 입출력 요청을 상기 스케줄러로부터 로드 모듈로 전달하는 것; 및forwarding the I/O request from the scheduler to a load module based at least in part on the size of the I/O request; and

상기 입출력 요청에 적어도 부분적으로 기초하여 상기 로드 모듈에 의해 상기 저장 장치로부터 데이터를 읽는 것의 결과를 초래하는 명령어들을 저장한다. stores instructions that result in reading data from the storage device by the load module based at least in part on the I/O request.

진술 48. 본 개시의 일 실시예는 진술 47에 따른 상기 물품을 포함하고, 상기 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 상기 입출력 요청을 상기 스케줄러로부터 상기 로드 모듈로 전달하는 것은,Statement 48. An embodiment of this disclosure includes the article according to Statement 47, wherein forwarding the I/O request from the scheduler to the load module based at least in part on the size of the I/O request comprises:

상기 스케줄러에 의해 상기 입출력 요청의 상기 크기를 결정하는 것;determining the size of the I/O request by the scheduler;

상기 스케줄러에 의해 상기 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 큐를 식별하는 것; 및 identifying a queue based at least in part on the size of the I/O request by the scheduler; and

상기 스케줄러에 의해 상기 입출력 요청을 상기 큐에 배치하는 것을 포함한다.and placing the I/O request into the queue by the scheduler.

진술 49. 본 개시의 일 실시예는 진술 48에 따른 상기 물품을 포함하고, 상기 로드 모듈에 의해 상기 저장 장치로부터 상기 데이터를 읽는 것은,Statement 49. One embodiment of this disclosure includes the article according to Statement 48, wherein reading the data from the storage device by the load module comprises:

관리자에 의해 상기 큐로부터 상기 입출력 요청을 검색하는 것; 및retrieving the I/O request from the queue by a manager; and

상기 관리자로부터 상기 로드 모듈로 상기 입출력 요청을 전송하는 것을 포함한다.and transmitting the input/output request from the manager to the load module.

진술 50. 본 개시의 일 실시예는 진술 49에 따른 상기 물품을 포함하고, 상기 입출력 요청을 상기 로드 모듈로 전송하는 것은, Statement 50. One embodiment of this disclosure includes the article according to Statement 49, wherein sending the I/O request to the load module comprises:

상기 관리자에 의해 상기 데이터를 저장하는 상기 저장 장치를 식별하는 것; 및 identifying the storage device storing the data by the manager; and

상기 관리자에 의해 상기 저장 장치에 액세스할 수 있는 상기 로드 모듈에 적어도 부분적으로 기초하여 상기 로드 모듈을 식별하는 것을 포함한다.and identifying the load module based at least in part on the load module having access to the storage device by the administrator.

진술 51. 본 개시의 일 실시예는 진술 50에 따른 상기 물품을 포함하고, 상기 관리자에 의해 상기 데이터를 저장하는 상기 저장 장치를 식별하는 것은, Statement 51. One embodiment of this disclosure includes the article according to Statement 50, wherein identifying the storage device storing the data by the administrator includes:

상기 입출력 요청의 제1 식별자를 상기 저장 장치의 제2 식별자에 맵핑하는 것을 포함한다.and mapping the first identifier of the input/output request to the second identifier of the storage device.

진술 52. 본 개시의 일 실시예는 진술 50에 따른 상기 물품을 포함하고, 상기 관리자에 의해 상기 데이터를 저장하는 상기 저장 장치를 식별하는 것은, Statement 52. One embodiment of this disclosure includes the article according to Statement 50, wherein identifying the storage device storing the data by the administrator includes:

상기 관리자에 의해 상기 저장 장치 및 제2 저장 장치로부터 상기 데이터를 저장하는 상기 저장 장치를 식별하는 것을 포함한다.and identifying the storage device storing the data from the storage device and a second storage device by the administrator.

진술 53. 본 개시의 일 실시예는 진술 49에 따른 상기 물품을 포함하고,Statement 53. An embodiment of the present disclosure includes the article according to Statement 49;

상기 큐는 선입선출(FIFO) 큐를 포함하고;the queue comprises a first-in-first-out (FIFO) queue;

상기 스케줄러에 의해 상기 큐에 상기 입출력 요청을 배치하는 것은,Placing the I/O request to the queue by the scheduler comprises:

상기 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 상기 입출력 요청에 대한 우선순위 태그를 결정하는 것; 및 determining a priority tag for the I/O request based at least in part on the size of the I/O request; and

상기 우선순위 태그와 상기 큐의 상기 입출력 요청을 연관시키는 것을 포함한다. and associating the priority tag with the input/output request of the queue.

진술 54. 본 개시의 일 실시예는 진술 53에 따른 상기 물품을 포함하고, 상기 관리자에 의해 상기 큐로부터 상기 입출력 요청을 검색하는 것은,Statement 54. One embodiment of this disclosure includes the article according to Statement 53, wherein retrieving the I/O request from the queue by the manager comprises:

상기 우선순위 태그에 적어도 부분적으로 기초하여 상기 관리자에 의해 상기 큐로부터 상기 입출력 요청을 검색하는 것을 포함한다.and retrieving the I/O request from the queue by the manager based at least in part on the priority tag.

진술 55. 본 개시의 일 실시예는 진술 49에 따른 상기 물품을 포함하고, 상기 스케줄러에 의해 상기 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 상기 큐를 식별하는 것은, Statement 55. One embodiment of this disclosure includes the article according to statement 49, wherein identifying the queue based at least in part on the size of the I/O request by the scheduler comprises:

상기 스케줄러에 의해 임계치보다 작은 상기 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 상기 큐를 식별하는 것을 포함한다.and identifying the queue based at least in part on the size of the I/O request being less than a threshold by the scheduler.

진술 56. 본 개시의 일 실시예는 진술 55에 따른 상기 물품을 포함하고, 상기 스케줄러에 의해 상기 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 상기 큐를 식별하는 것은, Statement 56. One embodiment of this disclosure includes the article according to Statement 55, wherein identifying the queue based at least in part on the size of the I/O request by the scheduler comprises:

상기 스케줄러에 의해 임계치보다 큰 제2 입출력 요청의 상기 크기에 적어도 부분적으로 기초하여 제2 큐를 식별하는 것을 더 포함한다.and identifying, by the scheduler, a second queue based at least in part on the size of a second I/O request that is greater than a threshold.

진술 57. 본 개시의 일 실시예는 진술 56에 따른 상기 물품을 포함하고, 상기 관리자에 의해 상기 큐로부터 상기 입출력 요청을 검색하는 것은, Statement 57. One embodiment of this disclosure includes the article according to Statement 56, wherein retrieving the I/O request from the queue by the manager comprises:

라운드 로빈 액세스(round robin access)를 사용하여 상기 큐로부터 상기 입출력 요청을 검색하고 상기 제2 큐로부터 상기 제2 입출력 요청을 검색하는 것을 포함한다.and retrieving the I/O request from the queue and the second I/O request from the second queue using round robin access.

진술 58. 본 개시의 일 실시예는 진술 47에 따른 상기 물품을 포함하고, 상기 로드 모듈에 의해 상기 저장 장치로부터 상기 데이터를 읽는 것은,Statement 58. One embodiment of this disclosure includes the article according to Statement 47, wherein reading the data from the storage device by the load module comprises:

상기 저장 장치로부터 제1 벡터를 읽는 것;reading a first vector from the storage device;

상기 저장 장치로부터 제2 벡터를 읽는 것; 및 reading a second vector from the storage device; and

상기 제1 벡터 및 상기 제2 벡터를 더하여 상기 데이터를 생성하는 것을 포함한다.and generating the data by adding the first vector and the second vector.

진술 59. 본 개시의 일 실시예는 진술 47에 따른 상기 물품을 포함하고, 상기 비일시적 저장 매체는, 상기 장치에 의해 실행될 때,Statement 59. An embodiment of this disclosure includes the article according to Statement 47, wherein the non-transitory storage medium, when executed by the device,:

상기 로드 모듈로부터 제2 스케줄러로 상기 데이터를 전송하는 것의 결과를 초래하는 추가의 지시어들을 저장한다.Stores additional directives that result in transferring the data from the load module to the second scheduler.

진술 60. 본 개시의 일 실시예는 진술 59에 따른 상기 물품을 포함하고, 상기 비일시적 저장 매체는, 상기 장치에 의해 실행될 때,Statement 60. An embodiment of this disclosure includes the article according to Statement 59, wherein the non-transitory storage medium, when executed by the device,:

상기 데이터를 사용하여 연산 요청을 처리하기 위한 처리 요소를 스케줄링하는 것의 결과를 초래하는 추가의 지시어들을 저장한다.The data is used to store additional directives that result in scheduling a processing element to process a computational request.

진술 61. 본 개시의 일 실시예는 진술 60에 따른 상기 물품을 포함하고, 상기 데이터를 사용하여 상기 연산 요청을 처리하기 위해 상기 처리 요소를 스케줄링하는 것은,Statement 61. An embodiment of this disclosure includes the article according to statement 60, wherein scheduling the processing element to process the computational request using the data includes:

상기 연산 요청을 큐에 할당하는 것을 포함한다.and assigning the operation request to a queue.

진술 62. 본 개시의 일 실시예는 진술 61에 따른 상기 물품을 포함하고, 상기 연산 요청을 상기 큐에 할당하는 것은,Statement 62. One embodiment of this disclosure includes the item according to statement 61, wherein assigning the operation request to the queue comprises:

상기 연산 요청의 워크로드에 적어도 부분적으로 기초하여 상기 연산 요청을 상기 큐에 할당하는 것을 포함한다.and assigning the operation request to the queue based at least in part on the workload of the operation request.

진술 63. 본 개시의 일 실시예는 진술 62에 따른 상기 물품을 포함하고, 상기 비일시적 저장 매체는, 상기 장치에 의해 실행될 때,Statement 63. An embodiment of this disclosure includes the article according to Statement 62, wherein the non-transitory storage medium, when executed by the device,:

상기 처리 요소에 의해 상기 큐로부터 상기 연산 요청을 검색하는 것; 및 retrieving the operation request from the queue by the processing element; and

상기 처리 요소에 의해 상기 데이터를 사용하여 상기 연산 요청을 처리하는 것의 결과를 초래하는 추가의 지시어들을 저장한다.Stores additional directives that result from processing the operation request using the data by the processing element.

진술 64. 본 개시의 일 실시예는 진술 63에 따른 상기 물품을 포함하고, 상기 비일시적 저장 매체는, 상기 장치에 의해 실행될 때, Statement 64. An embodiment of this disclosure includes the article according to Statement 63, wherein the non-transitory storage medium, when executed by the device,:

완료 상태를 제2 큐의 상기 처리 요소로부터 상기 제2 스케줄러로 반환하는 것의 결과를 초래하는 추가의 지시어들을 저장한다. Stores additional directives that result in returning a completion status from the processing element in a second queue to the second scheduler.

진술 65. 본 개시의 일 실시예는 진술 61에 따른 상기 물품을 포함하고, 상기 비일시적 저장 매체는, 상기 장치에 의해 실행될 때,Statement 65. An embodiment of this disclosure includes the article according to Statement 61, wherein the non-transitory storage medium, when executed by the device,:

상기 제2 스케줄러에 의해 상기 제2 큐로부터 상기 완료 상태를 검색하는 것의 결과를 초래하는 추가의 지시어들을 저장한다.Store additional directives that result in retrieving the completion status from the second queue by the second scheduler.

진술 66. 본 개시의 일 실시예는 진술 65에 따른 상기 물품을 포함하고, 상기 비일시적 저장 매체는, 상기 장치에 의해 실행될 때,Statement 66. An embodiment of this disclosure includes the article according to Statement 65, wherein the non-transitory storage medium, when executed by the device,:

상기 제2 스케줄러에 의해 상기 제2 큐로부터 제2 완료 상태를 검색하고, 상기 제2 완료 상태는 제2 처리 요소에 의해 상기 제2 큐에 배치되는 것의 결과를 초래하는 추가의 지시어들을 저장한다. Retrieve a second completion state from the second queue by the second scheduler, the second completion state storing additional instructions that result in being placed in the second queue by a second processing element.

진술 67. 본 개시의 일 실시예는 진술 61에 다른 상기 물품을 포함하고, 상기 연산 요청을 상기 큐에 할당하는 것은, Statement 67. One embodiment of this disclosure includes the item other than in statement 61, wherein assigning the operation request to the queue comprises:

상기 연산 요청을 상기 처리 요소와 연관된 상기 큐에 할당하는 것을 포함한다.and assigning the operation request to the queue associated with the processing element.

진술 68. 본 개시의 일 실시예는 진술 61에 따른 상기 물품을 포함하고, Statement 68. An embodiment of the present disclosure includes the article according to Statement 61;

상기 데이터를 사용하여 상기 연산 요청을 처리하기 위해 상기 처리 요소를 스케줄링하는 것은, 상기 연산 요청에 적어도 부분적으로 기초하여 상기 처리 요소의 유형을 식별하는 것을 포함하고, scheduling the processing element to process the computational request using the data includes identifying a type of the processing element based at least in part on the computational request;

상기 연산 요청을 상기 큐에 할당하는 것은, 상기 연산 요청을 상기 처리 요소의 유형과 연관된 상기 큐에 할당하는 것을 포함한다.Assigning the operation request to the queue includes assigning the operation request to the queue associated with the type of processing element.

결과적으로, 여기에서 기술된 실시예들에 대한 다양한 변경의 관점에서, 상세한 설명 및 수반되는 자료는 단지 예시를 위한 것일 뿐이고, 본 개시의 범위를 한정하기 위한 것으로 간주되어서는 안 된다. 따라서 본 개시로서 청구되는 것은, 다음의 청구범위 및 그와 동등한 것들의 범위 및 사상 내에서 올 수 있는 모든 그러한 수정들이다.Consequently, the detailed description and accompanying material, in view of various modifications to the embodiments described herein, are illustrative only and should not be construed as limiting the scope of the present disclosure. Accordingly, what is claimed as this disclosure are all such modifications as may come within the scope and spirit of the following claims and equivalents thereto.

110: 프로세서
115: 메모리
120-1: 저장 장치
120-2: 저장 장치
125: 메모리 컨트롤러
130: 장치 드라이버
135: 다중 처리 시스템
140: 연산 시스템110: processor
115: memory
120-1: storage device
120-2: storage device
125: memory controller
130: device driver
135: multiprocessing system
140: operation system

Claims

a storage device that stores data;
a load module that reads the data from the storage device based at least in part on an I/O request; and
a scheduler to receive the I/O requests and to place the I/O requests into a queue for delivery to the load module based at least in part on the size of the I/O requests being less than a threshold.

According to claim 1,
and a manager for retrieving the I/O requests from the queue and assigning the I/O requests to the load modules.

The method of claim 2, wherein the system,
a second storage device that stores second data; and
a second load module for reading the second data from the second storage device;
wherein the manager assigns the I/O request to the load module based at least in part on the I/O request requesting the data.

According to claim 3,
The input/output request includes a first identifier of the data,
wherein the manager includes a table mapping the first identifier of the data to a second identifier of the storage device.

The method of claim 2, wherein the system,
and a second load module to read the data from the storage device.

6. The system of claim 5, wherein the manager selects the load module to process the I/O request.

According to claim 2,
The system further comprises a second queue;
wherein the scheduler places the second I/O request into the second queue based at least in part on a size of the second I/O request exceeding the threshold.

The method of claim 7, wherein the manager,
A system for retrieving the I/O request from the queue and the second I/O request from the second queue using round robin access.

The method of claim 1 , wherein the load module:
a reader that reads the first vector and the second vector of data from the storage device; and
and an adder for adding the first vector and the second vector.

2. The system of claim 1, wherein the load module is configured to transmit the data to a second scheduler.