TWI781509B - System and method for transforming sparse elements into a dense matrix, and non-transitory machine-readable storage device - Google Patents

System and method for transforming sparse elements into a dense matrix, and non-transitory machine-readable storage device Download PDF

Info

Publication number
TWI781509B
TWI781509B TW110100489A TW110100489A TWI781509B TW I781509 B TWI781509 B TW I781509B TW 110100489 A TW110100489 A TW 110100489A TW 110100489 A TW110100489 A TW 110100489A TW I781509 B TWI781509 B TW I781509B
Authority
TW
Taiwan
Prior art keywords
sparse
unit
dense
elements
element access
Prior art date
Application number
TW110100489A
Other languages
Chinese (zh)
Other versions
TW202131172A (en
Inventor
拉非 納拉亞那斯瓦密
拉浩 納加拉簡
禹同爀
克里斯多福 丹尼爾 理瑞
Original Assignee
美商谷歌有限責任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 美商谷歌有限責任公司 filed Critical 美商谷歌有限責任公司
Publication of TW202131172A publication Critical patent/TW202131172A/en
Application granted granted Critical
Publication of TWI781509B publication Critical patent/TWI781509B/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Complex Calculations (AREA)
  • Error Detection And Correction (AREA)
  • Multi Processors (AREA)
  • Magnetic Resonance Imaging Apparatus (AREA)

Abstract

Methods, systems, and apparatus, including a system for transforming sparse elements to a dense matrix. The system is configured to receive a request for an output matrix based on sparse elements including sparse elements associated with a first dense matrix and sparse elements associated with a second dense matrix; obtain the sparse elements associated with the first dense matrix fetched by a first group of sparse element access units; obtain the sparse elements associated with the second dense matrix fetched by a second group of sparse element access units; and transform the sparse elements associated with the first dense matrix and the sparse elements associated with the second dense matrix to generate the output dense matrix that includes the sparse elements associated with the first dense matrix and the sparse elements associated with the second dense matrix.

Description

用於將稀疏元素變換為一密集矩陣之系統及方法,以及非暫時性機器可讀儲存裝置System and method for transforming sparse elements into a dense matrix, and non-transitory machine-readable storage

本說明書一般而言係關於使用電路來處理一矩陣。This description generally relates to the use of circuits to process a matrix.

根據本說明書中所闡述之標的物之一項新穎態樣,一矩陣處理器可用於執行一稀疏至密集或一密集至稀疏矩陣變換。一般而言,高效能計算系統可使用線性代數常式來處理一矩陣。在某些例項中,矩陣之大小可太大而不能裝配在一個資料儲存器中,且矩陣之不同部分可稀疏地儲存於一分散式資料儲存系統之不同位置中。為載入矩陣,一計算系統之中央處理單元可指令一單獨電路存取矩陣之不同部分。該電路可包含根據一網路拓撲配置之多個記憶體控制器,其中可基於一預定規則集合而分割且儲存稀疏資料。每一記憶體控制器可基於該預定規則集合而聚集稀疏資料,以對稀疏資料執行同作計算,且產生可串連在一起以供中央處理單元執行進一步處理之一密集矩陣。 一般而言,本說明書中所闡述之標的物之一項新穎態樣可體現在用於將稀疏元素變換為一密集矩陣之一系統中。該系統包含:一第一稀疏元素存取單元群組,其經組態以提取與一第一密集矩陣相關聯之稀疏元素;及一第二稀疏元素存取單元群組,其經組態以提取與不同於該第一密集矩 陣之一第二密集矩陣相關聯之稀疏元素。該系統經組態以:接收對基於稀疏元素之一輸出矩陣之一請求,該等稀疏元素包含與一第一密集矩陣相關聯之稀疏元素及與一第二密集矩陣相關聯之稀疏元素;獲得由該第一稀疏元素存取單元群組提取之與該第一密集矩陣相關聯之該等稀疏元素;獲得由該第二稀疏元素存取單元群組提取之與該第二密集矩陣相關聯之該等稀疏元素;且變換與該第一密集矩陣相關聯之該等稀疏元素及與該第二密集矩陣相關聯之該等稀疏元素以產生包含與該第一密集矩陣相關聯之該等稀疏元素及與該第二密集矩陣相關聯之該等稀疏元素之該輸出密集矩陣。 此等及其他實施方案可各自視情況包含以下特徵中之一或多者。舉例而言,該第一稀疏元素存取單元群組可包含一第一稀疏元素存取單元及一第二稀疏元素存取單元。該第一稀疏元素存取單元可經組態以提取與該第一密集矩陣相關聯之該等稀疏元素之一第一子集。該第二稀疏元素存取單元可經組態以提取與該第一密集矩陣相關聯之該等稀疏元素之一第二不同子集。 該第一稀疏元素存取單元經組態以接收對包含與該第一密集矩陣相關聯之該等稀疏元素及與該第二密集矩陣相關聯之該等稀疏元素之複數個稀疏元素之一請求;且將該請求傳輸至該第二稀疏元素存取單元。該第一稀疏元素存取單元可經組態以判定該複數個稀疏元素中之一特定稀疏元素之一身份和與該第一密集矩陣相關聯之該等稀疏元素之該第一子集中之一個稀疏元素之一身份匹配。該第一稀疏元素存取單元可經組態以回應於判定該複數個稀疏元素中之該特定稀疏元素之該身份和與該第一密集矩陣相關聯之該等稀疏元素之該第一子集中之一個稀疏元素之該身份匹配而提取包含該特定稀疏元素之與該第一密集矩陣相關聯之該等稀疏元素之該第一子集。 該第一稀疏元素存取單元可經組態以自一第一資料分區提取與該第一密集矩陣相關聯之該等稀疏元素之該第一子集,且該第二稀疏元素存取單元可經組態以自一第二不同資料分區提取與該第一密集矩陣相關聯之該等稀疏元素之該第二不同子集。該第一稀疏元素存取單元可經組態以變換與該第一密集矩陣相關聯之該等稀疏元素之該第一子集以產生一第三密集矩陣,且該第二稀疏元素存取單元可經組態以接收該第三密集矩陣;變換與該第二密集矩陣相關聯之該等稀疏元素之該第二子集以產生一第四密集矩陣;及變換該第三密集矩陣與該第四密集矩陣以產生包含與該第一密集矩陣相關聯之該等稀疏元素之該第一子集及與該第一密集矩陣相關聯之該等稀疏元素之該第二子集之一第五密集矩陣。 該第一稀疏元素存取單元群組及該第二稀疏元素存取單元群組可配置成一個二維網狀組態。該第一稀疏元素存取單元群組及該第二稀疏元素存取單元群組可配置成一個二維環面組態。與第一密集矩陣相關聯之該等稀疏元素及與第二密集矩陣相關聯之該等稀疏元素可係多維矩陣,且該輸出密集矩陣可係一向量。 本說明書中闡述之標的物可在特定實施例中經實施以便實現以下優點中之一或多者。根據一網路拓撲連接記憶體控制器單元允許遵循一預定規則集合而分割稀疏資料之儲存。將稀疏至密集資料載入任務自中央處理單元轉移至一單獨電路增加中央處理單元之計算頻寬且降低系統之處理成本。藉由使用專用電路,可避免使用專用於密集線性代數之處理器來提取稀疏資料。藉由在分散式系統中同時使用多個記憶體,分散式系統中可用之總和彙總頻寬高於需要串列化且關於彙總頻寬具有一單記憶體容量之一單個記憶體組之頻寬。 此態樣及其他態樣之其他實施方案包含對應系統、裝置及編碼於電腦儲存器件上之經組態以執行方法之動作之電腦程式。一或多個電腦之一系統可藉助於安裝於系統上之在操作中致使系統執行動作之軟體、韌體、硬體或其一組合而如此組態。一或多個電腦程式可藉助於具有在由資料處理裝置執行時致使該裝置執行動作之指令而如此組態。 附圖及下文的說明中陳述本說明書中所闡述之標的物之一或多個實施方案之細節。根據說明、圖式及申請專利範圍將明瞭標的物之其他可能特徵、態樣及優點。According to one novel aspect of the subject matter set forth in this specification, a matrix processor can be used to perform a sparse-to-dense or a dense-to-sparse matrix transformation. In general, a HPC system can use linear algebra formulas to process a matrix. In some instances, the size of the matrix may be too large to fit in one data store, and different parts of the matrix may be stored sparsely in different locations in a distributed data storage system. To load the matrix, the central processing unit of a computing system can instruct a separate circuit to access different parts of the matrix. The circuitry may include memory controllers configured according to a network topology, wherein sparse data may be partitioned and stored based on a predetermined set of rules. Each memory controller can aggregate sparse data based on the predetermined set of rules to perform concurrent computations on the sparse data and generate a dense matrix that can be chained together for further processing by the central processing unit. In general, a novel aspect of the subject matter set forth in this specification can be embodied in a system for transforming sparse elements into a dense matrix. The system includes: a first group of sparse element access units configured to retrieve sparse elements associated with a first dense matrix; and a second group of sparse element access units configured to Extract the first dense moment with The sparse elements associated with a second dense matrix of arrays. The system is configured to: receive a request for an output matrix based on sparse elements including sparse elements associated with a first dense matrix and sparse elements associated with a second dense matrix; obtain the sparse elements associated with the first dense matrix extracted from the first sparse element access unit group; obtaining the sparse elements associated with the second dense matrix extracted from the second sparse element access unit group the sparse elements; and transforming the sparse elements associated with the first dense matrix and the sparse elements associated with the second dense matrix to generate the sparse elements associated with the first dense matrix and the output dense matrix of the sparse elements associated with the second dense matrix. These and other implementations can each optionally include one or more of the following features. For example, the first sparse element access unit group may include a first sparse element access unit and a second sparse element access unit. The first sparse element access unit can be configured to retrieve a first subset of the sparse elements associated with the first dense matrix. The second sparse element access unit can be configured to retrieve a second different subset of the sparse elements associated with the first dense matrix. The first sparse element access unit is configured to receive a request for a plurality of sparse elements including the sparse elements associated with the first dense matrix and the sparse elements associated with the second dense matrix ; and transmitting the request to the second sparse element access unit. The first sparse element access unit may be configured to determine an identity of a particular sparse element of the plurality of sparse elements and one of the first subset of the sparse elements associated with the first dense matrix One of the sparse elements identity matches. The first sparse element access unit may be configured in response to determining the identity of the particular sparse element of the plurality of sparse elements and the first subset of the sparse elements associated with the first dense matrix The identity of a sparse element is matched to extract the first subset of the sparse elements associated with the first dense matrix including the particular sparse element. The first sparse element access unit can be configured to retrieve the first subset of the sparse elements associated with the first dense matrix from a first data partition, and the second sparse element access unit can configured to extract the second different subset of the sparse elements associated with the first dense matrix from a second different data partition. The first sparse element access unit may be configured to transform the first subset of the sparse elements associated with the first dense matrix to produce a third dense matrix, and the second sparse element access unit can be configured to receive the third dense matrix; transform the second subset of the sparse elements associated with the second dense matrix to produce a fourth dense matrix; and transform the third dense matrix with the first Quad dense matrix to produce a fifth dense comprising the first subset of the sparse elements associated with the first dense matrix and the second subset of the sparse elements associated with the first dense matrix matrix. The first sparse element access unit group and the second sparse element access unit group may be configured in a two-dimensional mesh configuration. The first sparse element access unit group and the second sparse element access unit group may be configured in a two-dimensional torus configuration. The sparse elements associated with the first dense matrix and the sparse elements associated with the second dense matrix may be multidimensional matrices, and the output dense matrix may be a vector. The subject matter set forth in this specification can be implemented in particular embodiments so as to realize one or more of the following advantages. Connecting the memory controller units according to a network topology allows partitioning the storage of sparse data following a predetermined set of rules. Offloading the sparse to dense data loading task from the central processing unit to a separate circuit increases the computational bandwidth of the central processing unit and reduces the processing cost of the system. By using dedicated circuitry, the use of processors dedicated to dense linear algebra to extract sparse data can be avoided. By using multiple memories simultaneously in a distributed system, the aggregate aggregate bandwidth available in a distributed system is higher than that required for a single memory bank that needs to be serialized and has a single memory capacity with respect to the aggregate bandwidth . Other implementations of this and other aspects include corresponding systems, apparatus, and computer programs encoded on computer storage devices configured to perform the actions of the methods. A system of one or more computers may be so configured by means of software, firmware, hardware or a combination thereof installed on the system which in operation causes the system to perform actions. One or more computer programs may be so configured by having instructions that when executed by a data processing device cause the device to perform actions. Details of one or more implementations of the subject matter set forth in this specification are set forth in the accompanying drawings and the description below. Other possible features, aspects and advantages of the subject matter will be clarified according to the description, drawings and scope of patent application.

一般而言,資料可以一矩陣之形式來表示且一計算系統可使用線性代數演算法操縱該資料。一矩陣可係一個一維向量或一個多維矩陣。一矩陣可由諸如一資料庫表格或一變數之一資料結構表示。然而,當一矩陣之大小太大時,將整個矩陣儲存於一個資料儲存器中可係不可能的。一密集矩陣可變換成多個稀疏元素,其中每一稀疏元素可儲存於一不同資料儲存器中。一密集矩陣之一稀疏元素可係一矩陣,其中該矩陣之僅一小子矩陣(例如,一單值元素、一列、一行或一子矩陣)具有非零值。當一計算系統需要存取該密集矩陣時,中央處理單元(CPU)可開始到達該資料儲存器之每一者以提取所儲存稀疏元素之一執行緒,且應用一稀疏-密集變換以恢復該密集矩陣。然而,提取所有稀疏元素花費之時間量可係長的,且結果表明,CPU之計算頻寬可係未充分利用的。在某些情形中,一計算系統可需要存取數個密集矩陣之稀疏元素以形成一新密集矩陣,其中該等密集矩陣可不具有相等尺寸。與到達資料儲存器之每一者以提取不同密集矩陣之稀疏元素之一執行緒相關聯之CPU閒置時間可遭遇不同等待時間,且可以一不合意方式進一步影響計算器件之效能。在某些情形中,一計算系統可需要存取數個密集矩陣之稀疏元素以形成一新密集矩陣,其中該等稀疏元素可不具有相等尺寸。與到達資料儲存器之每一者以提取不同密集矩陣之稀疏元素之一執行緒相關聯之CPU閒置時間可遭遇不同等待時間,且可以一不合意方式進一步影響計算器件之效能。與一CPU分開之一硬體稀疏-密集變換單元可藉由收集稀疏元素且獨立於CPU操作而將該等稀疏元素變換成一密集矩陣來增加處理器之計算頻寬。 圖1展示用於變換來自一或多個密集矩陣之稀疏元素以產生一密集矩陣之一實例性計算系統100之一方塊圖。計算系統100包含一處理單元102、一稀疏-密集變換單元104及資料分區106a至106k,其中k 係大於1之一整數。一般而言,處理單元102處理用於存取一目標密集矩陣之一指令,且將一指令110發送至稀疏-密集變換單元104以產生該目標密集矩陣。稀疏-密集變換單元104自資料分區106a至106k中之一或多者存取對應稀疏元素108a至108n,其中n 係大於1之一整數。稀疏-密集變換單元104使用對應稀疏元素108a至108n產生目標密集矩陣112,且將目標密集矩陣112提供至處理單元102以用於進一步處理。舉例而言,稀疏元素108a至108n可係具有不同大小之二維矩陣,且稀疏-密集變換單元104可藉由將稀疏元素108a至108n中之每一者變換成一向量且將n 個向量串連成一單個向量而產生目標密集矩陣112。 在某些實施方案中,處理單元102可處理用於更新一目標密集矩陣之一指令且將一經更新密集矩陣發送至稀疏-密集變換單元104。稀疏-密集變換單元104可將該經更新密集矩陣變換成對應稀疏元素且相應地更新儲存於資料分區106a至106k中之一或多個稀疏元素。 處理單元102經組態以處理用於在計算系統100內執行之指令。處理單元102可包含一或多個處理器。在某些實施方案中,處理單元102經組態以處理由稀疏-密集變換單元104產生之目標密集矩陣112。在某些其他實施方案中,處理單元102可經組態以請求稀疏-密集變換單元104產生目標密集矩陣112,且另一處理單元可經組態以處理目標密集矩陣112。資料分區106a至106k儲存包含稀疏元素108a至108n之資料。在某些實施方案中,資料分區106a至106k可係一或若干揮發性記憶體單元。在某些其他實施方案中,資料分區106a至106k可係一或若干非揮發性記憶體單元。資料分區106a至106k亦可係另一形式之電腦可讀媒體,諸如一儲存區域網路中之器件或其他組態。資料分區106a至106k可使用電連接、光學連接或無線連接耦合至稀疏-密集變換單元104。在某些實施方案中,資料分區106a至106k可係稀疏-密集變換單元104之部分。 稀疏-密集變換單元104經組態以基於稀疏元素而判定一密集矩陣。在某些實施方案中,稀疏-密集變換單元104可經組態以基於一密集矩陣而判定稀疏元素之位置。在某些實施方案中,稀疏-密集變換單元104可包含多個互連之稀疏元素存取單元,如下文參考圖2A至圖2D更詳細地闡述。 圖2A展示一實例性稀疏-密集變換單元200。稀疏-密集變換單元200可對應於稀疏-密集變換單元104。稀疏-密集變換單元200包含實體地或邏輯地配置成M 列及N 行之M ×N 個稀疏元素存取單元X 1,1 至X M,N ,其中MN 係等於或大於1之整數。在某些實施方案中,稀疏-密集變換單元200可包含經組態以處理資料之額外電路。一般而言,稀疏-密集變換單元200經組態以接收對一密集矩陣之一請求,且基於可由稀疏元素存取單元X 1,1 至X M,N 存取之對應稀疏元素而判定一密集矩陣。一般而言,每一稀疏元素存取單元經組態以存取一指定稀疏元素集合,且在下文參考圖3A至圖3B更詳細地闡述。在某些實施方案中,一稀疏元素存取單元可係一單指令多資料(SIMD)處理器件。 在某些實施方案中,稀疏元素存取單元X 1,1 至X M,N 可實體地或邏輯地配置成一個二維網狀組態。舉例而言,稀疏元素存取單元X 1,1 直接耦合至稀疏元素存取單元X 1,2 X2,1 。作為另一實例,稀疏元素存取單元X 2,2 直接耦合至稀疏元素存取單元X 2,1 、X 3,1 、X 2,3 X1,2 。兩個稀疏元素存取單元之間的耦合可係一電連接、一光學連接、一無線連接或任一其他適合連接。 在某些其他實施方案中,稀疏元素存取單元X 1,1 至X M,N 可實體地或邏輯地配置成一個二維環面組態。舉例而言,稀疏元素存取單元X 1,1 直接耦合至稀疏元素存取單元X 1,2 、X 2,1 、X 1,N XM,1 。作為另一實例,稀疏元素存取單元X M,N 直接耦合至稀疏元素存取單元X M,N-1 、X M-1,N 、X M,1 X1,N 。 在某些實施方案中,稀疏-密集變換單元200可經組態以根據一預定條件集合分割自密集矩陣變換之稀疏元素。稀疏元素存取單元X 1,1 至X M,N 之每一列可經分割以存取自特定密集矩陣變換之稀疏元素。舉例而言,稀疏-密集變換單元200可經組態以存取自對應於一電腦模型之1,000個不同資料庫表格之密集矩陣變換之稀疏元素。該等資料庫表格中之一或多者可具有不同大小。稀疏元素存取單元之第一列202可經組態以存取自1號資料庫表格至100號資料庫表格變換之稀疏元素,稀疏元素存取單元之第二列204可經組態以存取自101號資料庫表格至300號資料庫表格變換之稀疏元素,且稀疏元素存取單元之第M列206可經組態以存取自751號資料庫表格至1,000號資料庫表格變換之稀疏元素。在某些實施方案中,可在一處理器使用稀疏-密集變換單元200存取稀疏元素之前藉由硬體指令組態分割。 稀疏元素存取單元X 1,1 至X M,N 之每一行可經分割以存取自特定密集矩陣變換之稀疏元素之一子集。舉例而言,對應於1號資料庫表格之密集矩陣可變換成1,000個稀疏元素,其中該1,000個稀疏元素可由第一列202存取,如上文所闡述。稀疏元素存取單元X 1,1 可經組態以存取1號資料庫表格之1號至200號稀疏元素,且稀疏元素存取單元X 1,2 可經組態以存取1號資料庫表格之201號至500號稀疏元素。作為另一實例,對應於2號資料庫表格之密集矩陣可變換成500個稀疏元素,其中該500稀疏元素可由第一列202存取,如上文所闡述。稀疏元素存取單元X 1,1 可經組態以存取2號資料庫表格之1號至50號稀疏元素,且稀疏元素存取單元X 1,2 可經組態以存取2號資料庫表格之51號至200號稀疏元素。作為另一實例,對應於1,000號資料庫表格之密集矩陣可變換成10,000個稀疏元素,其中該10,000個稀疏元素可由第M列206存取,如上文所闡述。稀疏元素存取單元X M,1 可經組態以存取1,000號資料庫表格之1號至2,000號稀疏元素,且稀疏元素存取單元X M,N 可經組態以存取1,000號資料庫表格之9,000號至10,000號稀疏元素。 圖2B展示稀疏-密集變換單元200可如何使用稀疏元素存取單元之一個二維網狀網路請求稀疏元素之一實例。作為一實例,一處理單元可執行向稀疏-密集變換單元200請求如下內容之一指令:使用1號資料庫表格之1號至50號稀疏元素、2號資料庫表格之100號至200號稀疏元素及1,000號資料庫表格之9,050號至9,060號稀疏元素產生之一密集一維向量。在稀疏-密集變換單元200接收來自處理單元之請求之後,稀疏-密集變換單元200可指令稀疏元素存取單元X 1,1 將對稀疏元素之一請求廣播至網狀網路中之其他稀疏元素存取單元。稀疏元素存取單元X 1,1 可將一請求222廣播至稀疏元素存取單元X 1,2 且將一請求224廣播至稀疏元素存取單元X 2,1 。在接收請求222之後,稀疏元素存取單元X 1,2 可將一請求226廣播至稀疏元素存取單元X 1,3 。在某些實施方案中,一稀疏元素存取單元可經組態以基於一路由方案而將一請求廣播至另一稀疏元素存取單元。舉例而言,稀疏元素存取單元X 1,2 可不經組態以將一請求廣播至稀疏元素存取單元X 2,2 ,此乃因稀疏元素存取單元X 2,2 經組態以接收來自稀疏元素存取單元X 2,1 之一廣播。該路由方案可係靜態的或動態產生的。舉例而言,該路由方案可係一查找表。在某些實施方案中,一稀疏元素存取單元可經組態以基於請求224而將請求224廣播至另一稀疏元素存取單元。舉例而言,請求224可包含所請求稀疏元素(例如,1號資料庫表格,1號至50號稀疏元素)之識別,且稀疏元素存取單元X 1,2 可基於該等識別而判定是否將請求224廣播至稀疏元素存取單元X 2,2 及/或稀疏元素存取單元X 1,3 。廣播程序透過網狀網路傳播,其中稀疏元素存取單元X M,N 自稀疏元素存取單元X M,N-1 接收一請求230。 圖2C展示稀疏-密集變換單元200可如何使用稀疏元素存取單元之二維網狀網路產生所請求密集矩陣之一實例。在某些實施方案中,在一稀疏元素存取單元接收所廣播請求之後,稀疏元素存取單元經組態以判定其是否經組態以存取所請求稀疏元素中之任一者。舉例而言,稀疏元素存取單元X 1,1 可判定其經組態以存取1號資料庫表格之1號至50號稀疏元素,但其不經組態以存取2號資料庫表格之100號至200號稀疏元素或1,000號資料庫表格之9,050號至9,060號稀疏元素。回應於判定其經組態以存取1號資料庫表格之1號至50號稀疏元素,稀疏元素存取單元X 1,1 可自此等稀疏元素儲存於其中之資料分區提取1號資料庫表格之1號至50號稀疏元素,且基於此等稀疏元素而產生一密集矩陣242。 作為另一實例,稀疏元素存取單元X 2,1 可判定其不經組態以存取1號資料庫表格之1號至50號稀疏元素、2號資料庫表格之100號至200號稀疏元素或1,000號資料庫表格之9,050號至9,060號稀疏元素中之任一者。回應於判定其不經組態以存取所請求稀疏元素中之任一者,稀疏元素存取單元X 2,1 可不執行任何進一步動作。 作為另一實例,稀疏元素存取單元X 1,2 可判定其經組態以存取2號資料庫表格之100號至200號稀疏元素,但其不經組態以存取1號資料庫表格之1號至50號稀疏元素或1,000號資料庫表格之9,050號至9,060號稀疏元素。回應於判定其經組態以存取2號資料庫表格之100號至200號稀疏元素,稀疏元素存取單元X 1,2 可自此等稀疏元素儲存於其中之資料分區提取此等稀疏元素,且基於此等稀疏元素而產生一密集矩陣244。在某些實施方案中,在一稀疏元素存取單元產生一密集矩陣之後,該稀疏元素存取單元可經組態以將該密集矩陣轉發至廣播請求之發送者。在此處,稀疏元素存取單元X 1,2 將密集矩陣244轉發至稀疏元素存取單元X 1,1 。 作為另一實例,稀疏元素存取單元X M,N 可判定其經組態以存取1,000號資料庫表格之9,050號至9,060號稀疏元素,但其不經組態以存取1號資料庫表格之1號至50號稀疏元素或2號資料庫表格之100號至200號稀疏元素。回應於判定其經組態以存取1,000號資料庫表格之9,050號至9,060號稀疏元素,稀疏元素存取單元X M,N 可自此等稀疏元素儲存於其中之資料分區提取此等稀疏元素,且基於此等稀疏元素而產生一密集矩陣246。在某些實施方案中,在一稀疏元素存取單元產生一密集矩陣之後,該稀疏元素存取單元可經組態以將該密集矩陣轉發至廣播請求之發送者。在此處,稀疏元素存取單元X M,N 將密集矩陣246轉發至稀疏元素存取單元X M,N-1 。在下一循環中,稀疏元素存取單元X M,N-1 經組態以將密集矩陣246轉發至稀疏元素存取單元XM,N-2 。此程序繼續直至稀疏元素存取單元X 2,1 已將密集矩陣246轉發至稀疏元素存取單元X 1,1 為止。 在某些實施方案中,稀疏-密集變換單元200經組態以變換由稀疏元素存取單元產生之密集矩陣且產生用於處理器單元之一密集矩陣。在此處,稀疏-密集變換單元200將密集矩陣242、244及246變換成用於處理器單元之一密集矩陣。舉例而言,密集矩陣242可具有100×10之尺寸,密集矩陣244可具有20×100之尺寸,且密集矩陣246可具有3×3之尺寸。稀疏-密集變換單元200可將密集矩陣242、244及246變換成具有1×3009之尺寸之一向量。有利地,根據密集矩陣(例如,資料庫表格)分割列允許稀疏-密集變換單元200在所產生密集矩陣已自行N 傳播至行1之後獲得所有所請求稀疏元素。行之分割減少由使用稀疏元素存取單元中之僅一者存取太多稀疏元素導致之頻寬瓶頸。 圖2D展示稀疏-密集變換單元200可如何使用稀疏元素存取單元之一個二維網狀網路基於一密集矩陣而更新稀疏元素之一實例。作為一實例,一處理單元可執行請求稀疏-密集變換單元200使用一密集一維向量更新所儲存稀疏元素之一指令,該密集一維向量使用1號資料庫表格之1號至50號稀疏元素及1,000號資料庫表格之9,050號至9,060號稀疏元素來產生。在稀疏-密集變換單元200自處理單元接收請求之後,稀疏-密集變換單元200可指令稀疏元素存取單元X 1,1 將一稀疏元素更新請求廣播至網狀網路中之其他稀疏元素存取單元,其中該稀疏元素更新請求可包含由處理單元提供之密集一維向量。在某些實施方案中,稀疏元素存取單元X 1,1 可判定其是否經指派以存取包含於密集一維向量中之稀疏元素。回應於判定其經指派以存取包含於密集一維向量中之稀疏元素,稀疏元素存取單元X 1,1 可更新儲存於資料分區中之稀疏元素。在此處,稀疏元素存取單元X 1,1 判定其經指派以存取1號資料庫表格之1號至50號稀疏元素,且稀疏元素存取單元X 1,1 執行更新資料分區中之此等稀疏元素之一指令。 稀疏元素存取單元X 1,1 可將一稀疏元素更新請求252廣播至稀疏元素存取單元X 1,2 且將一稀疏元素更新請求254廣播至稀疏元素存取單元X 2,1 。在接收稀疏元素更新請求252之後,稀疏元素存取單元X 1,2 可判定其未經指派以存取包含於密集一維向量中之稀疏元素。稀疏元素存取單元X 1,2 將一請求256廣播至稀疏元素存取單元X 1,3 。廣播程序透過網狀網路傳播,其中稀疏元素存取單元X M,N 自稀疏元素存取單元X M,N-1 接收一請求260。在此處,稀疏元素存取單元X M,N 判定其經指派以存取1,000號資料庫表格之9,050號至9,060號稀疏元素,且稀疏元素存取單元X M,N 執行用以更新資料分區中之此等稀疏元素之一指令。 圖3A展示一實例性稀疏元素存取單元300。稀疏元素存取單元300可係稀疏元素存取單元X 1,1 至X M,N 中之任一者。一般而言,稀疏元素存取單元300經組態以自節點網路320接收提取儲存於一或多個資料分區中之稀疏元素且將所提取稀疏元素變換成一密集矩陣之一請求342。在某些實施方案中,一處理單元316將對使用稀疏元素產生之一密集矩陣之一請求發送至節點網路320中之一稀疏元素存取單元。該稀疏元素存取單元可將請求342廣播至稀疏元素存取單元300。廣播請求342之路由可類似於圖2B中之說明。稀疏元素存取單元300包含一請求識別單元302、一資料提取單元304、一稀疏縮減單元306、一串連單元308、一壓縮/解壓縮單元310及一分裂單元312。節點網路320可係一個二維網狀網路。處理單元316可類似於處理單元102。 一般而言,請求識別單元302經組態以接收提取儲存於一或多個資料分區330中之稀疏元素之請求342,且判定稀疏元素存取單元300是否經指派以存取由請求342指示之稀疏元素。在某些實施方案中,請求識別單元302可藉由使用一查找表而判定稀疏元素存取單元300是否經指派以存取由請求342指示之稀疏元素。舉例而言,若一特定所請求稀疏元素(例如,1號資料庫表格之1號)之一識別包含於查找表中,則請求識別單元302可將提取該特定所請求稀疏元素之一信號344發送至資料提取單元304。若一特定所請求稀疏元素(例如,1號資料庫表格之1號)之一識別未包含於查找表中,則請求識別單元302可捨棄所接收請求。在某些實施方案中,請求識別單元302可經組態以將所接收請求廣播至節點網路320上之另一稀疏元素存取單元。 資料提取單元304經組態以回應於接收信號344而自資料分區330提取一或多個所請求稀疏元素。在某些實施方案中,資料提取單元304包含一或多個處理器322a至322k,其中k 係一整數。處理器322a至322k可係向量處理單元(VPU)、陣列處理單元或任何適合處理單元。在某些實施方案中,處理器322a至322k經配置為在資料分區330附近以減少處理器322a至322k與資料分區330之間的延時。基於稀疏元素存取單元300經指派以提取之所請求稀疏元素之數目,資料提取單元304可經組態以產生將分佈在處理器322a至322k當中之一或多個請求。在某些實施方案中,處理器322a至322k中之每一者可基於稀疏元素之識別而指派給特定稀疏元素,且資料提取單元304可經組態以基於稀疏元素之識別而針對處理器322a至322k產生一或多個請求。在某些實施方案中,資料提取單元304可藉由使用一查找表而判定處理器指派。在某些實施方案中,資料提取單元304可針對處理器322a至322k產生多個批次,其中每一批次係對所請求稀疏元素之一子集之一請求。處理器322a至322k經組態以自資料分區330獨立地提取經指派稀疏元素,且將所提取稀疏元素346轉發至稀疏縮減單元306。 稀疏縮減單元306經組態以縮減所提取稀疏元素346之尺寸。舉例而言,處理器322a至322k中之每一者可產生具有100×1之尺寸之一稀疏元素。稀疏縮減單元306可接收具有100×k 之尺寸之所提取稀疏元素346,且藉由通過邏輯操作、算術操作或兩者之一組合將所提取稀疏元素346之尺寸縮減為100×1而產生經稀疏縮減元素348。稀疏縮減單元306經組態以將經稀疏縮減元素348輸出至串連單元308。 串連單元308經組態以重新配置且串連經稀疏縮減元素348以產生經串連元素350。舉例而言,稀疏元素存取單元X 1,1 可經組態以存取1號資料庫表格之1號至200號稀疏元素。處理器322a可比經組態之處理器322b傳回所提取5號稀疏元素快速地將所提取10號稀疏元素傳回至稀疏縮減單元306。串連單元308經組態以將較晚接收之5號稀疏元素重新配置為被安排在較早接收之10號稀疏元素前面,且將1號至200號稀疏元素串連為經串連元素350。 壓縮/解壓縮單元310經組態以壓縮經串連元素350以產生用於節點網路320之一密集矩陣352。舉例而言,壓縮/解壓縮單元310可經組態以壓縮經串連元素350中之零值以改良節點網路320之頻寬。在某些實施方案中,壓縮/解壓縮單元310可解壓縮一所接收密集矩陣。舉例而言,稀疏元素存取單元300可經由節點網路320自一鄰近稀疏元素存取單元接收一密集矩陣。稀疏元素存取單元300可解壓縮所接收密集矩陣,且可串連經解壓縮密集矩陣與經串連元素350以形成可經壓縮且然後輸出至節點網路320之經更新經串連元素。 圖3B展示稀疏元素存取單元300可如何基於自節點網路320接收之一密集矩陣而更新稀疏元素之一實例。作為一實例,一處理單元可執行請求稀疏-密集變換單元使用一密集一維向量更新所儲存稀疏元素之一指令,該密集一維向量使用1號資料庫表格之1號至50號稀疏元素及1,000號資料庫表格之9,050號至9,060號稀疏元素來產生。在稀疏-密集變換單元自處理單元接收請求之後,稀疏-密集變換單元可發送如下之一請求362:指令稀疏元素存取單元300判定其是否經指派以存取包含於密集一維向量中之稀疏元素。請求識別單元302經組態以判定稀疏元素存取單元300是否經指派以存取包含於密集一維向量中之稀疏元素。回應於判定稀疏元素存取單元300經指派以存取包含於密集一維向量中之稀疏元素,請求識別單元302可將更新儲存於資料分區中之稀疏元素之一指示364發送至分裂單元312。 分裂單元312經組態以將一所接收密集矩陣變換成可由資料提取單元304在資料分區330中更新之稀疏元素。舉例而言,分裂單元312可經組態以將密集一維向量變換成多個稀疏元素,且指令資料提取單元304更新稀疏元素存取單元300經指派提取的儲存於資料分區330中之稀疏元素。 圖4係圖解說明用於產生一密集矩陣之一程序400之一實例之一流程圖。程序400可由諸如稀疏-密集變換單元104或稀疏-密集變換單元200之一系統執行。該系統可包含一第一稀疏元素存取單元群組及一第二稀疏元素存取單元群組。舉例而言,參考圖2A,稀疏-密集變換單元200可包含實體地或邏輯地配置成M 列及N 行之M ×N 個稀疏元素存取單元X 1,1 至X M,N 。稀疏元素存取單元X 1,1 至X M,N 之每一列可經分割以存取自特定密集矩陣變換之稀疏元素。在某些實施方案中,該第一稀疏元素存取單元群組可包含一第一稀疏元素存取單元及一第二稀疏元素存取單元。舉例而言,稀疏-密集變換單元200之第一列可包含稀疏元素存取單元X 1,1 及X 1,2 。在某些實施方案中,該第一稀疏元素存取單元群組及該第二稀疏元素存取單元群組可配置成一個二維網狀組態。在某些實施方案中,該第一稀疏元素存取單元群組及該第二稀疏元素存取單元群組可配置成一個二維環面組態。 系統接收對基於稀疏元素之一輸出矩陣之一請求,該等稀疏元素包含與一第一密集矩陣相關聯之稀疏元素及與一第二密集矩陣相關聯之稀疏元素(402)。舉例而言,參考圖2B,一處理單元可執行向稀疏-密集變換單元200請求一密集一維向量之一指令,該密集一維向量使用1號資料庫表格之1號至50號稀疏元素、2號資料庫表格之100號至200號稀疏元素及1,000號資料庫表格之9,050號至9,060號稀疏元素來產生。 在某些實施方案中,第一稀疏元素存取單元可接收對包含與第一密集矩陣相關聯之稀疏元素及與第二密集矩陣相關聯之稀疏元素之複數個稀疏元素之一請求。第一稀疏元素存取單元可將該請求傳輸至第二稀疏元素存取單元。舉例而言,參考圖2B,在稀疏-密集變換單元200自處理單元接收該請求之後,稀疏-密集變換單元200可指令稀疏元素存取單元X 1,1 將對稀疏元素之一請求廣播至網狀網路中之其他稀疏元素存取單元。稀疏元素存取單元X 1,1 可將一請求222廣播至稀疏元素存取單元X 1,2 。 系統獲得由一第一稀疏元素存取單元群組提取之與第一密集矩陣相關聯之稀疏元素(404)。在某些實施方案中,第一稀疏元素存取單元可判定該複數個稀疏元素中之一特定稀疏元素之一身份和與第一密集矩陣相關聯之稀疏元素之第一子集中之一個稀疏元素之一身份匹配。舉例而言,參考圖2C,稀疏元素存取單元X 1,1 可經組態以存取1號資料庫表格之1號至200號稀疏元素。稀疏元素存取單元X 1,1 可判定其經組態以存取1號資料庫表格之1號至50號稀疏元素,但其不經組態以存取2號資料庫表格之100號至200號稀疏元素或1,000號資料庫表格之9,050號至9,060號稀疏元素。回應於判定該複數個稀疏元素中之特定稀疏元素之身份和與第一密集矩陣相關聯之稀疏元素之第一子集中之一個稀疏元素之身份匹配,第一稀疏元素存取單元可提取包含特定稀疏元素之與第一密集矩陣相關聯之稀疏元素之第一子集。舉例而言,回應於判定其經組態以存取1號資料庫表格之1號至50號稀疏元素,稀疏元素存取單元X 1,1 可自此等稀疏元素儲存於其中之資料分區提取1號資料庫表格之1號至50號稀疏元素。 第二稀疏元素存取單元可提取與第一密集矩陣相關聯之稀疏元素之一第二不同子集。舉例而言,參考圖2C,稀疏元素存取單元X 1,2 可經組態以存取2號資料庫表格之51號至200號稀疏元素。回應於判定其經組態以存取2號資料庫表格之100號至200號稀疏元素,稀疏元素存取單元X 1,2 可自此等稀疏元素儲存於其中之資料分區提取此等稀疏元素。 系統獲得由一第二稀疏元素存取單元群組提取之與第二密集矩陣相關聯之稀疏元素(406)。舉例而言,參考圖2C,第二稀疏元素存取單元群組可係M ×N 個稀疏元素存取單元之第M 列,其中稀疏元素存取單元X M,N 可經組態以存取1,000號資料庫表格之9,000號至10,000號稀疏元素。回應於判定其經組態以存取1,000號資料庫表格之9,050號至9,060號稀疏元素,稀疏元素存取單元X M,N 可自此等稀疏元素儲存於其中之資料分區提取此等稀疏元素,且基於此等稀疏元素而產生一密集矩陣246。 在某些實施方案中,第一稀疏元素存取單元可自一第一資料分區提取與第一密集矩陣相關聯的稀疏元素之第一子集,且第二稀疏元素存取單元可自一第二不同資料分區提取與第一密集矩陣相關聯的稀疏元素之第二不同子集。舉例而言,參考圖1,第一稀疏元素存取單元可自資料分區106a提取與第一密集矩陣相關聯的稀疏元素之第一子集,且第二稀疏元素存取單元可自資料分區106b提取與第一密集矩陣相關聯的稀疏元素之第二不同子集。 系統變換與第一密集矩陣相關聯之稀疏元素及與第二密集矩陣相關聯之稀疏元素以產生包含與第一密集矩陣相關聯之稀疏元素及與第二密集矩陣相關聯之稀疏元素的一輸出密集矩陣(408)。舉例而言,參考圖2C,稀疏-密集變換單元200可將密集矩陣242、244及246變換成用於處理器單元之一密集矩陣。 在某些實施方案中,與第一密集矩陣相關聯之稀疏元素及與第二密集矩陣相關聯之稀疏元素可係多維矩陣,且輸出密集矩陣可係一向量。舉例而言,密集矩陣242可具有100×10之尺寸,密集矩陣244可具有20×100之尺寸,且密集矩陣246可具有3×3之尺寸。稀疏-密集變換單元200可將密集矩陣242、244及246變換成具有1×3009之尺寸之一向量。 圖5係圖解說明用於產生一密集矩陣之一程序500之一實例之一流程圖。程序500可由諸如稀疏-密集變換單元104或稀疏元素存取單元300之一系統執行。 系統接收用於存取特定稀疏元素之子集之一指示(502)。舉例而言,參考圖3A,資料提取單元304可經組態以接收用於自資料分區330提取一或多個所請求稀疏元素之一信號344。在某些實施方案中,可經由一節點網路接收對儲存於一或多個資料分區中之特定稀疏元素之一請求。舉例而言,參考圖3A,請求識別單元302可經組態以經由一節點網路320接收提取儲存於資料分區330中之稀疏元素之一請求342。系統可判定資料提取單元經指派以處置特定稀疏元素之一子集。舉例而言,請求識別單元302可經組態以判定稀疏元素存取單元300是否經指派以存取由請求342指示之稀疏元素。回應於判定資料提取單元經指派以處置特定稀疏元素之一子集,可產生用於存取特定稀疏元素之子集之指示。舉例而言,若一特定所請求稀疏元素(例如,1號資料庫表格之1號)之一識別包含於一查找表中,則請求識別單元302可將提取該特定所請求稀疏元素之一信號344發送至資料提取單元304。 系統基於特定稀疏元素之子集之識別而判定用於提取特定稀疏元素之子集之一處理器指定(504)。舉例而言,參考圖3A,資料提取單元304包含一或多個處理器322a至322k。處理器322a至322k中之每一者可基於稀疏元素之識別而指派給特定稀疏元素,且資料提取單元304可經組態以基於稀疏元素之識別而產生用於處理器322a至322k之一或多個請求。在某些實施方案中,系統可判定系統經指派以處置特定稀疏元素之子集包括基於一查找表而判定系統經指派以處置特定稀疏元素之一子集。舉例而言,資料提取單元304可藉由使用一查找表而判定處理器指派。 系統基於指定且藉由該複數個處理器中之一第一處理器提取特定稀疏元素之子集中之一第一稀疏元素(506)。舉例而言,參考圖3A,資料提取單元304可指令處理器322a提取包含於信號344中之一稀疏元素。 系統基於指定且藉由該複數個處理器中之一第二處理器提取特定稀疏元素之子集中之一第二稀疏元素(508)。舉例而言,參考圖3A,資料提取單元304可指令處理器322b提取包含於信號344中之一不同稀疏元素。 在某些實施方案中,可接收包含來自第一處理器之第一稀疏元素之一第一矩陣,其中該第一矩陣可具有一第一尺寸。系統可產生包含第一稀疏元素之一第二矩陣,該第二矩陣具有小於第一尺寸之一第二尺寸。舉例而言,稀疏縮減單元306可經組態以縮減所提取稀疏元素346之尺寸。處理器322a至322k中之每一者可產生具有100×1之尺寸之一稀疏元素。稀疏縮減單元306可接收具有100×k 之尺寸之所提取稀疏元素346,且藉由通過邏輯操作、算數操作或兩者之一組合將所提取稀疏元素346之尺寸縮減至100×1而產生經稀疏縮減元素348。系統可產生輸出密集矩陣,可基於第二矩陣而產生輸出密集矩陣。舉例而言,串連單元308可經組態以重新配置且串連經稀疏縮減元素348以產生經串連元素350。 在某些實施方案中,可在一第一時間點處接收第一稀疏元素,且可在一第二不同時間點處接收第二稀疏元素。系統可判定用於輸出密集矩陣之第一稀疏元素及第二稀疏元素之一次序。舉例而言,參考圖3A,處理器322a可比經配置之處理器322b傳回所提取5號稀疏元素快速地將所提取10號稀疏元素傳回至稀疏縮減單元306。串連單元308經組態以將較晚接收之5號稀疏元素重新配置為被安排在較早接收之10號稀疏元素前面,且將1號至200號稀疏元素串連為經串連元素350。 系統基於至少應用於第一稀疏元素及第二稀疏元素之一變換而產生一輸出密集矩陣(510)。在某些實施方案中,系統可壓縮輸出密集矩陣以產生一經壓縮輸出密集矩陣。系統可將該經壓縮輸出密集矩陣提供至節點網路。舉例而言,壓縮/解壓縮單元310可經組態以壓縮經串連元素350以產生用於節點網路320之一密集矩陣352。 在某些實施方案中,系統可接收表示經由節點網路發送之一密集矩陣之一第一密集矩陣,且基於第一密集矩陣、第一稀疏元素及第二稀疏元素而產生輸出密集矩陣。舉例而言,稀疏元素存取單元300可經由節點網路320自一鄰近稀疏元素存取單元接收一密集矩陣。稀疏元素存取單元300可解壓縮該所接收密集矩陣,且可串連該經解壓縮密集矩陣與經串連元素350以形成可經壓縮且然後輸出至節點網路320之經更新經串連元素。 在某些實施方案中,特定稀疏元素中之一或多個稀疏元素係多維矩陣,且輸出密集矩陣係一向量。本說明書中所闡述之標的物及功能操作之實施例可實施於包含本說明書中所揭示之結構及其結構等效物之數位電子電路、有形地體現之電腦軟體或韌體、電腦硬體或者其等中之一或多者之組合中。本說明書中所闡述之標的物之實施例可實施為一或多個電腦程式,亦即,編碼於一有形非暫時性程式載體上以供資料處理裝置執行或控制資料處理裝置之操作之一或多個電腦程式指令模組。替代地或另外,該等程式指令可編碼於一人工產生之所傳播信號(例如,一機器產生之電、光學或電磁信號)上,該人工產生之所傳播信號經產生以編碼用於傳輸至適合接收器裝置以由一資料處理裝置執行之資訊。電腦儲存媒體可係一機器可讀儲存器件、一機器可讀儲存基板、一隨機或串列存取記憶體器件或者其等中之一或多者之一組合。 術語「資料處理裝置」囊括用於處理資料之所有種類之裝置、器件及機器,以實例方式包含一可程式化處理器、一電腦或者多個處理器或電腦。該裝置可包含特殊用途邏輯電路,例如,一FPGA (場可程式化閘陣列)或一ASIC (特殊應用積體電路)。除硬體之外,該裝置亦可包含為所討論之電腦程式建立一執行環境之程式碼,例如,構成處理器韌體、一協定堆疊、一資料庫管理系統、一作業系統或其等中之一或多者之一組合的程式碼。 一電腦程式(其亦可稱為或闡述為一程式、軟體、一軟體應用程式、一模組、一軟體模組、一腳本或程式碼)可以任何形式之程式設計語言(包含編譯語言或解譯語言或者宣告語言或程序語言)來寫入,且其可以任何形式經部署,包含部署為一獨立程式或部署為一模組、組件、副常式或適合在一計算環境中使用之其他單元。一電腦程式可以但無需對應於一檔案系統中之一檔案。一程式可儲存於保存其他程式或資料(例如,儲存於一標記語言文檔中之一或多個腳本)之一檔案之一部分中、儲存於專用於所討論之程式之一單個檔案中或儲存於多個協調檔案(例如,儲存一或多個模組、子程式或程式碼之部分之檔案)中。一電腦程式可經部署以在一個電腦上或在位於一個位點處或跨越多個位點分佈且藉由一通信網路互連之多個電腦上執行。 本說明書中所闡述之程序及邏輯流程可由一或多個可程式化電腦執行,該一或多個可程式化電腦執行一或多個電腦程式以藉由對輸入資料進行操作且產生輸出而執行功能。該等程序及流程亦可由特殊用途邏輯電路(例如,一FPGA (場可程式化閘陣列)、一ASIC (特殊應用積體電路)或一GPGPU (一般用途圖形處理單元))執行且裝置亦可實施為特殊用途邏輯電路。 適合用於執行一電腦程式之電腦以實例方式包含、可基於一般或特殊用途微處理器或兩者或任一其他種類之中央處理單元。一般而言,一中央處理單元將自一唯讀記憶體或一隨機存取記憶體或兩者接收指令及資料。一電腦之基本元件係用於執行指令之一中央處理單元及用於儲存指令及資料之一或多個記憶體器件。一般而言,一電腦亦將包含用於儲存資料之一或多個大容量儲存器件(例如,磁碟、磁光碟或光碟)或經操作地耦合以自該一或多個大容量儲存器件接收資料或向其傳送資料或既接收資料又傳送資料。然而,一電腦不需要具有此等器件。此外,一電腦可嵌入於另一器件中,例如僅舉幾例,一行動電話、一個人數位助理(PDA)、一行動音訊或視訊播放器、一遊戲控制台、一全球定位系統(GPS)接收器或一可攜式儲存器件(例如,一通用串列匯流排(USB)隨身碟)。 適合用於儲存電腦程式指令及資料之電腦可讀媒體包含所有形式之非揮發性記憶體、媒體及記憶體器件,以實例方式包含:半導體記憶體器件,例如EPROM、EEPROM及快閃記憶體器件;磁碟,例如內部硬碟或可抽換式磁碟;磁光碟;及CD ROM碟及DVD-ROM碟。處理器及記憶體可由特殊用途邏輯電路補充或併入特殊用途邏輯電路中。 為提供與一使用者之互動,本說明書中所闡述之標的物之實施例可實施於具有用於向該使用者顯示資訊之一顯示器件(例如,一CRT(陰極射線管)或LCD(液晶顯示器)監視器)以及該使用者可藉以向電腦提供輸入之一鍵盤及一指向器件(例如,一滑鼠或一軌跡球)之一電腦上。亦可使用其他種類之器件來提供與一使用者之互動;舉例而言,提供給該使用者之回饋可為任何形式之傳感回饋,例如,視覺回饋、聽覺回饋或觸覺回饋;且來自該使用者之輸入可以任何形式來接收,包含聲音、語音或觸覺輸入。另外,一電腦可藉由以下方式與一使用者互動:將文檔發送至該使用者所使用之一器件及自該器件接收文檔;舉例而言,回應於自網頁瀏覽器接收之請求而將網頁頁面發送至一使用者之用戶端器件上之一網頁瀏覽器。 本說明書中所闡述之標的物之實施例可實施於一計算系統中,該計算系統包含一後端組件(例如,作為一資料伺服器),或包含一中間軟體組件(例如,一應用程式伺服器),或包含一前端組件(例如,具有一圖形使用者介面之一用戶端電腦或一使用者可透過其與本說明書中所闡述之標的物之一實施方案互動之一網頁瀏覽器)或者一或多個此類後端、中間軟體或前端組件之任何組合。該系統之該等組件可藉由任何數位資料通信形式或媒體(例如,一通信網路)來互連。通信網路之實例包含一區域網路(「LAN」)及一廣域網路(「WAN」),例如網際網路。 該計算系統可包含用戶端及伺服器。一用戶端與伺服器一般彼此遠離且通常透過一通信網路互動。用戶端與伺服器之關係藉助於在各別電腦上運行且彼此之間具有一用戶端-伺服器關係之電腦程式而產生。 儘管本說明書含有諸多具體實施細節,但此等具體實施細節不應被解釋為對任何發明或可請求之內容之範疇之限制,而是解釋為對特定發明之特定實施例可特有之特徵之說明。亦可將在本說明書中在單獨實施例之上下文中闡述之特定特徵以組合形式實施於一單個實施例中。相反地,在一單個實施例之上下文中闡述之各種特徵亦可單獨地或以任何適合子組合實施於多個實施例中。此外,儘管上文可將特徵闡述為以特定組合形式起作用且甚至最初係如此主張,但在某些情形中,可自一所主張組合去除來自該組合之一或多個特徵,且所主張之組合可係針對於一子組合或一子組合之變化形式。 類似地,儘管在圖式中以一特定次序繪示操作,但不應將此理解為需要以所展示之特定次序或以循序次序執行此類操作,或執行所有所圖解說明之操作以實現合意結果。在特定情況下,多任務及並行處理可係有利的。此外,不應將上文所闡述之實施例中之各種系統模組及組件之分離理解為在所有實施例中需要此分離,且應理解,一般可將所闡述之程式組件及系統共同整合於一單個軟體產品中或封裝至多個軟體產品中。 已闡述標的物之特定實施例。其他實施例亦在以下申請專利範圍之範疇內。舉例而言,申請專利範圍中所敘述之動作可以一不同次序執行且仍達成合意結果。作為一項實例,附圖中所繪示之程序未必需要所展示之特定次序或順序次序來達成合意結果。在特定實施方案中,多任務及並行處理可係有利的。Generally, data can be represented in the form of a matrix and a computing system can manipulate the data using linear algebra algorithms. A matrix can be a one-dimensional vector or a multidimensional matrix. A matrix can be represented by a data structure such as a database table or a variable. However, when the size of a matrix is too large, it may not be possible to store the entire matrix in a data store. A dense matrix can be transformed into a plurality of sparse elements, where each sparse element can be stored in a different data store. A sparse element of a dense matrix may be a matrix in which only a small submatrix (eg, a single-valued element, column, row, or submatrix) of the matrix has nonzero values. When a computing system needs to access the dense matrix, the central processing unit (CPU) can start going to each of the data stores to fetch a thread of stored sparse elements, and apply a sparse-dense transformation to restore the dense matrix. However, the amount of time it takes to fetch all sparse elements can be significant, and it turns out that the computational bandwidth of the CPU can be underutilized. In some cases, a computing system may need to access sparse elements of several dense matrices, where the dense matrices may not be of equal size, to form a new dense matrix. The CPU idle time associated with a thread reaching each of the data stores to fetch the sparse elements of the different dense matrices may suffer from different latencies and may further affect the performance of the computing device in an undesirable manner. In some cases, a computing system may need to access sparse elements of several dense matrices, where the sparse elements may not be of equal size, to form a new dense matrix. The CPU idle time associated with a thread reaching each of the data stores to fetch the sparse elements of the different dense matrices may suffer from different latencies and may further affect the performance of the computing device in an undesirable manner. A hardware sparse-to-dense transformation unit separate from a CPU can increase the computational bandwidth of the processor by collecting sparse elements and transforming the sparse elements into a dense matrix operating independently of the CPU. 1 shows a block diagram of an example computing system 100 for transforming sparse elements from one or more dense matrices to produce a dense matrix. The computing system 100 includes a processing unit 102 , a sparse-dense transformation unit 104 and data partitions 106 a to 106 k , wherein k is an integer greater than 1. In general, the processing unit 102 processes an instruction for accessing a target dense matrix, and sends an instruction 110 to the sparse-dense transformation unit 104 to generate the target dense matrix. The sparse-to-dense transformation unit 104 accesses corresponding sparse elements 108 a to 108 n from one or more of the data partitions 106 a to 106 k , where n is an integer greater than one. The sparse-to-dense transformation unit 104 generates the target dense matrix 112 using the corresponding sparse elements 108a-108n, and provides the target dense matrix 112 to the processing unit 102 for further processing. For example, the sparse elements 108a-108n may be two-dimensional matrices with different sizes, and the sparse-dense transformation unit 104 may convert each of the sparse elements 108a-108n into a vector and concatenate the n vectors into a single vector to produce the target dense matrix 112. In certain implementations, the processing unit 102 may process an instruction to update a target dense matrix and send an updated dense matrix to the sparse-dense transform unit 104 . The sparse-dense transformation unit 104 may transform the updated dense matrix into corresponding sparse elements and accordingly update one or more sparse elements stored in the data partitions 106a-106k. Processing unit 102 is configured to process instructions for execution within computing system 100 . The processing unit 102 may include one or more processors. In some implementations, the processing unit 102 is configured to process the target dense matrix 112 generated by the sparse-to-dense transform unit 104 . In certain other implementations, the processing unit 102 may be configured to request the sparse-dense transformation unit 104 to generate the target dense matrix 112 , and another processing unit may be configured to process the target dense matrix 112 . Data partitions 106a-106k store data including sparse elements 108a-108n. In some implementations, the data partitions 106a-106k may be one or several volatile memory units. In some other implementations, the data partitions 106a-106k may be one or several non-volatile memory units. Data partitions 106a-106k may also be another form of computer-readable medium, such as a device in a storage area network or other configuration. Data partitions 106a-106k may be coupled to sparse-dense transformation unit 104 using electrical, optical, or wireless connections. In certain implementations, the data partitions 106 a - 106 k may be part of the sparse-to-dense transform unit 104 . The sparse-to-dense transformation unit 104 is configured to determine a dense matrix based on sparse elements. In some implementations, the sparse-dense transformation unit 104 can be configured to determine the locations of sparse elements based on a dense matrix. In certain implementations, the sparse-to-dense transform unit 104 may include a plurality of interconnected sparse element access units, as explained in more detail below with reference to FIGS. 2A-2D . FIG. 2A shows an example sparse-dense transform unit 200 . The sparse-dense transform unit 200 may correspond to the sparse-dense transform unit 104 . The sparse-dense transformation unit 200 includes M × N sparse element access units X 1,1 to X M,N physically or logically arranged in M columns and N rows, where M and N are integers equal to or greater than 1 . In certain implementations, the sparse-to-dense transform unit 200 may include additional circuitry configured to process data. In general, the sparse-to-dense transformation unit 200 is configured to receive a request for a dense matrix and determine a dense matrix based on corresponding sparse elements accessible by sparse element access units X 1,1 through X M,N matrix. In general, each sparse element access unit is configured to access a specified set of sparse elements, and is explained in more detail below with reference to FIGS. 3A-3B . In some implementations, a sparse element access unit may be a single instruction multiple data (SIMD) processing device. In some embodiments, the sparse element access units X 1,1 to X M,N can be physically or logically configured in a two-dimensional mesh configuration. For example, sparse element access unit X 1,1 is directly coupled to sparse element access units X 1,2 and X 2,1 . As another example, sparse element access unit X 2,2 is directly coupled to sparse element access units X 2,1 , X 3,1 , X 2,3 and X 1,2 . The coupling between two sparse element access units may be an electrical connection, an optical connection, a wireless connection, or any other suitable connection. In some other embodiments, the sparse element access units X 1,1 to X M,N may be physically or logically configured in a two-dimensional torus configuration. For example, sparse element access unit X 1,1 is directly coupled to sparse element access units X 1,2 , X 2,1 , X 1,N and X M,1 . As another example, the sparse element access unit X M,N is directly coupled to the sparse element access units X M,N-1 , X M-1,N , X M,1 and X 1,N . In some implementations, the sparse-to-dense transform unit 200 may be configured to partition the sparse elements transformed from the dense matrix according to a predetermined set of conditions. Each column of sparse element access units X 1,1 to X M,N can be partitioned to access sparse elements from a particular dense matrix transformation. For example, the sparse-dense transformation unit 200 can be configured to access sparse elements from dense matrix transformations corresponding to 1,000 different database tables of a computer model. One or more of the database tables may be of different sizes. The first column 202 of the sparse element access unit can be configured to access sparse elements converted from database table No. 1 to database table No. 100, and the second column 204 of the sparse element access unit can be configured to store Sparse elements from database table 101 transformed to database table 300, and row M of the sparse element access unit 206 can be configured to access data transformed from database table 751 to database table 1,000 sparse elements. In some implementations, partitioning may be configured by hardware instructions before a processor uses the sparse-to-dense transform unit 200 to access sparse elements. Each row of sparse element access units X 1,1 to X M,N can be partitioned to access a subset of sparse elements from a particular dense matrix transformation. For example, the dense matrix corresponding to database table number 1 can be transformed into 1,000 sparse elements, where the 1,000 sparse elements can be accessed by the first column 202, as explained above. Sparse element access unit X 1,1 can be configured to access sparse elements No. 1 to No. 200 of database table No. 1, and sparse element access unit X 1,2 can be configured to access data No. 1 Sparse elements from No. 201 to No. 500 of the library table. As another example, the dense matrix corresponding to database table No. 2 can be transformed into 500 sparse elements, where the 500 sparse elements can be accessed by the first column 202, as set forth above. Sparse element access unit X 1,1 can be configured to access sparse elements No. 1 to No. 50 of database table No. 2, and sparse element access unit X 1,2 can be configured to access data No. 2 Sparse elements No. 51 to No. 200 of the library table. As another example, a dense matrix corresponding to database table number 1,000 can be transformed into 10,000 sparse elements, where the 10,000 sparse elements can be accessed by the Mth column 206, as set forth above. Sparse element access unit X M,1 can be configured to access sparse elements No. 1 to No. 2,000 of database table No. 1,000, and sparse element access unit X M, N can be configured to access data No. 1,000 9,000 to 10,000 sparse elements of the library table. FIG. 2B shows an example of how the sparse-to-dense transformation unit 200 may request sparse elements using a two-dimensional mesh of sparse element access units. As an example, a processing unit may execute an instruction to request the sparse-dense transformation unit 200 for one of the following: using sparse elements No. 1 to No. 50 of database table No. Elements and sparse elements 9,050 to 9,060 of the 1,000 database table yield a dense one-dimensional vector. After sparse-dense transformation unit 200 receives a request from a processing unit, sparse-dense transformation unit 200 may instruct sparse element access unit X 1,1 to broadcast a request for a sparse element to other sparse elements in the mesh network access unit. Sparse element access unit X 1,1 may broadcast a request 222 to sparse element access unit X 1,2 and a request 224 to sparse element access unit X 2,1 . After receiving the request 222, the sparse element access unit X 1,2 may broadcast a request 226 to the sparse element access unit X 1,3 . In some implementations, a sparse element access unit can be configured to broadcast a request to another sparse element access unit based on a routing scheme. For example, sparse element access unit X 1,2 may not be configured to broadcast a request to sparse element access unit X 2,2 because sparse element access unit X 2,2 is configured to receive Broadcast from one of the sparse element access units X 2,1 . The routing scheme can be static or dynamically generated. For example, the routing scheme can be a lookup table. In certain implementations, one sparse element access unit can be configured to broadcast a request 224 to another sparse element access unit based on the request 224 . For example, request 224 may include an identification of the requested sparse element (e.g., database table number 1, sparse element numbers 1 through 50), and based on these identifications, sparse element access unit X 1,2 may determine whether The request 224 is broadcast to the sparse element access unit X 2,2 and/or the sparse element access unit X 1,3 . The broadcast procedure is propagated through the mesh network, wherein the sparse element access unit XM, N receives a request 230 from the sparse element access unit XM,N−1 . FIG. 2C shows an example of how the sparse-to-dense transform unit 200 may generate a requested dense matrix using a two-dimensional mesh network of sparse element access units. In some implementations, after a sparse element access unit receives a broadcast request, the sparse element access unit is configured to determine whether it is configured to access any of the requested sparse elements. For example, sparse element access unit X 1,1 may determine that it is configured to access sparse elements 1 through 50 of database table 1, but it is not configured to access database table 2 100 to 200 sparse elements or 9,050 to 9,060 sparse elements of 1,000 database tables. In response to determining that it is configured to access sparse elements 1 through 50 of tables of database 1, sparse element access unit X 1,1 may fetch database 1 from the data partition in which these sparse elements are stored Sparse elements No. 1 to No. 50 of the table, and generate a dense matrix 242 based on these sparse elements. As another example, sparse element access unit X 2,1 may determine that it is not configured to access sparse elements 1 through 50 of database table 1, and sparse elements 100 through 200 of database table 2. element or any of the sparse elements 9,050 to 9,060 of the 1,000 database table. In response to determining that it is not configured to access any of the requested sparse elements, sparse element access unit X 2,1 may not perform any further action. As another example, sparse element access unit X 1,2 may determine that it is configured to access sparse elements 100 through 200 of table 2 in database 2, but it is not configured to access database 1 Sparse elements 1 to 50 for tables or 9,050 to 9,060 for database tables 1,000. In response to determining that it is configured to access sparse elements 100 through 200 of database table 2, the sparse element access unit X 1,2 may fetch these sparse elements from the data partition in which they are stored , and a dense matrix 244 is generated based on these sparse elements. In some implementations, after a sparse element access unit generates a dense matrix, the sparse element access unit may be configured to forward the dense matrix to the sender of the broadcast request. Here, the sparse element access unit X 1,2 forwards the dense matrix 244 to the sparse element access unit X 1,1 . As another example, sparse element access unit XM,N may determine that it is configured to access sparse elements 9,050 to 9,060 of table 1,000, but it is not configured to access database 1 Sparse elements No. 1 to No. 50 of the table or No. 100 to No. 200 sparse elements of the No. 2 database table. In response to determining that it is configured to access sparse elements 9,050 through 9,060 of database table 1,000, the sparse element access unit XM ,N may extract these sparse elements from the data partition in which they are stored , and a dense matrix 246 is generated based on these sparse elements. In some implementations, after a sparse element access unit generates a dense matrix, the sparse element access unit may be configured to forward the dense matrix to the sender of the broadcast request. Here, the sparse element access unit X M,N forwards the dense matrix 246 to the sparse element access unit X M,N−1 . In the next cycle, sparse element access unit XM,N-1 is configured to forward dense matrix 246 to sparse element access unit XM,N-2 . This procedure continues until sparse element access unit X 2,1 has forwarded dense matrix 246 to sparse element access unit X 1,1 . In certain implementations, the sparse-to-dense transform unit 200 is configured to transform a dense matrix produced by a sparse element access unit and produce a dense matrix for a processor unit. Here, the sparse-to-dense transform unit 200 transforms the dense matrices 242, 244, and 246 into one dense matrix for the processor unit. For example, dense matrix 242 may have dimensions of 100×10, dense matrix 244 may have dimensions of 20×100, and dense matrix 246 may have dimensions of 3×3. Sparse-dense transform unit 200 may transform dense matrices 242, 244, and 246 into one vector having a size of 1x3009. Advantageously, partitioning the columns according to the dense matrix (eg, database table) allows the sparse-dense transformation unit 200 to obtain all the requested sparse elements after the resulting dense matrix has been propagated to row 1 by itself N. Row partitioning reduces bandwidth bottlenecks caused by accessing too many sparse elements using only one of the sparse element access units. FIG. 2D shows an example of how the sparse-to-dense transformation unit 200 may update sparse elements based on a dense matrix using a two-dimensional mesh network of sparse element access units. As an example, a processing unit may execute an instruction that requests the sparse-to-dense transformation unit 200 to update the stored sparse elements with a dense one-dimensional vector using sparse elements 1 through 50 of database table 1 and 9,050 to 9,060 sparse elements of the 1,000 database table. After the sparse-dense transformation unit 200 receives the request from the processing unit, the sparse-dense transformation unit 200 can instruct the sparse element access unit X 1,1 to broadcast a sparse element update request to other sparse element accesses in the mesh network unit, wherein the sparse element update request may include a dense one-dimensional vector provided by the processing unit. In some implementations, the sparse element access unit X 1,1 may determine whether it is assigned to access sparse elements contained in a dense one-dimensional vector. In response to determining that it is assigned to access a sparse element contained in a dense one-dimensional vector, the sparse element access unit X 1,1 may update the sparse element stored in the data partition. Here, sparse element access unit X 1,1 determines that it is assigned to access sparse elements No. 1 to No. 50 of database table No. 1, and sparse element access unit X 1,1 executes update One of such sparse element directives. Sparse element access unit X 1,1 may broadcast a sparse element update request 252 to sparse element access unit X 1,2 and a sparse element update request 254 to sparse element access unit X 2,1 . After receiving the sparse element update request 252, the sparse element access unit X 1,2 may determine that it is not assigned to access the sparse elements contained in the dense one-dimensional vector. The sparse element access unit X 1,2 broadcasts a request 256 to the sparse element access unit X 1,3 . The broadcast procedure is propagated through the mesh network, wherein the sparse element access unit XM, N receives a request 260 from the sparse element access unit XM,N−1 . Here, sparse element access unit XM,N determines that it is assigned to access sparse element numbers 9,050 to 9,060 of database table number 1,000, and sparse element access unit XM ,N executes to update the data partition Instructions for one of these sparse elements. FIG. 3A shows an example sparse element access unit 300 . The sparse element access unit 300 may be any one of the sparse element access units X 1,1 to X M,N . In general, the sparse element access unit 300 is configured to receive a request 342 from the node network 320 to extract sparse elements stored in one or more data partitions and to transform the extracted sparse elements into a dense matrix. In some embodiments, a processing unit 316 sends a request for a dense matrix generated using sparse elements to a sparse element access unit in node network 320 . The sparse element access unit may broadcast a request 342 to the sparse element access unit 300 . The routing of broadcast request 342 may be similar to that illustrated in FIG. 2B. The sparse element access unit 300 includes a request identification unit 302 , a data extraction unit 304 , a sparse reduction unit 306 , a concatenation unit 308 , a compression/decompression unit 310 and a splitting unit 312 . The node network 320 can be a two-dimensional mesh network. Processing unit 316 may be similar to processing unit 102 . In general, the request identification unit 302 is configured to receive a request 342 to fetch a sparse element stored in one or more data partitions 330, and determine whether the sparse element access unit 300 is assigned to access the sparse element indicated by the request 342 sparse elements. In some implementations, request identification unit 302 may determine whether sparse element access unit 300 is assigned to access the sparse element indicated by request 342 by using a lookup table. For example, if an identification of a particular requested sparse element (e.g., number 1 of database table number 1) is included in the lookup table, request identification unit 302 may extract signal 344 of the particular requested sparse element Send to the data extraction unit 304. If an identification of a particular requested sparse element (eg, number 1 of database table number 1 ) is not included in the lookup table, request identification unit 302 may discard the received request. In certain implementations, the request identification unit 302 may be configured to broadcast the received request to another sparse element access unit on the node network 320 . Data extraction unit 304 is configured to extract one or more requested sparse elements from data partition 330 in response to receiving signal 344 . In some embodiments, the data extraction unit 304 includes one or more processors 322a to 322k, where k is an integer. Processors 322a-322k may be vector processing units (VPUs), array processing units, or any suitable processing units. In some implementations, the processors 322a - 322k are configured near the data partition 330 to reduce latency between the processors 322a - 322k and the data partition 330 . Based on the number of requested sparse elements that sparse element access unit 300 is assigned to fetch, data fetch unit 304 may be configured to generate one or more requests to be distributed among processors 322a-322k. In certain implementations, each of the processors 322a through 322k can be assigned to a particular sparse element based on the identification of the sparse element, and the data extraction unit 304 can be configured to target the processor 322a based on the identification of the sparse element. to 322k to generate one or more requests. In some implementations, the data extraction unit 304 can determine the processor assignment by using a lookup table. In some embodiments, data extraction unit 304 may generate multiple batches for processors 322a through 322k, where each batch is a request for one of a subset of the requested sparse elements. Processors 322 a - 322 k are configured to independently extract assigned sparse elements from data partition 330 , and forward extracted sparse elements 346 to sparse reduction unit 306 . Sparse reduction unit 306 is configured to reduce the size of extracted sparse elements 346 . For example, each of processors 322a-322k may generate a sparse element having a size of 100x1. The sparse reduction unit 306 may receive the extracted sparse elements 346 having a size of 100× k , and generate a result by reducing the size of the extracted sparse elements 346 to 100×1 through logical operations, arithmetic operations, or a combination of both. Sparse reduced elements 348. Sparse downscale unit 306 is configured to output the sparsely downscaled elements 348 to concatenation unit 308 . Concatenation unit 308 is configured to reconfigure and concatenate sparsely reduced elements 348 to produce concatenated elements 350 . For example, the sparse element access unit X 1,1 can be configured to access sparse elements No. 1 to No. 200 of database table No. 1 . The processor 322a may return the extracted sparse element No. 10 to the sparse reduction unit 306 faster than the processor 322b is configured to return the extracted sparse element No. 5. The concatenation unit 308 is configured to reconfigure the later received sparse element No. 5 to be arranged in front of the earlier received sparse element No. 10, and to concatenate the sparse elements Nos. 1 to 200 into the concatenated element 350 . Compression/decompression unit 310 is configured to compress concatenated elements 350 to produce a dense matrix 352 for node network 320 . For example, compression/decompression unit 310 may be configured to compress zero values in concatenated elements 350 to improve the bandwidth of node network 320 . In some implementations, compression/decompression unit 310 may decompress a received dense matrix. For example, sparse element access unit 300 may receive a dense matrix from a neighboring sparse element access unit via node network 320 . Sparse element access unit 300 may decompress the received dense matrix, and may concatenate the decompressed dense matrix with concatenated elements 350 to form updated concatenated elements that may be compressed and then output to node network 320 . FIG. 3B shows an example of how sparse element access unit 300 may update sparse elements based on a dense matrix received from node network 320 . As an example, a processing unit may execute an instruction that requests the sparse-to-dense transformation unit to update the stored sparse elements with a dense one-dimensional vector using sparse elements 1 through 50 of database table 1 and Sparse elements 9,050 to 9,060 of the 1,000 database table are generated. After the sparse-dense transformation unit receives the request from the processing unit, the sparse-dense transformation unit may send one of the following requests 362 instructing the sparse element access unit 300 to determine whether it is assigned to access the sparse elements contained in the dense one-dimensional vector element. The request identification unit 302 is configured to determine whether the sparse element access unit 300 is assigned to access sparse elements contained in a dense one-dimensional vector. In response to determining that the sparse element access unit 300 is assigned to access the sparse elements contained in the dense one-dimensional vector, the request identification unit 302 may send an indication 364 to the split unit 312 to update the sparse elements stored in the data partition. Splitting unit 312 is configured to transform a received dense matrix into sparse elements that can be updated by data extraction unit 304 in data partition 330 . For example, split unit 312 may be configured to transform a dense one-dimensional vector into a plurality of sparse elements, and instruct data fetch unit 304 to update the sparse elements stored in data partition 330 that sparse element access unit 300 is assigned to fetch . FIG. 4 illustrates a flowchart of an example of a procedure 400 for generating a dense matrix. The procedure 400 may be performed by a system such as the sparse-dense transformation unit 104 or the sparse-dense transformation unit 200 . The system may include a first sparse element access unit group and a second sparse element access unit group. For example, referring to FIG. 2A , the sparse-dense transform unit 200 may include M × N sparse element access units X 1,1 to X M,N physically or logically arranged in M columns and N rows. Each column of sparse element access units X 1,1 to X M,N can be partitioned to access sparse elements from a particular dense matrix transformation. In some implementations, the first sparse element access unit group may include a first sparse element access unit and a second sparse element access unit. For example, the first column of the sparse-dense transform unit 200 may include sparse element access units X 1,1 and X 1,2 . In some embodiments, the first sparse element access unit group and the second sparse element access unit group may be configured in a two-dimensional mesh configuration. In some implementations, the first sparse element access unit group and the second sparse element access unit group may be configured in a two-dimensional torus configuration. The system receives a request for an output matrix based on sparse elements, including sparse elements associated with a first dense matrix and sparse elements associated with a second dense matrix (402). For example, referring to FIG. 2B, a processing unit may execute an instruction requesting a dense one-dimensional vector from the sparse-dense transformation unit 200, and the dense one-dimensional vector uses the sparse elements No. 1 to No. 50 of the No. 1 database table, The 100th to 200th sparse elements of the No. 2 database table and the 9,050 to 9,060th sparse elements of the 1,000th database table are generated. In some implementations, the first sparse element access unit can receive a request for one of a plurality of sparse elements including a sparse element associated with the first dense matrix and a sparse element associated with the second dense matrix. The first sparse element access unit may transmit the request to the second sparse element access unit. For example, referring to FIG. 2B , after the sparse-dense transformation unit 200 receives the request from the processing unit, the sparse-dense transformation unit 200 may instruct the sparse element access unit X 1,1 to broadcast a request for one of the sparse elements to the network Other sparse element access units in the mesh network. The sparse element access unit X 1,1 may broadcast a request 222 to the sparse element access unit X 1,2 . The system obtains sparse elements associated with the first dense matrix extracted from a first sparse element access unit group (404). In some embodiments, the first sparse element access unit may determine an identity of a particular one of the plurality of sparse elements and a sparse element in a first subset of sparse elements associated with the first dense matrix One of the identities matches. For example, referring to FIG. 2C , the sparse element access unit X 1,1 can be configured to access sparse elements No. 1 to No. 200 of database table No. 1 . Sparse element access unit X 1,1 can determine that it is configured to access sparse elements 1 to 50 of database table 1, but it is not configured to access 100 to 50 of database table 2 200 sparse elements or 9,050 to 9,060 sparse elements of 1,000 database tables. In response to determining that the identity of a particular sparse element of the plurality of sparse elements matches an identity of a sparse element in a first subset of sparse elements associated with the first dense matrix, the first sparse element access unit may extract a sparse element containing the specified A first subset of sparse elements associated with the first dense matrix of sparse elements. For example, in response to determining that it is configured to access sparse elements 1 through 50 of database table 1, the sparse element access unit X 1,1 may be fetched from the data partition in which these sparse elements are stored Sparse elements from No. 1 to No. 50 in No. 1 database table. The second sparse element access unit can fetch a second different subset of the sparse elements associated with the first dense matrix. For example, referring to FIG. 2C , the sparse element access unit X 1,2 can be configured to access sparse elements No. 51 to No. 200 of database table No. 2 . In response to determining that it is configured to access sparse elements 100 through 200 of database table 2, the sparse element access unit X 1,2 may fetch these sparse elements from the data partition in which they are stored . The system obtains sparse elements associated with the second dense matrix extracted from a second sparse element access unit group (406). For example, referring to FIG. 2C, the second sparse element access unit group may be the Mth column of M × N sparse element access units, where the sparse element access unit XM,N may be configured to access 9,000 to 10,000 sparse elements of 1,000 database tables. In response to determining that it is configured to access sparse elements 9,050 through 9,060 of database table 1,000, the sparse element access unit XM ,N may extract these sparse elements from the data partition in which they are stored , and a dense matrix 246 is generated based on these sparse elements. In some implementations, a first sparse element access unit can extract a first subset of sparse elements associated with a first dense matrix from a first data partition, and a second sparse element access unit can extract a first subset of sparse elements from a first data partition. Two different data partitions extract a second different subset of sparse elements associated with the first dense matrix. For example, referring to FIG. 1, a first sparse element access unit may extract a first subset of sparse elements associated with a first dense matrix from data partition 106a, and a second sparse element access unit may extract from data partition 106b A second distinct subset of sparse elements associated with the first dense matrix is extracted. The system transforms the sparse elements associated with the first dense matrix and the sparse elements associated with the second dense matrix to produce an output comprising the sparse elements associated with the first dense matrix and the sparse elements associated with the second dense matrix Dense Matrix (408). For example, referring to FIG. 2C, sparse-dense transform unit 200 may transform dense matrices 242, 244, and 246 into one dense matrix for a processor unit. In some implementations, the sparse elements associated with the first dense matrix and the sparse elements associated with the second dense matrix can be multidimensional matrices, and the output dense matrix can be a vector. For example, dense matrix 242 may have dimensions of 100×10, dense matrix 244 may have dimensions of 20×100, and dense matrix 246 may have dimensions of 3×3. Sparse-dense transform unit 200 may transform dense matrices 242, 244, and 246 into one vector having a size of 1x3009. FIG. 5 is a flowchart illustrating an example of a procedure 500 for generating a dense matrix. The procedure 500 may be executed by a system such as the sparse-dense transform unit 104 or the sparse element access unit 300 . The system receives one indication for accessing a particular subset of sparse elements (502). For example, referring to FIG. 3A , data extraction unit 304 may be configured to receive a signal 344 for extracting one or more requested sparse elements from data partition 330 . In some implementations, a request for a particular sparse element stored in one or more data partitions may be received via a network of nodes. For example, referring to FIG. 3A , the request identification unit 302 may be configured to receive a request 342 to fetch a sparse element stored in a data partition 330 via a network of nodes 320 . The system may determine that a data extraction unit is assigned to handle a subset of a particular sparse element. For example, request identification unit 302 may be configured to determine whether sparse element access unit 300 is assigned to access the sparse element indicated by request 342 . In response to determining that the data extraction unit is assigned to handle a particular subset of sparse elements, an indication for accessing the particular subset of sparse elements may be generated. For example, if identification of a particular requested sparse element (e.g., number 1 of database table number 1) is included in a lookup table, request identification unit 302 may extract a signal for the particular requested sparse element 344 sent to the data extraction unit 304. The system determines a processor designation for extracting the particular subset of sparse elements based on the identification of the particular subset of sparse elements (504). For example, referring to FIG. 3A , the data extraction unit 304 includes one or more processors 322a to 322k. Each of the processors 322a-322k can be assigned to a particular sparse element based on the identification of the sparse element, and the data extraction unit 304 can be configured to generate data for one or the other of the processors 322a-322k based on the identification of the sparse element. Multiple requests. In some implementations, the system may determine that the system is assigned to handle a particular subset of sparse elements includes determining that the system is assigned to handle a particular subset of sparse elements based on a lookup table. For example, the data extraction unit 304 can determine the processor assignment by using a lookup table. The system extracts, based on the specification and by a first processor of the plurality of processors, a first sparse element in a subset of the specified sparse elements (506). For example, referring to FIG. 3A , the data extraction unit 304 may instruct the processor 322 a to extract a sparse element included in the signal 344 . The system extracts, based on the specification and by a second processor of the plurality of processors, a second sparse element in a subset of the specified sparse elements (508). For example, referring to FIG. 3A , the data extraction unit 304 can instruct the processor 322b to extract a different sparse element contained in the signal 344 . In some implementations, a first matrix including first sparse elements from a first processor can be received, where the first matrix can have a first size. The system may generate a second matrix including the first sparse elements, the second matrix having a second size smaller than the first size. For example, sparse reduction unit 306 may be configured to reduce the size of extracted sparse elements 346 . Each of processors 322a-322k may generate a sparse element having a size of 100x1. The sparse reduction unit 306 may receive the extracted sparse elements 346 having a size of 100× k , and generate a result by reducing the size of the extracted sparse elements 346 to 100×1 through logical operations, arithmetic operations, or a combination of both. Sparse reduced elements 348. The system can generate an output dense matrix, and the output dense matrix can be generated based on the second matrix. For example, concatenation unit 308 may be configured to reconfigure and concatenate sparsely reduced elements 348 to produce concatenated elements 350 . In some implementations, a first sparse element can be received at a first point in time, and a second sparse element can be received at a second different point in time. The system can determine the order of the first sparse elements and the second sparse elements for outputting the dense matrix. For example, referring to FIG. 3A , the processor 322a may return the extracted sparse element No. 10 to the sparse reduction unit 306 faster than the processor 322b is configured to return the extracted sparse element No. 5. The concatenation unit 308 is configured to reconfigure the later received sparse element No. 5 to be arranged in front of the earlier received sparse element No. 10, and to concatenate the sparse elements Nos. 1 to 200 into the concatenated element 350 . The system generates an output dense matrix based on a transformation applied to at least the first sparse element and the second sparse element (510). In some implementations, the system can compress the output dense matrix to produce a compressed output dense matrix. The system may provide the compressed output dense matrix to a network of nodes. For example, compression/decompression unit 310 may be configured to compress concatenated elements 350 to produce dense matrix 352 for node network 320 . In some implementations, the system may receive a first dense matrix representing a dense matrix sent over the network of nodes, and generate an output dense matrix based on the first dense matrix, the first sparse elements, and the second sparse elements. For example, sparse element access unit 300 may receive a dense matrix from a neighboring sparse element access unit via node network 320 . Sparse element access unit 300 may decompress the received dense matrix and may concatenate the decompressed dense matrix with concatenated elements 350 to form an updated concatenated which may be compressed and then output to node network 320 element. In some implementations, one or more of the particular sparse elements is a multidimensional matrix, and the output dense matrix is a vector. Embodiments of the subject matter and functional operations described in this specification can be implemented in digital electronic circuits, tangibly embodied computer software or firmware, computer hardware, or One or more of them in combination. Embodiments of the subject matter described in this specification may be implemented as one or more computer programs, that is, one or more computer programs encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, a data processing device. A plurality of computer program instruction modules. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal (e.g., a machine-generated electrical, optical, or electromagnetic signal) that is generated to encode for transmission to Information suitable for receiver means to be executed by a data processing means. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them. The term "data processing device" includes all kinds of devices, devices and machines for processing data, including by way of example a programmable processor, a computer or a plurality of processors or computers. The device may comprise special purpose logic circuits such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit). In addition to hardware, the device may also contain code that establishes an execution environment for the computer program in question, for example, constituting processor firmware, a protocol stack, a database management system, an operating system, or the like Code for one or a combination of one or more. A computer program (which may also be called or described as a program, software, a software application, a module, a software module, a script, or code) may be written in any form of programming language (including compiled or interpreted language (or declarative or procedural language), and it may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment . A computer program may, but need not, correspond to a file in a file system. A program may be stored in a portion of a file that holds other programs or data (for example, in one or more scripts in a markup language document), in a single file dedicated to the program in question, or in In multiple coordination files (for example, files that store one or more modules, subroutines, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network. The procedures and logic flows described in this specification can be executed by one or more programmable computers that execute one or more computer programs to perform by operating on input data and generating output Function. The programs and processes can also be executed by special purpose logic circuits (for example, an FPGA (Field Programmable Gate Array), an ASIC (Application Specific Integrated Circuit) or a GPGPU (General Purpose Graphics Processing Unit)) and the device can also implemented as special purpose logic circuits. Computers suitable for the execution of a computer program include, by way of example, may be based on general or special purpose microprocessors or both or any other kind of central processing unit. Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The basic elements of a computer are a central processing unit for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include or be operatively coupled to receive data from one or more mass storage devices (e.g., magnetic, magneto-optical, or optical disks) for storing data. data or transmit data to it or both receive and transmit data. However, a computer need not have such devices. Additionally, a computer may be embedded in another device such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, to name a few device or a portable storage device (eg, a Universal Serial Bus (USB) flash drive). Computer readable media suitable for storing computer program instructions and data includes all forms of non-volatile memory, media and memory devices, including by way of example: semiconductor memory devices such as EPROM, EEPROM and flash memory devices ; magnetic disks, such as internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD-ROM disks. The processor and memory can be supplemented by or incorporated in special purpose logic circuitry. To provide for interaction with a user, embodiments of the subject matter described in this specification may be implemented with a display device (for example, a CRT (cathode ray tube) or LCD (liquid crystal display)) for displaying information to the user. monitor) and a keyboard and a pointing device (eg, a mouse or a trackball) on the computer by which the user can provide input to the computer. Other types of devices may also be used to provide interaction with a user; for example, the feedback provided to the user may be any form of sensory feedback, such as visual feedback, auditory feedback, or tactile feedback; and from the User input may be received in any form, including sound, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device used by the user; for example, sending web pages in response to requests received from a web browser The page is sent to a web browser on a user's client device. Embodiments of the subject matter described in this specification can be implemented in a computing system that includes a back-end component (for example, as a data server), or that includes an intermediate software component (for example, as an application server server), or include a front-end component (e.g., a client computer having a graphical user interface or a web browser through which a user can interact with an implementation of the subject matter described in this specification) or Any combination of one or more such backend, middleware or frontend components. The components of the system can be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include a local area network ("LAN") and a wide area network ("WAN"), such as the Internet. The computing system may include clients and servers. A client and server are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by means of computer programs running on the respective computers and having a client-server relationship to each other. While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as illustrations of features that may be specific to particular embodiments of particular inventions. . Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, while features above may be stated as functioning in particular combinations, and even initially claimed as such, in some cases one or more features from a claimed combination may be removed from that combination and the claimed A combination may be directed to a sub-combination or a variation of a sub-combination. Similarly, while operations are depicted in a particular order in the drawings, this should not be understood as requiring that such operations be performed in the particular order shown, or in sequential order, or that all illustrated operations be performed, to achieve the desired result. In certain circumstances, multitasking and parallel processing may be advantageous. Furthermore, the separation of the various system modules and components in the embodiments described above should not be construed as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in within a single software product or packaged into multiple software products. Specific embodiments of the subject matter have been described. Other embodiments are also within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As one example, the procedures depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In particular implementations, multitasking and parallel processing may be advantageous.

100:計算系統 102:處理單元 104:稀疏-密集變換單元 106a-106k:資料分區 108a-108n:稀疏元素 110:指令 112:目標密集矩陣 200:稀疏-密集變換單元 202:第一列 204:第二列 206:第M列 222:請求 224:請求 226:請求 230:請求 242:密集矩陣 244:密集矩陣 246:密集矩陣 252:稀疏元素更新請求 254:稀疏元素更新請求 256:請求 260:請求 300:稀疏元素存取單元 302:請求識別單元 304:資料提取單元 306:稀疏縮減單元 308:串連單元 310:壓縮/解壓縮單元 312:分裂單元 320:節點網路 322a-322k:處理器 330:資料分區 342:請求/廣播請求 344:信號 346:所提取稀疏元素 348:經稀疏縮減元素 350:經串連元素 352:密集矩陣 362:請求 364:指示 X1,1 -XM,N :稀疏元素存取單元100: computing system 102: processing unit 104: sparse-dense transformation unit 106a-106k: data partition 108a-108n: sparse element 110: instruction 112: target dense matrix 200: sparse-dense transformation unit 202: first column 204: first row Second column 206: M column 222: request 224: request 226: request 230: request 242: dense matrix 244: dense matrix 246: dense matrix 252: sparse element update request 254: sparse element update request 256: request 260: request 300 : sparse element access unit 302: request identification unit 304: data extraction unit 306: sparse reduction unit 308: concatenation unit 310: compression/decompression unit 312: split unit 320: node network 322a-322k: processor 330: Data Partition 342: Request/Broadcast Request 344: Signal 346: Extracted Sparse Elements 348: Sparsely Reduced Elements 350: Concatenated Elements 352: Dense Matrix 362: Request 364: Indicates X 1,1 -X M,N : Sparse element access unit

圖1係一實例性計算系統之一方塊圖。 圖2A至圖2D圖解說明一實例性稀疏-密集變換單元。 圖3A至圖3B圖解說明一實例性稀疏元素存取單元。 圖4係圖解說明用於產生一密集矩陣之一程序之一實例之一流程圖。 圖5係圖解說明用於將稀疏元素變換成一密集矩陣之一程序之一實例之一流程圖。 在各個圖式中,相似元件符號及名稱指示相似元件。FIG. 1 is a block diagram of an exemplary computing system. 2A-2D illustrate an example sparse-dense transform unit. 3A-3B illustrate an example sparse element access unit. Figure 4 is a flowchart illustrating an example of a procedure for generating a dense matrix. Figure 5 is a flowchart illustrating an example of a procedure for transforming sparse elements into a dense matrix. In the various drawings, like element symbols and names indicate like elements.

100:計算系統 100: Computing Systems

102:處理單元 102: Processing unit

104:稀疏-密集變換單元 104:Sparse-dense transformation unit

106a-106k:資料分區 106a-106k: data partition

108a-108n:稀疏元素 108a-108n: sparse elements

110:指令 110: instruction

112:目標密集矩陣 112: Target Dense Matrix

Claims (15)

一種用於將稀疏(sparse)元素變換為一密集(dense)矩陣之系統,該系統包括:複數個稀疏元素存取單元,該複數個稀疏元素存取單元之每一稀疏元素存取單元經組態以:接收各別控制信號;基於該等各別控制信號存取儲存於對應於該稀疏元素存取單元之一資料分區(shard)中之多個稀疏元素;基於施加於自該資料分區所獲得之該等稀疏元素之一變換,以產生一輸出密集矩陣;及提供該輸出密集矩陣至該系統之一節點網路。 A system for transforming sparse (sparse) elements into a dense (dense) matrix, the system comprising: a plurality of sparse element access units, each sparse element access unit of the plurality of sparse element access units is grouped state to: receive respective control signals; access a plurality of sparse elements stored in a data partition (shard) corresponding to the sparse element access unit based on the respective control signals; transforming one of the obtained sparse elements to generate an output dense matrix; and providing the output dense matrix to a network of nodes of the system. 如請求項1之系統,其進一步包括:一稀疏-密集變換單元,其經組態以接收對應於該等各別控制信號之指令,其中該複數個稀疏元素存取單元係位於該稀疏-密集變換單元中。 The system of claim 1, further comprising: a sparse-dense transformation unit configured to receive instructions corresponding to the respective control signals, wherein the plurality of sparse element access units are located in the sparse-dense in the transformation unit. 如請求項2之系統,其中該稀疏-密集變換單元係包括一列(row)維度及一行(column)維度之一多維度稀疏-密集變換單元。 The system according to claim 2, wherein the sparse-dense transformation unit includes a multi-dimensional sparse-dense transformation unit of a row dimension and a column dimension. 如請求項3之系統,其中該複數個稀疏元素存取單元係沿著該多維度稀疏-密集變換單元之各別維度配置。 The system of claim 3, wherein the plurality of sparse element access units are arranged along respective dimensions of the multi-dimensional sparse-dense transformation unit. 如請求項2之系統,其中該複數個稀疏元素存取單元之每一者:包含一各別第一單元,其經組態以施加該變換至該等稀疏元素以產生該輸出密集矩陣。 The system of claim 2, wherein each of the plurality of sparse element access units: comprises a respective first unit configured to apply the transformation to the sparse elements to generate the output dense matrix. 如請求項5之系統,其中:該第一單元係一串連(concatenation)單元;及該變換係基於一串連操作。 The system of claim 5, wherein: the first unit is a concatenation unit; and the transformation is based on a concatenation operation. 如請求項5之系統,其中:該該第一單元係一壓縮/解壓縮單元;且該變換係基於一壓縮該等稀疏元素之操作。 The system of claim 5, wherein: the first unit is a compression/decompression unit; and the transformation is based on an operation of compressing the sparse elements. 一種用於使用一系統將稀疏元素變換為一密集矩陣之方法,該系統包括複數個稀疏元素存取單元,該方法包括:藉由一稀疏元素存取單元接收各別控制信號;藉由該稀疏元素存取單元基於該等經接收之各別控制信號存取儲存於對應於該稀疏元素存取單元之一資料分區中之多個稀疏元素;基於施加於自該資料分區所獲得之該等稀疏元素之一變換,以產生一輸出密集矩陣;及提供該輸出密集矩陣至該系統之一節點網路。 A method for transforming sparse elements into a dense matrix using a system comprising a plurality of sparse element access units, the method comprising: receiving respective control signals by a sparse element access unit; by the sparse an element access unit accesses a plurality of sparse elements stored in a data partition corresponding to the sparse element access unit based on the received respective control signals; transforming one of the elements to generate an output dense matrix; and providing the output dense matrix to a network of nodes of the system. 如請求項8之方法,其中: 該系統包含一稀疏-密集變換單元,其經組態以接收指令;該複數個稀疏元素存取單元係位於該稀疏-密集變換單元中;及該方法包括藉由該稀疏-密集變換單元接收對應於該複數個稀疏元素存取單元之每一者之各別控制信號之指令。 The method as claimed in item 8, wherein: The system includes a sparse-dense transform unit configured to receive instructions; the plurality of sparse element access units are located in the sparse-dense transform unit; and the method includes receiving, by the sparse-dense transform unit, corresponding Instructions for respective control signals at each of the plurality of sparse element access units. 如請求項9之方法,其中該稀疏-密集變換單元係包括一列維度及一行維度之一多維度稀疏-密集變換單元。 The method of claim 9, wherein the sparse-dense transformation unit is a multi-dimensional sparse-dense transformation unit comprising a column dimension and a row dimension. 如請求項10之方法,其中該複數個稀疏元素存取單元係沿著該多維度稀疏-密集變換單元之各別維度配置。 The method of claim 10, wherein the plurality of sparse element access units are arranged along respective dimensions of the multi-dimensional sparse-dense transformation unit. 如請求項9之方法,其中:該複數個稀疏元素存取單元之每一者包含一各別第一單元,其經組態以變換該等稀疏元素;及該方法包括藉由該第一單元施加對該等稀疏元素之該變換以產生該輸出密集矩陣。 The method of claim 9, wherein: each of the plurality of sparse element access units includes a respective first unit configured to transform the sparse elements; and the method includes, by the first unit The transformation is applied to the sparse elements to produce the output dense matrix. 如請求項12之方法,其中:該第一單元係一串連單元;及施加該變換包括:對該等稀疏元素施加一串連操作;及基於該串連操作串連該等稀疏元素以產生該輸出密集矩陣。 The method of claim 12, wherein: the first unit is a concatenated unit; and applying the transformation comprises: applying a concatenation operation to the sparse elements; and concatenating the sparse elements based on the concatenation operation to generate This outputs a dense matrix. 如請求項12之方法,其中:該第一單元係一壓縮/解壓縮單元;及施加該變換包括:對該等稀疏元素施加一壓縮操作;及基於該壓縮操作壓縮該等稀疏元素以產生該輸出密集矩陣。 The method of claim 12, wherein: the first unit is a compression/decompression unit; and applying the transformation comprises: applying a compression operation to the sparse elements; and compressing the sparse elements based on the compression operation to generate the output a dense matrix. 一種非暫時性機器可讀儲存裝置,其儲存用以藉由一系統之多個稀疏元素存取單元將稀疏元素變換為一密集矩陣之指令,該等指令藉由一處理器而可執行以引起包括以下操作之效能:藉由一稀疏元素存取單元接收各別控制信號;藉由該稀疏元素存取單元基於該等各別控制信號存取儲存於對應於該稀疏元素存取單元之一資料分區中之多個稀疏元素;基於藉由該稀疏元素存取單元施加於自該資料分區所獲得之該等稀疏元素之一變換,以產生一輸出密集矩陣;及提供該輸出密集矩陣至該系統之一節點網路。 A non-transitory machine-readable storage device storing instructions for transforming sparse elements into a dense matrix by a plurality of sparse element access units of a system, the instructions being executable by a processor to cause Including the performance of: receiving respective control signals by a sparse element access unit; accessing, by the sparse element access unit, data stored corresponding to the sparse element access unit based on the respective control signals a plurality of sparse elements in a partition; generating an output dense matrix based on a transformation applied by the sparse element access unit to the sparse elements obtained from the data partition; and providing the output dense matrix to the system One node network.
TW110100489A 2016-02-05 2016-12-29 System and method for transforming sparse elements into a dense matrix, and non-transitory machine-readable storage device TWI781509B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15/016,420 2016-02-05
US15/016,420 US9805001B2 (en) 2016-02-05 2016-02-05 Matrix processing apparatus

Publications (2)

Publication Number Publication Date
TW202131172A TW202131172A (en) 2021-08-16
TWI781509B true TWI781509B (en) 2022-10-21

Family

ID=57708453

Family Applications (4)

Application Number Title Priority Date Filing Date
TW107112523A TWI670613B (en) 2016-02-05 2016-12-29 System and method for transforming sparse elements into a dense matrix, and non-transitory machine-readable storage device
TW108126777A TWI718604B (en) 2016-02-05 2016-12-29 System and method for transforming sparse elements into a dense matrix, and non-transitory machine-readable storage device
TW110100489A TWI781509B (en) 2016-02-05 2016-12-29 System and method for transforming sparse elements into a dense matrix, and non-transitory machine-readable storage device
TW105143869A TWI624763B (en) 2016-02-05 2016-12-29 Matrix processing apparatus

Family Applications Before (2)

Application Number Title Priority Date Filing Date
TW107112523A TWI670613B (en) 2016-02-05 2016-12-29 System and method for transforming sparse elements into a dense matrix, and non-transitory machine-readable storage device
TW108126777A TWI718604B (en) 2016-02-05 2016-12-29 System and method for transforming sparse elements into a dense matrix, and non-transitory machine-readable storage device

Family Applications After (1)

Application Number Title Priority Date Filing Date
TW105143869A TWI624763B (en) 2016-02-05 2016-12-29 Matrix processing apparatus

Country Status (8)

Country Link
US (6) US9805001B2 (en)
EP (2) EP3203382A1 (en)
JP (4) JP6524052B2 (en)
KR (4) KR101980365B1 (en)
CN (2) CN107045493B (en)
BR (1) BR102016030970A8 (en)
SG (2) SG10201610977QA (en)
TW (4) TWI670613B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9805001B2 (en) 2016-02-05 2017-10-31 Google Inc. Matrix processing apparatus
US10521458B1 (en) * 2016-08-25 2019-12-31 Cyber Atomics, Inc. Efficient data clustering
US10635739B1 (en) 2016-08-25 2020-04-28 Cyber Atomics, Inc. Multidimensional connectivity graph-based tensor processing
JP6912703B2 (en) * 2017-02-24 2021-08-04 富士通株式会社 Arithmetic method, arithmetic unit, arithmetic program and arithmetic system
US10489481B1 (en) 2017-02-24 2019-11-26 Cyber Atomics, Inc. Efficient matrix property determination with pipelining and parallelism
US10936942B2 (en) 2017-11-21 2021-03-02 Google Llc Apparatus and mechanism for processing neural network tasks using a single chip package with multiple identical dies
CN108804684B (en) * 2018-06-13 2020-11-03 北京搜狗科技发展有限公司 Data processing method and device
US10719323B2 (en) 2018-09-27 2020-07-21 Intel Corporation Systems and methods for performing matrix compress and decompress instructions
CN113794709B (en) * 2021-09-07 2022-06-24 北京理工大学 Hybrid coding method for binary sparse matrix

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102141976A (en) * 2011-01-10 2011-08-03 中国科学院软件研究所 Method for storing diagonal data of sparse matrix and SpMV (Sparse Matrix Vector) realization method based on method
CN103262068A (en) * 2010-12-20 2013-08-21 萨思学会有限公司 Systems and methods for generating a cross-roduct matrix in a single pass through data using single pass levelization
US20130339506A1 (en) * 2012-06-13 2013-12-19 International Business Machines Corporation Performing synchronized collective operations over multiple process groups
CN103984527A (en) * 2014-04-01 2014-08-13 杭州电子科技大学 Method optimizing sparse matrix vector multiplication to improve incompressible pipe flow simulation efficiency

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5752067A (en) 1990-11-13 1998-05-12 International Business Machines Corporation Fully scalable parallel processing system having asynchronous SIMD processing
US5765011A (en) 1990-11-13 1998-06-09 International Business Machines Corporation Parallel processing system having a synchronous SIMD processing with processing elements emulating SIMD operation using individual instruction streams
US5625836A (en) 1990-11-13 1997-04-29 International Business Machines Corporation SIMD/MIMD processing memory element (PME)
GB2251320A (en) * 1990-12-20 1992-07-01 Motorola Ltd Parallel processor
JP2557175B2 (en) * 1992-05-22 1996-11-27 インターナショナル・ビジネス・マシーンズ・コーポレイション Computer system
US5446908A (en) 1992-10-21 1995-08-29 The United States Of America As Represented By The Secretary Of The Navy Method and apparatus for pre-processing inputs to parallel architecture computers
US5644517A (en) 1992-10-22 1997-07-01 International Business Machines Corporation Method for performing matrix transposition on a mesh multiprocessor architecture having multiple processor with concurrent execution of the multiple processors
JP3348367B2 (en) * 1995-12-06 2002-11-20 富士通株式会社 Multiple access method and multiple access cache memory device
JP3639206B2 (en) * 2000-11-24 2005-04-20 富士通株式会社 Parallel matrix processing method and recording medium in shared memory type scalar parallel computer
US7587516B2 (en) 2001-02-24 2009-09-08 International Business Machines Corporation Class network routing
ATE479147T1 (en) 2001-02-24 2010-09-15 Ibm NEW MASSIVE PARALLEL SUPERCOMPUTER
US6961888B2 (en) 2002-08-20 2005-11-01 Flarion Technologies, Inc. Methods and apparatus for encoding LDPC codes
US8380778B1 (en) * 2007-10-25 2013-02-19 Nvidia Corporation System, method, and computer program product for assigning elements of a matrix to processing threads with increased contiguousness
WO2011156247A2 (en) 2010-06-11 2011-12-15 Massachusetts Institute Of Technology Processor for large graph algorithm computations and matrix operations
US8549259B2 (en) 2010-09-15 2013-10-01 International Business Machines Corporation Performing a vector collective operation on a parallel computer having a plurality of compute nodes
US9170836B2 (en) * 2013-01-09 2015-10-27 Nvidia Corporation System and method for re-factorizing a square matrix into lower and upper triangular matrices on a parallel processor
JP6083300B2 (en) * 2013-03-29 2017-02-22 富士通株式会社 Program, parallel operation method, and information processing apparatus
US9367519B2 (en) * 2013-08-30 2016-06-14 Microsoft Technology Licensing, Llc Sparse matrix data structure
US9471377B2 (en) 2013-11-13 2016-10-18 Reservoir Labs, Inc. Systems and methods for parallelizing and optimizing sparse tensor computations
US9715481B2 (en) 2014-06-27 2017-07-25 Oracle International Corporation Approach for more efficient use of computing resources while calculating cross product or its approximation for logistic regression on big data sets
US9898441B2 (en) * 2016-02-05 2018-02-20 Google Llc Matrix processing apparatus
US9805001B2 (en) 2016-02-05 2017-10-31 Google Inc. Matrix processing apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103262068A (en) * 2010-12-20 2013-08-21 萨思学会有限公司 Systems and methods for generating a cross-roduct matrix in a single pass through data using single pass levelization
CN102141976A (en) * 2011-01-10 2011-08-03 中国科学院软件研究所 Method for storing diagonal data of sparse matrix and SpMV (Sparse Matrix Vector) realization method based on method
US20130339506A1 (en) * 2012-06-13 2013-12-19 International Business Machines Corporation Performing synchronized collective operations over multiple process groups
CN103984527A (en) * 2014-04-01 2014-08-13 杭州电子科技大学 Method optimizing sparse matrix vector multiplication to improve incompressible pipe flow simulation efficiency

Also Published As

Publication number Publication date
KR20170093698A (en) 2017-08-16
BR102016030970A8 (en) 2018-07-31
JP7187635B2 (en) 2022-12-12
SG10201610977QA (en) 2017-09-28
TW201732645A (en) 2017-09-16
US9805001B2 (en) 2017-10-31
SG10201808521PA (en) 2018-11-29
EP4160448A1 (en) 2023-04-05
US10719575B2 (en) 2020-07-21
JP2023021171A (en) 2023-02-09
CN107045493A (en) 2017-08-15
TWI624763B (en) 2018-05-21
KR20190054052A (en) 2019-05-21
US20170228343A1 (en) 2017-08-10
US9798701B2 (en) 2017-10-24
CN112000919A (en) 2020-11-27
KR102483303B1 (en) 2022-12-29
KR20200053461A (en) 2020-05-18
US20220391472A1 (en) 2022-12-08
KR20230002254A (en) 2023-01-05
EP3203382A1 (en) 2017-08-09
US20210034697A1 (en) 2021-02-04
JP6978467B2 (en) 2021-12-08
KR102112094B1 (en) 2020-05-18
US11366877B2 (en) 2022-06-21
TW201826143A (en) 2018-07-16
TWI718604B (en) 2021-02-11
TW202011226A (en) 2020-03-16
KR101980365B1 (en) 2019-05-20
CN107045493B (en) 2020-08-18
CN112000919B (en) 2022-04-05
KR102635985B1 (en) 2024-02-08
US10417303B2 (en) 2019-09-17
BR102016030970A2 (en) 2018-07-17
US20180060276A1 (en) 2018-03-01
US20200012705A1 (en) 2020-01-09
JP6524052B2 (en) 2019-06-05
US20170228341A1 (en) 2017-08-10
TWI670613B (en) 2019-09-01
JP2022000781A (en) 2022-01-04
TW202131172A (en) 2021-08-16
JP2017138965A (en) 2017-08-10
JP2019153333A (en) 2019-09-12

Similar Documents

Publication Publication Date Title
TWI781509B (en) System and method for transforming sparse elements into a dense matrix, and non-transitory machine-readable storage device
JP7023917B2 (en) Systems and methods for converting sparse elements to dense matrices

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent