TWI719433B - Data structures with multiple read ports, processor, and method for data structures with multiple read ports - Google Patents
Data structures with multiple read ports, processor, and method for data structures with multiple read ports Download PDFInfo
- Publication number
- TWI719433B TWI719433B TW108109969A TW108109969A TWI719433B TW I719433 B TWI719433 B TW I719433B TW 108109969 A TW108109969 A TW 108109969A TW 108109969 A TW108109969 A TW 108109969A TW I719433 B TWI719433 B TW I719433B
- Authority
- TW
- Taiwan
- Prior art keywords
- data
- read
- memory
- subset
- port
- Prior art date
Links
Images
Abstract
Description
本發明大體上係關於資料結構之儲存,且具體而言,本發明係關於具有多讀取埠之資料結構之儲存。 The present invention generally relates to the storage of data structures, and more specifically, the present invention relates to the storage of data structures with multiple read ports.
資料結構(諸如查找表)可用於諸多應用中以對所接收之輸入資料執行一函數。例如,一算術邏輯單元(ALU)可藉由在一查找表中查找一接收輸入值且回傳一對應輸出值來對該接收輸入值執行一運算。 Data structures (such as look-up tables) can be used in many applications to perform a function on the received input data. For example, an arithmetic logic unit (ALU) can perform an operation on the received input value by looking up a received input value in a lookup table and returning a corresponding output value.
在一些情況中(諸如在單指令多資料(SIMD)應用中),可期望能夠對不同輸入資料集並行執行相同運算。因而,多個ALU或其他電路需要能夠並行存取查找表內所含之資料。 In some cases (such as in single instruction multiple data (SIMD) applications), it may be desirable to be able to perform the same operation on different sets of input data in parallel. Therefore, multiple ALUs or other circuits need to be able to access the data contained in the look-up table in parallel.
具有多個讀取埠之一記憶體結構可用於允許由多個ALU或其他處理器件並行存取一共同資料結構(諸如一查找表)。可使用具有較少讀取埠之複數個記憶體結構來建構該記憶體結構。 A memory structure with multiple read ports can be used to allow multiple ALUs or other processing devices to access a common data structure (such as a look-up table) in parallel. A plurality of memory structures with fewer read ports can be used to construct the memory structure.
可使用各具有2m-1個讀取埠之三個記憶體結構(例如子結構)來建構具有允許同時存取n個資料輸入項之2m個讀取埠之一記憶體結構。該三個記憶體結構包含:一第一結構,其提供對該n個資料輸入項之一第一半(n/2個輸入項)之存取;一第二結構,其提供對該n個資料輸入項
之一第二半(n/2個輸入項)之存取;及一差異結構,其提供對該n個資料輸入項之該第一半與該第二半(n/2個輸入項)之間的差異資料之存取。該2m個埠之各者可連接至該等2m-1埠資料結構之各者之一各自埠,使得一埠可藉由存取該第一結構或藉由存取該差異結構及該第二結構兩者來自該n個資料輸入項之該第一半存取資料以重建由該第一結構儲存之該資料。類似地,一埠可藉由存取該第二結構或藉由存取該差異結構及該第一結構兩者來自該n個資料輸入項之該第二半存取資料以重建由該第二結構儲存之該資料。
Each can be used with three
因而,可使用各儲存n/2個資料輸入項之三個1埠記憶體結構來建構用於存取n個資料輸入項之一2埠記憶體結構。類似地,可使用總共儲存(3/2)m*n個輸入項之多個1埠記憶體結構來建構用於存取n個資料輸入項之一2m埠記憶體結構。 Therefore, three 1-port memory structures each storing n/2 data input items can be used to construct a 2-port memory structure for accessing n data input items. Similarly, multiple 1-port memory structures storing a total of (3/2) m *n input items can be used to construct a 2 m- port memory structure for accessing one of n data input items.
100:多埠記憶體結構 100: Multi-port memory structure
102:算術邏輯單元(ALU) 102: Arithmetic Logic Unit (ALU)
104:讀取埠 104: Read port
200:1埠記憶體結構 200:1 port memory structure
205:讀取埠 205: read port
300:2埠記憶體結構 300: 2-port memory structure
305A:第一1埠記憶體結構/下結構 305A: The first 1-port memory structure/lower structure
305B:第二1埠記憶體結構/上結構 305B: second 1-port memory structure/upper structure
310:第三1埠結構/差異結構
310: The
315:存取電路 315: Access Circuit
320:存取電路 320: access circuit
320A:下讀取埠 320A: Lower reading port
320B:上讀取埠 320B: Upper reading port
325:多工器(MUX) 325: Multiplexer (MUX)
325A:下MUX 325A: Down MUX
325B:上MUX 325B: Upper MUX
330:差異電路 330: Difference Circuit
330A:第一差異電路 330A: First difference circuit
330B:第二差異電路 330B: Second difference circuit
335:衝突控制電路 335: Conflict Control Circuit
400:4埠記憶體結構 400: 4-port memory structure
405A:下2埠記憶體結構/第一2埠記憶體結構 405A: Lower 2-port memory structure/first 2-port memory structure
405B:上2埠記憶體結構/第二2埠記憶體結構 405B: Upper 2-port memory structure/Second 2-port memory structure
410:差異2埠記憶體結構/第三2埠記憶體結構 410: Differential 2-port memory structure/third 2-port memory structure
415A:下結構 415A: Lower structure
415B:上結構 415B: Upper structure
415C:差異結構 415C: Differential structure
420A:下結構 420A: Lower structure
420B:上結構 420B: Upper structure
420C:差異結構 420C: Differential structure
425A:下結構 425A: Lower structure
425B:上結構 425B: Upper structure
425C:差異結構 425C: Differential structure
430:子存取電路 430: Sub-Access Circuit
430A:子存取電路 430A: Sub-access circuit
430B:子存取電路 430B: Sub-access circuit
430C:子存取電路 430C: Sub-access circuit
435:存取電路 435: access circuit
435A:第一存取電路 435A: The first access circuit
435B:第二存取電路 435B: second access circuit
440A:讀取埠 440A: Read port
440B:讀取埠 440B: Read port
440C:讀取埠 440C: Read port
440D:讀取埠 440D: Read port
502:第一方法 502: first method
504:第二方法 504: second method
506:第三方法 506: third method
508:第四方法 508: The Fourth Method
600:2m埠記憶體結構 600: 2 m port memory structure
605A:下結構/下子表 605A: Lower structure/lower table
605B:上結構/上子表 605B: Upper structure/Upper table
610:差異結構/差異子表 610: Difference structure / difference sub-table
615:存取電路 615: Access Circuit
615-1至615-2m-1:存取電路 615-1 to 615-2 m-1 : Access circuit
625:讀取埠 625: read port
625-1至625-2m-1:讀取埠 625-1 to 625-2 m-1 : read port
圖1繪示根據一些實施例之具有含多個讀取埠之一記憶體結構之一處理器之一方塊圖。 FIG. 1 shows a block diagram of a processor having a memory structure with a plurality of read ports according to some embodiments.
圖2繪示根據一些實施例之具有一單一讀取埠之一記憶體結構。 FIG. 2 shows a memory structure with a single read port according to some embodiments.
圖3繪示根據一些實施例之具有兩個讀取埠之一記憶體結構。 FIG. 3 shows a memory structure with two read ports according to some embodiments.
圖4繪示根據一些實施例之可使用三個不同2埠結構來組裝之一4埠結構之一圖式。 FIG. 4 shows a diagram of a 4-port structure that can be assembled using three different 2-port structures according to some embodiments.
圖5繪示4埠結構之埠如何能夠並行存取結構之任何資料輸入項之一圖式。 FIG. 5 shows a diagram of how the ports of the 4-port structure can access any data input items of the structure in parallel.
圖6繪示根據一些實施例之由三個2m-1埠結構建構之具有2m個讀取埠之一結構。 FIG. 6 shows a structure with 2 m read ports constructed from three 2 m-1 port structures according to some embodiments.
圖式僅為了說明而描繪本發明之實施例。熟習技術者應易於自以下描述認識到,可在不背離本文所描述之本發明之原理或惠誉之情況下採用本文所繪示之結構及方法之替代實施例。 The drawings depict embodiments of the invention for illustration only. Those skilled in the art should easily recognize from the following description that alternative embodiments of the structure and method described herein can be used without departing from the principles of the invention described herein or Fitch.
圖式及以下描述係關於僅供說明之較佳實施例。應注意,將易於自以下討論認識到,本文所揭示之結構及方法之替代實施例係可在不背離所主張之原理之情況下採用之可行替代方案。 The drawings and the following description are about preferred embodiments for illustration only. It should be noted that it will be easy to recognize from the following discussion that alternative embodiments of the structures and methods disclosed herein are feasible alternatives that can be adopted without departing from the claimed principles.
一資料結構(諸如一查找表)可由一算術邏輯單元(ALU)或其他電路用於對所接收之輸入值執行一運算。在諸多並行處理應用(諸如單指令多資料(SIMD)應用)中,多個ALU需要並行存取資料結構。因而,期望資料結構實施於具有多個讀取埠之一記憶體結構(例如一隨機存取記憶體(RAM)或唯讀記憶體(ROM))上。另外,儘管本發明主要涉及ALU經由一或多個讀取埠自資料結構讀取資料,但在其他實施例中,任何其他類型之電路或消費者可經由一或多個讀取埠自資料結構讀取資料。 A data structure (such as a lookup table) can be used by an arithmetic logic unit (ALU) or other circuit to perform an operation on the received input value. In many parallel processing applications (such as single instruction multiple data (SIMD) applications), multiple ALUs need to access the data structure in parallel. Therefore, it is desirable that the data structure be implemented on a memory structure (such as a random access memory (RAM) or read-only memory (ROM)) having multiple read ports. In addition, although the present invention mainly relates to the ALU reading data from the data structure through one or more read ports, in other embodiments, any other types of circuits or consumers can read data from the data structure through one or more read ports. Read the data.
圖1繪示根據一些實施例之包含具有多個讀取埠之一記憶體結構之一處理器之一方塊圖。處理器可為一積體電路(IC)器件。在一些實施例中,處理器係專用於張量處理之一處理器。處理器包含一多埠記憶體結構100及多個ALU 102。多埠記憶體結構100包含動態隨機存取記憶體(DRAM)胞或儲存由複數個ALU 102存取之一資料結構(例如一查找表)之其他類型之記憶體。在一些實施例中,資料結構與一函數相關聯,且將函數輸入值映射至函數輸出值。例如,資料結構實施一機器學習模型之一
激活函數,諸如整流線性單元(RELU)函數、二元階躍函數、反正切函數或其他函數。
FIG. 1 shows a block diagram of a processor including a memory structure having a plurality of read ports according to some embodiments. The processor may be an integrated circuit (IC) device. In some embodiments, the processor is a processor dedicated to tensor processing. The processor includes a
ALU 102可為一SIMD或其他並行處理器之部分,其中各ALU 102經組態以對不同輸入資料集執行相同算術運算。例如,各ALU 102接收一各自輸入資料集,且對儲存於記憶體結構100中之資料結構執行一或多次查找以基於與資料結構相關聯之函數來產生一各自輸出資料集。為使ALU 102並行運算,複數個ALU 102需要能夠同時存取記憶體結構100上之資料結構。例如,圖1繪示具有四個讀取埠104之記憶體結構100,讀取埠104各連接至四個不同ALU 102之一者。在一實施例中,各讀取埠具有其自身專用位址匯流排及其自身專用資料匯流排。ALU藉由將一位址提供至位址匯流排來經由一讀取埠讀取資料,且記憶體結構100自定位於該位址處之資料結構回傳資料。如本文所使用,「同時」可係指在一共同時間週期(例如一時脈循環)期間。例如,複數個ALU之各者可在一特定時脈循環期間將一讀取請求傳輸至記憶體結構100,其中所傳輸之讀取請求可被視為彼此同時。
The
在一些實施例中,可使用具有較少讀取埠之記憶體結構來建構具有多個讀取埠之一記憶體結構(諸如記憶體結構100)。例如,記憶體結構100可由各具有一單一讀取埠之複數個記憶體結構建構。圖2繪示根據一些實施例之具有一單一讀取埠之一記憶體結構。記憶體結構200儲存複數個資料輸入項(例如輸入項[0]至[n-1],其中n包括2或更大之一整數值)。因為記憶體結構200僅具有一單一讀取埠205,所以每次僅一單一ALU可存取由記憶體結構200含有之資料。
In some embodiments, a memory structure with fewer read ports can be used to construct a memory structure with multiple read ports (such as the memory structure 100). For example, the
允許多個ALU並行存取記憶體結構200之資料之任何者之 一實例性方式係跨多個單讀取埠記憶體結構複製資料(例如輸入項[0]至[n-1])。例如,可進行跨一第二單讀取埠記憶體結構複製結構200之資料以建構具有兩個讀取埠之一組合記憶體結構,該兩個讀取埠之各者可由一不同ALU獨立存取以提供原始結構中之資料之任何者之存取。然而,此組態亦使儲存資料所需之記憶量加倍,因為將跨兩個單埠記憶體結構複製輸入項[0]至[n-1]之各者。使用此類型之組態,為構建具有可通過2m個讀取埠存取之n個輸入項之一記憶體結構,將需要儲存總共n*2m個輸入項,其中m包括一正整數值。 An exemplary method of allowing multiple ALUs to access data of the memory structure 200 in parallel is to copy data across multiple single read port memory structures (for example, input items [0] to [n-1]). For example, the data of structure 200 can be copied across a second single read port memory structure to construct a combined memory structure with two read ports, each of which can be independently stored by a different ALU Taken to provide any access to the data in the original structure. However, this configuration also doubles the amount of memory required to store data, because each of the input items [0] to [n-1] will be copied across two separate memory structures. Using this type of configuration, in order to build a memory structure with n input items that can be accessed through 2 m read ports, a total of n*2 m input items will need to be stored, where m includes a positive integer value .
2埠記憶體結構 2-port memory structure
圖3繪示根據一些實施例之具有兩個讀取埠之一記憶體結構。如圖3中所繪示,允許並行存取資料之n個輸入項(例如輸入項[0]至[n-1])之任何者之一2埠記憶體結構300由三個1埠記憶體結構構建。各1埠記憶體結構含有一表或儲存原始資料結構之資料輸入項之一半數目(例如n/2個輸入項)之其他資料結構。因而,與原始1埠記憶體結構(例如圖2中所繪示之1埠結構200)相比,2埠記憶體結構300將僅需要儲存50%以上輸入項,同時仍允許透過兩個讀取埠之任一者來同時存取所有n個輸入項。
FIG. 3 shows a memory structure with two read ports according to some embodiments. As shown in Figure 3, one of the n input items (for example, input items [0] to [n-1]) that allows parallel access to data, the 2-
2埠記憶體結構300包括儲存含有資料輸入項[0]至[n-1]之一第一半之一表之一第一1埠記憶體結構305A及儲存含有輸入項[0]至[n-1]之一第二半之一表之一第二1埠記憶體結構305B。為便於闡釋,資料輸入項之第一半亦可指稱「下」半部(例如輸入項[0]至[n/2-1]),而第二半亦可指稱「上」半部(例如輸入項[n/2]至[n-1])。因而,第一結構305A可指稱「下結構」,而第二結構305B可指稱「上結構」。
The 2-
除下結構305A及上結構305B之外,2埠結構300進一步包
括儲存n/2個輸入項之一第三1埠結構310(下文中指稱「差異結構」),該等輸入項各指示下結構之一對應輸入項與上結構之一對應輸入項之間是否存在差異。例如,差異結構可儲存指示下結構之輸入項[0]與上結構之輸入項[n/2]之間的差異、輸入項[0]與[n/2+1]之間的差異等等之輸入項。可使用任何函數來判定差異,該函數允許僅使用對應差異之值及下半部或上半部之資料輸入項來判定另一半之一資料輸入項之值。例如,在一些實施例中,自下結構及上結構之對應輸入項之一互斥或(XOR)產生差異結構中之輸入項。因而,無需存取下結構,可使用對應上半部資料輸入項及XOR值來判定下半部之一特定資料輸入項之值。在其他實施例中,可使用除XOR之外之可逆函數來計算差異輸入項。
In addition to the
存取電路315包括將下結構305A、上結構305B及差異結構310之讀取埠映射至兩個不同讀取埠320A及320B(其等可分別指稱下讀取埠及上讀取埠)之一電路。各讀取埠320經組態以接收指定待讀取之一或多個輸入項之讀取位址之讀取請求。針對各讀取埠320,存取電路315包括一多工器(MUX)325及一差異計算電路330。各差異計算電路330經組態以自差異結構310及下結構305A或上結構305B之一者接收對應輸入項之資料以自剩餘上結構305B或下結構305A計算一對應輸入項之值(諸如藉由實施一XOR運算或其他可逆函數)。例如,可自下結構305A之對應輸入項之一XOR及差異結構310(例如輸入項[0]及輸入項([0]XOR[n/2]))判定上結構305B中之任何輸入項(例如輸入項[n/2])。因而,一特定讀取埠可藉由組合自下結構305A及差異結構310擷取之資料來提供對應於上結構305B之輸入項之資料,即使上結構305B不可用(例如歸因於由其他讀取埠存取)。類似地,當下結構305A不可用時,可藉由存取上結構305B及差異
結構310來判定下結構305A之資料輸入項。
The
在一些實施例中,差異電路330包括:一第一差異電路330A,其經組態以使用下結構305A及差異結構310來判定上結構305B之輸入項值;及一第二差異電路330B,其經組態以使用上結構305B及差異結構310來判定下結構305A之輸入項值。第一差異電路330A及第二差異電路330B可分別指稱下差異電路及上差異電路。
In some embodiments, the difference circuit 330 includes: a
MUX 325包括一下MUX 325A及一上MUX 325B,其等各經組態以在下結構305A(針對讀取請求自儲存輸入項之下半部請求一位址時)、上結構305A(針對讀取請求自儲存輸入項之上半部請求一位址時)及差異電路330A或330B之一者之輸出之間選擇且將選定輸出提供至一各自讀取埠320A/B。例如,下讀取埠320A接收連接至差異電路330A之下MUX 325A之一輸出,而上讀取埠320B接收連接至差異電路330B之上MUX 325B之一輸出。
MUX 325 includes a
在一些實施例中,一衝突控制電路335使用一優先權方案來判定讀取埠320A及320B之各者如何能夠存取由結構305A、305B及310儲存之資料輸入項。衝突控制電路335經組態以自對應於接收讀取請求之讀取埠接收位址,且藉由控制MUX 325A/B自各讀取埠320A/B應自其接收資料之結構選擇來執行任何同時接收請求之間的衝突解決。
In some embodiments, a
例如,如上文所討論,讀取埠320可被標示為一下讀取埠320A及一上讀取埠320B。下讀取埠320A具有對下結構305A之「優先權」。因而,衝突控制電路335組態MUX 325A以透過下讀取埠320A自下結構305A直接讀取對下結構305A中之輸入項之所有請求。類似地,上讀取埠320B具有對上結構305B之「優先權」以透過上讀取埠320B自上結構
305B直接讀取對上結構305B中之輸入項之所有請求。另外,衝突控制電路335可組態MUX 325A/B,使得每當各讀取埠320A/B未接收到自相同結構讀取資料之一同時讀取請求時,另一讀取埠可自其不具有優先權之下結構305A/上結構305B直接讀取。然而,若下讀取埠320A及上讀取埠320B兩者同時接收自上結構305B讀取一或多個輸入項之讀取,則衝突控制電路335組態MUX 325A,使得下讀取埠320A代以自差異計算電路330A之輸出讀取,其使用下結構305A之對應輸入項及差異結構310來判定上結構305B之請求輸入項之值。類似地,若下讀取埠320A及上讀取埠320B同時接收自下結構305A讀取一或多個輸入項之請求,則衝突控制電路335組態MUX 325B以引起上讀取埠320B自差異計算電路330B之輸出讀取。
For example, as discussed above, the read port 320 can be labeled as a
儘管圖3繪示一特定存取電路組態,但應瞭解,在其他實施例中,其他存取電路組態係可行的。例如,在一些實施例中,讀取埠320A或320B能夠使用相對結構及差異結構來讀取下結構或上結構之資料輸入項。在一些實施例中,一存取電路可經組態以將複數個記憶體結構映射至兩個以上埠。
Although FIG. 3 shows a specific access circuit configuration, it should be understood that in other embodiments, other access circuit configurations are possible. For example, in some embodiments, the
22 mm 埠記憶體結構Port memory structure
上文所討論之使用1埠記憶體結構之2埠記憶體結構之建構可經外推以組裝具有額外數目之可用讀取埠(例如2m個讀取埠)之結構。圖4繪示根據一些實施例之可使用三個不同2埠記憶體結構來組裝之一4埠記憶體結構之一圖式。4埠記憶體結構400由三個2埠記憶體結構(其包含一下2埠記憶體結構405A、一上2埠記憶體結構405B及一差異2埠記憶體結構410)建構。2埠記憶體結構405A、405B及410之各者可依類似於圖3中所繪示之2埠記憶體結構300之一方式建構(例如由三個1埠記憶體結構建
構)。
The construction of the 2-port memory structure using the 1-port memory structure discussed above can be extrapolated to assemble a structure with an additional number of available read ports (for example, 2 m read ports). FIG. 4 shows a diagram of a 4-port memory structure that can be assembled using three different 2-port memory structures according to some embodiments. The 4-
為了討論,將由實施於4埠記憶體結構400上之表儲存之資料輸入項分成若干資料子集「A」、「B」、「C」及「D」,其等各對應於4埠結構400之全部資料輸入項之1/4。
For the purpose of discussion, the data input items stored in the table implemented on the 4-
第一2埠記憶體結構405A包括三個1埠記憶體結構,其等包括儲存含有資料子集「A」之一表之一下結構415A、儲存含有資料子集「B」之一表之一上結構415B及儲存指示資料子集「A」與「B」之間的差異(例如「A♁B」)之一表之一差異結構415C。類似地,第二2埠記憶體結構405B包括儲存含有資料子集「C」之一表之一下結構420A、儲存含有資料子集「D」之一表之一上結構420B及儲存指示資料子集「C」與「D」之間的差異(例如「C♁D」)之一表之一差異結構420C。因而,第一2埠記憶體結構405A及第二2埠記憶體結構405B可充當4埠記憶體結構400之一2埠下結構及一2埠上結構。第三2埠記憶體結構410充當第一2埠記憶體結構405A與第二2埠記憶體結構405B之間的一2埠差異結構,其包括儲存指示資料子集「A」與「C」之間的差異(例如「A♁C」)之一表之一下結構425A、儲存指示資料子集「B」與「D」之間的差異(例如「B♁D」)之一表之一上結構425B及儲存指示所有四個資料子集之間的差異(例如「(A♁C)♁(B♁D)」)之一表之一差異結構425C。如圖4中所繪示,可使用一XOR運算來判定資料子集之間的差異。
The first 2-
2埠記憶體結構405A、405B及410之各者亦包括一各自存取電路430(下文指稱子存取電路430(例如子存取電路430A、430B及430C)),其結構可實質上類似於圖3中所繪示之存取電路315。
Each of the 2-
三個子存取電路430之各者之各埠連接至一存取電路435。
例如,第一存取電路435A連接至各子存取電路430之下讀取埠,而第二存取電路435B連接至各子存取電路430之上讀取埠。各存取電路435A可具有實質上類似於圖3之存取電路315之一結構。由於各存取電路435具有兩個讀取埠,所以4埠記憶體結構400具有總共四個讀取埠440A、440B、440C及440D,其等各能夠並行存取資料子集「A」、「B」、「C」及「D」之任何者。
Each port of each of the three sub-access circuits 430 is connected to an access circuit 435.
For example, the
如圖4中所繪示,用於提供對n個資料輸入項之存取之4埠記憶體結構400可使用各儲存含有n/4個資料輸入項之一表之9個(或32個)1埠記憶體結構來建構,或建構為各儲存含有總共(3n/4)個資料輸入項之表之三個2埠記憶體結構。因此,4埠記憶體結構400儲存含有總共9n/4個資料輸入項之表。一般而言,使用上文所討論之建構方案,可使用儲存總共(3/2)m*n個輸入項之上子結構、下子結構及差異子結構來建構經組態以具有用於提供對n個資料輸入項之並行存取之2m個埠之一記憶體結構。相比而言,僅複製一1埠記憶體結構以提供額外埠將需要儲存2m個埠之2m*n個輸入項。
Depicted in FIG. 4, for providing the n 4
圖5繪示4埠記憶體結構之埠如何能夠並行存取由結構儲存之任何資料輸入項之一圖式。如上文所討論,由4埠記憶體結構儲存之資料輸入項可分成四個資料子集四分體:「A」、「B」、「C」及「D」,其等之各者可作為一表儲存於一單一1埠記憶體結構上。另外,4埠結構含有儲存指示一或多對資料子集之間的差異之表之5個額外1埠結構(例如儲存一對資料子集之間的差異之四個結構及儲存兩對資料子集之差異之一個結構)。因而,使用9個1埠記憶體結構(例如分組成3個2埠記憶體結構,如圖4中所繪示)來組裝4埠記憶體結構。 FIG. 5 shows a diagram of how the ports of the 4-port memory structure can concurrently access any data input items stored by the structure. As discussed above, the data input items stored by the 4-port memory structure can be divided into four data subsets: "A", "B", "C" and "D", each of which can be used as A table is stored on a single 1-port memory structure. In addition, the 4-port structure contains five additional 1-port structures that store the table indicating the difference between one or more pairs of data subsets (for example, the four structures that store the difference between a pair of data subsets and the storage of two pairs of data subsets). A structure of the set of differences). Therefore, nine 1-port memory structures (for example, grouped into three 2-port memory structures, as shown in FIG. 4) are used to assemble a 4-port memory structure.
使用上述建構,一4埠記憶體結構之一讀取埠能夠使用四種不同方法之一者來存取一特定資料輸入項(例如資料子集「A」之一資料輸入項)以允許記憶體結構之所有四個讀取埠並行存取資料輸入項。使用第一方法502,一讀取埠可透過儲存含有資料子集「A」之表之1埠記憶體結構(例如圖4中所繪示之結構415A)來存取資料子集「A」。另一方面,剩餘方法504、506及508需要藉由存取複數個其他記憶體結構來重建資料子集「A」。例如,讀取埠可使用第二方法504來存取儲存含有資料子集「B」(結構41B)及資料子集「A」與「B」之間的差異(結構415C)之表之記憶體結構以判定資料子集「A」之輸入項。替代地,使用第三方法506,埠可存取儲存含有資料子集「C」(結構420A)及「A」與「C」之間的差異(結構425A)之表之記憶體結構以判定資料子集「A」之輸入項。使用第四方法508,埠存取儲存含有資料子集「D」(結構420B)、「C」與「D」之間的差異(結構420C)、「B」與「D」之間的差異(結構425B)及所有四個資料子集之間的差異(結構425C)之表之記憶體結構以判定資料子集「A」之輸入項。亦可依一類似方式各使用四種不同方法之一者來判定剩餘資料子集「B」、「C」及「D」之各者。因此,可在四個讀取埠之各者處並行存取或判定來自資料子集之任何者之任何資料。
Using the above construction, a read port of a 4-port memory structure can use one of four different methods to access a specific data entry (for example, a data entry of data subset "A") to allow memory All four read ports of the structure access data input items in parallel. Using the
上述技術及建構可經進一步外推以建構具有2m個讀取埠之記憶體結構。圖6繪示根據一些實施例之由三個2m-1埠結構建構之具有2m個讀取埠之一記憶體結構。使用包括一下結構605A、一上結構605B及一差異結構610之三個2m-1埠記憶體結構來建構用於提供對n個資料輸入項之存取之各2m埠記憶體結構。各2m-1埠記憶體結構提供對n/2個資料輸入項之存取。例如,n個資料輸入項之下半部儲存於下結構605A中,n個資料
輸入項之上半部儲存於上結構605B中,且差異結構610儲存對應下半部輸入項與上半部輸入項之間的差異。
The above technology and construction can be further extrapolated to construct a memory structure with 2 m read ports. FIG. 6 shows a memory structure with 2 m read ports constructed from three 2 m-1 port structures according to some embodiments. Three 2 m-1 port memory structures including a
2m-1埠記憶體結構之2m-1個埠之各者映射至一存取電路615(例如存取電路615-1至615-2m-1)。例如,三個2m-1個埠記憶體結構之各者之一第一埠映射至一第一存取電路615-1,三個2m-1個埠記憶體結構之各者之一第二埠映射至一第二存取電路615-2等等,直至存取電路615-2m-1。
Each of the 2 m-1 ports of the 2 m-1 port memory structure is mapped to an access circuit 615 (for example, the access circuits 615-1 to 615-2 m-1 ). For example, each one of those three 2 m-1 th memory structure of the first port is mapped to a first access port circuit 615-1, each one of those three
各存取電路615含有兩個讀取埠625(例如一下讀取埠及一上讀取埠625-1及625-2...625-(2m-1-1)及625-2m-1)),其等各能夠直接存取2m-1埠下結構605A及2m-1埠上結構605B或使用差異結構610及相對結構來判定下結構或上結構之資料輸入項之值。例如,各存取電路可經組態使得其各自下讀取埠總是能夠直接存取下子表605A,但在各自上讀取埠需要同時存取上子表605B時使用下子表605A及差異子表610來判定上子表605B中之輸入項之值。類似地,上讀取埠總是能夠存取上子表605B,但在下讀取埠同時存取下子表605A時使用上子表605B及差異子表610來判定下子表605A中之輸入項之值。
Each access circuit 615 contains two reading ports 625 (for example, a lower reading port and an upper reading port 625-1 and 625-2...625-(2 m-1 -1) and 625-2 m- 1)), each of them can directly access the 2 m-1 port
因此,如圖6中所繪示,2m埠記憶體結構將包括映射至三個2m-1埠記憶體結構之各者之2m-1個存取電路。由於各存取電路615含有2個埠,所以總共2m個埠可用。此允許2m埠結構使用(3/2)m*n個輸入項來建構。 Therefore, as shown in FIG. 6, the 2 m- port memory structure will include 2 m-1 access circuits mapped to each of the three 2 m-1 port memory structures. Since each access circuit 615 contains 2 ports, a total of 2 m ports are available. This allows a 2 m port structure to be constructed using (3/2) m *n input items.
儘管本文所描述之技術主要討論使用1埠記憶體結構來建構具有多個讀取埠之記憶體結構,但應瞭解,在其他實施例中,具有一個以上讀取埠(例如2個讀取埠、3個讀取埠等等)之記憶體結構可用於建構具有額外讀取埠之記憶體結構。例如,各具有k個讀取埠之三個記憶體結構 可用於建構使用上述組態之具有高達2k個讀取埠之一記憶體結構。 Although the technology described herein mainly discusses the use of a 1-port memory structure to construct a memory structure with multiple read ports, it should be understood that in other embodiments, there are more than one read port (for example, two read ports). , 3 read ports, etc.) memory structure can be used to construct a memory structure with additional read ports. For example, three memory structures each with k read ports It can be used to construct a memory structure with up to 2k read ports using the above configuration.
另外,所建構之記憶體結構不必受限於2m個埠。例如,若記憶體結構之一特定層級含有不是2之一冪之一埠數目,則一後續層級亦可具有不是2之一冪之一埠數目(例如,三個3埠記憶體結構可用於建構一6埠記憶體結構)。另外,可在一給定層級處使用較少存取電路以減少可用讀取埠之總數。例如,參考圖6,可使用小於(m-1)個存取電路,使得不是各2m-1埠子結構之每個埠映射至一存取電路615以導致小於2m個總讀取埠。 In addition, the constructed memory structure need not be limited to 2 m ports. For example, if a particular level of the memory structure contains a port number that is not a power of 2, then a subsequent level may also have a port number that is not a power of 2 (for example, three 3-port memory structures can be used to construct A 6-port memory structure). In addition, fewer access circuits can be used at a given level to reduce the total number of available read ports. For example, referring to FIG. 6, less than (m-1) access circuits can be used, so that not each port of each 2 m-1 port substructure is mapped to an access circuit 615, resulting in less than 2 m total read ports .
儘管上述實例繪示使用儲存資料子集之兩個子結構(例如下半部及上半部)來建構之一多讀取埠記憶體結構之各層級,但應瞭解,在其他實施例中,可使用不同數目個子結構。例如,在一些實施例中,資料輸入項可劃分於三個子結構與一差異子結構之間。可使用一不同運算(諸如addition mod 3)來替代對應於一XOR運算之差異。在一些實施例中,一存取電路可經組態以將複數個子結構連接至兩個以上埠及控制對複數個子結構之存取。 Although the above example illustrates the use of two sub-structures (such as the lower half and the upper half) of the stored data subset to construct the various levels of a multi-access memory structure, it should be understood that in other embodiments, A different number of substructures can be used. For example, in some embodiments, the data entry items can be divided between three substructures and a difference substructure. A different operation (such as addition mod 3) can be used to replace the difference corresponding to an XOR operation. In some embodiments, an access circuit can be configured to connect multiple sub-structures to more than two ports and control access to multiple sub-structures.
寫入資料 Write data
由複數個組成記憶體結構(例如圖2中所描述之1埠記憶體結構200)建構之一多讀取埠記憶體結構(例如圖6中所描述之2m埠記憶體結構600)可被視為包括複數個層級。例如,由複數個1埠記憶體結構建構之一2m埠記憶體結構可包括m個層級,各層級包括各具有2m-k個讀取埠之3k個記憶體結構,其中k指示一層級且對應於1至m之間的一整數。例如,參考圖4中所繪示之4埠記憶體結構400,記憶體結構400具有含有三個2埠記憶體結構之一k=1層級及含有9個1埠記憶體結構之一k=2層級。
A multi-port memory structure (such as the 2 m-
當將資料寫入至一2m埠記憶體結構中時,使用一遞歸寫入程序以使待寫入之資料反映於結構之所有層級中。例如,參考圖4中所繪示之組態,為將新資料寫入至一特定資料子集(例如資料子集「B」)中,需要將資料寫入至2埠下結構405A及差異結構410。在2埠下結構405A內,將資料寫入至上結構415B(含有儲存資料子集「B」之表)。另外,亦重新計算差異結構415C(儲存(A♁B))之資料。另外,在2埠差異結構410內,亦需要重新計算上結構425B(儲存(B♁D))及差異結構425C(儲存(A♁C)♁(B♁D))內之資料。
When writing data into a 2 m- port memory structure, a recursive writing process is used to make the data to be written reflect in all levels of the structure. For example, referring to the configuration shown in Figure 4, in order to write new data into a specific data subset (for example, data subset "B"), it is necessary to write data to the 2-port
額外組態資訊Additional configuration information
已為了說明而呈現本發明之實施例之以上描述;其不意欲具窮舉性或使本發明受限於所揭示之精確形式。熟習相關技術者應瞭解,可鑑於依據上述揭示內容來進行諸多修改及變動。 The above description of the embodiments of the invention has been presented for illustration; it is not intended to be exhaustive or to limit the invention to the precise form disclosed. Those who are familiar with the relevant technology should understand that many modifications and changes can be made based on the above disclosure.
本描述之一些部分從演算法及對資訊之運算之符號表示方面描述本發明之實施例。此等演算法描述及表示常由熟習資料處理技術者用於將其工作之實質有效傳達給其他熟習技術者。當從功能、計算或邏輯上描述時,此等運算應被理解為由電腦程式或等效電路、微碼或其類似者實施。此外,有時亦證明,將運算之此等配置指稱模組係方便的,且不失一般性。所描述之運算及其相關聯模組可體現為軟體、韌體、硬體或其等之任何組合。 Some parts of this description describe the embodiments of the present invention in terms of algorithms and symbolic representations of operations on information. These algorithm descriptions and representations are often used by those who are familiar with data processing technology to effectively convey the essence of their work to other people who are familiar with the technology. When described in terms of function, calculation or logic, these operations should be understood as being implemented by computer programs or equivalent circuits, microcode or the like. In addition, it is sometimes proved that it is convenient to refer to such configurations of operations as modules without loss of generality. The described operations and their associated modules can be embodied in software, firmware, hardware, or any combination thereof.
可使用一或多個硬體或軟體模組單獨或與其他器件組合執行或實施本文所描述之步驟、運算或程序之任何者。在一實施例中,一軟體模組使用包括含有電腦程式碼之一電腦可讀媒體之一電腦程式產品來實施,電腦程式碼可由一電腦處理器執行以執行所描述之步驟、運算或程序 之任何者或全部。 One or more hardware or software modules can be used alone or in combination with other devices to execute or implement any of the steps, operations, or procedures described herein. In one embodiment, a software module is implemented using a computer program product including a computer readable medium containing computer program code, and the computer program code can be executed by a computer processor to perform the described steps, operations, or procedures Any or all of.
本發明之實施例亦可關於用於執行本文運算之一裝置。此裝置可專為所需用途建構,及/或其可包括由儲存於電腦中之一電腦程式選擇性啟動或重新組態之一通用計算器件。此一電腦程式可儲存於一非暫時性、有形電腦可讀儲存媒體或適合於儲存電子指令之任何類型之媒體(其可耦合至一電腦系統匯流排)中。此外,本說明書中所提及之任何計算系統可包含一單一處理器或可為採用針對增加計算能力之多個處理器設計之架構。 The embodiments of the present invention may also relate to a device for performing the operations herein. The device may be specially constructed for the required purpose, and/or it may include a general-purpose computing device that is selectively activated or reconfigured by a computer program stored in the computer. This computer program can be stored in a non-transitory, tangible computer-readable storage medium or any type of medium suitable for storing electronic instructions (which can be coupled to a computer system bus). In addition, any computing system mentioned in this specification may include a single processor or may be an architecture that uses multiple processors designed to increase computing power.
本發明之實施例亦可關於由本文所描述之一計算程序產生之一產品。此一產品可包括源自一計算程序之資訊,其中資訊儲存於一非暫時性、有形電腦可讀儲存媒體上且可包含一電腦程式產品或本文所描述之其他資料組合之任何實施例。 The embodiments of the present invention may also be related to a product produced by a calculation program described herein. Such a product may include information derived from a computing process, where the information is stored on a non-transitory, tangible computer-readable storage medium and may include any embodiment of a computer program product or other combination of data described herein.
最後,已主要為了可讀性及教學而選擇本說明書中所使用之用語,且其未被選擇用於定界或限定本發明標的。因此,本發明之範疇不意欲受限於[實施方式],而是受限於發佈於由基於此之一申請案上之任何請求項。因此,實施例之揭示內容意欲說明而非限制以下申請專利範圍中所闡述之本發明之範疇。 Finally, the terms used in this specification have been chosen mainly for readability and teaching, and they have not been chosen to delimit or limit the subject matter of the present invention. Therefore, the scope of the present invention is not intended to be limited to [implementations], but to any claims issued on an application based on this. Therefore, the disclosure of the embodiments is intended to illustrate rather than limit the scope of the present invention described in the scope of the following patent applications.
100:多埠記憶體結構 100: Multi-port memory structure
102:算術邏輯單元(ALU) 102: Arithmetic Logic Unit (ALU)
104:讀取埠 104: Read port
Claims (15)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW108109969A TWI719433B (en) | 2019-03-22 | 2019-03-22 | Data structures with multiple read ports, processor, and method for data structures with multiple read ports |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW108109969A TWI719433B (en) | 2019-03-22 | 2019-03-22 | Data structures with multiple read ports, processor, and method for data structures with multiple read ports |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202036274A TW202036274A (en) | 2020-10-01 |
TWI719433B true TWI719433B (en) | 2021-02-21 |
Family
ID=74091174
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW108109969A TWI719433B (en) | 2019-03-22 | 2019-03-22 | Data structures with multiple read ports, processor, and method for data structures with multiple read ports |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI719433B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040215679A1 (en) * | 2003-04-23 | 2004-10-28 | Mark Beaumont | Method for finding global extrema of a set of shorts distributed across an array of parallel processing elements |
US6988181B2 (en) * | 2000-03-08 | 2006-01-17 | Sun Microsystems, Inc. | VLIW computer processing architecture having a scalable number of register files |
US20160328158A1 (en) * | 2015-05-07 | 2016-11-10 | Marvell Israel (M.I.S.L) Ltd. | Multi-bank memory with multiple read ports and multiple write ports per cycle |
US20180267932A1 (en) * | 2017-03-14 | 2018-09-20 | Jianbin Zhu | Shared Memory Structure for Reconfigurable Parallel Processor |
-
2019
- 2019-03-22 TW TW108109969A patent/TWI719433B/en active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6988181B2 (en) * | 2000-03-08 | 2006-01-17 | Sun Microsystems, Inc. | VLIW computer processing architecture having a scalable number of register files |
US20040215679A1 (en) * | 2003-04-23 | 2004-10-28 | Mark Beaumont | Method for finding global extrema of a set of shorts distributed across an array of parallel processing elements |
US20160328158A1 (en) * | 2015-05-07 | 2016-11-10 | Marvell Israel (M.I.S.L) Ltd. | Multi-bank memory with multiple read ports and multiple write ports per cycle |
US20180267932A1 (en) * | 2017-03-14 | 2018-09-20 | Jianbin Zhu | Shared Memory Structure for Reconfigurable Parallel Processor |
Also Published As
Publication number | Publication date |
---|---|
TW202036274A (en) | 2020-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220101896A1 (en) | Data structures with multiple read ports | |
TWI622991B (en) | Apparatuses and methods for cache operations | |
CN111052099B (en) | In-memory processing | |
US8327071B1 (en) | Interprocessor direct cache writes | |
CN103824251B (en) | The technology of information is shared between different cache coherency domains | |
US8982140B2 (en) | Hierarchical memory addressing | |
TWI656533B (en) | Apparatus and method for calculating in a data path | |
US20100138614A1 (en) | Compression Status Bit Cache And Backing Store | |
US20170004089A1 (en) | Patch memory system | |
JP6791522B2 (en) | Equipment and methods for in-data path calculation operation | |
TW201706855A (en) | Translation lookaside buffer in memory | |
KR20160039687A (en) | Independently addressable memory array address spaces | |
WO2017173754A1 (en) | Method and device for on-chip repetitive addressing | |
US20220179823A1 (en) | Reconfigurable reduced instruction set computer processor architecture with fractured cores | |
US20140160876A1 (en) | Address bit remapping scheme to reduce access granularity of dram accesses | |
JP2020530176A (en) | Reconfigurable cache architecture and cache coherency method | |
TWI719433B (en) | Data structures with multiple read ports, processor, and method for data structures with multiple read ports | |
TWI515571B (en) | Partition-free multi-socket memory system architecture | |
JP4451733B2 (en) | Semiconductor device | |
TWI751882B (en) | Data structures with multiple read ports, processor, and method for data structures with multiple read ports | |
US20240103755A1 (en) | Data processing system and method for accessing heterogeneous memory system including processing unit | |
US20160034392A1 (en) | Shared memory system | |
US20030191922A1 (en) | Method and system for local memory addressing in single instruction, multiple data computer system | |
WO2020185239A1 (en) | Data structures with multiple read ports | |
KR20230043619A (en) | Memory device and method of implementing reducing timimg parameters and power comsumption of internal processing operations |