TWI719433B - Data structures with multiple read ports, processor, and method for data structures with multiple read ports - Google Patents

Data structures with multiple read ports, processor, and method for data structures with multiple read ports Download PDF

Info

Publication number
TWI719433B
TWI719433B TW108109969A TW108109969A TWI719433B TW I719433 B TWI719433 B TW I719433B TW 108109969 A TW108109969 A TW 108109969A TW 108109969 A TW108109969 A TW 108109969A TW I719433 B TWI719433 B TW I719433B
Authority
TW
Taiwan
Prior art keywords
data
read
memory
subset
port
Prior art date
Application number
TW108109969A
Other languages
Chinese (zh)
Other versions
TW202036274A (en
Inventor
強納森 亞歷山德 羅斯
葛瑞格 M 索爾森
Original Assignee
美商葛如克公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 美商葛如克公司 filed Critical 美商葛如克公司
Priority to TW108109969A priority Critical patent/TWI719433B/en
Publication of TW202036274A publication Critical patent/TW202036274A/en
Application granted granted Critical
Publication of TWI719433B publication Critical patent/TWI719433B/en

Links

Images

Abstract

A memory structure having 2m read ports allowing for concurrent access to n data entries can be constructed using three memory structures each having 2m-1 read ports. The three memory structures include two structures providing access to half of the n data entries, and a difference structure providing access to difference data between the halves of the n data entries. Each pair of the 2m ports is connected to a respective port of each of the 2m-1-port data structures, such that each port of the part can access data entries of a first half of the n data entries either by accessing the structure storing that half directly, or by accessing both the difference structure and the structure containing the second half to reconstruct the data entries of the first half, thus allowing for a pair of ports to concurrently access any of the stored data entries in parallel.

Description

具有多讀取埠之資料結構、處理器、及用於具有多讀取埠之資料結構之方法 Data structure with multiple read ports, processor, and method for data structure with multiple read ports

本發明大體上係關於資料結構之儲存,且具體而言,本發明係關於具有多讀取埠之資料結構之儲存。 The present invention generally relates to the storage of data structures, and more specifically, the present invention relates to the storage of data structures with multiple read ports.

資料結構(諸如查找表)可用於諸多應用中以對所接收之輸入資料執行一函數。例如,一算術邏輯單元(ALU)可藉由在一查找表中查找一接收輸入值且回傳一對應輸出值來對該接收輸入值執行一運算。 Data structures (such as look-up tables) can be used in many applications to perform a function on the received input data. For example, an arithmetic logic unit (ALU) can perform an operation on the received input value by looking up a received input value in a lookup table and returning a corresponding output value.

在一些情況中(諸如在單指令多資料(SIMD)應用中),可期望能夠對不同輸入資料集並行執行相同運算。因而,多個ALU或其他電路需要能夠並行存取查找表內所含之資料。 In some cases (such as in single instruction multiple data (SIMD) applications), it may be desirable to be able to perform the same operation on different sets of input data in parallel. Therefore, multiple ALUs or other circuits need to be able to access the data contained in the look-up table in parallel.

具有多個讀取埠之一記憶體結構可用於允許由多個ALU或其他處理器件並行存取一共同資料結構(諸如一查找表)。可使用具有較少讀取埠之複數個記憶體結構來建構該記憶體結構。 A memory structure with multiple read ports can be used to allow multiple ALUs or other processing devices to access a common data structure (such as a look-up table) in parallel. A plurality of memory structures with fewer read ports can be used to construct the memory structure.

可使用各具有2m-1個讀取埠之三個記憶體結構(例如子結構)來建構具有允許同時存取n個資料輸入項之2m個讀取埠之一記憶體結構。該三個記憶體結構包含:一第一結構,其提供對該n個資料輸入項之一第一半(n/2個輸入項)之存取;一第二結構,其提供對該n個資料輸入項 之一第二半(n/2個輸入項)之存取;及一差異結構,其提供對該n個資料輸入項之該第一半與該第二半(n/2個輸入項)之間的差異資料之存取。該2m個埠之各者可連接至該等2m-1埠資料結構之各者之一各自埠,使得一埠可藉由存取該第一結構或藉由存取該差異結構及該第二結構兩者來自該n個資料輸入項之該第一半存取資料以重建由該第一結構儲存之該資料。類似地,一埠可藉由存取該第二結構或藉由存取該差異結構及該第一結構兩者來自該n個資料輸入項之該第二半存取資料以重建由該第二結構儲存之該資料。 Each can be used with three memory structure 2 m-1 number of read ports (e.g. sub-structures) allow simultaneous access to construct having a read port of 2 m n number of data entries, one memory structure. The three memory structures include: a first structure that provides access to the first half (n/2 input items) of one of the n data entry items; a second structure that provides access to the n data entry items Access to the second half (n/2 input items) of one of the data input items; and a difference structure that provides the first half and the second half (n/2 input items) of the n data input items Item) access to the difference data. Each of the 2 m ports can be connected to a respective port of each of the 2 m-1 port data structures, so that one port can be accessed by accessing the first structure or by accessing the differential structure and the The second structure both come from the first half of the access data of the n data entries to reconstruct the data stored by the first structure. Similarly, a port can be reconstructed from the second half of the access data from the n data entries by accessing the second structure or by accessing both the difference structure and the first structure. The data stored in the structure.

因而,可使用各儲存n/2個資料輸入項之三個1埠記憶體結構來建構用於存取n個資料輸入項之一2埠記憶體結構。類似地,可使用總共儲存(3/2)m*n個輸入項之多個1埠記憶體結構來建構用於存取n個資料輸入項之一2m埠記憶體結構。 Therefore, three 1-port memory structures each storing n/2 data input items can be used to construct a 2-port memory structure for accessing n data input items. Similarly, multiple 1-port memory structures storing a total of (3/2) m *n input items can be used to construct a 2 m- port memory structure for accessing one of n data input items.

100:多埠記憶體結構 100: Multi-port memory structure

102:算術邏輯單元(ALU) 102: Arithmetic Logic Unit (ALU)

104:讀取埠 104: Read port

200:1埠記憶體結構 200:1 port memory structure

205:讀取埠 205: read port

300:2埠記憶體結構 300: 2-port memory structure

305A:第一1埠記憶體結構/下結構 305A: The first 1-port memory structure/lower structure

305B:第二1埠記憶體結構/上結構 305B: second 1-port memory structure/upper structure

310:第三1埠結構/差異結構 310: The third port 1 structure/differential structure

315:存取電路 315: Access Circuit

320:存取電路 320: access circuit

320A:下讀取埠 320A: Lower reading port

320B:上讀取埠 320B: Upper reading port

325:多工器(MUX) 325: Multiplexer (MUX)

325A:下MUX 325A: Down MUX

325B:上MUX 325B: Upper MUX

330:差異電路 330: Difference Circuit

330A:第一差異電路 330A: First difference circuit

330B:第二差異電路 330B: Second difference circuit

335:衝突控制電路 335: Conflict Control Circuit

400:4埠記憶體結構 400: 4-port memory structure

405A:下2埠記憶體結構/第一2埠記憶體結構 405A: Lower 2-port memory structure/first 2-port memory structure

405B:上2埠記憶體結構/第二2埠記憶體結構 405B: Upper 2-port memory structure/Second 2-port memory structure

410:差異2埠記憶體結構/第三2埠記憶體結構 410: Differential 2-port memory structure/third 2-port memory structure

415A:下結構 415A: Lower structure

415B:上結構 415B: Upper structure

415C:差異結構 415C: Differential structure

420A:下結構 420A: Lower structure

420B:上結構 420B: Upper structure

420C:差異結構 420C: Differential structure

425A:下結構 425A: Lower structure

425B:上結構 425B: Upper structure

425C:差異結構 425C: Differential structure

430:子存取電路 430: Sub-Access Circuit

430A:子存取電路 430A: Sub-access circuit

430B:子存取電路 430B: Sub-access circuit

430C:子存取電路 430C: Sub-access circuit

435:存取電路 435: access circuit

435A:第一存取電路 435A: The first access circuit

435B:第二存取電路 435B: second access circuit

440A:讀取埠 440A: Read port

440B:讀取埠 440B: Read port

440C:讀取埠 440C: Read port

440D:讀取埠 440D: Read port

502:第一方法 502: first method

504:第二方法 504: second method

506:第三方法 506: third method

508:第四方法 508: The Fourth Method

600:2m埠記憶體結構 600: 2 m port memory structure

605A:下結構/下子表 605A: Lower structure/lower table

605B:上結構/上子表 605B: Upper structure/Upper table

610:差異結構/差異子表 610: Difference structure / difference sub-table

615:存取電路 615: Access Circuit

615-1至615-2m-1:存取電路 615-1 to 615-2 m-1 : Access circuit

625:讀取埠 625: read port

625-1至625-2m-1:讀取埠 625-1 to 625-2 m-1 : read port

圖1繪示根據一些實施例之具有含多個讀取埠之一記憶體結構之一處理器之一方塊圖。 FIG. 1 shows a block diagram of a processor having a memory structure with a plurality of read ports according to some embodiments.

圖2繪示根據一些實施例之具有一單一讀取埠之一記憶體結構。 FIG. 2 shows a memory structure with a single read port according to some embodiments.

圖3繪示根據一些實施例之具有兩個讀取埠之一記憶體結構。 FIG. 3 shows a memory structure with two read ports according to some embodiments.

圖4繪示根據一些實施例之可使用三個不同2埠結構來組裝之一4埠結構之一圖式。 FIG. 4 shows a diagram of a 4-port structure that can be assembled using three different 2-port structures according to some embodiments.

圖5繪示4埠結構之埠如何能夠並行存取結構之任何資料輸入項之一圖式。 FIG. 5 shows a diagram of how the ports of the 4-port structure can access any data input items of the structure in parallel.

圖6繪示根據一些實施例之由三個2m-1埠結構建構之具有2m個讀取埠之一結構。 FIG. 6 shows a structure with 2 m read ports constructed from three 2 m-1 port structures according to some embodiments.

圖式僅為了說明而描繪本發明之實施例。熟習技術者應易於自以下描述認識到,可在不背離本文所描述之本發明之原理或惠誉之情況下採用本文所繪示之結構及方法之替代實施例。 The drawings depict embodiments of the invention for illustration only. Those skilled in the art should easily recognize from the following description that alternative embodiments of the structure and method described herein can be used without departing from the principles of the invention described herein or Fitch.

圖式及以下描述係關於僅供說明之較佳實施例。應注意,將易於自以下討論認識到,本文所揭示之結構及方法之替代實施例係可在不背離所主張之原理之情況下採用之可行替代方案。 The drawings and the following description are about preferred embodiments for illustration only. It should be noted that it will be easy to recognize from the following discussion that alternative embodiments of the structures and methods disclosed herein are feasible alternatives that can be adopted without departing from the claimed principles.

一資料結構(諸如一查找表)可由一算術邏輯單元(ALU)或其他電路用於對所接收之輸入值執行一運算。在諸多並行處理應用(諸如單指令多資料(SIMD)應用)中,多個ALU需要並行存取資料結構。因而,期望資料結構實施於具有多個讀取埠之一記憶體結構(例如一隨機存取記憶體(RAM)或唯讀記憶體(ROM))上。另外,儘管本發明主要涉及ALU經由一或多個讀取埠自資料結構讀取資料,但在其他實施例中,任何其他類型之電路或消費者可經由一或多個讀取埠自資料結構讀取資料。 A data structure (such as a lookup table) can be used by an arithmetic logic unit (ALU) or other circuit to perform an operation on the received input value. In many parallel processing applications (such as single instruction multiple data (SIMD) applications), multiple ALUs need to access the data structure in parallel. Therefore, it is desirable that the data structure be implemented on a memory structure (such as a random access memory (RAM) or read-only memory (ROM)) having multiple read ports. In addition, although the present invention mainly relates to the ALU reading data from the data structure through one or more read ports, in other embodiments, any other types of circuits or consumers can read data from the data structure through one or more read ports. Read the data.

圖1繪示根據一些實施例之包含具有多個讀取埠之一記憶體結構之一處理器之一方塊圖。處理器可為一積體電路(IC)器件。在一些實施例中,處理器係專用於張量處理之一處理器。處理器包含一多埠記憶體結構100及多個ALU 102。多埠記憶體結構100包含動態隨機存取記憶體(DRAM)胞或儲存由複數個ALU 102存取之一資料結構(例如一查找表)之其他類型之記憶體。在一些實施例中,資料結構與一函數相關聯,且將函數輸入值映射至函數輸出值。例如,資料結構實施一機器學習模型之一 激活函數,諸如整流線性單元(RELU)函數、二元階躍函數、反正切函數或其他函數。 FIG. 1 shows a block diagram of a processor including a memory structure having a plurality of read ports according to some embodiments. The processor may be an integrated circuit (IC) device. In some embodiments, the processor is a processor dedicated to tensor processing. The processor includes a multi-port memory structure 100 and a plurality of ALUs 102. The multi-port memory structure 100 includes a dynamic random access memory (DRAM) cell or another type of memory that stores a data structure (such as a look-up table) accessed by a plurality of ALUs 102. In some embodiments, the data structure is associated with a function, and the function input value is mapped to the function output value. For example, the data structure implements one of a machine learning model Activation functions, such as rectified linear unit (RELU) functions, binary step functions, arctangent functions, or other functions.

ALU 102可為一SIMD或其他並行處理器之部分,其中各ALU 102經組態以對不同輸入資料集執行相同算術運算。例如,各ALU 102接收一各自輸入資料集,且對儲存於記憶體結構100中之資料結構執行一或多次查找以基於與資料結構相關聯之函數來產生一各自輸出資料集。為使ALU 102並行運算,複數個ALU 102需要能夠同時存取記憶體結構100上之資料結構。例如,圖1繪示具有四個讀取埠104之記憶體結構100,讀取埠104各連接至四個不同ALU 102之一者。在一實施例中,各讀取埠具有其自身專用位址匯流排及其自身專用資料匯流排。ALU藉由將一位址提供至位址匯流排來經由一讀取埠讀取資料,且記憶體結構100自定位於該位址處之資料結構回傳資料。如本文所使用,「同時」可係指在一共同時間週期(例如一時脈循環)期間。例如,複數個ALU之各者可在一特定時脈循環期間將一讀取請求傳輸至記憶體結構100,其中所傳輸之讀取請求可被視為彼此同時。 The ALU 102 may be part of a SIMD or other parallel processor, where each ALU 102 is configured to perform the same arithmetic operation on different sets of input data. For example, each ALU 102 receives a respective input data set, and performs one or more searches on the data structure stored in the memory structure 100 to generate a respective output data set based on the function associated with the data structure. In order for the ALU 102 to operate in parallel, multiple ALUs 102 need to be able to access the data structure on the memory structure 100 at the same time. For example, FIG. 1 shows a memory structure 100 with four read ports 104, each of which is connected to one of four different ALUs 102. In one embodiment, each read port has its own dedicated address bus and its own dedicated data bus. The ALU reads data through a read port by providing an address to the address bus, and the memory structure 100 returns data from the data structure located at the address. As used herein, "simultaneously" may refer to during a common time period (for example, a clock cycle). For example, each of a plurality of ALUs can transmit a read request to the memory structure 100 during a specific clock cycle, where the transmitted read requests can be regarded as simultaneous with each other.

在一些實施例中,可使用具有較少讀取埠之記憶體結構來建構具有多個讀取埠之一記憶體結構(諸如記憶體結構100)。例如,記憶體結構100可由各具有一單一讀取埠之複數個記憶體結構建構。圖2繪示根據一些實施例之具有一單一讀取埠之一記憶體結構。記憶體結構200儲存複數個資料輸入項(例如輸入項[0]至[n-1],其中n包括2或更大之一整數值)。因為記憶體結構200僅具有一單一讀取埠205,所以每次僅一單一ALU可存取由記憶體結構200含有之資料。 In some embodiments, a memory structure with fewer read ports can be used to construct a memory structure with multiple read ports (such as the memory structure 100). For example, the memory structure 100 can be constructed by a plurality of memory structures each having a single read port. FIG. 2 shows a memory structure with a single read port according to some embodiments. The memory structure 200 stores a plurality of data input items (for example, input items [0] to [n-1], where n includes an integer value of 2 or greater). Because the memory structure 200 only has a single read port 205, only a single ALU can access the data contained in the memory structure 200 at a time.

允許多個ALU並行存取記憶體結構200之資料之任何者之 一實例性方式係跨多個單讀取埠記憶體結構複製資料(例如輸入項[0]至[n-1])。例如,可進行跨一第二單讀取埠記憶體結構複製結構200之資料以建構具有兩個讀取埠之一組合記憶體結構,該兩個讀取埠之各者可由一不同ALU獨立存取以提供原始結構中之資料之任何者之存取。然而,此組態亦使儲存資料所需之記憶量加倍,因為將跨兩個單埠記憶體結構複製輸入項[0]至[n-1]之各者。使用此類型之組態,為構建具有可通過2m個讀取埠存取之n個輸入項之一記憶體結構,將需要儲存總共n*2m個輸入項,其中m包括一正整數值。 An exemplary method of allowing multiple ALUs to access data of the memory structure 200 in parallel is to copy data across multiple single read port memory structures (for example, input items [0] to [n-1]). For example, the data of structure 200 can be copied across a second single read port memory structure to construct a combined memory structure with two read ports, each of which can be independently stored by a different ALU Taken to provide any access to the data in the original structure. However, this configuration also doubles the amount of memory required to store data, because each of the input items [0] to [n-1] will be copied across two separate memory structures. Using this type of configuration, in order to build a memory structure with n input items that can be accessed through 2 m read ports, a total of n*2 m input items will need to be stored, where m includes a positive integer value .

2埠記憶體結構 2-port memory structure

圖3繪示根據一些實施例之具有兩個讀取埠之一記憶體結構。如圖3中所繪示,允許並行存取資料之n個輸入項(例如輸入項[0]至[n-1])之任何者之一2埠記憶體結構300由三個1埠記憶體結構構建。各1埠記憶體結構含有一表或儲存原始資料結構之資料輸入項之一半數目(例如n/2個輸入項)之其他資料結構。因而,與原始1埠記憶體結構(例如圖2中所繪示之1埠結構200)相比,2埠記憶體結構300將僅需要儲存50%以上輸入項,同時仍允許透過兩個讀取埠之任一者來同時存取所有n個輸入項。 FIG. 3 shows a memory structure with two read ports according to some embodiments. As shown in Figure 3, one of the n input items (for example, input items [0] to [n-1]) that allows parallel access to data, the 2-port memory structure 300 consists of three 1-port memories Structure construction. Each 1-port memory structure contains a table or other data structures that store half the number of data input items (for example, n/2 input items) of the original data structure. Therefore, compared with the original 1-port memory structure (such as the 1-port structure 200 shown in FIG. 2), the 2-port memory structure 300 will only need to store more than 50% of the input items, while still allowing reading through two Any one of the ports can access all n input items at the same time.

2埠記憶體結構300包括儲存含有資料輸入項[0]至[n-1]之一第一半之一表之一第一1埠記憶體結構305A及儲存含有輸入項[0]至[n-1]之一第二半之一表之一第二1埠記憶體結構305B。為便於闡釋,資料輸入項之第一半亦可指稱「下」半部(例如輸入項[0]至[n/2-1]),而第二半亦可指稱「上」半部(例如輸入項[n/2]至[n-1])。因而,第一結構305A可指稱「下結構」,而第二結構305B可指稱「上結構」。 The 2-port memory structure 300 includes a first 1-port memory structure 305A that stores data entry items [0] to [n-1] and a first half table and a first 1-port memory structure 305A that stores data entry items [0] to [n -1] One second half one table one second 1-port memory structure 305B. For ease of explanation, the first half of the data entry can also be referred to as the "lower" half (e.g. entry [0] to [n/2-1]), and the second half can also be referred to as the "upper" half (e.g. Enter items [n/2] to [n-1]). Thus, the first structure 305A can be referred to as the "lower structure", and the second structure 305B can be referred to as the "upper structure".

除下結構305A及上結構305B之外,2埠結構300進一步包 括儲存n/2個輸入項之一第三1埠結構310(下文中指稱「差異結構」),該等輸入項各指示下結構之一對應輸入項與上結構之一對應輸入項之間是否存在差異。例如,差異結構可儲存指示下結構之輸入項[0]與上結構之輸入項[n/2]之間的差異、輸入項[0]與[n/2+1]之間的差異等等之輸入項。可使用任何函數來判定差異,該函數允許僅使用對應差異之值及下半部或上半部之資料輸入項來判定另一半之一資料輸入項之值。例如,在一些實施例中,自下結構及上結構之對應輸入項之一互斥或(XOR)產生差異結構中之輸入項。因而,無需存取下結構,可使用對應上半部資料輸入項及XOR值來判定下半部之一特定資料輸入項之值。在其他實施例中,可使用除XOR之外之可逆函數來計算差異輸入項。 In addition to the lower structure 305A and the upper structure 305B, the 2-port structure 300 further includes Including storing one of the n/2 input items, the third 1-port structure 310 (hereinafter referred to as the "difference structure"), each of these input items indicates whether the corresponding input item of the lower structure and the corresponding input item of the upper structure has a difference. For example, the difference structure can store the difference between the input item [0] of the lower structure and the input item [n/2] of the upper structure, the difference between the input item [0] and [n/2+1], etc.的input item. Any function can be used to determine the difference. The function allows only the value of the corresponding difference and the data entry in the lower or upper half to determine the value of one of the data entries in the other half. For example, in some embodiments, one of the corresponding input items of the lower structure and the upper structure is mutually exclusive or (XOR) to generate an input item in the difference structure. Therefore, without access to the lower structure, the corresponding upper half of the data entry and XOR value can be used to determine the value of a specific data entry in the lower half. In other embodiments, a reversible function other than XOR may be used to calculate the difference input.

存取電路315包括將下結構305A、上結構305B及差異結構310之讀取埠映射至兩個不同讀取埠320A及320B(其等可分別指稱下讀取埠及上讀取埠)之一電路。各讀取埠320經組態以接收指定待讀取之一或多個輸入項之讀取位址之讀取請求。針對各讀取埠320,存取電路315包括一多工器(MUX)325及一差異計算電路330。各差異計算電路330經組態以自差異結構310及下結構305A或上結構305B之一者接收對應輸入項之資料以自剩餘上結構305B或下結構305A計算一對應輸入項之值(諸如藉由實施一XOR運算或其他可逆函數)。例如,可自下結構305A之對應輸入項之一XOR及差異結構310(例如輸入項[0]及輸入項([0]XOR[n/2]))判定上結構305B中之任何輸入項(例如輸入項[n/2])。因而,一特定讀取埠可藉由組合自下結構305A及差異結構310擷取之資料來提供對應於上結構305B之輸入項之資料,即使上結構305B不可用(例如歸因於由其他讀取埠存取)。類似地,當下結構305A不可用時,可藉由存取上結構305B及差異 結構310來判定下結構305A之資料輸入項。 The access circuit 315 includes mapping the read ports of the lower structure 305A, the upper structure 305B, and the differential structure 310 to one of two different read ports 320A and 320B (these can be referred to as the lower read port and the upper read port, respectively) Circuit. Each read port 320 is configured to receive a read request specifying the read address of one or more input items to be read. For each read port 320, the access circuit 315 includes a multiplexer (MUX) 325 and a difference calculation circuit 330. Each difference calculation circuit 330 is configured to receive data of a corresponding input item from one of the difference structure 310 and the lower structure 305A or the upper structure 305B to calculate the value of a corresponding input item from the remaining upper structure 305B or the lower structure 305A (such as borrowing By implementing an XOR operation or other reversible function). For example, one of the corresponding input items XOR of the lower structure 305A and the difference structure 310 (for example, input item [0] and input item ([0]XOR[n/2])) can be used to determine any input item in the upper structure 305B ( For example, the entry [n/2]). Therefore, a specific read port can provide data corresponding to the input items of the upper structure 305B by combining the data retrieved from the lower structure 305A and the difference structure 310, even if the upper structure 305B is unavailable (for example, due to other reads Port access). Similarly, when the lower structure 305A is not available, you can access the upper structure 305B and the difference The structure 310 determines the data entry of the lower structure 305A.

在一些實施例中,差異電路330包括:一第一差異電路330A,其經組態以使用下結構305A及差異結構310來判定上結構305B之輸入項值;及一第二差異電路330B,其經組態以使用上結構305B及差異結構310來判定下結構305A之輸入項值。第一差異電路330A及第二差異電路330B可分別指稱下差異電路及上差異電路。 In some embodiments, the difference circuit 330 includes: a first difference circuit 330A, which is configured to use the lower structure 305A and the difference structure 310 to determine the input value of the upper structure 305B; and a second difference circuit 330B, which It is configured to use the upper structure 305B and the difference structure 310 to determine the input value of the lower structure 305A. The first difference circuit 330A and the second difference circuit 330B may be referred to as a lower difference circuit and an upper difference circuit, respectively.

MUX 325包括一下MUX 325A及一上MUX 325B,其等各經組態以在下結構305A(針對讀取請求自儲存輸入項之下半部請求一位址時)、上結構305A(針對讀取請求自儲存輸入項之上半部請求一位址時)及差異電路330A或330B之一者之輸出之間選擇且將選定輸出提供至一各自讀取埠320A/B。例如,下讀取埠320A接收連接至差異電路330A之下MUX 325A之一輸出,而上讀取埠320B接收連接至差異電路330B之上MUX 325B之一輸出。 MUX 325 includes a lower MUX 325A and an upper MUX 325B, each of which is configured to be in the lower structure 305A (for a read request from the lower half of the stored input item), the upper structure 305A (for a read request) When requesting an address from the upper half of the stored input item) and the output of one of the difference circuit 330A or 330B, the selected output is selected and provided to a respective read port 320A/B. For example, the lower read port 320A receives an output of the lower MUX 325A connected to the difference circuit 330A, and the upper read port 320B receives an output of the upper MUX 325B connected to the difference circuit 330B.

在一些實施例中,一衝突控制電路335使用一優先權方案來判定讀取埠320A及320B之各者如何能夠存取由結構305A、305B及310儲存之資料輸入項。衝突控制電路335經組態以自對應於接收讀取請求之讀取埠接收位址,且藉由控制MUX 325A/B自各讀取埠320A/B應自其接收資料之結構選擇來執行任何同時接收請求之間的衝突解決。 In some embodiments, a conflict control circuit 335 uses a priority scheme to determine how each of the read ports 320A and 320B can access the data entries stored by the structures 305A, 305B, and 310. The conflict control circuit 335 is configured to receive an address from the read port corresponding to the received read request, and by controlling the structure selection of the MUX 325A/B to receive data from each read port 320A/B to perform any simultaneous Resolve conflicts between received requests.

例如,如上文所討論,讀取埠320可被標示為一下讀取埠320A及一上讀取埠320B。下讀取埠320A具有對下結構305A之「優先權」。因而,衝突控制電路335組態MUX 325A以透過下讀取埠320A自下結構305A直接讀取對下結構305A中之輸入項之所有請求。類似地,上讀取埠320B具有對上結構305B之「優先權」以透過上讀取埠320B自上結構 305B直接讀取對上結構305B中之輸入項之所有請求。另外,衝突控制電路335可組態MUX 325A/B,使得每當各讀取埠320A/B未接收到自相同結構讀取資料之一同時讀取請求時,另一讀取埠可自其不具有優先權之下結構305A/上結構305B直接讀取。然而,若下讀取埠320A及上讀取埠320B兩者同時接收自上結構305B讀取一或多個輸入項之讀取,則衝突控制電路335組態MUX 325A,使得下讀取埠320A代以自差異計算電路330A之輸出讀取,其使用下結構305A之對應輸入項及差異結構310來判定上結構305B之請求輸入項之值。類似地,若下讀取埠320A及上讀取埠320B同時接收自下結構305A讀取一或多個輸入項之請求,則衝突控制電路335組態MUX 325B以引起上讀取埠320B自差異計算電路330B之輸出讀取。 For example, as discussed above, the read port 320 can be labeled as a lower read port 320A and an upper read port 320B. The lower reading port 320A has a "priority" over the lower structure 305A. Therefore, the conflict control circuit 335 configures the MUX 325A to directly read all requests for the input items in the lower structure 305A from the lower structure 305A through the lower read port 320A. Similarly, the upper read port 320B has a "priority" to the upper structure 305B so as to use the upper read port 320B from the upper structure 305B directly reads all requests for the input items in the above structure 305B. In addition, the conflict control circuit 335 can configure the MUX 325A/B so that whenever each read port 320A/B does not receive a simultaneous read request to read data from the same structure, the other read port is free from it. The lower structure 305A/upper structure 305B with priority is read directly. However, if the lower reading port 320A and the upper reading port 320B both receive the reading of one or more input items from the upper structure 305B, the conflict control circuit 335 configures the MUX 325A so that the lower reading port 320A Instead of reading from the output of the difference calculation circuit 330A, it uses the corresponding input item of the lower structure 305A and the difference structure 310 to determine the value of the request input item of the upper structure 305B. Similarly, if the lower read port 320A and the upper read port 320B simultaneously receive a request to read one or more input items from the lower structure 305A, the conflict control circuit 335 configures the MUX 325B to cause the upper read port 320B to be different The output of the calculation circuit 330B is read.

儘管圖3繪示一特定存取電路組態,但應瞭解,在其他實施例中,其他存取電路組態係可行的。例如,在一些實施例中,讀取埠320A或320B能夠使用相對結構及差異結構來讀取下結構或上結構之資料輸入項。在一些實施例中,一存取電路可經組態以將複數個記憶體結構映射至兩個以上埠。 Although FIG. 3 shows a specific access circuit configuration, it should be understood that in other embodiments, other access circuit configurations are possible. For example, in some embodiments, the read port 320A or 320B can use the relative structure and the difference structure to read the data entry of the lower structure or the upper structure. In some embodiments, an access circuit can be configured to map a plurality of memory structures to more than two ports.

22 mm 埠記憶體結構Port memory structure

上文所討論之使用1埠記憶體結構之2埠記憶體結構之建構可經外推以組裝具有額外數目之可用讀取埠(例如2m個讀取埠)之結構。圖4繪示根據一些實施例之可使用三個不同2埠記憶體結構來組裝之一4埠記憶體結構之一圖式。4埠記憶體結構400由三個2埠記憶體結構(其包含一下2埠記憶體結構405A、一上2埠記憶體結構405B及一差異2埠記憶體結構410)建構。2埠記憶體結構405A、405B及410之各者可依類似於圖3中所繪示之2埠記憶體結構300之一方式建構(例如由三個1埠記憶體結構建 構)。 The construction of the 2-port memory structure using the 1-port memory structure discussed above can be extrapolated to assemble a structure with an additional number of available read ports (for example, 2 m read ports). FIG. 4 shows a diagram of a 4-port memory structure that can be assembled using three different 2-port memory structures according to some embodiments. The 4-port memory structure 400 is constructed by three 2-port memory structures (including a lower 2-port memory structure 405A, an upper 2-port memory structure 405B, and a differential 2-port memory structure 410). Each of the 2-port memory structures 405A, 405B, and 410 can be constructed in a manner similar to the 2-port memory structure 300 shown in FIG. 3 (for example, constructed by three 1-port memory structures).

為了討論,將由實施於4埠記憶體結構400上之表儲存之資料輸入項分成若干資料子集「A」、「B」、「C」及「D」,其等各對應於4埠結構400之全部資料輸入項之1/4。 For the purpose of discussion, the data input items stored in the table implemented on the 4-port memory structure 400 are divided into data subsets "A", "B", "C" and "D", each of which corresponds to the 4-port structure 400 1/4 of all data input items.

第一2埠記憶體結構405A包括三個1埠記憶體結構,其等包括儲存含有資料子集「A」之一表之一下結構415A、儲存含有資料子集「B」之一表之一上結構415B及儲存指示資料子集「A」與「B」之間的差異(例如「A♁B」)之一表之一差異結構415C。類似地,第二2埠記憶體結構405B包括儲存含有資料子集「C」之一表之一下結構420A、儲存含有資料子集「D」之一表之一上結構420B及儲存指示資料子集「C」與「D」之間的差異(例如「C♁D」)之一表之一差異結構420C。因而,第一2埠記憶體結構405A及第二2埠記憶體結構405B可充當4埠記憶體結構400之一2埠下結構及一2埠上結構。第三2埠記憶體結構410充當第一2埠記憶體結構405A與第二2埠記憶體結構405B之間的一2埠差異結構,其包括儲存指示資料子集「A」與「C」之間的差異(例如「A♁C」)之一表之一下結構425A、儲存指示資料子集「B」與「D」之間的差異(例如「B♁D」)之一表之一上結構425B及儲存指示所有四個資料子集之間的差異(例如「(A♁C)♁(B♁D)」)之一表之一差異結構425C。如圖4中所繪示,可使用一XOR運算來判定資料子集之間的差異。 The first 2-port memory structure 405A includes three 1-port memory structures, including a lower structure 415A for storing a table containing a data subset "A", and a lower structure for storing a table containing a data subset "B". The structure 415B and the difference between the storage instruction data subset "A" and "B" (for example, "A♁B") are a table of a difference structure 415C. Similarly, the second 2-port memory structure 405B includes a lower structure 420A for storing a table containing a data subset "C", an upper structure 420B for storing a table containing a data subset "D", and a storing instruction data subset The difference between "C" and "D" (for example, "C♁D") represents a difference structure 420C. Therefore, the first 2-port memory structure 405A and the second 2-port memory structure 405B can serve as a 2-port lower structure and a 2-port upper structure of the 4-port memory structure 400. The third two-port memory structure 410 serves as a two-port differential structure between the first two-port memory structure 405A and the second two-port memory structure 405B, which includes the storage instruction data subsets "A" and "C". The difference between (e.g. "A♁C") is one of the lower structure 425A, and the difference between the storage instruction data subset "B" and "D" (e.g. "B♁D") is one of the upper structure 425B and storage indicate the difference between all four data subsets (for example, "(A♁C)♁(B♁D)") a table of a difference structure 425C. As shown in Figure 4, an XOR operation can be used to determine the difference between data subsets.

2埠記憶體結構405A、405B及410之各者亦包括一各自存取電路430(下文指稱子存取電路430(例如子存取電路430A、430B及430C)),其結構可實質上類似於圖3中所繪示之存取電路315。 Each of the 2-port memory structures 405A, 405B, and 410 also includes a respective access circuit 430 (hereinafter referred to as sub-access circuit 430 (for example, sub-access circuit 430A, 430B, and 430C)), and its structure can be substantially similar to The access circuit 315 shown in FIG. 3.

三個子存取電路430之各者之各埠連接至一存取電路435。 例如,第一存取電路435A連接至各子存取電路430之下讀取埠,而第二存取電路435B連接至各子存取電路430之上讀取埠。各存取電路435A可具有實質上類似於圖3之存取電路315之一結構。由於各存取電路435具有兩個讀取埠,所以4埠記憶體結構400具有總共四個讀取埠440A、440B、440C及440D,其等各能夠並行存取資料子集「A」、「B」、「C」及「D」之任何者。 Each port of each of the three sub-access circuits 430 is connected to an access circuit 435. For example, the first access circuit 435A is connected to the lower read port of each sub-access circuit 430, and the second access circuit 435B is connected to the upper read port of each sub-access circuit 430. Each access circuit 435A may have a structure substantially similar to the access circuit 315 of FIG. 3. Since each access circuit 435 has two read ports, the 4-port memory structure 400 has a total of four read ports 440A, 440B, 440C, and 440D, each of which can access the data subsets "A" and " Any of "B", "C" and "D".

如圖4中所繪示,用於提供對n個資料輸入項之存取之4埠記憶體結構400可使用各儲存含有n/4個資料輸入項之一表之9個(或32個)1埠記憶體結構來建構,或建構為各儲存含有總共(3n/4)個資料輸入項之表之三個2埠記憶體結構。因此,4埠記憶體結構400儲存含有總共9n/4個資料輸入項之表。一般而言,使用上文所討論之建構方案,可使用儲存總共(3/2)m*n個輸入項之上子結構、下子結構及差異子結構來建構經組態以具有用於提供對n個資料輸入項之並行存取之2m個埠之一記憶體結構。相比而言,僅複製一1埠記憶體結構以提供額外埠將需要儲存2m個埠之2m*n個輸入項。 Depicted in FIG. 4, for providing the n 4 data ports 400 access the memory structure of the entry may be used to store each one of the tables contains nine of n / 4 th Data entry (2 or 3 ) A 1-port memory structure, or as three 2-port memory structures each storing a table containing a total of (3n/4) data entry items. Therefore, the 4-port memory structure 400 stores a table containing a total of 9n/4 data entry items. Generally speaking, using the construction scheme discussed above, a total of (3/2) m *n input items can be stored in the upper sub-structure, lower sub-structure and difference sub-structure to construct a configured configuration to provide A memory structure of 2 m ports for parallel access of n data input items. In contrast, only one copy port memory structure 1 to provide additional ports would need to store ports of 2 m 2 m * n number of entries.

圖5繪示4埠記憶體結構之埠如何能夠並行存取由結構儲存之任何資料輸入項之一圖式。如上文所討論,由4埠記憶體結構儲存之資料輸入項可分成四個資料子集四分體:「A」、「B」、「C」及「D」,其等之各者可作為一表儲存於一單一1埠記憶體結構上。另外,4埠結構含有儲存指示一或多對資料子集之間的差異之表之5個額外1埠結構(例如儲存一對資料子集之間的差異之四個結構及儲存兩對資料子集之差異之一個結構)。因而,使用9個1埠記憶體結構(例如分組成3個2埠記憶體結構,如圖4中所繪示)來組裝4埠記憶體結構。 FIG. 5 shows a diagram of how the ports of the 4-port memory structure can concurrently access any data input items stored by the structure. As discussed above, the data input items stored by the 4-port memory structure can be divided into four data subsets: "A", "B", "C" and "D", each of which can be used as A table is stored on a single 1-port memory structure. In addition, the 4-port structure contains five additional 1-port structures that store the table indicating the difference between one or more pairs of data subsets (for example, the four structures that store the difference between a pair of data subsets and the storage of two pairs of data subsets). A structure of the set of differences). Therefore, nine 1-port memory structures (for example, grouped into three 2-port memory structures, as shown in FIG. 4) are used to assemble a 4-port memory structure.

使用上述建構,一4埠記憶體結構之一讀取埠能夠使用四種不同方法之一者來存取一特定資料輸入項(例如資料子集「A」之一資料輸入項)以允許記憶體結構之所有四個讀取埠並行存取資料輸入項。使用第一方法502,一讀取埠可透過儲存含有資料子集「A」之表之1埠記憶體結構(例如圖4中所繪示之結構415A)來存取資料子集「A」。另一方面,剩餘方法504、506及508需要藉由存取複數個其他記憶體結構來重建資料子集「A」。例如,讀取埠可使用第二方法504來存取儲存含有資料子集「B」(結構41B)及資料子集「A」與「B」之間的差異(結構415C)之表之記憶體結構以判定資料子集「A」之輸入項。替代地,使用第三方法506,埠可存取儲存含有資料子集「C」(結構420A)及「A」與「C」之間的差異(結構425A)之表之記憶體結構以判定資料子集「A」之輸入項。使用第四方法508,埠存取儲存含有資料子集「D」(結構420B)、「C」與「D」之間的差異(結構420C)、「B」與「D」之間的差異(結構425B)及所有四個資料子集之間的差異(結構425C)之表之記憶體結構以判定資料子集「A」之輸入項。亦可依一類似方式各使用四種不同方法之一者來判定剩餘資料子集「B」、「C」及「D」之各者。因此,可在四個讀取埠之各者處並行存取或判定來自資料子集之任何者之任何資料。 Using the above construction, a read port of a 4-port memory structure can use one of four different methods to access a specific data entry (for example, a data entry of data subset "A") to allow memory All four read ports of the structure access data input items in parallel. Using the first method 502, a read port can access the data subset "A" by storing the 1-port memory structure of the table containing the data subset "A" (for example, the structure 415A shown in FIG. 4). On the other hand, the remaining methods 504, 506, and 508 need to reconstruct the data subset "A" by accessing a plurality of other memory structures. For example, the read port can use the second method 504 to access the memory that stores the table containing the data subset "B" (structure 41B) and the difference between the data subset "A" and "B" (structure 415C) Structure to determine the input items of the data subset "A". Alternatively, using the third method 506, the port can access and store the memory structure containing the data subset "C" (structure 420A) and the table of the difference between "A" and "C" (structure 425A) to determine the data Input items of subset "A". Using the fourth method 508, the port access storage contains the data subset "D" (structure 420B), the difference between "C" and "D" (structure 420C), and the difference between "B" and "D" ( The memory structure of the table of structure 425B) and the difference between all four data subsets (structure 425C) is used to determine the input items of data subset "A". It is also possible to use one of four different methods to determine each of the remaining data subsets "B", "C" and "D" in a similar manner. Therefore, any data from any of the data subsets can be accessed or determined in parallel at each of the four read ports.

上述技術及建構可經進一步外推以建構具有2m個讀取埠之記憶體結構。圖6繪示根據一些實施例之由三個2m-1埠結構建構之具有2m個讀取埠之一記憶體結構。使用包括一下結構605A、一上結構605B及一差異結構610之三個2m-1埠記憶體結構來建構用於提供對n個資料輸入項之存取之各2m埠記憶體結構。各2m-1埠記憶體結構提供對n/2個資料輸入項之存取。例如,n個資料輸入項之下半部儲存於下結構605A中,n個資料 輸入項之上半部儲存於上結構605B中,且差異結構610儲存對應下半部輸入項與上半部輸入項之間的差異。 The above technology and construction can be further extrapolated to construct a memory structure with 2 m read ports. FIG. 6 shows a memory structure with 2 m read ports constructed from three 2 m-1 port structures according to some embodiments. Three 2 m-1 port memory structures including a lower structure 605A, an upper structure 605B, and a differential structure 610 are used to construct each 2 m port memory structure for providing access to n data input items. Each 2 m-1 port memory structure provides access to n/2 data input items. For example, the lower half of n data input items are stored in the lower structure 605A, the upper half of n data input items are stored in the upper structure 605B, and the difference structure 610 stores the corresponding lower half of the input items and the upper half of the input The difference between items.

2m-1埠記憶體結構之2m-1個埠之各者映射至一存取電路615(例如存取電路615-1至615-2m-1)。例如,三個2m-1個埠記憶體結構之各者之一第一埠映射至一第一存取電路615-1,三個2m-1個埠記憶體結構之各者之一第二埠映射至一第二存取電路615-2等等,直至存取電路615-2m-1 Each of the 2 m-1 ports of the 2 m-1 port memory structure is mapped to an access circuit 615 (for example, the access circuits 615-1 to 615-2 m-1 ). For example, each one of those three 2 m-1 th memory structure of the first port is mapped to a first access port circuit 615-1, each one of those three port 2 m-1 memory structure of th The two ports are mapped to a second access circuit 615-2 and so on, up to the access circuit 615-2 m-1 .

各存取電路615含有兩個讀取埠625(例如一下讀取埠及一上讀取埠625-1及625-2...625-(2m-1-1)及625-2m-1)),其等各能夠直接存取2m-1埠下結構605A及2m-1埠上結構605B或使用差異結構610及相對結構來判定下結構或上結構之資料輸入項之值。例如,各存取電路可經組態使得其各自下讀取埠總是能夠直接存取下子表605A,但在各自上讀取埠需要同時存取上子表605B時使用下子表605A及差異子表610來判定上子表605B中之輸入項之值。類似地,上讀取埠總是能夠存取上子表605B,但在下讀取埠同時存取下子表605A時使用上子表605B及差異子表610來判定下子表605A中之輸入項之值。 Each access circuit 615 contains two reading ports 625 (for example, a lower reading port and an upper reading port 625-1 and 625-2...625-(2 m-1 -1) and 625-2 m- 1)), each of them can directly access the 2 m-1 port lower structure 605A and the 2 m-1 port upper structure 605B or use the difference structure 610 and the relative structure to determine the value of the data input item of the lower structure or the upper structure. For example, each access circuit can be configured so that its respective lower read port can always directly access the lower sub-table 605A, but when the respective upper read port needs to simultaneously access the upper sub-table 605B, the lower sub-table 605A and the difference sub-table 605A are used. Table 610 determines the value of the entry in the upper sub-table 605B. Similarly, the upper read port can always access the upper sub-table 605B, but when the lower read port accesses the lower sub-table 605A at the same time, the upper sub-table 605B and the difference sub-table 610 are used to determine the value of the entry in the lower sub-table 605A .

因此,如圖6中所繪示,2m埠記憶體結構將包括映射至三個2m-1埠記憶體結構之各者之2m-1個存取電路。由於各存取電路615含有2個埠,所以總共2m個埠可用。此允許2m埠結構使用(3/2)m*n個輸入項來建構。 Therefore, as shown in FIG. 6, the 2 m- port memory structure will include 2 m-1 access circuits mapped to each of the three 2 m-1 port memory structures. Since each access circuit 615 contains 2 ports, a total of 2 m ports are available. This allows a 2 m port structure to be constructed using (3/2) m *n input items.

儘管本文所描述之技術主要討論使用1埠記憶體結構來建構具有多個讀取埠之記憶體結構,但應瞭解,在其他實施例中,具有一個以上讀取埠(例如2個讀取埠、3個讀取埠等等)之記憶體結構可用於建構具有額外讀取埠之記憶體結構。例如,各具有k個讀取埠之三個記憶體結構 可用於建構使用上述組態之具有高達2k個讀取埠之一記憶體結構。 Although the technology described herein mainly discusses the use of a 1-port memory structure to construct a memory structure with multiple read ports, it should be understood that in other embodiments, there are more than one read port (for example, two read ports). , 3 read ports, etc.) memory structure can be used to construct a memory structure with additional read ports. For example, three memory structures each with k read ports It can be used to construct a memory structure with up to 2k read ports using the above configuration.

另外,所建構之記憶體結構不必受限於2m個埠。例如,若記憶體結構之一特定層級含有不是2之一冪之一埠數目,則一後續層級亦可具有不是2之一冪之一埠數目(例如,三個3埠記憶體結構可用於建構一6埠記憶體結構)。另外,可在一給定層級處使用較少存取電路以減少可用讀取埠之總數。例如,參考圖6,可使用小於(m-1)個存取電路,使得不是各2m-1埠子結構之每個埠映射至一存取電路615以導致小於2m個總讀取埠。 In addition, the constructed memory structure need not be limited to 2 m ports. For example, if a particular level of the memory structure contains a port number that is not a power of 2, then a subsequent level may also have a port number that is not a power of 2 (for example, three 3-port memory structures can be used to construct A 6-port memory structure). In addition, fewer access circuits can be used at a given level to reduce the total number of available read ports. For example, referring to FIG. 6, less than (m-1) access circuits can be used, so that not each port of each 2 m-1 port substructure is mapped to an access circuit 615, resulting in less than 2 m total read ports .

儘管上述實例繪示使用儲存資料子集之兩個子結構(例如下半部及上半部)來建構之一多讀取埠記憶體結構之各層級,但應瞭解,在其他實施例中,可使用不同數目個子結構。例如,在一些實施例中,資料輸入項可劃分於三個子結構與一差異子結構之間。可使用一不同運算(諸如addition mod 3)來替代對應於一XOR運算之差異。在一些實施例中,一存取電路可經組態以將複數個子結構連接至兩個以上埠及控制對複數個子結構之存取。 Although the above example illustrates the use of two sub-structures (such as the lower half and the upper half) of the stored data subset to construct the various levels of a multi-access memory structure, it should be understood that in other embodiments, A different number of substructures can be used. For example, in some embodiments, the data entry items can be divided between three substructures and a difference substructure. A different operation (such as addition mod 3) can be used to replace the difference corresponding to an XOR operation. In some embodiments, an access circuit can be configured to connect multiple sub-structures to more than two ports and control access to multiple sub-structures.

寫入資料 Write data

由複數個組成記憶體結構(例如圖2中所描述之1埠記憶體結構200)建構之一多讀取埠記憶體結構(例如圖6中所描述之2m埠記憶體結構600)可被視為包括複數個層級。例如,由複數個1埠記憶體結構建構之一2m埠記憶體結構可包括m個層級,各層級包括各具有2m-k個讀取埠之3k個記憶體結構,其中k指示一層級且對應於1至m之間的一整數。例如,參考圖4中所繪示之4埠記憶體結構400,記憶體結構400具有含有三個2埠記憶體結構之一k=1層級及含有9個1埠記憶體結構之一k=2層級。 A multi-port memory structure (such as the 2 m- port memory structure 600 described in FIG. 6) constructed by a plurality of constituent memory structures (such as the 1-port memory structure 200 described in FIG. 2) can be It is considered to include multiple levels. For example, a 2 m- port memory structure constructed from a plurality of 1-port memory structures may include m levels, and each level includes 3 k memory structures each with 2 mk read ports, where k indicates a level and Corresponds to an integer between 1 and m. For example, referring to the 4-port memory structure 400 shown in FIG. 4, the memory structure 400 has a k=1 level with one of three 2-port memory structures and a k=2 with one of nine 1-port memory structures Level.

當將資料寫入至一2m埠記憶體結構中時,使用一遞歸寫入程序以使待寫入之資料反映於結構之所有層級中。例如,參考圖4中所繪示之組態,為將新資料寫入至一特定資料子集(例如資料子集「B」)中,需要將資料寫入至2埠下結構405A及差異結構410。在2埠下結構405A內,將資料寫入至上結構415B(含有儲存資料子集「B」之表)。另外,亦重新計算差異結構415C(儲存(A♁B))之資料。另外,在2埠差異結構410內,亦需要重新計算上結構425B(儲存(B♁D))及差異結構425C(儲存(A♁C)♁(B♁D))內之資料。 When writing data into a 2 m- port memory structure, a recursive writing process is used to make the data to be written reflect in all levels of the structure. For example, referring to the configuration shown in Figure 4, in order to write new data into a specific data subset (for example, data subset "B"), it is necessary to write data to the 2-port lower structure 405A and the difference structure 410. In the 2-port lower structure 405A, write data to the upper structure 415B (the table containing the stored data subset "B"). In addition, the data of the difference structure 415C (stored (A♁B)) is also recalculated. In addition, in the 2-port differential structure 410, the data in the upper structure 425B (storage (B♁D)) and the differential structure 425C (storage (A♁C)♁(B♁D)) also need to be recalculated.

額外組態資訊Additional configuration information

已為了說明而呈現本發明之實施例之以上描述;其不意欲具窮舉性或使本發明受限於所揭示之精確形式。熟習相關技術者應瞭解,可鑑於依據上述揭示內容來進行諸多修改及變動。 The above description of the embodiments of the invention has been presented for illustration; it is not intended to be exhaustive or to limit the invention to the precise form disclosed. Those who are familiar with the relevant technology should understand that many modifications and changes can be made based on the above disclosure.

本描述之一些部分從演算法及對資訊之運算之符號表示方面描述本發明之實施例。此等演算法描述及表示常由熟習資料處理技術者用於將其工作之實質有效傳達給其他熟習技術者。當從功能、計算或邏輯上描述時,此等運算應被理解為由電腦程式或等效電路、微碼或其類似者實施。此外,有時亦證明,將運算之此等配置指稱模組係方便的,且不失一般性。所描述之運算及其相關聯模組可體現為軟體、韌體、硬體或其等之任何組合。 Some parts of this description describe the embodiments of the present invention in terms of algorithms and symbolic representations of operations on information. These algorithm descriptions and representations are often used by those who are familiar with data processing technology to effectively convey the essence of their work to other people who are familiar with the technology. When described in terms of function, calculation or logic, these operations should be understood as being implemented by computer programs or equivalent circuits, microcode or the like. In addition, it is sometimes proved that it is convenient to refer to such configurations of operations as modules without loss of generality. The described operations and their associated modules can be embodied in software, firmware, hardware, or any combination thereof.

可使用一或多個硬體或軟體模組單獨或與其他器件組合執行或實施本文所描述之步驟、運算或程序之任何者。在一實施例中,一軟體模組使用包括含有電腦程式碼之一電腦可讀媒體之一電腦程式產品來實施,電腦程式碼可由一電腦處理器執行以執行所描述之步驟、運算或程序 之任何者或全部。 One or more hardware or software modules can be used alone or in combination with other devices to execute or implement any of the steps, operations, or procedures described herein. In one embodiment, a software module is implemented using a computer program product including a computer readable medium containing computer program code, and the computer program code can be executed by a computer processor to perform the described steps, operations, or procedures Any or all of.

本發明之實施例亦可關於用於執行本文運算之一裝置。此裝置可專為所需用途建構,及/或其可包括由儲存於電腦中之一電腦程式選擇性啟動或重新組態之一通用計算器件。此一電腦程式可儲存於一非暫時性、有形電腦可讀儲存媒體或適合於儲存電子指令之任何類型之媒體(其可耦合至一電腦系統匯流排)中。此外,本說明書中所提及之任何計算系統可包含一單一處理器或可為採用針對增加計算能力之多個處理器設計之架構。 The embodiments of the present invention may also relate to a device for performing the operations herein. The device may be specially constructed for the required purpose, and/or it may include a general-purpose computing device that is selectively activated or reconfigured by a computer program stored in the computer. This computer program can be stored in a non-transitory, tangible computer-readable storage medium or any type of medium suitable for storing electronic instructions (which can be coupled to a computer system bus). In addition, any computing system mentioned in this specification may include a single processor or may be an architecture that uses multiple processors designed to increase computing power.

本發明之實施例亦可關於由本文所描述之一計算程序產生之一產品。此一產品可包括源自一計算程序之資訊,其中資訊儲存於一非暫時性、有形電腦可讀儲存媒體上且可包含一電腦程式產品或本文所描述之其他資料組合之任何實施例。 The embodiments of the present invention may also be related to a product produced by a calculation program described herein. Such a product may include information derived from a computing process, where the information is stored on a non-transitory, tangible computer-readable storage medium and may include any embodiment of a computer program product or other combination of data described herein.

最後,已主要為了可讀性及教學而選擇本說明書中所使用之用語,且其未被選擇用於定界或限定本發明標的。因此,本發明之範疇不意欲受限於[實施方式],而是受限於發佈於由基於此之一申請案上之任何請求項。因此,實施例之揭示內容意欲說明而非限制以下申請專利範圍中所闡述之本發明之範疇。 Finally, the terms used in this specification have been chosen mainly for readability and teaching, and they have not been chosen to delimit or limit the subject matter of the present invention. Therefore, the scope of the present invention is not intended to be limited to [implementations], but to any claims issued on an application based on this. Therefore, the disclosure of the embodiments is intended to illustrate rather than limit the scope of the present invention described in the scope of the following patent applications.

100:多埠記憶體結構 100: Multi-port memory structure

102:算術邏輯單元(ALU) 102: Arithmetic Logic Unit (ALU)

104:讀取埠 104: Read port

Claims (15)

一種記憶體結構,其包括:一第一記憶體子結構,其經組態以儲存資料之一第一子集;一第二記憶體子結構,其經組態以儲存資料之一第二子集;一第三記憶體子結構,其經組態以儲存對應於資料之該第一子集及資料之該第二子集之差異資料;至少一第一讀取埠及一第二讀取埠,其等各經組態以接收自該第一記憶體子結構讀取資料之該第一子集或自該第二記憶體子結構讀取資料之該第二子集之讀取請求;及一存取電路,其連接至該第一記憶體子結構、該第二記憶體子結構及該第三記憶體子結構以控制由該第一讀取埠及該第二讀取埠存取該第一記憶體子結構、該第二記憶體子結構及該第三記憶體子結構,且其中該存取電路經組態以:回應於該第一讀取埠及該第二讀取埠兩者在一第一時間週期期間接收自儲存於該第二記憶體子結構上之資料之該第二子集讀取之讀取請求,而藉由將來自該第二記憶體子結構之該經請求資料提供至該第二讀取埠來優先化該第二讀取埠以滿足來自該第二讀取埠之該讀取請求,且使用儲存於該第一記憶體子結構上之資料之該第一子集及儲存於該第三記憶體子結構上之該差異資料之對應部分來重建資料之該第二子集之一部分以滿足來自該第一讀取埠之該讀取請求,及回應於該第一讀取埠及該第二讀取埠兩者在一第二時間週期期間 接收自儲存於該第一記憶體子結構上之資料之該第一子集讀取之讀取請求,而藉由將來自該第一記憶體子結構之該經請求資料提供至該第一讀取埠來優先化該第一讀取埠以滿足來自該第一讀取埠之該讀取請求,且使用儲存於該第二記憶體子結構上之資料之該第二子集及儲存於該第三記憶體子結構上之該差異資料之對應部分來重建資料之該第一子集之一部分以滿足來自該第二讀取埠之該讀取請求。 A memory structure comprising: a first memory substructure configured to store a first subset of data; a second memory substructure configured to store a second subset of data Set; a third memory substructure configured to store difference data corresponding to the first subset of data and the second subset of data; at least one first read port and a second read Ports, each of which is configured to receive a read request for the first subset of data read from the first memory substructure or the second subset of data read from the second memory substructure; And an access circuit connected to the first memory substructure, the second memory substructure, and the third memory substructure to control access by the first read port and the second read port The first memory substructure, the second memory substructure, and the third memory substructure, and the access circuit is configured to respond to the first read port and the second read port Both receive read requests from the second subset of data stored on the second memory substructure during a first time period, and by combining the read requests from the second memory substructure The requested data is provided to the second read port to prioritize the second read port to satisfy the read request from the second read port, and use the data stored on the first memory substructure The first subset and the corresponding part of the difference data stored on the third memory substructure to reconstruct a part of the second subset of data to satisfy the read request from the first read port, and Responding to both the first read port and the second read port during a second time period Receive a read request from the first subset read of data stored on the first memory substructure, and by providing the requested data from the first memory substructure to the first read Fetch ports to prioritize the first read port to satisfy the read request from the first read port, and use the second subset of data stored on the second memory substructure and store in the The corresponding part of the difference data on the third memory substructure reconstructs a part of the first subset of data to satisfy the read request from the second read port. 如請求項1之記憶體結構,其中透過該第一讀取埠來讀取資料之該第二子集之該重建部分,且其中透過該第二讀取埠來讀取資料之該第一子集之該重建部分。 Such as the memory structure of claim 1, wherein the reconstructed part of the second subset of data is read through the first read port, and the first sub-set of the second subset of data read through the second read port Set the reconstruction part. 如請求項2之記憶體結構,其中該存取電路經組態以回應於在一相同時脈循環期間無自該第二讀取埠接收之資料之該第二子集讀取之讀取請求而允許該第一讀取埠自儲存資料之該第二子集之該第二記憶體子結構讀取請求資料,及回應於在一相同時脈循環期間無自該第一讀取埠接收之資料之該第一子集讀取之讀取請求而允許該第二讀取埠自儲存資料之該第一子集之該第一記憶體子結構讀取請求資料。 Such as the memory structure of request 2, in which the access circuit is configured to respond to the read request of the second subset of the data received from the second read port during a same clock cycle And allow the first read port to read request data from the second memory substructure of the second subset of stored data, and respond to data not received from the first read port during a same clock cycle The read request for the first subset of data to read allows the second read port to read the requested data from the first memory substructure of the first subset of stored data. 如請求項1之記憶體結構,其中該差異資料包括資料之該第一子集及資料之該第二子集之對應部分之間的「互斥或(XOR)」值。 For example, the memory structure of request 1, wherein the difference data includes the "mutual exclusive OR (XOR)" value between the corresponding parts of the first subset of data and the second subset of data. 如請求項1之記憶體結構,其中該第一記憶體子結構、該第二記憶體 子結構及該第三記憶體子結構之各者含有2m-1個讀取埠,其中m係一正整數,該等讀取埠各能夠並行讀取包含在該各自記憶體子結構內之該資料之任何者。 For example, the memory structure of claim 1, wherein each of the first memory substructure, the second memory substructure, and the third memory substructure contains 2 m-1 read ports, where m is one A positive integer, each of the read ports can read any of the data contained in the respective memory substructure in parallel. 如請求項5之記憶體結構,其中該第一記憶體子結構、該第二記憶體子結構及該第三記憶體子結構之各者之該2m-1個讀取埠之至少一部分連接至一各自存取電路以控制由該各自存取電路之至少一各自第一讀取埠及第二讀取埠存取該第一記憶體子結構、該第二記憶體子結構及該第三記憶體子結構。 Such as the memory structure of claim 5, wherein at least a part of the 2 m-1 read ports of each of the first memory substructure, the second memory substructure, and the third memory substructure are connected To a respective access circuit to control access to the first memory sub-structure, the second memory sub-structure and the third from at least one respective first read port and second read port of the respective access circuit Memory substructure. 如請求項1之記憶體結構,其中資料之該第一子集及該第二子集與一函數相關聯,且其中該第一讀取埠及該第二讀取埠經組態以並行接收該等讀取請求作為一單指令多資料(SIMD)應用之部分。 Such as the memory structure of request 1, in which the first subset and the second subset of data are associated with a function, and the first read port and the second read port are configured to receive in parallel These read requests are part of a single instruction multiple data (SIMD) application. 如請求項1之記憶體結構,其中該第一時間週期及該第二時間週期對應於各自第一時脈循環及第二時脈循環。 Such as the memory structure of claim 1, wherein the first time period and the second time period correspond to the first clock cycle and the second clock cycle, respectively. 一種處理器,其包括:複數個算術邏輯單元(ALU),其等各經組態以實施一數學函數;一記憶體結構,其儲存表示不同輸入值之該數學函數之輸出之資料,且具有與該複數個ALU通信之複數個讀取埠,各ALU藉由經由一對應讀取埠發送該儲存資料之部分之讀取請求來實施該數學函數,該記憶體結構包括: 一第一記憶體,其用於儲存該資料之一第一子集;一第二記憶體,其用於儲存該資料之一第二子集;一第三記憶體,其用於儲存表示資料之該第一子集與資料之該第二子集之間的差異之差異資料;及一存取電路,其連接至該第一記憶體、該第二記憶體及該第三記憶體以控制由該複數個讀取埠存取該第一記憶體、該第二記憶體及該第三記憶體,且經組態以:回應於該複數個讀取埠之一第一讀取埠及一第二讀取埠兩者在一第一時間週期期間接收自儲存於該第二記憶體上之資料之該第二子集讀取之讀取請求,而藉由將來自該第二記憶體之該經請求資料提供至該第二讀取埠來優先化該第二讀取埠以滿足來自該第二讀取埠之該讀取請求,且使用儲存於該第一記憶體上之資料之該第一子集及儲存於該第三記憶體上之該差異資料之對應部分來重建資料之該第二子集之一部分以滿足來自該第一讀取埠之該讀取請求,及回應於該第一讀取埠及該第二讀取埠兩者在一第二時間週期期間接收自儲存於該第一記憶體上之資料之該第一子集讀取之讀取請求,而藉由將來自該第一記憶體之該經請求資料提供至該第一讀取埠來優先化該第一讀取埠以滿足來自該第一讀取埠之該讀取請求,且使用儲存於該第二記憶體上之資料之該第二子集及儲存於該第三記憶體上之該差異資料之對應部分來重建資料之該第一子集之一部分以滿足來自該第二讀取埠之該讀取請求。 A processor comprising: a plurality of arithmetic logic units (ALU), each of which is configured to implement a mathematical function; a memory structure, which stores data representing the output of the mathematical function of different input values, and has A plurality of read ports communicating with the plurality of ALUs, each ALU implements the mathematical function by sending a read request of the part of the stored data through a corresponding read port, the memory structure includes: A first memory, which is used to store a first subset of the data; a second memory, which is used to store a second subset of the data; a third memory, which is used to store representative data Difference data between the first subset and the second subset of data; and an access circuit connected to the first memory, the second memory, and the third memory to control The first memory, the second memory, and the third memory are accessed by the plurality of read ports, and are configured to respond to one of the plurality of read ports, a first read port and a Both second read ports receive read requests from the second subset of the data stored on the second memory during a first time period, and by transferring the read requests from the second memory The requested data is provided to the second read port to prioritize the second read port to satisfy the read request from the second read port, and use the data stored on the first memory The first subset and the corresponding part of the difference data stored on the third memory reconstruct a part of the second subset of data to satisfy the read request from the first read port, and respond to the Both the first read port and the second read port receive read requests read from the first subset of data stored on the first memory during a second time period, and by The requested data from the first memory is provided to the first read port to prioritize the first read port to satisfy the read request from the first read port, and use the data stored in the second The second subset of data on the memory and the corresponding part of the difference data stored on the third memory are used to reconstruct a part of the first subset of data to satisfy the read from the second read port Fetch request. 如請求項9之處理器,其中該差異資料包括資料之該第一子集及資料 之該第二子集之對應部分之間的XOR值。 Such as the processor of claim 9, wherein the difference data includes the first subset of data and data The XOR value between the corresponding parts of the second subset. 如請求項9之處理器,其中該第一記憶體、該第二記憶體及該第三記憶體之各者含有2m-1個讀取埠,其中m係一正整數,該等讀取埠各能夠並行讀取包含在該各自記憶體內之該資料之任何者。 For example, the processor of claim 9, wherein each of the first memory, the second memory, and the third memory contains 2 m-1 read ports, where m is a positive integer, and the reads Each port can read any of the data contained in the respective memory in parallel. 如請求項11之處理器,其中該2m-1個讀取埠之各者連接至一各自存取電路以控制由至少一各自第一讀取埠及第二讀取埠存取該第一記憶體、該第二記憶體及該第三記憶體。 For example, the processor of claim 11, wherein each of the 2 m-1 read ports is connected to a respective access circuit to control access to the first from at least one respective first read port and second read port Memory, the second memory, and the third memory. 如請求項9之處理器,其中該複數個ALU並行產生及發送讀取請求至該記憶體結構作為一單指令多資料(SIMD)應用之部分。 Such as the processor of claim 9, wherein the plurality of ALUs generate and send read requests in parallel to the memory structure as part of a single instruction multiple data (SIMD) application. 一種用於透過多個讀取埠自一記憶體讀取資料之方法,其包括:在一第一算術邏輯單元(ALU)及一第二ALU處接收由該第一ALU及該第二ALU處理之輸入資料;由該第一ALU及該第二ALU基於該所接收之輸入資料來產生自一記憶體結構擷取資料之各自第一讀取請求及第二讀取請求,該記憶體結構至少包括:一第一子結構,其儲存資料之一第一子集;一第二子結構,其儲存資料之一第二子集;一第三子結構,其儲存對應於資料之該第一子集及資料之該第二子集之差異資料; 在一第一時間週期期間,經由各自第一讀取埠及第二讀取埠來將該第一讀取請求及該第二讀取請求傳輸至該記憶體結構之一存取電路;在該存取電路處,回應於判定該第一讀取請求及該第二讀取請求二者正自資料之該第二子集請求資料,而藉由經由該第二讀取埠將由來自儲存於該第二子結構上之資料之該第二子集之該第二讀取請求所請求之資料提供至該第二ALU來優先化該第二讀取埠,且藉由使用儲存於該第一子結構上之資料之該第一子集及儲存於該第三子結構上之該差異資料重建資料之該第二子集之一請求部分而經由該第一讀取埠將由該第一讀取請求所請求之資料提供至該第一ALU。 A method for reading data from a memory through a plurality of reading ports, which includes: receiving processing by the first ALU and the second ALU at a first arithmetic logic unit (ALU) and a second ALU The input data; the first ALU and the second ALU are based on the received input data to generate respective first read requests and second read requests to retrieve data from a memory structure, the memory structure at least Including: a first substructure, which stores a first subset of data; a second substructure, which stores a second subset of data; and a third substructure, which stores the first subset corresponding to the data The difference data of the second subset of the set and data; During a first time period, the first read request and the second read request are transmitted to an access circuit of the memory structure through the respective first read port and the second read port; The access circuit responds to determining that both the first read request and the second read request are requesting data from the second subset of data, and by passing through the second read port, the data is stored in the The data requested by the second read request of the second subset of data on the second substructure is provided to the second ALU to prioritize the second read port, and stored in the first substructure by using The first subset of data on the structure and the difference data stored on the third substructure will rebuild a request part of the second subset of data and will be requested by the first read via the first read port The requested information is provided to the first ALU. 如請求項14之方法,其中該差異資料包括資料之該第一子集及資料之該第二子集之對應部分之間的XOR值。 Such as the method of claim 14, wherein the difference data includes an XOR value between corresponding parts of the first subset of data and the second subset of data.
TW108109969A 2019-03-22 2019-03-22 Data structures with multiple read ports, processor, and method for data structures with multiple read ports TWI719433B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW108109969A TWI719433B (en) 2019-03-22 2019-03-22 Data structures with multiple read ports, processor, and method for data structures with multiple read ports

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW108109969A TWI719433B (en) 2019-03-22 2019-03-22 Data structures with multiple read ports, processor, and method for data structures with multiple read ports

Publications (2)

Publication Number Publication Date
TW202036274A TW202036274A (en) 2020-10-01
TWI719433B true TWI719433B (en) 2021-02-21

Family

ID=74091174

Family Applications (1)

Application Number Title Priority Date Filing Date
TW108109969A TWI719433B (en) 2019-03-22 2019-03-22 Data structures with multiple read ports, processor, and method for data structures with multiple read ports

Country Status (1)

Country Link
TW (1) TWI719433B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040215679A1 (en) * 2003-04-23 2004-10-28 Mark Beaumont Method for finding global extrema of a set of shorts distributed across an array of parallel processing elements
US6988181B2 (en) * 2000-03-08 2006-01-17 Sun Microsystems, Inc. VLIW computer processing architecture having a scalable number of register files
US20160328158A1 (en) * 2015-05-07 2016-11-10 Marvell Israel (M.I.S.L) Ltd. Multi-bank memory with multiple read ports and multiple write ports per cycle
US20180267932A1 (en) * 2017-03-14 2018-09-20 Jianbin Zhu Shared Memory Structure for Reconfigurable Parallel Processor

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6988181B2 (en) * 2000-03-08 2006-01-17 Sun Microsystems, Inc. VLIW computer processing architecture having a scalable number of register files
US20040215679A1 (en) * 2003-04-23 2004-10-28 Mark Beaumont Method for finding global extrema of a set of shorts distributed across an array of parallel processing elements
US20160328158A1 (en) * 2015-05-07 2016-11-10 Marvell Israel (M.I.S.L) Ltd. Multi-bank memory with multiple read ports and multiple write ports per cycle
US20180267932A1 (en) * 2017-03-14 2018-09-20 Jianbin Zhu Shared Memory Structure for Reconfigurable Parallel Processor

Also Published As

Publication number Publication date
TW202036274A (en) 2020-10-01

Similar Documents

Publication Publication Date Title
US20220101896A1 (en) Data structures with multiple read ports
TWI622991B (en) Apparatuses and methods for cache operations
CN111052099B (en) In-memory processing
US8327071B1 (en) Interprocessor direct cache writes
CN103824251B (en) The technology of information is shared between different cache coherency domains
US8982140B2 (en) Hierarchical memory addressing
TWI656533B (en) Apparatus and method for calculating in a data path
US20100138614A1 (en) Compression Status Bit Cache And Backing Store
US20170004089A1 (en) Patch memory system
JP6791522B2 (en) Equipment and methods for in-data path calculation operation
TW201706855A (en) Translation lookaside buffer in memory
KR20160039687A (en) Independently addressable memory array address spaces
WO2017173754A1 (en) Method and device for on-chip repetitive addressing
US20220179823A1 (en) Reconfigurable reduced instruction set computer processor architecture with fractured cores
US20140160876A1 (en) Address bit remapping scheme to reduce access granularity of dram accesses
JP2020530176A (en) Reconfigurable cache architecture and cache coherency method
TWI719433B (en) Data structures with multiple read ports, processor, and method for data structures with multiple read ports
TWI515571B (en) Partition-free multi-socket memory system architecture
JP4451733B2 (en) Semiconductor device
TWI751882B (en) Data structures with multiple read ports, processor, and method for data structures with multiple read ports
US20240103755A1 (en) Data processing system and method for accessing heterogeneous memory system including processing unit
US20160034392A1 (en) Shared memory system
US20030191922A1 (en) Method and system for local memory addressing in single instruction, multiple data computer system
WO2020185239A1 (en) Data structures with multiple read ports
KR20230043619A (en) Memory device and method of implementing reducing timimg parameters and power comsumption of internal processing operations