CN114185513B - Data caching device and chip - Google Patents

Data caching device and chip Download PDF

Info

Publication number
CN114185513B
CN114185513B CN202210144832.5A CN202210144832A CN114185513B CN 114185513 B CN114185513 B CN 114185513B CN 202210144832 A CN202210144832 A CN 202210144832A CN 114185513 B CN114185513 B CN 114185513B
Authority
CN
China
Prior art keywords
queue
pointer
target
request information
counting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210144832.5A
Other languages
Chinese (zh)
Other versions
CN114185513A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Muxi Integrated Circuit Shanghai Co ltd
Original Assignee
Muxi Integrated Circuit Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Muxi Integrated Circuit Shanghai Co ltd filed Critical Muxi Integrated Circuit Shanghai Co ltd
Priority to CN202210144832.5A priority Critical patent/CN114185513B/en
Publication of CN114185513A publication Critical patent/CN114185513A/en
Application granted granted Critical
Publication of CN114185513B publication Critical patent/CN114185513B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention relates to a data caching device and a chip, which comprise an arithmetic unit, a counting queue, N pointer queues, N cache memories and a register group, wherein N is an integer greater than or equal to 2, the register group comprises M registers, each register can store request information, and M is an integer greater than or equal to 2; the arithmetic unit is connected with the counting queues, each pointer queue is connected with a corresponding cache memory, information interaction is carried out between the arithmetic unit and the counting queues based on a preset Credit-Debit protocol, and the maximum depth M of the counting queues is set in the Credit-Debit protocol. The invention reduces the area of the chip and reduces the power consumption of the chip in the data caching process.

Description

Data caching device and chip
Technical Field
The invention relates to the technical field of data caching, in particular to a data caching device and a chip.
Background
When multiple downstream processing units obtain requested information from the same upstream arithmetic unit, it is usually necessary to provide a corresponding FIFO (first in first out) buffer on the chip for each downstream processing unit. The bit width of the FIFO buffer is the same as the bit width of the request information sent by the upstream arithmetic unit, and since the bit width of the request information sent by the upstream arithmetic unit is large, usually tens of bits, even hundreds of bits are needed, a plurality of FIFO buffers with high bit widths need to be arranged on the chip, the chip area and the chip power consumption in the data caching process are increased, and therefore, it can be known how to reduce the chip area and the chip power consumption in the data caching process become a technical problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a data caching device and a chip, which reduce the area of the chip and reduce the power consumption of the chip in the data caching process.
According to a first aspect of the present invention, there is provided a data caching apparatus, comprising an arithmetic unit, a count queue, N pointer queues, N cache memories, and a register set, where N is an integer greater than or equal to 2, the register set includes M registers, each register is capable of storing a request message, and M is an integer greater than or equal to 2;
the arithmetic unit is connected with the counting queues, each pointer queue is connected with a corresponding cache memory, information interaction is carried out between the arithmetic unit and the counting queues based on a preset Credit-Debit protocol, and the maximum depth M of the counting queues is set in the Credit-Debit protocol;
the counting queue and the pointer queue are both first-in first-out queues, and the depth and the width of the counting queue and the pointer queue are the same;
the arithmetic unit is used for writing request information in the register group according to the information of the counting queue, and the counting queue and the corresponding pointer queue are updated correspondingly;
the high-speed buffer memory is used for reading request information from the register group according to corresponding pointer queue information, and the corresponding pointer queue and the counting queue are updated correspondingly.
Compared with the prior art, the invention has obvious advantages and beneficial effects. By means of the technical scheme, the data cache device and the chip provided by the invention can achieve considerable technical progress and practicability, have wide industrial utilization value and at least have the following advantages:
the register group with the same bit width as the request information sent by the arithmetic unit is arranged in the data caching device to store the request information, the pointer queue corresponding to each cache memory is only the same as the bit width of the counting queue, the bit width of the pointer queue is reduced, all the cache memories share one register group, the area of a chip is reduced, and the power consumption of the chip in the data caching process is reduced.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly understood, the present invention may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present invention more clearly understood, the following preferred embodiments are described in detail with reference to the accompanying drawings.
Drawings
Fig. 1 is a schematic diagram of a data caching apparatus according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a data caching apparatus according to another embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description will be given to a specific implementation and effects of a data caching device and a chip according to the present invention with reference to the accompanying drawings and preferred embodiments.
An embodiment of the present invention provides a data caching apparatus, as shown in fig. 1, including an arithmetic unit, a count queue, N pointer queues, N cache memories (caches), and a register set, where N is an integer greater than or equal to 2, the register set includes M registers, each register is capable of storing one piece of request information, and M is an integer greater than or equal to 2.
The arithmetic unit is connected with the count queues, each pointer queue is connected with a corresponding cache memory, information interaction is carried out between the arithmetic unit and the count queues based on a preset Credit-Debit protocol, and a maximum depth value M of the count queues is set in the Credit-Debit protocol, so that the maximum depth value M is set according to a specific application scene, and specific values of M are not limited in the invention and can be set to 16, 32 and the like. Through a Credit-Debit protocol, at most M pieces of request information can be stored in the register group at the same time, and the count queue can limit the request information sent by the operation unit strictly based on the storage state of the register group. It is understood that the arithmetic unit is a unit capable of generating request information, and may specifically be a client, and the request information may be an instruction or data. The constituent units are connected through data access paths, and the connecting lines between every two units in the figure represent the data access paths.
The counting queue and the pointer queue are both first-in first-out queues (FIFO), and the depth and the width of the counting queue and the pointer queue are the same, so that normal data caching can be ensured when extreme conditions occur, namely when all the continuous M pieces of request information correspond to one high-speed buffer memory.
The arithmetic unit is used for writing request information in the register group according to the information of the counting queue, and the counting queue and the corresponding pointer queue are updated correspondingly.
The high-speed buffer memory is used for reading request information from the register group according to corresponding pointer queue information, and the corresponding pointer queue and the counting queue are updated correspondingly.
The device may perform the operation of writing the request information into the register set by the operation unit alone, perform the operation of reading the request information from the cache memory alone, and perform the operation of writing the request information into the register set by the operation unit and the operation of reading the request information from the cache memory at the same time, where the register set, the count queue, and the pointer queue are all dynamically changed.
The embodiment of the invention stores the request information by setting the register group with the same bit width as the request information sent by the arithmetic unit in the data caching device, the pointer queue corresponding to each cache memory is only the same as the bit width of the counting queue, the bit width of the pointer queue is reduced, all the cache memories share one register group, the area of a chip is reduced, and the power consumption of the chip in the data caching process is reduced.
In order to improve load balance of the N caches so that request information of the execution unit is distributed to the N caches as evenly as possible, as an embodiment, as shown in fig. 2, the apparatus further includes a selection unit connected to the operation unit and the N pointer queues, respectively, and the selection unit is configured to determine a first target cache from the N caches according to the request information.
As an embodiment, when N =2, the selecting unit is specifically configured to obtain at least one preset detection bit information from the request information to be stored, determine a first target cache memory from the N cache memories based on one detection bit information, or perform an exclusive-or operation based on a plurality of detection bit information, and determine the first target cache memory from the N cache memories according to an exclusive-or operation result. Taking N caches as the first cache and the second cache as an example, if the x-th bit of the request message is 0, the first cache is determined as the first target cache, and if the x-th bit of the request message is 1, the second cache is determined as the first target cache. Alternatively, if the exclusive or result of the c-th bit, the e-th bit, the f-th bit, and the g-th bit of the request information is 0, the first cache memory is determined as the first target cache memory. And if the XOR result of the c bit, the e bit, the f bit and the g bit of the request information is 1, determining the second cache memory as a first target cache memory. It will be appreciated that the selection of specific detection bits may make the request information distribution more uniform.
As another example, the selecting unit is specifically configured to obtain sequence information of preset a-th bit to b-th bit from the request information to be stored, perform a hash operation based on the sequence information, and determine a first target cache memory from the N cache memories based on a hash operation result. It is understood that the selection of the specific a-th bit to b-th bit can make the request information distribution more balanced.
The register set is { R1, R2, … RM }, Ri is the ith register, and the value range of i is 1-M; the device initializes M pointers { P1, P2, … PM } arranged in a sequence in the counting queue in an initialization stage, wherein Pi denotes an ith pointer, and the Pi pointer points to an ith register Ri, and it can be understood that in the initialization stage, the register group is empty, and the counting queue is a first-in first-out queue, so that the previous M request messages are sequentially stored in the register group according to the sequence of P1, P2, … PM, but subsequently, due to the difference of the sequence, speed and the like of executing a data reading instruction by each cache memory, the register group is not necessarily read according to the sequence of P1, P2, … PM, and therefore, the sequence of the subsequent counting queue may no longer be P1, P2, … PM. But both the count queue and the pointer queue must be processed in a first-in-first-out order.
As an example, when the apparatus performs an operation of requesting information to be written into a register group:
the arithmetic unit is used for generating request information to be stored, judging whether a pointer exists in the counting queue or not, if so, directly acquiring the pointer at the head of the counting queue as a first target pointer, and sending the first target pointer and the request information to be stored to the register group and the selection unit. If no pointer exists in the current counting queue, when the pointer exists in the counting queue, the pointer at the head of the counting queue is obtained again to serve as a first target pointer, and the first target pointer and the request information to be stored are sent to the register group and the selection unit. It is understood that this process is implemented primarily based on the Credit-bit protocol.
The count queue is used for clearing the first target pointer from the count queue, which indicates that the first target pointer is used, and the first target pointer cannot be reused before the request information in the register corresponding to the first target pointer is cleared.
The selection unit is used for determining a first target cache memory from the N cache memories based on the request information to be stored, a pointer queue corresponding to the first target cache memory is a first target pointer queue, and the first target pointer is sent to the first target pointer queue.
The first target pointer queue is used for storing the first target pointer into the tail of the first target pointer queue, so that each pointer queue operates in a first-in first-out mode.
The register group is used for storing the request information to be stored into the register corresponding to the first target pointer, after the request information with storage is stored into the register, the register is occupied, and the corresponding pointer in the counting queue is cleared, so that the register can not be reused before the request information is read out.
As an embodiment, setting j to take values from 1 to N, when the apparatus performs an operation of reading request information from the register set in the jth cache memory:
the jth cache is configured to send a read request to a jth pointer queue, read a pointer at a head of the queue from the jth pointer queue, and send a second target pointer to the register group and the count queue as a second target pointer.
The jth pointer queue is used for clearing the second target pointer from the jth pointer queue.
The register set is configured to read request information from the register corresponding to the second target pointer, send the request information to the jth cache memory, empty the request information in the register corresponding to the second target pointer, and after the empty request information is empty, the register can be used as storage of subsequent request information.
The count queue is configured to store the second target pointer at the end of the count queue such that the pointer can be reused for subsequent requests.
The invention also provides a chip comprising the data caching device. The register group with the same bit width as the request information sent by the arithmetic unit is arranged in the data caching device to store the request information, the pointer queue corresponding to each high-speed buffer memory is only the same as the bit width of the counting queue, the bit width of the pointer queue is reduced, all the high-speed buffer memories share one register group, the area of a chip is reduced, and the power consumption of the chip in the data caching process is reduced.
The embodiment of the invention also provides electronic equipment comprising the chip.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A data caching device, characterized in that,
the device comprises an operation unit, a count queue, N pointer queues, N cache memories, a selection unit and a register set, wherein N is an integer greater than or equal to 2, the register set comprises M registers { R1, R2, … RM }, Ri is the ith register, the value range of i is 1 to M, each register can store request information, and M is an integer greater than or equal to 2;
the arithmetic unit is connected with the counting queues, each pointer queue is connected with a corresponding cache memory, and the maximum depth value of the counting queues is set to be M;
the selection unit is respectively connected with the arithmetic unit and the N pointer queues;
the counting queue and the pointer queue are both first-in first-out queues, and the depth and the width of the counting queue and the pointer queue are the same;
in an initialization stage, initializing M pointers { P1, P2, … PM } arranged in sequence in the counting queue, wherein Pi represents an ith pointer, and the Pi pointer points to an ith register Ri;
the arithmetic unit is used for writing request information in the register group according to the information of the counting queue, and the counting queue and the corresponding pointer queue are updated correspondingly;
when the device executes the operation process of writing the request information into the register group:
the arithmetic unit is used for generating request information to be stored, judging whether a pointer exists in the counting queue or not, if so, directly acquiring the pointer at the head of the counting queue as a first target pointer, and sending the first target pointer and the request information to be stored to the register group and the selection unit;
the count queue is used for clearing the first target pointer from the count queue;
the selection unit is used for determining a first target cache memory from the N cache memories based on the request information to be stored, wherein a pointer queue corresponding to the first target cache memory is a first target pointer queue, and the first target pointer is sent to the first target pointer queue;
the first target pointer queue is used for storing the first target pointer into the tail of the first target pointer queue;
the register group is used for storing the request information to be stored into a register corresponding to the first target pointer;
the high-speed buffer memory is used for reading request information from the register group according to corresponding pointer queue information, and the corresponding pointer queue and the counting queue are updated correspondingly;
setting j to take values from 1 to N, when the device executes the operation process of reading the request information from the register group by the jth cache memory:
the j-th cache memory is used for sending a read request to a j-th pointer queue, reading a pointer of a queue head from the j-th pointer queue as a second target pointer, and sending the second target pointer to the register group and the count queue;
the j pointer queue is used for clearing the second target pointer from the j pointer queue;
the register group is used for reading the request information from the register corresponding to the second target pointer, sending the request information to the jth cache memory, and clearing the request information in the register corresponding to the second target pointer;
the counting queue is used for storing the second target pointer into the tail of the counting queue.
2. The apparatus of claim 1,
the operation unit is further configured to, if no pointer exists in the current count queue, wait for the pointer existing in the count queue, obtain the pointer at the head of the count queue as a first target pointer, and send the first target pointer and the request information to be stored to the register group and the selection unit.
3. The apparatus of claim 1,
when N =2, the selecting unit is specifically configured to obtain at least one preset detection bit information from the request information to be stored, determine a first target cache memory from the N cache memories based on the one detection bit information, or perform an exclusive-or operation based on a plurality of detection bit information, and determine the first target cache memory from the N cache memories according to an exclusive-or operation result.
4. The apparatus of claim 1,
the selection unit is specifically configured to acquire preset sequence information from the a-th bit to the b-th bit from the request information to be stored, perform hash operation based on the sequence information, and determine a first target cache memory from the N cache memories based on a hash operation result;
the counting queue is used for storing the second target pointer into the tail of the counting queue.
5. The apparatus of claim 1,
the request information is a request instruction or request data.
6. A chip comprising the data caching device of any one of claims 1 to 5.
CN202210144832.5A 2022-02-17 2022-02-17 Data caching device and chip Active CN114185513B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210144832.5A CN114185513B (en) 2022-02-17 2022-02-17 Data caching device and chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210144832.5A CN114185513B (en) 2022-02-17 2022-02-17 Data caching device and chip

Publications (2)

Publication Number Publication Date
CN114185513A CN114185513A (en) 2022-03-15
CN114185513B true CN114185513B (en) 2022-05-20

Family

ID=80546146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210144832.5A Active CN114185513B (en) 2022-02-17 2022-02-17 Data caching device and chip

Country Status (1)

Country Link
CN (1) CN114185513B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5948081A (en) * 1997-12-22 1999-09-07 Compaq Computer Corporation System for flushing queued memory write request corresponding to a queued read request and all prior write requests with counter indicating requests to be flushed
US8942248B1 (en) * 2010-04-19 2015-01-27 Altera Corporation Shared control logic for multiple queues
CN105183665A (en) * 2015-09-08 2015-12-23 福州瑞芯微电子股份有限公司 Data-caching access method and data-caching controller

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0644245B2 (en) * 1983-12-29 1994-06-08 富士通株式会社 Store buffer device
US5450555A (en) * 1990-06-29 1995-09-12 Digital Equipment Corporation Register logging in pipelined computer using register log queue of register content changes and base queue of register log queue pointers for respective instructions
US6032179A (en) * 1996-08-14 2000-02-29 Mitsubishi Electric Information Technology Center America, Inc. (Ita) Computer system with a network interface which multiplexes a set of registers among several transmit and receive queues
AU2001285384A1 (en) * 2000-07-31 2002-02-13 Conexant Systems, Inc. Enhancing performance by pre-fetching and caching data directly in a communication processor's register set
WO2004066571A1 (en) * 2003-01-20 2004-08-05 Fujitsu Limited Network switch apparatus and network switch method
US20040181638A1 (en) * 2003-03-14 2004-09-16 Paul Linehan Event queue system
CN1238788C (en) * 2003-10-08 2006-01-25 复旦大学 First-in first-out register quenue arrangement capable of processing variable-length data and its control method
CN100362839C (en) * 2003-12-29 2008-01-16 中兴通讯股份有限公司 Multiple queue sequential buffer managing circuit and method based on pipeline
CN100555216C (en) * 2007-09-12 2009-10-28 华为技术有限公司 A kind of data processing method and processor
WO2010013189A2 (en) * 2008-07-29 2010-02-04 Nxp B.V. Data processing circuit with arbitration between a plurality of queues
CN105573711B (en) * 2014-10-14 2019-07-19 深圳市中兴微电子技术有限公司 A kind of data cache method and device
US10983725B2 (en) * 2018-03-01 2021-04-20 Synopsys, Inc. Memory array architectures for memory queues
CN113312278B (en) * 2021-07-29 2021-11-05 常州楠菲微电子有限公司 Device and method for statically allocating shared multi-queue cache

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5948081A (en) * 1997-12-22 1999-09-07 Compaq Computer Corporation System for flushing queued memory write request corresponding to a queued read request and all prior write requests with counter indicating requests to be flushed
US8942248B1 (en) * 2010-04-19 2015-01-27 Altera Corporation Shared control logic for multiple queues
CN105183665A (en) * 2015-09-08 2015-12-23 福州瑞芯微电子股份有限公司 Data-caching access method and data-caching controller

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于三级队列缓存的FlexRay网络监控平台;刘彪等;《计算机测量与控制》;20170825(第08期);全文 *

Also Published As

Publication number Publication date
CN114185513A (en) 2022-03-15

Similar Documents

Publication Publication Date Title
US11829295B2 (en) Efficient work unit processing in a multicore system
CN110741356A (en) Relay -induced memory management in multiprocessor systems
US9996490B2 (en) Technique for scaling the bandwidth of a processing element to match the bandwidth of an interconnect
US7366865B2 (en) Enqueueing entries in a packet queue referencing packets
CN108363620B (en) Memory module for providing virtual memory capacity and operation method thereof
US7327674B2 (en) Prefetching techniques for network interfaces
US10200313B2 (en) Packet descriptor storage in packet memory with cache
US10691731B2 (en) Efficient lookup in multiple bloom filters
US20190188239A1 (en) Dual phase matrix-vector multiplication system
US10218382B2 (en) Decompression using cascaded history windows
CN111949568A (en) Message processing method and device and network chip
US11048475B2 (en) Multi-cycle key compares for keys and records of variable length
KR20150077288A (en) A look-aside processor unit with internal and external access for multicore processors
JP2007034392A (en) Information processor and data processing method
US20190163443A1 (en) Hierarchical sort/merge structure using a request pipe
US10101963B2 (en) Sending and receiving data between processing units
US20220129275A1 (en) Circular queue management with split indexes
US8819305B2 (en) Directly providing data messages to a protocol layer
CN114185513B (en) Data caching device and chip
US20110055842A1 (en) Virtual multiple instance extended finite state machines with wait rooms and/or wait queues
CN116578245B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
US9055019B1 (en) Method and apparatus for message multicasting
US9137167B2 (en) Host ethernet adapter frame forwarding
Hendrantoro et al. Early result from adaptive combination of LRU, LFU and FIFO to improve cache server performance in telecommunication network
US11762773B2 (en) Memory-based synchronization of distributed operations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant