WO2017023659A1 - Method and apparatus for a processor with cache and main memory - Google Patents

Method and apparatus for a processor with cache and main memory Download PDF

Info

Publication number
WO2017023659A1
WO2017023659A1 PCT/US2016/044358 US2016044358W WO2017023659A1 WO 2017023659 A1 WO2017023659 A1 WO 2017023659A1 US 2016044358 W US2016044358 W US 2016044358W WO 2017023659 A1 WO2017023659 A1 WO 2017023659A1
Authority
WO
WIPO (PCT)
Prior art keywords
data items
memory
list
recent
data
Prior art date
Application number
PCT/US2016/044358
Other languages
French (fr)
Inventor
Jerry Hongming ZHENG
Original Assignee
Marvell World Trade Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US15/220,916 external-priority patent/US10552350B2/en
Application filed by Marvell World Trade Ltd. filed Critical Marvell World Trade Ltd.
Priority to TW105124513A priority Critical patent/TWI695268B/en
Priority to TW105124514A priority patent/TWI703437B/en
Priority to TW105124512A priority patent/TWI703445B/en
Priority to TW105124510A priority patent/TWI695262B/en
Priority to TW105124515A priority patent/TWI705327B/en
Publication of WO2017023659A1 publication Critical patent/WO2017023659A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • H04L12/407Bus networks with decentralised control
    • H04L12/413Bus networks with decentralised control with random access, e.g. carrier-sense multiple-access with collision detection (CSMA-CD)
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/32Handling requests for interconnection or transfer for access to input/output bus using combination of interrupt and burst mode transfer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • H04L12/40006Architecture of a communication node
    • H04L12/40019Details regarding a bus master
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/30Flow control; Congestion control in combination with information about buffer occupancy at either end or at transit nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/36Flow control; Congestion control by determining packet size, e.g. maximum transfer unit [MTU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0038System on Chip

Definitions

  • Implementations of the subject matter of this disclosure general ly pertain to apparatus and methods for performing l ist comparisons.
  • implementations of the subject matter of this disclosure pertain to apparatus and methods for accelerating l ist comparisons by checking recent "hits" first.
  • instructions under control, e.g., of a processor, microprocessor or microcontroller (hereafter collectively referred to as a
  • processor it is frequently necessary to compare a data item to a list of data items.
  • the number of comparisons that can be performed during each system clock cycle is limited, and decreases as the size of the items being compared increases. If all necessary comparisons cannot be performed within a single clock cycle, system performance may suffer.
  • Comparison circuitry includes a first memory that stores a list of data items, a second memory that stores a list of most- recent I y used ones of the data items, a first comparator that compares an input data item first to the ones of the data items in the second memory and, only in absence of a hit in the second memory, compares the input data item to the data items in the first memory.
  • the first memory and the second memory may be separate memory devices. In another implementation of such comparison circuity, the first memory and the second memory may be separate portions of a single memory device.
  • the first comparator may be a processor, operation of the processor is clocked, the second memory is sized to contain a number of the data items, and the number of the data items is a number of the data items on which the processor performs operations in a single clock cycle.
  • the first comparator compares the input data item to all of the data items in the first memory. In another implementation of such comparison circuitry, the first comparator compares the input data item only to the data items in the first memory that are not also in the second memory. In that other implementation, the first memory includes a respective flag for each respective data item stored in the first memory, to indicate whether the respective data item also is stored in the second memory.
  • Such comparison circuitry may further include at least one additional comparator.
  • Each respective one of the at least one additional comparator operates in parallel with the first comparator to compare the input data item to respective data items in at least one additional one of the second memory, and to compare the input data item to respective data items in the first memory in absence of a respective hit in the at least one additional one of the second memory.
  • a data communications system includes a plurality of integrated circuit devices, and a data bus interconnecting the plurality of integrated circuit devices.
  • At least one of the integrated circuit devices in the plurality of integrated circuit devices includes decoding circuitry, and the decoding circuitry includes a first memory that stores a list of data items, a second memory that stores a list of most-recent I y used data items, a first comparator that compares an input data item first to data items in the second memory and, only in absence of a hit in the second memory, compares the input data item to data items in the first memory.
  • the first comparator may include a processor, operation of the processor is clocked, the second memory is sized to contain a number of data items, and the number of data items is a number of data items on which the processor performs operations in a single clock cycle.
  • a comparison method for use in a data communications system, includes storing a list of data items, storing a list of most-recent I y used ones of the data items, comparing an input data item first to the list of most-recent I y used ones of the data items and, only in absence of a hit in the list of most-recent I y used ones of the data items, comparing the input data item to the list of the data items, for decoding the input data item for use in said data communications system.
  • the storing the list of data items and the storing the list of the most- recent I y used ones of the data items includes storing the list of data items in a first memory device, and storing the list of the most-recent I y used ones of the data items in a second memory device.
  • the storing the list of data items and the storing the list of the most-recent I y used ones of the data items includes storing the list of data items and the list of the most-recent I y used ones of the data items in first and second portions of a single memory device.
  • the comparing may be clocked, and the number of data items in the list of the most-recent I y used ones of the data items is a number of data items on which the comparing is performed in a single clock cycle.
  • the comparing the input data item to the list of the data items includes comparing the input data item to all data items in the list of the data items.
  • the comparing the input data item to the list of the data items includes comparing the input data item only to data items in the list of the data items that are not also in the list of the most-recent I y used ones of the data items.
  • the storing a list of data items includes storing a respective flag for each respective data item in the list of the data items, to indicate whether the respective data item also is in the list of the most-recent I y used ones of the data items.
  • Such a comparison method the comparing an input data item first to list of most-recent I y used ones of the data items may include comparing the input data item in parallel to respective data items in the list of most-recent I y used ones of the data items, and to respective data items in list of the data items in the absence of a respective hit in the list of most-recent I y used ones of the data items.
  • FIG. 1 shows a processor arrangement for performing comparison operations according to implementations of the subject matter of this disclosure.
  • FIG. 2 is a flow diagram illustrating an implementation of a comparison method according to the subject matter of this
  • a table of addresses may be provided to associate each logical address with a particular physical component.
  • an address is decoded by finding the address in the table, thereby identifying the associated component.
  • the address is compared to each entry in the table until a match is found. If the time needed to compare each entry in the table, and the number of entries that need to be compared, are great enough, the comparison will not be completed within a single clock cycle. While the amount of time needed for each comparison may be a function of the particular comparison scheme used, for any given comparison scheme— even if the comparison scheme is the fastest available comparison
  • comparison of an incoming data item to a list of other data items can be accelerated by judiciously choosing the order of comparison.
  • a different order is selected.
  • a separate list of most-recent I y used ones of the items in the list of other data items may be maintained, and the incoming data item is compared to most-recent I y used items in the separate list before being compared (if necessary) to the larger list of other data items.
  • the number of items kept in the separate list of most- recent I y used items may be selected to keep the time needed to compare the incoming data item to the separate list of most-recent I y used items to within one clock cycle. At least the approximate the length of time needed for each comparison would be known and the number of items in the separate list of most-recent I y used items can be limited accordingly to keep the comparisons of those items to within one clock cycle. That number may be a function of the size of each data item. For example, if the incoming data item is an address as in the example above, and each address is eight bits long, the number of items that can be compared within one clock cycle might be larger than if each address were thirty-two bits long.
  • comparison with the complete l ist may take more than one clock cycle, and even several clock cycles, and this would be in addition to the clock cycle consumed by the unsuccessful comparison of the incoming data item to the l ist of most-recent I y used data items.
  • One way to accelerate this longer comparison time is to exclude from the comparison those items in the l ist of most- recent I y used data items, as the incoming data item would already have been compared unsuccessful ly to those items.
  • the data items discussed herein can be of many different types, including, but not l imited to, device addresses, context IDs, storage locations or pointers, fi lenames, etc. It wi 11 be understood that any discussion herein of comparing an incoming data item to a l ist of data items refers to comparing the incoming data item to a l ist of data items of the same type. Thus, for example, an incoming device address would be compared to a l ist of device addresses, and not any other type of data items.
  • comparison methods and apparatus in accordance with embodiments of this disclosure may be used with a modular chip "hopping bus" architecture, including multiple
  • Such a system may operate at a high clock rate — e.g. , 1 GHz — which translates to a very short clock cycle (1 ns) which al lows only a smal l number of comparison operations.
  • wi l l include a processor (as defined above) to manage the intra- and inter-chip protocol.
  • Each chip wi l l be assigned an address, and the processor on the "master" chip wi l l perform comparisons as described herein to decode chip addresses.
  • the processor may also, e.g. , perform comparisons to decode context IDs from ID-based context look-up.
  • Implementations of the subject matter of this disclosure keep track of "last hit” (i.e. , most-recent I y used) information, to create a two- 1 eve I comparison schema.
  • the N last hit items may be referred to as:
  • N can be 1, 2, 4 or more, but N should be selected so that N items can be compared in a single cycle of the targeted clock rate.
  • the new incoming data item can be compared to LH [1 : N] in one clock cycle. If there is no hit from those comparisons, then the new incoming data items can be compared to al l items, or the remaining items, over the next several clock cycles. It is apparent, then, that if the new incoming data item matches any of the "last hit” (i.e. , most-recent I y used) data items, the comparison takes one clock cycle, but if the new incoming data item does not match any of the "last hit” data items, the comparison requires multiple cycles.
  • the "last hit" i.e. , most-recent I y used
  • a ful l comparison of 32 addresses cannot be performed in one cycle for a high-speed design, such as a system with a 1 GHz clock.
  • destination address 30->decoded to target zone 30 The four-comparator logic can run at very high speed and completed in a single cycle. If there is no match, then the ful l comparison against al l 32 addresses (or the remaining 28 addresses) can be performed, which wi l l take multiple cycles.
  • a ful l comparison of 64 context IDS cannot be performed in one cycle for a high-speed design, such as a system with a 1 GHz clock.
  • FIG. 1 shows a processor arrangement 100 for performing comparison operations according to implementations of the subject matter of this disclosure, whether included in a modular chip
  • Comparator 102 may be a separate hardware comparator, or may implemented as software in processor 101. A plural ity of
  • comparators 102 may be provided (additional instances of
  • comparator 102 are shown in phantom) to al low multiple comparisons to be performed in paral lel, which increases the number of comparisons that can be performed within a single clock cycle.
  • the incoming data item to be compared is stored in a memory or register 103 on one input of comparator 102.
  • the ful l set of data items against which the incoming data item is to be compared is stored in a memory 104.
  • a further memory 105 stores the most recent "hits" (i.e. , the most-recent I y used data items), against which the incoming data item is first checked.
  • a multiplexer 106 on the second input of comparator 102 selects the ful l set of data items in memory 104 or the most-recent I y used data items in memory 105 under control of processor 101.
  • Memory 104 may have provisions for each stored item to be flagged at 114 so that if a memory item is one of the most-recent I y used data items in memory 105, it can be flagged, if desired, to be skipped when the incoming data item in memory 103 is compared against the ful l set of data items in memory 104, to avoid dupl icate comparisons and thereby speed up the comparison against the ful l set of data items.
  • FIG. 2 shows a flow diagram of an implementation of a comparison method 200 according to the subject matter of this disclosure.
  • the data items being compared are addresses, but this is only an example and the data items could be any kind of data items that need to be compared against a l ist of data items.
  • Method 200 starts at 201 where the complete l ist of addresses and the l ist of most-recent I y used addresses are
  • the system waits for an address to be input. Once an address has been input, it is determined at 203 whether there are any more uncompared most-recent I y used addresses in the l ist of most- recent I y used addresses. On startup, the l ist of most-recent I y used addresses wi l l be empty, so the answer wi l l be "No.” The ful l l ist of addresses wi 11 be checked at 204 to determine whether the most- recent I y used addresses are flagged in the ful l l ist of addresses. Again, on startup, the answer wi l l be "No," and so the input address wi 11 be compared at 205 to an address in the ful I I ist of addresses.
  • wi l l be determined whether that comparison is a hit. If not, comparisons at 205 wi l l continue unti l a hit is determined at 206 (in this implementation there is no such thing as an inval id input address, so there wi l l eventual ly a hit; in implementations where an inval id address is a possibi l ity, further checking to determine whether al l addresses have been checked without a hit may be implemented).
  • the l ist of most-recent I y used addresses is updated by adding the current address as the most- recent I y used address; the I east-recent I y used address is discarded.
  • the l ist of most-recent I y used addresses is updated by adding the current address as the most- recent I y used address; the I east-recent I y used address is discarded.
  • an address is already present in the l ist of most-recent I y used addressed, that address wi l l not be added again, so that other addresses can remain in the l ist of most-recently used addresses.
  • the l ist of most-recently used addresses would have to contain an indication of how recently each address in the l ist of most-recently used addresses was used, so that when a new address is added to the l ist of most-recently used addresses, it is known which address that is already in the l ist of most-recently used addresses is the I east-recent I y used address and can be deleted.
  • the input address is decoded at 208 and the system returns to 202 to await another input address.
  • wi l l be at least one most-recently used address in the l ist of most-recently used addresses.
  • the input address wi l l be compared to a most- recently used address in the l ist of most-recently used addresses, and if there is a hit as determined at 212, meaning the input address matches the most-recently used address in the l ist of most-recently used addresses that is being compared, the input address wi l l be decoded at 208 and the system returns to 202 to await another input address.
  • the system returns to 203 to see if there are any more most-recently used addresses in the l ist of most-recently used addresses that have not yet been compared. If so, the system returns to 211 to compare the input address to the next most-recently used addresses in the l ist of most-recent I y used addresses and flow continues as before. If at 203 there are no more most-recent I y used addresses in the l ist of most- recent I y used addresses, the system continues to 204 and flow continues as before.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Bus Control (AREA)

Abstract

Comparison circuitry includes a first memory that stores a list of data items, a second memory that stores a list of most-recently used ones of the data items, a first comparator that compares an input data item first to the ones of the data items in the second memory and, only in absence of a hit in the second memory, compares the input data item to the data items in the first memory. At least one additional comparator may operate in parallel with the first comparator to compare the input data item to respective data items in at least one additional second memory, and to compare the input data item to respective data items in the first memory in absence of a respective hit in the at least one additional second memory. A data communications system may include a decoder incorporating such comparison circuitry.

Description

METHOD AND APPARATUS FOR A PROCESSOR WITH CACHE AND MAIN MEMORY
Cross Reference to Related Appl ications
[0001] This claims the benefit of United States Provisional Patent Appl ications Nos. 62/200,436; 62/200,462; 62/200,444; and 62/200,452 fi led August 3, 2015, and also claims the benefit of United States Provisional Patent Appl ication No. 62/218,296 fi led September 14, 2015, and the benefit of United States Non-Provisional Patent
Appl i cat ions Nos. 15/220,898; 15/220,916; 15/220,546; 15/220,923; and 15/220,684 fi led July 27, 2016, each of which is hereby incorporated by reference herein in its respective entirety.
Field of Use
[0002] Implementations of the subject matter of this disclosure general ly pertain to apparatus and methods for performing l ist comparisons. In particular, implementations of the subject matter of this disclosure pertain to apparatus and methods for accelerating l ist comparisons by checking recent "hits" first.
Background
[0003] The background description provided herein is for the purpose of general ly presenting the context of the disclosure. Work of the inventors hereof, to the extent the work is described in this background section, as wel l as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted to be prior art against the present disclosure.
[0004] In electronic systems that execute software, firmware or microcode (hereafter collectively referred to as "software" or
"instructions"), under control, e.g., of a processor, microprocessor or microcontroller (hereafter collectively referred to as a
"processor"), it is frequently necessary to compare a data item to a list of data items. The number of comparisons that can be performed during each system clock cycle is limited, and decreases as the size of the items being compared increases. If all necessary comparisons cannot be performed within a single clock cycle, system performance may suffer.
Summary
[0005] Comparison circuitry according to implementations of the subject matter of this disclosure includes a first memory that stores a list of data items, a second memory that stores a list of most- recent I y used ones of the data items, a first comparator that compares an input data item first to the ones of the data items in the second memory and, only in absence of a hit in the second memory, compares the input data item to the data items in the first memory.
[0006] In one implementation of such comparison circuity, the first memory and the second memory may be separate memory devices. In another implementation of such comparison circuity, the first memory and the second memory may be separate portions of a single memory device.
[0007] In such comparison circuity, the first comparator may be a processor, operation of the processor is clocked, the second memory is sized to contain a number of the data items, and the number of the data items is a number of the data items on which the processor performs operations in a single clock cycle.
[0008] In one implementation of such comparison circuity, in the absence of a hit in the second memory, the first comparator compares the input data item to all of the data items in the first memory. In another implementation of such comparison circuitry, the first comparator compares the input data item only to the data items in the first memory that are not also in the second memory. In that other implementation, the first memory includes a respective flag for each respective data item stored in the first memory, to indicate whether the respective data item also is stored in the second memory.
[0009] Such comparison circuitry may further include at least one additional comparator. Each respective one of the at least one additional comparator operates in parallel with the first comparator to compare the input data item to respective data items in at least one additional one of the second memory, and to compare the input data item to respective data items in the first memory in absence of a respective hit in the at least one additional one of the second memory.
[0010] A data communications system according to implementations of the subject matter of this disclosure includes a plurality of integrated circuit devices, and a data bus interconnecting the plurality of integrated circuit devices. At least one of the integrated circuit devices in the plurality of integrated circuit devices includes decoding circuitry, and the decoding circuitry includes a first memory that stores a list of data items, a second memory that stores a list of most-recent I y used data items, a first comparator that compares an input data item first to data items in the second memory and, only in absence of a hit in the second memory, compares the input data item to data items in the first memory.
[0011] In such a data communications system, the first comparator may include a processor, operation of the processor is clocked, the second memory is sized to contain a number of data items, and the number of data items is a number of data items on which the processor performs operations in a single clock cycle.
[0012] A comparison method according to implementations of the subject matter of this disclosure, for use in a data communications system, includes storing a list of data items, storing a list of most-recent I y used ones of the data items, comparing an input data item first to the list of most-recent I y used ones of the data items and, only in absence of a hit in the list of most-recent I y used ones of the data items, comparing the input data item to the list of the data items, for decoding the input data item for use in said data communications system.
[0013] In one implementation of such a comparison method, the storing the list of data items and the storing the list of the most- recent I y used ones of the data items includes storing the list of data items in a first memory device, and storing the list of the most-recent I y used ones of the data items in a second memory device. In another implementation of such a comparison method, the storing the list of data items and the storing the list of the most-recent I y used ones of the data items includes storing the list of data items and the list of the most-recent I y used ones of the data items in first and second portions of a single memory device.
[0014] In such a comparison method the comparing may be clocked, and the number of data items in the list of the most-recent I y used ones of the data items is a number of data items on which the comparing is performed in a single clock cycle.
[0015] In one implementation of such a comparison method, the comparing the input data item to the list of the data items includes comparing the input data item to all data items in the list of the data items. In another implementation of such a comparison method, the comparing the input data item to the list of the data items includes comparing the input data item only to data items in the list of the data items that are not also in the list of the most-recent I y used ones of the data items. In that other implementation of such a comparison method, the storing a list of data items includes storing a respective flag for each respective data item in the list of the data items, to indicate whether the respective data item also is in the list of the most-recent I y used ones of the data items.
[0016] Such a comparison method the comparing an input data item first to list of most-recent I y used ones of the data items may include comparing the input data item in parallel to respective data items in the list of most-recent I y used ones of the data items, and to respective data items in list of the data items in the absence of a respective hit in the list of most-recent I y used ones of the data items.
Brief Description of the Drawings
[0017] Further features of the disclosure, its nature and various advantages, will be apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:
[0018] FIG. 1 shows a processor arrangement for performing comparison operations according to implementations of the subject matter of this disclosure; and
[0019] FIG. 2 is a flow diagram illustrating an implementation of a comparison method according to the subject matter of this
disclosure.
Detai led Descr iption
[0020] As discussed above, in electronic systems that execute instructions under control of a processor, it is frequently necessary to compare a data item to a list of data items. The number of comparisons that can be performed during each system clock cycle is limited, and decreases as the size of the items being compared increases. If all necessary comparisons cannot be performed within a single clock cycle, system performance may suffer.
[0021] For example, in a system with multiple addressable components, a table of addresses may be provided to associate each logical address with a particular physical component. To communicate with a desired component, an address is decoded by finding the address in the table, thereby identifying the associated component. To find the address in the table, the address is compared to each entry in the table until a match is found. If the time needed to compare each entry in the table, and the number of entries that need to be compared, are great enough, the comparison will not be completed within a single clock cycle. While the amount of time needed for each comparison may be a function of the particular comparison scheme used, for any given comparison scheme— even if the comparison scheme is the fastest available comparison
scheme— there wi 11 be a maximum number of comparisons that can be performed in a single clock cycle.
[0022] In accordance with implementations of the subject matter of this disclosure, comparison of an incoming data item to a list of other data items can be accelerated by judiciously choosing the order of comparison. Thus, rather than compare the incoming data item to the list of other data items in the order in which the other data items appear in the list, a different order is selected. In particular, a separate list of most-recent I y used ones of the items in the list of other data items may be maintained, and the incoming data item is compared to most-recent I y used items in the separate list before being compared (if necessary) to the larger list of other data items.
[0023] The number of items kept in the separate list of most- recent I y used items may be selected to keep the time needed to compare the incoming data item to the separate list of most-recent I y used items to within one clock cycle. At least the approximate the length of time needed for each comparison would be known and the number of items in the separate list of most-recent I y used items can be limited accordingly to keep the comparisons of those items to within one clock cycle. That number may be a function of the size of each data item. For example, if the incoming data item is an address as in the example above, and each address is eight bits long, the number of items that can be compared within one clock cycle might be larger than if each address were thirty-two bits long.
[0024] If the incoming data item is found in the list of most- recently used data items, then the comparison will occur within one clock cycle and system operations can continue without undue delay. If the incoming data item is not found in the list of most-recent I y used items, the system would have to compare the incoming data item to the complete l ist of data items. This could slow system
operations, insofar as comparison with the complete l ist may take more than one clock cycle, and even several clock cycles, and this would be in addition to the clock cycle consumed by the unsuccessful comparison of the incoming data item to the l ist of most-recent I y used data items. One way to accelerate this longer comparison time is to exclude from the comparison those items in the l ist of most- recent I y used data items, as the incoming data item would already have been compared unsuccessful ly to those items.
[0025] The data items discussed herein can be of many different types, including, but not l imited to, device addresses, context IDs, storage locations or pointers, fi lenames, etc. It wi 11 be understood that any discussion herein of comparing an incoming data item to a l ist of data items refers to comparing the incoming data item to a l ist of data items of the same type. Thus, for example, an incoming device address would be compared to a l ist of device addresses, and not any other type of data items.
[0026] As one example only, comparison methods and apparatus in accordance with embodiments of this disclosure may be used with a modular chip "hopping bus" architecture, including multiple
integrated circuit chips that communicate as though they are a single chip, as described in U.S. Patent Appl ication Publ ication
2015/0169495, which is hereby incorporated by reference herein in its entirety. Such a system may operate at a high clock rate — e.g. , 1 GHz — which translates to a very short clock cycle (1 ns) which al lows only a smal l number of comparison operations.
[0027] In such a system, at least some of the chips, which may be referred to as "masters," wi l l include a processor (as defined above) to manage the intra- and inter-chip protocol. Each chip wi l l be assigned an address, and the processor on the "master" chip wi l l perform comparisons as described herein to decode chip addresses. The processor may also, e.g. , perform comparisons to decode context IDs from ID-based context look-up. [0028] Implementations of the subject matter of this disclosure keep track of "last hit" (i.e. , most-recent I y used) information, to create a two- 1 eve I comparison schema. The N last hit items may be referred to as:
LH[1], LH[2], ■■■ LH[N]
N can be 1, 2, 4 or more, but N should be selected so that N items can be compared in a single cycle of the targeted clock rate.
[0029] For any new incoming data item to be compared, the new incoming data item can be compared to LH [1 : N] in one clock cycle. If there is no hit from those comparisons, then the new incoming data items can be compared to al l items, or the remaining items, over the next several clock cycles. It is apparent, then, that if the new incoming data item matches any of the "last hit" (i.e. , most-recent I y used) data items, the comparison takes one clock cycle, but if the new incoming data item does not match any of the "last hit" data items, the comparison requires multiple cycles.
[0030] In an address decoding example, there may be 32 possible addresses decoding to 32 possible target zones:
New Address==desti nation address 0->decoded to target zone 0 New Address==desti nation address 1->decoded to target zone 1
New Address==desti nation address 31->decoded to target zone 31 A ful l comparison of 32 addresses cannot be performed in one cycle for a high-speed design, such as a system with a 1 GHz clock.
[0031] However, one can apply an implementation of the subject matter of this disclosure, as fol lows:
[0032] One can keeps track of the four most recent decoding hits (i.e. , N=4) :
Last hit 1 ==dest i nation address 10->decoded to target zone 10
Last hit 2==desti nation address 5->decoded to target zone 5
Last hit 3==desti nation address 11->decoded to target zone 11
Last hit 4==desti nation address 30->decoded to target zone 30
For the next new incoming address, one can perform four comparisons: New Address==Last hit 1, destination address 10->decoded to target zone 10
New Address==Last hit 2, destination address 5->decoded to target zone 5
New Address==Last hit 3, destination address 11->decoded to target zone 11
New Address==Last hit 4, destination address 30->decoded to target zone 30 The four-comparator logic can run at very high speed and completed in a single cycle. If there is no match, then the ful l comparison against al l 32 addresses (or the remaining 28 addresses) can be performed, which wi l l take multiple cycles.
[0033] In an ID-based context look-up example, there may be 64 possible context IDs (using an 8-bit ID) decoding to 32 possible contexts:
New Address==ID for context 0->look up context 0 New Address==ID for context 1 -> I ook up context 1
New Address==ID for context 31 -> I ook up context 31 A ful l comparison of 64 context IDS cannot be performed in one cycle for a high-speed design, such as a system with a 1 GHz clock.
[0034] However, one can apply an implementation of the subject matter of this disclosure, as fol lows:
[0035] One can keeps track of the four most recent context hits (i.e. , N=4) :
Last hit 1 == I D for context 10-> I ook up context 10
Last hit 2== I D for context 5-> I ook up context 5
Last hit 3== ID for context 11 -> I ook up context 11
Last hit 4== ID for context 30->look up context 30
For the next new incoming address, one can perform four comparisons:
New Address==Last hit 1, ID for context 10-> I ook up context 10
New Address==Last hit 2, ID for context 5-> I ook up context 5
New Address==Last hit 3, ID for context 11— > I ook up context 11
New Address==Last hit 4, ID for context 30->look up context 30
The four-comparator logic can run at very high speed and completed in a single cycle. If there is no match, then the ful l comparison against al l 64 context IDs (or the remaining 60 context IDs) can be performed, which wi l l take multiple cycles. [0036] FIG. 1 shows a processor arrangement 100 for performing comparison operations according to implementations of the subject matter of this disclosure, whether included in a modular chip
"hopping bus" architecture as described in above- i ncorporated U.S. Patent Appl ication Publ ication 2015/0169495, or otherwise.
Processor 101 controls the comparison operations in comparator 102. Comparator 102 may be a separate hardware comparator, or may implemented as software in processor 101. A plural ity of
comparators 102 may be provided (additional instances of
comparator 102 are shown in phantom) to al low multiple comparisons to be performed in paral lel, which increases the number of comparisons that can be performed within a single clock cycle.
[0037] The incoming data item to be compared is stored in a memory or register 103 on one input of comparator 102. The ful l set of data items against which the incoming data item is to be compared is stored in a memory 104. A further memory 105 stores the most recent "hits" (i.e. , the most-recent I y used data items), against which the incoming data item is first checked. A multiplexer 106 on the second input of comparator 102 selects the ful l set of data items in memory 104 or the most-recent I y used data items in memory 105 under control of processor 101. Memory 104 may have provisions for each stored item to be flagged at 114 so that if a memory item is one of the most-recent I y used data items in memory 105, it can be flagged, if desired, to be skipped when the incoming data item in memory 103 is compared against the ful l set of data items in memory 104, to avoid dupl icate comparisons and thereby speed up the comparison against the ful l set of data items.
[0038] Although comparators 102 are shown as separate circuit elements, the comparison function denoted by each comparator 102 may actual ly be performed by software in processor 101. Simi larly, whi le memories 103, 104, 105 are shown as separate memories, they may be implemented as separate portions (each of which may be contiguous or non-contiguous) within a single memory device. [0039] FIG. 2 shows a flow diagram of an implementation of a comparison method 200 according to the subject matter of this disclosure. In method 200 as i l lustrated, the data items being compared are addresses, but this is only an example and the data items could be any kind of data items that need to be compared against a l ist of data items.
[0040] Method 200 starts at 201 where the complete l ist of addresses and the l ist of most-recent I y used addresses are
initial ized. The complete l ist of addresses is determined as the system is configured and al l addressable system components are recognized. Initial ization of the l ist of most-recent I y used addresses normal ly involves clearing the l ist.
[0041] At 202, the system waits for an address to be input. Once an address has been input, it is determined at 203 whether there are any more uncompared most-recent I y used addresses in the l ist of most- recent I y used addresses. On startup, the l ist of most-recent I y used addresses wi l l be empty, so the answer wi l l be "No." The ful l l ist of addresses wi 11 be checked at 204 to determine whether the most- recent I y used addresses are flagged in the ful l l ist of addresses. Again, on startup, the answer wi l l be "No," and so the input address wi 11 be compared at 205 to an address in the ful I I ist of addresses.
[0042] At 206 it wi l l be determined whether that comparison is a hit. If not, comparisons at 205 wi l l continue unti l a hit is determined at 206 (in this implementation there is no such thing as an inval id input address, so there wi l l eventual ly a hit; in implementations where an inval id address is a possibi l ity, further checking to determine whether al l addresses have been checked without a hit may be implemented). If at 206 a comparison at 205 is determined to be a hit, at 207 the l ist of most-recent I y used addresses is updated by adding the current address as the most- recent I y used address; the I east-recent I y used address is discarded. Alternatively, to prevent a single frequently used address from occupying al l spaces in the l ist of most-recent I y used addresses, if an address is already present in the l ist of most-recent I y used addressed, that address wi l l not be added again, so that other addresses can remain in the l ist of most-recently used addresses. In such a case, the l ist of most-recently used addresses would have to contain an indication of how recently each address in the l ist of most-recently used addresses was used, so that when a new address is added to the l ist of most-recently used addresses, it is known which address that is already in the l ist of most-recently used addresses is the I east-recent I y used address and can be deleted. After updating of the l ist of most-recently used addresses, the input address is decoded at 208 and the system returns to 202 to await another input address.
[0043] If at 204 it is determined that most-recently used addresses are flagged in the ful l l ist, then at 209 the input address is compared to an unf lagged address in ful l l ist of addresses. If there is a hit, as determined at 210, then at 207 the l ist of most- recently used addresses is updated by adding the current address as the most-recently used address, and the I east-recent I y used address is discarded, and flow continues as above.
[0044] At times other than startup, at 203 there wi l l be at least one most-recently used address in the l ist of most-recently used addresses. At 211, the input address wi l l be compared to a most- recently used address in the l ist of most-recently used addresses, and if there is a hit as determined at 212, meaning the input address matches the most-recently used address in the l ist of most-recently used addresses that is being compared, the input address wi l l be decoded at 208 and the system returns to 202 to await another input address.
[0045] If at 212 there is not a hit, meaning that the input address does not match the most-recently used address in the l ist of most-recently used addresses that is being compared, the system returns to 203 to see if there are any more most-recently used addresses in the l ist of most-recently used addresses that have not yet been compared. If so, the system returns to 211 to compare the input address to the next most-recently used addresses in the l ist of most-recent I y used addresses and flow continues as before. If at 203 there are no more most-recent I y used addresses in the l ist of most- recent I y used addresses, the system continues to 204 and flow continues as before.
[0046] Thus it seen that methods and apparatus for accelerating l ist comparison operations have been provided.
[0047] It wi l l be understood that the foregoing is only
i l lustrative of the principles of the invention, and that the invention can be practiced by other than the described embodiments, which are presented for purposes of i l lustration and not of
l imitation, and the present invention is l imited only by the claims wh i ch f o 11 ow.

Claims

WHAT IS CLAIMED IS:
1. Comparison circuitry comprising:
a first memory that stores a list of data items; a second memory that stores a list of most-recent I y used ones of the data items;
a first comparator that compares an input data item first to the ones of the data items in the second memory and, only in absence of a hit in the second memory, compares the input data item to the data items in the first memory.
2. The comparison circuity of claim 1 wherein the first memory and the second memory are separate memory devices.
3. The comparison circuity of claim 1 wherein the first memory and the second memory are separate portions of a single memory device.
4. The comparison circuity of claim 1 wherein:
the first comparator comprises a processor;
operation of the processor is clocked;
the second memory is sized to contain a number of the data items; and
the number of the data items is a number of the data items on which the processor performs operations in a single clock cycle.
5. The comparison circuity of claim 1 wherein, in the absence of a hit in the second memory, the first comparator compares the input data item to all of the data items in the first memory.
6. The comparison circuity of claim 1 wherein, in the absence of a hit in the second memory, the first comparator compares the input data item only to the data items in the first memory that are not also in the second memory.
7. The comparison circuity of claim 6 wherein the first memory includes a respective flag for each respective data item stored in the first memory, to indicate whether the respective data item also is stored in the second memory.
8. The comparison circuitry of claim 1 further comprising at least one additional comparator; wherein:
each respective one of the at least one additional comparator operates in parallel with the first comparator to compare the input data item to respective data items in at least one additional one of the second memory, and to compare the input data item to respective data items in the first memory in absence of a respective hit in the at least one additional one of the second memory.
9. A data communications system comprising:
a plurality of integrated circuit devices; and a data bus interconnecting the plurality of integrated circuit devices, wherein:
at least one of the integrated circuit devices in the plurality of integrated circuit devices includes decoding circuitry, the decoding circuitry comprising:
a first memory that stores a list of data items; a second memory that stores a list of most-recent I y used data items;
a first comparator that compares an input data item first to data items in the second memory and, only in absence of a hit in the second memory, compares the input data item to data items in the first memory.
10. The data communications system of claim 9 wherein the input data item is an address to be decoded.
11. The data communications system of claim 9 wherein the input data item is a context ID to be decoded.
12. The data communications system of claim 9 wherein: the first comparator comprises a processor;
operation of the processor is clocked;
the second memory is sized to contain a number of data items; and
the number of data items is a number of data items on which the processor performs operations in a single clock cycle.
13. A comparison method for use in a data communications system, said comparison method comprising:
storing a list of data items;
storing a list of most-recent I y used ones of the data items;
comparing an input data item first to the list of most-recent I y used ones of the data items and, only in absence of a hit in the list of most-recent I y used ones of the data items, comparing the input data item to the list of the data items, for decoding the input data item for use in said data communications system.
14. The comparison method of claim 13 wherein the storing the list of data items and the storing the list of the most- recent I y used ones of the data items comprises:
storing the list of data items in a first memory device; and
storing the list of the most-recent I y used ones of the data items in a second memory device.
15. The comparison method of claim 13 wherein the storing the list of data items and the storing the list of the most- recent I y used ones of the data items comprises storing the list of data items and the list of the most-recent I y used ones of the data items in first and second portions of a single memory device.
The comparison method of claim 13 wherein the comparing is clocked; and
the number of data items in the list of the most- recent I y used ones of the data items is a number of data items on which the comparing is performed in a single clock cycle.
17. The comparison method of claim 13 wherein the comparing the input data item to the list of the data items comprises comparing the input data item to all data items in the list of the data items.
18. The comparison method of claim 13 wherein the comparing the input data item to the list of the data items comprises comparing the input data item only to data items in the list of the data items that are not also in the list of the most-recent I y used ones of the data items.
19. The comparison method of claim 18 wherein the storing a list of data items comprises storing a respective flag for each respective data item in the list of the data items, to indicate whether the respective data item also is in the list of the most- recent I y used ones of the data items.
20. The comparison method of claim 13, wherein:
the comparing an input data item first to list of most-recent I y used ones of the data items comprises comparing the input data item in parallel to respective data items in the list of most-recent I y used ones of the data items, and to respective data items in list of the data items in the absence of a respective hit in the list of most-recent I y used ones of the data items.
PCT/US2016/044358 2015-08-03 2016-07-28 Method and apparatus for a processor with cache and main memory WO2017023659A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
TW105124513A TWI695268B (en) 2015-08-03 2016-08-02 Systems and methods for transmitting interrupts between nodes
TW105124514A TWI703437B (en) 2015-08-03 2016-08-02 Systems and methods for performing unknown address discovery in a mochi space
TW105124512A TWI703445B (en) 2015-08-03 2016-08-02 Systems and methods for aggregating data packets in a mochi system
TW105124510A TWI695262B (en) 2015-08-03 2016-08-02 Systems and methods for implementing topology-based identification process in a mochi environment
TW105124515A TWI705327B (en) 2015-08-03 2016-08-02 Methods and apparatus for accelerating list comparison operations

Applications Claiming Priority (20)

Application Number Priority Date Filing Date Title
US201562200462P 2015-08-03 2015-08-03
US201562200444P 2015-08-03 2015-08-03
US201562200452P 2015-08-03 2015-08-03
US201562200436P 2015-08-03 2015-08-03
US62/200,444 2015-08-03
US62/200,452 2015-08-03
US62/200,436 2015-08-03
US62/200,462 2015-08-03
US201562218296P 2015-09-14 2015-09-14
US62/218,296 2015-09-14
US15/220,916 US10552350B2 (en) 2015-08-03 2016-07-27 Systems and methods for aggregating data packets in a mochi system
US15/220,684 US10198376B2 (en) 2015-08-03 2016-07-27 Methods and apparatus for accelerating list comparison operations
US15/220,546 2016-07-27
US15/220,684 2016-07-27
US15/220,898 US10339077B2 (en) 2015-08-03 2016-07-27 Systems and methods for implementing topology-based identification process in a MoChi environment
US15/220,916 2016-07-27
US15/220,546 US10318453B2 (en) 2015-08-03 2016-07-27 Systems and methods for transmitting interrupts between nodes
US15/220,898 2016-07-27
US15/220,923 2016-07-27
US15/220,923 US10474597B2 (en) 2015-08-03 2016-07-27 Systems and methods for performing unknown address discovery in a MoChi space

Publications (1)

Publication Number Publication Date
WO2017023659A1 true WO2017023659A1 (en) 2017-02-09

Family

ID=56616069

Family Applications (5)

Application Number Title Priority Date Filing Date
PCT/US2016/044431 WO2017023682A1 (en) 2015-08-03 2016-07-28 Systems and methods for performing unknown address discovery in a mochi space
PCT/US2016/044360 WO2017023661A1 (en) 2015-08-03 2016-07-28 Systems and methods for transmitting interrupts between nodes
PCT/US2016/044358 WO2017023659A1 (en) 2015-08-03 2016-07-28 Method and apparatus for a processor with cache and main memory
PCT/US2016/044428 WO2017023681A1 (en) 2015-08-03 2016-07-28 Systems and methods for aggregating data packets in a mochi system
PCT/US2016/044425 WO2017023678A1 (en) 2015-08-03 2016-07-28 Systems and methods for implementing topoloby-based identification process in a mochi environment

Family Applications Before (2)

Application Number Title Priority Date Filing Date
PCT/US2016/044431 WO2017023682A1 (en) 2015-08-03 2016-07-28 Systems and methods for performing unknown address discovery in a mochi space
PCT/US2016/044360 WO2017023661A1 (en) 2015-08-03 2016-07-28 Systems and methods for transmitting interrupts between nodes

Family Applications After (2)

Application Number Title Priority Date Filing Date
PCT/US2016/044428 WO2017023681A1 (en) 2015-08-03 2016-07-28 Systems and methods for aggregating data packets in a mochi system
PCT/US2016/044425 WO2017023678A1 (en) 2015-08-03 2016-07-28 Systems and methods for implementing topoloby-based identification process in a mochi environment

Country Status (1)

Country Link
WO (5) WO2017023682A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5845327A (en) * 1995-05-03 1998-12-01 Apple Computer, Inc. Cache coherency where multiple processors may access the same data over independent access paths
US20050225558A1 (en) * 2004-04-08 2005-10-13 Ati Technologies, Inc. Two level cache memory architecture
US20130132658A1 (en) * 2011-11-21 2013-05-23 International Business Machines Corporation Device For Executing Program Instructions and System For Caching Instructions
US20130191587A1 (en) * 2012-01-19 2013-07-25 Renesas Electronics Corporation Memory control device, control method, and information processing apparatus

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5179704A (en) * 1991-03-13 1993-01-12 Ncr Corporation Method and apparatus for generating disk array interrupt signals
JP2000332817A (en) * 1999-05-18 2000-11-30 Fujitsu Ltd Packet processing unit
US7295552B1 (en) * 1999-06-30 2007-11-13 Broadcom Corporation Cluster switching architecture
KR100678223B1 (en) * 2003-03-13 2007-02-01 삼성전자주식회사 Method and apparatus for packet transmitting in a communication system
US7330467B2 (en) * 2003-03-26 2008-02-12 Altera Corporation System and method for centralized, intelligent proxy driver for a switch fabric
TWI256591B (en) * 2004-08-11 2006-06-11 Benq Corp Method of reducing interrupts
US7788434B2 (en) * 2006-12-15 2010-08-31 Microchip Technology Incorporated Interrupt controller handling interrupts with and without coalescing
US8588253B2 (en) * 2008-06-26 2013-11-19 Qualcomm Incorporated Methods and apparatuses to reduce context switching during data transmission and reception in a multi-processor device
US8725919B1 (en) * 2011-06-20 2014-05-13 Netlogic Microsystems, Inc. Device configuration for multiprocessor systems
US9170971B2 (en) * 2012-12-26 2015-10-27 Iii Holdings 2, Llc Fabric discovery for a cluster of nodes
US9959237B2 (en) 2013-12-12 2018-05-01 Marvell World Trade Ltd. Method and apparatus for transferring information within and between system-on-chips via intra-chip and inter-chip hopping buses
JP6541272B2 (en) * 2013-12-12 2019-07-10 マーベル ワールド トレード リミテッド Method and apparatus for transferring information within and between system on chips via intra chip and inter chip hopping buses

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5845327A (en) * 1995-05-03 1998-12-01 Apple Computer, Inc. Cache coherency where multiple processors may access the same data over independent access paths
US20050225558A1 (en) * 2004-04-08 2005-10-13 Ati Technologies, Inc. Two level cache memory architecture
US20130132658A1 (en) * 2011-11-21 2013-05-23 International Business Machines Corporation Device For Executing Program Instructions and System For Caching Instructions
US20130191587A1 (en) * 2012-01-19 2013-07-25 Renesas Electronics Corporation Memory control device, control method, and information processing apparatus

Also Published As

Publication number Publication date
WO2017023678A1 (en) 2017-02-09
WO2017023661A1 (en) 2017-02-09
WO2017023681A1 (en) 2017-02-09
WO2017023682A1 (en) 2017-02-09

Similar Documents

Publication Publication Date Title
US20170038996A1 (en) Methods and apparatus for accelerating list comparison operations
US7512739B2 (en) Updating a node-based cache LRU tree
US9348775B2 (en) Out-of-order execution of bus transactions
CN110442462B (en) Multithreading data transmission method and device in TEE system
US20150110114A1 (en) Processing Concurrency in a Network Device
JP2016516352A (en) Configurable multi-core network processor
JP5933360B2 (en) Specifying the address of the branch destination buffer in the data processor
JP2013229038A5 (en)
JP2020502828A5 (en)
US11218410B2 (en) Hybrid wildcard match table
US10038571B2 (en) Method for reading and writing forwarding information base, and network processor
AR024726A1 (en) METHOD FOR PROCESSING NETWORK MESSAGES
WO2017023659A1 (en) Method and apparatus for a processor with cache and main memory
US20170116152A1 (en) Local processing apparatus and data transceiving method thereof
KR102524566B1 (en) A packet memory system, method and device for preventing underrun
US9244850B2 (en) Device for executing program instructions and system for caching instructions
US20170024154A1 (en) System and method for broadcasting data to multiple hardware forwarding engines
JP6450536B2 (en) Single cycle arbitration
CN108647289B (en) Hash table building method based on valley Hash and bloom filter
JP6344348B2 (en) Buffer control device, communication node, and relay device
JP2019525376A5 (en)
US20100182815A1 (en) Content addressable memory
CN108667951B (en) Virtual MAC address mapping method and device, storage medium and relay equipment
CN106330722B (en) A kind of method creating route-caching item, the method and device that E-Packets
US8706736B2 (en) Extended width entries for hash tables

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16748434

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16748434

Country of ref document: EP

Kind code of ref document: A1