GB2516092A

GB2516092A - Method and system for implementing a bit array in a cache line

Info

Publication number: GB2516092A
Application number: GB1312446.6A
Authority: GB
Inventors: Burkhard Steinmacher-Burow
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2013-07-11
Filing date: 2013-07-11
Publication date: 2015-01-14
Also published as: GB201312446D0; JP2016526739A; GB2530962B; GB201601479D0; DE112014003212T5; WO2015004571A1; CN105378686B; CN105378686A; GB2530962A; JP6333371B2

Abstract

The invention concerns a method for implementing a bit array 318 in a cache line 211 of memory system 128 that includes memory storage 208 and controller 206. Each bit of the bit array may represent a lock in a set of locks shared by multiple threads whereby operations (e.g. atomic operations) on the bit array may allow a thread to become an owner of a resource by setting the value of the corresponding bit or bits. The method includes configuring in the cache line the bit array, wherein the configuring further comprises defining a value of each bit in the bit array; receiving, by the controller, a request 210 for an operation (e.g. an atomic operation) on the bit array wherein the request is indicative of a location (i.e. address) of the cache line in the memory storage and information specifying the request; identifying, by the controller, for the operation one or more actions on the bit array using the information, wherein the one or more actions are encoded in the controller; and in response to receiving the request, performing the request by executing the one or more encoded actions.

Description

DESCRIPTION

Method and system for implementing a bit array in a cache line

Field of the invention

The invention relates to computing systems, and more particularly to a method for implementing a bit array in a cache line.

Background

Many multithreaded computer systems become one of the more important technologies for different sized companies. They increase the computational efficiency and flexibility of a computing hardware platform. F4ultithreaded operations on the multithreaded computer systems may comprise bit-wise Atomic Memory Operations (AMO) that perform logical operations on individual bits of a bit array. Such AMOs include the store-like operations storeAND, storeOR, storeXOR and the toad-like operations fetchAND, fetchOk, fetchXOR.

Summary of the invention

It is an objective of embodiments of the invention to provide for an improved method, a computer system and a computer program product. Said objective is solved by the subject matter of the independent claims. Advantageous embodiments are described in the dependent claims.

An atomic memory operation (ANO) as used herein refers to a read-modify-write operation on shared data. An AMO is atomic in the sense that the read-modify-write operation of a thread is performed without interference by another thread. In other terms, for threads or processors accessing (e.g. concurrently accessing) shared data, each read or write or P110 access is performed atomically, without interference from another access.

Tn one aspect, the invention relates to a method for implementing a bit array in a cache line of a memory system that includes a memory storage and a controller.

The method comprises configuring in the cache line the bit array, the bit array comprising array of bits, wherein the configuring further comprises defining a value of each bit in the bit array, receiving, by the controller, a request for an operation on the bit array wherein the reguest is indicative of a location of the cache line in the memory storage and information specifying the reguest, identifying, by the controller (206), for die operation one or more actions on the bit array using the information, wherein the one or more actions are encoded in the controller; and in response to receiving the reguest, performing the reguest by executing the one or more encoded actions.

The bit array is a data structure that stores individual bits i.e. each array element corresponds to a single bit. Each element of the bit array stores one of two values 0 or 1. Each element of the bit array is identified by a unigue index.

Typically, the index values 0 to (E-l) are used for a bit array with E elements in number.

The implementation of the bit array may allow an access to at least part of the cache line for performing bit-wise logical operations on individual bits as well as on groups of bits of the bit array. For example, each bit of the bit array may represent an instance of a resource of the computer system comprising the memory system. Operations on said bit array may allow, for example, a user e.g. a thread to atomically become an "owner" of one or more said resource instances by setting the value of the corresponding bit or bits to 1.

For example, each bit of the bit array may represent a lock in a set of locks shared by multiple threads. Operations on said bit array may allow, for example, a thread to atomically acquire multiple locks with high performance and low programming effort.

For example, the 1 or 0 value of each bit of the bit array may represent the presence or absence of a particular element in a priority queue. Such a priority queue has up to S unique elements, where each element has a unique priority numbered 0 to E-1, to correspond to each index of the bit array.

These features may be advantageous as they may allow a higher rate of operations on the bit array compared to conventional methods, especially for concurrent threads. This may be due to the fact that an operation on the bit array may be a simple operation faster than conventional operations on other data structures.

These features may allow the use of the bit array as a concurrent data structure by multiple threads. For example, the requested operation may be an AMO. The received request may be one of multiple concirrent requests received from multiple threads by the controller for operations on the bit array. The received concurrent requests may be sequentially performed. The controller may be adapted for performing and managing such multiple concurrent threads. The controller may comprise a receiver accepting a memory access request and a transmitter returning a reply, each may include first-in-first-out buffers (FIFO5), to serve the memory access of the multiple concurrent threads.

These features may provide a single interface for a plurality of users or threads to utilize the bit array, for example with a multi-threaded application where multiple threads concurrently issue requests, such as the request, to access the bit array.

Another advantage may be that the memory system may implement new atomic memory operations to use the bit array beyond the conventional ones. For example, an AMO may be a variation on a processor store or load instruction.

Requested operations may only require the address of the cache line and do not require addresses of elements of the bit array.

According to one embodiment, the request comprises a fetch and set a first 0-bit request, the performing of the request comprising performing the one or more aotions: starting at index 0, sequentially reading each bit of the bit array, in case a first bit having value 0 is found, returning an index of the first found bit, and setting the value of the first found bit to 1, otherwise returning a predefined failure value.

The term 0-bit refers to a bit having the value 0. The term 1-bit refers to a bit having the value 1.

For a bit array with E elements in number, fetch and set a first 0-bit request returns an index value within 0 to E-l. If no 0-bit exists in the bit array, the predefined failure value may not correspond to a valid index. For example, the failure value may be E, the number of bits of the bit array.

The cache line may also be accessed by other bitwise AI40s. For example, a bitwise AMO like storeAND() addresses and modifies an 8 byte word within a 128 byte cache line. For example, a thread issues a fetch and set a first 0-bit request to obtain a resource corresponding to the index of the first bit having value 0. After using the resource, the thread issues a storeAND() operation on the appropriate word in the cache line to set that bit to the value 0, thus making the resource available again. Thus bit arrays can be used for the atlocation of memory pages, modes, disk sectors, etc. According to one embodiment, the request comprises a fetch and set a last 0-bit request, wherein the bit array comprises E elements, the performing of the request comprising performing the one or more actions: starting at index E-1, sequentially reading each bit of the bit array, in case a last bit of the bit array having value 0 is found returning an index of the last bit, and setting the value of said last bit to 1, otherwise returning a predefined failure value.

According to one embodiment, the reguest comprises a count 1-bits request, the performing of the request comprising performing the one or more actions: reading each bit of the bit array, counting the number of bits having the value 1, and returning the result of the count. For example, such a request is also known as a population count request.

According to one embodiment, the request comprises a fetch and set an isolated 0-bit request, the performing of the request comprising performing the one or more actions: reading the bit array and determining zero or more contiguous seguences of bits having 0 values, in case zero sequences is found returning a predefined failure value; otherwise ranking the one or more contiguous seguences based on their number of bits, selecting a bit of a contiguous sequence of the one or more contiguous sequences having the lowest number of bits, returning an index of the selected bit, setting the value of the selected bit to 1.

This may be advantageous as it may preserve the longest possible run of Os that may be used by other concurrent operations. This may be advantageous, when the bit array represents contiguous instances of some resource such as memory pages or disk blocks, where preserving a run of Os preserves contiguous available resources for some future resource acquisition. Such a future acquisition may use a fetch and set a number N of contiguous 0-bits request on the bit array.

According to one embodiment, the request comprises a fetch a first 1-bit request, the performing of the request comprising performing the one or more actions: starting at index 0, sequentially reading each bit of the bit array, in case a first bit having value 1 is found returning an index of the first found bit, otherwise returning a predefined failure value. For example, a fetch a first 1-bit request may be used to identify the highest priority element in a priority queue based on a bit array.

According to one embodiment, the request comprises a fetch and clear a first i-bit request, the performing of the request comprising performing the one or more actions: starting at index 0, sequentially reading each bit of the bit array, in case a first bit having value 1 is found, returning an index of the first found bit, and setting the first found bit value to 0; otherwise returning a predefined failure value. For example, a fetch and clear a first 1-bit request may be used to identify and remove the highest priority element in a priority queue based on a bit array.

According to one embodiment, the request comprises a fetch and clear a last 1-bit request, wherein the bit array comprises E elements, the performing of the request comprising performing the one or more actions: starting at index E-l, seguentially reading each bit of the bit array, in case a last bit of the bit array having value 1 is found returning an index of the last bit, and setting the last bit value to 0, otherwise returning a predefined failure value. For example, a fetch and clear a last 1-bit request may be used to identify and remove the lowest priority element in a priority queue based on a bit array.

According to one embodiment, the request comprises a fetch an Nth 1-bit request, the request providing the value for N, the performing of the request comprising performing the one or more actions: starting at index 0, sequentially reading each bit of the bit array, in case the Nth occurrence of a bit having value 1 is found returning an index of the found Nth 1-bit, otherwise returning a predefined failure value. For example, if the request provides the value 1 for N, then the request is equivalent to a fetch a first 1-bit request. For example, in a scenario with 4 threads, the fetch an Nth 1-bit request allows each thread to identify 1 of the 4 top elements in a priority queue.

According to one embodiment, the request comprises a fetch and clear an Nth 1-bit request, the request providing the value for N, the performing of the request comprising performing the one or more actions: starting at index 0, sequentially reading each bit of the bit array, in case the Nth occurrence of a bit having value I is found, returning an index of the found Nth 1-bit and setting the value of the found bit to 0, otherwise returning a predefined failure valie. For example, 2 threads of different performance processing the elements in a priority queue, may best meet application performance aims if the fast and slow threads provide the valies 1 and 2 for N, respectively.

According to one embodiment, the request comprises a fetch and clear an isolated 1-bit request, the performing of the request comprising performing the one or more actions: reading the bit array and determining zero or more contiguous seguences of bits having 1 values, in case zero seguences is found returning a predefined failure value; otherwise ranking the one or more contiguous sequences based on their number of bits, selecting a bit of the contiguous sequence having the lowest number of bits, returning an index of the selected bit, setting the selected bit value to 0.

According to one embodiment, the request comprises a fetch and set a number N of contiguous 0-bits reguest, wherein the performing of the request comprises performing the one or more actions: reading the bit array and determining zero or more contiguous seguences of N bits having 0 values, in case zero sequences is found returning a predefined failure value; otherwise selecting a sequence of the one more contiguous sequences that meets a predefined condition, returning an index of a first bit of the selected sequence, and setting the selected N bits to value 1.

This may allow, for example, a thread to atomically acquire ownership of a sequence of N resources. A predefined failure value nay be returned if N contiguous 0-bits are not present in the bit array. Such a sequence of resources may be used in the allocation of memory pages, modes, disk sectors, etc. According to one embodiment, the predefined condition comprises that the sequence comprises first N contiguous bits of the bit array. According to one embodiment, the bit array may be treated as a circular buffer, thus allowing the contiguous sequences of N bits to consist of a contiguous sequence at the start of the bit array and a contiguous sequence at the end of the bit array.

According to one embodiment, the request comprises a fetch and set a number N of specified 0-bits request, wherein the performing of the request comprises performing the one or more actions: reading the bit array and locating the specified bits at the specified indices; in case at least one bit of the specified bits having a value 1, returning a predefined failure value, in case each of the specified bits having a value 0, setting each specified bit to the value 1 and returning a predefined success value. This may allow a thread to atomically obtain multiple locks.

For a fetch and set a number N of specified 0-bits request, one or more values in the request can efficiently specify the indices of the 0-bits requested. For example, for a bit array with up to 256 elements, the request can provide an 8 byte value where each of the 8 bytes specifies a bit index. To specify less than 8 bits, some of the bytes specify the same index. In other words, to specify S 0-bits, the 8 bytes specify S unique indices. For example, for a bit array with up to 1024 elements, the request can provide an 64bit value where the least significant 60 bits are treated as 6 fields of lObits each,

where each of the 6 fields specifies a bit index.

The fetch and set a number N of specified 0-bits request invention addresses the problem of granular locking where each process or thread must hold multiple locks from a shared set of locks, where the number of locks in the set is up to the number E of bits in the bit array and where each look in the set may be numbered 0 to E-1. Without the invention, granular locking can create subtle lock dependencies. This subtlety can increase the -10 -chance that a programmer will unknowingly introduce a deadlock.

Thus without the invention, locks are only oomposable (e.g., managing multiple concurrent locks in order to atomically delete Item X from Table A and insert X into Table B) with relatively elaborate (overhead) software support and perfect adherence by applications programming to rigorous conventions.

If the set of objects, each to be locked, is greater than E in number, then each object may be hashed to number between C and E-1 to allow the use of the bit array at the performance cost of a coarser lock granularity.

For example, the bit array may be implemented in two or more cache lines of the memory system, wherein the two or more cache lines are contiguous cache lines. A request for an operation on the bit array may be indicative of the two or more contiguous cache lines and a location of a first cache line of the two or more cache lines in the memory system.

In anorher aspect, the invention relates to a computer program product comprising computer executable instructions to perform the method steps of the method of any one of the preceding embodiments.

In another aspect, the invention relates to a system for implementing a bit array in a cache line, the system comprising a memory storage and a controller, the system being configured to configure in the cache line the bit array, the bit array comprising array of bits, wherein the configuring further comprises defining a value of each bit in the bit array, receive, by the controller, a request for an operation on the bit array wherein the request is indicative of a location of the cache line in the memory storage and information specifying the request, identify for rhe operation one or more actions on the bit array using the information, wherein the one or more actions -:11 -are encoded in the controller, and in response to receiving the request, perform the request by executing the one or more encoded actions.

The memory system may be a level in a cache hierarchy, so that the ccntroller performing a request may result in memory system access to lower level(s) in the cache hierarchy to establish the metadata and elements in memory cache. A memory system may be split into two or more parts, where a controller in a part may operate a dynamic array data structure using memory in that part. A cache level may be replicated as two or more units, where the controller may access any part of the underlying cache or memory levels within a cache unit.

A computer-readable storage medium' as used herein encompasses any tangible storage medium which may store instructions which are executable by a processor of a computing device. The computer-readable storage medium may be referred to as a computer-readable non-transitory storage medium. The computer-readable storage medium may also be referred to as a tangible computer readable medium. In some embodiments, a computer-readable storage medium may also be able to store data which is able to be accessed by the processor of the computing device.

Examples of computer-readable storage media include, but are not limited to: a floppy disk, a magnetic hard disk drive, a solid state hard disk, flash memory, a tJSB thumb drive, Random Access Memory (RAM) , Read Only Memory (ROM) , an optical disk, a magneto-optical disk, and the register file of the processor.

Examples of optical disks include Compact Disks (CD) and Digital Versatile Disks (DVD), for example CD-ROM, CD-RW, CD-R, DVD-RCM, DVD-RW, or DVD-R disks. The term computer readable-storage medium also refers to various types of recording media capable of being accessed by the computer device via a network or communication link. For example a data may be retrieved over a -12 -modem, over the internet, or over a local area network.

Computer exeoutable oode embodied on a oomputer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with computer executable code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport & program for use by or in connection with an instruction execution system, apparatus, or device.

Computer memory' or memory' is an example of a computer-readable storage medium. Computer memory is any memory which is directly accessible to a processor. Computer storage' or storage' is a further example of a computer-readable storage medium. Computer storage is any non-volatile computer-readable storage medium. In some embodiments computer storage may also be computer memory or vice versa.

A processor' as used herein encompasses an electronic component which is able to execute a program or machine executable instruction or computer executable code. References to the computing device comprising "a processor" should be interpreted as possibly containing more than one processor or processing core. The processor may for instance be a multi-core processor.

A processor may also refer to a collection of processors within a single computer system or distributed amongst multiple computer systems. The term computing device should also be -13 -interpreted to possibly refer to a collection or network of computing devices each comprising a processor or processors. The computer executable code may be executed by multiple processors that may be within the same computing device or which may even be distributed across multiple computing devices.

Computer executable code may comprise machine executable instructions or a program which causes a processor to perform an aspect of the present invention. Computer executable code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages and compiled into machine executable instructions. In some instances the computer executable code may be in the form of a high level language or in a pre-compiled form and be used in conjunction with an interpreter which generates the machine executable instructions on the fly.

The computer executable code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Tnternet Service Provider) Aspects of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to -14 -embodiments of the invention. It will be understood that each block or a portion of the blocks of the flowchart, illustrations, and/or block diagrams, can be implemented by computer program instructions in form of computer executable code when applicable. It is further under stood that, when not mutually exclusive, combinations of blocks in different flowcharts, illustrations, and/or block diagrams may be combined. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

-15 -As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as an apparatus, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," "module" or "system." Furnhermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer executable code embodied thereon.

It is understood that one or more of the aforementioned embodiments may be combined as long as the combined embodiments are not mutually exclusive.

Brief description of the drawings

In the following, preferred embodiments of the invention will be described in greater detail by way of example only making reference to the drawings in which: Fig. 1 illustrates system architecture operable to execute method for implementing a bit array in a cache line in a memory; Fig. 2 illustrates an exemplary block diagram of a memory system; Fig. 3 shows a diagram illustrating a sequence of operations on a bit array; and Fig. 4 is a flowchart of a method for implementing a bit array in a cache line in a memory.

-16 -

Detailed description

Tn the following, like numbered elements in the figures either designate similar elements or designate elements that perform an equivalent function. Elements which have been discussed previously will not necessarily be discussed in later figures if the function is eguivalent.

Fig.1 shows a computer system (or server) 112 in computing system 100 is shown in the form of a general-purpose computing device. The components of computer system 112 may include, but are not limited to, one or more processors or processing units 116, a memory system 128, and a bus 118 that couples various system components including memory system 128 to processor 116.

Computer system 112 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system 112, and it includes both volatile and non-volatile media, removable and non-removable media.

Memory system 128 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) and/or cache memory. The memory system may include one or more active buffered memory devices. The active buffered devices may include a plurality of memory elements (e.g., chips). The active buffered memory device may include layers of memory that form a three dimensional ("3D") memory device where individual columns of chips form vaults in communication with the processing units 116. The active buffered memory device may comprise partitions that may be concurrently accessed by a plurality of processing elements, where the partitions may be any suitable memory segment, including but not limited to vaults.

-17 -The processing units 116 may issue requests to the memory system, utilizing the dynamic array data structure and associated metadata to implement an application.

Computer system 112 may also communicate with one or more external devices 114 such as a keyboard, a pointing device, a display 124, etc.; one or more devices that enable a user to interact with computer system 112; and/or any devices (e.g., network card, modem, etc.) that enable computer system 112 to communicate with one or more other computing devices. Such communication can occur via I/C interface(s) 122. Still yet, computer system 112 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 120. As depicted, network adapter 120 communicates with the other components of computer system/server 112 via bus 118.

Fig. 2 shows in details an exemplary block diagram of the memory system 128. The memory system 128 comprises a controller 206 and memory storage 208. The memory storage 208 may be for example any suitable physical memory, such as a cache or random access memory (RAI4) The controller 206 includes a receiver 214 of a request 210 and a transmitter 216 of a reply 212 configured to communicate with the bus 118, where the receiver 214 and transmitter 216 each include first-in-first-out buffers. In response to a request 210, the controller 206 performs read and write accesses to the storage 208 and may return a reply 212.

The memory storage 208 comprises one or more cache lines 211. A cache line 211 may be accessed as a bit array, which may contain bit values 0 or 1. A bit array may be implemented in the cache line 211, by, for example, configuring at least part of the -18 -cache line for accessing said at least part of the cache line using bitwise operations.

The described structure of the memory system may provide a single interfaoe for a plurality of users or threads to utilize the bit array, for example with a multi-threaded application where multiple threads concurrently issue requests, such as request 210, to access the cache line.

The request 210 for an operation on the bit array 211 is performed by the controller 206 by executing actions that correspond to said operation request. These actions are encoded in the controller.

The request 210 from a requester or user is received from the bus 118 by the receiver 214 of the controller 206. The requester may be a thread executing an application e.g. a thread executing an AMO. The request 210 is indicative of a location of the cache line in the memory.

The request 210 may be any suitable request for an operation on the bit array 318, such as an operation request to set all elements of the bit array to 0, or set all elements to 1. This may allow an initialization of the bit array. The request may comprise an operation request to return number of bits set to 1, the first 1-bit, the middle 1-bit or the first transition bit (i.e. when leading Os change to 1 or vice versa.). The request may further comprise an operation request to return and clear entire first blip. For example, for the array 00001101, position 4 and length 2 are returned and resulting array is 00000001. The middle 1-bit refers to the bit with an equal number of 1-bits earlier and later in the array.

-19 -A bit or a group of bits of the bit array 215 may be accessed by the controller 206 in oommunication 218, where the acoessing is based on the operation in request 210. In the example where the request 210 is a fetch a first 0-bit request, the value of the first bit having value 0 read in communication 218 is transmitted by the controller 206 to the user in the reply 212.

Fig. 3 shows a diagram illustrating a sequence of operations on a bit array (e.g. 211). The bit array 318.1 contains a fixed number of bits. For simplifying the description, the bit array 318.1 is shown as containing first 24 bits of the cache line.

The bit positions are labeled from left-to-right as shown in line 333.

The operations may be requested and received by the controller (e.g. 206) of the memory system as concurrent requests, and may be processed (e.g. once at time) in sequence as follows. The order of operations may be arbitrary.

Operation 301 of the sequence corresponds to fetchAndsetFirstOBit(cacheLineAddress) to return the index of the first bit in the bit array 318.1 with value 0 and then sets that bit to the value 1. For that the controller 206 may sequentially read the bit array starting from the bit having index 00 until the first bit (i.e. index 00) having the value 0 is found. The controller may then return the index 00. Next, the controller sets the value of the bit 00 to 1, which results in the bit array 318.2.

Operation 303 of the sequence also corresponds to fetchAndsetFirstOBit(cacheLineAddress) to return the index of the first bit in the bit array 318.2 with value 0 and then sets that bit to the value 1. For that the controller 206 may sequentially read the bit array starting from the bit having -20 -index 00 until the first bit (i.e. index 03) having the value 0 is found. The controller may then return the index 03. Next, the controller sets the value of the bit 00 to 1, which results in the bit array 318.3.

Operation 305 of the sequence corresponds to fetchAndfetFirstNcontiguous0Bits (cacheLineAddress, 2) to return an index of the first bit of a first set of 2 contiguous 0-bits in the bit array 318.3 and then sets that 2 bits to the value 1.

For that, the controller 206 may sequentially read the bit array 318.3 starting from the bit having index 00 until the first 2 contiguous bits (i.e. indices 06,07) are found each with value 0. The controller may then return the index 06, and sets the values of the two bits 06 and 07 to 1, which results in the bit array 318.4.

Operation 307 of the sequence corresponds to fetchAndsetFirstNcontiguousoBits (cachefineAddress, 4) to return an index of the first bit of a first set of 4 contiguous 0-bits in the bit array 318.4 and then sets that 4 bits to the value 1.

For that, the controller 206 may sequentially read the bit array 318.4 starting from the bit having index 00 until the first 4 contiguous bits (i.e. indices 11-14) are found each with value 0. The controller may then return the index 11, and sets the values of the 4 bits 11-14 to 1, which results in the bit array 318.5.

Operation 309 of the sequence corresponds to fetchAndsetNgiven0Bits (cacheLineAddress,3,x04090F) . The 2nd argument 3 indicates that 3 bits may be changed from 0 to 1. The 3rd argument 0x040911 indicates that the 3 desired bits indices are: 0x04, 0x09, Oxll==17. The controller may then read the value of each of these 3 bits from bit array 318.5 and determine whether each bit has the value 0. The controller may return 1 to -21 -indicate a success and may set the value of each of the 3 bits to one which results in the bit array 318.6.

Operation 311 of the sequence corresponds to fetchAndsetNgiven0Bits(cacheLineAddress,2,x0108) . The 2nd argument 2 indicates that 2 bits may be changed from 0 to 1.

The 3rd argument 0x0108 indicates the 2 desired bits indices OxOl, 0x08. The controller may then read the value of each of these 2 bits from bit array 318.6 and determine whether each bit has the value 0. In this case, one of the two bits has a value 1. The controller may return a predefined failure value 0 to indicate a failed request without changing the content of the bit array 318.6.

Fig. 4 is a flow chart of an exemplary method and system for operating a memory system, such as the memory system shown in FIG. 1 to implement a bit array in a cache line e.g. in a single cache line in the memory system.

Tn step 401, the cache line is configured in the memory such that at least part of the cache line may be accessed as a bit array. For example, the cache line may be accessed as a bit array with 1024 elements.

Tn step 403, a request for an operation on the bit array from a user or requestor is received by the controller. The request 210 provides the address location of the cache line for the operation. The requestor issuing the request may be any suitable user, such as a thread running on a processor, a processing element included in a buffered memory stack or a thread communicating over a network via network interface logic.

In step 405, the controller identifies for the operation one or more actions on the bit array using the information. The one or -22 -more actions are encoded in the controller. Multiple actions may be encoded in the controller. The identification of the one or more actions may comprise selecting the one or more actions from the multiple actions.

Tn step 407, the controller executes the one or more actions to perform the request. For example, The request may comprise a fetch and set a first 0-bit to return the index of a first bit in the bit array with value 0 and then sets that bit to the value 1.

As shown in the diagram, following step 407, the controller waits to serve the next request 210 and returns to block 403.

The initial configuration step 401 may be performed once during initial configuration, while steps 403 to 407 may be repeated when each request for an operation is served by the controller 206.

-23 -List of deference Numerals computing system 112 server 114 external devices 116 processor network adapter 122 i/o interface 124 display 128 memory system 206 controller 208 memory storage 210 request 211 cache line 212 reply

213 metadata field

214 receiver

215 elements field

216 transmitter 218-220 communication 301-311 cperaticns 311 cache line 333 indices 401-407 steps.