CN113508363A - Arithmetic and logical operations in a multi-user network - Google Patents

Arithmetic and logical operations in a multi-user network Download PDF

Info

Publication number
CN113508363A
CN113508363A CN202080016545.5A CN202080016545A CN113508363A CN 113508363 A CN113508363 A CN 113508363A CN 202080016545 A CN202080016545 A CN 202080016545A CN 113508363 A CN113508363 A CN 113508363A
Authority
CN
China
Prior art keywords
bit
hypothesized
circuitry
bit string
hypothetical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202080016545.5A
Other languages
Chinese (zh)
Other versions
CN113508363B (en
Inventor
V·S·拉梅什
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Micron Technology Inc
Original Assignee
Micron Technology Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/287,156 external-priority patent/US11074100B2/en
Priority claimed from US16/286,941 external-priority patent/US10990387B2/en
Application filed by Micron Technology Inc filed Critical Micron Technology Inc
Publication of CN113508363A publication Critical patent/CN113508363A/en
Application granted granted Critical
Publication of CN113508363B publication Critical patent/CN113508363B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17331Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/544Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations
    • G06F7/575Basic arithmetic logic units, i.e. devices selectable to perform either addition, subtraction or one of several logical operations, using, at least partially, the same circuitry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/60Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers
    • G06F7/607Methods or arrangements for performing computations using a digital non-denominational number representation, i.e. number representation without radix; Computing devices using combinations of denominational and non-denominational quantity representations, e.g. using difunction pulse trains, STEELE computers, phase computers number-of-ones counters, i.e. devices for counting the number of input lines set to ONE among a plurality of input lines, also called bit counters or parallel counters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/541Interprogram communication via adapters, e.g. between incompatible applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45583Memory management, e.g. access or allocation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Advance Control (AREA)

Abstract

Systems, devices, and methods related to arithmetic and logical operations in a multi-user network are described. The circuitry may be part of a shared pool of computing resources in a multi-user network. Data (e.g., one or more bit strings) received by the circuitry may be selectively operated on. The circuitry may operate on data to convert the data between one or more formats, such as floating point and/or general number (e.g., a hypothetical number) formats, and may further perform arithmetic and/or logical operations on the converted data. For example, the circuitry may be configured to receive a request to perform an arithmetic operation and/or a logical operation using at least one assumed bit string operand. The request may include a parameter corresponding to performing the operation. The circuitry may perform the arithmetic operation and/or the logical operation based at least in part on the parameter.

Description

Arithmetic and logical operations in a multi-user network
Technical Field
The present disclosure relates generally to semiconductor memories and methods, and more particularly, to apparatus, systems, and methods related to arithmetic and logical operations in multi-user networks.
Background
Memory devices are typically provided as internal semiconductor integrated circuits in computers or other electronic systems. There are many different types of memory, including volatile and non-volatile memory. Volatile memory may require power to maintain its data (e.g., host data, error data, etc.) and includes Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Synchronous Dynamic Random Access Memory (SDRAM), and Thyristor Random Access Memory (TRAM), among others. Non-volatile memory may provide persistent data by retaining stored data when not powered, and may include NAND flash memory, NOR flash memory, and resistance variable memory, such as Phase Change Random Access Memory (PCRAM), Resistive Random Access Memory (RRAM), and Magnetoresistive Random Access Memory (MRAM), such as spin torque transfer random access memory (sttram), among others.
The memory device may be coupled to a host (e.g., a host computing device) to store data, commands, and/or instructions for use by the host in operating the computer or electronic system. For example, data, commands, and/or instructions may be transferred between a host and a memory device during operation of a computing or other electronic system.
The host and/or memory devices may operate in a multi-user network (e.g., a software-defined data center) in which Virtual Machines (VMs), virtual workloads, data compute nodes, clusters, and containers, etc., are deployed. A VM is a software implementation of a computer that executes application software similar to a physical computer. VMs have the advantage of not being tied to physical resources, which allows VMs to be moved and scaled around to meet the changing needs of the enterprise without impacting the use of the enterprise's applications. The VM may be deployed on a hypervisor provided with a pool of computing resources (e.g., processing resources, memory devices that may include memory resources, etc.).
Drawings
Fig. 1 is a functional block diagram in the form of a computing system including an apparatus including a host and acceleration circuitry, according to several embodiments of the present disclosure.
Fig. 2A is a functional block diagram in the form of a computing system including an apparatus including a host and a memory device, according to several embodiments of the present disclosure.
Fig. 2B is a functional block diagram in the form of a computing system deployed in a multi-user network including hosts, memory devices, application specific integrated circuits, field programmable gate arrays, and virtual compute clusters, according to several embodiments of the present disclosure.
FIG. 3 is an example of an n-bit assumed number (posit) with es exponent bits.
Fig. 4A is an example of a positive value of the 3-bit assumed number.
Fig. 4B is an example of a hypothetical number construction using two exponent bits.
Fig. 5 is a functional block diagram in the form of acceleration circuitry according to several embodiments of the present disclosure.
FIG. 6 is a diagram of a host, hypervisor, multiple virtual compute instances, and agents, according to several embodiments of the present disclosure.
Fig. 7A is a diagram of a virtual compute cluster, according to several embodiments of the present disclosure.
Fig. 7B is another diagram of a virtual compute cluster, according to several embodiments of the present disclosure.
Fig. 8 is a diagram of an apparatus according to several embodiments of the present disclosure.
Fig. 9 is a diagram of a machine according to several embodiments of the present disclosure.
Figure 10 is a flow diagram representing an example method involving arithmetic and logical operations in a multi-user network in accordance with several embodiments of the present disclosure.
Detailed Description
Systems, devices, and methods related to arithmetic and logical operations in a multi-user network are described. The circuitry may be part of a shared pool of computing resources in a multi-user network. Data (e.g., one or more bit strings) received by the circuitry may be selectively operated on. The circuitry may operate on the data to convert the data between one or more formats, such as floating point and/or general number (e.g., a hypothetical number) formats, and may further perform arithmetic and/or logical operations on the converted data. For example, the circuitry may be configured to receive a request to perform an arithmetic operation and/or a logical operation using at least one assumed bit string operand. The request may include parameters corresponding to performing the operation. The circuitry may perform arithmetic operations and/or logical operations based at least in part on the parameters.
Computing systems may perform a wide range of operations that may include performing various calculations, which may require varying degrees of accuracy. However, computing systems and/or multi-user networks have a limited amount of resources to perform such operations. For example, memory resources in which operands to perform computations and/or processing resources for performing such computations are to be stored may be limited in a computing system or multi-user network. To facilitate operations within constraints imposed by limited resources using operands stored by a computing system or multi-user network, in some approaches, the operands are stored in a particular format. For simplicity, one such format is referred to as a "floating point" format or a "floating point number" (e.g., IEEE754 floating point format).
According to the floating point standard, a string of bits (e.g., data, a string of bits that can represent a number, etc.), such as a string of binary numbers, is represented in terms of three sets of integers or bits-a set of bits called a "base," a set of bits called an "exponent," and a set of bits called a "mantissa" (or significand). The set of integers or bits defining the format in which the binary string is stored may be referred to herein as a "format". For example, the three integer sets (e.g., base, exponent, and mantissa) of the above-described bits that define a floating-point bit string may be referred to as a format (e.g., a first format). As described in more detail below, it is assumed that a digit string may include four sets of integers or bits (e.g., symbols, bases, exponents, and mantissas), which may also be referred to as a "format" (e.g., a second format). Furthermore, according to the floating-point standard, two infinite values (e.g., + ∞ and- ∞) and/or two kinds of "non-numerical values" (NaN) (quiet NaN and signaling NaN) may be included in the bit string.
Floating point standards have been used in computing systems for years and define arithmetic formats, commutative formats, rounding rules, operations, and exception handling for computations by many computing systems. The arithmetic format may include binary and/or decimal floating point data, which may include finite numbers, wireless values, and/or special NaN values. The interchange format may include encodings (e.g., bit strings) that may be used to exchange floating point data. A rounding rule may include a set of attributes that may be satisfied when rounding a number during an arithmetic operation and/or a conversion operation. Floating point operations may include arithmetic operations and/or other computational operations, such as trigonometric functions. Exception handling may include an indication of an exception condition, such as divide by zero, overflow, and the like.
An alternative format for floating points is known as the "universal number" (unim) format. There are several forms of unum formats-type I unum, type II unum, and type III unum that can be referred to as "hypotheses" and/or "significands". Type I units are a superset of the IEEE754 standard floating point format that uses "bits" at the end of the fraction to indicate whether a real number is an exact floating point number or whether it is in the interval between adjacent floating point numbers. The sign bit, exponent bit, and fraction bit in type I unum take their definitions according to the IEEE754 floating point format, however, the length of the exponent and fraction fields of type I unum may vary significantly from a single bit to a maximum user definable length. By taking the sign bit, exponent bits, and fraction bits according to the IEEE754 standard floating-point format, the type I unum may behave similar to a floating-point number, however, the variable bit length present in the exponent bits and fraction bits of the type I unum may require additional management compared to a floating-point number.
Type II unum is generally incompatible with floating point numbers, which permits cleaning based on projected real numbersClean mathematical design. Type II unum may include n bits and may be described in terms of a "u grid" where the quadrants of the circular projection are filled with 2n-3-1 ordered set of real numbers. The value of type II unum may be reflected around an axis that bisects the circular projection such that positive values are located in the upper right quadrant of the circular projection and their negative corresponding values are located in the upper left quadrant of the circular projection. The lower half of the circular projection representing type II unum may contain the inverse of the value located in the upper half of the circular projection. Type II unum typically relies on a look-up table for most operations. For example, in some cases, the size of the lookup table may limit the effect of type II unum. However, type II unum may provide improved computational functionality compared to floating point numbers under some conditions.
The type III unum format is referred to herein as the "assumed number format" or "assumed number" for simplicity. In contrast to floating point bit strings, in some conditions, a hypothetical number may allow for a wider dynamic range and higher accuracy (e.g., precision) than a floating point number having the same bit width. This may allow operations performed by the computing system or the multi-user network to be performed at a higher rate (e.g., faster) when using a given number than when using a floating point number, which in turn may improve the performance of the computing system or the multi-user network by, for example, reducing the number of clock cycles used in performing the operations, thereby reducing the processing time and/or power consumed in performing such operations. Furthermore, using a hypothetical number in a computing system or multi-user network may allow for greater accuracy and/or precision than floating point numbers, which may further improve the functionality of the computing system or multi-user network over some approaches (e.g., approaches that rely on floating point bit strings).
Embodiments herein are directed to hardware circuitry (e.g., logic circuitry, arithmetic logic units, field programmable gate arrays, application specific integrated circuits, etc.) configured to perform various operations using bit strings to improve the overall functionality of a computing device and/or multi-user network (e.g., a software-defined data center, a cloud computing environment, etc.). For example, embodiments herein are directed to hardware circuitry deployed in a computing device or multi-user network and configured to perform a conversion operation to convert a format of a bit string from a first format (e.g., a floating point format) to a second format (e.g., a unum format, a hypothesized number format, etc.). Once the bit string has been converted to the second format, the circuitry is operable to perform an operation (e.g., an arithmetic operation, a logical operation, a bitwise operation, a vector operation, etc.) on the converted bit string.
In some embodiments, the circuitry may be further operable to convert the results of the operations back to a first format (e.g., into a floating point format), which may in turn be communicated to different circuitry (e.g., a host, a memory device, a portion of a shared computing resource, etc.) of the computing system or multi-user network. By operating in this manner, the circuitry may help improve the performance of the computing system or multi-user network by allowing for improved accuracy and/or precision of the operations performed, improved speed of performing the operations, and/or reduced storage space required for bit strings before, during, or after performing arithmetic, logical, or other operations.
In some embodiments, the circuitry may be deployed as part of a shared pool of computing resources in a multi-user network. As used herein, a "multi-user network" generally refers to a collection of computing systems in which one or more hosts (e.g., host computing systems) are configured to provide computing functionality via a network such as the internet. Multi-user networks are dynamic in nature. For example, Virtual Compute Instances (VCIs) and/or various application services may be created, used, moved, or destroyed within a multi-user network. When a VCI is created (e.g., when a container is initialized), various processes and/or services begin to run and consume resources.
In a multi-user network, a resource may be accessed by multiple users in different geographic locations, which are not necessarily the same geographic location in which the computing resource is located. As used herein, a "resource" is a physical or virtual component with limited availability within a computer or multi-user network. For example, resources include processing resources, memory resources, power and/or input/output resources, and the like. A multi-user network may include a shared pool of resources (e.g., processing resources, memory resources, etc.) shared by multiple users. In the alternative, the multi-user network may be referred to herein as a software-defined datacenter or cloud computing environment.
In some embodiments, the circuitry is accessible by the VCI as part of a shared pool of computing resources available to the VCI. For example, the circuitry may be deployed in a memory device that is provided as part of a shared pool of computing resources that may be used for a multi-user network. However, embodiments are not so limited, and the circuitry may be deployed on a host, a blade server, a graphics processing unit, a field programmable gate array, an application specific integrated circuit, or other physical or virtualized hardware component that is provided as part of a shared pool of computing resources that may be used for a multi-user network.
The term "virtual compute instance" (VCI) encompasses a range of computing functionality. The VCI may comprise a data compute node, such as a Virtual Machine (VM), running on a hypervisor. In contrast, a container may run on a host operating system without the need for a hypervisor or a separate operating system, such as a container running within Linux. The container may be provided by a Virtual Machine (VM) that includes a container virtualization layer (e.g., Docker). A VM generally refers to an isolated end-user space instance that may be executed within a virtualized environment. Other technologies besides hardware virtualization that can provide isolated end-user space instances may also be referred to as VCI. The term "VCI" encompasses these examples and combinations of different types of VCIs, and the like.
In some embodiments, a VM operates on a host with its own guest operating system using the resources of the host virtualized by virtualization software (e.g., hypervisor, virtual machine monitor, etc.). The tenant (i.e., the owner of the VM) can select which applications operate on top of the guest operating system. On the other hand, the construct runs on top of the host operating system without the need for a hypervisor or some container for a separate guest operating system.
The host operating system may use namespaces to isolate containers from each other and thus may provide operating system level separation for different groups of applications operating within different containers. This separation is equivalent to the VM separation that can be provided in a hypervisor virtualization environment that virtualizes system hardware, and thus can be viewed as a form of virtualization that isolates different groups of applications operating in different containers. Such a container may be "lightweight" compared to a VM at least because it shares an operating system rather than operating with its own guest operating system.
Multiple VCIs may be configured to communicate with each other in a multi-user network. In such systems, information may be propagated from an end user to at least one of the VCIs in the system, between the VCIs in the system, and/or between at least one of the VCIs in the system and a non-virtualized physical host.
Containerized cloud-native applications can be used to accelerate application delivery in a multi-user network. As used herein, "containerized" or "containerization" refers to virtualization technology in which an application (or portion of an application, such as a stream corresponding to an application) is wrapped into a container (e.g., a Docker, Linux container, etc.) as an alternative to full machine virtualization. Because containerization may include loading the application onto the VCI, the application may run on any suitable physical machine without fear of application dependencies. Additionally, as used herein, "cloud-native applications" refer to applications (e.g., computer programs, software packages, etc.) that are assembled as containerized workloads in containers deployed in a multi-user network. A "containerized workload" refers to a computing architecture in which an application is structured as a collection of loosely coupled (e.g., containerized) services. The containerized workload architecture may allow for improved application modularity, extensibility, and continuous deployment compared to traditional application development environments.
In embodiments where circuitry to perform operations to convert bit strings between various formats and/or perform arithmetic and/or logical operations using bit strings is provided in a multi-user network, portions of the operations may be performed with the aid of one or more VCIs and/or containers (e.g., containerized workloads). For example, one or more VCIs or containers may be deployed in a multi-user network and may be configured to access circuitry to request operations to convert bit strings between various formats and/or to request arithmetic and/or logical operations using bit strings.
In some embodiments, operations to convert bit strings between various formats and/or arithmetic and/or logical operations using bit strings may be performed based on parameters received by the multi-user network. For example, a request to perform an operation to convert a bit string between various formats and/or a request to perform an arithmetic and/or logical operation on a bit string may be accompanied by one or more parameters corresponding to performing an operation to convert a bit string between various formats and/or an arithmetic and/or logical operation on a bit string. The parameters may include an amount of processing resources to be used for performing the operation, an amount of time to be allocated for performing the operation, a bit length of an operand to be used for performing the operation, and/or an exponent bit length of an operand to be used for performing the operation, among others.
By performing operations to convert bit strings between various formats and/or arithmetic and/or logical operations using bit strings in a multi-user network based on parameters, application developers or other users of the multi-user network may be able to fine-tune their resource consumption when such operations are requested. This may allow for a reduction in both monetary and resource-related costs associated with performing large computations in a multi-user network, as compared to approaches that do not perform operations that take such parameters into account.
In the following detailed description of the present disclosure, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration ways in which one or more embodiments of the disclosure may be practiced. These embodiments are described in sufficient detail to enable those of ordinary skill in the art to practice the embodiments of this disclosure, and it is to be understood that other embodiments may be utilized and that process, electrical, and structural changes may be made without departing from the scope of the present disclosure.
As used herein, designators such as "N", "M", and the like, particularly with respect to reference numerals in the figures, indicate that a number of the particular feature so designated may be included. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting. As used herein, the singular forms "a" and "the" may include both the singular and the plural referents unless the context clearly dictates otherwise. Further, "a plurality," "at least one," and "one or more" (e.g., a plurality of memory banks) can refer to one or more memory banks, while "a plurality" is intended to refer to more than one of such things.
Moreover, the word "can/may" is used throughout this application in a permissive sense (i.e., possible, able) rather than a mandatory sense (i.e., must). The term "comprising" and its derivatives mean "including, but not limited to". Depending on the context, the term "coupled" means physically connected directly or indirectly or used to access and move (transfer) commands and/or data. Depending on the context, the terms "bit string," "data," and "data value" are used interchangeably herein and may have the same meaning.
The figures herein follow a numbering convention in which a first one or more digits correspond to the drawing figure number and the remaining digits identify an element or component in the drawing. Similar elements or components between different figures may be identified by the use of similar digits. For example, 120 may refer to element "20" in FIG. 1, and a similar element may be referred to in FIG. 2 as 220. A group or plurality of similar elements or components may be generally referred to herein by a single element number. For example, the plurality of reference elements 433-1, 433-2 … … 433-N may be collectively referred to as 433. As will be appreciated, elements shown in the various embodiments herein can be added, exchanged, and/or removed in order to provide a number of additional embodiments of the present disclosure. Furthermore, the proportion and/or the relative scale of the elements provided in the figures are intended to illustrate certain embodiments of the present disclosure, and should not be taken in a limiting sense.
Fig. 1 is a functional block diagram in the form of a computing system 100 including an apparatus including a host 102 and acceleration circuitry 120, according to several embodiments of the present disclosure. As used herein, "apparatus" may refer to, but is not limited to, any of a variety of structures or combinations of structures, such as, for example, a circuit or circuitry, one or more dies, one or more modules, one or more devices, or one or more systems. Each of the components (e.g., host 102, acceleration circuitry 120, logic circuitry 122, and/or memory resources 124) may be individually referred to herein as a "device".
As illustrated in fig. 1, the host 102 may be coupled to acceleration circuitry 120. In various embodiments, the host 102 may be coupled to the acceleration circuitry 120 via one or more channels 103 (e.g., buses, interfaces, communication paths, etc.). The channel 103 may be used to transfer data between the acceleration circuitry 120 and the host 102, and may be in the form of a standardized interface. For example, channel 103 may be a Serial Advanced Technology Attachment (SATA), peripheral component interconnect express (PCIe) or Universal Serial Bus (USB), a Double Data Rate (DDR) interface, and other connectors and interfaces. In general, however, channel 103 may provide an interface for passing control, address, data, and other signals between acceleration circuitry 120 and host 102 with compatible receptors for channel 103.
The host 102 may be a host system, such as a personal laptop computer, desktop computer, digital camera, mobile phone, internet of things (IoT) -enabled device, or memory card reader, graphics processing unit (e.g., video card), and various other types of hosts. The host 102 may include a system motherboard and/or backplane and may include a number of memory access devices, such as a number of processing devices (e.g., one or more processors, microprocessors, or some other type of control circuitry). One of ordinary skill in the art will appreciate that "processor" may mean one or more processors, such as a parallel processing system, a number of coprocessors, and the like. Host 102 may be provided in a multi-user network, such as multi-user network 201 illustrated in fig. 2B herein. Thus, in some embodiments, the host 102 may include physical and/or virtualized hardware configured to execute a host operating system.
The system 100 may include separate integrated circuits, or both the host 102 and the acceleration circuitry 120 may be on the same integrated circuit. The system 100 may be, for example, a server system and/or a High Performance Computing (HPC) system and/or a portion thereof. Although the example shown in fig. 1 illustrates a system having a Von Neumann architecture, embodiments of the present disclosure may be implemented in a non-Von Neumann architecture that may not include one or more components (e.g., CPU, ALU, etc.) typically associated with a Von Neumann architecture.
In some embodiments, the host 102 may be responsible for executing an operating system for the computing system 100 that includes the acceleration circuitry 120 and/or other components, such as the memory device 204 illustrated in fig. 2A and 2B, the field programmable gate array 221 illustrated in fig. 2B, the application specific integrated circuit 223 illustrated in fig. 2B, the virtual compute cluster 251 illustrated in fig. 2B, and so forth. Thus, in some embodiments, the host 102 may be responsible for controlling the operation of the acceleration circuitry 120. For example, the host 102 may execute instructions (e.g., in the form of an operating system) that manage the hardware of the computing system 100 (e.g., schedule tasks, execute applications, control peripherals, etc.).
The acceleration circuitry 120 may include logic circuitry 122 and memory resources 124. The logic circuitry 122 may be provided in the form of an integrated circuit, such as an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), a system on a chip, or other combination of hardware and/or circuitry configured to perform the operations described in greater detail herein. In some embodiments, the logic circuitry 122 may include an Arithmetic Logic Unit (ALU). The ALU may include circuitry (e.g., hardware, logic, one or more processing devices, etc.) to perform operations (e.g., arithmetic operations, logical operations, bitwise operations, etc.) as described above on an integer binary bit string, such as a bit string in a hypothetical number format. However, embodiments are not limited to ALUs, and in some embodiments, the logic circuitry 122 may include a state machine and/or an instruction set architecture (or a combination thereof) in addition to or in place of ALUs, as described in more detail herein in connection with fig. 2B and 5.
The logic circuitry 122 may be configured to receive one or more bit strings (e.g., a plurality of bits) stored in a first format (e.g., a plurality of bits in a floating-point format), convert the bit strings to a second format (e.g., convert the bit strings to a hypothetical format), and/or cause operations, such as arithmetic and/or logical operations, to be performed using the bit strings having the second format. As used herein, a bit string stored in a second format (e.g., a bit string in a hypothetical number format) includes at least one bit referred to as a "symbol," a set of bits referred to as a "base," a set of bits referred to as an "exponent," and a set of bits referred to as a "mantissa" (or significand). As used herein, a set of bits is intended to refer to a subset of bits contained in a string of bits. Examples of sign bit sets, base bit sets, exponent bit sets, and mantissa bit sets are described in more detail herein in connection with fig. 3 and 4A-4B.
For example, once the floating-point bit string is converted to a bit string in a hypothesized number format, the logic circuitry 122 may be configured to proceed (or cause to proceed) using the hypothesized number bit string: arithmetic operations such as addition, subtraction, multiplication, division, fused multiply-add, product-accumulate, dot product units, greater or less than, absolute values (e.g., FABS ()), fast Fourier (Fourier) transforms, inverse fast Fourier transforms, sigmoid functions, convolution, square roots, exponents, and/or logarithmic operations; AND/OR logical operations such as AND, OR, XOR, NOT, etc.; and trigonometric functions such as sine, cosine, tangent, etc. As will be appreciated, the foregoing list of operations is not intended to be exhaustive, nor is the foregoing list of operations intended to be limiting, and the logic circuitry 122 may be configured to perform (or cause to be performed) other arithmetic operations, logical operations, bitwise operations, vector operations, and the like.
The acceleration circuitry 120 may further include a memory resource 124 that may be communicatively coupled to the logic circuitry 122. The memory resources 124 may include volatile memory resources, non-volatile memory resources, or a combination of volatile and non-volatile memory resources. In some embodiments, the memory resource may be a Random Access Memory (RAM), such as a Static Random Access Memory (SRAM). However, embodiments are not so limited, and the memory resources may be cache, one or more registers, NVRAM, ReRAM, FeRAM, MRAM, PCM, "emerging" memory devices such as 3-D cross point (3D XP) memory devices, or a combination thereof. A 3D XP array of non-volatile memory may store bits based on changes in body resistance in conjunction with a stackable cross-meshed data access array. Additionally, in contrast to many flash-based memories, 3D XP nonvolatile memory can be subjected to write-in-place operations, where nonvolatile memory cells can be programmed without pre-erasing the nonvolatile memory cells.
The embodiment of fig. 1 may include additional circuitry not illustrated to avoid obscuring embodiments of the present disclosure. For example, the system 100 may include address circuitry to latch address signals provided over I/O connections through I/O circuitry. Address signals may be received and decoded by a row decoder and a column decoder to access devices within the system 100. Those skilled in the art will appreciate that the number of address input connections may depend on the density and architecture of the system 100.
Fig. 2A is a functional block diagram in the form of a computing system including an apparatus 200 including a host 202 and a memory device 204, according to several embodiments of the present disclosure. The memory device 204 may include acceleration circuitry 220, which may be similar to the acceleration circuitry 120 illustrated in fig. 1. Similarly, host 202 may be similar to host 102 illustrated in fig. 1. Each of the components (e.g., host 202, acceleration circuitry 220, logic circuitry 222, memory resources 224, and/or memory array 230, etc.) may be individually referred to herein as a "device.
The host 202 may be communicatively coupled to the memory device 204 via one or more channels 203, 205. The channels 203, 205 may be interfaces or other physical connections that allow data and/or commands to be transferred between the host 202 and the memory device 205. For example, commands that cause operations to be performed by the acceleration circuitry 220 (e.g., operations that convert a bit string in floating-point format to a bit string in a hypothetical format, and subsequent arithmetic and/or logical operations on the bit string in the hypothetical format) may be communicated from the host via the channels 203, 205. It should be noted that in some embodiments, the acceleration circuitry 220 may operate in response to an initiation command transmitted from the host 202 via one or more of the channels 203, 205 in the absence of an intervening command from the host 202. That is, once the acceleration circuitry 220 has received a command from the host 202 to initiate an operation, the operation may be performed by the acceleration circuitry 220 without additional commands from the host 202.
Memory device 204 may include one or more memory modules (e.g., single inline memory modules, dual inline memory modules, etc.). The memory device 204 may include volatile memory and/or non-volatile memory. In a number of embodiments, the memory device 204 may comprise a multi-chip device. A multi-chip device may include several different memory types and/or memory modules. For example, memory device 204 may include non-volatile or volatile memory on any type of module.
Memory device 204 may provide a main memory for computing system 200 or may be used as additional memory or storage throughout computing system 200. The memory device 204 can include one or more memory arrays 230 (e.g., an array of memory cells), which can include volatile and/or nonvolatile memory cells. For example, the memory array 230 may be a flash array having a NAND architecture. Embodiments are not limited to a particular type of memory device. For example, memory device 204 may include RAM, ROM, DRAM, SDRAM, PCRAM, RRAM, flash memory, and so forth.
In embodiments where memory device 204 comprises non-volatile memory, memory device 204 may comprise a flash memory device, such as a NAND or NOR flash memory device. However, embodiments are not so limited, and memory device 204 may include other non-volatile memory devices such as non-volatile random access memory devices (e.g., NVRAM, ReRAM, FeRAM, MRAM, PCM), an "emerging" memory device such as a 3-D cross-point (3D XP) memory device, or a combination thereof.
As shown in fig. 2A, memory device 204 may include a register access component 206, a High Speed Interface (HSI)208, a controller 210, one or more extended row address (XRA) components 212, main memory input/output (I/O) circuitry 214, Row Address Strobe (RAS)/Column Address Strobe (CAS) chain control circuitry 216, RAS/CAS chain component 218, acceleration circuitry 220, and a memory array 230. As shown in fig. 2A, the acceleration circuitry 220 is located in an area of the memory device 204 that is physically distinct from the memory array 230. That is, in some embodiments, the acceleration circuitry 220 is located in a peripheral location of the memory array 230.
The register access component 206 may facilitate data transfer and extraction from the host 202 to the memory device 204 and from the memory device 204 to the host 202. For example, the register access component 206 may store an address (or facilitate a lookup of an address), such as a memory address, corresponding to data to be transferred from the memory device 204 to the host 202 or from the host 202 to the memory device 204. In some embodiments, the register access component 206 may facilitate transferring and extracting data to be operated on by the acceleration circuitry 220, and/or the register access component 206 may facilitate transferring and extracting data that has been operated on by the acceleration circuitry 220 for transfer to the host 202.
The HSI 208 may provide an interface between the host 202 and the memory device 204 for commands and/or data that traverse the channel 205. The HSI 208 may be a Double Data Rate (DDR) interface, such as DDR3, DDR4, DDR5, and the like. However, embodiments are not limited to DDR interfaces, and the HSI 208 may be a Quad Data Rate (QDR) interface, a Peripheral Component Interconnect (PCI) interface (e.g., peripheral component interconnect express (PCIe)) interface, or other suitable interface for transferring commands and/or data between the host 202 and the memory device 204.
The controller 210 may be responsible for executing instructions from the host 202 and accessing the acceleration circuitry 220 and/or the memory array 230. The controller 210 may be a state machine, a sequencer, or some other type of controller. The controller 210 may receive commands from the host 202 (e.g., via the HSI 208) and control the acceleration circuitry 220 and/or the operation of the memory array 230 based on the received commands. In some embodiments, the controller 210 may receive commands from the host 202 to cause operations to be performed using the acceleration circuitry 220 (e.g., to convert a bit string between various formats, to perform arithmetic and/or logical operations using the bit string, etc.). In response to receiving such a command, the controller 210 may instruct the acceleration circuitry 220 to begin operating.
In some embodiments, the controller 210 may be a global processing controller and may provide power management functions to the memory device 204. The power management functions may include control of power consumed by the memory device 204 and/or the memory array 230. For example, the controller 210 may control the power provided to the various banks of the memory array 230 to control which banks of the memory array 230 operate at different times during operation of the memory device 204. This may include turning off certain banks of the memory array 230 while power is provided to other banks of the memory array 230 to optimize the power consumption of the memory device 230. In some embodiments, the controller 210 that controls the power consumption of the memory device 204 may include controlling power to various cores of the memory device 204 and/or to the acceleration circuitry 220, the memory array 230, and the like.
The XRA component 212 is intended to provide additional functionality (e.g., peripheral amplifiers) that sense (e.g., read, store, cache) data values of memory cells in the memory array 230 and that are distinct from the memory array 230. The XRA components 212 may include latches and/or registers. For example, additional latches may be included in the XRA component 212. The latches of the XRA component 212 may be located on the periphery of the memory array 230 of the memory device 204 (e.g., on the periphery of one or more groups of memory cells).
Main memory input/output (I/O) circuitry 214 may facilitate the transfer of data and/or commands to and from memory array 230. For example, the main memory I/O circuitry 214 may facilitate transferring bit strings, data, and/or commands from the host 202 and/or the acceleration circuitry 220 to the memory array 230 and from the memory array 230. In some embodiments, the main memory I/O circuitry 214 may include one or more Direct Memory Access (DMA) components that may transfer a bit string (e.g., a hypothetical bit string stored as a block of data) from the acceleration circuitry 220 to the memory array 230, and vice versa.
In some embodiments, the main memory I/O circuitry 214 may facilitate the transfer of bit strings, data, and/or commands from the memory array 230 to the acceleration circuitry 220 so that the acceleration circuitry 220 may operate on the bit strings. Similarly, the main memory I/O circuitry 214 may facilitate transfer of bit strings to the memory array 230 on which one or more operations have been performed by the acceleration circuitry 220. As described in more detail herein, the operations may include operations that convert a bit string formatted according to a floating-point standard to a bit string formatted as a hypothetical number (and vice versa), arithmetic operations performed on a bit string formatted as a hypothetical number, logical operations performed on a bit string formatted as a hypothetical number, and the like.
Row Address Strobe (RAS)/Column Address Strobe (CAS) chain control circuitry 216 and RAS/CAS chain assembly 218 may be used in conjunction with memory array 230 to latch a row address and/or a column address to initiate a memory cycle. In some embodiments, RAS/CAS chain control circuitry 216 and/or RAS/CAS chain component 218 may resolve row addresses and/or column addresses of memory array 230 at which read and write operations associated with memory array 230 will initiate or terminate. For example, after completing an operation using the acceleration circuitry 220, the RAS/CAS chain control circuitry 216 and/or RAS/CAS chain component 218 may latch and/or resolve a particular location in the memory array 230 to which a bit string that has been operated on by the acceleration circuitry 220 is to be stored. Similarly, prior to the acceleration circuitry 220 operating on the bit string, the RAS/CAS chain control circuitry 216 and/or the RAS/CAS chain assembly 218 may latch and/or resolve a particular location in the memory array 230 from which the bit string is to be transferred to the acceleration circuitry 220.
As described above in connection with fig. 1 and in more detail below in connection with fig. 5, the acceleration circuitry 220 may be configured to receive one or more bit strings in a first format (e.g., a plurality of bits in a floating-point format), convert the one or more bit strings according to a second format (e.g., encode the plurality of bits in a hypothetical number format), and/or cause operations, such as arithmetic and/or logical operations, to be performed using the one or more bit strings in the second format.
The acceleration circuitry 220 may include logic circuitry (e.g., logic circuitry 122 illustrated in fig. 1) and/or memory resources (e.g., memory resources 124 illustrated in fig. 1). A bit string (e.g., data, a plurality of bits, etc.) may be received by the acceleration circuitry 220 from, for example, the host 202 and/or the memory array 230 and stored by the acceleration circuitry 220 in a memory resource, for example, the acceleration circuitry 220. The acceleration circuitry (e.g., logic circuitry of acceleration circuitry 220) may operate on (or cause to operate on) the string of bits to convert the string of bits from a floating-point format to a hypothetical number format, to perform arithmetic and/or logical operations on the hypothetical string of bits, and/or to convert results of the arithmetic and/or logical operations to a different format (e.g., a floating-point format), as described in more detail herein in connection with fig. 5.
As described in more detail in connection with fig. 3 and 4A-4B, the hypothesized numbers may provide improved accuracy and may require less storage space (e.g., may contain a smaller number of bits) than the corresponding bit strings represented in floating-point format. Thus, by using the acceleration circuitry 220 to convert a floating-point bit string to a hypothesized bit string, performance of the memory device 202 may be improved over methods that utilize only floating-point bit strings because the hypothesized bit string may be operated on more quickly (e.g., because the bit string in the hypothesized format is smaller and therefore requires less time to operate on), and because less memory space is needed in the memory device 204 to store the bit string in the hypothesized format, which may free up additional space in the memory device 202 for other bit strings, data, and/or other operations to be performed.
Once the acceleration circuitry 220 has performed an operation that converts data from a floating-point format to a hypothetical number format, the acceleration circuitry may perform (or cause to be performed) an arithmetic and/or logical operation on the hypothetical digit string. For example, as discussed above, the acceleration circuitry 220 may be configured to (or caused to) perform: arithmetic operations such as addition, subtraction, multiplication, division, fused multiply-add, product-accumulate, dot product units, greater or less than, absolute values (e.g., FABS ()), fast Fourier transforms, inverse fast Fourier transforms, sigmoid functions, convolution, square root, exponential, and/or logarithmic operations; AND/OR logical operations such as AND, OR, XOR, NOT, etc.; and trigonometric functions such as sine, cosine, tangent, etc. As will be appreciated, the foregoing list of operations is not intended to be exhaustive, nor is the foregoing list of operations intended to be limiting, and the acceleration circuitry 220 may be configured to perform (or cause to be performed on) other arithmetic and/or logical operations on a hypothetical bit string.
In some embodiments, the acceleration circuitry 220 may perform the operations listed above in conjunction with the execution of one or more machine learning algorithms. For example, the acceleration circuitry 220 may perform operations related to one or more neural networks. The neural network may allow training of the algorithm over time to determine an output response based on the input signal. For example, over time, a neural network may learn to substantially better maximize the likelihood of completing a particular goal. This may be advantageous in machine learning applications, as neural networks may be trained with new data over time to achieve better maximizing the likelihood of accomplishing a particular goal. Neural networks may be trained over time to improve the operation of specific tasks and/or specific goals.
However, in some approaches, machine learning (e.g., neural network training) may be processing intensive (e.g., may consume a large amount of computer processing resources) and/or may be time intensive (e.g., may require lengthy computations that consume multiple cycles). In contrast, by performing such operations using the acceleration circuitry 220, for example, by performing such operations on a bit string that has been converted by the acceleration circuitry 220 to a hypothetical number format, the amount of processing resources and/or the amount of time consumed in performing the operations may be reduced compared to methods that perform such operations using a bit string in a floating-point format.
The acceleration circuitry 220 may be communicatively coupled to the memory array 230 via one or more channels, interfaces, and/or buses. For example, memory array 230 may be a DRAM array, SRAM array, STT RAM array, PCRAM array, TRAM array, RRAM array, NAND flash array, and/or NOR flash array, although embodiments are not limited to these particular examples. The memory array 230 may serve as the main memory for the computing system. In some embodiments, the memory array 230 may be configured to store bit strings operated on by the acceleration circuitry 220 and/or to store bit strings to be transmitted to the acceleration circuitry 220. Array 230 can include memory cells arranged in rows coupled by access lines (which can be referred to herein as word lines or select lines) and columns coupled by sense lines (which can be referred to herein as data lines or digit lines). Although a single array 230 is shown in fig. 2A, embodiments are not so limited. For example, the memory device 204 has a number of memory arrays 230 (e.g., a number of banks of DRAM cells, NAND flash cells, etc.).
The embodiment of fig. 2A may include additional circuitry not illustrated to avoid obscuring embodiments of the present disclosure. For example, the memory device 204 may include address circuitry to latch address signals provided over I/O connections through I/O circuitry. Address signals may be received and decoded by a row decoder and a column decoder to access the memory device 204 and/or the memory array 230. Those skilled in the art will appreciate that the number of address input connections may depend on the density and architecture of the memory device 204 and/or the memory array 230.
Fig. 2B is a functional block diagram in the form of a computing system 200 deployed in a multi-user network 201 including a shared pool of computing resources 246 including a host 202, a memory device 204 (which may include logic circuitry 222), an application specific integrated circuit 223, a field programmable gate array 221, and a Virtual Compute Cluster (VCC)251, according to several embodiments of the present disclosure. As shown in fig. 2B, shared pool of computing resources 246 may further include processing resources 245 and memory resources 247, which may be included within host 202, separate from host 202, or a combination thereof. Each of the components (e.g., host 202, conversion component 211, memory device 204, FPGA 221, ASIC 223, VCC 251, etc.) may individually be referred to herein as an "apparatus.
The multi-user network 201 may be a software-defined data center, a cloud computing environment, a data center, or other such network or computing environment in which Virtual Compute Instances (VCIs), Virtual Machines (VMs), virtual workloads, data compute nodes, clusters, and containers, and the like, are deployed. The multi-user network 201 may extend virtualization concepts such as abstraction, aggregation, and automation of data center resources and services to provide information technology as a service (ITaaS). In the multi-user network 201, infrastructure (e.g., networking, processing, and security) may be virtualized and delivered as services. The multi-user network 201 may include software-defined networking and/or software-defined storage. In some embodiments, the components of the multi-user network 201 may be provided, operated, and/or managed through an Application Programming Interface (API). Thus, multiple users may access resources associated with the multi-user network 201 from different locations via, for example, a computing node 207 communicatively coupled to the multi-user network 201. Although a single compute node 207 is shown in fig. 2B, it should be appreciated that multiple compute nodes may be communicatively coupled to the multi-user network 201.
Computing node 207 may be a user device such as a personal computer, laptop computer, tablet handset, smartphone, or other device that may access multi-user network 201 via, for example, an edge device. The compute node 207 may be configured to send commands to the multi-user network 201 to facilitate operations using bit strings described herein (e.g., a hypothetical bit string). The command may include a command to initiate performance of an operation using a bit string, and/or the command may include one or more parameters that specify criteria under which an operation is to be performed. Table 1 shows several non-limiting examples of parameters that specify criteria according to which operations are to be performed.
Time of treatment Processing resources Assuming a numerical parameter Assuming a numerical parameter Assuming a numerical parameter Assuming a numerical parameter
15 minutes 2 cores (8,0) (16,0) (32,0) (64,0)
30 minutes 4 cores (8,1) (16,1) (32,1) (64,1)
45 minutes 8 cores (8,2) (16,2) (32,2) (64,2)
60 minutes 16 cores (8,3) (16,3) (32,3) (64,3)
90 minutes 32 cores (8,4) (16,4) (32,4) (64,4)
TABLE 1
The non-limiting example parameters shown in table 1 may include processing time (e.g., an amount of time to be allocated for performing operations), processing resources (e.g., an amount of processing resources to be allocated for performing operations), and a hypothetical number parameter (e.g., a hypothetical number precision parameter, such as a requested bit length and a requested exponent length of a bit string to be used for performing operations).
As indicated in table 1, the treatment time may be selected from various preset time ranges (e.g., 15 minutes, 30 minutes, 45 minutes, 60 minutes, 90 minutes). Because accessing resources in a multi-user network may be costly and may be based on the amount of time that access to the resources is provided, by allowing an optional time frame within which operations are completed, a user may be better able to plan for expenses associated with performing the operations described herein within the multi-user network. However, while a particular time range is shown in table 1, embodiments are not so limited and additional processing time ranges may be provided, or processing times (e.g., 20 minutes, 161.80339 minutes, etc.) may be customized via, for example, user input.
As indicated in table 1, the processing resource parameters may be selected from a variety of preset processing resource parameters that may be used in the multi-user network 201. For example, the number of processing cores (e.g., 2 cores, 4 cores, 8 cores, 16 cores, 32 cores, etc.) of processing resources 245 to be allocated by the multi-user network 201 for performing operations may be selected prior to initiating operations. Because accessing resources in a multi-user network may be costly and may be based on the amount of processing cores requested, by allowing optional processing resources 245 to complete the operations, a user may be better able to plan for the expenses associated with performing the operations described herein within the multi-user network. However, while specific processing resources are shown in table 1, embodiments are not so limited and additional processing resources may be provided or the amount of processing resources requested may be customized via, for example, user input.
As indicated in table 1, the hypothetical number parameters (e.g., hypothetical number accuracy parameters) can be selected from various preset hypothetical number parameters (e.g., (8,0), (16,1), (32,4), etc.). The hypothetical parameters shown in table 1 may correspond to the bit length and exponent bit length of a hypothetical bit string to be used as an operand when performing arithmetic and/or logical operations. The bit length may correspond to the total number of bits in the hypothetical digit string, while the exponent bit length may correspond to the number of exponent bits (e.g., exponent bits es, described in more detail herein in connection with fig. 3 and 4A-4B). In the notation of table 1, an assumed digit string having a bit length of eight bits and an exponential bit length of two bits can be written as (8,2), and an assumed digit string having a bit length of sixty-four bits and an exponential bit length of four bits can be written as (64, 4).
In some embodiments, the compute node 207 may be configured to display a Graphical User Interface (GUI) using the host 202, the memory device 204, the FPGA 221, the ASIC 223, and/or the VCC 251 to facilitate operations using bit strings. The compute node 207 may be configured to display a GUI in which requests to perform operations are selected or otherwise input and/or parameters specifying criteria from which operations are to be performed are selected. For example, the GUI may be similar to the example shown in table 1, and may allow a user to select processing time, processing resources, and/or a hypothetical number parameter for an operation using a hypothetical bit string as an operand. However, embodiments are not so limited, and in some embodiments, the GUI of the compute node 207 may allow a user to enter specific parameters or parameter values not necessarily listed in table 1.
As shown in fig. 2B, a host 202 may be coupled to a memory device 204 via a channel 203, which channel 203 may be similar to channel 103 illustrated in fig. 1. A Field Programmable Gate Array (FPGA)221 can be coupled to the host 202 via a channel 217 and an Application Specific Integrated Circuit (ASIC)223 can be coupled to the host 202 via a channel 219. In some embodiments, channels 217 and/or 219 may comprise peripheral serial interconnect express (PCIe) interfaces, however embodiments are not so limited and channels 217 and/or 219 may comprise other types of interfaces, buses, communication channels, etc. to facilitate data transfers between host 202 and FPGA 221 and/or ASIC 223. For example, channels 203, 217, and/or 219 may be communication paths that may utilize multi-user network communication 201 protocols such as TCP/IP, MQTT, HTTP, and the like.
In some embodiments, FPGA 221 and/or ASIC 223 may receive a bit string, convert the bit string from a first format (e.g., floating point format) to a second format (e.g., hypothetical number format), perform arithmetic and/or logical operations on the hypothetical digit string to produce a resulting hypothetical number representing the result of the operation performed on the received hypothetical digit string, and/or convert the resulting bit string from the second format to the first format based on parameters (such as the parameters shown in table 1).
As described above, non-limiting examples of arithmetic and/or logical operations that may be performed by the FPGA 221 and/or ASIC 223 using a string of assumed-number bits include: arithmetic operations such as addition, subtraction, multiplication, division, fused multiply-add, product-accumulate, dot product units, greater or less than, absolute values (e.g., FABS ()), fast Fourier transforms, inverse fast Fourier transforms, sigmoid functions, convolution, square root, exponential, and/or logarithmic operations; AND/OR logical operations such as AND, OR, XOR, NOT, etc.; and trigonometric functions such as sine, cosine, tangent, etc.
The FPGA 221 may include a state machine 227 and/or registers 229. State machine 227 may include one or more processing devices configured to operate on inputs and generate outputs. In some embodiments, the FPGA 221 can receive a command (e.g., from the compute node 207) to initiate an operation using one or more bit strings. The command may include one or more parameters, such as those shown in table 1.
For example, FPGA 221 may be configured to receive a bit string, convert the bit string from a first format (e.g., a floating point format) to a second format (e.g., a hypothetical number format), perform arithmetic and/or logical operations on the hypothetical digit string to produce a resulting hypothetical bit string that represents the result of the operation performed on the received hypothetical digit string, and/or convert the resulting bit string from the second format to the first format based on parameters received with the command to initiate the operation.
Registers 229 of FPGA 221 may be configured to buffer and/or store bit strings prior to the operation of received hypothetical bit strings by state machine 227. Furthermore, registers 229 of FPGA 221 may be configured to buffer and/or store a resulting string of hypothetical digits representing a result of an operation performed on a received string of hypothetical digits prior to transferring the result to circuitry external to ASIC 233 (e.g., host 202, memory device 204, compute node 207, memory resources 247, etc.).
ASIC 223 may include logic 241 and/or cache 243. Logic 241 may include circuitry configured to operate on inputs and generate outputs. In some embodiments, the ASIC 223 may receive a command (e.g., from the compute node 207) to initiate an operation using one or more bit strings. The command may include one or more parameters, such as those shown in table 1.
In some embodiments, ASIC 223 may be configured to receive a bit string, convert the bit string from a first format (e.g., floating point format) to a second format (e.g., hypothetical number format), perform arithmetic and/or logical operations on the hypothetical digit string to produce a resulting hypothetical bit string that represents the result of the operation performed on the received hypothetical digit string, and/or convert the resulting bit string from the second format to the first format based on parameters received with the command to initiate the operation.
Cache 243 of ASIC 223 may be configured to buffer and/or store a string of hypothetical digits before logic 241 operates on the received string of hypothetical digits. Furthermore, cache 243 of ASIC 223 may be configured to buffer and/or store a resulting string of hypothetical digits representing the result of an operation performed on a received string of hypothetical digits prior to transferring the result to circuitry external to ASIC 233 (e.g., host 202, memory device 204, compute node 207, memory resources 247, etc.).
Although FPGA 227 is shown to include state machine 227 and registers 229, in some embodiments, FPGA 221 may include logic such as logic 241 and/or a cache such as cache 243 in addition to or in place of state machine 227 and/or registers 229. Similarly, in some embodiments, ASIC 223 may include a state machine, such as state machine 227, and/or registers, such as registers 229, in addition to or in place of logic 241 and/or cache 243.
The VCC 251 may include a scheduling agent, a plurality of Virtual Compute Instances (VCIs), and/or a hypervisor, which are described in more detail herein in conjunction with fig. 6 and 7A-7B. The VCC 251 can be communicatively coupled to the host 202, memory device 204, FPGA 221, and/or ASIC 223 of the multi-user network 201, and/or the VCC 251 can be communicatively coupled to the compute node 207. As described in more detail in connection with fig. 7A and 7B, the VCC 251 may facilitate operations to convert bit strings between various formats and/or the VCC 251 may facilitate arithmetic and/or logical operations using bit strings. For example, the VCI (or hypervisor) of the VCC 251 can have a hypothetical arithmetic agent running thereon that can facilitate operations to convert bit strings between various formats and/or arithmetic and/or logical operations using the bit strings.
An agent may be a set of instructions, code, or script, or some combination of the three, in software, firmware, or hardware residing on a computer or computing device. The agent may communicate with another device or program periodically or periodically. The agent may act with or without explicit commands (e.g., monitor activity, execute commands, access memory or storage). In some instances, the agent is an autonomous agent. For example, an agent may be configured to execute instructions using a computing resource (e.g., hardware) that is available to the agent in a computing resource pool (e.g., shared computing resource pool 246 illustrated in fig. 2B).
In some embodiments, circuitry (e.g., the logic circuitry 122 illustrated in fig. 1, the acceleration circuitry 220 illustrated in fig. 2A, the FPGA 221, and/or the ASIC 223) may be configured to receive a request to perform an arithmetic operation and/or a logical operation using at least one operand of the hypothesized number of bits. The request may include at least one of the parameters described above in connection with table 1. In some embodiments, the request may be received by circuitry from compute node 207. Circuitry may perform arithmetic operations and/or logical operations using a hypothesized bit string of operands in response to a request based at least in part on received parameters.
For example, if a parameter specifies an amount of computing resources (e.g., an amount of processing resources and/or an amount of memory resources) from a shared pool of computing resources 246 available to the multi-user network 201, the circuitry may be configured to access and allocate the amount of computing resources specified by the parameter for performing arithmetic and/or logical operations using at least one hypothesized number bit string operand. In some embodiments, the circuitry may generate a request to the multi-user network 201 to allocate a specified amount of computing resources for arithmetic and/or logical operations using at least one hypothesized number of bit-string operands to access the specified amount of computing resources.
In another example, if the parameter specifies an amount of time (e.g., a particular amount of time) allowed for arithmetic and/or logical operations using at least one assumed-bit-string operand, the circuitry may be configured to operate within the amount of time specified by the parameter. In some embodiments, the parameters may specify a nonce parameter (e.g., a nonce accuracy parameter), as described above in connection with table 1. In embodiments where the parameter specifies a presumed number parameter, the circuitry may be configured to generate the presumed number string operand such that a bit length and/or an exponent bit length of the presumed number operand corresponds to the bit length and/or the exponent bit length specified by the parameter. The circuitry may then perform arithmetic and/or logical operations using the assumed bit string operands based on the specified parameters.
In some embodiments, the circuitry may retrieve the hypothetical digit string operand from a memory location within the shared pool of computing resources 246 prior to performing the arithmetic operation and/or the logical operation. For example, if the hypothesized digit string operand is stored in memory device 204 (or another memory resource, such as memory resource 247 accessible by multi-user network 201), the circuitry may generate a request for the hypothesized digit string operand and retrieve the hypothesized digit string operand from its stored memory location prior to performing the arithmetic and/or logical operation. If the bit string operands are not already in a hypothetical number format (e.g., if the bit string operands are stored in different formats (e.g., floating point formats) in memory locations accessible by the multi-user network 201), the circuitry may perform an operation to convert the bit string to a hypothetical bit string before performing the arithmetic and/or logical operation.
FIG. 3 is an example of an n-bit universal number or "um" with an es exponent bit. In the example of fig. 3, n-bit unum is the hypothetical digit string 331. As shown in fig. 3, n-bit hypothesis number 331 may include a set of sign bits (e.g., sign bit 333), a set of base bits (e.g., base bits 335), a set of exponent bits (e.g., exponent bits 337), and a set of mantissa bits (e.g., mantissa bits 339). Mantissa bits 339 may be referred to in the alternative as a "fractional portion" or as "fractional bits," and may represent a portion (e.g., a number) of a bit string after a decimal point.
Sign bit 333 may be zero (0) for positive numbers and one (1) for negative numbers. The base digit 335 is described below in conjunction with table 2, which table 2 shows a (binary) bit string and its associated numerical meaning k. In table 2, the numerical meaning k is determined by the run length of the bit string. The letter x in the binary part of table 2 indicates that the bit value is irrelevant for the determination of the base number, since the (binary) bit string terminates in response to consecutive bit flips or when the end of the bit string is reached. For example, in a (binary) bit string 0010, the bit string terminates in response to zero flipping to one and then back to zero. Thus, the last zero is not related to the base and all that is considered for the base is the leading identity bit and the first identity bit of the terminating bit string (if the bit string contains such a bit).
Binary system 0000 0001 001X 01XX 10XX 110X 1110 1111
Numerical value (k) -4 -3 -2 -1 0 1 2 3
TABLE 2
In FIG. 3, the base digit 335r corresponds to the same bit in the bit string, and the base digit corresponds to the same bit in the bit string
Figure BDA0003228566890000193
Corresponding to the opposite bit of the terminating bit string. For example, for the value k-2 shown in Table 2, the base bit r corresponds to the first two leading zeros, while the base bit r corresponds to the first two leading zeros
Figure BDA0003228566890000192
Corresponding to one. As mentioned above, the final bit corresponding to the value k represented by X in table 2 is not correlated with the base.
If m corresponds to the number of identical bits in the bit string, then k equals-m if the bit is zero. If the bit is one, then k is m-1. This is illustrated in table 1, where for example the (binary) bit string 10XX has a single one and k m-1-0. Similarly, the (binary) bit string 0001 contains three zeros, so that k-m-3. The base number can indicate the usedkA scaling factor of wherein
Figure BDA0003228566890000191
Table 3 below showsSeveral example values of used are shown.
es 0 1 2 3 4
used 2 22=4 42=16 162=256 2562=65536
TABLE 3
Exponent bit 337 corresponds to exponent e, which is an unsigned number. The exponent bits 337 described herein may have no offset associated therewith as compared to floating point numbers. Thus, the exponent bit 337 described herein may be represented by a factor of 2eScaling is performed. As shown in FIG. 3, there may be up to es exponent bits (e) depending on how many bits remain to the right of the bottom bit 335 of the n-bit hypothesis number 3311、e2、e3……ees). In some embodiments, this may allow for progressively less accuracy of the n-bit hypotheses 331, with numbers closer in magnitude to one having greater accuracy than the maximum or minimum number. However, the device is not suitable for use in a kitchenHowever, since very large or very small numbers may be used infrequently in certain kinds of operations, the diminishing accuracy performance of the n-bit assumed number 331 shown in FIG. 3 may be desirable in a wide range of situations.
Mantissa bits 339 (or fractional bits) represent any additional bits that may be part of an n-bit hypothetical number 331 located to the right of exponent bit 337. Similar to a floating-point bit string, mantissa bits 339 represent a fraction f, which may be similar to a fraction 1.f, where f includes one or more bits to the right of a subsequent decimal point. However, in the n-bit assumed number 331 shown in fig. 3, the "hidden bits" (e.g., one) may always be one (e.g., uniform) as compared to the floating-point bit string, which may include a sub-normal number with "hidden bits" zero (e.g., 0. f).
Fig. 4A is an example of a positive value for the 3-bit nonce 431. In FIG. 4A, only the right half of real numbers are projected, however, it should be appreciated that the real numbers corresponding to the negative projection of their positive corresponding values shown in FIG. 4A may exist on a curve representing a transformation about the y-axis of the curve shown in FIG. 4A.
In the example of fig. 4A, es ═ 2, and thus
Figure BDA0003228566890000201
The accuracy of the assumed number 431 may be increased by appending bits to a string of bits, as shown in fig. 4B. For example, appending a bit string having a bit with the value one (1) to the assumed number 431 increases the accuracy of the assumed number 431 as shown by the assumed number 431-2 in FIG. 4B. Similarly, appending a bit having the value one to the bit string of the hypothetical number 431-2 in FIG. 4B increases the accuracy of the hypothetical number 431-2 as shown by the hypothetical number 431-3 shown in FIG. 4B. The following is an example of an interpolation rule that may be used to append bits to the bit string of the hypothetical number 431 shown in FIG. 4A to obtain the hypothetical numbers 431-2, 431-3 illustrated in FIG. 4B.
If maxpos is the maximum positive value of the bit string for the hypothetical numbers 431-1, 431-2, 431-3 and minpos is the minimum value of the bit string for the hypothetical numbers 431-1, 431-2, 431-3 shown in FIG. 4B, maxpos may be equivalent to useed and minpos may be equivalent to minpos
Figure BDA0003228566890000202
Between maxpos and ± ∞, the new bit value may be maxpos used, and between zero and minpos, the new bit value may be
Figure BDA0003228566890000203
These new bit values may correspond to new base bits 335. At present value x is 2mAnd y is 2nWhere m and n differ by more than one, the new bit value may be given by the geometric mean:
Figure BDA0003228566890000204
which corresponds to the new exponent bit 337. If the new bit value is midway between the existing x and the y value immediately following it, the new bit value may represent an arithmetic mean
Figure BDA0003228566890000205
Which corresponds to the new mantissa bit 339.
Fig. 4B is an example of a hypothetical number construction using two exponent bits. In fig. 4B, only the right half of real numbers are projected, however, it should be appreciated that the real numbers corresponding to the negative projection of their positive corresponding values shown in fig. 4B may exist on a curve representing a transformation about the y-axis of the curve shown in fig. 4B. The assumed numbers 431-1, 431-2, 431-3 shown in FIG. 4B each contain only two outliers: zero (0) when all bits of the bit string are zero, and + - ∞whenthe bit string is one (1) after all zeros. It should be noted that the values of the assumed numbers 431-1, 431-2, 431-3 shown in FIG. 4 are exactly the same as usedk. That is, for a power of the k value represented by a base number (e.g., base bits 335 described above in connection with FIG. 3), the numerical value of the assumed numbers 431-1, 431-2, 431-3 shown in FIG. 4 is exactly useed. In fig. 4B, the assumed number 431-1 has es ═ 2, and thus
Figure BDA0003228566890000206
The assumed number 431-2 has es-3, so
Figure BDA0003228566890000207
Figure BDA0003228566890000208
And the number 431-3 is assumed to have es ═ 4, so
Figure BDA0003228566890000209
As an illustrative example of adding bits to the 3-bit hypothesis number 431-1 to create the 4-bit hypothesis number 431-2 of FIG. 4B, use is 256, so the bit string corresponding to 256 uses has extra base digits appended thereto and the previous 16 use has the terminal base digits appended thereto
Figure BDA00032285668900002010
As described above, between existing values, the corresponding bit string has additional exponent bits appended thereto. For example, the numerical values 1/16, 1/4, 1, and 4 will have exponent bits appended thereto. That is, the final exponent bit corresponding to the value 4, the final zero exponent bit corresponding to the value 1, and so on. This pattern can be further seen in hypothesis 431-3, which is a 5-bit hypothesis generated from a 4-bit hypothesis 431-2 according to the above rules. If another bit is added to the hypothetical number 431-3 in FIG. 4B to produce a 6-bit hypothetical number, the mantissa bit 339 will be appended to a value between 1/16 and 16.
The following is a non-limiting example of decoding a hypothetical number (e.g., hypothetical number 431) to obtain its numerical equivalent. In some embodiments, the bit string corresponding to the hypothesized number p is at-2n-1To 2n-1Unsigned integers within the range, k is an integer corresponding to base digit 335, and e is an unsigned integer corresponding to exponent digit 337. If the set of mantissa bits 339 is denoted as f1 f2...ffsAnd f is from 1.f1 f2...ffsThe value represented (e.g., by one after the decimal point after the mantissa bit 339), then p may be given by equation 1 below.
Figure BDA0003228566890000211
Equation 1
Another illustrative example of decoding a hypothetical digit string is provided below in connection with hypothetical digit string 0000110111011101 shown in table 4 below.
(symbol) Base number Index of refraction Mantissa
0 0001 101 11011101
TABLE 4
In table 4, assume that digit string 0000110111011101 is broken down into its constituent bit sets (e.g., sign bit 333, base bit 335, exponent bit 337, and mantissa bit 339). Since es is 3 in the hypothetical digit string shown in table 4 (e.g., because there are three exponent bits), the used is 256. Because the sign bit 333 is zero, the value of the numerical expression corresponding to the string of assumed-number bits shown in table 4 is positive. The base digit 335 has three consecutive runs of zeros corresponding to the value-3 (as described above in connection with table 2). Thus, the scale factor contributed by the base digit 335 is 256-3(e.g., usedk). Exponent number 337 represents five (5) as an unsigned integer and thus contributes 2e=25An additional scaling factor of 32. Finally, the mantissa bit 339 given as 11011101 in Table 4 represents two hundred twenty one (221) as an unsigned integer, so the mantissa bit 339 given as f above is
Figure BDA0003228566890000212
Using these values and equation 1, the numerical value corresponding to the sequence of hypothesized digit bits given in Table 4 is
Figure BDA0003228566890000213
Fig. 5 is a functional block diagram in the form of an apparatus 500 including acceleration circuitry 520 according to several embodiments of the present disclosure. The acceleration circuitry 520 may include logic circuitry 522 and memory resource 524, which may be similar to the logic circuitry 122/222 and memory resource 124/224 illustrated in fig. 1 and 2 herein. Logic circuitry 522 and/or memory resources 524 may be considered "devices" individually.
The acceleration circuitry 520 may be configured to receive a command (e.g., a start command) from a host (e.g., host 102/202 illustrated in fig. 1 and 2 herein) and/or a controller (e.g., controller 210 illustrated in fig. 2 herein) that initiates one or more operations (e.g., format conversion operations, arithmetic operations, logical operations, bitwise operations, etc.) on data stored in the memory resources 524. Once the start command has been received by the acceleration circuitry 520, the acceleration circuitry may perform the operations described above without an intervening command from the host and/or controller. For example, the acceleration circuitry 520 may include sufficient processing resources and/or instructions to operate on bit strings stored in the memory resources 524 without receiving additional commands from circuitry external to the acceleration circuitry 520.
The logic circuitry 522 may be an Arithmetic Logic Unit (ALU), a state machine, a sequencer, a controller, an instruction set architecture, or other type of control circuitry. As described above, the ALU may include circuitry that performs operations such as those described above on integer binary numbers of a bit string as in a hypothetical format (e.g., operations that convert the bit string from a first format (floating point format) to a second format (hypothetical format) and/or arithmetic operations, logical operations, bitwise operations, etc.). The Instruction Set Architecture (ISA) may include a Reduced Instruction Set Computing (RISC) device. In embodiments where the logic circuitry 522 comprises a RISC device, the RISC device may comprise processing resources that may employ an Instruction Set Architecture (ISA) such as the RISC-V ISA, however, embodiments are not limited to the RISC-V ISA and other processing devices and/or ISAs may be used.
In some embodiments, logic circuitry 522 may be configured to execute instructions (e.g., instructions stored in INSTR 525 portion of memory resource 524) to perform the above operations. For example, the logic circuitry 524 is provided with processing resources sufficient to cause such operations on data (e.g., on bit strings) received by the acceleration circuitry 520.
Once operated on by the logic circuitry 522, the resulting bit string may be stored in the memory resource 524 and/or a memory array (e.g., memory array 230 illustrated in fig. 2 herein). The stored resulting bit string may be addressed so that it may be used to perform an operation. For example, the bit string may be stored in the memory resource 524 and/or a memory array at a particular physical address (which may have a corresponding logical address corresponding thereto) so that the bit string may be accessed when performing an operation.
In some embodiments, memory resource 524 may be a memory resource, such as random access memory (e.g., RAM, SRAM, etc.). However, embodiments are not so limited, and memory resources 524 may include various registers, caches, buffers, and/or memory arrays (e.g., 1T1C, 2T2C, 3T, etc. DRAM arrays). Herein, memory resource 524 may be configured to receive bit strings from, for example, a host (such as host 102/202 illustrated in fig. 1 and 2) and/or a memory array (such as memory array 130/230 illustrated in fig. 1 and 2). In some embodiments, the memory resources 538 may have a size of approximately 256 Kilobytes (KB), however embodiments are not limited to this particular size and the memory resources 524 may have a size greater than or less than 256 KB.
Memory resource 524 may be partitioned into one or more addressable memory regions. As shown in fig. 5, memory resources 524 may be partitioned into addressable memory regions so that various types of data may be stored therein. For example, one or more memory regions may store instructions ("INSTR") 525 used by memory resource 524, one or more memory regions may store data 526-1 … … 526-N (e.g., data of a bit string as retrieved from a host and/or memory array), and/or one or more memory regions may serve as LOCAL memory ("LOCAL MEM") 528 portion of memory resource 538. Although 20 different memory regions are shown in fig. 5, it should be appreciated that memory resource 524 may be partitioned into any number of different memory regions.
As discussed above, the bit string may be retrieved from the host and/or the memory array in response to a message and/or command generated by the host, a controller (e.g., controller 210 illustrated in fig. 2 herein), or logic circuitry 522. In some embodiments, commands and/or messages may be processed by logic circuitry 522. Once the bit string is received by the acceleration circuitry 520 and stored in the memory resource 524, it may be processed by the logic circuitry 522. Processing the bit string by the logic circuitry 522 may include converting the bit string from a first format to a second format, performing arithmetic and/or logical operations on the converted bit string, and/or converting the bit string on which the operation has been performed from the second format to the first format.
In a non-limiting neural network training application, the acceleration circuitry 520 may convert the floating-point bit string to an 8-bit assumed number with es ═ 0. In contrast to some approaches that utilize a half-precision 16-bit floating-point bit string for neural network training, an 8-bit assumed bit string with es ═ 0 can provide comparable neural network training results that are two to four times faster than the half-precision 16-bit floating-point bit string.
A common function used in training neural networks is the sigmoid function f (x) (e.g., a function that gradually approaches zero when x → - ∞ and gradually approaches 1 when x → ∞). An example of an sigmoid function that may be used in neural network training applications is
Figure BDA0003228566890000231
It may require up to one hundred clock cycles to compute using a half precision 16-bit floating point bit string. However, using an 8-bit hypothesis with es ═ 0, the first bit of the hypothesis number representing x can be inverted by acceleration circuitry 520And shifted two bits to the right to evaluate the same function, the operation may take at least an order of magnitude less clock signals than the evaluation of the same function using a half-precision 16-bit floating-point bit string.
In this example, by operating the acceleration circuitry 520 to convert a floating-point bit string to an 8-bit assumed bit string having es ═ 0, and then operating the acceleration circuitry 520 to perform operations that evaluate an example sigmoid function on the 8-bit assumed bit string, processing time, resource consumption, and/or memory space may be reduced as compared to methods that do not include the acceleration circuitry 520 configured to perform such conversion and/or subsequent operations. This reduction in processing time, resource consumption, and/or memory space may improve the functionality of the computing device by reducing the number of clock signals used in performing such operations (which may reduce the amount of power consumed by the computing device and/or the amount of time to perform such operations) and by freeing processing and/or memory resources for other tasks and functions to operate the acceleration circuitry 520.
FIG. 6 is a diagram of a host 602, a hypervisor 642, a plurality of Virtual Compute Instances (VCIs) 641-1, 641-2 … … 641-N, and a hypothetical operations agent 643, according to several embodiments of the present disclosure. The system may include a host 602 having processing resources 645 (e.g., one or more processors), memory resources 647 (e.g., one or more main memory devices, such as memory device 204 illustrated in fig. 2A and 2B herein), and/or a storage memory device), and/or a network interface 649. Host 602 may be included in a multi-user network, such as multi-user network 201 illustrated in fig. 2B. Multi-user networks may extend virtualization concepts such as abstraction, aggregation, and automation of data center resources and services to provide information technology as a service (ITaaS). In a multi-user network, the infrastructure (e.g., networking, processing, and security) may be virtualized and delivered as a service. The multi-user network may include software-defined networking and/or software-defined storage. In some embodiments, the components of the multi-user network may be provided, operated, and/or managed through an Application Programming Interface (API).
The host 602 may incorporate a hypervisor 642 that may execute several VCIs 641-1, 641-2 … … 641-N (collectively referred to herein as "VCIs 641"). The VCI may be provided with processing resources 645 and/or memory resources 647 and may communicate via a network interface 649. The processing resources 647 and memory resources 647 provided to the VCI 641 may be local and/or remote to the host 602 (e.g., the VCI 641 may ultimately be executed by hardware that may not be physically associated with the VCI 641). For example, in a multi-user network, VCI 641 may be provided with resources that are generally available to the multi-user network and not associated with any particular hardware device. By way of example, the memory resources 647 may include volatile and/or non-volatile memory available to the VCI 647. The VCI 641 may be moved to a different host (not specifically illustrated) such that a different hypervisor manages the VCI 641. In some embodiments, the host 602 may be connected to (e.g., in communication with) a hypothetical operational agent 643, which hypothetical operational agent 643 may be deployed on a VCI 641 or a container (not explicitly shown).
The VCI 641 may include one or more containers that may have containerized workloads running thereon. The containerized workload may correspond to one or more applications or portions of applications executed by the VCI 641 and/or the host 602. The application may be configured to perform certain tasks and/or functions with respect to the VCI 641 and/or the host 602, such as converting a bit string between various formats and performing arithmetic and/or logical operations using a hypothetical bit string. By executing an application using multiple containerized workloads, the extensibility and/or portability of the application may be improved as compared to an approach in which the application is monolithic.
Assume that the number operation agent 643 may be configured to cause operations, such as operations to convert a bit string between various formats and/or operations to perform arithmetic and/or logical operations on the bit string, as described in more detail herein. In some embodiments, the hypothetical operations agent 643 may be deployed on one or more of the host 602 and/or the VCI 641 (e.g., may run on one or more of the host 602 and/or the VCI 641).
In some embodiments, the predicate calculation agent 643 may include a combination of software and hardware, or the predicate calculation agent 643 may include software and may be provided by the processing resources 645. This is illustrated in more detail herein with respect to FIGS. 7A and 7BAnd an example describing the hypothetical operations agent 643. In some embodiments, operations performed by the predicate operation agent 643 may be performed by a container scheduling agent (e.g., scheduling agent 742 illustrated in fig. 7A and 7B herein) such as
Figure BDA0003228566890000251
DOCKER
Figure BDA0003228566890000252
Etc.) scheduling.
Assume that the arithmetic proxy 643 may be deployed in a multi-user network (such as the multi-user network 201 illustrated in fig. 2B herein). The assumed-number operation agent 643 may be configured to receive parameters corresponding to at least one of an arithmetic operation and a logical operation using one or more assumed-number bit strings. The parameter may be at least one of the parameters described above in connection with table 1. For example, the parameters may include a processing time parameter, a parameter corresponding to an amount of processing resources to operate on using one or more hypothesized bit strings, a parameter corresponding to a bit length of one or more hypothesized bit strings, a parameter corresponding to a number of exponent bits of one or more hypothesized bit strings, or a combination thereof.
The hypothesized digit operation agent 643 may be configured to allocate computing resources available to the multi-user network based on the parameters for performing arithmetic operations and/or logical operations using one or more hypothesized digit strings. For example, assume that the number operation agent 643 may be configured to allocate an amount of time available for performing arithmetic and/or logical operations, an amount of processing resources available for performing arithmetic and/or logical operations, a bit length of a hypothetical bit string to be used for performing arithmetic and/or logical operations, and/or an exponential length of a hypothetical bit string to be used for performing arithmetic and/or logical operations.
In some embodiments, the hypothesized number operation agent 643 may receive a request to initiate an arithmetic operation and/or a logical operation using one or more hypothesized bit strings, and/or cause an arithmetic operation and/or a logical operation to be performed using one or more hypothesized bit strings based, at least in part, on received parameters. For example, assume that the digital operation agent 643 may access circuitry (such as the logic circuitry 122 illustrated in FIG. 1), the FPGA 221 and/or the ASIC 223 illustrated in FIG. 2B to perform arithmetic operations and/or logical operations using one or more assumed bit strings. The request and/or parameters may be received from a compute node, such as compute node 207 illustrated in fig. 2B herein.
If the bit strings used to perform the arithmetic and/or logical operations are stored in a repository of the multi-user network (e.g., memory device 204 or memory resource 247 illustrated in fig. 2B, or other data storage area or data repository associated with the multi-user network), then the assumed-number operation agent 643 may be configured to retrieve one or more assumed-number bit strings from accessible by the multi-user network prior to causing the arithmetic and/or logical operations to be performed using the one or more assumed-number bit strings.
In some embodiments, the bit string may be stored according to a format other than a hypothetical number format. For example, the bit string may be stored in a floating point format. If the bit string requested for performing the arithmetic and/or logical operation is stored in a format other than the hypothetical number, the hypothetical operation agent 643 may be configured to perform an operation that converts the bit string into the hypothetical number format prior to performing (or causing to be performed using, for example, the logic circuitry 122 illustrated in FIG. 1, the FPGA 221 and/or the ASIC 223 illustrated in FIG. 2B) the arithmetic operation and/or the logical operation.
Fig. 7A is a diagram of a Virtual Compute Cluster (VCC)751, according to several embodiments of the present disclosure. VCC 751, which may be similar to VCC 251 illustrated in fig. 2B, may be deployed in a multi-user network such as multi-user network 201 illustrated in fig. 2B herein. As shown in fig. 7A, a cluster 751 (e.g., VCC) may include multiple Virtual Compute Instances (VCIs) 741 provided with a pool of computing resources (e.g., shared pool of computing resources 246 illustrated in fig. 2B) and executable by hardware. In some embodiments, at least a first VCI (e.g., VCI 741-1) is deployed on a first hypervisor (e.g., hypervisor 742-1) of the VCC 751 and at least a second VCI (e.g., VCI 741-2) is deployed on a second hypervisor (e.g., hypervisor 742-M) of the VCC 751. Although not explicitly shown, in some embodiments, the VCI 741 may include a container running thereon.
The VCI 741 may include a corresponding predicate operation agent 743. For example, a first hypothetical operation agent 743-1 may be deployed on a first VCI 741-1, a second hypothetical operation agent 743-2 may be deployed on a second VCI 741-2, and an Nth hypothetical operation agent 743-N may be deployed on an Nth VCI 741-N. As described above, the nonce operation agent 743 may be configured to perform or cause operations such as converting a bit string between various formats, as well as arithmetic and/or logical operations using the converted (e.g., the nonce) bit string. In some embodiments, the nonce operation agent may be provided as a nonce operation engine and/or a nonce operation module, as described in more detail herein in connection with fig. 8 and 9.
Scheduling agent 752 may be provided with computing resources and may be configured to coordinate the deployment of VCIs 741 and/or containers within VCC 751. In some embodiments, scheduling agent 752 may be, for example
Figure BDA0003228566890000261
DOCKER
Figure BDA0003228566890000262
Figure BDA0003228566890000263
Etc. container scheduler. The scheduling agent 752 may determine when to deploy the VCI 741 (or container) to run the nonce operations agent 743 in response to a request to operate on a string of hypothetical digits received by the VCC 751. For example, if a request to perform a particular arithmetic and/or logical operation using a string of hypothesized bits is received, scheduling agent 752 may deploy a VCI (e.g., VCI 741-1) and/or a container to run a hypothesized number operating agent (e.g., hypothesized number operating agent 743-1) to facilitate the performance of the requested operation.
Fig. 7B is another diagram of a virtual compute cluster 751, according to several embodiments of the present disclosure. VCC 751 may be deployed in a multi-user network, such as multi-user network 201 illustrated in fig. 2B herein. As shown in fig. 7B, a cluster 751 (e.g., VCC) may include a plurality of Virtual Compute Instances (VCIs) 741 provided with a pool of computing resources (e.g., processing resources 645 and/or memory resources 647 illustrated in fig. 6 herein) and executable by hardware. In some embodiments, at least a first VCI (e.g., VCI 741-1) is deployed on a first hypervisor (e.g., hypervisor 742-1) of the VCC 751 and at least a second VCI (e.g., VCI 741-2) is deployed on a second hypervisor (e.g., hypervisor 742-M) of the VCC 751. Although not explicitly shown, in some embodiments, the VCI 741 may include a container.
The hypervisor 742-1 … … 742-M may include a corresponding predicate operation agent 743. For example, a first predicate operation agent 743-1 may be deployed on a first hypervisor 742-1, and an Mth predicate operation agent 743-M may be deployed on an Mth hypervisor 741-M. As described above, the nonce operation agent 743 may be configured to perform or cause operations such as converting a bit string between various formats, as well as arithmetic and/or logical operations using the converted (e.g., the nonce) bit string. The hypothetical arithmetic agent is described in more detail herein in connection with fig. 8 and 9.
In some embodiments, the speculative operation agent 743 may be provided with computational resources and may be executed by hardware. For example, assume that the numeric operation agent 743 may be provided with computing resources (e.g., processing resources, memory resources, etc.) that are available to a multi-user network, such as the multi-user network 201 illustrated in fig. 2B herein. As described in more detail herein, due to the dynamic nature of multi-user networks, the presumed number operation agent 743 can be deployed on a VCI (as shown in fig. 7A), or the presumed number operation agent 743 can be deployed on a hypervisor 742, as shown in fig. 2B. However, wherever the hypothetical arithmetic agent 743 is deployed, it may ultimately be executed by hardware available to the multi-user network or the VCC 751.
The hypothetical number operation agent 743 may be configured to receive a request to perform at least one of an arithmetic operation and a logical operation between a first hypothetical digit string operand and a second hypothetical digit string operand, as described above. In some embodiments, the hypothesized number operating agent 743 may be configured to allocate an amount of computational resources available for and/or cause an arithmetic operation and/or a logical operation to be performed between a first hypothesized digit string operand and a second hypothesized digit string operand. The amount of computing resources allocated for performing operations by the predicate operation agent 743 may be based on various parameters, such as those described above in connection with Table 1.
In some embodiments, circuitry (e.g., logic circuitry 122 illustrated in fig. 1, acceleration circuitry 220 illustrated in fig. 2A, FPGA 221 and/or ASIC 223 illustrated in fig. 2B) may be communicatively coupled to VCC 751, as shown in fig. 2B. For example, since VCC 751 can be deployed in a multi-user network (such as multi-user network 201 illustrated in FIG. 2B), circuitry can be accessed by VCC 751. In some embodiments, the hypothesized number operating agent 743 may cause a first hypothesized digit string operand and a second hypothesized digit string operand to be loaded into logic circuitry, and the circuitry may be configured to perform an arithmetic operation and/or a logical operation between the first hypothesized digit string operand and the second hypothesized digit string operand, as described above.
If the bit string is not available in the hypothesized number format (e.g., if the requested bit string is stored in, for example, a floating point format), the hypothesized number operation agent 743 may access the circuitry and cause the first and second floating point bit strings to be loaded into the circuitry. However, embodiments are not limited to floating point bit strings, and bit strings may be in other numerical formats, such as fixed width formats.
Once the bit string is loaded into circuitry, the circuitry may convert the first floating-point bit string to a hypothetical format to produce a first hypothetical digit string operand and convert the second floating-point bit string to a hypothetical format to produce a second hypothetical digit string operand. After converting the floating-point bit string to a hypothetical number format, the circuitry may perform arithmetic and/or logical operations between the first hypothetical digit string operand and the second hypothetical digit string operand, as described herein in connection with fig. 1, 2A-2B, and 5.
As described above, the hypothetical operations agent 743 may receive various parameters and perform (or cause to be performed) operations that convert a bit string between various formats, and perform arithmetic and/or logical operations using the bit string. For example, assume that the number operation agent 743 may receive a parameter as part of a request command to perform an operation received from a compute node (such as compute node 207 illustrated in fig. 2B herein).
For example, the hypothetical operation agent 743 can receive a processing resource parameter corresponding to performing an arithmetic operation and/or a logical operation, and allocate an amount of computing resources available for performing the arithmetic operation and/or the logical operation based at least in part on the processing resource parameter. In another example, the hypothetical operation agent 743 can receive a processing time parameter corresponding to performing an arithmetic operation and/or a logical operation, and allocate an amount of time available for performing the arithmetic operation and/or the logical operation based at least in part on the processing time parameter.
In yet another example, the presumed number operation agent 743 may receive a presumed number precision parameter corresponding to performing at least one of an arithmetic operation and a logical operation, and set a bit length of the first presumed digit string operand and the second presumed digit string operand based at least in part on the presumed number precision parameter, and/or set an exponent bit length of the first presumed digit string operand and the second presumed digit string operand based at least in part on the presumed number precision parameter.
Fig. 8 is a diagram of an apparatus 853 according to several embodiments of the present disclosure. The apparatus 853 may include a database 854, a subsystem 855, and/or a number of engines, such as a hypothetical arithmetic engine 856, and may communicate with the database 854 via a communication link. The apparatus 853 may include additional or fewer engines than illustrated to perform the various functions described herein. Device 853 may represent program instructions and/or hardware of a machine (e.g., machine 957 as referenced in fig. 9, etc.). As used herein, an "engine" may include program instructions and/or hardware, but at least hardware. Hardware is a physical component of a machine that enables it to function. Examples of hardware may include processing resources, memory resources, logic gates, and so forth. In some embodiments, the apparatus 853 may be similar to the predicate calculation agent 643 illustrated and described herein in connection with fig. 6.
An engine (e.g., 856) may comprise a combination of hardware and program instructions configured to perform several functions described herein. Program instructions (e.g., software, firmware, etc.) may be stored in memory resources (e.g., machine-readable media) as well as in hardwired programs (e.g., logic). Hardwired program instructions (e.g., logic) may be viewed as both program instructions and hardware.
In some embodiments, the predicate operation engine 856 may include a combination of hardware and program instructions that may be configured to perform the operations described above in connection with the predicate operation agents 643 and/or 743 of fig. 6 and 7A-7B.
Fig. 9 is a diagram of a machine 957 according to several embodiments of the present disclosure. The machine 957 may utilize software, hardware, firmware, and/or logic to perform several functions. The machine 957 may be a combination of hardware and program instructions configured to perform several functions, such as acts. The hardware may include, for example, a number of processing resources 945 and a number of memory resources 947, such as machine-readable media (MRM) or other memory resources 947. Memory resources 947 may be internal and/or external to machine 957 (e.g., machine 957 may include internal memory resources and may access external memory resources). In some embodiments, machine 957 may be a virtual machine, or machine 957 may be a server. Program instructions, such as machine-readable instructions (MRI), may comprise instructions stored on the MRM to implement specific functions (e.g., actions involving logic circuitry in a multi-user network, such as converting bit strings between various formats within the multi-user network, performing arithmetic and/or logical operations on the converted bit strings, etc.). The set of MRIs may be performed by one or more of the processing resources 945. Memory resources 947 can be coupled to machine 957 in a wired and/or wireless manner. For example, the memory resource 947 can be internal memory, portable memory, a portable disk, and/or memory associated with another resource, e.g., to enable transfer and/or execution of MRI across a network such as the internet. As used herein, "module" may include program instructions and/or hardware, but at least program instructions.
The memory resources 947 can be non-transitory and can include volatile and/or non-volatile memory. Volatile memory may include memory that depends on the power at which information is stored, such as various types of Dynamic Random Access Memory (DRAM), and so forth. Non-volatile memory may include memory that does not depend on the power at which the information is stored. Examples of non-volatile memory may include solid-state media such as flash memory, electrically erasable programmable read-only memory (EEPROM), Phase Change Random Access Memory (PCRAM), magnetic memory, optical memory, and/or a solid-state drive (SSD), among other types of machine-readable media.
Processing resources 945 can be coupled to memory resources 947 via communication path 958. The communication path 958 may be local to or remote from the machine 957. Examples of local communication paths 958 may include an electronic bus internal to the machine, where the memory resources 947 communicate with the processing resources 945 via the electronic bus. Examples of such electronic buses may include Industry Standard Architecture (ISA), Peripheral Component Interconnect (PCI), Advanced Technology Attachment (ATA), Small Computer System Interface (SCSI), Universal Serial Bus (USB), and other types and variations of electronic buses. The communication paths 958 may keep the memory resources 947 away from the processing resources 945, such as in a network connection between the memory resources 947 and the processing resources 945. That is, the communication path 958 may be a network connection. Examples of such network connections may include a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), the Internet, and so forth.
As shown in fig. 9, the MRI stored in the memory resources 947 may be segmented into several modules (e.g., 959) that may perform several functions when executed by the processing resources 945. As used herein, a module includes a set of instructions included to perform a particular task or action. Module 959 may be a sub-module of other modules. Examples are not limited to the particular module 959 illustrated in fig. 9.
The module (59 may include program instructions and/or a combination of hardware and program instructions that, when executed by the processing resources 945, may act as a corresponding engine as described with respect to fig. 8, for example, the hypothetical arithmetic module 959 may include program instructions and/or a combination of hardware and program instructions that, when executed by the processing resources 945, may act as the hypothetical arithmetic engine 856 illustrated and described in connection with fig. 8.
Fig. 10 is a flow diagram representing an example method 1060 involving arithmetic and logical operations in a multi-user network in accordance with several embodiments of the present disclosure. At block 1062, the method 1060 may include receiving a request for an arithmetic operation and/or a logical operation between a first operand and a second operand. As shown at block 1062 of method 1060, the request may include a parameter corresponding to an amount of shared computing resources to be allocated for performing arithmetic operations and/or logical operations. Receiving a request to perform an arithmetic operation and/or a logical operation between a first operand and a second operand may further include receiving a request to perform an arithmetic operation and/or a logical operation using a hypothetical digit string operand as at least one of the first operand and the second operand. For example, in some embodiments, at least one of the first operand and the second operand may be a hypothetical digit string operand.
As described above in connection with table 1, the parameters may include a parameter corresponding to an amount of time allowed to perform at least one of an arithmetic operation and a logical operation, an amount of processing resources allowed to perform at least one of an arithmetic operation and a logical operation, and/or a first bit string length and a first exponent bit length of the first bit string operand and a second bit string length and a second exponent bit length of the second bit string.
In some embodiments, the method 1060 may further include causing at least one of the arithmetic operation and the logical operation to be performed during an amount of time allowed for performing the at least one of the arithmetic operation and the logical operation, and/or causing at least one of the arithmetic operation and the logical operation to be performed using an allowed amount of processing resources specified by the parameter. However, embodiments are not so limited, and in some embodiments, the method 1060 may further include setting a first bit string length and a first exponent bit length of the first bit string operand based on the parameter, setting a second bit string length and a second exponent bit length of the second bit string operand based on the parameter, and/or causing at least one of an arithmetic operation and a logical operation to be performed using the first bit string operand and the second bit string operand.
At block 1064, the method 1060 may include allocating an amount of shared computing resources to be used for performing arithmetic operations and/or logical operations based, at least in part, on the parameter. For example, method 1060 may include allocating a particular amount of processing resources for performing arithmetic operations and/or logical operations as described above in connection with table 1.
At block 1066, the method 1060 may further include causing arithmetic operations and/or logical operations to be performed using the allocated amount of the shared computing resource. In some embodiments, causing the arithmetic operation and/or the logical operation to be performed using the allocated amount of the shared computing resource may further include enabling logic circuitry communicatively coupled to the shared computing resource to perform the arithmetic operation and/or the logical operation. The logic circuitry may be similar to the logic circuitry 122 illustrated in fig. 1 herein.
Method 1060 may further include generating a graphical user interface to be displayed by a compute node connected to the shared pool of computing resources including the amount of shared computing resources, and/or receiving a request via input provided to the graphical user interface. The graphical user interface may include prompts and/or selectable items that may allow a user to select parameters for performing the operations described herein. For example, the graphical user interface may include hints and/or selectable items corresponding to an amount of processing resources to be used for performing operations (e.g., a number of compute cores), an amount of time in which to perform operations (e.g., a processing time parameter), a bit length of an operand to be used for performing operations, and/or a number of exponent bits of an operand to be used for performing operations.
Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that an arrangement calculated to achieve the same results may be substituted for the specific embodiments shown. This disclosure is intended to cover adaptations or variations of one or more embodiments of the present disclosure. It is to be understood that the above description has been made in an illustrative fashion, and not a restrictive one. Combinations of the above embodiments, and other embodiments not specifically described herein, will be apparent to those of skill in the art upon reviewing the above description. The scope of one or more embodiments of the present disclosure includes other applications in which the above structures and processes are used. The scope of one or more embodiments of the disclosure should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
In the foregoing detailed description, certain features are grouped together in a single embodiment for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the disclosed embodiments of the disclosure have to use more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed embodiment. Thus the following claims are hereby incorporated into the detailed description, with each claim standing on its own as a separate embodiment.

Claims (27)

1. An apparatus, comprising:
circuitry communicatively coupled to a shared pool of computing resources deployed in a multi-user network, wherein the circuitry is configured to:
receiving a request to perform an arithmetic operation or a logical operation or both using at least one hypothesized digit string operand, wherein the request includes parameters corresponding to performing the operation using the at least one hypothesized digit string; and
performing the arithmetic operation or the logical operation or both using the at least one hypothesized bit string operand based at least in part on the received parameters.
2. The apparatus of claim 1, wherein the circuitry is configured to use the at least one hypothesized bit string operand to access an amount of computational resources among the shared pool of computational resources specified by the parameters for performing the arithmetic operation or the logical operation, or both.
3. The apparatus of claim 1, wherein the circuitry is configured to perform the arithmetic operation or the logical operation, or both, using the at least one hypothesized bit string operand for a particular amount of time specified by the parameter.
4. The apparatus of any one of claims 1-3, wherein the parameter corresponds to a bit length of the at least one hypothesized bit string operand, a number of exponent bits of the at least one hypothesized bit string operand, or both.
5. The apparatus of any of claims 1-3, wherein the logic circuitry is configured to:
receiving at least one floating-point bit string;
generating the at least one dummy bit string by converting the at least one floating-point bit string to a dummy bit string prior to performing at least one of the arithmetic operation and the logical operation.
6. The apparatus of any of claims 1-3, wherein the circuitry is further configured to request allocation of an amount of processing resources and an amount of memory resources from the shared pool of computing resources for performing the arithmetic operation or the logical operation, or both, using the at least one hypothesized number bit string operand in response to receipt of the request to perform the operation.
7. The apparatus of any of claims 1-3, wherein the circuitry is further configured to retrieve the at least one hypothesized bit string operand from a memory location within the shared computing resource pool prior to performing the arithmetic operation or the logical operation, or both.
8. A system, comprising:
a multi-user network comprising a pool of shared computing resources;
a computing node configured to access the multi-user network; and
circuitry communicatively coupled to the shared pool of computing resources, wherein the circuitry is configured to:
receiving a request from the compute node to perform an arithmetic operation or a logical operation or both using at least one hypothesized digit string operand;
receiving parameters from the compute node corresponding to the operation using the at least one string of hypothesized digit bits;
causing the arithmetic operation or the logical operation, or both, to be performed using the shared pool of computing resources based at least in part on the request and the received parameters.
9. The system of claim 8, wherein the circuitry is configured to:
requesting allocation of an amount of computing resources from the shared pool of computing resources for performing the arithmetic operation or the logical operation, or both, based on the received parameters; and
such that the arithmetic operation or the logical operation, or both, are performed using the amount of computing resources allocated.
10. The system of any one of claims 8-9, wherein the parameter includes an amount of time allowed for the arithmetic operation or the logical operation, or both, and wherein
The circuitry is configured such that the arithmetic operation or the logical operation, or both, are performed within an allowed amount of time.
11. The system of any of claims 8-8, wherein the parameters include a bit string length and an exponent bit length of the first bit string operand, and a second bit string length and a second exponent bit length of the at least one hypothesized bit string, and wherein
The circuitry is configured to set the bit string length and the exponent bit length of the at least one dummy bit string operand based on the parameter prior to performing the arithmetic operation or the logical operation, or both.
12. The system of any of claims 8-9, wherein the circuitry is configured to:
receiving at least one floating-point bit string;
generating the at least one hypothesized bit string by converting the at least one floating-point bit string to a hypothesized format prior to performing the arithmetic operation or the logical operation, or both.
13. The system of any one of claims 8 to 9, wherein the circuitry is configured to access a memory location within the shared pool of computing resources to retrieve the at least one hypothetical digit string operand prior to performing the arithmetic operation or the logical operation, or both.
14. An apparatus, comprising:
an agent deployed in a multi-user network, the agent provided with processing resources and executable by hardware, wherein the agent is configured to:
receiving parameters corresponding to an arithmetic operation or a logical operation or both using one or more hypothesized bit strings;
receiving a request to initiate the arithmetic operation or the logical operation, or both, using the one or more hypothesized digit strings; and
causing the arithmetic operation or the logical operation, or both, to be performed using the one or more hypothesized digit strings based, at least in part, on the received parameters.
15. The apparatus of claim 14, wherein the parameters comprise a parameter corresponding to an amount of time to perform the operation, a parameter corresponding to an amount of processing resources to perform the operation, a parameter corresponding to a bit length of the one or more hypothesized bit strings, a parameter corresponding to a number of exponent bits of the one or more hypothesized bit strings, or a combination thereof.
16. The apparatus of claim 14, wherein the agent is further configured to allocate computing resources available to the multi-user network for performing the arithmetic operation or the logical operation, or both, using the one or more hypothesized digit strings based on the parameter.
17. The apparatus of any one of claims 14-16, further comprising logic circuitry communicatively coupled to the agent, wherein the agent is further configured such that the one or more hypothesized bit strings are communicated to the logic circuitry, and wherein
The logic circuitry is configured to perform the arithmetic operation or the logical operation, or both, using the one or more hypothesized bit strings.
18. The apparatus of claim 17, wherein the agent is further configured to cause the logic circuitry to retrieve the one or more hypothesized digit strings from a memory resource accessible by the multi-user network prior to causing the arithmetic operation or the logical operation, or both, to be performed using the one or more hypothesized digit strings.
19. The apparatus of any one of claims 14-16, wherein the agent is further configured to cause conversion of one or more strings of floating-point bits to a hypothetical format to generate the one or more hypothesized bit string operands prior to causing the arithmetic operation or the logical operation, or both, to be performed using the one or more hypothesized bit strings.
20. A system, comprising:
a Virtual Compute Cluster (VCC);
an agent deployed within the VCC, the VCC provided with computing resources and executable by hardware, wherein the agent is configured to:
receiving a request to perform an arithmetic operation or a logical operation or both between a first hypothesized digit string operand and a second hypothesized digit string operand;
allocating an amount of computational resources available for performing the arithmetic operation or the logical operation, or both, between the first hypothetical digit string operand and the second hypothetical digit string operand; and
causing the arithmetic operation or the logical operation, or both, to be performed between the first hypothetical digit string operand and the second hypothetical digit string operand.
21. The system of claim 20, further comprising logic circuitry communicatively coupled to the VCC, wherein the proxy is further configured to:
accessing the logic circuitry; and
such that the first hypothetical bit string operand and the second hypothetical bit string operand are loaded into the logic circuitry, and wherein
The logic circuitry is configured to perform the arithmetic operation or the logical operation, or both, between the first hypothetical digital string operand and the second hypothetical digital string operand.
22. The system of claim 20, wherein the logic circuitry comprises at least one of an application specific integrated circuit and a field programmable gate array.
23. The system of claim 20, further comprising logic circuitry communicatively coupled to the VCC, wherein the proxy is further configured to:
accessing the logic circuitry;
causing loading of a first floating-point bit string and a second floating-point bit string into the logic circuitry, and wherein the logic circuitry is configured to:
converting the first floating-point bit string to a first hypothesized bit string to generate the first hypothesized bit string operand;
converting the second string of floating-point bits to a second string of hypothetical digits strings to produce the second hypothetical digit string operand; and
performing the arithmetic operation or the logical operation, or both, between the first hypothetical digit string operand and the second hypothetical digit string operand.
24. The system of any of claims 20 to 23, wherein the agent is configured to:
receiving a processing resource parameter corresponding to performing the arithmetic operation or the logical operation, or both; and
allocating an amount of the computing resources available to perform the arithmetic operation or the logical operation, or both, based at least in part on the processing resource parameter.
25. The system of any of claims 20 to 23, wherein the agent is configured to:
receiving a processing time parameter corresponding to performing the arithmetic operation or the logical operation, or both; and
allocating an amount of time available to perform the arithmetic operation or the logical operation, or both, based at least in part on the processing time parameter.
26. The system of any of claims 20 to 23, wherein the agent is configured to:
receiving a hypothetical precision parameter corresponding to performing the arithmetic operation or the logical operation, or both;
setting a bit length of the first hypothetical digital string operand and the second hypothetical digital string operand based, at least in part, on the hypothetical precision parameter; and
setting an exponent bit length of the first hypothesized digit string operand and the second hypothesized digit string operand based, at least in part, on the hypothesized digit precision parameter.
27. The system of any of claims 20 to 23, wherein the agent runs on a hypervisor deployed in the VCC, a virtual compute instance deployed in the VCC, or a container deployed in the VCC.
CN202080016545.5A 2019-02-27 2020-01-28 Arithmetic and logical operations in a multi-user network Active CN113508363B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US16/287,156 US11074100B2 (en) 2019-02-27 2019-02-27 Arithmetic and logical operations in a multi-user network
US16/287,156 2019-02-27
US16/286,941 US10990387B2 (en) 2019-02-27 2019-02-27 Converting floating-point operands into universal number format operands for processing in a multi-user network
US16/286,941 2019-02-27
PCT/US2020/015369 WO2020176184A1 (en) 2019-02-27 2020-01-28 Arithmetic and logical operations in a multi-user network

Publications (2)

Publication Number Publication Date
CN113508363A true CN113508363A (en) 2021-10-15
CN113508363B CN113508363B (en) 2022-09-16

Family

ID=72238547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080016545.5A Active CN113508363B (en) 2019-02-27 2020-01-28 Arithmetic and logical operations in a multi-user network

Country Status (4)

Country Link
EP (1) EP3931695A4 (en)
KR (1) KR20210121266A (en)
CN (1) CN113508363B (en)
WO (1) WO2020176184A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120311299A1 (en) * 2001-02-24 2012-12-06 International Business Machines Corporation Novel massively parallel supercomputer
US20180276040A1 (en) * 2017-03-23 2018-09-27 Amazon Technologies, Inc. Event-driven scheduling using directed acyclic graphs
CN109213723A (en) * 2017-07-01 2019-01-15 英特尔公司 Processor, method and system for the configurable space accelerator with safety, power reduction and performance characteristic

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5652862A (en) * 1992-12-30 1997-07-29 Apple Computer, Inc. Method and appartus for determining a precision of an intermediate arithmetic for converting values between a first numeric format and a second numeric format
US7353368B2 (en) * 2000-02-15 2008-04-01 Intel Corporation Method and apparatus for achieving architectural correctness in a multi-mode processor providing floating-point support
US20180217838A1 (en) * 2017-02-01 2018-08-02 Futurewei Technologies, Inc. Ultra lean vector processor

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120311299A1 (en) * 2001-02-24 2012-12-06 International Business Machines Corporation Novel massively parallel supercomputer
US20180276040A1 (en) * 2017-03-23 2018-09-27 Amazon Technologies, Inc. Event-driven scheduling using directed acyclic graphs
CN109213723A (en) * 2017-07-01 2019-01-15 英特尔公司 Processor, method and system for the configurable space accelerator with safety, power reduction and performance characteristic

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GUSTAFSON等: "Beating Floating Point at ints Own Game:Posit Arithmetic", 《SUPERCOMPUTING FRONTIERS AND INNOVATIONS:AN INTERNATIONAL JOURNAL》 *

Also Published As

Publication number Publication date
KR20210121266A (en) 2021-10-07
EP3931695A1 (en) 2022-01-05
EP3931695A4 (en) 2022-12-14
CN113508363B (en) 2022-09-16
WO2020176184A1 (en) 2020-09-03

Similar Documents

Publication Publication Date Title
CN111724832B (en) Apparatus, system, and method for memory array data structure positive number operation
CN112420092B (en) Bit string conversion
CN111625183A (en) Systems, devices, and methods involving acceleration circuitry
CN113965205A (en) Bit string compression
CN111696610A (en) Apparatus and method for bit string conversion
CN113906386B (en) Bit string operations using computation tiles
CN113805974A (en) Application-based data type selection
US11074100B2 (en) Arithmetic and logical operations in a multi-user network
US20200341762A1 (en) Bit sting operations using a computing tile
CN113508363B (en) Arithmetic and logical operations in a multi-user network
CN113918117B (en) Dynamic precision bit string accumulation
CN114096948B (en) Bit string lookup data structure
CN113553278A (en) Acceleration circuitry for posit operations
US11875150B2 (en) Converting floating-point bit strings in a multi-user network
CN113454916B (en) Host-based bit string conversion
CN113641602B (en) Acceleration circuitry for posit operations
CN113924622B (en) Accumulation of bit strings in the periphery of a memory array
US20200293289A1 (en) Bit string conversion
CN116048456A (en) Matrix multiplier, method of matrix multiplication, and computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant