DE102018115219A1

DE102018115219A1 - Systems and methods for mapping reduction operations

Info

Publication number: DE102018115219A1
Application number: DE102018115219.1A
Authority: DE
Inventors: Martin Langhammer; Gregg William Baeckler; Bogdan Pasca
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 2017-07-14
Filing date: 2018-06-25
Publication date: 2019-01-17

Abstract

[0085] Addiererbäume können für effizientes Packen arithmetischer Operatoren in eine integrierte Schaltung gebaut sein. Die Operanden der Bäume können trunkiert sein, um eine ganzzahlige Anzahl von Knoten pro Logic-Array-Block zu packen. Infolgedessen können arithmetische Operationen effizienter auf die integrierte Schaltung packen, während erhöhte Präzision und Leistung beigestellt werden.

Adder trees can be built into an integrated circuit for efficiently packing arithmetic operators. The operands of the trees may be truncated to pack an integer number of nodes per Logic Array block. As a result, arithmetic operations can pack more efficiently on the integrated circuit while providing increased precision and performance.

Description

QUERVERWEIS AUF VEWANDTE ANMELDUNGENCROSS REFERENCE TO APPLIED APPLICATIONS

Diese Anmeldung ist eine nicht vorläufige Anmeldung, die die Priorität gegenüber der vorläufigen US-Anmeldung Nr. 62/532,871 , bezeichnet als „Systeme und Verfahren zur Abbildung von Reduktionsoperationen“, eingereicht am 14. Juli 2017, beansprucht, die durch Bezugnahme hierin aufgenommen ist.This application is a non-provisional application that has priority over the provisional U.S. Application No. 62 / 532,871 , referred to as "Systems and Methods for Depicting Reduction Operations," filed Jul. 14, 2017, which is hereby incorporated by reference.

ALLGEMEINER STAND DER TECHNIKGENERAL PRIOR ART

Die vorliegende Offenbarung bezieht sich im Allgemeinen auf integrierte Schaltungen und insbesondere auf die Erhöhung der Effizienz der Abbildung von Reduktionsoperationen (z.B. Summierung mehrerer Operanden) auf programmierbare Vorrichtungen (z.B. Field-Programmable-Gate-Array- (FPGA-) Vorrichtungen). Insbesondere bezieht sich die gegenwärtige Offenbarung auf kleine präzisionsmultiplikationsbasierte Skalarprodukte für Operationen maschinellen Lernens.The present disclosure relates generally to integrated circuits, and more particularly to increasing the efficiency of mapping reduction operations (e.g., summing multiple operands) to programmable devices (e.g., Field Programmable Gate Array (FPGA) devices). In particular, the present disclosure relates to small precision multiplication based scalar products for machine learning operations.

Dieser Abschnitt soll den Leser in verschiedene Aspekte des Stands der Technik einführen, die mit verschiedenen Aspekten der vorliegenden Offenbarung in Zusammenhang stehen können, die nachfolgend beschrieben und/oder beansprucht werden. Es wird davon ausgegangen, dass diese Diskussion hilfreich ist, um dem Leser Hintergrundinformationen zu vermitteln, um ein besseres Verständnis der verschiedenen Aspekte der vorliegenden Offenbarung zu ermöglichen. Dementsprechend versteht es sich, dass diese Aussagen in diesem Licht und nicht als Aufnahmen des Stands der Technik zu lesen sind.This section is intended to introduce the reader to various aspects of the prior art that may be associated with various aspects of the present disclosure, which are described and / or claimed below. It is believed that this discussion is helpful in providing background information to the reader in order to provide a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in light of this and not as photographs of the prior art.

Maschinelles Lernen wird zu einem zunehmend wertvollen Anwendungsbereich. Zum Beispiel kann es in der Verarbeitung natürlicher Sprache, der Objekterkennung, der Bioinformatik und der Wirtschaft und anderen Gebieten und Anwendungen genutzt werden. Implementierungen maschinellen Lernens können große arithmetische Operationen beinhalten, wie beispielsweise die Summierung vieler Operanden. Große arithmetische Operationen lassen sich jedoch schwierig in integrierte Schaltungen (z.B. FPGAs) einpassen, die maschinelles Lernen implementieren können. Es kann beispielsweise besonders schwierig sein, arithmetische Operationen auf eine integrierte Schaltung einzupassen, wenn die Operanden eine hohe Präzision aufweisen, es viele Operanden zu summieren gibt und/oder ein hoher Prozentsatz an Logik in der Vorrichtung für die arithmetischen Operationen genutzt wird. Daher kann die Summierung der vielen Operanden, die an dem maschinellen Lernen beteiligt sein können, einen großen Abschnitt der Fläche der integrierten Schaltung involvieren. Dazu kann aufgrund der physikalischen Anordnung und der Art und Weise, in welcher Logikressourcen in derartigen Designs eingesetzt werden können, die nutzbare Logik für dichte arithmetische Designs eingeschränkt sein. In einigen arithmetischen Designs sind beispielsweise Softlogikressourcen, die Addierer-Ressourcen (z.B. Addierer) enthalten, um arithmetische Funktionen auszuführen, oft zusammen gruppiert. Wenn ein Ripple-Carry-Addierer, der für einen bestimmten Knoten in einem Addiererbaum genutzt wird, mehr als die Hälfte der Softlogikgruppe einnimmt, ist daher die übrige Logik in der Gruppe möglicherweise für ähnliche Knoten nicht mehr verfügbar. Folglich ist ein Großteil der Logik in der integrierten Schaltung möglicherweise unerreichbar.Machine learning is becoming an increasingly valuable area of application. For example, it can be used in natural language processing, object recognition, bioinformatics and economics and other fields and applications. Machine learning implementations can involve large arithmetic operations, such as the summation of many operands. However, large arithmetic operations are difficult to fit into integrated circuits (e.g., FPGAs) that can implement machine learning. For example, it may be particularly difficult to fit arithmetic operations onto an integrated circuit if the operands have high precision, there are many operands to sum, and / or a high percentage of logic in the device is used for the arithmetic operations. Therefore, the summation of the many operands that may be involved in machine learning may involve a large portion of the integrated circuit area. In addition, due to the physical arrangement and the manner in which logic resources may be employed in such designs, the useable logic for dense arithmetic designs may be limited. For example, in some arithmetic designs, soft logic resources that contain adder resources (e.g., adders) to perform arithmetic functions are often grouped together. Therefore, if a ripple carry adder used for a particular node in an adder tree occupies more than half of the soft logic group, then the remainder of the logic in the group may not be available to similar nodes. As a result, much of the logic in the integrated circuit may be out of reach.

Figurenlistelist of figures

Verschiedene Aspekte dieser Offenbarung lassen sich möglicherweise nach dem Lesen der folgenden detaillierten Beschreibung und unter Bezugnahme auf die Zeichnungen besser verstehen, in welchen:

1 ein Blockdiagramm eines Systems zur Implementierung arithmetischer Operationen gemäß einer Ausführungsform ist;
2 ein Blockdiagramm einer integrierten Schaltung, in der arithmetische Operationen implementiert sein können, gemäß einer Ausführungsform ist.
3 ein Blockdiagramm eines Datenverarbeitungssystem, in dem eine integrierte Schaltung implementiert sein kann, gemäß einer Ausführungsform ist;
4 ein Blockdiagramm eines Addiererbaums, in dem die arithmetischen Operationen ausgeführt werden können, gemäß einer Ausführungsform ist;
5 ein Blockdiagramm einer zweiten Ausführungsform eines Addiererbaums ist;
6 ein Blockdiagramm des Addiererbaums von 5 und eines nachfolgenden Addiererbaums gemäß einer Ausführungsform ist;
7 ein Blockdiagramm einer zweiten Ausführungsform eines nachfolgenden Addiererbaums ist;
8 ein Blockdiagramm einer Summe des Addiererbaums von 5 und einer Summe des nachfolgenden Addiererbaums von 7 gemäß einer Ausführungsform ist;
9 ein Blockdiagramm einer dritten Ausführungsform eines Addiererbaums ist;
10 ein Flussdiagramm eines Verfahrens, um einen gesamten durchschnittlichen Trunkierungswert zu bestimmen, das am Trunkieren von Operanden in dem Addiererbaum von 9 beteiligt ist, gemäß einer Ausführungsform ist;
11 ein Diagramm einer statischen Verteilung der Bits, die von dem Operanden in dem Addiererbaum von 9 trunkiert werden, gemäß einer Ausführungsform ist;
12 ein Diagramm eines Systems, in dem eine dynamische Verteilung der Bits, die von den Operanden in dem Addiererbaum von 9 trunkiert werden, und ein gesamter durchschnittlicher Trunkierungswert bestimmt werden, gemäß einer Ausführungsform ist;
13 ein Blockdiagramm eines Addiererbaumknotens gemäß einer Ausführungsform ist;
14 ein Blockdiagramm eines Addiererbaumknotens, in dem eine Komprimiererstruktur implementiert ist, gemäß einer Ausführungsform ist;
15 ein Blockdiagramm eines Blockgleitkommabaums gemäß einer Ausführungsform ist;
16 ein Blockdiagramm eines vereinfachten Blockgleitkommabaums gemäß einer Ausführungsform ist; und
17 ein Blockdiagramm eines Blockgleitkommakombinationsbaums gemäß einer Ausführungsform ist.

Various aspects of this disclosure may be better understood after reading the following detailed description and with reference to the drawings, in which:

1 Fig. 10 is a block diagram of a system for implementing arithmetic operations according to an embodiment;
2 a block diagram of an integrated circuit in which arithmetic operations may be implemented, according to one embodiment.
3 a block diagram of a data processing system in which an integrated circuit may be implemented, according to an embodiment;
4 a block diagram of an adder tree in which the arithmetic operations can be performed, according to an embodiment;
5 Fig. 10 is a block diagram of a second embodiment of an adder tree;
6 a block diagram of the adder tree of 5 and a subsequent adder tree according to an embodiment;
7 Figure 3 is a block diagram of a second embodiment of a subsequent adder tree;
8th a block diagram of a sum of the adder tree of 5 and a sum of the subsequent adder tree of 7 according to one embodiment;
9 Fig. 10 is a block diagram of a third embodiment of an adder tree;
10 FIG. 5 is a flow chart of a method for determining an overall average truncation value associated with truncating operands in the adder tree of FIG 9 is involved, according to one embodiment;
11 FIG. 4 is a diagram of a static distribution of bits taken from the operand in the adder tree of FIG 9 truncated, according to one embodiment;
12 FIG. 4 is a diagram of a system in which a dynamic distribution of bits taken from the operands in the adder tree of FIG 9 truncated, and an overall average truncation value can be determined, according to one embodiment;
13 Fig. 10 is a block diagram of an adder tree node according to an embodiment;
14 a block diagram of an adder tree node in which a compressor structure is implemented, according to an embodiment;
15 Fig. 10 is a block diagram of a block floating point tree according to an embodiment;
16 Figure 3 is a block diagram of a simplified block floating point tree according to an embodiment; and
17 FIG. 10 is a block diagram of a block floating point combination tree according to an embodiment. FIG.

DETAILLIERTE BESCHREIBUNG KONKRETER AUSFÜHRUNGSFORMENDETAILED DESCRIPTION OF CONCRETE EMBODIMENTS

Eine oder mehrere konkrete Ausführungsformen werden nachstehend beschrieben. In dem Bemühen, eine kurze Beschreibung dieser Ausführungsformen bereitzustellen, sind nicht alle Merkmale einer tatsächlichen Implementierung in der Beschreibung beschrieben. Es sollte beachtet werden, dass bei der Entwicklung einer derartigen tatsächlichen Implementierung, wie in jedem Engineering- oder Design-Projekt, zahlreiche implementierungsspezifische Entscheidungen getroffen werden müssen, um die konkreten Ziele der Entwickler zu erreichen, wie beispielsweise die Einhaltung systembezogener und geschäftsbezogener Einschränkungen, die von einer Implementierung zur anderen variieren können. Außerdem sollte beachtet werden, dass eine derartige Entwicklungsbemühung komplex und zeitaufwändig sein könnte, aber dennoch für den Durchschnittsfachmann mit dem Vorteil dieser Offenbarung eine routinemäßige Design-, Fertigungs- und Herstellungsaufgabe wäre.One or more specific embodiments will be described below. In an effort to provide a brief description of these embodiments, not all features of an actual implementation are described in the description. It should be noted that in developing such an actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made in order to achieve the specific goals of the developers, such as adhering to systemic and business related constraints vary from one implementation to another. It should also be noted that such a development effort could be complex and time consuming but would still be a routine design, manufacturing and manufacturing task for one of ordinary skill in the art having the benefit of this disclosure.

Maschinelles Lernen ist ein wertvoller Anwendungsfall für integrierte Schaltungen geworden (z.B. Field-Programmable-Gate-Arrays, auch als FPGAs bekannt) und kann eine oder mehrere arithmetische Operationen (z.B. Reduktionsoperationen) nutzen. Um eine arithmetische Operation durchzuführen, kann eine integrierte Schaltung einen Logic-Array-Block (LAB) enthalten, der eine Anzahl von adaptiven Logikmodulen (ALM) und/oder anderen Logikelementen enthalten kann. Die ALMs können Ressourcen enthalten, wie beispielsweise verschiedene Lookup-Tabellen (LUT), Addierer, Übertragsketten und dergleichen, so dass jedes ALM und anschließend der LAB, der die ALMs enthält, konfiguriert sein kann, die arithmetische Funktion zu implementieren. Daher kann eine Implementierung maschinellen Lernens beispielsweise ein FPGA mit einem LAB nutzen, um eine Anzahl arithmetischer Operationen durchzuführen. In diesen Fällen kann der LAB mehrere Operanden mithilfe seiner ALM-Ressourcen summieren. Wenngleich die Operandengrößen, die am maschinellen Lernen beteiligt sind, im Allgemeinen relativ klein sind, können viele parallele Reduktionsoperationen implementiert sein, die einen großen Abschnitt der integrierten Schaltung nutzen können.Machine learning has become a valuable use case for integrated circuits (e.g., field programmable gate arrays, also known as FPGAs) and may utilize one or more arithmetic operations (e.g., reduction operations). To perform an arithmetic operation, an integrated circuit may include a Logic Array Block (LAB), which may include a number of adaptive logic modules (ALM) and / or other logic elements. The ALMs may include resources such as various lookup tables (LUTs), adders, commit chains, and the like, such that each ALM, and then the LAB containing the ALMs, may be configured to implement the arithmetic function. Therefore, a machine learning implementation may use, for example, an FPGA with a LAB to perform a number of arithmetic operations. In these cases, the LAB can sum multiple operands using its ALM resources. Although the operand quantities involved in machine learning are generally relatively small, many parallel reduction operations can be implemented that can utilize a large portion of the integrated circuit.

Daher können gemäß bestimmter Ausführungsformen der vorliegenden Offenbarung Addiererbäume für effizientes Packen arithmetischer Operatoren in integrierte Schaltungen gebaut sein. Die Operanden der Bäume können trunkiert (z.B. beschnitten) sein, um eine ganzzahlige Anzahl von Knoten pro Logic-Array-Block zu packen. Ferner schaffen die hierin beschriebenen Techniken Mechanismen zur Bestimmung des wahrscheinlichen Fehlers, sowie Hardwarestrukturen, um einen Fehler abzuschwächen, der mit dem Trunkieren der Operanden verbunden ist. Infolgedessen können arithmetische Operationen effizienter mit erhöhter Präzision und Leistung auf eine integrierte Schaltung packen.Therefore, according to certain embodiments of the present disclosure, adder trees may be built into integrated circuits for efficiently packing arithmetic operators. The operands of the trees may be truncated (e.g., truncated) to pack an integer number of nodes per Logic Array block. Further, the techniques described herein provide mechanisms for determining the probable error as well as hardware structures to mitigate an error associated with truncating the operands. As a result, arithmetic operations can pack more efficiently with increased precision and power onto an integrated circuit.

Mit Blick auf das Vorgenannte stellt 1 ein Blockdiagramm eines Systems 10 dar, das arithmetische Operationen implementiert. Ein Designer wünscht möglicherweise, Funktionalität auf einer integrierten Schaltung 12 zu implementieren, die beispielsweise ein FPGA, eine anwendungsspezifische integrierte Schaltung (ASIC), ein System-on-Chip (SoC) oder dergleichen enthalten kann. Der Designer kann ein zu implementierendes Programm vorgeben, das es dem Designer ermöglichen kann, Programmierungsanweisungen bereitzustellen, um ein Schaltungsdesign für die integrierte Schaltung 12 zu implementieren. Beispielsweise kann der Designer vorgeben, dass die Programmierungsanweisungen einen Bereich der integrierten Schaltung 12 konfigurieren oder teilweise konfigurieren. With a view to the aforementioned poses 1 a block diagram of a system 10 which implements arithmetic operations. A designer may want functionality on an integrated circuit 12 which may include, for example, an FPGA, an application specific integrated circuit (ASIC), a system on chip (SoC), or the like. The designer may specify a program to implement that may enable the designer to provide programming instructions for a circuit design for the integrated circuit 12 to implement. For example, the designer may dictate that the programming instructions include an area of the integrated circuit 12 configure or partially configure.

Der Designer kann ein Design mithilfe von Designsoftware 14, wie beispielsweise eine Version von Quartus der Intel Corporation, implementieren. Die Designsoftware 14 kann einen Compiler 16 nutzen, um das Programm in ein maschinennahes Programm umzuwandeln. Der Compiler 16 kann maschinenlesbare Anweisungen, die das Programm darstellen, einem Host 18 und der integrierten Schaltung 12 bereitstellen. In einem Beispiel, in dem die integrierte Schaltung 12 eine FPGA-Struktur enthält, kann die integrierte Schaltung 12 ein oder mehrere Kernelprogramme 20 aufnehmen, die die Hardwareimplementierungen beschreiben, die in die programmierbare Struktur der integrierten Schaltung programmiert werden sollten. Der Host 18 kann ein Host-Programm 22 aufnehmen, das von den Kernelprogrammen 20 implementiert werden kann. Um das Host-Programm 22 zu implementieren, kann der Host 18 Anweisungen von dem Host-Programm 22 zu der integrierten Schaltung 12 über eine Kommunikationsverbindung 24 kommunizieren, die zum Beispiel Kommunikation über Direkt-Speicherzugriff (DMA) oder Peripheral Component Interconnect Express (PCIe) sein kann. In einigen Ausführungsformen können die Kernelprogramme 20 und der Host 18 eine Konfiguration eines LAB 26 auf der integrierten Schaltung 12 ermöglichen. Der LAB 26 kann eine Anzahl von ALMs und/oder anderen Logikelementen enthalten und konfiguriert sein, die arithmetischen Funktionen zu implementieren.The designer can design using design software 14 , such as a version of Quartus by Intel Corporation. The design software 14 can be a compiler 16 use to convert the program into a machine-level program. The compiler 16 can machine-readable instructions that represent the program to a host 18 and the integrated circuit 12 provide. In an example where the integrated circuit 12 contains an FPGA structure, the integrated circuit 12 one or more kernel programs 20 which describe the hardware implementations that should be programmed into the programmable structure of the integrated circuit. The host 18 can be a host program 22 the kernel programs 20 can be implemented. To the host program 22 to implement, the host can 18 Instructions from the host program 22 to the integrated circuit 12 via a communication connection 24 which may be, for example, Direct Memory Access (DMA) or Peripheral Component Interconnect Express (PCIe) communication. In some embodiments, the kernel programs 20 and the host 18 a configuration of a LAB 26 on the integrated circuit 12 enable. The LAB 26 may include a number of ALMs and / or other logic elements and be configured to implement the arithmetic functions.

In einem Beispiel, das in 2 gezeigt ist, kann die integrierte Schaltung 12 eine programmierbare Logikvorrichtung enthalten, wie beispielsweise ein Field-Programmable-Gate-Array (FPGA) 40. Für die Zwecke dieses Beispiels wird die Vorrichtung als ein FPGA 40 bezeichnet, wenngleich es sich versteht, dass die Vorrichtung eine beliebige Art von Logikvorrichtung sein kann (z.B. eine anwendungsspezifische integrierte Schaltung (ASIC) und/oder ein anwendungsspezifisches Standardprodukt (ASSP)). Wie gezeigt, kann das FPGA 40 Eingabe-/Ausgabeschaltungen 42 zum Antreiben von Signalen außerhalb von FPGA 40 und zum Empfangen von Signalen von anderen Vorrichtungen über Eingabe-/Ausgabepins 44 aufweisen. Es können Verbindungsressourcen 46, wie beispielsweise globale und lokale vertikale und horizontale Leitungen und Busse, genutzt werden, um Signale auf dem FPGA 40 zu leiten. Zusätzlich können die Verbindungsressourcen 46 feste Zwischenverbindungen (Leiterbahnen) und programmierbare Zwischenverbindungen (d.h. programmierbare Verbindungen zwischen jeweiligen festen Zwischenverbindungen) enthalten. Die programmierbare Logik 48 kann kombinatorische und sequentielle Logikschaltungen enthalten. Beispielsweise kann die programmierbare Logik 48 Lookup-Tabellen, Register und Multiplexer enthalten. In verschiedenen Ausführungsformen kann die programmierbare Logik 48 konfiguriert sein, eine individuelle Logikfunktion auszuführen. Die programmierbaren Zwischenverbindungen, die mit Verbindungsressourcen assoziiert sind, können als ein Teil der programmierbaren Logik 48 erachtet werden.In an example that is in 2 shown is the integrated circuit 12 include a programmable logic device, such as a Field Programmable Gate Array (FPGA) 40 , For the purposes of this example, the device will be considered an FPGA 40 Although it should be understood that the device may be any type of logic device (eg, an application specific integrated circuit (ASIC) and / or a standard application specific product (ASSP)). As shown, the FPGA 40 Input / output circuits 42 for driving signals outside of FPGA 40 and for receiving signals from other devices via input / output pins 44 exhibit. It can connect resources 46 , such as global and local vertical and horizontal lines and buses, are used to send signals to the FPGA 40 to lead. Additionally, the connection resources 46 fixed interconnects (interconnects) and programmable interconnects (ie, programmable interconnections between respective fixed interconnects). The programmable logic 48 may include combinational and sequential logic circuits. For example, the programmable logic 48 Lookup tables, registers and multiplexers included. In various embodiments, the programmable logic 48 be configured to perform an individual logic function. The programmable interconnects associated with connection resources may be considered part of the programmable logic 48 be considered.

Programmierbare Logikvorrichtungen, wie beispielsweise FPGA 40, können programmierbare Elemente 50 mit der programmierbaren Logik 48 enthalten. Beispielsweise wie zuvor besprochen, kann ein Designer (z.B. ein Kunde) die programmierbare Logik 48 programmieren (z.B. konfigurieren), damit sie eine oder mehrere gewünschte Funktionen ausführt. Beispielsweise kann das FPGA 40 programmiert werden, indem die programmierbaren Elemente 50 mithilfe von Maskenprogrammierungsanordnungen konfiguriert werden, was während der Halbleiterfertigung durchgeführt wird. In einem anderen Beispiel kann das FPGA 40 konfiguriert werden, nachdem die Halbleiterfertigungsvorgänge abgeschlossen worden sind, wie beispielsweise mittels elektrischer Programmierung oder Laserprogrammierung, um die programmierbaren Elemente 50 zu programmieren. Im Allgemeinen können die programmierbaren Elemente 50 auf einer beliebigen geeigneten programmierbaren Technik beruhen, wie beispielsweise Sicherungen, Antisicherungen, elektrisch programmierbare Nur-Lese-Speichertechnik, Arbeitsspeicherzellen, maskenprogrammierte Elemente und so weiter.Programmable logic devices, such as FPGA 40 , can be programmable elements 50 with the programmable logic 48 contain. For example, as previously discussed, a designer (eg, a customer) may use the programmable logic 48 program (eg configure) to perform one or more desired functions. For example, the FPGA 40 be programmed by the programmable elements 50 be configured using mask programming arrangements, which is performed during semiconductor fabrication. In another example, the FPGA 40 after the semiconductor manufacturing operations have been completed, such as by electrical programming or laser programming, around the programmable elements 50 to program. In general, the programmable elements 50 are based on any suitable programmable technique such as fuses, antifuses, electrically programmable read only memory technology, memory cells, mask programmed elements, and so forth.

Das FPGA 40 kann elektrisch programmiert sein. Mit elektrischen Programmierungsanordnungen können die programmierbaren Elemente 50 aus einer oder mehreren Speicherzellen gebildet werden. Beispielsweise werden während der Programmierung Konfigurationsdaten mithilfe der Eingabe-/Ausgabepins 44 und den Eingabe-/Ausgabeschaltungen 42 in die Speicherzellen geladen. In einer Ausführungsform können die Speicherzellen als Arbeitsspeicher- (RAM-) Zellen implementiert sein. Die hierin beschriebene Nutzung von Speicherzellen basierend auf RAM-Technik soll nur ein Beispiel sein. Da diese RAM-Zellen während der Programmierung mit Konfigurationsdaten geladen werden, werden sie ferner manchmal als Konfigurations-RAM-Zellen (CRAM) bezeichnet. Diese Speicherzellen können jeweils ein entsprechendes statisches Steuerungsausgangssignal bereitstellen, das einen Zustand einer zugehörigen Logikkomponente in der programmierbaren Logik 48 steuert. In einigen Ausführungsformen können die Ausgangssignale beispielsweise an die Gates von MetallOxid-Halbleiter- (MOS-) Transistoren innerhalb der programmierbaren Logik 48 angelegt sein.The FPGA 40 can be programmed electrically. With electrical programming arrangements, the programmable elements 50 be formed of one or more memory cells. For example, during programming, configuration data is made using the input / output pins 44 and the input / output circuits 42 loaded into the memory cells. In one embodiment, the memory cells may be implemented as random access memory (RAM) cells. The use of memory cells based on RAM technology described herein is intended to be only an example. Furthermore, as these RAM cells are loaded with configuration data during programming, they are sometimes called configuration RAM. Cells (CRAM). These memory cells may each provide a corresponding static control output signal representing a state of an associated logic component in the programmable logic 48 controls. For example, in some embodiments, the output signals may be applied to the gates of metal oxide semiconductor (MOS) transistors within the programmable logic 48 be created.

Die Schaltungen des FPGA 40 können mithilfe jeder geeigneten Architektur organisiert sein. Als Beispiel kann die Logik des FPGA 40 in eine Reihe von Zeilen und Spalten größerer programmierbarer Logikbereiche organisiert sein, die jeweils mehrere kleinere Logikbereiche enthalten können. Die Logikressourcen des FPGA 40 können durch die Verbindungsressourcen 46, wie beispielsweise zugehörige vertikale und horizontale Leiter, miteinander verbunden sein. In einigen Ausführungsformen können diese Leiter beispielsweise umfassende Leiterbahnen, die sich im Wesentlichen über den gesamten FPGA 40 erstrecken, Teilleitungen wie beispielsweise Halbleitungen oder Viertelleitungen, die sich über einen Teil des FPGA 40 erstrecken, abgestufte Leitungen einer bestimmten Länge (die z.B. ausreicht, um mehrere Logikbereiche miteinander zu verbinden), lokale Leitungen oder eine beliebige andere Anordnung geeigneter Verbindungsressourcen enthalten. Außerdem kann die Logik des FPGA 40 in weiteren Ausführungsformen in mehreren Ebenen oder Schichten angeordnet sein, in denen mehrere große Bereiche miteinander verbunden sind, um noch größere Logikabschnitte zu bilden. Fernerhin können andere Vorrichtungsanordnungen eine Logik nutzen, die nicht in einer anderen Art und Weise als Zeilen und Spalten angeordnet ist.The circuits of the FPGA 40 can be organized using any suitable architecture. As an example, the logic of the FPGA 40 be organized into a series of rows and columns of larger programmable logic areas, each of which may contain several smaller logic areas. The logic resources of the FPGA 40 can through the connection resources 46 , such as associated vertical and horizontal conductors, be interconnected. For example, in some embodiments, these conductors may include extensive traces extending substantially throughout the FPGA 40 extend, sub-lines such as half-lines or quarter-lines, which extend over part of the FPGA 40 extend, graduated lines of a certain length (eg, sufficient to connect several logic areas together), local lines, or any other arrangement of suitable connection resources. In addition, the logic of the FPGA 40 in further embodiments, be arranged in a plurality of planes or layers in which a plurality of large areas are interconnected to form even larger logic sections. Furthermore, other device arrangements may utilize logic that is not arranged in a manner other than rows and columns.

3 zeigt ein Datenverarbeitungssystem 100, welches ein Beispiel einer der vielen elektronischen Vorrichtungen sein kann, in denen eine integrierte Schaltung 12, wie beispielsweise der FPGA 40, genutzt werden kann. Das Datenverarbeitungssystem 100 kann einen Prozessor 101, Speicher 102, Eingabe-/Ausgabe-(I/O-) Anschlüsse 103, Peripherievorrichtungen 104 und/oder zusätzliche oder weniger Komponenten enthalten. Die Komponenten in dem Datenverarbeitungssystem 100 können durch einen Systembus 105 miteinander gekoppelt und auf eine Leiterplatte 106, die in einem Endbenutzersystem 107 enthalten sein kann, bestückt sein. 3 shows a data processing system 100 which may be an example of one of the many electronic devices incorporating an integrated circuit 12 , such as the FPGA 40 , can be used. The data processing system 100 can be a processor 101 , Storage 102 , Input / output (I / O) connectors 103 , Peripheral devices 104 and / or contain additional or less components. The components in the data processing system 100 can through a system bus 105 coupled together and on a circuit board 106 that in an end-user system 107 may be included.

Das Datenverarbeitungssystem 100 kann in einer Vielzahl von Anwendungen genutzt werden. Zum Beispiel kann es in Computer- oder Datennetzen, Instrumentierung, Video- oder Datensignalverarbeitung oder in anderen Anwendungen, in denen programmierbare Logik für vorteilhaft befunden werden kann, genutzt werden. Die integrierte Schaltung 12 kann innerhalb des Datenverarbeitungssystems 100 genutzt werden, um Logikfunktionen auszuführen. Die integrierte Schaltung 12 kann als ein Prozessor oder eine Steuerung konfiguriert sein, die mit dem Prozessor 101 zusammenwirken kann, und/oder die integrierte Schaltung 12 kann zwischen dem Prozessor 101 und anderen Komponenten in dem Datenverarbeitungssystem 100 eine Schnittstelle bilden, neben weiteren Beispielen.The data processing system 100 can be used in a variety of applications. For example, it can be used in computer or data networks, instrumentation, video or data signal processing, or in other applications where programmable logic can be found to be beneficial. The integrated circuit 12 can within the data processing system 100 used to perform logic functions. The integrated circuit 12 can be configured as a processor or controller that interfaces with the processor 101 can interact, and / or the integrated circuit 12 can be between the processor 101 and other components in the data processing system 100 form an interface, among other examples.

Nunmehr bezugnehmend auf 4, kann in einigen Ausführungsformen ein ALM und/oder LAB 26 die Summierung mehrerer Operanden über einen Addiererbaum 200 durchführen, der die Operanden 201 an mehreren Knoten 207 und/oder Stufen 208 summieren kann, bis eine Endsumme erzeugt ist. In der dargestellten Ausführungsform kann beispielsweise ein erster Addiererbaum 210 Eingabeschaltungen enthalten, um vier Operanden 201 in einer ersten Stufe 208A zu empfangen, die schließlich zu einer Endsumme 206 summiert werden. Die Operanden 201 können an zwei Knoten 207 gepaart werden, um zwei Sätze von Operanden zu bilden, die ein Satz von Addierern (z.B. Addiererschaltungen) separat zu zwei Zwischenergebnissen 203 summieren kann. Ein zusätzlicher Addierer kann die zwei Zwischenergebnisse 203 an einem einzelnen Knoten 207 in einer zweiten Stufe 208B des ersten Addiererbaums 210 in einer dritten Stufe 208C des ersten Addiererbaums 210 zu der Endsumme 206 summieren, um die arithmetische Operation des Summierens aller vier der Operanden 201 abzuschließen. Während Addierer in der dargestellten Ausführungsform nicht gezeigt sind, versteht es sich, dass ein Ergebnis (z.B., 203, 206) des Summierens zweier oder mehr Operanden 201 durch Nutzung eines Addierers (z.B. Addiererschaltungen) erhalten werden kann.Referring now to 4 For example, in some embodiments, an ALM and / or LAB 26 the summation of several operands via an adder tree 200 perform the operands 201 at several nodes 207 and / or stages 208 can accumulate until a grand total is generated. In the illustrated embodiment, for example, a first adder tree 210 Input circuits contain four operands 201 in a first stage 208A to finally receive a final sum 206 be summed up. The operands 201 can connect to two nodes 207 are paired to form two sets of operands containing a set of adders (eg adder circuits) separate to two intermediate results 203 can sum up. An additional adder can do the two intermediate results 203 at a single node 207 in a second stage 208B of the first adder tree 210 in a third stage 208C of the first adder tree 210 to the final sum 206 sum to the arithmetic operation of summing all four of the operands 201 complete. While adders are not shown in the illustrated embodiment, it is understood that a result (eg, 203 . 206 ) of summing two or more operands 201 by using an adder (eg, adder circuits).

Die Addition der zwei Operanden 201 kann ein Ergebnis mit einer größeren Anzahl von Bits als jeder der Operanden 201 erzeugen. Zum Beispiel kann die Addition eines ersten 6-Bit-Operanden und eines zweiten 6-Bit-Operanden ein 7-Bit-Ergebnis erzeugen, wenn die Addition eine Übertragsoperation beinhaltet, die sich auf ein Bit mit höchstem Stellenwert (MSB) des ersten 6-Bit-Operanden oder des zweiten 6-Bit-Operanden auswirkt. Daher können in einigen Fällen die Zwischenergebnisse 203 und die Endsumme 206 eines Addiererbaums 200 jeweils zusätzliche Bits verglichen mit den Operanden 201 in einer vorherigen Stufe (z.B. 208A, 208B) enthalten. Ein Addiererbaum 200 mit mehreren Stufen kann beispielsweise eine Endsumme 206 erzeugen, die mehr Bits als jeder eines Satzes von Operanden 201 aufweist, die in den Addiererbaum 200 eingegeben werden. Daher kann das Wachstum der Ergebnisse der arithmetischen Operationen in dem Addiererbaum die Nutzung zusätzlicher Ressourcen und/oder zusätzlichen Platzes auf einer integrierten Schaltung 12 mit sich bringen und sich weiter negativ auf die Packeffizienz der integrierten Schaltung 12 auswirken.The addition of the two operands 201 can produce a result with a larger number of bits than each of the operands 201 produce. For example, the addition of a first 6-bit operand and a second 6-bit operand may produce a 7-bit result if the addition involves a carry operation that relates to a highest-order bit (MSB) of the first 6-bit operand. Bit operands or the second 6-bit operand. Therefore, in some cases, the intermediate results 203 and the final sum 206 an adder tree 200 each additional bits compared to the operands 201 in a previous stage (eg 208A, 208B). An adder tree 200 For example, with multiple levels, a grand total may be used 206 generate more bits than each of a set of operands 201 which is in the adder tree 200 be entered. Therefore, the growth of the results of the arithmetic operations in the adder tree may involve the use of additional resources and / or additional space on an integrated circuit 12 bring about and continue to adversely affect the packaging efficiency of the integrated circuit 12 impact.

Deshalb können die Operanden 201 bei jeder Stufe (z.B. 208A-208C) oder einer Untermenge der Stufen 208 trunkiert (z.B. beschnitten) werden, um das Wachstum der Ergebnisse einer arithmetischen Operation und somit das Packen zu steuern. Zum Beispiel kann durch Trunkieren der Operanden 201 in dem ersten Addiererbaum 210 der Addiererbaum 200 effizienter auf die integrierte Schaltung 12 packen. In der dargestellten Ausführungsform beispielsweise ist jeder der Operanden 201 6 Bits breit, und die Größe von 6 Bits wird durch den ersten Addiererbaum 210 übertragen. Daher sind die Zwischenergebnisse 203 und die Endsumme 206 in der dargestellten Ausführungsform jeweils ebenfalls 6 Bits breit. Um die konstante Größe von 6 Bits zu erleichtern, kann ein Bit mit niedrigstem Stellenwert (LSB) (z.B. 202, 204) jedes Operanden 201 fallengelassen werden. Beispielsweise können die Softlogikschaltungen (z.B. Logik innerhalb des LAB) die Operanden 201 um ein Bit nach rechts verschieben, um die LSBs zu trunkieren. Das heißt, in dem in 4 gezeigten Beispiel können nur die oberen fünf Bits jedes Operanden 201 genutzt werden, was zu der Addition zweier 5-Bit-Operanden 201 bei jeder Stufe (z.B. 208A, 208B) des ersten Addiererbaums 210 führt. Daher weisen die Zwischenergebnisse 203 in der zweiten Stufe 208B eine Größe von 6 Bits auf, da, wie besprochen, die arithmetische Operation zu einem Ergebnis mit einem Einzelbitwachstum führen kann. Daher kann ein LSB 204 von jedem der Zwischenergebnisse 203 ebenfalls trunkiert werden, bevor die Additionsoperation bei der zweiten Stufe 208B erfolgt. Die Endsumme 206 kann daher eine Breite von 6 Bit aufweisen. Dementsprechend ist der erste Addiererbaum 210 ein veranschaulichendes Beispiel eines Reduktionsbaums, bei dem ein Einzelbit abgeschnitten ist, da ein Einzelbit vom jedem Operanden 201 bei jeder Stufe 208 entfernt (z.B. abgeschnitten) wird. Therefore, the operands 201 at each level (eg 208A - 208C ) or a subset of the levels 208 truncated (eg trimmed) to control the growth of the results of an arithmetic operation and thus packing. For example, by truncating the operands 201 in the first adder tree 210 the adder tree 200 more efficient on the integrated circuit 12 pack. For example, in the illustrated embodiment, each of the operands 201 6 bits wide, and the size of 6 bits is passed through the first adder tree 210 transfer. Therefore, the intermediate results 203 and the final sum 206 in the illustrated embodiment also each 6 bits wide. To the constant size of 6 To facilitate bits, a least significant bit (LSB) (eg 202 . 204 ) each operand 201 be dropped. For example, the soft logic circuits (eg, logic within the LAB) may be the operands 201 shift one bit to the right to truncate the LSBs. That is, in the in 4 As shown, only the upper five bits of each operand can be used 201 be used, leading to the addition of two 5 Bit operands 201 at each level (eg 208A . 208B) of the first adder tree 210 leads. Therefore, the intermediate results 203 in the second stage 208B 6 bits in size, because, as discussed, the arithmetic operation can result in a single bit-growth result. Therefore, an LSB 204 from each of the intermediate results 203 also be truncated before the addition operation at the second stage 208B he follows. The final sum 206 can therefore have a width of 6 bits. Accordingly, the first adder tree 210 an illustrative example of a reduction tree in which a single bit is truncated, since a single bit from each operand 201 at every level 208 removed (eg cut off) becomes.

Als zusätzliches veranschaulichendes Beispiel kann ein vorzeichenbehafteter Multiplizierer von 8 Bit mal 3 Bit, der in Implementierungen maschinellen Lernens genutzt werden kann, einen vorzeichenbehafteten Ausgabebereich von 10 Bits aufweisen. Dementsprechend kann eine Eingabepräzision eines Addiererbaums, der genutzt wird, um ein Produkt des vorzeichenbehafteten Multiplizierers von 8 Bit mal 3 Bit zu erzeugen, 10 Bit betragen. In einigen Ausführungsformen kann es das Beibehalten von 10 Bits für jeden Operanden 201 an einem Addiererknoten 207 erlauben, dass zwei Knoten 207 in jede Routing-Gruppe (z.B. Softlogikgruppe) gepackt werden, da 20 Addierer-Bits eine Logikgruppierung in einer integrierten Schaltung 12 sind. Daher kann, um 10 Bits für jeden Operanden 201 beizubehalten und um eine Endsumme mit einer Breite von 10 Bit zu erzeugen, jeder Operand 201 an jedem Knoten 207 des Addiererbaums um ein Einzelbit nach rechts verschoben (z.B. trunkiert) werden, was das Wachstum eines Einzelbits in einer Summe, die von einem Knoten 207 ausgegeben wird, adressieren kann.As an additional illustrative example, a signed multiplier of 8 bits by 3 bits, which may be used in machine learning implementations, may have a signed output range of 10 bits. Accordingly, an input precision of an adder tree used to produce a product of the signed multiplier of 8 bits by 3 bits may be 10 bits. In some embodiments, it may be to keep 10 bits for each operand 201 at an adder node 207 allow that two nodes 207 into each routing group (eg, soft logic group) since 20 adder bits form a logic array in an integrated circuit 12 are. Therefore, by 10 bits for each operand 201 and to produce a final sum with a width of 10 bits, each operand 201 at every node 207 of the adder tree are shifted (eg truncated) to the right by a single bit, which is the growth of a single bit in a sum equal to a node 207 is issued, can address.

In anderen Ausführungsformen kann die Trunkierung das Entfernen einer LSB-Gruppe 305 (z.B. eines Satzes von zwei oder mehr LSBs) anstelle von oder zusätzlich zu dem Entfernen eines einzelnen LSB beinhalten. Beispielsweise kann in einigen Ausführungsformen ein zweiter Addiererbaum 300 Eingabeschaltungen enthalten, um Operanden 201 mit vielen Bits (z.B. große Operanden) zu empfangen, wie es 5 möglicherweise darstellt. Daher kann Softlogik beispielsweise eine LSB-Gruppe 305 (z.B. 302-304) von jedem der Operanden 201 trunkieren, um die Packeffizienz zu verbessern. Beispielsweise kann der zweite Addiererbaum 300 die Addition von vier 5-Bit-Operanden 201 anstelle von vier 8-Bit-Operanden 201 in einer ersten Stufe 208A verarbeiten. Folglich können die Zwischenergebnisse 203 6 Bits enthalten, also kann Softlogik, um eine Endsumme 206 mit einer Breite von 6 Bit zu erzeugen, beispielsweise ein einzelnes LSB 204 von jedem der Zwischenergebnisse 203 trunkieren. Daher kann bei jeder Stufe (z.B. 208A-208C) eines Addiererbaums (z.B. des zweiten Addiererbaums 300) eine andere Anzahl von LSBs nach Bedarf trunkiert werden, um die Packeffizienz zu verbessern. Während die 4 und 5 die Trunkierung eines einzelnen LSB und/oder einer LSB-Gruppe 305 veranschaulichen, die als ein Satz von drei LSBs (z.B. 302-304) dargestellt ist, kann ferner jede geeignete Anzahl von LSBs bei jeder geeigneten Stufe 208 eines Addiererbaums von einem Operanden 201 trunkiert werden.In other embodiments, the truncation may involve the removal of an LSB group 305 (eg, a set of two or more LSBs) instead of or in addition to removing a single LSB. For example, in some embodiments, a second adder tree 300 Input circuits contain operands 201 with many bits (eg, large operands) to receive, as it is 5 possibly representing. Therefore, soft logic may be an LSB group, for example 305 (eg 302 - 304 ) of each of the operands 201 truncate to improve packing efficiency. For example, the second adder tree 300 the addition of four 5-bit operands 201 instead of four 8-bit operands 201 in a first stage 208A to process. Consequently, the intermediate results 203 6 Contain bits, so can soft logic to a final sum 206 with a width of 6 bits, for example a single LSB 204 from each of the intermediate results 203 truncate. Therefore, at each stage (eg 208A - 208C ) of an adder tree (eg the second adder tree 300 ) truncate another number of LSBs as needed to improve packing efficiency. While the 4 and 5 the truncation of a single LSB and / or an LSB group 305 Illustrate as a set of three LSBs (eg 302 - 304 ) may further include any suitable number of LSBs at any suitable stage 208 an adder tree from an operand 201 be truncated.

Die Endsumme 206 kann im Vergleich zu tatsächlichen Ergebnissen (z.B. mit vollständiger Präzision) eines jeweiligen vollständigen, nicht trunkierten Addiererbaums fehlerbehaftet sein. Da zum Beispiel die trunkierten LSBs (z.B. 202, 204 und 302-304) nicht in der Endsumme 206 der dargestellten Addiererbäume (z.B. 210 bzw. 300) enthalten sind, kann sich die Endsumme 206 von den tatsächlichen Ergebnissen der jeweiligen vollständigen Addiererbäume unterscheiden. Die Endsumme 206 unterscheidet sich möglicherweise nicht wesentlich von den tatsächlichen Ergebnissen der jeweiligen vollständigen Addiererbäume, jedoch sind in einigen Ausführungsformen genauere Addiererbaumsummen von Vorteil.The final sum 206 may be flawed compared to actual results (eg, with full precision) of a respective complete, non-truncated adder tree. Since, for example, the truncated LSBs (eg 202 . 204 and 302 - 304 ) not in the final sum 206 the illustrated adder trees (eg 210 respectively. 300 ), the final sum may be 206 differ from the actual results of the respective complete adder trees. The final sum 206 may not differ significantly from the actual results of the respective complete adder trees, however, in some embodiments more accurate adder tree sums are advantageous.

Daher kann ein Addiererbaum 200 in mehrere Bäume aufgeteilt werden, um die Genauigkeit einer Endsumme 206 zu verbessern, während effizientes Packen in der integrierten Schaltung 12 beibehalten wird. Dementsprechend stellt 6 eine Ausführungsform des zweiten Addiererbaums 300 dar, der in einen Hauptbaum 308 (z.B. zweiten Addiererbaum 300), der der Summierung der trunkierten Operanden 201 entspricht, und einen ersten nachfolgenden Addiererbaum 310A, der der Summierung der LSB-Gruppen 305 entspricht, die von den Operanden 201 in der ersten Stufe 208A des Hauptbaums 308 trunkiert werden, aufgeteilt ist. Daher können die LSB-Gruppen 305 separat vom den trunkierten Operanden 201 summiert werden. Ferner kann in Abhängigkeit von der Anzahl der LSBs 302-304, die von den Operanden 201 trunkiert werden, der erste nachfolgende Addiererbaum 310A eine weitere Trunkierung implementieren oder nicht. In einigen Ausführungsformen beispielsweise implementiert der erste nachfolgende Addiererbaum 310A möglicherweise keine Trunkierung der LSB-Gruppen 305, wenn die Größe und/oder Anzahl der trunkierten LSB-Gruppen 305 geeignet genug ist, effizient in die integrierte Schaltung 12 zu packen. In anderen Ausführungsformen können viele und/oder große LSB-Gruppen 305 in dem ersten nachfolgenden Addiererbaum 310A zur Trunkierung von LSBs von den LSB-Gruppen 305 selbst führen.Therefore, an adder tree 200 divided into several trees to the accuracy of a final total 206 improve while efficient packaging in the integrated circuit 12 is maintained. Accordingly presents 6 an embodiment of the second adder tree 300 that is in a main tree 308 (eg second adder tree 300 ), the summation of truncated operands 201 corresponds, and a first subsequent adder tree 310A , the sum of the LSB groups 305 matches that of the operands 201 in the first stage 208A of the main tree 308 be truncated, split. Therefore, the LSB groups 305 separate from the truncated operand 201 be summed up. Furthermore, depending on the number of LSBs 302 - 304 that of the operands 201 be truncated, the first subsequent adder tree 310A implement another truncation or not. For example, in some embodiments, the first subsequent adder tree implements 310A possibly no truncation of the LSB groups 305 if the size and / or number of truncated LSB groups 305 suitable enough, efficient in the integrated circuit 12 to pack. In other embodiments, many and / or large LSB groups may be used 305 in the first subsequent adder tree 310A for truncating LSBs from the LSB groups 305 lead yourself.

Ferner kann 6 die Nutzung eines ersten nachfolgenden Addiererbaums 310A, um die LSB-Gruppen 305 von der ersten Stufe 208A des Hauptbaums 308 zu summieren, veranschaulichen, in einigen Ausführungsformen kann der Hauptbaum 308 die Operanden 201 jedoch bei mehr als einer Stufe trunkieren. In der dargestellten Ausführungsform können beispielsweise Softlogikschaltungen das LSB 204 von jedem der Zwischenergebnisse 203 in dem Hauptbaum 308 trunkieren. Daher kann in einigen Ausführungsformen die Summierung der LSBs, die bei jeder Stufe 208 trunkiert werden, in einen separaten nachfolgenden Baum 310 aufgeteilt werden, der der Stufe 208 entspricht. Die Ausführungsform in 7 veranschaulicht beispielsweise den ersten nachfolgenden Addiererbaum 310A, der die Summierung der LSBs 302-304 von der ersten Stufe 208A des Hauptbaums 308 verarbeiten kann, und einen zweiten nachfolgenden Addiererbaum 310B, der die Summierung der LSBs 204 von der zweiten Stufe 208B des Hauptbaums 308 verarbeiten kann. In diesen Ausführungsformen kann eine Summierung des ersten nachfolgenden Addiererbaums 310A und des zweiten nachfolgenden Addiererbaums 310B ausgerichtet sein. Beispielsweise kann das LSB 204 dieselbe Bitposition wie das Bit 351 in den Zwischenergebnissen 203 in der zweiten Stufe des ersten Addiererbaums 310A teilen. Daher kann die Endsumme 206 des ersten nachfolgenden Addiererbaums 310A mit der Endsumme 206 des zweiten nachfolgenden Addiererbaums 310B ausgerichtet sein, um zu einer Endsumme 206' summiert zu werden, die durch Summieren des ersten nachfolgenden Addiererbaums 310A und des zweiten nachfolgenden Addiererbaums 310B erzeugt wird.Furthermore, can 6 the use of a first subsequent adder tree 310A to the LSB groups 305 from the first stage 208A of the main tree 308 to summarize, in some embodiments, the main tree 308 the operands 201 but truncate at more than one level. For example, in the illustrated embodiment, soft logic circuits may include the LSB 204 from each of the intermediate results 203 in the main tree 308 truncate. Therefore, in some embodiments, the summation of the LSBs occurring at each stage 208 be truncated, in a separate subsequent tree 310 be split, the level 208 equivalent. The embodiment in 7 Illustrates, for example, the first subsequent adder tree 310A that is the summation of the LSBs 302 - 304 from the first stage 208A of the main tree 308 and a second subsequent adder tree 310B that is the summation of the LSBs 204 from the second stage 208B of the main tree 308 can handle. In these embodiments, a summation of the first subsequent adder tree 310A and the second subsequent adder tree 310B be aligned. For example, the LSB 204 the same bit position as the bit 351 in the intermediate results 203 in the second stage of the first adder tree 310A share. Therefore, the final sum 206 of the first subsequent adder tree 310A with the final sum 206 of the second subsequent adder tree 310B be aligned to a grand total 206 ' to be summed by summing the first subsequent adder tree 310A and the second subsequent adder tree 310B is produced.

Dementsprechend kann, wie in 8 dargestellt, die Endsumme 320 des Hauptbaums 308 mit der Endsumme 206' des summierten ersten nachfolgenden Addiererbaums 310A und des zweiten nachfolgenden Addiererbaums 310B zusammen summiert werden. In einigen Ausführungsformen können die MSBs der Endsumme 206' mit den LSBs der Endsumme 206 ausgerichtet sein. Beispielsweise, in der dargestellten Ausführungsform, können die Bits 323 und 322 der Endsumme 206 jeweils mit den Bits 380 und 381 der Endsumme 206' ausgerichtet sein.Accordingly, as in 8th represented, the final sum 320 of the main tree 308 with the final sum 206 ' of the summed first subsequent adder tree 310A and the second subsequent adder tree 310B summed up together. In some embodiments, the MSBs may sum the total 206 ' with the LSBs of the final sum 206 be aligned. For example, in the illustrated embodiment, the bits 323 and 322 the final sum 206 each with the bits 380 and 381 the final sum 206 ' be aligned.

Während es möglicherweise erscheint, dass das Aufteilen eines Addiererbaums 200 (z.B. des zweiten Addiererbaums 300) in separate Bäume (z.B. 308, 310A, 310B) weniger effizient packt als ein einzelner Addiererbaum 200, kann der trunkierte Hauptbaum 308 zu 100% in aktuelle FPGAs 40 packen. Wenngleich der nachfolgende Addiererbaum 310 in einigen Fällen nicht beschnitten ist, bestehen viele Möglichkeiten, dass er effizient in das FPGA packen kann. Da beispielsweise ein nachfolgender Addiererbaum 310 nur einen kleinen Bruchteil einer Softlogikgruppe nutzen kann, können mehrere Knoten des nachfolgenden Addiererbaums 310 in eine einzelne Logikgruppe packen. Da der nachfolgende Addiererbaum 310 ferner kleine arithmetische Operationen (z.B. weniger Operanden, kleinere Operanden und/oder dergleichen) im Vergleich zu einem Hauptbaum 308 verarbeiten kann, kann der nachfolgende Addiererbaum 310 effizient in eine integrierte Schaltung 12 packen.While it may appear that splitting up an adder tree 200 (eg the second adder tree 300 ) in separate trees (eg 308 . 310A . 310B ) packs less efficiently than a single adder tree 200 , may be the truncated main tree 308 100% in current FPGAs 40 pack. Although the subsequent adder tree 310 In some cases, there are many ways that it can pack efficiently into the FPGA. For example, because a subsequent adder tree 310 can use only a small fraction of a soft logic group, can use multiple nodes of the subsequent adder tree 310 into a single logic group. Because the following adder tree 310 and small arithmetic operations (eg, fewer operands, smaller operands, and / or the like) compared to a main tree 308 can process the subsequent adder tree 310 efficient in an integrated circuit 12 pack.

In einigen Ausführungsformen können andere arithmetische Strukturen zusätzlich oder alternativ genutzt werden, um einen nachfolgenden Addiererbaum 310 zu bauen. Beispielsweise kann der nachfolgende Addiererbaum 310 Komprimierer enthalten. Wenngleich Komprimierer mehr Logik pro Bit nutzen können, kann das Packverhältnis sehr hoch sein, da die Operation eher mehr wie eine beliebige Logikaufgabe als eine Beschränkung einer Übertragskettenabbildung einer arithmetischen Struktur strukturiert sein kann.In some embodiments, other arithmetic structures may additionally or alternatively be used to construct a subsequent adder tree 310 to build. For example, the subsequent adder tree 310 Compressor included. Although compressors can use more logic per bit, the packing ratio can be very high because the operation may be more structured like any logical task rather than a constraint on a carry-chain map of an arithmetic structure.

In einigen Ausführungsformen kann das Bauen mehrerer Bäume ressourcen- und flächenaufwändig sein. In vielen Fällen ist die Genauigkeit der Endsumme 206 einer Summierung jedoch wertvoller als die Genauigkeit der Zwischenergebnisse 203. Daher kann, zusätzlich oder alternativ zur Verarbeitung eines Beitrags trunkierter Bits durch Addieren des ersten nachfolgenden Addiererbaums 310A in die Endsumme 206 des Hauptbaums 308, der Beitrag der trunkierten Bits mit einer Konstante berücksichtigt werden, die dem Hauptbaum 308 hinzuaddiert wird. Beispielsweise kann die Genauigkeit der Endsumme 206 nach dem Beschneiden oder Trunkieren der Operanden 201 des Hauptbaums 308 durch Addieren einer Konstante oder eines Satzes von Konstanten zu dem Hauptbaum 308 auf Grundlage einer Schätzung eines Werts verbessert werden, den die trunkierten Bits zu der Endsumme 206 beigetragen hätten.In some embodiments, building multiple trees can be resource and space consuming. In many cases, the accuracy of the final total 206 a summation, however, more valuable than the accuracy of the intermediate results 203 , Therefore, in addition to or alternatively to processing a contribution of truncated bits, by adding the first subsequent adder tree 310A in the final sum 206 of the main tree 308 , the contribution of the truncated bits with a constant that is the main tree 308 is added. For example, the accuracy of the final total 206 after trimming or truncating the operands 201 of the main tree 308 by adding a constant or a sentence from constants to the main tree 308 based on an estimate of a value that the truncated bits to the final sum 206 contributed.

Dementsprechend stellt 9 einen dritten Addiererbaum 400 dar, der Eingabeschaltungen enthalten kann, die konfiguriert sind, sieben 13-Bit-Operanden 201 zu empfangen, und eine einzelne 10-Bit-Endsumme 206 ausgeben kann. In der ersten Stufe 208A des dritten Addiererbaums 400 wird jeder der 13-Bit-Operanden 201 zu 10-Bit-Operanden 201 trunkiert. Beispielsweise, wie in der linken Seite von 9 gezeigt, werden drei LSBs von den Operanden 201 trunkiert. Wie besprochen können die Operanden 201 trunkiert werden, um effizienter in die integrierte Schaltung 12 zu packen. Die trunkierten Operanden 201 werden dann mithilfe erster Addierer 401A zusammenaddiert. Da eine ungerade Anzahl von Operanden 201 in der ersten Stufe 208A des Addiererbaums 200 beteiligt ist und da die ersten Addierer 401A zwei Operanden 201 empfangen können, kann ein Nulloperationsblock 402 (no-op) einen Operanden 201 empfangen, der nicht in den Satz der ersten Addierer 401A passt. Ferner können nach dem Addieren der Zehn-Bit-Operanden 201 mit den ersten Addierern 401A die Zwischenergebnisse 203 infolge von 1-Bit-Wachstum 11-Bits enthalten.Accordingly presents 9 a third adder tree 400 which may include input circuits configured to have seven 13-bit operands 201 to receive, and a single 10-bit total 206 can spend. In the first stage 208A of the third adder tree 400 becomes each of the 13-bit operands 201 to 10-bit operands 201 truncated. For example, as in the left side of 9 shown are three LSBs from the operands 201 truncated. As discussed, the operands 201 be truncated to be more efficient in the integrated circuit 12 to pack. The truncated operands 201 then use first adders 401A added together. Because an odd number of operands 201 in the first stage 208A of the adder tree 200 is involved and since the first adder 401A two operands 201 can receive a blank operation block 402 (no-op) an operand 201 not received in the sentence of the first adder 401A fits. Further, after adding the ten-bit operands 201 with the first adders 401A the interim results 203 As a result of 1-bit growth, it contains 11 bits.

Daher kann in der zweiten Stufe 208B des dritten Addiererbaums 400 ein LSB der 11-Bit-Zwischenergebnisse 203, die durch die ersten Addierer 401A gebildet werden, trunkiert werden, um drei 10-Bit-Zwischenergebnisse 203 zu bilden. Das LSB des Operanden 201, der von dem Nulloperationsblock 402 verarbeitet wird, kann ebenfalls trunkiert werden, um ein viertes 10-Bit-Zwischenergebnis 203 zu bilden und um die Ausrichtung zwischen den 10-Bit-Zwischenergebnissen 203 beizubehalten. Beispielsweise kann der Operand 201, der von dem Nulloperationsblock 402 verarbeitet wird, mit Nullen aufgefüllt und/oder vorzeichenerweitert werden, bevor das LSB trunkiert wird, um ein 10-Bit-Zwischenergebnis 203 zu erzeugen, das mit den anderen 10-Bit-Zwischenergebnissen 203 entsprechend bitausgerichtet ist. Daher kann in der zweiten Stufe 208B ein Satz von zweiten Addierern 401B zwei Sätze von zwei Operanden aus den vier 10-Bit-Zwischenergebnissen 203 summieren. Daher kann die zweite Stufe 208B zwei 11-Bit-Zwischenergebnisse 203 ausgeben.Therefore, in the second stage 208B of the third adder tree 400 an LSB of the 11-bit intermediate results 203 that through the first adder 401A be formed, truncated to three 10-bit intermediate results 203 to build. The LSB of the operand 201 , that of the blank operation block 402 can also be truncated to a fourth 10-bit intermediate result 203 to form and align between the 10-bit intermediate results 203 maintain. For example, the operand 201 , that of the blank operation block 402 is processed, padded with zeros, and / or sign extended before the LSB is truncated to obtain a 10-bit intermediate result 203 to generate that with the other 10-bit intermediate results 203 is bit-oriented accordingly. Therefore, in the second stage 208B a set of second adders 401B two sets of two operands from the four 10-bit intermediate results 203 sum up. Therefore, the second stage 208B two 11-bit intermediate results 203 output.

Die dritte Stufe 208C des dritten Addiererbaums 400 kann die zwei 11-Bit-Zwischenergebnisse 203 empfangen und das LSB jedes der 11-Bit-Zwischenergebnisse trunkieren, um zwei 10-Bit-Zwischenergebnisse zu bilden. Ein dritter Addierer 401C kann dann diese 10-Bit-Zwischenergebnisse summieren, um eine 11-Bit-Endsumme 206 zu bilden. Dementsprechend kann die vierte Stufe 208D ein LSB der 11-Bit-Endsumme 206 trunkieren, um eine 10-Bit-Endsumme 206 zu bilden. Da jedoch wie besprochen Bits von den Operanden 201 und Zwischenergebnissen 203 bei jeder Stufe (208A-208D) des dritten Addiererbaums 400 trunkiert werden, kann die 10-Bit-Endsumme 206 im Vergleich zu einem tatsächlichen Endergebnis, das von einem Addiererbaum 200 ohne Trunkierung erzeugt wird, fehlerbehaftet sein. Daher kann in einigen Ausführungsformen ein Satz von Konstanten (z.B. A-F) dem Addiererbaum bei bestimmten Stufen 208 hinzuaddiert werden, um einen durchschnittlichen relativen Fehler zu verringern, der durch das Trunkieren von LSBs bei einer beliebigen Stufe 208 des dritten Addiererbaums 400 verursacht wird.The third stage 208C of the third adder tree 400 can the two 11-bit intermediate results 203 and truncate the LSB of each of the 11-bit intermediate results to form two 10-bit intermediate results. A third adder 401C can then sum these 10-bit intermediate results to form an 11-bit final sum 206. Accordingly, the fourth stage 208D truncate an LSB of the 11-bit final sum 206 to a 10-bit grand total 206 to build. However, as discussed, bits from the operands 201 and intermediate results 203 at each stage ( 208A - 208D) of the third adder tree 400 can be truncated, the 10-bit total 206 compared to an actual end result, that of an adder tree 200 without truncation is generated, be faulty. Thus, in some embodiments, a set of constants (eg, AF) may be added to the adder tree at certain stages 208 to reduce an average relative error caused by truncating LSBs at any stage 208 of the third adder tree 400 is caused.

Mit Blick auf das Vorgenannte stellt 10 ein Flussdiagramm eines Verfahrens 500 zur Bestimmung eines geeigneten Satzes von Konstanten (z.B. A-F) zum Addieren in die Stufen 208 eines Addiererbaums 200 gemäß hierin beschriebenen Ausführungsformen dar. Wenngleich die folgende Beschreibung des Verfahrens 500 in einer bestimmten Reihenfolge beschrieben wird, welche eine bestimmte Ausführungsform darstellt, sollte beachtet werden, dass das Verfahren 500 in jeder geeigneten Reihenfolge durchgeführt werden kann. Ferner können im Verfahren 500 bestimmte Schritte gänzlich übersprungen werden und zusätzliche Schritte enthalten sein. Wenngleich die folgende Beschreibung des Verfahrens 500 so beschrieben wird, dass es von dem Prozessor 101 durchgeführt wird, welcher ein oder mehrere Verarbeitungssysteme enthalten kann, sollte beachtet werden, dass das Verfahren 500 in jeder geeigneten Rechenvorrichtung durchgeführt werden kann.With a view to the aforementioned poses 10 a flowchart of a method 500 to determine a suitable set of constants (eg AF) to add to the stages 208 an adder tree 200 According to embodiments described herein. Although the following description of the method 500 In a particular order describing a particular embodiment, it should be noted that the method 500 can be performed in any suitable order. Furthermore, in the process 500 Certain steps may be skipped altogether and additional steps may be included. Although the following description of the method 500 so it is described by the processor 101 should be carried out, which may contain one or more processing systems, it should be noted that the method 500 can be performed in any suitable computing device.

Bei Block 502 kann der Prozessor 101 eine Anzahl von Eingaben erhalten und/oder bestimmen, wie durch wEin, wAus und N in der gezeigten Ausführungsform dargestellt. Daher kann der Prozessor 101 eine Breite von Eingabedaten (wEin) (z.B. eine Breite von Operanden 201), eine Breite von Ausgabedaten (wAus) (z.B. eine Breite der Endsumme 206) und eine Anzahl von Operanden 201, die in dem Addiererbaum 200 zusammen summiert werden sollen (N), empfangen. Beispielsweise kann unter Bezugnahme auf den dritten Addiererbaum 400 von 9 der Prozessor 101 wEin = 13 (z.B. 13-Bit-Operanden 201), wAus = 10 (z.B. 10-Bit-Endsumme 206) und N = 7 (z.B. 7 Eingabeoperanden 201) als Eingaben empfangen.At block 502 can the processor 101 receive and / or determine a number of inputs, as represented by w, w, and N in the illustrated embodiment. Therefore, the processor can 101 a width of input data (wEin) (eg a width of operands 201 ), a width of output data (wOus) (eg a width of the total 206 ) and a number of operands 201 that in the adder tree 200 to be summed together (N), received. For example, with reference to the third adder tree 400 from 9 the processor 101 win = 13 (eg, 13-bit operands 201), wOff = 10 (eg, 10-bit total 206 ) and N = 7 (eg 7 input operands 201 ) as inputs.

Der Prozessor 101 kann dann bei Block 504 die Anzahl von Ebeneneingaben (LI) oder die Anzahl von Operanden, die bei einer bestimmten Stufe 208 des Addiererbaums 200 summiert werden sollen, auf N setzen, da die erste Stufe 208A des Addiererbaums 200 alle der Operanden 201 empfangen kann. Dementsprechend kann der Prozessor 101 für den dritten Addiererbaum 400 von 9 LI = 7 setzen.The processor 101 can then block 504 the number of level inputs (LI) or the number of operands that are at a given level 208 of the adder tree 200 to be summed, set to N, since the first stage 208A of the adder tree 200 all of the operands 201 can receive. Accordingly, the processor 101 for the third adder tree 400 from 9 Set LI = 7.

Bei Block 506 kann der Prozessor 101 einen Trunkierungswert, oder einen durchschnittlichen Fehler, der durch Trunkierung bei der ersten Stufe 208A des Addiererbaums 200 entstanden ist, auf Grundlage einer Verteilung der Operanden 201 (z.B. Eingaben) des Addiererbaums 200 bestimmen. In einigen Ausführungsformen kann der Prozessor 101 den Trunkierungswert bei der ersten Ebene des Addiererbaums 200 auf Grundlage einer Annahme bestimmen, dass die Werte der Operanden 201 gleichmäßig verteilt sind. Beispielsweise können unter Bezugnahme auf 9 die LSB-Gruppen 305, die bei der ersten Stufe 208A trunkiert werden, einen Wert von 0 (z.B. 000) bis 7 (z.B. 111) aufweisen. Bei einer gleichmäßigen Verteilung der Operanden 201 beträgt der Trunkierungswert, der durch das Entfernen der LSB-Gruppen 305 von einem Operanden 201 entsteht, 3,5. Folglich beträgt die Summe der Trunkierungswerte für jeden der sieben 13-Bit-Operanden 201, die zu 10-Bit-Operanden 201 trunkiert werden, 24,5. At block 506 can the processor 101 a truncation value, or an average error caused by truncation at the first stage 208A of the adder tree 200 based on a distribution of operands 201 (eg inputs) of the adder tree 200 determine. In some embodiments, the processor 101 the truncation value at the first level of the adder tree 200 on the basis of an assumption, determine that the values of the operands 201 evenly distributed. For example, with reference to 9 the LSB groups 305 that at the first stage 208A be truncated, a value from 0 (eg 000) to 7 (eg 111 ) exhibit. With a uniform distribution of the operands 201 is the truncation value obtained by removing the LSB groups 305 from an operand 201 arises, 3,5. Thus, the sum of the truncation values is for each of the seven 13-bit operands 201 leading to 10-bit operands 201 to be truncated, 24.5.

Bei Block 508 kann der Prozessor 101 dann einen gesamten durchschnittlichen Trunkierungswert für den Addiererbaum 200 aktualisieren. Da der gesamte durchschnittliche Trunkierungswert auf 0 initiiert werden kann, nachdem Block 506 abgeschlossen ist, kann der gesamte durchschnittliche Trunkierungswert aktualisiert werden, um mit dem Wert übereinzustimmen, der bei Block 506 berechnet wird.At block 508 can the processor 101 then an overall average truncation value for the adder tree 200 To update. Because the total average truncation value can be initiated to 0 after block 506 is complete, the total average truncation value can be updated to match the value at Block 506 is calculated.

Der Prozessor 101 kann dann bei Block 510 bestimmen, ob LI größer oder gleich 2 ist. Daher kann der Prozessor bestimmen, ob er Berechnungen für eine letzte Stufe 208 des Addiererbaums 200 oder eine frühere Stufe 208 des Baums verarbeitet. Wenn LI größer oder gleich 2 ist, kann der Prozessor bei Block 512 LI auf LI = ceil(LI/2) aktualisieren. Daher kann der Prozessor die Operation LI/2 auf ihre Höchstgrenze runden. Beispielsweise empfängt der dritte Addiererbaum 400 von 9 sieben 13-Bit-Operanden 201 bei seiner ersten Stufe 208A. Daher ist der Anfangswert von LI = 7, was größer als 2 ist. Folglich kann bei Block 512 der Prozessor 101 gemäß ceil(7/2) aktualisieren, was LI = 4 ergibt, was der Anzahl von 11-Bit-Zwischenergebnissen 203 entspricht, die von der zweiten Stufe 208B des Addiererbaums empfangen werden. In einer nächsten Iteration kann der Prozessor 101 den Wert von LI auf 2 aktualisieren, was der Anzahl von 11-Bit-Zwischenergebnissen 203 entspricht, die von der dritten Stufe 208C empfangen werden, und in einer letzten Iteration oder bei der letzten Stufe 208D wird der Wert von LI 1 sein, was der Endsumme 206 entspricht.The processor 101 can then block 510 determine if LI is greater than or equal to 2. Therefore, the processor can determine if it is calculating for a final stage 208 of the adder tree 200 or an earlier stage 208 processed by the tree. If LI is greater than or equal to 2, the processor may block 512 Update LI to LI = ceil (LI / 2). Therefore, the processor can round operation LI / 2 to its maximum limit. For example, the third adder tree receives 400 from 9 seven 13-bit operands 201 at its first stage 208A , Therefore, the initial value of LI = 7, which is greater than 2. Consequently, at block 512 the processor 101 according to ceil (7/2), giving LI = 4, which is the number of 11-bit intermediate results 203 corresponds to that of the second stage 208B of the adder tree. In a next iteration, the processor can 101 update the value of LI to 2, which is the number of intermediate 11-bit results 203 corresponds to that of the third stage 208C received, and in a final iteration or at the last stage 208D becomes the value of LI 1 be what the final sum 206 equivalent.

Bei Block 514 kann der Prozessor 101 den durchschnittlichen Trunkierungswert für die Ebeneneingaben einer nächsten Stufe 208 auf Grundlage eines durchschnittlichen Trunkierungswerts berechnen. Dazu kann der Prozessor 101 die Stufe 208 des Addiererbaums 200 aktualisieren, die nach der Aktualisierung des Werts von LI untersucht wird, und ähnlich wie Block 508 den durchschnittlichen Trunkierungswert bestimmen, der sich aus der Trunkierung von Zwischenergebnissen 203 bei der untersuchten Stufe 208 ergibt. Während der Prozessor 101 verteilungsbasierte Daten (z.B. die Annahme, dass die Operanden gleichmäßig verteilt sind) nutzen kann, um den Trunkierungswert bei Block 506 zu bestimmen, kann der Prozessor jedoch den Trunkierungswert auf Grundlage eines Durchschnitts bei Block 508 bestimmen. Beispielsweise kann der Prozessor 101 den Trunkierungswert der ersten Stufe 208A des dritten Addiererbaums 400 bei Block 506 berechnen und anschließend bei Block 512 einen Trunkierungswert der zweiten Stufe 208B bestimmen, wenn LI = 4, der dritten Stufe 208C bestimmen, wenn LI = 2 und der vierten Stufe 208D bestimmen, wenn LI = 1. Da der Prozessor 101 den Trunkierungswert auf Grundlage eines durchschnittlichen Trunkierungswerts für jedes Zwischenergebnis 203 des Addiererbaums 200 bestimmen kann, wird in der zweiten Stufe 208B das Trunkieren eines Einzelbits mit dem Höchstwert von 8 (z.B. 2³) einen Durchschnittswert von 4 aufweisen. Daher kann der Prozessor 101 über vier 11-Bit-Zwischenergebnisse 203 einen durchschnittlichen Trunkierungswert von 16 für die zweite Stufe 208B bestimmen. Der Höchstwert des trunkierten LSB beträgt 8, da, wenngleich es ein Einzelbit ist, das LSB, das von dem 11-Bit-Zwischenergebnis 203 in der zweiten Stufe 208B trunkiert wird, im Vergleich zu dem ursprünglichen 13-Bit-Operanden 201 das vierte Bit ist. Ferner kann der Prozessor 101 für die dritte Stufe 208C einen durchschnittlichen Trunkierungswert von 8 und einen gesamten Trunkierungswert von 16 über die zwei 11-Bit-Zwischenergebnisse 201 berechnen, da der Höchstwert des trunkierten LSB, das sich in der fünften Bitposition bezogen auf den 13-Bit-Operanden 201 befindet, 16 ist. Für die dritte Stufe kann der Prozessor 101 einen durchschnittlichen Trunkierungswert und einen gesamten Trunkierungswert von 16 berechnen, da ein einzelnes LSB vorliegt, das von der sechsten Bitposition bezogen auf den 13-Bit-Operanden 201 von der 11-Bit-Endsumme 206 trunkiert wird.At block 514 can the processor 101 the average truncation value for the level inputs of a next level 208 based on an average truncation value. This can be done by the processor 101 the stage 208 of the adder tree 200 update, which is inspected after updating the value of LI, and similar to Block 508 determine the average truncation value resulting from the truncation of intermediate results 203 at the studied stage 208 results. While the processor 101 distribution-based data (eg, the assumption that the operands are evenly distributed) to use the truncation value at block 506 however, the processor may determine the truncation value based on an average at block 508 determine. For example, the processor 101 the truncation value of the first stage 208A of the third adder tree 400 at block 506 calculate and then at block 512 a truncation value of the second stage 208B determine if LI = 4, the third stage 208C determine if LI = 2 and the fourth stage 208D determine if LI = 1. Since the processor 101 the truncation value based on an average truncation value for each intermediate result 203 of the adder tree 200 can determine is in the second stage 208B truncating a single bit having the maximum value of 8 (eg, 2 ³ ) has an average value of 4. Therefore, the processor can 101 over four 11-bit intermediate results 203 an average truncation value of 16 for the second stage 208B determine. The maximum value of the truncated LSB is 8, because although it is a single bit, the LSB is that of the 11-bit intermediate result 203 in the second stage 208B is truncated compared to the original 13-bit operand 201 the fourth bit is. Furthermore, the processor can 101 for the third stage 208C an average truncation value of 8 and a total truncation value of 16 over the two 11-bit intermediate results 201 because the maximum value of the truncated LSB that is in the fifth bit position relative to the 13-bit operand 201 is 16. For the third stage, the processor 101 calculate an average truncation value and a total truncation value of 16 since there is a single LSB from the sixth bit position relative to the 13-bit operand 201 from the 11-bit final sum 206 is truncated.

Dementsprechend kann der Prozessor, wenn das Verfahren 500 in einer Schleife zurückkehrt, bei Block 508 einen zuletzt berechneten Trunkierungswert zu dem gesamten durchschnittlichen Trunkierungswert hinzuaddieren, um den Trunkierungswert jeder Stufe 208 des Addiererbaums 200 zu berücksichtigen. Daher kann der Prozessor 101 für den dritten Addiererbaum 400 jedes Mal, wenn er Block 508 erreicht, zu dem durchschnittlichen Trunkierungswert iterativ 24,5 für die erste Stufe 208A, 16 für die zweite Stufe 208B, 16 für die dritte Stufe 208C und 16 für die vierte Stufe 208D hinzuaddieren, für einen gesamten durchschnittlichen Trunkierungswert von 72,5.Accordingly, the processor, if the process 500 returns in a loop, at block 508 Add a last calculated truncation value to the total average truncation value to the truncation value of each level 208 of the adder tree 200 to take into account. Therefore, the processor can 101 for the third adder tree 400 every time he blocks 508 reaches 24.5 for the first stage at the average truncation value iteratively 208A . 16 for the second stage 208B . 16 for the third stage 208C and 16 for the fourth stage 208D for a total average truncation value of 72.5.

Nachdem der Prozessor 101 bei Block 510 bestimmt hat, dass LI weniger als 2 beträgt, kann der Prozessor 101 bei Block 516 den durchschnittlichen Trunkierungswert auf einen nächstliegenden Wert runden, der in dem Addiererbaum darstellbar ist, der auf Grundlage des Orts der Konstanten bestimmt werden kann (z.B. A-F). Da beispielsweise in 9 die Konstanten A-C jeweils ein Einzelbit darstellen, das in jeden der ersten Addierer 401A übertragen wird, und da in Bezug auf die 13-Bit-Operanden 201 die Konstanten A-C in die vierte Bitposition übertragen werden, können sie jeweils eine 8 oder eine 0 darstellen. In ähnlicher Weise können die Konstanten D-E, die in den Satz von zweiten Addierern 401B bei der fünften Bitposition übertragen werden, eine 16 oder eine 0 darstellen, und die Konstante F, die in den dritten Addierer 401C bei der sechsten Bitposition übertragen wird, kann eine 32 oder eine 0 darstellen. Daher ist A*8 + B*8 + C*8 + D*16 + E*16 + F*32, wobei A-F ganzzahlige Werte sind, eine repräsentative Gleichung eines Werts, der in dem Addiererbaum 200 darstellbar ist. Daher kann ein gesamter durchschnittlicher Trunkierungswert von 72,5 in dem Addiererbaum 200 durch 72 approximiert werden, indem A= 1, F= 2 und alle anderen Konstanten auf 0 gesetzt werden, neben anderen Kombinationen der Konstanten. After the processor 101 at block 510 has determined that LI is less than 2, the processor can 101 at block 516 round the average truncation value to a nearest value that can be represented in the adder tree, which can be determined based on the location of the constant (eg AF). For example, in 9 the constants AC each represent a single bit into each of the first adders 401A and since the constants AC are transferred to the fourth bit position with respect to the 13-bit operands 201, they may respectively represent an 8 or a 0. Similarly, the constants DE that are in the set of second adders 401B at the fifth bit position, representing a 16 or a 0, and the constant F entering the third adder 401C transmitted at the sixth bit position may represent a 32 or a 0. Therefore, A * 8 + B * 8 + C * 8 + D * 16 + E * 16 + F * 32, where AF are integer values, is a representative equation of a value contained in the adder tree 200 is representable. Therefore, a total average truncation value of 72.5 in the adder tree 200 can be approximated by 72 by setting A = 1, F = 2 and all other constants to 0, among other combinations of the constants.

Nach dem Bestimmen eines Werts für den durchschnittlichen Trunkierungswert, der in dem Addiererbaum 200 darstellbar ist, zum Beispiel nach dem Bestimmen einer geeigneten Kombination von Konstanten, die am nächsten mit dem durchschnittlichen Trunkierungswert übereinstimmen, kann der Prozessor 101 die Konstanten zurückgeben, die den entsprechenden Stufen des Addiererbaums 200 hinzuaddiert werden sollen. Dementsprechend können die Konstanten in das Design des Addiererbaums 200 integriert werden, wenn er in einer integrierten Schaltung 12 erzeugt wird.After determining a value for the average truncation value stored in the adder tree 200 For example, after determining an appropriate combination of constants that most closely match the average truncation value, the processor may 101 return the constants corresponding to the corresponding stages of the adder tree 200 should be added. Accordingly, the constants in the design of the adder tree 200 be integrated when placed in an integrated circuit 12 is produced.

Während das Verfahren 500 unter Bezugnahme auf den dritten Addiererbaum 400 von 9 beschrieben wird, sollte beachtet werden, dass das Verfahren auf jeden geeigneten Addiererbaum mit geeigneten Werten für wEin, wAus und N angewandt werden kann.While the procedure 500 with reference to the third adder tree 400 from 9 It should be noted that the method can be applied to any suitable adder tree with appropriate values for w, w, and N.

Ferner kann in der oben beschriebenen Ausführungsform bei Block 506 der Prozessor 101 den Trunkierungswert bei der ersten Ebene des Addiererbaums 200 auf Grundlage einer Annahme bestimmen, dass die Werte der Operanden 201 gleichmäßig verteilt sind. Zusätzlich oder alternativ kann der Prozessor 101 Daten bezüglich der Operanden 201 nutzen und/oder bestimmen, dass die Werte der Operanden 201 nicht gleichmäßig verteilt sind. Unter Bezugnahme auf 11 kann beispielsweise eine Tabelle 600 bei einem gegebenen vorzeichenlosen Multiplizierer von 8-Bit mal 5-Bit eine Verteilung der 3 LSBs in dem resultierenden 13-Bit-Produkt erfassen, welches als ein 13-Bit-Operand 201 in den Addiererbaum 200 eingegeben werden kann. Beispielsweise kann die Tabelle 600 die Verteilung jedes der möglichen Werte, die diskret von 0 bis 7 reichen kann, der 3 LSBs in einem Produkt verdeutlichen, das aus jedem der möglichen Eingaben in den Multiplizierer resultieren kann. Wie dargestellt, beträgt der Wert der 3 LSBs für jede gegebene Eingabe in den Multiplizierer höchstwahrscheinlich 0. Dementsprechend kann ein gewichteter Durchschnitt des Werts der 3 LSBs bei 2,375 landen, wie durch die Linie 602 bezeichnet, anstatt beim ungewichteten Durchschnitt von 3,5. Dazu kann der Prozessor 101, wenn der Prozessor 101 bei Block 506 bestimmt, dass beispielsweise die Operanden 201 aus einem vorzeichenlosen Multiplizierer von 8-Bit mal 5-Bit resultieren, die in der Tabelle 600 bereitgestellten Verteilungsinformationen nutzen, um zu bestimmen, dass ein Trunkierungswert für jeden der 7 Operanden 201 2,375 beträgt und der Gesamtwert für die Operanden 201 16,625 beträgt. In diesen Ausführungsformen kann der gesamte durchschnittliche Trunkierungswert für den Addiererbaum 200, der von dem Prozessor bestimmt wird, eine verbesserte Genauigkeit aufweisen, was zu einem Satz von Konstanten führen kann, die genauer um die Trunkierungsfehler korrigieren.Further, in the embodiment described above, in block 506 the processor 101 the truncation value at the first level of the adder tree 200 on the basis of an assumption, determine that the values of the operands 201 evenly distributed. Additionally or alternatively, the processor 101 Data regarding the operands 201 use and / or determine that the values of the operands 201 are not evenly distributed. With reference to 11 For example, for a given 8-bit by 5-bit unsigned multiplier, a table 600 may capture a distribution of the 3 LSBs in the resulting 13-bit product, which may be considered a 13-bit operand 201 into the adder tree 200 can be entered. For example, the table 600 may illustrate the distribution of each of the possible values, which may range discretely from 0 to 7, of the 3 LSBs in a product, which may result from each of the possible inputs to the multiplier. As shown, for any given input to the multiplier, the value of the 3 LSBs is most likely to be 0. Accordingly, a weighted average of the value of the 3 LSBs may land at 2.375, as through the line 602 instead of the unweighted average of 3.5. This can be done by the processor 101 if the processor 101 at block 506 determines that, for example, the operands 201 result from an 8-bit by 5-bit unsigned multiplier that uses distribution information provided in table 600 to determine that a truncation value for each of the 7 operands 201 Is 2,375 and the total value for the operands 201 Is 16,625. In these embodiments, the total average truncation value for the adder tree 200 , which is determined by the processor, have improved accuracy, which may result in a set of constants that more accurately correct for the truncation errors.

Ferner kann der Satz von Konstanten in einigen Ausführungsformen auf Grundlage einer aktuellen Verteilung der LSBs der Operanden dynamisch aktualisieren. Wie beispielsweise unter Bezugnahme auf die 9 und 10 besprochen, kann in einigen Ausführungsformen das Datenverarbeitungssystem 100 und/oder der Prozessor 101 die Werte des Satzes von Konstanten nach dem Bau des Addiererbaums 200 periodisch aktualisieren, während der Prozessor 101 den Satz von Konstanten bestimmen kann, die einen Trunkierungsfehler in einem Addiererbaum 200 vor dem Bau des Addiererbaums 200 verringern können. Dementsprechend stellt 12 eine Ausführungsform eines Addiererbaumsystems 620 dar, das die Werte eines Satzes von Konstanten (z.B. A-F) ändern kann, die in den Addiererbaum 200 hineinaddiert werden.Further, in some embodiments, the set of constants may dynamically update the operands based on a current distribution of the LSBs. For example, with reference to the 9 and 10 In some embodiments, the data processing system may be discussed 100 and / or the processor 101 the values of the set of constants after the construction of the adder tree 200 periodically refresh while the processor 101 can determine the set of constants that cause a truncation error in an adder tree 200 before the construction of the adder tree 200 can reduce. Accordingly presents 12 an embodiment of an adder tree system 620 which can change the values of a set of constants (eg, AF) that are in the adder tree 200 be added.

In diesen Ausführungsformen kann ein Register und/oder Ort im Speicher 102 auf jeden der Sätze von Konstanten (z.B. A-F) abbilden, so dass während der Ausführung des Addiererbaums 200 einer arithmetischen Operation die Werte jeder des Satzes von Konstanten (z.B. A-F) abgerufen und in den Addiererbaum 200 eingegeben werden können. Ferner kann wie dargestellt ein Verarbeitungssystem, wie beispielsweise das Datenverarbeitungssystem 100, Daten bezüglich der LSBs 622 von dem Addiererbaum 200 empfangen. Die Daten bezüglich der LSBs 622 können Informationen enthalten, wie beispielsweise den Wert der LSBs, sowie die Stufe 208 des Addiererbaums 200, von dem die LSBs empfangen werden. Beispielsweise kann das Datenverarbeitungssystem 100 Daten bezüglich der LSBs 622 für LSBs bei jeder Stufe 208 innerhalb des Addiererbaums 200 empfangen. Ferner kann das Datenverarbeitungssystem 100 eine LSB-Verteilungslogik 624 enthalten. Eine geeignete Kombination von Komponenten (z.B. des Prozessors 101 und des Speichers 102) des Datenverarbeitungssystems 100 kann die LSB-Verteilungslogik 624 implementieren und/oder zu dieser beitragen. In jedem Fall kann die LSB-Verteilungslogik 624 die Daten bezüglich der LSBs 622 empfangen und eine Verteilung von LSBs für eine beliebige geeignete Stufe 208 (z.B. eine Stufe 208, von der Daten bezüglich der LSBs 622 empfangen wurden) bestimmen und/oder aktualisieren. Daher kann das Datenverarbeitungssystem 100 einen oder mehrere Sätze von LSB-Verteilungen beibehalten, wie in 11 dargestellt.In these embodiments, a register and / or location may be in memory 102 to each of the sets of constants (eg, AF), such that during execution of the adder tree 200 An arithmetic operation retrieves the values of each of the set of constants (eg AF) and into the adder tree 200 can be entered. Furthermore, as shown, a processing system such as the data processing system 100 , Data on LSBs 622 from the adder tree 200 receive. The data concerning the LSBs 622 may contain information, such as the value of the LSBs, as well as the stage 208 of the adder tree 200 from which the LSBs are received. For example, the data processing system 100 Data regarding the LSBs 622 for LSBs at each level 208 within the adder tree 200 receive. Furthermore, the data processing system 100 an LSB distribution logic 624 contain. A suitable combination of components (eg the processor 101 and the memory 102 ) of the data processing system 100 can the LSB distribution logic 624 implement and / or contribute to this. In any case, the LSB distribution logic 624 the data regarding the LSBs 622 receive and a distribution of LSBs for any suitable level 208 (eg a level 208 , from the data regarding the LSBs 622 received) determine and / or update. Therefore, the data processing system 100 maintain one or more sets of LSB distributions, as in 11 shown.

Die Berechnungslogik 626 kann den einen oder die mehreren Sätze von LSB-Verteilungen, die von der LSB-Verteilungslogik 624 beibehalten werden, nutzen, um einen geeigneten Satz von Konstanten (z.B. A-F) zum Eingeben in den Addiererbaum 200, um einen Trunkierungsfehler zu verringern, zu bestimmen. Um den geeigneten Satz von Konstanten (z.B. A-F) zu bestimmen, kann wie oben besprochen die Berechnungslogik 626 die Trunkierungsfehlerwerte für jede Stufe 208 des Addiererbaums 200 summieren und den nächstgelegenen Wert des gesamten Trunkierungsfehlers, der durch die Konstanten darstellbar ist, die in den Addiererbaum 200 eingegeben werden, auf Grundlage des Orts (z.B. Stufe 208) bestimmen, in den die Konstanten eingegeben werden (z.B. auf Grundlage einer Gleichung, wie beispielsweise A*8 + B*8 + C*8 + D*16 + E*16 + F*32). Nachdem die Berechnungslogik 626 den geeigneten Satz von Konstanten berechnet hat, kann das Datenverarbeitungssystem 100 den geeigneten Satz von Konstanten an ihre jeweiligen Register übertragen, um einen aktuellen Wert zu aktualisieren, der mit jeder Konstante assoziiert ist, die in einem jeweiligen Register gespeichert ist. Dementsprechend kann ein aktualisierter Wert für den Satz von Konstanten über den jeweiligen Satz von Registern in den Addiererbaum 200 fließen.The calculation logic 626 may be the one or more sets of LSB distributions derived from the LSB distribution logic 624 to use a suitable set of constants (eg AF) to enter into the adder tree 200 to determine a truncation error to determine. To determine the appropriate set of constants (eg AF), the computation logic may be discussed above 626 the truncation error values for each stage 208 of the adder tree 200 and the nearest value of the total truncation error represented by the constants that are in the adder tree 200 be entered based on the location (eg level 208 ) into which the constants are input (eg, based on an equation such as A * 8 + B * 8 + C * 8 + D * 16 + E * 16 + F * 32). After the calculation logic 626 has calculated the appropriate set of constants, the data processing system 100 Transmit the appropriate set of constants to their respective registers to update a current value associated with each constant stored in a respective register. Accordingly, an updated value for the set of constants over the respective set of registers in the adder tree 200 flow.

In einigen Ausführungsformen kann zusätzlich oder alternativ zum folgenden Verfahren 500, um einen Satz von Konstanten zu berechnen, die geeignet sind, einen Trunkierungsfehler auszugleichen, ein Satz von vorberechneten festen Anpassungswerten genutzt werden, ohne Gewichte oder Anwendungen zu analysieren. In einem Fall können diese Werte beispielsweise auf das Zweifache der Anzahl von Eingabewerten in den Baum festgelegt sein. In einem anderen Fall kann Heuristik genutzt werden, um einen wahrscheinlichsten Anpassungswert zu bestimmen.In some embodiments, in addition or as an alternative to the following method 500 In order to compute a set of constants that are apt to compensate for a truncation error, use a set of precalculated fixed adaptation values without analyzing weights or applications. For example, in one case, these values may be set to twice the number of input values in the tree. In another case, heuristics can be used to determine a most likely fit value.

In jedem Fall kann eine konstante Zahl dem Addiererbaum 200 hinzuaddiert werden, was auf mehrere Arten und Weisen erreicht werden kann. Ein einfaches Verfahren kann beinhalten, dass die konstante Zahl direkt zu einer Ausgabe des Addiererbaums 200 hinzuaddiert wird. Beispielsweise kann der Addiererbaum 200 seine Zwischenergebnisse 203 und Endsumme 206 ohne Änderung an dem Baum berechnen und runden, und nachdem die Endsumme 206 erzeugt wurde, kann die konstante Zahl zu der Endsumme 206 hinzuaddiert werden. Daher kann dieses Verfahren jedoch einen Latenzzyklus und zusätzliche Softlogik-Addierer-Ressourcen hinzufügen. Wenn jedoch ein ungepaartes Tupel in dem Addiererbaum 200 vorliegt, kann die Konstante in das ungepaarte Tupel mit einer geringeren Latenz hineinaddiert werden als das Addieren der Konstante zu der Endsumme 206 des Addiererbaums 200. Ferner wird in einigen Ausführungsformen durch Nutzen von Softlogik in Verbindung mit eingebetteten Ripple-Carry-Addierern moderner FPGAs 40, um die Konstante zu addieren, möglicherweise keine zusätzliche Latenz oder Fläche genutzt, um den Trunkierungsfehler zu korrigieren.In any case, a constant number can be added to the adder tree 200 can be added, which can be achieved in several ways. A simple method may involve having the constant number directly to an output of the adder tree 200 is added. For example, the adder tree 200 his interim results 203 and final sum 206 calculate and round without modification to the tree, and after the final sum 206 is generated, the constant number may be the final sum 206 be added. Therefore, however, this method may add a latency cycle and additional soft-logic adder resources. However, if there is an unpaired tuple in the adder tree 200 is present, the constant can be added into the unpaired tuple with a lower latency than adding the constant to the final sum 206 of the adder tree 200 , Further, in some embodiments, using soft logic in conjunction with embedded ripple-carry adders of modern FPGAs 40 In order to add the constant, no extra latency or area may be used to correct the truncation error.

Dementsprechend stellt 13 eine Ausführungsform eines Addiererbaumknotens 207 dar, der auf einen Addierer mit 2 Eingaben 650 abgebildet ist. Der Addierer mit 2 Eingaben 650 kann einen ersten 4-Bit-Operanden A (z.B. A1-A4) und einen zweiten 4-Bit-Operanden B (z.B. B1-B4) empfangen und ein 4-Bit-Ergebnis S (z.B. S1-S4) ausgeben. Um das 4-Bit-Ergebnis S zu erzeugen, kann der Addierer mit 2 Eingaben 650 einen Ripple-Carry-Addierer 654 für jedes Bit in dem ersten 4-Bit-Operanden A und/oder dem zweiten 4-Bit-Operanden B enthalten. Beispielsweise können das erste Bit des ersten 4-Bit-Operanden A (A1) und das erste Bit des zweiten 4-Bit-Operanden B (B1) auf einen ersten Ripple-Carry-Addierer 654A abbilden, das zweite Bit des ersten 4-Bit-Operanden A (A2) und das zweite Bit des zweiten 4-Bit-Operanden B (B2) können auf einen zweiten Ripple-Carry-Addierer 654B abbilden, das dritte Bit des ersten 4-Bit-Operanden A (A3) und das dritte Bit des zweiten 4-Bit-Operanden B (B3) können auf einen dritten Ripple-Carry-Addierer 654C abbilden, und das vierte Bit des ersten 4-Bit-Operanden A (A4) und das vierte Bit des zweiten 4-Bit-Operanden B (B4) können auf einen vierten Ripple-Carry-Addierer 654D abbilden. Daher kann jeder Ripple-Carry-Addierer 654 ein Einzelbit (z.B. S1-S4) ausgeben, um das 4-Bit-Ergebnis S zu bilden, und ein Übertragsbit, das in eine Summe hineinaddiert werden kann, die von einem Ripple-Carry-Addierer 654 berechnet wird, an den nächsten Ripple-Carry-Addierer 654 weitergeben. In einigen Ausführungsformen kann der erste Ripple-Carry-Addierer 654A ein Übertragsbit empfangen, wenngleich kein Ripple-Carry-Addierer 654 vorliegt, der der Operation des ersten Ripple-Carry-Addierers 654A vorangeht, was sich auf die resultierende Summe in dem 4-Bit-Ergebnis S auswirken kann.Accordingly presents 13 an embodiment of an adder tree node 207 which points to a 2-input adder 650 is shown. The adder with 2 inputs 650 may be a first 4-bit operand A (eg A1 - A4 ) and a second 4-bit operand B (eg B1 - B4 ) and receive a 4-bit result S (eg S1 - S4 ) output. In order to generate the 4-bit result S, the adder with 2 inputs 650 a ripple-carry adder 654 for each bit in the first 4-bit operand A and / or the second 4-bit operand B. For example, the first bit of the first 4-bit operand A ( A1 ) and the first bit of the second 4-bit operand B ( B1 ) to a first ripple carry adder 654A map the second bit of the first 4-bit operand A ( A2 ) and the second bit of the second 4-bit operand B ( B2 ) can access a second ripple carry adder 654B map the third bit of the first 4-bit operand A ( A3 ) and the third bit of the second 4-bit operand B ( B3 ) can access a third ripple carry adder 654C map and the fourth bit of the first 4-bit operand A ( A4 ) and the fourth bit of the second 4-bit operand B ( B4 ) can access a fourth ripple carry adder 654D depict. Therefore, any ripple carry adder 654 output a single bit (eg, S1-S4) to get the 4-bit result S and a carry bit that can be summed into a sum supplied by a ripple carry adder 654 is calculated to the next ripple carry adder 654 pass on. In some embodiments, the first ripple carry adder 654A receive a carry bit, although not a ripple carry adder 654 present, the operation of the first ripple carry adder 654A precedes, which affects the resulting sum in the 4-bit result S can affect.

Ferner bildet jedes Operandenbit (z.B. A1-A4 und/oder B1-B4) möglicherweise nicht direkt auf einen jeweiligen Addierer ab (z.B. 654A-654D); stattdessen kann ein Satz von Softlogikblöcken 652, die mit den Addierern 654 assoziiert sind, die Operandenbits verarbeiten und ein Ergebnis an die Addierer 654 ausgeben. In der dargestellten Ausführungsform empfängt beispielsweise ein Softlogikblock 652 das erste Bit des ersten 4-Bit-Operanden A (A1) und gibt ein Ergebnis auf Grundlage des ersten Bits des ersten 4-Bit-Operanden A (A1) an den ersten Ripple-Carry-Addierer 654A aus, und ein Softlogikblock 652 empfängt das erste Bit des zweiten 4-Bit-Operanden B (B1) und gibt ein Ergebnis auf Grundlage des ersten Bits des zweiten 4-Bit-Operanden B (B1) aus. In einigen Ausführungsformen können die Operandenbits (z.B. A1-A4 und/oder B1-B4) die Softlogikböcke 652 unverändert durchlaufen, bevor sie die Addierer 654 erreichen. Beispielsweise kann der Addierer mit 2 Eingaben 650 so wirken, als ob die Softlogikblöcke 652 nicht zwischen den Operandenbits (z.B. A1-A4 und/oder B1-B4) und den Addierern 654 enthalten sind, oder als ob die Operandenbits (z.B. A1-A4 und/oder B1-B4) direkt mit den Addierern 654 gekoppelt sind. In einigen Ausführungsformen können sich die Softlogikblöcke jedoch auf das 4-Bit-Ergebnis S des Addierers mit 2 Eingaben 650 auswirken.Furthermore, each operand bit (eg A1 - A4 and or B1 - B4 ) may not depend directly on a particular adder (eg 654A - 654D ); instead, you can use a set of soft logic blocks 652 that with the adders 654 associate the operand bits and a result to the adders 654 output. For example, in the illustrated embodiment, a soft logic block receives 652 the first bit of the first 4-bit operand A ( A1 ) and gives a result based on the first bit of the first 4-bit operand A ( A1 ) to the first ripple carry adder 654A off, and a soft logic block 652 receives the first bit of the second 4-bit operand B ( B1 ) and outputs a result based on the first bit of the second 4-bit operand B ( B1 ) out. In some embodiments, the operand bits (eg A1 - A4 and or B1 - B4 ) the soft logic blocks 652 go through it unchanged, before the adders 654 to reach. For example, the adder with 2 inputs 650 act as if the soft logic blocks 652 not between the operand bits (eg A1 - A4 and or B1 - B4 ) and the adders 654 are contained, or as if the operand bits (eg A1 - A4 and or B1 - B4 ) directly with the adders 654 are coupled. However, in some embodiments, the soft logic blocks may be responsive to the 4-bit result S of the adder with 2 entries 650 impact.

Bezugnehmend auf 14 können die Softlogikblöcke 652 eine Konstante (z.B., konstante Komprimierung) für eine 3-2 Komprimiererstruktur 700 emulieren. In diesen Ausführungsformen können zusätzliche Verbindungen zwischen den Softlogikblöcken 652 und den Addierern 654 verfügbar sein, um Addierer mit 3 Eingaben zu implementieren, indem zuerst eine Komprimierung von 3-2 erzeugt wird. Dementsprechend können zwei Operanden, A und B, zu mehreren Softlogikblöcken 652 gleichzeitig weiterleiten. Da eine integrierte Schaltung 12 viele redundante verfügbare Verbindungen aufweisen kann, belastet dieses Weiterleiten die lokale Weiterleitung möglicherweise nicht wesentlich. Die Softlogikblöcke 652 können einen dritten Operanden, welcher eine Konstante sein kann, direkt kodieren. Daher kann eine Konstante (z.B. der dritte Operand) überall in dem Baum ohne zusätzliche Logik, Weiterleitung oder Latenzauswirkung addiert werden.Referring to 14 can the soft logic blocks 652 a constant (eg, constant compression) for a 3-2 compressor structure 700 emulate. In these embodiments, additional connections may be made between the soft logic blocks 652 and the adders 654 be available to implement adders with 3 inputs by first generating a compression of 3-2. Accordingly, two operands, A and B , to several soft logic blocks 652 forward at the same time. As an integrated circuit 12 If there are many redundant connections available, this forwarding may not materially affect local forwarding. The soft logic blocks 652 may directly encode a third operand, which may be a constant. Therefore, a constant (eg, the third operand) can be added anywhere in the tree without additional logic, routing, or latency impact.

Auf 13 zurückkommend, können die Softlogikblöcke 652 in einigen Ausführungsformen eine LUT enthalten. Daher können die Summenbits (z.B. S1-S4) und die Übertragsbits, die von einem jeweiligen Ripple-Carry-Addierer 654 erzeugt werden, auf Grundlage von Abbildungen in der LUT der Softlogikblöcke 652 bestimmt werden. Ferner können die Softlogikblöcke 652 in einigen Ausführungsformen das Übertragsbit, das von dem ersten Ripple-Carry-Addierer 654A empfangen wird, zum Teil auf Grundlage von LUTs bestimmen. In diesen Ausführungsformen können die Softlogikblöcke 652 einen Rundungsfehler berücksichtigen, der mit dem Trunkieren eines LSBs verbunden ist. Beispielsweise können die Softlogikblöcke 652 ein Übertragsbit bestimmen, das das LSB zu dem 4-Bit-Ergebnis S beigetragen hätte, wenn es nicht trunkiert gewesen wäre, und das Übertragsbit in den ersten Ripple-Carry-Addierer 654A hineinaddieren, um Rundungsfehler zu verringern, die ohne den Beitrag des Übertragsbits verursacht werden. Dazu können ein Bit, das von dem ersten Operanden A (A0) trunkiert wird, und ein Vorzeichenbit von dem ersten Operanden A (SA) in einen ersten Softlogikblock 652 fließen. Der erste Softlogikblock 652 kann eine LUT enthalten, die die Eingaben A0 und SA auf eine Ausgabe (A0 XOR SA), oder das exklusive oder von A0 und SA, abbildet. Ferner können ein trunkiertes Bit von dem zweiten Operanden B (B0) und ein Vorzeichenbit von dem zweiten Operanden B (B0) in einen zweiten Softlogikblock 652 fließen. Der zweite Softlogikblock kann eine LUT enthalten, die die Eingaben B0 und SB auf eine Ausgabe (B0 XOR SB), oder das exklusive oder von B0 und SB, abbildet. Die Ausgaben des ersten Softlogikblocks 652 und des zweiten Softlogikblocks 652 können in einen Ripple-Carry-Addierer 654 fließen. In einigen Ausführungsformen kann der Ripple-Carry-Addierer 654 auch SA als ein Übertragsbit empfangen. Daher kann der Ripple-Carry-Addierer 654 ein Summenbit und ein Carry-Out-Bit auf Grundlage der Addition von (A0 XOR SA), (B0 XOR SB), und SA (e.g., (A0 XOR SA) + (B0 XOR SB) + SA) erzeugen. Dementsprechend demonstriert Tabelle 1 die möglichen Kombinationen von SA, SB, A0 und B0 und der Summe und des Übertrags, die aus jeder Kombination resultieren. Tabelle 1. (A0 XOR SA) + (B0 XOR SB) + SA Vorzeichen A Vorzeichen B A0 B0 Summe Übertrag 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 1 0 On 13 Coming back, the soft logic blocks can 652 include a LUT in some embodiments. Therefore, the sum bits (eg S1 - S4 ) and the carry bits provided by a respective ripple carry adder 654 based on maps in the LUT of the soft logic blocks 652 be determined. Furthermore, the soft logic blocks 652 in some embodiments, the carry bit provided by the first ripple carry adder 654A is determined, based in part on LUTs. In these embodiments, the soft logic blocks may be 652 consider a rounding error associated with truncating an LSB. For example, the soft logic blocks 652 determine a carry bit that would have contributed the LSB to the 4-bit result S if it had not been truncated, and the carry bit into the first ripple carry adder 654A in order to reduce rounding errors caused without the contribution of the carry bit. For this purpose, a bit which is derived from the first operand A ( A0 ) and a sign bit from the first operand A (SA) into a first soft logic block 652 flow. The first soft logic block 652 may contain a LUT containing the inputs A0 and SA on an issue ( A0 XOR SA), or the exclusive or A0 and SA , pictures. Furthermore, a truncated bit from the second operand B ( B0 ) and a sign bit from the second operand B ( B0 ) into a second soft logic block 652 flow. The second soft logic block may contain a LUT containing the inputs B0 and SB to an output ( B0 XOR SB), or the exclusive or B0 and SB maps. The outputs of the first soft logic block 652 and the second soft logic block 652 can into a ripple-carry adder 654 flow. In some embodiments, the ripple carry adder 654 also receive SA as a carry bit. Therefore, the ripple carry adder 654 a sum bit and a carry-out bit based on the addition of ( A0 XOR SA), ( B0 XOR SB), and SA (eg, (A0 XOR SA) + (B0 XOR SB) + SA). Accordingly, Table 1 demonstrates the possible combinations of SA . SB . A0 and B0 and the sum and carry resulting from each combination. Table 1. (A0 XOR SA) + (B0 XOR SB) + SA Sign A Sign B A0 B0 total transfer 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 0 0 1 1 1 1 1 1 0

Ferner können wie besprochen die Softlogikblöcke 652 durch Nutzen der LUTs ein Übertragsbit bestimmen, das die Addition von A0 und B0 zu der Summierung des ersten 4-Bit-Operanden A und des zweiten 4-Bit-Operanden B beigetragen hätte, wenn sie nicht trunkiert gewesen wären. Jedoch können für einen Zweierkomplement-Operanden sowohl SA als auch SB zu (A0 XOR SA) + (B0 XOR SB) addiert werden, um das geeignete Übertragsbit zu erzeugen. Folglich, da das Übertragsbit, das gemäß Tabelle 1 erzeugt wird (z.B. gemäß dem Ergebnis des Ripple-Carry-Addierers 654), den Beitrag von SB nicht aufweist, weil zum Beispiel der Ripple-Carry-Addierer 654 nur ein einzelnes Carry-In-Bit empfangen kann, kann das Übertragsbit in den ersten Ripple-Carry-Addierer 654A des Addierers mit 2 Eingaben 650 fließen, und das SB kann in einen beliebigen geeigneten Abschnitt des Addierers mit 2 Eingaben 650 und/oder einen späteren Abschnitt eines Addiererbaums 200 an einer geeigneten Bitposition hineinaddiert werden.Further, as discussed, the soft logic blocks 652 by using the LUTs, determine a carry bit that is the addition of A0 and B0 would have contributed to the summation of the first 4-bit operand A and the second 4-bit operand B if they had not been truncated. However, for a two's complement operand, both SA and SB can be added to (A0 XOR SA) + (B0 XOR SB) to generate the appropriate carry bit. Consequently, since the carry bit generated according to Table 1 (eg according to the result of the ripple carry adder 654 ), the contribution of SB not because, for example, the ripple carry adder 654 can receive only a single carry-in bit, the carry bit may be in the first ripple carry adder 654A of adder with 2 inputs 650 flow, and the SB can be placed in any appropriate section of the 2-input adder 650 and / or a later portion of an adder tree 200 be added at a suitable bit position.

Ferner kann das SA-Bit in einigen Ausführungsformen aufgrund des Orts des Ripple-Carry-Addierers 654 möglicherweise nicht in den Ripple-Carry-Addierer 654 fließen. Beispielsweise empfängt der Ripple-Carry-Addierer 654 möglicherweise kein Carry-In-Bit. In diesen Ausführungsformen kann die Ausgabe des Addierers die Summierung von (A0 XOR SA) + (B0 XOR SB) ohne den Beitrag von SA darstellen. Daher demonstriert Tabelle 2 die Ergebnisse der obigen Summierung ohne den Beitrag von SA für jede Kombination von SA, SB, A0, und B0. Tabelle 2. (A0 XOR SA) + (B0 XOR SB) Vorzeichen A Vorzeichen B A0 B0 Summe Übertrag 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0* 1 0 0 1 0 1 1 0 1 0 0 0 1 0 1 1 1 0* 1 1 0 0 0 1 1 1 0 1 1 0* 1 1 1 0 1 0* 1 1 1 1 0 0 Further, in some embodiments, the SA bit may be due to the location of the ripple carry adder 654 may not be in the ripple-carry adder 654 flow. For example, the ripple carry adder receives 654 possibly not a carry-in bit. In these embodiments, the output of the adder may represent the summation of (A0 XOR SA) + (B0 XOR SB) without the contribution of SA. Therefore, Table 2 demonstrates the results of the above summation without the contribution of SA for every combination of SA . SB . A0 , and B0. Table 2. (A0 XOR SA) + (B0 XOR SB) Sign A Sign B A0 B0 total transfer 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 0 1 1 1 1 0 1 0 0 0 1 0 * 1 0 0 1 0 1 1 0 1 0 0 0 1 0 1 1 1 0 * 1 1 0 0 0 1 1 1 0 1 1 0 * 1 1 1 0 1 0 * 1 1 1 1 0 0

Die Bits in der Spalte „Übertrag“, die mit einem Sternchen (*) markiert sind, können Übertragsbit-Fehler aufgrund des Fehlens des SA-Übertrags in den Ripple-Carry-Addierer 654 darstellen. Beispielsweise sind die mit einem Sternchen versehenen Überträge von Tabelle 2 für dieselben Kombinationen von SA, SB, A0 und B0 (z.B. {SA, SB, S0, A0} = {1, 0, 0, 0}, {1, 0, 1, 1}, {1, 1, 0, 1}, {1, 1, 1, 0}) im Vergleich zu Tabelle 1 inkorrekt. Daher können, um das fehlende SA-Carry-In zu berücksichtigen, die LUTs und/oder die Softlogikblöcke 652 eine zusätzliche Logik enthalten, um einen Carry-Out-Wert von 1 für die mit einem Sternchen versehenen Kombinationen von SA, SB, A0 und B0 zu erzwingen.The bits in the "carry" column marked with an asterisk (*) may have carry bit errors due to the absence of the SA carry in the ripple carry adder 654 represent. For example, the asterisked carries of Table 2 are for the same combinations of SA, SB, A0, and B0 (eg, {SA, SB, S0, A0} = {1, 0, 0, 0}, {1, 0, 1 , 1}, {1, 1, 0, 1}, {1, 1, 1, 0}) compared to Table 1 incorrectly. Therefore, to account for the missing SA carry-in, the LUTs and / or the soft logic blocks may be used 652 contain additional logic to give a carry-out value of 1 for the asterisked combinations of SA . SB . A0 and B0 to force.

In einigen Ausführungsformen kann das Packen durch Entfernen eines LSBs von einer Ebene in einem Multiplizierer verbessert werden. Ein Multiplizierer kann ein Produkt durch Erzeugen eines Satzes partieller Produkte jeweils auf einer anderen Ebene berechnen, bevor er sie alle zusammenaddiert. Beispielsweise kann in einer ersten Ebene eines Multiplizierers ein partielles Produkt erzeugt werden, indem ein erstes Bit eines Multiplikators in einer Multiplikationsoperation mit jedem Bit eines Multiplikanden in der Multiplikationsoperation multipliziert wird (z.B. logisches UND). In einer zweiten Stufe kann ein partielles Produkt erzeugt werden, indem ein zweites Bit des Multiplikators mit jedem Bit des Multiplikanden multipliziert wird. Daher kann das Entfernen eines LSBs von einer Ebene in einem Multiplizierer das Entfernen eines LSBs von einem Multiplikator und einem Multiplikanden, die an einer Multiplikationsoperation beteiligt sind, beinhalten. In der Annahme, dass die Multiplikationsoperation eine Signed-Magnitude-Operation ist (z.B. weisen sowohl der Multiplikator als auch der Multiplikand das Signed-Magnitude-Format auf), kann, um einen Fehler zu berücksichtigen, der mit dem Trunkieren (z.B. Entfernen) des LSBs von dem Multiplikator und dem Multiplikanden verbunden ist, der Übertrag berechnet werden, den der Multiplikator und der Multiplikand auf der Ebene, auf der die LSBs entfernt wurden, gehabt hätten. Dazu kann das LSB der Ausgabe (z.B. des partiellen Produkts) des Multiplizierers sowohl auf der Multipliziererebene, von der die LSBs entfernt werden, als auch der folgenden Multipliziererebene bestimmt werden. Beispielsweise kann für einen Multiplikator A mit einem LSB A0, das auf einer ersten Multipliziererebene trunkiert wird, und einen Multiplikanden B mit einem LSB B0, das auf der ersten Multipliziererebene trunkiert wird, das Ausgabe-LSB der ersten Multipliziererebene (z.B. A0 UND B0) bestimmt werden, und das Ausgabe-LSB der zweiten Multipliziererebene, das mit dem Bit A1 des zweiten Multiplikators und dem Bit B1 des zweiten Multiplikanden (z.B. A1 UND B1) verbunden ist, bestimmt werden. Der Beitrag der LSBs A0 und B0 kann dann bestimmt werden, indem das exklusive oder (XOR) des Ausgabe-LSBs der ersten Multipliziererebene und ein Vorzeichenbit der Ausgabe (Vorzeichen1) der ersten Multipliziererebene (z.B. (A0 UND B0) XOR Vorzeichen1) genommen wird, indem das XOR des Ausgabe-LSBs der zweiten Multipliziererebene und ein Vorzeichenbit der Ausgabe (Vorzeichen2) der zweiten Multipliziererebene (z.B. (A1 UND B1) XOR Vorzeichen2) genommen wird, und indem die Ergebnisse dieser zwei Operationen (z.B. ((A0 UND B0) XOR Vorzeichen1) + ((A1 UND B1) XOR Vorzeichen2)) addiert werden. Dazu kann Logik für die oben besprochenen Operationen zumindest in einem Abschnitt eines oder mehrerer ALM enthalten sein. In einigen Ausführungsformen kann die Logik, um ((A1 UND B1) XOR Vorzeichen2) zu berechnen, in einer MSB-Hälfte oder einem MSB-Abschnitt eines ALM enthalten sein, da sich die A1- und B1-Bits in einer höherwertigeren Bitposition als die A0- und B0-Bits befinden. Die LSB-Hälfte des ALM kann eine Logik enthalten, um ((A0 UND B0) XOR Vorzeichen1) + (A0 UND B0) XOR Vorzeichen1)) zu berechnen, dessen Ergebnis einen Übertrag in das MSB erzwingen kann, so dass der Beitrag von A0 und B0 auf A1 und B1 angewandt wird, obwohl A0 und B0 trunkiert sind. In diesen Ausführungsformen kann der Multiplizierer effizienter in eine integrierte Schaltung 12 packen, und das Endprodukt, das vom Multiplizierer berechnet wird, kann unverändert bleiben, da der Beitrag der trunkierten Bits (z.B. A0 und B0) berücksichtigt wird.In some embodiments, packaging may be improved by removing an LSB from a plane in a multiplier. A multiplier can compute a product by generating a set of partial products each at a different level before adding them all together. For example, in a first level of a multiplier, a partial product may be generated by multiplying a first bit of a multiplier in a multiplication operation by each bit of a multiplicand in the multiplication operation (eg, logical AND). In a second stage, a partial product can be created by multiplying a second bit of the multiplier by each bit of the multiplicand. Therefore, removing an LSB from a plane in a multiplier may involve removing an LSB from a multiplier and a multiplicand involved in a multiplication operation. Assuming that the multiplication operation is a signed magnitude operation (eg, both the multiplier and the multiplicand have the signed magnitude format), in order to account for an error associated with truncating (eg, removing) the LSBs of the multiplier and the multiplicand, the carry that the multiplier and multiplicand would have had at the level at which the LSBs were removed are calculated. For this purpose, the LSB of the output (eg of the partial product) of the multiplier can be determined both at the multiplier level from which the LSBs are removed and at the following multiplier level. For example, for a multiplier A with an LSB A0 which is truncated at a first multiplier level and a multiplicand B with an LSB B0 which is truncated at the first multiplier level, the output LSB of the first multiplier level (eg A0 AND B0 ), and the second multiplier level output LSB associated with the bit A1 of the second multiplier and the bit B1 of the second multiplicand (eg A1 AND B1 ) are determined. The contribution of the LSBs A0 and B0 can then be determined by specifying the exclusive or (XOR) of the first multiplier output LSB and a sign bit of the output (Sign1) of the first multiplier level (eg, A0 AND B0 ) XOR sign1) is taken by the XOR of the output multiplier second LSB and a sign bit of the second multiplier output (Sign2) (eg A1 AND B1 ) XOR sign2) is taken, and by adding the results of these two operations (eg ((A0 AND B0) XOR sign1) + ((A1 AND B1) XOR sign2)). This may include logic for the operations discussed above at least in a portion of one or more ALMs. In some embodiments, the logic to (( A1 AND B1 ) XOR sign2) may be included in an MSB half or MSB portion of an ALM because the A1 and B1 bits are in a higher order bit position than the A0 and B0 bits. The LSB half of the ALM may contain logic to ((A0 AND B0 ) XOR sign1) + ( A0 AND B0 ) XOR sign1)), the result of which can force a carry into the MSB so that the contribution of A0 and B0 to A1 and B1 is applied even though A0 and B0 are truncated. In these embodiments, the multiplier can be more efficiently integrated into an integrated circuit 12 and the final product computed by the multiplier can remain unchanged since the contribution of the truncated bits (eg A0 and B0) is taken into account.

Ferner kann in einigen Ausführungsformen das Packen durch Entfernen eines MSB aus dem Ergebnis der ersten Stufe 208A eines Addiererbaums 200 verbessert werden. Insbesondere kann in Ausführungsformen, die die Addition zweier oder mehr Signed-Magnitude-Operanden 201 in der ersten Stufe 208A beinhalten, ein Vorzeichenbit (z.B. MSB) eines oder mehrerer der Ergebnisse der Addition trunkiert sein. In diesen Ausführungsformen kann ein geeignetes Vorzeichenbit später eingefügt werden, wie beispielsweise zu der Endsumme 206 des Addiererbaums 200 addiert werden. Wenn beispielsweise ein erster Operand 201A mit sieben Bits in Signed-Magnitude mit einem zweiten Operanden 201B mit sechs Bits in Signed-Magnitude addiert wird, würde das Ergebnis der Addition sieben Datenbits und ein achtes Bit, das eine Vorzeichenerweiterung verarbeitet, enthalten. Daher ist das MSB des Ergebnisses das Vorzeichenbit. Da jedoch das Vorzeichenbit des Ergebnisses ebenfalls 0 sein wird, wenn sowohl das Vorzeichen des ersten Operanden 201A als auch das Vorzeichen des zweiten Operanden 201B 0 sind (z.B. sind der erste Operand 201A und der zweite Operand 201B positiv). Wenn sowohl das Vorzeichen des ersten Operanden 201A als auch das Vorzeichen des zweiten Operanden 201B 1 ist (z.B. sind der erste Operand 201A und der zweite Operand 201B negativ), wird das Vorzeichenbit des Ergebnisses ebenfalls 1 sein, und wenn das Vorzeichen des ersten Operanden 201A nicht mit dem Vorzeichen des zweiten Operanden 201B übereinstimmt, wird das Vorzeichenbit des Ergebnisses mit dem Vorzeichen des ersten Operanden 201A (z.B. dem MSB des ersten Operanden 201A) übereinstimmen. Dementsprechend kann das MSB des Ergebnisses auf Grundlage der Vorzeichenbits des ersten Operanden 201A und des zweiten Operanden 201B kodiert werden. Daher kann das MSB des Ergebnisses trunkiert und später dekodiert werden, um zurück in das Zwischenergebnis 203 oder die Endsumme 206 des Addiererbaums 200 addiert zu werden. Dazu werden Ressourcen, die mit dem Beibehalten des MSB bei jeder Stufe 208 des Addiererbaums verbunden sind, von dem Addiererbaum entfernt (z.B. entkoppelt), da das MSB nicht in dem Ergebnis enthalten ist, was zu einem effizienteren Packen des Addiererbaums führen kann.Further, in some embodiments, packaging may be accomplished by removing an MSB from the result of the first stage 208A an adder tree 200 be improved. In particular, in embodiments involving the addition of two or more signed magnitude operands 201 in the first stage 208A include, a sign bit (eg, MSB) of one or more of the results of the addition to be truncated. In these embodiments, a suitable sign bit may be inserted later, such as the final sum 206 of the adder tree 200 be added. For example, if a first operand 201A with seven bits in signed magnitude with a second operand 201B with six bits summed in signed magnitude, the result of the addition would include seven bits of data and one eighth bit that handles sign extension. Therefore, the MSB of the result is the sign bit. However, since the sign bit of the result will also be 0 if both the sign of the first operand 201A as well as the sign of the second operand 201B 0 are (for example, are the first operand 201A and the second operand 201B positive). If both the sign of the first operand 201A as well as the sign of the second operand 201B 1 is (eg are the first operand 201A and the second operand 201B negative), the sign bit of the result will also be 1, and if the sign of the first operand 201A not with the sign of the second operand 201B matches the sign bit of the result with the sign of the first operand 201A (eg the MSB of the first operand 201A) to match. Accordingly, the MSB of the result may be based on the sign bits of the first operand 201A and the second operand 201B be encoded. Therefore, the MSB of the result can be truncated and later decoded to return to the intermediate result 203 or the final sum 206 of the adder tree 200 to be added. These are resources that keep up with the MSB at each stage 208 of the adder tree are removed (eg, decoupled) from the adder tree because the MSB is not included in the result, which can result in more efficient packaging of the adder tree.

Nunmehr bezugnehmend auf 15, stellt ein Blockgleitkommabaum 800 ein Verfahren der Nutzung von Softlogik in beschnittenen Addiererbäumen 200 (z.B. Addiererbäume 200 mit trunkierten Operanden 201), um die Genauigkeit der Endsumme 206 zu erhöhen, dar. Der Blockgleitkommabaum 800 kann einen Satz von XOR-Gattern 802 enthalten. Jedes XOR-Gatter 802 kann konfiguriert sein, zwei MSBs 805 von den Operanden 201, die in den Blockgleitkommabaum 800 eingegeben werden, zu empfangen. Bei jedem XOR-Gatter 802 werden die zwei MSBs 805 jedes Operanden 201 geprüft, um einen möglichen Überlauf in einer folgenden Stufe 208 des Blockgleitkommabaums 800 vorherzusagen. Daher werden die XOR-Gatter 802 genutzt, um einen Dynamikbereich der Operanden 201 in der ersten Stufe 208A zu prüfen. Ein ODER-Gatter 804 dann logische ODER-Ergebnisse von den XOR-Gattern 802 zusammen, um einen einzelnen Bestimmungsfaktor für die gesamte erste Stufe 208A zu erzeugen. Der Bestimmungsfaktor kann in einen Satz von Multiplexern 806 (MUXe) als ein Auswahlsignal fließen, das genutzt wird, um zwischen den Operanden 201 oder den Operanden 201, die um ein Bit nach rechts verschoben sind (z.B. mit dem trunkierten LSB 202), zu wählen, die über Verschiebungsblöcke 808 in die MUXe 806 eingegeben werden. Wenn ein XOR-Gatter 802, das mit einem der Operanden 201 in der ersten Stufe 208A gekoppelt ist, einen möglichen Überlauf aufgrund des jeweiligen Operanden 201 vorhersagt, kann beispielsweise das ODER-Gatter 804 die Operanden 201, die um ein Bit nach rechts verschoben sind, aus jedem der MUXe 806 auswählen, um Bitwachstum zu berücksichtigen. Wenn keins der XOR-Gatter 802 einen möglichen Überlauf aufgrund eines der Operanden 201 vorhersagt, kann das ODER-Gatter 804 andererseits den unveränderten Operanden 201 aus jedem MUX 806 auswählen. In jedem Fall können die Ausgaben der MUXe 806 in zwei Zweiersätzen zusammenaddiert werden, um die Zwischenergebnisse 203 zu erzeugen.Referring now to 15 , sets a block floating point tree 800 a method of using soft logic in truncated adder trees 200 (eg adder trees 200 with truncated operands 201 ) to the accuracy of the final total 206 The block floating point tree 800 can use a set of XOR gates 802 contain. Every XOR gate 802 can be configured to have two MSBs 805 from the operands 201 entering the block floating point tree 800 to be received. At every XOR gate 802 become the two MSBs 805 each operand 201 Checked for a possible overflow in a following stage 208 of the block floating point tree 800 predict. Therefore, the XOR gates become 802 used to control a dynamic range of the operands 201 in the first stage 208A to consider. An OR gate 804 then logical OR results from the XOR gates 802 put together a single determinant for the entire first stage 208A to create. The determination factor can be in a set of multiplexers 806 (MUXes) flow as a select signal that is used to switch between the operands 201 or the operands 201 which are shifted one bit to the right (eg with the truncated LSB 202 ), to choose, via shift blocks 808 entered into MUX 806. If an XOR gate 802 that with one of the operands 201 in the first stage 208A coupled, a possible overflow due to the respective operand 201 For example, the OR gate can be 804 the operands 201 which are shifted one bit to the right from each of the muxes 806 select to account for bit growth. If none of the XOR gates 802 a possible overflow due to one of the operands 201 predicts, the OR gate 804 on the other hand, the unchanged operand 201 from every mux 806 choose. In any case, the expenses of MUXes 806 be added together in two sets of two to the intermediate results 203 to create.

Ähnlich wie die erste Stufe 208A des Blockgleitkommabaums 800 können in der zweiten Stufe 208B des Blockgleitkommabaums 800 die zwei MSBs 805 der Zwischenergebnisse 203 in einen zweiten Satz von XOR-Gattern 802B fließen. Der zweite Satz von XOR-Gattern 802B kann einen möglichen Überlauf der Zwischenergebnisse 203 vorhersagen und mit einem zweiten ODER-Gatter 804B gekoppelt sein, so dass das zweite ODER-Gatter 804B zwischen den Zwischenergebnissen 203 und den Zwischenergebnissen 203, die über die Verschiebungsblöcke 808B um ein Bit nach rechts verschoben sind, auf Grundlage von Überlaufvorhersagen für die Zwischenergebnisse 203 auswählen kann. Das zweite ODER-Gatter 804B kann die Zwischenergebnisse 203 auswählen, wenn kein Übertrag für eins der Zwischenergebnisse 203 vorhergesagt ist, und die Zwischenergebnisse 203 auswählen, die um ein Bit nach rechts verschoben sind, wenn es vorhergesagt wird, das mindestens ein Zwischenergebnis 203 überlaufen wird.Similar to the first stage 208A of the block floating point tree 800 can in the second stage 208B of the block floating point tree 800 the two MSBs 805 the interim results 203 into a second set of XOR gates 802B flow. The second set of XOR gates 802B can be a possible overflow of intermediate results 203 predict and with a second OR gate 804B be coupled so that the second OR gate 804B between the intermediate results 203 and the intermediate results 203 that about the displacement blocks 808B one bit to the right, based on overflow predictions for the intermediate results 203 can choose. The second OR gate 804B can the intermediate results 203 select if no carry for one of the intermediate results 203 is predicted, and the interim results 203 select that are shifted one bit to the right, if it is predicted, that at least one intermediate result 203 is overrun.

In der dritten Stufe 208C des Blockgleitkommabaums 800 werden die Ausgaben, die aus dem zweiten Satz von MUXen 806B ausgewählt werden, zusammenaddiert, um die Endsumme 206 mit einer geeigneten Anzahl von Bits zu erzeugen. Ferner werden die Ausgaben des ODER-Gatters 804A und des zweiten ODER-Gatters 804B von einer Addiererstruktur 810 (z.B. ein Addiererbaum 200) summiert, um einen gemeinsamen Blockgleitkomma-Exponenten (z.B. Normalisierungsfaktor) für den Blockgleitkommabaum 800 zu erzeugen; allerdings kann die Addiererstruktur 810 die Ausgaben des ODER-Gatters (z.B. 804A-804B) an jedem geeigneten Ort in dem Blockgleitkommabaum 800 summieren und ist nicht auf das Summieren der Ausgaben in der dritten Stufe 208C beschränkt. In einigen Ausführungsformen kann die Addiererstruktur 810 eine beliebige geeignete Anzahl von Eingaben verarbeiten und/oder die Eingaben in jeder geeigneten Anzahl von Stufen 208 summieren.In the third stage 208C of the block floating point tree 800 Be the expenses that come from the second set of muxes 806B be selected, added together to the final sum 206 to generate with an appropriate number of bits. Further, the outputs of the OR gate 804A and the second OR gate 804B from an adder structure 810 (eg an adder tree 200 ) adds a common block floating point exponent (eg, normalization factor) for the block floating point tree 800 to create; however, the adder structure 810 the outputs of the OR gate (eg 804A - 804B ) at any suitable location in the block floating point tree 800 sum up and is not on summing up the expenses in the third stage 208C limited. In some embodiments, the adder structure 810 process any suitable number of inputs and / or the inputs in any suitable number of stages 208 sum up.

In einigen Fällen kann die Ausgabe der Addiererstruktur 810 ein Blockgleitkomma-Exponent (z.B. Normalisierungsfaktor) sein, und die Endsumme 206 kann eine Blockgleitkomma-Mantisse darstellen. In einigen Ausführungsformen kann die Blockgleitkomma-Darstellung eine ganzzahlige Mantisse nutzen. Ferner kann die Mantisse normalisiert sein oder auch nicht, und kann im Signed-Magnitude- oder in anderen Formaten dargestellt sein.In some cases, the output of the adder structure 810 a block floating-point exponent (eg, normalization factor), and the final sum 206 may represent a block floating point mantissa. In some embodiments, the block floating point representation may use an integer mantissa. Furthermore, can the mantissa may or may not be normalized, and may be represented in Signed Magnitude or other formats.

Während die Verfahren, die an der Addition mithilfe des Blockgleitkommabaums 800 beteiligt sind, möglicherweise eine erhöhte Genauigkeit im Vergleich zu einigen der anderen Ausführungsformen von hierin beschriebenen Addiererbäumen 200 bieten, kann es schwierig sein, große Bäume (z.B. Skalarprodukte mit vielen Elementen) mit diesem Verfahren direkt zu implementieren. Da ein großer Fan-In von Ergebnissen von den XOR-Gattern 802A und dem zweiten Satz von XOR-Gattern 802B in das ODER-Gatter 804A bzw. das zweite ODER-Gatter 804B vorliegt, und da ein großer Fan-Out der Auswahlsignale von dem ODER-Gatter 804A und dem zweiten ODER-Gatter 804B zu den MUXen 806A bzw. 806B vorliegt, kann beispielsweise der Blockgleitkommabaum 800 von langen Verzögerungen betroffen sein, die die Leistung verringern können.While the procedures involved in the addition using the block floating-point tree 800 possibly increased accuracy compared to some of the other embodiments of adder trees described herein 200 It can be difficult to directly implement large trees (eg scalar products with many elements) using this method. As a big fan-in of results from the XOR gates 802A and the second set of XOR gates 802B into the OR gate 804A or the second OR gate 804B is present, and since a large fan-out of the selection signals from the OR gate 804A and the second OR gate 804B to the MUXs 806A respectively. 806B is present, for example, the Blockgleitkommabaum 800 be affected by long delays that can reduce performance.

Daher kann in einigen Ausführungsformen eine Stufe 208 des Blockgleitkommabaums 800 übersprungen werden, um die Komplexität zu lösen, die mit diesen Verfahren verbunden ist. Dementsprechend stellt 16 einen vereinfachten Blockgleitkommabaum 850 dar, bei dem die Operanden 201 der ersten Stufe 208A nicht angepasst (z.B. ausgewählt) werden, bevor sie in der zweiten Stufe 208B summiert werden. Daher geht ein Ergebnis des ersten ODER-Gatters 802A direkt zu der Addiererstruktur 810 über. Da eine Stufe 208 übersprungen wird, um die Zwischenergebnisse 203 in der zweiten Stufe 208B entsprechend anzupassen, kann die Ausgabe des ODER-Gatters 804 ferner zwischen den Zwischenergebnissen 203 und den Zwischenergebnissen, die an den MUXen 806 um zwei Bits nach rechts verschoben werden, auswählen. Ein Satz von Verschiebungsblöcken 808 kann die Zwischenergebnisse 203 um zwei Bits nach rechts verschieben, um gleichzeitig einen Überlauf der ersten Stufe 208A und der zweiten Stufe 208B zu berücksichtigen. Folglich kann das ODER-Gatter 804 Eingaben von den XOR-Gattern 802 empfangen, um die Ausgaben der MUXen 806 auszuwählen. In diesen Ausführungsformen kann jedes XOR-Gatter 802 die ersten drei MSBs 805 von einem jeweiligen Operanden 201 empfangen, um zu ermöglichen, dass das ODER-Gatter 804 einen Überlauf in der ersten Stufe 208A und/oder der zweiten Stufe 208B vorhersagt.Therefore, in some embodiments, a stage 208 of the block floating point tree 800 be skipped to solve the complexity associated with these methods. Accordingly presents 16 a simplified block floating point tree 850 in which the operands 201 the first stage 208A not be adjusted (eg selected) before going into the second stage 208B be summed up. Therefore, a result of the first OR gate goes 802A directly to the adder structure 810 above. As a step 208 skipped to the intermediate results 203 in the second stage 208B Correspondingly, the output of the OR gate 804 also between the intermediate results 203 and the intermediate results, the MUXs 806 by two bits to the right, select. A set of displacement blocks 808 can the intermediate results 203 shift two bits to the right to simultaneously overflow the first stage 208A and the second stage 208B to take into account. Consequently, the OR gate 804 Inputs from the XOR gates 802 receive the expenses of MUXes 806 select. In these embodiments, each XOR gate may 802 the first three MSBs 805 from a respective operand 201 received to allow the OR gate 804 an overflow in the first stage 208A and / or the second stage 208B predicts.

In einigen Ausführungsformen kann das ODER-Gatter 804 ein Auswahlsignal ausgeben, das ein Einzelbit oder einen Satz von Bits enthält, um die Zwischenergebnisse 203 (z.B. die verschobenen 0-Bits), oder die Zwischenergebnisse, die um 2 Bits nach rechts verschoben sind, auszuwählen. Beispielsweise kann in einigen Ausführungsformen das Auswahlsignal mit dem Wert der ausgewählten Verschiebeoperation übereinstimmen (z.B. 0 oder 2), die die Nutzung zweier Bits enthalten kann, und in anderen Ausführungsformen kann das Auswahlsignal ein Einzelbit nutzen, das genutzt wird, um eine Verschiebung von 0 oder 2 zu kodieren. In jedem Fall kann die Addiererstruktur 810 dennoch alle Verschiebungswerte zusammensummieren und ihre Ausgabe auf Grundlage der Art des genutzten Auswahlsignals (z.B. ein Einzelbit oder einen Satz von Bits) anpassen. In einigen Ausführungsformen können die kodierten Auswahlsignale an einem anderen Ort in der integrierten Schaltung 12 berücksichtigt werden.In some embodiments, the OR gate 804 output a selection signal containing a single bit or a set of bits to the intermediate results 203 (eg the shifted 0-bits), or the intermediate results shifted by 2 bits to the right. For example, in some embodiments, the selection signal may match the value of the selected shift operation (eg, 0 or 2), which may include the use of two bits, and in other embodiments, the selection signal may utilize a single bit that is used to shift 0 or 2 to code. In any case, the adder structure 810 nevertheless, sum up all the shift values and adjust their output based on the type of select signal used (eg, a single bit or a set of bits). In some embodiments, the coded selection signals may be at a different location in the integrated circuit 12 be taken into account.

Zusätzlich oder alternativ zum Überspringen von Stufen 208, wie in 16 dargestellt, kann die Komplexität, die mit den unter Bezugnahme auf 15 dargestellten Verfahren verbunden ist, durch Nutzen eines Satzes von Blockgleitkommabäumen 800 und/oder eines Satzes von vereinfachten Blockgleitkommabäumen 850 verringert werden. Beispielsweise kann ein Satz von Operanden 201 in Teilsätze von Operanden 201 aufgeteilt werden, und jeder von einem Satz von Blockgleitkommabäumen 800 und/oder einem Satz von vereinfachten Blockgleitkommabäumen 850 kann einen unterschiedlichen Teilsatz von Operanden 201 empfangen und summieren. Daher kann die Anzahl von Operanden 201, die von einem Blockgleitkommabaum 800 und/oder einem vereinfachten Blockgleitkommabaum 850 empfangen werden, verringert werden, wodurch das Fan-In und Fan-Out verringert wird, das Verzögerungen in einem großen Blockgleitkommabaum 800 verursachen kann. Die Ergebnisse jedes der Sätze von Blockgleitkommabäumen 800 und/oder Sätzen von vereinfachten Blockgleitkommabäumen 850 kann dann zu einer einzigen Blockgleitkommadarstellung kombiniert werden.Additionally or alternatively to skip steps 208 , as in 16 represented, the complexity can be compared with with reference to 15 by using a set of block floating-point trees 800 and / or a set of simplified block floating point trees 850 be reduced. For example, a set of operands 201 in subsets of operands 201 and each of a set of block floating-point trees 800 and / or a set of simplified block floating point trees 850 can use a different subset of operands 201 receive and sum up. Therefore, the number of operands 201 coming from a block floating point tree 800 and / or a simplified block floating-point tree 850 can be decreased, thereby reducing fan-in and fan-out delays in a large block floating-point tree 800 can cause. The results of each of the sets of block floating-point trees 800 and / or sentences of simplified block floating point trees 850 can then be combined into a single block floating point representation.

Dementsprechend stellt 17 eine Ausführungsform eines Blockgleitkommakombinationsbaums 900 dar, der die Summierung mehrerer Blockgleitkommabäume 800 implementieren kann. In der dargestellten Ausführungsform geben ein erster Blockgleitkommabaum 800A, ein zweiter Blockgleitkommabaum 800B und ein dritter Blockgleitkommabaum 800C jeweils sowohl einen Blockgleitkomma-Exponenten und eine Blockgleitkomma-Mantisse aus. Die Blockgleitkommabaum-Exponenten von jedem Blockgleitkommabaum (z.B. 800A-C) können von einer Schaltung 902 sortiert werden, die den höchsten Blockgleitkommabaum-Exponenten, der von den Blockgleitkommabäumen 800A-C empfangen wird, auswählen und ausgeben kann. Ein Satz von Subtrahierern 904 kann mit der Schaltung 902 gekoppelt werden und dann jeden Blockgleitkomma-Exponenten von diesem höchsten Blockgleitkommabaum-Exponenten subtrahieren. Die Ausgaben von diesen Subtrahierern 904 können dann in einen Satz von Verschiebern 906A-906C fließen, um die Blockgleitkomma-Mantissen zu normalisieren. Daher kann jeder der Schieber 906A-906C eine jeweilige Blockgleitkomma-Mantisse eines entsprechenden Blockgleitkommabaums (z.B. 800A-800C) um eine Anzahl von Bits nach rechts verschieben, die der jeweiligen Ausgabe des Subtrahierers 904 entspricht. In einigen Ausführungsformen verschiebt beispielsweise der erste Verschieber 906A die Blockgleitkomma-Mantisse des ersten Blockgleitkommabaums 800A möglicherweise gar nicht, da der Blockgleitkomma-Exponent des ersten Blockgleitkommabaums 800A möglicherweise der höchste Blockgleitkomma-Exponent ist. In diesen Fällen kann der zweite Verschieber 906B die zweite Blockgleitkomma-Mantisse des zweiten Blockgleitkommabaums 800B beispielsweise um drei Bits nach rechts verschieben, wenn das Ergebnis des Subtrahierers 904 des Subtrahierens des Blockgleitkomma-Exponenten des zweiten Blockgleitkommabaums 800B von dem höchsten Blockgleitkomma-Exponenten drei ist. Ferner kann der dritte Verschieber 906C beispielsweise die dritte Blockgleitkomma-Mantisse des dritten Blockgleitkommabaums 800C um zwei Bits nach rechts verschieben, beispielsweise auf Grundlage der Ausgabe des jeweiligen Subtrahierers 904, die in den dritten Verschieber 906C eingegeben wird. Die Verschieber 906A-906C können relativ kleine und/oder flache logische Strukturen sein, da die Blockgleitkomma-Exponenten wahrscheinlich klein sind. Beispielsweise können die Exponenten, wie in 15 beschrieben, bei jeder Stufe 208 in dem Blockgleitkommabaum 800 typischerweise um höchstens 1 inkrementiert werden, also sind die Verschiebeoperationen, um die Blockgleitkomma-Mantissen zu normalisieren, wahrscheinlich ebenfalls klein.Accordingly presents 17 an embodiment of a block floating point combination tree 900 representing the summation of multiple block floating point trees 800 can implement. In the illustrated embodiment, a first block floating point tree 800A , a second block floating point tree 800B and a third block floating point tree 800C each comprise both a block floating point exponent and a block floating point mantissa. The block floating point exponents of each block floating point tree (eg 800A - C ) can be from a circuit 902 the highest block floating-point exponent, that of the block floating-point trees 800A-C is received, select and output. A set of subtractors 904 can with the circuit 902 and then subtract each block floating-point exponent from this highest block floating-point exponent. The issues of these subtractors 904 can then translate into a set 906A - 906C flow to normalize the block floating point mantissas. Therefore, every one of the slides 906A - 906C a respective block floating point mantissa of a corresponding block floating point tree (eg 800A - 800C ) shift by a number of bits to the right, that of the respective output of the subtractor 904 equivalent. For example, in some embodiments, the first pusher shifts 906A the block floating point mantissa of the first block floating point tree 800A possibly not at all, since the block floating-point exponent of the first block floating-point tree 800A possibly the highest block floating point exponent. In these cases, the second shifter 906B the second block floating point mantissa of the second block floating point tree 800B for example, shift three bits to the right if the result of the subtractor 904 subtracting the block floating point exponent of the second block floating point tree 800B of the highest block floating point exponent is three. Furthermore, the third shifter 906C for example, the third block floating point mantissa of the third block floating point tree 800C shift two bits to the right, for example based on the output of the respective subtractor 904 in the third shifter 906C is entered. The shifters 906A - 906C may be relatively small and / or flat logic structures since the block floating point exponents are likely to be small. For example, the exponents, as in 15 described, at each level 208 in the block floating point tree 800 typically are incremented by at most 1, so the shift operations to normalize the block floating point mantissas are also likely to be small.

Dementsprechend kann jede der Blockgleitkomma-Mantissen, die von den Verschiebern 906A-906C ausgegeben wird, in Bezug auf den höchsten Blockgleitkomma-Exponenten normalisiert werden, und daher können die Blockgleitkomma-Mantissen in einen letzten zu summierenden Blockgleitkommabaum 800D fließen. Der letzte Blockgleitkommabaum 800D kann die Blockgleitkomma-Mantissen als Operanden empfangen und eine letzte Blockgleitkomma-Mantisse und einen letzten Blockgleitkomma-Exponenten auf Grundlage der Summierung jedes der Blockgleitkommabäume 800A-800C erzeugen.Accordingly, each of the block floating point mantissas received by the relays 906A - 906C is normalized with respect to the highest block floating point exponent, and therefore the block floating point mantissas can be put into a last block floating point byte to be summed 800D flow. The last block floating point tree 800D may receive the block floating point mantissas as operands and a last block floating point mantissa and last block floating point exponent based on the summation of each of the block floating point trees 800A - 800C produce.

Ferner kann in einigen Ausführungsformen der Blockgleitkommakombinationsbaum 900 eine beliebige geeignete Kombination von Blockgleitkommabäumen 800 und/oder vereinfachten Blockgleitkommabäumen 850 enthalten. Das heißt, es sollte beachtet werden, dass 17 und ihre Beschreibung nur der Veranschaulichung und nicht der Einschränkung dienen.Further, in some embodiments, the block floating point combination tree 900 any suitable combination of block floating point trees 800 and / or simplified block floating point trees 850 contain. That said, it should be noted that 17 and their description are illustrative only and not intended to be limiting.

Während die Ausführungsformen, die in der vorliegenden Offenbarung dargelegt sind, für verschiedene Modifikationen und alternative Formen empfänglich sein können, wurden konkrete Ausführungsformen beispielhaft in den Zeichnungen gezeigt und hierin im Detail beschrieben. Es versteht sich jedoch, dass die Offenbarung nicht auf die bestimmten offenbarten Formen beschränkt sein soll. Die Offenbarung soll alle Modifikationen, Äquivalente und Alternativen abdecken, die in den Gedanken und Umfang der Offenbarung fallen, wie durch die folgenden angehängten Ansprüche definiert.While the embodiments set forth in the present disclosure may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and described in detail herein. It should be understood, however, that the disclosure is not intended to be limited to the particular forms disclosed. The disclosure is intended to cover all modifications, equivalents and alternatives falling within the spirit and scope of the disclosure as defined by the following appended claims.

Ausführungsformen der aktuellen AnmeldungEmbodiments of the current application

Die folgenden nummerierten Klauseln definieren Ausführungsformen der aktuellen Anmeldung.The following numbered clauses define embodiments of the current application.

Klausel A1. Integrierte Schaltung mit einem Addiererbaum, der konfiguriert ist, eine Summe zumindest teilweise auf Grundlage einer Ausgabe und eines zusätzlichen Werts zu erzeugen, der Addiererbaum umfassend:

erste Eingabeschaltungen, die konfiguriert sind, einen ersten Operanden zu empfangen, wobei der erste Operand eine erste Mehrzahl von Bits umfasst;
zweite Eingabeschaltungen, die konfiguriert sind, einen zweiten Operanden zu empfangen, wobei der zweite Operand eine zweite Mehrzahl von Bits umfasst; Softlogikschaltungen, die konfiguriert sind, ein oder mehrere Bits von der ersten Mehrzahl von Bits zu trennen, um einen ersten Teilsatzoperanden zu erzeugen, und konfiguriert sind, ein zusätzliches oder mehrere Bits von der zweiten Mehrzahl von Bits zu trennen, um einen zweiten Teilsatzoperanden zu erzeugen;
Addiererschaltungen, die konfiguriert sind, die Ausgabe zumindest teilweise auf Grundlage des ersten Teilsatzoperanden und des zweiten Teilsatzoperanden zu erzeugen; und
zusätzliche Schaltungen, die konfiguriert sind, den zusätzlichen Wert zumindest teilweise auf Grundlage des einen oder der mehreren Bits zu erzeugen.

clause A1 , An integrated circuit having an adder tree configured to generate a sum based at least in part on an output and an additional value, the adder tree comprising:

first input circuits configured to receive a first operand, the first operand comprising a first plurality of bits;
second input circuits configured to receive a second operand, the second operand comprising a second plurality of bits; Soft logic circuits configured to separate one or more bits from the first plurality of bits to generate a first subset operand and configured to separate an additional one or more bits from the second plurality of bits to produce a second subset operand ;
Adder circuits configured to generate the output based at least in part on the first subset operand and the second subset operand; and
additional circuitry configured to generate the additional value based at least in part on the one or more bits.

Klausel A2. Integrierte Schaltung nach Klausel A1, wobei die zusätzlichen Schaltungen einen nachfolgenden Addiererbaum umfassen, wobei der nachfolgende Addiererbaum zusätzliche Addiererschaltungen umfasst, die konfiguriert sind, den zusätzlichen Wert zumindest teilweise auf Grundlage einer Summierung des einen oder der mehreren Bits und des/der zusätzlichen einen oder mehreren Bits zu erzeugen.clause A2 , Integrated circuit after clause A1 wherein the additional circuits comprise a subsequent adder tree, the subsequent adder tree comprising additional adder circuits configured to generate the additional value based at least in part on a summation of the one or more bits and the additional one or more bits.

Klausel A3. Integrierte Schaltung nach Klausel A2, wobei der nachfolgende Addiererbaum konfiguriert ist, ein Bit von dem einen oder den mehreren Bits zu trennen, um einen Teilsatz des einen oder der mehreren Bits zu erzeugen, und wobei die zusätzlichen Addiererschaltungen konfiguriert sind, den zusätzlichen Wert zumindest teilweise auf Grundlage einer Summierung des Teilsatzes des einen oder der mehreren Bits und des/der zusätzlichen einen oder mehreren Bits zu erzeugen.clause A3 , Integrated circuit after clause A2 wherein the subsequent adder tree is configured to separate a bit from the one or more bits to produce a subset of the one or more bits, and wherein the additional adder circuits are configured to calculate the additional value based at least in part on the one or more bits Subset of the one or more bits and the additional one or more bits.

Klausel A4. Integrierte Schaltung nach einer der Klauseln A1 oder 2, wobei die zusätzlichen Schaltungen konfiguriert sind, den zusätzlichen Wert teilweise auf Grundlage einer Verteilung möglicher Werte des einen oder der mehreren Bits zu erzeugen.clause A4 , Integrated circuit according to one of the clauses A1 or 2 wherein the additional circuitry is configured to generate the additional value in part based on a distribution of possible values of the one or more bits.

Klausel A5. Integrierte Schaltung nach einer der Klauseln A1, 2 oder 4, wobei die Softlogikschaltungen die zusätzlichen Schaltungen umfassen und konfiguriert sind, den zusätzlichen Wert durch Emulieren einer konstanten Komprimierung des zusätzlichen Werts zu erzeugen.clause A5 , Integrated circuit according to one of the clauses A1 . 2 or 4 wherein the soft logic circuits comprise the additional circuitry and are configured to generate the additional value by emulating a constant compression of the additional value.

Klausel A6. Integrierte Schaltung nach einer der Klauseln A1, 2, 4 oder 5, wobei die Softlogikschaltungen eine Lookup-Tabelle umfassen, die konfiguriert ist, den zusätzlichen Wert teilweise auf Grundlage des einen oder der mehreren Bits zu erzeugen.clause A6 , Integrated circuit according to one of the clauses A1 . 2 . 4 or 5 wherein the soft logic circuits comprise a look-up table configured to generate the additional value in part based on the one or more bits.

Klausel A7. Integrierte Schaltung nach einer der Klauseln A1, 2, 4, 5 oder 6, wobei das eine oder die mehreren Bits ein oder mehrere Bits mit niedrigstem Stellenwert umfassen.clause A7 , Integrated circuit according to one of the clauses A1 . 2 . 4 . 5 or 6 wherein the one or more bits comprise one or more least significant bits.

Klausel A8. Integrierte Schaltung nach einer der Klauseln A1, 2, 4, 5, 6, oder 7, wobei der Addiererbaum konfiguriert ist, den zusätzlichen Wert an die Ausgabe anzuhängen, den zusätzlichen Wert der Ausgabe voranzustellen, oder zu einer Kombination davon konfiguriert ist.clause A8 , Integrated circuit according to one of the clauses A1 . 2 . 4 . 5 . 6 , or 7 wherein the adder tree is configured to append the additional value to the output, prepend the additional value of the output, or is configured to a combination thereof.

Klausel A9. Integrierte Schaltung nach einer der Klauseln A1, 2, 4, 5, 6, 7 oder 8, wobei der Addiererbaum konfiguriert ist, die Summe zumindest teilweise auf Grundlage einer Summierung des zusätzlichen Werts und der Ausgabe zu erzeugen.clause A9 , Integrated circuit according to one of the clauses A1 . 2 . 4 . 5 . 6 . 7 or 8th wherein the adder tree is configured to generate the sum at least in part based on a summation of the additional value and the output.

Klausel A10. Integrierte Schaltung nach einer der Klauseln A1, 2, 4, 5, 6, 7, 8 oder 9, wobei der Addiererbaum konfiguriert ist, eine Multiplikationsoperation eines Multiplikanden und eines Multiplikators durchzuführen, wobei der erste Operand den Multiplikanden umfasst, wobei der zweite Multiplikator den Multiplikator umfasst, wobei die Addiererschaltungen konfiguriert sind, die Ausgabe zumindest teilweise auf Grundlage eines ersten partiellen Produkts des ersten Teilsatzoperanden und des zweiten Teilsatzoperanden und eines zweiten partiellen Produkts des ersten Teilsatzoperanden und des zweiten Teilsatzoperanden zu erzeugen, und wobei die zusätzlichen Schaltungen konfiguriert sind, den zusätzlichen Wert zumindest teilweise auf Grundlage eines ersten Bits mit niedrigstem Stellenwert eines dritten partiellen Produkts des Multiplikanden und des Multiplikators und eines zweiten Bits mit niedrigstem Stellenwert eines vierten partiellen Produkts des Multiplikanden und des Multiplikators zu erzeugen.clause A10 , Integrated circuit according to one of the clauses A1 . 2 . 4 . 5 . 6 . 7 . 8th or 9 wherein the adder tree is configured to perform a multiplication operation of a multiplicand and a multiplier, the first operand comprising the multiplicand, the second multiplier comprising the multiplier, the adder circuits configured to output the output based at least in part on a first partial product of the first Generate subset operands and the second subset operand and a second partial product of the first subset operand and the second subset operand, and wherein the additional circuits are configured, the additional value based at least in part on a first least significant bit of a third partial product of the multiplicand and the multiplier and a second least significant bit of a fourth partial product of the multiplicand and the multiplier.

Klausel A11. Integrierte Schaltung nach Klausel A10, wobei die Summe ein Produkt der Multiplikationsoperation umfasst und wobei der zusätzliche Wert einen Carry-In-Wert zur Ausgabe umfasst.clause A11 , Integrated circuit after clause A10 wherein the sum comprises a product of the multiplication operation and wherein the additional value comprises a carry-in value for output.

Klausel A12. Integrierte Schaltung nach einer der Klauseln A1, 2, 4, 5, 6, 7, 8, 9, oder 10, wobei das eine oder die mehreren Bits ein oder mehrere Bits mit höchstem Stellenwert umfassen.clause A12 , Integrated circuit according to one of the clauses A1 . 2 . 4 . 5 . 6 . 7 . 8th . 9 , or 10 wherein the one or more bits comprise one or more most significant bits.

Klausel A13. Integrierte Schaltung nach Klausel A12, wobei das eine oder die mehreren Bits mit höchstem Stellenwert ein oder mehrere Vorzeichenbits umfassen, und wobei der zusätzliche Wert ein Vorzeichenbit umfasst, das teilweise auf Grundlage des einen oder der mehreren Bits und des/der zusätzlichen einen oder mehreren Bits dekodiert wird.clause A13 , Integrated circuit after clause A12 wherein the one or more most significant bits comprise one or more sign bits, and wherein the additional value comprises a sign bit which is partially decoded based on the one or more bits and the additional one or more bits.

Klausel A14. Integrierte Schaltung mit einer Addiererbaumstufe, die Addiererbaumstufe umfassend: Addiererschaltungen, die eine Eingabe umfassen und konfiguriert sind, eine Ausgabe und einen Normalisierungsfaktor zumindest teilweise auf Grundlage der Eingabe zu erzeugen; und
Eingabeschaltungen, die konfiguriert sind:

einen Operanden zu empfangen, wobei der Operand eine Mehrzahl von Bits umfasst;
ein Bit mit höchstem Stellenwert der Mehrzahl von Bits zu bestimmen; und
zumindest teilweise auf Grundlage des Bits mit höchstem Stellenwert den Operanden oder einen Teilsatzoperanden selektiv zu der Eingabe weiterzuleiten, wobei der Teilsatzoperand einen Teilsatz der Mehrzahl von Bits umfasst.

clause A14 , An adder tree stage integrated circuit, the adder tree stage, comprising: adder circuits comprising an input and configured to generate an output and a normalization factor based at least in part on the input; and
Input circuits that are configured:

receive an operand, the operand comprising a plurality of bits;
determine a most significant bit of the plurality of bits; and
based at least in part on the most significant bit, to selectively pass the operand or subset operand to the input, the subset operand comprising a subset of the plurality of bits.

Klausel A15. Integrierte Schaltung nach Klausel A14, wobei beim selektiven Weiterleiten des Teilsatzoperanden zu der Eingabe die Eingabeschaltungen konfiguriert sind, den Normalisierungsfaktor zu inkrementieren.clause A15 , Integrated circuit after clause A14 wherein in selectively routing the subset operand to the input, the input circuits are configured to increment the normalization factor.

Klausel A16. Integrierte Schaltung nach Klausel A14 oder 15, wobei die Eingabeschaltungen konfiguriert sind, den Operanden oder den Teilsatzoperanden selektiv zu einer zusätzlichen Eingabe zusätzlicher Addiererschaltungen weiterzuleiten, wobei die zusätzlichen Addiererschaltungen innerhalb einer zusätzlichen Addiererbaumstufe angeordnet sind.clause A16 , Integrated circuit after clause A14 or 15 wherein the input circuits are configured to selectively pass the operand or the subset operand to an additional input of additional adder circuits, wherein the additional adder circuits are arranged within an additional adder tree stage.

Klausel A17. Integrierte Schaltung nach Klausel A16, wobei eine Anzahl von Bits in dem Teilsatz der Mehrzahl von Bits teilweise davon abhängig ist, ob die Eingabeschaltungen konfiguriert sind, den Teilsatzoperanden zu den Addiererschaltungen oder zu den zusätzlichen Addiererschaltungen weiterzuleiten.clause A17 , Integrated circuit after clause A16 wherein a number of bits in the subset of the plurality of bits is partially dependent on whether the input circuits are configured to pass the subset operands to the adder circuits or to the additional adder circuits.

Klausel A18. Materielles, nichtflüchtiges maschinenlesbares Medium, umfassend maschinenlesbare Anweisungen, die bei Ausführung durch einen oder mehrere Prozessoren bewirken, dass die Prozessoren:

eine Anzahl von Operanden zur Eingabe in einen Addiererbaum bestimmen;
eine Bitbreite jedes der Operanden bestimmen;
eine zweite Bitbreite einer Ausgabe des Addiererbaums bestimmen;
eine Anzahl von entfernbaren Bits zum Trennen von jedem der Operanden zumindest teilweise auf Grundlage der zweiten Bitbreite bestimmen;
einen Wert zumindest teilweise auf Grundlage eines zusätzlichen Werts der entfernbaren Bits bestimmen; und
den Addiererbaum bauen, der konfiguriert ist:
- die Operanden als Eingaben zu empfangen;
- die entfernbaren Bits von jedem der Operanden zu trennen, um eine Mehrzahl von Teilsatzoperanden zu erzeugen; und
- eine Ausgabe teilweise auf Grundlage einer Summe der Teilsatzoperanden und des Werts zu erzeugen.

clause A18 , A material, non-transitory, machine-readable medium comprising machine-readable instructions that, when executed by one or more processors, cause the processors to:

determine a number of operands for input to an adder tree;
determine a bit width of each of the operands;
determine a second bit width of an output of the adder tree;
determine a number of removable bits for separating each of the operands based at least in part on the second bit width;
determine a value based at least in part on an additional value of the removable bits; and
Build the adder tree that is configured:
- receive the operands as inputs;
- separating the removable bits from each of the operands to produce a plurality of subset operands; and
- to generate an output partially based on a sum of the subset operands and the value.

Klausel A19. Maschinenlesbares Medium nach Klausel A18, wobei die maschinenlesbaren Anweisungen bei Ausführung durch einen oder mehrere Prozessoren bewirken, dass die Prozessoren den Wert zumindest teilweise auf Grundlage einer Verteilung möglicher zusätzlicher Werte der entfernbaren Bits bestimmen.clause A19 , Machine-readable medium according to clause A18 wherein the machine-readable instructions, when executed by one or more processors, cause the processors to determine the value based at least in part on a distribution of possible additional values of the removable bits.

Klausel A20. Maschinenlesbares Medium nach einem der vorhergehenden Klauseln, wobei der Wert einen Carry-In-Wert zur Summe umfasst.clause A20 , The machine-readable medium of any one of the preceding clauses, wherein the value comprises a carry-in value to the sum.

Klausel B1. Integrierte Schaltung mit einem Addiererbaum, der konfiguriert ist, eine Summe zumindest teilweise auf Grundlage einer Ausgabe und eines zusätzlichen Werts zu erzeugen, der Addiererbaum umfassend:

clause B1 , An integrated circuit having an adder tree configured to generate a sum based at least in part on an output and an additional value, the adder tree comprising:

Klausel B2. Integrierte Schaltung nach Klausel B1, wobei die zusätzlichen Schaltungen einen nachfolgenden Addiererbaum umfassen, wobei der nachfolgende Addiererbaum zusätzliche Addiererschaltungen umfasst, die konfiguriert sind, den zusätzlichen Wert zumindest teilweise auf Grundlage einer Summierung des einen oder der mehreren Bits und des/der zusätzlichen einen oder mehreren Bits zu erzeugen.clause B2 , Integrated circuit after clause B1 wherein the additional circuitry comprises a subsequent adder tree, the subsequent adder tree comprising additional adder circuits configured to generate the additional value based at least in part on a summation of the one or more bits and the additional one or more bits.

Klausel B3. Integrierte Schaltung nach Klausel B2, wobei der nachfolgende Addiererbaum konfiguriert ist, ein Bit von dem einen oder den mehreren Bits zu trennen, um einen Teilsatz des einen oder der mehreren Bits zu erzeugen, und wobei die zusätzlichen Addiererschaltungen konfiguriert sind, den zusätzlichen Wert zumindest teilweise auf Grundlage einer Summierung des Teilsatzes des einen oder der mehreren Bits und des/der zusätzlichen einen oder mehreren Bits zu erzeugen.clause B3 , Integrated circuit after clause B2 wherein the subsequent adder tree is configured to separate a bit from the one or more bits to produce a subset of the one or more bits, and wherein the additional adder circuits are configured to calculate the additional value based at least in part on the one or more bits Subset of the one or more bits and the additional one or more bits.

Klausel B4. Integrierte Schaltung nach einer der Klauseln B1 oder 2, wobei die zusätzlichen Schaltungen konfiguriert sind, den zusätzlichen Wert teilweise auf Grundlage einer Verteilung möglicher Werte des einen oder der mehreren Bits zu erzeugen.clause B4 , Integrated circuit according to one of the clauses B1 or 2 wherein the additional circuitry is configured to generate the additional value in part based on a distribution of possible values of the one or more bits.

Klausel B5. Integrierte Schaltung nach einer der Klauseln B1, 2 oder 4, wobei die Softlogikschaltungen die zusätzlichen Schaltungen umfassen und konfiguriert sind, den zusätzlichen Wert durch Emulieren einer konstanten Komprimierung des zusätzlichen Werts zu erzeugen.clause B5 , Integrated circuit according to one of the clauses B1 . 2 or 4 wherein the soft logic circuits comprise the additional circuitry and are configured to generate the additional value by emulating a constant compression of the additional value.

Klausel B6. Integrierte Schaltung nach einer der Klauseln B1, 2, 4 oder 5, wobei die Softlogikschaltungen eine Lookup-Tabelle umfassen, die konfiguriert ist, den zusätzlichen Wert teilweise auf Grundlage des einen oder der mehreren Bits zu erzeugen.clause B6 , Integrated circuit according to one of the clauses B1 . 2 . 4 or 5 wherein the soft logic circuits comprise a look-up table configured to generate the additional value in part based on the one or more bits.

Klausel B7. Integrierte Schaltung nach einer der Klauseln B1, 2, 4, 5 oder 6, wobei das eine oder die mehreren Bits ein oder mehrere Bits mit niedrigstem Stellenwert umfassen.clause B7 , Integrated circuit according to one of the clauses B1 . 2 . 4 . 5 or 6 wherein the one or more bits comprise one or more least significant bits.

Klausel B8. Integrierte Schaltung nach einer der Klauseln B1, 2, 4, 5, 6, oder 7, wobei der Addiererbaum konfiguriert ist, den zusätzlichen Wert an die Ausgabe anzuhängen, den zusätzlichen Wert der Ausgabe voranzustellen, oder zu einer Kombination davon konfiguriert ist.clause B8 , Integrated circuit according to one of the clauses B1 . 2 . 4 . 5 . 6 , or 7 wherein the adder tree is configured to append the additional value to the output, prepend the additional value of the output, or is configured to a combination thereof.

Klausel B9. Integrierte Schaltung nach einer der Klauseln B1, 2, 4, 5, 6, 7 oder 8, wobei der Addiererbaum konfiguriert ist, die Summe zumindest teilweise auf Grundlage einer Summierung des zusätzlichen Werts und der Ausgabe zu erzeugen.clause B9 , Integrated circuit according to one of the clauses B1 . 2 . 4 . 5 . 6 . 7 or 8th wherein the adder tree is configured to generate the sum at least in part based on a summation of the additional value and the output.

Klausel B10. Integrierte Schaltung nach einer der Klauseln B1, 2, 4, 5, 6, 7, 8 oder 9, wobei der Addiererbaum konfiguriert ist, eine Multiplikationsoperation eines Multiplikanden und eines Multiplikators durchzuführen, wobei der erste Operand den Multiplikanden umfasst, wobei der zweite Multiplikator den Multiplikator umfasst, wobei die Addiererschaltungen konfiguriert sind, die Ausgabe zumindest teilweise auf Grundlage eines ersten partiellen Produkts des ersten Teilsatzoperanden und des zweiten Teilsatzoperanden und eines zweiten partiellen Produkts des ersten Teilsatzoperanden und des zweiten Teilsatzoperanden zu erzeugen, und wobei die zusätzlichen Schaltungen konfiguriert sind, den zusätzlichen Wert zumindest teilweise auf Grundlage eines ersten Bits mit niedrigstem Stellenwert eines dritten partiellen Produkts des Multiplikanden und des Multiplikators und eines zweiten Bits mit niedrigstem Stellenwert eines vierten partiellen Produkts des Multiplikanden und des Multiplikators zu erzeugen. clause B10 , Integrated circuit according to one of the clauses B1 . 2 . 4 . 5 . 6 . 7 . 8th or 9 wherein the adder tree is configured to perform a multiplication operation of a multiplicand and a multiplier, the first operand comprising the multiplicand, the second multiplier comprising the multiplier, the adder circuits configured to output the output based at least in part on a first partial product of the first Generate subset operands and the second subset operand and a second partial product of the first subset operand and the second subset operand, and wherein the additional circuits are configured, the additional value based at least in part on a first least significant bit of a third partial product of the multiplicand and the multiplier and a second least significant bit of a fourth partial product of the multiplicand and the multiplier.

Klausel B11. Integrierte Schaltung nach Klausel B10, wobei die Summe ein Produkt der Multiplikationsoperation umfasst und wobei der zusätzliche Wert einen Carry-In-Wert zur Ausgabe umfasst.clause B11 , Integrated circuit after clause B10 wherein the sum comprises a product of the multiplication operation and wherein the additional value comprises a carry-in value for output.

Klausel B12. Integrierte Schaltung nach einer der Klauseln B1, 2, 4, 5, 6, 7, 8, 9, oder 10, wobei das eine oder die mehreren Bits ein oder mehrere Bits mit höchstem Stellenwert umfassen.clause B12 , Integrated circuit according to one of the clauses B1 . 2 . 4 . 5 . 6 . 7 . 8th . 9 , or 10 wherein the one or more bits comprise one or more most significant bits.

Klausel B13. Integrierte Schaltung nach Klausel B12, wobei das eine oder die mehreren Bits mit höchstem Stellenwert ein oder mehrere Vorzeichenbits umfassen, und wobei der zusätzliche Wert ein Vorzeichenbit umfasst, das teilweise auf Grundlage des einen oder der mehreren Bits und des/der zusätzlichen einen oder mehreren Bits dekodiert wird.clause B13 , Integrated circuit after clause B12 wherein the one or more most significant bits comprise one or more sign bits, and wherein the additional value comprises a sign bit which is partially decoded based on the one or more bits and the additional one or more bits.

Klausel B14. Integrierte Schaltung mit einer Addiererbaumstufe, die Addiererbaumstufe umfassend: Addiererschaltungen, die eine Eingabe umfassen und konfiguriert sind, eine Ausgabe und einen Normalisierungsfaktor zumindest teilweise auf Grundlage der Eingabe zu erzeugen; und
Eingabeschaltungen, die konfiguriert sind:

clause B14 , An adder tree stage integrated circuit, the adder tree stage, comprising: adder circuits comprising an input and configured to generate an output and a normalization factor based at least in part on the input; and
Input circuits that are configured:

Klausel B15. Integrierte Schaltung nach Klausel B14, wobei beim selektiven Weiterleiten des Teilsatzoperanden zu der Eingabe die Eingabeschaltungen konfiguriert sind, den Normalisierungsfaktor zu inkrementieren.clause B15 , Integrated circuit after clause B14 wherein in selectively routing the subset operand to the input, the input circuits are configured to increment the normalization factor.

Klausel B16. Integrierte Schaltung nach Klausel B14 oder 15, wobei die Eingabeschaltungen konfiguriert sind, den Operanden oder den Teilsatzoperanden selektiv zu einer zusätzlichen Eingabe zusätzlicher Addiererschaltungen weiterzuleiten, wobei die zusätzlichen Addiererschaltungen innerhalb einer zusätzlichen Addiererbaumstufe angeordnet sind.clause B16 , Integrated circuit after clause B14 or 15 wherein the input circuits are configured to selectively pass the operand or the subset operand to an additional input of additional adder circuits, wherein the additional adder circuits are arranged within an additional adder tree stage.

Klausel B17. Integrierte Schaltung nach Klausel B16, wobei eine Anzahl von Bits in dem Teilsatz der Mehrzahl von Bits teilweise davon abhängig ist, ob die Eingabeschaltungen konfiguriert sind, den Teilsatzoperanden zu den Addiererschaltungen oder zu den zusätzlichen Addiererschaltungen weiterzuleiten.clause B17 , Integrated circuit after clause B16 wherein a number of bits in the subset of the plurality of bits is partially dependent on whether the input circuits are configured to pass the subset operands to the adder circuits or to the additional adder circuits.

Klausel B18. Verfahren, um einen Addiererbaum zu bauen, umfassend:

Bestimmen einer Anzahl von Operanden zur Eingabe in einen Addiererbaum;
Bestimmen einer Bitbreite jedes der Operanden;
Bestimmen einer zweiten Bitbreite einer Ausgabe des Addiererbaums;
Bestimmen einer Anzahl von entfernbaren Bits zum Trennen von jedem der Operanden zumindest teilweise auf Grundlage der zweiten Bitbreite;
Bestimmen eines Werts zumindest teilweise auf Grundlage eines zusätzlichen Werts der entfernbaren Bits; und Implementieren des Addiererbaums mit Schaltungen, die konfiguriert sind:
- die Operanden als Eingaben zu empfangen;
- die entfernbaren Bits von jedem der Operanden zu trennen, um eine Mehrzahl von Teilsatzoperanden zu erzeugen; und
- eine Ausgabe teilweise auf Grundlage einer Summe der Teilsatzoperanden und des Werts zu erzeugen.

clause B18 , A method of building an adder tree, comprising:

Determining a number of operands for input to an adder tree;
Determining a bit width of each of the operands;
Determining a second bit width of an output of the adder tree;
Determining a number of removable bits for separating each of the operands based at least in part on the second bit width;
Determining a value based at least in part on an additional value of the removable bits; and implementing the adder tree with circuits configured:
- receive the operands as inputs;
- separating the removable bits from each of the operands to produce a plurality of subset operands; and
- to generate an output partially based on a sum of the subset operands and the value.

Klausel B19. Verfahren nach Klausel B18, wobei das Bestimmen des Werts das Bestimmen des Werts zumindest teilweise auf Grundlage einer Verteilung möglicher zusätzlicher Werte der entfernbaren Bits umfasst.clause B19 , Procedure according to clause B18 wherein determining the value comprises determining the value based at least in part on a distribution of possible additional values of the removable bits.

Klausel B20. Verfahren nach Klausel B18 oder 19, wobei der Wert einen Carry-In-Wert zur Summe umfasst.clause B20 , Procedure according to clause B18 or 19 , where the value comprises a carry-in value to the sum.

Klausel B21. Materielles, nichtflüchtiges maschinenlesbares Medium, umfassend maschinenlesbare Anweisungen, die bei Ausführung durch einen oder mehrere Prozessoren bewirken, dass die Prozessoren das Verfahren nach Klausel B18, 19 oder 20 durchführen.clause B21 , A material, non-transitory, machine readable medium comprising machine readable instructions that, when executed by one or more processors, cause the processors to perform the method by clause B18 . 19 or 20 carry out.

Klausel C1. Integrierte Schaltung mit einem Addiererbaum, der konfiguriert ist, eine Summe zumindest teilweise auf Grundlage einer Ausgabe und eines zusätzlichen Werts zu erzeugen, der Addiererbaum umfassend:

clause C1 , An integrated circuit having an adder tree configured to generate a sum based at least in part on an output and an additional value, the adder tree comprising:

Klausel C2. Integrierte Schaltung nach Klausel C1, wobei die zusätzlichen Schaltungen einen nachfolgenden Addiererbaum umfassen, wobei der nachfolgende Addiererbaum zusätzliche Addiererschaltungen umfasst, die konfiguriert sind, den zusätzlichen Wert zumindest teilweise auf Grundlage einer Summierung des einen oder der mehreren Bits und des/der zusätzlichen einen oder mehreren Bits zu erzeugen.clause C2 , Integrated circuit after clause C1 wherein the additional circuitry comprises a subsequent adder tree, the subsequent adder tree comprising additional adder circuits configured to generate the additional value based at least in part on a summation of the one or more bits and the additional one or more bits.

Klausel C3. Integrierte Schaltung nach Klausel C1 oder 2, wobei die zusätzlichen Schaltungen konfiguriert sind, den zusätzlichen Wert teilweise auf Grundlage einer Verteilung möglicher Werte des einen oder der mehreren Bits zu erzeugen.clause C3 , Integrated circuit after clause C1 or 2 wherein the additional circuitry is configured to generate the additional value in part based on a distribution of possible values of the one or more bits.

Klausel C4. Integrierte Schaltung nach Klausel C1, 2 oder 3, wobei die Softlogikschaltungen die zusätzlichen Schaltungen umfassen und konfiguriert sind, den zusätzlichen Wert durch Emulieren einer konstanten Komprimierung des zusätzlichen Werts zu erzeugen.clause C4 , Integrated circuit after clause C1 . 2 or 3 wherein the soft logic circuits comprise the additional circuitry and are configured to generate the additional value by emulating a constant compression of the additional value.

Klausel C5. Integrierte Schaltung nach Klausel C1, 2, 3 oder 4, wobei die Softlogikschaltungen eine Lookup-Tabelle umfassen, die konfiguriert ist, den zusätzlichen Wert teilweise auf Grundlage des einen oder der mehreren Bits zu erzeugen.clause C5 , Integrated circuit after clause C1 . 2 . 3 or 4 wherein the soft logic circuits comprise a look-up table configured to generate the additional value in part based on the one or more bits.

Klausel C6. Integrierte Schaltung nach Klausel C1, 2, 3, 4, oder 5, wobei der Addiererbaum konfiguriert ist, den zusätzlichen Wert an die Ausgabe anzuhängen, den zusätzlichen Wert der Ausgabe voranzustellen, oder zu einer Kombination davon konfiguriert ist.clause C6 , Integrated circuit after clause C1 . 2 . 3 . 4 , or 5 wherein the adder tree is configured to append the additional value to the output, prepend the additional value of the output, or is configured to a combination thereof.

Klausel C7. Integrierte Schaltung nach Klausel C1, 2, 3, 4, 5 oder 6, wobei der Addiererbaum konfiguriert ist, die Summe zumindest teilweise auf Grundlage einer Summierung des zusätzlichen Werts und der Ausgabe zu erzeugen.clause C7 , Integrated circuit after clause C1 . 2 . 3 . 4 . 5 or 6 wherein the adder tree is configured to generate the sum at least in part based on a summation of the additional value and the output.

Klausel C8. Integrierte Schaltung nach Klausel C1, 2, 3, 4, 5, 6 oder 7, wobei der Addiererbaum konfiguriert ist, eine Multiplikationsoperation eines Multiplikanden und eines Multiplikators durchzuführen, wobei der erste Operand den Multiplikanden umfasst, wobei der zweite Multiplikator den Multiplikator umfasst, wobei die Addiererschaltungen konfiguriert sind, die Ausgabe zumindest teilweise auf Grundlage eines ersten partiellen Produkts des ersten Teilsatzoperanden und des zweiten Teilsatzoperanden und eines zweiten partiellen Produkts des ersten Teilsatzoperanden und des zweiten Teilsatzoperanden zu erzeugen, und wobei die zusätzlichen Schaltungen konfiguriert sind, den zusätzlichen Wert zumindest teilweise auf Grundlage eines ersten Bits mit niedrigstem Stellenwert eines dritten partiellen Produkts des Multiplikanden und des Multiplikators und eines zweiten Bits mit niedrigstem Stellenwert eines vierten partiellen Produkts des Multiplikanden und des Multiplikators zu erzeugen.clause C8 , Integrated circuit after clause C1 . 2 . 3 . 4 . 5 . 6 or 7 wherein the adder tree is configured to perform a multiplication operation of a multiplicand and a multiplier, the first operand comprising the multiplicand, the second multiplier comprising the multiplier, the adder circuits configured to output the output based at least in part on a first partial product of the first Generate subset operands and the second subset operand and a second partial product of the first subset operand and the second subset operand, and wherein the additional circuits are configured, the additional value based at least in part on a first least significant bit of a third partial product of the multiplicand and the multiplier and a second least significant bit of a fourth partial product of the multiplicand and the multiplier.

Klausel C9. Integrierte Schaltung nach Klausel C8, wobei die Summe ein Produkt der Multiplikationsoperation umfasst und wobei der zusätzliche Wert einen Carry-In-Wert zur Ausgabe umfasst.clause C9 , Integrated circuit after clause C8 wherein the sum comprises a product of the multiplication operation and wherein the additional value comprises a carry-in value for output.

Klausel C10. Integrierte Schaltung nach Klausel C1, 2, 3, 4, 5, 6, 7 oder 8, wobei das eine oder die mehreren Bits ein oder mehrere Bits mit höchstem Stellenwert, ein oder mehrere Bits mit niedrigstem Stellenwert oder eine Kombination davon umfassen.clause C10 , Integrated circuit after clause C1 . 2 . 3 . 4 . 5 . 6 . 7 or 8th wherein the one or more bits comprise one or more most significant bits, one or more least significant bits, or a combination thereof.

Klausel C11. Integrierte Schaltung nach Klausel C10, wobei das eine oder die mehreren Bits mit höchstem Stellenwert ein oder mehrere Vorzeichenbits umfassen, und wobei der zusätzliche Wert ein Vorzeichenbit umfasst, das teilweise auf Grundlage des einen oder der mehreren Bits und des/der zusätzlichen einen oder mehreren Bits dekodiert wird.clause C11 , Integrated circuit after clause C10 wherein the one or more most significant bits comprise one or more sign bits, and wherein the additional value comprises a sign bit which is partially decoded based on the one or more bits and the additional one or more bits.

Klausel C12. Integrierte Schaltung mit einer Addiererbaumstufe, die Addiererbaumstufe umfassend: Addiererschaltungen, die eine Eingabe umfassen und konfiguriert sind, eine Ausgabe und einen Normalisierungsfaktor zumindest teilweise auf Grundlage der Eingabe zu erzeugen; und
Eingabeschaltungen, die konfiguriert sind:

clause C12 , An adder tree stage integrated circuit, the adder tree stage, comprising: adder circuits comprising an input and configured to generate an output and a normalization factor based at least in part on the input; and
Input circuits that are configured:

Klausel C13. Integrierte Schaltung nach Klausel C12, wobei beim selektiven Weiterleiten des Teilsatzoperanden zu der Eingabe die Eingabeschaltungen konfiguriert sind, den Normalisierungsfaktor zu inkrementieren.clause C13 , Integrated circuit after clause C12 wherein in selectively routing the subset operand to the input, the input circuits are configured to increment the normalization factor.

Klausel C14. Materielles, nichtflüchtiges maschinenlesbares Medium, umfassend maschinenlesbare Anweisungen, die bei Ausführung durch einen oder mehrere Prozessoren bewirken, dass die Prozessoren:

clause C14 , A material, non-transitory, machine-readable medium comprising machine-readable instructions that, when executed by one or more processors, cause the processors to:

Klausel C15. Maschinenlesbares Medium nach Klausel C14, wobei der Wert einen Carry-In-Wert zur Summe umfasst.clause C15 , Machine-readable medium according to clause C14 , where the value comprises a carry-in value to the sum.

ZITATE ENTHALTEN IN DER BESCHREIBUNG QUOTES INCLUDE IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of the documents listed by the applicant has been generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte PatentliteraturCited patent literature

US 62532871 [0001]

Claims

An integrated circuit having an adder tree configured to generate a sum based at least in part on an output and an additional value, the adder tree comprising: first input circuits configured to receive a first operand, the first operand comprising a first plurality of bits; second input circuits configured to receive a second operand, the second operand comprising a second plurality of bits; Soft logic circuits configured to separate one or more bits from the first plurality of bits to generate a first subset operand and configured to separate an additional one or more bits from the second plurality of bits to produce a second subset operand ; Adder circuits configured to generate the output based at least in part on the first subset operand and the second subset operand; and additional circuitry configured to generate the additional value based at least in part on the one or more bits.

Integrated circuit after Claim 1 wherein the additional circuitry comprises a subsequent adder tree, the subsequent adder tree comprising additional adder circuits configured to generate the additional value based at least in part on a summation of the one or more bits and the additional one or more bits.

Integrated circuit after Claim 2 wherein the subsequent adder tree is configured to separate a bit from the one or more bits to produce a subset of the one or more bits, and wherein the additional adder circuits are configured to map the additional value based at least in part on the one or more bits Subset of the one or more bits and the additional one or more bits.

Integrated circuit according to one of the Claims 1 or 2 wherein the additional circuitry is configured to generate the additional value in part based on a distribution of possible values of the one or more bits.

Integrated circuit according to one of the Claims 1 . 2 or 4 wherein the soft logic circuits comprise the additional circuitry and are configured to generate the additional value by emulating a constant compression of the additional value.

Integrated circuit according to one of the Claims 1 . 2 . 4 or 5 wherein the soft logic circuits comprise a look-up table configured to generate the additional value in part based on the one or more bits.

Integrated circuit according to one of the Claims 1 . 2 . 4 . 5 or 6 wherein the one or more bits comprise one or more least significant bits.

Integrated circuit according to one of the Claims 1 . 2 . 4 . 5 . 6 , or 7, wherein the adder tree is configured to append the additional value to the output, prepend the additional value of the output, or configured to a combination thereof.

Integrated circuit according to one of the Claims 1 . 2 . 4 . 5 . 6 . 7 or 8th wherein the adder tree is configured to generate the sum at least in part based on a summation of the additional value and the output.

Integrated circuit according to one of the Claims 1 . 2 . 4 . 5 . 6 . 7 . 8th or 9 wherein the adder tree is configured to perform a multiplication operation of a multiplicand and a multiplier, the first operand comprising the multiplicand, the second multiplier comprising the multiplier, the adder circuits configured to output the output based at least in part on a first partial product of the first Generate subset operands and the second subset operand and a second partial product of the first subset operand and the second subset operand, and wherein the additional circuits are configured, the additional value based at least in part on a first least significant bit of a third partial product of the multiplicand and the multiplier and a second least significant bit of a fourth partial product of the multiplicand and the multiplier.

Integrated circuit after Claim 10 wherein the sum comprises a product of the multiplication operation and wherein the additional value comprises a carry-in value for output.

Integrated circuit according to one of the Claims 1 . 2 . 4 . 5 . 6 . 7 . 8th . 9 , or 10, wherein the one or more bits comprise one or more most significant bits.

Integrated circuit after Claim 12 wherein the one or more most significant bits comprise one or more sign bits, and wherein the additional value comprises a sign bit which is partially decoded based on the one or more bits and the additional one or more bits.

Integrated circuit having an adder tree stage, the adder tree stage comprising: Adder circuits comprising an input and configured to generate an output and a normalization factor based at least in part on the input; and Input circuits that are configured: receive an operand, the operand comprising a plurality of bits; determine a most significant bit of the plurality of bits; and based at least in part on the most significant bit, to selectively pass the operand or subset operand to the input, the subset operand comprising a subset of the plurality of bits.

Integrated circuit after Claim 14 wherein in selectively routing the subset operand to the input, the input circuits are configured to increment the normalization factor.

The integrated circuit of any one of the preceding claims, wherein the input circuits are configured to selectively pass the operand or the subset operand to additional input of additional adder circuits, wherein the additional adder circuits are located within an additional adder tree stage.

Integrated circuit after Claim 16 wherein a number of bits in the subset of the plurality of bits is partially dependent on whether the input circuits are configured to pass the subset operands to the adder circuits or to the additional adder circuits.

A material, non-transitory, machine-readable medium comprising machine-readable instructions that, when executed by one or more processors, cause the processors to: determine a number of operands for input to an adder tree; determine a bit width of each of the operands; determine a second bit width of an output of the adder tree; determine a number of removable bits for separating each of the operands based at least in part on the second bit width; determine a value based at least in part on an additional value of the removable bits; and Build the adder tree that is configured: receive the operands as inputs; separating the removable bits from each of the operands to produce a plurality of subset operands; and to generate an output partially based on a sum of the subset operands and the value.

Machine-readable medium after Claim 18 wherein the machine-readable instructions, when executed by one or more processors, cause the processors to determine the value based at least in part on a distribution of possible additional values of the removable bits.

The machine-readable medium of any one of the preceding claims, wherein the value comprises a carry-in value to the sum.