WO2021185261A1 - 计算装置、方法、板卡和计算机可读存储介质 - Google Patents

计算装置、方法、板卡和计算机可读存储介质 Download PDF

Info

Publication number
WO2021185261A1
WO2021185261A1 PCT/CN2021/081188 CN2021081188W WO2021185261A1 WO 2021185261 A1 WO2021185261 A1 WO 2021185261A1 CN 2021081188 W CN2021081188 W CN 2021081188W WO 2021185261 A1 WO2021185261 A1 WO 2021185261A1
Authority
WO
WIPO (PCT)
Prior art keywords
component
bit width
bit
value
width value
Prior art date
Application number
PCT/CN2021/081188
Other languages
English (en)
French (fr)
Inventor
刘少礼
周诗怡
刘道福
Original Assignee
安徽寒武纪信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 安徽寒武纪信息科技有限公司 filed Critical 安徽寒武纪信息科技有限公司
Priority to JP2021576637A priority Critical patent/JP7269382B2/ja
Priority to EP21771952.5A priority patent/EP4024288B1/en
Publication of WO2021185261A1 publication Critical patent/WO2021185261A1/zh
Priority to US17/557,669 priority patent/US20220253280A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/491Computations with decimal numbers radix 12 or 20.
    • G06F7/498Computations with decimal numbers radix 12 or 20. using counter-type accumulators
    • G06F7/4981Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/499Denomination or exception handling, e.g. rounding or overflow
    • G06F7/49942Significance control
    • G06F7/49947Rounding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • This disclosure generally relates to data processing. More specifically, the present disclosure relates to a computing device, method, integrated circuit board, and computer-readable storage medium for processing multi-bit width values.
  • the data bit width processed by different types of processors may be different.
  • the data bit width processed by them is often limited.
  • the data bit width that can usually be processed does not exceed 16 bits, such as 16-bit integer data.
  • how to enable a processor with a limited bit width to process more bit width data has become a technical problem that needs to be solved.
  • the solution of the present disclosure provides a solution for splitting multi-bit wide data.
  • multi-bit-wide data can be split into at least two data with a smaller bit-width for expression, so that in scenarios where the processing bit-width of the processor is limited, two bit-width data can be used. Small data to participate in the calculation.
  • the present disclosure provides a computing device for processing multi-bit width values for neural network operations, including: an input circuit configured to receive the multi-bit width values and configuration information , Wherein the configuration information includes at least the bit width information of the first component and the bit width information of the second component representing the multi-bit bit width value; the first component calculation circuit is configured to The scaling factor associated with the bit width information calculates the adjusted multi-bit width value to obtain a first component representing the multi-bit width value; a second component calculation circuit is configured to at least according to the adjusted multi-bit width value The multi-bit bit width value and the value of the first component are calculated to obtain a second component representing the multi-bit bit width value; and an output circuit configured to output one of the first component and the second component at least one.
  • the present disclosure provides an integrated circuit chip including the aforementioned computing device.
  • the present disclosure provides an integrated circuit board card, which includes the aforementioned integrated circuit chip.
  • the present disclosure provides a method for processing multi-bit width values for neural network operations, including: receiving the multi-bit width values and configuration information, wherein the configuration information is at least Including the bit width information of the first component and the bit width information of the second component representing the multi-bit bit width value; the adjusted multi-bit bit width value is adjusted according to the scaling factor associated with the bit width information of the second component Performing calculations to obtain the first component representing the multi-bit width value; and performing calculations at least according to the adjusted multi-bit width value and the value of the first component to obtain the multi-bit width value The second component of the numerical value; and at least one of the first component and the second component is output.
  • the present disclosure provides a computing device for processing multi-bit width values, including: a processor; a memory, which is used to store program instructions, when the program instructions are processed by the at least one When the device is executed, the computing device is caused to execute the aforementioned method.
  • the present disclosure provides a computer-readable storage medium on which is stored program instructions for processing multi-bit width values for neural network operations, when the program instructions are run by the processor , Execute the aforementioned method.
  • the solution of the present disclosure can split a multi- (or high) bit width value into multiple small (or low) bit widths It is expressed as a numerical value, so that in artificial intelligence application scenarios such as neural network operations or other general scenarios, it is not limited by the processing bit width of the processor, and the computing power of the processor is fully utilized. Furthermore, in some neural network computing scenarios that require low bit width values, the solution of the present disclosure can also simplify the calculation of the neural network by splitting the multi-bit width value into multiple low-bit width expressions, thereby improving the calculation. s efficiency.
  • FIG. 1 is a simplified block diagram showing a computing device according to an embodiment of the present disclosure
  • FIG. 2 is a detailed block diagram showing a computing device according to an embodiment of the present disclosure
  • FIG. 3 is a flowchart showing a process for processing a multi-bit width value according to an embodiment of the present disclosure
  • Fig. 4 is a structural diagram showing a combined processing device according to an embodiment of the present disclosure.
  • Fig. 5 is a schematic diagram showing the structure of a board card according to an embodiment of the present disclosure.
  • the solution of the present disclosure overcomes the limitation of the processor's bit width and simplifies the calculation by allowing the multi-bit width (for example, 24 bits) value to be expressed by at least two low-bit (for example, 16 bits and 8 bits) bit width components. Complexity, thereby improving the computational efficiency of, for example, neural network calculations.
  • the source or initial value can be divided into a high-bit part and a low-bit part according to the bit distribution and configuration information of the input multi-bit bit-width value, and the high-bit part and the low-bit part The split calculation is partially performed to obtain the first component and the second component corresponding to the high-bit part and the low-bit part.
  • the solution of the present disclosure can also decompose the aforementioned source value into multiple required components according to the configuration information. Split the calculation to obtain more than three components.
  • FIG. 1 is a simplified block diagram showing a computing device 100 according to an embodiment of the present disclosure.
  • the computing device 100 can process multi-bit bit width values for use in various application scenarios, such as artificial intelligence applications including neural network operations or those that need to split values for calculation.
  • the multi-bit bit width value may include two parts, a high-bit part and a low-bit part, for subsequent splitting into two or more components.
  • the aforementioned neural network operations may include various operations in training neural networks, such as weight update or gradient calculation in the reverse propagation direction.
  • the computing device 100 includes an input circuit 102 configured to receive a multi-bit width value and configuration information, wherein the configuration information includes at least the bit width of the first component representing the multi-bit bit width value Information and bit width information of the second component.
  • the configuration information may further include sign information indicating whether the bit width information of the second component includes a sign bit.
  • the configuration information may also include valid bit information, for example, which bits in the multi-bit width value are forcibly designated as valid bits, so as to calculate the valid bits in subsequent split calculations.
  • the first component calculation circuit may be configured to calculate the adjusted multi-bit bit width value according to a scaling factor associated with the bit width information of the second component to obtain a representative value of the multi-bit width.
  • the first component of the bit width value may determine a scaling factor according to the bit width information of the second component, and use the scaling factor to calculate the multi-bit bit width value to obtain the first component. For example, when the bit width of the first component is n1 and the bit width of the second component is n2, when the sign bit is not considered in n2, the scaling factor may be 2 n2-1 . In contrast, when considering that the sign bit is included in n2, the scaling factor can be 2 n2 .
  • the computing device of the present disclosure may also determine the multi-bit bit width value and the size of the designated data to perform corresponding numerical adjustment. For example, when the designated data is zero, when the multi-bit width value is greater than or equal to zero, it is added to a given constant to obtain the adjusted multi-bit width value. In contrast, when the multi-bit width value is less than zero, the multi-bit width value is subtracted from the aforementioned given constant to obtain the adjusted multi-bit width value.
  • the second component calculation circuit may be configured to perform calculation at least according to the adjusted multi-bit width value and the value of the first component to obtain a representative value of the multi-bit width.
  • the second component of the wide value when the sign information indicates that the bit width information of the second component does not include a sign bit, for example, the highest bit of the second component bit width is not a sign bit, the second component calculation circuit may It is configured to subtract the value of the first component from the adjusted multi-bit width value to obtain the second component.
  • the value of the first component may be a product value of the first component and the aforementioned scaling factor.
  • the second component calculation circuit may be configured The adjustment value is determined according to the bit width information of the second component, and calculation is performed according to the multi-bit bit width value, the value about the first component, and the adjustment value to obtain the second component. It is understandable that in some scenarios, the aforementioned configuration information may not include symbol information. In this case, through, for example, the initial default settings, the computing device of the present disclosure can be configured to directly perform the splitting operation considering the sign bit or perform the splitting operation without considering the sign bit, instead of determining whether or not based on the configuration information. Consider the judgment of the sign bit.
  • various corresponding rounding operations may be performed on the intermediate value before obtaining the first component and the second component according to different adjustment methods of the aforementioned multi-bit bit width value. These rounding operations can include rounding to zero, rounding, rounding up, or rounding down accordingly. For example, after the adjusted multi-bit width value is calculated using the scaling factor, the obtained value may be set toward zero to obtain the first component. Similarly, after subtracting the value related to the first component and the adjusted value from the adjusted multi-bit width value, the obtained value may also be rounded to zero to obtain the second component.
  • the computing device of the present disclosure further includes an output circuit 108, which can be configured to output at least one of the first component and the second component .
  • the first component and the second component output by the output circuit can be used in various calculations in neural networks that need to use low-bit-width data, such as back propagation in neural network training. Weight update and gradient calculation. In some application scenarios, it is also possible to directly store the obtained first component and second component to replace the multi-bit width value for later use.
  • the first component and the second component of the low-bit width output by the output circuit are used for the fixed-point operation of the fixed-point processor.
  • the processor will get rid of the limitation of not being able to process multi- or high-bit-width data, expand the calculation scenarios of the fixed-point processor and simplify the calculation, thereby also improving the calculation efficiency and reducing the calculation overhead.
  • FIG. 2 is a detailed block diagram showing a computing device 200 according to an embodiment of the present disclosure. It can be seen from FIG. 2 that the computing device 200 not only includes the input circuit 102, the first component circuit 104, the second component circuit 106, and the output circuit 108 of the computing device 100 in FIG. 1, but also shows the first and second components. Multiple circuits contained in the component circuit, and additionally multiple other devices. Since the functions of the input circuit, the first component and the second component calculation circuit, and the output circuit have been described in detail in the foregoing in conjunction with FIG. 1, details will not be repeated below.
  • the calculation device 200 may further include a type converter 110, which may be configured to convert the input data into the same data type as the multi-bit width value, that is, the first component calculation circuit and the second component calculation circuit Data types supported by the two-component calculation circuit.
  • a type converter 110 which may be configured to convert the input data into the same data type as the multi-bit width value, that is, the first component calculation circuit and the second component calculation circuit Data types supported by the two-component calculation circuit.
  • the computing device of the present disclosure can perform splitting on data of a different type from the data type supported by the splitting operation. For example, when the computing device of the present disclosure supports the splitting of fixed-point numeric values, when the input circuit receives a floating-point numeric value, the floating-point numeric value can be converted into a fixed-point integer numeric value through the type converter, so that the The first component calculation circuit and the second component calculation circuit are split.
  • the computing device of the present disclosure supports the splitting of floating-point values
  • the input circuit receives a fixed-point value
  • the fixed-point value can be converted into a floating-point value by the type converter, so that the The first component calculation circuit and the second component calculation circuit are split.
  • the computing device may further include a determination circuit 112, which may be configured to determine the multi-bit width value and the size of the designated data, and send the determination result to the addition circuit 114.
  • the addition circuit may be configured to perform an addition or subtraction operation on the multi-bit width value and a given constant based on the determination result to obtain the adjusted multi-bit width value.
  • the addition circuit here may include a negative number circuit, so that the subtraction operation can be converted into an addition operation.
  • the aforementioned addition circuit may be an adder that supports subtraction operations.
  • the adder is only exemplary. According to the teaching of the present disclosure, those skilled in the art can also arrange the adder and the subtractor separately to complete the corresponding addition operation and subtraction operation respectively.
  • the further illustrated first component calculation circuit 104 may include a scaling circuit 1041 and a rounding circuit 1042.
  • the scaling circuit is configured to perform a shift operation on the adjusted multi-bit width value according to the aforementioned scaling factor. For example, when the scaling factor is 2 n2 , using the shift circuit to perform the shift operation is to move the multi-bit width value to the higher order by n2 bits. When the high bit is on the left side of the multi-bit width value and the low bit is on the right side, moving n2 bits to the high bit means moving n2 bits to the left.
  • the shift circuit here can be constructed by a multiplier.
  • the first component calculation circuit further includes a rounding circuit 1042, which may be configured to perform the shift operation on the multi-bit value.
  • the bit width value is rounded to obtain the first component.
  • the rounding operation here may include various forms of rounding operations, such as rounding up, rounding down, and rounding to zero. After such a rounding operation, the first component related to the high-bit part of the multi-bit width value can be obtained.
  • the second component calculation circuit 106 may include a subtraction circuit 1061 configured to subtract the value related to the first component and the adjustment value from the multi-bit width value to obtain the The second component.
  • the value of the first component here may be the product value of the first component and the scaling factor, and the adjustment value is the aforementioned value after considering the sign bit of the low-bit part.
  • the computing device 200 of the present disclosure additionally includes a selector 116, which may be configured to output at least one of the first component and the second component to the output circuit 108 for output.
  • the selector 116 may select to output the first component, the second component, or both according to the information about the input items included in the configuration information.
  • Such selective output has technical advantages in some application scenarios. For example, when only the first component or the second component is required to participate in subsequent operations, the output circuit does not need to output both at this time, thereby saving overhead in terms of output.
  • the computing device of the present disclosure may also calculate and output only the first component, thereby further saving calculation overhead.
  • the computing device of the present disclosure can also be used to decompose the multi-bit bit width value into multiple components specified by the user or required by the algorithm for expression.
  • the aforementioned configuration information may include information on the number of components.
  • the calculation device of the present disclosure repeatedly executes the first component calculation circuit and the second component calculation circuit according to the configuration information until the component of the number of components is obtained.
  • the first component calculation circuit and the second component calculation circuit can be used to The 24-bit bit-width value is split into an 8-bit-wide first component and a 16-bit-wide middle second component. Then, the obtained 16-bit wide middle second component value is re-input to the first component calculation circuit and the second component calculation circuit to further split it into an 8-bit wide second component and an 8-bit wide The third component.
  • FIG. 3 is a flowchart illustrating a method 300 for processing a multi-bit width value according to an embodiment of the present disclosure. Through the processing of the method 300, at least the multi-bit width value can be split into the first component and the second component representing it.
  • the method 300 receives the multi-bit width value and configuration information, where the configuration information includes at least the bit width information and the bit width information representing the first component of the multi-bit bit width value. Bit width information of the second component. With such configuration information, the method 300 can at least determine the bit width value corresponding to the first component or the second component to be split.
  • the method 300 calculates the adjusted multi-bit width value according to the scaling factor associated with the bit width information of the second component to obtain the first component representing the multi-bit width value.
  • the adjustment here can be to compare the multi-bit width value with the specified data, and make the multi-bit width value and a given constant perform addition or subtraction according to the result of the comparison Operate to complete the corresponding adjustments.
  • the method 300 may determine a scaling factor according to the bit width information of the second component, and use the scaling factor to calculate the adjusted multi-bit bit width value to obtain the first component.
  • the method 300 may determine the scaling factor according to the configuration information including sign information about whether the second component contains a sign bit.
  • the method 300 After calculating the first component representing the multi-bit width value, the method 300 proceeds to step 306. At this step 306, the method 300 performs calculation at least according to the adjusted multi-bit width value and the aforementioned first component value to obtain the second component representing the multi-bit width value.
  • the method 300 may further include determining and adjusting according to the bit width information of the second component. Value, and calculating according to the adjusted multi-bit width value, the value about the first component, and the adjustment value to obtain the second component.
  • the value related to the first component and the adjustment value may be subtracted from the adjusted multi-bit width value to obtain the second component. Further, after subtracting the value related to the first component and the adjustment value from the multi-bit width value, a rounding operation corresponding to the operation of adjusting the multi-bit width value may be performed on the obtained value. To obtain the second component.
  • the method 300 After obtaining the first component and the second component, the method 300 proceeds to step 308.
  • the method 300 outputs at least one of the first component and the second component.
  • the first component, the second component, or both may be selectively output according to the configuration information.
  • the method 300 may repeatedly perform the split operation according to the number of components in the configuration information, until the multi-bit width value is split to the required number. .
  • the method 300 may further include determining at least one of the first component and the second component as the next to-be-processed component according to the configuration information. The new multi-bit width value of.
  • the method 300 can perform an adjustment operation on the new multi-bit width data to obtain the adjusted new multi-bit width value, and then can perform calculations based on the bit width information in the configuration information to obtain a representative of the new multi-bit width data.
  • the first component and the second component of the multi-bit width value may repeatedly perform the foregoing calculation steps of the first component and the second component until the predetermined number of components is obtained.
  • f can be adjusted by the following formulas (1) and (2):
  • the "0.5” in the above formula is the aforementioned given constant. Through the calculation of the above formula (1) or (2), the adjusted multi-bit width value can be obtained.
  • the first component I 1 can be calculated by the following formula (3):
  • to_zero is a function of rounding to zero
  • 2 n2-1 represents the aforementioned scaling factor
  • the second component does not include the sign bit at this time.
  • the scaling factor is also related to whether the second component includes a sign bit.
  • the scaling factor here can be 2 n2 (described below)
  • the second component I 2 can be calculated by the following formula:
  • I 1 ⁇ 2 n2-1 in the above formula (2) is the value of the aforementioned first component.
  • I 1 is multiplied by the scaling factor 2 n2-1 , it is equivalent to moving it by n2-1 bits in the direction of higher bits.
  • the first component and the second component representing the multi-bit width value are obtained.
  • f can be adjusted by the following formulas (5) and (6):
  • the "0.5" in the above formula is the aforementioned given constant. Through the calculation of the above formula (5) or (6), the adjusted multi-bit width value can be obtained.
  • the first component I 1 can be calculated by the following formula (7):
  • 2 n2 is used here to represent the aforementioned scaling factor instead of "2 n2-1 "in formula (1).
  • the second component I 2 can be calculated by the following equations (8) and (9):
  • equations (8) and (9) have the adjustment value "2 n2-1 "which is not included in equation (4). Furthermore, the term “I 1 ⁇ 2 n2 "in equations (8) and (9) is the value of the first component.
  • the solution of the present disclosure can also perform a corresponding rounding operation on the obtained value, such as the rounding to zero used in formula (8)
  • the function "to_zero()” and the round-down function "floor()” used in Equation 9 are used to obtain the final first and second components. Since equations (8) and (9) consider the sign bit, the obtained first component and second component have less loss in expressing the multi-bit bit width value before splitting.
  • Tables (1)-Table (2) list the splitting results considering the sign bit
  • Tables (3) and (4) List the split results without considering the sign bit.
  • FIG. 4 is a structural diagram of a combined processing device 400 according to an embodiment of the present disclosure.
  • the combined processing device 400 includes the aforementioned computing device 402, which can be configured to execute the aforementioned splitting method described in conjunction with the accompanying drawings.
  • the combined processing device also includes a universal interconnection interface 404 and other processing devices 406.
  • the computing device 402 according to the present disclosure can interact with other processing devices 406 via the universal interconnect interface 404 to jointly complete user-specified operations, such as splitting a multi-bit width value to obtain at least the first component and the second component.
  • the other processing device may include one or more types of general-purpose and/or special-purpose processors such as central processing unit (“CPU”), graphics processing unit (“GPU”), and artificial intelligence processor.
  • CPU central processing unit
  • GPU graphics processing unit
  • artificial intelligence processor an artificial intelligence processor.
  • the number of processors can not be limited but determined according to actual needs.
  • the other processing device can be used as an interface between the computing device of the present disclosure (which can be embodied as a related computing device of artificial intelligence such as neural network operations) and external data and control.
  • the execution includes but is not limited to Data transfer, complete basic control of the opening and stopping of the computing device; other processing devices can also cooperate with the computing device to complete computing tasks.
  • the universal interconnection interface can be used to transmit data and control commands between the computing device and other processing devices.
  • the computing device may obtain input data to be split from other processing devices via the universal interconnection interface, and write the input data to the on-chip storage device (or called memory) of the computing device.
  • the computing device may obtain control instructions from other processing devices via the universal interconnection interface, and write them into the on-chip control buffer of the computing device.
  • the universal interconnection interface can also read the data in the storage module of the computing device and transmit it to other processing devices.
  • the combined processing device may further include a storage device 408, which may be connected to the computing device and the other processing device respectively.
  • the storage device may be used to store data of the computing device and the other processing device, especially those data that cannot be fully saved in the internal or on-chip storage device of the computing device or other processing device.
  • the combined processing device of the present disclosure can be used as an SOC system-on-chip for mobile phones, robots, drones, video surveillance equipment and other equipment, effectively reducing the core area of the control part, increasing processing speed, and reducing overall power consumption.
  • the universal interconnection interface of the combined processing device is connected to some parts of the equipment. Some components such as camera, monitor, mouse, keyboard, network card or wifi interface.
  • the present disclosure also discloses a chip, which includes the above-mentioned test device or combined processing device. In other embodiments, the present disclosure also discloses a chip packaging structure, which includes the above-mentioned chip.
  • the present disclosure also discloses a board card, which includes the above-mentioned chip packaging structure. Referring to FIG. 5, it provides the aforementioned exemplary board. In addition to the aforementioned chip 502, the aforementioned board may also include other supporting components, including but not limited to: a storage device 504, an interface device 506, and a control device. 508.
  • the storage device is connected to the chip in the chip packaging structure through a bus for storing data.
  • the storage device may include multiple groups of storage units 510. Each group of the storage unit and the chip are connected by a bus. It can be understood that each group of the storage units may be DDR SDRAM ("Double Data Rate SDRAM, Double Rate Synchronous Dynamic Random Access Memory").
  • the storage device may include 4 groups of the storage unit. Each group of the storage unit may include a plurality of DDR4 particles (chips). In an embodiment, the chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification.
  • each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel.
  • DDR can transmit data twice in one clock cycle.
  • a controller for controlling the DDR is provided in the chip for controlling the data transmission and data storage of each storage unit.
  • the interface device is electrically connected with the chip in the chip packaging structure.
  • the interface device is used to implement data transmission between the chip and an external device 512 (for example, a server or a computer).
  • the interface device may be a standard PCIE interface.
  • the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer.
  • the interface device may also be other interfaces.
  • the present disclosure does not limit the specific manifestations of the other interfaces mentioned above, and the interface unit only needs to be able to realize the switching function.
  • the calculation result of the chip is still transmitted by the interface device back to an external device (such as a server).
  • the control device is electrically connected with the chip.
  • the control device is used to monitor the state of the chip.
  • the chip and the control device may be electrically connected through an SPI interface.
  • the control device may include a single-chip microcomputer (Micro Controller Unit, MCU).
  • MCU Micro Controller Unit
  • the chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, which can drive multiple loads. Therefore, the chip can be in different working states such as multi-load and light-load.
  • the control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and/or multiple processing circuits in the chip.
  • the present disclosure also discloses an electronic device or device, which includes the above-mentioned board.
  • electronic equipment or devices can include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, and cameras , Cameras, projectors, watches, earphones, mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, optical, acoustic, magnetic or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or in the form of software program modules.
  • the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present disclosure can be embodied in the form of a software product (for example, a computer-readable storage medium)
  • the computer software product is stored in a memory and includes a number of instructions to enable a computer device (which can be Perform all or part of the steps of the methods described in the various embodiments of the present disclosure for a personal computer, a server, or a network device, etc.).
  • the aforementioned memory includes: U disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • a computing device for processing a multi-bit bit width value comprising: an input circuit configured to receive the multi-bit bit width value and configuration information, wherein the configuration information at least includes The bit width information of the first component and the bit width information of the second component of the bit width value; a first component calculation circuit configured to adjust the adjusted multiplier according to the scaling factor associated with the bit width information of the second component The bit width value is calculated to obtain a first component representing the multi-bit bit width value; a second component calculation circuit is configured to at least be based on the adjusted multi-bit bit width value and the first component The numerical value is calculated to obtain a second component representing the multi-bit bit width value; and an output circuit configured to output at least one of the first component and the second component.
  • Clause 2 The computing device according to Clause 1, further comprising a determination circuit and an addition circuit, wherein: the determination circuit is configured to determine the multi-bit width value and the size of the designated data, and send the determination result to the An addition circuit; and the addition circuit is configured to perform an addition or subtraction operation on the multi-bit width value and a given constant based on the determination result to obtain the adjusted multi-bit width value.
  • Clause 3 The computing device according to clause 2, wherein the configuration information further includes sign information about whether the bit width information of the second component includes a sign bit, and the first component calculation circuit is configured to The symbol information determines the scaling factor.
  • the first component calculation circuit includes: a scaling circuit and a rounding circuit, wherein the scaling circuit is configured to perform an adjustment to the adjusted multi-bit bit according to the scaling factor.
  • the wide value performs a shift operation
  • the rounding circuit is configured to enter a rounding operation on the multi-bit wide value after the shift operation is performed to obtain the first component.
  • Clause 5 The computing device according to clause 3, wherein the sign information indicates that the bit width information of the second component includes a sign bit, and the second component calculation circuit is configured to: according to the bit width of the second component The information determines an adjustment value; and calculation is performed according to the adjusted multi-bit width value, the value about the first component, and the adjustment value to obtain the second component.
  • Clause 6 The computing device according to Clause 5, wherein the second component calculation circuit includes a subtraction circuit configured to subtract the sum of the value regarding the first component from the adjusted multi-bit width value. The adjustment value to obtain the second component.
  • Clause 7 The computing device according to clause 1, further comprising a type converter configured to convert the input data into the same data type as the multi-bit width value.
  • Clause 8 The computing device according to clause 1, further comprising a selector configured to select at least one of the first component and the second component to the output circuit according to the configuration information.
  • Clause 9 The computing device according to any one of clauses 1-8, wherein the first component and the second component are used to represent a rounded value of the multi-bit bit width value.
  • Clause 10 The computing device according to any one of clauses 1-8, wherein the configuration information further includes information on the number of components, and when the number of components is a positive integer greater than 2, the computing device The configuration information repeatedly executes the first component calculation circuit and the second component calculation circuit until the number of components is obtained.
  • a method for processing a multi-bit width value for use in neural network operations comprising: receiving the multi-bit width value and configuration information, wherein the configuration information includes at least representative of the multi-bit The bit width information of the first component and the bit width information of the second component of the bit width value; calculate the adjusted multi-bit bit width value according to the scaling factor associated with the bit width information of the second component to obtain a representative The first component of the multi-bit bit width value; calculating at least according to the adjusted multi-bit bit width value and the value of the first component to obtain a second component representing the multi-bit bit width value; And output at least one of the first component and the second component.
  • Clause 14 The method according to Clause 13, further comprising: judging the multi-bit width value and the size of the designated data; and based on the judgment result, adding the multi-bit width value and a given constant to Obtain the adjusted multi-bit width value.
  • Clause 15 The method according to Clause 14, wherein the configuration information further includes sign information about whether the bit width information of the second component includes a sign bit, and the method further includes determining the sign according to the sign information.
  • the zoom factor The zoom factor.
  • Clause 16 The method according to clause 13, wherein in obtaining the first component representing the multi-bit width value, the method comprises: performing the adjusted multi-bit width value according to the scaling factor A shift operation; and a rounding operation is performed on the multi-bit width value after the shift operation is performed to obtain the first component.
  • Clause 17 The method according to Clause 15, wherein the sign information indicates that the bit width information of the second component includes a sign bit, and the method further includes: determining an adjustment value according to the bit width information of the second component; And calculating according to the adjusted multi-bit width value, the value about the first component, and the adjustment value to obtain the second component.
  • Clause 18 The method according to Clause 13, further comprising: converting the input data into the same data type as the multi-bit width value.
  • Clause 19 The method according to clause 13, further comprising: selecting at least one of the first component and the second component for output according to the configuration information.
  • Clause 20 The method according to any one of clauses 13-19, wherein the first component and the second component are used to represent rounded values of the multi-bit width value.
  • the configuration information further includes information about the number of components, and when the number of components is a positive integer greater than 2, the method further includes: determining the configuration information according to the configuration information. At least one of the first component and the second component serves as the new multi-bit width value to be processed next; and is associated with the bit width information of the second component of the new multi-bit width value in the configuration information
  • the scaling system calculates the adjusted new multi-bit width value to obtain the first component representing the new multi-bit width value; according to the adjusted new multi-bit width value and the new multi-bit width value
  • the value of the first component of is calculated to obtain the second component representing the new multi-bit width value; and the above determination and calculation steps are repeatedly performed until the component of the number of components is obtained.
  • a computing device for processing multi-bit width values for neural network operations comprising: a processor; a memory, which is used to store program instructions, when the program instructions are processed by the at least one When the device is executed, the computing device is caused to execute the method according to any one of clauses 13-21.
  • Clause 23 A computer-readable storage medium storing program instructions for processing multi-bit width values for neural network operations. When the program instructions are executed by a processor, they are executed according to Clause 13- The method of any one of 21.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “in response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.

Abstract

一种用于对多比特位宽数值进行处理的计算装置(402)、集成电路板卡、方法和计算机可读存储介质,其中该计算装置(402)可以包括在组合处理装置(400)中,该组合处理装置(400)还可以包括通用互联接口(404)和其他处理装置(406)。所述计算装置(402)与其他处理装置(406)进行交互,共同完成用户指定的计算操作。组合处理装置(400)还可以包括存储装置(408),该存储装置(408)分别与设备和其他处理装置(406)连接,用于存储该设备和其他处理装置(406)的数据。所述装置可以对多比特位宽数值进行拆分,以使得处理器的处理能力不受位宽的影响。

Description

计算装置、方法、板卡和计算机可读存储介质
相关申请的交叉引用
本申请要求于2020年3月17日申请的,申请号为2020101883411,名称为“计算装置、方法、板卡和计算机可读存储介质”的中国专利申请的优先权,在此将其全文引入作为参考。
技术领域
本披露一般地涉及数据处理。更具体地,本披露涉及一种用于对多比特位宽数值进行处理的计算装置、方法、集成电路板卡和计算机可读存储介质。
背景技术
当前,不同类型的处理器所处理的数据位宽可能有所不同。对于执行特定数据类型运算的处理器来说,其处理的数据位宽往往是有限的。例如,对于定点运算器,其通常能处理的数据位宽不超过16位,例如16位的整型数据。然而,为了节约计算成本和开销并提高计算效率,如何令位宽受限的处理器能够处理更多位宽的数据成为需要解决的一个技术问题。
发明内容
为了至少部分地解决背景技术中提到的技术问题,本披露的方案提供了一种用于对多位宽数据进行拆分的方案。通过本披露的拆分方案,可以将多位宽数据拆分成至少两个位宽较小的数据来表达,从而在处理器的处理位宽受限的场景中,可以利用两个位宽较小的数据来参与计算。
在第一方面中,本披露提供一种用于对多比特位宽数值进行处理以用于神经网络运算的计算装置,包括:输入电路,其配置成接收所述多比特位宽数值和配置信息,其中所述配置信息至少包括代表所述多比特位宽数值的第一分量的位宽信息和第二分量的位宽信息;第一分量计算电路,其配置成根据与所述第二分量的位宽信息关联的缩放系数对经调整的多比特位宽数值进行计算,以获得代表所述多比特位宽数值的第一分量;第二分量计算电路,其配置成至少根据所述经调整的多比特位宽数值和所述第一分量的数值进行计算,以获得代表所述多比特位宽数值的第二分量;以及输出电路,其配置成输出所述第一分量和第二分量中的至少一个。
在第二方面中,本披露提供一种集成电路芯片,其包括前述的计算装置。
在第三方面中,本披露提供一种集成电路板卡,其包括前述的集成电路芯片。
在第四方面中,本披露提供一种用于对多比特位宽数值进行处理以用于神经网络运算的方法,包括:接收所述多比特位宽数值和配置信息,其中所述配置信息至少包括代表所述多比特位宽数值的第一分量的位宽信息和第二分量的位宽信息;根据与所述第二分量的位宽信息关联的缩放系数对经调整的多比特位宽数值进行计算,以获得代表所述多比特位宽数值的第一分量;至少根据所述经调整的多比特位宽数值和所述第一分量的数值进行计算,以获得代表所述多比特位宽数值的第二分量;以及输出所述第一分量和第二分量中的至少一个。
在第五方面中,本披露提供一种用于对多比特位宽数值进行处理的计算装置,包括: 处理器;存储器,其用于存储程序指令,当所述程序指令由所述至少一个处理器执行时,使得所述计算装置执行前述的方法。
在第六方面中,本披露提供一种计算机可读存储介质,其上存储有用于对多比特位宽数值进行处理以用于神经网络运算的程序指令,当所述程序指令由处理器运行时,执行前述的方法。
通过如上所提供的计算装置、集成电路板卡、方法和计算机可读存储介质,本披露的方案可以将一个多(或者高)比特位宽数值拆分为多个少(或者低)比特位宽数值来表达,从而在包括例如神经网络运算等的人工智能应用场景或其他通用场景中,不受处理器处理位宽的限制,充分发挥处理器的计算能力。进一步,在一些需要低比特位宽数值的神经网络运算场景中,本披露的方案还可以通过拆分多比特位宽数值为多个低比特位宽的表达来简化神经网络的计算,从而提高计算的效率。
附图说明
通过结合附图,可以更好地理解本披露的上述特征,并且其众多目的、特征和优点对于本领域技术人员而言是显而易见的。下面描述中的附图仅仅是本披露的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可根据这些附图获得其他的附图,其中:
图1是示出根据本披露实施例的计算装置的简化框图;
图2是示出根据本披露实施例的计算装置的详细框图;
图3是示出根据本披露实施例的用于对多比特位宽数值进行处理的流程图;
图4是示出根据本披露实施例的一种组合处理装置的结构图;以及
图5是示出根据本披露实施例的一种板卡的结构示意图。
具体实施方式
本披露的方案通过令多比特位宽(例如24位)数值由至少两个低比特(例如16位和8位)位宽分量来表达,克服了处理器位宽受限的障碍,简化了计算复杂度,从而提高了例如神经网络计算的计算效率。在一个或多个实施例中,可以根据输入的多比特位宽数值的位分布和配置信息,将该源或初始数值划分成高比特部分和低比特部分,并且针对该高比特部分和低比特部分执行拆分计算,以获取与该高比特部分和低比特部分相对应的第一分量和第二分量。由此,在实际计算中,可以利用第一分量和第二分量中的至少一个来替代源数值参与计算。在另外的实施例中,本披露的方案还可以根据配置信息将前述的源数值分解成所需的多个分量,例如对前述的第一分量和第二分量中的至少一个反复继续执行类似的拆分计算,以便获得三个以上的分量。
下面将结合附图对本披露的实施例进行详细的描述。
图1是示出根据本披露实施例的计算装置100的简化框图。在一个或多个实施例中,该计算装置100可以对多比特位宽数值进行处理以用于各类应用场景中,例如包括神经网络运算的人工智能应用或需要拆分数值以用于计算的通用场景中。这里,多比特位宽数值可以包括高比特部分和低比特部分两个部分,以用于后续拆分为两个分量或更多的分量。另外,前述的神经网络运算可以包括训练神经网络中的各类运算,例如反向传播方向上的权值更新或梯度的计算。
如图1中所示,计算装置100包括输入电路102,其配置成接收多比特位宽数值和配 置信息,其中所述配置信息至少包括代表所述多比特位宽数值的第一分量的位宽信息和第二分量的位宽信息。例如,当多比特位宽数值为24位比特位宽时,可以将其划分为8位的高比特部分和16位的低比特部分,也即将要拆分得到的一个8位比特位宽分量和一个16位比特位宽分量。在一个或多个实施例中,配置信息还可以包括指示第二分量的位宽信息是否包含符号位的符号信息。例如,当以比特位来表示时,则“1”指示第二分量中包括符号位,而“0”指示低比特部分不包含符号位。附加地,配置信息还可以包括有效位信息,例如强行指定所述多比特位宽数值中的哪些比特位为有效位,以便在后续的拆分计算中针对有效位进行计算。
与输入电路102相连接的是第一分量计算电路104和第二分量计算电路106。在一个或多个实施例中,第一分量计算电路可以配置成根据与所述第二分量的位宽信息关联的缩放系数对经调整的多比特位宽数值进行计算,以获得代表所述多比特位宽数值的第一分量。在一个场景中,第一分量计算电路可以根据所述第二分量的位宽信息来确定缩放系数以及利用该缩放系数对所述多比特位宽数值进行计算以获得所述第一分量。例如,当所述第一分量的比特位宽为n1,而所述第二分量的比特位宽为n2时,当不考虑n2中包括符号位时,缩放系数可以为2 n2-1。相比而言,当考虑n2中包括符号位时,则缩放系数可以为2 n2
在对多比特位宽数值进行计算前,在一个或多个实施例中,本披露的计算装置还可以判定多比特位宽数值与指定数据的大小,以执行对应的数值调整。例如,在指定数据为零时,当所述多比特位宽数值大于或等于零时,将其与给定常量执行加法,以获得所述经调整的多比特位宽数值。相比而言,当所述多比特位宽数值小于零时,将所述多比特位宽数值与前述给定常量执行减法,以获得所述经调整的多比特位宽数值。
在一个或多个实施例中,所述第二分量计算电路可以配置成至少根据所述经调整的多比特位宽数值和所述第一分量的数值进行计算,以获得代表所述多比特位宽数值的第二分量。在一个场景中,当所述符号信息指示所述第二分量的位宽信息不包含符号位时,例如第二分量比特位宽的最高位不为符号位,则所述第二分量计算电路可以配置成从所述经调整的多比特位宽数值减去所述第一分量的数值,以获得所述第二分量。这里,第一分量的数值可以是第一分量与前述的缩放系数的乘积值。在另一个场景中,当配置信息中的符号信息指示第二分量的位宽信息包括符号位时,例如第二分量的比特位宽的最高位是符号位时,则第二分量计算电路可以配置成根据所述第二分量的位宽信息确定调整值,以及根据所述多比特位宽数值、关于所述第一分量的数值和所述调整值来进行计算,以获得所述第二分量。可以理解的是在一些场景中,前述的配置信息中也可以不包括符号信息。在该情况下,通过例如初始的默认设置,本披露的计算装置可以配置成直接执行考虑符号位的拆分操作或执行不考虑符号位的拆分操作,而不再基于配置信息来做出是否考虑符号位的判断。
上面对于如何获得本披露所述的第一分量和第二分量进行了描述,但应当理解的是上述的描述仅仅是示例性的而非限制性的,而本领域技术人员根据上述的描述,也可以想到其他的可选或替代方案。例如,在一些场景中,根据对前述的多比特位宽数值的不同调整方式,可以对获得所述第一分量和第二分量之前的中间值进行各种相应的舍入操作。这些舍入操作可以包括相应地向零取整、四舍五入、向上取整或向下取整等。例如,在利用缩放系数对经调整的多比特位宽数值进行计算后,可以对获得的值向零取值,以获得第一分量。类似地,在从所述经调整的多比特位宽数值中减去关于所述第一分量的数值和所述调 整值后,也可以对获得的值向零取整,以获得第二分量。
在获得可以代表所述多比特位宽数值的第一分量和第二分量后,本披露的计算装置还包括输出电路108,其可以配置成输出所述第一分量和第二分量中的至少一个。如前所述,输出电路所输出的第一分量和第二分量可以应用于神经网络中需要利用低比特位宽的数据进行的各类计算中,例如用于神经网络训练中的反向传播的权值更新和梯度计算中。在一些应用场景中,还可以直接存储获得的第一分量和第二分量来替代多比特位宽数值,以便稍后使用。在神经网络使用支持低比特位宽的定点处理器来执行定点运算的场景中,通过使用输出电路输出的低比特位宽的第一分量和第二分量来用于定点处理器的定点运算,定点处理器将摆脱不能处理多或高比特位宽数据的限制,扩展了定点处理器的计算场景并简化计算,从而也提升了计算效率并且减小计算的开销。
图2是示出根据本披露实施例的计算装置200的详细框图。从图2中看出,计算装置200不仅包括图1中的计算装置100的输入电路102、第一分量电路104、第二分量电路106和输出电路108,还进一步示出了第一和第二分量电路中包含的多个电路,以及附加地多个其他器件。由于前文已经结合图1对输入电路、第一分量和第二分量计算电路以及输出电路的功能进行了详细描述,下文将不再赘述。
如图2中所示,计算装置200还可以包括类型转换器110,其可以配置成将输入的数据转换成与所述多比特位宽数值相同的数据类型,也即第一分量计算电路和第二分量计算电路支持的数据类型。通过设置的类型转换器,本披露的计算装置可以对与拆分操作支持的数据类型不同类型的数据执行拆分。例如,当本披露的计算装置支持对定点型数值的拆分时,当输入电路接收到浮点型数值时,通过类型转换器可以将该浮点型数值转换成定点整型数值,以便由第一分量计算电路和第二分量计算电路进行拆分。类似地,当本披露的计算装置支持对浮点型数值的拆分时,当输入电路接收到定点型数值时,通过类型转换器可以将该定点型数值转换成浮点型数值时,以便由第一分量计算电路和第二分量计算电路进行拆分。
进一步,计算装置还可以包括判定电路112,该判定电路可以配置成判定所述多比特位宽数值与指定数据的大小,并且将判定结果发送给所述加法电路114。在一个实施例中,所述加法电路可以配置成基于所述判定结果将所述多比特位宽数值与给定常量执行加法或减法操作,以获得所述经调整的多比特位宽数值。在一个实现场景中,此处的加法电路可以包括取负数电路,从而可以将减法操作转变为加法操作。在另一个实现场景中,前述的加法电路可以是支持减法操作的加法器。另外,这里的加法器仅仅是示例性的,本领域技术人员根据本披露的教导,也可以分别布置加法器和减法器,从而分别完成相应的加法操作和减法操作。
进一步示出的第一分量计算电路104可以包括缩放电路1041和舍入电路1042。在一个实施例中,其中所述缩放电路配置成根据前述的缩放系数对所述经调整的多比特位宽数值执行移位操作。例如,当缩放系数为2 n2时,利用移位电路执行移位操作即是将多比特位宽数值向高位移动n2位。当高位在多比特位宽数值的左侧而低位在其右侧时,向高位移动n2位即是向左移动n2位。在具体实现方面,此处的移位电路可以通过乘法器来构建。在对多比特位宽数值进行相应的移位操作后,在一个实施例中,第一分量计算电路还包括舍入电路1042,该舍入电路可以配置成对执行移位操作后的所述多比特位宽数值进行舍入操作,以获得所述第一分量。根据不同的应用场景,这里的舍入操作可以包括各种形式 的取整运算,例如向上取整、向下取整、向零取整等取整运算。通过这样的舍入操作后,就可以获得与所述多比特位宽数值中的高比特部分相关的第一分量。
在一个或多个实施例中,第二分量计算电路106可以包括减法电路1061,其配置成从所述多比特位宽数值中减去关于第一分量的数值和所述调整值,以获得所述第二分量。在一个场景中,这里的第一分量的数值可以是第一分量与缩放系数的乘积值,而所述调整值即前述的考虑了低比特部分的符号位后的数值。通过利用减法电路从多比特位宽数值中减去乘积值和调整值,即可获得与所述多比特位宽数值中的低比特部分相关的第二分量。
为了实现灵活地输出,本披露的计算装置200附加地还包括选择器116,该选择器可以配置成将所述第一分量和第二分量中的至少一个输出给输出电路108以便进行输出。在一个实施例中,该选择器116可以根据配置信息中包括的关于输入项的信息来选择输出第一分量、第二分量或者输出二者。这样的选择性输出在一些应用场景具有技术优势。例如,在仅需要第一分量或第二分量参与后续的运算中,此时输出电路并不需要将二者都予以输出,从而在输出方面节省开销。另外,当仅需要输出第一分量时,本披露的计算装置还可以仅计算第一分量并予以输出,从而进一步节省计算开销。
在一个或多个实施例中,本披露的计算装置还可以用于将多比特位宽数值分解为由用户指定或算法所需的多个分量来表达。为此,可以令前述的配置信息包括分量数目的信息。在分量数目为大于2的正整数时,本披露的计算装置根据所述配置信息来反复执行第一分量计算电路和第二分量计算电路,直到获得所述分量数目的分量。例如,对于24位比特位宽数值,当配置信息指示将其拆分成3个分量时,并且三个分量都为8比特位宽时,可以通过第一分量计算电路和第二分量计算电路将24位比特位宽数值拆分成8比特位宽的第一分量和16比特位宽的中间第二分量。接着,获得的16位宽的中间第二分量的数值被重新输入到第一分量计算电路和第二分量计算电路,以将其进一步拆分成8比特位宽的第二分量和8比特位宽的第三分量。
图3是示出根据本披露实施例的用于对多比特位宽数值进行处理的方法300的流程图。通过方法300的处理,至少可以将多比特位宽数值拆分成代表其的第一分量和第二分量。
如图3中所示,在步骤302处,方法300接收所述多比特位宽数值和配置信息,其中所述配置信息至少包括代表所述多比特位宽数值的第一分量的位宽信息和第二分量的位宽信息。通过这样的配置信息,方法300至少可以确定将要拆分的第一分量或第二分量所对应的比特位宽值。接着,在步骤304处,方法300根据与第二分量的位宽信息关联的缩放系数对经调整的多比特位宽数值进行计算,以获得代表多比特位宽数值的第一分量。如前结合图1和图2所述的,这里的调整可以是将多比特位宽数值与指定数据进行比较,并且根据比较的结果令该多比特位宽数值与一个给定常量执行加法或减法操作,从而完成相应的调整。在一个实施例中,方法300可以根据所述第二分量的位宽信息来确定缩放系数,以及利用该缩放系数对经调整的多比特位宽数值进行计算以获得所述第一分量。在一个场景中,方法300可以根据配置信息中包括关于第二分量是否包含符号位的符号信息来确定缩放系数。
在计算获得代表多比特位宽数值的第一分量后,方法300前进到步骤306。在该步骤306处,方法300至少根据经调整的多比特位宽数值和前述第一分量的数值进行计算,以获得代表多比特位宽数值的第二分量。在一个实施例中,当前述配置信息中的所述符号信息指示所述第二分量的位宽信息包含符号位,则所述方法300还可以包括根据所述第二分 量的位宽信息确定调整值,以及根据经调整的多比特位宽数值、关于所述第一分量的数值和所述调整值进行计算,以获得所述第二分量。在一个实现场景中,可以从经调整的多比特位宽数值中减去关于第一分量的数值和所述调整值,以获得第二分量。进一步,也可以在多比特位宽数值中减去关于第一分量的数值和所述调整值后,对所获得的数值执行与调整所述多比特位宽数值的操作相对应的舍入操作,以获得第二分量。
在获得第一分量和第二分量后,方法300前进到步骤308。在此处,方法300输出第一分量和第二分量中的至少一个。在一个实施例中,可以根据配置信息来选择性地输出第一分量、第二分量或者输出二者。在一些实施例中,当配置信息包括拆分后的分量数目的信息时,方法300可以根据配置信息中的分量数目来反复执行拆分操作,直到将多比特位宽数值拆分到要求的数目。例如,在一个场景中,当要求拆分的分量数目为大于2的正整数时,则方法300还可以包括根据配置信息确定所述第一分量和第二分量中的至少一个作为下一待处理的新的多比特位宽数值。接着,方法300可以对新的多比特位宽数据执行调整操作,以获得调整后的新的多比特位宽数值,并且接着可以根据配置信息中的位宽信息来进行计算,以获得代表该新的多比特位宽数值的第一分量和第二分量。可以看出,为了达到预定的分量数目,方法300可以反复执行前述的第一分量和第二分量的计算步骤,直到获得预定数目的分量。
下面将从数学计算角度来描述前述本披露的计算装置或方法所执行的拆分操作,通过下面的一系列示例性公式表达和具体数值的拆分结果,本领域技术人员可以进一步理解本披露的方案及其实施。为了描述的简明,下面以f表示多比特位宽数值,其具有n0的比特位宽,I 1为拆分后获得的第一分量并且具有n1长度的比特位宽,而I 2为拆分后获得的第二分量并且具有n2长度的比特位宽,其中n0=n1+n2。
首先,可以通过下面的公式(1)和(2)来对f进行调整:
if f>=0:f=f+0.5        (1)
if f<0:f=f-0.5        (2)
上式中的“0.5”即前述的给定常量。通过上式(1)或(2)的计算,可以获得经调整的多比特位宽数值。
接着,可以通过下式(3)来计算第一分量I 1
Figure PCTCN2021081188-appb-000001
其中,to_zero是向零取整函数,2 n2-1表示前述的缩放系数,此时第二分量并不包括符号位。如前所述,缩放系数也与第二分量是否包括符号位有关。当第二分量包括符号位时,则这里的缩放系数可以是2 n2(下文将描述到)
接着,可以通过下式来计算第二分量I 2
I 2=to_zero(f-I 1×2 n2-1)       (4)
可以看出,上式(2)中的“I 1×2 n2-1”项即前述的第一分量的数值。当I 1乘以缩放系数2 n2-1时,相当于将其向高比特位方向移动n2-1位。通过上述的计算,就获得了代表所述多比特位宽数值的第一分量和第二分量。
上面对第二分量不包括符号位的拆分操作进行了描述,下面将对第二分量包括符号位的拆分操作进行描述,其中公式中符号的含义与上文相同。
首先,可以通过下面的公式(5)和(6)来对f进行调整:
iff>=0:f=f+0.5             (5)
if f<0:f=f-0.5          (6)
上式中的“0.5”即前述的给定常量。通过上式(5)或(6)的计算,可以获得经调整的多比特位宽数值。
接着,可以通过下式(7)来计算第一分量I 1
Figure PCTCN2021081188-appb-000002
由于在本情形中考虑第二分量中包括符号位,因此这里采用2 n2表示前述的缩放系数,而非式(1)中的“2 n2-1”。
接着,可以通过下式(8)和(9)来计算第二分量I 2
如果f<=0,则I 2=to_zero(f-I 1×2 n2-2 n2-1)       (8)
如果f>0,则I 2=floor(f-I 1×2 n2-2 n2-1)      (9)
可以看出,由于考虑符号位,式(8)和(9)中具有式(4)中所不具有的调整值“2 n2-1”。进一步,式(8)和(9)中的“I 1×2 n2”项即第一分量的数值。另外,在将多比特位宽数值减去第一分量的数值和调整值后,本披露的方案还可以对获得的值执行对应的舍入操作,例如式(8)中使用的向零取整函数“to_zero()”和式9中使用的向下取整函数“floor()”,从而获得最终的第一分量和第二分量。由于式(8)和(9)考虑了符号位,因此获得的第一分量和第二分量在表达拆分前的多比特位宽数值方面损失更小。
在一些应用场景中,当所述多比特位宽数值是浮点数时,可以将其通过舍入操作(例如四舍五入)来获得定点数,进而通过上式(1)-(4)或上式(5)-(9)的拆分操作所获得的第一分量和第二分量也可以代表该定点数参与运算,这对于人工智能应用中的定点数运算尤其有利。另外,在不同的拆分场景中,上式中的n0、n1和n2可以取不同的正整数值,例如n0=24,n1=8,n2=16,或者n0=32,n1=16,n2=16,或n0=25,n1=9,n2=16。在一些拆分场景中,n0、n1和n2还可以满足n0<=n1+n2,例如n0=25,n1=16,n2=16,即将25位比特位宽数值拆分成两个16比特位宽分量。
下面通过表格列出对具体数值(以浮点数为例)的拆分结果,其中表(1)-表(2)列出考虑符号位的拆分结果,而表(3)和表(4)列出未考虑符号位的拆分结果。
表1
Figure PCTCN2021081188-appb-000003
表2
Figure PCTCN2021081188-appb-000004
Figure PCTCN2021081188-appb-000005
表3
Figure PCTCN2021081188-appb-000006
表4
Figure PCTCN2021081188-appb-000007
图4是示出根据本披露实施例的一种组合处理装置400的结构图。如图所示,该组合处理装置400包括前述的计算装置402,其可以配置用于执行前述结合附图所描述的拆分方法。另外,该组合处理装置还包括通用互联接口404和其他处理装置406。根据本披露的计算装置402可以通过通用互联接口404与其他处理装置406进行交互,共同完成用户指定的操作,例如对多比特位宽数值进行拆分,以至少获得第一分量和第二分量。
根据本披露的方案,该其他处理装置可以包括中央处理器(“CPU”)、图形处理器(“GPU”)、人工智能处理器等通用和/或专用处理器中的一种或多种类型的处理器,其数目可以不做限制而是根据实际需要来确定。在一个或多个实施例中,该其他处理装置可以作为本披露的计算装置(其可以具体化为人工智能例如神经网络运算的相关运算装置)与外部数据和控制的接口,执行包括但不限于数据搬运,完成对计算装置的开启、停止等的基本控制;其他处理装置也可以和该计算装置协作共同完成运算任务。
根据本披露的方案,该通用互联接口可以用于在计算装置与其他处理装置间传输数据和控制指令。例如,该计算装置可以经由所述通用互联接口从其他处理装置中获取待拆分的输入数据,写入该计算装置片上的存储装置(或称存储器)。进一步,该计算装置可以经由所述通用互联接口从其他处理装置中获取控制指令,写入计算装置片上的控制缓存。替代地或可选地,通用互联接口也可以读取计算装置的存储模块中的数据并传输给其他处理装置。
可选的,该组合处理装置还可以包括存储装置408,其可以分别与所述计算装置和所述其他处理装置连接。在一个或多个实施例中,存储装置可以用于保存所述计算装置和所 述其他处理装置的数据,尤其那些在计算装置或其他处理装置的内部或片上存储装置中无法全部保存的数据。
根据应用场景的不同,本披露的组合处理装置可以作为手机、机器人、无人机、视频监控设备等设备的SOC片上系统,有效降低控制部分的核心面积,提高处理速度,降低整体功耗。在此情况下,该组合处理装置的通用互联接口与设备的某些部件相连接。某些部件例如摄像头、显示器、鼠标、键盘、网卡或wifi接口。
在一些实施例里,本披露还公开了一种芯片,其包括了上述测试装置或组合处理装置。在另一些实施例里,本披露还公开了一种芯片封装结构,其包括了上述芯片。
在一些实施例里,本披露还公开了一种板卡,其包括了上述芯片封装结构。参阅图5,其提供了前述的示例性板卡,上述板卡除了包括上述芯片502以外,还可以包括其他的配套部件,该配套部件包括但不限于:存储器件504、接口装置506和控制器件508。
所述存储器件与所述芯片封装结构内的芯片通过总线连接,用于存储数据。所述存储器件可以包括多组存储单元510。每一组所述存储单元与所述芯片通过总线连接。可以理解,每一组所述存储单元可以是DDR SDRAM(“Double Data Rate SDRAM,双倍速率同步动态随机存储器”)。
上述DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。在一个实施例中,所述存储器件可以包括4组所述存储单元。每一组所述存储单元可以包括多个DDR4颗粒(芯片)。在一个实施例中,所述芯片内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。
在一个实施例中,每一组所述存储单元包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在所述芯片中设置控制DDR的控制器,用于对每个所述存储单元的数据传输与数据存储的控制。
所述接口装置与所述芯片封装结构内的芯片电连接。所述接口装置用于实现所述芯片与外部设备512(例如服务器或计算机)之间的数据传输。例如在一个实施例中,所述接口装置可以为标准PCIE接口。比如,待处理的数据由服务器通过标准PCIE接口传递至所述芯片,实现数据转移。在另一个实施例中,所述接口装置还可以是其他的接口,本披露并不限制上述其他的接口的具体表现形式,所述接口单元能够实现转接功能即可。另外,所述芯片的计算结果仍由所述接口装置传送回外部设备(例如服务器)。
所述控制器件与所述芯片电连接。所述控制器件用于对所述芯片的状态进行监控。具体地,所述芯片与所述控制器件可以通过SPI接口电连接。所述控制器件可以包括单片机(Micro Controller Unit,MCU)。在一个或多个实施例中,所述芯片可以包括多个处理芯片、多个处理核或多个处理电路,可以带动多个负载。因此,所述芯片可以处于多负载和轻负载等不同的工作状态。通过所述控制装置可以实现对所述芯片中多个处理芯片、多个处理和/或多个处理电路的工作状态的调控。
在一些实施例里,本披露还公开了一种电子设备或装置,其包括了上述板卡。根据不同的应用场景,电子设备或装置可以包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、 微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本披露并不受所描述的动作顺序的限制,因为依据本披露,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本披露所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本披露所提供的几个实施例中,应该理解到,所公开的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性、光学、声学、磁性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本披露各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。
所述集成的单元如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,当本披露的技术方案可以以软件产品(例如计算机可读存储介质)的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本披露各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
在本披露的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。上述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
依据以下条款可更好地理解前述内容:
条款1、一种用于对多比特位宽数值进行处理的计算装置,包括:输入电路,其配置成接收所述多比特位宽数值和配置信息,其中所述配置信息至少包括代表所述多比特位宽数值的第一分量的位宽信息和第二分量的位宽信息;第一分量计算电路,其配置成根据与所述第二分量的位宽信息关联的缩放系数对经调整的多比特位宽数值进行计算,以获得代表所述多比特位宽数值的第一分量;第二分量计算电路,其配置成至少根据所述经调整的多比特位宽数值和所述第一分量的数值进行计算,以获得代表所述多比特位宽数值的第二分量;以及输出电路,其配置成输出所述第一分量和第二分量中的至少一个。
条款2、根据条款1所述的计算装置,进一步包括判定电路和加法电路,其中:所述判定电路配置成判定所述多比特位宽数值与指定数据的大小,并且将判定结果发送给所述加法电路;以及所述加法电路配置成基于所述判定结果将所述多比特位宽数值与给定常量执行加法或减法操作,以获得所述经调整的多比特位宽数值。
条款3、根据条款2所述的计算装置,其中所述配置信息还包括关于所述第二分量的位宽信息是否包含符号位的符号信息,并且所述第一分量计算电路配置成根据所述符号信息来确定所述缩放系数。
条款4、根据条款1所述的计算装置,其中所述第一分量计算电路包括:缩放电路和舍入电路,其中所述缩放电路配置成根据所述缩放系数对所述经调整的多比特位宽数值执行移位操作,并且所述舍入电路配置成对执行移位操作后的多比特位宽数值进入舍入操作,以获得所述第一分量。
条款5、根据条款3所述的计算装置,其中所述符号信息指示所述第二分量的位宽信息包含符号位,所述第二分量计算电路配置成:根据所述第二分量的位宽信息确定调整值;以及根据所述经调整的多比特位宽数值、关于所述第一分量的数值和所述调整值进行计算,以获得所述第二分量。
条款6、根据条款5所述的计算装置,其中所述第二分量计算电路包括减法电路,其配置成从所述经调整的多比特位宽数值中减去关于所述第一分量的数值和所述调整值,以获得所述第二分量。
条款7、根据条款1所述的计算装置,进一步包括类型转换器,其配置成将输入的数据转换成与所述多比特位宽数值相同的数据类型。
条款8、根据条款1所述的计算装置,进一步包括选择器,其配置成根据所述配置信息选择所述第一分量和第二分量中的至少一个至所述输出电路。
条款9、根据条款1-8的任意一项所述的计算装置,其中所述第一分量和所述第二分量用于代表所述多比特位宽数值的舍入值。
条款10、根据条款1-8的任意一项所述的计算装置,其中所述配置信息还包括分量数目的信息,并且当所述分量数目为大于2的正整数时,所述计算装置根据所述配置信息反复执行所述第一分量计算电路和第二分量计算电路,直到获得所述分量数目的分量。
条款11、一种集成电路芯片,包括根据条款1-10的任意一项所述的计算装置。
条款12、一种集成电路板卡,包括根据条款11所述的计算装置。
条款13、一种用于对多比特位宽数值进行处理以用于神经网络运算的方法,包括:接收所述多比特位宽数值和配置信息,其中所述配置信息至少包括代表所述多比特位宽数值的第一分量的位宽信息和第二分量的位宽信息;根据与所述第二分量的位宽信息关联的缩放系数对经调整的多比特位宽数值进行计算,以获得代表所述多比特位宽数值的第一分量;至少根据所述经调整的多比特位宽数值和所述第一分量的数值进行计算,以获得代表所述多比特位宽数值的第二分量;以及输出所述第一分量和第二分量中的至少一个。
条款14、根据条款13所述的方法,进一步包括:判定所述多比特位宽数值与指定数据的大小;以及基于该判定结果,将所述多比特位宽数值与给定常量执行加法,以获得所述经调整的多比特位宽数值。
条款15、根据条款14所述的方法,其中所述配置信息还包括关于所述第二分量的位宽信息是否包含符号位的符号信息,所述方法进一步包括根据所述符号信息来确定所述缩 放系数。
条款16、根据条款13所述的方法,其中在获得代表所述多比特位宽数值的第一分量中,所述方法包括:根据所述缩放系数对所述经调整的多比特位宽数值执行移位操作;以及对执行移位操作后的多比特位宽数值进入舍入操作,以获得所述第一分量。
条款17、根据条款15所述的方法,其中所述符号信息指示所述第二分量的位宽信息包含符号位,所述方法还包括:根据所述第二分量的位宽信息确定调整值;以及根据所述经调整的多比特位宽数值、关于所述第一分量的数值和所述调整值进行计算,以获得所述第二分量。
条款18、根据条款13所述的方法,进一步包括:将输入的数据转换成与所述多比特位宽数值相同的数据类型。
条款19、根据条款13所述的方法,进一步包括:根据所述配置信息选择所述第一分量和第二分量中的至少一个来进行输出。
条款20、根据条款13-19的任意一项所述的方法,其中所述第一分量和所述第二分量用于代表所述多比特位宽数值的舍入值。
条款21、根据条款13的所述的方法,其中所述配置信息还包括分量数目的信息,并且当所述分量数目为大于2的正整数时,所述方法还包括:根据配置信息确定所述第一分量和第二分量中的至少一个作为下一待处理的新的多比特位宽数值;根据所述配置信息中所述新的多比特位宽数值的第二分量的位宽信息关联的缩放系统对经调整的新的多比特位宽数值进行计算,以获得代表新的多比特位宽数值的第一分量;根据经调整的新的多比特位宽数值和新的多比特位宽数值的第一分量的数值进行计算,以获得代表新的多比特位宽数值的第二分量;以及反复执行上述确定步骤和计算步骤,直到获得所述分量数目的分量。
条款22、一种用于对多比特位宽数值进行处理以用于神经网络运算的计算装置,包括:处理器;存储器,其用于存储程序指令,当所述程序指令由所述至少一个处理器执行时,使得所述计算装置执行根据条款13-21的任意一项所述的方法。
条款23、一种计算机可读存储介质,其上存储有用于对多比特位宽数值进行处理以用于神经网络运算的程序指令,当所述程序指令由处理器运行时,执行根据条款13-21的任意一项所述的方法。
以上对本披露实施例进行了详细介绍,本文中应用了具体个例对本披露的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本披露的方法及其核心思想;同时,对于本领域的一般技术人员,依据本披露的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本披露的限制。
应当理解,本披露的权利要求、说明书及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。本披露的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本披露说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本披露。如在本披露说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一 步理解,在本披露说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
以上对本披露实施例进行了详细介绍,本文中应用了具体个例对本披露的原理及实施方式进行了阐述,以上实施例的说明仅用于帮助理解本披露的方法及其核心思想。同时,本领域技术人员依据本披露的思想,基于本披露的具体实施方式及应用范围上做出的改变或变形之处,都属于本披露保护的范围。综上所述,本说明书内容不应理解为对本披露的限制。

Claims (23)

  1. 一种用于对多比特位宽数值进行处理的计算装置,包括:
    输入电路,其配置成接收所述多比特位宽数值和配置信息,其中所述配置信息至少包括代表所述多比特位宽数值的第一分量的位宽信息和第二分量的位宽信息;
    第一分量计算电路,其配置成根据与所述第二分量的位宽信息关联的缩放系数对经调整的多比特位宽数值进行计算,以获得代表所述多比特位宽数值的第一分量;
    第二分量计算电路,其配置成至少根据所述经调整的多比特位宽数值和所述第一分量的数值进行计算,以获得代表所述多比特位宽数值的第二分量;以及
    输出电路,其配置成输出所述第一分量和第二分量中的至少一个。
  2. 根据权利要求1所述的计算装置,进一步包括判定电路和加法电路,其中:
    所述判定电路配置成判定所述多比特位宽数值与指定数据的大小,并且将判定结果发送给所述加法电路;以及
    所述加法电路配置成基于所述判定结果将所述多比特位宽数值与给定常量执行加法或减法操作,以获得所述经调整的多比特位宽数值。
  3. 根据权利要求2所述的计算装置,其中所述配置信息还包括关于所述第二分量的位宽信息是否包含符号位的符号信息,并且所述第一分量计算电路配置成根据所述符号信息来确定所述缩放系数。
  4. 根据权利要求1所述的计算装置,其中所述第一分量计算电路包括:
    缩放电路和舍入电路,其中所述缩放电路配置成根据所述缩放系数对所述经调整的多比特位宽数值执行移位操作,并且所述舍入电路配置成对执行移位操作后的多比特位宽数值进入舍入操作,以获得所述第一分量。
  5. 根据权利要求3所述的计算装置,其中所述符号信息指示所述第二分量的位宽信息包含符号位,所述第二分量计算电路配置成:
    根据所述第二分量的位宽信息确定调整值;以及
    根据所述经调整的多比特位宽数值、关于所述第一分量的数值和所述调整值进行计算,以获得所述第二分量。
  6. 根据权利要求5所述的计算装置,其中所述第二分量计算电路包括减法电路,其配置成从所述经调整的多比特位宽数值中减去关于所述第一分量的数值和所述调整值,以获得所述第二分量。
  7. 根据权利要求1所述的计算装置,进一步包括类型转换器,其配置成将输入的数据转换成与所述多比特位宽数值相同的数据类型。
  8. 根据权利要求1所述的计算装置,进一步包括选择器,其配置成根据所述配置信息选择所述第一分量和第二分量中的至少一个至所述输出电路。
  9. 根据权利要求1-8的任意一项所述的计算装置,其中所述第一分量和所述第二分量用于代表所述多比特位宽数值的舍入值。
  10. 根据权利要求1-8的任意一项所述的计算装置,其中所述配置信息还包括分量数目的信息,并且当所述分量数目为大于2的正整数时,所述计算装置根据所述配置信息反复执行所述第一分量计算电路和第二分量计算电路,直到获得所述分量数目的分量。
  11. 一种集成电路芯片,包括根据权利要求1-10的任意一项所述的计算装置。
  12. 一种集成电路板卡,包括根据权利要求11所述的集成电路芯片。
  13. 一种用于对多比特位宽数值进行处理的方法,包括:
    接收所述多比特位宽数值和配置信息,其中所述配置信息至少包括代表所述多比特位宽数值的第一分量的位宽信息和第二分量的位宽信息;
    根据与所述第二分量的位宽信息关联的缩放系数对经调整的多比特位宽数值进行计算,以获得代表所述多比特位宽数值的第一分量;
    至少根据所述经调整的多比特位宽数值和所述第一分量的数值进行计算,以获得 代表所述多比特位宽数值的第二分量;以及
    输出所述第一分量和第二分量中的至少一个。
  14. 根据权利要求13所述的方法,进一步包括:
    判定所述多比特位宽数值与指定数据的大小;以及
    基于该判定结果,将所述多比特位宽数值与给定常量执行加法,以获得所述经调整的多比特位宽数值。
  15. 根据权利要求14所述的方法,其中所述配置信息还包括关于所述第二分量的位宽信息是否包含符号位的符号信息,所述方法进一步包括根据所述符号信息来确定所述缩放系数。
  16. 根据权利要求13所述的方法,其中在获得代表所述多比特位宽数值的第一分量中,所述方法包括:
    根据所述缩放系数对所述经调整的多比特位宽数值执行移位操作;以及
    对执行移位操作后的多比特位宽数值进入舍入操作,以获得所述第一分量。
  17. 根据权利要求15所述的方法,其中所述符号信息指示所述第二分量的位宽信息包含符号位,所述方法还包括:
    根据所述第二分量的位宽信息确定调整值;以及
    根据所述经调整的多比特位宽数值、关于所述第一分量的数值和所述调整值进行计算,以获得所述第二分量。
  18. 根据权利要求13所述的方法,进一步包括:
    将输入的数据转换成与所述多比特位宽数值相同的数据类型。
  19. 根据权利要求13所述的方法,进一步包括:
    根据所述配置信息选择所述第一分量和第二分量中的至少一个来进行输出。
  20. 根据权利要求13-19的任意一项所述的方法,其中所述第一分量和所述第二分量用于代表所述多比特位宽数值的舍入值。
  21. 根据权利要求13的所述的方法,其中所述配置信息还包括分量数目的信息,并且当所述分量数目为大于2的正整数时,所述方法还包括:
    根据配置信息确定所述第一分量和第二分量中的至少一个作为下一待处理的新的多比特位宽数值;
    根据所述配置信息中所述新的多比特位宽数值的第二分量的位宽信息关联的缩放系统对经调整的新的多比特位宽数值进行计算,以获得代表新的多比特位宽数值的第一分量;
    根据经调整的新的多比特位宽数值和新的多比特位宽数值的第一分量的数值进行计算,以获得代表新的多比特位宽数值的第二分量;以及
    反复执行上述确定步骤和计算步骤,直到获得所述分量数目的分量。
  22. 一种用于对多比特位宽数值进行处理的计算装置,包括:
    处理器;
    存储器,其用于存储程序指令,当所述程序指令由所述至少一个处理器执行时,使得所述计算装置执行根据权利要求13-21的任意一项所述的方法。
  23. 一种计算机可读存储介质,其上存储有用于对多比特位宽数值进行处理以用于神经网络运算的程序指令,当所述程序指令由处理器运行时,执行根据权利要求13-21的任意一项所述的方法。
PCT/CN2021/081188 2020-03-17 2021-03-16 计算装置、方法、板卡和计算机可读存储介质 WO2021185261A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2021576637A JP7269382B2 (ja) 2020-03-17 2021-03-16 計算装置、方法、プリント基板、およびコンピュータ読み取り可能な記録媒体
EP21771952.5A EP4024288B1 (en) 2020-03-17 2021-03-16 Computing apparatus, method, board card and computer-readable storage medium
US17/557,669 US20220253280A1 (en) 2020-03-17 2021-12-21 Computing apparatus, method, board card and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010188341.1 2020-03-17
CN202010188341.1A CN113408717A (zh) 2020-03-17 2020-03-17 计算装置、方法、板卡和计算机可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/557,669 Continuation US20220253280A1 (en) 2020-03-17 2021-12-21 Computing apparatus, method, board card and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2021185261A1 true WO2021185261A1 (zh) 2021-09-23

Family

ID=77677169

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/081188 WO2021185261A1 (zh) 2020-03-17 2021-03-16 计算装置、方法、板卡和计算机可读存储介质

Country Status (5)

Country Link
US (1) US20220253280A1 (zh)
EP (1) EP4024288B1 (zh)
JP (1) JP7269382B2 (zh)
CN (1) CN113408717A (zh)
WO (1) WO2021185261A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102761509A (zh) * 2011-04-27 2012-10-31 联芯科技有限公司 Ofdm系统的接收系统及降低接收系统内存的方法
CN105978611A (zh) * 2016-05-12 2016-09-28 京信通信系统(广州)有限公司 一种频域信号压缩方法及装置
US20190339937A1 (en) * 2018-05-04 2019-11-07 Microsoft Technology Licensing, Llc Block floating point computations using reduced bit-width vectors
CN110780845A (zh) * 2019-10-17 2020-02-11 浙江大学 一种用于量化卷积神经网络的可配置近似乘法器及其实现方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0831024B2 (ja) * 1989-02-03 1996-03-27 日本電気株式会社 演算プロセッサ
JP3602884B2 (ja) * 1994-04-14 2004-12-15 松下電器産業株式会社 画像処理装置
JP2001109613A (ja) 1999-10-05 2001-04-20 Mitsubishi Electric Corp 演算装置
US20030065699A1 (en) * 2001-10-01 2003-04-03 Koninklijke Philips Electronics N.V. Split multiplier for efficient mixed-precision DSP
US8495125B2 (en) * 2009-05-27 2013-07-23 Microchip Technology Incorporated DSP engine with implicit mixed sign operands
US10579338B2 (en) 2017-01-30 2020-03-03 Arm Limited Apparatus and method for processing input operand values
US10853067B2 (en) * 2018-09-27 2020-12-01 Intel Corporation Computer processor for higher precision computations using a mixed-precision decomposition of operations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102761509A (zh) * 2011-04-27 2012-10-31 联芯科技有限公司 Ofdm系统的接收系统及降低接收系统内存的方法
CN105978611A (zh) * 2016-05-12 2016-09-28 京信通信系统(广州)有限公司 一种频域信号压缩方法及装置
US20190339937A1 (en) * 2018-05-04 2019-11-07 Microsoft Technology Licensing, Llc Block floating point computations using reduced bit-width vectors
CN110780845A (zh) * 2019-10-17 2020-02-11 浙江大学 一种用于量化卷积神经网络的可配置近似乘法器及其实现方法

Also Published As

Publication number Publication date
EP4024288B1 (en) 2024-05-01
JP7269382B2 (ja) 2023-05-08
EP4024288A1 (en) 2022-07-06
CN113408717A (zh) 2021-09-17
JP2022538238A (ja) 2022-09-01
EP4024288A4 (en) 2023-09-06
US20220253280A1 (en) 2022-08-11

Similar Documents

Publication Publication Date Title
WO2021185262A1 (zh) 计算装置、方法、板卡和计算机可读存储介质
WO2021078210A1 (zh) 用于神经网络运算的计算装置、方法、集成电路和设备
WO2021082725A1 (zh) Winograd卷积运算方法及相关产品
WO2021185261A1 (zh) 计算装置、方法、板卡和计算机可读存储介质
CN111523656A (zh) 处理装置及方法
EP4141685A1 (en) Method and device for constructing communication topology structure on basis of multiple processing nodes
WO2021223642A1 (zh) 数据处理方法及装置以及相关产品
CN111258537B (zh) 一种防止数据溢出的方法、装置和芯片
CN111381875B (zh) 数据比较器、数据处理方法、芯片及电子设备
CN111047023B (zh) 一种计算装置及相关产品
WO2021169914A1 (zh) 数据量化处理方法、装置、电子设备和存储介质
CN111738428B (zh) 计算装置、方法及相关产品
WO2021073512A1 (zh) 用于浮点运算的乘法器、方法、集成电路芯片和计算装置
WO2022001438A1 (zh) 一种计算装置、集成电路芯片、板卡、设备和计算方法
CN111381802B (zh) 数据比较器、数据处理方法、芯片及电子设备
CN111260044B (zh) 数据比较器、数据处理方法、芯片及电子设备
CN111258534B (zh) 数据比较器、数据处理方法、芯片及电子设备
CN111384944B (zh) 全加器、半加器、数据处理方法、芯片及电子设备
CN112232498B (zh) 一种数据处理装置、集成电路芯片、电子设备、板卡和方法
WO2021082724A1 (zh) 运算方法及相关产品
CN117724676A (zh) 数据比较器、数据处理方法、芯片及电子设备
CN117519637A (zh) 数据比较器、数据处理方法、芯片及电子设备
WO2021223645A1 (zh) 数据处理方法及装置以及相关产品
CN117519636A (zh) 数据比较器、数据处理方法、芯片及电子设备
CN111381806A (zh) 数据比较器、数据处理方法、芯片及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21771952

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021576637

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021771952

Country of ref document: EP

Effective date: 20220330

NENP Non-entry into the national phase

Ref country code: DE