CN110413254B - Data processor, method, chip and electronic equipment - Google Patents

Data processor, method, chip and electronic equipment Download PDF

Info

Publication number
CN110413254B
CN110413254B CN201910902610.3A CN201910902610A CN110413254B CN 110413254 B CN110413254 B CN 110413254B CN 201910902610 A CN201910902610 A CN 201910902610A CN 110413254 B CN110413254 B CN 110413254B
Authority
CN
China
Prior art keywords
data
partial product
bit
circuit
target code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910902610.3A
Other languages
Chinese (zh)
Other versions
CN110413254A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201910902610.3A priority Critical patent/CN110413254B/en
Priority to CN201911349822.XA priority patent/CN111008003B/en
Publication of CN110413254A publication Critical patent/CN110413254A/en
Application granted granted Critical
Publication of CN110413254B publication Critical patent/CN110413254B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Neurology (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Image Processing (AREA)

Abstract

The application provides a data processor, a method, a chip and electronic equipment, wherein the data processor comprises a first multiplication operation circuit, a second multiplication operation circuit and a partial product exchange circuit, the first multiplication operation circuit comprises a first correction coding sub-circuit and a first correction compression sub-circuit, the second multiplication operation circuit comprises a second correction coding sub-circuit and a second correction compression sub-circuit, and the data processor can carry out regular signed number coding processing on received data, so that the number of the obtained effective partial products is small, and the complexity of the data processor for realizing multiplication or multiply-accumulate operation is reduced.

Description

Data processor, method, chip and electronic equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processor, a method, a chip, and an electronic device.
Background
With the continuous development of digital electronics, the rapid development of various Artificial Intelligence (AI) chips has increased the demand for high performance data processors, such as multipliers, adders or multiply-accumulators. As one of algorithms widely used by an intelligent chip, a neural network algorithm performs multiply-accumulate operation by a multiply-accumulator, which is a common operation in the neural network algorithm.
Currently, a data processor uses each three-bit value in a multiplier as a code, obtains partial products according to the multiplicand, and compresses all the partial products by using a wallace tree to obtain a multiplication result or a multiplication and accumulation result. However, in the conventional technique, the number of non-zero values in the code is large, and the number of the generated corresponding effective partial products is large, so that the complexity of the data processor for realizing multiplication or multiply-accumulate operation is high.
Disclosure of Invention
In view of the above, there is a need to provide a data processor, a method, a chip and an electronic device, which can reduce the number of acquired effective partial products and reduce the computational complexity.
A data processor, the data processor comprising: the first multiplication circuit comprises a first correction coding sub-circuit and a first correction compression sub-circuit, the second multiplication circuit comprises a second correction coding sub-circuit and a second correction compression sub-circuit, wherein the first correction coding sub-circuit comprises a first coding branch and a first selection branch, the second correction coding sub-circuit comprises a second coding branch and a second selection branch, a first output end of the first correction coding sub-circuit is connected with a first input end of the partial product exchange circuit, a second output end of the first correction coding sub-circuit is connected with an input end of the first correction compression sub-circuit, and a first output end of the partial product exchange circuit is connected with an input end of the first correction coding sub-circuit, a second output end of the partial product switching circuit is connected with an input end of the second correction coding sub-circuit, a first output end of the second correction coding sub-circuit is connected with a second input end of the partial product switching circuit, and a second output end of the second correction coding sub-circuit is connected with an input end of the second correction compression sub-circuit;
the first coding branch is configured to perform regular signed number coding on received first data to obtain a first partial product after sign bit expansion, the first selecting branch is configured to select a first partial product of a target code from the first partial product after sign bit expansion, the first correction compression sub-circuit is configured to perform compression processing on the first partial product of the target code to obtain a first target operation result, the second coding branch is configured to perform regular signed number coding on received second data to obtain a second partial product after sign bit expansion, the second selecting branch is configured to select a second partial product of the target code from the second partial product after sign bit expansion, and the second correction compression sub-circuit is configured to perform compression processing on the second partial product of the target code to obtain a second target operation result, the partial product exchanging circuit is used for exchanging the first partial product after the sign bit is expanded and the second partial product after the sign bit is expanded.
In one embodiment, each of the first and second multiplication circuits includes a first input for receiving a function selection mode signal; the partial product switching circuit comprises a third input end for receiving the function selection mode signal; the function select mode signal is used to determine that the data processor can currently process different modes of data operations.
In one embodiment, the first modified encoding sub-circuit comprises: the output end of the first correction coding processing branch is connected with the input end of the first partial product selection branch;
the first modified coding processing branch is configured to perform regular signed number coding processing on the received first data to obtain the first target code, and the first partial product selecting branch is configured to select the sign-bit-extended first partial product according to the sign-bit-extended first partial product obtained by the first target code, receive the sign-bit-extended second partial product output by the partial product exchanging circuit, and use the received sign-bit-extended second partial product and the sign-bit-extended first partial product obtained by the selection as the first partial product of the target code.
In one embodiment, the first modified encoding processing branch comprises: a first modified coding unit, a lower partial product obtaining unit, a higher partial product obtaining unit, and a higher partial product obtaining unit, wherein a first output end of the first modified coding unit is connected to a first input end of the lower partial product obtaining unit, an output end of the lower partial product obtaining unit is connected to a second input end of the lower partial product obtaining unit, a second output end of the first modified coding unit is connected to a first input end of the higher partial product obtaining unit, and an output end of the higher partial product obtaining unit is connected to a second input end of the higher partial product obtaining unit;
wherein the first modified encoding unit is configured to perform regular signed number encoding processing on the received first data, determine a bit width of the data that can be processed by the data processor according to the received function selection mode signal, and obtain a first target code according to the bit width of the data that can be processed by the data processor, the lower-order partial product obtaining unit is configured to obtain a first lower-order partial product after sign bit extension according to a first lower-order target code in the received first target code and the first data, the lower-order selector grouping unit is configured to gate a value in the first lower-order partial product after sign bit extension, the upper-order partial product obtaining unit is configured to obtain a first upper-order partial product after sign bit extension according to a first upper-order target code in the received first target code and the first data, the high selector bank unit is used for gating the value in the first high partial product after the sign bit is expanded.
In one embodiment, the first modified encoding unit includes: the device comprises a first data input port, a first mode selection signal input port, a low-order target coding output port and a high-order target coding output port; the first data input port is configured to receive the first data, the first mode selection signal input port is configured to receive the function selection mode signal, the low-order target coding output port is configured to output the first low-order target code obtained after the first data is subjected to the regular signed number coding processing, and the high-order target coding output port is configured to output the first high-order target code obtained after the first data is subjected to the regular signed number coding processing.
In one embodiment, the lower partial product obtaining unit includes: the low-order target coding input port, the gating value input port, the first data input port and the low-order partial product output port; the lower target code input port is configured to receive the first lower target code output by the first modified code unit, the gated value input port is configured to receive a value in a first lower partial product after the sign bit is extended, which is obtained after the gating by the lower selector bank unit, the first data input port is configured to receive the first data, and the lower partial product output port is configured to output the first lower partial product after the sign bit is extended.
In one embodiment, the upper partial product obtaining unit includes: the system comprises an upper target code input port, a gating numerical value input port, a first data input port and an upper partial product output port; the upper target code input port is configured to receive a first upper target code output by the first modified code unit, the gated value input port is configured to receive a value in a first upper partial product after the sign bit is expanded and output after the gating by the upper selector bank unit, the first data input port is configured to receive the first data, and the upper partial product output port is configured to output the first upper partial product after the sign bit is expanded.
In one embodiment, the low selector bank unit includes: a low selector for gating a value in the sign bit extended first low portion product.
In one embodiment, the high selector bank unit includes: a high selector for gating the value in the sign bit extended first high bit partial product.
In one embodiment, the first partial product selection branch comprises: a function selection mode signal input port, a first partial product input port, a second partial product input port, a first partial product output port, and a gated partial product output port; the function selection mode signal input port is configured to receive the function selection mode signal, the first partial product input port is configured to receive a first partial product after sign bit expansion output by the first correction encoding unit, the second partial product input port is configured to receive a second partial product after sign bit expansion exchanged by the partial product exchange circuit, the first partial product output port is configured to output a first partial product after sign bit expansion that needs to be exchanged by the partial product exchange circuit, and the gated partial product output port is configured to output a gated first partial product after sign bit expansion and a received second partial product after sign bit expansion.
In one embodiment, the first modified compression sub-circuit comprises: the device comprises a correction Wallace tree group unit and an accumulation unit, wherein the output end of the correction Wallace tree group unit is connected with the input end of the accumulation unit; the modified Wallace tree group unit is used for accumulating each column number value in the first partial product of the target code acquired when data in different modes are operated, so as to obtain an accumulated operation result, and the accumulation unit is used for adding the accumulated operation result.
In one embodiment, the modified wallace tree group unit includes: the system comprises a low-level Wallace tree subunit, a selector and a high-level Wallace tree subunit, wherein the output end of the low-level Wallace tree subunit is connected with the input end of the selector, and the output end of the selector is connected with the input end of the high-level Wallace tree subunit; the low-order Wallace tree subunit is configured to perform an accumulation operation on each column value in the first partial product of the target code to obtain the accumulation operation result, the selector is configured to gate the carry input signal received by the high-order Wallace tree subunit, and the high-order Wallace tree subunit is configured to perform an accumulation operation on each column value in the first partial product of the target code to obtain the accumulation operation result.
In one embodiment, the accumulation unit includes: an adder for adding the result of the addition operation.
In one embodiment, the second modified encoding sub-circuit comprises: the output end of the second correction coding processing branch is connected with the input end of the second partial product selection branch;
the second modified coding processing branch is configured to perform regular signed number coding processing on the received second data to obtain the second target code, and the second partial product selection branch is configured to obtain a sign bit extended second partial product according to the second target code, select the sign bit extended second partial product, receive the sign bit extended first partial product output by the partial product switching circuit, and use the received sign bit extended second partial product and the selected sign bit extended first partial product as the second partial product of the target code.
In one embodiment, the second partial product selection branch comprises: a function selection mode signal input port, a second partial product input port, a first partial product input port, a second partial product output port, and a gated partial product output port; the function selection mode signal input port is configured to receive the function selection mode signal, the second partial product input port is configured to receive a second partial product after the sign bit is expanded, the second partial product input port is configured to receive a first partial product after the sign bit is expanded, the first partial product input port is configured to receive the first partial product after the sign bit is expanded, the first partial product output port is configured to output a second partial product after the sign bit is expanded, the second partial product needs to be exchanged by the partial product exchange circuit, and the gated partial product output port is configured to output the second partial product after the sign bit is expanded, and the received first partial product after the sign bit is expanded.
In one embodiment, the partial product switching circuit comprises: the function selection mode signal input port is configured to receive the function selection mode signal, the first partial product input port is configured to receive a first partial product output by the first partial product selection branch after the sign bit needs to be swapped is expanded, the first partial product output port is configured to output the first partial product after the sign bit is expanded, the second partial product output port is configured to receive a second partial product output by the second partial product selection branch after the sign bit needs to be swapped is expanded, and the second partial product output port is configured to output the second partial product after the sign bit is expanded.
A method of data processing, the method comprising:
receiving data to be processed and a function selection mode signal, wherein the function selection mode signal is used for indicating that a data processor can currently process data operations in different modes;
judging whether the data to be processed needs to be split according to the function selection mode signal;
if the data to be processed needs to be split, splitting the data to be processed to obtain split data;
carrying out regular signed number coding processing on the split data to obtain a target code;
performing conversion processing according to the target code and the split data to obtain a partial product after sign bit expansion;
judging whether the partial product after the sign bit expansion needs to be exchanged or not according to the function selection mode signal;
if the partial product after the sign bit expansion does not need to be exchanged, taking the partial product after the sign bit expansion as the partial product of the target code;
and compressing the partial product of the target code to obtain a target operation result.
In one embodiment, the determining whether the data to be processed needs to be split according to the function selection mode signal includes: and judging whether the bit width of the data to be processed is equal to the bit width of the data which can be currently processed by the data processor and operated in the corresponding mode or not according to the function selection mode signal.
In one embodiment, after determining whether the bit width of the data to be processed is equal to the bit width of the data in the corresponding mode operation currently processable by the data processor according to the function selection mode signal, the method further includes: and if the data to be processed does not need to be split, continuing to perform regular signed number coding processing on the data to be processed to obtain the target code.
In one of the embodiments, the first and second electrodes are,the regular signed number coding processing is performed on the split data to obtain a target code, and the method comprises the following steps: the split data is continuouslBit value 1 to (l+ 1) the highest bit value is 1, the lowest bit value is-1, and the rest bits are 0, the target code is obtained,lgreater than or equal to 2.
In one embodiment, the performing regular signed number coding processing on the split data to obtain a target code includes:
carrying out regular signed number coding processing on the split data to obtain an intermediate code;
and obtaining the target code according to the intermediate code and the function selection mode signal.
In one embodiment, the performing conversion processing according to the target code and the split data to obtain a sign-bit-extended partial product includes:
performing conversion processing according to the target code and the split data to obtain an original partial product;
and sign bit expansion processing is carried out on the original partial product to obtain the partial product after sign bit expansion.
In one embodiment, the determining whether the partial product after the sign bit extension needs to be exchanged according to the function selection mode signal includes: and judging whether the data bit widths currently processed by the data processor are the same or not according to the function selection mode signal.
In one embodiment, after determining whether the partial product after the sign bit extension needs to be swapped according to the function selection mode signal, the method further includes: and if the partial product after the sign bit expansion needs to be exchanged, exchanging the upper-order partial product or the lower-order partial product in the partial product after the sign bit expansion.
In one embodiment, the compressing the partial product of the target encoding to obtain the target operation result includes:
accumulating the partial product of the target code to obtain an intermediate operation result;
and accumulating the intermediate operation result to obtain the target operation result.
In one embodiment, the accumulating the intermediate operation result to obtain the target operation result includes:
the low-order Wallace tree subunit performs accumulation processing on the column number in the partial product of all the target codes to obtain an accumulation operation result;
the selector gates the accumulation operation result according to the function selection mode signal to obtain a carry gating signal;
and the high-order Wallace tree subunit performs accumulation processing according to the carry gating signal and the column number in the partial product of the target code to obtain the target operation result.
In the data processor and method provided by this embodiment, a first modified coding sub-circuit and a second modified coding sub-circuit respectively implement regular signed number coding processing on received data, respectively obtain a first partial product after sign bit extension and a second partial product after sign bit extension, and determine whether the first partial product after sign bit extension and the second partial product after sign bit extension need to be exchanged through a partial product exchange circuit according to a received function selection mode signal, if the exchange processing needs to be performed, the first modified coding sub-circuit and the second modified coding sub-circuit can respectively use the partial product after sign bit extension that each sub-circuit currently has as a partial product of target coding, and further obtain a first partial product of target coding and a second partial product of target coding, and finally, the first partial product of target coding is respectively encoded through a first modified compression sub-circuit and a second modified compression sub-circuit The data processor can respectively carry out regular signed number coding processing on the received data through the first correction coding sub-circuit and the second correction coding sub-circuit, so that the number of the obtained effective partial products is small, and the complexity of realizing multiplication or multiply-accumulate operation by the data processor is reduced.
The machine learning arithmetic device provided by the embodiment of the application comprises one or more data processors; the machine learning arithmetic device is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning arithmetic and transmitting an execution result to other processing devices through an I/O interface;
when the machine learning arithmetic device comprises a plurality of data processors, the plurality of computing devices are connected through a preset specific structure and transmit data;
the data processors are interconnected through a PCIE bus and transmit data so as to support larger-scale machine learning operation; a plurality of the data processors share the same control system or own respective control systems; the data processors share the memory or own the memory; the interconnection mode of the data processors is any interconnection topology.
The combined processing device provided by the embodiment of the application comprises the machine learning processing device, the universal interconnection interface and other processing devices. The machine learning arithmetic device interacts with the other processing devices to jointly complete the operation designated by the user; the combined processing device may further include a storage device, which is connected to the machine learning arithmetic device and the other processing device, respectively, and is configured to store data of the machine learning arithmetic device and the other processing device.
The neural network chip provided by the embodiment of the application comprises the data processor, the machine learning arithmetic device or the combined processing device.
The neural network chip packaging structure provided by the embodiment of the application comprises the neural network chip.
The board card provided by the embodiment of the application comprises the neural network chip packaging structure.
The embodiment of the application provides an electronic device, which comprises the neural network chip or the board card.
An embodiment of the present application provides a chip, which includes at least one data processor as described in any one of the above.
An electronic device provided by the embodiment of the application comprises the chip.
Drawings
Fig. 1 is a schematic circuit diagram of a data processor according to an embodiment.
Fig. 2 is a schematic circuit diagram of another data processor according to another embodiment.
Fig. 3 is a specific circuit structure diagram of a data processor according to an embodiment.
Fig. 4a is a schematic diagram illustrating a distribution rule of partial products obtained by 16-bit data multiplication according to an embodiment.
Fig. 4b is a schematic diagram illustrating a distribution rule of partial products obtained by multiply-accumulate operations of 16 bits by 8 bits according to an embodiment.
Fig. 5 is a specific circuit configuration diagram of a data processor according to another embodiment.
Fig. 6 is a schematic flowchart of a data processing method according to an embodiment.
Fig. 7 is a specific circuit configuration diagram of the compression circuit for 8-bit data operation according to another embodiment.
Fig. 8 is a flowchart illustrating another data processing method according to an embodiment.
Fig. 9 is a structural diagram of a combined processing device according to an embodiment.
Fig. 10 is a block diagram of another combined processing device according to an embodiment.
Fig. 11 is a schematic structural diagram of a board card according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The data processor provided by the present application can be applied to an AI chip, a Field-Programmable Gate Array (FPGA) chip, or other hardware circuit devices to perform multiplication processing or multiply-accumulate processing, and the schematic structural diagram of the data processor is shown in fig. 1 and 2.
As shown in fig. 1, fig. 1 is a block diagram of a data processor according to an embodiment. As shown in fig. 1, the data processor includes: a first multiplication circuit 11, a second multiplication circuit 12, and a partial product swap circuit 13; the first multiplication circuit 11 includes a first modified coding sub-circuit 111 and a first modified compression sub-circuit 112, the second multiplication circuit 12 includes a second modified coding sub-circuit 121 and a second modified compression sub-circuit 122, wherein the first modified coding sub-circuit 111 includes a first coding branch 111a and a first selection branch 111b, the second modified coding sub-circuit 121 includes a second coding branch 121a and a second selection branch 121b, a first output terminal of the first modified coding sub-circuit 111 is connected to a first input terminal of the partial product exchange circuit 13, a second output terminal of the first modified coding sub-circuit 111 is connected to an input terminal of the first modified compression sub-circuit 112, a first output terminal of the partial product exchange circuit 13 is connected to an input terminal of the first modified coding sub-circuit 111, and a second output terminal of the partial product exchange circuit 13 is connected to an input terminal of the second modified coding sub-circuit 121, a first output terminal of the second modified coding sub-circuit 121 is connected to a second input terminal of the partial product switching circuit 13, and a second output terminal of the second modified coding sub-circuit 121 is connected to an input terminal of the second modified compression sub-circuit 122.
Wherein, the first encoding branch 111a is configured to perform regular signed number encoding processing on received first data to obtain a first partial product after sign bit expansion, the first selecting branch 111b is configured to select a first partial product of a target code from the first partial product after sign bit expansion, the first modified compressing sub-circuit 112 is configured to perform compression processing on the first partial product of the target code to obtain a first target operation result, the second encoding branch 121a is configured to perform regular signed number encoding processing on received second data to obtain a second partial product after sign bit expansion, the second selecting branch 121b is configured to select a second partial product of the target code from the second partial product after sign bit expansion, and the second modified compressing sub-circuit 122 is configured to perform compression processing on the second partial product of the target code, and obtaining a second target operation result, wherein the partial product exchanging circuit 13 is configured to exchange the sign bit extended first partial product and the sign bit extended second partial product.
Specifically, the data processor may perform a data multiplication operation or a data multiplication and accumulation operation. Optionally, the first modified coding sub-circuit 111 may receive first data, the second modified coding sub-circuit 121 may receive second data, and both the first data and the second data may include two sub-data, where the two sub-data may be the same sub-data with the same bit width or different sub-data with the same bit width; the sub-data may be a multiplicand in a multiplication operation or a multiply-accumulate operation, or may be a multiplier in a multiplication operation or a multiply-accumulate operation. Optionally, the two sub-data in the first data may be spliced to be input to the first modified coding sub-circuit 111 as a whole, or may be separately and simultaneously input to the first modified coding sub-circuit 111; the two sub-data in the second data may be spliced and input to the second modified coding sub-circuit 121 as a whole, or may be separately and simultaneously input to the second modified coding sub-circuit 121. The subdata may be fixed-point numbers, and the bit width may be 2NThe bit width of the data obtained by splicing the two subdata can be 4N. Alternatively, the first modified encoding sub-circuit 111 may include a plurality of data processing units having different functions, and the data processing unitsThe unit may be a unit having a regular signed number encoding processing function, and may also be a unit having a different conversion processing function, which is not limited in this embodiment. When the data processor performs the same data operation, one sub-data received by the first modified coding sub-circuit 111 in the data processor may be used as a multiplicand, and the other sub-data may be used as a multiplier; one of the sub-data received by the second modified encoding sub-circuit 121 in the data processor may be used as a multiplicand, and the other sub-data may be used as a multiplier. It can be further understood that, the bit width of the sign bit extended first partial product and the sign bit extended second partial product may be equal to 2 times the bit width of the multiplicand when the data processor is currently processing the multiplication operation or the multiply-accumulate operation; the number of sign bit extended first partial products may be equal to the number of target encoded first partial products; the number of sign bit extended second partial products may be equal to the number of target encoded second partial products. Wherein the sign bit extended first partial product may include a sign bit extended first lower partial product and a sign bit extended first upper partial product; the sign bit extended second partial product may include a sign bit extended second lower bit partial product and a sign bit extended second upper bit partial product; the first partial product of the target code may include a first lower bit partial product of the target code and a first upper bit partial product of the target code; the second partial product of the target code may include a second lower bit partial product of the target code and a second upper bit partial product of the target code.
In this embodiment, the first modified coding sub-circuit 111 may receive a multiplier in the operation process, and perform a regular signed number coding process on the multiplier to obtain the target code. It should be noted that the method of the regular signed number encoding process can be characterized by the following ways: for theNFor the bit multiplier, the low order value is processed to the high order value, if there is a continuationll>= 2) bit value 1, then it is possible to continuenBit value 1 is converted into data' 1 (0) l-1(-1) ", and the rest will correspond to (1) ((R))N-l) Bit value andafter conversion (l+1) Combining the bit values to obtain new data; then the new data is used as the initial data of the next stage of conversion processing until no continuous data exists in the new data obtained after the conversion processingll>= 2) digit value 1; wherein, it is toNThe bit multiplier is processed by regular signed number coding, and the bit width of the obtained target code can be equal to (A)N+1). Further, in the regular signed number encoding process, the data 11 can be converted into (100- > 001), that is, the data 11 can be equivalently converted into 10 (-1); data 111 can be converted to (1000-0001), i.e., data 111 can be converted to 100 (-1) equivalently; by analogy, the others are consecutivell>= 2) the manner of bit value 1 conversion processing is also similar.
For example, the multiplier received by the first modified encoding sub-circuit 111 is "001010101101110", the first new data obtained after the first-stage conversion processing is performed on the multiplier is 0010101011100 (-1) 0, the second new data obtained after the second-stage conversion processing is continuously performed on the first new data is 0010101100 (-1) 00 (-1) 0, the third new data obtained after the third-stage conversion processing is continuously performed on the second new data is 0010110 (-1) 00 (-1) 00 (-1) 0, the fourth new data obtained after the fourth-stage conversion processing is continuously performed on the third new data is 00110 (-1) 0 (-1) 00 (-1) 00 (-1) 0, the fifth new data obtained after the fifth-stage conversion processing is continuously performed on the fourth new data is 010 (-1) 0 (-1) 0 (-1) 00 (-1) 00 (-1) 0; and there is no continuity in the fifth new datall>= 2) bit value 1, at this time, the fifth new data may be referred to as an initial code, and an intermediate code is obtained after performing one bit complementing process on the initial code, and the representation regular signed number coding process is completed; wherein the initially encoded bit width may be equal to the bit width of the multiplier. Optionally, the first modified coding sub-circuit 111 performs regular signed number coding on the multiplier to obtain new data (i.e. initial coding), and if the highest bit value and the next highest bit value in the new data are "10" or "01", the first modified coding sub-circuit 111 may perform regular signed number coding on the highest bit value of the new data at a position higher than the highest bit valueAnd supplementing a bit value of 0 to obtain the high three-bit values of 010 or 001 respectively corresponding to the intermediate codes. Optionally, the bit width of the intermediate code may be equal to the bit width of the data currently processed by the data processor plus 1.
In addition, if the data bit width received by the data processor is 2NAnd can be currently processedNA bit data operation, a first correction coding sub-circuit 111 in the data processor can be 2NBit data is split into two groupsNThe bit data is operated on respectively, and at this time, two groups (N+1) The bit intermediate codes can be used as target codes after being combined; if the data processor can currently process 2NFor bit data operation, the first modified encoding sub-circuit 111 in the data processor can perform the (2) operationN+1) After one digit value 0 is complemented at the position one digit higher than the highest digit value of the middle code, the complemented value is processed (2)N+2) Bit data is encoded as a target. In this embodiment, the data processor may perform a complementary bit process for the initial code and a complementary number process for the intermediate code.
Optionally, each of the first multiplication circuit 11 and the second multiplication circuit 12 includes a first input end for receiving a function selection mode signal; the partial product swap circuit 13 comprises a third input for receiving the function selection mode signal. Optionally, the function selection mode signal is used to determine that the data processor can currently process different modes of data operations.
In this embodiment, each data processing unit included in the first multiplication circuit 11 can receive the function selection mode signal; each data processing unit comprised by the second multiply operation circuit 12 may receive a function selection mode signal. In addition, when the data processor performs the same data operation, the function selection mode signals received by the first multiplication circuit 11, the second multiplication circuit 12, and the partial product swap circuit 13 in the data processor may be equal to each other. Optionally, the function selection mode signal may include four different signals, and the four function selection mode signals respectively correspond to dataThe processor can process four different modes of data operation, which can includeNPositionNMultiplication of bit data,NPositionNMultiply-accumulate operation of bit data, 2NBit 2NMultiplication of bit data and 2NPositionNAnd performing multiply-accumulate operation on the bit data. For example, if the first data includes two 2 sNBit sub-data, the second data comprising two 2NThe data processor selects mode signals according to the received different functions, and can determine the data operation of the current processable specific mode; the four function selection mode signals may be represented by binary values 00, 01, 10, 11, respectively, or other representations, where mode =00 may indicate that the data processor is currently capable of processingNPositionNMultiplication of bit data, mode =01, may indicate that the data processor is currently capable of processingNPositionNMultiply-accumulate operation of bit data, mode =10 may indicate that the data processor can currently process 2NBit 2NMultiplication of bit data, mode =11, may indicate that the data processor may currently handle 2NPositionNMultiply and accumulate operation of bit data; it should be further understood that there may be any one-to-one correspondence between the four function selection mode signals and the data operations in the four different modes, which is not limited in this embodiment.
In addition, when the data processor processes 2NPositionNDuring the multiply-accumulate operation of bit data, the partial product exchanging circuit 13 in the data processor may exchange the first lower partial product after sign bit extension or the first upper partial product after sign bit extension, obtained by the first correction coding sub-circuit 111 in the data processor, with the second lower partial product after sign bit extension or the second upper partial product after sign bit extension, obtained by the second correction coding sub-circuit 121 in the data processor, according to actual requirements; it can also be understood that, when the data processor processes the data operation in the other three modes, the partial product swap circuit 13 in the data processor is in a floating state, and the low-bit partial product after the sign bit extension and the high-bit partial product after the sign bit extension do not perform corresponding swap processing. At the same time, two of the first dataThe bit width of the subdata is 2NAnd the bit widths of the two subdata contained in the second data are both 2NIf the data processor can currently process oneNPositionNDuring the multiplication of the bit data, according to actual requirements, one of the first data and the second data is 0, the high-order value of the two subdata included in the other data is 0, or the low-order value of the two subdata is 0, and the first data and the second data can be operated according to the original data; if the data processor can currently process one 2NBit 2NDuring the multiplication operation of the bit data, according to the actual requirement, one of the first data and the second data is 0, and the high-order numerical value and the low-order numerical value in the two subdata of the other data are both non-0 numerical values; if the data processor can currently process two 2 sNBit 2NIn the multiplication of the bit data, data 0 does not exist in the first data and the second data according to actual requirements.
In the data processor provided in this embodiment, the first modified coding sub-circuit and the second modified coding sub-circuit respectively implement regular signed number coding processing on received data, respectively obtain a first partial product after sign bit extension and a second partial product after sign bit extension, and determine whether the first partial product after sign bit extension and the second partial product after sign bit extension need to be exchanged through the partial product exchanging circuit according to a received function selection mode signal, if the exchange processing needs to be performed, the first modified coding sub-circuit and the second modified coding sub-circuit can respectively use the partial product after sign bit extension of each current sub-circuit as the partial product of a target code, so as to obtain the first partial product of the target code and the second partial product of the target code, and finally, the first modified compression sub-circuit and the second modified compression sub-circuit respectively, compressing the first partial product of the target code and the second partial product of the target code to obtain a target operation result; the data processor can not only realize multiplication operation, but also realize multiplication and accumulation operation, thereby improving the universality of the data processor; in addition, the data processor can complete multiply-accumulate operation without performing once more accumulate operation on the multiply operation result, and can directly realize multiply-accumulate or multiply operation through one operation process, thereby reducing the power consumption of the data processor; in addition, the data processor can also carry out regular signed number coding processing on the received data, and the number of the obtained effective partial products is less, so that the complexity of realizing multiplication operation or multiply-accumulate operation by the data processor is reduced.
As shown in fig. 2, fig. 2 is a schematic structural diagram of a data processor according to another embodiment, where the data processor includes a regular signed number encoding circuit 21, a first partial product obtaining circuit 22, a second partial product obtaining circuit 23, a first compressing circuit 24, and a second compressing circuit 25; the regular signed number encoding circuit 21 includes a regular signed number encoding processing unit 211, an output end of the regular signed number encoding processing unit 211 is connected to a first input end of the first partial product obtaining circuit 22, an output end of the regular signed number encoding processing unit 211 is connected to a first input end of the second partial product obtaining circuit 23, an output end of the first partial product obtaining circuit 22 is connected to a first input end of the first compressing circuit 24, and an output end of the second partial product obtaining circuit 23 is connected to a first input end of the second compressing circuit 25.
The regular signed number coding processing unit 211 is configured to perform a regular signed number coding process on received first data to obtain a target code, the first partial product obtaining circuit 22 is configured to receive second data and obtain a first partial product of the target code according to the target code and the second data, the second partial product obtaining circuit 23 is configured to receive the second data and obtain a second partial product of the target code according to the target code and the second data, the first compressing circuit 24 is configured to perform an accumulation process on the first partial product of the target code, and the second compressing circuit 25 is configured to perform an accumulation process on the second partial product of the target code.
Specifically, the first data and the second data may each include two subdata data, and the two subdata data in the first dataThe two sub data in the second data can be used as multiplicands in multiplication operation or multiply-accumulate operation, and the two sub data in the second data can be used as multiplicands in multiplication operation or multiply-accumulate operation. Optionally, the bit width of the sub data may be 2NIn addition, the two sub-data in the first data may be spliced and input to the regular signed number encoding processing unit 211 as a whole, or may be separately and simultaneously input to the regular signed number encoding processing unit 211; the two sub-data in the second data may be spliced and input to the first partial product obtaining circuit 22 and the second partial product obtaining circuit 23 as a whole, or may be separately and simultaneously input to the first partial product obtaining circuit 22 and separately and simultaneously input to the second partial product obtaining circuit 23. Optionally, after the regular signed number coding processing is performed on the two sub-data in the first data, a first target code and a second target code can be obtained respectively, and the first target code and the second target code are collectively referred to as target codes. Optionally, the bit width of the first target code may be equal to the bit width of the second target code, and may also be equal to the bit width of the multiplier currently processed by the data processor plus 1; the number of first partial products of the target code may be equal to the bit width of the first target code; the number of second partial products of the target code may be equal to the bit width of the second target code. Optionally, the first target code may include a first lower target code and a first upper target code, and the second target code may include a second lower target code and a second upper target code.
For example, the first data includes dataAAnd dataBThe second data including dataCAnd dataDIf the data processor needs to process the dataAData ofCPerforming multiplication operation on the dataBData ofDPerforming multiplication, the regular signed number encoding processing unit 211 in the data processor can encode the dataACarrying out regular signed number coding processing to obtain a first target code, and carrying out data codingBThe second target code is obtained by performing regular signed number coding processing, and the regular signed number coding processing unit 211 may code the first target code (and/or the second target code) and the dataC(or a secondTwo data) is input to the first partial product acquisition circuit 22, the second target code (and/or the first target code) and the data are encodedD(or second data) is input to the second partial product acquisition circuit 23; or encoding the first object (and/or encoding the second object) and the dataC(or second data) is inputted to the second partial product obtaining circuit 23, and the second target code (and/or the first target code) and the data are inputtedD(or second data) to the first partial product acquisition circuit 22; meanwhile, if the first partial product obtaining circuit 22 and the second partial product obtaining circuit 23 receive second data obtained by splicing two sub-data, both the first partial product obtaining circuit 22 and the second partial product obtaining circuit 23 can split the second data (i.e. multiplicand) to obtain sub-data required to be multiplied respectively, and obtain partial products through the obtained sub-data and the first target code or the second target code according to actual requirements; the actual requirement can also be understood as the corresponding relation between the multiplicand currently required to be processed by the data processor and the corresponding target code. In addition, if the bit width of the first target code may be equal to 2NThe first upper target code may be equal to the upper N-bit data in the first target code, and the first lower target code may be the lower N-bit data in the first target code.
In the data processor, the first partial product obtaining circuit 22 may receive the first target code and the multiplicand input by the regular signed number coding processing unit 211, and obtain a first partial product of the target code; the second partial product obtaining circuit 23 may receive the second target code and the multiplicand input from the regular signed number code processing unit 211, and obtain the second partial product of the target code. Optionally, the first partial product of the target code may include a first lower partial product of the target code and a first upper partial product of the target code; the second partial product of the target code may include a second lower bit partial product of the target code and a second upper bit partial product of the target code. Optionally, the first lower partial product of the target code may be a partial product corresponding to the first lower target code, and the first upper partial product of the target code may be a partial product corresponding to the first upper target code; the second lower partial product of the target code may be a partial product corresponding to the second lower target code and the second upper partial product of the target code may be a partial product corresponding to the second upper target code.
Further, the first compressing circuit 24 in the data processor may perform an accumulation process on the first partial product of the target code (i.e. the first lower partial product of the target code and the first upper partial product of the target code) obtained by the first partial product obtaining circuit 22; the second compressing circuit 25 in the data processor may perform an accumulation process on the second partial product of the target code (i.e., the second lower partial product of the target code and the second upper partial product of the target code) obtained by the second partial product obtaining circuit 23, thereby obtaining a target operation result. In addition, in this embodiment, in the first data and the second data received by the data processor, bit widths of sub-data included in the first data and the second data are both 2N
Optionally, the regular signed number encoding processing unit 211 includes a first input end, configured to receive a function selection mode signal; the first partial product obtaining circuit 22 and the second partial product obtaining circuit 23 each include a second input terminal for receiving the function selection mode signal; the first compression circuit 24 and the second compression circuit 25 each comprise a second input for receiving the function selection mode signal. Optionally, the function selection mode signal is used to determine that the data processor is currently processing data operations in different modes.
It will be appreciated that the function selection mode signal (mode) may have four different signals, which correspond to four different modes of data operations that the data processor may handle. Optionally, when the data is processed by the same data operation, the received function selection mode signals (modes) of the regular signed number encoding processing unit 211, the first partial product obtaining circuit 22, the second partial product obtaining circuit 23, the first compressing circuit 24 and the second compressing circuit 25 in the data processor may all be equal, and all of the four function selection mode signals (modes) may use binary values respectivelyData operations represented as mode =00, mode =01, mode =10, mode =11, four different modes may includeNPositionNMultiplication of bit data,NPositionNMultiply-accumulate operation of bit data, 2NBit 2NMultiplication of bit data and 2NPositionNAnd performing multiply-accumulate operation on the bit data. Both the first partial product obtaining circuit 22 and the second partial product obtaining circuit 23 in the data processor may control the receiving regular signed number encoding processing unit 211 to input the first target code or the second target code, or perform subsequent operations on the first target code and the second target code according to the received function selection mode signal.
In this embodiment, the regular signed number encoding processing unit 211 may receive a multiplier in the operation process, and perform regular signed number encoding processing on the multiplier to obtain the target code. It should be noted that the method of the regular signed number encoding process can be characterized by the following ways: for theNFor the bit multiplier, the low order value is processed to the high order value, if there is a continuationll>= 2) bit value 1, then it is possible to continuenBit value 1 is converted into data' 1 (0) l-1(-1) ", and the rest will correspond to (1) ((R))N-l) Bit value and converted (l+1) Combining the bit values to obtain new data; then the new data is used as the initial data of the next stage of conversion processing until no continuous data exists in the new data obtained after the conversion processingll>= 2) digit value 1; wherein, it is toNThe bit multiplier is processed by regular signed number coding, and the bit width of the obtained target code can be equal to (A)N+1). Further, in the regular signed number encoding process, the data 11 can be converted into (100- > 001), that is, the data 11 can be equivalently converted into 10 (-1); data 111 can be converted to (1000-0001), i.e., data 111 can be converted to 100 (-1) equivalently; by analogy, the others are consecutivell>= 2) the manner of bit value 1 conversion processing is also similar.
For example, the multiplier received by the regular signed number encoding processing unit 211 is "001010101101110"The first new data obtained by performing the first-stage conversion processing on the multiplier is 0010101011100 (-1) 0 ", the second new data obtained by continuing the second-stage conversion processing on the first new data is 0010101100 (-1) 00 (-1) 0", the third new data obtained by continuing the third-stage conversion processing on the second new data is 0010110 (-1) 00 (-1) 00 (-1) 0 ", the fourth new data obtained by continuing the fourth-stage conversion processing on the third new data is 00110 (-1) 0 (-1) 00 (-1) 00 (-1) 0", the fifth new data obtained by continuing the fifth-stage conversion processing on the fourth new data is 010 (-1) 0 (-1) 0 (-1) 00 (-1) 00 (-1) 0 ", there is no continuity in the fifth new datall>= 2) bit number value 1, at this time, the fifth new data may be referred to as an initial code, and after performing bit complement processing on the initial code once, the processing of representing the regular signed number coding is completed to obtain an intermediate code, where a bit width of the initial code may be equal to a bit width of a multiplier. Optionally, after the regular signed number encoding processing unit 211 performs regular signed number encoding processing on the multiplier, to obtain new data (i.e. initial encoding), if the highest-order numerical value and the second-order numerical value in the new data are "10" or "01", the regular signed number encoding processing unit 211 may complement a first-order numerical value 0 to the first-order position of the highest-order numerical value of the new data, to obtain the corresponding middle-encoded high three-order numerical values which are "010" or "001", respectively. Optionally, the bit width of the intermediate code may be equal to the bit width of the data currently processed by the data processor plus 1.
In addition, if the data bit width received by the data processor is 2NAnd can be currently processedNA regular signed number encoding unit 211 in the data processor for operating bit data can encode 2NBit data is split into two groupsNThe bit data is operated on respectively, and at this time, two groups (N+1) The bit intermediate codes can be used as target codes after being combined; if the data processor can currently process 2NBit data operation, the regular signed number encoding processing unit 211 in the data processor can encode the obtained (2)N+1) One is added at the position one bit higher than the highest bit value of the bit intermediate codeAfter the bit value 0 (i.e. complement processing), the complement processed (2)N+2) Bit data is encoded as a target.
In the data processor provided in this embodiment, a regular signed number encoding processing unit in the data processor performs regular signed number encoding processing on received first data to obtain a target code, a first partial product obtaining circuit obtains a first partial product of the corresponding target code according to received second data and the target code, a second partial product obtaining circuit obtains a second partial product of the corresponding target code according to received second data and the target code, and performs accumulation processing through a first compression circuit and a second compression circuit to obtain a target operation result; the data processor can carry out regular signed number coding processing on the received data, and the number of the obtained effective partial products is small, so that the complexity of realizing multiplication operation or multiply-accumulate operation by the data processor is reduced; meanwhile, the data processor can not only realize multiplication operation, but also realize multiplication and accumulation operation, thereby improving the universality of the data processor; in addition, the data processor does not need to carry out accumulation operation on the multiplication operation result once again to finish the multiplication and accumulation operation, and can directly realize the multiplication and accumulation or multiplication operation through one operation process, thereby reducing the power consumption of the data processor.
Fig. 3 is a schematic diagram of a detailed structure of a data processor according to another embodiment, in which a first modified encoding sub-circuit 111 in the data processor includes: a first modified coding processing branch 1111 and a first partial product selection branch 1112, wherein an output terminal of the first modified coding processing branch 1111 is connected to an input terminal of the first partial product selection branch 1112;
the first modified coding processing branch 1111 is configured to perform a regular signed number coding process on the received first data to obtain the first target code, and the first partial product selection branch 1112 is configured to obtain a sign bit extended first partial product according to the first target code, select the sign bit extended first partial product, receive the sign bit extended second partial product output by the partial product exchange circuit 13, and use the received sign bit extended second partial product and the selected sign bit extended first partial product as the first partial product of the target code.
Specifically, the first modified coding sub-circuit 111 may perform regular signed number coding processing on a multiplier in the received first data to obtain a first target code, and obtain a first partial product after sign bit expansion according to the multiplicand in the first data and the first target code. Optionally, the bit width of the first target code may be equal to the bit width of the multiplier plus 1, and the bit width of the first partial product after sign bit extension may be equal to 2 times the bit width of the multiplicand currently processed by the data processor. Optionally, the number of the first partial products after sign bit extension may be equal to the number of the first partial products of the target code, and may also be equal to the bit width of the first target code. Wherein the number of the first partial products after sign bit extension may be equal to the bit width of the first target code.
Illustratively, the data processor receives two data with a bit width of 16 bits, if the data processor can currently process a multiplication operation of 8 bits by 8 bits of data, the first modified encoding sub-circuit 111 in the data processor may divide the data with a bit width of 16 bits into two groups, i.e., upper 8 bits and lower 8 bits of data, to perform operation processing, respectively, at this time, the bit width of the obtained first partial product after sign bit extension may be equal to 16, the first upper partial product after 9 sign bit extensions may be obtained by performing operation processing on the upper 8 bits of data, and the first lower partial product after 9 sign bit extensions may be obtained by performing operation processing on the lower 8 bits of data; if the data processor can currently process a multiplication operation of 16 bits by 16 bits of data, the first modified coding sub-circuit 111 in the data processor may perform operation processing on two complete 16 bits of data, at this time, the bit width of the obtained sign bit expanded first partial product may be equal to 32, and 18 sign bit expanded first partial products may be obtained, the upper 9-bit value in the first target code, and the corresponding sign bit expanded partial product may be referred to as a sign bit expanded first upper partial product; the lower 9-bit value in the first target code, the corresponding sign bit extended partial product may be referred to as the first lower bit partial product after sign bit extension.
Optionally, the second modified encoding sub-circuit 121 includes: a second modified coding processing branch 1211 and a second partial product selecting branch 1212, wherein an output terminal of the second modified coding processing branch 1211 is connected to an input terminal of the second partial product selecting branch 1212; the second modified coding processing branch 1211 is configured to perform a regular signed number coding process on the received second data to obtain the second target code, and the second partial product selecting branch 1212 is configured to select the sign-bit-extended second partial product according to the sign-bit-extended second partial product obtained by the second target code, receive the sign-bit-extended first partial product output by the partial product exchanging circuit 13, and use the received sign-bit-extended second partial product and the selected sign-bit-extended first partial product as the second partial product of the target code.
It should be noted that, when the data processor processes 2NPositionNIn the multiply-accumulate operation of bit data, the partial product exchanging circuit 13 in the data processor may exchange, according to actual requirements, the sign-extended first lower-order partial product or the sign-extended first upper-order partial product obtained by the first modified encoding processing branch 1111 with the sign-extended second lower-order partial product or the sign-extended second upper-order partial product obtained by the second modified encoding sub-circuit 121. Optionally, after the partial product exchanging circuit 13 performs the exchanging process, the first modified coding processing branch 1111 may combine the first partial product after the sign bit extension that is not exchanged in the first modified coding processing branch 1111 with the received second partial product after the sign bit extension, and use the first partial product as the first partial product of the target code; the second modified coding processing branch 1211 may combine the extended second partial product of the sign bit not exchanged in the second modified coding processing branch 1211 with the received extended first partial product of the sign bit as the second partial product of the target coding.
In this embodiment, the method of processing data by the first modified encoding processing branch 1111 is basically the same as the method of processing data by the second modified encoding processing branch 1211; in this embodiment, the method for processing data by the second modified encoding processing branch 1211 is not described in detail.
In the data processor provided by this embodiment, a first modified coding processing branch in the data processor performs regular signed number coding processing on received first data to obtain a first partial product after sign bit expansion, selects the first partial product after sign bit expansion through a first partial product selection branch according to a data mode currently processed by the data processor to obtain a first partial product of a target code, and performs accumulation processing on the first partial product of the target code through a first modified compression sub-circuit to obtain a target operation result; the data processor does not need to carry out accumulation operation on the multiplication operation result once again to finish the multiplication and accumulation operation, and can directly realize the multiplication and accumulation or multiplication operation through one operation process, thereby reducing the power consumption of the data processor; meanwhile, the data processor can also carry out regular signed number coding processing on the received data, and the number of the obtained effective partial products is small, so that the complexity of realizing multiplication operation or multiplication accumulation operation by the data processor is reduced.
As an embodiment, the first modified encoding processing branch 1111 in the data processor includes: a first modified coding unit 1111a, a low-bit partial product obtaining unit 1111b, a low-bit selector set unit 1111c, a high-bit partial product obtaining unit 1111d, and a high-bit selector set unit 1111e, wherein a first output terminal of the first modified coding unit 1111a is connected to a first input terminal of the low-bit partial product obtaining unit 1111b, an output terminal of the low-bit selector set unit 1111c is connected to a second input terminal of the low-bit partial product obtaining unit 1111b, a second output terminal of the first modified coding unit 1111a is connected to a first input terminal of the high-bit partial product obtaining unit 1111d, and an output terminal of the high-bit selector set unit 1111e is connected to a second input terminal of the high-bit partial product obtaining unit 1111 d.
Wherein the first modified encoding unit 1111a is configured to perform regular signed number encoding on the received first data, determine a bit width of the data that can be processed by the data processor according to the received function selection mode signal, and obtain a first target code according to the bit width of the data that can be processed by the data processor, the lower bit product obtaining unit 1111b is configured to obtain a first lower bit product after sign bit extension according to a first lower bit target code in the received first target code and the first data, the lower bit selector group unit 1111c is configured to gate a value in the first lower bit product after sign bit extension, the upper bit product obtaining unit 1111d is configured to obtain a first upper bit product after sign bit extension according to a first upper bit target code in the received first target code and the first data, the high selector bank unit 1111e is used to gate the value in the first high bit partial product after the sign bit extension.
Specifically, the first modified coding processing branch 1111 may receive a multiplier in the first data, and perform regular signed number coding on the multiplier to obtain a first target code, and the low-order partial product obtaining unit 1111b may obtain a low-order partial product after sign bit extension according to the multiplicand in the received first data and the first target code obtained by the first modified coding unit 1111 a; the high-order bit product obtaining unit 1111d may obtain the high-order bit product after sign bit extension according to the multiplicand in the received first data and the first target code obtained by the first modified coding unit 1111 a. The first data may include a multiplier and a multiplicand in a multiplication operation or a multiply-accumulate operation. If the data bit width which can be processed currently by the data processor is asNBit, the bit width of two data received by the first modified encoding unit 1111a in the data processor is 2NBit, the first modified coding unit 1111a may automatically receive 2NBit data split highNBit data and lowNBit data; then are respectively highNBit data sum lowNCarrying out regular signed number coding processing on the bit data to obtain the bit width of the first high-order target codeIs equal toNPlus 1, the bit width of the first low-order object code is also equal toNAdding 1; meanwhile, the obtained numbers of the first high bit partial product corresponding to the target code and the first low bit partial product corresponding to the target code may be all equal to (a)N+1) (ii) a If the data bit width which can be processed currently by the data processor is 2NThe bit width of the two data received by the first modified coding processing branch 1111 in the data processor is 2NThen the first modified encoding processing branch 1111 may receive 2NThe bit data is processed by regular signed number coding to obtain (2)N+1) Intermediate coding of the bits and complementing the intermediate coding to obtain (2)N+2) Bit data of this (2)N+2) The data of the bit is used as a first target code, wherein the complement processing can be characterized as complementing the value 0 at the position one bit higher than the highest bit value of the data; at this time, high in the first target encoding: (N+1) The bit data may be referred to as a first upper target code, a lower one of whichN+1) The bit data may be referred to as a first lower target encoding. Optionally, the highest-order numerical value of the first target code is a numerical value 0 obtained after the complement processing, and all numerical values included in the partial product of the corresponding obtained target code may be a numerical value 0.
It should be noted that the low bit selector bank unit 1111c may gate the partial bit values in the first low bit partial product after sign bit extension according to the received function selection mode signal asNThe value in the sign-extended first lower partial product of the bit multiplication operation, again 2NThe median value of the first lower partial product after sign bit expansion obtained by bit multiplication operation; similarly, the upper selector bank unit 1111e may gate the partial bit value in the first upper partial product after sign bit extension according to the received function selection mode signal asNThe value in the sign-extended first high-order partial product obtained by the bit multiplication operation is also 2NThe sign bit of the value in the extended first high-order partial product obtained by the bit multiplication operation.
It will be appreciated that the data processor may receive a data bit width of 2NBit, currently processable 2NData carrierIf so, the lower-order partial product obtaining unit 1111b in the data processor may obtain the corresponding sign bit extended lower-order partial product according to each order value in the first lower-order target code; the lower selector bank unit 1111c may gate the value in the first lower partial product after sign bit extension; and then combining the low-order bit product after the sign bit is expanded with the value in the first low-order bit product after the sign bit is expanded, which is obtained after gating, so as to obtain the first low-order bit product after the sign bit is expanded. Optionally, the high-order partial product obtaining unit 1111d in the data processor may obtain, according to each bit value in the first high-order target code, a corresponding sign bit extended high-order partial product; the upper selector bank unit 1111e may gate the value in the first upper partial product after sign bit extension; and then combining the high-order partial product after the sign bit is expanded with the value in the first high-order partial product after the sign bit is expanded, which is obtained after gating, so as to obtain the first high-order partial product after the sign bit is expanded. Optionally, in the regular signed number encoding process, the bit width of the first low-order target code may be equal to the bit width of the first high-order target code, and may also be equal to the bit width of the low-order target codeNThe number of first lower bit partial products, or high, after sign bit extension corresponding to bit dataNThe number of first high bit partial products after sign bit extension corresponding to bit data. Optionally, the first modified encoding processing branch 1111 may include (N+1) A lower partial product obtaining unit 1111b, which may further includeN+1) And a high bit partial product obtaining unit 1111 d. Optionally, each of the above low-bit partial product obtaining units 1111b may include 4NEach high-order partial product obtaining unit 1111d may also include 4NAnd each value generation subunit can obtain a bit value in the first lower partial product after sign bit extension. Meanwhile, the lower-order-product obtaining unit 1111b may determine a first lower-order product of the target code according to the obtained first lower-order product after sign bit extension, and the upper-order-product obtaining unit 1111d may determine a first upper-order product of the target code according to the obtained first upper-order product after sign bit extension.
In addition, the second modified coding processing branch 1211 is the same as the first modified coding processing branch 1111 in the method for implementing the regular signed number coding processing, and the internal structures of the second modified coding processing branch 1211 and the first modified coding processing branch 1111 and the functions of the external output port are also the same, so the method and the structure for processing data by the second modified coding processing branch 1211 are not described in detail in this embodiment.
In the data processor provided in this embodiment, the data processor performs regular signed number coding processing on received data through a first correction coding unit in a first correction coding processing branch to obtain a first lower-order target code and a first higher-order target code, a lower-order product obtaining unit obtains a lower-order product after sign extension according to the first lower-order target code, and a higher-order product obtaining unit obtains a higher-order product after sign extension according to the first higher-order target code, and further determines whether it is necessary to perform exchange processing on the lower-order product after sign extension and the higher-order product after sign extension to obtain a partial product of the target code, and performs accumulation processing on the partial product of the target code to obtain a target operation result; the data processor can not only realize multiplication operation, but also realize multiplication and accumulation operation, thereby improving the universality of the data processor; meanwhile, the data processor can also carry out regular signed number coding processing on the received data, and the number of the obtained effective partial products is small, so that the complexity of realizing multiplication operation or multiplication accumulation operation by the data processor is reduced.
As one embodiment, the first modified encoding unit 1111a in the data processor includes: a first data input port 1111aa, a first mode select signal input port 1111ab, a lower target encoded output port 1111ac, and an upper target encoded output port 1111 ad; the first data input port 1111aa is configured to receive the first data, the first mode selection signal input port 1111ab is configured to receive the function selection mode signal, the lower target encoding output port 1111ac is configured to output the first lower target encoding obtained after the first data is subjected to the regular signed number encoding processing, and the upper target encoding output port 1111ad is configured to output the first upper target encoding obtained after the first data is subjected to the regular signed number encoding processing.
Specifically, in the multiplication operation process, the first modified coding unit 1111a in the data processor may receive the first data through the first data input port 1111aa, receive the function selection mode signal through the first mode selection signal input port 1111ab, perform the regular signed number coding on the multiplier in the first data to obtain the intermediate code, determine whether the complementary number processing needs to be performed on the intermediate code according to the received function selection mode signal, further obtain the first target code, then output the first low-order target code in the first target code through the low-order target code output port 1111ac, and output the first high-order target code in the first target code through the high-order code output port 1111 ad.
According to the data processor provided by the embodiment, the data processor can perform regular signed number encoding processing on received data to reduce the number of effective partial products acquired in a multiplication process, so that the complexity of the data processor in realizing multiplication is reduced, the operation efficiency of the multiplication is improved, and the power consumption of the data processor is effectively reduced.
In one embodiment, the lower-bit partial product obtaining unit 1111b in the data processor includes: a low-order object code input port 1111ba, a strobe value input port 1111bb, a first data input port 1111bc, and a low-order product output port 1111 bd; the lower target code input port 1111ba is configured to receive the first lower target code input by the first modified code unit 1111a, the strobe value input port 1111bb is configured to receive the value in the sign bit extended first lower partial product obtained after being strobed by the lower selector group unit 1111c, the first data input port 1111bc is configured to receive the first data, and the lower partial product output port 1111bd is configured to output the sign bit extended first lower partial product.
In particular, numberThe lower-order product obtaining unit 1111b in the data processor may receive the first lower-order target code output from the first modified encoding unit 1111a through a lower-order target code input port 1111ba, and may receive the multiplicand in the first data through a first data input port 1111 bc. Optionally, the lower-order partial product obtaining unit 1111b may obtain the first lower-order partial product after the sign bit is extended according to the received first lower-order target code and the received multiplicand in the multiplication operation or multiply-accumulate operation. Optionally, if the multiplicand bit width received by the first data input port 1111bc in the low-bit partial product obtaining unit 1111b is as followsNThen, the bit width of the first low bit partial product after sign bit extension obtained by the low bit partial product obtaining unit 1111b may be equal to 2N. Illustratively, if the lower partial product fetch unit 1111b receives oneNMultiplicand of bit widthXThen the lower part product obtaining unit 1111b can obtain the result according to the multiplicandXAnd three values included in the first lower target code, namely-1, 1 and 0, to obtain corresponding original partial products, and obtaining a sign bit extended lower bit partial product according to the original partial products, wherein the sign bit extended lower bit partial product has a lower value (of:)N+ 1) bit values may be equal to all values contained in the original partial product, the sign bit extended high (in the lower partial product)N-1) the bit values may each be equal to the sign bit value (i.e., the most significant bit value) of the original partial product. Wherein, when the value in the first low-order target code is-1, the original partial product can be-XWhen the value in the first lower target code is 1, the original partial product may beXWhen the value in the first lower target code is 0, then the original partial product may be 0.
It should be noted that the low-bit product obtaining unit 1111b may receive, through the gated value input port 1111bb, the corresponding bit value in the first low-bit product after sign bit extension obtained when the data operation in different modes is gated by the low-bit selector bank unit 1111 c; then, the lower-order partial product after sign bit extension currently obtained by the lower-order partial product obtaining unit 1111b is combined with the corresponding bit value after gating, so as to obtain the first lower-order partial product after sign bit extension.
Optionally, the high-order partial product obtaining unit 1111d in the data processor includes: a high bit target code input port 1111da, a strobe value input port 1111db, a data input port 1111dc, and a high bit partial product output port 1111 dd; the upper target code input port 1111da is configured to receive the first upper target code outputted from the first modified code unit 1111a, the strobe value input port 1111db is configured to receive the value of the sign bit extended first upper product outputted after being strobed by the upper selector set unit 1111e, the data input port 1111dc is configured to receive the first data, and the upper product output port 1111dd is configured to output the sign bit extended first upper product.
It is understood that the method for the high-order partial product obtaining unit 1111d to obtain the first high-order partial product after sign bit extension is the same as the method for the low-order partial product obtaining unit 1111b to obtain the first low-order partial product after sign bit extension, and the method for the high-order partial product obtaining unit 1111d to obtain the partial product will not be described in detail in this embodiment. In addition, the internal circuit structures of the low-bit partial product obtaining unit 1111b and the high-bit partial product obtaining unit 1111d may be the same, and the functions of the external output ports may also be the same.
In the data processor provided in this embodiment, the lower-order-portion-product obtaining unit in the data processor may obtain, according to the first lower-order target code, a lower-order-portion product after sign bit extension, and then combine the lower-order-portion product after sign bit extension with a value gated by the lower-order-selector-group unit to obtain a first lower-order-portion product after sign bit extension, and further determine whether to perform exchange processing on the first lower-order-portion product after sign bit extension and the first upper-order-portion product after sign bit extension to obtain a partial product of the target code, and perform accumulation processing on the partial product of the target code to obtain data operation results in different modes; the data processor can realize data operation processing in different modes, thereby improving the universality of the data processor; meanwhile, after the data processor carries out regular signed number coding processing on the received data, the number of the obtained effective partial products is small, and therefore the complexity of the data processor for realizing multiplication operation is reduced.
In one embodiment, the low selector bank unit 1111c in the data processor comprises: a low bit selector 1111ca, a plurality of the low bit selectors 1111ca are used to gate the value in the first low bit partial product after the sign bit is extended.
Specifically, the number of low selectors 1111ca in the low selector bank unit 1111c may be equal to 3N*(N+1),2NMay represent the bit width of the data currently processed by the data processor, and the internal circuit configuration of each of the low selectors 1111ca in the low selector bank unit 1111c may be the same. Optionally, the first modified coding unit 1111a is connected to correspond to (during multiplication or multiply-accumulate operation)N+ 1) lower partial product fetch units 1111b, each lower partial product fetch unit 1111b may contain 4NA number value generation subunit, of which 2NThe number generation subunit may be connected 2NA low bit selector 1111ca, 2NEach of the digital value generation subunits may be connected to a low bit selector 1111 ca. Alternative, 2N2 for each low selector 1111caNA number value generation subunit capable of generating 2 higher in the first lower part product after sign bit extensionNA value generation subunit corresponding to the bit data, and 2NThe external input port of the individual low selector 1111ca has two other input ports in addition to the function selection mode signal input port (mode). Optionally, if the data processor can process four different modes of data operations, and the bit width of the data received by the data processor is 2NThen, the two other input ports of the low level selector 1111ca can receive signals with the value 0 respectively, and perform 2 with the data processorNWhen the bit-wide data is operated, the sign bit value in the first low-order partial product after the corresponding sign bit is extended is obtained by the low-order partial product obtaining unit 1111 b. Wherein (A), (B), (C), (D), (C), (NThe +1 lower bit partial product obtaining units 1111b may be connected to: (1)N+ 1) group 2NIndividual low bit selector 1111ca, 2 of each groupNThe sign bit values received by the low bit selectors 1111ca may be the same or different; however, 2 of the same groupNThe sign bit value received by the lower selectors 1111ca is the same, and may be according to each group 2NThe lower selector 1111ca is obtained by the sign bit value in the first lower partial product after the sign bit extension acquired by the lower partial product acquisition unit 1111b connected correspondingly.
In addition, each lower partial product obtaining unit 1111b includes 4NA number generating subunit, whereinNThe digital value generation subunit may not connect the low bit selector 1111ca, and at this time, the digital value generation subunitNThe value obtained by the number generating subunit may be a value in a first low-order target code obtained by processing data with different bit widths currently by the data processor, and a corresponding bit value in a first low-order partial product after corresponding sign bit expansion is obtained; it is also to be understood that,Nthe value obtained by the number value generation subunit may be the 1 st bit to the 1 st bit from the lowest bit (i.e., the 1 st bit) to the highest bit in the first lower partial product after the corresponding sign bit is expandedNAll values between the bit values.
Note that each of the lower partial product obtaining units 1111b includes 4NThe rest of the number generation subunitNThe number generation subunit may also be connectedNA plurality of low selectors 1111ca, to each of which 1 low selector 1111ca may be connected; theNThe external input port of the individual low selector 1111ca has two other input ports in addition to the function selection mode signal input port (mode); the signals that these two other input ports can receive are respectively processed 2 for the data processorNBit data operation, obtaining the sign bit value in the first lower partial product after corresponding sign bit expansion, and the data processor performing 2NAnd performing bit data operation to obtain a corresponding bit value in the low-bit product after the corresponding sign bit is expanded. Wherein (A), (B), (C), (D), (C), (NThe +1 lower bit partial product obtaining units 1111b may be connected to: (1)N+ 1) groupNIndividual low bit selector 1111ca, of each groupNThe sign bit values received by the low bit selectors 1111ca may be the same or different; but of the same groupNThe sign bit value received by the respective low bit selectors 1111ca is the same, and the sign bit value may be determined according to each groupNThe lower selector 1111ca is obtained by the sign bit value in the first lower partial product after the sign bit extension acquired by the lower partial product acquisition unit 1111b connected correspondingly.
In addition, of each groupNThe corresponding bit value in the sign-extended first lower-order product received by the lower-order selector 1111ca may be determined according to the corresponding bit value in the sign-extended first lower-order product obtained by the lower-order product obtaining unit 1111b connected to the lower-order selector 1111 ca; and of each groupNIn the low bit selectors 1111ca, the corresponding bit values received by each of the low bit selectors 1111ca may be the same or different. Wherein, each lower bit partial product obtaining unit 1111b has 4NThe position distribution rule of the individual value generation subunit can be 4 in the last lower partial product obtaining unit 1111bNAnd on the basis of the positions of the numerical value generation subunits, moving the numerical value generation subunits left by one. Optionally, of the first low-order partial products of all target codes participating in the subsequent operation, only the bit width of the first low-order partial product of the first target code may be equal to 4 bits wide of the first low-order partial product after the first sign bit is extendedN(ii) a The bit width of the first lower partial product of the remaining target codes is one bit less than that of the first partial product of the last target code, and the bit width of the first upper partial product of the last target code may be equal to (2)N-1)。
Optionally, the high selector bank unit 1111e includes: a high bit selector 1111ea, a plurality of the high bit selectors 1111ea are used to gate the value in the first high bit partial product after the sign bit is extended.
It should be noted that the method for gating the value by the high selector 1111ea can be described as follows.
Optionally, the high bit in the high bit selector set unit 1111e is selectedThe number of 1111ea may be equal to 3N*(N+1),2NWhich may represent the bit width of the data currently processed by the data processor, the internal circuit structure of each of the high selectors 1111ea in the high selector bank unit 1111e may be the same. Optionally, the first modified encoding unit 1111a may be connected to (a) during multiplication or multiply-accumulate operationN+ 1) high-order partial product obtaining units, each of which may contain 4NA number value generation subunit, of which 2NThe number generation subunit may be connected 2NA high-level selector 1111ea, one for each value generation subunit. Alternatively, 2 aboveN2 for each high selector 1111eaNA number generation subunit capable of generating a low 2 in the high bit partial product of the target codeNA value generation subunit corresponding to the bit value, 2NThe external input port of the individual high bit selector 1111ea has two other input ports in addition to the function selection mode signal input port (mode). Optionally, if the data processor can process four different modes of data operations, and the bit width of the data received by the data processor is 2NThen the two other input ports of the high level selector 1111ea receive signals of 0 and the data processor proceeds to 2NWhen the bit-wide data is operated, the high-order partial product obtaining unit obtains the corresponding bit value in the partial product after the corresponding sign bit is expanded. Wherein (A), (B), (C), (D), (C), (NThe +1 upper partial product obtaining units may be connected to: (1)N+ 1) group 2NIndividual high bit selector 1111ea, 2 of each groupNThe corresponding bit values received by the high bit selectors 1111ea may be the same or different.
In addition, each high-order partial product obtaining unit includes 4NIn the number value generation subunit, correspondNThe number generation subunit may be connectedNA plurality of high bit selectors 1111ea, 1 high bit selector 1111ea may be connected to each value generation subunitNThe individual high selector 1111ea and the internal circuit configuration of the selector may be the same, andNexternal input port of high bit selector 1111ea except function selection mode signal input port (ii)mode), there are two other input ports, and the signals received by these two other input ports, respectively, can be processed 2 for the data processorNPerforming a bit data operation to obtain a sign bit value in the partial product after the sign bit is extended, and performing 2 on the sign bit value and the data processorNAnd performing bit data operation to obtain a sign bit value in the partial product after the sign bit is expanded correspondingly. Wherein (A), (B), (C), (D), (C), (NThe +1 upper partial product obtaining units may be connected to: (1)N+ 1) groupNIndividual high bit selectors 1111ea, of each groupNThe sign bit values received by the upper selectors 1111ea may be the same or different, but the sign bit values of the same group may be differentNThe sign bit value received by the upper selectors 1111ea is the same, and the sign bit value may be determined according to each groupNThe upper selector 1111ea is configured to obtain the sign bit value in the sign bit extended partial product acquired by the upper partial product acquisition unit connected correspondingly. In addition, of each groupNThe sign bit extended partial product received by the upper selector 1111ea may be determined according to the sign bit value of the partial product obtained by the upper partial product obtaining unit connected to the upper selector 1111ea, and the sign bit value of each group may be determinedNIn the high bit selectors 1111ea, the corresponding bit values received by each of the high bit selectors 1111ea may be the same or different.
Note that each high-order partial product obtaining unit includes 4NThe rest of the number generation subunitNThe digital value generating subunit may not connect the high bit selector 1111ea, and at this time, the digital value generating subunitNThe value obtained by the number value generating subunit may be a corresponding bit value in a partial product after a corresponding sign bit is extended, which is obtained by a value in a high-order target code obtained by processing data with different bit widths currently by the data processor, or may be understood as,Nthe value obtained by the number value generation subunit may be (2) th bit from the lowest bit (i.e., 1 st bit) to the highest bit in the higher-order partial product of the sign bit extensionN+ 1) bit to 3 rd bitNAll values between the bit values. Wherein each high-order partial product obtaining unit is 4NNumber generation subunitThe distribution rule of the positions of (4) in the last high-order partial product obtaining unitNAnd on the basis of the positions of the numerical value generation subunits, moving the numerical value generation subunits left by one. Optionally, the bit width of the upper partial product of only the first target code among the upper partial products of all target codes participating in the subsequent operation may be equal to 4NThe bit width of the upper product of the remaining target codes is one bit less than the bit width of the upper product of the last target code, and the bit width of the upper product of the last target code may be equal to (2)N-1)。
In the data processor provided by this embodiment, the low bit selector set unit in the data processor may gate the value in the low bit partial product to obtain the first low bit partial product after sign bit extension, and further obtain the first partial product of the target code according to the first low bit partial product after sign bit extension, and perform accumulation processing on the first partial product of the target code through the compression circuit to obtain target operation results in different modes.
In one embodiment, the data processor includes a first partial product selection branch 1112, the first partial product selection branch 1112 comprising: a function selection mode signal input port (mode) 1112a, a first partial product input port 1112b, a second partial product input port 1112c, a first partial product output port 1112d, and a gated partial product output port 1112 e; the function selection mode signal input port (mode) 1112a is configured to receive the function selection mode signal, the first partial product input port 1112b is configured to receive the first partial product after the sign bit is extended and input by the first modified coding sub-circuit 111, the second partial product input port 1112c is configured to receive the second partial product after the sign bit is extended and exchanged by the partial product exchange circuit 13, the first partial product output port 1112d is configured to output the first partial product after the sign bit is extended and needs to be exchanged by the partial product exchange circuit 13, and the gated partial product output port 1112e is configured to output the first partial product after the sign bit is extended and the received second partial product after the sign bit is extended.
Specifically, if the data processor can currently process 2NPositionNThe multiply-accumulate operation of the bit data, the partial product exchange circuit 13 in the data processor can exchange the second lower partial product after the sign bit is expanded and the first lower partial product after the sign bit is expanded; or the partial product exchanging circuit 13 in the data processor may exchange the second upper partial product after the sign bit is extended with the first upper partial product after the sign bit is extended; at this time, the first partial product selection branch 1112 may receive the second partial product after sign bit expansion exchanged by the partial product exchange circuit 13 through the second partial product input port 1112c, and the first partial product selection branch 1112 may output the first partial product after sign bit expansion to be exchanged to the partial product exchange circuit 13 through the first partial product output port 1112 d. A gated partial product output port 1112e in the first partial product selection branch 1112 may output a first partial product after sign bit extension that does not need to be exchanged, and a second partial product after sign bit extension that is received; meanwhile, the first partial product selection branch 1112 inputs the sign bit expanded first partial product that does not need to be exchanged and/or the received sign bit expanded second partial product as the first partial product of the target code to the first modified compression sub-circuit 112 for compression.
In the data processor provided by this embodiment, the data processor may select the first partial product after sign bit extension through the first partial product selection branch to obtain the east first partial product of the target code, so that the data processor may not only implement multiplication and multiply-accumulate operations on parity-wide data, but also implement multiply-accumulate operations on data with different bit widths, thereby improving the universality of the data processor.
In one embodiment, the data processor includes a first modified compression sub-circuit 112, the first modified compression sub-circuit 112 comprising: a modified wallace tree group unit 1121 and an accumulation unit 1122, wherein the output end of the modified wallace tree group unit 1121 is connected with the input end of the accumulation unit 1122; the modified wallace tree group unit 1121 is configured to perform accumulation processing on each column number value in the first partial product of the target code obtained when data in different modes are processed through arithmetic operations, so as to obtain an accumulation operation result, and the accumulation unit 1122 is configured to perform addition operation on the accumulation operation result.
Specifically, the modified wallace tree group unit 1121 may perform an accumulation process on each column number value in the first partial product of the target code obtained by the first modified coding sub-circuit 111, and perform an accumulation process on two operation results obtained by the modified wallace tree group unit 1121 through the accumulation unit 1122 to obtain a target operation result. When the wallace tree group unit 1121 is modified to perform the accumulation processing, the distribution rule of the first partial products of all the target codes may be characterized in that the position of the lowest order value in the first partial product of each row corresponding to a target code is shifted to the right by one order value compared with the position of the lowest order value in the first partial product of the next row corresponding to the target code, but the highest order value in the first partial product of each target code is located in the same column as the highest order value in the first partial product of the first target code. Optionally, the modified wallace tree group unit 1121 may perform accumulation processing on each column number value in the first partial products of all the target codes according to a distribution rule of the first partial products of all the target codes. Optionally, the two operation results obtained by the modified wallace tree group unit 1121 may include a sum bit output signalSumAnd carry output signalCarry
For example, if the data processor currently processes a 16 bit by 16 bit fixed point number multiplication, the distribution rule of the first partial products of the 9 target codes obtained by the first partial product selecting branch 1112 is shown in fig. 4a, wherein the open circles represent each bit value in the partial products, and the filled circles represent sign-extended bit values in the partial products.
If the data processor is the circuit structure shown in fig. 3, the data processor currently processes 16 bits by 8 bits of fixed point number multiply-accumulate operation, and the distribution rule of the first partial product of the target code received by the first modified compression sub-circuit 112 or the second modified compression sub-circuit 122 is shown in fig. 4 b; wherein, the hollow circle represents the partial product obtained by the first partial product selecting branch 1112 or the second partial product selecting branch 1212; the cross open circles indicate the sign-bit-extended second partial product obtained by the second partial product selecting branch 1212 through the partial product swapping circuit 13 by the first partial product selecting branch 1112, or the sign-bit-extended first partial product obtained by the first partial product selecting branch 1112 through the partial product swapping circuit 13 by the second partial product selecting branch 121.
In addition, the second correction compression sub-circuit 122 processes data in the same way as the first correction compression sub-circuit 112; the internal structures of the second modified compressing sub-circuit 122 and the first modified compressing sub-circuit 112 and the functions of the external output ports are also the same, and the method and the structure for processing data by the second modified compressing sub-circuit 122 are not repeated in this embodiment.
In the data processor provided by this embodiment, the data processor may accumulate the first partial product of the target code through the first correction compression sub-circuit, and accumulate the accumulation result through the accumulation unit to obtain the target operation result.
In one embodiment, the data processor includes a modified wallace tree group unit 1121, and the modified wallace tree group unit 1121 includes: the low-level Wallace tree sub-unit 1121a, the selector 1121b and the high-level Wallace tree sub-unit 1121c, wherein the output end of the low-level Wallace tree sub-unit 1121a is connected with the input end of the selector 1121b, and the output end of the selector 1121b is connected with the input end of the high-level Wallace tree sub-unit 1121 c; the multiple low-order wallace tree subunits 1121a are configured to perform an accumulation operation on each column number value in the first partial product of the target code to obtain the accumulation operation result, the selector 1121b is configured to gate the carry input signal received by the high-order wallace tree subunit 1121c, and the multiple high-order wallace tree subunits 1121c are configured to perform an accumulation operation on each column number value in the first partial product of the target code to obtain the accumulation operation result.
Specifically, the circuit structure of each low-level wallace tree subunit 1121a may be implemented by a combination of a full adder and a half adder, or by a combination of 4-2 compressors; the circuit structure of each high-level wallace tree subunit 1121c may also be implemented by a combination of a full adder and a half adder, or by a combination of a 4-2 compressor; in addition, both the lower Wallace tree subunit 1121a and the upper Wallace tree subunit 1121c can be understood as a circuit that can process a multi-bit input signal and add the multi-bit input signal to obtain a two-bit output signal. Optionally, the number of the modified high-order Wallace tree sub-units 1121c in the Wallace tree group unit 1121 may be equal to the bit width of the multiplicand when the data processor can currently process multiplication or multiply-accumulate operationsNAnd may also be equal to the number of lower Wallace tree subelements 1121 a; the two adjacent low-level wallace tree sub-units 1121a may be connected in series, and the two adjacent high-level wallace tree sub-units 1121c may also be connected in series. Optionally, an output end of the last low-level wallace tree subunit 1121a is connected to an input end of the selector 1121b, and an output end of the selector 1121b is connected to an input end of the first high-level wallace tree subunit 1121 a. Optionally, in the modified wallace tree group unit 1121, each lower-order wallace tree sub-unit 1121a may add the corresponding column values of the first partial products of all target codes; each of the low-level Wallace tree subunits 1121a may output two signals, namely, a carry signalCarry i And a sum signalSum i (ii) a Wherein,iit can indicate the number corresponding to each of the lower Wallace tree sub-units 1121a, and the number of the first lower Wallace tree sub-unit 1121a is 1. Optionally, the number of input signals received by each of the lower-level wallace tree sub-units 1121a may be equal to the number of first partial products of the target code. Wherein, in the modified Wallace tree group unit 1121, the sum of the numbers of the upper Wallace tree sub-units 1121c and the lower Wallace tree sub-units 1121a may be equal to 2N(ii) a The total number of columns from the lowest column to the highest column in the first partial product of all target codes may be equal to 2NNThe lower Wallace Tree subunit 1121a may encode a lower of the first partial product of all the targetsNEach column value in the column data is accumulated,Nthe upper Wallace tree subunit 1121c may encode the upper of the first partial product of all the targetsNEach column value in the column data is accumulated.
Illustratively, if the data processor currently requires processing 2NBit 2NThe bit data multiplication operation, in this case, the selector 1121b in the data processor may gate the last low-order Wallace tree subunit 1121a in the modified Wallace tree group unit 1121 to output the carry output signalCout N As the carry input signal received by the first high-order Wallace tree subunit 1121c in the modified Wallace tree group unit 1121Cin N+1(ii) a If the data processor currently needs to processNPositionNMultiplication of bit data, in which case the selector 1121b of the data processor may gate the value 0 as the carry input signal received by the first high-order Wallace tree subunit 1121c of the modified Wallace tree group unit 1121Cin N+1(ii) a It will also be appreciated that the data processor may currently receive 2NBit sub-data, divided into highNBit data sum lowNThe bit data is multiplied to correct the corresponding number from the first lower Wallace tree subunit 1121a to the last lower Wallace tree subunit 1121a in the Wallace tree group unit 1121iMay be represented as 1, 2, …,N(ii) a Corresponding numbering from the first high Wallace Tree subelement 1121c to the last high Wallace Tree subelement 1121ciCan be respectively expressed asN+1,N+2,…,2N
It should be noted that, each of the low-order wallace tree sub-units 1121a and the high-order wallace tree sub-units 1121c in the modified wallace tree group unit 1121 may receive signals including carry input signalsCin i Partial product value input signal, carry output signalCout i . Optionally, the partial product value input signal received by each of the lower-level wallace tree sub-units 1121a and the upper-level wallace tree sub-units 1121c may be a value of a corresponding column in the first partial product of all target codes; carry signals output by each of the low Wallace Tree subunits 1121a and the high Wallace Tree subunits 1121cCout i May be equal toN Cout =floor((N I +N Cin )/2) -1. Wherein,N I may represent the number of partial product value input signals of the lower-order wallace tree subunit 1121a or the upper-order wallace tree subunit 1121c,N Cin may represent the number of carry input signals to the lower walsh tree subunit 1121a or the upper walsh tree subunit 1121c,N Cout may represent the minimum number of carry output signals of the lower walsh tree subunit 1121a or the upper walsh tree subunit 1121c,floora floor function may be represented. Optionally, in the modified wallace tree group unit 1121, the carry input signal received by each lower-level wallace tree subunit 1121a or the upper-level wallace tree subunit 1121c may be a carry output signal output by the last lower-level wallace tree subunit 1121a or the upper-level wallace tree subunit 1121c, and the carry input signal received by the first lower-level wallace tree subunit 1121a is a value of 0. The carry input signal received by the first high-order wallace tree subunit 1121c may be determined by the bit width of the data in different modes currently processed by the data processor, and the bit width of the multiplicand in the multiplication operation or multiply-accumulate operation currently processed by the data processor.
In the data processor provided by this embodiment, the data processor may perform accumulation processing on the partial product of the target code by modifying the wallace tree group unit to obtain two output signals, and perform accumulation processing on the two output signals by the accumulation unit to obtain data operation results in different modes; the data processor can realize data operation processing in different modes, thereby improving the universality of the data processor and effectively reducing the area of the AI chip occupied by the data processor; in addition, the data processor does not need to carry out accumulation operation on the multiplication operation result once again to finish the multiplication and accumulation operation, and can directly realize the multiplication and accumulation or multiplication operation through one operation process, thereby reducing the power consumption of the data processor.
In one embodiment, the data processor includes an accumulation unit 1122, the accumulation unit 1122 including: and an adder 1122a, wherein the adder 1122a adds the accumulation result.
Specifically, the adder 1122a can be an adder with different bit widths. Optionally, the adder 1122a may receive the two signals output by the modified wallace tree group unit 1121, perform addition operation on the two output signals, and output a data operation result of the current processing mode of the data processor. Optionally, the adder 1122a may be a carry look ahead adder, and the bit width of the carry look ahead adder corresponding to the processed data may be equal to the bit width of the operation result output by the modified wallace tree group unit 1121.
In the data processor provided by this embodiment, the data processor may perform accumulation processing on two paths of signals output by the modified wallace tree group unit through the accumulation unit, and output data operation results in different modes; the data processor does not need to carry out accumulation operation on the multiplication operation result once again to finish the multiplication and accumulation operation, and can directly realize the multiplication or multiplication and accumulation operation through one operation process, thereby reducing the power consumption of the data processor.
In one embodiment, the second partial product selection branch 1212 of the data processor comprises: a function selection mode signal input port (mode) 1212a, a second partial product input port 1212b, a first partial product input port 1212c, a second partial product output port 1212d, and a gated partial product output port 1212 e; the function selection mode signal input port (mode) 1212a is configured to receive the function selection mode signal, the second partial product input port 1212b is configured to receive a second partial product after the sign bit is expanded, the second modified coding sub-circuit 121 inputs the second partial product, the first partial product input port 1212c is configured to receive a first partial product after the sign bit is expanded, the first partial product being obtained by switching by the partial product switching circuit 13, the second partial product output port 1212d is configured to output a second partial product after the sign bit is expanded, which needs to be switched by the partial product switching circuit 13, and the gated partial product output port 1212e is configured to output a second partial product after the sign bit is expanded, which is obtained by gating, and the received first partial product after the sign bit is expanded.
Specifically, if the data processor can currently process 2NPositionNThe multiplication and accumulation operation of the bit data, the partial product exchange circuit 13 in the data processor can exchange the second partial product after the sign bit is expanded and the first partial product after the sign bit is expanded; the second partial product selection branch 1212 in the data processor may receive the first partial product after sign bit expansion exchanged by the partial product exchange circuit 13 through the first partial product input port 1212c, and output the second partial product after sign bit expansion to be exchanged to the partial product exchange circuit 13 through the second partial product output port 1212 d. The gated partial product output port 1212e may output a second partial product after sign bit extension that does not need to be exchanged, and a received first partial product after sign bit extension; then, the second partial product selecting branch 1212 inputs the sign bit expanded second partial product that does not need to be exchanged and/or the received sign bit expanded first partial product as the second partial product of the target code to the second modified compressing sub-circuit 122 for compressing.
In the data processor provided by this embodiment, the data processor may select the partial product after sign bit extension through the second partial product selection branch to obtain the partial product of the target code, so that the data processor may not only implement multiplication and multiply-accumulate operations on parity-wide data, but also implement multiply-accumulate operations on data with different bit widths, thereby improving the universality of the data processor.
In one embodiment, the partial product switching circuit 13 in the data processor comprises: a function selection mode signal input port (mode) 131, a first partial product input port 132, a first partial product output port 133, a second partial product input port 134, and a second partial product output port 135, where the function selection mode signal input port (mode) 131 is configured to receive the function selection mode signal, the first partial product input port 132 is configured to receive a first partial product after sign bit expansion that needs to be exchanged and is input by the first modified coding sub-circuit 111, the first partial product output port 133 is configured to output the first partial product after sign bit expansion, the second partial product output port 134 is configured to receive a second partial product after sign bit expansion that needs to be exchanged and is input by the second modified coding sub-circuit 121, and the second partial product output port 135 is configured to output the second partial product after sign bit expansion.
Specifically, it is understood that the partial product exchanging circuit 13 determines whether to exchange the first partial product after sign bit expansion and the second partial product after sign bit expansion according to the function selection mode signal received through the function selection mode signal input port (mode) 131; the partial product exchanging circuit 13 may exchange the first lower bit product after the sign bit extension and the second lower bit product after the sign bit extension, or the partial product exchanging circuit 13 may exchange the first upper bit product after the sign bit extension and the second upper bit product after the sign bit extension. However, in this embodiment, only if the data processor needs to process 2NPositionNThe partial product exchanging circuit 13 needs to exchange the partial product after sign bit expansion only when the bit data is multiplied and accumulated, and the partial product exchanging circuit 13 may not need to exchange the data when the other three modes of data operation are processed.
In the data processor provided in this embodiment, the data processor may exchange, through the partial product exchange circuit, a first partial product obtained by the first modified coding sub-circuit after sign bit extension and a second partial product obtained by the second modified coding sub-circuit after sign bit extension, so as to implement 2NPositionNThe data processor not only can realize multiplication and accumulation of the data with the same bit width, but also can realize multiplication and accumulation of the data with different bit widths, thereby improving the data processorThe versatility of the method.
Another embodiment provides a data processor, in which the regular signed number encoding processing unit 211 includes: the first data input port 2111 is configured to receive the first data subjected to the regular signed number encoding processing, the function selection mode signal input port 2112 is configured to receive the function selection mode signal, and the target encoding output port 2113 is configured to output the target encoding obtained after the first data is subjected to the regular signed number encoding processing.
Specifically, the regular signed number encoding processing unit 211 may determine that the data bit width currently processable by the data processor is according to the received function selection mode signalNOr 2N. If the bit width of the data currently processed by the regular signed number coding processing unit 211 isNThen the regular signed number encoding processing unit 211 may automatically encode the received two 2 sNBit sub-data, each divided into highNBit data (i.e. high bit data) and lowNBit data (namely low bit data) and respectively carrying out regular signed number coding processing on the high bit data and the low bit data; if the bit width of the data currently processed by the regular signed number coding processing unit 211 is 2NThen the regular signed number encoding processing unit 211 may encode two 2 sNAnd the bit sub-data as a whole carries out regular signed number coding processing on the two sub-data respectively.
It should be noted that the first data may include two 2 sNBit sub-data, if the regular signed number encoding processing unit 211 currently needs to pair 2NThe bit data is subjected to a regular signed number encoding process, the lower bit data in the first data may include two 2' sNTwo corresponding low-order data in the bit sub-data; if the regular signed number encoding processing unit 211 needs to pair currentlyNThe bit data is processed, the regular signed number encoding processing unit 211 may encode two 2 sNBit sub-data divided into twoNBit sub-data, i.e. fourNBit sub-data; low in the first dataThe bit data may include two 2 sNThe bit sub-data corresponds to four lower bits of data. In addition, during the regular signed number encoding processing, the number of the lower target codes obtained by the regular signed number encoding processing unit 211 may be equal to the number of the obtained upper target codes, the number of the first lower partial products of the target codes corresponding to the lower data, or the number of the first upper partial products of the target codes corresponding to the upper data. If the data processor is currently processing oneNBit*NMultiplication of bit data, in which one subdata of the first data and the second data is 0, i.e. high in the first data and the second dataNBit data or lowNBit data is all 0; in addition, if the data processor is currently processing a 2NBit*2NMultiplication of bit data, wherein one subdata of the first data and the second data is 0, and the other subdata of the first data and the second data is 2NThe bit is not a 0 value.
In the data processor provided by this embodiment, the data processor performs regular signed number coding processing on the received first data through the regular signed number coding processing unit to obtain a target code, and further obtains a partial product of the target code according to the target code, and performs accumulation processing on the partial product of the target code to obtain a target operation result, thereby implementing data operation processing in multiple different modes; the data processor can carry out regular signed number coding processing on the received data through the regular signed number coding processing unit, and the number of the obtained effective partial products is small, so that the complexity of realizing multiplication operation or multiplication accumulation operation by the data processor is reduced; meanwhile, the data processor can realize data operation processing in various different modes, so that the universality of the data processor is improved, and the area of the AI chip occupied by the data processor is effectively reduced.
As one embodiment, the first partial product acquisition circuit 22 in the data processor includes: a low bit partial product acquisition unit 221, a low selector bank unit 222, a high bit partial product acquisition unit 223, and a high selector bank unit 224; a first input of the lower partial product obtaining unit 221 and a first input of the upper partial product obtaining unit 223 are both connected to the output of the regular signed number encoding processing unit 211, a second input of the lower partial product obtaining unit 221 is connected to the output of the lower selector bank unit 222, and a second input of the upper partial product obtaining unit 223 is connected to the output of the upper selector bank unit 224.
Wherein the lower bit partial product obtaining unit 221 is configured to obtain a first lower bit partial product after sign bit extension according to a lower bit target code in the target code and the second data, and obtain a first lower bit partial product of the target code according to the first lower bit partial product after sign bit extension, the lower selector group unit 222 is configured to gate a value in the first lower bit partial product after sign bit extension according to the received function selection mode signal, the upper bit partial product obtaining unit 223 is configured to obtain a first upper bit partial product after sign bit extension according to an upper bit target code in the target code and the second data, and obtain a first upper bit partial product of the target code according to the first upper bit partial product after sign bit extension, the upper selector group unit 224 is configured to select the mode signal according to the received function, gating the value in the first high bit partial product after the sign bit extension.
Specifically, it can be understood that the lower bit product obtaining unit 221 may obtain, according to each bit value in the lower bit target code input by the regular signed number coding unit 211, a corresponding lower bit product after sign bit expansion; the low selector bank unit 222 may gate the value in the sign-extended first low bit partial product; and then combining the value in the lower bit partial product after the sign bit expansion with the value in the first lower bit partial product after the sign bit expansion after gating to obtain a first lower bit partial product after the sign bit expansion, and obtaining a first lower bit partial product of the target code according to the first lower bit partial product after the sign bit expansion. Similarly, the high-order partial product obtaining unit 223 may obtain, according to each digit value in the high-order target code input by the regular signed number coding unit 211, a high-order partial product after sign bit expansion corresponding to the high-order data in the first data; the upper selector bank unit 224 may gate the value in the sign-extended first upper partial product; and then obtaining the first high-order bit product after the sign bit is expanded by the numerical value in the first high-order bit product after the sign bit is expanded and the gated sign bit is expanded, and obtaining the first high-order bit product of the target code according to the first high-order bit product after the sign bit is expanded.
In this embodiment, the first partial product of the target code may be obtained by performing a product of the first lower bit portion of the target code and the first upper bit portion of the target code. If the bit width of the first target code can be equal to 2NThe number of the value in the first lower target code, starting from the lowest value, may be 1.NThe corresponding number of the first lower bit product after the corresponding sign bit extension may also be 1.NThe corresponding number of the first lower partial product of the target code is similar to the corresponding number of the first lower partial product after sign bit extension; meanwhile, if the number corresponding to the value in the first high-order target code from the lowest order value can be the numberN+1,...,2NThen the corresponding number of the first high-order bit product after the sign bit extension may also be the same numberN+1,...,2NThe corresponding number of the first high-order partial product of the target code is similar to the corresponding number of the first high-order partial product after sign bit extension; the distribution rule of the first partial products of all the target codes can be further characterized in that the first lower bit partial product of the first target code can be equal to the first lower bit partial product after the first sign bit is extended, namely the first partial product of the first target code; starting from the first lower partial product of the second target code, the highest order value of the first lower partial product of each target code may be in the same column as the highest order value of the first partial product of the first target code; the lowest bit value corresponding to the first lower product of each target code is left-shifted by one bit from the lowest bit value of the first lower product of the previous target code, and the first partial product of the next target code to the first lower product of the last target code may be the second partial product of the first target codeA high-order partial product; wherein, the bit width of the first high bit partial product of the first target code can be equal toNCorresponding to left shifting of the first upper bit product after the first sign bit extension based on the corresponding column of the first lower bit product after the first sign bit extensionNThe bit value is not the value in the first partial product of the target code, and the distribution of the first high partial products of other target codes is analogized in turn.
It should be noted that if the data processor can currently process 2NBit 2NThe multiplication of bit data, the first partial product obtaining circuit 22 in the data processor may comprise: (N+1) A lower bit partial product obtaining unit 221, and: (N+1) A higher partial product acquisition unit 223; at this time, each lower partial product obtaining unit 221 may include 4NEach high-order partial product obtaining unit 223 of the number value generation subunit may also include 4NA number generation subunit. If the data processor currently needs to be pairedNThe first partial product obtaining circuit 22 in the data processor may include (b)N+1) A/2 lower partial product obtaining units 221, andN+1) a/2 high partial product obtaining units 223; at this time, each lower partial product obtaining unit 221 may include 2NA number value generation subunit, each of the upper partial product acquisition units 223 may include 2NAnd each value generation subunit can acquire one value in the sign bit expanded first partial product.
Optionally, the second partial product obtaining circuit 23 includes: a low bit product acquisition unit 231, a low bit selector bank unit 232, a high bit product acquisition unit 233, and a high bit selector bank unit 234; a first input terminal of the low-order partial product obtaining unit 231 and a first input terminal of the high-order partial product obtaining unit 233 are both connected to the output terminal of the regular signed number encoding processing unit 211, a second input terminal of the low-order partial product obtaining unit 231 is connected to the output terminal of the low-order selector bank unit 232, and a second input terminal of the high-order partial product obtaining unit 233 is connected to the output terminal of the high-order selector bank unit 234.
Wherein the lower bit portion obtaining unit 231 is configured to obtain a first lower bit portion after sign bit extension according to a lower bit target code in the target code and the second data, and obtain a first lower bit portion of the target code according to the first lower bit portion after sign bit extension, the lower bit selector group unit 232 is configured to gate a value in the first lower bit portion after sign bit extension according to the received function selection mode signal, the upper bit portion obtaining unit 233 is configured to obtain a first upper bit portion after sign bit extension according to an upper bit target code in the target code and the second data, and obtain a first upper bit portion of the target code according to the first upper bit portion after sign bit extension, the upper bit selector group unit 234 is configured to select the mode signal according to the received function, gating the value in the first high bit partial product after the sign bit extension.
In addition, the method for acquiring the sign bit expanded first partial product by the first partial product acquiring circuit 22 is the same as the method for acquiring the sign bit expanded second partial product by the second partial product acquiring circuit 23, and the method for acquiring the partial product by the second partial product acquiring circuit 23 is not described in this embodiment again. In addition, the internal circuit structures of the first partial product obtaining circuit 22 and the second partial product obtaining circuit 23 may be the same, and the functions of the external output ports may also be the same, and the specific structure of the second partial product obtaining circuit 23 is not described in detail in this embodiment.
In the data processor provided in this embodiment, the data processor obtains, through the low-order partial product obtaining unit, the high-order partial product obtaining unit, and the selector group unit, the first partial product after sign bit extension according to the low-order target code and the high-order target code, obtains the first partial product of the target code according to the first partial product after sign bit extension, and further performs accumulation processing on the first partial product of the target code to obtain a target operation result; the data processor can obtain fewer effective acquisition numbers, thereby reducing the complexity of realizing multiplication or multiply-accumulate operation by the data processor; meanwhile, the data processor does not need to carry out accumulation operation on the multiplication operation result once again to finish the multiplication and accumulation operation, and can directly realize the multiplication or multiplication and accumulation operation only through one operation process, thereby reducing the power consumption of the data processor; in addition, the data processor can also realize data operation processing in different modes, thereby improving the universality of the data processor.
In one embodiment, the lower portion product obtaining unit 221 in the data processor includes: a lower target code input port 2211, a strobe value input port 2212, a second data input port 2213, and a lower partial product output port 2214; the lower target code input port 2211 is configured to receive the first lower target code input by the regular signed code processing unit 211, the gated value input port 2212 is configured to receive a value in the sign-extended first lower partial product obtained after gating by the lower selector group unit 222, the second data input port 2213 is configured to receive the second data, and the lower partial product output port 2214 is configured to output the first lower partial product of the target code.
Specifically, the lower partial product obtaining unit 221 in the data processor may receive the lower target code in the target code output by the regular signed number encoding unit 211 through the lower target code input port 2211, and may receive two sub data (i.e., multiplicand) in the second data through the second data input port 2213. Optionally, the lower-order partial product obtaining unit 221 may obtain a lower-order partial product after sign bit extension corresponding to the lower-order data according to the received lower-order target code and a received multiplicand in a multiplication operation or a multiply-accumulate operation, and obtain a first lower-order partial product of the target code according to the lower-order partial product after sign bit extension. Optionally, if the multiplicand received by the second data input port 2213 in the lower partial product obtaining unit 221 has a bit width ofNThe bit width of the first lower bit partial product after sign bit extension obtained by the lower bit partial product obtaining unit 221 may be equal to 2N
It should be noted that the lower partial product obtaining unit 221 may receive, through the gated value input port 2212, the corresponding bit value in the lower partial product after sign bit extension obtained when the data operation in different modes is gated by the lower selector bank unit 222; then, the lower bit partial product after sign bit extension currently obtained by the lower bit partial product obtaining unit 221 is combined with the corresponding bit value after gating to obtain the first lower bit partial product after sign bit extension.
Optionally, the data processor includes the high-order partial product obtaining unit 223, and the high-order partial product obtaining unit 223 includes: an upper target code input port 2231, a strobe value input port 2232, a second data input port 2233, and an upper partial product output port 2234; the upper bit target code input port 2231 is configured to receive an upper bit target code output by the regular signed number coding unit 211, the strobe value input port 2232 is configured to receive a value in a first upper bit partial product of the sign bit expanded output after being strobed by the upper selector bank unit 224, the second data input port 2233 is configured to receive the second data, and the upper bit partial product output port 2234 is configured to output the first upper bit partial product of the target code.
It is understood that the method for the lower portion product obtaining unit 221 to obtain the first lower portion product of the target code is the same as the method for the upper portion product obtaining unit 223 to obtain the first upper portion product of the target code, and the method for the upper portion product obtaining unit 223 to obtain the partial product is not described in detail in this embodiment. In addition, the internal circuit structures of the low-order partial product obtaining unit 221 and the high-order partial product obtaining unit 223 may be the same, and the functions of the external output ports may be similar, and the specific structure of the high-order partial product obtaining unit 223 will not be described in detail in this embodiment.
In the data processor provided by this embodiment, the low-order-portion-product obtaining unit in the data processor may obtain, according to each bit value in the low-order target code, a low-order-portion product after sign-order extension, then combine the low-order-portion product after sign-order extension with a value gated by the low-order selector group unit to obtain a first low-order-portion product after sign-order extension, and obtain a first low-order-portion product of the target code according to the first low-order-portion product after sign-order extension, and further perform accumulation processing on the first low-order-portion product and the high-order-portion product of the target code to obtain data operation results in different modes, where the number of effective obtaining that the data processor can obtain is small, thereby reducing complexity of the data processor in realizing multiplication operation or multiplication-accumulation operation; meanwhile, the data processor can realize data operation processing in different modes, so that the universality of the data processor is improved.
In one embodiment, the data processor includes a low selector bank unit 222, the low selector bank unit 222 including: a low selector 2221, and a plurality of low selectors 2221 are used to gate the value in the first low partial product after the sign bit is extended.
Specifically, the number of the low selectors 2221 included in the low selector set unit 222 may be equal to 3N*(N+1),2NMay represent the bit width of the data currently processed by the data processor, the internal circuit structure of each low selector 2221 in the low selector bank unit 222 may be the same. Alternatively, if the data processor can currently process 2NBit 2NMultiplication of bit data, each regular signed number encoding unit 211 is connected toN+ 1) lower partial product obtaining units 221, each of which may include 4NA number value generation subunit, of which 2NThe number generation subunit may be connected 2NA low selector 2221, one low selector 2221 for each value generation subunit. Alternatively, 2 aboveN2 for each low selector 2221NA number value generation subunit capable of generating 2 higher in the first lower part product after sign bit extensionNA value generation subunit corresponding to the bit data, and 2NThe internal circuit structure of the low selector 2221 and the selector 212 may be the same, and 2 is the sameNThe external input port of the low selector 2221 has two other input ports in addition to the function selection mode signal input port (mode). Alternatively, if the data processor can handle four different modes of data operations, and the data processor receives the multiplicand bitsWidth is 2NThen, the two other input ports of the low selector 2221 can receive signals with the value 0 respectively, and perform 2 with the data processorNBit 2NIn the multiplication of bit data, the sign bit value in the first lower partial product after the corresponding sign bit is extended is obtained by the lower partial product obtaining unit 221. Wherein (A), (B), (C), (D), (C), (N+1 lower partial product obtaining units 221 may be connected to: (N+ 1) group 2NA low selector 2221, 2 for each groupNThe sign bit values received by the low bit selectors 2221 may be the same or different; however, 2 of the same groupNThe corresponding sign bit values received by the respective low selectors 2221 are the same, and may be based on each group 2NThe individual lower selector 2221 is obtained by multiplying the sign bit value in the first lower-order bit product obtained by the connected lower-order bit product obtaining unit 221 by the sign bit.
In addition, each lower partial product obtaining unit 221 includes 4NA number generating subunit, whereinNThe value generating subunit may not be connected to the low selector 2221, and in this case, theNThe value obtained by the number generating subunit may be a value in the first low-order target code obtained by multiplying data with different bit widths currently processed by the data processor, and a corresponding bit value in the first low-order partial product after the corresponding sign bit is extended is obtained; it is also to be understood that,Nthe value obtained by the number value generation subunit may be the 1 st bit to the 1 st bit from the lowest bit (i.e., the 1 st bit) to the highest bit in the first lower partial product after the corresponding sign bit is expandedNAll values between the bit values.
Note that each lower partial product obtaining unit 221 includes 4NThe rest of the number generation subunitNThe number generation subunit may also be connectedNA plurality of low selectors 2221, each of the value generating sub-units may be connected to 1 of the low selectors 2221; theNThe internal circuit structure of the low selector 2221 and the selector 212 may be the same, andNexternal input ports of the individual low selector 2221 in addition to the function selection mode signal input port (mode),there are two other input ports; the signals which can be received by the two other input ports are respectively carried out for the data processorNPositionNMultiplication of bit data to obtain the sign bit value in the first lower partial product after sign bit expansion, and data processor 2NBit 2NAnd carrying out multiplication operation on the bit data to obtain a corresponding bit value in the first low-bit partial product after the corresponding sign bit is expanded. Wherein (A), (B), (C), (D), (C), (N+1 lower partial product obtaining units 221 may be connected to: (N+ 1) groupNA low bit selector 2221, of each groupNThe sign bit values received by the low bit selectors 2221 may be the same or different; but of the same groupNThe sign bit value received by the lower selectors 2221 is the same, and may be according to each groupNThe individual lower selector 2221 is obtained by multiplying the sign bit value in the first lower-order bit product obtained by the connected lower-order bit product obtaining unit 221 by the sign bit.
In addition, of each groupNThe corresponding bit value in the first low bit product after the sign bit is extended, which is received by the low bit selector 2221, may be determined according to the corresponding bit value in the first low bit product after the sign bit is extended, which is obtained by the low bit product obtaining unit 221 connected to the group of low bit selectors 2221; and of each groupNIn the low selector 2221, the corresponding bit values received by each of the low selectors 2221 may be the same or different. Wherein each lower partial product obtaining unit 221 is 4NThe position distribution rule of the individual value generation subunit can be 4 in the last lower partial product obtaining unit 221NAnd on the basis of the positions of the numerical value generation subunits, moving the numerical value generation subunits left by one. Optionally, of the first low-order partial products of all target codes participating in the subsequent operation, only the bit width of the first low-order partial product of the first target code may be equal to 4 bits wide of the first low-order partial product after the first sign bit is extendedN(ii) a The bit width of the first lower product of the remaining target codes is one bit less based on the first lower product of the last target code, and the bit width of the first upper product of the last target code isCan be equal to (2)N-1)。
Optionally, the upper selector set unit 224 includes upper selectors 2241, and a plurality of the upper selectors 2241 are used for gating the value in the first upper partial product after the sign bit is extended.
It should be noted that the method for gating the value by the high-level selector 2241 is the same as the method for gating the value by the high-level selector 1111ea, and the method for gating the value by the high-level selector 2241 is not described in detail in this embodiment.
In the data processor provided by this embodiment, the low bit selector set unit in the data processor may gate the value in the low bit partial product to obtain the first low bit partial product after sign bit extension, further obtain the first partial product of the target code according to the first low bit partial product after sign bit extension, and perform accumulation processing on the first partial product of the target code through the compression circuit to obtain target operation results in different modes; the data processor can realize data operation processing in different modes, thereby improving the universality of the data processor.
Fig. 5 is a schematic structural diagram of a data processor according to another embodiment, where the data processor includes a first compressing circuit 24, and the first compressing circuit 24 includes: a modified wallace tree group unit 241 and an accumulation unit 242, wherein an output end of the modified wallace tree group unit 241 is connected with an input end of the accumulation unit 242; the modified wallace tree group unit 241 is configured to perform accumulation processing on each column number value in the first partial product of all the target codes obtained when performing operation processing on data in different modes to obtain an accumulation operation result, and the accumulation unit 242 is configured to perform addition operation on the accumulation operation result.
Specifically, the modified wallace tree group unit 241 may accumulate the first lower partial product of the target code obtained by the first partial product obtaining circuit 22 and each column number of the first upper partial product of the target code, and accumulate the two operation results obtained by the modified wallace tree group unit 241 by the accumulation unit 242 to obtain the target operation result. Wherein, by correctingWhen the lesch tree group unit 241 performs the accumulation process, the distribution rule of the first partial products of all the target codes may be characterized in that the position of the lowest bit value in the first partial product of each row corresponding to the target code is shifted by one bit value to the right from the position of the lowest bit value in the first partial product of the next row corresponding to the target code, but the highest bit value in the first partial product of each corresponding target code is in the same column as the highest bit value in the first partial product of the first target code. Optionally, the modified wallace tree group unit 241 may perform accumulation processing on each column number value in the first partial products of all the object codes according to a distribution rule of the first partial products of all the object codes. Optionally, the two operation results obtained by the modified wallace tree group unit 241 may include a sum bit output signalSumAnd carry output signalCarry
Optionally, the second compression circuit 25 includes: a modified Wallace tree group unit 251 and an accumulation unit 252, wherein the output end of the modified Wallace tree group unit 251 is connected with the input end of the accumulation unit 252; the modified wallace tree group unit 251 is configured to perform accumulation processing on each column number value in the second partial product of all the target codes obtained when performing operation processing on data in different modes to obtain an accumulation operation result, and the accumulation unit 252 is configured to perform addition operation on the accumulation operation result.
It should be noted that the method for compressing the first partial product of the target code by the first compression circuit 24 is the same as the method for compressing the second partial product of the target code by the second compression circuit 25, and the compression method of the second compression circuit 25 is not described again in this embodiment. In addition, the internal structures of the first compressing circuit 24 and the second compressing circuit 25 and the functions of the external ports are completely the same, and the detailed structure of the second compressing circuit 25 is not repeated in this embodiment.
In the data processor provided by this embodiment, the data processor may perform accumulation processing on the first low-order part and the first high-order part of the target code by modifying the wallace tree group unit to obtain an accumulation operation result, and perform accumulation processing on the accumulation operation result by the accumulation unit to obtain the target operation result.
In one embodiment, continuing with the detailed structural diagram of the data processor shown in fig. 5, the data processor includes the modified wallace tree group unit 241, where the modified wallace tree group unit 241 includes: a low-level Wallace tree subunit 2411, a selector 2412 and a high-level Wallace tree subunit 2413, wherein the output end of the low-level Wallace tree subunit 2411 is connected with the input end of the selector 2412, and the output end of the selector 2412 is connected with the input end of the high-level Wallace tree subunit 2413; the multiple low-order Wallace tree sub-units 2411 are configured to accumulate each column of values in the first partial product of the target code, the selector 2412 is configured to gate the carry input signal received by the high-order Wallace tree sub-unit 2413, and the multiple high-order Wallace tree sub-units 2413 are configured to accumulate each column of values in the first partial product of the target code to obtain the accumulation result.
Specifically, the circuit structure of each low-level wallace tree subunit 2411 may be implemented by a combination of a full adder and a half adder, or by a combination of 4-2 compressors; the circuit structure of each high-order Wallace tree subunit 2413 can also be realized by the combination of a full adder and a half adder, or by the combination of a 4-2 compressor; in addition, both the lower level Wallace Tree sub-unit 2411 and the upper level Wallace Tree sub-unit 2413 may be understood to be a circuit capable of processing a multi-bit input signal and summing the multi-bit input signal to obtain a two-bit output signal. Optionally, the number of the high-order Wallace tree sub-units 2413 in the modified Wallace tree group unit 241 may be equal to the bit width of the multiplicand when the data processor is currently capable of performing multiplication or multiply-accumulate operationsNOr equal to the number of the lower Wallace tree sub-units 2411, and the lower Wallace tree sub-units 2411 may be connected in series, and the higher Wallace tree sub-units 2413 may be connected in series. Can be used forOptionally, the output of the last lower Wallace tree subunit 2411 is connected to the input of a selector 2412, and the output of the selector 2412 is connected to the input of the first upper Wallace tree subunit 2411. Optionally, in the modified wallace tree group unit 241, each lower-order wallace tree sub-unit 2411 may add the corresponding column number of the partial product of all target codes; each low level Wallace tree subunit 2411 may output two signals, a carry signalCarry i And a sum signalSum i (ii) a Wherein,imay represent the corresponding number of each lower level wallace tree sub-unit 2411, the number of the first lower level wallace tree sub-unit 2411 is 0. Optionally, the number of input signals received by each lower-order Wallace tree subunit 2411 may be equal to the number of first partial products of the target code. Wherein, in the modified Wallace tree group unit 241, the sum of the number of the upper Wallace tree sub-units 2413 and the lower Wallace tree sub-units 2411 may be equal to 2N(ii) a The total number of columns from the lowest column to the highest column in the first partial product of all target codes may be equal to 2NNThe lower Wallace tree sub-unit 2411 may encode all of the objects with a lower first partial productNIn each row, the accumulation operation is performed,Nan upper Wallace tree subunit 2413 may encode all of the targets by the upper of the first partial productNEach of the columns is subjected to an accumulation operation.
Optionally, the modified wallace tree group unit 251 in the second compression circuit 25 includes: a low-level wallace tree subunit 2511, a selector 2512, and a high-level wallace tree subunit 2513, wherein an output terminal of the low-level wallace tree subunit 2511 is connected to an input terminal of the selector 2512, and an output terminal of the selector 2512 is connected to an input terminal of the high-level wallace tree subunit 2513; the plurality of low-order Wallace tree subunits 2511 are configured to perform an accumulation operation on each column of values in the second partial product of the target code, the selector 2512 is configured to gate the carry input signal received by the high-order Wallace tree subunit 2513, and the plurality of high-order Wallace tree subunits 2513 are configured to perform an accumulation operation on each column of values in the second partial product of the target code to obtain the accumulation operation result.
It should be noted that the circuit structure and function of the modified wallace tree group unit 241 in the first compression circuit 24 are the same as the circuit structure and function of the modified wallace tree group unit 251 in the second compression circuit 25, and the detailed structure of the modified wallace tree group unit 251 is not repeated in this embodiment.
According to the data processor provided by the embodiment, the data processor can accumulate partial products of target codes to obtain two paths of output signals by correcting the Wallace tree group unit, and accumulate the two paths of output signals to obtain data operation results in different modes, and the data processor can realize data operation processing in different modes, so that the universality of the data processor is improved, and the area of an AI chip occupied by the data processor is effectively reduced; in addition, the data processor does not need to carry out accumulation operation on the multiplication operation result once again to finish the multiplication and accumulation operation, and can directly realize the multiplication or multiplication and accumulation operation only through one operation process, thereby reducing the power consumption of the data processor.
Another embodiment provides a data processor, wherein the data processor includes the accumulation unit 242, and the accumulation unit 242 includes: an adder 2421, wherein the adder 2421 is configured to add the accumulated result.
Specifically, the adder 2421 can be an adder with different bit widths. Optionally, the adder 2421 may receive the two paths of signals output by the modified wallace tree group unit 241, perform addition operation on the two paths of output signals, and output a data operation result of the current processing mode of the data processor. Alternatively, the adder 2421 may be a carry look ahead adder.
In the data processor provided by this embodiment, the data processor can accumulate two paths of signals output by the modified wallace tree group unit through the accumulation unit and output data operation results in different modes, the data processor does not need to perform accumulation operation on the multiplication operation results once again to complete multiplication and accumulation operation, and multiplication or multiplication and accumulation operation can be directly realized through one operation process, so that the power consumption of the data processor is reduced.
In one embodiment, the data processor includes the adder 2421, and the adder 2421 includes: a carry signal input port 2421a, a bit signal input port 2421b and an operation result output port 2421 c; the carry signal input port 2421a is configured to receive a carry signal, the sum signal input port 2421b is configured to receive a sum signal, and the operation result output port 2421c is configured to output a result of performing accumulation processing on the carry signal and the sum signal.
Specifically, adder 2421 may receive the carry signal output from modified wallace tree group unit 241 through carry signal input port 2421aCarryThe sum bit signal output from the modified Wallace array circuit 241 is received through the sum bit signal input port 2421bSumAnd carry signalsCarryAnd bit signalSumThe result of the accumulation is output through the operation result output port 2421 c.
It should be noted that, in the operation process, the data processor may use an adder 2421 with different bit widths to output the carry output signal of the modified wallace tree group unit 241CarryAnd a sum bit output signalSumThe adder 2421 can process the data bit width, which can be equal to 2 times of the multiplicand bit width when the data processor needs to perform the multiplication or multiply-accumulate operation.
In the data processor provided by this embodiment, the data processor can perform the accumulation operation on the two paths of signals output by the modified wallace tree group unit through the accumulation unit, and output the data operation results in different modes, the data processor does not need to perform the accumulation operation on the multiplication operation results once again, and the multiplication or the multiplication-accumulation operation can be directly realized through one operation process, so that the power consumption of the data processor is reduced.
Fig. 6 is a flowchart illustrating a data processing method according to an embodiment, which can be processed by the data processors shown in fig. 1 and 3, where the embodiment relates to a process of implementing four different modes of data operations. As shown in fig. 6, the method includes:
s101, receiving data to be processed and a function selection mode signal, wherein the function selection mode signal is used for indicating that the data processor can process data operation in different modes currently.
Specifically, the data to be processed may include a multiplier and a multiplicand in a multiplication operation or a multiply-accumulate operation. Optionally, the data processor may receive one piece of data to be processed through the first modified coding sub-circuit and the second modified coding sub-circuit, where the data to be processed may include two pieces of sub data to be processed, and the two pieces of sub data to be processed may be the same sub data with the same bit width or different sub data with the same bit width. Optionally, the two sub-data in the data to be processed may be spliced together and input to the first modified coding sub-circuit and the second modified coding sub-circuit, or may be separately and simultaneously input to the first modified coding sub-circuit and the second modified coding sub-circuit. The sub-data to be processed may be fixed-point number, and the bit width may be 2NThe bit width of the data obtained by splicing the two to-be-processed subdata can be 4N
It should be noted that, the first multiplication circuit and the second multiplication circuit can both receive the same function selection mode, the function selection mode signal can have four different signals, the four function selection mode signals respectively correspond to four modes of data operations that can be processed by the data processor, and the four modes of data operations can includeNPositionNMultiplication of bit data,NPositionNMultiply-accumulate operation of bit data, 2NBit 2NMultiplication of bit data and 2NPositionNAnd performing multiply-accumulate operation on the bit data. The data processor can determine that the data operation of a specific mode can be processed currently according to the received different function selection mode signals. In addition, one to-be-processed subdata in one to-be-processed data can be used as a multiplier when the data processor processes multiplication operation or multiply-accumulate operation, and the other to-be-processed subdata can be used as a data processorA multiplicand in the process of multiplication or accumulation.
And S102, judging whether the data to be processed needs to be split or not according to the function selection mode signal.
Specifically, the data processor may determine a bit width of data currently processable by the data processor according to the received function selection mode signal, so as to determine whether to split the data to be processed. The splitting process may be characterized as dividing the data to be processed into a plurality of groups of data with the same bit width.
Optionally, the step of determining whether the to-be-processed data needs to be split according to the function selection mode signal in the step S102 may include: and judging whether the bit width of the data to be processed is equal to the bit width of the data which can be currently processed by the data processor and operated in the corresponding mode or not according to the function selection mode signal.
Optionally, after the step of determining whether the to-be-processed data needs to be split according to the function selection mode signal in S102, the method may further include: and if the data to be processed does not need to be split, continuing to perform regular signed number coding processing on the data to be processed to obtain the target code.
It should be noted that, in the above, according to the function selection mode signal, determining whether the data to be processed needs to be split, actually, it can be understood that, according to the function selection mode signal, determining whether the bit width of the data to be processed is equal to the bit width of the data of the corresponding mode operation that can be currently processed by the data processor, if so, the data to be processed does not need to be split, otherwise, the data to be processed needs to be split. For example, the bit widths of two data respectively received by the first modified coding sub-circuit and the second modified coding sub-circuit in the data processor are bothNBits and the data processor can currently processNPositionNAnd performing bit multiplication, wherein the bit width representing the data to be processed is equal to the bit width of the data which can be processed by the data processor and is operated in the corresponding mode. Wherein the regular signed number is compiledThe code processing may be characterized as a data processing procedure encoded by the values 0, -1 and 1. Alternatively, the bit width of the target code may be equal to the bit width of the data currently processed by the data processor plus 1.
S103, if the data to be processed needs to be split, splitting the data to be processed to obtain split data.
For example, the bit width of two data respectively received by the first modified coding sub-circuit and the second modified coding sub-circuit in the data processor is 2NBits and the data processor can currently processNPositionNThe multiplication of bits, in which case the first and second modified coding sub-circuits can automatically divide the two received data into high valuesNBit data and lowNAnd bit data to satisfy the data bit width of the corresponding mode operation which can be currently processed by the data processor.
And S104, performing regular signed number coding processing on the split data to obtain target codes.
Optionally, the step of performing regular signed number coding processing on the split data in S104 to obtain the target code may include: the split data is continuouslBit value 1 to (l+ 1) the highest bit value is 1, the lowest bit value is-1, and the rest bits are 0, the target code is obtained,lgreater than or equal to 2.
Specifically, if the bit width of the data to be processed received by the data processor is 2NThe data bit width which can be currently processed by the data processor isNThe first correction coding sub-circuit and the second correction coding sub-circuit in the data processor can both automatically generate 2NBit data split highNBit data and lowNBit data, simultaneously, respectively for highNBit data sum lowNAnd carrying out regular signed number coding processing on the bit data to obtain corresponding high-order target codes and low-order target codes. Optionally, the data to be processed may include a height to be processed after being splitNBit data and low to be processedNBit data. Wherein if it is to be treatedBit width of physical data is 2NIs then highNThe bit data may be referred to as high bit data, low bit data to be processedNThe bit data may be referred to as upper bit data to be processed.
And S105, performing conversion processing according to the target code and the split data to obtain a partial product after sign bit expansion.
Specifically, the conversion process may be characterized by converting the value in the target code into a partial product after sign bit extension based on a multiplicand in the multiplication operation. Optionally, the bit width of the partial product after sign bit extension may be equal to 2 times the bit width of the data currently processed by the data processor.
And S106, judging whether the partial product after the sign bit is expanded needs to be subjected to exchange processing or not according to the function selection mode signal.
Optionally, the step of determining whether the partial product after the sign bit extension needs to be exchanged according to the function selection mode signal in S106 may include: and judging whether the data bit widths currently processed by the data processor are the same or not according to the function selection mode signal.
In particular, when the data processor processes 2NPositionNWhen the data processor is used for data operation in three modes, the partial product exchanging circuit is in a suspended state, and the low-order partial product after sign bit extension and the high-order partial product after sign bit extension obtained by the sign bit extension do not perform corresponding exchanging processing. Meanwhile, the bit width of the two subdata in the first data and the second data is 2NIf the data processor can currently process oneNPositionNDuring the multiplication of bit data, according to actual requirements, one of the first data and the second data is 0, and the other data includes two subdataThe high-order numerical values are all 0, or the low-order numerical values are all 0, and according to actual requirements, the first data and the second data can be calculated according to original data; if the data processor can currently process one 2NBit 2NDuring the multiplication operation of the bit data, according to the actual requirement, one of the first data and the second data is 0, and the high-order numerical value and the low-order numerical value in the two subdata of the other data are both non-0 numerical values; if the data processor can currently process two 2 sNBit 2NIn the multiplication of the bit data, data 0 does not exist in the first data and the second data according to actual requirements.
It should be noted that, determining whether the bit widths of the data currently processed by the data processor are the same may actually be characterized as whether the bit widths of the multiplicand and the multiplier currently processed by the data processor are equal to each other.
Optionally, after the step of determining whether the partial product after the sign bit extension needs to be exchanged according to the function selection mode signal in S106, the method may further include: and if the partial product after the sign bit expansion needs to be exchanged, exchanging the upper-order partial product or the lower-order partial product in the partial product after the sign bit expansion.
And S107, if the sign bit expanded partial product does not need to be exchanged, taking the sign bit expanded partial product as a target coding partial product.
Specifically, if the sign bit extended partial product does not need to be exchanged, the first modified coding sub-circuit may use the first partial product obtained by extending the sign bit as the first partial product of the target code, and the second modified coding sub-circuit may use the second partial product obtained by extending the sign bit as the second partial product of the target code.
And S108, compressing the partial product of the target code to obtain a target operation result.
Specifically, the data processor may perform accumulation processing on the column numbers in the partial products of all the target codes to obtain a target operation result. Optionally, the bit width of the target operation result may be equal to 2 times of the bit width of the data currently processed by the data processor.
The data processing method provided by this embodiment receives data to be processed and a function selection mode signal, determines whether the data to be processed needs to be split according to the function selection mode signal, if the data to be processed needs to be split, the data to be processed is split to obtain split data, the split data is subjected to regular signed number encoding processing to obtain target codes, the target codes and the split data are converted to obtain partial products after sign bit expansion, determines whether the partial products after sign bit expansion need to be exchanged according to the function selection mode signal, if the partial products after sign bit expansion do not need to be exchanged, the partial products after sign bit expansion are used as the partial products of the target codes, and the partial products of the target codes are compressed, the method can realize not only multiplication operation but also multiplication and accumulation operation through the data processor, thereby improving the universality of the data processor; in addition, the method does not need to carry out accumulation operation on the multiplication operation result once again to finish the multiplication and accumulation operation, and can directly realize the multiplication and accumulation or multiplication operation through one operation process, thereby reducing the power consumption of the data processor; in addition, the method can also carry out regular signed number coding processing on the received data, and the number of the obtained effective partial products is less, thereby reducing the complexity of realizing multiplication operation or multiply-accumulate operation.
As an embodiment, in the step of performing regular signed number encoding processing on the split data in S104 to obtain the target code, the method may include:
s1041, carrying out regular signed number coding processing on the split data to obtain an intermediate code.
Specifically, the split data subjected to the regular signed number encoding processing may be a multiplier in a multiplication operation or a multiply-accumulate operation.
S1042, obtaining the target code according to the intermediate code and the function selection mode signal.
Specifically, the method of the regular signed number encoding process can be characterized by the following ways: for theNFor the bit multiplier, the low order value is processed to the high order value, if there is a continuationll>= 2) bit value 1, then it is possible to continuenBit value 1 is converted into data' 1 (0) l-1(-1) ", and the rest will correspond to (1) ((R))N-l) Bit value and converted (l+1) Combining the bit values to obtain new data; then the new data is used as the initial data of the next stage of conversion processing until no continuous data exists in the new data obtained after the conversion processingll>= 2) digit value 1; wherein, it is toNThe bit multiplier is processed by regular signed number coding, and the bit width of the obtained target code can be equal to (A)N+1). Further, in the regular signed number encoding process, the data 11 can be converted into (100- > 001), that is, the data 11 can be equivalently converted into 10 (-1); data 111 can be converted to (1000-0001), i.e., data 111 can be converted to 100 (-1) equivalently; by analogy, the others are consecutivell>= 2) the manner of bit value 1 conversion processing is also similar.
For example, the multiplier received by the first modified coding sub-circuit or the second modified coding sub-circuit in the data processor is "001010101101110", the first new data obtained by performing the first stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by continuing the second stage conversion processing on the first new data is "0010101100 (-1) 00 (-1) 0", the third new data obtained by continuing the third stage conversion processing on the second new data is "0010110 (-1) 00 (-1) 00 (-1) 0", the fourth new data obtained by continuing the fourth stage conversion processing on the third new data is "00110 (-1) 0 (-1) 00 (-1) 00 (-1) 0", and the fifth new data obtained by continuing the fifth stage conversion processing on the fourth new data is "010 (-1) 0 (-1) 0 (-1) 00 (-1) 00 (00) (1) 00 (0-1) 00:) -1) 0 ", there are no consecutive ones in the fifth new datall>= 2) bit value 1, thisAnd then, the fifth new data can be called as initial coding, and after the initial coding is subjected to one-time bit complementing processing, the representation regular signed number coding processing is completed to obtain an intermediate code, wherein the bit width of the initial coding can be equal to the bit width of a multiplier. Optionally, after the first modified coding sub-circuit or the second modified coding sub-circuit performs regular signed number coding processing on the multiplier, new data (i.e. initial coding) is obtained, and if the highest bit value and the next highest bit value in the new data are "10" or "01", the first modified coding sub-circuit or the second modified coding sub-circuit may supplement a bit value of 0 to the higher bit of the highest bit value of the new data, so as to obtain three higher bit values corresponding to the intermediate code, which are "010" or "001", respectively. Optionally, the bit width of the intermediate code may be equal to the bit width of the data currently processed by the data processor plus 1.
In addition, if the data bit width received by the data processor is 2NAnd can be currently processedNA bit data operation, a first or second modified coding sub-circuit in the data processor may be 2NBit data is split into two groupsNThe bit data is operated on respectively, and at this time, two groups (N+1) The bit intermediate codes can be used as target codes after being combined; if the data processor can currently process 2NA bit data operation, the first modified coding sub-circuit or the second modified coding sub-circuit in the data processor can be used for the obtained (2)N+1) After one digit value 0 is complemented at the position one digit higher than the highest digit value of the middle code, the complemented value is processed (2)N+2) Bit data is encoded as a target.
In the data processing method provided by this embodiment, the split data is subjected to regular signed number coding processing to obtain an intermediate code, and the target code is obtained according to the intermediate code and the function selection mode signal, so that the method can perform multiplication and multiply-accumulate operations on multiple data with different bit widths, thereby effectively reducing the area of an AI chip occupied by a data processor; meanwhile, the method can carry out regular signed number coding processing on the data, and reduce the number of effective partial products obtained in the operation process, thereby reducing the complexity of multiplication operation or multiply-accumulate operation and improving the operation efficiency.
In one embodiment, the step of performing conversion processing according to the target code and the split data in S105 to obtain a partial product after sign bit extension may include:
s1051, converting the split data according to the target code to obtain an original partial product.
Specifically, if the value in the target code is-1 and the split data isXThe original partial product can be-XIf the value in the target code is 1, the original partial product may beXIf the value in the target code is 0, the original partial product may be 0.
S1052, sign bit expansion processing is carried out on the original partial product to obtain the partial product after sign bit expansion.
In particular, the bit width of the raw partial product may be equal to the bit width of the data currently processed by the data processorNThe sign bit extended partial product may be equal to the data bit width currently processed by the data processorN2 times of the total weight of the powder. Wherein in the original partial productNThe bit value may be a low in the partial product after sign bit extensionNHigh in bit value, sign bit extended partial productNThe bit value may be the highest bit value in the original partial product, i.e. the sign bit value in the original partial product.
The data processing method provided by the embodiment can acquire fewer effective partial products, so that the complexity of multiplication or multiply-accumulate operation is reduced.
As an embodiment, the step of performing compression processing on the partial product of the target code in S108 to obtain the target operation result may include:
s1081, accumulating the partial product of the target code to obtain an intermediate operation result.
For example, the low order target is encoded (bit width isN+ 1) the lowest order value to the highest order value, the lowest orderThe number of the numerical value is 1, and the number of the highest numerical value isN+1, the corresponding target code is similarly numbered for the lower product, and the higher target code (bit wide isM+ 1) from the lowest order value to the highest order value, the lowest order value being numbered 1 and the highest order value being numberedM+1, the number of the corresponding upper partial product of the target code is similar, and the distribution rule of the lower partial products of all target codes and the partial products of all target codes can be characterized as the lowest bit value of the upper partial product of the target code numbered 1 and the lowest bit value of the upper partial product of the target code numbered 1NThe next lower value of the lower bit product of the target code of +1 is in the same column, the next lower value of the upper bit product of the other target code is in the same column as the lowest value of the upper bit product of the next target code based on the upper bit product of the first target code, and the next lower value of the lower bit product of the other target code is in the same column as the lowest value of the lower bit product of the next target code based on the lower bit product of the first target code.
It should be noted that the modified wallace tree group unit may perform an accumulation process on each column number in the partial products of all target codes.
And S1062, accumulating the intermediate operation result through an accumulation unit to obtain the target operation result.
Optionally, the step of performing accumulation processing on the intermediate operation result through an accumulation unit in S1062 to obtain the target operation result may specifically include: the low-order Wallace tree subunit performs accumulation processing on the column number in the partial product of all the target codes to obtain an accumulation operation result; the selector gates the accumulation operation result according to the function selection mode signal to obtain a carry gating signal; and the high-order Wallace tree subunit performs accumulation processing according to the carry gating signal and the column number in the partial product of the target code to obtain the target operation result.
Specifically, according to the distribution rule of the low bit part of all the target codes and the high bit part of all the target codes,the total column number of the partial product corresponding numerical values of all the target codes is 2NNThe bit width of the data currently being processed by the data processor), the number corresponding to each column value starting from the lowest bit value may be 0, …, 2N-1, wherein, numbers 0 toN-1 can be called lowNColumn number values. Optionally, the accumulation operation result may be a carry output signal output by the last high-order Wallace tree subunitCout
It should be noted that, in the following description,Nthe lower Wallace tree subunits can be arranged in the order of numbersNAnd performing accumulation operation on the column numerical values to obtain an accumulation operation result. Optionally, the result of the accumulation operation may include a carry output signal for each Wallace tree subunitCarrySumAnd the output signal of the last high-order Wallace Tree subunitCout
It will be appreciated that the selector in the modified Wallace Tree group unit may gate the output signal of the last lower Wallace Tree subunit in response to the received function selection mode signalCoutOr a value of 0A carry strobe signal is obtained.
In this embodiment, according to the distribution rule of the partial products of all target codes, the total number of columns of the corresponding numerical values of the partial products of all target codes is 2NNThe bit width of the data currently being processed by the data processor), the number corresponding to each column value starting from the lowest bit value may be 0, …, 2N-1, wherein, numberingNTo 2N-1 can be called highNColumn number values.
It should be noted that, in the following description,Nthe high-order Wallace tree subunits can be high according to the numbering sequenceNAnd performing accumulation operation on the column numerical values and outputting an accumulation operation result. The carry input signal received by the first high-order Wallace tree subunit may be a carry strobe signal output by the selector. If the data processor is currently processing 8-bit data operations, the circuit structure diagram of the corresponding modified compression sub-circuit can be seen in fig. 7.
In the data processing method provided by this embodiment, the partial product of the target code is accumulated by modifying the wallace tree group unit to obtain an intermediate operation result, and the intermediate operation result is accumulated by the accumulation circuit to obtain a target operation result, so that the method can perform multiplication operation on data with different bit widths according to the function selection mode signal received by the data processor, thereby effectively reducing the area of the AI chip occupied by the data processor; meanwhile, the number of effective partial products which can be obtained by the method is small, so that the complexity of multiplication or multiply-accumulate operation is reduced, and the operation efficiency is improved; in addition, the method does not need to carry out accumulation operation on the multiplication operation result once again to finish the multiplication and accumulation operation, can directly realize the multiplication or multiplication and accumulation operation through one operation process, and effectively reduces the power consumption of the data processor.
Fig. 8 is a flowchart illustrating a data processing method according to an embodiment, which can be processed by the data processors shown in fig. 2 and fig. 5, where the embodiment relates to a process of implementing four different modes of data operations. As shown in fig. 8, the method includes:
s201, receiving data to be processed and a function selection mode signal, wherein the function selection mode signal is used for indicating data operation of a corresponding mode which can be processed by a data processor currently.
Specifically, the data processor may receive one piece of data to be processed through the regular signed number encoding circuit, receive another piece of data to be processed through the first partial product obtaining circuit and the second partial product obtaining circuit, and the regular signed number encoding circuit, the first partial product obtaining circuit, and the second partial product obtaining circuit may all receive the same function selection mode signal at the same time. Optionally, the data to be processed may include two sub data to be processed, where the two sub data to be processed may be the same sub data with the same bit width, or may be different sub data with the same bit width. Optionally, two sub-data to be processed in one data to be processed may be spliced to form a whole and input to the regular signed number encoding circuit, or may be separately and simultaneously input to the regular signed number encoding circuit, two sub-data to be processed in another data to be processed may be spliced to form a whole,the partial products are simultaneously input to the first partial product acquisition circuit and the second partial product acquisition circuit, and may be separately and simultaneously input to the first partial product acquisition circuit and the second partial product acquisition circuit. The sub-data to be processed may be fixed-point number, and the bit width may be 2NThe bit width of the data obtained by splicing the two to-be-processed subdata can be 4N
It should be noted that the function selection mode signal may have four kinds, the four kinds of function selection mode signals respectively correspond to four kinds of mode data operations that can be processed by the data processor, and the four kinds of mode data operations may includeNPositionNMultiplication of bit data,NPositionNMultiply-accumulate operation of bit data, 2NBit 2NMultiplication of bit data and 2NPositionNAnd performing multiply-accumulate operation on the bit data. In addition, one of the to-be-processed sub-data in one of the to-be-processed data may be used as a multiplier when the data processor processes multiplication operation or multiply-accumulate operation, and the other of the to-be-processed sub-data may be used as a multiplicand when the data processor processes multiplication operation or multiply-accumulate operation.
S202, according to the function selection mode signal, performing regular signed number coding processing on the data to be processed to obtain a target code.
Optionally, the step of performing regular signed number coding processing on the data to be processed according to the function selection mode signal in S202 to obtain the target code includes: according to the function selection mode signal, continuous data in the data to be processed are processedlBit value 1 to (l+ 1) the highest bit value is 1, the lowest bit value is-1, and the rest bits are 0, the target code is obtained,lgreater than or equal to 2.
Specifically, if the bit width of the data to be processed received by the data processor is 2NThe data bit width which can be currently processed by the data processor isNThen a regular signed number encoding circuit in the data processor may automatically encode 2NBit data split highNBit data and lowNBit data, simultaneously, respectively for highNBit data sum lowNRegular signed number of bit data runsAnd (5) carrying out coding processing to obtain corresponding high-order target codes and low-order target codes.
Further, the method of the regular signed number encoding process may be characterized by: for theNFor the bit multiplier, the low order value is processed to the high order value, if there is a continuationll>= 2) bit value 1, then it is possible to continuenBit value 1 is converted into data' 1 (0) l-1(-1) ", and the rest will correspond to (1) ((R))N-l) Bit value and converted (l+1) Combining the bit values to obtain new data; then the new data is used as the initial data of the next stage of conversion processing until no continuous data exists in the new data obtained after the conversion processingll>= 2) digit value 1; wherein, it is toNThe bit multiplier is processed by regular signed number coding, and the bit width of the obtained target code can be equal to (A)N+1). Further, in the regular signed number encoding process, the data 11 can be converted into (100- > 001), that is, the data 11 can be equivalently converted into 10 (-1); data 111 can be converted to (1000-0001), i.e., data 111 can be converted to 100 (-1) equivalently; by analogy, the others are consecutivell>= 2) the manner of bit value 1 conversion processing is also similar.
For example, the multiplier received by the regular signed number encoding circuit is "001010101101110", the first new data obtained by performing the first-stage conversion processing on the multiplier is "0010101011100 (-1) 0", the second new data obtained by continuing the second-stage conversion processing on the first new data is "0010101100 (-1) 00 (-1) 0", the third new data obtained by continuing the third-stage conversion processing on the second new data is "0010110 (-1) 00 (-1) 00 (-1) 0", the fourth new data obtained by continuing the fourth-stage conversion processing on the third new data is "00110 (-1) 0 (-1) 00 (-1) 00 (-1) 0", the fifth new data obtained by continuing the fifth-stage conversion processing on the fourth new data is "010 (-1) 0 (-1) 0 (-1) 00 (-1) 00 (1) 0", there is no continuity in the fifth new datall>= 2) bit number 1, and at this time, the fifth new data may be referred to asAnd performing initial coding, performing bit complement processing on the initial coding once, and finishing the representation of the regular signed number coding to obtain an intermediate code, wherein the bit width of the initial coding can be equal to the bit width of a multiplier. Optionally, after the regular signed number encoding circuit performs regular signed number encoding processing on the multiplier, new data (i.e., initial encoding) is obtained, and if the highest-order numerical value and the next-highest numerical value in the new data are "10" or "01", the regular signed number encoding circuit may complement a numerical value of 0 at a higher-order position of the highest-order numerical value of the new data, so as to obtain a corresponding middle-encoded high-three numerical value of "010" or "001", respectively. Optionally, the bit width of the intermediate code may be equal to the bit width of the data currently processed by the data processor plus 1.
In addition, if the data bit width received by the data processor is 2NAnd can be currently processedNA regular signed number encoding circuit in the data processor for operating the bit data, 2NBit data is split into two groupsNThe bit data is operated on respectively, and at this time, two groups (N+1) The bit intermediate codes can be used as target codes after being combined; if the data processor can currently process 2NBit data operation, the regular signed number encoding circuit in the data processor can carry out the (2) obtainedN+1) After one digit value 0 is complemented at the position one digit higher than the highest digit value of the middle code, the complemented value is processed (2)N+2) Bit data is encoded as a target.
S203, according to the target code and the data to be processed, a first partial product of the target code and a second partial product of the target code are obtained.
Specifically, the data processor may obtain a first partial product of the target code and a second partial product of the target code according to the actual operation requirement, and the corresponding target code obtained by the to-be-processed sub-data (multiplier in the multiplication operation or multiply-accumulate operation) and the corresponding to-be-processed sub-data (multiplicand in the multiplication operation or multiply-accumulate operation). The data processor can obtain a first partial product of the target code through the first partial product obtaining circuit, and obtain a second partial product of the target code through the second partial product obtaining circuit.
S204, compressing the first partial product of the target code according to the function selection mode signal to obtain a first target operation result.
Optionally, the step of performing compression processing on the first partial product of the target code according to the function selection mode signal in the step S204 to obtain a first target operation result includes: the low-order Wallace tree subunit performs accumulation processing on the column number values in the first partial product of all the target codes to obtain a first accumulation operation result; the selector gates the first accumulation operation result according to the function selection mode signal to obtain a first carry gating signal; and the high-order Wallace tree subunit performs accumulation processing according to the first carry gating signal and the column number values in the first partial product of the target code to obtain the first target operation result.
Specifically, the data processor may perform an accumulation operation on the first partial product of the target code through a modified wallace tree group unit in the first compression circuit to obtain a first accumulation operation result, determine to gate the first carry gate signal according to the data operation mode corresponding to the received function selection mode signal, and perform an addition operation on the column number value in the first partial product of the target code by using the first carry gate signal as a carry input signal for the next addition operation to obtain the first target operation result. Optionally, the first accumulation operation result may include a sum output signal obtained by performing accumulation operation on the modified wallace tree group unitSumAnd carry output signalCarryWherein the sum bit outputs a signalSumAnd carry output signalCarryMay be the same. In addition, the accumulation unit is equivalent to a sum bit output signalSumAnd carry output signalCarryAnd performing accumulation operation. Optionally, the first target operation result may be data 0 or non-0 data.
It should be noted that, the data processor may output the carry output signal to the modified wallace tree group unit through the adder in the accumulation unitCarryAnd bit output signalSumTo addAnd (4) carrying out normal operation and outputting an addition operation result. Optionally, each Wallace tree subunit in the modified Wallace tree group unit may output a carry output signalCarry i And a sum bit output signalSum i i=0,…,2N-1,iNumbered starting with 0) for each corresponding number of wallace tree sub-elements. Optionally received by an adderCarry={[Carry 0 Carry N2-2]0}, that is, the carry-out signal received by the adderCarryIs bit wide asN,Carry output signalCarryMiddle front 2N-1 digit value corresponding to the first 2 in the modified Wallace Tree group UnitN-1 Wallace tree subunit carry output signal, carry output signalCarryThe last digit value in (a) may be replaced with a value of 0. Optionally, the sum bit output signal received by the adderSumIs bit wide asN,Sum bit output signalSumMay be equal to the sum bit output signal of each of the modified wallace tree subunits in the wallace tree group unit.
In this embodiment, according to the distribution rule of the low-order partial products of all target codes and the high-order partial products of all target codes, the total number of columns of the corresponding numerical values of the partial products of all target codes is 2NNThe bit width of the data currently being processed by the data processor), the number corresponding to each column value starting from the lowest bit value may be 0, …, 2N-1, wherein, numbers 0 toN-1 can be called lowNColumn number values. Optionally, the accumulation operation result may be a carry output signal output by the last high-order Wallace tree subunitCout
It should be noted that, in the following description,Nthe lower Wallace tree subunits can be arranged in the order of numbersNAnd performing accumulation operation on the column numerical values to obtain an accumulation operation result. Optionally, the result of the accumulation operation may include a carry output signal for each Wallace tree subunitCarrySumAnd the output signal of the last high-order Wallace Tree subunitCout
It will be appreciated that the cells in the set of modified Wallace treesThe selector can select the output signal of the last low-order Wallace tree subunit according to the received function selection mode signalCoutOr a value of 0A carry strobe signal is obtained.
In this embodiment, according to the distribution rule of the partial products of all target codes, the total number of columns of the corresponding numerical values of the partial products of all target codes is 2NNThe bit width of the data currently being processed by the data processor), the number corresponding to each column value starting from the lowest bit value may be 0, …, 2N-1, wherein, numberingNTo 2N-1 can be called highNColumn number values.
It should be noted that, in the following description,Nthe high-order Wallace tree subunits can be high according to the numbering sequenceNAnd performing accumulation operation on the column numerical values and outputting an accumulation operation result. The carry input signal received by the first high-order Wallace tree subunit may be a first carry strobe signal output by the selector.
S205, compressing the second partial product of the target code according to the function selection mode signal to obtain a second target operation result.
Optionally, the step of compressing the second partial product of the target code according to the function selection mode signal in S205 to obtain a second target operation result includes: the low-order Wallace tree subunit performs accumulation processing on the column number values in the second partial product of all the target codes to obtain a second accumulation operation result; the selector gates the second accumulation operation result according to the function selection mode signal to obtain a second carry gating signal; and the high-order Wallace tree subunit performs accumulation processing according to the second carry gating signal and the column number value in the second partial product of the target code to obtain a second target operation result.
Further, the data processor may perform an accumulation operation on a second partial product of the target code through a modified wallace tree group unit in the second compression circuit to obtain a second accumulation operation result, gate the second carry gating signal according to the function selection mode signal and the second accumulation operation result, and perform an accumulation process on the second accumulation operation result according to the second carry gating signal to obtain a second target operation result. Optionally, the second target operation result may be data 0 or non-0 data.
In this embodiment, the data processor may process step S204 and step S205 synchronously, and this embodiment does not limit the sequence of these two steps.
According to the data processing method provided by the embodiment, the data operation of the specific mode which can be currently processed can be determined according to the received function selection mode signal, so that not only can multiplication operation be realized, but also multiplication and accumulation operation can be realized, and the universality of a data processor is improved; in addition, the method does not need to carry out accumulation operation on the multiplication operation result once again to finish the multiplication and accumulation operation, can directly realize the multiplication or multiplication and accumulation operation only through one operation process, and also effectively reduces the power consumption of the data processor; in addition, the method can carry out regular signed number coding processing on the received data to be processed, so that the number of the obtained effective partial products is small, the complexity of multiplication or multiply-accumulate operation is reduced, and the operation efficiency is improved.
In one embodiment, the step of obtaining the first partial product of the target code and the second partial product of the target code according to the target code and the data to be processed in S203 includes:
s2031, conversion processing is carried out according to the first target code and the data to be processed, and a first original partial product is obtained.
Specifically, if the value in the first target code is-1, and the data to be processed isXThen the first original partial product can be-XIf the value in the first target code is 1, the first original partial product may beXIf the value in the first target code is 0, the first original partial product may be 0.
S2032, sign bit expansion processing is carried out according to the first original partial product and the data to be processed, and the first partial product of the target code is obtained.
In particular, the bit width of the first raw partial product may be equal to the bit width of the data currently processed by the data processorNThe sign bit extended first partial product may be equal to the data bit width currently processed by the data processorN2 times of the total weight of the powder. Wherein in the first original partial productNThe bit value may be a low in the sign bit extended first partial productNHigh in bit value, sign bit extended first partial productNThe bit value may be the highest bit value in the first original partial product, i.e. the sign bit value in the first original partial product.
S2033, the conversion processing is carried out according to the second target code and the data to be processed, and a second original partial product is obtained.
S2034, sign bit expansion processing is carried out according to the second original partial product and the data to be processed, and a second partial product of the target code is obtained.
Alternatively, the data processor may synchronize the processing between steps S2031 and S2032 and steps S2033 and S2034, and the processing sequence is not limited at all.
The data processing method provided by the embodiment can acquire fewer effective partial products, so that the complexity of multiplication or multiply-accumulate operation is reduced.
The embodiment of the application also provides a machine learning arithmetic device, which comprises one or more data processors mentioned in the application, and is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning arithmetic, and transmitting the execution result to peripheral equipment through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one data processor is included, the data processors may be linked and transmit data through a specific structure, such as through a PCIE bus, to support larger-scale machine learning operations. At this time, the same control system may be shared, or there may be separate control systems; the memory may be shared or there may be separate memories for each accelerator. In addition, the interconnection mode can be any interconnection topology.
The machine learning arithmetic device has high compatibility and can be connected with various types of servers through PCIE interfaces.
The embodiment of the application also provides a combined processing device which comprises the machine learning arithmetic device, the universal interconnection interface and other processing devices. The machine learning arithmetic device interacts with other processing devices to jointly complete the operation designated by the user. Fig. 9 is a schematic view of a combined treatment apparatus.
Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the machine learning arithmetic device and external data and control, and comprise data transportation to finish basic control of starting, stopping and the like of the machine learning arithmetic device; other processing devices can cooperate with the machine learning calculation device to complete calculation tasks.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning arithmetic device and other processing devices. The machine learning arithmetic device obtains the required input data from other processing devices and writes the input data into a storage device on the machine learning arithmetic device; control instructions can be obtained from other processing devices and written into a control cache on a machine learning arithmetic device chip; the data in the storage module of the machine learning arithmetic device can also be read and transmitted to other processing devices.
Alternatively, as shown in fig. 10, the configuration may further include a storage device, and the storage device is connected to the machine learning arithmetic device and the other processing device, respectively. The storage device is used for storing data in the machine learning arithmetic device and the other processing device, and is particularly suitable for data which is required to be calculated and cannot be stored in the internal storage of the machine learning arithmetic device or the other processing device.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.
In some embodiments, a chip is also claimed, which includes the above machine learning arithmetic device or the combined processing device.
In some embodiments, a chip package structure is provided, which includes the above chip.
In some embodiments, a board card is provided, which includes the above chip package structure. As shown in fig. 11, fig. 11 provides a card that may include other kits in addition to the chip 389, including but not limited to: memory device 390, receiving means 391 and control device 392;
the memory device 390 is connected to the chip in the chip package structure through a bus for storing data. The memory device may include a plurality of groups of memory cells 393. Each group of the storage units is connected with the chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 grains (chips). In one embodiment, the chip may internally include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits are used for ECC check. It can be understood that when DDR4-3200 grains are adopted in each group of memory units, the theoretical bandwidth of data transmission can reach 25600 MB/s.
In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.
The receiving device is electrically connected with the chip in the chip packaging structure. The receiving device is used for realizing data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the receiving device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the receiving device may also be another interface, and the present application does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the chip is still transmitted back to an external device (e.g., a server) by the receiving apparatus.
The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may carry a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing andor a plurality of processing circuits in the chip.
In some embodiments, an electronic device is provided that includes the above board card.
The electronic device may be a data processor, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a tachograph, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
It should be noted that, for simplicity of description, the foregoing method embodiments are described as a series of circuit combinations, but those skilled in the art should understand that the present application is not limited by the described circuit combinations, because some circuits may be implemented in other ways or structures according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are all alternative embodiments, and that the devices and modules referred to are not necessarily required for this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (21)

1. A data processor, characterized in that the data processor comprises: the first multiplication circuit comprises a first correction coding sub-circuit and a first correction compression sub-circuit, the second multiplication circuit comprises a second correction coding sub-circuit and a second correction compression sub-circuit, wherein the first correction coding sub-circuit comprises a first coding branch and a first selection branch, the second correction coding sub-circuit comprises a second coding branch and a second selection branch, a first output end of the first correction coding sub-circuit is connected with a first input end of the partial product exchange circuit, a second output end of the first correction coding sub-circuit is connected with an input end of the first correction compression sub-circuit, and a first output end of the partial product exchange circuit is connected with an input end of the first correction coding sub-circuit, a second output end of the partial product switching circuit is connected with an input end of the second correction coding sub-circuit, a first output end of the second correction coding sub-circuit is connected with a second input end of the partial product switching circuit, and a second output end of the second correction coding sub-circuit is connected with an input end of the second correction compression sub-circuit;
the first coding branch is configured to perform regular signed number coding on received first data to obtain a first partial product after sign bit expansion, the first selecting branch is configured to select a first partial product of a target code from the first partial product after sign bit expansion, the first correction compression sub-circuit is configured to perform compression processing on the first partial product of the target code to obtain a first target operation result, the second coding branch is configured to perform regular signed number coding on received second data to obtain a second partial product after sign bit expansion, the second selecting branch is configured to select a second partial product of the target code from the second partial product after sign bit expansion, and the second correction compression sub-circuit is configured to perform compression processing on the second partial product of the target code to obtain a second target operation result, the partial product exchanging circuit is configured to determine whether to exchange a higher-order partial product of the sign bit expanded first partial product with a higher-order partial product of the sign bit expanded second partial product or determine to exchange a lower-order partial product of the sign bit expanded first partial product with a lower-order partial product of the sign bit expanded second partial product, according to a received function selection mode signal.
2. A data processor as claimed in claim 1, wherein each of said first and second multiply operation circuits comprises a first input for receiving said function selection mode signal; the partial product switching circuit comprises a third input end for receiving the function selection mode signal; the function select mode signal is used to determine that the data processor can currently process different modes of data operations.
3. The data processor of claim 1, wherein the first modified compression sub-circuit comprises: the device comprises a correction Wallace tree group unit and an accumulation unit, wherein the output end of the correction Wallace tree group unit is connected with the input end of the accumulation unit; the modified Wallace tree group unit is used for accumulating each column number value in the first partial product of the target code acquired when data in different modes are operated, so as to obtain an accumulated operation result, and the accumulation unit is used for adding the accumulated operation result.
4. The data processor of claim 3, wherein the modified Wallace Tree group unit comprises: the system comprises a low-level Wallace tree subunit, a selector and a high-level Wallace tree subunit, wherein the output end of the low-level Wallace tree subunit is connected with the input end of the selector, and the output end of the selector is connected with the input end of the high-level Wallace tree subunit; the low-order Wallace tree subunit is configured to perform an accumulation operation on each column value in the first partial product of the target code to obtain the accumulation operation result, the selector is configured to gate the carry input signal received by the high-order Wallace tree subunit, and the high-order Wallace tree subunit is configured to perform an accumulation operation on each column value in the first partial product of the target code to obtain the accumulation operation result.
5. A data processor according to claim 3, wherein the accumulation unit comprises: an adder for adding the result of the addition operation.
6. A method of data processing, the method comprising:
receiving data to be processed and a function selection mode signal, wherein the function selection mode signal is used for indicating that a data processor can currently process data operations in different modes;
judging whether the data to be processed needs to be split according to the function selection mode signal;
if the data to be processed needs to be split, splitting the data to be processed to obtain split data;
carrying out regular signed number coding processing on the split data to obtain a target code;
performing conversion processing according to the target code and the split data to obtain a partial product after sign bit expansion;
judging whether the partial product after the sign bit expansion needs to be exchanged or not according to the function selection mode signal;
if the partial product after the sign bit expansion does not need to be exchanged, taking the partial product after the sign bit expansion as the partial product of the target code;
if the partial product after the sign bit expansion needs to be exchanged, exchanging a high-order partial product or a low-order partial product in the partial product after the sign bit expansion to obtain a partial product of the target code;
and compressing the partial product of the target code to obtain a target operation result.
7. The method according to claim 6, wherein the determining whether the data to be processed needs to be split according to the function selection mode signal comprises: and judging whether the bit width of the data to be processed is equal to the bit width of the data which can be currently processed by the data processor and operated in the corresponding mode or not according to the function selection mode signal.
8. The method according to claim 7, wherein after determining whether the bit width of the data to be processed is equal to the bit width of the data corresponding to the mode operation currently processed by the data processor according to the function selection mode signal, the method further comprises: and if the data to be processed does not need to be split, continuing to perform regular signed number coding processing on the data to be processed to obtain the target code.
9. The method according to any one of claims 6 to 8, wherein the performing regular signed number coding processing on the split data to obtain target coding includes: the split data is continuouslBit value 1 to (l+ 1) the highest bit value is 1, the lowest bit value is-1, and the rest bits are 0, the target code is obtained,lgreater than or equal to 2.
10. The method according to claim 6, wherein the performing regular signed number coding processing on the split data to obtain a target code includes:
carrying out regular signed number coding processing on the split data to obtain an intermediate code;
and obtaining the target code according to the intermediate code and the function selection mode signal.
11. The method of claim 6, wherein the performing a conversion process according to the target code and the split data to obtain a sign-bit-extended partial product comprises:
performing conversion processing according to the target code and the split data to obtain an original partial product;
and sign bit expansion processing is carried out on the original partial product to obtain the partial product after sign bit expansion.
12. The method of claim 6, wherein said determining whether the partial product after the sign bit extension requires a swap process according to the function selection mode signal comprises: and judging whether the data bit widths currently processed by the data processor are the same or not according to the function selection mode signal.
13. The method of claim 6, wherein the compressing the partial product of the target code to obtain the target operation result comprises:
accumulating the partial product of the target code to obtain an intermediate operation result;
and accumulating the intermediate operation result to obtain the target operation result.
14. The method of claim 13, wherein accumulating the intermediate operation result to obtain the target operation result comprises:
the low-order Wallace tree subunit performs accumulation processing on the column number in the partial product of all the target codes to obtain an accumulation operation result;
the selector gates the accumulation operation result according to the function selection mode signal to obtain a carry gating signal;
and the high-order Wallace tree subunit performs accumulation processing according to the carry gating signal and the column number in the partial product of the target code to obtain the target operation result.
15. A machine learning arithmetic device comprising one or more data processors according to any one of claims 1 to 5 for acquiring input data and control information to be operated on from other processing devices, processing a specified machine learning operation, and transmitting the processing result to the other processing devices via an I/O interface;
when the machine learning arithmetic device comprises a plurality of data processors, the data processors are connected through a preset specific structure and transmit data;
the data processors are interconnected through a PCIE bus and transmit data so as to support larger-scale machine learning operation; a plurality of the data processors share the same control system or own respective control systems; the data processors share the memory or own the memory; the interconnection mode of the data processors is any interconnection topology.
16. A combined processing apparatus, characterized in that the combined processing apparatus comprises the machine learning arithmetic apparatus according to claim 15, a universal interconnect interface and other processing apparatus;
and the machine learning arithmetic device interacts with the other processing devices to jointly complete the calculation operation designated by the user.
17. The combined processing device according to claim 16, further comprising: and a storage device connected to the machine learning arithmetic device and the other processing device, respectively, for storing data of the machine learning arithmetic device and the other processing device.
18. A neural network chip comprising a machine learning computation device according to claim 15 or a combined processing device according to claim 16 or a combined processing device according to claim 17.
19. An electronic device, characterized in that it comprises a chip according to claim 18.
20. The utility model provides a board card, its characterized in that, the board card includes: a memory device, a receiving device and a control device and a neural network chip as claimed in claim 18;
wherein the neural network chip is respectively connected with the storage device, the control device and the receiving device;
the storage device is used for storing data;
the receiving device is used for realizing data transmission between the chip and external equipment;
and the control device is used for monitoring the state of the chip.
21. The board card of claim 20,
the memory device includes: a plurality of groups of memory cells, each group of memory cells is connected with the chip through a bus, and the memory cells are: DDR SDRAM;
the chip includes: the DDR controller is used for controlling data transmission and data storage of each memory unit;
the receiving device is as follows: a standard PCIE interface.
CN201910902610.3A 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment Active CN110413254B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910902610.3A CN110413254B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment
CN201911349822.XA CN111008003B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910902610.3A CN110413254B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201911349822.XA Division CN111008003B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment

Publications (2)

Publication Number Publication Date
CN110413254A CN110413254A (en) 2019-11-05
CN110413254B true CN110413254B (en) 2020-01-10

Family

ID=68370615

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201911349822.XA Active CN111008003B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment
CN201910902610.3A Active CN110413254B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201911349822.XA Active CN111008003B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment

Country Status (1)

Country Link
CN (2) CN111008003B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113031911A (en) * 2019-12-24 2021-06-25 上海寒武纪信息科技有限公司 Multiplier, data processing method, device and chip
CN113033788B (en) * 2019-12-24 2023-08-18 上海寒武纪信息科技有限公司 Data processor, method, device and chip
CN113033799B (en) * 2019-12-24 2023-09-08 上海寒武纪信息科技有限公司 Data processor, method, device and chip
CN111767025B (en) * 2020-08-04 2023-11-21 腾讯科技(深圳)有限公司 Chip comprising multiply accumulator, terminal and floating point operation control method
CN112558920B (en) * 2020-12-21 2022-09-09 清华大学 Signed/unsigned multiply-accumulate device and method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923459A (en) * 2009-06-17 2010-12-22 复旦大学 Reconfigurable multiplication/addition arithmetic unit for digital signal processing
CN104011665A (en) * 2011-12-23 2014-08-27 英特尔公司 Super Multiply Add (Super MADD) Instruction

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5784305A (en) * 1995-05-01 1998-07-21 Nec Corporation Multiply-adder unit
ATE300761T1 (en) * 2000-10-16 2005-08-15 Nokia Corp MULTIPLYER AND SHIFT ARRANGEMENT USING SIGN DIGITAL NUMBER REPRESENTATION
CN1324456C (en) * 2004-01-09 2007-07-04 上海交通大学 Digital signal processor using mixed compression two stage flow multiplicaton addition unit
CN100356315C (en) * 2004-09-02 2007-12-19 中国人民解放军国防科学技术大学 Design method of number mixed multipler for supporting single-instruction multiple-operated
US7912891B2 (en) * 2005-12-09 2011-03-22 Electronics And Telecommunications Research Institute High speed low power fixed-point multiplier and method thereof
CN100552620C (en) * 2007-09-21 2009-10-21 清华大学 Large number multiplication device based on quadratic B ooth coding
CN101625634A (en) * 2008-07-09 2010-01-13 中国科学院半导体研究所 Reconfigurable multiplier
CN102591615A (en) * 2012-01-16 2012-07-18 中国人民解放军国防科学技术大学 Structured mixed bit-width multiplying method and structured mixed bit-width multiplying device
CN103955585B (en) * 2014-05-13 2017-02-15 复旦大学 FIR (finite impulse response) filter structure for low-power fault-tolerant circuit
CN105183424B (en) * 2015-08-21 2017-09-01 电子科技大学 A kind of fixation bit wide multiplier with high-precision low energy consumption characteristic
CN107977191B (en) * 2016-10-21 2021-07-27 中国科学院微电子研究所 Low-power-consumption parallel multiplier
CN108459840B (en) * 2018-02-14 2021-07-09 中国科学院电子学研究所 SIMD structure floating point fusion point multiplication operation unit
CN110190843B (en) * 2018-04-10 2020-03-10 中科寒武纪科技股份有限公司 Compressor circuit, Wallace tree circuit, multiplier circuit, chip and apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101923459A (en) * 2009-06-17 2010-12-22 复旦大学 Reconfigurable multiplication/addition arithmetic unit for digital signal processing
CN104011665A (en) * 2011-12-23 2014-08-27 英特尔公司 Super Multiply Add (Super MADD) Instruction

Also Published As

Publication number Publication date
CN110413254A (en) 2019-11-05
CN111008003B (en) 2023-10-13
CN111008003A (en) 2020-04-14

Similar Documents

Publication Publication Date Title
CN110413254B (en) Data processor, method, chip and electronic equipment
CN110515587B (en) Multiplier, data processing method, chip and electronic equipment
CN110362293B (en) Multiplier, data processing method, chip and electronic equipment
CN110673823B (en) Multiplier, data processing method and chip
CN110531954B (en) Multiplier, data processing method, chip and electronic equipment
CN110554854B (en) Data processor, method, chip and electronic equipment
CN111258544B (en) Multiplier, data processing method, chip and electronic equipment
CN111258633B (en) Multiplier, data processing method, chip and electronic equipment
CN111258541B (en) Multiplier, data processing method, chip and electronic equipment
CN113031912A (en) Multiplier, data processing method, device and chip
CN210006029U (en) Data processor
CN209879493U (en) Multiplier and method for generating a digital signal
CN110647307B (en) Data processor, method, chip and electronic equipment
CN210109789U (en) Data processor
CN210006031U (en) Multiplier and method for generating a digital signal
CN113033799B (en) Data processor, method, device and chip
CN111258545B (en) Multiplier, data processing method, chip and electronic equipment
CN210006030U (en) Data processor
CN110688087B (en) Data processor, method, chip and electronic equipment
CN111258542B (en) Multiplier, data processing method, chip and electronic equipment
CN110515588B (en) Multiplier, data processing method, chip and electronic equipment
CN113031911A (en) Multiplier, data processing method, device and chip
CN113031915A (en) Multiplier, data processing method, device and chip
CN113031916A (en) Multiplier, data processing method, device and chip
CN209879492U (en) Multiplier, machine learning arithmetic device and combination processing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant