CN110688087B - Data processor, method, chip and electronic equipment - Google Patents

Data processor, method, chip and electronic equipment Download PDF

Info

Publication number
CN110688087B
CN110688087B CN201910902845.2A CN201910902845A CN110688087B CN 110688087 B CN110688087 B CN 110688087B CN 201910902845 A CN201910902845 A CN 201910902845A CN 110688087 B CN110688087 B CN 110688087B
Authority
CN
China
Prior art keywords
partial product
data
circuit
order
low
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910902845.2A
Other languages
Chinese (zh)
Other versions
CN110688087A (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201910902845.2A priority Critical patent/CN110688087B/en
Publication of CN110688087A publication Critical patent/CN110688087A/en
Application granted granted Critical
Publication of CN110688087B publication Critical patent/CN110688087B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Neurology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application provides a data processor, a method, a chip and electronic equipment, wherein the data processor comprises a first multiplication circuit, a second multiplication circuit and a partial product exchange circuit, the output end of the first multiplication circuit is connected with the first input end of the partial product exchange circuit, the first output end of the partial product exchange circuit is connected with the first input end of the second multiplication circuit, the output end of the second multiplication circuit is connected with the second input end of the partial product exchange circuit, and the second output end of the partial product exchange circuit is connected with the input end of the first multiplication circuit; in addition, the data processor can complete multiply-accumulate operation without carrying out accumulation operation on multiplication operation results, and can directly realize multiply operation or multiply-accumulate operation only through one operation process, thereby reducing the power consumption of the data processor.

Description

Data processor, method, chip and electronic equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a data processor, a method, a chip, and an electronic device.
Background
With the continuous development of digital electronics, the rapid development of various types of artificial intelligence (Artificial Intelligence, AI) chips is also increasingly demanding for high performance data processors, which are multipliers, adders or multiply accumulators. The neural network algorithm is one of algorithms widely used by intelligent chips, and multiply-accumulate operation by a multiply-accumulate device is a common operation in the neural network algorithm.
In general, a data processor performs parallel multiplication operation by using a plurality of multipliers with the same input data bit width or different data bit widths to obtain a plurality of multiplication operation results, and performs accumulation operation on the plurality of multiplication operation results by using an adder to obtain a target multiply-accumulate result. However, the existing data processor can only perform multiply-accumulate operation on data with the same bit width, and the universality of the data processor is reduced. In addition, in the prior art, a plurality of multiplication operation results also need to be singly accumulated to realize the multiplication accumulation operation, so that the power consumption of the data processor is increased.
Disclosure of Invention
Accordingly, there is a need for a data processor, a method, a chip, and an electronic device with low power consumption and high versatility.
A data processor, the data processor comprising: the device comprises a first multiplication circuit, a second multiplication circuit and a partial product exchange circuit, wherein the first multiplication circuit comprises a first coding branch, a first selecting branch and a first compression branch, and the second multiplication circuit comprises a second coding branch, a second selecting branch and a second compression branch; the output end of the first multiplication circuit is connected with the first input end of the partial product switching circuit, the first output end of the partial product switching circuit is connected with the first input end of the second multiplication circuit, the output end of the second multiplication circuit is connected with the second input end of the partial product switching circuit, and the second output end of the partial product switching circuit is connected with the input end of the first multiplication circuit;
the first encoding branch is used for encoding received first data to obtain a first partial product after symbol bit expansion, the first selecting branch is used for selecting a first partial product of target encoding from the first partial product after symbol bit expansion, the first compressing branch is used for compressing the first partial product of target encoding to obtain a first target operation result, the second encoding branch is used for encoding received second data to obtain a second partial product after symbol bit expansion, the second selecting branch is used for selecting a second partial product of target encoding from the second partial product after symbol bit expansion, the second compressing branch is used for compressing the second partial product of target encoding to obtain a second target operation result, and the partial product exchanging circuit is used for exchanging the first partial product after symbol bit expansion and the second partial product after symbol bit expansion.
In one embodiment, the first multiplication circuit and the second multiplication circuit each include a first input terminal for receiving a function selection mode signal; the partial product switching circuit comprises a third input end for receiving the function selection mode signal; the function selection mode signal is used to determine a data operation of a corresponding mode that the data processor is currently capable of processing.
In one embodiment, the partial product switching circuit comprises: the device comprises a function selection mode signal input port, a first partial product output port, a second partial product input port and a second partial product output port, wherein the function selection mode signal input port is used for receiving the function selection mode signal, the first partial product input port is used for receiving a first partial product which is input by a first multiplication circuit and needs to be subjected to symbol bit expansion, the first partial product output port is used for outputting the first partial product which is subjected to symbol bit expansion, the second partial product output port is used for receiving a second partial product which is input by a second multiplication circuit and needs to be subjected to symbol bit expansion, and the second partial product output port is used for outputting the second partial product which is subjected to symbol bit expansion.
In one embodiment, the first multiplication circuit includes: the output end of the first correction coding sub-circuit is connected with the first input end of the first partial product selection sub-circuit, the second input end of the first partial product selection sub-circuit is connected with the second output end of the partial product exchange circuit, and the output end of the first partial product selection sub-circuit is connected with the first input end of the first correction compression sub-circuit;
the first correction coding sub-circuit is used for carrying out Booth coding processing on the received first data to obtain a first partial product after the sign bit expansion, the first partial product selection sub-circuit is used for receiving a second partial product after the sign bit expansion output by the partial product switching circuit, selecting the first partial product after the sign bit expansion, inputting the received second partial product after the sign bit expansion and the first partial product after the sign bit expansion obtained after the selection as the first partial product of the target coding to the first correction compression sub-circuit, and the first correction compression sub-circuit is used for carrying out accumulation processing on the first partial product of the target coding.
In one embodiment, the first modified encoding sub-circuit includes: a low-order booth encoding unit, a low-order partial product acquisition unit, a selector, a high-order booth encoding unit, a high-order partial product acquisition unit, a low-order selector group unit, and a high-order selector group unit; the first output end of the low-level booth encoding unit is connected with the input end of the selector, the second output end of the low-level booth encoding unit is connected with the first input end of the low-level partial product acquisition unit, the output end of the selector is connected with the first input end of the high-level booth encoding unit, the output end of the high-level booth encoding unit is connected with the first input end of the high-level partial product acquisition unit, the output end of the low-level selector group unit is connected with the second input end of the low-level partial product acquisition unit, and the output end of the high-level selector group unit is connected with the second input end of the high-level partial product acquisition unit;
the low-level booth encoding unit is used for performing booth encoding processing on low-level data in the received first data to obtain a first low-level target encoding, the low-level partial product obtaining unit is used for obtaining a first low-level partial product after sign bit expansion according to the first low-level target encoding, the selector is used for gating a bit supplementing value when the high-level data in the first data are subjected to booth encoding, the high-level booth encoding unit is used for performing booth encoding processing on the high-level data in the received first data and the bit supplementing value to obtain a first high-level target encoding, the high-level partial product obtaining unit is used for obtaining a first high-level partial product after sign bit expansion according to the first high-level target encoding, the low-level selector group unit is used for gating a value in the first low-level partial product after sign bit expansion, and the high-level selector group unit is used for gating a value in the first high-level partial product after sign bit expansion.
In one embodiment, the low-order booth encoding unit includes: a low bit data input port and a low bit target code output port; the low-order data input port is used for receiving low-order data in the first data subjected to Booth coding, and the low-order target coding output port is used for outputting a first low-order target code obtained after the Booth coding is performed on the low-order data in the first data.
In one embodiment, the high-order booth encoding unit includes: a high bit data input port and a high bit target code output port; the high-order data input port is used for receiving the high-order data in the first data subjected to Booth coding, and the high-order target coding output port is used for outputting high-order target codes obtained after the Booth coding is performed on the high-order data in the first data.
In one embodiment, the low-order partial product acquisition unit includes: a low order target code input port, a strobe value input port, a data input port, and a low order partial product output port; the low-order target code input port is used for receiving the low-order target code output by the low-order booth code unit, the gating value input port is used for receiving the value in the low-order partial product after the low-order selector group unit gates, the data input port is used for receiving the second data, and the low-order partial product output port is used for outputting the low-order partial product after the sign bit expansion.
In one embodiment, the high-order partial product acquisition unit includes: a high-order target coding input port, a gating value input port, a data input port and a high-order partial product output port; the high-order target coding input port is used for receiving the high-order target codes output by the high-order booth coding unit, the gating value input port is used for receiving the numerical value in the high-order partial product after the sign bit expansion output after the gating of the high-order selector group unit, the data input port is used for receiving the second data, and the high-order partial product output port is used for outputting the high-order partial product after the sign bit expansion.
In one embodiment, the selector comprises: a function selection mode signal input port, a first gating value input port, a second gating value input port and a gating result output port; the function selection mode signal input port is configured to receive the function selection mode signal corresponding to the data operation in the different modes, the first strobe value input port is configured to receive a first strobe value, the second strobe value input port is configured to receive a second strobe value, and the strobe result output port 1113d is configured to output the first strobe value or the second strobe value after being strobed.
In one embodiment, the low-order selector bank unit includes: and the low-order selector is used for gating the numerical value in the low-order partial product after the sign bit expansion.
In one embodiment, the high selector bank unit includes: and the high-order selector is used for gating the numerical value in the high-order partial product after the sign bit expansion.
In one embodiment, the first partial product selection sub-circuit includes: a function selection mode signal input port, a first partial product input port, a second partial product input port, a first partial product output port, and a strobe partial product output port; the function selection mode signal input port is used for receiving the function selection mode signal, the first partial product input port is used for receiving a first partial product after the sign bit expansion input by the first correction coding subcircuit, the second partial product input port is used for receiving a second partial product after the sign bit expansion switched by the partial product switching circuit, the first partial product output port is used for outputting the first partial product after the sign bit expansion required to be switched by the partial product switching circuit, and the gating partial product output port is used for outputting a first partial product after the sign bit expansion after gating and the second partial product after the sign bit expansion.
In one embodiment, the second multiplication circuit includes: the output end of the second correction coding sub-circuit is connected with the first input end of the second partial product selection sub-circuit, the second input end of the second partial product selection sub-circuit is connected with the first output end of the partial product exchange circuit, and the output end of the second partial product selection sub-circuit is connected with the first input end of the second correction compression sub-circuit;
the second correction coding sub-circuit is used for carrying out booth coding processing on the received second data to obtain a second partial product after the sign bit expansion, the second partial product selection sub-circuit is used for receiving the second partial product after the sign bit expansion output by the partial product switching circuit, selecting the second partial product after the sign bit expansion, and inputting the second partial product after the sign bit expansion and the second partial product after the sign bit expansion obtained after the selection as the second partial product of the target coding to the second correction compression sub-circuit, and the second correction compression sub-circuit is used for carrying out accumulation processing on the second partial product of the target coding.
In one embodiment, the first modified compression sub-circuit includes: the system comprises a modified Wallace tree group circuit and an accumulation circuit, wherein the output end of the modified Wallace tree group circuit is connected with the input end of the accumulation circuit; the modified Wallace tree group circuit is used for carrying out accumulation processing on each column value in the first low-order partial product of the target code and the first high-order partial product of the target code obtained during data operation processing of different modes to obtain an accumulation operation result, and the accumulation circuit is used for carrying out addition operation on the accumulation operation result.
In one embodiment, the modified Wallace Tree group Circuit comprises: the low-level Wallace tree sub-circuit, the selector and the high-level Wallace tree sub-circuit are connected, wherein the output end of the low-level Wallace tree sub-circuit is connected with the input end of the selector, and the output end of the selector is connected with the input end of the high-level Wallace tree sub-circuit; the low-order Wallace tree sub-circuit is used for carrying out accumulation operation on each column number value in the first partial product of the target code to obtain an accumulation operation result, the selector is used for gating a carry input signal received by the high-order Wallace tree sub-circuit, and the high-order Wallace tree sub-circuit is used for carrying out accumulation operation on each column number value in the first partial product of the target code to obtain the accumulation operation result.
In one embodiment, the accumulation circuit includes: and the carry adder is used for carrying out addition operation on the accumulation operation result.
A method of data processing, the method comprising:
receiving data to be processed and a function selection mode signal, wherein the function selection mode signal is used for indicating data operation of a corresponding mode which can be processed currently by a data processor;
according to the function selection mode signal, carrying out coding processing on the data to be processed to obtain a target code;
obtaining a partial product of the sign bit expansion through the target code and the data to be processed;
according to the function selection mode signal and the partial product after the sign bit expansion, acquiring a target coding partial product;
and compressing the partial product of the target code to obtain a target operation result.
In one embodiment, the encoding the data to be processed according to the function selection mode signal, to obtain the target code includes: determining the data operation of a specific mode which can be processed currently by the data processor according to the function selection mode signal; and carrying out Booth coding treatment on the data to be processed according to the data operation of the specific mode to obtain a target code.
In one embodiment, the obtaining the partial product of the sign bit extension by the target encoding and the data to be processed includes:
obtaining a first partial product of sign bit expansion through a first target code and the data to be processed;
and obtaining a second partial product of the sign bit expansion through a second target code and the data to be processed.
In one embodiment, the obtaining the partial product of the target code according to the function selection mode signal and the partial product after the sign bit expansion includes:
determining the data operation of the corresponding mode which can be processed by the data processor currently according to the function selection mode signal;
determining whether the partial product after the sign bit expansion needs to be subjected to exchange processing according to the data operation of the corresponding mode;
and if the exchange processing is not needed, taking the partial product of the sign bit after the expansion as the partial product of the target coding.
In one embodiment, the determining whether the partial product after the sign bit expansion needs to be exchanged according to the data operation of the corresponding mode includes: and determining whether the first partial product after the sign bit expansion and the second partial product after the sign bit expansion are required to be subjected to exchange processing according to the data operation of the corresponding mode.
In one embodiment, the method further comprises: and if the exchange processing is needed, carrying out the exchange processing on the partial product after the sign bit expansion.
According to the data processor and the method, the Booth coding processing is realized on the received data through the first multiplication circuit and the second multiplication circuit respectively, the partial product after the sign bit expansion is obtained, the partial product after the sign bit expansion is input into the first multiplication circuit and the second multiplication circuit through the partial product exchange circuit exchange processing, the first partial product of the target coding and the second partial product of the target coding are respectively determined, and then the target operation result is obtained through compression processing, so that the data processor can realize multiplication operation and multiply accumulation operation, and the universality of the data processor is improved; in addition, the data processor can complete multiply-accumulate operation without carrying out accumulation operation on multiplication operation results, and can directly realize multiply operation or multiply-accumulate operation only through one operation process, thereby reducing the power consumption of the data processor.
The embodiment of the application provides a machine learning operation device, which comprises one or more data processors; the machine learning operation device is used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation and transmitting an execution result to the other processing devices through an I/O interface;
when the machine learning operation device comprises a plurality of data processors, the data processors are connected through a preset specific structure and data are transmitted;
the data processors are interconnected through the PCIE bus and transmit data so as to support larger-scale machine learning operation; a plurality of data processors share the same control system or have respective control systems; a plurality of data processors share a memory or have respective memories; the interconnection mode of a plurality of the data processors is any interconnection topology.
The embodiment of the application provides a combined processing device, which comprises the machine learning processing device, a general interconnection interface and other processing devices. The machine learning operation device interacts with the other processing devices to jointly complete the operation appointed by the user; the combination processing device may further include a storage device connected to the machine learning operation device and the other processing device, respectively, for storing data of the machine learning operation device and the other processing device.
The neural network chip provided in the embodiments of the present application includes the data processor, the machine learning computing device, or the combination processing device.
The embodiment of the application provides a neural network chip packaging structure, which comprises the neural network chip.
The embodiment of the application provides a board card, which comprises the neural network chip packaging structure.
The embodiment of the application provides an electronic device, which comprises the neural network chip or the board card.
The chip provided by the embodiment of the application comprises at least one data processor as described in any one of the above.
The electronic device provided by the embodiment of the application comprises the chip.
Drawings
FIG. 1 is a schematic diagram of a circuit structure of a data processor according to an embodiment;
FIG. 2 is a schematic circuit diagram of another data processor according to another embodiment;
FIG. 3 is a schematic diagram of a data processor according to an embodiment;
FIG. 4a is a schematic diagram showing a distribution rule of partial products obtained by a 16-bit data multiplication operation according to an embodiment;
FIG. 4b is a schematic diagram showing a distribution rule of partial products obtained by a 16-bit x 8-bit data multiply-accumulate operation according to an embodiment;
FIG. 5 is a specific circuit configuration diagram of a data processor according to another embodiment;
FIG. 6 is a flow chart of a data processing method according to an embodiment;
FIG. 7 is a circuit diagram showing a compression circuit for 8-bit data operation according to another embodiment;
FIG. 8 is a flowchart illustrating another data processing method according to an embodiment;
FIG. 9 is a block diagram of a combination processing apparatus according to an embodiment;
FIG. 10 is a block diagram of another combination processing apparatus according to one embodiment;
fig. 11 is a schematic structural diagram of a board according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The data processor provided by the application can be applied to an AI chip, a Field programmable gate array FPGA (Field-Programmable Gate Array, FPGA) chip or other hardware circuit devices for multiplication or multiply-accumulate operation, and the structure schematic diagrams of the data processor are shown in fig. 1 and 2.
Referring now to FIG. 1, FIG. 1 is a block diagram of a data processor according to one embodiment. As shown in fig. 1, the data processor includes: a first multiplication circuit 11, a second multiplication circuit 12 and a partial product exchange circuit 13, the first multiplication circuit 11 comprising a first coding branch 11a, a first selection branch 11b and a first compression branch 11c, the second multiplication circuit 12 comprising a second coding branch 12a, a second selection branch 12b and a second compression branch 12c; the output end of the first multiplication circuit 11 is connected with the first input end of the partial product switching circuit 13, the first output end of the partial product switching circuit 13 is connected with the first input end of the second multiplication circuit 12, the output end of the second multiplication circuit 12 is connected with the second input end of the partial product switching circuit 13, and the second output end of the partial product switching circuit 13 is connected with the input end of the first multiplication circuit 11.
The first encoding branch 11a is configured to encode received first data to obtain a first partial product after symbol bit expansion, the first selecting branch 11b is configured to select a first partial product of target encoding from the first partial product after symbol bit expansion, the first compressing branch 11c is configured to compress the first partial product of target encoding to obtain a first target operation result, the second encoding branch 12a is configured to encode received second data to obtain a second partial product after symbol bit expansion, the second selecting branch 12b is configured to select a second partial product of target encoding from the second partial product after symbol bit expansion, the second compressing branch 12c is configured to compress the second partial product of target encoding to obtain a second target operation result, and the partial product exchanging circuit 13 is configured to exchange the first partial product after symbol bit expansion and the second partial product after symbol bit expansion.
Specifically, the first multiplication circuit 11 and the second multiplication circuit 12 may each receive one data, where the data may include two sub-data, and the two sub-data may be identical sub-data with a parity width, or may be different sub-data with a parity width, and the sub-data may be used as a multiplicand and a multiplier in a multiplication operation or a multiply-accumulate operation. Alternatively, the two sub-data in the first data and the second data may be spliced and then input to the first multiplication circuit 11 or the second multiplication circuit 12 as a whole, or may be separately and simultaneously input to the first multiplication circuit 11 or the second multiplication circuit 12. The sub data may be a fixed point number, and the bit width may be 2N, and the data bit width obtained after the two sub data are spliced may be 4N. Alternatively, the first multiplication circuit 11 may include a plurality of data processing units having different functions, and the data processing units may be units having binary encoding processing functions, or units having different operation processing functions, which is not limited in this embodiment. When the data processor performs the same data operation, the first multiplication circuit 11 or the second multiplication circuit 12 receives one of the sub-data as a multiplicand and the other sub-data as a multiplier. It will also be appreciated that the bit width of the sign bit expanded first partial product and the sign bit expanded second partial product may each be equal to 2 times the multiplicand bit width when the data processor is currently processing either a multiply operation or a multiply accumulate operation. The first partial product after the sign bit expansion may include a first low-order partial product after the sign bit expansion and a first high-order partial product after the sign bit expansion, and the second partial product after the sign bit expansion may include a second low-order partial product after the sign bit expansion and a second high-order partial product after the sign bit expansion.
Optionally, the first multiplication circuit 11 and the second multiplication circuit 12 each include a first input terminal for receiving a function selection mode signal; the partial product switching circuit 13 comprises a third input for receiving the function selection mode signal. Optionally, the function selection mode signal is used to determine a data operation of a corresponding mode that the data processor is currently capable of processing.
It should be noted that, when the data processor performs the same data operation, the first multiplication circuit 11, the second multiplication circuit 12, and the partial product exchange circuit 13 may all receive equal function selection mode signals, and the four function selection mode signals may correspond to four modes of data operations that the data processor may process, where the four modes of data operations may be multiplication operations of N bits by N bits of data, multiplication operations of 2N bits by 2N bits of data, and multiplication operations of 2N bits by N bits of data. For example, if the first data and the second data each include two 2N-bit sub-data, the data processor may determine that the data operation of the corresponding mode is currently required to be processed according to the received different function selection mode signals. The four function selection mode signals may be represented by binary values of 00, 01, 10, and 11, respectively, or may be represented by other modes, which is not limited in this embodiment. For example, mode=00 may represent a multiplication operation of N bits of data that the data processor may currently process, mode=01 may represent a multiplication and accumulation operation of N bits of data that the data processor may currently process, mode=10 may represent a multiplication operation of 2N bits of data that the data processor may currently process, mode=11 may represent a multiplication and accumulation operation of 2N bits of data that the data processor may currently process, and further, there may be any correspondence between the four function selection mode signals and the four different modes of data operation, which is not limited in this embodiment.
In addition, when the data processor performs multiply-accumulate operation of 2N bits of data, the partial product exchange circuit 13 may exchange the first low-order partial product after the sign bit expansion obtained by the first multiplication circuit 11 or the first high-order partial product after the sign bit expansion obtained by the second multiplication circuit 12 with the second low-order partial product after the sign bit expansion or the second high-order partial product after the sign bit expansion according to actual requirements, which may be further understood that the partial product exchange circuit 13 is in a suspended state when the data processor performs data operations in other three modes, and the low-order partial product after the sign bit expansion and the high-order partial product after the sign bit expansion do not perform corresponding exchange processing. Meanwhile, the bit width of two sub-data in the first data and the second data is 2N, if the data processor can process multiplication operation of N-bit data currently, one of the first data and the second data is a value 0 at this time according to actual requirements, and the high-order value of the two sub-data in the other data is a value 0, or the low-order value of the two sub-data is a value 0, and at this time, the first data and the second data can be calculated according to original data according to actual requirements; if the data processor can process multiplication operation of 2N bit data currently, according to actual requirements, one of the first data and the second data is a value 0, and the high-order value and the low-order value in two sub-data of the other data are both non-0 values; if the data processor can currently process multiplication operation of two 2N bits by 2N bits, according to actual requirements, no value 0 exists in the first data and the second data.
The data processor provided in this embodiment performs booth encoding processing on received data through a first multiplication circuit and a second multiplication circuit, obtains a partial product after sign bit expansion, and determines a first partial product of target encoding and a second partial product of target encoding through partial product exchange circuit exchange processing after sign bit expansion input to the first multiplication circuit and the second multiplication circuit, and further performs compression processing on the first partial product and the second partial product to obtain a target operation result, where the data processor not only can implement multiplication operation, but also can implement multiply accumulation operation, thereby improving universality of the data processor; in addition, the data processor can complete multiply-accumulate operation without carrying out accumulation operation on multiplication operation results, and can directly realize multiply-accumulate or multiplication operation only through one operation process, thereby reducing the power consumption of the data processor.
As shown in fig. 2, fig. 2 is a schematic structural diagram of a data processor according to another embodiment, where the data processor includes a booth encoding circuit 21, a first partial product acquiring circuit 22, a second partial product acquiring circuit 23, a first compressing circuit 24, and a second compressing circuit 25; the output end of the booth encoding circuit 21 is connected to the first input end of the first partial product obtaining circuit 22, the output end of the first partial product obtaining circuit 22 is connected to the first input end of the first compression circuit 24, the output end of the booth encoding circuit 21 is also connected to the first input end of the second partial product obtaining circuit 23, and the output end of the second partial product obtaining circuit 23 is connected to the first input end of the second compression circuit 25.
The booth encoding circuit 21 is configured to perform booth encoding processing on received first data to obtain a target encoding, the first partial product obtaining circuit 22 is configured to receive second data and obtain a first partial product of the target encoding according to the target encoding, the second partial product obtaining circuit 23 is configured to receive second data and obtain a second partial product of the target encoding according to the target encoding, the first compression circuit 24 is configured to perform accumulation processing on the first partial product of the target encoding to obtain a first target operation result, and the second compression circuit 25 is configured to perform accumulation processing on the second partial product of the target encoding to obtain a second target operation result.
Specifically, the first data and the second data may each include two sub-data, where the two sub-data included in the first data may be used as a multiplier in a multiplication operation or a multiply-accumulate operation by the data processor, and the two sub-data included in the second data may be used as a multiplicand in the multiplication operation or the multiply-accumulate operation by the data processor, but the bit widths of the multiplier and the multiplicand may be 2N, and in addition, the two sub-data included in the first data may be spliced and then input to the booth encoding circuit 21 as a whole, or may be spliced and then input to the booth encoding circuit 21 as a whole, and the two sub-data included in the second data may be spliced and then input to the first partial product acquiring circuit 22 and the second partial product acquiring circuit 23, or may be spliced and then input to the first partial product acquiring circuit 22 and the second partial product acquiring circuit 23. Alternatively, the first partial product acquiring circuit 22 and the second partial product acquiring circuit 23 may both receive the target codes input by the booth encoding circuit 21, and obtain the partial products of the corresponding target codes according to the received second data, respectively. Before the booth encoding process, the booth encoding circuit 21 may automatically perform a bit-filling process on two sub-data in the received first data, where the bit-filling process may be to fill a bit value 0 at a lower one of the lowest bit values of the two sub-data. For example, if one sub-data (i.e. multiplier) is y 7 y 6 y 5 y 4 y 3 y 2 y 1 y 0 Before Booth encoding, the Booth encoding circuit 21 may automatically perform bit-filling processing on the sub-data to convert the sub-data into y 7 y 6 y 5 y 4 y 3 y 2 y 1 y 0 0. Alternatively, the number of target encodings may be equal to 1/2 of the bit width of the sub-data (i.e., multiplier) currently being processed by the data processor.
It should be noted that, if the bit width of the data that the data processor can process currently is equal to the bit width of the sub data, the booth encoding circuit 21 may obtain two sets of target codes corresponding to the two sub data after finishing the booth encoding process, the booth encoding circuit 21 may input two or one set of target codes to the first partial product obtaining circuit 22, and input two or the other set of target codes to the second partial product obtaining circuit 23, which is equivalent to that the first partial product obtaining circuit 22 and the second partial product obtaining circuit 23 may receive one or two sets of target codes and the second data (i.e., the multiplicand), and according to the actual requirement, the partial product of one set of target codes and one sub data in the second data may be obtained, and the partial product of the target code may include the low partial product of the target code corresponding to the low data in the sub data and the high partial product of the target code corresponding to the high data in the sub data. The first compression circuit 24 may accumulate the partial products of the two sets of target codes (i.e., the first partial product of the target code) obtained by the first partial product obtaining circuit 22, and the second compression circuit 25 may accumulate the partial products of the two sets of target codes (i.e., the second partial product of the target code) obtained by the second partial product obtaining circuit 23, thereby obtaining the target operation result of the multiplication operation. In addition, in the embodiment, the bit width of the sub data included in the first data and the second data received by the data processor is 2N.
Optionally, the booth encoding circuit 21 includes a first input terminal for receiving a function selection mode signal; the first partial product acquiring circuit 22 and the second partial product acquiring circuit 23 each include a second input terminal for receiving the function selecting mode signal; the first compression circuit 24 and the second compression circuit 25 each include a second input terminal for receiving the function selection mode signal. Optionally, the function selection mode signal is used to determine that the data processor can currently process data operations of a corresponding mode.
It will be appreciated that there may be four types of function selection mode signals (modes) described above, which correspond to data operations that the data processor may process in four corresponding modes. Optionally, during the same data operation, the booth encoding circuit 21, the first partial product acquiring circuit 22, the second partial product acquiring circuit 23, the first compression circuit 24, and the second compression circuit 25 may all be equal to each other, and the four function selection mode signals (modes) may be respectively represented by binary values as mode=00, mode=01, mode=10, mode=11, and the four corresponding modes of data operation may be respectively multiplication operation of N bits by N bits of data, multiplication operation of 2N bits by 2N bits of data, multiplication operation of 2N bits by N bits of data. The first partial product acquiring circuit 22 and the second partial product acquiring circuit 23 may control the receiving booth encoding circuit 21 to input one set of target codes or two sets of target codes for operation according to the received function selection mode signal.
The data processor provided by the embodiment performs coding processing on the received first data through the booth coding circuit to obtain a target code, the first partial product acquisition circuit obtains a first partial product of a corresponding target code according to the received second data and the target code, the second partial product acquisition circuit obtains a second partial product of the corresponding target code according to the received second data and the target code, and the first compression circuit and the second compression circuit respectively perform accumulation processing, so that the data processor not only can realize multiplication operation, but also can realize multiply accumulation operation, and the universality of the data processor is improved; in addition, the data processor can complete multiply-accumulate operation without carrying out accumulation operation on multiplication operation results, and can directly realize multiply-accumulate or multiplication operation only through one operation process, thereby reducing the power consumption of the data processor.
Fig. 3 is a schematic diagram of a specific structure of a data processor according to another embodiment. The first multiplication circuit 11 in the data processor includes: a first correction encoding sub-circuit 111, a first partial product selection sub-circuit 112, and a first correction compression sub-circuit 113; an output terminal of the first correction coding sub-circuit 111 is connected to a first input terminal of the first partial product selection sub-circuit 112, a second input terminal of the first partial product selection sub-circuit 112 is connected to a second output terminal of the partial product switching circuit 13, and an output terminal of the first partial product selection sub-circuit 112 is connected to a first input terminal of the first correction compression sub-circuit 113; the first correction coding sub-circuit 111 is configured to perform booth coding processing on the received first data to obtain a first partial product after the sign bit expansion, the first partial product selecting sub-circuit 112 is configured to receive a second partial product after the sign bit expansion output by the partial product switching circuit 13, select the first partial product after the sign bit expansion, input the received second partial product after the sign bit expansion and the first partial product after the sign bit expansion obtained after the selection as the first partial product of the target code to the first correction compression sub-circuit 113, and the first correction compression sub-circuit 113 is configured to perform accumulation processing on the first partial product of the target code.
Specifically, the first correction coding sub-circuit 111 may include a plurality of data processing units having different functions. Optionally, the first modified encoding sub-circuit 111 may perform booth encoding processing on the received first data, which is equivalent to that the first modified encoding sub-circuit 111 may perform booth encoding processing on the received multiplier to obtain a first target encoding, and obtain, according to the received multiplicand and the first target encoding, a first partial product after sign bit expansion, where a bit width of the first partial product after sign bit expansion may be equal to 2 times a bit width of the multiplicand when the data processor may currently process multiplication. For example, the first modified encoding sub-circuit 111 receives data with a bit width of 16 bits, if the data processor can process multiplication operation of 8 bits of data, the first modified encoding sub-circuit 111 needs to divide the data with a bit width of 16 bits into two groups of data with a bit width of 8 bits at the top and 8 bits at the bottom for respectively performing operation processing, and at this time, the bit width of the first partial product after the obtained sign bit expansion can be equal to 16; if the data processor can currently process multiplication of 16 bits of data, the first correction coding sub-circuit 111 needs to perform operation on the whole 16 bits of data, and at this time, the bit width of the partial product after the obtained sign bit expansion may be equal to 32.
Optionally, the second multiplication circuit 12 includes: a second correction coding sub-circuit 121, a second partial product selection sub-circuit 122, and a second correction compression sub-circuit 123; an output terminal of the second correction coding sub-circuit 121 is connected to a first input terminal of the second partial product selection sub-circuit 122, a second input terminal of the second partial product selection sub-circuit 122 is connected to a first output terminal of the partial product switching circuit 13, and an output terminal of the second partial product selection sub-circuit 122 is connected to a first input terminal of the second correction compression sub-circuit 123; the second modified encoding sub-circuit 121 is configured to perform booth encoding processing on the received second data to obtain a second partial product after the sign bit expansion, the second partial product selecting sub-circuit 122 is configured to receive the second partial product after the sign bit expansion output by the partial product switching circuit 13, select the second partial product after the sign bit expansion, and input the second partial product after the sign bit expansion and the second partial product after the sign bit expansion obtained after the selection as the second partial product of the target encoding to the second modified compressing sub-circuit 123, where the second modified compressing sub-circuit 123 is configured to perform accumulation processing on the second partial product of the target encoding.
It should be noted that, the method of processing data by the second multiplication circuit 12 is substantially the same as the method of processing data by the first multiplication circuit 11, and the method of processing data by the second multiplication circuit 12 is not described in detail in this embodiment.
According to the data processor provided by the embodiment, the first partial product after sign bit expansion is obtained by carrying out coding processing on received first data through the correction coding circuit, the first partial product of target coding is selected through the first partial product selection subcircuit according to the current data mode processed by the data processor, the target operation result is obtained by carrying out accumulation processing on the first partial product of target coding through the correction compression circuit, the data processor can complete multiply-accumulate operation without carrying out accumulation operation on the multiplication operation result once again, and multiply-accumulate or multiply operation can be directly realized through one operation process, so that the power consumption of the data processor is reduced.
As one embodiment, the first correction coding sub-circuit 111 in the data processor includes: a low-order booth encoding unit 1111, a low-order partial product acquisition unit 1112, a selector 1113, a high-order booth encoding unit 1114, a high-order partial product acquisition unit 1115, a low-order selector group unit 1116, and a high-order selector group unit 1117; the first output end of the low-order booth encoding unit 1111 is connected to the input end of the selector 1112, the second output end of the low-order booth encoding unit 1111 is connected to the first input end of the low-order partial product acquisition unit 1112, the output end of the selector 1112 is connected to the first input end of the high-order booth encoding unit 1113, the output end of the high-order booth encoding unit 1113 is connected to the first input end of the high-order partial product acquisition unit 1115, the output end of the low-order selector unit 1116 is connected to the second input end of the low-order partial product acquisition unit 1112, and the output end of the high-order selector unit 1117 is connected to the second input end of the high-order partial product acquisition unit 1115. The low-level booth encoding unit 1111 is configured to perform booth encoding processing on low-level data in the received first data to obtain a first low-level target encoding, the low-level partial product obtaining unit 1112 is configured to obtain a first low-level partial product after symbol bit expansion according to the first low-level target encoding, the selector 1113 is configured to gate a complementary bit value when booth encoding is performed on high-level data in the first data, the high-level booth encoding unit 1114 is configured to perform booth encoding processing on the high-level data and the complementary bit value in the received first data to obtain a first high-level target encoding, the high-level partial product obtaining unit 1115 is configured to obtain a first high-level partial product after symbol bit expansion according to the first high-level target encoding, the low-level selector group unit 1116 is configured to gate a value in the first low-level partial product after symbol bit expansion, and the high-level selector group unit 1117 is configured to gate a value in the first high-level partial product after symbol bit expansion.
Specifically, the first correction coding sub-circuit 111 may receive the multiplier and the multiplicand in the multiply-accumulate operation, perform booth coding on the multiplier to obtain a first target code, and obtain a first partial product after sign bit expansion according to the first target code and the received multiplicand. Before the first data is subjected to the booth encoding process, the low-level booth encoding unit 1111 may automatically perform a booth encoding process on the low-level data in the first data received by the first correction encoding sub-circuit 111, to obtain a first low-level target encoding, where the low-level data may be a multiplier in a multiplication operation, and the low-level data in the first data may include low-level data corresponding to two sub-data in the first data. Alternatively, if the multiplier bit width received by the first correction coding sub-circuit 111 is N, the low-order data may be low-order N/2-order data, and the bit-filling process may be characterized as filling the low-order data with a bit value of 0 at the lower one of the lowest-order values. For example, if the data processor can currently handle 8-bit by 8-bit fixed-point number multiplication, the multiplier is "y 7 y 6 y 5 y 4 y 3 y 2 y 1 y 0 Before the booth encoding process, the low booth encoding unit 1111 may automatically perform the bit-filling process on the multiplier to convert the multiplier into the bit-filled data "y 7 y 6 y 5 y 4 y 3 y 2 y 1 y 0 0". Alternatively, the number of the first low-order target codes may be equal to 1/2 of the bit width of the low-order data, and the number of the first low-order target codes may be equal to the number of the first low-order partial products obtained by expanding the sign bits corresponding to the low-order data. It should be noted that, no matter whether the bit width of the data that the first correction coding sub-circuit 111 can process currently is the same as the bit width of the sub-data in the first data received by the first correction coding sub-circuit 111, when implementing the booth coding process, the low-order booth coding unit 1111 needs to automatically perform the bit compensation process on the low-order data in the data to be coded, and at this time, the bit compensation data is a value of 0.
Meanwhile, the high-order data in the multiplier received by the first correction coding sub-circuit 111 may be subjected to the booth coding process by the high-order booth coding unit 1114 to obtain the first high-order target code, however, the high-order data in the first data may need to be subjected to the booth coding process by the selector 1113 to obtain a strobe value, which may be used as a complementary value when the high-order data is subjected to the booth coding process, and then the high-order data and the complementary value may be combined to obtain the high-order data after the complementary process, and the high-order data after the complementary process may be subjected to the booth coding process by the high-order booth coding unit 1114 to obtain the first high-order target code. Alternatively, the selector 1113 may be a two-way selector, and the strobe value may be a value of 0, and the highest bit value of the low bit data (i.e., multiplier) in the first data. For example, the bit width of the two sub-data in the first data and the second data received by a data processor is 2N bits, if the data processor can process multiplication operation of N bits by N bits of data currently, the data gated by the selector 1113 is a value of 0, that is, the data processor needs to divide the received data with 2N bits of width into upper N bits and lower N bits, and the upper N bits and the lower N bits of data are processed respectively; if the data processor can currently process multiplication of 2N bits by 2N bits, the data selected by the selector 1113 is the highest bit value in the low-order data of the first data, which corresponds to the data processor being able to perform the booth encoding process on the received 2N bits of data as a whole. In addition, the selector 1112 may also determine the complement value of the strobe based on the different function selection mode signals received.
It should be noted that, the low-order partial product obtaining unit 1114 may obtain the low-order data in the first data, the corresponding partial product after the sign bit expansion, and the value in the first low-order partial product after the sign bit expansion obtained after the gating by the low-order selector unit 1116 according to each first low-order target code, so as to obtain the first low-order partial product after the sign bit expansion. Alternatively, the high-order partial product obtaining unit 1115 may obtain the high-order data in the first data, the corresponding symbol-bit-expanded partial product, and the value in the symbol-bit-expanded first high-order partial product obtained after the gating by the high-order selector group unit 1117 according to each first high-order target code, to obtain the symbol-bit-expanded first high-order partial product. Optionally, in the booth encoding process, the number of the obtained first low-order target codes may be equal to the number of the obtained first high-order target codes, or may be equal to the number of the first low-order partial products after the sign bit expansion corresponding to the low-order data, or the number of the first high-order partial products after the sign bit expansion corresponding to the high-order data. Alternatively, if the data processor can currently process N-bit multiplication operations, the first modified encoding sub-circuit 111 may include N/4 low-bit booth encoding units 1111 and N/4 high-bit booth encoding units 1114. Alternatively, the first modified encoding sub-circuit 111 may include N/4 low-order partial product acquisition units 1112 and N/4 high-order partial product acquisition units 1115. Alternatively, each of the low-order partial product acquiring units 1112 and each of the high-order partial product acquiring units 1115 may include 2N number generating subunits, and each number generating subunit may acquire a value of one of the first low-order partial product or the first high-order partial product after the sign bit expansion.
In addition, the method of the second modified encoding sub-circuit 121 for encoding data is the same as the method of the first modified encoding sub-circuit 111 for encoding data, and the internal structure and the function of the external output port of the second modified encoding sub-circuit 121 and the first modified encoding sub-circuit 111 are also the same, and the method and the structure of the second modified encoding sub-circuit 121 for processing data are not repeated in this embodiment.
According to the data processor provided by the embodiment, the low-order Booth coding unit, the selector and the high-order Booth coding unit in the correction coding circuit are used for carrying out Booth coding processing on received data to obtain the low-order target coding and the high-order target coding, the low-order partial product acquisition unit and the high-order partial product acquisition unit are used for obtaining the partial product of the target coding according to the low-order target coding and the high-order target coding, and then the partial product of the target coding is subjected to accumulation processing to obtain a target operation result.
As one embodiment, the low-order booth encoding unit 1111 in the data processor includes: low bit data input port 1111a and low bit target code output port 1111b. The low-order data input port 1111a is configured to receive low-order data in the first data subjected to booth encoding, and the low-order target encoding output port 1111b is configured to output a first low-order target encoding obtained after booth encoding is performed on the low-order data in the first data.
Specifically, during the operation process, the first modified encoding sub-circuit 111 may perform booth encoding processing on multipliers (i.e., two sub-data in the first data) during the multiplication operation or the multiply-accumulate operation, where the low-order booth encoding unit 1111 in the first modified encoding sub-circuit 111 may receive three-order numerical values in the low-order data corresponding to the two sub-data through the low-order target encoding output port 1111b, where the three-order numerical values are used as a set of data to be encoded, and the three numerical values may be adjacent three-order numerical values in the low-order data. Wherein, after each low-order booth encoding unit 1111 processes the received data to be encoded, the obtained low-order target code may be output through the low-order target code output port 1111 b. In addition, the first low-order booth encoding unit 1111 in the first modified encoding sub-circuit 111 may also receive the complement value 0 and the lower two-order value in the low-order data through the low-order target encoding output port 1111 b.
For example, if one of the sub-data (i.e., multiplier) received by the data processor is sub-data "y" of 16 bits wide 15 y 1 4 y 13 y 12 y 11 y 10 y 9 y 8 y 7 y 6 y 5 y 4 y 3 y 2 y 1 y 0 The numbers corresponding to the lowest-order numerical value and the highest-order numerical value are 0, … and 15, and the low-order booth encoding unit 1111 can encode the low-order data y 7 y 6 y 5 y 4 y 3 y 2 y 1 y 0 Performing Booth coding treatmentThe front 8-bit low-bit data is subjected to bit supplementing processing to obtain 9-bit data y 7 y 6 y 5 y 4 y 3 y 2 y 1 y 0 0, the low-order booth encoding units 1111 may be respectively applied to y 7 y 6 y 5 y 4 y 3 y 2 y 1 y 0 Y in 0 7 y 6 y 5 ,y 5 y 4 y 3 ,y 3 y 2 y 1 ,y 1 y 0 The four sets of sub data of 0 are respectively subjected to booth encoding processing, and adjacent three-bit values in the four sets of sub data divided by 9-bit sub data can be received through the low-bit target encoding output port 1111b in the low-bit booth encoding unit 1111.
It should be noted that, during each booth encoding process, the sub-data after the low-bit data is subjected to the bit compensation process may be divided into multiple groups of sub-data to be encoded, and the low-bit booth encoding unit 1111 may simultaneously perform booth encoding process on the multiple groups of sub-data to be encoded after the division. Alternatively, the principle of dividing multiple groups of sub-data to be encoded may be characterized in that every 3 adjacent bit values in the sub-data after the bit compensation processing are used as a group of sub-data to be encoded, and the highest bit value in each group of sub-data to be encoded may be used as the lowest bit value in the next adjacent group of sub-data to be encoded, that is, the bit compensation value during the booth encoding processing. Alternatively, the target coding rules of the booth coding may be found in table 1, where y in table 1 2i+1 ,y 2i And y 2i-1 Can represent the corresponding value of each group of sub data to be encoded (i.e. multiplier), X can represent the sub data in the second data received by the data processor (i.e. multiplicand), and after Booth encoding treatment is carried out on each group of sub data to be encoded, the corresponding target encoding PP is obtained i (i=0, 1,2,) n. Alternatively, the target codes obtained after Booth coding may include five classes, namely-2X, -X, X and 0, according to Table 1. Exemplary, if the multiplicand received by the data processor is "x 7 x 6 x 5 x 4 x 3 x 2 x 1 x 0 ", then X can be expressed as" X 7 x 6 x 5 x 4 x 3 x 2 x 1 x 0 ”。
TABLE 1
Illustratively, with continued reference to the above example, when i=0, y 2i+1 =y 1 ,y 2i =y 0 ,y 2i-1 =y -1 Then y -1 Can represent y 0 The post-bit number 0 (i.e., the post-bit multiplier is denoted as y 7 y 6 y 5 y 4 y 3 y 2 y 1 y 0 y -1 ) In the Booth coding process, y can be compared with -1 y 0 y 1 ,y 1 y 2 y 3 ,y 3 y 4 y 5 And y 5 y 6 y 7 And respectively carrying out target coding on four groups of sub-data to be coded to obtain 4 low-order target codes, wherein the highest numerical value in each group of sub-data to be coded can be used as the lowest numerical value in the next adjacent group of sub-data to be coded.
Optionally, the data processor further includes the high-order booth encoding unit 1114, and the high-order booth encoding unit 1114 includes: high bit data input port 1114a and high bit target encoded output port 1114b; the high-order data input port 1114a is configured to receive the high-order data in the first data subjected to the booth encoding process, and the high-order target encoding output port 1114b is configured to output a high-order target encoding obtained after the booth encoding process is performed on the high-order data in the first data.
Specifically, each time the booth encoding process is performed, the method of booth encoding the high-order data in the first data by the high-order booth encoding unit 1114 is the same as the method of booth encoding the low-order data in the first data by the low-order booth encoding unit 1111, and the method of booth encoding the high-order booth encoding unit 1114 is not repeated in this embodiment. In addition, the internal circuit structures of the high-order booth encoding unit 1114 and the low-order booth encoding unit 1111 may be the same, and the functions of the external output ports may be the same, so that the specific structure of the high-order booth encoding unit 1114 will not be described in detail in this embodiment.
According to the data processor provided by the embodiment, the low-order data in the first data is subjected to Booth coding processing through the low-order Booth coding unit to obtain the low-order target code corresponding to the low-order data, the low-order partial product of the target code is obtained through the low-order partial product obtaining unit according to the low-order target code, and further the low-order partial product and the high-order partial product of the target code are accumulated to obtain data operation results of different modes.
In one embodiment, the data processor includes a low-order partial product acquiring unit 1112, where the low-order partial product acquiring unit 1112 includes: a low order target code input port 1112a, a strobe value input port 1112b, a data input port 1112c, and a low order partial product output port 1112d; the low-order target code input port 1112a is configured to receive the low-order target code output by the low-order booth encoding unit 1111, the strobe value input port 1112b is configured to receive the value in the low-order partial product after the strobe by the low-order selector unit 1116, the data input port 1112c is configured to receive the second data, and the low-order partial product output port 1112d is configured to output the low-order partial product after the sign bit expansion.
Specifically, the low-order partial product acquiring unit 1112 in the data processor may receive the low-order target code output from the low-order booth encoding unit 1111 through the low-order target code input port 1112a, and may receive two sub-data (i.e., multiplicand) in the second data through the data input port 1112 c. Optionally, the low-order partial product acquiring unit 1112 in the data processor may obtain a partial product after the sign bit expansion corresponding to the low-order data according to the received low-order target code and the received multiplicand in the multiplication operation or the multiply-accumulate operation; and combining the partial product after the sign bit expansion with the numerical value in the low-order partial product after the sign bit expansion of the strobe to obtain the low-order partial product after the sign bit expansion. Alternatively, if the multiplied bit width received by the low-order partial product acquisition unit 1112 through the data input port 1112c is N, the bit width of the low-order partial product obtained by the low-order partial product acquisition unit 1112 after the sign bit expansion may be equal to 2N. For example, if the low-order partial product acquiring unit 1112 receives a multiplicand X with an N-bit width, the low-order partial product acquiring unit 1112 may obtain a corresponding symbol-bit-expanded partial product according to the multiplicand X and the five kinds of target codes (i.e., the low-order target codes, -2X, -X, and 0), where the low (n+1) bit value in the symbol-bit-expanded partial product may be equal to the value included in the original partial product, and the high (N-1) bit value in the symbol-bit-expanded low-order partial product may be equal to the symbol bit value of the original partial product, that is, the highest bit value in the original partial product. When the low-order target code is-2X, the original partial product can be obtained by inverting left and right bits of X and adding 1, when the low-order target code is 2X, the original partial product can be obtained by shifting left one bit of X, when the low-order target code is-X, the original partial product can be obtained by inverting bit of X and adding 1, when the low-order target code is X, the original partial product can be obtained by combining X with the sign bit value (namely the highest bit value of X) of X, when the low-order target code is +0, the original partial product can be obtained by adding 0, namely each bit value in the 9-bit original partial product is equal to 0.
It should be noted that, the low-order partial product acquiring unit 1112 in the data processor may receive, through the strobe value input port 1112b, the corresponding bit value in the low-order partial product after the obtained sign bit expansion when the data operation in the different modes is strobed by the low-order selector group unit 1116; and then combining the partial product after sign bit expansion obtained by the data processor with the corresponding bit value after gating to obtain a low-order partial product after sign bit expansion.
Optionally, the data processor includes the high-order partial product acquiring unit 1115, and the high-order partial product acquiring unit 1115 includes: a high order target encoding input port 1115a, a strobe value input port 1115b, a data input port 1115c, and a high order partial product output port 1115d; the high-order target code input port 1115a is configured to receive the high-order target code output by the high-order booth encoding unit 1114, the strobe value input port 1115b is configured to receive the value in the high-order partial product after the sign bit expansion output by the high-order selector bank unit 1117 is strobed, the data input port 1115c is configured to receive the second data, and the high-order partial product output port 1115d is configured to output the high-order partial product after the sign bit expansion.
It is to be understood that, the method of the low-order partial product acquiring unit 1112 acquiring the low-order partial product after the sign bit expansion is the same as the method of the high-order partial product acquiring unit 1114 acquiring the high-order partial product after the sign bit expansion, and the method of the high-order partial product acquiring unit 1114 acquiring the partial product is not repeated in this embodiment. In addition, the internal circuit structures of the low-order partial product acquiring unit 1112 and the high-order partial product acquiring unit 1114 may be the same, and the functions of the external output ports may be similar, so that the specific structure of the high-order partial product acquiring unit 1114 will not be described in detail in this embodiment.
According to the low-order partial product obtaining unit in the data processor, the partial product after the sign bit expansion can be obtained according to each low-order target code, then the partial product after the sign bit expansion is combined with the numerical value selected by the low-order selector group unit to obtain the low-order partial product after the sign bit expansion, further the low-order partial product of the target code is obtained according to the low-order partial product after the sign bit expansion, and the accumulation processing is carried out on the low-order partial product of the target code and the high-order partial product of the target code to obtain data operation results of different modes.
In one embodiment, the data processor includes a selector 1113, where the selector 1113 includes: a function selection mode signal input port (mode) 1113a, a first strobe value input port 1113b, a second strobe value input port 1113c, and a strobe result output port 1113d; the function selection mode signal input port 1113a is configured to receive the function selection mode signal corresponding to a data operation of a different mode, the first strobe value input port 1113b is configured to receive a first strobe value, the second strobe value input port 1113c is configured to receive a second strobe value, and the strobe result output port 1113d is configured to output the first strobe value or the second strobe value after being strobed.
Specifically, the selector 1113 may determine, based on the function selection mode signal received by the function selection mode signal input port 1113a, a data mode currently required to be processed by the correction encoding circuit 111, and determine whether the strobe result output port 1113d outputs the first strobe value or the second strobe value. Alternatively, the first strobe data may be a value of 0 or the highest value of the low-order data in the sub-data to be encoded, and the second strobe data may also be a value of 0 or the highest value of the low-order data in the sub-data to be encoded. That is, in the first strobe value input port 1113b and the second strobe value input port 1113c, one of the input ports may strobe a value of 0, and the other input port may strobe a highest value of low-order data in the sub-data to be encoded, and the embodiment does not limit any corresponding relation.
For example, in the operation process, if the bit width of two sub-data in the first data received by the correction encoding circuit 111 is 2N and the 2N-bit data is currently required to be subjected to the booth encoding process, the selector 1113 may receive the first strobe value through the first strobe value input port 1113b, and in addition, the selector 1113 may also receive the second strobe value through the second strobe value input port 1113, where the first strobe value or the second strobe value may be equal to the value 0; if the bit width of two sub-data in the first data received by the correction encoding circuit 111 is 2N and the N-bit data can be currently subjected to the booth encoding process, the selector 1113 may receive the first strobe value through the first strobe value input port 1113b, and in addition, the selector 1113 may also receive the second strobe value through the second strobe value input port 1113c, where the first strobe value or the second strobe value may be the highest bit value in the sub-data to be encoded.
According to the data processor provided by the embodiment, the data processor can determine the bit supplementing numerical value when the high-order data is subjected to Booth coding processing through the function selection mode signals received by the selector, so that the Booth coding processing is performed on the high-order data after bit supplementing, data operation processing in different modes can be realized, and the universality of the data processor is improved.
In one embodiment, the data processor includes a low-order selector set unit 1116 therein, and the low-order selector set unit 1116 includes: the low-order selector 1116a, a plurality of the low-order selectors 1116a are used to gate the values in the low-order partial product after the sign bit expansion.
Specifically, the number of low-order selectors 1116a included in the low-order selector set unit 1116 may be equal to 3/8 times the square of the multiplicand bit width when the data processor currently performs multiplication or multiply-accumulate operation, and the internal circuit structures of the plurality of low-order selectors 1116a in the low-order selector set unit 1116 may be identical. Alternatively, if the data processor currently needs to process multiplication of N bits of data, each of the low-order booth encoding units 1111 may be connected to a corresponding low-order partial product acquisition unit 1112 and include 2N number generation subunits, where the N number generation subunits may be connected to N low-order selectors 1116a, and each number generation subunit is connected to one low-order selector 1116a. Alternatively, the N number generating subunits corresponding to the N low-order selectors 1116a may be number generating subunits corresponding to the high N number in the low-order partial product after the sign bit expansion, and the internal circuit structures of the N low-order selectors 1116a and the selector 1113 may be identical. Meanwhile, the external input ports of the N low-order selectors 1116a have two other input ports in addition to the function selection mode signal input port (mode). Alternatively, if the data processor can process data operations in four different modes and the multiplicand bit width received by the data processor is N, the signals respectively received by the two other input ports of the low-order selector 1116a may be a value of 0, and when the data processor performs multiplication operation of N bits of data, the low-order partial product obtaining unit 1112 obtains the sign bit value in the corresponding sign bit expanded low-order partial product. The N/4 low-order partial product acquiring units 1112 may be connected to N/4 groups of N low-order selectors 1116a, where the sign bit values received by the N low-order selectors 1116a in each group may be the same or different, but the sign bit values received by the N low-order selectors 1116a in the same group are the same, and the sign bit values may be obtained from the sign bit values in the sign bit extended low-order partial product acquired by the corresponding connected low-order partial product acquiring unit 1112 according to each group of N low-order selectors 1116a.
In addition, among the 2N number generation subunits included in each low-order partial product acquisition unit 1112, the corresponding N/2 number generation subunits may not be connected to the low-order selector 1116a; at this time, the value obtained by the N/2 value generating subunit may be the corresponding bit value in the symbol bit expanded low-order partial product obtained by multiplying data with different bit widths currently processed by the data processor; that is, the values obtained by the N/2 number generation subunit may be all the values between the ((N/2) +1) th bit and the nth bit from the lowest bit (i.e., the 1 st bit) to the highest bit in the lower partial product after the corresponding sign bit expansion.
Among the 2N number generating subunits included in each low-order partial product acquiring unit 1112, the remaining N/2 number generating subunits may be connected to N/2 low-order selectors 1116a, each of the N/2 number generating subunits may be connected to 1 low-order selector 1116a, the internal circuit structures of the N/2 low-order selectors 1116a and the selector 1113 may be the same, and the external input port of the N/2 low-order selector 1116a may have two other input ports in addition to the function selection mode signal input port (mode), and the signals received by the two other input ports may be the sign bit values in the low-order partial product obtained by multiplying N/2 bits of N/2 bit data by the data processor and the sign bit values in the low-order partial product obtained by multiplying N bits of data by the data processor. The N/4 low-order partial product acquiring units 1112 may be connected to N/4 groups of N/2 low-order selectors 1116a, where the sign bit values received by the N/2 low-order selectors 1116a of each group may be the same or different, but the sign bit values received by the N/2 low-order selectors 1116a of the same group may be the same, and the sign bit values may be obtained from the sign bit values in the sign bit extended low-order partial product acquired by the corresponding connected low-order partial product acquiring unit 1112 according to each group of N/2 low-order selectors 1116 a.
In addition, the corresponding bit value in the low-order partial product after the sign bit expansion received by the N/2 low-order selectors 1116a of each group may be determined according to the corresponding bit value in the low-order partial product after the sign bit expansion obtained by the low-order partial product obtaining unit 1112 connected to the low-order selector 1116a of the group, and the corresponding bit value received by each low-order selector 1116a may be the same or different in the N/2 low-order selectors 1116a of each group. The positions of the 2N number generating sub-units in each of the low-order partial product acquiring units 1112 may be shifted left by two number generating sub-units based on the positions of the 2N number generating sub-units in the last low-order partial product acquiring unit 1112. Alternatively, in the low-order partial product after the sign bit expansion, only the bit width of the first low-order partial product may be equal to 2N, the remaining low-order partial products may be two higher-order numerical values based on the last low-order partial product, and the bit width of the last low-order partial product may be equal to (3N/2+2).
Optionally, the high selector bank unit 1117 includes: the high-order selector 1117a, a plurality of the low-order selectors 1117a are used to gate the numerical value in the high-order partial product after the sign bit expansion.
Note that, the method of gating the high-order selector 1117a is the same as the method of gating the low-order selector 1116a, and the method of gating the high-order selector 1117a is not described in detail in this embodiment.
According to the data processor provided by the embodiment, the low-order selector group unit in the data processor can gate the numerical value in the low-order partial product after the sign bit expansion to obtain the low-order partial product after the sign bit expansion, the low-order partial product of the target code is obtained according to the low-order partial product after the sign bit expansion, and further the compression circuit is used for accumulating the low-order partial product of the target code and the high-order partial product of the target code to obtain operation results of different modes.
In one embodiment, the data processor includes a first partial product selection subcircuit 112, the first partial product selection subcircuit 112 including: a function select mode signal input port (mode) 1121, a first partial product input port 1122, a second partial product input port 1123, a first partial product output port 1124, and a strobe partial product output port 1125; the function selection mode signal input port (mode) 1121 is configured to receive the function selection mode signal, the first partial product input port 1122 is configured to receive a first partial product of the sign bit expansion input from the first correction coding sub-circuit 111, the second partial product input port 1123 is configured to receive a second partial product of the sign bit expansion input from the partial product switching circuit 13, the first partial product output port 1124 is configured to output the first partial product of the sign bit expansion required to be switched by the partial product switching circuit 13, and the strobe partial product output port 1125 is configured to output the first partial product of the sign bit expansion input from the strobe, and the second partial product of the sign bit expansion input.
Specifically, if the data processor can currently process a multiply-accumulate operation of 2N bits of data, the partial product switching circuit 13 can switch the second partial product after the sign bit expansion and the first partial product after the sign bit expansion, at this time, the first partial product selecting sub-circuit 112 can receive the second partial product after the sign bit expansion switched by the partial product switching circuit 13 through the second partial product input port 1123, and output the first partial product after the sign bit expansion to be switched to the partial product switching circuit 13 through the first partial product output port 1124. The gating partial product output port 1125 may gate the first partial product after the sign bit expansion and the second partial product after the sign bit expansion that do not need to be exchanged, and the first partial product selecting sub-circuit 112 inputs the first partial product after the sign bit expansion and/or the second partial product after the sign bit expansion that do not need to be exchanged as the partial product of the target code to the first correction compressing sub-circuit 113 to perform the compression processing.
According to the data processor provided by the embodiment, the first partial product selection sub-circuit can select the partial product after the sign bit expansion to obtain the target encoded partial product, so that the data processor can realize multiplication operation and multiplication and accumulation operation of parity data and multiplication and accumulation operation of different bit width data, and the universality of the data processor is improved.
In one embodiment, the data processor includes a first modified compression sub-circuit 113, the first modified compression sub-circuit 113 including: the system comprises a modified Wallace tree group circuit 1131 and an accumulation circuit 1132, wherein the output end of the modified Wallace tree group circuit 1131 is connected with the input end of the accumulation circuit 1132; the modified wallace tree group circuit 1131 is configured to perform accumulation processing on each column value in the first low-order partial product of the target code and the first high-order partial product of the target code obtained during data operation processing in different modes to obtain an accumulation operation result, and the accumulation circuit 1132 is configured to perform addition operation on the accumulation operation result.
Specifically, the modified wallace tree group circuit 1131 may perform accumulation processing on each column value in the first low-order partial product of the target code and the first high-order partial product of the target code obtained by the first modified code sub-circuit 111, and perform accumulation processing on two operation results obtained by the modified wallace tree group circuit 1131 through the accumulation circuit 1132, so as to obtain a target operation result. When the modified wallace tree set circuit 1131 performs accumulation processing, the distribution rule of all partial products of the target code may be characterized in that the position of the lowest numerical value in the partial product of each row corresponding to the target code is shifted by two numerical values to the right compared with the position of the lowest numerical value in the partial product of the next row corresponding to the target code, however, in the distribution form of all partial products of the target code, the highest numerical value in the partial product of each row corresponding to the target code is located in the same column with the highest numerical value in the partial product of the first row corresponding to the target code, and the modified wallace tree set circuit 1131 performs accumulation processing on each column numerical value in all partial products of the target code according to the distribution rule, where all partial products of the target code may include the first low partial product of the target code and the first high partial product of the target code. Optionally, the two operation results obtained by the modified wallace tree group circuit 1131 may include a Sum output signal Sum and a Carry output signal Carry.
For example, if the data processor is currently processing 16-bit fixed-point number multiplication, the distribution of the 4 low-order partial products and the 4 high-order partial products of the target code obtained by the first partial product obtaining circuit 22 is shown in fig. 4a, where "Σ" represents each bit value in the low-order partial product,each of the upper partial products is represented by a numeric value, "+." represents a numeric value of a sign extension of the lower partial product or the upper partial product.
If the data processor is in the circuit structure shown in fig. 3, the data processor currently processes a 16-bit by 8-bit fixed-point multiply-accumulate operation, and the distribution rule of the partial products of the target code received by the first correction compression sub-circuit 113 or the second correction compression sub-circuit 123 is shown in fig. 4b, where "Σ" represents the partial product obtained by the first partial product selection sub-circuit 112 or the second partial product selection sub-circuit 122,indicating that the first partial product selection sub-circuit 112 passes through the partial product switching circuitThe second partial product obtained by the second partial product selecting sub-circuit 122 acquired by the circuit 13, or the second partial product selecting sub-circuit 121 is obtained by the first partial product selecting sub-circuit 112 acquired by the partial product switching circuit 13. / >
In addition, the method for processing data by the second modified compression sub-circuit 123 is the same as the method for processing data by the first modified compression sub-circuit 113, and the internal structures and the functions of the external output ports of the second modified compression sub-circuit 123 and the first modified compression sub-circuit 113 are also the same, and the method and the structure for processing data by the second modified compression sub-circuit 123 are not repeated in this embodiment.
According to the data processor provided by the embodiment, the data processor can carry out accumulation processing on the low-order partial product and the high-order partial product of the target code by correcting the Wallace tree group circuit, and the accumulation circuit is used for carrying out accumulation processing on the accumulation result to obtain the target operation result.
In one embodiment, the data processor includes a modified Wallace tree group circuit 1131, the modified Wallace tree group circuit 1131 including: the low-level Wallace tree sub-circuit 1131a, the selector 1131b and the high-level Wallace tree sub-circuit 1131c, wherein the output end of the low-level Wallace tree sub-circuit 1131a is connected with the input end of the selector 1131b, and the output end of the selector 1131b is connected with the input end of the high-level Wallace tree sub-circuit 1131 c; the low-level wallace tree sub-circuit 1131a is configured to perform an accumulation operation on each column number value in the first partial product of the target code to obtain the accumulation operation result, the selector 1131b is configured to gate a carry input signal received by the high-level wallace tree sub-circuit 1131c, and the high-level wallace tree sub-circuit 1131c is configured to perform an accumulation operation on each column number value in the first partial product of the target code to obtain the accumulation operation result.
Specifically, a plurality of low-level Wallace tree subcircuits 1131a and a plurality of high-level Wallace tree subcircuits 1131The circuit structure of c can be realized by a combination of a full adder and a half adder, can be realized by a combination of a 4-2 compressor, and can be also understood as a circuit capable of processing multi-bit input signals and adding the multi-bit input signals to obtain two-bit output signals. Optionally, the number of the upper wallace tree sub-circuits 1131c in the modified wallace tree group circuit 1131 may be equal to the bit width N of the multiplicand in the multiplication operation or the multiply-accumulate operation currently processed by the data processor, or may be equal to the number of the lower wallace tree sub-circuits 1131a, and the lower wallace tree sub-circuits 1131a may be connected in series, and the upper wallace tree sub-circuits 1131c may also be connected in series. Optionally, the output of the last low-level wale tree sub-circuit 1131a is connected to the input of the selector 1131b, and the output of the selector 1131b is connected to the input of the first high-level wale tree sub-circuit 1131 a. Optionally, in the modified wallace tree group circuit 1131, each low-order wallace tree sub-circuit 1131a may perform addition processing on each column of all partial products of the target code, and each low-order wallace tree sub-circuit 1131a may output two signals, namely a Carry signal Carry i And a Sum bit signal Sum i Where i may represent a number corresponding to each low-level wallace tree sub-circuit 1131a, and the number of the first low-level wallace tree sub-circuit 1131a is a value of 0. Alternatively, the number of input signals received by each low-order Wallace tree subcircuit 1131a may be equal to the number of first partial products of the target code. In the modified wallace tree group circuit 1131, the sum of the numbers of the upper wallace tree sub-circuit 1131c and the lower wallace tree sub-circuit 1131a may be equal to 2N, the total number of columns from the lowest column to the highest column may be equal to 2N among all first partial products of the target code, the N lower wallace tree sub-circuits 1131a may perform an accumulation operation on each of the low N columns of all first partial products of the target code, and the N upper wallace tree sub-circuits 1131c may perform an accumulation operation on each of the high N columns of all first partial products of the target code.
Illustratively, if the current data processor needs to perform a multiplication operation of 2N bits by 2N bits of data, then the selector 1131b can gate and correct the last low-order Wallace tree sub-circuit 1131a in the Wallace tree group circuit 1131 to output a carry output signal Cout N In the modified Wallace tree group circuit 1131, the carry input signal Cin received by the first higher Wallace tree sub-circuit 1131c N+1 The method comprises the steps of carrying out a first treatment on the surface of the If the current data processor needs to process multiplication of N bits of data, the selector 1131b may gate 0 as the carry input signal Cin received by the first higher Wallace tree sub-circuit 1131c in the modified Wallace tree group circuit 1131 N+1 It will be further understood that the data processor may currently divide the received 2N-bit sub-data into high N/2 bit data and low N/2 bit data for multiplication, wherein the corresponding numbers i from the first low-level wallace tree sub-circuit 1131a to the last low-level wallace tree sub-circuit 1131a are respectively 1,2, …, N, and the corresponding numbers i from the first high-level wallace tree sub-circuit 1131c to the last high-level wallace tree sub-circuit 1131c are respectively n+1, n+2, …,2N.
It should be noted that, the modified Wallace tree group circuit 1131 includes each of the low-level Wallace tree sub-circuit 1131a and the high-level Wallace tree sub-circuit 1131c, and the received signal may include the carry input signal Cin i Partial product value input signal, carry output signal Cout i . Optionally, the partial product value input signals received by each of the low-level Wallace tree sub-circuit 1131a and the high-level Wallace tree sub-circuit 1131c may be the values of the corresponding columns in all the first partial products of the target code, and each of the low-level Wallace tree sub-circuit 1131a and the high-level Wallace tree sub-circuit 1131c outputs the carry signal Cout i The number of bits of (a) may be equal to N Cout =floor((N I +N Cin )/2) -1. Wherein N is I Can represent the number, N, of partial product value input signals of the low-order Wallace tree sub-circuit 1131a or the high-order Wallace tree sub-circuit 1131c Cin Can represent the number of carry input signals, N, of the low-order Wallace tree sub-circuit 1131a or the high-order Wallace tree sub-circuit 1131c Cout Can represent the minimum number of carry out signals, fl, of the low-order Wallace tree subcircuit 1131a or the high-order Wallace tree subcircuit 1131coor (·) can represent a downward rounding function. Optionally, in the modified wallace tree group circuit 1131, the carry input signal received by each lower wallace tree sub-circuit 1131a or the higher wallace tree sub-circuit 1131c may be the carry output signal output by the last lower wallace tree sub-circuit 1131a or the higher wallace tree sub-circuit 1131c, and the carry input signal received by the first lower wallace tree sub-circuit 1131a is a value of 0. The input signal of the advanced bit received by the first high-order wallace tree sub-circuit 1131c may be determined by the data bit width of the different modes currently processed by the data processor and the bit width of the multiplicand in the multiplication operation or multiply-accumulate operation that the data processor needs to process.
According to the data processor provided by the embodiment, the partial product of the target code can be accumulated by the data processor through the correction Wallace tree group circuit to obtain two paths of output signals, and the two paths of output signals are accumulated by the accumulation circuit to obtain data operation results in different modes; in addition, the data processor can complete multiply-accumulate operation without carrying out accumulation operation on multiplication operation results, and can directly realize multiply-accumulate or multiplication operation only through one operation process, thereby reducing the power consumption of the data processor.
In one embodiment, the data processor includes an accumulation circuit 1132, the accumulation circuit 1132 including: and an adder 1132a, where the carry adder 1132a is configured to add the accumulated result.
In particular, adder 1132a may be a carry adder of different bit widths. Optionally, the adder 1132a may receive two signals output by the modified wallace tree group circuit 1131, perform addition operation on the two output signals, and output a data operation result of the current processing mode of the data processor. Alternatively, the adder 1132a may be a carry-ahead adder, and the bit width of the processed data corresponding to the carry-ahead adder may be equal to the bit width of the operation result output by the modified wallace tree group circuit 1131.
According to the data processor provided by the embodiment, the data processor can carry out accumulation processing on two paths of signals output by the correction Wallace tree group circuit through the accumulation circuit and output data operation results in different modes, the data processor can complete multiply-accumulate operation without carrying out accumulation operation on the multiply operation results again, and multiplication or multiply-accumulate operation can be directly realized through one operation process, so that the power consumption of the data processor is reduced.
In one embodiment, the data processor includes a second partial product selection subcircuit 122, the second partial product selection subcircuit 122 including: a function select mode signal input port (mode) 1221, a second partial product input port 1222, a first partial product input port 1223, a second partial product output port 1224, and a strobe partial product output port 1225; the function selection mode signal input port (mode) 1221 is configured to receive the function selection mode signal, the second partial product input port 1222 is configured to receive a second partial product of the sign bit extension output by the second modified encoding sub-circuit 121, the first partial product input port 1223 is configured to receive a first partial product of the sign bit extension exchanged by the partial product exchanging circuit 13, the second partial product output port 1224 is configured to output a second partial product of the sign bit extension required to be exchanged by the partial product exchanging circuit 13, and the strobe partial product output port 1225 is configured to output a second partial product of the sign bit extension after strobe, and the received first partial product of the sign bit extension.
Specifically, if the data processor can currently process a multiply-accumulate operation of 2N bits of data, the partial product switching circuit 13 can switch the second partial product after the sign bit expansion and the first partial product after the sign bit expansion, at this time, the second partial product selecting sub-circuit 122 can receive the first partial product after the sign bit expansion switched by the partial product switching circuit 13 through the first partial product input port 1223, and output the second partial product after the sign bit expansion to be switched to the partial product switching circuit 13 through the second partial product output port 1224. The gating partial product output port 1225 may gate the second partial product after the symbol bit expansion and the first partial product after the received symbol bit expansion, which do not need to be exchanged, and the second partial product selecting sub-circuit 122 inputs the second partial product after the symbol bit expansion and/or the first partial product after the received symbol bit expansion, which do not need to be exchanged, as the partial product of the target code, to the second correction compressing sub-circuit 123 for compression processing.
According to the data processor provided by the embodiment, the second partial product selection sub-circuit can select the partial product after the sign bit expansion so as to obtain the target encoded partial product, so that the data processor can realize multiplication operation and multiply-accumulate operation of the parity data and multiply-accumulate operation of different bit width data, and the universality of the data processor is improved.
In one embodiment, the data processor comprises a partial product switching circuit 13, the partial product switching circuit 13 comprising: a function selection mode signal input port (mode) 131, a first partial product input port 132, a first partial product output port 133, a second partial product input port 134, and a second partial product output port 135, where the function selection mode signal input port (mode) 131 is configured to receive the function selection mode signal, the first partial product input port 132 is configured to receive a first partial product after sign bit expansion to be exchanged input by the first multiplication circuit 11, the first partial product output port 133 is configured to output the first partial product after sign bit expansion, the second partial product output port 134 is configured to receive a second partial product after sign bit expansion to be exchanged input by the second multiplication circuit 12, and the second partial product output port 135 is configured to output the second partial product after sign bit expansion.
It will be specifically understood that the partial product switching circuit 13 determines whether the first partial product after the sign bit expansion and the second partial product after the sign bit expansion need to be switched currently according to the function selection mode signal input port (mode) 131 and the received function selection mode signal, and at this time, the partial product switching circuit 13 may switch the first lower partial product after the sign bit expansion and the second lower partial product after the sign bit expansion or switch the first upper partial product after the sign bit expansion and the second upper partial product after the sign bit expansion. However, in this embodiment, only when the data processor needs to perform multiply-accumulate operation of 2N bits of data, the partial product switching circuit 13 needs to switch the partial product after the sign bit expansion, and when performing data operation of other three modes, the partial product switching circuit 13 may not need to perform switching processing.
According to the data processor provided by the embodiment, the first partial product obtained by the first multiplication circuit and the second partial product obtained by the second multiplication circuit after the sign bit is expanded can be exchanged through the first partial product exchange circuit, so that multiplication and accumulation of 2N-bit data are realized, the data processor not only can realize multiplication and accumulation of parity data, but also can realize multiplication and accumulation of different bit data, and therefore the universality of the data processor is improved.
Fig. 5 is a schematic diagram of a specific structure of a data processor provided in another embodiment, where the data processor includes a booth encoding circuit 21, the booth encoding circuit 21 includes a low-level booth encoding unit 211, a selector 212, and a high-level booth encoding unit 213, an output end of the low-level booth encoding unit 211 is connected to an input end of the selector 212, and an output end of the selector 212 is connected to an input end of the high-level booth encoding unit 213. The low-level booth encoding unit 211 is configured to perform booth encoding processing on low-level data in the received first data to obtain a low-level target encoding, the selector 212 is configured to receive the function selection mode signal, gate high-level data in the first data according to the function selection mode signal, and perform a complementary bit value during booth encoding processing, and the high-level booth encoding unit 213 is configured to perform booth encoding processing on the received high-level data in the first data to obtain a high-level target encoding.
Specifically, the data processor may determine whether the data bit width currently required to be processed by the booth encoding circuit 21 is N or 2N according to the function selection mode signal received by the selector 213. If the data bit width to be processed by the booth encoding circuit 21 is N, the booth encoding circuit 21 may automatically divide the 2N-bit sub-data into high N-bit data (i.e., high-bit data) and low N-bit data (i.e., low-bit data), and perform booth encoding processing on the two data respectively; if the data bit width currently required to be processed by the booth encoding circuit 21 is 2N, the booth encoding circuit 21 may perform booth encoding processing on two 2N-bit sub-data as a whole. In addition, no matter whether the bit width of the data to be processed by the booth encoding circuit 21 is the same as the bit width of the sub-data in the first data received by the booth encoding circuit 21, when the booth encoding process is implemented, the low-bit booth encoding unit 211 needs to automatically perform the bit-filling process on the low-bit data in the data to be encoded, and the bit-filling data is a value of 0.
For example, if the booth encoding circuit 21 is currently required to process 2N bits of data, that is, the booth encoding circuit 21 is required to perform booth encoding processing on the complete sub-data a in the first data, wherein the higher order data in the sub-data a is denoted as a 1 The lower bit data is denoted as A 2 The low-order booth encoding unit 211 may encode a 2 The Booth coding process is performed and A is automatically performed 2 The bit compensation process is performed, the bit compensation value is 0, and the selector 212 can gate the low bit A during the encoding process 2 The highest numerical value of (2) is used as the highest numerical value of the high-order Booth coding unit 213 for the high-order data A 1 The bit number compensation value is carried out during the Booth coding treatment; if the Booth coding circuit 21 needs to process the N-bit data currently, the Booth coding circuit 21 needs to process the high-order data A in the sub-data A 1 And low bit data A 2 Respectively, the booth encoding process is performed, and at this time, the selector 212 may gate 0 as the high-order booth encoding unit 213 for the high-order data a 1 And (5) performing bit number compensation during Booth coding.
It should be noted that, the first data may include two 2N-bit sub-data, and if the booth encoding circuit 21 currently needs to encode the 2N-bit sub-data, the low-level data in the first data may include two corresponding low-level data in the two 2N-bit sub-data; if the booth encoding circuit 21 needs to process the N-bit data currently, it is equivalent to dividing the two 2N-bit sub-data into two N-bit sub-data, that is, four N-bit sub-data, at this time, the low-bit data in the first data may include four low-bit data corresponding to the two 2N-bit sub-data. In addition, in the booth encoding process, the number of low-order target codes obtained by the booth encoding circuit 21 may be equal to the number of high-order target codes obtained, or may be equal to the number of first low-order partial products of target codes corresponding to low-order data, or the number of first high-order partial products of target codes corresponding to high-order data. In addition, if the data processor is currently processing a multiplication of N bits of data, at this time, one of the first data and the second data has a sub-data value of 0, and the upper N bits or the lower N bits of the other sub-data have all a value of 0.
It will also be appreciated that if the booth encoding circuit 21 is currently required to process 2N bits of data, the booth encoding circuit 21 may include N lower booth encoding units 211, N higher booth encoding units 213, and 1 selector 212; if the booth encoding circuit 21 is currently required to process N bits of data, the booth encoding circuit 21 may include N/2 low booth encoding units 211, N/2 high booth encoding units 213, and 1 selector 212.
According to the data processor provided by the embodiment, the low-order Booth coding unit, the selector and the high-order Booth coding unit in the Booth coding circuit are used for carrying out Booth coding processing on received first data to obtain the low-order target coding and the high-order target coding, and then according to the low-order target coding and the high-order target coding, data operation processing in various different modes is realized.
In one embodiment, the data processor includes the low-order booth encoding unit 211, and the low-order booth encoding unit 211 includes: a low bit data input port 2111 and a low bit target code output port 2112. The low-order data input port 2111 is configured to receive low-order data in the first data subjected to booth encoding, and the low-order target encoding output port 2112 is configured to output a first low-order target encoding obtained after booth encoding is performed on the low-order data in the first data.
Optionally, the data processor further includes the high-order booth encoding unit 213, and the high-order booth encoding unit 213 includes: high bit data input port 2131 and high bit target encoded output port 2132; the high-order data input port 2131 is configured to receive the high-order data in the first data subjected to the booth encoding process, and the high-order target encoding output port 2132 is configured to output a high-order target encoding obtained after the booth encoding process is performed on the high-order data in the first data.
Specifically, each time the booth encoding process is performed, the method of booth encoding the high-order data in the first data by the high-order booth encoding unit 213 is the same as the method of booth encoding the low-order data in the first data by the low-order booth encoding unit 211, and the method of booth encoding the high-order booth encoding unit 213 is not repeated in this embodiment. In addition, the internal circuit structures of the high-order booth encoding unit 213 and the low-order booth encoding unit 211 may be the same, and the functions of the external output ports may be the same, so that the specific structure of the high-order booth encoding unit 213 is not described in detail in this embodiment.
According to the data processor provided by the embodiment, the low-order data in the first data is subjected to Booth coding processing through the low-order Booth coding unit to obtain the low-order target code corresponding to the low-order data, the low-order partial product of the target code is obtained through the low-order partial product obtaining unit according to the low-order target code, and further the low-order partial product and the high-order partial product of the target code are accumulated to obtain data operation results of different modes.
In one embodiment, the data processor includes the selector 212, the selector 212 including: a function select mode signal input port (mode) 2121, a first strobe value input port 2122, a second strobe value input port 2123, and a strobe result output port 2124; the function selection mode signal input port 2121 is configured to receive the function selection mode signal corresponding to the data operation of the different modes, the first strobe value input port 2122 is configured to receive a first strobe value, the second strobe value input port 2123 is configured to receive a second strobe value, and the strobe result output port 2124 is configured to output the first strobe value or the second strobe value after being strobed.
Specifically, the selector 212 may determine, according to the function selection mode signal received by the function selection mode signal input port 2121, a data mode that the booth encoding circuit 21 currently needs to process, and determine whether the strobe result output port 2124 outputs the first strobe value or the second strobe value. Alternatively, the first strobe data may be a value of 0 or the highest value of the low-order data in the sub-data to be encoded, and the second strobe data may also be a value of 0 or the highest value of the low-order data in the sub-data to be encoded. That is, one of the first strobe value input port 2122 and the second strobe value input port 2123 may strobe a value of 0, and the other may strobe a highest value of low-order data in the sub-data to be encoded, which is not limited in this embodiment.
According to the data processor provided by the embodiment, the data processor can determine the bit supplementing numerical value when the high-order data is subjected to Booth coding processing through the function selection mode signals received by the selector, so that the Booth coding processing is performed on the high-order data after bit supplementing, data operation processing in different modes can be realized, and the universality of the data processor is improved.
As one embodiment, the first partial product acquiring circuit 22 in the data processor includes: a low-order partial product acquisition unit 221, a low-order selector group unit 222, a high-order partial product acquisition unit 223, and a high-order selector group unit 224; a first input terminal of the low-order partial product obtaining unit 221 is connected to an output terminal of the low-order booth encoding unit 211, a second input terminal of the low-order partial product obtaining unit 221 is connected to an output terminal of the low-order selector group unit 222, a first input terminal of the high-order partial product obtaining unit 223 is connected to an output terminal of the high-order booth encoding unit 213, and a second input terminal of the high-order partial product obtaining unit 223 is connected to an output terminal of the high-order selector group unit 224.
The low-order partial product obtaining unit 221 is configured to obtain a first low-order partial product after symbol bit expansion according to the low-order target code and the second data, and obtain a first low-order partial product of the target code according to the first low-order partial product after symbol bit expansion, the low-order selector group unit 222 is configured to gate a value in the first low-order partial product after symbol bit expansion according to the received function selection mode signal, the high-order partial product obtaining unit 223 obtains a first high-order partial product after symbol bit expansion according to the high-order target code and the second data, and obtain a first high-order partial product of the target code according to the first high-order partial product after symbol bit expansion, and the high-order selector group unit 224 is configured to gate a value in the first high-order partial product after symbol bit expansion according to the received function selection mode signal.
It may be specifically understood that the low-order partial product obtaining unit 221 in the data processor may obtain a corresponding partial product after the sign bit expansion according to each low-order target code input by the low-order booth encoding unit 211; the low selector bank unit 222 may gate the value in the first low partial product after the sign bit expansion; combining the partial product after the sign bit expansion with the numerical value in the first low-order partial product after the sign bit expansion after the gating to obtain the first low-order partial product after the sign bit expansion; similarly, the high-order partial product obtaining unit 223 in the data processor may obtain a partial product after the sign bit expansion corresponding to the high-order data in the first data according to each high-order target code input by the high-order booth encoding unit 213; the high selector block 224 may gate the value in the first high partial product after the sign bit expansion; and then, the partial product after the sign bit expansion and the numerical value in the first high-order partial product after the sign bit expansion after the gating are carried out to obtain the first high-order partial product after the sign bit expansion.
It should be noted that, if the data processor can currently process multiplication of 2N bits by 2N bits of data, the first partial product acquiring circuit 22 may include N/2 low-order partial product acquiring units 221 and N/2 high-order partial product acquiring units 223, and at this time, each of the low-order partial product acquiring units 221 and each of the high-order partial product acquiring units 223 may include 4N number generating subunits; if the data processor currently needs to process N bits of data, the first partial product acquiring circuit 22 may include N/4 low-order partial product acquiring units 221 and N/4 high-order partial product acquiring units 223, where each of the low-order partial product acquiring units 221 and each of the high-order partial product acquiring units 223 may include 2N number of number generating subunits. Alternatively, each value generating subunit may obtain a value in the first partial product after the sign bit expansion.
In addition, the method of the first partial product acquiring circuit 22 acquiring the first partial product after the sign bit expansion is the same as the method of the second partial product acquiring circuit 23 acquiring the second partial product after the sign bit expansion, and the method of the second partial product acquiring circuit 23 acquiring the partial product is not repeated in this embodiment. In addition, the internal circuit structures of the first partial product acquiring circuit 22 and the second partial product acquiring circuit 23 may be the same, and the functions of the external output ports may be the same, so that the specific structure of the second partial product acquiring circuit 23 is not described in detail in this embodiment.
Further, the first partial product obtaining circuit 22 may obtain a corresponding first low partial product of the target code according to the first low partial product after all the sign bit expansion, where the distribution rule of the first low partial product of the target code may be that the first low partial product of the first target code may be equal to the first low partial product after the first sign bit expansion, that is, the first low partial product after the sign bit expansion corresponding to the lowest bit value in the low partial code, starting from the first low partial product of the second target code, the highest bit value in the first low partial product of each target code is located in the same column as the highest bit value in the first low partial product of the first target code, and the first low partial product of each target code may be equal to the first low partial product after the corresponding sign bit expansion, where the lowest bit value of the first low partial product after the sign bit expansion is located in the same column as the next higher bit value of the first low partial product of the last target code, that is located in the same column as the next higher bit value of the first low partial product of the first target code, that is not located in the next higher bit product than the first low partial product of the first target code.
Meanwhile, the first partial product acquiring circuit 22 may obtain a corresponding first partial product of the target code according to the first partial product of the first symbol bit extension, where the distribution rule of the first partial product of the first target code may be characterized in that the first partial product of the first target code may be located in the first partial product of the next target code of the last target code, that is, the first partial product of the target code corresponding to the lowest bit value in the first target code, the bit width of the first partial product of the first target code may be equal to 1 in the first partial product of the first low bit of the last target code, that is, the first partial product of the first target code may be equal to the first partial product of the first symbol bit extension, the lowest bit value of the first partial product of the first high bit extension may be located in the same column as the first partial product of the last target code, that the first partial product of the first target code is located in the same column, that the first partial product of the first high bit value of the first target code is located in the same column, and the first partial product of the first target code is located in the highest bit extension, and the first partial product of the first target code is located in the same column, and the first partial product of the first high bit extension is located in the first partial product of the first high bit extension, and the first partial product of the first target code is located in the first high bit value of the first bit extension is located in the first higher than the first partial product of the first target code after the first target code, the plurality of values corresponding to the highest column of values in the sign bit expanded first higher partial product exceeding the first target encoded first higher partial product do not participate in subsequent operations.
Alternatively, the second partial product acquiring circuit 23 includes a low-order partial product acquiring unit 231, a low-order selector group unit 232, a high-order partial product acquiring unit 233, and a high-order selector group unit 234; the first input end of the low-order partial product obtaining unit 231 is connected to the output end of the low-order booth encoding unit 211, the second input end of the low-order partial product obtaining unit 231 is connected to the output end of the low-order selector group unit 232, the first input end of the high-order partial product obtaining unit 233 is connected to the output end of the high-order booth encoding unit 213, and the second input end of the high-order partial product obtaining unit 233 is connected to the output end of the high-order selector group unit 234.
The low-order partial product obtaining unit 231 is configured to obtain a first low-order partial product after symbol bit expansion according to the low-order target code and the second data, and obtain a first low-order partial product of the target code according to the first low-order partial product after symbol bit expansion, the low-order selector set unit 232 is configured to gate a value in the first low-order partial product after symbol bit expansion according to the received function selection mode signal, and the high-order partial product obtaining unit 233 is configured to obtain a first high-order partial product after symbol bit expansion according to the high-order target code and the second data, and obtain a first high-order partial product of the target code according to the first high-order partial product after symbol bit expansion, and the high-order selector set unit 234 is configured to gate a value in the first high-order partial product after symbol bit expansion according to the received function selection mode signal.
In the present embodiment, the internal circuit structure and function of the second partial product acquiring circuit 23 are the same as those of the first partial product acquiring circuit 22, and the specific structure and function of the second partial product acquiring circuit 23 will not be described in detail in this embodiment.
According to the data processor provided by the embodiment, the first partial product after sign bit expansion is obtained according to the low-order target code and the high-order target code through the low-order partial product obtaining unit, the high-order partial product obtaining unit and the selector group unit, the first partial product of the target code is obtained according to the first partial product after sign bit expansion, and further the first partial product of the target code is accumulated to obtain an operation result; in addition, the data processor can also realize data operation processing in different modes, so that the universality of the data processor is improved.
In one embodiment, the data processor includes the low-order partial product acquisition unit 221, and the low-order partial product acquisition unit 221 includes: a low order destination code input port 2211, a strobe value input port 2212, a data input port 2213, and a low order partial product output port 2214; the low-order target code input port 2211 is configured to receive the low-order target code output by the low-order booth encoding unit 211, the strobe value input port 2212 is configured to receive the value in the low-order partial product after the strobe by the low-order selector group unit 222, the data input port 2213 is configured to receive the second data, and the low-order partial product output port 2214 is configured to output the low-order partial product after the sign bit expansion.
Specifically, the low-order partial product acquiring unit 221 may receive the low-order target code output from the low-order booth encoding unit 211 through the low-order target code input port 2211, and may receive two sub-data (i.e., multiplicands) in the second data through the data input port 2213. Alternatively, the low-order partial product obtaining unit 221 may obtain a low-order partial product after the sign bit expansion corresponding to the low-order data according to the received low-order target code and the received multiplicand in the multiplication operation or the multiply-accumulate operation. Alternatively, if the multiplied bit width received by the data input port 2213 is N, the bit width of the low-order partial product after the sign bit expansion may be equal to 2N.
It should be noted that, the low-order partial product obtaining unit 221 may receive, through the strobe value input port 2212, the corresponding bit value in the low-order partial product after the sign bit expansion obtained during the data operation of the different modes strobed by the low-order selector group unit 222, and obtain the low-order partial product after the sign bit expansion according to the low-order partial product after the sign bit expansion currently obtained by the data processor and the corresponding bit value after the strobe.
Optionally, the data processor includes the high-order partial product acquiring unit 223, and the high-order partial product acquiring unit 223 includes: high order destination encoding input port 2231, strobe value input port 2232, data input port 2233, and high order partial product output port 2234; the high-order target code input port 2231 is configured to receive the high-order target code output by the high-order booth encoding unit 223, the strobe value input port 2232 is configured to receive the value in the high-order partial product after the sign bit expansion output after the strobe by the high-order selector group unit 224, the data input port 2233 is configured to receive the second data, and the high-order partial product output port 2234 is configured to output the high-order partial product after the sign bit expansion.
It can be understood that, the method of the low-order partial product acquiring unit 221 acquiring the low-order partial product after the sign bit expansion is the same as the method of the high-order partial product acquiring unit 223 acquiring the high-order partial product after the sign bit expansion, and the method of the high-order partial product acquiring unit 223 acquiring the partial product is not repeated in this embodiment. In addition, the internal circuit structures of the low-order partial product acquiring unit 221 and the high-order partial product acquiring unit 223 may be the same, and the functions of the external output ports may be similar, so that the specific structure of the high-order partial product acquiring unit 223 is not described in detail in this embodiment.
According to the data processor provided by the embodiment, the low-order partial product after the sign bit expansion can be obtained through the low-order partial product obtaining unit according to each low-order target code, the low-order partial product after the sign bit expansion can be obtained through the low-order partial product obtaining unit according to the low-order partial product after the sign bit expansion and the numerical value gated by the low-order selector group unit, the low-order partial product of the target code can be obtained according to the low-order partial product after the sign bit expansion, and further the accumulation processing is carried out on the low-order partial product and the high-order partial product of the target code, so that data operation results of different modes can be obtained.
In one embodiment, the data processor includes a low-order selector bank unit 222, the low-order selector bank unit 222 including: the low-order selector 2221, a plurality of the low-order selectors 2221 are configured to gate the values in the low-order partial product after the sign bit expansion.
Specifically, the number of the low-order selectors 2221 included in the low-order selector group unit 222 may be equal to 3/8 times the square of the multiplicand bit width when the data processor currently performs multiplication or multiply-accumulate operation, and the internal circuit structures of the plurality of low-order selectors 2221 in the low-order selector group unit 222 may be the same. Alternatively, if the data processor currently needs to process multiplication of N bits of data, each low-bit booth encoding unit 211 may include 2N number generation subunits, where the N number generation subunits may be connected to N low-bit selectors 2221, and each number generation subunit is connected to one low-bit selector 2221. Alternatively, the N number generating subunits corresponding to the N low-level selectors 2221 may be number generating subunits corresponding to the N high-level number in the low-level partial product of the target code, and the internal circuit structures of the N low-level selectors 2221 and the selector 212 may be identical, and meanwhile, the external input ports of the N low-level selectors 2221 may have two other input ports besides the function selection mode signal input port (mode). Alternatively, if the data processor can process data operations in four different modes and the bit width of the multiplicand received by the data processor is N, the signals respectively received by the two other input ports of the low-level selector 2221 may be a value of 0, and when the data processor performs multiplication operation of N bits of data, the low-level partial product obtaining unit 221 obtains the sign bit value in the corresponding sign bit expanded low-level partial product. The N/4 low-order partial product acquiring units 221 may be connected to N/4 groups of N low-order selectors 2221, where the sign bit values received by the N low-order selectors 2221 of each group may be the same or different, but the sign bit values received by the N low-order selectors 2221 of the same group are the same, and the sign bit values may be obtained according to the sign bit values in the sign bit extended low-order partial product acquired by the corresponding connected low-order partial product acquiring unit 221 of each group of N low-order selectors 2221.
In addition, among the 2N number generation subunits included in each of the low-order partial product obtaining units 221, the corresponding N/2 number generation subunits may not be connected to the low-order selector 2221, in this case, the number obtained by the N/2 number generation subunits may be the corresponding bit number in the sign-bit-expanded low-order partial product obtained by multiplying data of different bit widths currently processed by the data processor, and it may be understood that the number obtained by the N/2 number generation subunits may be the corresponding bit number in the sign-bit-expanded low-order partial product, and may correspond to all the numbers from the lowest bit (i.e., 1 st bit) to the highest bit, between the (N/2) +1 st bit and the nth bit number.
It should be noted that, among the 2N number generating subunits included in each low-order partial product obtaining unit 221, the remaining N/2 number generating subunits may also be connected to N/2 low-order selectors 2221, each number generating subunit may be connected to 1 low-order selector 2221, the internal circuit structures of the N/2 low-order selectors 2221 and the selector 212 may be the same, and the external input ports of the N/2 low-order selectors 2221 may have two other input ports besides the function selection mode signal input port (mode), and the signals respectively received by the two other input ports may be N/2 bits of the multiplication operation of N/2 bits of data by the data processor, and the multiplication operation of N bits of the obtained corresponding symbol bit data may be performed by the data processor. The N/4 low-order partial product acquiring units 221 may be connected to N/4 groups of N/2 low-order selectors 2221, where the sign bit values received by the N/2 low-order selectors 2221 of each group may be the same or different, but the sign bit values received by the N/2 low-order selectors 2221 of the same group are the same, and the sign bit values may be obtained from the sign bit values in the sign bit extended low-order partial product acquired by the corresponding connected low-order partial product acquiring unit 221 according to each group of N/2 low-order selectors 2221.
In addition, the corresponding bit value in the low-order partial product after the sign bit expansion received by the N/2 low-order selectors 2221 of each group may be determined according to the low-order partial product obtaining unit 221 connected to the low-order selector 2221 of the group, and the corresponding bit value received by each low-order selector 2221 may be the same or different in the N/2 low-order selectors 2221 of each group. The positions of the 2N number generating sub-units in each low-order partial product obtaining unit 221 may be shifted left by two number generating sub-units based on the positions of the 2N number generating sub-units in the last low-order partial product obtaining unit 221.
According to the data processor provided by the embodiment, the low-order selector group unit in the data processor can gate the numerical value in the low-order partial product to obtain the low-order partial product after the sign bit expansion, the low-order partial product of the target code is obtained according to the low-order partial product after the sign bit expansion, and further the compression circuit is used for accumulating the low-order partial product and the high-order partial product of the target code to obtain operation results of different modes.
In one embodiment, the data processor includes the high-order selector bank unit 224, and the high-order selector bank unit 224 includes: the high-order selector 2241 is configured to gate the values in the high-order partial product after the sign bit expansion by a plurality of the high-order selectors 2241.
Specifically, the number of the high-order selectors 2241 in the high-order selector bank unit 224 may be equal to 3/8 times the square of the multiplicand bit width in the current multiplication or multiply-accumulate operation of the data processor, and the internal circuit structures of the plurality of high-order selectors 2241 in the high-order selector bank unit 224 may be the same. Optionally, if the data processor currently needs to process multiplication of N bits of data, each high-order booth encoding unit 213 may include 2N number generation subunits in the corresponding high-order partial product acquisition unit 223, where the N number generation subunits may be connected to N high-order selectors 2241, and each number generation subunit is connected to one high-order selector 2241, where N represents a bit width of a multiplicand currently received by the data processor. Alternatively, the N number generating subunits corresponding to the N high-level selectors 2241 may be number generating subunits corresponding to the low N number in the high-level partial product of the target code, and the internal circuit structures of the N high-level selectors 2241 and the selector 113 may be identical, and meanwhile, the external input ports of the N high-level selectors 2241 have two other input ports besides the function selection mode signal input port (mode). Alternatively, if the data processor can process data operation in four different modes and the bit width of the multiplicand received by the data processor is N, the signals respectively received by the two other input ports of the high-order selector 2241 may be a value of 0, and when the data processor performs multiplication of N bits of data, the high-order booth encoding unit 213 obtains the sign bit value in the high-order partial product after the corresponding sign bit expansion. The N/4 high-order partial product obtaining unit 223 may be connected to N/4 groups of N high-order selectors 2241, where the corresponding bit values received by the N high-order selectors 2241 of each group may be the same or different.
In addition, among the 2N number generating subunits included in each of the high-order partial product obtaining units 223, the corresponding N/2 number generating subunits may be connected to N/2 high-order selectors 2241, each of the number generating subunits may be connected to 1 high-order selector 2241, the internal circuit structures of the N/2 high-order selectors 2241 and the selector 212 may be the same, and the external input ports of the N/2 high-order selector 2241 may have two other input ports besides the function selection mode signal input port (mode), and the signals received by the two other input ports respectively may be the sign bit value in the high-order partial product obtained by multiplying N/2 bits of data by the data processor, and the sign bit value in the high-order partial product obtained by multiplying N bits of data by the data processor. The N/4 high-order partial product obtaining units 223 may be connected to N/4 groups of N/2 high-order selectors 2241, where the sign bit values received by the N/2 high-order selectors 2241 of each group may be the same or different, but the sign bit values received by the N/2 high-order selectors 2241 of the same group are the same, and the sign bit values may be obtained according to the sign bit values in the high-order partial product obtained by the corresponding connection of each group of N/2 high-order selectors 2241 by the high-order partial product obtaining unit 223. In addition, the corresponding bit value in the high-order partial product after the sign bit expansion received by the N/2 high-order selectors 2241 of each group may be determined according to the high-order partial product obtaining unit 223 connected to the high-order selector 2241 of the group, the sign bit value in the obtained high-order partial product after the sign bit expansion, and the corresponding bit value received by each high-order selector 2241 may be the same or different in the N/2 high-order selectors 2241 of each group.
It should be noted that, among the 2N number generation subunits included in each high-order partial product obtaining unit 223, the remaining N/2 number generation subunits may not be connected to the high-order selector 2241, and in this case, the number obtained by the N/2 number generation subunits may be a corresponding bit number in the sign bit expanded high-order partial product obtained by one high-order data corresponding to data with different bit widths, which is currently processed by the data processor, or it may be understood that the number obtained by the N/2 number generation subunits may be a corresponding number in the sign bit expanded high-order partial product, which corresponds to all numbers from the lowest bit (i.e., 1 st bit) to the highest bit, between the (n+1) th bit and the 3N/2 th bit number. The positions of the 2N number generating sub-units in each of the high-order partial product obtaining units 223 may be shifted left by two number generating sub-units based on the positions of the 2N number generating sub-units in the previous high-order partial product obtaining unit 223. Alternatively, only the first high partial product of the target code may have a bit width equal to 3N/2, and the remaining high partial products may have two higher values based on the last high partial product.
According to the data processor provided by the embodiment, the high-order selector group unit in the data processor can gate the numerical value in the high-order partial product to obtain the high-order partial product of the target code, and further, the compression circuit is used for accumulating the high-order partial product and the low-order partial product of the target code to obtain operation results of different modes.
Fig. 5 is a schematic diagram of a specific structure of a data processor according to another embodiment, where the data processor includes the first compression circuit 24, and the first compression circuit 24 includes: the system comprises a modified Wallace tree group circuit 241 and an accumulation circuit 242, wherein the output end of the modified Wallace tree group circuit 241 is connected with the input end of the accumulation circuit 242; the modified wallace tree group circuit 241 is configured to perform accumulation processing on each column value in the first low-order partial product of the target code and the first high-order partial product of the target code obtained during data operation processing in different modes to obtain an accumulation operation result, and the accumulation circuit 242 is configured to perform addition operation on the accumulation operation result.
Specifically, the modified wallace tree group circuit 241 may perform accumulation processing on each column value in the first low-order partial product of the target code and the first high-order partial product of the target code obtained by the first partial product obtaining circuit 22, and perform accumulation processing on two operation results obtained by the modified wallace tree group circuit 241 by the accumulation circuit 242, so as to obtain a target operation result. When the distribution rule of all partial products of the target code is modified by the wallace tree group circuit 241, the distribution rule can be characterized in that the position of the lowest numerical value in the partial product of each row corresponding to the target code is shifted by two numerical values to the right compared with the position of the lowest numerical value in the partial product of the next row corresponding to the target code, however, in the distribution form of all partial products of the target code, the highest numerical value in the partial product of each row corresponding to the target code is located in the same column with the highest numerical value in the partial product of the first row corresponding to the target code, and the modified wallace tree group circuit 241 performs the accumulation processing on each column numerical value in all partial products of the target code according to the distribution rule, wherein all partial products of the target code can include the first low partial product of the target code and the first high partial product of the target code. Alternatively, the two operation results obtained by the modified wallace tree group circuit 241 may include a Sum bit output signal Sum and a Carry output signal Carry.
Optionally, the second compression circuit 25 includes: the system comprises a modified Wallace tree group circuit 251 and an accumulation circuit 252, wherein the output end of the modified Wallace tree group circuit 251 is connected with the input end of the accumulation circuit 252; the modified wallace tree group circuit 251 is configured to perform accumulation processing on each column value in the first low-order partial product of the target code and the first high-order partial product of the target code obtained during data operation processing in different modes to obtain an accumulation operation result, and the accumulation circuit 252 is configured to perform addition operation on the accumulation operation result.
Note that, the method of compressing the first partial product of the target code by the first compression circuit 24 is the same as the method of compressing the second partial product of the target code by the second compression circuit 25, and the compression method of the second compression circuit 25 is not described in detail in this embodiment. In addition, the internal structures of the first compression circuit 24 and the second compression circuit 25, and the functions of the external ports are identical, and the specific structure of the second compression circuit 25 is not described in detail in this embodiment.
According to the data processor provided by the embodiment, the Wallace tree group circuit is corrected to perform accumulation processing on the low-order partial product and the high-order partial product of the target code, and the accumulation circuit is used for performing accumulation processing on the accumulation result to obtain the target operation result.
In one embodiment, the specific structure of the data processor shown in fig. 5 is further illustrated, where the data processor includes the modified wallace tree group circuit 241, and the modified wallace tree group circuit 241 includes: a low-level wallace tree sub-circuit 2411, a selector 2412, and a high-level wallace tree sub-circuit 2413, wherein an output terminal of the low-level wallace tree sub-circuit 2411 is connected to an input terminal of the selector 2412, and an output terminal of the selector 2412 is connected to an input terminal of the high-level wallace tree sub-circuit 2413; the plurality of low-level wallace tree sub-circuits 2411 are configured to perform an accumulation operation on each column value in the first partial product of the target code to obtain the accumulation operation result, the selector 2412 is configured to gate a carry input signal received by the high-level wallace tree sub-circuit 2413, and the plurality of high-level wallace tree sub-circuits 2413 are configured to perform an accumulation operation on each column value in the first partial product of the target code to obtain the accumulation operation result.
Specifically, the circuit structures of the plurality of low-level wallace tree subcircuits 2411 and the plurality of high-level wallace tree subcircuits 2413 may be implemented by a combination of full adder and half adder, or by a combination of 4-2 compressors, and may be understood as a circuit capable of processing multiple-bit input signals and adding the multiple-bit input signals to obtain two-bit output signals. Optionally, the number of the upper Wallace tree subcircuits 2413 in the modified Wallace tree group circuit 241 may be equal to the bit width N of the multiplicand in the multiplication or multiply-accumulate operation currently available to the data processor, or may be equal to the lower Wallace tree subcircuit 2411 The number of the lower Wallace tree subcircuits 2411 may be connected in series, and the upper Wallace tree subcircuits 2413 may be connected in series. Optionally, the output of the last low-level Wallace tree subcircuit 2411 is connected to the input of the selector 2412, and the output of the selector 2412 is connected to the input of the first high-level Wallace tree subcircuit 2411. Optionally, in the modified Wallace tree group circuit 241, each low-order Wallace tree sub-circuit 2411 may perform addition processing on each column of all partial products of the target code, and each low-order Wallace tree sub-circuit 2411 may output two signals, namely, carry signal Carry i And a Sum bit signal Sum i Where i may represent a number corresponding to each low-level wallace subcircuit 2411, and the number of the first low-level wallace subcircuit 2411 is a value of 0. Alternatively, the number of input signals received by each low-order Wallace tree subcircuit 2411 may be equal to the number of first partial products of the target code. In the modified wallace tree group circuit 241, the sum of the numbers of the upper wallace tree sub-circuit 2413 and the lower wallace tree sub-circuit 2411 may be equal to 2N, the total number of columns from the lowest column to the highest column may be equal to 2N among all first partial products of the target code, the N lower wallace tree sub-circuits 2411 may perform an accumulation operation on each of the low N columns of all first partial products of the target code, and the N upper wallace tree sub-circuits 2413 may perform an accumulation operation on each of the high N columns of all first partial products of the target code.
Optionally, the modified wallace tree group circuit 251 includes: a low-level wallace tree sub-circuit 2511, a selector 2512 and a high-level wallace tree sub-circuit 2513, wherein the output end of the low-level wallace tree sub-circuit 2511 is connected with the input end of the selector 2512, and the output end of the selector 2512 is connected with the input end of the high-level wallace tree sub-circuit 2513; the plurality of low-level wallace tree sub-circuits 2511 are configured to perform an accumulation operation on each column value in the first partial product of the target code to obtain the accumulation operation result, the selector 2512 is configured to gate a carry input signal received by the high-level wallace tree sub-circuit 2513, and the plurality of high-level wallace tree sub-circuits 2513 are configured to perform an accumulation operation on each column value in the first partial product of the target code to obtain the accumulation operation result.
In this embodiment, the internal circuit structure and function of the modified wallace tree group circuit 241 are the same as those of the modified wallace tree group circuit 251, and the internal circuit structure and function of the modified wallace tree group circuit 251 are not described in detail in this embodiment.
According to the data processor provided by the embodiment, the partial product of the target code can be accumulated through the modified Wallace tree group circuit to obtain two paths of output signals, and the two paths of output signals are accumulated through the accumulation circuit to obtain data operation results of different modes; in addition, the data processor can complete multiply-accumulate operation without carrying out accumulation operation on multiplication operation results, and can directly realize multiply or multiply-accumulate operation only through one operation process, thereby reducing the power consumption of the data processor.
Another embodiment provides a data processor, wherein the data processor includes the accumulating circuit 242, and the accumulating circuit 242 includes: adder 2421, carry adder 2421 is configured to add the result of the summation operation.
In particular, adder 2421 may be a carry adder of different bit widths. Alternatively, the adder 2421 may receive two signals output by the modified wallace tree group circuit 241, perform addition operation on the two output signals, and output a data operation result of the current processing mode of the data processor. Alternatively, the adder 2421 may be a carry-lookahead adder.
According to the data processor provided by the embodiment, the accumulation circuit can be used for carrying out accumulation processing on two paths of signals output by the correction Wallace tree group circuit and outputting data operation results in different modes, the multiplication and accumulation operation can be finished without carrying out accumulation operation on the multiplication operation results once, and multiplication or multiplication and accumulation operation can be directly realized through one operation process, so that the power consumption of the data processor is reduced.
In one embodiment, the data processor includes the adder 2421, and the adder 2421 includes: carry signal input port 2421a, and bit signal input port 2421b, and operation result output port 2421c; the carry signal input port 2421a is for receiving a carry signal, the sum bit signal input port 2421b is for receiving a sum bit signal, and the operation result output port 2421c is for outputting a result of the accumulation processing of the carry signal and the sum bit signal.
Specifically, the adder 2421 receives the Carry signal Carry output from the modified wallace tree group circuit 241 through the Carry signal input port 2421a, receives the Sum signal Sum output from the modified wallace tree group circuit 241 through the Sum signal input port 2421b, and outputs the result of accumulating the Carry signal Carry and the Sum signal Sum through the operation result output port 2421 c.
It should be noted that, in the operation process, the data processor may use the adder 2421 with different bit widths to perform addition operation on the Carry output signal Carry and the Sum output signal Sum output by the modified wallace tree group circuit 241, where the bit width of the data that the adder 2421 can process may be equal to 2 times of the multiplicand bit width in the multiplication or multiply-accumulate operation required by the data processor.
According to the data processor provided by the embodiment, the accumulation circuit can carry out accumulation operation on two paths of signals output by the correction Wallace tree group circuit to output data operation results in different modes, the multiplication accumulation operation can be completed without carrying out accumulation operation on the multiplication operation result once again, and multiplication or multiplication accumulation operation can be directly realized through one operation process, so that the power consumption of the data processor is reduced.
Fig. 6 is a flow chart of a data processing method provided in an embodiment, where the method may be processed by the data processor shown in fig. 1 and 3, and the embodiment relates to a process of implementing data operations in four different modes. As shown in fig. 6, the method includes:
s101, receiving data to be processed and a function selection mode signal, wherein the function selection mode signal is used for indicating data operation of a corresponding mode which can be processed currently by a data processor.
Specifically, the data processor may receive, through the first multiplication circuit and the second multiplication circuit, one piece of data to be processed, where the data to be processed may include two pieces of sub data to be processed, and the two pieces of sub data to be processed may be identical sub data with a parity width, or may be different sub data with a parity width. Alternatively, the two sub-data in the data to be processed may be spliced and then input to the first multiplication circuit or the second multiplication circuit as a whole, or may be separately and simultaneously input to the first multiplication circuit or the second multiplication circuit. The sub-data to be processed may be fixed-point number, and the bit width may be 2N, and the data bit width obtained after the two sub-data to be processed are spliced may be 4N.
It should be noted that, the first multiplication circuit and the second multiplication circuit may both receive the same function selection mode, and the function selection mode signal may have four kinds of function selection mode signals, where the four kinds of function selection mode signals respectively correspond to four kinds of data operations that can be processed by the data processor, and the four kinds of data operations may be multiplication operations of N bits by N bits of data, multiplication operations of 2N bits by 2N bits of data, and multiplication operations of 2N bits by N bits of data. The data processor selects the mode signals according to the received different functions, and can determine the data operation which needs to process the corresponding mode currently. In addition, one of the sub-data to be processed may be a multiplier when the data processor performs a multiplication or multiply-accumulate operation, and the other sub-data to be processed may be a multiplicand when the data processor performs a multiplication or multiply-accumulate operation.
S102, according to the function selection mode signal, encoding the data to be processed to obtain a target code.
Optionally, in S102, the step of performing encoding processing on the data to be processed according to the function selection mode signal to obtain a target code includes: determining the data operation of the corresponding mode which can be processed by the data processor currently according to the function selection mode signal; and carrying out Booth coding processing on the data to be processed according to the data operation of the corresponding mode to obtain a target code.
Specifically, the data processor may determine the specific mode data operation currently processable based on the received function selection mode signal. The bit width of the two sub-data to be processed included in the data to be processed is 2N, and then the data processor can determine that the first multiplication circuit and the second multiplication circuit need to perform booth encoding processing on the N-bit data or the 2N-bit data currently according to the bit width of the sub-data to be processed and the data operation of the corresponding mode to be processed currently, so as to obtain two groups of corresponding target codes. The first multiplication circuit and the second multiplication circuit may perform booth encoding processing on the received multiplier, and the received multiplicand may not be subjected to booth encoding processing.
It should be noted that, the target encoding rule of the booth encoding may be referred to table 1 and the related embodiments of the first modified encoding sub-circuit 111 structure described above. Alternatively, the target codes may include a first target code obtained by a first multiplication circuit and a second target code obtained by a second multiplication circuit. The target code can be adjacent three-bit numerical values in a multiplier during multiplication or multiply-accumulate operation processing.
S103, obtaining a partial product of the sign bit expansion through the target code and the data to be processed.
Specifically, the first multiplication circuit and the second multiplication circuit can obtain corresponding partial products after the sign bit expansion according to the obtained target codes and the sub-data to be processed (namely, multiplicand) in the received data to be processed. The bit width of the partial product after the sign bit expansion can be equal to 2 times of the corresponding multiplicand bit width in the operation processing process. Alternatively, the partial product after the sign bit expansion may include a first partial product after the sign bit expansion obtained by the first multiplication circuit, and a second partial product after the sign bit expansion obtained by the second multiplication circuit.
S104, according to the function selection mode signal and the partial product after the sign bit expansion, acquiring a target coding partial product.
It will be appreciated that the data processor may determine, by the function select mode signal, that a data operation of a corresponding mode is currently required to be processed, determine a first partial product of the target code based on the first partial product after sign bit expansion and/or the first partial product after sign bit expansion, and determine a second partial product of the target code based on the first partial product after sign bit expansion and/or the first partial product after sign bit expansion. The partial product of the target code may include a first partial product of the target code obtained by the first multiplication circuit and a second partial product of the target code obtained by the second multiplication circuit.
S105, compressing the partial product of the target code to obtain a target operation result.
Specifically, the compression process described above may be referred to as an accumulation operation process. In addition, the target operation result may be a result of a multiplication operation of N-bit data, a multiplication and accumulation operation of N-bit data, a multiplication operation of 2N-bit data, or a multiplication and accumulation operation of 2N-bit data. The data processor can directly compress all partial products of target codes obtained by two data to be processed into a carry signal and a sum bit signal when the multiplication and accumulation operation of N bits of data is performed on the partial products of the target codes, and then the two signals are accumulated to obtain a target operation result.
According to the data processing method provided by the embodiment, the data operation of the current processable corresponding mode can be determined according to the received function selection mode signal, so that multiplication operation can be realized, multiplication accumulation operation can be realized, and the universality of a data processor is improved; in addition, the multiplication operation can be completed without carrying out the accumulation operation on the multiplication operation result, and the multiplication operation or the multiplication operation can be directly realized only by one operation process, thereby effectively reducing the power consumption of the data processor.
In one embodiment, the step of obtaining the partial product after the sign bit expansion from the target code and the data to be processed in S103 includes:
s1031, obtaining a first partial product of the sign bit expansion through the first target code and the data to be processed.
Specifically, after the data processor determines that the data processor can process the data operation in the corresponding mode currently, the first multiplication circuit can obtain a first partial product after the corresponding sign bit expansion according to the obtained first target code and the received sub-data to be processed (i.e. multiplicand) in the data to be processed.
For example, the bit width of two sub-data to be processed in the data to be processed is 2N bits, the sub-data to be processed as a multiplicand may be represented as X, and the first target code may include five types of signals, namely-2X, -X, X and 0, respectively. In addition, if the data processor can process N bits of multiplication of N bits of data, the first multiplication result may directly obtain a corresponding first partial product after the sign bit expansion according to the multiplicand X and the first target code, where the bit width of the first partial product after the sign bit expansion may be equal to 2N, the low (n+1) bit value in the first partial product after the sign bit expansion may be equal to the value contained in the original partial product, and the high (N-1) bit value in the first partial product after the sign bit expansion may be equal to the sign bit value of the original partial product, which is the highest bit value in the original partial product. When the first target code is-2X, the original partial product may be obtained by inverting X by one bit and adding 1, when the first target code is 2X, the original partial product may be obtained by shifting X by one bit, when the first target code is-X, the original partial product may be obtained by inverting X by one bit and adding 1, when the first target code is X, the original partial product may be obtained by combining X with a sign bit value of X (i.e., a highest bit value of X), and when the first target code is +0, the original partial product may be obtained by adding 0, i.e., each bit value in the 9-bit original partial product is equal to 0.
S1032, obtaining a second partial product of the sign bit expansion through the second target code and the data to be processed.
It can be understood that after the data processor determines that the data processor can process the data operation in the corresponding mode currently, the second multiplication circuit can obtain the second partial product after the corresponding sign bit expansion according to the obtained second target code and the received sub-data to be processed (i.e. multiplicand) in the data to be processed.
According to the data processing method provided by the embodiment, the data processor obtains the first partial product after the sign bit expansion through the first multiplication circuit, and obtains the second partial product after the sign bit expansion through the second multiplication circuit, so that the target coding partial product is determined according to the first partial product after the sign bit expansion and the second partial product after the sign bit expansion according to the data processor which can process the data operation of the corresponding mode currently, thereby realizing the data operation of different modes and improving the universality of the data processor.
In one embodiment, the step of obtaining the partial product of the target code in S104 according to the function selection mode signal and the partial product of the sign bit after the sign bit expansion includes:
S1041, determining that the data processor can process the data operation of the corresponding mode currently according to the function selection mode signal.
Specifically, the data processor may determine the specific mode data operation currently being processed based on the received different function selection mode signals.
S1042, according to the data operation of the corresponding mode, determining whether the partial product after the sign bit expansion needs to be exchanged.
Optionally, in S1042, determining whether the exchange processing is required for the partial product after the sign bit expansion according to the data operation of the corresponding mode includes: and determining whether the first partial product after the sign bit expansion and the second partial product after the sign bit expansion are required to be subjected to exchange processing according to the data operation of the corresponding mode.
The partial product after the sign bit expansion may include a first partial product after the sign bit expansion obtained by the first multiplication circuit and a second partial product after the sign bit expansion obtained by the second multiplication circuit. Alternatively, the data processor may receive four different function selection mode signals, each of which represents a data operation for which the data processor is currently capable of processing a corresponding mode. The corresponding four modes of data operation can be multiplication operation of N bits of data, multiplication and accumulation operation of N bits of data, multiplication operation of 2N bits of data and multiplication and accumulation operation of 2N bits of data. It will be appreciated that the data processor need not perform the exchange process by the partial product exchange circuit for the first partial product of the sign bit expansion obtained by the first multiplication circuit and for the second partial product of the sign bit expansion obtained by the second multiplication circuit only when the function selection mode signal received by the data processor indicates that the multiply-accumulate operation of 2N bits of data is currently required to be processed.
S1043, if the exchange processing is not required, setting the partial product of the sign bit extension as the partial product of the target code.
Specifically, if the function selection mode signal received by the data processor indicates that the current processing is possible, the multiplication operation of N bits of data, the multiplication and accumulation operation of N bits of data, or the multiplication operation of 2N bits of data, the data processor may not perform the exchange processing, and may use the first partial product after the sign bit expansion as the first partial product of the target code and the second partial product after the sign bit expansion as the second partial product of the target code, so as to perform the compression processing respectively. Alternatively, the first partial product after sign bit expansion and the second partial product after sign bit expansion may be a digital value of 0 or a non-0 signal.
According to the data processing method provided by the embodiment, the data processor determines whether the exchange processing is needed for the partial product after the sign bit expansion according to the received function selection mode signal, if the exchange processing is not needed, the partial product after the sign bit expansion is used as the partial product of the target code, and then the compression processing is carried out for the partial product of the target code.
In one embodiment, after the step of determining whether the exchange processing is required for the partial product after the sign bit expansion according to the data operation of the corresponding mode in S1042, the method further includes: and if the exchange processing is needed, carrying out the exchange processing on the partial product after the sign bit expansion.
The data processor may perform the exchange process on the first low-order partial product after the sign bit expansion and the second low-order partial product after the sign bit expansion, or may perform the exchange process on the first high-order partial product after the sign bit expansion and the second high-order partial product after the sign bit expansion.
For example, if the bit width of two sub-data to be processed in the data to be processed received by the data processor is 2N, one sub-data to be processed (i.e. multiplier) may include two sub-data to be processed a and b, the other sub-data to be processed (i.e. multiplicand) may include two sub-data to be processed c and d, and multiplication operation of 2N bits of data is currently required to be performed on a by c and b by d, then the first multiplication circuit in the data processor may perform booth encoding processing on the received c to obtain a set of corresponding target codes, and according to actual operation requirements, the first multiplication circuit may perform compression processing on the first partial product after symbol bit expansion as the first partial product of the target code through the target code corresponding to the sub-data c, and the second multiplication circuit may perform compression processing on the second partial product after symbol bit expansion as the second partial product after symbol bit expansion.
Continuing the above example, if the data processor needs to perform multiply-accumulate operation of two groups of 2N bits by N bits, c (i.e. multiplier) received by the first multiplication circuit may be non-0 data of 2N bits, the lower N bit numerical values in a (i.e. multiplicand) may be both numerical values 0, or the upper N bit numerical values may be both numerical values 0, d (i.e. multiplier) received by the second multiplication circuit may be non-0 data of 2N bits, the lower N bit numerical values in b (i.e. multiplicand) may be both numerical values 0, or the upper N bit numerical values may be both numerical values 0, during operation, the first multiplication circuit obtains a first partial product after corresponding sign bit expansion according to the target code corresponding to c and the N bit non-0 data in a, the second multiplication circuit obtains a second partial product after corresponding sign bit expansion according to d corresponding target code and N bit non-0 data in b, then the first multiplication circuit and the second multiplication circuit perform partial product after corresponding sign bit expansion of N bit in b, and the second multiplication circuit obtains a partial product after corresponding to N bit expansion of the target code corresponding to N bit data in b, and the second partial product obtained by performing partial product-multiply-accumulate operation on the target bit expansion.
According to the data processing method provided by the embodiment, the data processor determines whether the exchange processing is needed for the partial product after the sign bit expansion according to the received function selection mode signal, if so, the partial product of the target code is obtained after the exchange processing, and then the compression processing is carried out on the partial product of the target code.
In another embodiment, the step of compressing the partial product of the target code in S105 to obtain a target operation result includes:
s1051, performing accumulation processing on the partial product of the target code to obtain an intermediate operation result.
Specifically, the data processor can perform accumulation processing on a first partial product of the target code through the first multiplication circuit to obtain an intermediate operation result, and can perform accumulation processing on a second partial product of the target code through the second multiplication circuit to obtain another intermediate operation result. Alternatively, the two intermediate operation results may both include a Sum bit output signal Sum and a Carry output signal Carry, where the Sum bit output signal Sum and the Carry output signal Carry may have the same bit width, and both intermediate operation results may be equal to 0 or may be equal to a non-0 signal. For example, only a group of 2N-bit data needs to be multiplied currently, at this time, one of the two data to be processed received by the data processor is a value 0, and the data to be processed which is the value 0 corresponds to the obtained target code, and the partial product after the sign bit expansion and the intermediate operation result can be equal to 0; when multiplication operation is needed to be carried out on two groups of 2N bit data, two groups of target codes corresponding to two groups of target codes obtained by the two groups of target codes are received by the data processor, and the partial product after two sign bit expansion and two intermediate operation results can be non-0 signals.
S1052, accumulating the intermediate operation result to obtain the target operation result.
Specifically, the data processor can respectively perform accumulation processing on the two intermediate operation results through the accumulation circuit to obtain a first target operation result and a second target operation result. The first target operation result may be referred to as an operation result obtained by the first multiplication circuit, and the second target operation result may be referred to as an operation result obtained by the second multiplication circuit.
In addition, the data processor can Carry out addition operation on the Carry output signal Carry and the Sum bit output signal Sum output by the modified Wallace tree group circuit through an adder in the accumulation circuit, and output an addition operation result. Optionally, each Wallace tree sub-circuit (i.e., low-level Wallace tree sub-circuit or high-level Wallace tree sub-circuit) in the modified Wallace tree group circuit may output oneCarry output signal Carry i And a Sum bit output signal Sum i (i=0, …,2N-1, i is the corresponding number for each wale tree subcircuit, starting with a number of 0). Optionally, the carry= { [ Carry ] received by the adder 0 :Carry 2N-2 ]0, that is, the bit width of the Carry output signal Carry received by the adder is 2N, the first 2N-1 digits in the Carry output signal Carry correspond to the Carry output signals of the first 2N-1 Wallace tree sub-circuits in the Wallace tree group circuit, and the last digit in the Carry output signal Carry can be replaced by a value of 0. Alternatively, the Sum bit output signal Sum received by the adder may have a bit width of 2N, and the value in the Sum bit output signal Sum may be equal to the Sum bit output signal of each of the wallace tree sub-circuits in the modified wallace tree group circuit.
For example, if the data processor currently needs to process 8-bit multiplication by 8-bit multiplication, the adder may be a 16-bit Carry-ahead adder, as shown in fig. 7, the modified wallace tree group circuit may output a Sum output signal Sum and a Carry output signal Carry of 16 wallace tree sub-circuits, but the Sum output signal received by the 16-bit Carry-ahead adder may be a complete Sum output signal Sum output by the modified wallace tree group circuit, and the received Carry output signal may be a Carry signal Carry after combining all Carry output signals of the Carry output signals output by the last wallace tree sub-circuit and 0 in the modified wallace tree group circuit.
According to the data processing method, the intermediate operation result is obtained by accumulating the partial product of the target code, and the target operation result is obtained by accumulating the intermediate operation result through the accumulation circuit.
Fig. 8 is a flow chart of a data processing method provided in an embodiment, where the method may be processed by the data processor shown in fig. 2 and 5, and the embodiment relates to a process of implementing data operations in four different modes. As shown in fig. 8, the method includes:
s201, receiving data to be processed and a function selection mode signal, wherein the function selection mode signal is used for indicating data operation of a corresponding mode which can be processed currently by a data processor.
Specifically, the data processor may receive one data to be processed through the booth encoding circuit, and receive another data to be processed through the first partial product acquisition circuit and the second partial product acquisition circuit, respectively, where the booth encoding circuit, the first partial product acquisition circuit, and the second partial product acquisition circuit may all receive the same function selection mode signal at the same time. Alternatively, the data to be processed may include two sub-data to be processed, where the two sub-data to be processed may be the same sub-data with the same bit width, or may be different sub-data with different bit widths. Alternatively, two sub-data to be processed in one piece of data to be processed can be spliced to be used as a whole and input to the booth encoding circuit, and can be separately and simultaneously input to the booth encoding circuit, and two sub-data to be processed in the other piece of data to be processed can be spliced to be used as a whole and simultaneously input to the first partial product acquisition circuit and the second partial product acquisition circuit, and can be separately and simultaneously input to the first partial product acquisition circuit and the second partial product acquisition circuit. The sub-data to be processed may be fixed-point number, and the bit width may be 2N, and the data bit width obtained after the two sub-data to be processed are spliced may be 4N.
It should be noted that the four kinds of function selection mode signals may be four kinds of function selection mode signals respectively corresponding to four kinds of data operations that can be processed by the data processor, where the four kinds of data operations may be multiplication operations of N bits by N bits of data, multiplication and accumulation operations of N bits by N bits of data, multiplication operations of 2N bits by 2N bits of data, and multiplication and accumulation operations of 2N bits by N bits of data. In addition, one of the sub-data to be processed may be a multiplier when the data processor performs a multiplication or multiply-accumulate operation, and the other sub-data to be processed may be a multiplicand when the data processor performs a multiplication or multiply-accumulate operation.
S202, carrying out Booth coding processing on the data to be processed according to the function selection mode signal to obtain target codes.
Optionally, in S202, according to the function selection mode signal, a step of performing booth encoding processing on the data to be processed to obtain a target code includes: determining the data operation of the corresponding mode which can be processed by the data processor currently according to the function selection mode signal; and carrying out Booth coding processing on the data to be processed according to the data operation of the corresponding mode to obtain a target code.
Specifically, the data processor can determine the specific mode data operation which can be processed currently according to the received function selection mode signal, and the two sub-data to be processed in the data to be processed received by the booth encoding circuit can be multipliers in the operation processing process. The bit width of the two sub-data to be processed contained in the data to be processed is 2N, and then the data processor can determine that the booth encoding circuit needs to perform booth encoding processing on the N-bit data or the 2N-bit data currently according to the bit width of the sub-data to be processed and the data operation of the corresponding mode to be processed currently, so as to obtain two groups of corresponding target codes.
It should be noted that, the target encoding rule of the booth encoding process may be referred to table 1 and related embodiments of the booth encoding circuit 21 structure described above. Optionally, if the booth encoding circuit currently needs to process N bits of data, the number of target encodings may be equal to N/2; if the booth encoding circuit currently needs to process 2N bits of data, the number of target encodings may be equal to N. The target code can be adjacent three-bit numerical values in a multiplier during multiplication or multiply-accumulate operation processing.
S203, according to the target code and the data to be processed, a first partial product of the target code and a second partial product of the target code are obtained.
Specifically, the data processor may obtain the first partial product of the target code and the second partial product of the target code according to the actual operation requirement and the corresponding target code and the corresponding data to be processed obtained from the sub data to be processed. The data processor can obtain a first partial product of the target code through the first partial product obtaining circuit, and obtain a second partial product of the target code through the second partial product obtaining circuit.
S204, compressing the first partial product of the target code to obtain a first target operation result.
Optionally, the step of compressing the first partial product of the target code in S204 to obtain a first target operation result includes: accumulating the first partial product of the target code to obtain a first intermediate operation result; and accumulating the first intermediate operation result to obtain the first target operation result.
Specifically, the data processor may perform an accumulation operation on the first partial product of the target code through a modified wallace tree group circuit in the first compression circuit to obtain a first intermediate operation result, and then perform an accumulation process on the first intermediate operation result through the accumulation circuit to obtain a first target operation result. Optionally, the first intermediate operation result may include modifying the wallace tree group circuit to perform an accumulation operation, so as to obtain a Sum bit output signal Sum and a Carry output signal Carry, where the bit widths of the Sum bit output signal Sum and the Carry output signal Carry may be the same. The accumulation circuit performs an accumulation operation on the Sum output signal Sum and the Carry output signal Carry. Alternatively, the first target operation result may be a value of 0, and may also be non-0 data.
It should be noted that, the data processor may perform addition operation on the Carry output signal Carry and the Sum bit output signal Sum output by the modified wallace tree group circuit through the adder in the accumulation circuit, and output an addition result. Optionally, each Wallace tree sub-circuit in the modified Wallace tree group circuit may output a Carry output signal Carry i And a Sum bit output signal Sum i (i=0, …,2N-1, i is the corresponding number for each wale tree subcircuit, starting with a number of 0). Optionally, the carry= { [ Carry ] received by the adder 0 :Carry 2N-2 ]0, that is, the bit width of the Carry output signal Carry received by the adder is 2N, the first 2N-1 digits in the Carry output signal Carry correspond to the Carry output signals of the first 2N-1 Wallace tree sub-circuits in the Wallace tree group circuit, and the last digit in the Carry output signal Carry can be replaced by a value of 0. Alternatively, the Sum bit output signal Sum received by the adder may have a bit width of 2N, and the value in the Sum bit output signal Sum may be equal to the Sum bit output signal of each of the wallace tree sub-circuits in the modified wallace tree group circuit.
S205, compressing the second partial product of the target code to obtain a second target operation result.
Optionally, the step of compressing the second partial product of the target code to obtain a second target operation result in S205 includes: accumulating the second partial product of the target code to obtain a second intermediate operation result; and accumulating the second intermediate operation result to obtain the second target operation result.
Specifically, the data processor may perform an accumulation operation on the second partial product of the target code through a modified wallace tree group circuit in the second compression circuit to obtain a second intermediate operation result, and then perform an accumulation process on the second intermediate operation result through the accumulation circuit to obtain a second target operation result. Optionally, the second target operation result may be a value of 0, and may also be non-0 data.
In this embodiment, the data processor may execute step S204 and step S205 simultaneously, and the sequence of these two steps is not limited in this embodiment.
According to the data processing method provided by the embodiment, the data operation of the current processable corresponding mode can be determined according to the received function selection mode signal, so that multiplication operation can be realized, multiplication accumulation operation can be realized, and the universality of a data processor is improved; in addition, the multiplication operation can be completed without carrying out the accumulation operation on the multiplication operation result, and the multiplication operation or the multiplication operation can be directly realized only by one operation process, thereby effectively reducing the power consumption of the data processor.
As one embodiment, the step of obtaining the first partial product of the target code and the second partial product of the target code in S203 according to the target code and the data to be processed may include:
s2031, obtaining a first partial product after sign bit expansion and a second partial product after sign bit expansion according to the target code and the data to be processed.
Specifically, the data to be processed may be a multiplicand (i.e., X) in a multiplication operation or a multiply-accumulate operation, and the target codes may include a first target code and a second target code. Optionally, the data processor may obtain a first partial product of the sign bit expansion according to the multiplicand and the first target code, and may obtain a first partial product of the sign bit expansion according to the multiplicand and the second target code.
It should be noted that, when the value in the first target code is-1, the original partial product may be-X, when the value in the first target code is 1, the original partial product may be X, when the value in the first target code is 0, the original partial product may be 0, correspondingly, the bit width of the first partial product after sign bit expansion may be 2 times the multiplicand bit width (i.e., N bits), the low (n+1) bit value in the first partial product after sign bit expansion may be the (n+1) bit value included in the original partial product, and the high (N-1) bit value in the first partial product after sign bit expansion may be the highest bit value in the original partial product, i.e., the sign bit value in the original partial product. Alternatively, each bit value in the first target code may obtain a first partial product of a corresponding one of the sign bit extensions.
Similarly, the second target code and the multiplicand obtain the second partial product after the sign bit expansion in the same manner as the first partial product after the sign bit expansion, which is not described in detail in this embodiment.
S2032, performing shift processing on the first partial product after the sign bit expansion and the second partial product after the sign bit expansion to obtain a first partial product of the target code and a second partial product of the target code.
It should be noted that, in the obtained distribution rule of the first partial product of all target codes, the first partial product of each target code may be equal to the first partial product of the corresponding symbol bit after expansion, or may be equal to a partial bit value in the first partial product of the corresponding symbol bit after expansion, where the first partial product of the first target code may be equal to the first partial product of the first corresponding symbol bit after expansion, starting from the first partial product of the second target code, the lowest bit value in the first partial product of each target code may be located in the same column with the next lower bit value in the first partial product of the last target code, which is equivalent to each bit value in the first partial product of each symbol bit after expansion, and is shifted to the left by one column based on the corresponding column where each bit value in the first partial product of the last symbol bit after expansion is located, and the highest bit value in the first partial product of each target code may be located in the same column as the highest bit value in the first partial product of the first target code, where the highest bit value in the first partial product of each target code is not located in the same column, and the highest bit value in the first partial product may be accumulated. Alternatively, the number of columns of the first partial product of all target encodings may be equal to 2 times the data bit width currently processed by the data processor.
Similarly, the manner of obtaining the second partial product of the target code according to the second partial product of the sign bit after the sign bit expansion may be the same as the manner of obtaining the first partial product of the target code, which is not described in detail in this embodiment.
The data processing method provided by the embodiment can realize multiplication operation and multiply-accumulate operation, so that the universality of a data processor is improved; in addition, the multiplication operation can be completed without carrying out the accumulation operation on the multiplication operation result, and the multiplication operation or the multiplication operation can be directly realized only by one operation process, thereby effectively reducing the power consumption of the data processor.
The embodiment of the application also provides a machine learning operation device, which comprises one or more data processors, wherein the data processors are used for acquiring data to be operated and control information from other processing devices, executing specified machine learning operation, and transmitting an execution result to peripheral equipment through an I/O interface. Peripheral devices such as cameras, displays, mice, keyboards, network cards, wifi interfaces, servers. When more than one data processor is included, the data processors may be linked and data transferred by a specific structure, such as interconnection and data transfer via PCIE bus, to support larger scale machine learning operations. At this time, the same control system may be shared, or independent control systems may be provided; the memory may be shared, or each accelerator may have its own memory. In addition, the interconnection mode can be any interconnection topology.
The machine learning operation device has higher compatibility and can be connected with various types of servers through PCIE interfaces.
The embodiment of the application also provides a combined processing device which comprises the machine learning operation device, a general interconnection interface and other processing devices. The machine learning operation device interacts with other processing devices to jointly complete the operation designated by the user. Fig. 9 is a schematic diagram of a combination processing apparatus.
Other processing means include one or more processor types of general-purpose/special-purpose processors such as Central Processing Units (CPU), graphics Processing Units (GPU), neural network processors, etc. The number of processors included in the other processing means is not limited. Other processing devices are used as interfaces between the machine learning operation device and external data and control, including data carrying, and complete basic control such as starting, stopping and the like of the machine learning operation device; other processing devices may cooperate with the machine learning computing device to perform the computing task.
And the universal interconnection interface is used for transmitting data and control instructions between the machine learning operation device and other processing devices. The machine learning operation device acquires required input data from other processing devices and writes the required input data into a storage device on a machine learning operation device chip; the control instruction can be obtained from other processing devices and written into a control cache on a machine learning operation device chip; the data in the memory module of the machine learning arithmetic device may be read and transmitted to other processing devices.
Alternatively, as shown in fig. 10, the structure may further include a storage device connected to the machine learning operation device and the other processing device, respectively. The storage device is used for storing data in the machine learning arithmetic device and the other processing devices, and is particularly suitable for data which cannot be stored in the machine learning arithmetic device or the other processing devices in the internal storage of the machine learning arithmetic device or the other processing devices.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle, video monitoring equipment and the like, so that the core area of a control part is effectively reduced, the processing speed is improved, and the overall power consumption is reduced. In this case, the universal interconnect interface of the combined processing apparatus is connected to some parts of the device. Some components such as cameras, displays, mice, keyboards, network cards, wifi interfaces.
In some embodiments, a chip is also disclosed, which includes the machine learning computing device or the combination processing device.
In some embodiments, a chip package structure is disclosed, which includes the chip.
In some embodiments, a board card is provided that includes the chip package structure described above. As shown in fig. 11, fig. 11 provides a board that may include other mating components in addition to the chips 389, including but not limited to: a storage device 390, a receiving device 391 and a control device 392;
The memory device 390 is connected to the chip in the chip package structure through a bus for storing data. The memory device may include multiple sets of memory cells 393. Each group of storage units is connected with the chip through a bus. It is understood that each set of memory cells may be DDR SDRAM (English: double Data Rate SDRAM, double Rate synchronous dynamic random Access memory).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on both the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the memory device may include 4 sets of the memory cells. Each set of the memory cells may include a plurality of DDR4 dies. In one embodiment, the chip may include 4 72-bit DDR4 controllers inside, where 64 bits of the 72-bit DDR4 controllers are used to transfer data and 8 bits are used for ECC verification. It is understood that when DDR4-3200 bits are used in each set of memory cells, the theoretical bandwidth of data transfer can reach 25600MB/s.
In one embodiment, each set of memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each storage unit.
The receiving device is electrically connected with the chip in the chip packaging structure. The receiving means is used for realizing data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the receiving device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X10 interface transmission is adopted, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the receiving device may be another interface, and the application is not limited to the specific form of the other interface, and the interface unit may be capable of implementing a switching function. In addition, the calculation result of the chip is still transmitted back to the external device (e.g., server) by the receiving apparatus.
The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may comprise a single chip microcomputer (Micro Controller Unit, MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may drive a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light-load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing circuits and/or a plurality of processing circuits in the chip.
In some embodiments, an electronic device is provided that includes the above board card.
The electronic device may be a data processor, a robot, a computer, a printer, a scanner, a tablet, an intelligent terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.
It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of circuit combinations, but those skilled in the art should appreciate that the present application is not limited by the circuit combinations described, as some circuits may be implemented in other manners or structures according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the devices and modules referred to are not necessarily required in the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (28)

1. A data processor, the data processor comprising: the device comprises a first multiplication circuit, a second multiplication circuit and a partial product exchange circuit, wherein the first multiplication circuit comprises a first coding branch, a first selecting branch and a first compression branch, and the second multiplication circuit comprises a second coding branch, a second selecting branch and a second compression branch; the output end of the first multiplication circuit is connected with the first input end of the partial product switching circuit, the first output end of the partial product switching circuit is connected with the input end of the second multiplication circuit, the output end of the second multiplication circuit is connected with the second input end of the partial product switching circuit, and the second output end of the partial product switching circuit is connected with the input end of the first multiplication circuit;
The first encoding branch is used for encoding received first data to obtain a first partial product after symbol bit expansion, the first selecting branch is used for selecting a first partial product of target encoding from the first partial product after symbol bit expansion, the first compressing branch is used for compressing the first partial product of target encoding to obtain a first target operation result, the second encoding branch is used for encoding received second data to obtain a second partial product after symbol bit expansion, the second selecting branch is used for selecting a second partial product of target encoding from the second partial product after symbol bit expansion, the second compressing branch is used for compressing the second partial product of target encoding to obtain a second target operation result, and the partial product exchanging circuit is used for exchanging the first partial product after symbol bit expansion and the second partial product after symbol bit expansion when the corresponding mode which can be processed currently by the data processor is a multiply-accumulate operation mode; and under the condition that the corresponding mode which can be processed by the data processor currently is other operation modes, the partial product switching circuit is in a suspended state.
2. The data processor of claim 1 wherein the first multiplication circuit and the second multiplication circuit each include a first input for receiving a function select mode signal; the partial product switching circuit comprises a third input end for receiving the function selection mode signal; the function selection mode signal is used to determine a data operation of a corresponding mode that the data processor is currently capable of processing.
3. A data processor according to claim 1 or 2, wherein the partial product switching circuit comprises: the device comprises a function selection mode signal input port, a first partial product output port, a second partial product input port and a second partial product output port, wherein the function selection mode signal input port is used for receiving the function selection mode signal, the first partial product input port is used for receiving a first partial product which is input by a first multiplication circuit and needs to be subjected to symbol bit expansion, the first partial product output port is used for outputting the first partial product which is subjected to symbol bit expansion, the second partial product output port is used for receiving a second partial product which is input by a second multiplication circuit and needs to be subjected to symbol bit expansion, and the second partial product output port is used for outputting the second partial product which is subjected to symbol bit expansion.
4. The data processor of claim 1 or 2, wherein the first multiplication circuit comprises: the output end of the first correction coding sub-circuit is connected with the first input end of the first partial product selection sub-circuit, the second input end of the first partial product selection sub-circuit is connected with the second output end of the partial product exchange circuit, and the output end of the first partial product selection sub-circuit is connected with the first input end of the first correction compression sub-circuit;
the first correction coding sub-circuit is used for carrying out Booth coding processing on the received first data to obtain a first partial product after the sign bit expansion, the first partial product selection sub-circuit is used for receiving a second partial product after the sign bit expansion output by the partial product switching circuit, selecting the first partial product after the sign bit expansion, inputting the received second partial product after the sign bit expansion and the first partial product after the sign bit expansion obtained after the selection as the first partial product of the target coding to the first correction compression sub-circuit, and the first correction compression sub-circuit is used for carrying out accumulation processing on the first partial product of the target coding.
5. The data processor of claim 4 wherein the first modified encoding sub-circuit comprises: a low-order booth encoding unit, a low-order partial product acquisition unit, a selector, a high-order booth encoding unit, a high-order partial product acquisition unit, a low-order selector group unit, and a high-order selector group unit; the first output end of the low-level booth encoding unit is connected with the input end of the selector, the second output end of the low-level booth encoding unit is connected with the first input end of the low-level partial product acquisition unit, the output end of the selector is connected with the first input end of the high-level booth encoding unit, the output end of the high-level booth encoding unit is connected with the first input end of the high-level partial product acquisition unit, the output end of the low-level selector group unit is connected with the second input end of the low-level partial product acquisition unit, and the output end of the high-level selector group unit is connected with the second input end of the high-level partial product acquisition unit;
the low-level booth encoding unit is used for performing booth encoding processing on low-level data in the received first data to obtain a first low-level target encoding, the low-level partial product obtaining unit is used for obtaining a first low-level partial product after sign bit expansion according to the first low-level target encoding, the selector is used for gating a bit supplementing value when the high-level data in the first data are subjected to booth encoding, the high-level booth encoding unit is used for performing booth encoding processing on the high-level data in the received first data and the bit supplementing value to obtain a first high-level target encoding, the high-level partial product obtaining unit is used for obtaining a first high-level partial product after sign bit expansion according to the first high-level target encoding, the low-level selector group unit is used for gating a value in the first low-level partial product after sign bit expansion, and the high-level selector group unit is used for gating a value in the first high-level partial product after sign bit expansion.
6. The data processor of claim 5, wherein the low-order booth encoding unit comprises: a low bit data input port and a low bit target code output port; the low-order data input port is used for receiving low-order data in the first data subjected to Booth coding, and the low-order target coding output port is used for outputting a first low-order target code obtained after the Booth coding is performed on the low-order data in the first data.
7. The data processor of claim 5, wherein the high-order booth encoding unit comprises: a high bit data input port and a high bit target code output port; the high-order data input port is used for receiving the high-order data in the first data subjected to Booth coding, and the high-order target coding output port is used for outputting high-order target codes obtained after the Booth coding is performed on the high-order data in the first data.
8. The data processor of claim 5, wherein the low-order partial product acquisition unit comprises: a low order target code input port, a strobe value input port, a data input port, and a low order partial product output port; the low-order target code input port is used for receiving the low-order target code output by the low-order booth code unit, the gating value input port is used for receiving the value in the low-order partial product after the low-order selector group unit gates, the data input port is used for receiving the second data, and the low-order partial product output port is used for outputting the low-order partial product after the sign bit expansion.
9. The data processor of claim 5, wherein the high-order partial product acquisition unit comprises: a high-order target coding input port, a gating value input port, a data input port and a high-order partial product output port; the high-order target coding input port is used for receiving the high-order target codes output by the high-order booth coding unit, the gating value input port is used for receiving the numerical value in the high-order partial product after the sign bit expansion output after the gating of the high-order selector group unit, the data input port is used for receiving the second data, and the high-order partial product output port is used for outputting the high-order partial product after the sign bit expansion.
10. The data processor of claim 5, wherein the selector comprises: a function selection mode signal input port, a first gating value input port, a second gating value input port and a gating result output port; the function selection mode signal input port is used for receiving the function selection mode signals corresponding to data operation of different modes, the first gating value input port is used for receiving a first gating value, the second gating value input port is used for receiving a second gating value, and the gating result output port is used for outputting the first gating value or the second gating value after gating.
11. The data processor of claim 5, wherein the low-order selector bank unit comprises: and the low-order selector is used for gating the numerical value in the low-order partial product after the sign bit expansion.
12. The data processor of claim 5, wherein the high-order selector bank unit comprises: and the high-order selector is used for gating the numerical value in the high-order partial product after the sign bit expansion.
13. The data processor of claim 4 wherein the first partial product selection sub-circuit comprises: a function selection mode signal input port, a first partial product input port, a second partial product input port, a first partial product output port, and a strobe partial product output port; the function selection mode signal input port is used for receiving the function selection mode signal, the first partial product input port is used for receiving a first partial product after the sign bit expansion input by the first correction coding subcircuit, the second partial product input port is used for receiving a second partial product after the sign bit expansion switched by the partial product switching circuit, the first partial product output port is used for outputting the first partial product after the sign bit expansion required to be switched by the partial product switching circuit, and the gating partial product output port is used for outputting a first partial product after the sign bit expansion after gating and the second partial product after the sign bit expansion.
14. A data processor according to claim 1 or 2, wherein the second multiplication circuit comprises: the output end of the second correction coding sub-circuit is connected with the first input end of the second partial product selection sub-circuit, the second input end of the second partial product selection sub-circuit is connected with the first output end of the partial product exchange circuit, and the output end of the second partial product selection sub-circuit is connected with the first input end of the second correction compression sub-circuit;
the second correction coding sub-circuit is used for carrying out booth coding processing on the received second data to obtain a second partial product after the sign bit expansion, the second partial product selection sub-circuit is used for receiving the second partial product after the sign bit expansion output by the partial product switching circuit, selecting the second partial product after the sign bit expansion, and inputting the second partial product after the sign bit expansion and the second partial product after the sign bit expansion obtained after the selection as the second partial product of the target coding to the second correction compression sub-circuit, and the second correction compression sub-circuit is used for carrying out accumulation processing on the second partial product of the target coding.
15. The data processor of claim 4 wherein the first modified compression sub-circuit comprises: the system comprises a modified Wallace tree group circuit and an accumulation circuit, wherein the output end of the modified Wallace tree group circuit is connected with the input end of the accumulation circuit; the modified Wallace tree group circuit is used for carrying out accumulation processing on each column value in the first low-order partial product of the target code and the first high-order partial product of the target code obtained during data operation processing of different modes to obtain an accumulation operation result, and the accumulation circuit is used for carrying out addition operation on the accumulation operation result.
16. The data processor of claim 15 wherein the modified wallace tree group circuit comprises: the low-level Wallace tree sub-circuit, the selector and the high-level Wallace tree sub-circuit are connected, wherein the output end of the low-level Wallace tree sub-circuit is connected with the input end of the selector, and the output end of the selector is connected with the input end of the high-level Wallace tree sub-circuit; the low-order Wallace tree sub-circuit is used for carrying out accumulation operation on each column number value in the first partial product of the target code to obtain an accumulation operation result, the selector is used for gating a carry input signal received by the high-order Wallace tree sub-circuit, and the high-order Wallace tree sub-circuit is used for carrying out accumulation operation on each column number value in the first partial product of the target code to obtain the accumulation operation result.
17. The data processor of claim 15 wherein the accumulation circuit comprises: and the adder is used for carrying out addition operation on the accumulation operation result.
18. A method of data processing, the method comprising:
receiving data to be processed and a function selection mode signal, wherein the function selection mode signal is used for indicating data operation of a corresponding mode which can be processed currently by a data processor;
according to the function selection mode signal, carrying out coding processing on the data to be processed to obtain a target code; the target code comprises a first target code and a second target code;
obtaining a first partial product of sign bit expansion through the first target code and the data to be processed; obtaining a second partial product of the sign bit expansion through the second target code and the data to be processed;
determining the data operation of the corresponding mode which can be processed by the data processor currently according to the function selection mode signal;
according to the data operation of the corresponding mode, determining whether exchange processing is needed between the first partial product after the sign bit expansion and the second partial product after the sign bit expansion, and acquiring a partial product of target coding according to a determination result;
And compressing the partial product of the target code to obtain a target operation result.
19. The method of claim 18, wherein the encoding the data to be processed according to the function selection mode signal to obtain a target code comprises: determining the data operation of a specific mode which can be processed currently by the data processor according to the function selection mode signal; and carrying out Booth coding treatment on the data to be processed according to the data operation of the specific mode to obtain a target code.
20. The method according to claim 18 or 19, wherein the obtaining the partial product of the target code according to the determination result includes:
and if the determination result is that the exchange processing is not needed, taking the first partial product after the sign bit expansion and the second partial product after the sign bit expansion as the partial products of the target coding.
21. The method of claim 20, wherein the method further comprises: and if the determined result is that the exchange processing is needed, carrying out the exchange processing on the first partial product after the sign bit expansion and the second partial product after the sign bit expansion.
22. A machine learning computing device, characterized in that the machine learning computing device comprises one or more data processors according to any one of claims 1-17, and is configured to obtain input data and control information to be computed from other processing devices, perform specified machine learning computation, and transmit the execution result to the other processing devices through an I/O interface;
when the machine learning operation device comprises a plurality of data processors, the data processors are connected through a preset specific structure and data are transmitted;
the data processors are interconnected through the PCIE bus and transmit data so as to support larger-scale machine learning operation; a plurality of data processors share the same control system or have respective control systems; a plurality of data processors share a memory or have respective memories; the interconnection mode of a plurality of the data processors is any interconnection topology.
23. A combination processing device, comprising the machine learning computing device of claim 22, a universal interconnect interface, and other processing devices;
the machine learning operation device interacts with the other processing devices to jointly complete the calculation operation designated by the user.
24. The combination processing device of claim 23, further comprising: and a storage device connected to the machine learning operation device and the other processing device, respectively, for storing data of the machine learning operation device and the other processing device.
25. A neural network chip, characterized in that the neural network chip comprises the machine learning arithmetic device of claim 22 or the combination processing device of claim 23 or the combination processing device of claim 24.
26. An electronic device comprising the neural network chip of claim 25.
27. A board, characterized in that, the board includes: a memory device, a receiving means and a control device, a neural network chip as claimed in claim 25;
the neural network chip is respectively connected with the storage device, the control device and the receiving device;
the storage device is used for storing data;
the receiving device is used for realizing data transmission between the neural network chip and external equipment;
the control device is used for monitoring the state of the neural network chip.
28. The board card of claim 27, wherein the board card is configured to,
the memory device includes: each group of storage units is connected with the neural network chip through a bus, and the storage units are as follows: DDR SDRAM;
the neural network chip includes: the DDR controller is used for controlling data transmission and data storage of each storage unit;
the receiving device is as follows: standard PCIE interfaces.
CN201910902845.2A 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment Active CN110688087B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910902845.2A CN110688087B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910902845.2A CN110688087B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment

Publications (2)

Publication Number Publication Date
CN110688087A CN110688087A (en) 2020-01-14
CN110688087B true CN110688087B (en) 2024-03-19

Family

ID=69109972

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910902845.2A Active CN110688087B (en) 2019-09-24 2019-09-24 Data processor, method, chip and electronic equipment

Country Status (1)

Country Link
CN (1) CN110688087B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222132B (en) * 2021-05-22 2023-04-18 上海阵量智能科技有限公司 Multiplier, data processing method, chip, computer device and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1550975A (en) * 2003-05-09 2004-12-01 三星电子株式会社 Montgomery modular multiplier and method thereof
CN1729628A (en) * 2002-10-11 2006-02-01 米特公司 System for direct acquisition of received signals
CN101010665A (en) * 2004-08-26 2007-08-01 松下电器产业株式会社 Multiplying device
CN101382882A (en) * 2008-09-28 2009-03-11 宁波大学 Booth encoder based on CTGAL and thermal insulation complement multiplier-accumulator
CN101384991A (en) * 2006-02-15 2009-03-11 松下电器产业株式会社 Multiplier, digital filter, signal processing device, synthesis device, synthesis program, and synthesis program recording medium
CN101685385A (en) * 2008-09-28 2010-03-31 北京大学深圳研究生院 Complex multiplier
US9519460B1 (en) * 2014-09-25 2016-12-13 Cadence Design Systems, Inc. Universal single instruction multiple data multiplier and wide accumulator unit
CN106897046A (en) * 2017-01-24 2017-06-27 青岛朗思信息科技有限公司 A kind of fixed-point multiply-accumulator
CN210006030U (en) * 2019-09-24 2020-01-31 上海寒武纪信息科技有限公司 Data processor

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7797366B2 (en) * 2006-02-15 2010-09-14 Qualcomm Incorporated Power-efficient sign extension for booth multiplication methods and systems
KR20130111721A (en) * 2012-04-02 2013-10-11 삼성전자주식회사 Method of generating booth code, computer system and computer readable medium, and digital signal processor
US10817587B2 (en) * 2017-02-28 2020-10-27 Texas Instruments Incorporated Reconfigurable matrix multiplier system and method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1729628A (en) * 2002-10-11 2006-02-01 米特公司 System for direct acquisition of received signals
CN1550975A (en) * 2003-05-09 2004-12-01 三星电子株式会社 Montgomery modular multiplier and method thereof
CN101010665A (en) * 2004-08-26 2007-08-01 松下电器产业株式会社 Multiplying device
CN101384991A (en) * 2006-02-15 2009-03-11 松下电器产业株式会社 Multiplier, digital filter, signal processing device, synthesis device, synthesis program, and synthesis program recording medium
CN101382882A (en) * 2008-09-28 2009-03-11 宁波大学 Booth encoder based on CTGAL and thermal insulation complement multiplier-accumulator
CN101685385A (en) * 2008-09-28 2010-03-31 北京大学深圳研究生院 Complex multiplier
US9519460B1 (en) * 2014-09-25 2016-12-13 Cadence Design Systems, Inc. Universal single instruction multiple data multiplier and wide accumulator unit
CN106897046A (en) * 2017-01-24 2017-06-27 青岛朗思信息科技有限公司 A kind of fixed-point multiply-accumulator
CN210006030U (en) * 2019-09-24 2020-01-31 上海寒武纪信息科技有限公司 Data processor

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种嵌入于微处理器的8位乘加器的设计;韩桂泽;胡越黎;向慧芳;;计算机测量与控制(第05期);全文 *
一种旨在优化速度的多功能乘累加器设计;张晓潇;陈杰;韩亮;林川;;科学技术与工程(第13期);全文 *

Also Published As

Publication number Publication date
CN110688087A (en) 2020-01-14

Similar Documents

Publication Publication Date Title
CN111008003B (en) Data processor, method, chip and electronic equipment
CN110515589B (en) Multiplier, data processing method, chip and electronic equipment
CN110362293B (en) Multiplier, data processing method, chip and electronic equipment
CN110554854B (en) Data processor, method, chip and electronic equipment
CN110673823B (en) Multiplier, data processing method and chip
CN110515587B (en) Multiplier, data processing method, chip and electronic equipment
CN111258544B (en) Multiplier, data processing method, chip and electronic equipment
CN110688087B (en) Data processor, method, chip and electronic equipment
CN111258633B (en) Multiplier, data processing method, chip and electronic equipment
CN111258541B (en) Multiplier, data processing method, chip and electronic equipment
CN113031912A (en) Multiplier, data processing method, device and chip
CN210109789U (en) Data processor
CN110647307B (en) Data processor, method, chip and electronic equipment
CN210006030U (en) Data processor
CN210006029U (en) Data processor
CN110515588B (en) Multiplier, data processing method, chip and electronic equipment
CN110515586B (en) Multiplier, data processing method, chip and electronic equipment
CN111258545B (en) Multiplier, data processing method, chip and electronic equipment
CN210006031U (en) Multiplier and method for generating a digital signal
CN209879493U (en) Multiplier and method for generating a digital signal
CN113033788B (en) Data processor, method, device and chip
WO2020108486A1 (en) Data processing apparatus and method, chip, and electronic device
CN113031915A (en) Multiplier, data processing method, device and chip
CN113031909B (en) Data processor, method, device and chip
CN113033799B (en) Data processor, method, device and chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant