WO2019023910A1 - 数据处理方法和设备 - Google Patents

数据处理方法和设备 Download PDF

Info

Publication number
WO2019023910A1
WO2019023910A1 PCT/CN2017/095334 CN2017095334W WO2019023910A1 WO 2019023910 A1 WO2019023910 A1 WO 2019023910A1 CN 2017095334 W CN2017095334 W CN 2017095334W WO 2019023910 A1 WO2019023910 A1 WO 2019023910A1
Authority
WO
WIPO (PCT)
Prior art keywords
bit
data processing
instruction
multipliers
adder
Prior art date
Application number
PCT/CN2017/095334
Other languages
English (en)
French (fr)
Inventor
仇晓颖
韩彬
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2017/095334 priority Critical patent/WO2019023910A1/zh
Priority to CN201780004422.8A priority patent/CN108475188A/zh
Publication of WO2019023910A1 publication Critical patent/WO2019023910A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting

Definitions

  • the present disclosure relates to the field of data processing technology, and more particularly, the present disclosure relates to a data processing method and apparatus.
  • the multiplier is a key component of high-performance digital signal processing (DSP) and is the core of real-time high-speed signal processing.
  • DSP digital signal processing
  • MAC Multiply and Accumulate
  • FFT Fast Fourier Transform
  • convolution convolution
  • filtering filtering
  • the MAC unit is an important factor affecting the critical path delay, so it is also the key to affect the performance of DSP applications. It can be seen that achieving a low latency, high throughput MAC unit is critical to high performance DSP. On the other hand, different DSP applications require different bit width multiplication operations. Therefore, how to implement high bit width wide multipliers by using low bit width multipliers to achieve resource reuse is also important for DSP architecture design. Link.
  • the present disclosure provides a MAC multi-mode work processing unit with resource multiplexing capability, which is capable of selecting different modes of multiply and accumulate operations according to different instructions. Specifically, the processing unit generates a partial product in parallel using a multiplier array, and then implements multiplication splicing of a low bit width to a high bit width by shifting and adding operations.
  • a data processing circuit including a calculation unit, an input unit, and an output unit.
  • the calculation unit includes an adder and a plurality of N-bit multipliers.
  • the input unit is configured to provide input to the multiplier.
  • the output unit is configured to output a calculation result of the calculation unit.
  • the data processing circuit further includes a configuration unit configured to configure at least one of the plurality of N-bit multipliers such that the configured N-bit multiplier The operation can be performed based on the control information.
  • the input unit is configured to generate an input of the multiplier based on the control information.
  • the output unit is configured to output a calculation result of the calculation unit based on the control information.
  • control information indicates at least one of the following modes of operation: an N-bit multiplication or a 2N-bit multiplication.
  • the computational unit in the data processing circuit includes two N-bit multipliers and one 2N-bit adder.
  • the computational unit in the data processing circuit includes four N-bit multipliers and one N-bit adder and three 2N-bit adders.
  • a data processing system comprising a data processing circuit and an instruction decoding unit according to the above, the instruction decoding unit configured to obtain an instruction and decode the instruction to obtain data for Processing circuit control information.
  • the instruction decoding unit is further configured to not decode the current instruction if the new instruction conflicts with the current instruction.
  • a new instruction occurs and the current instruction is decoded multiple times for any one of the multipliers within a set threshold time, or the required multiplier exceeds the idle multiplication in the data processing system. Then, it determines that the new instruction conflicts with the current instruction.
  • control information indicates at least one of the following modes of operation: an N-bit multiplication or a 2N-bit multiplication.
  • a method for processing data comprising: providing an input to a plurality of N-bit multipliers; performing an calculation using an adder and the plurality of N-bit multipliers; and outputting a final Calculation results.
  • the method further includes configuring at least one of the plurality of N-bit multipliers such that the configured N-bit multiplier is capable of performing operations in accordance with the control information.
  • control information indicates at least one of the following modes of operation: an N-bit multiplication or a 2N-bit multiplication.
  • the method further includes obtaining an instruction and decoding the instruction to obtain control information.
  • the current instruction is not decoded.
  • a new instruction occurs and the current instruction is decoded multiple times for any one of the multipliers within a set threshold time, or the required multiplier exceeds the idle multiplication in the data processing system. Then, it determines that the new instruction conflicts with the current instruction.
  • a processor in accordance with another aspect of the present disclosure, includes a data processing system in accordance with the above.
  • the technical solution of the present disclosure can achieve high bit width multiplication by shifting and splicing low bit width multiplication, minimizing the bit width of the adder, and reducing the use of hardware resources, simplifying the circuit. structure.
  • FIG. 1 is a block diagram showing a data processing circuit in accordance with one embodiment of the present disclosure
  • FIG. 2 is a block diagram showing the details of the data processing circuit of Figure 1;
  • FIG. 3 is a block diagram showing details of the data processing circuit of Figure 1;
  • FIG. 4 is a block diagram showing a data processing system in accordance with one embodiment of the present disclosure.
  • FIG. 5 is a block diagram showing details of the data processing system of Figure 4.
  • FIG. 6 is a block diagram showing details of the data processing system of Figure 4.
  • FIG. 7 is a flowchart illustrating a data processing method according to an embodiment of the present disclosure.
  • the present disclosure provides a MAC processing circuit with resource multiplexing capability, which can save the number of hardware resources by using multiple lower-order multipliers to splicing into higher-order multipliers and using lower-order adders. . Furthermore, the MAC processing circuit proposed by the present disclosure is capable of selecting different modes of MAC operations according to different instructions. Thus, through resource reuse, the amount of hardware resources is reduced, and the structure of the circuit is simplified.
  • FIG. 1 is a block diagram showing a data processing circuit in accordance with one embodiment of the present disclosure.
  • the data processing circuit 100 includes an input unit 110, a calculation unit 120, and an output unit 130.
  • data processing circuit 100 may also include configuration unit 140 (shown in phantom in Figure 1). Next, the operation of the respective components of the data processing circuit 100 will be described in detail.
  • N 2 n and n is a natural number greater than zero.
  • an N-bit multiplier can be used to perform multiplication of N bits * N bits.
  • the calculation unit 120 may include a plurality of multipliers having different values of N.
  • computing unit 120 may include one or more 2 n-1 bit multipliers, including one or more 2 n bit multipliers, or one or more 2 n+1 bit multipliers, and so on.
  • a multiplication of 2N*2N is implemented, and then a 2N*2N multiplier thus formed is used to implement a multiplication of 4N*2N in accordance with the teachings of the present disclosure.
  • a 2N*2N multiplication operation can be implemented first using four N-bit multipliers according to the teachings of the present disclosure, and then 4N*2N multipliers thus formed are used to implement 4N*4N according to the teachings of the present disclosure.
  • Multiplication operation The same principle can be applied to higher order multiplication implementations.
  • Input unit 110 is configured to provide input to a plurality of N-bit multipliers in computing unit 120. Further, the input unit 110 can receive input data and extract a plurality of operands from the input data as inputs to a plurality of N-bit multipliers. Details of this operation will be described in detail below with reference to the accompanying drawings and examples.
  • the output unit 130 is configured to output a calculation result of the calculation unit 120. Further, the output unit 130 may select appropriate data from the calculation results of the calculation unit 120 to output. Again, the details of this operation will be described in detail below in conjunction with the drawings and examples.
  • the technical solution of the present disclosure can achieve high bit width multiplication by shifting and splicing low bit width multiplication, minimizing the bit width of the adder, and reducing the use of hardware resources, simplifying the circuit. structure.
  • data processing circuit 100 can also optionally include configuration unit 140.
  • the configuration unit 140 is configured to configure the input unit 110 and the output unit 130 such that the data processing circuit 100 can perform operations in accordance with control information.
  • control information may indicate, for example, at least one of the following operation modes: N-bit multiplication (N-bit *N-bit multiplication) or 2N-bit multiplication (including N-bit *2N-bit multiplication and 2N bits*) 2N bit multiplication).
  • the configuration unit 140 can perform one 2N-bit*2N-bit by using the four N-bit multipliers in the computing unit 120 by configuring the input unit 110 and the output unit 130.
  • the multiplication operation either performs two 2N-bit *N-bit multiplication operations, or performs one 2N-bit *N-bit multiplication operation and simultaneously performs one or two N-bit *N-bit multiplication operations.
  • the same principle applies to the case where the computing unit 120 includes a greater number of multipliers.
  • the configuration unit 140 can configure the input unit 110 such that the input unit 110 is configured to generate an input of at least a portion of the plurality of N-bit multipliers based on the control information. For example, assume that computing unit 120 includes four N-bit multipliers. At this time, if you want to perform 1 2N bit * 2N bit multiplication or 2 2N bits * N bits The multiplication operation, the configuration unit 140 configures the input unit 110 to generate inputs for all four N-bit multipliers. Alternatively, if it is desired to perform a multiplication operation of 2NN*N bits or perform a multiplication of 2 N bits*N bits, the configuration unit 140 configures the input unit 110 to generate for 2 N-bit multipliers input of.
  • the configuration unit 140 may The inputs are configured accordingly to ensure that the data processing circuitry is working properly.
  • the specific configuration method may be sequential. For example, the method of selecting the first-in first-out FIFO, the data entering the input unit 110 first enters the calculation unit for calculation, or may be sorted according to the size of the resource occupation, such as resource occupancy. Large calculations are performed first, and the calculation is performed after the resource occupancy is small. It can also be numbered according to different tasks in different order. The higher the number represents, the first calculation is performed, etc., the specific configuration method is not done here. limited.
  • the configuration unit 140 may configure the output unit 130 such that the output unit 130 is configured to output the calculation result of the calculation unit 120 according to the control information.
  • the computing unit 120 includes four N-bit multipliers.
  • the configuration unit 140 configures the output unit 130 to be based on all four N-bit multipliers. Output to form the final calculation result.
  • the configuration unit 140 configures the output unit 130 to be based on 2 N-bit multipliers Output to form the final calculation result.
  • the corresponding output can be selected according to the selected input, or the corresponding output can be directly selected according to the configuration information.
  • the data processing circuit can be fully utilized, and the parallelization of multiple calculations can be realized while resource multiplexing can be realized, which saves resources and improves computational efficiency.
  • the input unit 110 may extract the input data to obtain an operand for the calculation unit 120.
  • the sign bits of each operand can be extracted and processed by the symbol processing module to obtain the last sign bit to be merged.
  • the absolute value of each operand extracted can be calculated.
  • combine the multipliers including multiple The multiplier provides these absolute values to each unsigned multiplier and processes it.
  • the output of the combined multiplier and the output (symbol bits) of the symbol processing module are combined (ie, the positive sign remains unchanged and the negative sign complements the code).
  • output unit 130 extends the output to a specified bit width output.
  • FIG. 3 is a block diagram showing details of the data processing circuit of FIG. 1.
  • the circuit structure shown in FIG. 3 can be considered as a specific embodiment of the circuit in FIG. 2. below,
  • the extraction and absolute value module (Extract & ABS) is part of the input unit 110 shown in FIG.
  • the example input data shown in Figure 3 has a width of 128 bits (input in Figure 3: 127...0).
  • the extraction and absolute value module extracts operands from the input data based on the control information (ie, SEL in Figure 3). Assuming that the data processing circuit shown in Figure 3 includes four 16-bit multipliers, the meaning of SEL can be: 00 for four 16x16 multiplications, 01 for two 32x16 multiplications, and 10 for one 32x32 multiplication.
  • the extraction and absolute value module may obtain the extracted unsigned operand and the corresponding sign bit according to the control information.
  • the extracted unsigned operand is provided to the selection unit (which is also part of the input unit 110 shown in FIG. 1).
  • the extraction and absolute value modules can calculate the absolute value of each operand, and obtain each 16-bit operand that is split into the above operands for supply to the corresponding 16x16 unsigned multiplier.
  • the selection unit is represented by "selector (MUX)" in FIG.
  • MUX multiplier
  • the eight selectors in Figure 3 provide input to four 16-bit multipliers (16*16 multipliers) based on the SEL signal, respectively.
  • the SEL signal when the SEL signal is 10, it means that one 32*32 multiplication is to be calculated, and four 16-bit multipliers perform a multiplication operation, and the result is sent to the adder.
  • three 32-bit + 16-bit adders and a 16-bit + 16-bit adder are used to add the outputs of the four 16-bit multipliers to obtain the desired result, where The unit uses the appropriate Most significant bit (MSB) and Least significant bit (LSB) to complete the design of the entire computing unit.
  • MSB Most significant bit
  • LSB Least significant bit
  • SEL When SEL is 01, it means that 1 or 2 16*32 multiplications are to be calculated. If only one 16*32 multiplication is calculated, the multiplication operation is performed using two 16-bit multipliers. If two 16*32 multiplications are calculated, the multiplication operation is still performed using four 16-bit multipliers.
  • SEL When SEL is 00, it means that 1-4 16*16 multiplications are to be calculated. A corresponding number of 16-bit multipliers can be used to perform the corresponding number of multiplication operations.
  • the SEL signal can indicate that only a portion of a plurality of N-bit multipliers are used, for example, only three 16*16 multipliers operate.
  • the above scheme can also be arbitrarily combined under the reasonable allocation of multiplier resources.
  • the SEL signal may indicate that one 16-bit * 32-bit multiplication and two 16-bit * 16-bit multiplications are simultaneously calculated.
  • FIG. 3 only shows the case where there are four multipliers.
  • the multiplier and the adder can be added according to the above rules to perform higher bit calculation.
  • eight 16-bit*16-bit multipliers or 16 16-bit*16s can be used.
  • a bit multiplier implements the MAC processing circuit of the present disclosure. Accordingly, the same principle can be used to implement the SEL signal to obtain resource multiplexing.
  • the output of the adder and the result of the symbolic operation are provided to the selector at the bottom of FIG. 3, which can be used as the output unit 130 shown in FIG. Specifically, the selector can output the result of the adder and the symbol operation to a specified bit width according to the SEL signal (ie, control information).
  • resource multiplexing of multipliers of different bit widths can be realized, and the structural design is simplified at the time of designing the structure. That is, low bit width addition, shift operation, and bit splicing are used to achieve high bit width addition, thereby reducing resource usage.
  • implementing a 32x16 multiplier requires only two 16x16 multipliers and one 32+16 adder.
  • Implementing a 32x32 multiplier requires only four 16x16 multipliers, three 32+16 adders, and one 16+16 adder.
  • the multiplier used can also be an 8-bit multiplier for 8*16, 16*16, 16*32, 32*32, etc. calculations.
  • the (2N)*N or (2N)*(2N) multiplier can be implemented by a plurality of N*N multipliers according to the technical solution of the present disclosure. For example, implementing one (2N)*N multiplier requires two N*N multipliers and one (2N)+N adder, while implementing one (2N)*(2N) multiplier requires four N*N multiplications. , 3 (2N) + N adders and 1 N + N adder.
  • data processing system 400 includes an instruction decoding unit 410 and a data processing circuit 420. In the following, the operation of the various components of data processing system 400 is described in detail.
  • Instruction decoding unit 410 is configured to obtain an instruction and decode the instruction to Control information for data processing circuit 420 is obtained.
  • the control information may indicate at least one of the following modes of operation: N-bit multiplication (N-bit *N-bit multiplication) or 2N-bit multiplication (including N-bit *2N-bit multiplication and 2N bits*) 2N bit multiplication).
  • the instruction decoding unit 410 does not decode the current instruction. For example, when a new instruction occurs and the current instruction performs multiple data inputs to any one of the multipliers within a set threshold time after decoding, or if the required multiplier exceeds an idle multiplier in the data processing system, The instruction decoding unit 410 does not decode the current instruction or delays the decoding of the current instruction.
  • Data processing circuit 420 is similar to the data processing circuit described above (e.g., data processing circuit 100 shown in FIG. 1). Therefore, a detailed description of the data processing circuit 420 is omitted here.
  • the instruction decoding unit 410 by decoding the instruction by the instruction decoding unit 410, it is possible to efficiently generate control information for the data processing circuit 420.
  • the instruction decoding unit 410 has an error detection function, which can avoid executing an erroneous instruction when an instruction conflict occurs, thereby ensuring normal operation of the data processing circuit 420.
  • FIG. 5 is a block diagram showing details of the data processing system of Figure 4. As shown in Figure 5, the data processing system can be divided into control paths and data paths.
  • the control path generates control information for the data path in a joint configuration of the instruction drive and control registers (eg, by the instruction decoder (Instrc_decoder) shown in FIG. 5).
  • the data path completes the corresponding computing function.
  • control path can include control register port control logic and instruction paths, and this combination can effectively reduce the number of instructions.
  • Control Register Port Control Logic generates control information for the data path by parsing the control registers.
  • the data path can include a load unit, a calculation unit, and a storage unit.
  • the load unit completes the operation of fetching from the data port.
  • the data and control signals loaded by the load unit are input to the calculation unit.
  • the calculation unit (represented by the pre-processing (P_Proc) module and MAC0...MAC31 in Fig. 5) performs a data calculation operation.
  • the storage unit stores the calculation result of the calculation unit to provide an output.
  • FIG. 6 is a block diagram showing details of the data processing system of Figure 4.
  • the instruction classification update (Instr_assort_update) module is an overall control module of the instruction path, which receives an instruction load (Instr_load) signal and determines whether to receive a new instruction according to the current state.
  • the instruction classification update module rejects the current instruction and generates an instruction conflict signal (instr_conflict).
  • the instruction classification update module If no instruction conflict occurs, the instruction classification update module generates control signals for the instruction decoder 0 and the instruction decoder 1 based on the instruction status and the instruction information acquired by the instruction bus (Instr_bus) interface.
  • the instruction decoder 0 and the instruction decoder 1 are responsible for decoding the instructions to generate control information.
  • the Control Signal Assort module combines the control information to finally control the data path (Data_path_M0...Datapath_M2 shown in Figure 6).
  • FIG. 7 is a flow chart showing a data processing method 70 in accordance with one embodiment of the present disclosure.
  • step S710 inputs are provided to a plurality of N-bit multipliers.
  • N 2 n and n is a natural number greater than zero.
  • step S720 calculation is performed using an adder and a plurality of N-bit multipliers, the adder including an N-bit adder and a 2N-bit adder.
  • step S730 the final calculation result is output.
  • control information may indicate at least one of the following modes of operation: an N-bit multiplication operation or a 2N-bit multiplication operation.
  • instructions can be obtained and decoded to obtain control information. If the new instruction conflicts with the current instruction, the current instruction is not decoded. For example, when a data input to the same multiplier occurs at the same time, or when the required multiplier exceeds the idle multiplier, the current instruction is not decoded.
  • processors can include a data processing system (eg, data processing system 400 shown in FIG. 4) in accordance with the above.
  • the processor can be designed as a digital signal processor DSP that can be applied to a variety of scenarios including, but not limited to, machine vision processing, image signal processing And so on.
  • the above-described embodiments of the present disclosure may be implemented by software, hardware, or a combination of both software and hardware.
  • embodiments of the present disclosure disclosed herein can be implemented on a computer program product.
  • the computer program product is a product having a computer readable medium encoded with computer program logic that, when executed on a computing device, provides related operations to implement The above technical solution of the present disclosure.
  • the computer program logic When executed on at least one processor of a computing system, the computer program logic causes the processor to perform the operations (methods) described in the embodiments of the present disclosure.
  • the arrangements of the present disclosure are typically provided as software, code, and/or other data structures, or such as one or more ROMs, that are arranged or encoded on a computer readable medium such as an optical medium (eg, a CD-ROM), a floppy disk, or a hard disk. Or firmware or microcode of other media on the RAM or PROM chip, or downloadable software images, shared databases, etc. in one or more modules.
  • Software or firmware or such a configuration may be installed on the computing device such that one or more processors in the computing device perform the technical solutions described in the embodiments of the present disclosure.
  • each functional module or individual feature of the device used in each of the above embodiments may be implemented or executed by circuitry, typically one or more integrated circuits.
  • Circuitry designed to perform the various functions described in this specification can include general purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs) or general purpose integrated circuits, field programmable gate arrays (FPGAs), or others.
  • a general purpose processor may be a microprocessor, or the processor may be an existing processor, controller, microcontroller, or state machine.
  • the above general purpose processor or each circuit may be configured by a digital circuit or may be configured by a logic circuit.
  • the present disclosure can also use an integrated circuit obtained by using the advanced technology.
  • the program running on the device may be a program that causes a computer to implement the functions of the embodiments of the present disclosure by controlling a central processing unit (CPU).
  • the program or information processed by the program may be temporarily stored in a volatile memory (such as a random access memory RAM), a hard disk drive (HDD), a non-volatile memory (such as a flash memory), or other memory system.
  • a program for realizing the functions of the embodiments of the present disclosure may be recorded on a computer readable recording medium.
  • the corresponding functions can be realized by causing a computer system to read programs recorded on the recording medium and execute the programs.
  • the so-called "computer system” herein may be a computer system embedded in the device, and may include an operating system or hardware (such as a peripheral device).
  • the "computer readable recording medium” may be a semiconductor recording medium, an optical recording medium, a magnetic recording medium, a recording medium of a short-term dynamic storage program, or any other recording medium readable by a computer.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Executing Machine-Instructions (AREA)
  • Advance Control (AREA)

Abstract

一种数据处理电路,包括:计算单元,包括加法器和多个N位乘法器;输入单元,被配置为向所述乘法器提供输入;以及输出单元,被配置为输出所述计算单元的计算结果;其中,所述加法器包括N位加法器和2N位加法器,N=2n,n为大于0的自然数。还提供了一种操作数据处理电路的方法以及相应的设备。

Description

数据处理方法和设备 技术领域
本公开涉及数据处理技术领域,更具体地,本公开涉及一种数据处理方法和设备。
背景技术
乘法器是高性能数字信号处理器(Digital signal processing,DSP)的关键部件,是进行实时高速信号处理的核心。其中,乘累加(Multiply and Accumulate,MAC)操作是很多DSP应用的基本操作,如快速傅里叶变换(Fast Fourier transform,FFT)、卷积、滤波等。
一方面,对于DSP应用来说,MAC单元是影响关键路径延时的重要因素,因此也是影响DSP应用性能的关键。由此可见,实现低延时,高吞吐率的MAC单元,对高性能DSP至关重要。另一方面,不同的DSP应用需要用到不同位宽的乘法操作,因此,如何利用低比特位宽乘法器来实现高比特位宽乘法器,从而实现资源的复用,也是DSP架构设计的重要环节。
发明内容
本公开提供了一种具有资源复用能力的MAC多模式工作处理单元,能够根据不同指令选择不同模式的乘累加操作。具体地,该处理单元采用乘法器阵列并行地产生部分乘积,然后通过移位和加法操作来实现低比特位宽到高比特位宽的乘法拼接。
根据本公开的一个方面,提供了一种数据处理电路,包括计算单元、输入单元和输出单元。计算单元包括加法器和多个N位乘法器。输入单元被配置为向乘法器提供输入。输出单元被配置为输出计算单元的计算结果。其中,加法器包括N位加法器和2N位加法器,N=2n,n为大于0的自然数。
在一个实施例中,数据处理电路还包括配置单元,该配置单元对多个N位乘法器中的至少一个进行配置,使得所配置的N位乘法器 能够根据控制信息来执行操作。
在一个实施例中,输入单元被配置为根据控制信息来产生乘法器的输入。
在一个实施例中,输出单元被配置为根据控制信息来输出计算单元的计算结果。
在一个实施例中,控制信息指示以下至少一个操作模式:N位的乘法运算或者2N位的乘法运算。
在一个实施例中,数据处理电路中的计算单元包括2个N位乘法器以及1个2N位加法器。
在一个实施例中,数据处理电路中的计算单元包括4个N位乘法器以及1个N位加法器和3个2N位加法器。
根据本公开的另一个方面,提供了一种数据处理系统,包括根据上文所述的数据处理电路以及指令解码单元,该指令解码单元被配置为获得指令并对指令进行解码以得到用于数据处理电路的控制信息。
在一个实施例中,指令解码单元还被配置为:如果新的指令与当前指令发生冲突,则不对当前指令进行解码。
在一个实施例中,如果发生新的指令与当前指令在解码后对任意一个乘法器在设定的阈值时间内进行多次数据输入,或者所需要的乘法器超过数据处理系统中的空闲的乘法器,则确定新的指令与当前指令发生冲突。
在一个实施例中,控制信息指示以下至少一个操作模式:N位的乘法运算或者2N位的乘法运算。
根据本公开的另一个方面,提供了一种用于处理数据的方法,包括:向多个N位乘法器提供输入;利用加法器和所述多个N位乘法器执行计算;以及输出最终的计算结果。其中,所述加法器包括N位加法器和2N位加法器,N=2n,n为大于0的自然数。
在一个实施例中,该方法还包括:对多个N位乘法器中的至少一个进行配置,使得所配置的N位乘法器能够根据控制信息来执行操作。
在一个实施例中,控制信息指示以下至少一个操作模式:N位的乘法运算或者2N位的乘法运算。
在一个实施例中,该方法还包括:获得指令并对指令进行解码,以得到控制信息。
在一个实施例中,如果新的指令与当前指令发生冲突,则不对当前指令进行解码。
在一个实施例中,如果发生新的指令与当前指令在解码后对任意一个乘法器在设定的阈值时间内进行多次数据输入,或者所需要的乘法器超过数据处理系统中的空闲的乘法器,则确定新的指令与当前指令发生冲突。
根据本公开的另一个方面,提供了一种处理器,该处理器包括根据上文所述的数据处理系统。
本公开的技术方案能够通过对低比特位宽的乘法进行移位和拼接来实现高比特位宽的乘法,最大程度地降低了加法器的位宽,并且降低了硬件资源的使用,简化电路的结构。
附图说明
通过下文结合附图的详细描述,本公开的上述和其它特征将会变得更加明显,其中:
图1是示出了根据本公开一个实施例的数据处理电路的框图;
图2是示出了图1中的数据处理电路的细节的框图;
图3是示出了图1中的数据处理电路的细节的框图;
图4是示出了根据本公开一个实施例的数据处理系统的框图;
图5是示出了图4中的数据处理系统的细节的框图;
图6是示出了图4中的数据处理系统的细节的框图;以及
图7是示出了根据本公开一个实施例的数据处理方法的流程图;
在以下描述中,通过相同或相似的附图标记来表示相同或相似的元件或步骤。需要说明的是,附图中的元件并非一定按照真实比例而绘制,而是旨在说明本公开的技术方案的原理。
具体实施方式
下面结合附图和具体实施方式对本公开进行详细阐述。应当注 意,本公开不应局限于下文所述的具体实施方式。另外,为了简便起见,省略了对与本公开没有直接关联的公知技术的详细描述,以防止对本公开的理解造成混淆。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。在不冲突的情况下,下述的实施例及实施方式中的特征可以相互组合。
本公开提供了一种具有资源复用能力的MAC处理电路,通过采用多个较低位的乘法器来拼接成较高位的乘法器,并且采用较低位的加法器,能够节省硬件资源的数量。此外,本公开所提出的MAC处理电路能够根据不同指令选择不同模式的MAC操作。由此,通过资源复用,降低了硬件资源用量,简化了电路的结构。
图1是示出了根据本公开一个实施例的数据处理电路的框图。如图1所示,数据处理电路100包括输入单元110、计算单元120和输出单元130。备选地,数据处理电路100还可以包括配置单元140(如图1中虚线所示)。下面,详细描述数据处理电路100的各个组成部分的操作。
计算单元120可以包括加法器和多个N位乘法器,其中N=2n,n为大于0的自然数。如本领域技术人员所熟知,N位乘法器可以用于执行N位*N位的乘法运算。需要说明,计算单元120可以包括N的取值不同的多个乘法器。例如,计算单元120可以包括一个或多个2n-1位的乘法器,同时包括一个或多个2n位的乘法器,或者一个或多个2n+1位的乘法器,依次类推。
计算单元120还可以包括N位加法器和2N位加法器,其中,N=2n,n为大于0的自然数。可以理解,2N位加法器可以是2N位+N位的加法器,也可以是2N位+2N位的加法器.
在本公开的技术方案中,当采用多个N位乘法器来执行N位*2N位的乘法运算时,或者执行2N位*2N位的乘法运算时,最多只会使用2N位的加法器,而无需使用大于2N位的加法器。另外,可以根据本公开的教导采用多个N位乘法器来实现4N位*2N位或4N位*4N位的乘法。例如,可以先采用4个N位乘法器根据本公开的教导来 实现2N*2N的乘法运算,然后使用如此形成的2个2N*2N的乘法器根据本公开的教导来实现4N*2N的乘法运算。同理,可以先采用4个N位乘法器根据本公开的教导来实现2N*2N的乘法运算,然后使用如此形成的4个2N*2N的乘法器根据本公开的教导来实现4N*4N的乘法运算。同样的原理可以应用于更高阶的乘法实现。
输入单元110被配置为向计算单元120中的多个N位乘法器提供输入。进一步地,输入单元110可以接收输入数据,并从输入数据中提取多个操作数,作为多个N位乘法器的输入。该操作的细节将会在下文结合附图和示例来详细描述。
输出单元130被配置为输出计算单元120的计算结果。进一步地,输出单元130可以从计算单元120的计算结果中选择合适的数据来输出。同样,该操作的细节将会在下文结合附图和示例来详细描述。
本公开的技术方案能够通过对低比特位宽的乘法进行移位和拼接来实现高比特位宽的乘法,最大程度地降低了加法器的位宽,并且降低了硬件资源的使用,简化电路的结构。
如图1所示,数据处理电路100还可以选择性地包括配置单元140。该配置单元140被配置为对输入单元110和输出单元130进行配置,使得数据处理电路100能够根据控制信息来执行操作。
在本文中,控制信息例如可以指示以下至少一个操作模式:N位的乘法运算(N位*N位的乘法运算)或者2N位的乘法运算(包括N位*2N位的乘法运算和2N位*2N位的乘法运算)。
例如,假设计算单元120包括4个N位乘法器,则配置单元140可以通过配置输入单元110和输出单元130,利用计算单元120中的4个N位乘法器来执行1个2N位*2N位的乘法运算,或者执行2个2N位*N位的乘法运算,或者执行1个2N位*N位的乘法运算并且同时执行1或2个N位*N位的乘法运算。同样的原理适用于计算单元120包括更多数量的乘法器的情况。
此外,配置单元140可以对输入单元110进行配置,使得输入单元110被配置为根据控制信息来产生多个N位乘法器中的至少一部分乘法器的输入。例如,假设计算单元120包括4个N位乘法器。此时,如果希望执行1个2N位*2N位的乘法运算或2个2N位*N位 的乘法运算,则配置单元140对输入单元110进行配置,以产生针对所有的4个N位乘法器的输入。备选地,如果希望执行1个2N位*N位的乘法运算或者执行2个N位*N位的乘法运算,则配置单元140对输入单元110进行配置,以产生针对2个N位乘法器的输入。这里,输入单元110在进行多个N为乘法器的输入时,如果输入的数据大于全部乘法器可以接收的大小时或者同时对一个乘法器同时具有多套不同的输入时,配置单元140可以对输入进行相应的配置,以确保数据处理电路可以正常工作。具体的配置方法,可以是顺序的,例如,选择先进先出FIFO的方式,先进入输入单元110的数据先进入计算单元进行计算,也可以是按照资源占用的大小进行排序的,例如资源占用量大的先进行计算,资源占用量小的后进行计算,还可以是按照不同的任务有不同的先后顺序编号,编号代表的越靠前的则先进行计算等等,这里具体的配置方法不做限定。
相应地,配置单元140可以对输出单元130进行配置,使得输出单元130被配置为根据控制信息来输出计算单元120的计算结果。例如,仍假设计算单元120包括4个N位乘法器。此时,如果希望执行1个2N位*2N位的乘法运算或2个2N位*N位的乘法运算,则配置单元140对输出单元130进行配置,以根据所有的4个N位乘法器的输出来形成最终的计算结果。备选地,如果希望执行1个2N位*N位的乘法运算或者执行2个N位*N位的乘法运算,则配置单元140对输出单元130进行配置,以根据2个N位乘法器的输出来形成最终的计算结果。这里可以根据选取的输入选择对应的输出,也可以直接按照配置信息选择相应的输出等等。
通过采用了配置单元,可以实现数据处理电路的充分利用,在可以实现资源复用的同时实现了多个计算的并行,节省了资源的同时提高了计算效率。
图2是示出了图1中的数据处理电路的细节的框图。如图2所示,输入单元110可以对输入的数据进行提取,从而获得用于计算单元120的操作数。一方面,可以提取每个操作数的符号位,并经过符号处理模块进行符号处理,得出最后需要合并的符号位。另一方面,可以计算所提取的每个操作数的绝对值。然后,组合乘法器(包括多个 乘法器)将这些绝对值提供每个无符号乘法器并进行处理。接下来,组合乘法器的输出和符号处理模块的输出(符号位)进行合并(即正号保持不变,负号求补码)。最后,输出单元130将输出扩展到指定位宽输出。
图3是示出了图1中的数据处理电路的细节的框图。可以将图3所示的电路结构视为图2中的电路的一个具体实施方式。下面,
在图3的示例中,提取和绝对值模块(Extract&ABS)是图1中所示的输入单元110的一部分。图3所示的示例输入数据有128位(图3中的输入:127…0)的宽度。
提取和绝对值模块根据控制信息(即图3中的SEL)从输入数据中提取操作数。假设图3所示的数据处理电路包括4个16位乘法器,那么SEL的含义可以是:00表示4个16x16乘法,01表示2个32x16乘法,10表示1个32x32乘法。
具体地,提取和绝对值模块可以根据控制信息获得提取的无符号操作数以及相应的符号位。其中,所提取的无符号操作数被提供给选择单元(该选择单元也是图1所示的输入单元110的一部分)。具体说来,提取和绝对值模块可以计算每个操作数的绝对值,获得将上述操作数拆成的每一个16位的操作数以便提供给相应的16x16无符号乘法器。选择单元在图3中由“选择器(MUX)”来表示。例如,图3中的8个选择器分别根据SEL信号来向4个16位乘法器(16*16乘法器)提供输入。接下来,当SEL信号为10时,表示要计算1个32*32的乘法,4个16位乘法器执行乘法操作,将结果送至加法器。在图3中,采用了三个32位+16位的加法器和一个16位+16位的加法器,对4个16位乘法器的输出进行加法运算,以获得期望的结果,其中在计算单元中使用合适的最高有效位(Most significant bit,MSB)和最低有效位(Least significant bit,LSB)来完成整个计算单元的设计。
当SEL为01时,表示要计算1或2个16*32的乘法。如果仅计算1个16*32的乘法,则采用2个16位乘法器执行乘法操作。如果计算2个16*32的乘法,则仍采用4个16位乘法器执行乘法操作。
当SEL为00时,表示要计算1-4个16*16的乘法。可以采用相应个数的16位乘法器来执行相应个数的乘法操作。
此外,SEL信号可以表示只使用多个N位乘法器中的一部分,例如只有3个16*16的乘法器进行运算。
上述方案也在乘法器资源合理分配下可以进行任意组合。例如,SEL信号可以指示同时计算1个16位*32位的乘法以及2个16位*16位的乘法。
图3中所示的实施例仅示出了乘法器为4个的情形。本领域技术人员可以理解,根据实际需要,乘法器和加法器可以按照上述规则进行增加以进行更高位的计算,例如,可以采用8个16位*16位的乘法器或16个16位*16位的乘法器来实现本公开的MAC处理电路。相应地,可以采用同样的原理来实现SEL信号以获得资源复用。
另一方面,如图3所示,对于提取的操作数的符号位,可以进行组合运算(图3中的“符号合并(Combsign)”模块)。
最后,将加法器的输出与符号运算的结果提供给图3中底部的选择器,该选择器可以作为图1中所示的输出单元130。具体地,该选择器可以根据SEL信号(即控制信息),将加法器和符号运算的结果扩展到指定位宽而输出。
采用图3所示的电路结构,可以实现不同位宽的乘法器的资源复用,并且在设计结构的时候最简化结构设计。即,采用低位宽的加法、移位操作和比特拼接来实现高位宽加法,从而降低了资源使用量。例如,实现一个32x16乘法器仅需要2个16x16乘法器和1个32+16加法器。实现一个32x32乘法器仅需要4个16x16乘法器、3个32+16加法器和1个16+16加法器。当然,所使用的乘法器也可以是8位的乘法器来进行8*16、16*16、16*32、32*32等计算。
进一步地,根据本公开的技术方案可采用多个N*N乘法器实现(2N)*N或(2N)*(2N)乘法器。例如,实现1个(2N)*N乘法器需要2个N*N乘法器和1个(2N)+N加法器,而实现一个(2N)*(2N)乘法器需要4个N*N乘法器、3个(2N)+N加法器和1个N+N加法器。
图4是示出了根据本公开一个实施例的数据处理系统的框图。如图4所示,数据处理系统400包括指令解码单元410和数据处理电路420。下面,详细描述数据处理系统400的各个组成部分的操作。
指令解码单元410被配置为获得指令,并对所述指令进行解码以 得到用于数据处理电路420的控制信息。如上文所述,控制信息可以指示以下至少一个操作模式:N位的乘法运算(N位*N位的乘法运算)或者2N位的乘法运算(包括N位*2N位的乘法运算和2N位*2N位的乘法运算)。
备选地,如果新的指令与当前指令发生冲突,则指令解码单元410不对当前指令进行解码。例如,当出现新的指令与当前指令在解码后对任意一个乘法器在设定的阈值时间内进行多次数据输入时,或者所需要的乘法器超过数据处理系统中的空闲的乘法器时,指令解码单元410不对当前指令进行解码,或者延迟对当前指令进行解码的时间。
数据处理电路420与上文描述的数据处理电路类似(例如图1所示的数据处理电路100)。因此,这里省略了对数据处理电路420的详细描述。
在本实施例中,通过采用指令解码单元410对指令进行解码,能够高效地为数据处理电路420生成控制信息。此外,指令解码单元410具有检错功能,在发生指令冲突时能够避免执行错误的指令,从而保证了数据处理电路420的正常工作。
图5是示出了图4中的数据处理系统的细节的框图。如图5所示,数据处理系统可以划分为控制通路和数据通路。控制通路以指令驱动和控制寄存器的联合配置生成对数据通路的控制信息(例如由图5所示的指令译码器(Instrc_decoder)来完成)。数据通路完成相应的运算功能。
具体地,控制通路可以包括控制寄存器端口控制逻辑和指令通路,这种组合形式可以有效减少指令条数。控制寄存器端口控制逻辑通过解析控制寄存器,生成针对数据通路的控制信息。如图5所示,数据通路可以包括加载单元、计算单元和存储单元。加载单元完成从数据端口取数的操作。加载单元所加载的数据和控制信号会输入到计算单元。计算单元(图5中由预处理(P_Proc)模块和MAC0…MAC31来表示)执行数据计算操作。存储单元将计算单元的计算结果进行存储以提供输出。
图6是示出了图4中的数据处理系统的细节的框图。如图6所示, 指令分类更新(Instr_assort_update)模块是指令通路的总体控制模块,它接收指令加载(Instr_load)信号,根据当前的状态来决定是否接收新的指令。
如果当前执行的指令和新的指令有冲突,则指令分类更新模块拒绝当前的指令,并且产生指令冲突信号(instr_conflict)。
如果不发生指令冲突,则指令分类更新模块根据指令状态以及指令总线(Instr_bus)接口获取的指令信息来产生对指令译码器0和指令译码器1的控制信号。
指令译码器0和指令译码器1负责对指令进行解码,从而生成控制信息。
控制信号分类(Control signal assort)模块对控制信息进行合并,最终控制数据通路(图6中所示的数据通路(Data_path)_M0…数据通路_M2)。
图7是示出了根据本公开一个实施例的数据处理方法70的流程图。
在步骤S710,向多个N位乘法器提供输入。其中,N=2n,n为大于0的自然数。
在步骤S720,利用加法器和多个N位乘法器执行计算,该加法器包括N位加法器和2N位加法器。
在步骤S730,输出最终的计算结果。
备选地,对多个N位乘法器中的至少一个进行配置,使得所配置的N位乘法器能够根据控制信息来执行操作。该控制信息可以指示以下至少一个操作模式:N位的乘法运算或者2N位的乘法运算。
备选地,可以获得指令并对所述指令进行解码,以得到控制信息。如果新的指令与当前指令发生冲突,则不对当前指令进行解码。例如,当发生对同一个乘法器在同一时间进行数据输入的情况时,或者所需要的乘法器超过空闲的乘法器时,不对当前指令进行解码。
另外,本公开的其他实施例提供了一种处理器,该处理器可以包括根据上文所述的数据处理系统(例如,图4所示的数据处理系统400)。例如,该处理器可以被设计为一种数字信号处理器DSP,该DSP可以应用于多种场景,包括但不限于机器视觉处理、图像信号处 理等。
通过采用本公开的上述实施例,能够对低比特位宽的乘法进行移位和拼接来实现高比特位宽的乘法。从而,简化了电路的结构并且降低了硬件资源的使用量。
上文已经结合优选实施例对本公开的方法和涉及的设备进行了描述。本领域技术人员可以理解,上面示出的方法仅是示例性的。本公开的方法并不局限于上面示出的步骤和顺序。
应该理解,本公开的上述实施例可以通过软件、硬件或者软件和硬件两者的结合来实现。此外,这里所公开的本公开的实施例可以在计算机程序产品上实现。更具体地,该计算机程序产品是如下的一种产品:具有计算机可读介质,计算机可读介质上编码有计算机程序逻辑,当在计算设备上执行时,该计算机程序逻辑提供相关的操作以实现本公开的上述技术方案。当在计算系统的至少一个处理器上执行时,计算机程序逻辑使得处理器执行本公开实施例所述的操作(方法)。
本公开的设置典型地提供为设置或编码在例如光介质(例如CD-ROM)、软盘或硬盘等的计算机可读介质上的软件、代码和/或其他数据结构、或者诸如一个或多个ROM或RAM或PROM芯片上的固件或微代码的其他介质、或一个或多个模块中的可下载的软件图像、共享数据库等。软件或固件或这种配置可安装在计算设备上,以使得计算设备中的一个或多个处理器执行本公开实施例所描述的技术方案。
此外,上述每个实施例中所使用的设备的每个功能模块或各个特征可以由电路实现或执行,所述电路通常为一个或多个集成电路。设计用于执行本说明书中所描述的各个功能的电路可以包括通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)或通用集成电路、现场可编程门阵列(FPGA)或其他可编程逻辑器件、分立的门或晶体管逻辑、或分立的硬件组件、或以上器件的任意组合。通用处理器可以是微处理器,或者所述处理器可以是现有的处理器、控制器、微控制器或状态机。上述通用处理器或每个电路可以由数字电路配置,或者可以由逻辑电路配置。此外,当由于半导体技术的进步,出 现了能够替代目前的集成电路的先进技术时,本公开也可以使用利用该先进技术得到的集成电路。
运行在根据本公开的设备上的程序可以是通过控制中央处理单元(CPU)来使计算机实现本公开的实施例功能的程序。该程序或由该程序处理的信息可以临时存储在易失性存储器(如随机存取存储器RAM)、硬盘驱动器(HDD)、非易失性存储器(如闪速存储器)、或其他存储器系统中。用于实现本公开各实施例功能的程序可以记录在计算机可读记录介质上。可以通过使计算机系统读取记录在所述记录介质上的程序并执行这些程序来实现相应的功能。此处的所谓“计算机系统”可以是嵌入在该设备中的计算机系统,可以包括操作系统或硬件(如外围设备)。“计算机可读记录介质”可以是半导体记录介质、光学记录介质、磁性记录介质、短时动态存储程序的记录介质、或计算机可读的任何其他记录介质。
如上,已经参考附图对本公开的实施例进行了详细描述。但是,具体的结构并不局限于上述实施例,本公开也包括不偏离本公开主旨的任何设计改动。另外,可以在权利要求的范围内对本公开进行多种改动,通过适当地组合不同实施例所公开的技术手段所得到的实施例也包含在本公开的技术范围内。此外,上述实施例中所描述的具有相同效果的组件可以相互替代。

Claims (18)

  1. 一种数据处理电路,包括:
    计算单元,包括加法器和多个N位乘法器;
    输入单元,被配置为向所述乘法器提供输入;以及
    输出单元,被配置为输出所述计算单元的计算结果;
    其中,所述加法器包括N位加法器和2N位加法器,N=2n,n为大于0的自然数。
  2. 根据权利要求1所述的数据处理电路,还包括:
    配置单元,被配置为对所述多个N位乘法器中的至少一个进行配置,使得所配置的N位乘法器能够根据控制信息来执行操作。
  3. 根据权利要求2所述的数据处理电路,其中,所述输入单元被配置为:根据所述控制信息来产生乘法器的输入。
  4. 根据权利要求2所述的数据处理电路,其中,所述输出单元被配置为:根据所述控制信息来输出所述计算单元的计算结果。
  5. 根据权利要求2-4中任意一项所述的数据处理电路,其中,所述控制信息指示以下至少一个操作模式:N位的乘法运算或者2N位的乘法运算。
  6. 根据权利要求1所述的数据处理电路,其中,所述多个N位乘法器包括2个N位乘法器,所述加法器包括1个2N位加法器。
  7. 根据权利要求1所述的数据处理电路,其中,所述多个N位乘法器包括4个N位乘法器,所述加法器包括1个N位加法器以及3个2N位加法器。
  8. 一种数据处理系统,包括:
    根据权利要求1-7中任意一项所述的数据处理电路;以及
    指令解码单元,被配置为获得指令并对所述指令进行解码以得到用于所述数据处理电路的控制信息。
  9. 根据权利要求8所述的数据处理系统,其中,所述指令解码单元还被配置为:如果新的指令与当前指令发生冲突,则不对当前指令进行解码。
  10. 根据权利要求9所述的数据处理系统,其中,所述新的指令与当前指令发生冲突包括:所述新的指令与当前指令在解码后对任意一个乘法器在设定的阈值时间内进行多次数据输入,或者所需要的乘法器超过数据处理系统中的空闲的乘法器。
  11. 根据权利要求8所述的数据处理系统,其中,所述控制信息指示以下至少一个操作模式:N位的乘法运算或者2N位的乘法运算。
  12. 一种用于处理数据的方法,包括:
    向多个N位乘法器提供输入;
    利用加法器和所述多个N位乘法器执行计算;以及
    输出最终的计算结果;
    其中,所述加法器包括N位加法器和2N位加法器,N=2n,n为大于0的自然数。
  13. 根据权利要求12所述的方法,还包括:
    对所述多个N位乘法器中的至少一个进行配置,使得所配置的N位乘法器能够根据控制信息来执行操作。
  14. 根据权利要求13所述的方法,其中,所述控制信息指示以下至少一个操作模式:N位的乘法运算或者2N位的乘法运算。
  15. 根据权利要求13所述的方法,还包括:
    获得指令并对所述指令进行解码,以得到所述控制信息。
  16. 根据权利要求15所述的方法,其中,如果新的指令与当前指令发生冲突,则不对当前指令进行解码。
  17. 根据权利要求16所述的方法,其中,所述新的指令与当前指令发生冲突包括:所述新的指令与当前指令在解码后对任意一个乘法器在设定的阈值时间内进行多次数据输入,或者所需要的乘法器超过空闲的乘法器。
  18. 一种处理器,包括根据权利要求8-11中任意一项所述的数据处理系统。
PCT/CN2017/095334 2017-07-31 2017-07-31 数据处理方法和设备 WO2019023910A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2017/095334 WO2019023910A1 (zh) 2017-07-31 2017-07-31 数据处理方法和设备
CN201780004422.8A CN108475188A (zh) 2017-07-31 2017-07-31 数据处理方法和设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/095334 WO2019023910A1 (zh) 2017-07-31 2017-07-31 数据处理方法和设备

Publications (1)

Publication Number Publication Date
WO2019023910A1 true WO2019023910A1 (zh) 2019-02-07

Family

ID=63266457

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/095334 WO2019023910A1 (zh) 2017-07-31 2017-07-31 数据处理方法和设备

Country Status (2)

Country Link
CN (1) CN108475188A (zh)
WO (1) WO2019023910A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7165018B2 (ja) * 2018-10-03 2022-11-02 キヤノン株式会社 情報処理装置、情報処理方法
CN111107274B (zh) * 2018-10-26 2021-01-08 北京图森智途科技有限公司 一种图像亮度统计方法及成像设备
WO2020211049A1 (zh) * 2019-04-18 2020-10-22 深圳市大疆创新科技有限公司 数据处理方法和设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040230632A1 (en) * 2002-09-24 2004-11-18 Interdigital Technology Corporation Computationally efficient mathematical engine
CN103294446A (zh) * 2013-05-14 2013-09-11 中国科学院自动化研究所 一种定点乘累加器
CN104252331A (zh) * 2013-06-29 2014-12-31 华为技术有限公司 乘累加器
CN104407836A (zh) * 2014-10-14 2015-03-11 中国航天科技集团公司第九研究院第七七一研究所 利用定点乘法器进行级联乘累加运算的装置和方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101916177B (zh) * 2010-07-26 2012-06-27 清华大学 一种可配置多精度定点乘加装置
US8478969B2 (en) * 2010-09-24 2013-07-02 Intel Corporation Performing a multiply-multiply-accumulate instruction
CN102360281B (zh) * 2011-10-31 2014-04-02 中国人民解放军国防科学技术大学 用于微处理器的多功能定点乘加单元mac运算装置
CN104102470A (zh) * 2014-07-23 2014-10-15 中国电子科技集团公司第五十八研究所 可配置可扩展的流水线乘累加器
CN105528191B (zh) * 2015-12-01 2017-04-12 中国科学院计算技术研究所 数据累加装置、方法及数字信号处理装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040230632A1 (en) * 2002-09-24 2004-11-18 Interdigital Technology Corporation Computationally efficient mathematical engine
CN103294446A (zh) * 2013-05-14 2013-09-11 中国科学院自动化研究所 一种定点乘累加器
CN104252331A (zh) * 2013-06-29 2014-12-31 华为技术有限公司 乘累加器
CN104407836A (zh) * 2014-10-14 2015-03-11 中国航天科技集团公司第九研究院第七七一研究所 利用定点乘法器进行级联乘累加运算的装置和方法

Also Published As

Publication number Publication date
CN108475188A (zh) 2018-08-31

Similar Documents

Publication Publication Date Title
US20180107630A1 (en) Processor and method for executing matrix multiplication operation on processor
US6230257B1 (en) Method and apparatus for staggering execution of a single packed data instruction using the same circuit
US20110106871A1 (en) Apparatus and method for performing multiply-accumulate operations
US8959275B2 (en) Byte selection and steering logic for combined byte shift and byte permute vector unit
US8977835B2 (en) Reversing processing order in half-pumped SIMD execution units to achieve K cycle issue-to-issue latency
WO2015114305A1 (en) A data processing apparatus and method for executing a vector scan instruction
JP2012507796A (ja) 範囲検出を行うための命令及びロジック
WO2019023910A1 (zh) 数据处理方法和设备
US8140608B1 (en) Pipelined integer division using floating-point reciprocal
US10133552B2 (en) Data storage method, ternary inner product operation circuit, semiconductor device including the same, and ternary inner product arithmetic processing program
US20170322808A1 (en) Low-power processor with support for multiple precision modes
US10459725B2 (en) Execution of load instructions in a processor
WO2022121090A1 (zh) 支持高吞吐多精度乘法运算的处理器
US20190057092A1 (en) Item selection apparatus
US6725360B1 (en) Selectively processing different size data in multiplier and ALU paths in parallel
JP2006018411A (ja) プロセッサ
JP2014164659A (ja) プロセッサ
TWI807927B (zh) 具有向量歸約方法與元素歸約方法的向量處理器
US20140281368A1 (en) Cycle sliced vectors and slot execution on a shared datapath
KR20010085353A (ko) 고속 컨텍스트 전환을 갖는 컴퓨터
WO2022220835A1 (en) Shared register for vector register file and scalar register file
US20150006850A1 (en) Processor with heterogeneous clustered architecture
US20210073000A1 (en) Reusing adjacent simd unit for fast wide result generation
JP2006092158A (ja) デジタル信号処理回路
JP6060853B2 (ja) プロセッサおよびプロセッサの処理方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17919971

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17919971

Country of ref document: EP

Kind code of ref document: A1