WO2021120646A1 - Data processing system - Google Patents

Data processing system Download PDF

Info

Publication number
WO2021120646A1
WO2021120646A1 PCT/CN2020/108985 CN2020108985W WO2021120646A1 WO 2021120646 A1 WO2021120646 A1 WO 2021120646A1 CN 2020108985 W CN2020108985 W CN 2020108985W WO 2021120646 A1 WO2021120646 A1 WO 2021120646A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
register
result
module
arithmetic
Prior art date
Application number
PCT/CN2020/108985
Other languages
French (fr)
Chinese (zh)
Inventor
蒋文
Original Assignee
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术股份有限公司 filed Critical 深圳云天励飞技术股份有限公司
Publication of WO2021120646A1 publication Critical patent/WO2021120646A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/262Analysis of motion using transform domain methods, e.g. Fourier domain methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20056Discrete and fast Fourier transform, [DFT, FFT]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the technical field of image processing, in particular to a data processing system.
  • the calculation of the correlation between the feature values of the two frames is usually implemented by software (that is, through the general-purpose processor ARM Or digital signal processor DSP to execute software program instructions), however, when pure software calculates the correlation between the feature values of two frames, the calculation efficiency is low.
  • the embodiment of the present invention provides a data processing system, which improves the efficiency of calculating the correlation between the first feature value and the second feature value through cooperation between multiple hardware modules.
  • the embodiment of the present invention provides a data processing system, which includes a data transmission module, a control module, an arithmetic module, a storage control module, and a storage module;
  • the data transmission module is configured to receive a first characteristic value and a second characteristic value, the first characteristic value and the second characteristic value including D ⁇ W ⁇ H characteristic vectors, where D stands for dimension and W stands for width , H stands for height, D, W, and H are all positive integers;
  • the control module is configured to use the acquired W ⁇ H eigenvectors of the i-th dimension of the first eigenvalue as a first operation value, and control the operation module to perform each row of the first operation value
  • the feature vector is subjected to fast Fourier transform to obtain W ⁇ H first results, and the arithmetic module is controlled to perform fast Fourier transform on each column of the first result to obtain H ⁇ W second results, where 1 ⁇ i ⁇ D ;
  • the control module is further configured to use the acquired W ⁇ H eigenvectors of the i-th dimension of the second eigenvalue as a second operation value, and control the operation module to perform a calculation on each of the second operation value.
  • the storage control module is configured to perform a conjugate multiplication operation on the second result and the fourth result to obtain W ⁇ H fifth results;
  • the control module is also configured to control the arithmetic module to perform inverse fast Fourier transform on each row feature vector of the fifth result to obtain W ⁇ H sixth results, and to control the arithmetic module to perform the inverse fast Fourier transform on the sixth result. Inverse fast Fourier transform for each column to obtain H ⁇ W seventh results;
  • the storage control module is further configured to accumulate the real parts of the same row and the same column in the seventh result of each of the D dimensions to obtain W ⁇ H eighth results, and use the eighth result as The degree of correlation between the first characteristic value and the second characteristic value;
  • the storage control module is further configured to control the storage module to store the first to seventh results and the correlation degree.
  • the arithmetic module includes M butterfly arithmetic units, and each butterfly arithmetic unit performs fast Fourier transform or inverse fast Fourier transform on the received data.
  • the control module includes an arithmetic controller, a register unit, and a data selection unit.
  • the register unit is used to register the data to be calculated
  • the arithmetic controller is used to register the data to be calculated according to W and M or H and The magnitude relationship of M controls the data selection unit to select corresponding data from the register unit, and output the selected data to the M butterfly operation units.
  • the arithmetic controller is further configured to control the data selection unit to select W data from the register unit if the W is less than 2M, and output the selected data to the M Butterfly operation unit;
  • the arithmetic controller is further configured to control the data selection unit to select 2M data from the register unit each time if the W is greater than or equal to 2M, and output the selected data to the M butterfly operations unit;
  • the operation controller is further configured to control the data selection unit to select H data from the register unit if the H is less than 2M, and output the selected data to the M butterfly operation units;
  • the operation controller is further configured to control the data selection unit to select 2M data from the register unit each time if the H is greater than or equal to 2M, and output the selected data to the M butterfly operations unit.
  • the data selection unit includes data selection arbitration logic and data selection logic;
  • the data selection arbitration logic determines the rotation factor according to the number of data selected from the register unit each time and the current stage where the butterfly operation unit performs fast Fourier transform or inverse fast Fourier transform;
  • the data selection logic determines the serial numbers of the two data input to each butterfly operation unit of the M butterfly operation units according to the rotation factor.
  • the register unit includes two register sets, each of the register sets includes X register sets, one register set includes P registers, and the P registers in one register set are used to store one row. Or a list of feature vectors, where both X and P are positive integers;
  • the arithmetic controller is configured to store X row or column feature vectors in one of the two register sets;
  • the arithmetic controller is also used to control the data selection unit to select corresponding data from the one set of registers, and output the selected data to the M butterfly arithmetic units, and at the same time, the arithmetic controller sends the
  • the other register set in the two register sets stores feature vectors other than the X row or column feature vector.
  • system further includes a register interface
  • the register interface is used to obtain register configuration information, and transmit the register configuration information to the operation controller;
  • the arithmetic controller is used for storing feature vectors in the M register sets according to the register configuration information.
  • the storage module includes a first memory, a second memory, and a third memory;
  • the first memory is used to store the first result, the third result, and the sixth result
  • the second memory is used to store the second result, the fourth result, the fifth result, and the seventh result
  • the third memory is used to store the eighth result.
  • the system further includes a task controller, and the task controller is configured to send a task start signal to the calculation controller when the correlation calculation instruction is detected, and the task start signal is used for Instruct the arithmetic controller to perform fast Fourier transform or inverse fast Fourier transform on the input data.
  • the data processing system includes a data transmission module, a control module, a calculation module, a storage control module, and a storage module. These five modules are all hardware modules.
  • the calculation of the correlation degree is realized by mutual cooperation, and the calculation process is performed in the unit of row or column. Therefore, compared with pure software calculation, the calculation efficiency of the data processing system is higher.
  • Fig. 1 is a functional block diagram of a data processing system provided by an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a first characteristic value and a second characteristic value provided by an embodiment of the present invention.
  • Fig. 3 is a functional block diagram of an arithmetic module provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a butterfly operation unit provided by an embodiment of the present invention.
  • Fig. 5 is a functional block diagram of another data processing system provided by an embodiment of the present invention.
  • Fig. 6 is a schematic diagram of an FFT operation provided by an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a registration unit provided by an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of each storage unit in the storage module provided by an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a pipeline processing provided by an embodiment of the present invention.
  • FIG. 1 is a functional block diagram of a data processing system according to an embodiment of the present invention.
  • the data processing system may include a data transmission module, a control module, an arithmetic module, a storage control module, and a storage module.
  • the data transmission module, the control module, the calculation module, the storage control module, and the storage module are all hardware modules.
  • the data transmission module is configured to receive a first feature value F1 and a second feature value F2.
  • the first feature value F1 and the second feature value F2 include D ⁇ W ⁇ H feature vectors, where D stands for dimension, W stands for width, H stands for height, and D, W, and H are all positive integers.
  • the control module is configured to use the acquired W ⁇ H eigenvectors of the i-th dimension of the first eigenvalue F1 as the first operation value, and control the operation module to calculate each of the first operation value.
  • One row of feature vectors is subjected to fast Fourier transform to obtain W ⁇ H first results, and the arithmetic module is controlled to perform fast Fourier transform on each column of the first result to obtain H ⁇ W second results, where 1 ⁇ i ⁇ D.
  • control module first performs fast Fourier transform on each row of feature vectors to obtain W ⁇ H first results, and then Each column of the first result is subjected to fast Fourier transform to obtain H ⁇ W second results.
  • the control module is further configured to use the acquired W ⁇ H eigenvectors of the i-th dimension of the second eigenvalue F2 as a second operation value, and control the operation of the operation module on the second operation value Fast Fourier transform of each row feature vector to obtain W ⁇ H third results, and control the arithmetic module to perform fast Fourier transform on each column of the third result to obtain H ⁇ W fourth results;
  • control module first performs fast Fourier transform on each row of eigenvectors to obtain W ⁇ H third results, and then Each column of the first result is subjected to fast Fourier transform to obtain H ⁇ W fourth results.
  • the storage control module is configured to perform a conjugate multiplication operation on the second result and the fourth result to obtain W ⁇ H fifth results.
  • the control module is also configured to control the arithmetic module to perform inverse fast Fourier transform on each row feature vector of the fifth result to obtain W ⁇ H sixth results, and to control the arithmetic module to perform the inverse fast Fourier transform on the sixth result. Inverse fast Fourier transform for each column to obtain the seventh result of H ⁇ W.
  • the storage control module is further configured to accumulate the real parts of the same row and the same column in the seventh result of each of the D dimensions to obtain W ⁇ H eighth results, and use the eighth result as The degree of correlation between the first characteristic value and the second characteristic value.
  • the storage control module is further configured to control the storage module to store the first to seventh results and the correlation degree.
  • the arithmetic module may include M butterfly arithmetic units, and M is a positive integer.
  • Each butterfly operation unit is used to perform fast Fourier transform or inverse fast Fourier transform on the received data.
  • each butterfly operation unit includes two input interfaces for receiving input data input 1 and input 2 respectively.
  • Input 2 and the twiddle factor are multiplied by the multiplier to get the rotated input 2.
  • Input 1 and the rotated input 2 are added by the adder to get the result 1.
  • Input 1 and rotated input 2 are subtracted by the subtractor to get the result 2.
  • Input 1 and rotated input 2 are subtracted by the subtractor to get the result 2.
  • M butterfly arithmetic units include 2M input interfaces, which can process the first-level 2M point FFT (Fast Fourier Transform, Fast Fourier Transform) or IFFT (Inverse Fast Fourier Transform, inverse fast Fourier transform) operation.
  • first-level 2M point FFT Fast Fourier Transform
  • IFFT Inverse Fast Fourier Transform, inverse fast Fourier transform
  • For X-point FFT or IFFT operation when X ⁇ 2M, the arithmetic module can process the X-point FFT or IFFT operation of the first stage in one clock cycle; when X>2M and Y times 2M, so The arithmetic module can process the X-point FFT or IFFT operation of the first stage in Y clock cycles.
  • 2M points mentioned above mean that the number of feature vectors that can be input by the arithmetic module is 2M.
  • the arithmetic module can complete the 8-point FFT or IFFT operation of the first stage in one clock cycle.
  • FIG. 5 is a functional block diagram of another data processing system according to an embodiment of the present invention.
  • the data processing system may include a data transmission module, a control module, an arithmetic module, a storage control module, and a storage module.
  • the functions of the data transmission module, the control module, the operation module, the storage control module, and the storage module are the same as those of the data transmission module, control module, operation module, storage control module, and storage in the embodiment provided in FIG. 1
  • the functions of the modules are similar, so I won't repeat them here.
  • the data transmission module can be connected to an external memory to transmit the image feature value stored in the external memory to the control module, where the image feature value can include a first feature value F1 of the first image and a second feature value F2 of the second image .
  • Each of the first feature value F1 and the second feature value F2 may include multiple feature vectors. As shown in FIG. 2, each feature value may include D ⁇ W ⁇ H feature vectors, where D represents dimension, W stands for width and H stands for height.
  • the data transmission module transmits the first characteristic value and the second characteristic value from the external memory to the control module, which can facilitate subsequent calculations, thereby greatly improving calculation efficiency.
  • the control module may include an arithmetic controller, a register unit, and a data selection unit.
  • the register unit is used to register the data to be operated on, and the operation controller is used to control the data selection unit to select corresponding data from the register unit according to the magnitude relationship between W and M or H and M, and select The data of is output to the M butterfly operation units. For example, if W is less than 2M, control the data selection unit to select W data from the register unit, and output the selected data to the M butterfly operation units.
  • the arithmetic module integrates 4 butterfly arithmetic units, which can process 8-point FFT at the same time, and W is less than 8, then the processing can be completed in one time period, and the 4 eigenvectors are directly input to the arithmetic module for operation It can be understood that the feature vector of the input operation module only occupies two butterfly operation units.
  • the data selection unit is controlled to select 2M data from the register unit each time, and the selected data is output to the M butterfly operation units.
  • W is 16
  • the arithmetic module integrates 4 butterfly arithmetic units, which can process 8-point FFT at the same time, and W is greater than 2M, it is not possible to input all W data, only 2M feature vectors can be input to the arithmetic module at a time. Operation.
  • the data selection unit includes data selection arbitration logic and data selection logic.
  • the data selection arbitration logic determines the rotation factor according to the number of data selected from the register unit each time and the current stage where the butterfly operation unit performs fast Fourier transform or inverse fast Fourier transform.
  • the data selection logic determines the serial numbers of the two data input to each butterfly operation unit of the M butterfly operation units according to the rotation factor, and inputs the corresponding butterfly operation unit according to the serial number.
  • W in the twiddle factor WkN represents the weight
  • N represents the number of FFT or IFFT points (generally 4 points, 8 points, 16 points, etc.) and k represents the number of weights.
  • N represents the number of FFT or IFFT points (generally 4 points, 8 points, 16 points, etc.)
  • k represents the number of weights. It can be seen from the schematic diagram of the butterfly operation process that the values of N and k in the rotation factor WkN, the number of points involved in the FFT operation (the value of N), and the current fast Fourier transform or inverse fast Fourier transform of the butterfly operation unit The number of stages is related.
  • the data selection arbitration logic is based on the number of data selected from the register unit each time (that is, the number of points involved in the FFT operation), and the current level of the butterfly operation unit performing fast Fourier transform or inverse fast Fourier transform. Determine the rotation factor, that is, determine the size of N and k.
  • the data selection logic can determine the serial number of the two data input to each butterfly operation unit of the M butterfly operation units, where the serial number can be obtained by sequentially numbering when the data is taken out from the register unit The serial number. Or, if it is the intermediate result of the FFT operation, the sequence number may be obtained by sequentially numbering the intermediate results obtained after the first-level operation. For example, if it is an 8-point FFT operation, the number is 0-7, and if it is a 16-point FFT operation, the number is 0-15. Determine the data input to the butterfly operation unit according to the serial number.
  • the point spacing is 4 (where k has four values), that is, select sequence number 0 and sequence number 4, sequence number 1 and sequence number 5, sequence number 2 and sequence number 6, and sequence number 3 and sequence number 7.
  • the above-mentioned register unit may include a first register set and a second register set.
  • the first register set and the second register set each include X register sets, and one register set includes P registers, and both X and P are positive. Integer, P registers in a register group are used to store one row or column feature vector, then one register set can store X row or column feature vector.
  • the arithmetic controller is used to control the first register set to store X row or column feature vectors
  • the arithmetic controller is also used to control the data selection unit to select corresponding data from the first register set, and output the selected data to the M butterfly arithmetic units, while the arithmetic controller controls
  • the second register set stores feature vectors other than the X row or column feature vectors. For example, if the first register set stores 1 to X row or column feature vectors, the second register set stores X+1 to 2X rows Or column feature vector.
  • the register unit may include two register sets, the first register set includes register sets 1-4, and the second register set includes register sets 5-8.
  • a register group can be used to store a row of feature vectors, for example, register groups 1-4 are used to store the 1-4 rows of the feature value F1, and the register groups 5-8 are used to store the 5-8th rows of the feature value F1.
  • register banks 1-4 and register banks 5-8 can take ping-pong operation, that is, while processing the characteristic values of rows 1-4, the characteristic values of rows 5-8 can be stored in In the second register set, after processing the characteristic values in rows 1-4, the characteristic values in rows 5-8 have been stored in the second register set, and the characteristic values in the second register set can be directly processed
  • the processing reduces the waiting time for storing the characteristic values of lines 5-8 into the second register set, thereby greatly improving the processing efficiency.
  • register group 1-4 can also be used to store rows 9-12, 17-20, etc.
  • register group 5-8 can also be used to store rows 13-16, 21-24, etc. , And so on.
  • the storage control module is configured to perform a conjugate multiplication operation on the second result and the fourth result obtained by the above calculation to obtain W ⁇ H fifth results.
  • the control module is also configured to control the arithmetic module to perform inverse fast Fourier transform on each row feature vector of the fifth result to obtain W ⁇ H sixth results, and to control the arithmetic module to perform the inverse fast Fourier transform on the sixth result. Inverse fast Fourier transform for each column to obtain the seventh result of H ⁇ W.
  • the arithmetic module performs inverse fast Fourier transform method on each row feature vector of W ⁇ H fifth results, and performs inverse fast Fourier transform on each column of W ⁇ H sixth results to obtain H ⁇ W
  • the arithmetic module performs inverse fast Fourier transform method on each row feature vector of W ⁇ H fifth results, and performs inverse fast Fourier transform on each column of W ⁇ H sixth results to obtain H ⁇ W
  • the directional fast Fourier transform process and the column directional fast Fourier transform process of the foregoing embodiment which will not be repeated here.
  • the storage control module is further configured to accumulate the real parts of the same row and the same column in the seventh result of each of the D dimensions to obtain W ⁇ H eighth results, and use the eighth result as The correlation degree between the first eigenvalue and the second eigenvalue; the correlation degree can be used in a kernel correlation filtering algorithm (Kernel The calculation in the Correlation Filter (KCF) is to calculate the correlation between the current image frame and the previous image frame.
  • KCF kernel correlation filtering algorithm
  • the H ⁇ W seventh result of a dimension is calculated, that is, the H ⁇ W seventh result of the dimension is accumulated with the same row and the same column in the accumulated result that has been stored.
  • the H ⁇ W seventh results of the dimension can obtain the W ⁇ H eighth results, and the eighth results are used as the correlation between the first eigenvalue and the second eigenvalue.
  • the storage control module is further configured to control the storage module to store the first to seventh results and the correlation degree.
  • the storage module includes a first memory, a second memory, and a third memory.
  • the first memory is used to store the first result, the third result, and the sixth result.
  • the second memory is used to store the second result, the fourth result, the fifth result, and the seventh result.
  • the third memory is used to store the eighth result.
  • the first result obtained after the Fourier transform is performed on each row of the feature vector of the first feature value F1 is stored in the first memory, and further, the arithmetic controller reads the first result from the first memory, The arithmetic controller controls the arithmetic module to perform fast Fourier transform on each column of the first result to obtain a second result, and stores the second result in the second memory.
  • the third result obtained by performing the fast Fourier transform on each row of the second eigenvalue can be stored in The first storage, it should be noted that, can be overwriting storage, that is, the third result overwrites the first result, or it can be stored without overwriting.
  • the arithmetic controller reads the third result from the first memory, and controls the arithmetic module to perform fast Fourier transform on each column of the third result to obtain the fourth result.
  • the fourth result can be stored in the second memory, which cannot be Overwrite storage, because the second result stored in the second memory requires a conjugate multiplication operation.
  • the storage control module controls the second result and the fourth result to perform conjugate multiplication to obtain the fifth result, and store the fifth result in the second memory.
  • the fifth result is further read from the second memory, and the inverse fast Fourier transform is performed on each row of the fifth result to obtain the sixth result. Store the sixth result in the first memory.
  • the sixth result is further read from the first memory, and the inverse fast Fourier transform is performed on each column of the sixth result to obtain the seventh result.
  • the seventh result is stored in the second memory, and the seventh result is further read from the second memory.
  • the real parts of the same row and the same column of the seventh result of all dimensions are added to obtain the eighth result, which is the degree of correlation.
  • the correlation degree is stored in the third memory. For example, two dimensions, two rows and two columns of eigenvalues, then the real parts of the two second inverse fast Fourier transform results in the first row and first column of the two dimensions are added together, and the first in the two dimensions The real parts of the two second inverse fast Fourier transform results in one row and the second column are added together, and so on, to get the value of the addition of the real parts of the two rows and two columns.
  • each memory (such as the first memory, the second memory, and the third memory) in the storage module may include multiple storage units (such as Bank0-Bank31).
  • Each storage unit has an address number, and each storage unit is used to store one row or one column of data ( Figure 7 shows an example of a row).
  • the number of storage units included in each memory may be the same or different, and the address numbers of the storage units in each memory may be the same or different.
  • each memory includes the same number of storage units and the address numbers of the storage units in each memory are also the same.
  • each memory includes 32 storage units, and the address numbers of the 32 storage units are addresses. 0-Address 31.
  • a storage unit sequentially stores each result in a row. When fast Fourier transform is performed on each column of data, the results with the same address number in multiple storage units are taken out to form a column of data. For example, all the data at address 0 is taken to form the first column of data.
  • the above calculation of the fast Fourier transform of each row of data, or the fast Fourier transform of each column of data, and the inverse fast Fourier transform of each row of data, or the fast Fourier transform of each column of data needs to go through the following four processes:
  • the arithmetic controller loads the first characteristic value F1 and the first characteristic value F2 transmitted by the data transmission module, or the intermediate result output by the arithmetic module, or the first result and the third result output by the storage module into the register unit;
  • the data selection unit selects the data loaded into the register unit, and transmits the selected data to the arithmetic module;
  • the calculation module performs fast Fourier transform or inverse fast Fourier transform on the selected data, and transmits the calculation result to the storage module;
  • the storage module stores the received calculation results (such as the first to seventh results).
  • Fig. 9 is a schematic diagram of a pipeline processing provided by an embodiment of the present invention. As shown in Fig. 9, a cycle includes four sub-periods, and each sub-period corresponds to one of the above-mentioned processing procedures. In the first sub-cycle, the arithmetic controller loads the corresponding data into register group 1.
  • the data selection unit selects the data loaded into register group 1, and the arithmetic controller loads the corresponding data Register group 2; in the third sub-cycle, the arithmetic module performs FFT or IFFT operations on the data of the selected register group 1, while the data selection unit selects the data loaded in register group 2, and the arithmetic controller loads the corresponding data Into register group 3; in the fourth sub-cycle, the storage module stores the result of the operation on the data of register group 1 output by the operation module, while the operation module performs FFT or IFFT operation on the data of the selected register group 2, and the data selection unit Load the data of register group 3 for selection, and the arithmetic controller loads the corresponding data into register group 4.
  • each cycle completes the FFT or IFFT operation of one row or one column of data, and each cycle has the participation of 4 register groups, which can process 4 rows or 4 columns of data.
  • the calculation of the fast Fourier transform results of the 4 rows of eigenvalues is completed until 4 time periods, and the fast Fourier transform results of the 4 rows of eigenvalues are obtained. It should be noted that in order to avoid the waiting time for reading the eigenvalues of rows 5-8 from the outside, you can read the data to the first while performing fast Fourier transform calculations on the eigenvalues of rows 1-4
  • the second storage group is stored in the set.
  • the number of processing procedures required to calculate the FFT or IFFT operation of one row or one column of data is the same as the number of sub-periods included in one cycle and the number of register groups included in a register set. For example, when calculating the FFT or IFFT operation of one row or one column of data requires four processing procedures, one cycle includes four sub-periods, and one register set includes four register sets.
  • the data processing system may also include a register interface.
  • the register interface is used to obtain register configuration information and transmit the register configuration information to the operation controller.
  • the arithmetic controller is configured to store feature vectors in the first register set and the second register set according to the register configuration information.
  • the register configuration information indicates that data is stored in the first register set first, and then when the data stored in the first register set is processed, the data is stored in the second register set.
  • the data processing system may further include a task controller, the task controller is configured to send a task start signal to the calculation controller when the correlation calculation instruction is detected, and the task start signal is used to indicate
  • the arithmetic controller performs fast Fourier transform or inverse fast Fourier transform on input data.
  • the program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. When executed, it includes the procedures of the above-mentioned method embodiments.
  • the storage medium can be a magnetic disk, an optical disk, a read-only storage memory (Read-Only Memory, ROM) or random storage memory (Random Access Memory, RAM) etc.

Abstract

A data processing system. The data processing system comprises: a data transmission module, a control module, a calculation module, a storage control module, and a storage module. The data transmission module is configured to receive a first feature value and a second feature value, the control module is configured to control the calculation module to calculate data of the first feature value and the second feature value, the storage control module is configured to perform conjugate multiplication operation and perform calculation of correlation degree of the first feature value and the second feature value, and the storage module is configured to store a calculation result. The system completes the calculation of correlation degree by means of cooperation among a plurality of hardware modules, and the calculation efficiency is high.

Description

一种数据处理系统A data processing system 技术领域Technical field
本发明涉及图像处理技术领域,尤其涉及一种数据处理系统。The present invention relates to the technical field of image processing, in particular to a data processing system.
本申请要求于2019年12月16日提交中国专利局,申请号为201911296752.6、发明名称为“一种数据处理系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on December 16, 2019, with an application number of 201911296752.6 and an invention title of "a data processing system", the entire content of which is incorporated into this application by reference.
背景技术Background technique
在进行目标跟踪时,通常需要通过当前帧的图像和下一帧的图像之间的相关性,来检测下一帧图像的预测位置是否存在目标。而计算当前帧图像的特征值和下一帧图像的特征值之间的相关度成为一个关键步骤,目前,两帧图像特征值之间相关度的计算通常由软件实现(即通过通用处理器ARM或者数字信号处理器DSP来执行软件程序指令实现),然而,由纯软件计算两帧图像特征值之间的相关度时,计算效率较低。When performing target tracking, it is usually necessary to detect whether there is a target in the predicted position of the next frame of image through the correlation between the image of the current frame and the image of the next frame. The calculation of the correlation between the feature value of the current frame image and the feature value of the next frame image has become a key step. At present, the calculation of the correlation between the feature values of the two frames is usually implemented by software (that is, through the general-purpose processor ARM Or digital signal processor DSP to execute software program instructions), however, when pure software calculates the correlation between the feature values of two frames, the calculation efficiency is low.
技术解决方案Technical solutions
本发明实施例提供一种数据处理系统,该系统通过多个硬件模块之间的协作来提高计算第一特征值和第二特征值的相关度的效率。The embodiment of the present invention provides a data processing system, which improves the efficiency of calculating the correlation between the first feature value and the second feature value through cooperation between multiple hardware modules.
本发明实施例提供了一种数据处理系统,所述数据处理系统包括数据传输模块、控制模块、运算模块、存储控制模块及存储模块;The embodiment of the present invention provides a data processing system, which includes a data transmission module, a control module, an arithmetic module, a storage control module, and a storage module;
所述数据传输模块用于接收第一特征值以及第二特征值,所述第一特征值及所述第二特征值包括D×W×H个特征向量,其中,D代表维度,W代表宽度,H代表高度,D、W、H均为正整数;The data transmission module is configured to receive a first characteristic value and a second characteristic value, the first characteristic value and the second characteristic value including D×W×H characteristic vectors, where D stands for dimension and W stands for width , H stands for height, D, W, and H are all positive integers;
所述控制模块用于将获取到的所述第一特征值的第i个维度的W×H个特征向量作为第一运算值,并控制所述运算模块对所述第一运算值的每一行特征向量作快速傅立叶变换得到W×H个第一结果,且控制所述运算模块对所述第一结果的每一列作快速傅立叶变换得到H×W个第二结果,其中,1≤i≤D;The control module is configured to use the acquired W×H eigenvectors of the i-th dimension of the first eigenvalue as a first operation value, and control the operation module to perform each row of the first operation value The feature vector is subjected to fast Fourier transform to obtain W×H first results, and the arithmetic module is controlled to perform fast Fourier transform on each column of the first result to obtain H×W second results, where 1≤i≤D ;
所述控制模块还用于将获取到的所述第二特征值的第i个维度的W×H个特征向量作为第二运算值,并控制所述运算模块对所述第二运算值的每一行特征向量作快速傅立叶变换得到W×H个第三结果,且控制所述运算模块对所述第三结果的每一列作快速傅立叶变换得到H×W个第四结果;The control module is further configured to use the acquired W×H eigenvectors of the i-th dimension of the second eigenvalue as a second operation value, and control the operation module to perform a calculation on each of the second operation value. Fast Fourier transform of a row of feature vectors to obtain W×H third results, and control the arithmetic module to perform fast Fourier transform on each column of the third result to obtain H×W fourth results;
所述存储控制模块用于对所述第二结果和所述第四结果做共轭相乘运算得到W×H个第五结果;The storage control module is configured to perform a conjugate multiplication operation on the second result and the fourth result to obtain W×H fifth results;
所述控制模块还用于控制所述运算模块对所述第五结果的每一行特征向量作快速傅立叶逆变换得到W×H个第六结果,并控制所述运算模块对所述第六结果的每一列作快速傅立叶逆变换得到H×W个第七结果;The control module is also configured to control the arithmetic module to perform inverse fast Fourier transform on each row feature vector of the fifth result to obtain W×H sixth results, and to control the arithmetic module to perform the inverse fast Fourier transform on the sixth result. Inverse fast Fourier transform for each column to obtain H×W seventh results;
所述存储控制模块还用于将所述D个维度中每个维度的第七结果中相同行和相同列的实部进行累加,得到W×H个第八结果,将所述第八结果作为所述第一特征值和所述第二特征值的相关度;The storage control module is further configured to accumulate the real parts of the same row and the same column in the seventh result of each of the D dimensions to obtain W×H eighth results, and use the eighth result as The degree of correlation between the first characteristic value and the second characteristic value;
所述存储控制模块还用于控制所述存储模块存储所述第一至第七结果以及所述相关度。The storage control module is further configured to control the storage module to store the first to seventh results and the correlation degree.
在一种可能的设计中,所述运算模块包括M个蝶形运算单元,每个蝶形运算单元对接收到的数据进行快速傅立叶变换或者快速傅立叶逆变换。In a possible design, the arithmetic module includes M butterfly arithmetic units, and each butterfly arithmetic unit performs fast Fourier transform or inverse fast Fourier transform on the received data.
在一种可能的设计中,所述控制模块包括运算控制器、寄存单元和数据选择单元,所述寄存单元用于寄存即将进行运算的数据,所述运算控制器用于根据W与M或H与M的大小关系控制所述数据选择单元从所述寄存单元中选取相应的数据,并将选取的数据输出给所述M个蝶形运算单元。In a possible design, the control module includes an arithmetic controller, a register unit, and a data selection unit. The register unit is used to register the data to be calculated, and the arithmetic controller is used to register the data to be calculated according to W and M or H and The magnitude relationship of M controls the data selection unit to select corresponding data from the register unit, and output the selected data to the M butterfly operation units.
在一种可能的设计中,所述运算控制器还用于若所述W小于2M,控制所述数据选择单元从所述寄存单元中选取W个数据,并将选取的数据输出给所述M个蝶形运算单元;In a possible design, the arithmetic controller is further configured to control the data selection unit to select W data from the register unit if the W is less than 2M, and output the selected data to the M Butterfly operation unit;
所述运算控制器还用于若所述W大于或者等于2M,控制所述数据选择单元每次从所述寄存单元中选取2M个数据,并将选取的数据输出给所述M个蝶形运算单元;The arithmetic controller is further configured to control the data selection unit to select 2M data from the register unit each time if the W is greater than or equal to 2M, and output the selected data to the M butterfly operations unit;
所述运算控制器还用于若所述H小于2M,控制所述数据选择单元从所述寄存单元中选取H个数据,并将选取的数据输出给所述M个蝶形运算单元;The operation controller is further configured to control the data selection unit to select H data from the register unit if the H is less than 2M, and output the selected data to the M butterfly operation units;
所述运算控制器还用于若所述H大于或者等于2M,控制所述数据选择单元每次从所述寄存单元中选取2M个数据,并将选取的数据输出给所述M个蝶形运算单元。The operation controller is further configured to control the data selection unit to select 2M data from the register unit each time if the H is greater than or equal to 2M, and output the selected data to the M butterfly operations unit.
在一种可能的设计中,所述数据选择单元包括数据选择仲裁逻辑和数据选择逻辑;In a possible design, the data selection unit includes data selection arbitration logic and data selection logic;
所述数据选择仲裁逻辑根据每次从寄存单元中选取的数据的个数,以及所述蝶形运算单元进行快速傅立叶变换或者快速傅立叶逆变换当前所处的级数,确定旋转因子;The data selection arbitration logic determines the rotation factor according to the number of data selected from the register unit each time and the current stage where the butterfly operation unit performs fast Fourier transform or inverse fast Fourier transform;
所述数据选择逻辑根据所述旋转因子,确定输入所述M个蝶形运算单元中每个蝶形运算单元的两个数据的序号。The data selection logic determines the serial numbers of the two data input to each butterfly operation unit of the M butterfly operation units according to the rotation factor.
在一种可能的设计中,所述寄存单元包括两个寄存器集合,每个所述寄存器集合包含X个寄存器组,一个寄存器组包括P个寄存器,一个寄存器组中的P个寄存器用于存储一行或一列特征向量,所述X和P均为正整数;In a possible design, the register unit includes two register sets, each of the register sets includes X register sets, one register set includes P registers, and the P registers in one register set are used to store one row. Or a list of feature vectors, where both X and P are positive integers;
所述运算控制器用于向所述两个寄存器集合中的一个寄存器集合存储X行或列特征向量;The arithmetic controller is configured to store X row or column feature vectors in one of the two register sets;
所述运算控制器还用于控制所述数据选择单元从所述一个寄存器集合中选取相应的数据,并将选取的数据输出给所述M个蝶形运算单元,同时所述运算控制器向所述两个寄存器集合中的另一个寄存器集合存储除所述X行或列特征向量外的特征向量。The arithmetic controller is also used to control the data selection unit to select corresponding data from the one set of registers, and output the selected data to the M butterfly arithmetic units, and at the same time, the arithmetic controller sends the The other register set in the two register sets stores feature vectors other than the X row or column feature vector.
在一种可能的设计中,所述系统还包括寄存器接口;In a possible design, the system further includes a register interface;
所述寄存器接口用于获取寄存器配置信息,并将所述寄存器配置信息传输至所述运算控制器;The register interface is used to obtain register configuration information, and transmit the register configuration information to the operation controller;
所述运算控制器用于根据所述寄存器配置信息,向所述M个寄存器集合存储特征向量。The arithmetic controller is used for storing feature vectors in the M register sets according to the register configuration information.
在一种可能的设计中,所述存储模块包括第一存储器、第二存储器以及第三存储器;In a possible design, the storage module includes a first memory, a second memory, and a third memory;
所述第一存储器用于存储所述第一结果、所述第三结果、所述第六结果;The first memory is used to store the first result, the third result, and the sixth result;
所述第二存储器用于存储所述第二结果、第四结果、所述第五结果以及所述第七结果;The second memory is used to store the second result, the fourth result, the fifth result, and the seventh result;
所述第三存储器用于存储所述第八结果。The third memory is used to store the eighth result.
在一种可能的设计中,所述系统还包括任务控制器,所述任务控制器用于在检测到相关度计算指令时,向所述运算控制器发送任务启动信号,所述任务启动信号用于指示所述运算控制器对输入数据进行快速傅立叶变换或者快速傅立叶逆变换。In a possible design, the system further includes a task controller, and the task controller is configured to send a task start signal to the calculation controller when the correlation calculation instruction is detected, and the task start signal is used for Instruct the arithmetic controller to perform fast Fourier transform or inverse fast Fourier transform on the input data.
本发明实施例中,数据处理系统包括数据传输模块、控制模块、运算模块、存储控制模块及存储模块,这五个模块均为硬件模块,通过该数据处理系统中这五个硬件模块之间的相互协作实现相关度的计算,且计算过程中都是以行或者列为单位进行计算的,因此,相较于纯软件计算,该数据处理系统的计算效率较高。In the embodiment of the present invention, the data processing system includes a data transmission module, a control module, a calculation module, a storage control module, and a storage module. These five modules are all hardware modules. The calculation of the correlation degree is realized by mutual cooperation, and the calculation process is performed in the unit of row or column. Therefore, compared with pure software calculation, the calculation efficiency of the data processing system is higher.
附图说明Description of the drawings
为了说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。In order to illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art.
图1为本发明实施例提供的一种数据处理系统的原理框图。Fig. 1 is a functional block diagram of a data processing system provided by an embodiment of the present invention.
图2为本发明实施例提供的第一特征值及第二特征值的示意图。FIG. 2 is a schematic diagram of a first characteristic value and a second characteristic value provided by an embodiment of the present invention.
图3为本发明实施例提供的一种运算模块的原理框图。Fig. 3 is a functional block diagram of an arithmetic module provided by an embodiment of the present invention.
图4为本发明实施例提供的一种蝶形运算单元的示意图。FIG. 4 is a schematic diagram of a butterfly operation unit provided by an embodiment of the present invention.
图5为本发明实施例提供的另一种数据处理系统的原理框图。Fig. 5 is a functional block diagram of another data processing system provided by an embodiment of the present invention.
图6为本发明实施例提供的一种FFT运算的示意图。Fig. 6 is a schematic diagram of an FFT operation provided by an embodiment of the present invention.
图7为本发明实施例提供的一种寄存单元的示意图。FIG. 7 is a schematic diagram of a registration unit provided by an embodiment of the present invention.
图8为本发明实施例提供的存储模块中每个存储单元的示意图。FIG. 8 is a schematic diagram of each storage unit in the storage module provided by an embodiment of the present invention.
图9为本发明实施例提供的一种流水线处理的示意图。FIG. 9 is a schematic diagram of a pipeline processing provided by an embodiment of the present invention.
本发明的实施方式Embodiments of the present invention
请参阅图1,图1为本发明实施例提供的一种数据处理系统的原理框图。Please refer to FIG. 1, which is a functional block diagram of a data processing system according to an embodiment of the present invention.
如图1所示,所述数据处理系统可以包括数据传输模块、控制模块、运算模块、存储控制模块及存储模块。所述数据传输模块、所述控制模块、所述运算模块、所述存储控制模块及所述存储模块均为硬件模块。As shown in Figure 1, the data processing system may include a data transmission module, a control module, an arithmetic module, a storage control module, and a storage module. The data transmission module, the control module, the calculation module, the storage control module, and the storage module are all hardware modules.
请一并参阅图1及图2,所述数据传输模块用于接收第一特征值F1以及第二特征值F2,所述第一特征值F1及所述第二特征值F2包括D×W×H个特征向量,其中,D代表维度,W代表宽度,H代表高度,D、W、H均为正整数。Please refer to FIGS. 1 and 2 together. The data transmission module is configured to receive a first feature value F1 and a second feature value F2. The first feature value F1 and the second feature value F2 include D×W× H feature vectors, where D stands for dimension, W stands for width, H stands for height, and D, W, and H are all positive integers.
所述控制模块用于将获取到的所述第一特征值F1的第i个维度的W×H个特征向量作为第一运算值,并控制所述运算模块对所述第一运算值的每一行特征向量作快速傅立叶变换得到W×H个第一结果,且控制所述运算模块对所述第一结果的每一列作快速傅立叶变换得到H×W个第二结果,其中,1≤i≤D。The control module is configured to use the acquired W×H eigenvectors of the i-th dimension of the first eigenvalue F1 as the first operation value, and control the operation module to calculate each of the first operation value. One row of feature vectors is subjected to fast Fourier transform to obtain W×H first results, and the arithmetic module is controlled to perform fast Fourier transform on each column of the first result to obtain H×W second results, where 1≤i≤ D.
具体可选的,控制模块针对第一特征值F1的每个维度的W×H个特征向量,均是先对每一行特征向量作快速傅里叶变换得到W×H个第一结果,然后对第一结果的每一列做快速傅里叶变换得到H×W个第二结果。Specifically, for the W×H feature vectors of each dimension of the first feature value F1, the control module first performs fast Fourier transform on each row of feature vectors to obtain W×H first results, and then Each column of the first result is subjected to fast Fourier transform to obtain H×W second results.
所述控制模块还用于将获取到的所述第二特征值F2的第i个维度的W×H个特征向量作为第二运算值,并控制所述运算模块对所述第二运算值的每一行特征向量作快速傅立叶变换得到W×H个第三结果,且控制所述运算模块对所述第三结果的每一列作快速傅立叶变换得到H×W个第四结果;The control module is further configured to use the acquired W×H eigenvectors of the i-th dimension of the second eigenvalue F2 as a second operation value, and control the operation of the operation module on the second operation value Fast Fourier transform of each row feature vector to obtain W×H third results, and control the arithmetic module to perform fast Fourier transform on each column of the third result to obtain H×W fourth results;
具体可选的,控制模块针对第二特征值F2的每个维度的W×H个特征向量,均是先对每一行特征向量作快速傅里叶变换得到W×H个第三结果,然后对第一结果的每一列做快速傅里叶变换得到H×W个第四结果。Specifically, for the W×H eigenvectors of each dimension of the second eigenvalue F2, the control module first performs fast Fourier transform on each row of eigenvectors to obtain W×H third results, and then Each column of the first result is subjected to fast Fourier transform to obtain H×W fourth results.
所述存储控制模块用于对所述第二结果和所述第四结果做共轭相乘运算得到W×H个第五结果。The storage control module is configured to perform a conjugate multiplication operation on the second result and the fourth result to obtain W×H fifth results.
所述控制模块还用于控制所述运算模块对所述第五结果的每一行特征向量作快速傅立叶逆变换得到W×H个第六结果,并控制所述运算模块对所述第六结果的每一列作快速傅立叶逆变换得到H×W个第七结果。The control module is also configured to control the arithmetic module to perform inverse fast Fourier transform on each row feature vector of the fifth result to obtain W×H sixth results, and to control the arithmetic module to perform the inverse fast Fourier transform on the sixth result. Inverse fast Fourier transform for each column to obtain the seventh result of H×W.
所述存储控制模块还用于将所述D个维度中每个维度的第七结果中相同行和相同列的实部进行累加,得到W×H个第八结果,将所述第八结果作为所述第一特征值和所述第二特征值的相关度。The storage control module is further configured to accumulate the real parts of the same row and the same column in the seventh result of each of the D dimensions to obtain W×H eighth results, and use the eighth result as The degree of correlation between the first characteristic value and the second characteristic value.
所述存储控制模块还用于控制所述存储模块存储所述第一至第七结果以及所述相关度。The storage control module is further configured to control the storage module to store the first to seventh results and the correlation degree.
请参阅图3,所述运算模块可以包括M个蝶形运算单元,M为正整数。每个蝶形运算单元用于对接收到的数据进行快速傅里叶变换或者快速傅里叶逆变换。Referring to FIG. 3, the arithmetic module may include M butterfly arithmetic units, and M is a positive integer. Each butterfly operation unit is used to perform fast Fourier transform or inverse fast Fourier transform on the received data.
请参阅图4,每个蝶形运算单元包括两个输入接口,分别用于接收输入的数据输入1和输入2。输入2与旋转因子通过乘法器相乘,得到旋转后的输入2,输入1与旋转后的输入2通过加法器相加得到结果1,可选的,若是IFFT运算,则还需要将输入1与旋转后的输入2通过加法器相加后的结果除2,得到结果1。输入1与旋转后的输入2通过减法器相减得到结果2,可选的,若是IFFT运算,则还需要将输入1与旋转后的输入2通过减法器相减得到的结果除2,得到结果2。Please refer to Fig. 4, each butterfly operation unit includes two input interfaces for receiving input data input 1 and input 2 respectively. Input 2 and the twiddle factor are multiplied by the multiplier to get the rotated input 2. Input 1 and the rotated input 2 are added by the adder to get the result 1. Optionally, if it is an IFFT operation, you also need to input 1 and The rotated input 2 is added by the adder and the result is divided by 2 to get the result 1. Input 1 and rotated input 2 are subtracted by the subtractor to get the result 2. Optionally, if it is an IFFT operation, you also need to divide the result of input 1 and the rotated input 2 by the subtractor by 2 to get the result 2.
M个蝶形运算单元包括2M个输入接口,可以在一个时钟周期处理完一级的2M点的FFT(Fast Fourier Transform,快速傅立叶变换)或者IFFT(Inverse Fast Fourier Transform,快速傅立叶逆变换)运算。对于X点FFT或IFFT运算,当X≤2M时,所述运算模块的可以在一个时钟周期处理完一级的X点的FFT或IFFT运算;当X>2M且是2M的Y倍时,所述运算模块的可以在Y个时钟周期处理完一级的X点的FFT或IFFT运算。需要说明的是,上述的2M点是指该运算模块可以输入的特征向量的个数为2M个。M butterfly arithmetic units include 2M input interfaces, which can process the first-level 2M point FFT (Fast Fourier Transform, Fast Fourier Transform) or IFFT (Inverse Fast Fourier Transform, inverse fast Fourier transform) operation. For X-point FFT or IFFT operation, when X≤2M, the arithmetic module can process the X-point FFT or IFFT operation of the first stage in one clock cycle; when X>2M and Y times 2M, so The arithmetic module can process the X-point FFT or IFFT operation of the first stage in Y clock cycles. It should be noted that the 2M points mentioned above mean that the number of feature vectors that can be input by the arithmetic module is 2M.
具体的,当M=8,且X=8时,所述运算模块可以在一个时钟周期处理完一级的8点的FFT或IFFT运算。当M=8,且X=16时,所述运算模块可以在一个时钟周期处理完一级的16点的FFT或IFFT运算。当M=8,且X=32时,所述运算模块可以在两个时钟周期处理完一级的32点的FFT或IFFT运算。当M=8,且X=64时,所述运算模块可以在四个时钟周期处理完一级的32点的FFT或IFFT运算。Specifically, when M=8 and X=8, the arithmetic module can complete the 8-point FFT or IFFT operation of the first stage in one clock cycle. When M=8 and X=16, the arithmetic module can process the first-level 16-point FFT or IFFT operation in one clock cycle. When M=8 and X=32, the arithmetic module can complete the first-stage 32-point FFT or IFFT operation in two clock cycles. When M=8 and X=64, the arithmetic module can complete the first-level 32-point FFT or IFFT operation in four clock cycles.
请参阅图5,图5为本发明实施例提供的另一种数据处理系统的原理框图。Please refer to FIG. 5, which is a functional block diagram of another data processing system according to an embodiment of the present invention.
如图5所示,所述数据处理系统可以包括数据传输模块、控制模块、运算模块、存储控制模块及存储模块。所述数据传输模块、所述控制模块、所述运算模块、所述存储控制模块及所述存储模块功能与图1提供的实施例的数据传输模块、控制模块、运算模块、存储控制模块及存储模块的功能类似,再此不在赘述。As shown in FIG. 5, the data processing system may include a data transmission module, a control module, an arithmetic module, a storage control module, and a storage module. The functions of the data transmission module, the control module, the operation module, the storage control module, and the storage module are the same as those of the data transmission module, control module, operation module, storage control module, and storage in the embodiment provided in FIG. 1 The functions of the modules are similar, so I won't repeat them here.
数据传输模块可以与外部存储器相连,以将外部存储器中存储的图像特征值传输给控制模块,其中,图像特征值可以包括第一图像的第一特征值F1和第二图像的第二特征值F2。第一特征值F1和第二特征值F2中每个特征值可以包括多个特征向量,如图2所示,每个特征值可以包括D×W×H个特征向量,其中,D代表维度,W代表宽度,H代表高度。数据传输模块将第一特征值和第二特征值从外部存储器传输至控制模块,可以方便后续的运算,从而大大提高运算效率。The data transmission module can be connected to an external memory to transmit the image feature value stored in the external memory to the control module, where the image feature value can include a first feature value F1 of the first image and a second feature value F2 of the second image . Each of the first feature value F1 and the second feature value F2 may include multiple feature vectors. As shown in FIG. 2, each feature value may include D×W×H feature vectors, where D represents dimension, W stands for width and H stands for height. The data transmission module transmits the first characteristic value and the second characteristic value from the external memory to the control module, which can facilitate subsequent calculations, thereby greatly improving calculation efficiency.
所述控制模块可以包括运算控制器、寄存单元和数据选择单元。The control module may include an arithmetic controller, a register unit, and a data selection unit.
所述寄存单元用于寄存即将进行运算的数据,所述运算控制器用于根据W与M或H与M的大小关系控制所述数据选择单元从所述寄存单元中选取相应的数据,并将选取的数据输出给所述M个蝶形运算单元。比如,若W小于2M,控制所述数据选择单元从所述寄存单元中选取W个数据,并将选取的数据输出给所述M个蝶形运算单元。例如,W为4,运算模块集成了4个蝶形运算单元,即可以同时处理8点FFT,而W小于8,那么一个时间周期就可以处理完成,将4个特征向量直接输入运算模块进行运算,可以理解的是,输入运算模块的特征向量只占用了两个蝶形运算单元。The register unit is used to register the data to be operated on, and the operation controller is used to control the data selection unit to select corresponding data from the register unit according to the magnitude relationship between W and M or H and M, and select The data of is output to the M butterfly operation units. For example, if W is less than 2M, control the data selection unit to select W data from the register unit, and output the selected data to the M butterfly operation units. For example, if W is 4, the arithmetic module integrates 4 butterfly arithmetic units, which can process 8-point FFT at the same time, and W is less than 8, then the processing can be completed in one time period, and the 4 eigenvectors are directly input to the arithmetic module for operation It can be understood that the feature vector of the input operation module only occupies two butterfly operation units.
又比如,若所述W大于或者等于2M,控制所述数据选择单元每次从所述寄存单元中选取2M个数据,并将选取的数据输出给所述M个蝶形运算单元。例如,W为16,运算模块集成了4个蝶形运算单元,即可以同时处理8点FFT,而W大于2M,不能将W个数据全部输入,只能一次输入2M个特征向量至运算模块进行运算。For another example, if the W is greater than or equal to 2M, the data selection unit is controlled to select 2M data from the register unit each time, and the selected data is output to the M butterfly operation units. For example, if W is 16, the arithmetic module integrates 4 butterfly arithmetic units, which can process 8-point FFT at the same time, and W is greater than 2M, it is not possible to input all W data, only 2M feature vectors can be input to the arithmetic module at a time. Operation.
所述数据选择单元包括数据选择仲裁逻辑和数据选择逻辑。所述数据选择仲裁逻辑根据每次从寄存单元中选取的数据的个数,以及所述蝶形运算单元进行快速傅立叶变换或者快速傅立叶逆变换当前所处的级数,确定旋转因子。The data selection unit includes data selection arbitration logic and data selection logic. The data selection arbitration logic determines the rotation factor according to the number of data selected from the register unit each time and the current stage where the butterfly operation unit performs fast Fourier transform or inverse fast Fourier transform.
所述数据选择逻辑根据所述旋转因子,确定输入所述M个蝶形运算单元中每个蝶形运算单元的两个数据的序号,并根据序号输入相应的蝶形运算单元。The data selection logic determines the serial numbers of the two data input to each butterfly operation unit of the M butterfly operation units according to the rotation factor, and inputs the corresponding butterfly operation unit according to the serial number.
如图6所示,旋转因子WkN中的W表示权重,N表示FFT或者IFFT的点数(一般是4点、8点、16点等等)k表示第几个权重。从该蝶形运算过程示意图可见,旋转因子WkN中N和k的取值与参与该次FFT运算的点数(N的值)以及蝶形运算单元进行快速傅立叶变换或者快速傅立叶逆变换当前所处的级数有关,比如,八点FFT运算时,在第一级运算时,N=8,k=0、1、2以及3,第二级运算时,N=8,k=0或者2,第三级运算时,N=8,k=0。As shown in Figure 6, W in the twiddle factor WkN represents the weight, and N represents the number of FFT or IFFT points (generally 4 points, 8 points, 16 points, etc.) and k represents the number of weights. It can be seen from the schematic diagram of the butterfly operation process that the values of N and k in the rotation factor WkN, the number of points involved in the FFT operation (the value of N), and the current fast Fourier transform or inverse fast Fourier transform of the butterfly operation unit The number of stages is related. For example, in the eight-point FFT operation, in the first stage operation, N=8, k=0, 1, 2 and 3, in the second stage operation, N=8, k=0 or 2, the first stage operation In the three-level operation, N=8, k=0.
因此,数据选择仲裁逻辑根据每次从寄存单元中选取的数据的个数(即参与该次FFT运算的点数),以及蝶形运算单元进行快速傅立叶变换或者快速傅立叶逆变换当前所处的级数确定旋转因子,即确定N和k的大小。Therefore, the data selection arbitration logic is based on the number of data selected from the register unit each time (that is, the number of points involved in the FFT operation), and the current level of the butterfly operation unit performing fast Fourier transform or inverse fast Fourier transform. Determine the rotation factor, that is, determine the size of N and k.
该数据选择逻辑根据旋转因子,即可以确定输入M个蝶形运算单元中每个蝶形运算单元的两个数据的序号,其中,该序号可以是从寄存单元中取出数据时,依次进行编号得到的序号。或者,若是FFT运算的中间结果,该序号可以是第一级运算后得到的中间结果的顺序依次编号得到的。例如,若是8点FFT运算,则编号为0-7,若是16点FFT运算,则编号为0-15。根据序号确定输入蝶形运算单元的数据,比如,进行8点FFT运算,且为第一级运算,N=8,k=0、1、2以及3,则输入同一个蝶形运算单元的数据点的间距为4(其中,k有四种取值),即选择序号0和序号4,序号1和序号5,序号2和序号6,序号3和序号7。又比如,8点FFT运算,进行第二级运算时,N=8,k=0或者2,则输入同一个蝶形运算单元的数据点的间距为2(其中,k有两种取值),即选择序号0和序号2、序号1和序号3、序号4和序号6以及序号5和序号7。又比如,8点FFT运算,进行第三级运算时,N=8,k=0,则输入同一个蝶形运算单元的数据点的间距为1(其中,k有一种取值),即选择序号0和序号1、序号2和序号3、序号4和序号5、序号6和序号7。According to the rotation factor, the data selection logic can determine the serial number of the two data input to each butterfly operation unit of the M butterfly operation units, where the serial number can be obtained by sequentially numbering when the data is taken out from the register unit The serial number. Or, if it is the intermediate result of the FFT operation, the sequence number may be obtained by sequentially numbering the intermediate results obtained after the first-level operation. For example, if it is an 8-point FFT operation, the number is 0-7, and if it is a 16-point FFT operation, the number is 0-15. Determine the data input to the butterfly operation unit according to the serial number. For example, if an 8-point FFT operation is performed, and it is the first level operation, N=8, k=0, 1, 2, and 3, then input the data of the same butterfly operation unit The point spacing is 4 (where k has four values), that is, select sequence number 0 and sequence number 4, sequence number 1 and sequence number 5, sequence number 2 and sequence number 6, and sequence number 3 and sequence number 7. For another example, 8 point FFT operation, when performing the second level operation, N=8, k=0 or 2, then the distance between the data points input to the same butterfly operation unit is 2 (where k has two values) , That is, select sequence number 0 and sequence number 2, sequence number 1 and sequence number 3, sequence number 4 and sequence number 6, and sequence number 5 and sequence number 7. For another example, 8 point FFT operation, when performing the third level operation, N=8, k=0, then the distance between the data points input to the same butterfly operation unit is 1 (where k has a value), that is, select Serial number 0 and serial number 1, serial number 2 and serial number 3, serial number 4 and serial number 5, serial number 6 and serial number 7.
可以理解,不同的点数可以对应不同的级数,例如,包含X点,所分的级数为log2X。It can be understood that different points can correspond to different series, for example, if X points are included, the number of points divided into is log2X.
可选的,上述寄存单元可以包括第一寄存器集合和第二寄存器集合,该第一寄存器集合和第二寄存器集合均包含X个寄存器组,一个寄存器组包括P个寄存器,X和P均为正整数,一个寄存器组中的P个寄存器用于存储一行或一列特征向量,那么一个寄存器集合就可以存储X行或列特征向量。Optionally, the above-mentioned register unit may include a first register set and a second register set. The first register set and the second register set each include X register sets, and one register set includes P registers, and both X and P are positive. Integer, P registers in a register group are used to store one row or column feature vector, then one register set can store X row or column feature vector.
所述运算控制器用于控制所述第一寄存器集合存储X行或列特征向量;The arithmetic controller is used to control the first register set to store X row or column feature vectors;
所述运算控制器还用于控制所述数据选择单元从所述第一寄存器集合中选取相应的数据,并将选取的数据输出给所述M个蝶形运算单元,同时所述运算控制器控制所述第二寄存器集合存储除所述X行或列特征向量外的特征向量,比如,第一寄存器集合中存储1至X行或列特征向量,则第二寄存器集合存储X+1至2X行或列特征向量。The arithmetic controller is also used to control the data selection unit to select corresponding data from the first register set, and output the selected data to the M butterfly arithmetic units, while the arithmetic controller controls The second register set stores feature vectors other than the X row or column feature vectors. For example, if the first register set stores 1 to X row or column feature vectors, the second register set stores X+1 to 2X rows Or column feature vector.
在一个实施例中,如图7所示,寄存单元可以包括两个寄存器集合,第一寄存器集合包含寄存组1-4,第二寄存器集合包含寄存组5-8。一个寄存组可以用于存储一行特征向量,比如寄存组1-4用于存储特征值F1的第1-4行,寄存组5-8用于存储特征值F1中的第5-8行。为了提高数据处理效率,寄存器组1-4和寄存器组5-8可以采取乒乓操作,即可以在对第1-4行的特征值进行处理的同时,将第5-8行的特征值存储在第二寄存器集合中,当完成对第1-4行的特征值的处理后,第5-8行的特征值已经存储在第二寄存器集合中,可以直接对第二寄存器集合中的特征值进行处理,减少了第5-8行的特征值存储进第二寄存器集合的等待时间,从而大大提高处理效率。In one embodiment, as shown in FIG. 7, the register unit may include two register sets, the first register set includes register sets 1-4, and the second register set includes register sets 5-8. A register group can be used to store a row of feature vectors, for example, register groups 1-4 are used to store the 1-4 rows of the feature value F1, and the register groups 5-8 are used to store the 5-8th rows of the feature value F1. In order to improve the efficiency of data processing, register banks 1-4 and register banks 5-8 can take ping-pong operation, that is, while processing the characteristic values of rows 1-4, the characteristic values of rows 5-8 can be stored in In the second register set, after processing the characteristic values in rows 1-4, the characteristic values in rows 5-8 have been stored in the second register set, and the characteristic values in the second register set can be directly processed The processing reduces the waiting time for storing the characteristic values of lines 5-8 into the second register set, thereby greatly improving the processing efficiency.
需要说明的是,寄存器组1-4还可以继续用于存储9-12行,17-20行等等,寄存组5-8还可以继续用于存储13-16行,21-24行等等,以此类推。It should be noted that the register group 1-4 can also be used to store rows 9-12, 17-20, etc., and the register group 5-8 can also be used to store rows 13-16, 21-24, etc. , And so on.
可以理解的是,寄存组1-4和寄存组5-8存储特征值F2的存储方式同存储特征值F1相同,在此不再赘述。It can be understood that the storage mode of the feature value F2 in the registration groups 1-4 and 5-8 is the same as that of the feature value F1, and will not be repeated here.
可选的,所述存储控制模块用于对上述计算得到的第二结果和第四结果做共轭相乘运算得到W×H个第五结果。Optionally, the storage control module is configured to perform a conjugate multiplication operation on the second result and the fourth result obtained by the above calculation to obtain W×H fifth results.
所述控制模块还用于控制所述运算模块对所述第五结果的每一行特征向量作快速傅立叶逆变换得到W×H个第六结果,并控制所述运算模块对所述第六结果的每一列作快速傅立叶逆变换得到H×W个第七结果。The control module is also configured to control the arithmetic module to perform inverse fast Fourier transform on each row feature vector of the fifth result to obtain W×H sixth results, and to control the arithmetic module to perform the inverse fast Fourier transform on the sixth result. Inverse fast Fourier transform for each column to obtain the seventh result of H×W.
其中,运算模块对W×H个第五结果的每一行特征向量作快速傅里叶逆变换的方法,以及对W×H个第六结果的每一列作快速傅立叶逆变换得到H×W个第七结果的方法请参照前述实施例行方向快速傅里叶变换过程和列方向快速傅里叶变换过程的描述,在此不再赘述。Among them, the arithmetic module performs inverse fast Fourier transform method on each row feature vector of W×H fifth results, and performs inverse fast Fourier transform on each column of W×H sixth results to obtain H×W For the method of the seventh result, please refer to the descriptions of the directional fast Fourier transform process and the column directional fast Fourier transform process of the foregoing embodiment, which will not be repeated here.
所述存储控制模块还用于将所述D个维度中每个维度的第七结果中相同行和相同列的实部进行累加,得到W×H个第八结果,将所述第八结果作为所述第一特征值和所述第二特征值的相关度;该相关度可以用于核相关滤波算法(Kernel Correlation Filter,KCF)中的计算,即计算当前图像帧和上一个图像帧之间的相关度。The storage control module is further configured to accumulate the real parts of the same row and the same column in the seventh result of each of the D dimensions to obtain W×H eighth results, and use the eighth result as The correlation degree between the first eigenvalue and the second eigenvalue; the correlation degree can be used in a kernel correlation filtering algorithm (Kernel The calculation in the Correlation Filter (KCF) is to calculate the correlation between the current image frame and the previous image frame.
可选的,每计算出一个维度的H×W个第七结果,即将该维度的H×W个第七结果中与已经存储的累加结果中的相同行和相同列进行累加,当计算出所有维度的H×W个第七结果,即可得到W×H个第八结果,将所述第八结果作为所述第一特征值和所述第二特征值的相关度。Optionally, every time the H×W seventh result of a dimension is calculated, that is, the H×W seventh result of the dimension is accumulated with the same row and the same column in the accumulated result that has been stored. When all the results are calculated The H×W seventh results of the dimension can obtain the W×H eighth results, and the eighth results are used as the correlation between the first eigenvalue and the second eigenvalue.
所述存储控制模块还用于控制所述存储模块存储所述第一至第七结果以及所述相关度。The storage control module is further configured to control the storage module to store the first to seventh results and the correlation degree.
可选的,存储模块包括第一存储器、第二存储器以及第三存储器。Optionally, the storage module includes a first memory, a second memory, and a third memory.
所述第一存储器用于存储所述第一结果、所述第三结果、所述第六结果。The first memory is used to store the first result, the third result, and the sixth result.
所述第二存储器用于存储所述第二结果、第四结果、所述第五结果以及所述第七结果。The second memory is used to store the second result, the fourth result, the fifth result, and the seventh result.
所述第三存储器用于存储所述第八结果。The third memory is used to store the eighth result.
具体可选的,在对第一特征值F1的每行特征向量进行傅里叶变换后得到的第一结果存储至第一存储器,进一步,运算控制器从第一存储器读出该第一结果,运算控制器控制运算模块对第一结果的每列进行快速傅里叶变换得到第二结果,将该第二结果存储至第二存储器。Specifically, optionally, the first result obtained after the Fourier transform is performed on each row of the feature vector of the first feature value F1 is stored in the first memory, and further, the arithmetic controller reads the first result from the first memory, The arithmetic controller controls the arithmetic module to perform fast Fourier transform on each column of the first result to obtain a second result, and stores the second result in the second memory.
由于第一存储器所存储的第一结果已经用于列方向的快速傅里叶变换,因此为了节省存储器,可以将对第二特征值的每行进行快速傅里叶变换得到的第三结果存储至第一存储器,需要说明的是,可以是覆盖存储,即将第三结果覆盖掉第一结果,也可以不用覆盖存储。Since the first result stored in the first memory has been used for the fast Fourier transform in the column direction, in order to save memory, the third result obtained by performing the fast Fourier transform on each row of the second eigenvalue can be stored in The first storage, it should be noted that, can be overwriting storage, that is, the third result overwrites the first result, or it can be stored without overwriting.
进一步运算控制器从第一存储器读出第三结果,并控制运算模块对第三结果的每列进行快速傅里叶变换,得到第四结果,可以将第四结果存在第二存储器,这里不能是覆盖存储,因为第二存储器中所存储的第二结果需要进行共轭相乘运算。Further, the arithmetic controller reads the third result from the first memory, and controls the arithmetic module to perform fast Fourier transform on each column of the third result to obtain the fourth result. The fourth result can be stored in the second memory, which cannot be Overwrite storage, because the second result stored in the second memory requires a conjugate multiplication operation.
进一步从第二存储器读出第二结果和第四结果,存储控制模块控制对第二结果和第四结果进行共轭相乘,得到第五结果,将第五结果存储在第二存储器中。Further read the second result and the fourth result from the second memory, and the storage control module controls the second result and the fourth result to perform conjugate multiplication to obtain the fifth result, and store the fifth result in the second memory.
进一步从第二存储器读出第五结果,对第五结果的每行进行快速傅里叶逆变换,得到第六结果。将第六结果存储至第一存储器。The fifth result is further read from the second memory, and the inverse fast Fourier transform is performed on each row of the fifth result to obtain the sixth result. Store the sixth result in the first memory.
进一步从第一存储器读出第六结果,并对第六结果的每列进行快速傅里叶逆变换,得到第七结果。The sixth result is further read from the first memory, and the inverse fast Fourier transform is performed on each column of the sixth result to obtain the seventh result.
将第七结果存储至第二存储器,进一步从第二存储器读出第七结果,将所有维度第七结果的相同行和相同列的实部相加,即得到第八结果,即相关度,将相关度存储至第三存储器。比如是两个维度,两行两列的特征值,则将两个维度中第一行第一列的两个第二快速傅里叶逆变换结果的实部相加,将两个维度中第一行第二列的两个第二快速傅里叶逆变换结果的实部相加,以此类推,得到两行两列的实部相加的值。The seventh result is stored in the second memory, and the seventh result is further read from the second memory. The real parts of the same row and the same column of the seventh result of all dimensions are added to obtain the eighth result, which is the degree of correlation. The correlation degree is stored in the third memory. For example, two dimensions, two rows and two columns of eigenvalues, then the real parts of the two second inverse fast Fourier transform results in the first row and first column of the two dimensions are added together, and the first in the two dimensions The real parts of the two second inverse fast Fourier transform results in one row and the second column are added together, and so on, to get the value of the addition of the real parts of the two rows and two columns.
如图8所示,在一个实施例中,存储模块中的每个存储器(如第一存储器、第二存储器及第三存储器)都可以包括多个存储单元(如Bank0-Bank31)。每个存储单元具有一个地址编号,每个存储单元且用于存储一行或者一列的数据(图7以行为例进行说明)。每个存储器包括的存储单元的数量可以相同也可以不同,且每个存储器中存储单元的地址编号可以相同也可以不同。As shown in FIG. 8, in one embodiment, each memory (such as the first memory, the second memory, and the third memory) in the storage module may include multiple storage units (such as Bank0-Bank31). Each storage unit has an address number, and each storage unit is used to store one row or one column of data (Figure 7 shows an example of a row). The number of storage units included in each memory may be the same or different, and the address numbers of the storage units in each memory may be the same or different.
在一个实施例中,每个存储器包括的存储单元的数量相同且每个存储器中存储单元的地址编号也相同,比如每个存储器包括的32个存储单元,且32个存储单元的地址编号是地址0-地址31。一个存储单元依次存储一行中的各个结果,当对每列数据进行快速傅里叶变换时,将多个存储单元中地址编号相同的结果取出来即构成一列数据。比如,都取地址0的数据,就构成第一列数据。In one embodiment, each memory includes the same number of storage units and the address numbers of the storage units in each memory are also the same. For example, each memory includes 32 storage units, and the address numbers of the 32 storage units are addresses. 0-Address 31. A storage unit sequentially stores each result in a row. When fast Fourier transform is performed on each column of data, the results with the same address number in multiple storage units are taken out to form a column of data. For example, all the data at address 0 is taken to form the first column of data.
在一个实施例中,上述计算每行数据的快速傅里叶变换,或者每列数据的快速傅里叶变换,以及每行数据的快速傅里叶逆变换,或者每列数据的快速傅里叶逆变换,均需要经历以下四个处理过程:In one embodiment, the above calculation of the fast Fourier transform of each row of data, or the fast Fourier transform of each column of data, and the inverse fast Fourier transform of each row of data, or the fast Fourier transform of each column of data The inverse transformation needs to go through the following four processes:
数据载入:运算控制器将数据传输模块传输的第一特征值F1及第一特征值F2,或运算模块输出的中间结果,或存储模块输出的第一结果及第三结果载入寄存单元;Data loading: the arithmetic controller loads the first characteristic value F1 and the first characteristic value F2 transmitted by the data transmission module, or the intermediate result output by the arithmetic module, or the first result and the third result output by the storage module into the register unit;
数据选择:数据选择单元对载入寄存单元的数据进行选择,并将选择的数据传输给运算模块;Data selection: The data selection unit selects the data loaded into the register unit, and transmits the selected data to the arithmetic module;
数据运算:运算模块对选择的数据进行快速傅里叶变换或者快速傅里叶逆变换,并将计算结果传输给存储模块;Data calculation: The calculation module performs fast Fourier transform or inverse fast Fourier transform on the selected data, and transmits the calculation result to the storage module;
数据存储:存储模块存储接收到的计算结果(如第一至第七结果)。Data storage: The storage module stores the received calculation results (such as the first to seventh results).
由此可见,上述四个处理过程需按先后顺序依次执行,即,先进行数据载入,再进行数据选择,接着进行数据运算,最后进行数据存储。亦即,上述四个处理过程是按流水线处理的。It can be seen that the above four processing procedures need to be executed in sequence, that is, data loading is performed first, data selection is performed, data calculation is performed, and data storage is finally performed. That is, the above four processing procedures are processed in a pipeline.
图9是本发明实施例提供的一种流水线处理的示意图,如图9所示,一个周期包括四个子周期,每个子周期对应一个上述的处理过程。在第一个子周期,运算控制器将相应的数据载入寄存器组1;在第二个子周期,数据选择单元对载入寄存组1的数据进行选择,同时运算控制器将相应的数据载入寄存器组2;在第三个子周期,运算模块对选择的寄存组1的数据进行FFT或者IFFT运算,同时数据选择单元对载入寄存组2的数据进行选择,且运算控制器将相应的数据载入寄存器组3;在第四个子周期,存储模块存储运算模块输出的对寄存组1的数据进行运算的结果,同时运算模块对选择的寄存组2的数据进行FFT或者IFFT运算,数据选择单元对载入寄存组3的数据进行选择,运算控制器将相应的数据载入寄存器组4。依次类推,每个周期完成一行或一列数据的FFT或者IFFT运算,且每个周期有4个寄存组的参与,可以对4行或4列数据进行处理。一直到4个时间周期完成对4行特征值的快速傅里叶变换结果的计算,得到4行特征值的快速傅里叶变换结果。需要说明的是,为了避免对第5-8行的特征值从外部读入的等待时间,可以在对第1-4行的特征值进行快速傅里叶变换计算的同时,读入数据到第二寄存组集合中进行存储。Fig. 9 is a schematic diagram of a pipeline processing provided by an embodiment of the present invention. As shown in Fig. 9, a cycle includes four sub-periods, and each sub-period corresponds to one of the above-mentioned processing procedures. In the first sub-cycle, the arithmetic controller loads the corresponding data into register group 1. In the second sub-cycle, the data selection unit selects the data loaded into register group 1, and the arithmetic controller loads the corresponding data Register group 2; in the third sub-cycle, the arithmetic module performs FFT or IFFT operations on the data of the selected register group 1, while the data selection unit selects the data loaded in register group 2, and the arithmetic controller loads the corresponding data Into register group 3; in the fourth sub-cycle, the storage module stores the result of the operation on the data of register group 1 output by the operation module, while the operation module performs FFT or IFFT operation on the data of the selected register group 2, and the data selection unit Load the data of register group 3 for selection, and the arithmetic controller loads the corresponding data into register group 4. By analogy, each cycle completes the FFT or IFFT operation of one row or one column of data, and each cycle has the participation of 4 register groups, which can process 4 rows or 4 columns of data. The calculation of the fast Fourier transform results of the 4 rows of eigenvalues is completed until 4 time periods, and the fast Fourier transform results of the 4 rows of eigenvalues are obtained. It should be noted that in order to avoid the waiting time for reading the eigenvalues of rows 5-8 from the outside, you can read the data to the first while performing fast Fourier transform calculations on the eigenvalues of rows 1-4 The second storage group is stored in the set.
在一个实施例中,计算一行或一列数据的FFT或者IFFT运算需要经历的处理过程的数量,与一个周期包括的子周期的数量,以及一个寄存器集合包括的寄存器组的数量相同。例如,当计算一行或一列数据的FFT或者IFFT运算需要经历四个处理过程时,一个周期包括四个子周期,且一个寄存器集合包括四个寄存器组。In one embodiment, the number of processing procedures required to calculate the FFT or IFFT operation of one row or one column of data is the same as the number of sub-periods included in one cycle and the number of register groups included in a register set. For example, when calculating the FFT or IFFT operation of one row or one column of data requires four processing procedures, one cycle includes four sub-periods, and one register set includes four register sets.
请再次参阅图5,数据处理系统还可以包括寄存器接口。所述寄存器接口用于获取寄存器配置信息,并将所述寄存器配置信息传输至所述运算控制器。所述运算控制器用于根据所述寄存器配置信息,向所述第一寄存器集合和所述第二寄存器集合存储特征向量。Please refer to FIG. 5 again, the data processing system may also include a register interface. The register interface is used to obtain register configuration information and transmit the register configuration information to the operation controller. The arithmetic controller is configured to store feature vectors in the first register set and the second register set according to the register configuration information.
比如,寄存器配置信息指示先向第一寄存器集合存储数据,然后在对第一寄存器集合中存储的数据进行处理时,向第二寄存器集合存储数据。For example, the register configuration information indicates that data is stored in the first register set first, and then when the data stored in the first register set is processed, the data is stored in the second register set.
请再次参阅图5,数据处理系统还可以包括任务控制器,所述任务控制器用于在检测到相关度计算指令时,向所述运算控制器发送任务启动信号,所述任务启动信号用于指示所述运算控制器对输入数据进行快速傅立叶变换或者快速傅立叶逆变换。Please refer to FIG. 5 again, the data processing system may further include a task controller, the task controller is configured to send a task start signal to the calculation controller when the correlation calculation instruction is detected, and the task start signal is used to indicate The arithmetic controller performs fast Fourier transform or inverse fast Fourier transform on input data.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于计算机可读取存储介质中,该程序在执行时,包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. When executed, it includes the procedures of the above-mentioned method embodiments. Wherein, the storage medium can be a magnetic disk, an optical disk, a read-only storage memory (Read-Only Memory, ROM) or random storage memory (Random Access Memory, RAM) etc.

Claims (9)

  1. 一种数据处理系统,其特征在于,所述数据处理系统包括数据传输模块、控制模块、运算模块、存储控制模块及存储模块;A data processing system, characterized in that the data processing system includes a data transmission module, a control module, an arithmetic module, a storage control module, and a storage module;
    所述数据传输模块用于接收第一特征值以及第二特征值,所述第一特征值及所述第二特征值包括D×W×H个特征向量,其中,D代表维度,W代表宽度,H代表高度,D、W、H均为正整数;The data transmission module is configured to receive a first characteristic value and a second characteristic value, the first characteristic value and the second characteristic value including D×W×H characteristic vectors, where D stands for dimension and W stands for width , H stands for height, D, W, and H are all positive integers;
    所述控制模块用于将获取到的所述第一特征值的第i个维度的W×H个特征向量作为第一运算值,并控制所述运算模块对所述第一运算值的每一行特征向量作快速傅立叶变换得到W×H个第一结果,且控制所述运算模块对所述第一结果的每一列作快速傅立叶变换得到H×W个第二结果,其中,1≤i≤D;The control module is configured to use the acquired W×H eigenvectors of the i-th dimension of the first eigenvalue as a first operation value, and control the operation module to perform each row of the first operation value The feature vector is subjected to fast Fourier transform to obtain W×H first results, and the arithmetic module is controlled to perform fast Fourier transform on each column of the first result to obtain H×W second results, where 1≤i≤D ;
    所述控制模块还用于将获取到的所述第二特征值的第i个维度的W×H个特征向量作为第二运算值,并控制所述运算模块对所述第二运算值的每一行特征向量作快速傅立叶变换得到W×H个第三结果,且控制所述运算模块对所述第三结果的每一列作快速傅立叶变换得到H×W个第四结果;The control module is further configured to use the acquired W×H eigenvectors of the i-th dimension of the second eigenvalue as a second operation value, and control the operation module to perform a calculation on each of the second operation value. Fast Fourier transform of a row of feature vectors to obtain W×H third results, and control the arithmetic module to perform fast Fourier transform on each column of the third result to obtain H×W fourth results;
    所述存储控制模块用于对所述第二结果和所述第四结果做共轭相乘运算得到W×H个第五结果;The storage control module is configured to perform a conjugate multiplication operation on the second result and the fourth result to obtain W×H fifth results;
    所述控制模块还用于控制所述运算模块对所述第五结果的每一行特征向量作快速傅立叶逆变换得到W×H个第六结果,并控制所述运算模块对所述第六结果的每一列作快速傅立叶逆变换得到H×W个第七结果;The control module is also configured to control the arithmetic module to perform inverse fast Fourier transform on each row feature vector of the fifth result to obtain W×H sixth results, and to control the arithmetic module to perform the inverse fast Fourier transform on the sixth result. Inverse fast Fourier transform for each column to obtain H×W seventh results;
    所述存储控制模块还用于将所述D个维度中每个维度的第七结果中相同行和相同列的实部进行累加,得到W×H个第八结果,将所述第八结果作为所述第一特征值和所述第二特征值的相关度;The storage control module is further configured to accumulate the real parts of the same row and the same column in the seventh result of each of the D dimensions to obtain W×H eighth results, and use the eighth result as The degree of correlation between the first characteristic value and the second characteristic value;
    所述存储控制模块还用于控制所述存储模块存储所述第一至第七结果以及所述相关度。The storage control module is further configured to control the storage module to store the first to seventh results and the correlation degree.
  2. 如权利要求1所述的数据处理系统,其特征在于,所述运算模块包括M个蝶形运算单元,M为正整数,每个蝶形运算单元对接收到的数据进行快速傅立叶变换或者快速傅立叶逆变换。The data processing system according to claim 1, wherein the arithmetic module includes M butterfly arithmetic units, M is a positive integer, and each butterfly arithmetic unit performs fast Fourier transform or fast Fourier transform on the received data. Inverse transformation.
  3. 如权利要求2所述的数据处理系统,其特征在于,所述控制模块包括运算控制器、寄存单元和数据选择单元,所述寄存单元用于寄存即将进行运算的数据,所述运算控制器用于根据W与M或H与M的大小关系控制所述数据选择单元从所述寄存单元中选取相应的数据,并将选取的数据输出给所述M个蝶形运算单元。The data processing system according to claim 2, wherein the control module includes an arithmetic controller, a registering unit, and a data selection unit, the registering unit is used to register data to be calculated, and the arithmetic controller is used for According to the magnitude relationship between W and M or H and M, the data selection unit is controlled to select corresponding data from the register unit, and the selected data is output to the M butterfly operation units.
  4. 如权利要求3所述的数据处理系统,其特征在于,所述运算控制器还用于若所述W小于2M,控制所述数据选择单元从所述寄存单元中选取W个数据,并将选取的数据输出给所述M个蝶形运算单元;The data processing system of claim 3, wherein the arithmetic controller is further configured to control the data selection unit to select W data from the register unit if the W is less than 2M, and select Output data of to the M butterfly operation units;
    所述运算控制器还用于若所述W大于或者等于2M,控制所述数据选择单元每次从所述寄存单元中选取2M个数据,并将选取的数据输出给所述M个蝶形运算单元;The arithmetic controller is further configured to control the data selection unit to select 2M data from the register unit each time if the W is greater than or equal to 2M, and output the selected data to the M butterfly operations unit;
    所述运算控制器还用于若所述H小于2M,控制所述数据选择单元从所述寄存单元中选取H个数据,并将选取的数据输出给所述M个蝶形运算单元;The operation controller is further configured to control the data selection unit to select H data from the register unit if the H is less than 2M, and output the selected data to the M butterfly operation units;
    所述运算控制器还用于若所述H大于或者等于2M,控制所述数据选择单元每次从所述寄存单元中选取2M个数据,并将选取的数据输出给所述M个蝶形运算单元。The operation controller is further configured to control the data selection unit to select 2M data from the register unit each time if the H is greater than or equal to 2M, and output the selected data to the M butterfly operations unit.
  5. 如权利要求4所述的数据处理系统,其特征在于,所述数据选择单元包括数据选择仲裁逻辑和数据选择逻辑;5. The data processing system of claim 4, wherein the data selection unit comprises data selection arbitration logic and data selection logic;
    所述数据选择仲裁逻辑根据每次从所述寄存单元中选取的数据的个数,以及所述蝶形运算单元进行快速傅立叶变换或者快速傅立叶逆变换当前所处的级数,确定旋转因子;The data selection arbitration logic determines the rotation factor according to the number of data selected from the register unit each time and the current stage where the butterfly operation unit performs fast Fourier transform or inverse fast Fourier transform;
    所述数据选择逻辑根据所述旋转因子,确定输入所述M个蝶形运算单元中每个蝶形运算单元的两个数据的序号。The data selection logic determines the serial numbers of the two data input to each butterfly operation unit of the M butterfly operation units according to the rotation factor.
  6. 如权利要求3所述的数据处理系统,其特征在于,所述寄存单元包括第一寄存器集合和第二寄存器集合,所述第一寄存器集合和所述第二寄存器集合均包含X个寄存器组,一个寄存器组包括P个寄存器,一个寄存器组中的P个寄存器用于存储一行或一列特征向量,所述X和P均为正整数;The data processing system according to claim 3, wherein the register unit includes a first register set and a second register set, and both the first register set and the second register set include X register sets, A register group includes P registers, and the P registers in a register group are used to store a row or a column of feature vectors, and the X and P are both positive integers;
    所述运算控制器用于向所述第一寄存器集合中存储X行或列特征向量;The operation controller is used to store X row or column feature vectors in the first register set;
    所述运算控制器还用于控制所述数据选择单元从所述第一寄存器集合中选取相应的数据,并将选取的数据输出给所述M个蝶形运算单元,同时所述运算控制器向所述第二寄存器集合中存储除所述X行或列特征向量外的特征向量。The arithmetic controller is also used to control the data selection unit to select corresponding data from the first register set, and output the selected data to the M butterfly arithmetic units, while the arithmetic controller sends The second register set stores feature vectors other than the X row or column feature vectors.
  7. 如权利要求6所述的数据处理系统,所述系统还包括寄存器接口;The data processing system according to claim 6, said system further comprising a register interface;
    所述寄存器接口用于获取寄存器配置信息,并将所述寄存器配置信息传输至所述运算控制器;The register interface is used to obtain register configuration information, and transmit the register configuration information to the operation controller;
    所述运算控制器用于根据所述寄存器配置信息,向所述第一寄存器集合和所述第二寄存器集合存储特征向量。The arithmetic controller is configured to store feature vectors in the first register set and the second register set according to the register configuration information.
  8. 如权利要求1所述的数据处理系统,所述存储模块包括第一存储器、第二存储器以及第三存储器;The data processing system according to claim 1, wherein the storage module includes a first storage, a second storage, and a third storage;
    所述第一存储器用于存储所述第一结果、所述第三结果、所述第六结果;The first memory is used to store the first result, the third result, and the sixth result;
    所述第二存储器用于存储所述第二结果、第四结果、所述第五结果以及所述第七结果;The second memory is used to store the second result, the fourth result, the fifth result, and the seventh result;
    所述第三存储器用于存储所述第八结果。The third memory is used to store the eighth result.
  9. 如权利要求3所述的数据处理系统,所述系统还包括任务控制器,所述任务控制器用于在检测到相关度计算指令时,向所述运算控制器发送任务启动信号,所述任务启动信号用于指示所述运算控制器对输入数据进行快速傅立叶变换或者快速傅立叶逆变换。The data processing system according to claim 3, the system further comprising a task controller, the task controller is configured to send a task start signal to the calculation controller when the correlation calculation instruction is detected, and the task start The signal is used to instruct the arithmetic controller to perform fast Fourier transform or inverse fast Fourier transform on the input data.
PCT/CN2020/108985 2019-12-16 2020-08-13 Data processing system WO2021120646A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911296752.6 2019-12-16
CN201911296752.6A CN111145075B (en) 2019-12-16 2019-12-16 Data processing system

Publications (1)

Publication Number Publication Date
WO2021120646A1 true WO2021120646A1 (en) 2021-06-24

Family

ID=70518468

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/108985 WO2021120646A1 (en) 2019-12-16 2020-08-13 Data processing system

Country Status (2)

Country Link
CN (1) CN111145075B (en)
WO (1) WO2021120646A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111145075B (en) * 2019-12-16 2023-05-12 深圳云天励飞技术有限公司 Data processing system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729463A (en) * 2008-10-24 2010-06-09 中兴通讯股份有限公司 Hardware device and method for implementing Fourier transform and Fourier inverse transform
CN103714531A (en) * 2013-12-05 2014-04-09 南京理工大学 FPGA-based phase correlation method image registration system and method
WO2017013877A1 (en) * 2015-07-20 2017-01-26 Okinawa Institute Of Science And Technology School Corporation 2d discrete fourier transform with simultaneous edge artifact removal for real-time applications
CN108335330A (en) * 2017-12-31 2018-07-27 华中科技大学 A kind of collection of illustrative plates collaboration real time processing system
CN109859178A (en) * 2019-01-18 2019-06-07 北京航空航天大学 A kind of infrared remote sensing image real-time target detection method based on FPGA
CN111145075A (en) * 2019-12-16 2020-05-12 深圳云天励飞技术有限公司 Data processing system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103955447B (en) * 2014-04-28 2017-04-12 中国人民解放军国防科学技术大学 FFT accelerator based on DSP chip

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101729463A (en) * 2008-10-24 2010-06-09 中兴通讯股份有限公司 Hardware device and method for implementing Fourier transform and Fourier inverse transform
CN103714531A (en) * 2013-12-05 2014-04-09 南京理工大学 FPGA-based phase correlation method image registration system and method
WO2017013877A1 (en) * 2015-07-20 2017-01-26 Okinawa Institute Of Science And Technology School Corporation 2d discrete fourier transform with simultaneous edge artifact removal for real-time applications
CN108335330A (en) * 2017-12-31 2018-07-27 华中科技大学 A kind of collection of illustrative plates collaboration real time processing system
CN109859178A (en) * 2019-01-18 2019-06-07 北京航空航天大学 A kind of infrared remote sensing image real-time target detection method based on FPGA
CN111145075A (en) * 2019-12-16 2020-05-12 深圳云天励飞技术有限公司 Data processing system

Also Published As

Publication number Publication date
CN111145075B (en) 2023-05-12
CN111145075A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
US10762164B2 (en) Vector and matrix computing device
CN109240746B (en) Apparatus and method for performing matrix multiplication operation
WO2019136764A1 (en) Convolutor and artificial intelligent processing device applied thereto
CN103955447B (en) FFT accelerator based on DSP chip
CN111651205B (en) Apparatus and method for performing vector inner product operation
EP3796190A1 (en) Memory device and method
WO2021120646A1 (en) Data processing system
US20170255572A1 (en) System and method for preventing cache contention
JP6970827B2 (en) Arithmetic processing unit
CN112765540B (en) Data processing method and device and related products
US7657587B2 (en) Multi-dimensional fast fourier transform
US20220300253A1 (en) Arithmetic operation device and arithmetic operation system
JP2002032358A (en) Two cycle fast-fourier transform
JP3277399B2 (en) General-purpose processor for image processing
EP1162547A2 (en) In-Place Memory Management for FFT
JP4514086B2 (en) Arithmetic processing unit
US9311274B2 (en) Approach for significant improvement of FFT performance in microcontrollers
US10895911B2 (en) Image operation method and system for eye-tracking
JPS6270971A (en) Histogram calculator
CN116225532A (en) General processor supporting acceleration vector operation
JP2945726B2 (en) Parallel processing system
JP2862387B2 (en) Filtering method for ultra-high-speed image processing system
CN112766473A (en) Arithmetic device and related product
JPH04364525A (en) Parallel arithmetic unit
KR920008212B1 (en) 2-dimension fast fourier transform processor by using mixed shuffle connection method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20902533

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21.11.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20902533

Country of ref document: EP

Kind code of ref document: A1