CN1900927A - Reconstructable digital signal processor - Google Patents

Reconstructable digital signal processor Download PDF

Info

Publication number
CN1900927A
CN1900927A CN 200610086398 CN200610086398A CN1900927A CN 1900927 A CN1900927 A CN 1900927A CN 200610086398 CN200610086398 CN 200610086398 CN 200610086398 A CN200610086398 A CN 200610086398A CN 1900927 A CN1900927 A CN 1900927A
Authority
CN
China
Prior art keywords
data
output
module
complex multiplication
coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200610086398
Other languages
Chinese (zh)
Other versions
CN100594491C (en
Inventor
洪一
郭二辉
赵斌
洪灏
彭勇俊
陈风波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui core Century Technology Co., Ltd.
Original Assignee
CETC 38 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 38 Research Institute filed Critical CETC 38 Research Institute
Priority to CN200610086398A priority Critical patent/CN100594491C/en
Publication of CN1900927A publication Critical patent/CN1900927A/en
Application granted granted Critical
Publication of CN100594491C publication Critical patent/CN100594491C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The present invention discloses a kind of reconfigurable digital signal processor (DSP), which has internal hardware resource capable of being reconfigured according to different application requirements so as to realize filtering operation of different forms. The present invention has the advantages of both application specific integrated circuit (ASIC) and common DSP, possesses arithmetic capacity similar to that of large scale special device, and can suit for fast Fourier transform (FFT), inverse fast Fourier transform (IFFT), FIR pulse group processing and other digital signal real-time processing fields. It has simple usage and low cost.

Description

Reconstructable digital signal processor
Affiliated technical field
The invention discloses a kind of for the real-time reconstructable digital signal processor (DSP) of processing of the data signals such as FFT (FFT), invert fast fourier transformation (IFFT), the processing of FIR arteries and veins group, relevant treatment.
Background technology
Since the sixties in 20th century, along with developing rapidly of computing technique and information technology, Digital Signal Processing develops rapidly as an independent educational project and is used widely at numerous areas. Along with the fast development of large scale integrated circuit technology and semiconductor technology and improving constantly of various real-time processing requirement, digital signal processing capability also rapidly promotes with exponential speed, and bringing into play more and more important effect in scientific research, military affairs and the field such as civilian, digital signal processor has become the essential condition that supports these field high speed developments. In data signal is processed in real time, being most widely used of the filtering operations such as FFT (FFT), invert fast fourier transformation (IFFT), the processing of FIR arteries and veins group, relevant treatment. The implementation of hardware mainly contains at present based on nextport universal digital signal processor NextPort, based on field programmable gate array (FPGA)/scale programmable logic device (CPLD) with based on three kinds of special ICs (ASIC). On the one hand, three kinds of devices respectively have limitation, and the advantage of nextport universal digital signal processor NextPort is flexibility and the universality of programming, but its operational capability is limited. Jumbo FPGA/CPLD internal hardware resources is more, but need to develop separately firmware logic for concrete application, and human cost is high, and large capacity FPGA/CPLD is expensive. Traditional special IC framework and rigid line are connected and fixed, and function is more single, and its range of application receives limitation greatly. On the other hand, the specification requirement of Digital Signal Processing is but improving constantly, along with the broadband operation occasion constantly enlarges, the number of ARRAY PROCESSING constantly increases, cooperation and non-cooperation number target are processed the operand that relates to and continued to increase, the speed that signal is processed is constantly being raised the price. Require more than 100MHz such as 1024 FFT arithmetic speeds of plural number, some occasion need to be for more than the 500MHz. Above-mentioned three kinds of devices are the more and more difficult requirement of satisfying the real-time processing of data signal on function, price, adaptability, ease for use.
Summary of the invention
Technical problem to be solved by this invention provides a kind of reconstructable digital signal processor that data signal is processed in real time that is applied to, it has the operational capability of specialized large scale integrated circuit, and can adapt to the different data signals such as FFT (FFT), invert fast fourier transformation (IFFT), the processing of FIR arteries and veins group, relevant treatment and process in real time occasion, use simultaneously simple, cheap.
The technical solution used in the present invention is:
The hardware structure of reconstructable digital signal processor inside and hardware line can carry out structural rearrangement by the configuration control word, thereby realize the filtering operation of the various ways such as FFT (FFT)/invert fast fourier transformation (IFFT), FIR arteries and veins group and relevant treatment.
Main framework comprises input block, output unit, exchanges data unit and 4 elementary cells, comprise 160 real number floating-point multiplication accumulators in its elementary cell, and they is evenly distributed in 4 elementary cells.
The organizational form of hardware can be recombinated by the configuration control word: by the configuration of control word and control signal, can change the organizational form of described 160 real number floating-point multiplication accumulators and exchanges data unit, make it to select different mode of operations, to adapt to three kinds of different processor active tasks: FFT/IFFT, the processing of FIR arteries and veins group, related operation.
The hardware scheduling scheme is taked the centralized and distributed two-level scheduler method that combines: namely control word is carried out one-level decoding by global module first, carries out two-stage decode by each elementary cell again.
General frame adopts two-stage control framework, and global control module is used for coordinating 4 elementary cells, and there is the local control logic of himself each elementary cell inside. Exchanges data unit between 4 elementary cells is responsible for the data in each elementary cell are required to send into other 3 elementary cells according to different control.
Input block receives control word, control word is carried out one-level decoding, distributed control word to each unit. Control word and coefficient entrance 1 multiplexing same port, the control word receiver module receives control word, then send into the decoding of one-level decoding module, produce on the one hand overall control signal for generation of sequential in the sheet, for coefficient and data provide synchronously, on the other hand by the control word distribution module respectively in the sheet other unit launch. The coefficient synchronization module carries out coefficient entrance 1 and coefficient entrance 2 synchronously. Data simultaneous module is carried out data entrance 1 and data entrance 2 synchronously.
The exchanges data unit is the switch combination of inputting, exporting more a group more, swap data between 4 elementary cells.
Output unit sorts to the operation result of each elementary cell, and according to different formatted outputs. Elementary cell Output rusults order module is arranged the output of elementary cell when FFT/IFFT and the processing of FIR arteries and veins group according to channel order; When working in the relevant treatment operational pattern, will be adjusted into the continuous data stream identical with inputting data transfer rate from data each elementary cell, frame by frame output. Ask the mould module to finish the output format of real part/imaginary part to the conversion of the output format of mould value/phase angle. The module of taking the logarithm is converted to logarithm with the mould value of inputting and represents. Floating-point/fixed point modular converter can with real part/imaginary part module or ask the output of mould module to be converted to fixed point format by floating-point format. Index normalization module unifies to be fixed value by mantissa being done corresponding displacement with the index of the operation result of floating-point format. Above-mentioned 4 kinds of format converting module have respectively two covers, and the corresponding cover of each output port guarantees that two output ports can be independently with any one formatted output.
In the elementary cell, data storage comprises 8 512 * 40 two-port RAM, is used for the temporary and operation result output buffer memory of input-buffer, intermediate results of operations of operational data. Data buffer storage can be added to each complex multiplication accumulator and complex multiplication accumulator submatrix to data simultaneously. Coefficient memory comprises 10 256 * 32 two-port RAM, the weight coefficient when being used for storage relevant treatment and FIR filter coefficient and FFT computing. The coefficient of FFT interative computation is a table fixing, that realize with special logic. Corresponding 4 the real number floating-point multiplication accumulators of each complex multiplication accumulator, each complex multiplication accumulator submatrix comprises 16 real number floating-point multiplication accumulators, is equivalent to 4 complex multiplication accumulators. Two complex multiplication accumulator submatrixs and complex multiplication accumulator are used for finishing different computings according to different mode matched combined. The structure of real number floating-point multiplication accumulator is divided into 5 parts: fixed-point multiplication part, cut position part, index replacement part, fixed point addition section, index judgment part.
When three kinds of filtering operations, take corresponding organizational form to be:
When doing the FFT/IFFT computing, as the node of an interative computation of basic 8 algorithms, 4 complex multiplication accumulators in the submatrix are used for finishing one-level base 8 interative computations to the cumulative submatrix B of the cumulative submatrix A of complex multiplication and complex multiplication respectively. That complex multiplication accumulator A of submatrix front and complex multiplication accumulator B carry out windowing process as the windowing arithmetic unit to data. Data after the windowing enter the cumulative submatrix of complex multiplication and carry out one-level base 8 interative computations. The output of every one-level interative computation at first imports data buffer storage, and then delivers to the cumulative submatrix of corresponding another one complex multiplication by the exchanges data unit, to carry out the next stage interative computation.
When doing FIR arteries and veins group and process, the in parallel use of parallel multiplication in 2 complex multiplication accumulator module of elementary cell inside and 2 the complex multiplication accumulator submatrix modules, a channel of the corresponding FIR arteries and veins of each complex multiplication accumulator group. Corresponding 2 channels during conjugate operation.
When carrying out related calculation, complex multiplication accumulator and the series connection of complex multiplication accumulator submatrix are used, 10 the complex multiplication accumulators that are equivalent to connect, and the parallel multiplication of elementary cell inside is configured to the form of cascade at this moment.
The hardware scheduling scheme that adopts is:
By control word and control signal the two-level scheduler pattern that local control combines in overall situation control and the elementary cell is adopted in the scheduling of hardware resource, control word and control signal at first enter global control module and carry out one-level decoding and control word distribution. In this module, at first receive control word, then produce some overall control signals according to control word. These overall situation controls comprise: coordinate the action of 4 elementary cells, determine the output format of operation result, to the setting of master clock in the sheet. Second effect of global control module is the Local Control Module emission control information to each elementary cell. These information that are launched comprise: subtype, the computing of mode of operation, mode of operation counted, the number of number of channels, treatment channel.
Local Control Module in each elementary cell carries out two-stage decode to these information after the information of receiving the global control module emission, be converted into the details of hardware scheduling control signal.
The present invention has significant technological progress and good effect:
The present invention has had the advantage of conventional dedicated integrated circuit and nextport universal digital signal processor NextPort concurrently.
The high speed processing ability: the present invention adopts restructural dedicated digital signal processor scheme, and sheet contains great amount of hardware resources, nearly 160 in floating-point multiplication accumulator, and comparing with the conventional dedicated integrated circuit on disposal ability still had it. Because the device inside algorithm is all realized with hardware, thereby is had the incomparable high-speed computation performance of nextport universal digital signal processor NextPort chip. Device inside computing form is floating-point, and the maximum number of points that monolithic is finished FFT (FFT) computing is 4096 points, and finishing 4096 FFT operation times is 25.6us, finish with the FFT+IFFT time of counting be 51.2us; It is that the following maximum filter length of 128,80 channels is that the above maximum filter length of 256,80 channels is 128 that FIR arteries and veins group is processed the highest channel number; The maximum computing length of relevant treatment computing monolithic single channel is 4000, but maximum 16 passages of parallel processing, and the corresponding relation between each channel data and each the group coefficient is very flexible.
Reconfigurable function: what be different from traditional dedicated IC chip is, the resource of device inside can be carried out structural rearrangement according to different application demands, thereby can realize the filtering operation of the various ways such as FFT, IFFT, FIR arteries and veins group and relevant treatment. Strengthen device and used flexibility, expanded the range of application of device.
Occupation mode easily: the present invention combines the flexibility of nextport universal digital signal processor NextPort to a certain extent, has adopted exclusive 64bit function control word that device is configured, to satisfy different application demands. Use very easyly, do not have loaded down with trivial details programming and debugging, also do not need to resemble and need to carry out logical design and Time-Series analysis the FPGA, only need send 64 control words, operational data gets final product according to the standard time sequence input. As long as change control word, the resource structures of device inside is just recombinated, and device is also just according to another work pattern.
Description of drawings
Fig. 1 hardware structure block diagram one
Fig. 2 basic cell structure block diagram
Fig. 3 real multiplications accumulator structured flowchart
The elementary cell configuration structure block diagram of Fig. 4 FFT computing
The elementary cell configuration structure block diagram of Fig. 5 FIR arteries and veins group computing
The elementary cell configuration structure block diagram of Fig. 6 related operation
Fig. 7 hardware resource two-level scheduler framework
Fig. 8 hardware structure block diagram two
Fig. 9 input block is realized block diagram
Figure 10 output unit is realized block diagram
32 control words of Figure 11 are inputted real order figure
16 control words of Figure 12 are inputted real order figure
The specific embodiment
The invention will be further described below in conjunction with accompanying drawing.
Main body of the present invention is 160 parallel multiplications, is evenly distributed in 4 elementary cells (basic unit). By the configuration of control word and control signal, can allow the hardware resources such as these 160 parallel multiplications and memory, exchanges data unit work in different organizational forms, these organizational forms have just determined the different working modes type of device. In addition, the present invention adopts the two-level scheduler method to the scheduling of hardware resource, is controlling the way that adopts centralized control to combine with distributed control.
The specific embodiment of the present invention is divided into 3 levels, and ground floor is the device hardware structure; The second layer is hardware organization's mode; The 3rd layer is to the hardware resource dispatching method. The below is illustrated respectively these three levels.
One, the main framework of hardware
Hardware structure of the present invention such as Fig. 1 and shown in Figure 8.
The present invention adopts full Synchronization Design, and namely whole chip only has a clock zone. Consider that from the balanced equal angles of function division, logic word resource the Top-layer Design Method of whole device is 7 parts, namely take 4 elementary cells as main body, adds input block, output unit, 3 parts of exchanges data.
Device adopts two-stage control framework, and global control module is used for coordinating 4 elementary cells, and there is the local control logic of himself each elementary cell inside. These 4 main bodys that elementary cell is device are finished the major functions such as data buffer storage, computing, address generate. Exchanges data unit between 4 elementary cells is responsible for the data in each elementary cell are required to send into other 3 elementary cells according to different control. For satisfying different application, device inside is integrated some accommodation function modules comprise: mod circuit, the circuit of taking the logarithm, index normalization circuit, floating-point-fixed point change-over circuit etc. These special-purpose functional modules can be exported the operation result of elementary cell by different way, and two output ports are fully independent, are independent of each other. In addition, device inside is an also integrated phaselocked loop that is used for frequency multiplication, like this so that the internal arithmetic clock both can be directly from the outside input, also can be first by low-speed clock of outside input, again by inner frequency multiplication of phase locked loop afterwards as the computing clock.
The specific implementation of input block as shown in Figure 9. Except being used for the test input pin of design for Measurability, the input pin of all functions at first enters input block. Input block is undertaken following task: receive control word, control word is carried out one-level decoding, distributed control word to each unit, input coefficient is carried out synchronously, carries out synchronously, produces overall clock signal required in the sheet etc. to inputting data. Below just the modules in the input block is briefly explained one by one. Control word and coefficient entrance 1 multiplexing same port (coeff_in[31:0]), therefore the control word of input at first must be received, this is the task of control word receiver module. If one group of register of control word receiver module internal main divides 64 control words according to the enable signal of the outer input of sheet and to squeeze into for 2 times or 4 times in this group register. After receiving control word, the one-level decoding module is decoded as overall control signal with the part control word, and these overall control signals provide synchronous etc. for generation of sequential in the sheet, for coefficient and data. The task of control word distribution module is, with respectively other unit emissions in the sheet after 64 control words decodings. Because coefficient entrance 1 is undertaken the task of Input Control Word simultaneously, and coefficient entrance 2 is not undertaken this task, so coefficient entrance 1 is different from 2 time-delays of coefficient entrance, so must carry out synchronously coefficient entrance 1 and coefficient entrance 2 with a synchronization module. Similarly, data entrance 1 and data entrance 2 are not quite similar in function, so the data of these two data entrances must be synchronous according to mode of operation and data type.
The specific implementation of output unit as shown in figure 10. The Main Function of this unit is that the operation result to each elementary cell sorts, and according to different formatted outputs. Filtering operation is finished in elementary cell, and 4 elementary cells export its result separately, therefore the Output rusults of 4 elementary cells must be sorted, to guarantee final Output rusults order output. Acting as of " ordering of elementary cell Output rusults " module: when device worked in FFT/IFFT and the processing of FIR arteries and veins group, this module was arranged the output of elementary cell according to channel order; When working in the relevant treatment operational pattern, elementary cell is the formatted output of a frame according to per 10, there is the interval between two Frames, this moment, this module was adjusted the output data format of elementary cell, to from data each elementary cell, that export frame by frame, be adjusted into the continuous data stream identical with inputting data transfer rate. Ask the mould module mainly to finish the output format of real part/imaginary part to the conversion of the output format of mould value/phase angle. The module of taking the logarithm can be converted to logarithm with the mould value of input and represent. Floating-point/fixed point modular converter can with real part/imaginary part module or ask the output of mould module to be converted to fixed point format by floating-point format. The effect of index normalization module is similar to floating-point/fixed point modular converter, and it is by doing corresponding displacement to mantissa, thereby is certain fixed value with the index of the operation result of floating-point format is unified. 4 kinds of above-mentioned format converting module have respectively two covers, and the corresponding cover of each output port of chip has guaranteed that like this two output ports can be independently with any one formatted output.
The exchanges data unit is actually the switch combination of inputting, exporting more a group more. It receives 8 groups of inputs from 4 elementary cells, then by selector switch any one group of input is switched to any one group of output. Like this, data can freely be transmitted between 4 elementary cells, when realizing special algorithm, provide the guarantee of data path for the cooperation between 4 elementary cells.
4 elementary cells are main parts of the present invention, finish the most functions such as data buffer storage, computing, address generate. Its structure as shown in Figure 2. 4 elementary cells have unified framework. " data storage " comprised 8 512 * 40 two-port RAM, is used for the temporary and FFT operation result output buffer memory of data input-buffer, FFT intermediate results of operations of operational pattern. As seen from the figure, " data buffer storage " can be added to each " complex multiplication accumulator " and " complex multiplication accumulator submatrix " to data simultaneously. " coefficient memory " comprised 10 256 * 32 two-port RAM, is used for the computing of storage relevant treatment, the filter coefficient of FIR filtering operation and the weight coefficient of FFT computing. " FFT interative computation coefficient " with one fixing, with the table that special logic is realized, be used for providing the interative computation coefficient of FFT/IFFT computing. The floating-point multiplication accumulator of each " complex multiplication accumulator " corresponding 4 real numbers among the figure, each " complex multiplication accumulator submatrix " then comprises the floating-point multiplication accumulator of 16 real numbers, is equivalent to 4 " complex multiplication accumulators ". Two " complex multiplication accumulator submatrix " and " complex multiplication accumulator " can be finished different computings according to different mode matched combined.
When doing the FFT/IFFT computing, " complex multiplication accumulator " is weighted the input data that need multistage operations as the weighting multiplier, then sends into " complex multiplication add up submatrix " and carries out basic 8 interative computations. That is to say, when doing the FFT/IFFT computing, " complex multiplication accumulator A " in the elementary cell and " complex multiplication accumulator submatrix A " form base 8 operation core, " complex multiplication accumulator B " and " complex multiplication accumulator submatrix B " forms another basic 8 operation core, whole like this chip just has 8 such operation core, and the computing with FFT/IFFT when computing decomposes this 8 operation core parallel processings.
When doing FIR arteries and veins group and process, the in parallel use of parallel multiplication in 2 " complex multiplication accumulator " modules of elementary cell inside and 2 " complex multiplication accumulator submatrix " modules, a channel of the corresponding FIR arteries and veins of each complex multiplication accumulator group. If the employing conjugate operation, corresponding 2 channels of each complex multiplication accumulator then, if once multiplexing again, could more multichannel computing of correspondence.
When carrying out related calculation, " complex multiplication accumulator " and " complex multiplication accumulator submatrix " series connection is used, and 10 complex multiplication accumulators are equivalent to connect; When doing the FIR filtering operation, " complex multiplication accumulator " and " complex multiplication accumulator submatrix " is in parallel to be used, corresponding 10 channels of filtering operation; When doing the FFT computing, " complex multiplication accumulator " as the weighting multiplier, gives " complex multiplication accumulator submatrix " after data are weighted again and does interative computation.
The present invention adopts floating point data format to carry out complex operation, and complex operation is comprised of 4 real arithmetics, so real number floating-point multiplication accumulator is its main operational parts of device. The structured flowchart of real number floating-point multiplication accumulator is divided into 5 parts as shown in Figure 3: fixed-point multiplication part, cut position part, index replacement part, fixed point addition section, index judgment part.
Device treat totally 20 of operational datas, form is 4 has symbol mantissa without symbol index+16, the dynamic range that can represent is-215×2 15~2 15×2 15-1. Coefficient is 16 signed fixed-point numbers, and the dynamic range that can represent is-215~2 15-1. For the intermediate data of computing, the compromise precision factor of considering computing and hard-wired area, speed factor adopt 24 floating point data formats, and namely 4 have symbol mantissa without symbol index+20. The calculating process of parallel multiplication has following 5 steps.
1., the fixed-point multiplication part, being used for 16 of data has symbol mantissa and 16 to have the coefficient of symbol to fix a point to multiply each other.
2., the cut position part, be used for keeping to greatest extent fixed-point multiplication result's precision. According to the figure place of the redundant symbol position of 32 multiplication results after the fixed-point multiplication, determine 20 mantissa and corresponding 4 indexes of this mantissa of 24 intermediate operations data that will keep. If 32 fixed-point multiplication result is with k redundant symbol position, and can only keep 20, so obvious, in order to obtain maximal accuracy, preferably cut out the redundant symbol (the k position moves to left) of this k position and subtract k at index simultaneously. Mantissa has become the form of a bit sign position and 19 bit data positions thereafter like this. This just is equivalent to 32 fixed-point multiplications a as a result31a 30a 29……a 2a 1a 0×2 eProceed as follows simultaneously:
Remove redundant symbol position and cut position:
Figure A20061008639800171
Figure A20061008639800172
Index subtracts k on original basis:
Figure A20061008639800173
For the situation of e>k>12, namely move to left surpass 12 after, should mend 0 at low level. Should be noted that simultaneously, can not make to have subtracted the later index of k less than 0. Like this, in cut position, will consider the figure place of redundant symbol position and the size of index, if original index less than the figure place of redundant symbol position, i.e. e<k so just can only cut out the redundant symbol position of e position, simultaneously index is kept to 0:
Remove redundant symbol position and cut position:
Figure A20061008639800182
Index subtracts k on original basis:
3., for the addition of floating number, must addend identical with the index of summand the two mantissa could addition. The index replacement part plays a part to adjust the index of addend or summand so that the two is equal just. Suppose A1=a1 * 2e1、A2=a2×2 e2Two number additions, and e1<e2 will be adjusted to the numerical value identical with e2 to the exponent e 1 of A1 so, simultaneously the mantissa of A1 be done the expansion of e2-e1 bit sign position and the e2-e1 position that moves to right.
4., through after the index replacement, addend and summand index are unified, the addition of just mantissa after adjusting of the two can being fixed a point after sign bit is expanded obtains 21 fixed point addition results.
5., because index replacement that always will be less in 3. index replacement process be larger index, so after the fixed point addition, will and remove the redundant symbol position to 21 fixed point, judge simultaneously the value of addition index afterwards, the effect of index judgement that Here it is.
Two, the organizational form of hardware resource
The present invention is a restructural dedicated digital signal processor, and so-called " restructural " namely can organize hardware resource by disposing different control words, and hardware resource is operated under the different patterns. The present invention can be configured to three class mode of operation: FFT/IFFT, the processing of FIR arteries and veins group, relevant treatment computing. Each class of this three classes mode of operation can be regulated its key parameter by control word again, satisfies different processing demands. The organizational form of hardware resource under this three classes mode of operation is described respectively below in conjunction with accompanying drawing.
1、FFT/IFFT
The fundamental formular of discrete Fourier transform (DFT) is:
Wherein w (i) is the DFT weighted factor, and x (i) is the input data,Be twiddle factor. N is counting of DFT computing. If N is a composite number, then long DFT computing of counting can be converted into two weak points (N that counts1、N 2) the DFT computing. If N1、N 2Can continue to decompose, then this decomposition can go on always, and operand can further descend.
According to different decomposition methods, FFT has the scheduling algorithms such as base 2, base 4, base 8, mixed base. FFT/IFFT of the present invention adopts basic 8 computings, this mode of operation can be subdivided into FFT, IFFT, FFT+IFFT three types, and its processing to count can be 256 points, 512 points, 1024 points, 2048 points, 4096 etc., and counting less than or equal to 2048 o'clock, can also carry out FFT or IFFT processing to two paths of data simultaneously. No matter the processing of FFT/IFFT counts what are, under this type of mode of operation, 4 elementary cells (basic_unit) of device all will be configured to mode as shown in Figure 4. As the node of an interative computation of basic 8 algorithms, 4 complex multiplication accumulators in the submatrix are for finishing one-level base 8 interative computations respectively for this moment shown in Figure 2 " complex multiplication add up submatrix A " and " complex multiplication add up submatrix B ". That of submatrix front " complex multiplication accumulator A " reaches " complex multiplication accumulator B " and as the windowing arithmetic unit, data carried out windowing process. Data after the windowing enter " complex multiplication add up submatrix " and carry out one-level base 8 interative computations. The output of every one-level interative computation at first imports data buffer storage, and then delivers to corresponding another one " complex multiplication add up submatrix " by the exchanges data unit among Fig. 1, to carry out the next stage interative computation. The below describes hardware organization's mode under the FFT/IFFT pattern for single elementary cell.
When the GA3816 device carried out the FFT conversion, if computing is counted greater than 256, device inside was worked in the time-sharing multiplex mode, and processing is counted larger, and multiplexing number is more. Data storage is configured to the data buffer storage of ping-pong structure, and total data storage capacity is 32 * 256 * 40bits in the device, and the data buffer storage that is assigned in each elementary cell is 2 * 4 * 256 * 40bits. Each elementary cell data buffer memory marks the address space of half, i.e. 4 * 256 * 40bits is as the data input-buffer. Like this, 4 elementary cells of individual devices can be finished maximum 4096 FFT computings.
When carrying out the FFT computing, in order to suppress secondary lobe, need to carry out windowing process to the input data. The dual port RAM that 40 * 256 * 32bit is arranged at the GA3816 device inside is as coefficient memory, and being assigned to each elementary cell has 10 * 256 * 32bits. In each elementary cell, get 8 * 256 * 32bits as the window function buffer memory, and this buffer memory also is divided into two groups, each group addressing degree of depth is 1024, provides window function to each " complex multiplication accumulator submatrix " windowing arithmetic unit before respectively.
Because the twiddle factor of FFT/IFFT computing has regularity, so the present invention is stored in the twiddle factor of basic 8FFT computing in the fixing table. It is 4096 that the achievable maximum FFT/IFFT of monolithic of the present invention counts, and chip internal has this twiddle factor coefficient table of 8 512 * 32bit, and by mean allocation in four elementary cells (basic_unit). If computing is counted less than 4096, then twiddle factor just extracts from this table. We have also designed the coefficient table of 4 * 8 * 32bits in addition, the coefficient when being used for storing basic 8 computing.
When FFT counts less than or equal to 2048 the time, the present invention is directed to FFT/IFFT and designed the second hardware resource organizational form: two-way carries out the FFT computing simultaneously. Two-way advances in the data pattern, and two elementary cells of basic_uint0, basic_unit2 are one group, is used for finishing first via computing; Two elementary cells of basic_uint1, basic_unit3 are another group, are used for finishing other one tunnel computing. Although hardware resource is divided into two groups, the principle of two groups of hardware base 8 computings is constant. Two paths of data is inputted the input of 2 ports from data input 1, data simultaneously, and two groups of results of computing gained are simultaneously from output port 1, output port 2 parallel outputs. This pattern can be used for two groups independently data be simultaneously FFT/IFFT or one group of data is Two-dimensional FFT/IFFT.
2, FIR arteries and veins group is processed
The basic function that FIR arteries and veins group is finished dealing with is a matrix operation, and its fundamental formular is: Y=H*X
Wherein:
Figure A20061008639800212
The matrix operation of following formula can be regarded the multiplication accumulating operation as in realization. If when observing separately the computing of the delegation of H matrix (below be also referred to as " coefficient matrix ") and X matrix (below be also referred to as " data ") row and operational data for plural number, its basic operation form is exactly that multiplication is cumulative, expression formula is:
As seen, FIR arteries and veins group is processed in fact total i group multiply accumulating computing, and every group comprises and multiply each other and add up for j-1 time for j time.
Except the citation form that FIR arteries and veins group is processed, the present invention proposes FIR arteries and veins group and process several extend types: the two channel addition forms that slip FIR arteries and veins group is processed, the FIR arteries and veins group of two groups of data parallels is processed, FIR arteries and veins group is processed etc. No matter citation form or extend type are processed under this class mode of operation in FIR arteries and veins group, the hardware resource in 4 elementary cells (basic_unit) all will be configured to organizational form shown in Figure 5.
In the organizational form shown in Figure 5,4 complex multiplication accumulators are together with " complex multiplication accumulator A " composition " multiplication in parallel add up array 1 " in " complex multiplication add up submatrix A "; 4 complex multiplication accumulators are together with " complex multiplication accumulator B " composition " multiplication in parallel add up array 2 " in " complex multiplication add up submatrix B ". In " multiplication in parallel add up array ", 1 complex multiplication accumulator is used for FIR arteries and veins group and processes 1 channel (2 channels of conjugation situation) computing, such one " multiplication in parallel add up array " can 5 channels of parallel processing (processing 10 channels in the conjugation situation), so 1 elementary cell can 10 channels of parallel processing (processing 20 channels in the conjugation situation). Therefore, when the input data transfer rate equaled chip computing clock, 4 elementary cells can walk abreast at most and finish 80 channel FIR arteries and veins groups processing. If " multiplication in parallel add up array " is multiplexing, then can finishes at most 128 channel FIR arteries and veins groups and process.
The coefficient storage that FIR arteries and veins group is processed is in coefficient memory shown in Figure 2. Each elementary cell has the dual port RAM packing coefficient of 10 256 * 32bits, each multiplication unit is equipped with a dual port RAM, some complex multiplication accumulators in " multiplication in parallel add up array " provide coefficient regularly, and such parallel multiplication provides the dual port RAM of coefficient to consist of the channel that FIR arteries and veins group is processed together with that to it.
The dual port RAM of 8 512 * 40bits is arranged as data storage in each elementary cell (basic_unit). When FIR arteries and veins group was processed, these 8 dual port RAMs were all for data buffer storage. In each elementary cell, data buffer storage receives identical data from the data input port. And the coefficient that is added to each parallel multiplication data input pin and different channel carries out the multiplication accumulating operation.
The second hardware resource organizational form that FIR arteries and veins group is processed is that the FIR arteries and veins group of two groups of parallel datas is processed. Under this organizational form, first elementary cell (basic_uint0) and three basic unit (basic_uint2) are divided into one group, and second elementary cell (basic_uint1) and the 4th elementary cell (basic_uint3) are divided into another group. " multiplication in parallel add up array " in first group of elementary cell is used for the data of " data input 1 " port input are processed, and second group " multiplication in parallel add up array " then is used for data processing that " data input 2 " port is entered. Data storage in first group of elementary cell and coefficient memory are used for first group of data of storage and coefficient; Data storage in second group of elementary cell and coefficient memory then are used for second group of data of storage and coefficient. So, monolithic device just can carry out the processing of FIR arteries and veins group to two groups of different pieces of informations concurrently.
The above has introduced the pattern of two groups of data parallel computings, under the sort of pattern, the hardware resource of whole chip is divided into two groups, and each group is processed separately one group of data. Similarly, when channel number less than or equal to 40 filter length more than or equal to 80 the time, in order to improve arithmetic speed, pending data can be equally divided into two sections according to 1/2 of filter length, enter from data-in port 1 the last period, and rear one section enters from data-in port 2, then utilize in the sheet two groups of hardware resources simultaneously with separately multiplication, cumulative, and then incite somebody to action the separately results added of multiply accumulating, obtain complete result. The third hardware resource organizational form under the FIR arteries and veins group work for the treatment of pattern that Here it is.
3, relevant treatment computing
The relevant treatment computing refers to that therefrom intercepting continuously N data with sliding type from the data sequence of a continuous sampling carries out filtering operation. Its computing characteristics are to have N-1 data identical in the two adjacent groups filtering operation. Its mathematic(al) representation is:Wherein N is wave filter computing length. xiBe input signal, hiBe coefficient corresponding to wave filter.
Hardware resource configuration during relevant treatment computing mode of operation in the elementary cell as shown in Figure 6. The parallel multiplication of elementary cell inside is configured to the form of cascade at this moment. Each elementary cell comprises the tired device of 10 floating-point CMs. The output of first complex multiplication accumulator is inputted as second complex multiplication accumulator, carry out sum operation with second multiplication accumulation result, addition result is as the output of second complex multiplication accumulator, be input in the 3rd the complex multiplication accumulator, an addend as the 3rd add operation ... by that analogy, 10 complex multiplication accumulator cascades are used. Like this, each elementary cell (basic_unit) can be organized into the operating structure of the tired device cascade of 10 CMs, multiplexing by to these " level continued multiplication cumulative array " different number of times can be finished 10*N point (N=1,2,3 ... 100) related operation of filter length. In like manner, also adopt cascade mode between 4 elementary cells, previous elementary cell obtain the part multiplication of related operation cumulative and, deliver to next elementary cell, as the addend of first complex multiplication accumulator in the next elementary cell " the cumulative array of level continued multiplication ". Such 4 elementary cells successively cascade are cumulative, and final related operation result produces in the 4th elementary cell (basic_unit3).
The data storage that same capability is arranged in each elementary cell of device: the dual port RAM of 8 512 * 40bits. Under the related operation pattern, the data storage in each elementary cell is unified addressing, as data buffer storage. When data are inputted, the data that the storage of the data buffer storage of each elementary cell is identical. During computing, with for the moment individual clock beat, provide identical data by data buffer storage to 10 cascade complex multiplication accumulators in the same elementary cell.
The coefficient of related operation is temporary in the dual port RAM of 10 256 * 32bits in each elementary cell. In elementary cell, the dual port RAM correspondence of each 32bit the tired device of a complex multiplication in the cumulative array of level continued multiplication, provides coefficient to that complex multiplication accumulator regularly. The storage characteristics of coefficient sequence in these 10 dual port RAMs is that coefficient is according to hn,h n+1…h n+9Order be stored in successively in the 1st to the 10th dual port RAM. That is to say that for same dual port RAM, the sequence number of the coefficient that store its address n and n+1 position, address differs 10. This is in order to cooperate 10 complex multiplication accumulators in the cumulative array of grade continued multiplication. Because on the streamline of the cumulative array of level continued multiplication, the same clock cycle need to be carried out 10 multiply accumulating computings, this just requires coefficient to be buffered in the same clock cycle 10 continuous coefficients is provided, so coefficient is stored with this characteristics.
Three, the dispatching method of hardware resource
The present invention adopts the two-level scheduler framework to the scheduling of hardware resource, adopts centralized Control to combine with the control that distributes in control method. Fig. 7 is device two-level scheduler frame diagram. Control word and control signal at first enter " overall situation control " module and carry out one-level decoding and control word distribution. In this module, at first receive control word, then produce some overall control signals according to control word. These overall situation controls comprise: coordinate the action of 4 elementary cells, determine the output format of operation result, to arranging of master clock in the sheet etc. Second effect of " overall situation control " module is to each elementary cell " local control " module emission control information. Control signal and control word at first enter " overall situation control " module, and be decoded with the control word of the various relating to parameters of s operation control in this module, and to each elementary cell emission. These information that are launched comprise: subtype, the computing of mode of operation, mode of operation counted, the number of number of channels, treatment channel etc.
Local Control Module in each elementary cell (basic_unit) can carry out two-stage decode to these information after the information of receiving the global control module emission, be converted into the details of hardware scheduling control signal. For example, according to the folding of mode of operation and some selector switches of type decided thereof; Number according to treatment channel decides coefficient how to store, and how the coefficient memory address produces; Counting according to computing determines the multiplexing number of the cumulative array of multiplication, and then determines whether certain register upgrades and update time etc. The scheduling of hardware resource is divided into two-stage, mainly is for the logic on controlling and parsimony. From the angle that realizes, more clear and clear.
Inner in elementary cell, adopt the method for arithmetical operation to come key parameter is deciphered. No matter FFT/IFFT, FIR arteries and veins group or relevant treatment computing, the multiplexing number of the storage/access address of its data and coefficient, the cumulative array of multiplication etc. all has regular, can be on the basis of these rules, utilize arithmetical operation that control word is decoded, obtain all control informations and clock signal. For example, in the computing of FIR arteries and veins group, the filtering exponent number is 40, and counting also is 40, and then special-purpose decoding multiplier can calculate the coefficient that altogether needs 40 * 40=1600 to order. When producing the coefficient write address, can produce successively 0 ~ 1599 write address so, coefficient is write corresponding buffer memory. Adopt the Another reason of arithmetical operation decoding process, be for trading off between the complexity of dispatching and the use resource: at first, 8 * 8 fixed-point multiplication device can't take too many resource; Secondly, if processor all working status Bar is enumerated, realize decoding with lookup table mode, loaded down with trivial details and huge workload is not only arranged, its memory cell and to search the resource that logic takies also be very considerable.
The present invention has 64 to the control word that the hardware resource dispatching office adopts, and under the mode of operation of FFT/IFFT, FIR arteries and veins group, related operation, the explanation of control word is respectively such as table 1, table 2, table 3
Control word Title Meaning and value
  chip_id(3:0) Level consecutive numbers Do not use, can perseverance be set to 0000.
  chip_num(3:0) Level is counted in flakes Do not use, can perseverance be set to 0000.
  chan_num(7:0) Port number Do not use, can perseverance be set to 00000000.
  work_model(4:0) Mode of operation FFT: value is 10000; IFFT: value is 10010; FFT+IFFT: value is 10011; Two-way advances data FFT: value is 10100; Two-way advances data I FFT: value is 10110.
  coeff_num(4:0) The coefficient sets number Do not use, can perseverance be set to 00000.
  coeff_chan(4:0) The coefficient passage Do not use, can perseverance be set to 00000.
  conj Conjugation is selected Do not use, can be set to ' 0 '.
  length(7:0) Filter length What low four length (3:0) characterized is the data length (also being counting of one group of weight coefficient) of FFT/IFFT computing. Take 2 the end of as, positive integer value corresponding to length (3:0) is as index, and the power of obtaining is exactly the length of computing. Treatable length code is at present: " 1000 "-256 point: " 1001 "-512 point; " 1010 "-1024 point; " 1011 "-2048 point; " 1100 "-4096 point. The positive integer value that length (6:4) is corresponding adds 1 value that obtains and represents ultra long FFT/IFFT computing and split into the needed computing progression of point base eight computings, and its value is relevant with computing length. Three grades of 256 point~512 point processings palpuses, corresponding code is: " 010 "; 1024 point~4096 point processings palpus level Four, corresponding code is: " 011 ". Highest order length (7) temporarily need not, be set to ' 0 '.
  sel_coeff_ram(4:0   ) The coefficient plot During the FFT/IFFT computing, replaceable coefficient sets in calculating process. Corresponding certain group coefficient in the coefficient memory that the integer value that sel_coeff_ram (4:0) is corresponding indicates to use. The optional value of coefficient sets number is counted relevant with computing. 4096 have 2 groups of coefficients interchangeable, and 2048 have 4 groups, and 1024 have 8 groups, and 512 have 16 groups, and 256 then have 32 groups of coefficients interchangeable. Be that the sel_coeff_ram span arrives between " 11111 " in " 00000 ".
  sel_data Whether weighting Under the FIR arteries and veins group operational pattern, whether selection is to the input data weighting. FFT pattern perseverance is made as ' 0 '.
  sel_coeff_pp Coefficient table tennis Whether coefficient was rattled and is write when two-way advanced data. Do not rattle, be set to ' 0 '; Table tennis is set to ' 1 '.
  resultl_mod(1:0) Delivery outlet 1 way of output The as a result way of output of final result output port 1 can have 4 kinds. This control word gets 00: output I/Q data; Get 01: output mould value, phase angle; Get 10: output logarithm, phase angle; Get 11: the relevant cascade data of output.
  sel_AGC1(1:0) Delivery outlet 1 gain control Final result output port 1 both can be according to the form of input data, and I/Q is respectively 20 floating-point outputs; Also can get a normalized floating point values as standard with the value of control word GAC1 (3:0); I/Q can also be converted into 20 fixed-point number output. Sel_AGC1 (1:0) got 00,11 o'clock, floating-point output; Got index normalization output at 01 o'clock; Got fixed point output at 10 o'clock.
  AGC1(3:0) Delivery outlet 1 normalization level Sel_AGC1=01,10 o'clock, all need to determine the exponential quantity of a benchmark, and AGC1 (3:0) i.e. benchmark index value for this reason, when index normalization output or fixed point output, all as benchmark.
  result2_mod(1:0) Delivery outlet 2 way of outputs Be similar to result1_mod (1:0). Get 00: output I/Q data; Get 01: output mould value, phase angle; Get 10: output logarithm, phase angle; Get 11: the relevant cascade data of output.
  sel_AGC2(1:0) Delivery outlet 1 gain control Be similar to sel_AGC1 (1:0). Get 00,11: floating-point output; Get 01: index normalization output; Get 10: fixed point output.
  AGC2(3:0) Delivery outlet 1 normalization level Be similar to AGC1 (3:0). Work as sel_AGC2=01,10 o'clock, the floating-point index of output was got the AGC2 value and is benchmark, does index normalization or floating-point and turns fixed-point processing.
Table 1
Control word Title Meaning and value
 chip_id(3:0) Level consecutive numbers Perseverance is set to 0000 under the FIR pattern.
 chip_num(3:0) Level is counted in flakes Perseverance is set to 0000 under the FIR pattern.
 chan_num(7:0) Channel number The number of channels of FIR arteries and veins group computing. Determine according to using needs. As, parallel 50 channel computings, then chan_num+1=50.
 work_model(4:0) Mode of operation Value is 01000 under the FIR arteries and veins group operational pattern; Value is 00010 under the slip FIR arteries and veins group operational pattern; Value is 00100 under the FIR arteries and veins group mode of two groups of data parallel computings;
Value is 00001 under the FIR arteries and veins group mode of two channel additions; FIR arteries and veins group realizes the DFT pattern, and value is 01000 with common FIR arteries and veins group operational pattern, and control word sel_data must be set to ' 1 ' simultaneously.
  coeff_num(4:0) The coefficient sets number Perseverance is set to 00000 under FIR arteries and veins group mode.
  coeff_chan(4:0) The coefficient passage Perseverance is set to 00000 under FIR arteries and veins group mode.
  conj Conjugation is selected Coefficient conjugation whether under the FIR arteries and veins group mode. If bank of filters adopts the coefficient of conjugation between channel, then be equivalent to arithmetic speed and double. Be made as ' 0 ', not conjugation; Be made as ' 1 ', conjugation.
  length(7:0) Filter length Filtering operation length under the FIR arteries and veins group mode, filter length N=length+1.
  sel_coeff_ram(4:0   ) The coefficient plot The calculating process coefficient is changed plot under the FIR arteries and veins group mode. High 2 perseverances are set to 00, the coefficient in which zone in the middle of the low 3 bit representation coefficient of utilization memories.
  sel_data Whether weighting Under the FIR arteries and veins group operational pattern, whether selection is to the input data weighting. ' 0 ', not weighting; ' 1 ' weighting. When and if only if chip operation was realized the DFT computing in FIR arteries and veins group, this control word was ' 1 '.
  sel_coeff_pp Coefficient table tennis Under the FIR arteries and veins group mode, fixedly be made as ' 0 ' herein.
  result1_mod(1:0) Delivery outlet 1 way of output The as a result way of output of final result output port 1 can have 4 kinds. This control word gets 00: output I/Q data; Get 01: output mould value, phase angle; Get 10: output logarithm, phase angle; Get 11: the relevant cascade data of output.
  sel_AGC1(1:0) Delivery outlet 1 gain control Final result output port 1 both can be according to the form of input data, and I/Q is respectively 20 floating-point outputs; Also can get a normalized floating point values as standard with the value of control word GAC1 (3:0); I/Q can also be converted into 20 fixed-point number output. Sel_AGC1 (1:0) got 00,11 o'clock, floating-point output; Got index normalization output at 01 o'clock; Got fixed point output at 10 o'clock.
  AGC1(3:0) Delivery outlet 1 normalization level Sel_AGC1=01,10 o'clock, all need to determine the exponential quantity of a benchmark, and AGC1 (3:0) i.e. benchmark index value for this reason, when index normalization output or fixed point output, all as benchmark.
  result2_mod(1:0) Delivery outlet 2 way of outputs Be similar to result1_mod (1:0). Get 00: output I/Q data; Get 01: output mould value, phase angle; Get 10: output logarithm, phase angle; Get 11: the relevant cascade data of output.
  sel_AGC2(1:0) Delivery outlet 1 gain control Be similar to sel_AGC1 (1:0). Get 00,11: floating-point output; Get 01: index normalization output; Get 10: fixed point output.
  AGC2(3:0) Delivery outlet 1 normalization level Be similar to AGC1 (3:0). Work as sel_AGC2=01,10 o'clock, the floating-point index of output was got the AGC2 value and is benchmark, does index normalization or floating-point and turns fixed-point processing.
Table 2
Control word Title Meaning and value
  chip_id(3:0) Level consecutive numbers When the multi-disc cascade uses, the position of this film on the cascade chain. For example: this film is the 3rd on the cascade chain, then chip_id+1=3. If monolithic uses, then the chip_id perseverance is 0000.
  Chip_num(3:0) Level is counted in flakes When the multi-disc cascade uses, show total several devices on the cascade chain. For example: have 3 device cascades to use, then chip_num+1=3. If monolithic uses, then the chip_num perseverance is 0000.
  Chan_num(7:0) Port number Value is between 00000000~00001111, according to using needs to determine. For example, need concurrent operation 7 passages, then chan_num+1=7.
  Work_model(4:0) Mode of operation Value is 11000 under the related operation pattern
  coeff_num(4:0) The coefficient sets number Need how much organize coefficient altogether when showing the multichannel computing. Value must not surpass current port number chan_num (7:0). For example, have 3 groups of coefficients, then coeff_num+1=3.
  Coeff_chan(4:0) The coefficient passage Each organizes coefficient and each interchannel corresponding relation. The cooperation coefficient group counts coeff_num and port number chan_num uses. Triangular pass is: coeff_chan+1=[(chan_num+1)/(coeff_num+1)]+1 namely, (coeff_chan+1) add 1 for (chan_num+1) after rounding divided by (coeff_num+1). For example: chan_num+1=7, coeff_num+1=3, then coeff_num+1=3. The expressed meaning of Coeff_chan+1 is: the corresponding port number of each group coefficient. Above example, 7 passages, it is 3 that 3 groups of coefficients, coefficient passage add 1, and the 1st, 2,3 passages and the 1st group of coefficients match are described, the 4th, 5,6 passages and the 2nd group of coefficients match, the 7th passage and the 3rd group of coefficients match.
  Conj Conjugation is selected Perseverance is set to 0 during related operation.
  Length(7:0) Filter length What characterize is the filter length (also being counting of one group of coefficient) of related operation. If length of window is X, then, X=40* (length+1).
  Sel_coeff_ram(4   :0) Read the coefficient plot During related operation, coefficient memory is divided into two zones, can change in real time coefficient in calculating process. High 4 perseverances of Sel_coeff_ram (4:0) are set to 0000, and lowest order represents the coefficient with which zone.
  Sel_data Whether weighting Under the related operation pattern, fixedly be made as ' 0 ' herein.
  Sel_coeff_pp Coefficient table tennis Under the related operation pattern, fixedly be made as ' 0 ' herein.
  Result1_mod(1:0   ) Export 1 pattern 00:result1 directly exports I/Q; 01:result1 output mould value, phase angle; 10:result1 output logarithm, phase angle; The 11:result1 cascade of being correlated with.
  Sel_AGC1(1:0) Output 1 gain control is selected 00,11:result1 gain does not deal with; 01:result1 result carries out index normalization; 10:result1 result turns fixed point by floating-point.
  AGC1(3:0) Output 1 gain When the sel_AGC1 value is 01 or 10, just the index of computing gained floating point result 1 is carried out normalization according to the value of AGC1.
  Result2_mod(1:0   ) Export 2 patterns 00:result2 directly exports I/Q; 01:result2 output mould value, phase angle; 10:result2 output logarithm, phase angle; The 11:result2 cascade of being correlated with.
  Sel_AGC2(1:0) Output 2 gain controls are selected 00,11:result2 gain does not deal with; 01:result2 result carries out index normalization; 10:result2 result turns fixed point by floating-point.
  AGC2(3:0) Output 2 gains When the sel_AGC2 value is 01 or 10, just the index of computing gained floating point result 2 is carried out normalization according to the value of AGC2.
Table 3
64 control words are by coefficient input port coeff_in input, input mode is divided into two kinds of 32 inputs and 16 inputs,, respectively do for oneself at control_en1, control_en2 and send into respectively when hanging down as enabling by pin control_en1, control_en2. During 32 inputs, the function control word is divided two groups, take whole 32 of coeff_in (31:0), the low level of control_en1, control_en2 continues 1 coeff_en cycle respectively, being low coeff_en send the function control word in the cycle front 32 at control_en1, is low coeff_en send control word in the cycle rear 32 at control_en2. During 16 inputs, the function control word takies the high 16 of coeff_in (31:0), the low level of control_en1, control_en2 continues 2 coeff_en cycles, being that 2 low coeff_en send front 32 of function control word in the cycle at control_en1, is 2 low coeff_en send the function control word in the cycle rear 32 at control_en2. Sequential Figure 11 of 32 control word input timings and 16 control word inputs, shown in Figure 12.
On the general signal processing module take the present invention as primary processor, carry out respectively the related operation of 1024 FFT computings, 10 rank FIR arteries and veins groups processing, 360 point Linear FM signals, its control word arranges respectively shown in table 4, table 5, table 6.
Control word Title Meaning and value
  work_model(4:0) Mode of operation FFT: value is 10000;
  length(7:0) Filter length What low four length (3:0) characterized is the data length (also being counting of one group of weight coefficient) of FFT/IFFT computing. Take 2 the end of as, positive integer value corresponding to length (3:0) is as index, and the power of obtaining is exactly the length of computing. Value " 1010 " herein, corresponding 1024 points. The positive integer value that length (6:4) is corresponding adds 1 value that obtains and represents ultra long FFT/IFFT computing and split into the needed computing progression of point base eight computings, and its value is relevant with computing length. 1024 point processings must level Four herein, and corresponding code is: " 011 ". Highest order length (7) is set to ' 0 '.
  resultl_mod(1:0) Delivery outlet 1 way of output The as a result way of output of final result output port 1 can have 4 kinds. Control word gets 01 herein, output mould value, phase angle.
  sel_AGC1(1:0) Delivery outlet 1 gain control Final result output port 1 both can be according to the form of input data, and I/Q is respectively 20 floating-point outputs; Also can get a normalized floating point values as standard with the value of control word GAC1 (3:0); I/Q can also be converted into 20 fixed-point number output. Get 10 herein, fixed point output.
  AGC1(3:0) Delivery outlet 1 normalization level Work as sel_AGC1=01,10 o'clock, all need to determine the exponential quantity of a benchmark, and AGC1 (3:0) i.e. benchmark index value for this reason, when index normalization output or fixed point output, all as benchmark. Getting benchmark herein is 1011.
Table 4
Control word Title Meaning and value
  chan_num(7:0) Channel number The number of channels of FIR arteries and veins group computing. Rule is: number of channels=chan_num+1. Be 10 rank FIR arteries and veins group computings, so chan_num is 00001001 herein
  work_model(4:0) Mode of operation Value is 01000 herein for FIR arteries and veins group operational pattern, so be set to 01000 under the FIR arteries and veins group operational pattern
  conj Conjugation is selected Coefficient conjugation whether under the FIR arteries and veins group mode. If bank of filters adopts the coefficient of conjugation between channel, then be equivalent to arithmetic speed and double. Be made as ' 1 ' herein, select conjugation.
  length(7:0) Filter length Filtering operation length under the FIR arteries and veins group mode, filter length N=length+1. Be 10 rank FIR arteries and veins group computings, so length is set to 00001001 herein
  result1_mod(1:0) Delivery outlet 1 way of output The as a result way of output of final result output port 1 can have 4 kinds. Get 00 herein: output I/Q data.
  sel_AGC1(1:0) Delivery outlet 1 gain control Final result output port 1 both can be according to the form of input data, and I/Q is respectively 20 floating-point outputs; Also can get a normalized floating point values as standard with the value of control word GAC1 (3:0); I/Q can also be converted into 20 fixed-point number output. Sel_AGC1 (1:0) gets 10 herein, fixed point output.
  AGC1(3:0) Delivery outlet 1 normalization level Sel_AGC1=01,10 o'clock, all need to determine the exponential quantity of a benchmark, and AGC1 (3:0) i.e. benchmark index value for this reason, when index normalization output or fixed point output, all as benchmark. Benchmark GAC1 gets 0111 herein
Table 5
Control word Title Meaning and value
  chip_id(3:0) Level consecutive numbers When the multi-disc cascade uses, the position of this film on the cascade chain. Monolithic uses herein, so chip_id is set to 0000.
  Chip_num(3:0) Level is counted in flakes When the multi-disc cascade uses, show total several devices on the cascade chain. Monolithic uses herein, so chip_num is set to 0000.
  Chan_num(7:0) Port number Number of active lanes, rule is: port number=chan_num+1. Passage herein is so chan_num is set to 00000000
  Work_model(4:0) Mode of operation Value is 11000 herein for related operation under the related operation pattern, and work_model is set to 11000
  coeff_num(4:0) The coefficient sets number Need how much organize coefficient altogether when showing the multichannel computing. Value must not surpass current port number. Only has one group of coefficient herein, so coeff_num is set to 00000
  Coeff_chan(4:0) The coefficient passage Each organizes coefficient and each interchannel corresponding relation. Corresponding passage of one group of coefficient herein is so coeff_chan is set to 00000
  Length(7:0) Filter length What characterize is counting of related operation. If length of window is X, then, X=40* (length+1). Be 360 spot correlation computings, so length is set to 00001001 herein
  Result1_mod(1:0) Export 1 pattern Show that the result1 output port is with pattern output in what. Result1_mod is set to 10, result1 output logarithm, phase angle herein.
Table 6

Claims (5)

1, reconstructable digital signal processor is characterized in that:
The hardware structure of device inside and hardware line can carry out structural rearrangement by the configuration control word, thereby realize the filtering operation of the various ways such as FFT/invert fast fourier transformation, FIR arteries and veins group and relevant treatment.
2, reconstructable digital signal processor as claimed in claim 1 is characterized in that:
Main framework comprises input block, output unit, exchanges data unit and 4 elementary cells, comprises 160 real number floating-point multiplication accumulators in its elementary cell, and they are evenly distributed in 4 elementary cells;
The organizational form of hardware can be recombinated by the configuration control word: by the configuration of control word and control signal, can change the organizational form of described 160 real number floating-point multiplication accumulators and exchanges data unit, make it to select different mode of operations, to adapt to three kinds of different processor active tasks: FFT/IFFT, the processing of FIR arteries and veins group, related operation;
The hardware scheduling scheme is taked the centralized and distributed two-level scheduler method that combines: namely control word is carried out one-level decoding by global module first, carries out two-stage decode by each elementary cell again.
3, reconstructable digital signal processor as claimed in claim 1 or 2 is characterized in that:
General frame adopts two-stage control framework, and global control module is used for coordinating 4 elementary cells, and there is the local control logic of himself each elementary cell inside. Exchanges data unit between 4 elementary cells is responsible for the data in each elementary cell are required to send into other 3 elementary cells according to different control;
Input block receives control word, control word is carried out one-level decoding, distributed control word to each unit: control word and coefficient entrance 1 multiplexing same port, the control word receiver module receives control word, then send into the decoding of one-level decoding module, produce on the one hand overall control signal and provide synchronous for generation of sequential in the sheet, for coefficient and data, on the other hand by respectively other unit emissions in the sheet of control word distribution module, the coefficient synchronization module carries out synchronously coefficient entrance 1 and coefficient entrance 2, and data simultaneous module is carried out synchronously data entrance 1 and data entrance 2;
The exchanges data unit is the switch combination of inputting, exporting more a group more, swap data between 4 elementary cells;
Output unit sorts to the operation result of each elementary cell, and according to different formatted outputs: elementary cell Output rusults order module is arranged the output of elementary cell when working in FFT/IFFT and FIR arteries and veins group tupe according to channel order, will be from each elementary cell when working in the relevant treatment pattern, the data of output are adjusted into the continuous data stream identical with inputting data transfer rate frame by frame, ask the mould module to finish the output format of real part/imaginary part to the conversion of the output format of mould value/phase angle, the module of taking the logarithm is converted to logarithm with the mould value of inputting and represents, floating-point/fixed point modular converter can with real part/imaginary part module or ask the output of mould module to be converted to fixed point format by floating-point format, index normalization module unifies to be fixed value by mantissa being done corresponding displacement with the index of the operation result of floating-point format, above-mentioned 4 kinds of format converting module have respectively two covers, the corresponding cover of each output port guarantees that two output ports can be independently with any one formatted output;
In the elementary cell, data storage comprises 8 512 * 40 two-port RAM, the input-buffer that is used for operational data, temporary and the operation result output buffer memory of intermediate results of operations, data buffer storage can be added to each complex multiplication accumulator and complex multiplication accumulator submatrix to data simultaneously, coefficient memory comprises 10 256 * 32 two-port RAM, weight coefficient when being used for storage relevant treatment and FIR filter coefficient and FFT computing, the coefficient of FFT interative computation be one fixing, table with the special logic realization, corresponding 4 the real number floating-point multiplication accumulators of each complex multiplication accumulator, each complex multiplication accumulator submatrix comprises 16 real number floating-point multiplication accumulators, be equivalent to 4 complex multiplication accumulators, two complex multiplication accumulator submatrixs and complex multiplication accumulator are according to different mode matched combined, be used for finishing different computings, the structure of real number floating-point multiplication accumulator is divided into 5 parts: the fixed-point multiplication part, the cut position part, the index replacement part, the fixed point addition section, the index judgment part.
4, reconstructable digital signal processor as claimed in claim 1 or 2 is taked corresponding organizational form when it is characterized in that doing three kinds of filtering operations, be respectively:
When doing the FFT/IFFT computing, the cumulative submatrix B of the cumulative submatrix A of complex multiplication and complex multiplication is respectively as the node of an interative computation of basic 8 algorithms, 4 complex multiplication accumulators in the submatrix are used for finishing one-level base 8 interative computations: that complex multiplication accumulator A of submatrix front and complex multiplication accumulator B are as the windowing arithmetic unit, data are carried out windowing process, data after the windowing enter the cumulative submatrix of complex multiplication and carry out one-level base 8 interative computations, the output of every one-level interative computation at first imports data buffer storage, and then deliver to the cumulative submatrix of corresponding another one complex multiplication by the exchanges data unit, to carry out the next stage interative computation;
When doing the processing of FIR arteries and veins group, in parallel use of parallel multiplication in 2 complex multiplication accumulator module of elementary cell inside and 2 the complex multiplication accumulator submatrix modules, a channel of the corresponding FIR arteries and veins of each complex multiplication accumulator group, corresponding 2 channels during conjugate operation;
When carrying out related calculation, complex multiplication accumulator and the series connection of complex multiplication accumulator submatrix are used, and 10 complex multiplication accumulators are equivalent to connect.
5, reconstructable digital signal processor as claimed in claim 1 or 2 is characterized in that being the hardware scheduling scheme that adopts respectively:
The two-level scheduler pattern that local control combines in overall situation control and the elementary cell is adopted in the scheduling of hardware resource: control word and control signal at first enter global control module and carry out one-level decoding and control word distribution, in this module, at first receive control word, then produce overall control signal according to control word, comprise: coordinate the action of 4 elementary cells, determine the output format of operation result, setting to master clock in the sheet, second effect of global control module is the Local Control Module emission control information to each elementary cell, comprising: subtype, the computing of mode of operation, mode of operation counted, the number of number of channels, treatment channel;
Local Control Module in each elementary cell carries out two-stage decode to these information after the information of receiving the global control module emission, be converted into the details of hardware scheduling control signal.
CN200610086398A 2006-07-14 2006-07-14 Reconstructable digital signal processor Active CN100594491C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200610086398A CN100594491C (en) 2006-07-14 2006-07-14 Reconstructable digital signal processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200610086398A CN100594491C (en) 2006-07-14 2006-07-14 Reconstructable digital signal processor

Publications (2)

Publication Number Publication Date
CN1900927A true CN1900927A (en) 2007-01-24
CN100594491C CN100594491C (en) 2010-03-17

Family

ID=37656814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200610086398A Active CN100594491C (en) 2006-07-14 2006-07-14 Reconstructable digital signal processor

Country Status (1)

Country Link
CN (1) CN100594491C (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010083723A1 (en) * 2009-01-21 2010-07-29 上海芯豪微电子有限公司 Reconfigurable data processing platform
CN101833540A (en) * 2010-04-07 2010-09-15 华为技术有限公司 Signal processing method and device
CN102043761A (en) * 2011-01-04 2011-05-04 东南大学 Fourier transform implementation method based on reconfigurable technology
CN102122275A (en) * 2010-01-08 2011-07-13 上海芯豪微电子有限公司 Configurable processor
CN102087640B (en) * 2009-12-08 2013-06-05 中兴通讯股份有限公司 Method and device for realizing Fourier transform
CN103390071A (en) * 2012-05-07 2013-11-13 北京大学深圳研究生院 Hierarchical interconnection structure of reconfigurable operator array
CN103543983A (en) * 2012-07-11 2014-01-29 世意法(北京)半导体研发有限责任公司 Novel data access method for improving FIR operation performance on balance throughput data path architecture
CN103543984A (en) * 2012-07-11 2014-01-29 世意法(北京)半导体研发有限责任公司 Modification type balance throughput data path architecture for special corresponding applications
CN103999039A (en) * 2011-10-27 2014-08-20 Lsi公司 Digital processor having instruction set with complex exponential non-linear function
CN104932992A (en) * 2015-07-08 2015-09-23 中国电子科技集团公司第五十四研究所 Designing method for microwave digital flexible forwarding technology variable in bandwidth granularity
CN105206282A (en) * 2015-09-24 2015-12-30 深圳市冠旭电子有限公司 Noise acquisition method and device
CN106951211A (en) * 2017-03-27 2017-07-14 南京大学 A kind of restructural fixed and floating general purpose multipliers
CN106951394A (en) * 2017-03-27 2017-07-14 南京大学 A kind of general fft processor of restructural fixed and floating
CN106959936A (en) * 2016-01-08 2017-07-18 福州瑞芯微电子股份有限公司 A kind of the hardware-accelerated of FFT realizes device and method
CN109782661A (en) * 2019-01-04 2019-05-21 中国科学院声学研究所东海研究站 It is applied to hydrolocation based on FPGA and realizes reconfigurable and multi output real time processing system and method
CN110024345A (en) * 2016-11-30 2019-07-16 美光科技公司 Wireless device and system comprising mixing the example of the coefficient data selected specific to tupe
CN110247872A (en) * 2019-03-25 2019-09-17 南京南瑞微电子技术有限公司 A kind of synchronization detecting method and device for power line carrier communication chip
CN110674456A (en) * 2019-09-26 2020-01-10 电子科技大学 Time-frequency conversion method of signal acquisition system
CN111064912A (en) * 2019-12-20 2020-04-24 江苏芯盛智能科技有限公司 Frame format conversion circuit and method
CN111158636A (en) * 2019-12-03 2020-05-15 中国人民解放军战略支援部队信息工程大学 Reconfigurable computing structure and routing addressing method and device of multiply-accumulate computing processing array
CN111782581A (en) * 2020-07-30 2020-10-16 中国电子科技集团公司第十四研究所 Reconfigurable signal processing arithmetic unit and recombination unit based on same
US11115256B2 (en) 2016-11-30 2021-09-07 Micron Technology, Inc. Wireless devices and systems including examples of mixing input data with coefficient data
US11671291B2 (en) 2019-02-22 2023-06-06 Micron Technology, Inc. Mixing coefficient data specific to a processing mode selection using layers of multiplication/accumulation units for wireless communication

Cited By (40)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010083723A1 (en) * 2009-01-21 2010-07-29 上海芯豪微电子有限公司 Reconfigurable data processing platform
CN102087640B (en) * 2009-12-08 2013-06-05 中兴通讯股份有限公司 Method and device for realizing Fourier transform
CN102122275A (en) * 2010-01-08 2011-07-13 上海芯豪微电子有限公司 Configurable processor
WO2011082690A1 (en) * 2010-01-08 2011-07-14 Shanghai Xin Hao Micro Electronics Co. Ltd. Reconfigurable processing system and method
CN101833540A (en) * 2010-04-07 2010-09-15 华为技术有限公司 Signal processing method and device
CN101833540B (en) * 2010-04-07 2012-06-06 华为技术有限公司 Signal processing method and device
CN102043761A (en) * 2011-01-04 2011-05-04 东南大学 Fourier transform implementation method based on reconfigurable technology
CN103999039B (en) * 2011-10-27 2018-08-10 英特尔公司 Digital processing unit with the instruction set with complex exponential nonlinear function
CN103999039A (en) * 2011-10-27 2014-08-20 Lsi公司 Digital processor having instruction set with complex exponential non-linear function
CN103390071A (en) * 2012-05-07 2013-11-13 北京大学深圳研究生院 Hierarchical interconnection structure of reconfigurable operator array
CN103543984A (en) * 2012-07-11 2014-01-29 世意法(北京)半导体研发有限责任公司 Modification type balance throughput data path architecture for special corresponding applications
CN103543983A (en) * 2012-07-11 2014-01-29 世意法(北京)半导体研发有限责任公司 Novel data access method for improving FIR operation performance on balance throughput data path architecture
CN103543984B (en) * 2012-07-11 2016-08-10 世意法(北京)半导体研发有限责任公司 Modified form balance throughput data path architecture for special related application
US9424033B2 (en) 2012-07-11 2016-08-23 Stmicroelectronics (Beijing) R&D Company Ltd. Modified balanced throughput data-path architecture for special correlation applications
CN103543983B (en) * 2012-07-11 2016-08-24 世意法(北京)半导体研发有限责任公司 For improving the novel data access method of the FIR operating characteristics in balance throughput data path architecture
CN104932992B (en) * 2015-07-08 2017-10-03 中国电子科技集团公司第五十四研究所 A kind of flexible retransmission method of the variable Digital Microwave of bandwidth granularity
CN104932992A (en) * 2015-07-08 2015-09-23 中国电子科技集团公司第五十四研究所 Designing method for microwave digital flexible forwarding technology variable in bandwidth granularity
CN105206282A (en) * 2015-09-24 2015-12-30 深圳市冠旭电子有限公司 Noise acquisition method and device
CN105206282B (en) * 2015-09-24 2019-12-13 深圳市冠旭电子股份有限公司 noise collection method and device
CN106959936A (en) * 2016-01-08 2017-07-18 福州瑞芯微电子股份有限公司 A kind of the hardware-accelerated of FFT realizes device and method
US11658687B2 (en) 2016-11-30 2023-05-23 Micron Technology, Inc. Wireless devices and systems including examples of mixing input data with coefficient data
US11695503B2 (en) 2016-11-30 2023-07-04 Micron Technology, Inc. Wireless devices and systems including examples of mixing coefficient data specific to a processing mode selection
CN110024345B (en) * 2016-11-30 2022-01-25 美光科技公司 Wireless device and system including instances of mixing processing mode selection specific coefficient data
CN110024345A (en) * 2016-11-30 2019-07-16 美光科技公司 Wireless device and system comprising mixing the example of the coefficient data selected specific to tupe
US11115256B2 (en) 2016-11-30 2021-09-07 Micron Technology, Inc. Wireless devices and systems including examples of mixing input data with coefficient data
US11088888B2 (en) 2016-11-30 2021-08-10 Micron Technology, Inc. Wireless devices and systems including examples of mixing coefficient data specific to a processing mode selection
CN106951211B (en) * 2017-03-27 2019-10-18 南京大学 A kind of restructural fixed and floating general purpose multipliers
CN106951394A (en) * 2017-03-27 2017-07-14 南京大学 A kind of general fft processor of restructural fixed and floating
CN106951211A (en) * 2017-03-27 2017-07-14 南京大学 A kind of restructural fixed and floating general purpose multipliers
CN109782661A (en) * 2019-01-04 2019-05-21 中国科学院声学研究所东海研究站 It is applied to hydrolocation based on FPGA and realizes reconfigurable and multi output real time processing system and method
US11671291B2 (en) 2019-02-22 2023-06-06 Micron Technology, Inc. Mixing coefficient data specific to a processing mode selection using layers of multiplication/accumulation units for wireless communication
CN110247872A (en) * 2019-03-25 2019-09-17 南京南瑞微电子技术有限公司 A kind of synchronization detecting method and device for power line carrier communication chip
CN110247872B (en) * 2019-03-25 2021-11-23 南京杰思微电子技术有限公司 Synchronous detection method and device for power line carrier communication chip
CN110674456A (en) * 2019-09-26 2020-01-10 电子科技大学 Time-frequency conversion method of signal acquisition system
CN110674456B (en) * 2019-09-26 2022-11-22 电子科技大学 Time-frequency conversion method of signal acquisition system
CN111158636A (en) * 2019-12-03 2020-05-15 中国人民解放军战略支援部队信息工程大学 Reconfigurable computing structure and routing addressing method and device of multiply-accumulate computing processing array
CN111064912A (en) * 2019-12-20 2020-04-24 江苏芯盛智能科技有限公司 Frame format conversion circuit and method
CN111064912B (en) * 2019-12-20 2022-03-22 江苏芯盛智能科技有限公司 Frame format conversion circuit and method
CN111782581A (en) * 2020-07-30 2020-10-16 中国电子科技集团公司第十四研究所 Reconfigurable signal processing arithmetic unit and recombination unit based on same
CN111782581B (en) * 2020-07-30 2024-01-12 中国电子科技集团公司第十四研究所 Reconfigurable signal processing operation unit and recombination unit based on same

Also Published As

Publication number Publication date
CN100594491C (en) 2010-03-17

Similar Documents

Publication Publication Date Title
CN1900927A (en) Reconstructable digital signal processor
JP7337103B2 (en) neural processor
Ma et al. ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler
Nakahara et al. A deep convolutional neural network based on nested residue number system
CN106779060B (en) A kind of calculation method for the depth convolutional neural networks realized suitable for hardware design
CN109447241B (en) Dynamic reconfigurable convolutional neural network accelerator architecture for field of Internet of things
CN105534546B (en) A kind of ultrasonic imaging method based on ZYNQ Series FPGAs
CN110413254B (en) Data processor, method, chip and electronic equipment
CN101149730B (en) Optimized discrete Fourier transform method and apparatus using prime factor algorithm
CN1226980A (en) Correlator method and apparatus
CN111401554A (en) Accelerator of convolutional neural network supporting multi-granularity sparsity and multi-mode quantization
CN1020806C (en) Parallel multiplier using skip array and modified wallace tree
CN112101517A (en) FPGA implementation method based on piecewise linear pulse neuron network
Wang et al. A well-structured modified Booth multiplier design
CN113283587B (en) Winograd convolution operation acceleration method and acceleration module
Zhang et al. Parallel hybrid stochastic-binary-based neural network accelerators
Rakesh et al. Design and implementation of Novel 32-bit MAC unit for DSP applications
Boroumand et al. Approximate quaternary addition with the fast carry chains of fpgas
Liu et al. Leveraging fine-grained structured sparsity for cnn inference on systolic array architectures
CN109711542A (en) A kind of DNN accelerator that supporting dynamic accuracy and its implementation
CN107368459A (en) The dispatching method of Reconfigurable Computation structure based on Arbitrary Dimensions matrix multiplication
Spagnolo et al. Designing fast convolutional engines for deep learning applications
CN110825346A (en) Low-logic-complexity unsigned approximate multiplier
CN1735857A (en) Method and system for performing a multiplication operation and a device
CN107256342A (en) Collaboration entropy Cascading Methods on multiple populations for electronic health record Reduction of Knowledge measures of effectiveness

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20070124

Assignee: China Information & Electronice Development Inc., Ltd., Hefei

Assignor: No.38 Inst., China Electronic Sci. & Tech. Group Co.

Contract record no.: 2013340000054

Denomination of invention: Reconstructable digital signal processor

Granted publication date: 20100317

License type: Exclusive License

Record date: 20130422

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20191119

Address after: 5 / F, airborne center, 38 new area, No. 199, Xiangzhang Avenue, hi tech Zone, Hefei City, Anhui Province 230000

Patentee after: Anhui core Century Technology Co., Ltd.

Address before: 230031 Hefei thirty-eighth Research Institute, 9023 mailbox, Anhui, China

Patentee before: No.38 Inst., China Electronic Sci. & Tech. Group Co.