CN103139110A

CN103139110A - Device and method for baseband processing

Info

Publication number: CN103139110A
Application number: CN2011103906724A
Authority: CN
Inventors: 张电波; 周代彬
Original assignee: Alcatel Lucent Shanghai Bell Co Ltd
Current assignee: Nokia Shanghai Bell Co Ltd
Priority date: 2011-11-30
Filing date: 2011-11-30
Publication date: 2013-06-05
Anticipated expiration: 2031-11-30
Also published as: CN103139110B

Abstract

The invention provides a device and a method for baseband processing. A cubic flow processing device for the baseband processing comprises an instruction obtaining and decoding device, an instruction distribution device and a special processing device. The instruction obtaining and decoding device is used for obtaining and decoding an instruction from an instruction cache device so as to obtain a to-be-processed instruction. The instruction distribution device is used for judging whether the to-be-processed instruction is a specific instruction or not, and when the to-be-processed instruction is judged to be the specific instruction, the instruction is issued to the special processing device. The special processing device is used for obtaining data from a register array of the cubic flow processing device and a self corresponding data area when the to-be-processed instruction is received, so that the data can be processed, and a processing result is enabled to be stored into the data area. The device and the method for the baseband processing have the advantages that the special processing device is used for processing the specific instruction, particularly a self-defined complicated instruction, and the efficiency for processing the specific instruction is improved significantly.

Description

A kind of apparatus and method for Base-Band Processing

Technical field

The present invention relates to the communications field, relate in particular to a kind of apparatus and method for Base-Band Processing.

Background technology

Software radio (SDR, Software-Defined Radio) is used for realizing efficient multi-mode Base-Band Processing scheme in the wireless base station, yet this SDR pattern is being challenged in the day by day complexity of multi-antenna transmitting transferring technology and the deflation of energy budget.

Be generally used for the processing unit of Base-Band Processing mostly based on multi-core DSP and large-scale FPGA.As up-to-date Turbo Nyquist (TMS320C6618) system level chip based on DSP, it has comprised 4 DSP cores and a plurality of coprocessor, as Bitlevel processor, FFT and Turbo Decoder etc., this chip can be processed bandwidth and four aerial arrays of 20MHz.But adopt common processing unit to have problems: at first, the cost that adopts current general processing unit often to expend is higher; Secondly, depend on largely production development route and products thereof the requirement of chip supplier; And current general processing unit can't be processed the situation of complex wireless environments more such as LTE A, microwave transmission, multi-pattern Fusion processing etc., also can't adapt to the demand for development under this type of more complicated wireless environment.

Summary of the invention

The purpose of this invention is to provide a kind of apparatus and method for Base-Band Processing.

According to an aspect of the present invention, a kind of composite handling arrangement is provided, wherein, described composite handling arrangement comprises the three-dimensional cell element array structure that is made of a plurality of processing units, wherein, each processing unit comprises four inputs and at least one output, and wherein, each processing unit is connected with a processing unit at least.

According to an aspect of the present invention, also provide a kind of m-ALU parts, wherein, comprised described composite handling arrangement in described m-ALU parts.

according to another aspect of the present invention, a kind of cube current processing device for Base-Band Processing is provided, described cube of current processing device comprises internal system time clock, bus clock, program counter, the instruction buffer device, register array, data buffer storage, the multithread scheduling device, storage/download apparatus, TLB and L2 cache, dma controller, the DDR2 controller, wireless IP and common public radio interface interface, wherein, described cube of current processing device comprises that also instruction obtains code translator, command assignment device and special treatment device, wherein, described special treatment device comprises one or more described m-ALU parts, wherein:

Instruction is obtained code translator for being obtained instruction and decoding by the instruction buffer device to obtain pending instruction;

The command assignment device is used for described pending instruction, judges that whether it is specific instruction, when being judged as specific instruction, is sent to special treatment device with this instruction;

Special treatment device is used for after receiving pending instruction, by obtaining data in data area corresponding with self in the register array of this cube current processing device, so that these data are processed, and result is deposited in to this data area.

According to another aspect of the present invention, also provide a kind of method for Base-Band Processing, described method realizes based on described cube of current processing device, wherein, said method comprising the steps of:

-by obtaining instruction and decoding in the instruction buffer device to obtain pending instruction;

-to described pending instruction, judge that whether it is specific instruction, when being judged as specific instruction, is sent to special treatment device with this instruction;

-after receiving pending instruction, by obtaining data in data area corresponding with self in the register array of this cube current processing device, so that these data are processed, and result is deposited in to this data area.

Compared with prior art, the present invention has the following advantages: by adopting the three-dimensional cell element array structure that is comprised of a plurality of processing units, can greatly improve the treatment effeciency to each instruction, the great lifting that especially can access for the treatment effeciency of complicated order.Simultaneously, carry out parallel processing by adopting a plurality of G-ALU that ordinary instruction is operated, avoided on the one hand processing all instructions and causing the processing load of special treatment device overweight and cause to give full play to the situation of the processing advantage of special treatment device by special treatment device, also further improved on the other hand the treatment effeciency of this process chip; And can realize as the DSP parts in FPGA or ASIC based on the existing basic element of character according to device of the present invention, thereby need not to depend on supplier's production development direction.

Description of drawings

By reading the detailed description that non-limiting example is done of doing with reference to the following drawings, it is more obvious that other features, objects and advantages of the present invention will become:

Fig. 1 is the structural representation of the three-dimensional cell element array that is made of a plurality of processing units that comprises in a kind of composite handling arrangement of one aspect of the invention;

Fig. 2 is the structural representation of a kind of cube current processing device for Base-Band Processing of another aspect;

Fig. 3 is a kind of method flow diagram for Base-Band Processing of another aspect of the present invention.

In accompanying drawing, same or analogous Reference numeral represents same or analogous parts.

Embodiment

Below in conjunction with accompanying drawing, the present invention is described in further detail.

Fig. 1 has illustrated the structural representation of the three-dimensional cell element array that is made of a plurality of processing units that comprises in a kind of composite handling arrangement of one aspect of the invention.Wherein, described composite handling arrangement comprises the three-dimensional cell element array structure that is made of a plurality of processing units, and wherein, each processing unit comprises four inputs and at least one output, and wherein, each processing unit is connected with a processing unit at least.

Particularly, as shown in Figure 1, wherein, A1 to A4, B 1 to B4, C 1 to C4 and D1 to D4 are the input message of this three-dimensional cell element array, and O1 to O4 is the output information of this three-dimensional cell element array apparatus.

Preferably, described composite handling arrangement comprises 4*4*4 processing unit and a control unit, and wherein, each processing unit comprises 3 DSP logic chips, and each DSP logic chip comprises 4 inputs and 1 output.

Wherein, each DSP logic chip can be processed the computation system of taking advantage of of 28*18 position at least.Preferably, can also process the computation system of taking advantage of of 32*32 position based on the DSP logic chip of ASIC, the even more high-order computation system of taking advantage of.Therefore, described processing unit can be completed the processing (comprising 18 I and 18 Q) to the complicated multiplication of 32 bit data within 2 instruction cycles.

Based on said structure, the processing array that is comprised of 4 * 4 processing units can be processed the transpose of a matrix operation of 2*2 within 4 instruction cycles, for example, in the situation that 250MHz, adopt the composite handling arrangement of described three-dimensional cell element array only to need 48ns can process the transpose of a matrix operation of 4*4 within 12 instruction cycles, can greatly improve treatment effeciency.

Preferably, described processing array can be processed self-defined special operational instruction, for example, and matrix multiplication command M UL, matrix transpose instruction INV, matrix decomposition instruction QR, matrix inversion command M MSE etc.

Take matrix transpose instruction INV as example, adopt the composite handling arrangement described in the present invention to carry out matrix decomposition when processing to the matrix of 4*4, only need 8 instruction cycles can complete processing, and the only operation of 4 read memories, and the operation of 4 Mobile data positions.

Preferably, described composite handling arrangement can comprise 4 described processing arrays that are comprised of 4 * 4 processing units, namely this composite handling arrangement comprises 64*3=192 DSP logic chip, and based on said structure, described composite handling arrangement can be processed the data of three sectors simultaneously.

Wherein, described composite handling arrangement is contained in the m-ALU parts, so that it can be applicable in multiple process chip.

According to composite handling arrangement of the present invention, by adopting the structure of three-dimensional cell element array, can greatly improve treatment effeciency, especially for the operational order of complexity, effect with raising treatment effeciency of highly significant, and can be based on the existing basic element of character according to device of the present invention, the DSP parts in FPGA or ASIC realize, thereby need not to depend on supplier's production development direction.

The structural representation of Fig. 2 has illustrated another aspect of the present invention a kind of cube current processing device for Base-Band Processing.Described cube of current processing device comprises internal system time clock (not shown), bus clock (not shown), program counter (not shown) and instruction buffer device, register array, data buffer storage, multithread scheduling device, storage/download apparatus, TLB and L2 cache, dma controller, DDR2 controller, wireless IP and common public radio interface interface, wherein, described cube of current processing device comprises that also instruction obtains code translator, command assignment device and special treatment device.Wherein, described special treatment device comprises one or more m-ALU parts, and wherein, described m-ALU parts comprise with reference to aforementioned composite handling arrangement shown in Figure 1.

Preferably, described special treatment device is trembled a parallel G-ALU all code translator and command assignment device is obtained in corresponding identical instruction buffer device, register array, instruction with described.

Wherein, described cube of current processing device can be for being integrated in the process chip in FPGA or ASIC.

Particularly, instruction is obtained code translator by obtaining instruction and decoding in instruction buffer to obtain pending instruction.

Then, the pending instruction that the command assignment device obtains after to decoding judges that whether it is specific instruction, when being judged as specific instruction, is sent to special treatment device with this instruction.Preferably, described specific instruction is self-defining special operational instruction.

Particularly, cube current processing device judges whether described pending instruction is contained in predetermined specific instruction list, when this pending instruction of judgement is contained in predetermined specific instruction list, this instruction is sent to special treatment device.

For example, the pending instruction of obtaining the code translator acquisition when instruction is INV, and cube specific instruction of a current processing device judgement INV for being scheduled to, and the command assignment device is determined INV is dispensed to special treatment device.

Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any pending instruction to obtaining after decoding judges whether it is specific instruction, when being judged as specific instruction, this instruction is sent to the implementation of special treatment device, all should be within the scope of the present invention.

After special treatment device receives pending instruction, by obtaining data in data area corresponding with self in the register array of this cube current processing device, so that these data are processed, and result is deposited in to this data area.

Particularly, after special treatment device receives pending instruction, by obtaining and the corresponding data of this pending instruction in the data area corresponding with special treatment device predetermined in the register array of under self cube of current processing device, carrying out described pending instruction, and the result after complete is deposited in the Free Region that is back to this data area.

As one of the preferred embodiments of the present invention, but also comprise the G-ALU of a plurality of parallel processings according to of the present invention cube of current processing device.

Wherein, when instruction that code translator obtains is obtained in the decision instruction of command assignment device for specific instruction, also carry out the sub-processing instruction that will this pending instruction splits to obtain a plurality of these pending instructions, but and each the sub-processing instruction after splitting be sent to respectively the operation of the G-ALU of described a plurality of parallel processings.

Particularly, when the command assignment device judges that the instruction that obtains is not specific instruction, according to the instruction process mode of subscribing, this pending instruction is split, obtaining a plurality of sub-processing instructions that correspond respectively to each G-ALU in process chip, and each sub-processing instruction is sent to respectively in its corresponding G-ALU.

For example, cube current processing device as shown in Figure 2 comprises 4 G-ALU_1 to G-ALU_4 that can carry out parallel processing, when the command assignment device judges that the pending instruction instruction1 that obtains is not specific instruction, this instruction is split, obtaining respectively sub-processing instruction instru1, instru2, instru3, instru4, and send it to respectively G-ALU1, G-ALU2, G-ALU3 and G-ALU4.

Need to prove, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, when any instruction that obtains when judgement is not specific instruction, this pending instruction is split to obtain the sub-processing instruction of a plurality of these pending instructions, but and each the sub-processing instruction after splitting be sent to respectively the implementation of the G-ALU of described a plurality of parallel processings, all should be within the scope of the present invention.

Then, to each G-ALU in the G-ALU of described a plurality of parallel processings, after receiving pending instruction, from the register array of described cube of current processing device obtain data with self corresponding data area, and result is deposited in to this data area.

Particularly, to each G-ALU in the G-ALU of described a plurality of parallel processings, after this G-ALU receives pending instruction, according to received instruction, by the register array of cube current processing device in the data area place corresponding with self obtain and the corresponding data of this sub-processing instruction, to carry out required operation, with the sub-result corresponding with this sub-processing instruction that obtains, and this sub-result is still deposited to this data area of register array.

For example, in cube current processing device as shown in Figure 2, G-ALU1 obtains data in the data area 1 and 2 by register array, to carry out required operation, with the sub-result corresponding with this sub-processing instruction that obtains, and this sub-result is still deposited to this data area of register array.

Then, cube current processing device merges to obtain final result with the sub-result of the G-ALU of described a plurality of parallel processings.

Concrete, cube current processing device is by obtaining the sub-result that current each G-ALU obtains respectively in data storage areas corresponding with each G-ALU in register array, and according to predetermined merging rule, each sub-deal with data is merged, to obtain final result.

Wherein, those skilled in the art should determine the data of each sub-result to be merged to obtain the mode of final process result according to predetermined merging rule according to actual conditions and demand, do not repeat them here.

As one of the preferred embodiments of the present invention, described special treatment device can send data to described each G-ALU, perhaps, obtains data by described each G-ALU place.

According to the solution of the present invention, because the three-dimensional cell element array that comprises in m-ALU can greatly improve treatment effeciency to each instruction, therefore by adopting the special treatment device that is consisted of by m-ALU to process all kinds of instructions, especially the great lifting that self-defining special instruction, the treatment effeciency of its instruction can access.Simultaneously, carry out parallel processing by adopting a plurality of G-ALU that ordinary instruction is operated, avoided on the one hand processing all instructions and causing the processing load of special treatment device overweight and cause to give full play to the situation of the processing advantage of special treatment device by special treatment device, also further improved on the other hand the treatment effeciency of this process chip; And can realize as the DSP parts in FPGA or ASIC based on the existing basic element of character according to device of the present invention, thereby need not to depend on supplier's production development direction.

Fig. 3 has illustrated a kind of method flow diagram for Base-Band Processing of another aspect of the present invention.Described method realizes based on described cube of current processing device.

Wherein, described method comprises step S1, step S2 and step S3.

In step S1, cube current processing device is by obtaining instruction and decoding in instruction buffer.

In step S2, the pending instruction that cube current processing device obtains after to decoding judges that whether it is specific instruction, when being judged as specific instruction, is sent to special treatment device with this instruction.Preferably, described specific instruction is self-defining special operational instruction.

For example, the pending instruction that obtains in step S1 when cube current processing device is INV, and the specific instruction of judgement INV for being scheduled to, and cube current processing device is determined INV is dispensed to special treatment device.

In step S3, after special treatment device receives pending instruction, obtain data by in the register array data area corresponding with it, so that these data are processed, and result is deposited in to this data area.

Particularly, after special treatment device receives pending instruction, by obtaining and the corresponding data of this pending instruction in the data area corresponding with special treatment device predetermined in the register array of this cube current processing device, carrying out described pending instruction, and the result after complete is deposited in the Free Region that is back to this data area.

As one of the preferred embodiments of the present invention, the method according to this invention also comprises step S4 (not shown), step S5 (not shown) and step S6 (not shown).

In step S4, when cube instruction that the current processing device judgement obtains in step S2 is not specific instruction, this pending instruction is split to obtain the sub-processing instruction of a plurality of these pending instructions, but and each the sub-processing instruction after splitting be sent to respectively the G-ALU of described a plurality of parallel processings.

Particularly, judge in step S2 when cube current processing device when the instruction that obtains is not specific instruction, according to the instruction process mode of subscribing, this pending instruction is split, obtaining a plurality of sub-processing instructions that correspond respectively to each G-ALU in cube current processing device, and each sub-processing instruction is sent to respectively in its corresponding G-ALU.

For example, the G-ALU_1 to G-ALU_4 of parallel processing comprises 4 in cube current processing device can be carried out the time, when the pending instruction instruction1 that obtains in cube current processing device determining step S1 is not specific instruction, this instruction is split, obtaining respectively sub-processing instruction instru1, instru2, instru3, instru4, and send it to respectively G-ALU_1, G-ALU_2, G-ALU_3 and G-ALU_4.

In step S5, to each G-ALU in the G-ALU of described a plurality of parallel processings, after receiving pending instruction, obtain data from described register array and self corresponding data area, and result is deposited in to this data area.

Particularly, to each G-ALU in the G-ALU of described a plurality of parallel processings, after this G-ALU receives pending instruction, according to received instruction, in register array by cube current processing device, obtain in the data area corresponding with self and the corresponding data of this sub-processing instruction, to carry out required operation, with the sub-result corresponding with this sub-processing instruction that obtains, and this sub-result is still deposited to this data area of register array.

Then, in step S6, the sub-result of the G-ALU of described a plurality of parallel processings is merged to obtain final result.

The method according to this invention, owing to having adopted the special treatment device that is made of m-ALU to come specific instruction, particularly self-defining complicated order is processed, and has improved the efficient of processing specific instruction; And, carry out parallel processing for ordinary instruction by a plurality of G-ALU, avoided on the one hand processing all instructions and causing special treatment device over-burden by special treatment device, can't give full play to the situation of the processing advantage of special treatment device, also further improve on the other hand the treatment effeciency of this cube current processing device.

To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned one exemplary embodiment, and in the situation that do not deviate from spirit of the present invention or essential characteristic, can realize the present invention with other concrete form.Therefore, no matter from which point, all should regard embodiment as exemplary, and be nonrestrictive, scope of the present invention is limited by claims rather than above-mentioned explanation, therefore is intended to be included in the present invention dropping on the implication that is equal to important document of claim and all changes in scope.Any Reference numeral in claim should be considered as limit related claim.In addition, obviously other unit or step do not got rid of in " comprising " word, and odd number is not got rid of plural number.A plurality of unit of stating in system's claim or device also can be realized by software or hardware by a unit or device.The first, the second word such as grade is used for representing title, and does not represent any specific order.

Claims

1. composite handling arrangement, wherein, described composite handling arrangement comprises the three-dimensional cell element array structure that is made of a plurality of processing units, wherein, each processing unit comprises four inputs and at least one output, and wherein, each processing unit is connected with a processing unit at least.

2. composite handling arrangement according to claim 1, wherein, described composite handling arrangement comprises 4*4*4 processing unit and a control unit, wherein, each processing unit comprises 3 DSP logic chips, and each DSP logic chip all comprises 4 inputs and 1 output.

3. m-ALU parts, wherein, comprise composite handling arrangement as claimed in claim 1 or 2 in described m-ALU parts.

4. cube current processing device that is used for Base-Band Processing, internal system time clock, bus clock, program counter, the instruction buffer device, register array, data buffer storage, the multithread scheduling device, storage/download apparatus, TLB and L2 cache, dma controller, the DDR2 controller, wireless IP and common public radio interface interface, wherein, described cube of current processing device comprises that also instruction obtains code translator, command assignment device and special treatment device, wherein, described special treatment device comprises one or more m-ALU parts as claimed in claim 3, wherein:

5. according to claim 5 cube of current processing device, wherein, described cube of current processing device also comprises a plurality of parallel G-ALU, wherein:

When the instruction that the command assignment device also is used for obtaining when judgement is not specific instruction, this pending instruction is split to obtain the sub-processing instruction of a plurality of these pending instructions, but and each the sub-processing instruction after splitting be sent to respectively the G-ALU of described a plurality of parallel processings;

Each G-ALU in described a plurality of parallel G-ALU is used for after receiving pending instruction, obtains data with self corresponding data area from the register array of described cube of current processing device, and result is deposited in to this data area.

6. according to claim 4 or 5 described cubes of current processing devices, wherein, described special treatment device can send data to described each G-ALU, perhaps, obtains data by described each G-ALU place.

7. described cube of current processing device of any one according to claim 4 to 6, wherein, described special treatment device is trembled a parallel G-ALU all code translator and command assignment device is obtained in corresponding identical instruction buffer device, register array, instruction with described.

8. method that is used for Base-Band Processing, described method realizes based on comprising as described in any one in claim 4 to 8 cube of current processing device, wherein, said method comprising the steps of:

-after receiving pending instruction, by obtaining data in corresponding data area in the register array of under self cube of current processing device, so that these data are processed, and result is deposited in to this data area.

9. method according to claim 8, wherein, described cube of current processing device also comprises a plurality of parallel G-ALU, wherein, described method is further comprising the steps of:

-during for specific instruction, this pending instruction is split to obtain the sub-processing instruction of a plurality of these pending instructions when the instruction that obtains of judgement, but and each the sub-processing instruction after splitting be sent to respectively the G-ALU of described a plurality of parallel processings;

Wherein, each G-ALU in the G-ALU of described a plurality of parallel processings carries out following steps:

-when receiving sub-processing instruction, obtain data from described register array and self corresponding data area, and result is deposited in to this data area;

Wherein, described method is further comprising the steps of:

-result of the G-ALU of described a plurality of parallel processings is merged to obtain final result.