CN1437417A - Iterative operation structure and method suitable for being realized via software radio technology - Google Patents

Iterative operation structure and method suitable for being realized via software radio technology Download PDF

Info

Publication number
CN1437417A
CN1437417A CN 02103915 CN02103915A CN1437417A CN 1437417 A CN1437417 A CN 1437417A CN 02103915 CN02103915 CN 02103915 CN 02103915 A CN02103915 A CN 02103915A CN 1437417 A CN1437417 A CN 1437417A
Authority
CN
China
Prior art keywords
common factor
slave
sub
module
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 02103915
Other languages
Chinese (zh)
Other versions
CN1168328C (en
Inventor
汪东艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Telecommunications Technology CATT
Original Assignee
Datang Mobile Communications Equipment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Datang Mobile Communications Equipment Co Ltd filed Critical Datang Mobile Communications Equipment Co Ltd
Priority to CNB021039151A priority Critical patent/CN1168328C/en
Publication of CN1437417A publication Critical patent/CN1437417A/en
Application granted granted Critical
Publication of CN1168328C publication Critical patent/CN1168328C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Landscapes

  • Mobile Radio Communication Systems (AREA)

Abstract

An iterative operation structure suitable to be realized in software radio technique at least includes processor module, central controlling unit, system matrix storage, master-slave public factor generator unit and master-slave public factor storage, of which the processor module consists of one and above subprocessing module with all the same structure as each subprocessing module to be connected with each abovesaid unit through multiple bus separately and each of it including subprocessor unit, storage and multiplexing unit. The data to be processed will be inputted to subprocessor unit for processing through the storage and multiplexing unit and the data having been processed to the subprocessor unit will be inputted into storage as the data to be processed in next step.

Description

Iterative operation structure and method suitable for software radio technology implementation
Technical Field
The invention relates to an iterative operation technology in wireless communication, in particular to a structure and a method which are suitable for a software radio technology and are used for processing iterative operation with a plurality of common factors among multi-step iterations when a plurality of processors work in parallel.
Background
Currently, the main body of modern wireless communication is mobile communication, which operates in a complex and diverse mobile environment, and thus the impact of severe time-varying and multipath propagation must be considered, with reference to International Telecommunication Union (ITU) recommendation M1225. In modern wireless communication systems, particularly Code Division Multiple Access (CDMA) systems, it is generally desirable to use smart antennas with joint detection techniques in order to increase system capacity, improve system sensitivity, and achieve greater communication distances at lower transmit powers.
In many published technical documents, the research on the beamforming algorithm in the smart antenna is involved, and the research results show that the stronger the function, the more complex the algorithm is. However, in a mobile communication environment, the joint detection technology and beamforming must be completed in real time, and the time for completing the algorithm can only be calculated in microseconds. However, due to the state of modern microelectronics, Digital Signal Processors (DSPs) or application specific chips (ASICs) are not capable of performing overly complex real-time processing in such short time periods.
On the other hand, the technology and standard of mobile communication are continuously proposed and updated, and the software radio technology is increasingly highly regarded. How to solve the problem of spatial interfaces of different systems on a common hardware platform by using other programmable devices such as a Digital Signal Processor (DSP) or a programmable logic array (FPGA) has become a main research topic of numerous communication companies of various countries around the world. Furthermore, not only can software radio be used for user terminals to solve the problem of multimode handsets, it will also be used for wireless base stations. Especially in the case of the continuous update of third generation mobile communication technologies and standards, it is only possible to keep the product up with the technological development using software radio technology.
In the implementation technology of software radio technology, research has shown that the programmable logic device has better performance, especially for high-parallelism operation than the DSP widely used at presentThe method has obvious advantages that the former can not only improve the operation speed, but also can improve the overall work efficiency of system hardware through an effective and flexible design method, namely: all logic resources in the system are in an effective working state as much as possible, and the power of the system is reduced. This is not comparable to some current dedicated chips or even to DSPs. However, for the operation with higher iteration, it is generally considered that it is difficult to realize the operation in the FPGA with higher cost performance ratio. Taking the equation solution as an example, for example: knowing the vector e and the matrix A, solving the vector by equation (1)dWhereinAis an m x m dimensional non-negative Hermite array.
e= A· d (1)
Then, the following three iterative operations are generally required to solve the problem by the conventional method:
the method comprises the following steps: the matrix a is subjected to a decomposition operation shown in formula (2):
A=L*TL (2)
wherein L is a lower triangular matrix, L*TIs the conjugate transpose of L.
Step two: and (3) completing the iterative operation shown in the formula (3):
L*Ty=e (3)
wherein y is an intermediate variable to be solved.
Step three: and (4) completing the iterative operation shown in the formula (4):
L d=y (4)
in the above solving process, when the data amount of the a matrix is large, in order to increase the processing speed, a plurality of processors (processors) are often required to work in parallel to complete the above steps. For the operation of step one, when N processors are used to complete in parallel, the result shown in fig. 2 is obtained, that is, in the initial operation time T1, all the processors Processor1, Processor2, …, and Processor N are all in the active state, and in the time period T1 < T < Tn, the Processor1 is in the idle state, and in the time period T2 < T < Tn, the Processor2 is in the idle state, …, and the operation to time step one is completed. As can be seen, processor1 has a life cycle of T1, processor2 has a life cycle of T2, processor 3 has a life cycle of T3, processor 4 has a life cycle of T4, …, processor N-1 has a life cycle of T (N-1), and processor N has a life cycle of Tn. When iterative operation is performed using multiple processors, if the structure of each processor has a performance structure as shown in fig. 2, such operation characteristics are referred to as staircase operation characteristics. If Tn is taken as the calculation time of the whole iterative operation, the idle time of the processor1 is Tn-T1; for processor2, its idle time is Tn-T2; for processor 3, its idle time is Tn-T3; for processor 4, its idle time is Tn-T4; …, respectively; for processor N-1, the idle time is Tn-T (N-1). Thus, for the iterative operation unit, the wasted hardware resources are: n × Tn- (T1+ T2+ T3+ T4+ … + T (N-1)). This means that more processors are used to do this, which may result in more wasted hardware resources while increasing processing speed. Similar problems are also caused for the processing operations of step two and step three.
Based on the above analysis, people tend to use DSP to implement such iterative operations, and therefore, one has to divide a complete operation module into a plurality of operation sub-modules, that is: the parallelism operation with higher performance requirement is realized in FPGA, and the operation with higher iteration is realized in DSP. However, a series of negative effects are brought about, and most prominently, the overhead brought by data communication between the modules is increased, and the overall performance of the system is reduced. Currently, the highest performance DSPs or ASICs cannot achieve overly complex real-time processing due to the ever-improving higher performance smart antennas and joint detection algorithms requiring higher baseband processing power and speed. It is therefore necessary to seek a higher performance processing method.
Disclosure of Invention
In view of the above, the main objective of the present invention is to provide an iterative operation structure and method suitable for software radio technology implementation, so that hardware resources can be fully utilized, meanwhile, the calculation efficiency can be improved, the hardware resource occupation can be reduced, the operation processing speed can be increased, the baseband processing capability can be improved, and the implementation is simple and convenient, and the performance is good.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
an iterative operation structure suitable for software radio technology implementation, comprising at least: a processor module for data processing and computation; the central control unit is used for controlling and coordinating the work of each module; a system matrix memory for storing system matrix data; a main common factor generator unit for extracting a main common factor; a slave common factor generator unit for generating slave operation factors; a main common factor storage storing a main operation factor; a slave common factor storage storing slave operation factors; the central control unit controls the master common factor generator unit, the slave common factor generator unit and the processor module; the stored data of the system matrix memory is sent to a slave common factor generator unit as input, the slave common factor generator unit is connected with a slave common factor memory, the output of the slave common factor memory is connected with the input of the system matrix memory, the master common factor generator unit is connected with a master common factor memory, and the output of the master common factor memory is connected with the input of the slave common factor generator unit;
the key point is that: the processor module further comprises more than one sub-processing module with the same structure; each sub-processing module mainly comprises a sub-processor unit, a memory and a multiplexing unit, wherein data to be processed is input into the sub-processor unit for processing through the memory and the multiplexing unit, and the data processed by the sub-processor unit is input into the memory as data to be processed in the next step;
the central control unit is connected with the multiplexing unit of each sub-processing module through a bus; the data are respectively processed by the sub-processor units which are input to all the sub-processing modules by the system matrix memory, the slave common factor generator unit, the slave common factor memory, the master common factor generator unit and the master common factor memory through buses.
The number of the sub-processing modules is determined according to the size of the data to be processed, the time specified for completing the corresponding operation and the number of available hardware resources.
The sub-processor unit further comprises a main operation module and a slave operation module, and the central control unit controls and selects the main operation module or the slave operation module to work according to certain conditions. The certain condition is that all data required for calculation by the master operation module or the slave operation module are generated.
A method for realizing iterative operation by using the iterative operation structure is characterized in that: when more than one processor is needed for iterative operation processing, iterative operation with complementary step-like operation characteristics is completed by at least two processors in the same time slice under certain conditions.
The complementary step-shaped operation means that more than one common factor exists between two iterative operation steps, and in a calculation framework formed by taking the processor as a vertical axis and the time slice as a horizontal axis, the calculation results of each step of the two iterative operations are in a complementary step shape. The iterative operation is more than one different iterative operation step; or different levels of iteration in the same iterative operation step. The certain condition means that all data required by the next step of iterative operation are generated by the previous step of iterative operation; or all the data required for processing the next data in the current iteration operation step are generated by the current iteration operation step.
Therefore, the iterative operation structure and the iterative operation method which are suitable for the software radio technology implementation provided by the invention have the following advantages and characteristics:
1) because the iP structure and the pP structure comprise a plurality of multiplexing modules, the logic resources of the programmable gate array can be multiplexed to achieve the purpose that the available logic resources can be utilized to the maximum extent on any time slice, so that the best performance can be obtained while the resources are optimally utilized, namely the use limit of hardware is reached.
2) All sub-processors in the iP structure work simultaneously and parallelly, so that the life cycles of all sub-processor units in the iP structure are the same, current hardware resources can be fully utilized, and the resource utilization rate is improved; and the iterative operation required by the system is completed with higher performance, and a solution is provided for realizing software radio by using an FPGA or other similar devices.
3) The invention only partially improves the iteration part of the whole operation structure without changing the whole structure, so the invention is simple and convenient to realize and is convenient for real-time calculation.
4) Because the invention extracts the common factor aiming at a plurality of iterative sub-operations included in the iterative operation when the iterative operation is finished, the main operation module and the slave operation module aim at different iterative sub-operations logically respectively, but the hardware resources are multiplexed together, the proposal has less occupied resources and high calculation efficiency, and can realize higher performance than the DSP with the highest performance when the iterative operation is finished.
5) The invention is used in a mobile communication system, has higher capacity and better performance, greatly improves the baseband processing capability and is more beneficial to the realization of complex baseband algorithm.
Drawings
FIG. 1 is a schematic diagram of a structure of an iterative calculator;
FIG. 2 is a schematic diagram of the life cycle of each processor when iterative operations are performed according to a conventional method;
FIG. 3 is a schematic diagram of an iP operation structure in the iterative structure of the present invention;
FIG. 4 is a diagram illustrating a structure of a pP operation used in the embodiment of the present invention;
FIG. 5 is a schematic diagram of the life cycle of the cooperative work of the subprocessors of the iP structure of the invention;
FIG. 6 is a diagram illustrating an example of a calculation performed in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
Fig. 1 is a schematic diagram of a composition structure of an iterative arithmetic unit, which mainly includes an iP structure 10 and a pP structure 30. The iP structure 10 is a module for performing iterative operations, and the pP structure 30 is a module for performing flat operations, i.e., non-iterative operations. As shown in fig. 1, the result of the iterative operation of the iP structure 10, i.e., the signal S101, is used as the input of the pP structure 30; meanwhile, the signal S100 is a control signal output by a central processing unit in the iP structure, and can be used to control and start a module in the pP structure 30 to perform corresponding signal processing and calculation, thereby completing an operation process required by a user.
Data symbol sequence transmitted by each user in wireless communication systemdGenerally based on the received signal on the basis of channel estimationAnd recovering the product. Usually, the channel estimation is performed in the channel estimation module, and after the action of other preamble modules in the system, the corresponding channel impulse response is obtained, and the system matrix is obtained based on the channel impulse responseIt is the system matrix that is utilized in this embodiment
Figure A0210391500101
From received signalsRecovering the data symbol sequence transmitted by each userdThe operation structure of (2).
Recovering the user transmitted data symbol sequence using the obtained received signal and the system matrix, is implemented using the following equation: <math> <mrow> <msup> <mi>e</mi> <mrow> <mo>(</mo> <msub> <mi>k</mi> <mi>a</mi> </msub> <mo>)</mo> </mrow> </msup> <mo>=</mo> <msup> <munder> <mi>A</mi> <mo>&OverBar;</mo> </munder> <mrow> <mo>(</mo> <msub> <mi>k</mi> <mi>a</mi> </msub> <mo>)</mo> </mrow> </msup> <mo>&CenterDot;</mo> <munder> <mi>d</mi> <mo>&OverBar;</mo> </munder> <mo>+</mo> <msup> <munder> <mi>n</mi> <mo>&OverBar;</mo> </munder> <mrow> <mo>(</mo> <msub> <mi>k</mi> <mi>a</mi> </msub> <mo>)</mo> </mrow> </msup> <msub> <mi>k</mi> <mi>a</mi> </msub> <mo>=</mo> <mi>OK</mi> <msub> <mi>K</mi> <mi>a</mi> </msub> <mo>-</mo> <mn>1</mn> </mrow> </math> wherein,
Figure A0210391500104
is the channel impulse response matrix corresponding to a particular antenna,dis a symbol vector transmitted by the transmitting end,
Figure A0210391500105
is corresponding to a specific antenna kaThe interference vector of (a) is calculated,
Figure A0210391500106
i.e. to a specific antenna kaThe received signal. The purpose of this embodiment is to solve the equationdAccordingly, all K's can be matchedaAnd solving the root antenna.
For simplicity of explanation of the algorithm structure, parameters may not be considered at allThen the above equation can be simplified as:
e= A· d (5)
in the hypothesis (5)AThe method is a non-negative definite Hermite array, and if the traditional method is used for solving the problem that three steps of iterative operation are needed, and the iterative operation structure containing the iP structure is used for solving the problem, the operation can be completed only by one step of iterative operation and two steps of flat operation. The embodiment adopts a multiplexing iP structure and a multiplexing pP structure, and the specific implementation process, principle and effect thereof are described in detail below with reference to fig. 3, fig. 4 and fig. 5.
As shown in fig. 3, the process of solving equations using the iterative calculation structure of the present invention is as follows:
the iP architecture 10 in fig. 3 mainly comprises a central control unit 100, a system matrix memory 102, a master common factor generator unit 104, a slave common factor generator unit 101, a master common factor memory 103, a slave common factor memory 105 and a processor module 106. The central control unit 100 controls the master common factor generator unit 104, the slave common factor generator unit 101 and the processor module 106; the stored data in the system matrix memory 102 is sent as input to the slave common factor generator unit 101; the slave common factor generator unit 101 is connected to the slave common factor storage 105; the output from the common factor memory 105 is in turn connected to the input of the system matrix memory 102, since the system matrix memory 102 has a much larger memory space than the common factor memory 105, and therefore some of the data processed from the common factor memory 105 will also be stored in the system matrix memory 102; the main common factor generator unit 104 is connected to the main common factor storage 103; the output of the master common factor store 103 is in turn connected to the input of the slave common factor generator unit 101.
The processor module 106 further comprises a plurality of sub-processing modules, such as U1, U2, …, Un, etc., each sub-processing module is composed of sub-processor units P1, P2, …, Pn, memory M1, M2, …, Mn and multiplexing units X1, X2, …, Xn, the data to be processed is input to the sub-processor units P1, P2, …, Pn for processing through the memories M1, M2, …, Mn and multiplexing units X1, X2, …, Xn, the data processed by the sub-processor units P1, P2, …, Pn is input to the memories M1, M2, …, Mn as the data to be processed next; the central control unit 100 is connected to the multiplexing unit of each sub-processing module via a bus. For each sub-processor unit P1, P2, …, Pn, the input data is derived from several parts: data to be processed from the system matrix memory 102; data from the processing output from the common factor storage 105; stored data from the main common factor storage 103; and data from memories M1, M2, …, Mn, which are all transmitted over the bus. The specific number of sub-processing modules is determined by the size of the data to be processed, the required computational performance and the amount of available hardware resources. The respective sub-processing modules may operate in parallel under the scheduling and control of the central control unit 100, and the structural compositions of the respective sub-processing modules are the same.
The iterative operation in the iP structure at least comprises the following three steps:
a) suppose that
Figure A0210391500111
Is a known system matrix and is stored in a predetermined manner in a system matrix memory 102, the system matrix memory 102 being based on the system matrix under the control of the central control unit 100
Figure A0210391500112
The input parameter a is supplied to each of the sub-processor units P1, P2, …, Pn in the slave common factor generator unit 101 and the processor module 106ij(ii) a At the same time, the master common factor storage 103 supplies the parameter l to the slave common factor generator unit 101ijThe 1 ofijIs set to 0, the secondary operation factor q is obtained from the common factor generator unit 101 by the calculation according to equation (6)j <math> <mrow> <msub> <mi>q</mi> <mi>j</mi> </msub> <mo>=</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>a</mi> <mi>jj</mi> </msub> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>l</mi> <mi>jk</mi> </msub> <msubsup> <mi>l</mi> <mi>jk</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mrow> <mo>-</mo> <mn>1</mn> <mo>/</mo> <mn>2</mn> </mrow> </msup> <mi>j</mi> <mo>=</mo> <mn>1,2</mn> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mi>m</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow> </math> Wherein, aijIs a system matrixThe elements (A) and (B) in (B),
Figure A0210391500115
is ajkThe complex conjugate of (a).
b) Under the control of the central control unit 100, the slave computing factor qjIs supplied to each sub-processor unit P1, P2, …, Pn after being intermediately stored from the common factor memory 105, and at the same time, the main common factor memory 103 supplies a main operation factor (P) to each sub-processor unit P1, P2, …, Pni1,pi2,…,pik) The system matrix memory 102 is based on the system matrix
Figure A0210391500121
Providing the input parameter a to the sub-processor units P1, P2, …, PnijAnd further calculates an intermediate result lijAnd tij
Taking the sub-processing module 200A as an example, the slave computing factor qjA main operation factor pikAnd system matrix element aijAfter the data is inputted into the sub-processing module 200A, the iteration data in the memory M1 is further inputted into the main operation module or the slave operation module inside the sub-processor unit P1 through the multiplexing unit X1 to calculate the intermediate result lijAnd tij. The master operation module and the slave operation module in the sub-processor unit P1 are multiplexing modules.
The main calculation module in the sub-processor unit P1 is according to equation (7), and is mainly used for calculating the intermediate result lij <math> <mrow> <msub> <mi>l</mi> <mi>ij</mi> </msub> <mo>=</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>a</mi> <mi>ij</mi> </msub> <mo>-</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mrow> <mi>j</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msubsup> <mi>l</mi> <mi>jk</mi> <mo>*</mo> </msubsup> <mo>)</mo> </mrow> <mi>i</mi> <mo>=</mo> <mn>1,2</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mi>m</mi> <mo>;</mo> <mi>j</mi> <mo>=</mo> <mn>1,2</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mi>m</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow> </math> In the formula (7), aijIs a system matrixElement (ii) qjIs a slave operation factor, pikIs a factor of the main operation, and is,
Figure A0210391500124
is ajkThe complex number of the conjugate of (a),
Figure A0210391500125
and ljkIs the intermediate operation result of the iP structure.
The slave arithmetic module in the sub-processor unit P1 is according to equation (8), and is mainly used for calculating the intermediate result tij <math> <mrow> <msub> <mi>t</mi> <mi>ij</mi> </msub> <mo>=</mo> <mo>-</mo> <msub> <mi>q</mi> <mi>j</mi> </msub> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mi>j</mi> </mrow> <mrow> <mi>i</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>ik</mi> </msub> <msub> <mi>t</mi> <mi>kj</mi> </msub> <mo>)</mo> </mrow> <mi>i</mi> <mo>=</mo> <mn>1,2</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mi>m</mi> <mo>;</mo> <mi>j</mi> <mo>=</mo> <mn>1,2</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mi>m</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow> </math> In the formula (8), qjIs a slave operation factor, pikIs a main operational factor, tijIs the operation result of the iP operation module and satisfies tjj=qj
c) Through the above calculation, l calculated by each sub-processorijAnd tijAre sent to and stored in memories M1, M2, …, Mn. Meanwhile, the primary common factor generator unit 104 will extract the output results of P1, P2, …, Pn as its input to generate the primary operation factor P required for the next iteration operationik. The master common factor storage 103 provides on the one hand the slave common factor generator unit 101 with the input parameter/ijFor generating a slave operation factor q required for the next iteration operationj(ii) a On the other hand, each sub-processor unit P1, P2, …, Pn in the processor module 106 is supplied with an input parameter, i.e., a main operation factor Pik
Repeating steps a) to c) until all data in the system matrix memory 102 have been processed and stored in M1, M2, …, Mn. By this point, the operations in the iP structure are all completed, and the system can further start the following operations in the pP structure. Of course, the system may start the operation of the pP structure when all operations in the iP structure are not completed, and if this operation mode is adopted, certain conditions need to be satisfied, where the conditions are: the operation of the pP structure can be started after the parameters required by the pP structure are generated.
Under the control of the central control unit 100, the master common factor generator unit 104, the master common factor storage 103, the slave common factor generator unit 101, the slave common factor storage 105 and the processor module 106 form a pipelined arithmetic structure, that is, several parts of data streams form a pipelined arithmetic structure. In all the operation time periods, all the hardware modules work cooperatively, so that the hardware resources are effectively utilized, and the requirement of the system on the real-time performance is met.
After the operation in the iP structure is finished, the final operation result is sent to the pP structure as an input signal, and the equation (5) is finally solved. The pP calculation structure is a flat structure, and the flat structure is: all elements in the vector to be solved have equal chances to be solved without the precedence of solving, i.e. the structure is not iterative.
The pP operation structure mainly comprises a local controller 302, a multiplexing processing module 303, multiplexers 305 and 306, a conjugate transpose module 301 and a memory 304; s100, S101, S102 are three input signals. The local controller 302 controls the multiplexed processing module 303, the multiplexers 305 and 306, the conjugate transpose module 301, and the memory 304 at the same time, the output of the processing module 303 is input to the multiplexer 306 through the memory 304, the output signal S101 of the iP architecture is input to the multiplexer 305 directly or through the conjugate transpose module 301, the output of the multiplexer 305 is connected to the multiplexed processing module 303, and the control signal S100 of the iP architecture is directly connected to the local controller 302. As shown in fig. 4, S100 is a control signal output by the central controller 100 in the iP architecture, and is used to start the local controller 302 in the pP architecture; s101 is a calculation result t of an iP structureij(ii) a S102 is known received signal data input by the systeme (k,a). The functional operation performed by the pP structure mainly comprises the following two steps:
the first step is as follows: <math> <mrow> <msub> <mi>r</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>i</mi> </munderover> <msub> <mi>t</mi> <mi>ik</mi> </msub> <msub> <mi>e</mi> <mi>k</mi> </msub> <mi>i</mi> <mo>=</mo> <mn>1,2</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mi>m</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow> </math>
in the formula (9), tikIs the resulting vector from the iP structure, i.e., signal S101 in fig. 4; e.g. of the typekIs receiving a data vectore (k,a)I.e., signal S102 in fig. 4; the result obtained by this step is riI.e. signal S105 in fig. 4, which is stored in the memory 304 and further used for the next operation.
The second step is that: <math> <mrow> <msub> <mi>d</mi> <mi>i</mi> </msub> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msubsup> <mi>t</mi> <mi>ik</mi> <mo>*</mo> </msubsup> <msub> <mi>r</mi> <mi>k</mi> </msub> <mi>i</mi> <mo>=</mo> <mn>1,2</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mi>m</mi> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>10</mn> <mo>)</mo> </mrow> </mrow> </math>
in the formula (10), the compound represented by the formula (10),
Figure A0210391500143
is tikI.e. signal S106 in fig. 4; diIs a symbol vector transmitted by the transmitting enddI.e., signal S107 in fig. 4; m is riLength of (d).
Under the control of the local controller 302, the multiplexers 305 and 306 select the signals S101, S106 or S102, S105, respectively, and the outputs S103, S104, S103 and S104 are input parameters of the processing module 303. Through the operations of the above two steps, a final output result, i.e. a signal S107, is obtained, namely: the sequence of user transmission data symbols sought by this embodimentd
Although only the corresponding antenna k is solved in the present embodimentaIs/are as followsdLikewise, all K's can be treated according to the structure and method of the present inventionaAnd solving the root antenna. In the present embodiment, the a matrix is assumed to be a non-negative-definite Hermite matrix, and in practical applications, similar methods can be used as long as the order primary sub-formula of the a matrix is not zeroThe architecture is realized and higher performance is achieved.
When the iterative operation structure provided by the invention realizes the parallel work of the multiple processors, the n sub-processors shown in figure 3 adopt parallel processing, and the iterative operation represented by a formula (7) and a formula (8) shows that the iterative operation structure extracts a common factor q aiming at two iterative operationsjAnd pikAnd a common operation module, namely a master operation module and a slave operation module in the sub-processor units P1, P2, … and Pn are multiplexed, so that the performance shown in FIG. 5 can be achieved, namely the life cycles of the sub-processors are basically the same. For example, in the above process of solving the equation by using the equation coefficient matrix, the coefficient matrix a is assumed to be an m × m dimension Hermite matrix, and m processors are used for parallel processing, in this example, n equals m. Since the matrix a is a Hermite matrix, the matrix is capable of generating two conjugate matrices by triangular decomposition, and the ith column (or row) element of the generated matrix has a specific correlation with the ith-1 column element, namely: processing elements for column 2 (or row) must be based on all processing of elements for column 1 (or row), processing elements for column 3 (or row) must be based on all processing of elements for column 2 (or row), …, and so on. Meanwhile, the processing equations (7) and (8) have two common factors. Then, at time t1, m processors operate in parallel to process m elements of column 1 (or row) of matrix a according to equation (7) to obtain m non-zero elements; by time t2, m-1 processors work in parallel to process the elements of column 2 (or row) of matrix A according to equation (7) to obtain m-1 non-zero elements, while at the same time the input data of equation (8) is already available, at which time processor1 processes according to equation (8); at the time t3, m-2 processors work in parallel to process the elements of the 3 rd column (or row) of the matrix A according to a formula (7) to obtain m-2 non-zero elements, and the processors 1 and 2 process the elements of the 3 rd column according to a formula (8); …, respectively; and so on. When m is 5, the case shown in fig. 6 can be obtained, where P1 to P5 represent 5 processors, T1 to T5 represent 5 time instants, the positive slope filled part is the calculation result of the iterative sub-operation represented by formula (7), and the negative slope filled part is the calculation result of the iterative sub-operation represented by formula (8)As a result, the positive and negative sloped portions have complementary staircase-like operational characteristics, that is: the calculation results of all iterative sub-operations of the formula (7) and the formula (8) are respectively in a step shape, and the two step shapes are complementary, namely the two step shapes can be mutually filled to form a rectangle. The elements in which the grid lines fill the part are completed from the common factor generator unit 101 in the illustrated embodiment of the present invention, thereby calculating the operation factors required for equations (7) and (8). In this way, the life cycle of each sub-processor is close to Tn, which means that hardware resources are fully utilized, and not only the computational efficiency can be improved, but also the resource utilization rate can be greatly improved.
The above embodiments are mainly applied to the iterative algorithm implementation of the software radio algorithm of the wireless communication system, and the operation structure and the implementation method related by the invention provide a high-performance solution for implementing software radio on a single-chip large-scale programmable logic device. Meanwhile, the invention can be applied to other occasions needing to solve the multi-element linear equation, such as an image processing system, a pattern recognition system and the like, only by slightly changing the input signal and the composition structure, and under the condition that the sequence principle formula of the coefficient matrix corresponding to the multi-element linear equation is not zero or certain operation has the step-shaped operational characteristic shown in figure 2, the operation of a plurality of complementary step-shaped operational characteristics can be realized by using a similar system structure, and higher performance is achieved, thereby obtaining higher cost performance.
The hardware architecture related by the invention can be completely used in the design of hard cores and soft cores based on iterative operation, and the hardware architecture provides a solution for the design of high-performance special chips.
In short, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (8)

1. An iterative operation structure suitable for software radio technology implementation, comprising at least: a processor module for data processing and computation; the central control unit is used for controlling and coordinating the work of each module; a system matrix memory for storing system matrix data; a main common factor generator unit for extracting a main common factor; a slave common factor generator unit for generating slave operation factors; a main common factor storage storing a main operation factor; a slave common factor storage storing slave operation factors; the central control unit controls the master common factor generator unit, the slave common factor generator unit and the processor module; the stored data of the system matrix memory is sent to a slave common factor generator unit as input, the slave common factor generator unit is connected with a slave common factor memory, the output of the slave common factor memory is connected with the input of the system matrix memory, the master common factor generator unit is connected with a master common factor memory, and the output of the master common factor memory is connected with the input of the slave common factor generator unit;
the method is characterized in that: the processor module further comprises more than one sub-processing module with the same structure; each sub-processing module mainly comprises a sub-processor unit, a memory and a multiplexing unit, wherein data to be processed is input into the sub-processor unit for processing through the memory and the multiplexing unit, and the data processed by the sub-processor unit is input into the memory as data to be processed in the next step;
the central control unit is connected with the multiplexing unit of each sub-processing module through a bus; the data are respectively processed by the sub-processor units which are input to all the sub-processing modules by the system matrix memory, the slave common factor generator unit, the slave common factor memory, the master common factor generator unit and the master common factor memory through buses.
2. The structure of claim 1, wherein: the number of the sub-processing modules is determined according to the size of the data to be processed, the time specified for completing the corresponding operation and the number of available hardware resources.
3. The structure of claim 1, wherein: the sub-processor unit further comprises a main operation module and a slave operation module, and the central control unit controls and selects the main operation module or the slave operation module to work according to certain conditions.
4. The structure of claim 3, wherein: the certain condition is that all data required by the calculation of the master operation module or the slave operation module are generated.
5. A method for realizing iterative operation by using the iterative operation structure is characterized in that: when more than one processor is needed for iterative operation processing, iterative operation with complementary step-like operation characteristics is completed by at least two processors in the same time slice under certain conditions.
6. The structure of claim 5, wherein: the iterative operation is more than one different iterative operation step; or different levels of iteration in the same iterative operation step.
7. The structure of claim 5 or 6, wherein: the certain condition means that all data required by the next step of iterative operation are generated by the previous step of iterative operation; or all the data required for processing the next data in the current iteration operation step are generated by the current iteration operation step.
8. The structure of claim 5, wherein: the complementary step-shaped operation means that more than one common factor exists between two iterative operation steps, and in a calculation framework formed by taking the processor as a vertical axis and the time slice as a horizontal axis, the calculation results of each step of the two iterative operations are in a complementary step shape.
CNB021039151A 2002-02-07 2002-02-07 Iterative operation structure and method suitable for being realized via software radio technology Expired - Lifetime CN1168328C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB021039151A CN1168328C (en) 2002-02-07 2002-02-07 Iterative operation structure and method suitable for being realized via software radio technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB021039151A CN1168328C (en) 2002-02-07 2002-02-07 Iterative operation structure and method suitable for being realized via software radio technology

Publications (2)

Publication Number Publication Date
CN1437417A true CN1437417A (en) 2003-08-20
CN1168328C CN1168328C (en) 2004-09-22

Family

ID=27627935

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB021039151A Expired - Lifetime CN1168328C (en) 2002-02-07 2002-02-07 Iterative operation structure and method suitable for being realized via software radio technology

Country Status (1)

Country Link
CN (1) CN1168328C (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993275A (en) * 2017-12-29 2019-07-09 华为技术有限公司 A kind of signal processing method and device
CN111353588A (en) * 2016-01-20 2020-06-30 中科寒武纪科技股份有限公司 Apparatus and method for performing artificial neural network reverse training

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353588A (en) * 2016-01-20 2020-06-30 中科寒武纪科技股份有限公司 Apparatus and method for performing artificial neural network reverse training
CN111353588B (en) * 2016-01-20 2024-03-05 中科寒武纪科技股份有限公司 Apparatus and method for performing artificial neural network reverse training
CN109993275A (en) * 2017-12-29 2019-07-09 华为技术有限公司 A kind of signal processing method and device
CN109993275B (en) * 2017-12-29 2021-01-29 华为技术有限公司 Signal processing method and device
US11238130B2 (en) 2017-12-29 2022-02-01 Huawei Technologies Co., Ltd. Signal processing method and apparatus

Also Published As

Publication number Publication date
CN1168328C (en) 2004-09-22

Similar Documents

Publication Publication Date Title
Samimi et al. Res-DNN: A residue number system-based DNN accelerator unit
CN110516801B (en) High-throughput-rate dynamic reconfigurable convolutional neural network accelerator
CN109447241A (en) A kind of dynamic reconfigurable convolutional neural networks accelerator architecture in internet of things oriented field
CN113077047B (en) Convolutional neural network accelerator based on feature map sparsity
CN113033794B (en) Light weight neural network hardware accelerator based on deep separable convolution
CN110069444B (en) Computing unit, array, module, hardware system and implementation method
CN112434801B (en) Convolution operation acceleration method for carrying out weight splitting according to bit precision
CN110851779A (en) Systolic array architecture for sparse matrix operations
CN105515627B (en) A kind of extensive MIMO detection method and detection device
CN111694544B (en) Multi-bit multiplexing multiply-add operation device, neural network operation system, and electronic apparatus
Wang et al. Solving large systems of linear equations over GF (2) on FPGAs
Mera et al. Compact domain-specific co-processor for accelerating module lattice-based key encapsulation mechanism
CN101697486A (en) Two-dimensional wavelet transformation integrated circuit structure
Meyer-Base et al. New power-of-2 RNS scaling scheme for cell-based IC design
CN110620566A (en) FIR filtering system based on combination of random calculation and remainder system
CN113055060B (en) Coarse-grained reconfigurable architecture system for large-scale MIMO signal detection
CN112799634B (en) Based on base 2 2 MDC NTT structured high performance loop polynomial multiplier
CN1168328C (en) Iterative operation structure and method suitable for being realized via software radio technology
Meher Unified systolic-like architecture for DCT and DST using distributed arithmetic
CN107092462B (en) 64-bit asynchronous multiplier based on FPGA
Liu et al. A high speed VLSI implementation of 256-bit scalar point multiplier for ECC over GF (p)
CN116596034A (en) Three-dimensional convolutional neural network accelerator and method on complex domain
CN111626410A (en) Sparse convolution neural network accelerator and calculation method
CN113283591B (en) Efficient convolution implementation method and device based on Winograd algorithm and approximate multiplier
Tan et al. A 400MHz NPU with 7.8 TOPS 2/W High-PerformanceGuaranteed Efficiency in 55nm for Multi-Mode Pruning and Diverse Quantization Using Pattern-Kernel Encoding and Reconfigurable MAC Units

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
PC01 Cancellation of the registration of the contract for pledge of patent right

Date of registration: 20070510

Pledge (preservation): Pledge

PE01 Entry into force of the registration of the contract for pledge of patent right

Effective date of registration: 20070510

Pledge (preservation): Pledge

PC01 Cancellation of the registration of the contract for pledge of patent right

Date of cancellation: 20100413

Granted publication date: 20040922

Pledgee: National Development Bank

Pledgor: Datang Mobile Communications Equipment Co|Shanghai Datang Mobile Communications Equipment Co|Telecom Research Institute of science and technology

Registration number: 2007110000354

ASS Succession or assignment of patent right

Owner name: INST OF TELECOMMUNICATION SCIENCE AND TECHNOLGOY

Free format text: FORMER OWNER: DATANG MOBILE COMMUNICATION EQUIPMENT CO., LTD.

Effective date: 20110706

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 100083 NO. 40, XUEYUAN ROAD, HAIDIAN DISTRICT, BEIJING TO: 100191 NO. 40, XUEYUAN ROAD, HAIDIAN DISTRICT, BEIJING

TR01 Transfer of patent right

Effective date of registration: 20110706

Address after: 100191 Haidian District, Xueyuan Road, No. 40,

Patentee after: Inst of Telecommunication Science and Technolgoy

Address before: 100083 No. 40, Haidian District, Beijing, Xueyuan Road

Patentee before: Datang Mobile Communication Equipment Co., Ltd.

CX01 Expiry of patent term
CX01 Expiry of patent term

Granted publication date: 20040922