CN101553994A

CN101553994A - Method and apparatus for joint detection

Info

Publication number: CN101553994A
Application number: CNA2007800299454A
Authority: CN
Inventors: 严爱国; 利德温·马蒂诺; 马可·卡西可; 保罗·D·克里瓦切克; 托马斯·J·小巴伯; 约翰·子军·申
Original assignee: MediaTek Inc
Current assignee: MediaTek Inc; Analog Devices Inc
Priority date: 2006-09-29
Filing date: 2007-09-27
Publication date: 2009-10-07
Also published as: CN101663829A; CN101536333A; CN101553995A; CN101663829B; CN101553995B; CN101536333B

Abstract

A joint detection system and associated methods are provided. The joint detection system is configured to perform joint detection of received signals and includes a joint detector accelerator and a programmable digital signal processor (DSP). The joint detector accelerator is configured to perform front-end processing of first data inputted to the joint detector accelerator and output second data resulting from the front-end processing. The joint detector accelerator is further configured to perform back-end processing using at least third data inputted to the joint detector accelerator. The programmable DSP is coupled to the joint detector accelerator, and the programmable DSP is programmed to perform at least one intermediate processing operation using the second data outputted by the joint detector accelerator. The programmable DSP is further programmed to output the third data resulting from the intermediate processing operation to the joint detector accelerator.

Description

Associated detecting method and equipment

Technical field

The application relates to associated detecting method and the circuit that is used for radio communication.

Background technology

TD-SCDMA (Time Division Synchronized Code Division Multiple Access, TD SDMA) is the wireless communication standard that combines TDD/TDMA (Time Division Duplexing/TimeDivision Multiple Access, time division duplex/time division multiple access) operation and synchronization CDMA (code division multiple access).TD-SCDMA can distribute different time slot (time slot) and spreading code for the user, and each time slot can comprise the data relevant with the different user of being distinguished by different spreading codes thus.Fig. 1 shows TD-SCDMA scheme 100, its midband 110 can be used for and a plurality of user-dependent communications, this communication is by distribute different time slot 121,122,123,124 etc. and different (for example, a TD-SCDMA time slot can use 16 spreading codes at most) such as spreading

codes

1,2,3 for these users.The every time slot of present TD-SCDMA uses 16 spreading codes at most, and this makes and can distribute different spreading codes simultaneously to maximum 16 users in a given time slot.In some cases, can distribute a plurality of spreading codes for a user.

TD-SCDMA supports asymmetric flow and business, can utilize flexibly frame structure to revise up link and downlink traffic allocation thus, and this frame structure flexibly makes can dynamically revise up link and downlink allocation during conversing.TD-SCDMA also makes and disturbs (MAI, multiple access interference) by utilizing joint-detection and antenna system to reduce multiple access.In the joint-detection scheme,, and from the signal that is received, extract specific user's data simultaneously to estimating from a plurality of user's data relevant with time slot.In this way, solved owing to the interference that causes with other user-dependent signal and can disturb the data that reduce for the user provide.

Summary of the invention

In one aspect, a kind of combined detection system is set to received signal is carried out joint-detection.This combined detection system comprises combined detector accelerator (accelerator), it is set to first data that are input to this combined detector accelerator are carried out front-end processing and exported second data that obtain from this front-end processing, and wherein, this combined detector accelerator also is set to use at least the 3rd data that are input to this combined detector accelerator to carry out back-end processing.This combined detection system also comprises programmable digital signal processor (digital signal processor, DSP), it is connected to this combined detector accelerator, wherein, this programmable digital signal processor DSP is programmed to use by these second data of this combined detector accelerator output and carries out at least one intermediate treatment operation, and wherein, this programmable digital signal processor DSP also is programmed to operate the 3rd data that obtain to this combined detector accelerator output from this at least one intermediate treatment.

On the other hand, a kind of combined detector accelerator is set to carry out at least some processing that are associated with the joint-detection of received signal.This combined detector accelerator comprises processor, this processor is set to first data that are input to this combined detector accelerator are carried out front-end processing and exported second data that obtain from this front-end processing, and, at least use the 3rd data that are input to this combined detector accelerator to carry out back-end processing, wherein, the 3rd data are at least in part based on these second data.

On the other hand, provide a kind of method of using the combined detector accelerator to come received signal is carried out joint-detection.This method may further comprise the steps: use this combined detector accelerator to receive first data.This method is further comprising the steps of: use this combined detector accelerator to come these first data are carried out front-end processing and obtained second data, wherein, this front-end processing comprises at least some operations of this joint-detection of received signal.This method is further comprising the steps of: use the combined detector accelerator to export second data that obtain from this front-end processing.This method is further comprising the steps of: use the combined detector accelerator to receive to three data of small part ground based on these second data, and, use this combined detector accelerator to carry out the back-end processing of using the 3rd data at least, wherein, this back-end processing comprises at least some other operations of this joint-detection of received signal.

When reading in conjunction with the accompanying drawings detailed description of the present invention, it is obvious that others of the present invention, execution mode and feature will become.Accompanying drawing is schematically and not to be to draw by ratio.In the accompanying drawings, be illustrated in each identical or similar substantially unit in each accompanying drawing with single Reference numeral or symbol.

For simplicity, be not that each parts is all marked in each accompanying drawing.And, so that those skilled in the art can understand when of the present invention, these parts are not shown when not needing to be described for each parts in each execution mode of the present invention yet.The whole patent applications that will be introduced into by quoting by reference and the full content of patent are incorporated in here.If inconsistent situation, then to have comprised that the specification of the present invention that limits is as the criterion.

Description of drawings

In the accompanying drawings, represent identical unit with identical Reference numeral:

Fig. 1 is the schematic diagram of TD-SCDMA time slot and spreading code;

Fig. 2 is the schematic diagram of TD-SCDMA downlink channel model;

Fig. 3 is the schematic diagram of various matrixes related in united detection processing;

Fig. 4 be according to the realization of an execution mode block diagram of receiver system of combined detector system;

Fig. 5 is the block diagram according to the inner receiver chain of an execution mode;

Fig. 6 is the block diagram according to the united detection processing of an execution mode;

Fig. 7 is the schematic diagram according to the time slot format of the received signal of an execution mode;

Fig. 8 is the schematic diagram of content of accumulator before shifting processing and after the shifting processing according to an execution mode;

Fig. 9 is that the content to accumulator according to an execution mode is shifted and the flow chart of the processing of a plurality of subclass of memory accumulator;

Figure 10 is the block diagram according to the hardware structure that can realize shifting processing of an execution mode;

Figure 11 lets slip the flow chart of journey according to the preshrunk that is used for the propagation channel estimation of an execution mode;

Figure 12 is the schematic diagram according to the pre-convergent-divergent of the channel estimating of an execution mode;

Figure 13 is the block diagram according to the combined detector accelerator architecture of an execution mode; And

Figure 14 is can be by the flow chart of the processing of carrying out according to the finite state machine of the combined detector of an execution mode.

Embodiment

For example can utilize digital signal processor (DSP) that combined detection system is embodied as software solution, perhaps (joint detection accelerator, the form of circuit JDA) realizes combined detection system by hardware solution according to being called the joint-detection accelerator.Compare with the combined detection system of realizing according to software, JDA can reduce power consumption and improve the speed of service.

The applicant have realized that combined detection system can comprise some of from the flexibility that programmable software realizes, being benefited handle operation and from the power consumption of the speed of JDA and reduction, be benefited other handle operation.The operation that is realized among the JDA can comprise the ripe algorithm that is customized by different mobile-phone manufacturers that those unlikely change and difficult.On the contrary, the operation that realizes in the Programmable DSPs can comprise and may change and be easy to the algorithm that customized by different mobile phone manufacturers.

The applicant recognizes that also when satisfying performance specification (such as the piece error rate performance), JDA is benefited can realizing from the fixed point (fixed point) that can reduce chip area and power consumption.The data bit width that reduces to fix a point to realize can dwindle chip area and reduce power consumption.In addition, littler bit wide means that processing can carry out in the shorter time, therefore makes longer chip dormancy or the idle pulley time period become possibility.The applicant has realized that JDA with small data bit wide more by carrying out computing and only position still less is kept in the memory in having a plurality of accumulator, still can multiply each other and the accumulating operation process in keep high accuracy.The applicant also recognizes, can be come internally to determine and/or set one or more shift value by external source (such as Programmable DSPs) by JDA.

The applicant recognizes that also the current initial channel estimation in combined detection system may need bigger bit wide, holds the difference between the amplitude of each channel.This situation may occur as the result of the mode of carrying out channel estimating.For example, in the TD-SCDMA system, in each burst, be provided with one or more training sequence (midamble), and receiver utilizes training sequence to estimate propagation channel between transmitter and the receiver.Yet receiver is carried out initial channel estimation, does not solve the difference between the quantity of the power grade of training sequence and training sequence in initial channel estimation.Though use the zoom factor that generates by the efficient coding detection algorithm finally to solve this species diversity, but in the fixed point of JDA realizes, the bit wide that has needed with having solved above-mentioned influence is at the very start compared, and the channel estimating that initial channel estimation generated can need bigger bit wide.The applicant has realized that combined detection system can dwindle bit wide in can realizing in the fixed point of JDA thus from before one or more propagation channel is sent to JDA these propagation channels being carried out being benefited the pre-convergent-divergent (pre-scaling).

Should be understood that,, and can realize by any-mode because technology described herein is not limited to any specific implementation.After a while the example of these realizations is discussed, but should be understood that these realizations are only introduced as illustrative examples, and can realize these execution modes by alternate manner.After a while in the combined detection system that the example of being introduced is described for using with the TD-SCDMA scheme.Yet, should be understood that, these technology as described herein can be used with other suitable communication plan, and/or can use with other combined detection system of realizing by variety of way, and the use of these technology is not limited to the combined detection system of any particular type.

As discussing after a while, an application of technology as described herein is the combined detection system that is used for the TD-SCDMA receiver.Yet, because technology as described herein can be used in the system of any type of the joint-detection that wherein can carry out received signal, so this example only is exemplary.

As shown in Figure 2, TD-SCDMA downlink channel model 200 comprise channel code and scrambler c 1, c2 ..., cKa, channel impulse response h 1, h2 ..., hKa, append to the random noise z of each channel and joint-detection data receiver 210.Data d1, d2 ..., dKa in the base station respectively with channel code and scrambler c1, c2 ..., cKa multiplies each other, and is sent to channel.Each encoding channel can be modeled as channel impulse response h 1, h2 ..., hKa and noise z afterwards.Owing to use smart antenna, so the channel impulse response of each encoding channel is independent of each other.Simulation part by receiver 210 is sampled to received data r, and is entered into the combined detection system of receiver 210.The output x of combined detection system comprises user data, can further be decoded to user data by the downlink bit processor.

The cumulative effects of channel code/scrambler and channel impulse response is the convolution of channel code/scrambler cx and channel impulse response h x.Can represent the cumulative effects of whole channels for the individual data symbol by matrix V, wherein the row of matrix V are the convolution at the channel code/scrambler and the channel impulse response of this encoding channel.The columns of matrix V is the quantity of efficient coding channel Ka.By arranging the V matrix, can make up merging response matrix T about the total data field along the diagonal of matrix T.

Fig. 3 is the schematic diagram of channel impulse response matrix H, channel code/scrambling matrix C, matrix V and matrix T.As shown in the figure, matrix H have Ka row and W capable, and Matrix C has that Ka is listed as and Q is capable, wherein W is the length of channel impulse response, Q is a spread spectrum coefficient, and Ka is the quantity of efficient channel.Matrix V have Ka row and W+Q-1 capable, N*Ka is listed as and N*Q+W-1 is capable, the quantity of the data symbol during wherein N is every and matrix T has.The matrix V line number depends on that (for TD-SCDMA, the length of data field is N for the length of data field ^*Q chip) the length W (the length W of channel impulse response is between 1 to 17 chip period) that adds channel impulse response subtracts (a N then ^*Q+W-1).

Use above-mentioned defined matrix, can add that then noise z represents received data r according to transmission data d with via the cumulative effects of the channel code/scrambler and the channel impulse response of matrix T:

r＝Td+z。

Can use joint detection algorithm come from received data r recover to obtain transmit data d.Can be used in find the solution transmit data d first algorithm use least square (least squares, LS) criterion:

Wherein TH is the associate matrix of matrix T.Least-squares algorithm may not be carried out well for low received signal to noise ratio (SNR), therefore can use another based on least mean-square error (minimummean squared error, MMSE) algorithm of criterion:

Wherein σ 2 is variances of noise z.LS algorithm and MMSE algorithm can be reduced to same equation:

Ad＝y，

Y=THr wherein, and be called as matched filter output, and for LS algorithm A=(THT), perhaps for MMSE algorithm A=(THT+ σ 2I).

Usually, find the solution this equation and will be referred to inverting matrix A.Because the attribute of matrix A utilizes equation A=LHDL, can use the Cholesky decomposition method to come representing matrix A by upper triangular matrix L and diagonal matrix D, wherein LH is the associate matrix of matrix L, and it can be used in solving equation Ad=y iteratively.In order to illustrate, Fig. 3 has also comprised the schematic diagram of matrix A and matrix L.Matrix L is made of N the piece of arranging along the diagonal of this matrix, and as will be discussed further, piece by only calculating limited quantity (for example, piece B1 and B2) and remaining piece be set to equal the last piece that calculates value (for example, with piece B3, B4 ..., BN is set to into the value of B2), can approach matrix L.

The solution procedure of solving equation Ad=y comprises forward substitution, some division and back to replacement, is transmitted the value of data to obtain:

(1) forward substitution: LHf=y

(2) some division: g=f./D

It is (3) back to replacement: Ld=g,

Wherein f is the intermediate vector of obtaining in the forward substitution equation, and g removes the intermediate vector of obtaining in the normal equation at point.The output of joint detection algorithm comprises the data of given subscriber equipment (UE).Can remove the UE data of non-given UE, make last output only comprise the data of given UE.

In one embodiment, realize that the combined detector system of joint detection algorithm can comprise JDA and Programmable DSPs, wherein Programmable DSPs is carried out one or more related processing operation of joint detection algorithm.Programmable DSPs makes it possible to by software one or more united detection processing operation that DSP carries out be customized.Programmable DSPs can carry out before JDA receives data and handle operation, can carry out some at JDA handles the back and carries out some intermediate treatment operations and/or can carry out reprocessing after JDA has finished processing to data.DSP carries out in some execution modes of intermediate treatment operation therein, and JDA can be included in JDA front-end processing that is performed before the intermediate treatment of being carried out by DSP and the JDA back-end processing that is performed after the intermediate treatment of being carried out by DSP.As discussing after a while, in one embodiment, the intermediate treatment operation of being carried out by DSP is that efficient coding detects processing.DSP carried out in some execution modes of handling data before data are sent to JDA therein, and DSP can carry out the channel estimation process of generator matrix H and Matrix C.JDA can be used in and finds the solution linear equation Ad=y, and DSP can provide received data r, matrix H and Matrix C and noise power σ 2 to JDA.

Fig. 4 be according to the realization of an execution mode comprise the block diagram of exemplary receiver system 400 of the combined detector system of JDA 415 and Programmable DSPs 425.Programmable DSPs 425 can be carried out one or more related processing operation of joint detection algorithm.System 400 can comprise wireless and Analog Baseband assembled unit 450, and wherein wireless module can receive the signal by base station transmits, and the Analog Baseband assembly can be provided by the received signal that is provided by wireless module.Numeric field assembly 440 can be handled the signal that is provided by the Analog Baseband assembly then.

Numeric field assembly 440 can comprise digital baseband components and can carry out the coprocessor that numeric field is handled.Digital baseband components can comprise can be to the Programmable DSPs 425 of received signal combine digital processing.Digital baseband components can be communicated by letter with coprocessor, and this coprocessor can be handled in numeric field received signal.

Coprocessor comprises JDA 415 and bit rate processor (bit rate processor, BRP) 416.In one embodiment, JDA 415 can carry out one or more processing operation of joint detection algorithm, and DSP 425 also can carry out one or more processing operation of joint detection algorithm.JDA 415 and DSP 425 can communicate with one another, and make DSP 425 can carry out one or more united detection processing operation thus, and these results that handle operation are sent to JDA 415 to be used for further processing.In addition, perhaps alternatively, JDA415 also can carry out one or more processing operation of joint detection algorithm, and the result of these operations is sent to DSP 425.In this way, DSP 425 can carry out the operation of the joint detection algorithm of any amount, and JDA 415 can carry out the operation of the joint detection algorithm of any amount.JDA 415 can generate soft-decision output, can this soft-decision output be converted to most probable hard decision by bit rate processor 416 then.Bit rate processor 416 can be carried out channel-decoding and error-detecting to transmission channel, execution deinterleaves to improve the chnnel coding performance, execution goes rate-matched to adjust data rate, carries out the demultiplexing of transmission channel, and carries out the mapping of going to coded composite transport channel on physical channel.

Fig. 5 is the block diagram according to the TD-SCDMA inner receiver chain 500 of an execution mode.What inner receiver chain 500 can comprise receiving terminal rises root cosine filter 520, can realize rising root cosine filter 520 in Analog Baseband (for example, the assembly 470 of system 400).Rise root cosine filter 520 and provide received signal to one or more pre-processing assembly (removing assembly 530 and I/Q compensation assembly 540) such as DC.In one embodiment, realize that by Programmable DSPs (such as the DSP 425 of system 400) DC removes assembly 530 and I/Q compensation assembly 540.Before received data is sent to combined detection system 550, can remove

assembly

530 and 540 pairs of I/Q sampling carrying out preliminary treatment of I/Q compensation assembly by DC from a time slot collection, DC removes assembly 530 can carry out the DC offset correction, and I/Q compensation assembly 540 can be carried out the unbalanced correction of I/Q phase place.In some embodiments, combined detection system 550 comprises JDA and Programmable DSPs, and Programmable DSPs makes it possible to customize by the software of one or more united detection processing operation.In one embodiment, DSP can carry out pretreatment operation before data are sent to JDA.As discussing after a while, the pretreatment operation of being carried out by DSP can comprise channel estimating and/or training sequence Interference Cancellation.

Fig. 6 is the block diagram 600 according to the united detection processing of an execution mode.Can carry out by combined detection system by block diagram 600 described united detection processing, such as the combined detection system 550 of the inner receiver chain of Fig. 5.The operation of united detection processing can be carried out jointly by JDA and Programmable DSPs.

United detection processing can be from receiving through DC offset correction, the unbalanced compensation of I/Q phase place and/or any other pretreated signal.Received signal can comprise by separated two the data fields of training sequence.Fig. 7 shows the time slot format 700 of received signal, and received signal comprises first data field 710, is training sequence field 720 afterwards, is second data field 730 afterwards, and is protection (guard period) 740 at interval afterwards.Because airborne spread channel diffusion, data in the afterbody of data field 710 and training sequence 720 are interfering with each other, and the data in the afterbody of training sequence 720 and second data field 730 are interfering with each other, and this obtains data field 712 (r1) and data field 722 (r2).

The data division operation 610 of the united detection processing shown in Fig. 6 can be handled received signal, with this division of signal to be independently signal of these two of data field (r1 or r2) and training sequences.United detection processing can be come in turn deal with data field r1 and r2 according to any desired order, wherein (for example to a data field, r2) processing can reuse to other data field (for example, the more resulting results of processing r1), as described later.Description like this, after a while can be called the processing to data field r1 and/or r2.In some embodiments, the processing to data field r2 is to carry out before to the processing of data field r1.When second data field comprises that command instruction is (such as reaching power control instruction synchronously, synchronously and power control instruction can at first to the data flow of data field r2, the process the then data flow of data field r1 handled, handle) time, this execution mode is preferred.

The training sequence that is provided by data division operation 610 can be provided in channel estimating operation 615, and generates channel estimate matrix H and encoder matrix C, matrix H as shown in Figure 3 and Matrix C.Be known that channel estimation process can use Known signal patterns (for example training sequence signal) to estimate the airborne spread channel from the base station to the receiver.If the use smart antenna then can be associated each encoding channel of TD-SCDMA scheme with different propagation channel.Use the result of channel estimating can make rough (crude) estimation about efficient coding quantity, but careful be the quantity of too high estimation efficient channel, invalid to avoid efficient coding is appointed as.Finally, the efficient channel of joint-detection detects the better judgement that can provide efficient coding is provided.The output of channel estimating operation 615 can comprise matrix H and Matrix C.

The training sequence Interference Cancellation operation of training sequence interference to the influence of data field, the data fields that training sequence Interference Cancellation operation 620 can be handled by operation 610 outputs have been eliminated by execution.The operation of training sequence Interference Cancellation relates to the channel estimating of use from channel estimating operation 615.The output of operation 620 can be to have the data field that passes through the training sequence Interference Cancellation.In one embodiment, carry out data division, training sequence Interference Cancellation and/or channel estimating by DSP.This makes it possible to customize one or more operation and need not to change the receiver chip group.

In one embodiment, will send to JDA by the pretreated result that DSP carries out, to carry out front-end processing.JDA can receive matrix H and the Matrix C (for example, via the external coprocessor interface, shown in the system of Fig. 4) that is sent by DSP, and makes up matrix V in operation 625.Make up matrix V and relate to use channel estimate matrix H and encoder matrix C.The i row of matrix V are the convolution of i row with the Matrix C i row of matrix H.As described later, in some embodiments, JDA can realize the displacement to the result before being kept at the result in the memory.

In addition, the operation 630 of JDA can be carried out matched filter and calculate, and makes up matched filter output y=THr thus, and wherein r is r1 and/or r2.Matched filter operation can use matrix V and vectorial r to make up y, and because a lot of of matrix T H are zero, as shown in Figure 3, so needn't make up whole matrix T H.Matched filter operation 630 can receive by operating 625 constructed matrix V.In addition, matched filter operation 630 can also receive r1 and r2 data field from training sequence Interference Cancellation operation 620.As described later, in some embodiments, JDA can realize the displacement to the result before being kept at the result in the memory.

JDA can also calculate the secondary power (power) of y and each row of matrix V in operation 635, this can carry out valid code and detect.To the calculating of secondary power of each row of matrix V relate at given row ask matrix V item squared magnitude and operation.Because it is enough to detect this purpose vector y1 for valid code, needn't carry out at y2 the calculating of the secondary power of y so carry out at y1.In some embodiments, JDA can realize the displacement to resultant secondary power before being kept at resultant secondary power in the memory.

In one embodiment, carrying out efficient coding by DSP detects.In operation 640, DSP can receive y and matrix V (optional) secondary power result of calculation from JDA, and uses this secondary power value to determine the efficient coding and the zoom factor of each channel code.When in DSP, carrying out the efficient coding detection, can carry out the customization of efficient coding detection algorithm.The evolution that can detect handle along with efficient coding and revise dsp software, and same chipset can be used in to carry out to detect through the efficient coding of revising and handles.

Can use any suitable algorithm to carry out efficient coding detects.For example, efficient coding testing process can relate to be determined greater than the coding of threshold level the secondary power of its matched filter output (y).Should be understood that this only is the example that simple efficient coding detects step, and can use efficient coding arbitrarily to detect and handle.Efficient coding detects handles the zoom factor that can also determine to be applied to each channel code.The zoom factor that can represent each channel code by mantissa value and exponential quantity.Should be understood that, not necessarily to carry out efficient coding and detect, and under a stable condition, can omit this operation, for example, when using spread spectrum coefficient " ", perhaps to have indicated in given time slot which yard be when effectively indicating when subscriber equipment has had.

The result of efficient coding detecting operation can estimate to operate 655 by SIR and use, and also can be carried out by DSP and operate 655.SIR estimation operation 655 can be used the result of channel estimating operation 615 and efficient coding detecting operation 640.SIR estimates that operation can output noise power σ 2.Should be understood that, in some embodiments, can under the result's who does not use efficient coding to detect situation, carry out SIR and estimate.In this case, can after channel estimating, carry out SIR by DSP and estimate, and can before JDA carries out front-end processing, SIR be estimated to send to JDA.Alternatively, when JDA carries out front-end processing, carry out SIR at least in part by DSP and estimate.

In some embodiments, will detect determined efficient coding by the efficient coding that DSP carries out and, send to JDA to be used for back-end processing with the indication of zoom factor and/or by the noise that DSP calculated.The JDA back-end processing can comprise y again convergent-divergent (rescaling) and rearrange (reordering) operation 645 and V again convergent-divergent and rearrange the operation 650.These operations can be on the result's who is sent by valid code detecting operation 640 basis, and the row of the row of matrix y and matrix V are rearranged and convergent-divergent again, wherein, rearranges and has eliminated arbitrarily and the corresponding row of non-valid code.As this result who rearranges, the JDA back-end processing can be used same matrix index, and with which coding be effectively irrelevant.

The back-end processing of being carried out by JDA can also comprise matrix A calculating operation 660, matrix A calculating operation 660 receive by operation 650 generated through convergent-divergent and the matrix V that rearranges again with by operation 655 noises that generated, to make up matrix A by estimated matrix computing THT+ σ 2I.Owing to can directly use matrix V to come the element of compute matrix A, and, therefore, make up matrix A and might not relate to the structure matrix T because a lot of elements of matrix T all are zero.Therefore, can only carry out calculating to the nonzero element of matrix A, and these nonzero elements (for example, not needing to store known neutral element) that can storage matrix A.In some embodiments, JDA carries out displacement to these results before can being kept in the memory in the value with resulting matrix A.

The JDA back-end processing can also comprise Cholesky operation splitting 655, and Cholesky operation splitting 655 can be decomposed into matrix A matrix L and matrix D.Can carry out Cholesky under the situation that does not need the whole elements of compute matrix L decomposes.Matrix L can be divided into a plurality of that numerically restrain, and the quantity of the piece that is calculated depends on desired accuracy.In an implementation, the quantity of the piece of the matrix L of being calculated is 2.The minimizing that the piece of the matrix L that usage quantity reduces can obtain division number of times in a division calculation has realized joint detection algorithm thus.Therefore, can be only to the calculating of the subclass of the nonzero element of matrix L, and these nonzero elements (for example, not needing to store known neutral element) that can storage matrix L.

The JDA back-end processing can also comprise linear equation solver operation 670, and linear equation Ax=y (for example, using forward substitution, some division and back to replacement, as mentioned above) is found the solution in linear equation solver operation 670.Linear equation solver operation 670 can and rearrange that operation 645 receives data fields from y convergent-divergent again and from Cholesky operation splitting 665 receiving matrix L and matrix D.Linear equation solver operation 670 can generate data field (x1 and x2).In some embodiments, JDA can be shifted forward substitution, some division and/or back before the result who replaces processing is kept in the memory to these results.

Can extract operation by the user and 675 come deal with data field x1 and x2, the user extracts (a plurality of) sign indicating number that operation 675 can use UE using and extracts this particular UE data.Linear equation solver operation 670 can generate this two data field x1 and x2 successively, and user data extraction piece 670 can also merge these two data fields, thereby obtain a unified data field x, handle to be used for other assembly by data extract operation 670 this data field of output x subsequently.For example, DSP can carry out back (post) united detection processing.If also need the coding (for example, being used for the coding of power measurement) outside the UE sign indicating number, then can in output, comprise other coding.

In some embodiments, can realize JDA, wherein, before the position of the negligible amounts that will be arranged in accumulator fixed bit position is kept at memory, (for example, in the accumulator) operation result is shifted according to the fixed point implementation.This operation is equal to be selected those positions that will be stored in the accumulator in the memory under the situation that needn't be shifted to the content of accumulator.Those fixed bit positions that can select shift value and will be stored in the accumulator in the memory, thus guarantee in memory, to be enough to show value (for example, shearing the exact value of (bit clipping)) in the accumulator without any obvious position.

In one embodiment, JDA comprises that wherein storage is as the memory assembly that the variable of symbol N position decimal is arranged.Like this, the numerical value of the variable of being stored-1 and+1 between, comprise-1 but do not comprise+1.Alternatively, because technology described here is not limited to be used for decimal, so the variable in the memory of JDA can be the N position integer that symbol is arranged.When carrying out computing at two or more variablees stored among the JDA, (for example, being stored in the accumulator) operation result may not satisfy the range of variables in the above-mentioned JDA of the being stored in memory.Displacement technology described here makes can use desired bit wide to come storing value.

Should be understood that a lot of computings among the JDA are multiplying and/or add operation, such as computing cj=∑ iaibi.JDA can carry out this computing, thereby by use accumulator (data bit width of this accumulator obviously than finally multiply each other and add up (it is roomy that multiply and accumulate, result MAC) will be stored in the data bit of memory wherein) multiply each other and the MAC calculating process that adds up in keep degree of precision.After computing (such as MAC) is finished, a plurality of subclass of accumulator can be saved in the memory.To be saved to memory to the position of which accumulator selects to relate to according to shift value and comes content to accumulator to be shifted and relate to place value is saved in memory from the fixed bit position of accumulator.

Fig. 8 is according to the accumulator of an execution mode and the figure of subclass of position that will be saved to the accumulator of memory after shifting function.The place value 800 that illustrates in the drawings only is used for illustrative purposes, and described technology is unrestricted on the one hand at this.The quantity of the position that accumulator 810 comprises is than (it is big for example, MAC) will to be stored in the quantity of position of data storage memory location 820 after finishing in computing.The position that accumulator 810 can comprise any amount (for example, 28), these positions comprise sign bit, and the figure place N that will be stored in the result of memory than the little any amount of the figure place of accumulator (for example can be, 11), these positions also can comprise sign bit.Should be understood that above-mentioned accumulator and data bit width value only are examples, technology described here is unrestricted on the one hand at this.In addition, can be enough big based on will the accumulator size being elected as by the size of the data of computing, thus guarantee that precision does not have heavy losses.

Can select to be saved to the figure place of the accumulator 810 in the memory based on desired memory data bit wide N.In addition, owing to can correspondingly adjust based on selected fixed bit position, so can at random select to the particular location of the fixed bit in the accumulator 810 that should be saved to memory to the displacement that before the result is saved in memory, is applied to accumulator contents.In Fig. 8, the accumulator data place value that be saved to memory is the place value in the rectangle 840.

As being this situation the when decimal of symbol is arranged in the data when institute's computing, the data of institute's computing and performed computing can be to make accumulator decimal point 830 between the

position

831 and 832 of two specific bit of accumulator 810.Selected to be saved to accumulator 810 data bit (i.e. position in rectangle 840) of memory, this makes bit position 831 comprise the leftmost bit that will be saved to memory.

(for example, MAC) finish and after operation result is arranged in accumulator 810, select to be saved to the place value of memory based on the displacement of the position content that is applied to accumulator 810 in computing.Fig. 8 shows that displacement is shifted with the place value to accumulator to the content application of accumulator 810, as shown in the resulting accumulator 810 '.In that after with the place value right shift in the accumulator 810, accumulator 810 ' is identical with accumulator 810 according to shift value S.As described later, this shift value is the integer that symbol is arranged, and can determine or set this shift value according to the mode of any appropriate.Positive shift value S (wherein S is a positive integer) can be associated to shifting left with the position with accumulator.Negative shift value-S (wherein S is a positive integer) can be associated with the content right shift with accumulator.Should be understood that the symbol of this shift value is arbitrarily and depends on usage, and technology described here is unrestricted on the one hand at this.

In the explanation of Fig. 8, shift value is to make shifting function have the symbol place value 833 to move to accumulator bit position 831 from the bit position with first.After the displacement of finishing accumulator contents, the place value that is positioned at accumulator fixed bit position shown in rectangle 840 is saved to memory.In the illustrative examples of Fig. 8, select shift value to make the sign bit that has of repetition of the binary number in the accumulator not be stored in the memory, be referred to as normalization here.Should be understood that, because the example among Fig. 8 only is to be used for illustrative purposes, thereby can be according to any amount these values that are shifted.In some embodiments, the value that is stored in the memory is the N position decimal that symbol is arranged, and the shift value that is applied to accumulator contents be make that the content that is shifted be positioned at accumulator decimal point left side only comprises repetition sign bit (for example, being arranged in the position in 833 left sides, bit position of accumulator 810) arranged.

Fig. 9 handles 900 flow chart, and the operation result that is stored in the accumulator can be stored in the memory by handling 900.Can be in JDA carry out by hardware and handle 900, under the situation of MAC computing, this hardware can comprise and multiplies each other and add up unit and one or more shift register.In action 902, carry out computing (for example, MAC) according to the result who is stored in the accumulator.After computing is finished, in action 904, come accumulator contents is shifted according to shift value.Shifting function is equal to that content with accumulator multiply by or divided by 2SHIFT, wherein SHIFT is a shift value.Can carry out shifting function by Output Shift Register, and can internally determine shift value, this shift value (for example, Programmable DSPs, and can being specified by the user) perhaps can be provided by the system of JDA outside by JDA.

In action 908, expection will round (round) to the content of accumulator in the subclass stored into memory of position.Can round up (rounding-up) by last position to the subclass of the accumulator bits that will be stored in memory (or round downwards (rounding-down)) round.Yet, should be understood that, because technology described here is unrestricted on the one hand at this, can round according to other any suitable manner.As known to those skilled in the art, in action 910, can overflow checking and judge whether round the result overflows.If made the judgement that occurs overflowing, then make the accumulator contents that will be saved to memory saturated (saturate) (action 912).The saturated value that will be stored in memory that relates to is made as maximum positive N figure place or minimum negative N figure place.

If (in action 910) made the judgement of overflowing does not appear, perhaps occurred overflowing and correspondingly the value of making is saturated subsequently, then should handle and continue action 914, N the continuous position that will be arranged in the place, fixed position of accumulator appointment in action 914 is saved in memory, and wherein N is less than the sum of the position of accumulator.Can specify each the fixed position to the figure place N that will be saved to memory and the accumulator that will be saved to memory during JDA in design by hardware designs teacher.Subsequently, handling 900 can stop.

In some embodiments, different variablees can have the different shift values that is associated.In some embodiments, distribute same shift value to being stored in vector in the memory or each element of matrix.In other embodiments, the different lines or the different rows of matrix are distributed different shift values.Permission is used different shift values at the different lines or the different rows of matrix, and this makes it possible to by based on the shift value of the finishing (tailor) of the displacement of the value in each row of matrix or each row being selected each row or each row, to improve accuracy.

Should be understood that the add operation of being undertaken by JDA can relate to two or more and utilize different displacements and may stored storage of variables.Should be understood that, can be regarded as having the storage of the mantissa value of different indexes with institute's storage of variables that different shift values are associated.In this computing, JDA can guarantee that in the displacement for the variable that all just is being added before carrying out addition all be identical.For example, when the computing carried out such as cj=dj+ ∑ iaibi, JDA can judge before one or more vector element is in being stored in memory whether be shifted.If one or more in these vectors is shifted, then JDA can guarantee that whole vectors all have same shift value before carrying out addition.For example, if vectorial a in being saved to memory before through the displacement carried out according to shift value A_SHIFT, vector b in being saved to memory before through the displacement carried out according to shift value B_SHIFT, and vectorial d in being saved to memory before through the displacement carried out according to shift value D_SHIFT, then JDA can be shifted to the vectorial d that obtains from memory according to shift value A_SHIFT+B_SHIFT-D_SHIFT before the d vector element is added to addition result ∑ iaibi.Can with this computing according to mathematical way be expressed as (dj＜＜A_SHIFT+B_SHIFT-D_SHIFT)+∑ iaibi, wherein, operator "＜＜" is expressed as the shift operation of carrying out at dj.The result of computing can also be shifted before being stored in memory location.

Figure 10 is the block diagram that the hardware structure 1000 among the JDA is shown, and hardware structure 1000 can be realized the aforesaid shifting processing that relates to the computing of variable d and summed result ∑ iaibi addition.In one embodiment, though can use other data bit width, the figure place N that is used for storage of variables a, b and d is 11, and the figure place of accumulator is 28, and described here technology is unrestricted on the one hand at this.Hardware 1000 can comprise and is used for the input shift register 1008 that before the d value is written into accumulator 1006 the d value is shifted.Carried out displacement, vectorial d according to shift value B_SHIFT before in being saved to memory and be shifted according to shift value D_SHIFT before in being saved to memory if vectorial a has carried out displacement, vectorial b according to shift value A_SHIFT before in being saved to memory, then shift register 1008 employed shift values can be A_SHIFT+B_SHIFT-D_SHIFT.

Hardware 1000 can also comprise and be used for the multiplier 1002 that input value ai and bi are multiplied each other and content and the ai that is provided by multiplier 1002 and the adder 1004 of bi multiplied result addition with accumulator 1006 is provided.Accumulator 1006 can comprise a plurality of A, and A can be greater than the figure place N of input data.After multiplying each other and accumulating operation finishes, can come the value in the accumulator 1006 is shifted according to shift value C_SHIFT by Output Shift Register 1010, and each subclass of accumulator can be saved in memory.As mentioned above, the subclass of each of accumulator can be included in N position of fixed position in the accumulator.Can select shift value C_SHIFT, a feasible high position with the value in the resulting accumulator is kept in the memory.Be used for figure place that the median of calculating in united detection processing is stored by minimizing, JDA can have desired speed, memory area and/or power consumption.

In some embodiments, the system by the JDA outside sets one or more shift value that is used for JDA (for example, passing through Programmable DSPs).External system can comprise Programmable DSPs, and this makes the designer to programme to the software of the shift value of one or more variable of being identified for being stored by JDA.Alternatively, perhaps additionally, the designer is provided with fixing shift value (for example, passing through Programmable DSPs) and provides it to JDA subsequently.DSP is to definite use that can relate to the result of processing performed in the front-end processing from JDA of shift value.For example, based on the result of the efficient coding testing process that can be carried out by DSP, DSP can be identified for the shift value of one or more variable of being stored by JDA.May depend on number of times (number of times of this addition is relevant with the quantity of efficient coding) owing to be used for the selection of shift value of the back-end processing of JDA, so this step needs in the summation operation addition.Therefore, can determine shift value based on the result of efficient coding testing process at least in part by DSP, and subsequently shift value be sent to JDA.

In some embodiments, internally determine employed one or more shift value of JDA by JDA.JDA determines can relate to being stored in the analysis of the data result in the memory to the inside of one or more shift value that will be used by JDA.Under the situation of matrix A, because matrix V is stored in the inside of JDA, thus calculate relatively difficulty of maximum possible output displacement by software, and therefore the shift value of matrix A is determined in expectation by JDA.In some embodiments, can be by the internally maximum possible output displacement of compute matrix A of JDA.If, then can usually determine maximum possible output displacement based on the greastest element in the matrix A by the internally maximum possible output displacement of compute matrix A of JDA.Because each encoding channel of element representation (adding noise) on the diagonal in the matrix A is big with the cross-correlation of other any encoding channel with the auto-correlation ratio of the auto-correlation of channel code and each encoding channel, so the greatest member in the matrix A is positioned on the diagonal.In order to determine the maximum possible output displacement of matrix A, cornerwise each element that can compute matrix A and can be with the maximum possible output displacement of the maximum possible shift value of greatest member as the whole elements of matrix A.Should be understood that, compare that the automatic inside of only using a small amount of cycle to carry out shift value is determined with the sum in the cycle that is used for the complete united detection processing.

The storage of other variable among the JDA is also benefited from the inside of the shift value of being carried out by JDA and is determined.For example, the result that the some division is handled also benefits from the inside of shift value is determined, the result is stored in the memory shift value should be applied to these results before.As mentioned above, under the situation of a division, such as the f./D computing of linear equation solver operation, this computing comprises takes advantage of the anti-phase of decimal and diagonal matrix D.Because the element of D is positive decimal, is not the possibility of decimal so exist the result of a division.(that is, to can be used in the result who guarantees a division also be decimal to the displacement of deriving of inside l/dii) to the contrary diagonal element of matrix D.In some embodiments, used single shift value at whole elements of this matrix, this can make computation complexity and memory area minimize.In this embodiment, the maximum possible shift value of greatest member that can be by determining this matrix and subsequently this maximum possible shift value is used for whole elements of matrix is determined single shift value.

Can in a plurality of parts, carry out the division of matrix D to each element of matrix D.At first, as known in the art, can come each diagonal element of matrix D is carried out normalization by shifting processing, wherein shifting processing comes each element is shifted and determines the mantissa and the index of each diagonal element thus according to shift value, eliminates the position that symbol is arranged of repeating thus.Because technology described here is unrestricted on the one hand at this, the shift value that is applied to whole elements can be identical, to reduce computation complexity, perhaps can be different.The normalized value of the element dii of matrix D (being called normalized (dii)) is less than 1 and more than or equal to 0.5.Therefore, each value 0.5/normalized (dii) is greater than 0.5 and is less than or equal to 1 decimal.Can (for example have a plurality of positions, 21) middle divider in calculate normalized value 0.5/normalized (dii), and can these positions that quantity reduces be kept in the memory according to the form of mantissa's (for example, as 11 place values) and index (for example, 5 place values) subsequently.In addition, maximal index that can determined value 0.5/normalized (dii), and the shift value used as before the result of memory point division arithmetic g=f/.D wherein should be understood that, can be with the shift value of maximal index as whole elements of vectorial g.

In some embodiments, whether the designer can use the shift value of given variable to select to JDA, so that internally determined displacement or set displacement by the system (for example, passing through Programmable DSPs) of JDA outside by JDA.The designer can for example set bit variable by the Programmable DSPs of communicating by letter with JDA, wherein, bit variable represent JDA whether should use by JDA inner determine (for example, aforesaid matrix A and matrix 1./D) or the shift value of the given variable set by system's (for example, being programmed among the DSP by the designer) of JDA outside.This makes the designer can select which variable should use the inner displacement of determining and is stored and which variable should use the displacement of being determined or being set by external source (for example, Programmable DSPs) and is stored.This method is by making that the designer can be by some variablees (for example, shift value by designer's programming) which position the shift value that outside is determined selects is important, and make JDA use the result of intermediate treatment to come internally to determine the shift value of other variable simultaneously, flexibility ratio is provided.The shift value that is set by the system of JDA outside may relate to and uses the intermediate object program that offers external system to calculate shift value, perhaps can be the fixing shift value that is provided by the designer.

Should be understood that, can use one or more technology of the displacement that is used to use, determines and/or set that the fixed point of JDA realizes individually, or use with other technology described here.Can with JDA that Programmable DSPs is communicated by letter in use displacement, to carry out one or more processing operation such as one or more intermediate treatment operation, but the technology of the shift variable among the JDA also can be used by the JDA that does not have whole features described here (for example, needn't necessarily use DSP to carry out the JDA of intermediate treatment operation).

In some embodiments, joint detection algorithm can be included in before the JDA transmitting channel is estimated one or more propagation channel is estimated that (for example, one or more row of the matrix H of Fig. 3) carry out pre-convergent-divergent operation.Pre-convergent-divergent operation can be included in the channel estimating operation, and in case finish initial channel estimation and before this channel estimating of output, carry out pre-convergent-divergent operation.Also can in the Programmable DSPs of carrying out the initial channel estimation processing, carry out this pre-convergent-divergent operation.Before estimating to the JDA transmitting channel one or more propagation channel being estimated to carry out pre-convergent-divergent operation can be so that improves accuracy in the fixed point of JDA realizes.

Figure 11 shows the preshrunk of propagation channel estimation and lets slip the flow chart 1100 of journey.This processing can have been determined one or more pre-convergent-divergent coefficient from moving 1102 in action 1102.These pre-convergent-divergent coefficients will be applied to being estimated by the determined propagation channel of initial channel estimation step.The pre-convergent-divergent coefficient of each propagation channel can be different, but should be understood that, this technology is unrestricted on the one hand at this.Can come to determine determining of one or more pre-convergent-divergent coefficient in any suitable manner.

It can be the pre-convergent-divergent coefficient that secondary power that the greatest member estimated based on propagation channel at least in part and/or propagation channel are estimated selects propagation channel to estimate.Can select pre-convergent-divergent coefficient to realize various purposes, include but not limited to: (1) is estimated to carry out pre-convergent-divergent to propagation channel, makes that the maximum value element that these propagation channels are estimated has identical index after pre-convergent-divergent; (2) propagation channel is estimated to carry out pre-convergent-divergent, make that these propagation channels estimate to have essentially identical maximum value element (for example, the same index of their maximum value element and the absolute value of mantissa) after pre-convergent-divergent; Perhaps (3) are estimated to carry out pre-convergent-divergent to propagation channel, make that these propagation channels estimate to have essentially identical secondary power after pre-convergent-divergent.

In one embodiment, can select pre-convergent-divergent coefficient to guarantee that the index of the maximum value element of each channel estimating is all identical after pre-convergent-divergent.When channel estimating comprised complex item, the maximum value element can be selected as comprising these the absolute value of real part and the maximum of the set of the absolute value of imaginary part.Under this situation, complex item is made of two real number elements, real part and imaginary part that promptly should plural number.If initial propagation channel estimate be by

h ₁＝[h ₁(0)，h ₁(1)，...，h ₁(w-1)]

h ₂＝[h ₂(0)，h ₂(1)，...，h ₂(w-1)]

.

h _ka＝[h _ka(0)，h _ha(1)，...，h _ka(w-1)]

Provide, wherein h1, h2 ..., hka is that initial propagation channel estimates (each row of matrix H) and each initial propagation channel estimate it is the vector with w-1 complex item.Constitute each complex item hi (j) by real number element real (hi (j)) and imaginary number element imag (hi (j)).Therefore, as represented here, the maximum value element (being also referred to as the maximum value of a plurality of values that constitute initial propagation channel estimation) that given propagation channel can be estimated hi is expressed as by { abs (real (hi (j))), abs (imag (hi (j))), j=0, ..., the maximum of the set that w-1} provides.

In another embodiment, (for example determined given propagation channel estimation, the given row of matrix H) maximum value element, and the inverse that can be the maximum value element with the pre-convergent-divergent coefficient settings of this given channel, guarantee thus after pre-convergent-divergent, the element of given propagation channel is less than or equal to 1 (for example, decimal).Can use independent mantissa and index to represent the pre-convergent-divergent coefficient that each propagation channel is estimated.

In another embodiment, determined that secondary power that each propagation channel estimates (for example, the mould of each row of matrix H square), and can come each channel is carried out convergent-divergent, so that each channel has essentially identical power behind convergent-divergent according to pre-convergent-divergent coefficient.Therefore, the pre-convergent-divergent coefficient that each propagation channel is estimated can be chosen as the inverse of the secondary power of each propagation channel estimation.

In action 1104, use determined pre-convergent-divergent coefficient in action 1102 comes each propagation channel is estimated to carry out pre-convergent-divergent.Can provide through the propagation channel of pre-convergent-divergent to JDA subsequently and estimate and corresponding pre-convergent-divergent coefficient.

The pre-convergent-divergent that wherein uses each propagation channel that Figure 12 shows according to an execution mode comes initial letter is propagated the illustrative embodiments that pre-convergent-divergent is estimated to carry out in the road.Can be listed as by each of matrix H and represent initial channel estimation, and wherein the length of each row of matrix H is W at as shown in the matrix H 1210 of a channel as row wherein.Therefore, each row of matrix H comprise W (for example, plural number).Figure 12 shows according to pre-convergent-divergent FACTOR P i to come each initial propagation channel is estimated that (each row of matrix H) carry out pre-convergent-divergent to make up the matrix H 1250 through pre-convergent-divergent.Can come to determine each pre-convergent-divergent coefficient in any suitable manner, for example, as use each to handle as described in (the action 1102 described processing of illustrated method in the flow chart such as Figure 11).

Should be understood that, can use pre-zoom technology individually, or use with other technology described here.Can with JDA that Programmable DSPs is communicated by letter in use pre-convergent-divergent, handle operation to carry out, but also can in the JDA of other type, use pre-zoom technology such as one or more of one or more intermediate treatment operation.

In case extracted user data, then in united detection processing or when united detection processing is finished, can eliminate the influence of pre-convergent-divergent coefficient.In some embodiments, the whole coefficients that are used for the intermediate object program that generates in united detection processing is carried out convergent-divergent or displacement can be eliminated when united detection processing is finished.These coefficients can comprise pre-convergent-divergent coefficient, as the zoom factor of regulation in efficient coding detects be used for the shift value of JDA memory stores.For example, if matrix T is carried out convergent-divergent, then can from final output, eliminate the influence (for example, from efficient coding detection and/or pre-convergent-divergent) of these zoom factors.In addition, perhaps alternatively, can eliminate the displacement that the intermediate object program that generates is carried out in united detection processing by coming with the negative value of clean (net) shift value final output is shifted.

Because one or more technology described here is unrestricted on the one hand at this, so can use any suitable hardware structure to realize these technology.Figure 13 is the block diagram according to the JDA framework 1300 of an execution mode.The JDA framework 1300 of Figure 13 comprises communication bus interface 1320, and communication bus interface 1320 makes JDA via communication bus 1321 and external module (for example, Programmable DSPs) communication.JDA can comprise finite state machine 1350, and finite state machine 1350 can be controlled a plurality of hardware to carry out joint detection algorithm, the JDA algorithm shown in the block diagram of Fig. 6.The JDA hardware block can comprise that data address maker 1303, register 1302, joint-detection memory 1304, first multiply each other and (for example add up unit 1306, the complex multiplication and the unit that adds up), second multiply each other and the unit 1307 that adds up (for example, the complex multiplication and the unit that adds up) and divider 1308.Register 1302 can be stored and be provided with and state information, and joint-detection memory 1304 can be stored in employed data value and parameter in the united detection processing algorithm.Can comprise that in this framework inputoutput multiplexer 1314 and its are set to direct the data to first and multiply each other and the unit 1306 that adds up, second multiplies each other and adds up unit 1307 or divider 1308.Output multiplexer 1316 can be set to the results direct of performed computing is returned combined detector memory 1304.

Shown in the framework of Figure 13, JDA can comprise the multiple data paths that can carry out dissimilar computings.The JDA framework comprises three data paths shown in the block diagram of Figure 13, be primary data path, secondary data path and divider 1308 data paths, wherein, primary data path comprises complex multiplication and the unit 1306 that adds up, and the secondary data path comprises the complex multiplication of simple version and the unit 1307 that adds up.

Comprise and multiplying each other and the primary data path of the unit 1306 that adds up can be carried out computing such as ∑ iaibi+dj.Shown primary data path comprises input shift register 1310 and Output Shift Register 1312, and these shift registers can multiply by or divided by value by two repeatedly power (that is, multiply by 2SHIFT).Input shift register 1310 can be used for each position of the data input value dj that will be added to accumulator is shifted, and each position of the value that Output Shift Register 1312 obtains in can being used for before each subclass of the value that accumulator is obtained is stored in the combined detector memory 1304 accumulator the fixed bit position of accumulator (for example) is shifted.In some embodiments, primary data path can be used to carry out multiplying each other and accumulating operation except that compute matrix V.Should be understood that, as is known to persons skilled in the art, though multiply each other and the unit 1306 that adds up comprise multiplier 1361 and adder 1362, but the complex multiplication and the unit 1306 that adds up can comprise four multipliers and two adders, are used to calculate the real part of multiplying each other and the imaginary part of two plural numbers.

Comprise through the secondary data path of the complex multiplication of simplifying and the unit 1307 that adds up and can carry out computing such as ∑ iaibi, wherein, bi is+1 ,-1 ,+j or-j.Can in relating to the matrix V computing of convolution of each row of matrix H and Matrix C, it carry out this computing.Since encoder matrix C can only limit to comprise belong to set+1 ,-1 ,+j, the element of-j} is so can use the secondary data path to come the element of compute matrix V.The secondary data path can be set to by using one or more multiplexer 1309 more optimally to carry out the multiplication of aibi, multiplexer 1309 can be based on bi+1 ,-1 ,+j or-j selects real part or the imaginary part of input value ai.The accumulating operation of being carried out by the secondary data path needn't necessarily comprise the summation that dj value is added to aibi, also needn't be necessarily subsequently the input of dj be shifted.Should be understood that, can in the secondary data path, comprise the Output Shift Register that the result of accumulator is carried out computing, thereby output is shifted.In some embodiments, the secondary data path can be used to carry out compute matrix V.

The division data path comprises divider 1308, and can be used for calculating the l/dii computing in the Cholesky resolution process.As known in the art, the division data path can be used to carry out the normalization division to be handled, in handling, the normalization division can eliminate the sign bit that has that repeats thus by the shifting processing that each element of matrix D is shifted being come each diagonal element of matrix D is carried out normalization according to shift value.Because technology described here is unrestricted on the one hand at this, can be identical so be applied to the shift value of whole elements to reduce computation complexity, perhaps can be different.Can (for example have more long number, 20) middle divider in calculated value 0.5/normalized (dii), and can these positions that quantity reduces be kept in the JDA memory 1304 according to the form of mantissa's (for example, as 11 place values) and index (for example, 5 place values) subsequently.Should be understood that, index can be stored in the imaginary part of plural number owing to mantissa can be stored in the real part of plural number, so needn't necessarily need the specific memory device at the mantissa of each contrary element of matrix D and the storage of index.

According to a plurality of flowing water (pipeline) stage that comprises addressing generation phase 1360, data extract stage 1370, execution phase 1380 and data write phase 1390, finite state machine 1350 can be controlled the operation of each hardware block.Address phase 1360 can be associated with the control to data address generator 1303.The data extract stage 1370 can be associated with the control to associating detection of stored device 1304.Executing state 1380 can be associated with the control to primary data path, secondary data path or divider data path.Data write phase 1390 can with will be written to joint-detection memory 1304 from the result of execution phase and be associated.In one embodiment, except the visit of the initial memory when the beginning of each joint-detection task computation, in each clock cycle memory access (no matter being read access or write access) appears all.

Can under the control of finite state machine 1350, carry out the JDA shown in the block diagram of Fig. 6 for example by the framework of Figure 13 and handle operation.Figure 14 shows can be by the flow chart 1400 that is used for the hardware block of JDA framework is controlled and is used to carry out one or more task handling of united detection processing of finite state machine 1350 execution.Finite state machine can be from making up matrix V (action 1402), and wherein, thereby finite state machine can be controlled the element of aforesaid a plurality of flowing water stage compute matrix V.Finite state machine can continue to make up the output of first matched filter, and (for example, y1), wherein, thereby finite state machine can be controlled a plurality of flowing water stages calculating matched filter outputs (action 1404) in a similar way.Finite state machine can continue the secondary power of the matched filter output of the secondary power of compute matrix V and/or calculating before, wherein, only need to calculate the secondary power of a matched filter output in some embodiments, thereby wherein finite state machine can be controlled a plurality of flowing water stages calculating secondary power values (action 1406) in a similar way.

Subsequently, finite state machine can continue to make up second matched filter output (for example, y2), wherein, thereby finite state machine can be controlled a plurality of flowing water stages in a similar way and calculates matched filters output y2 (action 1408).Finite state machine can also continue to control the execution that detects relevant action with efficient coding, detects relevant action with efficient coding and can move (exporting y2 such as making up matched filter) execution simultaneously with other.

In some embodiments, the action relevant with effective detection can comprise and determine whether the effective code detection of needs.Can provide the indication that whether needs effective code detection by the designer.This indication can comprise parameter is set, and whether should carry out or skip efficient coding and detect to specify.Because these execution modes are unrestricted on the one hand at this, so can come in any suitable manner to provide the parameter setting to JDA.Can provide the indication parameter setting whether efficient coding should be skipped by the DSP that is connected to JDA, and the designer can provide this parameter to DSP.Finite state machine can judge whether to skip efficient coding and detect (action 1409) based on the value of aforementioned parameters.Should skip the judgement that efficient coding detects if made, then finite state machine can be skipped efficient coding all the other actions in detecting and proceeding to handle.If made the judgement that should not skip the efficient coding detection and therefore should carry out this detection, then finite state machine can be guided and be initiated the efficient coding detection, and this efficient coding detection can be carried out by the Programmable DSPs that is connected to JDA.In this way, when DSP was carrying out the efficient coding detection, JDA can carry out other operation of not using the efficient coding testing result simultaneously, such as making up matched filter output y2 (action 1408).

Finite state machine can be waited for finish (action 1410) of efficient coding detection subsequently, wherein, as mentioned above, in some embodiments, can carry out efficient coding by other assembly except that JDA (for example Programmable DSPs) and detect.When effective code detection was finished, finite state machine can be controlled the convergent-divergent again of matched filter output y1, y2 and matrix V and rearrange (action 1412).Finite state machine can continue Cholesky subsequently and decompose, and wherein, finite state machine can be controlled a plurality of flowing water states in a similar way and decompose (action 1414) to carry out.Finite state machine can continue to relate to the calculating of finding the solution linear equation A*x1=y1 at the x1 value subsequently, and wherein, finite state machine can be controlled a plurality of flowing water states in a similar way and relate to the calculating (action 1416) of finding the solution linear equation with execution.Finite state machine can continue to extract user data (action 1418) from separate x1.Finite state machine can continue to relate to the calculating of finding the solution linear equation A*x2=y2 at the x2 value subsequently, and wherein, finite state machine can be controlled a plurality of flowing water states in a similar way and relate to the calculating (action 1420) of finding the solution linear equation with execution.Finite state machine can continue to extract user data (action 1422) from separate x2.Finite state machine can be waited for next time slot (action 1424) subsequently, and continues once more 1402 to begin to repeat this processing from moving when receiving next time slot.Should be understood that, in some embodiments, can before the finishing dealing with of current time slots, load data and Control Parameter (being collectively referred to as data here sometimes) at next time slot.In some this execution modes, after JDA finishes the Control Parameter and data of loading at current time slots, can carry out data and the Control Parameter of loading immediately at next time slot.Control Parameter can comprise whether shift value, expression should skip the length (W) of the parameter of efficient coding detection, channel and/or the quantity of coding.

Can use various aspects of the present invention individually, or with not in aforementioned embodiments concrete discuss multiple common use various aspects of the present invention are set, and these aspects of the present invention described here are not limited on it is used described in the aforementioned description or the details and the setting of the assembly in the accompanying drawings.These aspects of the present invention can be used for other execution mode, and can implement or execution according to variety of way.Can with the network of any kind, bunch or structure is common realizes various aspects of the present invention.Network is realized not being provided with any restriction.

Therefore, the description of front and accompanying drawing only are examples.

In addition, employed here wording and term are for purposes of illustration, it should be considered as restriction.Here employed " comprising ", " comprising ", " having ", " relating to " and modification are intended to comprise listed thereafter assembly and equivalent and extra unit.

Claims

1, a kind of combined detection system that is set to received signal is carried out joint-detection, this combined detection system comprises:

The combined detector accelerator, it is set to first data that are input to described combined detector accelerator are carried out front-end processing, and second data that output obtains from described front-end processing, and wherein said combined detector accelerator also is set to use at least the 3rd data that are input to described combined detector accelerator to carry out back-end processing; And

Programmable digital signal processor DSP, it is connected to described combined detector accelerator, wherein, described programmable digital signal processor DSP is programmed to use by described second data of described combined detector accelerator output and carries out at least one intermediate treatment operation, and wherein said programmable digital signal processor DSP also is programmed to operate described the 3rd data that obtain to described combined detector accelerator output from described at least one intermediate treatment.

2, combined detection system according to claim 1, wherein, described at least one intermediate treatment operation comprises the valid code detecting operation that a plurality of valid codes of described received signal are determined.

3, combined detection system according to claim 2, wherein, described at least one intermediate treatment operation also comprises the signal noise estimation operation that the Noise Estimation of described received signal is determined.

4, combined detection system according to claim 1, wherein, described combined detector accelerator is set to receive the parameter whether expression will carry out the valid code detecting operation that a plurality of valid codes of described received signal are determined.

5, combined detection system according to claim 1, wherein, described programmable digital signal processor DSP also is programmed to carry out at least one pretreatment operation, and send the data that obtain from described at least one pretreatment operation, as described first data that are input to described combined detector accelerator.

6, combined detection system according to claim 5, wherein, described at least one pretreatment operation comprises the channel estimating operation that initial channel estimation is determined, described initial channel estimation comprises and the corresponding a plurality of channel estimating of a plurality of propagation channels.

7, combined detection system according to claim 6, wherein, described at least one pretreatment operation comprises pre-convergent-divergent operation, this pre-convergent-divergent operation is to carrying out pre-convergent-divergent with the corresponding described a plurality of channel estimating of described a plurality of propagation channels, thereby generate channel estimating through pre-convergent-divergent, and wherein, come in described a plurality of channel estimating at least some are carried out pre-convergent-divergent according to different zoom factors.

8, combined detection system according to claim 5, wherein, described at least one pretreatment operation comprises the operation of training sequence Interference Cancellation.

9, combined detection system according to claim 1, wherein, described combined detector accelerator also is set to after described second data of output and carried out at least one and handle operation before described the 3rd data of reception.

10, combined detection system according to claim 1, wherein, described first data comprise at least one data field and at least one Control Parameter.

11, combined detection system according to claim 1, wherein, described front-end processing and described back-end processing comprise the processing operation that is used for the TD-SCDMA communication plan.

12, combined detection system according to claim 1, wherein, described back-end processing comprises that the user data of output and user-dependent data extracts operation.

13, a kind of combined detector accelerator that is set to carry out at least some processing relevant with the joint-detection of received signal, this combined detector accelerator comprises:

Processor, it is set to:

First data that are input to described combined detector accelerator are carried out front-end processing, and export second data that obtain from described front-end processing, and

At least use the 3rd data that are input to described combined detector accelerator to carry out back-end processing, wherein, described the 3rd data are at least in part based on described second data.

14, combined detector accelerator according to claim 13, wherein, described the 3rd data comprise the indication of the valid code of described received signal.

15, combined detector accelerator according to claim 14, wherein, described the 3rd data comprise the signal noise estimation of described received signal.

16, combined detector accelerator according to claim 13, wherein, described first data comprise the data that obtain from the preliminary treatment to described received signal.

17, combined detector accelerator according to claim 16, wherein, described first data comprise and the corresponding a plurality of channel estimating of a plurality of propagation channels.

18, combined detector accelerator according to claim 16, wherein, described first data comprise a plurality of channel estimating with the pre-convergent-divergent of the corresponding process of a plurality of propagation channels.

19, combined detector accelerator according to claim 16, wherein, described first data comprise from the resulting data of training sequence Interference Cancellation to described received signal.

20, combined detector accelerator according to claim 13, wherein, described first data comprise at least one data field and at least one Control Parameter.

21, a kind of method of utilizing the combined detector accelerator to come received signal is carried out joint-detection, this method may further comprise the steps:

Use described combined detector accelerator to receive first data;

Use described combined detector accelerator to come described first data are carried out front-end processing and obtained second data, wherein, described front-end processing comprises at least some operations of the described joint-detection of described received signal;

Use described combined detector accelerator to export second data that obtain from described front-end processing;

Use described combined detector accelerator to receive to three data of small part ground based on described second data; And

Use described combined detector accelerator to carry out the back-end processing of using described the 3rd data at least, wherein, described back-end processing comprises at least some other operations of the described joint-detection of described received signal.

22, method according to claim 21, this method is further comprising the steps of:

The valid code of using described second data to carry out described received signal detects, and obtains described the 3rd data.

23, method according to claim 22, this method is further comprising the steps of:

Carrying out the signal noise of described received signal estimates.

24, method according to claim 23, wherein, the step of carrying out the signal noise estimation of described received signal comprises the signal noise estimation of using described the 3rd data to carry out described received signal.

25, method according to claim 21, this method is further comprising the steps of:

Described received signal is carried out preliminary treatment, obtain described first data.

26, method according to claim 25, wherein, described preliminary treatment comprises the channel estimating that execution is determined initial channel estimation, described initial channel estimation comprises and the corresponding a plurality of channel estimating of a plurality of propagation channels.

27, method according to claim 26, wherein, described preliminary treatment comprises carrying out pre-convergent-divergent with the corresponding described a plurality of channel estimating of described a plurality of propagation channels, thereby generate channel estimating through pre-convergent-divergent, and wherein, come in described a plurality of channel estimating at least some are carried out pre-convergent-divergent according to different zoom factors.

28, method according to claim 25, wherein, described preliminary treatment comprises carries out the training sequence Interference Cancellation.