CN109905898A - Baseband processing resource distribution method - Google Patents
Baseband processing resource distribution method Download PDFInfo
- Publication number
- CN109905898A CN109905898A CN201711282563.4A CN201711282563A CN109905898A CN 109905898 A CN109905898 A CN 109905898A CN 201711282563 A CN201711282563 A CN 201711282563A CN 109905898 A CN109905898 A CN 109905898A
- Authority
- CN
- China
- Prior art keywords
- dsp
- baseband processing
- processing unit
- core
- task
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Abstract
The present invention provides a kind of distribution method of baseband processing resource, which is executed by multiple multi-core DSPs, and this method includes that the DSP quantity for constituting baseband processing unit and including is determined according to data volume to be processed;The DSP for selecting the baseband processing unit to include according to the loading condition of each DSP and identified DSP quantity.Method of the invention can be realized the parallel processing in system-level, task level and instruction thread grade dimension, to effectively improve the resource utilization and processing speed of Base-Band Processing.
Description
Technical field
The present invention relates to wireless communication technology field more particularly to a kind of baseband processing resource distribution systems and Base-Band Processing
Resource allocation methods.
Background technique
The traditional base station processing capacity disposed is relatively fixed, can not on-demand dilatation, operator needs to expend expensive builds
If the site of maintenance expense and great number is mating and lease expenses, by largely increasing base station number, with meet WiMAX and
The business demand of narrowband.In addition, between traditional base station independently of one another, cannot achieve the statistic multiplexing of resource, causing resource and energy
The significant wastage in source.
It limits to solve above-mentioned traditional base station to cope with the explosive growth of data communication and multimedia service, newly
Base station gradually substitutes traditional base station, mainly there is the centralization of Cloud-RAN (CRAN) network architecture and super base station at present
The network architecture.For example, Fig. 1 shows the topological diagram of the centralized network architecture of the super base station of Computer Department of the Chinese Academy of Science's proposition, it is whole
A framework is divided into 4 layers of resource, the theory that every layer of resource takes horizontal pondization shared.First layer is distribution type fiber-optic extension radio frequency
Unit, i.e. RRH are responsible for wireless signal transmitting-receiving and simple signal processing, pass rf data back super base station machine by optical fiber
Then rf data is exchanged to any baseband processing unit by radio frequency exchange machine by room as needed;The second layer is that multimode can weigh
DSP can be used to realize the Base-Band Processing of base station in the baseband processing resource pond of structure;Third layer is the protocol processes of multi-mode reconfigurable
Resource pool, the main protocol processes for completing base station layer 2 and layer 3;4th layer is global resource management control pond, main to complete to base
The management control stood, such as RRM (wireless resource management), OAM (Operation and Maintenance) etc., and the resource point of entire super base station system
Match and scheduling controlling.
Although the base station architecture of centralization can effectively reduce energy consumption, infrastructure utilization rate is improved, realizes and calculates
The shared and load balancing of resource and collect medium, however, with orthogonal frequency division multiplexing (OFDM), multiple antennas receive and dispatch (MIMO) and
The introducing of new technologies such as multicast communication (CoMP) is cooperateed with, wireless algorithm complexity rises, this proposes base band signal process ability
Higher requirement.Existing Base-Band Processing can't effectively meet the real-time demand of base band signal process.
Therefore, it is necessary to be improved to the prior art, to improve the efficiency of base station processing mass data.
Summary of the invention
It is an object of the invention to overcome the defect of the above-mentioned prior art, provide a kind of baseband processing resource distribution system and
Baseband processing resource distribution method.
According to the first aspect of the invention, a kind of distribution method of baseband processing resource is provided, wherein the Base-Band Processing
It is executed by multiple multi-core DSPs, method includes the following steps:
Step 1: the DSP quantity for constituting baseband processing unit and including is determined according to data volume to be processed;
Step 2: the baseband processing unit packet is selected according to the loading condition of each DSP and identified DSP quantity
The DSP contained.
In one embodiment of the invention, the distribution method of the baseband processing resource further include: when the Base-Band Processing
When the load of a DSP is higher than first threshold in unit, load biography is carried out between the DSP that the baseband processing unit is included
It passs, to realize the load balancing between DSP that the baseband processing unit is included.
In one embodiment of the invention, the distribution method of the baseband processing resource further include: when the Base-Band Processing
When the load of multiple DSP in unit is below second threshold, born between the DSP that the baseband processing unit is included
Load is moved, and will have DSP of the load aggregation on the DSP of relatively low load level to relatively high load level.
In one embodiment of the invention, the distribution method of the baseband processing resource further includes step 3, the step 3 packet
Include following sub-step:
Step 31: each DSP of the baseband processing unit is divided into main core and multiple from core;
Step 32: by the uplink task of Base-Band Processing and downlink task be separately disassembled into multiple uplink subtasks and it is multiple under
Row subtask;
Step 33: being distributed to the uplink subtask and the downlink subtask at the base band based on computation complexity
Manage the main core of each DSP of unit and from core, wherein at least one uplink subtask and at least one downlink subtask are distributed to
Identical core.
In one embodiment of the invention, the distribution method of the baseband processing resource further include: execute task in DSP
In the process, for iterative cycles operation and the small sentence of correlation, multiple thread parallels is decomposed into and are executed.
In one embodiment of the invention, the Thread Count decomposed is equal with the nucleus number of DSP of the task is executed.
In one embodiment of the invention, the distribution method of the baseband processing resource further include: iterative cycles are grasped
Make, schduling cycle instructs so that preceding an iteration starts a new iteration before not yet completing.
In one embodiment of the invention, the distribution method of the baseband processing resource further include: execute task in DSP
In the process, a plurality of single instrction is linked together using very long instruction word.
In one embodiment of the invention, the distribution method of the baseband processing resource further include: for a single instrction
Corresponding multiple operands, while reading or being written.
Compared with the prior art, the advantages of the present invention are as follows:
The present invention improves the speed of Base-Band Processing using multicore DSP array;And the characteristics of combining multicore DSP array, from
Instruction thread grade, task level and the more various dimensions such as system-level realize the parallelization of Base-Band Processing.It is this mutually to be tied using thickness granularity
The Parallelization Scheme of conjunction can greatly improve Base-Band Processing efficiency and resource utilization, to solve processing mass data
Real time problems.
Detailed description of the invention
The following drawings only makees schematical description and interpretation to the present invention, is not intended to limit the scope of the present invention, in which:
Fig. 1 shows the topological diagram of the centralized network architecture of the super base station of the prior art.
Fig. 2 shows centralized base band architecture diagrams according to an embodiment of the invention;
Fig. 3 shows the flow chart of baseband processing method according to an embodiment of the invention;
Fig. 4 shows the interaction schematic diagram of LTE Baseband Processing Unit according to an embodiment of the invention and base band management board;
Fig. 5 illustrates the process of load balancing according to an embodiment of the invention and load aggregation.
Fig. 6 (a) illustrates the uplink of existing TD-LTE system and the method for salary distribution of downlink;
Fig. 6 (b) illustrates the method for salary distribution of uplink according to an embodiment of the invention and downlink;
Fig. 7 shows the structure that fork-join executes model;
Fig. 8 shows different threads number and executes the comparison diagram of speed;
The use that Fig. 9 shows one embodiment of the invention uses the internal structure of the DSP of VLIW+SIMD.
Specific embodiment
It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear
Crossing specific embodiment, the present invention is described in more detail.It should be appreciated that specific embodiment described herein is only to explain
The present invention is not intended to limit the present invention.
The present invention will be illustrated by taking the centralized architecture of super base station and TD-LTE system as an example below.Fig. 2 shows bases
The base band centralized architecture of one embodiment of the invention.The framework includes high-speed radio-frequency exchange, baseband processing resource pond and agreement
Treatment source pool.High-speed radio-frequency exchange is made of IF board, radio-frequency front-end and CPRI interface board, and baseband processing resource pond is by multiple
LTE Baseband Processing Unit and a small amount of base band management board composition, protocol processes resource pool is made of Operation Administration and Maintenance plate, radio resource
Management board, managing computing resources plate and at least one Protocol Processing Board are constituted, and pass through the high speed such as CPRI or SRIO between each plate
Interface or exchange network carry out data interaction, and protocol processes resource pool passes through high speed ether exchange network and tele-control system
(i.e. remote power feeding system) is connected with core net.The present invention is carried out for the LTE Baseband Processing Unit and base band management board in baseband processing resource pond
It improves, therefore other processing boards or functional unit will not be described further.
In the present invention, each LTE Baseband Processing Unit includes the DSP array being made of multiple DSP, and each DSP includes multiple
Core, LTE Baseband Processing Unit are mainly responsible for the base band signal process for completing base station system.Base band management board is mainly responsible at each base band
The monitoring resource and load management function of plate are managed, such as detects the working condition and loading condition of each DSP.In one embodiment
In, the multi-core DSP of the C66 series of LTE Baseband Processing Unit and integrated two panels TI (Texas Instrument) company of base band management board, for example, adopting
With 4 core dsp chips of the C6618 type of TI.
For the LTE Baseband Processing Unit being made of multicore DSP array, one embodiment of the present of invention is proposed from multiple dimensions
The method, including system-level dimension, task level dimension and instruction thread grade dimension etc. of parallelization processing are carried out, wherein instruction thread
Grade dimension and the parallel of task level dimension mainly carry out on LTE Baseband Processing Unit, and system-level dimension needs base band management board parallel
It is participated in jointly with LTE Baseband Processing Unit.
Fig. 3 shows the flow chart of the resource allocation methods of Base-Band Processing according to an embodiment of the invention, including with
Lower step:
The first step dynamically distributes the DSP of composition baseband processing unit to realize the parallelization of system-level dimension.
In short, loading condition of the parallel method of system-level dimension based on DSP each on LTE Baseband Processing Unit and needs point
The stock number matched constitutes the DSP of current baseband processing unit to dynamically distribute, and data are by the baseband processing unit come real-time perfoming
Processing.
Fig. 4 shows the interaction schematic diagram of a LTE Baseband Processing Unit and base band management board, which includes DSP1
And DSP2, every DSP have 4 cores, i.e. CORE0-CORE3.In conjunction with the parallel method of system-level dimension Fig. 4 of the invention,
Including following sub-step:
Step S11 obtains the DSP loading condition on each LTE Baseband Processing Unit.
Base band management board can periodically monitor the loading condition of the DSP on each LTE Baseband Processing Unit, such as resources occupation rate
Deng.For example, in TD-LTE system, since the transmission cycle of upper layer protocol data is 1ms, settable DSP is with 1ms
Periodic report loading condition, base band management board can carry out periodic statistical updating according to uploaded state, to guarantee to carry out resource
DSP state before distributing on each LTE Baseband Processing Unit be it is newest, this, which helps reasonably to distribute, constitutes baseband processing unit
DSP。
Step S12 calculates the DSP quantity for needing to distribute according to data volume to be processed.
The data volume that base band management board is transmitted according to upper-layer protocol calculates the base-band resource number for needing to distribute, i.e. DSP quantity.
Step S13 combines current baseband processing unit.
Base band management board is closed according to the loading condition of the DSP on current LTE Baseband Processing Unit and the required DSP quantity of calculating
Reason distribution resource, is combined into currently used baseband processing unit, data to be processed is distributed to the baseband processing unit and are held
Row, for example, being executed parallel by the multiple DSP distributed in the baseband processing unit.
In one embodiment, in base station system operational process, base band management board passes through the load condition to each DSP unit
It is monitored and calculates, dynamically change the number of the core of the DSP of the quantity or occupancy that constitute the DSP of baseband processing unit, with
Load balancing or load aggregation are realized, thus to the reasonable distribution of baseband processing resource, so that each DSP reaches desired negative
It carries.
Referring to the load balancing and load aggregation of Fig. 5 signal, before carrying out load balancing, the resource of baseband processing unit
3 cores of 4 cores, DSP2 including DSP1 and 2 cores of DSP3, after load balancing, DSP1-DSP3 occupies 3
A core.In one embodiment, when the load of some DSP in baseband processing unit is higher than a scheduled threshold value or lower than one
When scheduled threshold value, load balancing is triggered.The DSP on each LTE Baseband Processing Unit can be enable by abundant benefit by load balancing
It uses, improves the operational efficiency of base band resource pool, while reaching optimum load effect, avoid frequently occurring the negative of single DSP
Excessively high or idle situation generation is carried, to realize the multiplexing of hardware resource.In another embodiment, when Base-Band Processing list
When multiple DSP are in low load condition in member, by load aggregation by the task merging of multiple DSP to one or a small amount of DSP,
The DSP being released then may be at low power consumpting state.According to different scenes or apply selection load balancing appropriate or load
Polymerization can be optimal in terms for the treatment of effeciency, efficiency of transmission and power consumption three.
By the parallelization of above-mentioned system-level dimension, according to the loading condition sum number of each DSP of each LTE Baseband Processing Unit
According to amount conditions of demand, the dynamic expansion or reduction of the DSP number of baseband processing unit are realized, thus in baseband processing resource pond
It realizes that multiple DSP's is parallel, and power is reduced by load aggregation.
Uplink task and downlink task are distributed on second step, DSP core to realize the parallelization of task level dimension.
Fig. 6 (a) shows the uplink of existing TD-LTE system and the method for salary distribution of downlink, is by uplink
Link and downlink are bundled in different core respectively to handle, that is to say, that and a part of core in DSP handles uplink,
Another part core handles downlink, and this method, which has, to be realized simply, the advantages of being easily managed and safeguard, is suitable for uplink and downlink
The not high situation of link processing complexity.However, the case where becoming larger for uplink and downlink processing complexity, will lead to responsible corresponding position
The load too high of the core of reason, processing capacity is inadequate, and other core is in idle situation and occurs, to inter-core load occur
It is unbalanced.
Fig. 6 (b) shows the method for salary distribution of uplink and downlink of the invention, is by uplink and downlink
Link bundling is handled in identical DSP core, i.e., i.e. having uplink task again has downlink task on each core, if currently processed place
It is suspended in the downlink task of sub-frame of uplink, each core, if currently processed be in downlink subframe, the uplink task of each core is hung
It rises, to ensure that all cores of each subframe both participate in uplink processing or downlink processing, this mode can reach inter-core load equilibrium
Purpose, and the parallel processing on all cores of uplink task or downlink task can be made, to significantly improve execution efficiency.
Specifically, task level dimension parallel method of the invention includes following sub-step:
Step S210, the DSP for being included for baseband processing unit are arranged main core and from cores.
For example, a core of DSP is set as main core, cokernel is set as from core, and e.g., CORE0 is main core, CORE1-CORE3
For from core, main core is responsible for receiving all data from high-rise or bottom, and part processing and task distribution work are undertaken, according to
Demand is combined the different task of uplink and downlink, dispatches, switches, and after the completion of the processing of each core, main core carries out data
Convergence with it is synchronous, and send.
Step S220 is each core allocation of downlink task.
By taking TD-LTE system as an example, downlink task mainly completes the processing of reference signal, synchronization signal and channel, example
Such as, be related to RS reference signal, primary synchronization signal PSS, secondary synchronization signal SSS, channel PBCH, PDSCH, PDCCH, PCFICH,
PHICH etc..In one embodiment, by the analysis of complexity of processing and assessment, the downlink task of each core is allocated as follows:
Main core CORE0 responsible downlink task processing includes: processing and the partial symbols of PSS, SSS, RS, PBCH, PHICH
IFFT;
The downlink task processing being responsible for from core CORE1 includes: processing and the partial symbols IFFT of PCFICH, PDCCH;
The downlink task processing being responsible for from core CORE2 includes: processing and the partial symbols IFFT of PDSCH;
Be responsible for the parallel processing of downlink task portion time-consuming module or algorithm from core CORE3, as PDSCH resource impact and
The parallel processing of partial symbols IFFT.
After core CORE1-CORE3 processing completion, Xiang Zhuhe CORE0 sends synchronization signal, is completed with instruction processing.Institute
After thering is the resource impact of channel and signal to handle, the processing of OFDM is carried out, most time-consuming module is the place of IFFT in OFDM
It manages, is extremely difficult to require in single core, so carrying out parallel processing with four cores.For example, receive sended over from core it is same
After walking signal, the IFFT parallel processing of four cores will be started, all cores have handled the synchronization that can try again in CORE0 later,
Completion sends the data of convergence after synchronizing.
Step S230 distributes uplink task for each core.
The uplink task of TD-LTE system mainly complete PRACH channel, PUCCH channel, PUSCH channel, SRS signal,
The processing of DMRS signal etc..Each channel exists independently of each other, and PUSCH channel be in all channels process flow it is most complicated,
The time-consuming longest part of processing.The uplink task of each core is allocated as follows:
Main core CORE0 responsible uplink task processing includes: SRS, DMRS and partial symbols FFT;
The uplink task processing being responsible for from core CORE1 includes: processing and the partial symbols FFT of PRACH, PUCCH;
The uplink task processing being responsible for from core CORE2 includes: processing and the partial symbols FFT of PUSCH;
It is responsible for the parallel processing of uplink task portion time-consuming module or algorithm from core CORE3, such as the solution resource impact of PUSCH
With the parallel processing of partial symbols FFT.
After main core CORE0 receives the baseband signal from bottom, CP is removed, then passes through EDMA (enhanced direct storage
Device access) it moves in shared drive region, and internuclear task distribution is carried out, notify other to start corresponding task from core.Task
It divides first using OFDM symbol grade as basic processing unit, according to oneself OFDM symbol is distributed to, dynamic is read to be shared each core
The data of storage region, are respectively completed FFT processing and the processing of respective physical channel and signal carries out internuclear synchronization later, by leading
After data are converged and synchronized by core CORE0, it is reported to high level.
The parallel method of the hybrid task level dimension of uplink and downlink proposed by the present invention can be by uplink and downlink chain
Road is bundled on identical core, the identical core is utilized by time division multiplexing, to improve resource utilization.In addition, by upper
Line link and downlink are respectively divided into multiple subtasks, and the characteristics of multiple subtasks are directed to communication protocol, distribute to
Different cores are handled, to be adapted to the high situation of uplink and downlink processing complexity, can be avoided and inter-core load unevenness occur
The phenomenon that weighing apparatus.
Third step, the parallelization that existing instruction thread grade dimension is respectively verified for the inside DSP.
The parallel method of instruction thread grade dimension of the invention is the method for parallel processing for core each inside DSP, in conjunction with
The hardware feature of DSP carries out more fine-grained parallel processing.
According to one embodiment of present invention, the parallel method of instruction thread grade dimension includes following sub-step:
Main task is decomposed into multiple threads to realize parallelization by step S310.
For example, being decomposed main task using OpenMP, OpenMP is many places that can be used for shared drive parallel system
The a set of guiding process of compilation scheme for managing device programming, provides the high-rise abstractdesription to parallel algorithm.Specifically,
For the operation of a large amount of iterative cycles present in the communication of algorithms (for example, for Do statement) and the lesser sentence of correlation, pass through
OpenMP executes model using fork-join (bifurcated-merging) and main task is decomposed into multiple threads come parallelization, such as Fig. 7 institute
Show, wherein fork creates thread or wakes up existing thread, the congregation of join, that is, multithreading.Fork-join model starts just
When execution, only one be known as " main thread " active thread exist, main thread in the process of running, when encounter need into
When row parallel computation, derives thread and carry out executing tasks parallelly.In parallel implementation, main thread and the derivation common work of thread
Make, after parallel task executes, derives from thread and exit or block, no longer work, control flow back into individual main thread.
Specifically, the intention of oneself is indicated by the way that dedicated pragma is added in source code.For example, for following
One section of code for seeking PI, needs to recycle 100,000 times (num_steps=100000), can by pragma come specified with 2 lines
Journey (#define NUMTHREADS 2) is performed simultaneously, and thus program can be carried out parallelization automatically by compiler, and must
Want place that synchronization and mutex and communication is added.
Fig. 8 shows the different threads number of same section of code and executes the comparison diagram of speed, wherein abscissa indicates thread
Number, ordinate indicate execution speed (unit is the periodicity of DSP, and numerical value is bigger to indicate more time-consuming), and serial 1 curve is 10 times corresponding
The case where the case where circulation, series 2 corresponds to 40 circulations, the case where circulation for 10 times with 40 circulations, when Thread Count is 2,
It executes fastest.As it can be seen that it is not that Thread Count is The more the better when carrying out multi-threaded parallel using OpenMP, this is because
Thread Count is more, and internuclear interactive expense is also bigger.In a preferred embodiment, Thread Count is set to the nucleus number with DSP
It is equal, relatively high level is also at the time-consuming less and operational efficiency for guaranteeing less.
Step S320, schduling cycle instruction, so as to overlappingly execute different circulations.
In this step, by way of arrangement software flowing water, start one before preceding an iteration is not yet completed newly
Loop iteration, so that the successive ignition recycled in a core is executed parallel.
In the specific implementation, it is thus necessary to determine that the minimum iteration interval of minimum iteration interval, a circulation refers to the circulation phase
Adjacent iteration twice start between the minimum period number that has to wait for.Iteration interval is smaller, executes a circulation period used just
It is fewer.By taking TI DSP executes fixed point dot-product operation as an example, if not using flowing water parallel, the periodicity of 1200 iteration is 19224,
According to software flow it is parallel after, the periodicities of 1200 iteration is 696, and efficiency improves more than 27 times.
By way of arrangement software flowing water, it can be improved and effectively utilize resource and improve operation efficiency.
Step S330 links together a plurality of instruction, to increase instruction thread grade degree of parallelism.
For example, a plurality of instruction is linked together using very long instruction word (VLIM), the basic ideas of VLIW are: at DSP
Reason device the ability that compiler controls all functional units is assigned in a long instruction words, enable compiler accurately
Scheduling wherein executes each operation.
Fig. 9 shows the internal structure of DSP according to the present invention, by Fetch unit, decoding unit, execution unit and storage
Four part of area composition.For example, dsp processor has 8 execution units, theoretically dsp processor for TI C6000 series DSP
Each period can execute 8 single instrctions, this 8 single instrctions are regarded as an instruction packet, and Fetch unit, is held decoding unit
Row unit every time operates an instruction packet.VLIW regards a plurality of single instrction as a long instruction, the course of work of VLIW
The fetching to every long instruction, decoding, implementation procedure can be regarded as.Specifically, the C6618 model DSP used in the present invention
Support the Fetch Packet of 8 32 single instrctions, i.e., total word length is 256, by one very long instruction word of this 256 compositions, wherein often
The operation code that single instrction is 32, a Fetch Packet may include most 8 single instrctions, can be by each 256 instruction packets
8 execution units are assigned to execute parallel.
The parallel advantage of instruction thread grade can be played by using VLIW structure, the resource of DSP is taken full advantage of, thus substantially
Improve arithmetic speed.
Step S340, while multiple operands of individual instructions are obtained, to improve operation efficiency.
SIMD (single-instruction multiple-data stream (SIMD)) technology can replicate multiple operands, and they are packaged in one group of register
In instruction.Relative to single instruction single data stream (SISD), vector is capable of providing using single-instruction multiple-data stream (SIMD) technology in the present invention
Processing capacity.For example, execution unit first accesses memory, obtains first operand after being decoded using SISD to addition instruction;
It accesses memory again later, obtains second operand;It then just can be carried out summation operation.And use SIMD to addition instruction
Several execution units can access memory simultaneously after decoding, disposably obtain all operands and carry out operation.For example, C6618
SIMD instruction, which expands to, supports 128bit vector data, such as QMPY32 is instructed to be able to carry out the corresponding 4 32bit numbers of two plural numbers
According to multiplication.
In one embodiment, using the mixed structure of VLIW+SIMD, referring also to shown in Fig. 9, wherein fetching and access
The process of memory block uses SIMD technology, can obtain simultaneously or multiple operands of store instruction, and the decoding to instruction
VLIW structure is used with implementation procedure, to execute a plurality of instruction parallel.This mode can be from more fine granularity (i.e. instruction thread
Grade) improve degree of parallelism.
For example, in order to obtain more instruction thread grade concurrencys, the multi-core DSP chip that the present invention uses instructs frame
Structure supports the two-way SIMD operation of 16 data and four road SIMD operations of 8 data, can replicate multiple operands, once
All instructions is executed, multiple identical datas can be packaged on 64 bit registers, while identical operation is carried out to it and (is added
Subtract multiplication and division, movement, logical operation etc.), to accelerate arithmetic speed.In the specific implementation, linear assembler realization can be used
This mixed structure specifies the modes such as register, customized parallel instruction to improve the scheduling efficiency of compiler by linear assembler.
It should be noted that, although each step is described according to particular order above, it is not intended that must press
Each step is executed according to above-mentioned particular order, in fact, some in these steps can concurrently execute, or even is changed suitable
Sequence, as long as can be realized required function.In addition, thought of the invention is also applied for other mobile communication standards.
The present invention can be system, method and/or computer program product.Computer program product may include computer
Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the invention.
Computer readable storage medium can be to maintain and store the tangible device of the instruction used by instruction execution equipment.
Computer readable storage medium for example can include but is not limited to storage device electric, magnetic storage apparatus, light storage device, electromagnetism and deposit
Store up equipment, semiconductor memory apparatus or above-mentioned any appropriate combination.The more specific example of computer readable storage medium
Sub (non exhaustive list) include: portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM),
Erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), Portable compressed disk are read-only
Memory (CD-ROM), memory stick, floppy disk, mechanical coding equipment, is for example stored thereon with instruction at digital versatile disc (DVD)
Punch card or groove internal projection structure and above-mentioned any appropriate combination.
Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and
It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill
Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport
In principle, the practical application or to the technological improvement in market for best explaining each embodiment, or make the art its
Its those of ordinary skill can understand each embodiment disclosed herein.
Claims (11)
1. a kind of distribution method of baseband processing resource, the Base-Band Processing are executed by multiple multi-core DSPs, this method includes following step
It is rapid:
Step 1: the DSP quantity for constituting baseband processing unit and including is determined according to data volume to be processed;
Step 2: selecting the baseband processing unit to include according to the loading condition of each DSP and identified DSP quantity
DSP。
2. according to the method described in claim 1, further include:
When the load of a DSP in the baseband processing unit is higher than first threshold, included in the baseband processing unit
DSP between carry out load transmitting, to realize the load balancing between DSP that the baseband processing unit is included.
3. according to the method described in claim 1, further include:
When the load of multiple DSP in the baseband processing unit is below second threshold, in the baseband processing unit institute
Load is carried out between the DSP for including to move, and there will be load aggregation on the DSP of relatively low load level to relatively high
The DSP of load level.
4. method according to any one of claims 1 to 3, wherein further include step 3, which includes following sub-step
It is rapid:
Step 31: each DSP of the baseband processing unit is divided into main core and multiple from core;
Step 32: the uplink task of Base-Band Processing and downlink task are separately disassembled into multiple uplink subtasks and multiple downlinks
Task;
Step 33: the uplink subtask and the downlink subtask are distributed to by the Base-Band Processing list based on computation complexity
The main core of each DSP of member and from core, wherein at least one uplink subtask and at least one downlink subtask are distributed to identical
Core.
5. according to the method described in claim 4, further include:
During DSP execution task, for iterative cycles operation and the small sentence of correlation, it is decomposed into multiple thread parallels
It executes.
6. according to the method described in claim 5, wherein, the Thread Count decomposed is equal with the nucleus number of DSP of the task is executed.
7. according to the method described in claim 5, further include:
Iterative cycles are operated, schduling cycle instructs so that preceding an iteration starts a new iteration before not yet completing.
8. according to the method described in claim 5, further include:
During DSP execution task, a plurality of single instrction is linked together using very long instruction word.
9. according to the method described in claim 5, further include:
Multiple operands corresponding for a single instrction, while reading or being written.
10. a kind of computer readable storage medium, is stored thereon with computer program, wherein when the program is executed by processor
The step of realizing according to claim 1 to any one of 9 the method.
11. a kind of computer equipment, including memory and processor, be stored on the memory to transport on a processor
Capable computer program, which is characterized in that the processor realizes any one of claims 1 to 9 institute when executing described program
The step of method stated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711282563.4A CN109905898B (en) | 2017-12-07 | 2017-12-07 | Baseband processing resource allocation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711282563.4A CN109905898B (en) | 2017-12-07 | 2017-12-07 | Baseband processing resource allocation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109905898A true CN109905898A (en) | 2019-06-18 |
CN109905898B CN109905898B (en) | 2022-10-11 |
Family
ID=66938907
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711282563.4A Active CN109905898B (en) | 2017-12-07 | 2017-12-07 | Baseband processing resource allocation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109905898B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112243266A (en) * | 2019-07-18 | 2021-01-19 | 大唐联仪科技有限公司 | Data packaging method and device |
CN112653638A (en) * | 2020-12-14 | 2021-04-13 | 中科院计算技术研究所南京移动通信与计算创新研究院 | Device for switching routes of multiple paths of intermediate frequencies and baseband at high speed and communication method thereof |
WO2021089114A1 (en) * | 2019-11-04 | 2021-05-14 | NEC Laboratories Europe GmbH | Autonomous virtual radio access network control |
CN113038607A (en) * | 2019-12-24 | 2021-06-25 | 大唐移动通信设备有限公司 | Channel processing method, device and base station |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1984395A (en) * | 2005-12-12 | 2007-06-20 | 大唐移动通信设备有限公司 | Baseband processor based on multiple kernel construction processor |
CN102681902A (en) * | 2012-05-15 | 2012-09-19 | 浙江大学 | Load balancing method based on task distribution of multicore system |
CN105045658A (en) * | 2015-07-02 | 2015-11-11 | 西安电子科技大学 | Method for realizing dynamic dispatching distribution of task by multi-core embedded DSP (Data Structure Processor) |
CN105915462A (en) * | 2016-06-03 | 2016-08-31 | 中国航天科技集团公司第九研究院第七七研究所 | Symmetrical RSS circuit facing TCP session |
US20160316485A1 (en) * | 2015-04-21 | 2016-10-27 | Anoop Kumar | Traffic scheduling system for wireless communication system |
-
2017
- 2017-12-07 CN CN201711282563.4A patent/CN109905898B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1984395A (en) * | 2005-12-12 | 2007-06-20 | 大唐移动通信设备有限公司 | Baseband processor based on multiple kernel construction processor |
CN102681902A (en) * | 2012-05-15 | 2012-09-19 | 浙江大学 | Load balancing method based on task distribution of multicore system |
US20160316485A1 (en) * | 2015-04-21 | 2016-10-27 | Anoop Kumar | Traffic scheduling system for wireless communication system |
CN105045658A (en) * | 2015-07-02 | 2015-11-11 | 西安电子科技大学 | Method for realizing dynamic dispatching distribution of task by multi-core embedded DSP (Data Structure Processor) |
CN105915462A (en) * | 2016-06-03 | 2016-08-31 | 中国航天科技集团公司第九研究院第七七研究所 | Symmetrical RSS circuit facing TCP session |
Non-Patent Citations (1)
Title |
---|
潘必超等: "面向 ESP 双核控制架构通信模块的设计与实现", 《计算机与现代化》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112243266A (en) * | 2019-07-18 | 2021-01-19 | 大唐联仪科技有限公司 | Data packaging method and device |
CN112243266B (en) * | 2019-07-18 | 2024-04-19 | 大唐联仪科技有限公司 | Data packet method and device |
WO2021089114A1 (en) * | 2019-11-04 | 2021-05-14 | NEC Laboratories Europe GmbH | Autonomous virtual radio access network control |
CN113038607A (en) * | 2019-12-24 | 2021-06-25 | 大唐移动通信设备有限公司 | Channel processing method, device and base station |
CN113038607B (en) * | 2019-12-24 | 2022-11-15 | 大唐移动通信设备有限公司 | Channel processing method, device and base station |
CN112653638A (en) * | 2020-12-14 | 2021-04-13 | 中科院计算技术研究所南京移动通信与计算创新研究院 | Device for switching routes of multiple paths of intermediate frequencies and baseband at high speed and communication method thereof |
Also Published As
Publication number | Publication date |
---|---|
CN109905898B (en) | 2022-10-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109905898A (en) | Baseband processing resource distribution method | |
CN102360309B (en) | Scheduling system and scheduling execution method of multi-core heterogeneous system on chip | |
CN106095583B (en) | Principal and subordinate's nuclear coordination calculation and programming frame based on new martial prowess processor | |
Meng et al. | Dedas: Online task dispatching and scheduling with bandwidth constraint in edge computing | |
CN108363615B (en) | Method for allocating tasks and system for reconfigurable processing system | |
CN103970580B (en) | A kind of data flow towards multinuclear cluster compiles optimization method | |
CN105159762B (en) | Heuristic cloud computing method for scheduling task based on Greedy strategy | |
CN104536937B (en) | Big data all-in-one machine realization method based on CPU GPU isomeric groups | |
CN104331321B (en) | Cloud computing task scheduling method based on tabu search and load balancing | |
CN103279390B (en) | A kind of parallel processing system (PPS) towards little optimization of job | |
CN109992407B (en) | YARN cluster GPU resource scheduling method, device and medium | |
CN101366004A (en) | Methods and apparatus for multi-core processing with dedicated thread management | |
CN103809936A (en) | System and method for allocating memory of differing properties to shared data objects | |
CN103699432B (en) | Multi-task runtime collaborative scheduling system under heterogeneous environment | |
CN103279445A (en) | Computing method and super-computing system for computing task | |
CN103793255B (en) | Starting method for configurable multi-main-mode multi-OS-inner-core real-time operating system structure | |
CN102193779A (en) | MPSoC (multi-processor system-on-chip)-oriented multithread scheduling method | |
CN102135949A (en) | Computing network system, method and device based on graphic processing unit | |
CN101464965B (en) | Multi-nuclear parallel ant group design method based on TBB | |
Zheng et al. | Architecture-based design and optimization of genetic algorithms on multi-and many-core systems | |
CN104090826B (en) | Task optimization deployment method based on correlation | |
CN111158790B (en) | FPGA virtualization method for cloud deep learning reasoning | |
WO2012152948A1 (en) | Microcomputer for low power efficient baseband processing | |
Shafique et al. | Minority-game-based resource allocation for run-time reconfigurable multi-core processors | |
CN104965762B (en) | A kind of scheduling system towards hybrid task |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |