CN109905898A - Baseband processing resource distribution method - Google Patents

Baseband processing resource distribution method Download PDF

Info

Publication number
CN109905898A
CN109905898A CN201711282563.4A CN201711282563A CN109905898A CN 109905898 A CN109905898 A CN 109905898A CN 201711282563 A CN201711282563 A CN 201711282563A CN 109905898 A CN109905898 A CN 109905898A
Authority
CN
China
Prior art keywords
dsp
baseband processing
processing unit
core
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711282563.4A
Other languages
Chinese (zh)
Other versions
CN109905898B (en
Inventor
王妮娜
杨喜宁
曹阳阳
曹欢
周一青
石晶林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Polytron Technologies Inc
Original Assignee
Beijing Zhongke Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Polytron Technologies Inc filed Critical Beijing Zhongke Polytron Technologies Inc
Priority to CN201711282563.4A priority Critical patent/CN109905898B/en
Publication of CN109905898A publication Critical patent/CN109905898A/en
Application granted granted Critical
Publication of CN109905898B publication Critical patent/CN109905898B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The present invention provides a kind of distribution method of baseband processing resource, which is executed by multiple multi-core DSPs, and this method includes that the DSP quantity for constituting baseband processing unit and including is determined according to data volume to be processed;The DSP for selecting the baseband processing unit to include according to the loading condition of each DSP and identified DSP quantity.Method of the invention can be realized the parallel processing in system-level, task level and instruction thread grade dimension, to effectively improve the resource utilization and processing speed of Base-Band Processing.

Description

Baseband processing resource distribution method
Technical field
The present invention relates to wireless communication technology field more particularly to a kind of baseband processing resource distribution systems and Base-Band Processing Resource allocation methods.
Background technique
The traditional base station processing capacity disposed is relatively fixed, can not on-demand dilatation, operator needs to expend expensive builds If the site of maintenance expense and great number is mating and lease expenses, by largely increasing base station number, with meet WiMAX and The business demand of narrowband.In addition, between traditional base station independently of one another, cannot achieve the statistic multiplexing of resource, causing resource and energy The significant wastage in source.
It limits to solve above-mentioned traditional base station to cope with the explosive growth of data communication and multimedia service, newly Base station gradually substitutes traditional base station, mainly there is the centralization of Cloud-RAN (CRAN) network architecture and super base station at present The network architecture.For example, Fig. 1 shows the topological diagram of the centralized network architecture of the super base station of Computer Department of the Chinese Academy of Science's proposition, it is whole A framework is divided into 4 layers of resource, the theory that every layer of resource takes horizontal pondization shared.First layer is distribution type fiber-optic extension radio frequency Unit, i.e. RRH are responsible for wireless signal transmitting-receiving and simple signal processing, pass rf data back super base station machine by optical fiber Then rf data is exchanged to any baseband processing unit by radio frequency exchange machine by room as needed;The second layer is that multimode can weigh DSP can be used to realize the Base-Band Processing of base station in the baseband processing resource pond of structure;Third layer is the protocol processes of multi-mode reconfigurable Resource pool, the main protocol processes for completing base station layer 2 and layer 3;4th layer is global resource management control pond, main to complete to base The management control stood, such as RRM (wireless resource management), OAM (Operation and Maintenance) etc., and the resource point of entire super base station system Match and scheduling controlling.
Although the base station architecture of centralization can effectively reduce energy consumption, infrastructure utilization rate is improved, realizes and calculates The shared and load balancing of resource and collect medium, however, with orthogonal frequency division multiplexing (OFDM), multiple antennas receive and dispatch (MIMO) and The introducing of new technologies such as multicast communication (CoMP) is cooperateed with, wireless algorithm complexity rises, this proposes base band signal process ability Higher requirement.Existing Base-Band Processing can't effectively meet the real-time demand of base band signal process.
Therefore, it is necessary to be improved to the prior art, to improve the efficiency of base station processing mass data.
Summary of the invention
It is an object of the invention to overcome the defect of the above-mentioned prior art, provide a kind of baseband processing resource distribution system and Baseband processing resource distribution method.
According to the first aspect of the invention, a kind of distribution method of baseband processing resource is provided, wherein the Base-Band Processing It is executed by multiple multi-core DSPs, method includes the following steps:
Step 1: the DSP quantity for constituting baseband processing unit and including is determined according to data volume to be processed;
Step 2: the baseband processing unit packet is selected according to the loading condition of each DSP and identified DSP quantity The DSP contained.
In one embodiment of the invention, the distribution method of the baseband processing resource further include: when the Base-Band Processing When the load of a DSP is higher than first threshold in unit, load biography is carried out between the DSP that the baseband processing unit is included It passs, to realize the load balancing between DSP that the baseband processing unit is included.
In one embodiment of the invention, the distribution method of the baseband processing resource further include: when the Base-Band Processing When the load of multiple DSP in unit is below second threshold, born between the DSP that the baseband processing unit is included Load is moved, and will have DSP of the load aggregation on the DSP of relatively low load level to relatively high load level.
In one embodiment of the invention, the distribution method of the baseband processing resource further includes step 3, the step 3 packet Include following sub-step:
Step 31: each DSP of the baseband processing unit is divided into main core and multiple from core;
Step 32: by the uplink task of Base-Band Processing and downlink task be separately disassembled into multiple uplink subtasks and it is multiple under Row subtask;
Step 33: being distributed to the uplink subtask and the downlink subtask at the base band based on computation complexity Manage the main core of each DSP of unit and from core, wherein at least one uplink subtask and at least one downlink subtask are distributed to Identical core.
In one embodiment of the invention, the distribution method of the baseband processing resource further include: execute task in DSP In the process, for iterative cycles operation and the small sentence of correlation, multiple thread parallels is decomposed into and are executed.
In one embodiment of the invention, the Thread Count decomposed is equal with the nucleus number of DSP of the task is executed.
In one embodiment of the invention, the distribution method of the baseband processing resource further include: iterative cycles are grasped Make, schduling cycle instructs so that preceding an iteration starts a new iteration before not yet completing.
In one embodiment of the invention, the distribution method of the baseband processing resource further include: execute task in DSP In the process, a plurality of single instrction is linked together using very long instruction word.
In one embodiment of the invention, the distribution method of the baseband processing resource further include: for a single instrction Corresponding multiple operands, while reading or being written.
Compared with the prior art, the advantages of the present invention are as follows:
The present invention improves the speed of Base-Band Processing using multicore DSP array;And the characteristics of combining multicore DSP array, from Instruction thread grade, task level and the more various dimensions such as system-level realize the parallelization of Base-Band Processing.It is this mutually to be tied using thickness granularity The Parallelization Scheme of conjunction can greatly improve Base-Band Processing efficiency and resource utilization, to solve processing mass data Real time problems.
Detailed description of the invention
The following drawings only makees schematical description and interpretation to the present invention, is not intended to limit the scope of the present invention, in which:
Fig. 1 shows the topological diagram of the centralized network architecture of the super base station of the prior art.
Fig. 2 shows centralized base band architecture diagrams according to an embodiment of the invention;
Fig. 3 shows the flow chart of baseband processing method according to an embodiment of the invention;
Fig. 4 shows the interaction schematic diagram of LTE Baseband Processing Unit according to an embodiment of the invention and base band management board;
Fig. 5 illustrates the process of load balancing according to an embodiment of the invention and load aggregation.
Fig. 6 (a) illustrates the uplink of existing TD-LTE system and the method for salary distribution of downlink;
Fig. 6 (b) illustrates the method for salary distribution of uplink according to an embodiment of the invention and downlink;
Fig. 7 shows the structure that fork-join executes model;
Fig. 8 shows different threads number and executes the comparison diagram of speed;
The use that Fig. 9 shows one embodiment of the invention uses the internal structure of the DSP of VLIW+SIMD.
Specific embodiment
It is logical below in conjunction with attached drawing in order to keep the purpose of the present invention, technical solution, design method and advantage more clear Crossing specific embodiment, the present invention is described in more detail.It should be appreciated that specific embodiment described herein is only to explain The present invention is not intended to limit the present invention.
The present invention will be illustrated by taking the centralized architecture of super base station and TD-LTE system as an example below.Fig. 2 shows bases The base band centralized architecture of one embodiment of the invention.The framework includes high-speed radio-frequency exchange, baseband processing resource pond and agreement Treatment source pool.High-speed radio-frequency exchange is made of IF board, radio-frequency front-end and CPRI interface board, and baseband processing resource pond is by multiple LTE Baseband Processing Unit and a small amount of base band management board composition, protocol processes resource pool is made of Operation Administration and Maintenance plate, radio resource Management board, managing computing resources plate and at least one Protocol Processing Board are constituted, and pass through the high speed such as CPRI or SRIO between each plate Interface or exchange network carry out data interaction, and protocol processes resource pool passes through high speed ether exchange network and tele-control system (i.e. remote power feeding system) is connected with core net.The present invention is carried out for the LTE Baseband Processing Unit and base band management board in baseband processing resource pond It improves, therefore other processing boards or functional unit will not be described further.
In the present invention, each LTE Baseband Processing Unit includes the DSP array being made of multiple DSP, and each DSP includes multiple Core, LTE Baseband Processing Unit are mainly responsible for the base band signal process for completing base station system.Base band management board is mainly responsible at each base band The monitoring resource and load management function of plate are managed, such as detects the working condition and loading condition of each DSP.In one embodiment In, the multi-core DSP of the C66 series of LTE Baseband Processing Unit and integrated two panels TI (Texas Instrument) company of base band management board, for example, adopting With 4 core dsp chips of the C6618 type of TI.
For the LTE Baseband Processing Unit being made of multicore DSP array, one embodiment of the present of invention is proposed from multiple dimensions The method, including system-level dimension, task level dimension and instruction thread grade dimension etc. of parallelization processing are carried out, wherein instruction thread Grade dimension and the parallel of task level dimension mainly carry out on LTE Baseband Processing Unit, and system-level dimension needs base band management board parallel It is participated in jointly with LTE Baseband Processing Unit.
Fig. 3 shows the flow chart of the resource allocation methods of Base-Band Processing according to an embodiment of the invention, including with Lower step:
The first step dynamically distributes the DSP of composition baseband processing unit to realize the parallelization of system-level dimension.
In short, loading condition of the parallel method of system-level dimension based on DSP each on LTE Baseband Processing Unit and needs point The stock number matched constitutes the DSP of current baseband processing unit to dynamically distribute, and data are by the baseband processing unit come real-time perfoming Processing.
Fig. 4 shows the interaction schematic diagram of a LTE Baseband Processing Unit and base band management board, which includes DSP1 And DSP2, every DSP have 4 cores, i.e. CORE0-CORE3.In conjunction with the parallel method of system-level dimension Fig. 4 of the invention, Including following sub-step:
Step S11 obtains the DSP loading condition on each LTE Baseband Processing Unit.
Base band management board can periodically monitor the loading condition of the DSP on each LTE Baseband Processing Unit, such as resources occupation rate Deng.For example, in TD-LTE system, since the transmission cycle of upper layer protocol data is 1ms, settable DSP is with 1ms Periodic report loading condition, base band management board can carry out periodic statistical updating according to uploaded state, to guarantee to carry out resource DSP state before distributing on each LTE Baseband Processing Unit be it is newest, this, which helps reasonably to distribute, constitutes baseband processing unit DSP。
Step S12 calculates the DSP quantity for needing to distribute according to data volume to be processed.
The data volume that base band management board is transmitted according to upper-layer protocol calculates the base-band resource number for needing to distribute, i.e. DSP quantity.
Step S13 combines current baseband processing unit.
Base band management board is closed according to the loading condition of the DSP on current LTE Baseband Processing Unit and the required DSP quantity of calculating Reason distribution resource, is combined into currently used baseband processing unit, data to be processed is distributed to the baseband processing unit and are held Row, for example, being executed parallel by the multiple DSP distributed in the baseband processing unit.
In one embodiment, in base station system operational process, base band management board passes through the load condition to each DSP unit It is monitored and calculates, dynamically change the number of the core of the DSP of the quantity or occupancy that constitute the DSP of baseband processing unit, with Load balancing or load aggregation are realized, thus to the reasonable distribution of baseband processing resource, so that each DSP reaches desired negative It carries.
Referring to the load balancing and load aggregation of Fig. 5 signal, before carrying out load balancing, the resource of baseband processing unit 3 cores of 4 cores, DSP2 including DSP1 and 2 cores of DSP3, after load balancing, DSP1-DSP3 occupies 3 A core.In one embodiment, when the load of some DSP in baseband processing unit is higher than a scheduled threshold value or lower than one When scheduled threshold value, load balancing is triggered.The DSP on each LTE Baseband Processing Unit can be enable by abundant benefit by load balancing It uses, improves the operational efficiency of base band resource pool, while reaching optimum load effect, avoid frequently occurring the negative of single DSP Excessively high or idle situation generation is carried, to realize the multiplexing of hardware resource.In another embodiment, when Base-Band Processing list When multiple DSP are in low load condition in member, by load aggregation by the task merging of multiple DSP to one or a small amount of DSP, The DSP being released then may be at low power consumpting state.According to different scenes or apply selection load balancing appropriate or load Polymerization can be optimal in terms for the treatment of effeciency, efficiency of transmission and power consumption three.
By the parallelization of above-mentioned system-level dimension, according to the loading condition sum number of each DSP of each LTE Baseband Processing Unit According to amount conditions of demand, the dynamic expansion or reduction of the DSP number of baseband processing unit are realized, thus in baseband processing resource pond It realizes that multiple DSP's is parallel, and power is reduced by load aggregation.
Uplink task and downlink task are distributed on second step, DSP core to realize the parallelization of task level dimension.
Fig. 6 (a) shows the uplink of existing TD-LTE system and the method for salary distribution of downlink, is by uplink Link and downlink are bundled in different core respectively to handle, that is to say, that and a part of core in DSP handles uplink, Another part core handles downlink, and this method, which has, to be realized simply, the advantages of being easily managed and safeguard, is suitable for uplink and downlink The not high situation of link processing complexity.However, the case where becoming larger for uplink and downlink processing complexity, will lead to responsible corresponding position The load too high of the core of reason, processing capacity is inadequate, and other core is in idle situation and occurs, to inter-core load occur It is unbalanced.
Fig. 6 (b) shows the method for salary distribution of uplink and downlink of the invention, is by uplink and downlink Link bundling is handled in identical DSP core, i.e., i.e. having uplink task again has downlink task on each core, if currently processed place It is suspended in the downlink task of sub-frame of uplink, each core, if currently processed be in downlink subframe, the uplink task of each core is hung It rises, to ensure that all cores of each subframe both participate in uplink processing or downlink processing, this mode can reach inter-core load equilibrium Purpose, and the parallel processing on all cores of uplink task or downlink task can be made, to significantly improve execution efficiency.
Specifically, task level dimension parallel method of the invention includes following sub-step:
Step S210, the DSP for being included for baseband processing unit are arranged main core and from cores.
For example, a core of DSP is set as main core, cokernel is set as from core, and e.g., CORE0 is main core, CORE1-CORE3 For from core, main core is responsible for receiving all data from high-rise or bottom, and part processing and task distribution work are undertaken, according to Demand is combined the different task of uplink and downlink, dispatches, switches, and after the completion of the processing of each core, main core carries out data Convergence with it is synchronous, and send.
Step S220 is each core allocation of downlink task.
By taking TD-LTE system as an example, downlink task mainly completes the processing of reference signal, synchronization signal and channel, example Such as, be related to RS reference signal, primary synchronization signal PSS, secondary synchronization signal SSS, channel PBCH, PDSCH, PDCCH, PCFICH, PHICH etc..In one embodiment, by the analysis of complexity of processing and assessment, the downlink task of each core is allocated as follows:
Main core CORE0 responsible downlink task processing includes: processing and the partial symbols of PSS, SSS, RS, PBCH, PHICH IFFT;
The downlink task processing being responsible for from core CORE1 includes: processing and the partial symbols IFFT of PCFICH, PDCCH;
The downlink task processing being responsible for from core CORE2 includes: processing and the partial symbols IFFT of PDSCH;
Be responsible for the parallel processing of downlink task portion time-consuming module or algorithm from core CORE3, as PDSCH resource impact and The parallel processing of partial symbols IFFT.
After core CORE1-CORE3 processing completion, Xiang Zhuhe CORE0 sends synchronization signal, is completed with instruction processing.Institute After thering is the resource impact of channel and signal to handle, the processing of OFDM is carried out, most time-consuming module is the place of IFFT in OFDM It manages, is extremely difficult to require in single core, so carrying out parallel processing with four cores.For example, receive sended over from core it is same After walking signal, the IFFT parallel processing of four cores will be started, all cores have handled the synchronization that can try again in CORE0 later, Completion sends the data of convergence after synchronizing.
Step S230 distributes uplink task for each core.
The uplink task of TD-LTE system mainly complete PRACH channel, PUCCH channel, PUSCH channel, SRS signal, The processing of DMRS signal etc..Each channel exists independently of each other, and PUSCH channel be in all channels process flow it is most complicated, The time-consuming longest part of processing.The uplink task of each core is allocated as follows:
Main core CORE0 responsible uplink task processing includes: SRS, DMRS and partial symbols FFT;
The uplink task processing being responsible for from core CORE1 includes: processing and the partial symbols FFT of PRACH, PUCCH;
The uplink task processing being responsible for from core CORE2 includes: processing and the partial symbols FFT of PUSCH;
It is responsible for the parallel processing of uplink task portion time-consuming module or algorithm from core CORE3, such as the solution resource impact of PUSCH With the parallel processing of partial symbols FFT.
After main core CORE0 receives the baseband signal from bottom, CP is removed, then passes through EDMA (enhanced direct storage Device access) it moves in shared drive region, and internuclear task distribution is carried out, notify other to start corresponding task from core.Task It divides first using OFDM symbol grade as basic processing unit, according to oneself OFDM symbol is distributed to, dynamic is read to be shared each core The data of storage region, are respectively completed FFT processing and the processing of respective physical channel and signal carries out internuclear synchronization later, by leading After data are converged and synchronized by core CORE0, it is reported to high level.
The parallel method of the hybrid task level dimension of uplink and downlink proposed by the present invention can be by uplink and downlink chain Road is bundled on identical core, the identical core is utilized by time division multiplexing, to improve resource utilization.In addition, by upper Line link and downlink are respectively divided into multiple subtasks, and the characteristics of multiple subtasks are directed to communication protocol, distribute to Different cores are handled, to be adapted to the high situation of uplink and downlink processing complexity, can be avoided and inter-core load unevenness occur The phenomenon that weighing apparatus.
Third step, the parallelization that existing instruction thread grade dimension is respectively verified for the inside DSP.
The parallel method of instruction thread grade dimension of the invention is the method for parallel processing for core each inside DSP, in conjunction with The hardware feature of DSP carries out more fine-grained parallel processing.
According to one embodiment of present invention, the parallel method of instruction thread grade dimension includes following sub-step:
Main task is decomposed into multiple threads to realize parallelization by step S310.
For example, being decomposed main task using OpenMP, OpenMP is many places that can be used for shared drive parallel system The a set of guiding process of compilation scheme for managing device programming, provides the high-rise abstractdesription to parallel algorithm.Specifically, For the operation of a large amount of iterative cycles present in the communication of algorithms (for example, for Do statement) and the lesser sentence of correlation, pass through OpenMP executes model using fork-join (bifurcated-merging) and main task is decomposed into multiple threads come parallelization, such as Fig. 7 institute Show, wherein fork creates thread or wakes up existing thread, the congregation of join, that is, multithreading.Fork-join model starts just When execution, only one be known as " main thread " active thread exist, main thread in the process of running, when encounter need into When row parallel computation, derives thread and carry out executing tasks parallelly.In parallel implementation, main thread and the derivation common work of thread Make, after parallel task executes, derives from thread and exit or block, no longer work, control flow back into individual main thread.
Specifically, the intention of oneself is indicated by the way that dedicated pragma is added in source code.For example, for following One section of code for seeking PI, needs to recycle 100,000 times (num_steps=100000), can by pragma come specified with 2 lines Journey (#define NUMTHREADS 2) is performed simultaneously, and thus program can be carried out parallelization automatically by compiler, and must Want place that synchronization and mutex and communication is added.
Fig. 8 shows the different threads number of same section of code and executes the comparison diagram of speed, wherein abscissa indicates thread Number, ordinate indicate execution speed (unit is the periodicity of DSP, and numerical value is bigger to indicate more time-consuming), and serial 1 curve is 10 times corresponding The case where the case where circulation, series 2 corresponds to 40 circulations, the case where circulation for 10 times with 40 circulations, when Thread Count is 2, It executes fastest.As it can be seen that it is not that Thread Count is The more the better when carrying out multi-threaded parallel using OpenMP, this is because Thread Count is more, and internuclear interactive expense is also bigger.In a preferred embodiment, Thread Count is set to the nucleus number with DSP It is equal, relatively high level is also at the time-consuming less and operational efficiency for guaranteeing less.
Step S320, schduling cycle instruction, so as to overlappingly execute different circulations.
In this step, by way of arrangement software flowing water, start one before preceding an iteration is not yet completed newly Loop iteration, so that the successive ignition recycled in a core is executed parallel.
In the specific implementation, it is thus necessary to determine that the minimum iteration interval of minimum iteration interval, a circulation refers to the circulation phase Adjacent iteration twice start between the minimum period number that has to wait for.Iteration interval is smaller, executes a circulation period used just It is fewer.By taking TI DSP executes fixed point dot-product operation as an example, if not using flowing water parallel, the periodicity of 1200 iteration is 19224, According to software flow it is parallel after, the periodicities of 1200 iteration is 696, and efficiency improves more than 27 times.
By way of arrangement software flowing water, it can be improved and effectively utilize resource and improve operation efficiency.
Step S330 links together a plurality of instruction, to increase instruction thread grade degree of parallelism.
For example, a plurality of instruction is linked together using very long instruction word (VLIM), the basic ideas of VLIW are: at DSP Reason device the ability that compiler controls all functional units is assigned in a long instruction words, enable compiler accurately Scheduling wherein executes each operation.
Fig. 9 shows the internal structure of DSP according to the present invention, by Fetch unit, decoding unit, execution unit and storage Four part of area composition.For example, dsp processor has 8 execution units, theoretically dsp processor for TI C6000 series DSP Each period can execute 8 single instrctions, this 8 single instrctions are regarded as an instruction packet, and Fetch unit, is held decoding unit Row unit every time operates an instruction packet.VLIW regards a plurality of single instrction as a long instruction, the course of work of VLIW The fetching to every long instruction, decoding, implementation procedure can be regarded as.Specifically, the C6618 model DSP used in the present invention Support the Fetch Packet of 8 32 single instrctions, i.e., total word length is 256, by one very long instruction word of this 256 compositions, wherein often The operation code that single instrction is 32, a Fetch Packet may include most 8 single instrctions, can be by each 256 instruction packets 8 execution units are assigned to execute parallel.
The parallel advantage of instruction thread grade can be played by using VLIW structure, the resource of DSP is taken full advantage of, thus substantially Improve arithmetic speed.
Step S340, while multiple operands of individual instructions are obtained, to improve operation efficiency.
SIMD (single-instruction multiple-data stream (SIMD)) technology can replicate multiple operands, and they are packaged in one group of register In instruction.Relative to single instruction single data stream (SISD), vector is capable of providing using single-instruction multiple-data stream (SIMD) technology in the present invention Processing capacity.For example, execution unit first accesses memory, obtains first operand after being decoded using SISD to addition instruction; It accesses memory again later, obtains second operand;It then just can be carried out summation operation.And use SIMD to addition instruction Several execution units can access memory simultaneously after decoding, disposably obtain all operands and carry out operation.For example, C6618 SIMD instruction, which expands to, supports 128bit vector data, such as QMPY32 is instructed to be able to carry out the corresponding 4 32bit numbers of two plural numbers According to multiplication.
In one embodiment, using the mixed structure of VLIW+SIMD, referring also to shown in Fig. 9, wherein fetching and access The process of memory block uses SIMD technology, can obtain simultaneously or multiple operands of store instruction, and the decoding to instruction VLIW structure is used with implementation procedure, to execute a plurality of instruction parallel.This mode can be from more fine granularity (i.e. instruction thread Grade) improve degree of parallelism.
For example, in order to obtain more instruction thread grade concurrencys, the multi-core DSP chip that the present invention uses instructs frame Structure supports the two-way SIMD operation of 16 data and four road SIMD operations of 8 data, can replicate multiple operands, once All instructions is executed, multiple identical datas can be packaged on 64 bit registers, while identical operation is carried out to it and (is added Subtract multiplication and division, movement, logical operation etc.), to accelerate arithmetic speed.In the specific implementation, linear assembler realization can be used This mixed structure specifies the modes such as register, customized parallel instruction to improve the scheduling efficiency of compiler by linear assembler.
It should be noted that, although each step is described according to particular order above, it is not intended that must press Each step is executed according to above-mentioned particular order, in fact, some in these steps can concurrently execute, or even is changed suitable Sequence, as long as can be realized required function.In addition, thought of the invention is also applied for other mobile communication standards.
The present invention can be system, method and/or computer program product.Computer program product may include computer Readable storage medium storing program for executing, containing for making processor realize the computer-readable program instructions of various aspects of the invention.
Computer readable storage medium can be to maintain and store the tangible device of the instruction used by instruction execution equipment. Computer readable storage medium for example can include but is not limited to storage device electric, magnetic storage apparatus, light storage device, electromagnetism and deposit Store up equipment, semiconductor memory apparatus or above-mentioned any appropriate combination.The more specific example of computer readable storage medium Sub (non exhaustive list) include: portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), static random access memory (SRAM), Portable compressed disk are read-only Memory (CD-ROM), memory stick, floppy disk, mechanical coding equipment, is for example stored thereon with instruction at digital versatile disc (DVD) Punch card or groove internal projection structure and above-mentioned any appropriate combination.
Various embodiments of the present invention are described above, above description is exemplary, and non-exclusive, and It is not limited to disclosed each embodiment.Without departing from the scope and spirit of illustrated each embodiment, for this skill Many modifications and changes are obvious for the those of ordinary skill in art field.The selection of term used herein, purport In principle, the practical application or to the technological improvement in market for best explaining each embodiment, or make the art its Its those of ordinary skill can understand each embodiment disclosed herein.

Claims (11)

1. a kind of distribution method of baseband processing resource, the Base-Band Processing are executed by multiple multi-core DSPs, this method includes following step It is rapid:
Step 1: the DSP quantity for constituting baseband processing unit and including is determined according to data volume to be processed;
Step 2: selecting the baseband processing unit to include according to the loading condition of each DSP and identified DSP quantity DSP。
2. according to the method described in claim 1, further include:
When the load of a DSP in the baseband processing unit is higher than first threshold, included in the baseband processing unit DSP between carry out load transmitting, to realize the load balancing between DSP that the baseband processing unit is included.
3. according to the method described in claim 1, further include:
When the load of multiple DSP in the baseband processing unit is below second threshold, in the baseband processing unit institute Load is carried out between the DSP for including to move, and there will be load aggregation on the DSP of relatively low load level to relatively high The DSP of load level.
4. method according to any one of claims 1 to 3, wherein further include step 3, which includes following sub-step It is rapid:
Step 31: each DSP of the baseband processing unit is divided into main core and multiple from core;
Step 32: the uplink task of Base-Band Processing and downlink task are separately disassembled into multiple uplink subtasks and multiple downlinks Task;
Step 33: the uplink subtask and the downlink subtask are distributed to by the Base-Band Processing list based on computation complexity The main core of each DSP of member and from core, wherein at least one uplink subtask and at least one downlink subtask are distributed to identical Core.
5. according to the method described in claim 4, further include:
During DSP execution task, for iterative cycles operation and the small sentence of correlation, it is decomposed into multiple thread parallels It executes.
6. according to the method described in claim 5, wherein, the Thread Count decomposed is equal with the nucleus number of DSP of the task is executed.
7. according to the method described in claim 5, further include:
Iterative cycles are operated, schduling cycle instructs so that preceding an iteration starts a new iteration before not yet completing.
8. according to the method described in claim 5, further include:
During DSP execution task, a plurality of single instrction is linked together using very long instruction word.
9. according to the method described in claim 5, further include:
Multiple operands corresponding for a single instrction, while reading or being written.
10. a kind of computer readable storage medium, is stored thereon with computer program, wherein when the program is executed by processor The step of realizing according to claim 1 to any one of 9 the method.
11. a kind of computer equipment, including memory and processor, be stored on the memory to transport on a processor Capable computer program, which is characterized in that the processor realizes any one of claims 1 to 9 institute when executing described program The step of method stated.
CN201711282563.4A 2017-12-07 2017-12-07 Baseband processing resource allocation method Active CN109905898B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711282563.4A CN109905898B (en) 2017-12-07 2017-12-07 Baseband processing resource allocation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711282563.4A CN109905898B (en) 2017-12-07 2017-12-07 Baseband processing resource allocation method

Publications (2)

Publication Number Publication Date
CN109905898A true CN109905898A (en) 2019-06-18
CN109905898B CN109905898B (en) 2022-10-11

Family

ID=66938907

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711282563.4A Active CN109905898B (en) 2017-12-07 2017-12-07 Baseband processing resource allocation method

Country Status (1)

Country Link
CN (1) CN109905898B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112243266A (en) * 2019-07-18 2021-01-19 大唐联仪科技有限公司 Data packaging method and device
CN112653638A (en) * 2020-12-14 2021-04-13 中科院计算技术研究所南京移动通信与计算创新研究院 Device for switching routes of multiple paths of intermediate frequencies and baseband at high speed and communication method thereof
WO2021089114A1 (en) * 2019-11-04 2021-05-14 NEC Laboratories Europe GmbH Autonomous virtual radio access network control
CN113038607A (en) * 2019-12-24 2021-06-25 大唐移动通信设备有限公司 Channel processing method, device and base station

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1984395A (en) * 2005-12-12 2007-06-20 大唐移动通信设备有限公司 Baseband processor based on multiple kernel construction processor
CN102681902A (en) * 2012-05-15 2012-09-19 浙江大学 Load balancing method based on task distribution of multicore system
CN105045658A (en) * 2015-07-02 2015-11-11 西安电子科技大学 Method for realizing dynamic dispatching distribution of task by multi-core embedded DSP (Data Structure Processor)
CN105915462A (en) * 2016-06-03 2016-08-31 中国航天科技集团公司第九研究院第七七研究所 Symmetrical RSS circuit facing TCP session
US20160316485A1 (en) * 2015-04-21 2016-10-27 Anoop Kumar Traffic scheduling system for wireless communication system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1984395A (en) * 2005-12-12 2007-06-20 大唐移动通信设备有限公司 Baseband processor based on multiple kernel construction processor
CN102681902A (en) * 2012-05-15 2012-09-19 浙江大学 Load balancing method based on task distribution of multicore system
US20160316485A1 (en) * 2015-04-21 2016-10-27 Anoop Kumar Traffic scheduling system for wireless communication system
CN105045658A (en) * 2015-07-02 2015-11-11 西安电子科技大学 Method for realizing dynamic dispatching distribution of task by multi-core embedded DSP (Data Structure Processor)
CN105915462A (en) * 2016-06-03 2016-08-31 中国航天科技集团公司第九研究院第七七研究所 Symmetrical RSS circuit facing TCP session

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
潘必超等: "面向 ESP 双核控制架构通信模块的设计与实现", 《计算机与现代化》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112243266A (en) * 2019-07-18 2021-01-19 大唐联仪科技有限公司 Data packaging method and device
CN112243266B (en) * 2019-07-18 2024-04-19 大唐联仪科技有限公司 Data packet method and device
WO2021089114A1 (en) * 2019-11-04 2021-05-14 NEC Laboratories Europe GmbH Autonomous virtual radio access network control
CN113038607A (en) * 2019-12-24 2021-06-25 大唐移动通信设备有限公司 Channel processing method, device and base station
CN113038607B (en) * 2019-12-24 2022-11-15 大唐移动通信设备有限公司 Channel processing method, device and base station
CN112653638A (en) * 2020-12-14 2021-04-13 中科院计算技术研究所南京移动通信与计算创新研究院 Device for switching routes of multiple paths of intermediate frequencies and baseband at high speed and communication method thereof

Also Published As

Publication number Publication date
CN109905898B (en) 2022-10-11

Similar Documents

Publication Publication Date Title
CN109905898A (en) Baseband processing resource distribution method
CN102360309B (en) Scheduling system and scheduling execution method of multi-core heterogeneous system on chip
CN106095583B (en) Principal and subordinate's nuclear coordination calculation and programming frame based on new martial prowess processor
Meng et al. Dedas: Online task dispatching and scheduling with bandwidth constraint in edge computing
CN108363615B (en) Method for allocating tasks and system for reconfigurable processing system
CN103970580B (en) A kind of data flow towards multinuclear cluster compiles optimization method
CN105159762B (en) Heuristic cloud computing method for scheduling task based on Greedy strategy
CN104536937B (en) Big data all-in-one machine realization method based on CPU GPU isomeric groups
CN104331321B (en) Cloud computing task scheduling method based on tabu search and load balancing
CN103279390B (en) A kind of parallel processing system (PPS) towards little optimization of job
CN109992407B (en) YARN cluster GPU resource scheduling method, device and medium
CN101366004A (en) Methods and apparatus for multi-core processing with dedicated thread management
CN103809936A (en) System and method for allocating memory of differing properties to shared data objects
CN103699432B (en) Multi-task runtime collaborative scheduling system under heterogeneous environment
CN103279445A (en) Computing method and super-computing system for computing task
CN103793255B (en) Starting method for configurable multi-main-mode multi-OS-inner-core real-time operating system structure
CN102193779A (en) MPSoC (multi-processor system-on-chip)-oriented multithread scheduling method
CN102135949A (en) Computing network system, method and device based on graphic processing unit
CN101464965B (en) Multi-nuclear parallel ant group design method based on TBB
Zheng et al. Architecture-based design and optimization of genetic algorithms on multi-and many-core systems
CN104090826B (en) Task optimization deployment method based on correlation
CN111158790B (en) FPGA virtualization method for cloud deep learning reasoning
WO2012152948A1 (en) Microcomputer for low power efficient baseband processing
Shafique et al. Minority-game-based resource allocation for run-time reconfigurable multi-core processors
CN104965762B (en) A kind of scheduling system towards hybrid task

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant