CN109656861A - A kind of multi-core parallel concurrent signal processing system and method based on SRIO bus - Google Patents

A kind of multi-core parallel concurrent signal processing system and method based on SRIO bus Download PDF

Info

Publication number
CN109656861A
CN109656861A CN201811230162.9A CN201811230162A CN109656861A CN 109656861 A CN109656861 A CN 109656861A CN 201811230162 A CN201811230162 A CN 201811230162A CN 109656861 A CN109656861 A CN 109656861A
Authority
CN
China
Prior art keywords
dsp
computation module
signal processing
srio
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811230162.9A
Other languages
Chinese (zh)
Inventor
杨经纬
黄勇
唐琳
陈曦
李爽爽
李灿乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Radio Equipment Research Institute
Original Assignee
Shanghai Radio Equipment Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Radio Equipment Research Institute filed Critical Shanghai Radio Equipment Research Institute
Priority to CN201811230162.9A priority Critical patent/CN109656861A/en
Publication of CN109656861A publication Critical patent/CN109656861A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4208Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being a system bus, e.g. VME bus, Futurebus, Multibus
    • G06F13/4213Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being a system bus, e.g. VME bus, Futurebus, Multibus with asynchronous protocol

Abstract

The invention discloses a kind of multi-core parallel concurrent signal processing systems and method based on SRIO bus, are related to hardware architecture and software architecture, are based on SRIO high speed exchange chip, realize that high speed interconnects between DSP;Using asynchronous message mechanism, realizes the maximization of each DSP core operation handling capacity, the degree of parallelism of system whole efficiency and calculating can be effectively improved;The present invention is based on the programming models of asynchronous message, and developer is allowed efficiently and rapidly to realize the mapping of Parallel signal processing algorithm to DSP processing unit, and the present invention can be used for the fields such as radar, Digital Image Processing.

Description

A kind of multi-core parallel concurrent signal processing system and method based on SRIO bus
Technical field
The invention belongs to Digital Signal Processing and parallel computing fields more particularly to a kind of based on the more of SRIO bus Core Parallel Signal Process System and method.
Background technique
With the increase of signal processing precision and complexity, multicore, many-core signal processing system are widely applied, and are passed through Set signal processing algorithm is subjected to parallelization exploitation, formation can be in multiple DSP (digital signal processor) core simultaneously The program of row operation, to realize the acceleration to existing signal processing algorithm, and promotes the real-time performance of related application.Internuclear base The distribution and collection of data are realized in shared storage or high-speed bus (positioned at the internuclear of different processor), therefore multiple nucleus system Acceleration capacity depends not only on the speed of service of processor core, additionally depends on intercore communication rate.The present invention is for multiple more Core dsp system proposes a kind of Parallel signal processing frame for being based on high-speed communication bus (such as SRIO), can be used as at various signals The architecture of reason system.High-speed communication bus is applicable not only to SRIO (serial high speed input and output), also can be used HyperLink, PCIE (peripheral component interconnect express, a kind of high speed serialization computer expansion Exhibition bus standard), the buses such as GbE (gigabit Ethernet) are substituted.
The prior art proposes a kind of signal processing system framework for specific application, and the framework is by Xilinx company The Keystone series high-performance multi-core DSP processor TMS320C6678 structure of large-scale F PGA device Vertex6 and TI company At wherein FPGA is 1, and DSP is 4.It is realized by a variety of HSSI High-Speed Serial Interfaces such as SRIO, GTX, PCIE or HyperLink The data communication of equipment room.The system can be used for the radars such as SAR/ISAR imaging, pulse detection, SAR image match cognization letter Number process field.The program is described only for specific hardware structure frame, does not relate to corresponding software architecture, And without discussing for specific interconnection, content is more wide in range.
The modulation /demodulation that hardware platform under SRIO (serial high speed input and output) bus architecture is further related in this field is hard Part level of abstraction design method, more particularly to wherein the modulation /demodulation hardware abstraction layer based on DSP (digital signal processor) is (following Abbreviation MHAL) implementation method.By creation purpose component LD and the mapping table of purpose equipment PD, local component LD registration table, A series of key messages such as local component LD and the mapping table of component task signal amount, DSP bottom hardware is answered with upper layer software (applications) With separating, the exploitation and deployment of software are constrained using the framework of standard, to ensure the portability of software, reusability and can Scalability.Its emphasis be SRIO bus application and component devices be abstracted, and to upper layer algorithm dispose be related to compared with It is few.
Researcher also proposes the parallel calculating method of a kind of concurrent computational system based on GbE and SRIO and its use. Outer input interface inputs pending data to Master by the data input module, by Master by the number to be processed According to available slave is averagely allocated to, each slave receives corresponding pending data, will according to the interior nucleus number of slave Corresponding pending data is averagely allocated to the kernel of the slave to be executed accordingly by corresponding kernel, and Implementing result is returned into Master, is merged by the implementing result that Master returns to all slave, and export mould by data Block is exported through external output interface.The distribution in relation to data is more mechanical in the calculation method method of patent description, flexibility It is slightly inadequate, due to joined GbE, it is more suitable for the large-scale distributed signal processing system of multiple independent particle system compositions System.
Currently, researcher also summarizes the development of parallel processing technique and overview, and to processing unit, Parallel processor network structure, parallel algorithm and task scheduling algorithm are discussed, and emphasis is to prior art Analysis and combing.In addition, also specifically to the DSP device TMS320C67x of the DSP device ADSP21160 of ADI company and TI company Comparative introduction is carried out.Finally, the parallelization of author's fft algorithm expands discussion.The technology to specific architecture and The practical operation of algorithm is related to less, and is also lacking to internuclear or inter-processor communication realization.
In addition, in the prior art also using SRIO as internal system interconnection, to be integrated with SRIO interface TMS320C6455 is core processor, designs a kind of more DSP Parallel Signal Process Systems.This method Analytical System Design thinking, And hardware configuration is provided, and verified to the bandwidth and reconfigurability of system.Equally, the program is to algorithm and architecture Mapping be not discussed, content is more wide in range.
The present invention is based on the deficiency of the above method, propose at a kind of multicore (multiprocessor) parallel signal based on SRIO Reason system, content are related to hardware architecture and software architecture, are with the main distinction of the above method using asynchronous message Mechanism realizes the maximization of each processor core operation handling capacity, can effectively improve system whole efficiency and signal processing Speed.On this basis, a kind of programming model based on asynchronous message is described, developer can efficiently and rapidly realize Mapping of the Parallel signal processing algorithm to DSP processing unit.
Summary of the invention
The object of the present invention is to provide a kind of multi-core parallel concurrent signals for being based on SRIO bus (or other high-speed serial bus) Processing system and method include hardware architecture and algorithm framework, provide a whole set of multicore System Signal Processing Scheme can make full use of calculating and the communication resource of existing DSP, sufficiently excavate the degree of parallelism hidden in signal processing algorithm, Optimize signal processing system operational efficiency, lifting system real-time performance.
To achieve the goals above, the invention discloses a kind of multi-core parallel concurrent signal processing system based on SRIO bus, Include:
If dry plate DSP is internally provided with SRIO communication module;
High speed interconnecting and switching chip, in fact the SRIO interconnection between incumbent meaning two panels DSP;The high speed interconnecting and switching chip SRIO interface is drawn by VPX interface;
For the power management of each processing unit and the CPLD of interface management;
Fpga chip is connect based on SRIO interface with the high speed interconnecting and switching chip, make the fpga chip and DSP it Between realize interconnection.
Preferably, if the number of the dry plate DSP is four, which uses TMS320C6678 digital signal processor; Every DSP has independent Flash storage unit, and every DSP includes several interfaces;It include 8 processors inside every DSP Core, processor core use long instruction structured,
Preferably, the high speed interconnecting and switching chip uses CPS1848 high speed interconnecting and switching chip;The CPS1848 high Fast interconnecting and switching chip provides 12 road 4X SRIO signals, wherein 4 tunnels and 4 TMS320C6678 digital signal processors It is connected, 2 tunnels are connected with the fpga chip, and remaining 6 tunnels are connected to the VPX interface.
Preferably, the VPX interface is believed including one or more of power supply, GPIO, interruption, GbE, PCIE interface Number.
Preferably, the DSP, the high speed interconnecting and switching chip, the CPLD and the fpga chip are arranged at one On board;The board also provides 2 road RS422 interfaces and 32 road GPIO signals to VPX interface;Be provided on the board by Fpga chip is responsible for the temperature sensing chip of control read-write.
Preferably, the multi-core parallel concurrent signal processing system, further includes:
Multiple parallel computation module modules that signal processing algorithm is divided are based on message transmission side between each other Formula realizes data transmit-receive, is finally completed signal processing calculating process;Multiple computation modules are distributed on different DSP cores;
Logic connection module between computation module module;
The message distributor called by computation module, operates on DSP core, responsible inspection is sent to local message simultaneously Corresponding computation module is activated in time;
Thread pool module, it is internal that multiple idle threads are set, when computation module receives data-message and there are idle lines It is allowed for access when journey ready state, thread provides running environment for computation module, and the component thread into ready state is scheduled and executes.
Preferably, each computation module includes data to be processed and handles function accordingly, the data to be processed The data buffer zone of computation module is transmitted to message mode, the precondition of computation module operation is that pending data arrived The data buffer zone of computation module;
After calculating process, processing function sends next stage computation module for processing result with message mode again.
Preferably, the computation module operates in independent thread space, supports preemption mechanism.
Preferably, the multi-core parallel concurrent signal processing system is based on asynchronous messaging model, so that a computation module is being counted Other computation modules can be called during calculation, and during waiting other computation modules to return to calculated result, the calculating group Part can't continue block, can register one processing returns to calculated result computation module, implementation procedure is then log out, to it The calculated result of his component returns and then completes subsequent calculating process by the computation module of message distributor activation registration, Before calculated result returns, current DSP core is in idle condition, and can dispatch other by message distributor has data-message The computation module of service condition is reached or has, to make full use of DSP computing resource.
The present invention also provides a kind of using the multi-core parallel concurrent signal processing system based on SRIO bus as described above Multi-core parallel concurrent signal processing method, this method include following procedure:
Multi-core parallel concurrent signal processing system starting, if the respective startup program of dry plate DSP loaded in parallel;
After startup program starts execution, multi-core parallel concurrent signal processing system carries out initial configuration according to configuration needs;
All DSP establish physical connection with high speed interconnecting and switching chip, and data transmission is realized between DSP;
Concurrent development is carried out to algorithm to be divided into algorithmic code multiple relatively independent using intercommunication primitive as cut-point Sub-block, and computation module is constructed based on sub-block, the communication calculated according to sub-block DSP between the demand and sub-block of power is closed System, is mapped to DSP core for the sub-block in algorithm;
The computation module being assigned in same processor core is compiled connection, constitutes independent process;
After finishing to all process initiations, corresponding logical connection is established simultaneously according to the correspondence between computation module Message buffer is opened up, computation module is then registered to corresponding message channel;
After a computation module completes calculating process, function can be sent by processing result by unified abstract message It is sent to next stage computation module, until completing to calculate.
Compared with prior art, the invention has the benefit that (1) hardware architecture of the present invention is made of DSP, in order to Lifting system overall performance is realized interconnection based on high-speed serial bus SRIO between DSP, is guaranteed as at DSP using multi-core DSP Communication provides the bandwidth and delay for meeting design requirement between reason device.(2) present invention is in order to guarantee same processor intercore communication It is abstract to carry out unification to intercore communication interface, and can flexibly fit for the consistency of primitive and different processor intercore communication primitive With different bottom communication media, therefore the signal processing method proposed by the present invention based on SRIO bus can be also used for The multiple processor system of the interconnections such as HyperLink, PCIE, GbE.(3) upper layer algorithm of the invention is based on asynchronous message mechanism, by Message realizes the driving to computation module;Each computation module is made of data to be processed and corresponding processing method, is being sealed Signal processing tasks are completed in the environment closed and processing result is sent to next stage processing component.(4) present invention can be abundant The degree of parallelism hidden in signal processing algorithm is sufficiently excavated in calculating and the communication resource using existing DSP, optimizes signal processing system System operational efficiency, lifting system real-time performance.
Detailed description of the invention
Fig. 1 multi-core parallel concurrent signal processing system hardware block diagram of the invention based on SRIO bus;
The software architecture block diagram of Fig. 2 multi-core parallel concurrent signal processing method of the invention based on SRIO bus.
Specific embodiment
The invention discloses a kind of multi-core parallel concurrent signal processing systems and method based on SRIO bus, in order to make the present invention More obvious and easy to understand, below in conjunction with the drawings and specific embodiments, the present invention will be further described.
As shown in Figure 1, it includes multi-disc multicore that the multi-core parallel concurrent signal processing system of the invention based on SRIO bus, which is equipped with, The board of DSP, the board include 4 DSP, a piece of CPS1848 high speed interconnecting and switching chip, a VPX interface, piece of CPLD, A piece of fpga chip.
It include 8 processor cores inside each DSP, processor core uses VLIW (very long instruction word) framework, theoretical On can at most execute 8 instructions parallel.
The effect of CPS1848 high speed interconnecting and switching chip is: being based between the built-in DSP of SRIO communication module CPS1848 high speed interconnecting and switching chip realizes interconnection.
VPX interface mainly include power supply, GPIO (General Purpose Input Output, universal input/output), The interface signals such as interruption, GbE, PCIE, CPS1848 high speed interconnecting and switching chip also pass through VPX interface and draw SRIO interface, realize The interconnection of plate grade.
(Complex Programmable Logic Device, a kind of user voluntarily construct CPLD according to respective need The digital integrated electronic circuit of logic function) it is mainly used for the power management and interface management of each processing unit on board, realize board Electrifying timing sequence control and low-speed interface function.
Fpga chip is mainly used for constructing accelerating module and interface conversion.Fpga chip be equally based on SRIO interface with The connection of CPS1848 high speed interconnecting and switching chip, therefore interconnection is realized between fpga chip and DSP.
The present invention is with TMS320C6678 (it is the fixed floating-point signal processor of multicore of the TI based on KeyStone) number For word signal processor, describes entire the hardware-initiated of processing system of the present invention, peripheral equipment configuration, the mapping of component, leads to The processes such as letter establishment of connection, the sending and receiving of data, the scheduling of message, the complete operation scene that system is presented.
In the present embodiment, every DSP contains SRIO, HyperLink, PCIE, SGMII (or GbE) interface, core work Working frequency is 1.0GHz, and every DSP provides 2GB DDR3 SDRAM memory, provides 1Gb program storage Flash.Institute in board There is the SGMII interface inter-link of TMS320C6678 digital signal processor to form daisy chain, final external SGMII signal all the way.
CPS1848 high speed interconnecting and switching chip provides 12 road 4X SRIO signals, wherein 4 tunnels and 4 TMS320C6678 numbers Word signal processor is connected, and 2 tunnels are connected with fpga chip, and remaining 6 tunnels are connected to VPX interface, therefore can also lead between veneer It crosses SRIO and realizes interconnection.
In addition, every piece of board also provides 2 road RS422 interfaces and 32 road GPIO signals to VPX connector, meanwhile, in order to can It is needed by property, sets up temperature sensing chip near multiple processing units also on board, control read-write is responsible for by fpga chip.
As shown in Fig. 2, the present invention also provides a kind of upper layer software (applications) framework method, it is specific as follows:
The software architecture includes that logical connection, message distributor, process and the thread between computation module, computation module are real The contents such as body, signal processing algorithm are divided into multiple parallel computation components, are based on message transmission mode between component and realize number According to transmitting-receiving, and it is finally completed signal processing calculating process.
It provides and the primitive of bottom communication medium is abstracted, number is transmitted based on unified message interface between all computation modules According to.Wherein, there is no inevitable corresponding relationship between computation module and DSP core, multiple calculating can be bound on each DSP core Component, i.e., each computation module are distributed on different DSP cores, with message mode driving calculating process between computation module It promotes, until being finally completed calculating.Wherein, for the different computation modules on same DSP, it is based on on-chip SRAM (Static Random-Access Memory, SRAM, static random access memory, such as L2 or the shared storage MSM of on piece) communication, it is right It in the different computation modules on different DSP, is communicated based on SRIO, bottom communication medium is transparent for computation module.
Each computation module includes data to be processed and handles function accordingly, and data to be processed are passed with message mode It is delivered to the data buffer zone of computation module, the transmitting of message is based on SRIO, DDR or on-chip SRAM (L2 or MSM).Pending data Arrival is the precondition that computation module brings into operation, and after calculating process, processing function again will processing knot with message mode Fruit is sent to next stage computation module.Wherein, computation module operates in independent thread space, supports preemption mechanism.
A message distributor is run on each DSP core, responsible inspection is sent to local message and activation in time is corresponding Computation module, which is called by computation module.
It include multiple idle threads in the thread pool, when computation module receives data-message the present invention also provides thread pool And system is there are ready state of being allowed for access when idle thread, can only be scheduled execution into the component thread of ready state.
Complete calculating process may be divided into multiple and different computation modules, and each computation module is distributed in different On DSP core, with the propulsion of message mode driving calculating process between computation module, until being finally completed calculating.Wherein, due to The present embodiment is that then a computation module may may require that in calculating process calls other calculating groups based on asynchronous messaging model Part, during waiting other computation modules to return to calculated result, which can't continue to block, but can register One processing returns to calculated result computation module, be then log out implementation procedure;Calculated result to other assemblies returns to it Afterwards, then by the computation module of message distributor activation registration subsequent calculating process is completed;Before calculated result return, currently DSP core is in idle condition, other can be dispatched by message distributor has data-message to reach or have service condition Computation module, to make full use of DSP computing resource.
As an embodiment of the present invention, after the starting of entire processing system, since each DSP has independent Flash Storage unit, as shown in Figure 1, therefore 4 respective startup programs of DSP loaded in parallel start journey before starting calculating task Sequence can complete a series of software and hardware initialization procedure.According to the difference of configuration, there are many Starting mode by DSP, can pass through base In the Flash starting of SPI interface connection, can also be started by the Flash based on EMIF interface, in addition, the starting interface of DSP It can also be Ethernet, PCEI, SRIO, HyperLink and IIC etc..It, can be according to configuration need after startup program starts execution It asks, to system PLL (Phase Locked Loop, phaselocked loop), UART (Universal Asynchronous Receiver/ Transmitter, universal asynchronous receiving-transmitting transmitter), clock counter, DDR Memory Controller Hub, GPIO, IIC, SRIO and CPS1848 exchange chip carries out initial configuration.
The reference clock of TMS320C6678 digital signal processor in the present embodiment is 100MHz, to core PLL frequency multiplication Register is carried out with frequency dividing register with postponing, and final core frequency is 1.0GHz, and in addition to this, startup program is also right The frequency divider of other configurable PLLs on TMS320C6678 internal clocking tree is configured, when providing work for peripheral hardware in piece Clock.The TMS320C6678 memory uses DDRIII chip particle, as shown in Figure 1, it is that reference clock frequency is that bit wide, which is 64, 100MHz matches by frequency multiplication frequency dividing and postpones, equivalent frequency 1333MHz.UART baud rate is configured to 115200, word length 8 Position, stop position 1, no parity check, UART is mainly used for terminal input and output here.GPIO is partially configured to interrupt input, portion Divide and be configured as output to control, for controlling the reset of external chip.IIC is mainly used for the configuration of CPS1848, and bus frequency is matched It is set to 400KHz.TMS320C6678 (being only limitted to main DSP) is wide to the SRIO port speed of CPS1848, port by IIC interface Degree, port traffic control, the enabled etc. of port routing and port are controlled.
4X mode, operating rate 3.125GHz are configured by the port SRIO in the present embodiment.Finally, startup program pair SRIO module inside TMS320C6678 is configured accordingly, the fixed ID number of each DSP configuration, the end for CPS1848 Mouth routing.So far, all DSP establish physical connection with CPS1848, and data transmission can be realized based on ID between DSP.
After hardware configuration to present system finishes, system starts the configuration process of software environment.Software environment Configuration process configures two links when configuring and run when being divided into design.In the design phase, programming personnel is according to DSP performance, DSP The information for interconnecting multiple dimensions such as topological structure, SRIO bandwidth, data processing scale, data processing delay carries out simultaneously algorithm Algorithmic code is divided into multiple relatively independent sub-blocks, and be basic structure with sub-block using intercommunication primitive as cut-point by row exploitation Build computation module.The correspondence between the demand and sub-block of power is calculated DSP according to sub-block, and the sub-block in algorithm is mapped to DSP core, basic principle are as follows:
(1) according to the correspondence between sub-block and the concurrency optimization principle of calculating, by the computation module set of algorithm Multiple subsets are divided into, different sons are higher than from the requirement of frequency to communication bandwidth, delay between the computation module in same subset Collection calculates the requirement between component to communication bandwidth, delay and frequency;
(2) under conditions of calculating power allows, the computation module in same subset is assigned to same DSP core as far as possible, if Force request is calculated beyond except DSP core ability, then has to take the second best, the computation module in same subset is assigned to as far as possible same DSP, if the remaining processor core on same DSP is not able to satisfy the calculation power demand of computation module in subset still, by subset Interior computation module is assigned to the lesser two or more DSP of SRIO communication delay as far as possible;
(3) next subset is assigned and then distributed to the computation module in a subset, if last distribution It can satisfy the subset demand with the presence of the remaining power of calculating of DSP core in the process, this is preferentially assigned to the DSP core, otherwise will be sub Collection is assigned to new DSP core or processor, the same principle of method (2);
It (4) in the assignment procedure, can be with if current subnet calculates power demand and exceeds a DSP core or processor requirement The computation module that will exceed part, which is assigned to calculation power, vacant other DSP cores or processor, the same principle of distribution principle (2);
(5) and so on, after all subset allocations, algorithm design process terminates.
On the basis of designing herein, the computation module being assigned in same processor core is compiled connection, is constituted Independent process.Therefore, system during starting can according to the allocation map relationship between process and processor core, Different processor cores starts corresponding process in the heart.It should be noted that due to the concept for introducing process/thread, Software systems need to provide corresponding support.Here operating system can be introduced, oneself can also realize and support process/thread function The runtime system of energy.
After finishing to all process initiations, system also needs to be established according to the correspondence between computation module corresponding Logical connection simultaneously opens up message buffer, and computation module is then registered to corresponding message channel.So far, software environment is first Beginning process is basically completed.After a computation module completes calculating process, function can be sent by unified abstract message Next stage computation module is sent by processing result.The message distributor in process where next stage computation module can be examined in due course The message reached is found, once some message channel has new information to reach, then activation is registered to the computation module in the channel, if worked as Available free thread in the thread pool of preceding process, the then computation module activated can be run in idle thread context, i.e. the thread Into ready wait wait scheduled execute.The arrival of message is responsible for inspection, the operation opportunity of message distributor by message distributor Usually terminate in message transmission, thread abandons DSP execution, and interrupt processing terminates or the moment such as system free time.Based on application It needs, message can be with configuration preference level, and distributor is according to the priority scheduling computation module of message, and the message of high priority is very The corresponding computation module of low priority message can extremely be seized.
Here asynchronous message mechanism is used, main advantage is to can be realized bigger calculating handling capacity or degree of parallelism. In traditional programming model, all codes of a task are executed in a thread context in order, if intermediate certain section of code Be blocked since message sends or receives process, then subsequent code must not be executed due to obstruction, even if its independent of Message sends or receives process, this just seriously limits the degree of parallelism of program.
In the present invention, since the code to task is divided, multiple computation modules are formed.Where certain section of code Computation module be blocked since message sends or receives process, then the subsequent computation module for not relying on the process process It can continue to execute in this DSP core without waiting for the arrival of news or the end of receive process or subsequent computation module even can be with It executes even with the computation module being blocked prior to the computation module execution being blocked in other DSP, thus effectively mentions parallel The operational efficiency for having risen DSP core reduces its free time.
Pass through the segmentation of algorithmic code block, the mapping of component, the foundation of communication connection, the transmitting-receiving of message and computation module The scheduling of (thread), a complete signal processing algorithm are mapped to DSP processing unit and are efficiently executed, algorithm and Row is adequately developed, and systematic entirety can be promoted effectively.
It is discussed in detail although the contents of the present invention have passed through above preferred embodiment, but it should be appreciated that above-mentioned Description is not considered as limitation of the present invention.After those skilled in the art have read above content, for of the invention A variety of modifications and substitutions all will be apparent.Therefore, protection scope of the present invention should be limited to the appended claims.

Claims (10)

1. a kind of multi-core parallel concurrent signal processing system based on SRIO bus, characterized by comprising:
If dry plate DSP is internally provided with SRIO communication module;
High speed interconnecting and switching chip, in fact the SRIO interconnection between incumbent meaning two panels DSP;The high speed interconnecting and switching chip passes through VPX interface draws SRIO interface, realizes the interconnection of plate grade;
For the power management of each processing unit and the CPLD of interface management;
Fpga chip is connect based on SRIO interface with the high speed interconnecting and switching chip, is made real between the fpga chip and DSP Now interconnect.
2. as described in claim 1 based on the multi-core parallel concurrent signal processing system of SRIO bus, which is characterized in that
If the number of the dry plate DSP is four, which uses TMS320C6678 digital signal processor;
Every DSP has independent Flash storage unit, and every DSP includes several interfaces;
It include 8 processor cores inside every DSP, processor core uses long instruction structured.
3. as claimed in claim 1 or 2 based on the multi-core parallel concurrent signal processing system of SRIO bus, which is characterized in that
The high speed interconnecting and switching chip uses CPS1848 high speed interconnecting and switching chip;
The CPS1848 high speed interconnecting and switching chip provides 12 road 4X SRIO signals, wherein 4 tunnels with described in 4 TMS320C6678 digital signal processor is connected, and 2 tunnels are connected with the fpga chip, and remaining 6 tunnels are connected to the VPX interface.
4. as described in claim 1 based on the multi-core parallel concurrent signal processing system of SRIO bus, which is characterized in that
The VPX interface includes one or more of power supply, GPIO, interruption, GbE, PCIE interface signal.
5. the multi-core parallel concurrent signal processing system as described in claim 1 or 4 based on SRIO bus, which is characterized in that
The DSP, the high speed interconnecting and switching chip, the CPLD and the fpga chip are arranged on a board;
The board also provides 2 road RS422 interfaces and 32 road GPIO signals to VPX interface;
The temperature sensing chip for being responsible for control read-write by fpga chip is provided on the board.
6. as described in claim 1 based on the multi-core parallel concurrent signal processing system of SRIO bus, which is characterized in that
It further includes:
Multiple parallel computation module modules that signal processing algorithm is divided, it is real based on message transmission mode between each other Existing data transmit-receive, is finally completed signal processing calculating process;Multiple computation modules are distributed on different DSP cores;
Logic connection module between computation module module;
The message distributor called by computation module, operates on DSP core, and responsible inspection is sent to local message and timely Activate corresponding computation module;
Thread pool module, it is internal that multiple idle threads are set, when computation module receives data-message and there are when idle thread It is allowed for access ready state, thread provides running environment for computation module, and the component thread into ready state is scheduled and executes.
7. as claimed in claim 6 based on the multi-core parallel concurrent signal processing system of SRIO bus, which is characterized in that
Each computation module includes data to be processed and handles function accordingly, and the data to be processed are passed with message mode It is delivered to the data buffer zone of computation module, the precondition of computation module operation is the number that pending data arrived computation module According to buffer area;
After calculating process, processing function sends next stage computation module for processing result with message mode again.
8. the multi-core parallel concurrent signal processing system based on SRIO bus as claimed in claims 6 or 7, which is characterized in that
The computation module operates in independent thread space, supports preemption mechanism.
9. the multi-core parallel concurrent signal processing system based on SRIO bus as claimed in claims 6 or 7, which is characterized in that
The multi-core parallel concurrent signal processing system is based on asynchronous messaging model, so that a computation module can be adjusted in calculating process With other computation modules, and during waiting other computation modules to return to calculated result, which can't continue Obstruction, can register one processing returns to calculated result computation module, be then log out implementation procedure, the calculating to other assemblies As a result it returns and then subsequent calculating process is completed by the computation module of message distributor activation registration, returned in calculated result Before, current DSP core is in idle condition, other can be dispatched by message distributor has data-message to reach or have The computation module of service condition, to make full use of DSP computing resource.
10. a kind of using the multi-core parallel concurrent signal processing system as described in any one of claims 1-9 based on SRIO bus Multi-core parallel concurrent signal processing method, which is characterized in that this method include following procedure:
Multi-core parallel concurrent signal processing system starting, if the respective startup program of dry plate DSP loaded in parallel;
After startup program starts execution, multi-core parallel concurrent signal processing system carries out initial configuration according to configuration needs;
All DSP establish physical connection with high speed interconnecting and switching chip, and data transmission is realized between DSP;
Concurrent development is carried out to algorithm, algorithmic code is divided into multiple relatively independent sub-blocks using intercommunication primitive as cut-point, And computation module is constructed based on sub-block, the correspondence between the demand and sub-block of power is calculated DSP according to sub-block, will be calculated Sub-block in method is mapped to DSP core;
The computation module being assigned in same processor core is compiled connection, constitutes independent process;
After being finished to all process initiations, corresponding logical connection is established according to the correspondence between computation module and is opened up Then computation module is registered to corresponding message channel by message buffer;
After a computation module completes calculating process, function can be sent by unified abstract message and send processing result To next stage computation module, until completing to calculate.
CN201811230162.9A 2018-10-22 2018-10-22 A kind of multi-core parallel concurrent signal processing system and method based on SRIO bus Pending CN109656861A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811230162.9A CN109656861A (en) 2018-10-22 2018-10-22 A kind of multi-core parallel concurrent signal processing system and method based on SRIO bus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811230162.9A CN109656861A (en) 2018-10-22 2018-10-22 A kind of multi-core parallel concurrent signal processing system and method based on SRIO bus

Publications (1)

Publication Number Publication Date
CN109656861A true CN109656861A (en) 2019-04-19

Family

ID=66110328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811230162.9A Pending CN109656861A (en) 2018-10-22 2018-10-22 A kind of multi-core parallel concurrent signal processing system and method based on SRIO bus

Country Status (1)

Country Link
CN (1) CN109656861A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110045743A (en) * 2019-05-05 2019-07-23 西北工业大学 It is a kind of to be added in the air by oily visual sensing system hardware structure based on nobody of DSP and FPGA
CN110737219A (en) * 2019-10-12 2020-01-31 四川赛狄信息技术股份公司 DSP-based digital signal processing device
CN110851337A (en) * 2019-11-18 2020-02-28 天津津航计算技术研究所 High-bandwidth multi-channel multi-DSP computing blade device suitable for VPX architecture
CN110855552A (en) * 2019-11-01 2020-02-28 中国人民解放军国防科技大学 Hardware abstraction layer message forwarding method based on cache static allocation
CN110908946A (en) * 2019-11-05 2020-03-24 北京理工大学 VPX high-performance digital signal processing board
CN112601234A (en) * 2020-11-20 2021-04-02 中电科仪器仪表(安徽)有限公司 Multi-core DSP-based multi-channel 5G signal demodulation device
CN112825101A (en) * 2019-11-21 2021-05-21 北京希姆计算科技有限公司 Chip architecture, data processing method thereof, electronic device and storage medium
CN112882962A (en) * 2019-11-29 2021-06-01 上海微电子装备(集团)股份有限公司 Data interaction system and method based on VPX architecture
CN112905124A (en) * 2021-02-23 2021-06-04 记忆科技(深圳)有限公司 Asynchronous low-power-consumption signal processing method and device, computer equipment and storage medium
CN112946580A (en) * 2021-01-14 2021-06-11 无锡国芯微电子系统有限公司 Multiprocessor cooperative radiation source frequency parameter estimation device and method
CN113608894A (en) * 2021-08-04 2021-11-05 电子科技大学 Fine granularity-oriented algorithm component operation method
CN114050838A (en) * 2021-10-30 2022-02-15 西南电子技术研究所(中国电子科技集团公司第十研究所) 100Gbps bandwidth RapidIO signal source
CN114826849A (en) * 2022-03-19 2022-07-29 西安电子科技大学 DSP local reconstruction method and system for communication signal identification processing

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110045743A (en) * 2019-05-05 2019-07-23 西北工业大学 It is a kind of to be added in the air by oily visual sensing system hardware structure based on nobody of DSP and FPGA
CN110737219A (en) * 2019-10-12 2020-01-31 四川赛狄信息技术股份公司 DSP-based digital signal processing device
CN110855552A (en) * 2019-11-01 2020-02-28 中国人民解放军国防科技大学 Hardware abstraction layer message forwarding method based on cache static allocation
CN110855552B (en) * 2019-11-01 2021-09-03 中国人民解放军国防科技大学 Hardware abstraction layer message forwarding method based on cache static allocation
CN110908946A (en) * 2019-11-05 2020-03-24 北京理工大学 VPX high-performance digital signal processing board
CN110851337A (en) * 2019-11-18 2020-02-28 天津津航计算技术研究所 High-bandwidth multi-channel multi-DSP computing blade device suitable for VPX architecture
CN110851337B (en) * 2019-11-18 2023-08-08 天津津航计算技术研究所 High-bandwidth multichannel multi-DSP (digital Signal processor) computing blade device suitable for VPX (virtual private X) architecture
CN112825101A (en) * 2019-11-21 2021-05-21 北京希姆计算科技有限公司 Chip architecture, data processing method thereof, electronic device and storage medium
CN112825101B (en) * 2019-11-21 2024-03-08 广州希姆半导体科技有限公司 Chip architecture, data processing method thereof, electronic equipment and storage medium
CN112882962A (en) * 2019-11-29 2021-06-01 上海微电子装备(集团)股份有限公司 Data interaction system and method based on VPX architecture
CN112882962B (en) * 2019-11-29 2023-06-02 上海微电子装备(集团)股份有限公司 Data interaction system and method based on VPX architecture
CN112601234A (en) * 2020-11-20 2021-04-02 中电科仪器仪表(安徽)有限公司 Multi-core DSP-based multi-channel 5G signal demodulation device
CN112946580A (en) * 2021-01-14 2021-06-11 无锡国芯微电子系统有限公司 Multiprocessor cooperative radiation source frequency parameter estimation device and method
CN112905124A (en) * 2021-02-23 2021-06-04 记忆科技(深圳)有限公司 Asynchronous low-power-consumption signal processing method and device, computer equipment and storage medium
CN112905124B (en) * 2021-02-23 2023-03-21 记忆科技(深圳)有限公司 Asynchronous low-power-consumption signal processing method and device, computer equipment and storage medium
CN113608894A (en) * 2021-08-04 2021-11-05 电子科技大学 Fine granularity-oriented algorithm component operation method
CN114050838A (en) * 2021-10-30 2022-02-15 西南电子技术研究所(中国电子科技集团公司第十研究所) 100Gbps bandwidth RapidIO signal source
CN114050838B (en) * 2021-10-30 2023-12-29 西南电子技术研究所(中国电子科技集团公司第十研究所) 100Gbps bandwidth RapidIO signal source
CN114826849A (en) * 2022-03-19 2022-07-29 西安电子科技大学 DSP local reconstruction method and system for communication signal identification processing

Similar Documents

Publication Publication Date Title
CN109656861A (en) A kind of multi-core parallel concurrent signal processing system and method based on SRIO bus
US10951458B2 (en) Computer cluster arrangement for processing a computation task and method for operation thereof
Pinto et al. Thymesisflow: A software-defined, hw/sw co-designed interconnect stack for rack-scale memory disaggregation
Yang et al. The TianHe-1A supercomputer: its hardware and software
CN102073481B (en) Multi-kernel DSP reconfigurable special integrated circuit system
CN108829515A (en) A kind of cloud platform computing system and its application method
CN104657308A (en) Method for realizing server hardware acceleration by using FPGA (field programmable gate array)
CN110297661B (en) Parallel computing method, system and medium based on AMP framework DSP operating system
CN100550003C (en) The implementation method of chip-on communication of built-in isomerization multicore architecture interconnection organisational level
Attia et al. Network interface sharing for SoCs based NoC
CN113672549B (en) Microsystem architecture based on non-shared storage multi-core processor
EP2113841A1 (en) Allocating resources in a multicore environment
CN113608861B (en) Virtualized distribution method and device for software load computing resources
WO2021213075A1 (en) Inter-node communication method and device based on multiple processing nodes
CN202033745U (en) On-chip heterogeneous multi-core system based on star-shaped interconnection framework
CN112416053A (en) Synchronizing signal generating circuit and chip of multi-core architecture and synchronizing method and device
CN217606354U (en) Reconfigurable edge calculation module
US20230134744A1 (en) Execution State Management
Cao et al. Design of hpc node with heterogeneous processors
US20230125149A1 (en) Fractional Force-Quit for Reconfigurable Processors
Wu et al. A embedded real-time polymorphic computing platform architecture
CN117370241A (en) Radar signal processing architecture based on HuaRui No. 2
CN112269755A (en) Distributed parallel packing method and system for FPGA chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190419