CN108234147A - DMA broadcast data transmission method based on host counting in GPDSP - Google Patents
DMA broadcast data transmission method based on host counting in GPDSP Download PDFInfo
- Publication number
- CN108234147A CN108234147A CN201711480231.7A CN201711480231A CN108234147A CN 108234147 A CN108234147 A CN 108234147A CN 201711480231 A CN201711480231 A CN 201711480231A CN 108234147 A CN108234147 A CN 108234147A
- Authority
- CN
- China
- Prior art keywords
- dma
- broadcast
- data
- core
- host
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
- G06F13/287—Multiplexed DMA
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/40—Bus structure
- G06F13/4063—Device-to-bus coupling
- G06F13/4068—Electrical coupling
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
- H04L12/1863—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast comprising mechanisms for improved reliability, e.g. status reports
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/28—DMA
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Multi Processors (AREA)
Abstract
The invention discloses a DMA broadcast data transmission method based on host counting in GPDSP, which comprises the following steps: starting DMA broadcast data transmission by a host DMA, generating a broadcast read request and then sending the broadcast read request to the outside of a core through an on-chip network; the host DMA receives the read return data of each slave DMA, counts to confirm whether the data transmission is finished, when the data transmission is confirmed to be finished, the host DMA sends out a buffer emptying command to all the slave DMAs, and each slave DMA receives the buffer emptying command and executes the buffer emptying operation to finish the broadcast transmission. The invention can start one DMA transmission transaction to realize DMA broadcast data transmission, and has the advantages of simple realization principle, low cost, low DMA transmission power consumption and starting overhead, high data transmission efficiency and DDR reading efficiency, large transmission bandwidth and the like.
Description
Technical field
The present invention relates to GPDSP (General Purpose Digital Signal Processor, general-purpose digital signals
Processor) a kind of Intrusion Detection based on host counts in technical field more particularly to GPDSP DMA (Director Memory Access,
Direct memory access) broadcast data transmission method.
Background technology
GPDSP is a kind of advantage for not only having kept DSP embedded essential characteristic and high-performance low-power-consumption, but also can efficiently be supported
The new architecture of general scientific algorithm, the structure can overcome the above problems of the general DSP for scientific algorithm, can be simultaneously
Efficient support to 64 high-performance computers and embedded high-precision signal processing is provided.The structure has following feature:①
Direct expression with double-precision floating point and 64 vertex datas, general register, data/address bus, instruction bit wide 64 or more,
Address bus 40 or more;2. CPU and DSP heterogeneous polynuclear close-coupleds, CPU core support complete operating system, the scalar of DSP core
Unit supports operating system micronucleus;3. consider the unified programming mode of vectorial array structure in CPU core, DSP core and DSP core;
4. its machine is kept to intersect artificial debugging, while provide local cpu host's debugging mode;5. retain the common DSP in addition to digit
Essential characteristic.
GPDSP usually forms processing array to obtain higher floating-point operation ability by 64 bit processing units of multiple isomorphisms,
The data volume that need to be handled however, as GPDSP is huge, causes to need to hand between GPDSP core memory storage components and the outer storage unit of core
Change a large amount of data.The data of the outer memory space storage of core are firstly the need of moving core memory space in terms of kernel to be facilitated to carry out
It calculates, the result needs that kernel is calculated are moved memory space outside core and preserved, at this time core memory storage component and core external memory
Storing up the message transmission rate between component becomes the key factor of limitation GPDSP processing speeds, identical with general processor,
GPDSP is also faced with the problem of " storage wall ".
DMA can be carried out at high speed data-moving from the background, the process of moving does not need to while process cores carry out data calculating
The participation of process cores, DMA can preferably alleviate " storage wall " problem.Since DMA technology is by the calculating operation and storage unit of kernel
Data-moving operation Overlapped Execution, reduce the data between the outer storage unit of core memory storage component and core to a certain extent
Influence of the transmission speed to GPDSP process performances.It is existing however as being continuously increased for the process cores number integrated in GPDSP
DMA data transfer mode has been unable to meet demand of the multi-core parallel concurrent processing to data volume, and efficient multinuclear DMA, which is related to must take into consideration, to be answered
With the memory access demand of program and the hardware architectural features of multinuclear GPDSP.
Such as Matrix Multiplication, Fast Fourier Transform (FFT), HPL (High Performance Linpack) algorithms most in use and application
For program in Parallel Implementation on multinuclear GPDSP, all kernels may access same memory space whithin a period of time, such as
GEMM matrix multiplications (C+=AB) are carried out, A matrixes are sharing matrix, and all DSP cores are required for matrix A;If use tradition
DMA transfer mode, each DSP core initiates point-to-point transmission and reads data block on DDR same positions, at this time due to every
The distance of a core to DDR are different, the data for being likely to occur different core readings are caused to be on different DDR pages, this can cause
DDR pages of hit is lost, DDR skips, and number increases, while increase Memory accessing delay, greatly reduces the reading efficiency of DDR;If it deposits
Start DMA transfer affairs in multiple or all cores, not only result in a large amount of power consumption, can also cause the pressure of network, and right
Situations such as will appear competition or hit loss when accessing memory space DDR outside core.
Invention content
The technical problem to be solved in the present invention is that:For technical problem of the existing technology, the present invention provides one
Kind realization principle is simple, at low cost, DMA transfer power consumption and Start-up costs are small, data transmission efficiency and DDR are read efficient and transmitted
The DMA broadcast data transmission methods counted with Intrusion Detection based on host in roomy GPDSP.
In order to solve the above technical problems, technical solution proposed by the present invention is:
The DMA broadcast data transmission methods that Intrusion Detection based on host counts in a kind of GPDSP, this method include:Started by host dma
DMA broadcast data transmissions through network-on-chip are sent to memory space outside core after generation broadcast read request;The outer memory space of core according to
The broadcast read request will read returned data and be sent to network-on-chip, and each core receives reading return from network-on-chip in GPDSP
Simultaneously core memory space is written in data, and host dma, which receives, to be read returned data and counted to confirm whether data transmission is completed.
As a further improvement on the present invention, it is described to confirm whether transmission is completed to specifically include:Setting includes respectively in advance
Source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt and destination frame residue unit number
The broadcast transmission parameters of DstEleCnt, the source frame number SrcArrCnt are used to be configured the frame number of the outer moving data of core, the source
Frame residue unit number SrcEleCnt is used to count the number of data units not read also in current source frame, the purpose frame number
For the data frame number of write-in core memory space to be configured, the destination frame residue unit number DstEleCnt is used for DstArrCnt
The number of data units not write also in current destination frame is counted, whether data transmission is confirmed according to the value of the broadcast transmission parameters
It completes.
As a further improvement on the present invention, the source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, mesh
Frame number DstArrCnt and destination frame residue unit number DstEleCnt meet following formula:
(SrcArrCnt+1) * SrcEleCnt==(DstArrCnt+1) * DstEleCnt;
Wherein SrcArrCnt+1 is the frame number of the outer moving data of required core, and SrcEleCnt is not read also in current source frame
Number of data units, DstArrCnt+1 be it is required write-in core memory space data frame number, DstEleCnt be current purpose
The number of data units not write also in frame.
As a further improvement on the present invention, this method further includes the transmission mode parameter for DMA transmission mode to be configured
TMODE when the transmission mode parameter TMODE is effective, starts and performs DMA broadcast data transmissions.
As a further improvement on the present invention, the reading that the broadcast read request includes returning nuclear information for mark data is returned
Return selection vector RetVec, according to it is described read to return selection vector RetVec determine reading returned data needed for the purpose core that returns.
As a further improvement on the present invention, the return selection vector RetVec that reads specifically has multidigit, each correspondence
Whether one participation core for participating in transmission of mark needs to return to the state for reading returned data.
As a further improvement on the present invention:The broadcast read request, which further includes, to be read address, reads mask, reads in return address
One or more information.
As a further improvement on the present invention, when confirming completion data transmission, clearing buffers step is further included, it is specific to walk
Suddenly it is:Host dma sends out clearing buffers order to all from DMA, and each slave DMA receives the empty buffer order and performs clear
Empty buffer operates, and terminates broadcast transmission.
As a further improvement on the present invention, this method the specific steps are:
S1. setting includes source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number respectively in advance
The broadcast transmission parameters of DstArrCnt and destination frame residue unit number DstEleCnt;
S2. the broadcast transmission parameters aft engine DMA is configured and starts DMA broadcast data transmissions, and pass according to the broadcast
After defeated parameter generation broadcast read request memory space outside core is sent to through network-on-chip;
S3. the outer memory space of core will read returned data according to the broadcast read request and be sent to network-on-chip, each in GPDSP
A core is received from network-on-chip to be read returned data and core memory space is written, and host dma, which receives, to be read returned data and update
The broadcast transmission parameters are to be counted;
S4. when host dma receive last block number according to when count complete, host dma all DSP cores are sent out empty it is slow
The order of punching, slave DMA, which receives to empty, performs null clear operation after order, and according to interrupting during enable bit sends out after the completion of emptying
Disconnected request;After slave DMA cachings empty, the value of pre-set broadcast end register BOR inside set, broadcast transmission thing
Business terminates.
As a further improvement on the present invention, when host dma receives reading returned data in the step S3, number is further included
According to effective judgment step, the specific steps are:Judge whether data are effective, if it is effective, forward the data to core memory storage sky
Between, start host dma and counted, if in vain, directly initiating host dma and being counted.
Compared with prior art, the advantage of the invention is that:
1) the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP of the present invention, pass through a DMA transfer affairs
In the core memory space for the same data block of memory space outside core being moved whole DSP cores to chip, by host
DMA generates read request and transmission data block is counted and is transmitted to confirm so that only needs one DSP core of startup
DMA broadcasts transmission transaction, can the same data block of memory space outside the core of GPDSP be transferred to chip in the form of broadcasting
On all DSP cores, meet the transmission mode for the needs of all DSP cores are to data, all cores avoided to start DMA transfer simultaneously,
DMA transfer power consumption and Start-up costs can be effectively reduced, mitigate the congestion of network-on-chip.
2) the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP of the present invention, can realize similar to GEMM squares
The broadcast transmission of (C+=AB) A matrixes in battle array multiplication, and it is all with regard to that can meet due to need to only start a DMA transfer affairs
Demand of the DSP core to data outside core can greatly reduce the number that skips of the outer memory space DDR of core, and reduce the access time of DDR
Number, so as to substantially increase the row hit rate of the reading efficiency of DDR and DDR, while effectively increases the bandwidth of transmission.
3) the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP of the present invention, further by setting source frame number
SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt and destination frame residue unit number
The broadcast transmission parameters of DstEleCnt so that DMA broadcast data transmissions can easily be realized by configuration broadcast configured transmission
Control so as to which simple, efficient realization starts the DMA broadcast transmission transactions of a DSP core, can will store outside the core of GPDSP
The same data block in space is transferred to all DSP cores on chip in the form of broadcasting, and can be with based on broadcast transmission parameters
Realize flexible configuration.
Description of the drawings
Fig. 1 is the GPDSP architectural principles schematic diagrames that the present embodiment uses.
Fig. 2 is positions and operation principle schematic diagram of the DMA in GPDSP in the present embodiment.
Fig. 3 is the principle schematic that DMA broadcast data transmissions are realized in the specific embodiment of the invention.
Fig. 4 is the principle schematic of DMA broadcast data transmissions configured transmission word in the specific embodiment of the invention.
Fig. 5 is the realization flow diagram that the present embodiment realizes DMA broadcast data transmissions.
Specific embodiment
Below in conjunction with Figure of description and specific preferred embodiment, the invention will be further described, but not therefore and
It limits the scope of the invention.
As shown in Fig. 1~5, the DMA broadcast data transmission methods that Intrusion Detection based on host counts in the present embodiment GPDSP include:By
Host dma starts DMA broadcast data transmissions, and through network-on-chip memory space outside core is sent to after generation broadcast read request;Outside core
Memory space is sent to network-on-chip according to broadcast read request by returned data is read, and each core connects from network-on-chip in GPDSP
It receives and reads returned data and core memory space is written, host dma, which receives, to be read returned data and counted to confirm data transmission
Whether complete, i.e., be responsible for generation read request in broadcast is transmitted as host by the DSP core of initiation DMA transfer affairs, be simultaneously
The transmission data of every other core are counted to confirm that data transmission is completed.
The present embodiment above method moves the same data block of memory space outside core by a DMA transfer affairs
In the core memory space of whole DSP cores on to chip, read request is generated by host dma and transmission data block is carried out
Counting is transmitted to confirm so that it only needs to start the DMA broadcast transmission transactions of DSP core, it can will be outside the core of GPDSP
The same data block of memory space is transferred to all DSP cores on chip in the form of broadcasting, and meets all DSP cores to data
Demand transmission mode, all cores is avoided to start DMA transfer simultaneously, can effectively reduce DMA transfer power consumption and Start-up costs,
Mitigate the congestion of network-on-chip.
The present embodiment above method can realize that the broadcast similar to (C+=AB) A matrixes in GEMM matrix multiplications passes
It is defeated, and due to need to only start a DMA transfer affairs with regard to that can meet all DSP cores to that the needs of data, can greatly reduce outside core
The number that skips of the outer memory space DDR of core reduces the access times of DDR, so as to substantially increase the reading efficiency of DDR and
The row hit rate of DDR, while effectively increase the bandwidth of transmission.
The present embodiment use GPDSP architectures as shown in Figure 1, multinuclear GPDSP by core nodes, I/O node, piece
The outer storage unit DDR compositions of upper network, DDR controller, core, wherein each core nodes include two DSP cores, DDR controller
Control DDR data are moved, the data communication between each DSP of network-on-chip realization and between DSP and the outer memory space of core.
As shown in Fig. 2, DMA is connected in DSP core by the way that bus PBUS is configured with SPU in the present embodiment, it is total by data
Line is connected with core memory space (vectorial storage unit AM and scalar storage unit SM), passes through core external bus interface and core external memory
Storage space DDR is connected;SPU scalar processing units are responsible for generating configured transmission word to DMA so that DMA can be actively from core memory
Storage space is moved to memory space outside core or memory space is moved to core memory space outside core, and DMA can also passively connect
Receive the read-write requests from network-on-chip.
In the present embodiment, confirm whether transmission is completed to specifically include:In advance respectively setting include source frame number SrcArrCnt,
The broadcast of source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt and destination frame residue unit number DstEleCnt
Configured transmission, source frame number SrcArrCnt be used to be configured the frame number of the outer moving data of core, represents to move memory space outside core
Data frame number is SrcArrCnt+1, and source frame residue unit number SrcEleCnt is used to count the data not read also in current source frame
Unit number, data cell are the minimum particle size of DMA transfer in GPDSP;Core memory is written for being configured in purpose frame number DstArrCnt
The data frame number in space is stored up, represents the data frame number of write-in core memory space as DstArrCnt+1, destination frame residue unit number
DstEleCnt is used to count the number of data units not write also in current destination frame, confirms number according to the value of broadcast transmission parameters
Whether completed according to transmission.
The present embodiment is by setting source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number
The broadcast transmission parameters of DstArrCnt and destination frame residue unit number DstEleCnt so that pass through configuration broadcast configured transmission
It can easily realize that DMA broadcast data transmissions control, realize that the DMA broadcast for starting a DSP core passes so as to simple, efficient
The same data block of memory space outside the core of GPDSP can be transferred to all on chip by defeated affairs in the form of broadcasting
DSP core.
In the present embodiment, source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt
And destination frame residue unit number DstEleCnt meets following formula:
(SrcArrCnt+1) * SrcEleCnt==(DstArrCnt+1) * DstEleCnt;
Wherein SrcArrCnt+1 is the frame number of the outer moving data of required core, and SrcEleCnt is not read also in current source frame
Number of data units, DstArrCnt+1 be it is required write-in core memory space data frame number, DstEleCnt be current purpose
The number of data units not write also in frame.
When the present embodiment DMA carries out broadcast data transmission, specific host dma will be to one in memory space DDR outside core
Block data block initiates broadcast read request, and the data of return are sent to all DSP cores;It represents to remove outside core by source frame number SrcArrCnt
The frame number of data is moved, value SrcArrCnt+1, SrcEleCnt represent current source frame residue unit number, data cell DMA
The least unit of transmission, broadcast transmission of data size are (SrcArrCnt+1) * SrcEleCnt, when SrcEleCnt is 0, when
The calculating of previous frame read request finishes, and the value of SrcArrCnt subtracts 1;When SrcArrCnt is 0 and SrcEleCnt is also 0, read request
Calculating finishes;DstArrCnt represents purpose frame number, and DstEleCnt represents current destination frame residue unit number, wherein
(SrcArrCnt+1) * SrcEleCnt==(DstArrCnt+1) * DstEleCnt.
In the present embodiment, the transmission mode parameter TMODE for DMA transmission mode to be configured is further included, when transmission mode is joined
When number TMODE is effective, starts and perform DMA broadcast data transmissions, for concrete configuration when TMODE=" 2 ' b11 ", transmission mode is wide
Unicast data transmission that is, when TMODE=" 2 ' b11 ", starts broadcast data transmission by host dma.
In the present embodiment, the reading that broadcast read request includes returning nuclear information for mark data returns to selection vector
RetVec is returned according to reading and vector RetVec is selected to determine the purpose core returned needed for reading returned data, the i.e. broadcast by sending
Flag data return information is carried in read request, each DSP core is determined the need for according to the value for reading to return selection vector RetVec
It returns and reads returned data.Reading to return selects vector RetVecc to share n, each corresponds to whether one DSP core of mark needs to return
The state of retaking of a year or grade returned data, i.e. each one DSP core of correspondence, represent whether and return data to corresponding core.Broadcast is read
Request, which further includes, reads address, reads mask, reads return address etc., i.e., carried by read request read address, read mask, read return address,
It reads to return to the information such as selection vector RetVec.
In a particular embodiment, when TMODE=" 2 ' b11 " (i.e. DMA carries out broadcast data transmission), good broadcast transmission is configured
After parameter, DMA initiate broadcast data transmission, DMA according to broadcast parameter SrcArrCnt, SrcEleCnt, DstArrCnt,
DstEleCnt generation broadcast read requests are transmitted to network-on-chip, read request include reading address, read mask, reading return address and
Parameter is read to return to selection vector RetVec, reads return selection vector RetVec and shares n, in broadcast is transmitted, signal RetVec
N positions be all 1, represent that reading returned data returns to whole DSP cores;The outer memory space of core returns data on piece according to read request
Network, all slave DMA receive data through network-on-chip passivity and are written in core.
The present embodiment is on the basis of point-to-point DMA transmission mode, by the way that 5 parameters are configured:Transmission mode TMODE, source
Frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt, destination frame residue unit number
DstEleCnt carrys out controlling transmission process, is asked by host dma according to the configured transmission generation broadcast data transmission being configured, together
For the counting statistics of Shi Jinhang moving datas until broadcasting the end of transmission, the DMA that can start a DSP core broadcasts transmission transaction,
All DSP cores that the same data block of memory space outside the core of GPDSP can be transferred in the form of broadcasting on chip.
In the present embodiment, when confirming completion data transmission, clearing buffers step is further included, the specific steps are:Work as host
After the completion of DMA is counted, host dma sends out clearing buffers order to all from DMA, and each slave DMA receives empty buffer order simultaneously
Clearing buffers operation is performed, this time transmission transaction terminates after emptying.
DMA broadcast data transmissions data are realized in the specific embodiment of the invention as shown in figure 3, wherein chip carries 12
DSP core (is stored with independent DMA and LM, LM for core memory space including vectorial storage unit AM and scalar in each core
Component SM);The frame that Array expressions are moved, C0~C11 represent that, per data line block, size is 512bits, that is, 8words.This
Broadcast transmission of data block size 4x96words;DMA initiates broadcast data transmission, transmits 4 frame data altogether, is per frame data size
96words.DDR has first removed the page of data of DDR, then turned over every time according to direction moving data shown in arrow in figure, DMA
Page moves lower page of data;DDR sends data to network-on-chip according to read request, and slave DSP is received through network passivity
Data.It can be seen from the above, using the present embodiment broadcast data transmission mode, the number that skips, the drop of DDR can be considerably reduced
Low transmission is delayed, and effectively reduces the access times of DDR, while improve transmission bandwidth and the reading efficiency of DDR.
DMA broadcast data transmissions configured transmission word is as shown in figure 4, specifically include transmission mode in the specific embodiment of the invention
TMODE, source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt, destination frame are remaining single
First number DstEleCnt, wherein TMODE bit wides are 2, and when TMODE values, ' during b11, DMA starts broadcast data transmission, outside core for 2
It is moved in memory space DDR in same data block to 12 DSP cores;SrcArrCnt is source frame unit number, and bit wide 32 is maximum
Frame number is 2 32 powers;SrcEleCnt be current source frame residue unit number, bit wide 32, maximum value be 2 32 powers subtract 1;
DstArrCnt is purpose frame unit number, and bit wide 32, maximum value is 2 32 powers;DstEleCnt is remaining for current destination frame
Unit number, 32 powers that maximum value is 2 subtract 1.
As shown in figure 5, the present embodiment realize GPDSP in DMA broadcast data transmissions the specific steps are:
S1. setting includes source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number respectively in advance
The broadcast transmission parameters of DstArrCnt and destination frame residue unit number DstEleCnt;
S2. configuration broadcast configured transmission aft engine DMA starts DMA broadcast data transmissions, and according to broadcast transmission parameters
SrcArrCnt, SrcEleCnt, DstArrCnt, DstEleCnt generation broadcast read request, read request include reading address, read to cover
Code reads return address and parameter reading return selection vector RetVec, and the broadcast read request of generation is sent to through network-on-chip outside core
Memory space;
S3. the outer memory space of core will read returned data according to broadcast read request and be sent to network-on-chip, each core in GPDSP
It is received from network-on-chip and reads returned data and core memory space is written, host dma receives reading returned data and updates broadcast
Configured transmission is to carry out this core transferred data count;
S4. when host dma receive last block number according to when count complete, host dma all DSP cores are sent out empty it is slow
The order of punching, slave DMA, which receives to empty, performs null clear operation after order, and according to interrupting during enable bit sends out after the completion of emptying
Disconnected request;After slave DMA cachings empty, the value of pre-set broadcast end register BOR inside set, broadcast transmission thing
Business terminates.
In the present embodiment, when host dma receives reading returned data in step S3, the effective judgment step of data is further included,
The specific steps are:Judge whether data effective, if it is effective, forward the data to core memory space, start host dma into
Row counts, if in vain, directly initiating host dma and being counted.
As shown in figure 5, after the good broadcast data transmission parameter word of the present embodiment concrete configuration, host dma initiates broadcast data
Transmission, host dma send out broadcast read request to memory space DDR outside core;DDR, which is returned, to be read to return data to network-on-chip, host
DMA receives the reading returned data of each core from network-on-chip, other all cores passively receive reading from network-on-chip and return to number
According to;This core core memory space is written if host dma detects data effectively, in vain if data are counted;Work as master
Machine can send out other cores the order of clearing buffers after the completion of counting, the value of broadcast end of identification register is put 1 by slave, is completed
After null clear operation, if it is 1 to interrupt enable bit, interruption is sent out.
Above-mentioned only presently preferred embodiments of the present invention not makees the present invention limitation in any form.It is although of the invention
It is disclosed above with preferred embodiment, however it is not limited to the present invention.Therefore, it is every without departing from technical solution of the present invention
Content, technical spirit any simple modifications, equivalents, and modifications made to the above embodiment, should all fall according to the present invention
In the range of technical solution of the present invention protection.
Claims (10)
1. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in a kind of GPDSP, which is characterized in that this method includes:By leading
Machine DMA starts DMA broadcast data transmissions, and through network-on-chip memory space outside core is sent to after generation broadcast read request;Core external memory
Storage space is sent to network-on-chip according to the broadcast read request by returned data is read, and each core is from network-on-chip in GPDSP
It receives and reads returned data and core memory space is written, host dma, which receives, to be read returned data and counted to confirm that data pass
It is defeated whether to complete.
2. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP according to claim 1, which is characterized in that
It is described to confirm whether transmission is completed to specifically include:Setting includes source frame number SrcArrCnt, source frame residue unit number respectively in advance
The broadcast transmission parameters of SrcEleCnt, purpose frame number DstArrCnt and destination frame residue unit number DstEleCnt, the source
Frame number SrcArrCnt is used to be configured the frame number of the outer moving data of core, and the source frame residue unit number SrcEleCnt works as counting
Core memory space is written for being configured in the number of data units not read also in preceding source frame, the purpose frame number DstArrCnt
Data frame number, the destination frame residue unit number DstEleCnt are used to count the data cell not write also in current destination frame
Number confirms whether data transmission is completed according to the value of the broadcast transmission parameters.
3. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP according to claim 2, which is characterized in that
The source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt and destination frame are remaining single
First number DstEleCnt meets following formula:
(SrcArrCnt+1) * SrcEleCnt==(DstArrCnt+1) * DstEleCnt;
Wherein SrcArrCnt+1 is the frame number of the outer moving data of required core, and SrcEleCnt is the number not read also in current source frame
According to unit number, DstArrCnt+1 is the data frame number of required write-in core memory space, and DstEleCnt is in current destination frame
The number of data units not write also.
4. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in the GPDSP according to claims 1 or 2 or 3, special
Sign is that this method further includes the transmission mode parameter TMODE for DMA transmission mode to be configured, when the transmission mode parameter
When TMODE is effective, starts and perform DMA broadcast data transmissions.
5. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in the GPDSP according to claims 1 or 2 or 3, special
Sign is that the reading that the broadcast read request includes returning nuclear information for mark data returns to selection vector RetVec, according to institute
Stating to read to return selects vector RetVec to determine the purpose core for reading to return needed for returned data.
6. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP according to claim 5, which is characterized in that
The return selection vector RetVec that reads specifically has multidigit, each corresponds to whether one participation core for participating in transmitting of mark needs
Return to the state for reading returned data.
7. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP according to claim 6, which is characterized in that
The broadcast read request, which further includes, to be read address, reads mask, reads one or more information in return address.
8. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in the GPDSP according to claims 1 or 2 or 3, special
Sign is, when confirming completion data transmission, further includes clearing buffers step, the specific steps are:Host dma is to all from DMA
Clearing buffers order is sent out, each slave DMA receives the empty buffer order and performs clearing buffers operation, terminates broadcast and pass
It is defeated.
9. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in the GPDSP according to claims 1 or 2 or 3, special
Sign is, this method the specific steps are:
S1. setting includes source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number respectively in advance
The broadcast transmission parameters of DstArrCnt and destination frame residue unit number DstEleCnt;
S2. the broadcast transmission parameters aft engine DMA is configured and starts DMA broadcast data transmissions, and according to the broadcast transmission ginseng
After number generation broadcast read request memory space outside core is sent to through network-on-chip;
S3. the outer memory space of core will read returned data according to the broadcast read request and be sent to network-on-chip, each core in GPDSP
It is received from network-on-chip and reads returned data and core memory space is written, host dma, which receives, to be read described in returned data and update
Broadcast transmission parameters are to be counted;
S4. when host dma receive last block number according to when count and complete, host dma sends out clearing buffers to all DSP cores
Order, slave DMA, which receives to empty, performs null clear operation after order, and after the completion of emptying according to interrupt enable bit send out interruption please
It asks;After slave DMA cachings empty, the value of pre-set broadcast end register BOR, broadcasts transmission transaction knot inside set
Beam.
10. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP according to claim 9, feature exist
In, when host dma receives reading returned data in the step S3, the effective judgment step of data is further included, the specific steps are:Sentence
Whether disconnected data are effective, if it is effective, forward the data to core memory space, start host dma and counted, if nothing
Effect, directly initiates host dma and is counted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711480231.7A CN108234147B (en) | 2017-12-29 | 2017-12-29 | DMA broadcast data transmission method based on host counting in GPDSP |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711480231.7A CN108234147B (en) | 2017-12-29 | 2017-12-29 | DMA broadcast data transmission method based on host counting in GPDSP |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108234147A true CN108234147A (en) | 2018-06-29 |
CN108234147B CN108234147B (en) | 2021-06-18 |
Family
ID=62647085
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711480231.7A Active CN108234147B (en) | 2017-12-29 | 2017-12-29 | DMA broadcast data transmission method based on host counting in GPDSP |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108234147B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114024920A (en) * | 2021-11-24 | 2022-02-08 | 苏州暴雪电子科技有限公司 | Data packet routing method for on-chip message network |
CN118170702A (en) * | 2024-05-13 | 2024-06-11 | 北京壁仞科技开发有限公司 | DMA controller and data handling method for broadcasting |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521201A (en) * | 2011-11-16 | 2012-06-27 | 刘大可 | Multi-core DSP (digital signal processor) system-on-chip and data transmission method |
US20150103826A1 (en) * | 2009-10-30 | 2015-04-16 | Calxeda Inc. | System and method for using a multi-protocol fabric module across a distributed server interconnect fabric |
CN104679691A (en) * | 2015-01-22 | 2015-06-03 | 中国人民解放军国防科学技术大学 | Multi-core DMA (direct memory access) subsection data transmission method used for GPDSP and adopting host counting |
CN104679689A (en) * | 2015-01-22 | 2015-06-03 | 中国人民解放军国防科学技术大学 | Multi-core DMA (direct memory access) subsection data transmission method used for GPDSP (general purpose digital signal processor) and adopting slave counting |
-
2017
- 2017-12-29 CN CN201711480231.7A patent/CN108234147B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150103826A1 (en) * | 2009-10-30 | 2015-04-16 | Calxeda Inc. | System and method for using a multi-protocol fabric module across a distributed server interconnect fabric |
CN102521201A (en) * | 2011-11-16 | 2012-06-27 | 刘大可 | Multi-core DSP (digital signal processor) system-on-chip and data transmission method |
CN104679691A (en) * | 2015-01-22 | 2015-06-03 | 中国人民解放军国防科学技术大学 | Multi-core DMA (direct memory access) subsection data transmission method used for GPDSP and adopting host counting |
CN104679689A (en) * | 2015-01-22 | 2015-06-03 | 中国人民解放军国防科学技术大学 | Multi-core DMA (direct memory access) subsection data transmission method used for GPDSP (general purpose digital signal processor) and adopting slave counting |
Non-Patent Citations (2)
Title |
---|
张帅: "一种支持多种传输模式的DMA主机模块设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
胡月安: "32位高性能M_DSP中支持高效数据传输的DMA设计与验证", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114024920A (en) * | 2021-11-24 | 2022-02-08 | 苏州暴雪电子科技有限公司 | Data packet routing method for on-chip message network |
CN114024920B (en) * | 2021-11-24 | 2023-10-27 | 苏州暴雪电子科技有限公司 | Data packet routing method for on-chip message network |
CN118170702A (en) * | 2024-05-13 | 2024-06-11 | 北京壁仞科技开发有限公司 | DMA controller and data handling method for broadcasting |
Also Published As
Publication number | Publication date |
---|---|
CN108234147B (en) | 2021-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110647480B (en) | Data processing method, remote direct access network card and equipment | |
CN107301455B (en) | Hybrid cube storage system for convolutional neural network and accelerated computing method | |
CN104699631B (en) | It is multi-level in GPDSP to cooperate with and shared storage device and access method | |
KR101719092B1 (en) | Hybrid memory device | |
EP2423821A2 (en) | Processor, apparatus, and method for fetching instructions and configurations from a shared cache | |
CN105389277B (en) | Towards the high-performance DMA components of scientific algorithm in GPDSP | |
CN102449611B (en) | For the method and apparatus of issuing memory barrier commands in weak sequence storage system | |
CN103645994A (en) | Data processing method and device | |
CN105183662A (en) | Cache consistency protocol-free distributed sharing on-chip storage framework | |
CN104679691B (en) | A kind of multinuclear DMA segment data transmission methods using host count for GPDSP | |
CN106775477B (en) | SSD (solid State disk) master control data transmission management device and method | |
CN105556503A (en) | Dynamic memory control method and system thereof | |
CN102968395B (en) | Method and device for accelerating memory copy of microprocessor | |
CN104679689B (en) | A kind of multinuclear DMA segment data transmission methods counted using slave for GPDSP | |
CN107015923A (en) | Uniformity for managing snoop operations is interconnected and data processing equipment including it | |
CN102314400A (en) | Method and device for dispersing converged DMA (Direct Memory Access) | |
CN108234147A (en) | DMA broadcast data transmission method based on host counting in GPDSP | |
CN104317754B (en) | The data transfer optimization method that strides towards heterogeneous computing system | |
CN102262608A (en) | Method and device for controlling read-write operation of processor core-based coprocessor | |
CN100405333C (en) | Method and device for processing memory access in multi-processor system | |
JP6679570B2 (en) | Data processing device | |
CN113535611A (en) | Data processing method and device and heterogeneous system | |
CN115174673B (en) | Data processing device, data processing method and apparatus having low-latency processor | |
JP7177948B2 (en) | Information processing device and information processing method | |
CN108062282A (en) | DMA data merging transmission method in GPDSP |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |