CN108234147A - DMA broadcast data transmission method based on host counting in GPDSP - Google Patents

DMA broadcast data transmission method based on host counting in GPDSP Download PDF

Info

Publication number
CN108234147A
CN108234147A CN201711480231.7A CN201711480231A CN108234147A CN 108234147 A CN108234147 A CN 108234147A CN 201711480231 A CN201711480231 A CN 201711480231A CN 108234147 A CN108234147 A CN 108234147A
Authority
CN
China
Prior art keywords
dma
broadcast
data
core
host
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711480231.7A
Other languages
Chinese (zh)
Other versions
CN108234147B (en
Inventor
马胜
雷元武
张美迪
万江华
陈胜刚
李勇
彭元喜
孙书为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201711480231.7A priority Critical patent/CN108234147B/en
Publication of CN108234147A publication Critical patent/CN108234147A/en
Application granted granted Critical
Publication of CN108234147B publication Critical patent/CN108234147B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • G06F13/287Multiplexed DMA
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4063Device-to-bus coupling
    • G06F13/4068Electrical coupling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1863Arrangements for providing special services to substations for broadcast or conference, e.g. multicast comprising mechanisms for improved reliability, e.g. status reports
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/28DMA
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Multi Processors (AREA)

Abstract

The invention discloses a DMA broadcast data transmission method based on host counting in GPDSP, which comprises the following steps: starting DMA broadcast data transmission by a host DMA, generating a broadcast read request and then sending the broadcast read request to the outside of a core through an on-chip network; the host DMA receives the read return data of each slave DMA, counts to confirm whether the data transmission is finished, when the data transmission is confirmed to be finished, the host DMA sends out a buffer emptying command to all the slave DMAs, and each slave DMA receives the buffer emptying command and executes the buffer emptying operation to finish the broadcast transmission. The invention can start one DMA transmission transaction to realize DMA broadcast data transmission, and has the advantages of simple realization principle, low cost, low DMA transmission power consumption and starting overhead, high data transmission efficiency and DDR reading efficiency, large transmission bandwidth and the like.

Description

The DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP
Technical field
The present invention relates to GPDSP (General Purpose Digital Signal Processor, general-purpose digital signals Processor) a kind of Intrusion Detection based on host counts in technical field more particularly to GPDSP DMA (Director Memory Access, Direct memory access) broadcast data transmission method.
Background technology
GPDSP is a kind of advantage for not only having kept DSP embedded essential characteristic and high-performance low-power-consumption, but also can efficiently be supported The new architecture of general scientific algorithm, the structure can overcome the above problems of the general DSP for scientific algorithm, can be simultaneously Efficient support to 64 high-performance computers and embedded high-precision signal processing is provided.The structure has following feature:① Direct expression with double-precision floating point and 64 vertex datas, general register, data/address bus, instruction bit wide 64 or more, Address bus 40 or more;2. CPU and DSP heterogeneous polynuclear close-coupleds, CPU core support complete operating system, the scalar of DSP core Unit supports operating system micronucleus;3. consider the unified programming mode of vectorial array structure in CPU core, DSP core and DSP core; 4. its machine is kept to intersect artificial debugging, while provide local cpu host's debugging mode;5. retain the common DSP in addition to digit Essential characteristic.
GPDSP usually forms processing array to obtain higher floating-point operation ability by 64 bit processing units of multiple isomorphisms, The data volume that need to be handled however, as GPDSP is huge, causes to need to hand between GPDSP core memory storage components and the outer storage unit of core Change a large amount of data.The data of the outer memory space storage of core are firstly the need of moving core memory space in terms of kernel to be facilitated to carry out It calculates, the result needs that kernel is calculated are moved memory space outside core and preserved, at this time core memory storage component and core external memory Storing up the message transmission rate between component becomes the key factor of limitation GPDSP processing speeds, identical with general processor, GPDSP is also faced with the problem of " storage wall ".
DMA can be carried out at high speed data-moving from the background, the process of moving does not need to while process cores carry out data calculating The participation of process cores, DMA can preferably alleviate " storage wall " problem.Since DMA technology is by the calculating operation and storage unit of kernel Data-moving operation Overlapped Execution, reduce the data between the outer storage unit of core memory storage component and core to a certain extent Influence of the transmission speed to GPDSP process performances.It is existing however as being continuously increased for the process cores number integrated in GPDSP DMA data transfer mode has been unable to meet demand of the multi-core parallel concurrent processing to data volume, and efficient multinuclear DMA, which is related to must take into consideration, to be answered With the memory access demand of program and the hardware architectural features of multinuclear GPDSP.
Such as Matrix Multiplication, Fast Fourier Transform (FFT), HPL (High Performance Linpack) algorithms most in use and application For program in Parallel Implementation on multinuclear GPDSP, all kernels may access same memory space whithin a period of time, such as GEMM matrix multiplications (C+=AB) are carried out, A matrixes are sharing matrix, and all DSP cores are required for matrix A;If use tradition DMA transfer mode, each DSP core initiates point-to-point transmission and reads data block on DDR same positions, at this time due to every The distance of a core to DDR are different, the data for being likely to occur different core readings are caused to be on different DDR pages, this can cause DDR pages of hit is lost, DDR skips, and number increases, while increase Memory accessing delay, greatly reduces the reading efficiency of DDR;If it deposits Start DMA transfer affairs in multiple or all cores, not only result in a large amount of power consumption, can also cause the pressure of network, and right Situations such as will appear competition or hit loss when accessing memory space DDR outside core.
Invention content
The technical problem to be solved in the present invention is that:For technical problem of the existing technology, the present invention provides one Kind realization principle is simple, at low cost, DMA transfer power consumption and Start-up costs are small, data transmission efficiency and DDR are read efficient and transmitted The DMA broadcast data transmission methods counted with Intrusion Detection based on host in roomy GPDSP.
In order to solve the above technical problems, technical solution proposed by the present invention is:
The DMA broadcast data transmission methods that Intrusion Detection based on host counts in a kind of GPDSP, this method include:Started by host dma DMA broadcast data transmissions through network-on-chip are sent to memory space outside core after generation broadcast read request;The outer memory space of core according to The broadcast read request will read returned data and be sent to network-on-chip, and each core receives reading return from network-on-chip in GPDSP Simultaneously core memory space is written in data, and host dma, which receives, to be read returned data and counted to confirm whether data transmission is completed.
As a further improvement on the present invention, it is described to confirm whether transmission is completed to specifically include:Setting includes respectively in advance Source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt and destination frame residue unit number The broadcast transmission parameters of DstEleCnt, the source frame number SrcArrCnt are used to be configured the frame number of the outer moving data of core, the source Frame residue unit number SrcEleCnt is used to count the number of data units not read also in current source frame, the purpose frame number For the data frame number of write-in core memory space to be configured, the destination frame residue unit number DstEleCnt is used for DstArrCnt The number of data units not write also in current destination frame is counted, whether data transmission is confirmed according to the value of the broadcast transmission parameters It completes.
As a further improvement on the present invention, the source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, mesh Frame number DstArrCnt and destination frame residue unit number DstEleCnt meet following formula:
(SrcArrCnt+1) * SrcEleCnt==(DstArrCnt+1) * DstEleCnt;
Wherein SrcArrCnt+1 is the frame number of the outer moving data of required core, and SrcEleCnt is not read also in current source frame Number of data units, DstArrCnt+1 be it is required write-in core memory space data frame number, DstEleCnt be current purpose The number of data units not write also in frame.
As a further improvement on the present invention, this method further includes the transmission mode parameter for DMA transmission mode to be configured TMODE when the transmission mode parameter TMODE is effective, starts and performs DMA broadcast data transmissions.
As a further improvement on the present invention, the reading that the broadcast read request includes returning nuclear information for mark data is returned Return selection vector RetVec, according to it is described read to return selection vector RetVec determine reading returned data needed for the purpose core that returns.
As a further improvement on the present invention, the return selection vector RetVec that reads specifically has multidigit, each correspondence Whether one participation core for participating in transmission of mark needs to return to the state for reading returned data.
As a further improvement on the present invention:The broadcast read request, which further includes, to be read address, reads mask, reads in return address One or more information.
As a further improvement on the present invention, when confirming completion data transmission, clearing buffers step is further included, it is specific to walk Suddenly it is:Host dma sends out clearing buffers order to all from DMA, and each slave DMA receives the empty buffer order and performs clear Empty buffer operates, and terminates broadcast transmission.
As a further improvement on the present invention, this method the specific steps are:
S1. setting includes source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number respectively in advance The broadcast transmission parameters of DstArrCnt and destination frame residue unit number DstEleCnt;
S2. the broadcast transmission parameters aft engine DMA is configured and starts DMA broadcast data transmissions, and pass according to the broadcast After defeated parameter generation broadcast read request memory space outside core is sent to through network-on-chip;
S3. the outer memory space of core will read returned data according to the broadcast read request and be sent to network-on-chip, each in GPDSP A core is received from network-on-chip to be read returned data and core memory space is written, and host dma, which receives, to be read returned data and update The broadcast transmission parameters are to be counted;
S4. when host dma receive last block number according to when count complete, host dma all DSP cores are sent out empty it is slow The order of punching, slave DMA, which receives to empty, performs null clear operation after order, and according to interrupting during enable bit sends out after the completion of emptying Disconnected request;After slave DMA cachings empty, the value of pre-set broadcast end register BOR inside set, broadcast transmission thing Business terminates.
As a further improvement on the present invention, when host dma receives reading returned data in the step S3, number is further included According to effective judgment step, the specific steps are:Judge whether data are effective, if it is effective, forward the data to core memory storage sky Between, start host dma and counted, if in vain, directly initiating host dma and being counted.
Compared with prior art, the advantage of the invention is that:
1) the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP of the present invention, pass through a DMA transfer affairs In the core memory space for the same data block of memory space outside core being moved whole DSP cores to chip, by host DMA generates read request and transmission data block is counted and is transmitted to confirm so that only needs one DSP core of startup DMA broadcasts transmission transaction, can the same data block of memory space outside the core of GPDSP be transferred to chip in the form of broadcasting On all DSP cores, meet the transmission mode for the needs of all DSP cores are to data, all cores avoided to start DMA transfer simultaneously, DMA transfer power consumption and Start-up costs can be effectively reduced, mitigate the congestion of network-on-chip.
2) the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP of the present invention, can realize similar to GEMM squares The broadcast transmission of (C+=AB) A matrixes in battle array multiplication, and it is all with regard to that can meet due to need to only start a DMA transfer affairs Demand of the DSP core to data outside core can greatly reduce the number that skips of the outer memory space DDR of core, and reduce the access time of DDR Number, so as to substantially increase the row hit rate of the reading efficiency of DDR and DDR, while effectively increases the bandwidth of transmission.
3) the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP of the present invention, further by setting source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt and destination frame residue unit number The broadcast transmission parameters of DstEleCnt so that DMA broadcast data transmissions can easily be realized by configuration broadcast configured transmission Control so as to which simple, efficient realization starts the DMA broadcast transmission transactions of a DSP core, can will store outside the core of GPDSP The same data block in space is transferred to all DSP cores on chip in the form of broadcasting, and can be with based on broadcast transmission parameters Realize flexible configuration.
Description of the drawings
Fig. 1 is the GPDSP architectural principles schematic diagrames that the present embodiment uses.
Fig. 2 is positions and operation principle schematic diagram of the DMA in GPDSP in the present embodiment.
Fig. 3 is the principle schematic that DMA broadcast data transmissions are realized in the specific embodiment of the invention.
Fig. 4 is the principle schematic of DMA broadcast data transmissions configured transmission word in the specific embodiment of the invention.
Fig. 5 is the realization flow diagram that the present embodiment realizes DMA broadcast data transmissions.
Specific embodiment
Below in conjunction with Figure of description and specific preferred embodiment, the invention will be further described, but not therefore and It limits the scope of the invention.
As shown in Fig. 1~5, the DMA broadcast data transmission methods that Intrusion Detection based on host counts in the present embodiment GPDSP include:By Host dma starts DMA broadcast data transmissions, and through network-on-chip memory space outside core is sent to after generation broadcast read request;Outside core Memory space is sent to network-on-chip according to broadcast read request by returned data is read, and each core connects from network-on-chip in GPDSP It receives and reads returned data and core memory space is written, host dma, which receives, to be read returned data and counted to confirm data transmission Whether complete, i.e., be responsible for generation read request in broadcast is transmitted as host by the DSP core of initiation DMA transfer affairs, be simultaneously The transmission data of every other core are counted to confirm that data transmission is completed.
The present embodiment above method moves the same data block of memory space outside core by a DMA transfer affairs In the core memory space of whole DSP cores on to chip, read request is generated by host dma and transmission data block is carried out Counting is transmitted to confirm so that it only needs to start the DMA broadcast transmission transactions of DSP core, it can will be outside the core of GPDSP The same data block of memory space is transferred to all DSP cores on chip in the form of broadcasting, and meets all DSP cores to data Demand transmission mode, all cores is avoided to start DMA transfer simultaneously, can effectively reduce DMA transfer power consumption and Start-up costs, Mitigate the congestion of network-on-chip.
The present embodiment above method can realize that the broadcast similar to (C+=AB) A matrixes in GEMM matrix multiplications passes It is defeated, and due to need to only start a DMA transfer affairs with regard to that can meet all DSP cores to that the needs of data, can greatly reduce outside core The number that skips of the outer memory space DDR of core reduces the access times of DDR, so as to substantially increase the reading efficiency of DDR and The row hit rate of DDR, while effectively increase the bandwidth of transmission.
The present embodiment use GPDSP architectures as shown in Figure 1, multinuclear GPDSP by core nodes, I/O node, piece The outer storage unit DDR compositions of upper network, DDR controller, core, wherein each core nodes include two DSP cores, DDR controller Control DDR data are moved, the data communication between each DSP of network-on-chip realization and between DSP and the outer memory space of core.
As shown in Fig. 2, DMA is connected in DSP core by the way that bus PBUS is configured with SPU in the present embodiment, it is total by data Line is connected with core memory space (vectorial storage unit AM and scalar storage unit SM), passes through core external bus interface and core external memory Storage space DDR is connected;SPU scalar processing units are responsible for generating configured transmission word to DMA so that DMA can be actively from core memory Storage space is moved to memory space outside core or memory space is moved to core memory space outside core, and DMA can also passively connect Receive the read-write requests from network-on-chip.
In the present embodiment, confirm whether transmission is completed to specifically include:In advance respectively setting include source frame number SrcArrCnt, The broadcast of source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt and destination frame residue unit number DstEleCnt Configured transmission, source frame number SrcArrCnt be used to be configured the frame number of the outer moving data of core, represents to move memory space outside core Data frame number is SrcArrCnt+1, and source frame residue unit number SrcEleCnt is used to count the data not read also in current source frame Unit number, data cell are the minimum particle size of DMA transfer in GPDSP;Core memory is written for being configured in purpose frame number DstArrCnt The data frame number in space is stored up, represents the data frame number of write-in core memory space as DstArrCnt+1, destination frame residue unit number DstEleCnt is used to count the number of data units not write also in current destination frame, confirms number according to the value of broadcast transmission parameters Whether completed according to transmission.
The present embodiment is by setting source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number The broadcast transmission parameters of DstArrCnt and destination frame residue unit number DstEleCnt so that pass through configuration broadcast configured transmission It can easily realize that DMA broadcast data transmissions control, realize that the DMA broadcast for starting a DSP core passes so as to simple, efficient The same data block of memory space outside the core of GPDSP can be transferred to all on chip by defeated affairs in the form of broadcasting DSP core.
In the present embodiment, source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt And destination frame residue unit number DstEleCnt meets following formula:
(SrcArrCnt+1) * SrcEleCnt==(DstArrCnt+1) * DstEleCnt;
Wherein SrcArrCnt+1 is the frame number of the outer moving data of required core, and SrcEleCnt is not read also in current source frame Number of data units, DstArrCnt+1 be it is required write-in core memory space data frame number, DstEleCnt be current purpose The number of data units not write also in frame.
When the present embodiment DMA carries out broadcast data transmission, specific host dma will be to one in memory space DDR outside core Block data block initiates broadcast read request, and the data of return are sent to all DSP cores;It represents to remove outside core by source frame number SrcArrCnt The frame number of data is moved, value SrcArrCnt+1, SrcEleCnt represent current source frame residue unit number, data cell DMA The least unit of transmission, broadcast transmission of data size are (SrcArrCnt+1) * SrcEleCnt, when SrcEleCnt is 0, when The calculating of previous frame read request finishes, and the value of SrcArrCnt subtracts 1;When SrcArrCnt is 0 and SrcEleCnt is also 0, read request Calculating finishes;DstArrCnt represents purpose frame number, and DstEleCnt represents current destination frame residue unit number, wherein (SrcArrCnt+1) * SrcEleCnt==(DstArrCnt+1) * DstEleCnt.
In the present embodiment, the transmission mode parameter TMODE for DMA transmission mode to be configured is further included, when transmission mode is joined When number TMODE is effective, starts and perform DMA broadcast data transmissions, for concrete configuration when TMODE=" 2 ' b11 ", transmission mode is wide Unicast data transmission that is, when TMODE=" 2 ' b11 ", starts broadcast data transmission by host dma.
In the present embodiment, the reading that broadcast read request includes returning nuclear information for mark data returns to selection vector RetVec is returned according to reading and vector RetVec is selected to determine the purpose core returned needed for reading returned data, the i.e. broadcast by sending Flag data return information is carried in read request, each DSP core is determined the need for according to the value for reading to return selection vector RetVec It returns and reads returned data.Reading to return selects vector RetVecc to share n, each corresponds to whether one DSP core of mark needs to return The state of retaking of a year or grade returned data, i.e. each one DSP core of correspondence, represent whether and return data to corresponding core.Broadcast is read Request, which further includes, reads address, reads mask, reads return address etc., i.e., carried by read request read address, read mask, read return address, It reads to return to the information such as selection vector RetVec.
In a particular embodiment, when TMODE=" 2 ' b11 " (i.e. DMA carries out broadcast data transmission), good broadcast transmission is configured After parameter, DMA initiate broadcast data transmission, DMA according to broadcast parameter SrcArrCnt, SrcEleCnt, DstArrCnt, DstEleCnt generation broadcast read requests are transmitted to network-on-chip, read request include reading address, read mask, reading return address and Parameter is read to return to selection vector RetVec, reads return selection vector RetVec and shares n, in broadcast is transmitted, signal RetVec N positions be all 1, represent that reading returned data returns to whole DSP cores;The outer memory space of core returns data on piece according to read request Network, all slave DMA receive data through network-on-chip passivity and are written in core.
The present embodiment is on the basis of point-to-point DMA transmission mode, by the way that 5 parameters are configured:Transmission mode TMODE, source Frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt, destination frame residue unit number DstEleCnt carrys out controlling transmission process, is asked by host dma according to the configured transmission generation broadcast data transmission being configured, together For the counting statistics of Shi Jinhang moving datas until broadcasting the end of transmission, the DMA that can start a DSP core broadcasts transmission transaction, All DSP cores that the same data block of memory space outside the core of GPDSP can be transferred in the form of broadcasting on chip.
In the present embodiment, when confirming completion data transmission, clearing buffers step is further included, the specific steps are:Work as host After the completion of DMA is counted, host dma sends out clearing buffers order to all from DMA, and each slave DMA receives empty buffer order simultaneously Clearing buffers operation is performed, this time transmission transaction terminates after emptying.
DMA broadcast data transmissions data are realized in the specific embodiment of the invention as shown in figure 3, wherein chip carries 12 DSP core (is stored with independent DMA and LM, LM for core memory space including vectorial storage unit AM and scalar in each core Component SM);The frame that Array expressions are moved, C0~C11 represent that, per data line block, size is 512bits, that is, 8words.This Broadcast transmission of data block size 4x96words;DMA initiates broadcast data transmission, transmits 4 frame data altogether, is per frame data size 96words.DDR has first removed the page of data of DDR, then turned over every time according to direction moving data shown in arrow in figure, DMA Page moves lower page of data;DDR sends data to network-on-chip according to read request, and slave DSP is received through network passivity Data.It can be seen from the above, using the present embodiment broadcast data transmission mode, the number that skips, the drop of DDR can be considerably reduced Low transmission is delayed, and effectively reduces the access times of DDR, while improve transmission bandwidth and the reading efficiency of DDR.
DMA broadcast data transmissions configured transmission word is as shown in figure 4, specifically include transmission mode in the specific embodiment of the invention TMODE, source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt, destination frame are remaining single First number DstEleCnt, wherein TMODE bit wides are 2, and when TMODE values, ' during b11, DMA starts broadcast data transmission, outside core for 2 It is moved in memory space DDR in same data block to 12 DSP cores;SrcArrCnt is source frame unit number, and bit wide 32 is maximum Frame number is 2 32 powers;SrcEleCnt be current source frame residue unit number, bit wide 32, maximum value be 2 32 powers subtract 1; DstArrCnt is purpose frame unit number, and bit wide 32, maximum value is 2 32 powers;DstEleCnt is remaining for current destination frame Unit number, 32 powers that maximum value is 2 subtract 1.
As shown in figure 5, the present embodiment realize GPDSP in DMA broadcast data transmissions the specific steps are:
S1. setting includes source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number respectively in advance The broadcast transmission parameters of DstArrCnt and destination frame residue unit number DstEleCnt;
S2. configuration broadcast configured transmission aft engine DMA starts DMA broadcast data transmissions, and according to broadcast transmission parameters SrcArrCnt, SrcEleCnt, DstArrCnt, DstEleCnt generation broadcast read request, read request include reading address, read to cover Code reads return address and parameter reading return selection vector RetVec, and the broadcast read request of generation is sent to through network-on-chip outside core Memory space;
S3. the outer memory space of core will read returned data according to broadcast read request and be sent to network-on-chip, each core in GPDSP It is received from network-on-chip and reads returned data and core memory space is written, host dma receives reading returned data and updates broadcast Configured transmission is to carry out this core transferred data count;
S4. when host dma receive last block number according to when count complete, host dma all DSP cores are sent out empty it is slow The order of punching, slave DMA, which receives to empty, performs null clear operation after order, and according to interrupting during enable bit sends out after the completion of emptying Disconnected request;After slave DMA cachings empty, the value of pre-set broadcast end register BOR inside set, broadcast transmission thing Business terminates.
In the present embodiment, when host dma receives reading returned data in step S3, the effective judgment step of data is further included, The specific steps are:Judge whether data effective, if it is effective, forward the data to core memory space, start host dma into Row counts, if in vain, directly initiating host dma and being counted.
As shown in figure 5, after the good broadcast data transmission parameter word of the present embodiment concrete configuration, host dma initiates broadcast data Transmission, host dma send out broadcast read request to memory space DDR outside core;DDR, which is returned, to be read to return data to network-on-chip, host DMA receives the reading returned data of each core from network-on-chip, other all cores passively receive reading from network-on-chip and return to number According to;This core core memory space is written if host dma detects data effectively, in vain if data are counted;Work as master Machine can send out other cores the order of clearing buffers after the completion of counting, the value of broadcast end of identification register is put 1 by slave, is completed After null clear operation, if it is 1 to interrupt enable bit, interruption is sent out.
Above-mentioned only presently preferred embodiments of the present invention not makees the present invention limitation in any form.It is although of the invention It is disclosed above with preferred embodiment, however it is not limited to the present invention.Therefore, it is every without departing from technical solution of the present invention Content, technical spirit any simple modifications, equivalents, and modifications made to the above embodiment, should all fall according to the present invention In the range of technical solution of the present invention protection.

Claims (10)

1. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in a kind of GPDSP, which is characterized in that this method includes:By leading Machine DMA starts DMA broadcast data transmissions, and through network-on-chip memory space outside core is sent to after generation broadcast read request;Core external memory Storage space is sent to network-on-chip according to the broadcast read request by returned data is read, and each core is from network-on-chip in GPDSP It receives and reads returned data and core memory space is written, host dma, which receives, to be read returned data and counted to confirm that data pass It is defeated whether to complete.
2. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP according to claim 1, which is characterized in that It is described to confirm whether transmission is completed to specifically include:Setting includes source frame number SrcArrCnt, source frame residue unit number respectively in advance The broadcast transmission parameters of SrcEleCnt, purpose frame number DstArrCnt and destination frame residue unit number DstEleCnt, the source Frame number SrcArrCnt is used to be configured the frame number of the outer moving data of core, and the source frame residue unit number SrcEleCnt works as counting Core memory space is written for being configured in the number of data units not read also in preceding source frame, the purpose frame number DstArrCnt Data frame number, the destination frame residue unit number DstEleCnt are used to count the data cell not write also in current destination frame Number confirms whether data transmission is completed according to the value of the broadcast transmission parameters.
3. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP according to claim 2, which is characterized in that The source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number DstArrCnt and destination frame are remaining single First number DstEleCnt meets following formula:
(SrcArrCnt+1) * SrcEleCnt==(DstArrCnt+1) * DstEleCnt;
Wherein SrcArrCnt+1 is the frame number of the outer moving data of required core, and SrcEleCnt is the number not read also in current source frame According to unit number, DstArrCnt+1 is the data frame number of required write-in core memory space, and DstEleCnt is in current destination frame The number of data units not write also.
4. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in the GPDSP according to claims 1 or 2 or 3, special Sign is that this method further includes the transmission mode parameter TMODE for DMA transmission mode to be configured, when the transmission mode parameter When TMODE is effective, starts and perform DMA broadcast data transmissions.
5. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in the GPDSP according to claims 1 or 2 or 3, special Sign is that the reading that the broadcast read request includes returning nuclear information for mark data returns to selection vector RetVec, according to institute Stating to read to return selects vector RetVec to determine the purpose core for reading to return needed for returned data.
6. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP according to claim 5, which is characterized in that The return selection vector RetVec that reads specifically has multidigit, each corresponds to whether one participation core for participating in transmitting of mark needs Return to the state for reading returned data.
7. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP according to claim 6, which is characterized in that The broadcast read request, which further includes, to be read address, reads mask, reads one or more information in return address.
8. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in the GPDSP according to claims 1 or 2 or 3, special Sign is, when confirming completion data transmission, further includes clearing buffers step, the specific steps are:Host dma is to all from DMA Clearing buffers order is sent out, each slave DMA receives the empty buffer order and performs clearing buffers operation, terminates broadcast and pass It is defeated.
9. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in the GPDSP according to claims 1 or 2 or 3, special Sign is, this method the specific steps are:
S1. setting includes source frame number SrcArrCnt, source frame residue unit number SrcEleCnt, purpose frame number respectively in advance The broadcast transmission parameters of DstArrCnt and destination frame residue unit number DstEleCnt;
S2. the broadcast transmission parameters aft engine DMA is configured and starts DMA broadcast data transmissions, and according to the broadcast transmission ginseng After number generation broadcast read request memory space outside core is sent to through network-on-chip;
S3. the outer memory space of core will read returned data according to the broadcast read request and be sent to network-on-chip, each core in GPDSP It is received from network-on-chip and reads returned data and core memory space is written, host dma, which receives, to be read described in returned data and update Broadcast transmission parameters are to be counted;
S4. when host dma receive last block number according to when count and complete, host dma sends out clearing buffers to all DSP cores Order, slave DMA, which receives to empty, performs null clear operation after order, and after the completion of emptying according to interrupt enable bit send out interruption please It asks;After slave DMA cachings empty, the value of pre-set broadcast end register BOR, broadcasts transmission transaction knot inside set Beam.
10. the DMA broadcast data transmission methods that Intrusion Detection based on host counts in GPDSP according to claim 9, feature exist In, when host dma receives reading returned data in the step S3, the effective judgment step of data is further included, the specific steps are:Sentence Whether disconnected data are effective, if it is effective, forward the data to core memory space, start host dma and counted, if nothing Effect, directly initiates host dma and is counted.
CN201711480231.7A 2017-12-29 2017-12-29 DMA broadcast data transmission method based on host counting in GPDSP Active CN108234147B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711480231.7A CN108234147B (en) 2017-12-29 2017-12-29 DMA broadcast data transmission method based on host counting in GPDSP

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711480231.7A CN108234147B (en) 2017-12-29 2017-12-29 DMA broadcast data transmission method based on host counting in GPDSP

Publications (2)

Publication Number Publication Date
CN108234147A true CN108234147A (en) 2018-06-29
CN108234147B CN108234147B (en) 2021-06-18

Family

ID=62647085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711480231.7A Active CN108234147B (en) 2017-12-29 2017-12-29 DMA broadcast data transmission method based on host counting in GPDSP

Country Status (1)

Country Link
CN (1) CN108234147B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114024920A (en) * 2021-11-24 2022-02-08 苏州暴雪电子科技有限公司 Data packet routing method for on-chip message network
CN118170702A (en) * 2024-05-13 2024-06-11 北京壁仞科技开发有限公司 DMA controller and data handling method for broadcasting

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521201A (en) * 2011-11-16 2012-06-27 刘大可 Multi-core DSP (digital signal processor) system-on-chip and data transmission method
US20150103826A1 (en) * 2009-10-30 2015-04-16 Calxeda Inc. System and method for using a multi-protocol fabric module across a distributed server interconnect fabric
CN104679691A (en) * 2015-01-22 2015-06-03 中国人民解放军国防科学技术大学 Multi-core DMA (direct memory access) subsection data transmission method used for GPDSP and adopting host counting
CN104679689A (en) * 2015-01-22 2015-06-03 中国人民解放军国防科学技术大学 Multi-core DMA (direct memory access) subsection data transmission method used for GPDSP (general purpose digital signal processor) and adopting slave counting

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150103826A1 (en) * 2009-10-30 2015-04-16 Calxeda Inc. System and method for using a multi-protocol fabric module across a distributed server interconnect fabric
CN102521201A (en) * 2011-11-16 2012-06-27 刘大可 Multi-core DSP (digital signal processor) system-on-chip and data transmission method
CN104679691A (en) * 2015-01-22 2015-06-03 中国人民解放军国防科学技术大学 Multi-core DMA (direct memory access) subsection data transmission method used for GPDSP and adopting host counting
CN104679689A (en) * 2015-01-22 2015-06-03 中国人民解放军国防科学技术大学 Multi-core DMA (direct memory access) subsection data transmission method used for GPDSP (general purpose digital signal processor) and adopting slave counting

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张帅: "一种支持多种传输模式的DMA主机模块设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
胡月安: "32位高性能M_DSP中支持高效数据传输的DMA设计与验证", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114024920A (en) * 2021-11-24 2022-02-08 苏州暴雪电子科技有限公司 Data packet routing method for on-chip message network
CN114024920B (en) * 2021-11-24 2023-10-27 苏州暴雪电子科技有限公司 Data packet routing method for on-chip message network
CN118170702A (en) * 2024-05-13 2024-06-11 北京壁仞科技开发有限公司 DMA controller and data handling method for broadcasting

Also Published As

Publication number Publication date
CN108234147B (en) 2021-06-18

Similar Documents

Publication Publication Date Title
CN110647480B (en) Data processing method, remote direct access network card and equipment
CN107301455B (en) Hybrid cube storage system for convolutional neural network and accelerated computing method
CN104699631B (en) It is multi-level in GPDSP to cooperate with and shared storage device and access method
KR101719092B1 (en) Hybrid memory device
EP2423821A2 (en) Processor, apparatus, and method for fetching instructions and configurations from a shared cache
CN105389277B (en) Towards the high-performance DMA components of scientific algorithm in GPDSP
CN102449611B (en) For the method and apparatus of issuing memory barrier commands in weak sequence storage system
CN103645994A (en) Data processing method and device
CN105183662A (en) Cache consistency protocol-free distributed sharing on-chip storage framework
CN104679691B (en) A kind of multinuclear DMA segment data transmission methods using host count for GPDSP
CN106775477B (en) SSD (solid State disk) master control data transmission management device and method
CN105556503A (en) Dynamic memory control method and system thereof
CN102968395B (en) Method and device for accelerating memory copy of microprocessor
CN104679689B (en) A kind of multinuclear DMA segment data transmission methods counted using slave for GPDSP
CN107015923A (en) Uniformity for managing snoop operations is interconnected and data processing equipment including it
CN102314400A (en) Method and device for dispersing converged DMA (Direct Memory Access)
CN108234147A (en) DMA broadcast data transmission method based on host counting in GPDSP
CN104317754B (en) The data transfer optimization method that strides towards heterogeneous computing system
CN102262608A (en) Method and device for controlling read-write operation of processor core-based coprocessor
CN100405333C (en) Method and device for processing memory access in multi-processor system
JP6679570B2 (en) Data processing device
CN113535611A (en) Data processing method and device and heterogeneous system
CN115174673B (en) Data processing device, data processing method and apparatus having low-latency processor
JP7177948B2 (en) Information processing device and information processing method
CN108062282A (en) DMA data merging transmission method in GPDSP

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant