CN102567256A

CN102567256A - Processor system, as well as multi-channel memory copying DMA accelerator and method thereof

Info

Publication number: CN102567256A
Application number: CN2011104255307A
Authority: CN
Inventors: 苏文; 苏孟豪
Original assignee: Loongson Technology Corp Ltd
Current assignee: Loongson Technology Corp Ltd
Priority date: 2011-12-16
Filing date: 2011-12-16
Publication date: 2012-07-11
Anticipated expiration: 2031-12-16
Also published as: CN102567256B

Abstract

The invention provides a processor system, as well as a multi-channel memory copying DMA (direct memory access) accelerator and a method thereof. The processor system comprises a multi-channel direct memory access (DMA) accelerator connected between a processor core and a memory through a data bus, and the multi-channel DMA accelerator is used for judging and decomposing task information of a data reading and writing request according to the task information of the data reading and writing request when the processor core emits the data reading and writing request of a memory copying command, controlling a plurality of reading and writing channels to emit the multiple reading and writing requests to the memory in parallel according to the task information of the data reading and writing request after decomposition, the reading and writing frequencies and the priorities of the plurality of the reading and writing channels in the task information and values of marker bits of the reading and writing channels, and further completing data reading and writing. The processor system has the advantages of high bandwidth, low latency, high degree of parallelism, reconfigurability and platform independence.

Description

Processor system and multichannel memory copy DMA accelerator and method

Technical field

The present invention relates to computer hardware architectures and processor design field; Be particularly related to a kind of support asynchronous memory access, memory read-write parallel based on the direct internal storage access of the embedded hyperchannel of processor (Direct Memory Access, processor system DMA) and multichannel memory copy DMA accelerator and method.

Background technology

In existing computer system, memory copying (Memory Copy) is a kind of important operation that between the internal memory diverse location, transmits data.It extensively is present in the middle of operating system and the types of applications program, and correlative study finds that memory copying operates in the 20%-40% that can account for the T.T. expense in the ICP/IP protocol processing.In operating system, the modular system function m emcpy that the memory copying operation defines through system kernel, bcopy etc. realize its function.To different Computer Architectures, operating system is also different to the concrete realization of this group function.In user program, C language standard storehouse (ANSIC) also copy function provides the function realization to internal memory.

As shown in Figure 1; Be existing memory copying processor system structure; Comprise processor core (CPU) 1, L2 cache (Cache) module 2, internal memory 3; Wherein, processor core comprises control module 11, arithmetic unit 12, register file 13, decoding and memory access unit 14, level cache (Cache) module 15 etc.Typical memory copying operation can be decomposed into a series of read-write operations that replace to internal memory with it on microcosmic.Processor core earlier sends a read operation to the A address, after it is accomplished, sends the value V (A) that a write operation will read back and writes address B; Send read request to A+1 afterwards, read back results V (A+1) is write B+1; Carrying out this process repeatedly accomplishes up to whole memory copying operation.

Existing a kind of memory copying accelerated method is the memory copying method of optimizing of resetting of instructing; This method is according to the streamline characteristics of particular architecture processor; Internal memory copy function programmed instruction is arranged again, postponed to obtain the stream of memory access continuously and to improve memory access efficient and reduce.Its embedded corresponding assembly instruction code in operating system nucleus memory copying function is replaced original general C code and is improved executing efficiency.And reset the compilation access instruction according to the characteristics of particular architecture, the pipeline stall when reducing the processor execution command instructs four or four arranged in groups like memory copying function under the MIPS architecture with load and store.

Existing a kind of memory copying accelerated method is the memory copying method that memory copying and access synchronized are optimized; This method is come the address of operation of record analysis memory copying and internal storage access operation through increasing the additional hardware module, does not improve instruction execution efficient thereby do not block processor.And the copy function primitive of optimization is provided in operating system, to realize the synchronous of copy procedure and other internal storage access processes.

Existing memory copying accelerated method has following shortcoming:

(1) system effectiveness is low.Still need processor to carry out relevant memory access and steering order when prior art is carried out memory copying, cause that processor can't carry out other operations in the whole copy procedure, it belongs to serial isochronous memory copy of processor control in essence.

(2) copying speed is slow.The inner general integrated 1-2 memory access parts of the processor of prior art; Could carry out the access instruction of back after having only current access instruction to accomplish; Therefore existing memory copying method is the serial access to internal memory on microcosmic; Can't carry out incoherent memory read-write operation simultaneously, cause copying speed slow.

(3) do not have universal compatibility.This method and processor structure and program instruction set are closely related, and the memory copying program after optimizing under the different architecture can not be compatible.

Summary of the invention

The object of the present invention is to provide a kind of processor system and multichannel memory thereof copy DMA accelerator and method, it has high bandwidth, low delay, high degree of parallelism, reconfigurableization, the advantage of platform-neutral.

A kind of processor system for realizing that the object of the invention provides comprises processor core, and internal memory, also comprises through data bus being connected the multi-channel DMA accelerator between processor core and the internal memory;

Said multi-channel DMA accelerator; Be used for sending memory copying order when producing the reading and writing data request, judge and decompose said reading and writing data tasks requested information according to said reading and writing data tasks requested information at processor core, and according to the mission bit stream of decomposed data read-write requests; And the wherein read-write frequency and the priority of a plurality of read-write channels; And the value of the marker bit of said read-write channel, control that a plurality of read-write channels are parallel to send repeatedly read-write requests to internal memory, accomplish reading and writing data.

More excellent ground, described processor system also comprises the cache module that is connected between internal memory and the multi-channel DMA accelerator, is used to be buffered in the data of transmitting between internal memory and the multi-channel DMA accelerator.

More excellent ground, said multi-channel DMA accelerator comprises at least one DMA engine modules and two interfaces;

Said DMA engine modules; Be used for judging and decomposing said reading and writing data tasks requested information according to said reading and writing data tasks requested information; And according to the mission bit stream of decomposed data read-write requests; And the wherein read-write frequency and the priority of a plurality of read-write channels, and the value of the marker bit of said read-write channel controls that a plurality of read-write channels are parallel to send repeatedly read-write requests to internal memory;

Said at least two interfaces are at least one data-interface and one control and communication interface;

Said data-interface is used for, and transmits the data of the required read-write of reading and writing data request of said memory copying order;

Said control and communication interface are used for communicating with said processor core and internal memory, and data are put in the receiving processor caryogamy, and dispose and be stored to the configuration register of said read-write channel according to said configuration data.

More excellent ground, said DMA engine modules comprise a plurality of read-write channels and corresponding marker bit thereof, connect two flow control unit and the data buffer of said processor core and said internal memory;

Said a plurality of read-write channel comprises a read channel and a write access at least;

Said read channel is used under the control of said flow control unit, and reading of data is to said processor core from internal memory;

Said write access is used under the control of said flow control unit, and the data that processor core is sent are written to internal memory;

Said flow control unit; Be used for value according to the marker bit of the configuration data of configuration register and read-write channel; And each read-write channel of the primary system meter that is provided with during initialization uses the situation of data bus, data bus distributed to the highest read-write channel of priority use, and the frequency and the priority of every read-write channel are controlled; Control the different different data transfer tasks of channel start, and with the duty of the mutual read-write channel of said control module;

Said data buffer is used for the data of cache read write access;

Each said read-write channel comprises a configuration register, be used to receive and storage of processor authorize send here, supply the configuration data of read-write channel read-write;

The data volume that the each read-write requests of the said read-write channel of each marker bit mark is read and write.

More excellent ground, the control module of said processor core comprises initialization subelement and configuration subelement, wherein:

Said initialization subelement is used for when processor core carries out initialization, said flow control unit being carried out initialization, and its initialization state is set, and starts the data transfer task of passage;

Said configuration subelement is used for through said control and communication interface, sends the configuration register of configuration data to said read-write channel to every read-write channel, and the value of the corresponding configuration register of every read-write channel is set.

For realizing that the object of the invention also provides a kind of multichannel memory copy DMA accelerator; Be used for receiving processor core when internal memory sends the copies data read-write requests; Judge and decompose said reading and writing data tasks requested information according to said reading and writing data tasks requested information; And according to the mission bit stream of decomposed data read-write requests, and the wherein read-write frequency and the priority of a plurality of read-write channels, and the value of the marker bit of said read-write channel; Control that a plurality of read-write channels are parallel to send repeatedly read-write requests to internal memory, accomplish reading and writing data.

For realizing that the object of the invention more provides a kind of memory copying accelerated method, comprise the steps:

Step S101, the control module of processor core sends the memory copying order to the multi-channel DMA accelerator;

Step S102; Multichannel memory copy DMA accelerator is when receiving processor core to memory copying reading and writing data request that internal memory sends; Judge and decompose said reading and writing data tasks requested information according to said reading and writing data tasks requested information, and according to the mission bit stream of decomposed data read-write requests, and the wherein read-write frequency and the priority of a plurality of read-write channels; And the value of the marker bit of said read-write channel; Control that a plurality of read-write channels are parallel to send repeatedly read-write requests to internal memory, the parallel data read-write is until accomplishing all read-write operations.

More excellent ground, among the said step S102, said judgement is also decomposed said reading and writing data tasks requested information, comprises the steps:

Multichannel memory copy DMA accelerator is judged according to said mission bit stream; When copies data total length during greater than the bus bit wide of the single passage of said multi-channel DMA accelerator; Then the mission bit stream to said memory copying order decomposes; Mission bit stream according to the memory copying order after decomposing sends repeatedly read-write requests through a plurality of passages to internal memory by the multi-channel DMA accelerator;

Otherwise multichannel memory copy DMA accelerator selects a passage to send the reading and writing data request to internal memory at random.

More excellent ground, said parallel data read-write comprises the steps:

Step S1021, said cache module are connected between internal memory and the multi-channel DMA accelerator, are buffered in the data of transmitting between internal memory and the multi-channel DMA accelerator;

After said cache module receives the reading and writing data request of multi-channel DMA accelerator, judge whether in cache module, whether had respective backup by visit data in the said reading and writing data request;

Step S1022 is if said reading and writing data request is had respective backup, execution in step S1023 by visit data in cache module; Otherwise execution in step S1024;

Step S1023 reads and writes in cache module accordingly by visit data according to said reading and writing data request, i.e. internal storage access cache hit, return read operation required by visit data, or upgrade the corresponding cache blocks of write operation by visit data;

Step S1024; Said reading and writing data request can't be read by visit data from the respective backup of cache module; Be the internal storage access cache miss, then cause the buffer memory replacement operation, will treat by visit data by changing in the external memory in the cache module; And return read operation required by visit data, or upgrade corresponding for write operation by the backup of visit data in buffer memory.

More excellent ground, said step S102 also comprises the steps:

Step S201, processor core are provided with the configuration register of every passage of multi-channel DMA accelerator, and the value of initialization s-tag;

Step S202, each bar passage of said multi-channel DMA module starts corresponding data transfer task according to the value of configuration register among the step S201;

Step S203, in each clock period, each passage is provided with the value of p-tag according to self working state, detects tag0～tag7 of s-tag simultaneously;

If the marker bit data represented amount of s-tag then temporarily quits work, and sends interrupt request to processor core, then execution in step S204 during passage completing steps S201 initialization; Continue execution in step S202 otherwise return;

Step S204, whether processor core detects the value of s-tag and the value of inquiring about p-tag, accomplish according to the value judgment data transmission of s-tag and p-tag; If then finish whole transformation task; Otherwise, begin next marker bit data represented transmission.

More excellent ground, said cache module is the L2 cache module.

Processor system of the present invention and multichannel memory copy accelerator and method have following beneficial effect:

(1) high bandwidth, the low delay: the present invention handles through the streamlined of many passages through in processor, adding the direct internal storage access of hyperchannel (DMA) accelerator, can obtain very high bandwidth and very low delay;

(2) high degree of parallelism: the present invention is provided with the duty of a group echo position mark channel through every passage at the multi-channel DMA accelerator; Make that (Direct Memory Access has realized a kind of more fine-grained concurrent working mechanism between data transmission DMA) with direct memory access in processor core calculating;

(3) reconfigurableization: the present invention realizes the data transmission procedure of CPU to real-time reconfigurableization of direct memory access (DMA) module through control and communication interface;

(4) platform-neutral: the present invention can effectively avoid the dependence of software platform, has good portability.

Figure of description

Fig. 1 is the existing processor system structural representation that carries out memory copying;

Fig. 2 is the processor system structural representation that carries out memory copying of the embodiment of the invention;

Fig. 3 is a multi-channel DMA accelerator structure synoptic diagram among embodiment of the invention Fig. 2;

Fig. 4 is a control module structural representation among embodiment of the invention Fig. 2;

The mutual synoptic diagram of Fig. 5 processor core (CPU) that is the embodiment of the invention in a memory copying operation and multi-channel DMA accelerator and cache module;

The course of work synoptic diagram of Fig. 6 parallel memory copying function m emcpy () of processor multichannel memory copy method of the present invention for the present invention uses.

Embodiment

In order to make the object of the invention, technical scheme and advantage clearer,, processor system of the present invention and multichannel memory thereof copy DMA accelerator and method are further elaborated below in conjunction with accompanying drawing and embodiment.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.

Embodiment one

As shown in Figure 2, be embodiment of the invention processor system, comprise processor core 1, and internal memory 3, and be connected multi-channel DMA (Direct Memory Access, the direct memory access) accelerator 4 between processor core 1 and the internal memory 3 through data bus;

Said multi-channel DMA accelerator 4; Be used for when processor core 1 sends the reading and writing data request of memory copying order, judge and decompose said reading and writing data tasks requested information according to said reading and writing data tasks requested information, and according to the mission bit stream of decomposed data read-write requests; And the wherein read-write frequency and the priority of a plurality of read-write channels; And the value of the marker bit of said read-write channel, control that a plurality of read-write channels are parallel to send repeatedly read-write requests to internal memory 3, accomplish reading and writing data.

Preferably, but as a kind of embodiment, the processor system of the embodiment of the invention also comprises the cache module 2 that is connected between internal memory 3 and the multi-channel DMA accelerator 4, is used to be buffered in the data of transmission between internal memory 3 and the multi-channel DMA accelerator 4.

But as a kind of embodiment; As shown in Figure 2; Said processor core 1 comprises control module 11, arithmetic unit 12, register file 13, decoding memory access unit 14 and level cache (Cache) module 15 etc.; Is processor core 1 with said control module 11, arithmetic unit 12, register file 13, decoding memory access unit 14 and level cache (Cache) module 15 as a hardware module encapsulation integral body, and said multi-channel DMA accelerator 4 is connected between the control module 11 and internal memory 3 of processor core through data bus.

Embodiment of the invention processor system; Through integrated multi-channel DMA accelerator 4; Eliminate the intervention of processor core in the memory copying process; The multi-channel DMA accelerator 4 inner integrated through processor system has the time delay that buffer memory (Cache) module 2 at utmost reduces the DMA read/write memory now with utilizing, and when keeping structure independence, reduces the influence to memory subsystem, and can realize the parallel work-flow of arithmetic unit computation process and the operation of asynchronous DMA engine memory copying.

But as a kind of embodiment; The multi-channel DMA accelerator 4 of the said embodiment of the invention is arranged at processor system inside; Be connected between the control module 11 and internal memory 3 of processor core 1, between processor core 1 and internal memory 3, opened up a direct data path.

But as a kind of embodiment, as shown in Figure 3, the multi-channel DMA accelerator 4 of the embodiment of the invention comprises at least one DMA engine modules 41 and two

interfaces

42,43;

Said DMA engine modules 41; Be used for judging and decomposing said reading and writing data tasks requested information according to said reading and writing data tasks requested information; And, send repeatedly read-write requests to internal memory 3 through a plurality of passages according to the mission bit stream of decomposed data read-write requests.

Preferably, as shown in Figure 3, the DMA engine modules 41 of the embodiment of the invention comprises a plurality of read-write channels and corresponding 412, two flow control unit 411 of marker bit and data buffer 413.

Said a plurality of read-write channel 412 comprises a read channel and a write access at least.

Said read channel is used under the control of said flow control unit, and reading of data is to said processor core from internal memory 3;

Said write access is used under the control of said flow control unit, and the data that processor core is sent are written to internal memory 3.

Said flow control unit 411; The value of the configuration data that is used for sending and the marker bit of read-write channel according to processor core; And each read-write channel of the primary system meter that is provided with during initialization uses the situation of data bus, data bus distributed to the highest read-write channel of priority use, and the frequency and the priority of every read-write channel are controlled; Control the different different data transfer tasks of channel start, and with the duty of said control module 11 mutual read-write channels.

But as a kind of embodiment; (certain bar passage sends one-time request to utilize one group of register to come to preserve respectively the memory access request number that current each bar DMA passage sent in the flow control unit 411; The corresponding registers value adds 1); When many passages sent the memory access request simultaneously, the minimum passage of corresponding register value obtained limit priority according to the equity dispatching principle.

Said data buffer 413 is used for the data of cache read write access.

Each said read-write channel 412 comprises a configuration register 4121, be used to receive and storage of processor authorize send here, supply the configuration data of read-write channel read-write.

As shown in Figure 3, said at least two interfaces of the multi-channel DMA accelerator 4 in the embodiment of the invention are at least one data-interface 43 and control and communication interface 42.

Said data-interface 43 is used to transmit the data of the required read-write of reading and writing data request of said memory copying order;

Said control and communication interface 42 are used for communicating with internal memory 3 with said processor core 1, and data are put in the receiving processor caryogamy, and dispose and be stored to the configuration register 4121 of said read-write channel according to said configuration data.

Preferably, but as a kind of embodiment, said interface is the interface of AXI (the Advanced eXtensible Interface) bus based on 128, accomplishes the read-write operation of internal storage location through this data bus; Said cache module 2 is the L2 cache module.

Correspondingly, but as a kind of embodiment, as shown in Figure 4, the control module 11 of said processor core comprises initialization subelement 111 and configuration subelement 112, wherein:

Said initialization subelement 111 is used for when processor core carries out initialization, and said flow control unit 411 is carried out initialization, and its initialization state is set, and starts the data transfer task of passage.

Said configuration subelement 112 is used for through said control and communication interface 42, sends the configuration register 4121 of configuration data to said read-write channel to every read-write channel, and the value of the corresponding configuration register of every read-write channel is set.

Preferably, said configuration data comprises the information of source address, destination address, data segment, length etc.

Said processor core 1 can be provided with the value of the corresponding configuration register 4121 of every read-write channel through said control and communication interface 42, thereby makes flow control unit to start different data transfer tasks according to putting data.

Core component as the embodiment of the invention; DMA engine modules 41 can reduce the instruction strip number that processor core is carried out; Can support the locality of memory copying operation, buffer memory (Cache) consistance and the program of continuous step-length; Degree of parallelism in the raising memory copying process between calculating and the data transmission, thus very high program run efficient obtained, and very low power and area overhead.

But as a kind of embodiment, the DMA engine modules 41 of the embodiment of the invention, as shown in Figure 3, comprise three read channels and a write access; Every passage all has oneself independently configuration register and marker bit (tag).Wherein a read-write channel other read-write channel when starting can carry out its data transmission; Promptly four read-write channels can the parallel processing data; Its effectively reduced passage startup, conversion, suspend and restart expense; In addition, because many passages can concurrent working, this has just greatly increased the bandwidth of data transmission.

The configuration subelement 112 of the control module 11 of processor core 1 is provided with the value of every channel arrangement register 4121 through said control and communication interface 42; Preferably, the value of said channel arrangement register 4121 comprises the value of source address, destination address, data segment, length etc.; The initialization subelement of control module is accomplished the initial work of flow control unit simultaneously, and starts the data transfer task of passage.

In data transmission procedure; Flow control unit 411 in the read-write channel will marker bit (tag) separately feeds back to the control module 11 of processor core, the duty separately of the said passage of control module 11 signs of processor core through said control and communication interface 4121.

But as a kind of embodiment, in the embodiment of the invention, the mode of operation of first in first out (FIFO) is adopted in said data buffer, holds the data of 2K byte, and the data that read channel reads back are temporarily stored in the data buffer earlier and write back internal memory by write access again.

Each passage of the primary system meter that flow control unit 411 is provided with during according to initialization uses the situation of bus, uses to the highest passage of priority bus assignment, realizes the frequency of every passage is controlled with priority.

In order to reduce control and mutual time delay.In embodiments of the present invention, described multi-channel DMA accelerator 4 all is provided with a group echo position (tag) in every read-write channel, and controls said marker bit by flow control unit and use.

Specifically, each group echo position (tag) all comprises the s-tag (tag0-tag7) of one 8 bit and the p-tag (tag0-tag7) of one 8 bit.

Wherein, s-tag: expression control mark, p-tag: expression status indication.

Wherein s-tag is for the condition of work through the pre-set DMA passage of CPU, thus the control after being implemented in DMA and starting working.But as a kind of embodiment, in the embodiment of the invention, s-tag has 8 0-7 positions, and whole DMA transmission course is divided into 8 sections (1/8 increases progressively).For instance: as in advance the position of s-tag 1,5 being made as 1; Then DMA has accomplished 2/8 of setting data total length in transmission respectively; With suspended (stall) at 6/8 o'clock if. be exactly to plan to transmit the data of 80 bytes in advance specifically through DMA; Then through after s-tag1,5 are set; DMA can get into halted state automatically when 20 bytes and 60 bytes (this statistics and be that counter register is realized through the transmission statistic function in the flow control unit relatively) are accomplished in transmission, suspends the back and says the word the artificial transmission that recovers through CPU.

Wherein, p-tag is the situation that the current DMA transmission of reflection is accomplished.Take example, establish DMA and accomplish 80 bytes of continual transmission, then p-tag accomplish greater than 10 at DMA successively, during 20...80 byte, its 0-7 position 1.The transmission working condition that the main CPU for ease of this mark inquires about current DMA.

S-tag and p-tag have no mutual in work.For instance, can s-tag not carried out any setting (being the free of discontinuities transmission), p-tag still can change according to DMA transmission situation.The two s-tag is important, is to realize that DMA controls the setting of (do not have and intervene) in advance, and p-tag is convenient mutual the using of CPU.

8 s-tag, p-tag are divided into 8 sections with passage; But transformation task is divided into 8 sections; After the once basic transmission requests of every completion, will check (through the inquiry counter register in the flow control unit relatively) current transmitted data whether more than or equal to the requirement of s-tag (1/8,2/8...8/8).The also same s-tag of transmission data phase of P-tag representative, every corresponding respectively (1/8,2/8...8/8).

The variation of s-tag, p-tag is to be provided with and the variation of DMA volume of transmitted data according to said CPU, and itself does not control flow control unit it.Just the read-write to marker bit is to send the back by CPU to accept and return corresponding state by flow control unit.

Each data represented amount of marker bit is that the total amount of data according to task decides, and promptly different task maybe be different, but all 1/8,2/8...

80 byte tasks for example, each of tag is represented 10 bytes

800 byte tasks, each of tag is represented 100 bytes

But, when the initialization of initialization subelement, each bit of s-tag and p-tag is labeled as 0 as a kind of embodiment; From LSB (Least Significant Bit; Least significant bit (LSB)) beginning, flow control unit are to each bit position 1 of s-tag, when accomplishing the data transfer task of marker bit representative; Promptly after each data transfer request is accomplished corresponding data volume from said read-write channel read-write; Corresponding read-write channel is with break-off, and then, read-write channel then can be with the corresponding bits position 1 among the p-tag.

In a concrete by way of example, each marker bit of the s-tag of read channel 1 and p-tag is all represented 1/8 data transfer task.

After flow control unit receives the reading and writing data request in the memory copying order; Judge that according to the mission bit stream in the reading and writing data request decomposition method according to setting in advance decomposes, and is decomposed into a plurality of read-write requests with the decomposed data read-write requests, then according to the configuration data in the configuration register; And the read-write frequency of each read-write channel and priority; And the value of marker bit, a plurality of read-write channels of parallel starting, reading and writing data.

Read channel 1 then temporarily quits work, and sends interrupt request to processor core (CPU) after running through 1/8 data task;

After the control module of processor core (CPU) is received interrupt request, detect the value of s-tag and p-tag, thereby know that read channel 1 writes the data buffer with 1/8 data segment by flow control unit.

Provide the example of one 1280 (160 * 8) byte of memory copy (being copied to address B) below by address A, as shown in Figure 5, further specify the multi-channel DMA accelerator 4 of the processor system of the embodiment of the invention.

At first carry out the setting of memory copying method;

(11) be provided with and utilize a read channel r and write access w to work simultaneously to accomplish (simple scenario can think between hyperchannel of the same type that passage has limit priority in the work during arbitration, between the read-write be walk abreast do not need arbitration).

(12) marker bit strategy (only using s-tag here) is set.After 160 bytes of the every completion of read channel, suspend; CPU starts 160 bytes that write access will read back at this moment and writes destination address, parallel simultaneously recovery read channel work just now, and same write access is accomplished 160 bytes transmission back time-out with read channel.Circulate 8 times to 1280 byte datas completion memory copying.

Then, configuration DMA passage and marker bit;

(21) source address (A), destination address (B) and the length (1280) of configuration read channel and write access

(22) only dispose the s-tag of read channel r earlier this moment,, start the read channel r work of DMA its 0-7 position 1.

Carry out first time read channel r time-out thereafter;

(31) after read channel r accomplishes 160 bytes and reads (1/8), the Rule of judgment of s-tag position 0 is triggered (position 0 corresponding 1/8), and read channel r suspends and also sends look-at-me notice CPU.

(32) have no progeny during CPU receives, the marker bit 0-7 of configurable write passage w puts 1, and starts write access w work (beginning 120 bytes that run through are before write destination address); Recover read channel r work simultaneously, with the position 0 of read channel s-tag (if decline 0 can send out repeatedly interruptions-to greater than 1/8 while less than 2/8 situation).

At last, carry out the residue process;

(41) after read channel r accomplishes 320 bytes (2/8) transmission, send interruption once more this moment, this moment is through judging the p-tag of write access w, write (1/8) whether inspection write access w has accomplished 160 bytes; If accomplished then recover read channel r and write access w, remove corresponding s-tag position (reason is with 3.2) simultaneously, otherwise continue the p-tag of poll write access w

(42) carry out 4.1 repeatedly up to the completion of 1280 bytes copy

From the whole process crucial effects of can having found out s-tag, CPU according to the embodiment of the invention after accomplishing with the transmission of 1/8 (160 byte) data volume in the Interrupt Process with s-tag, p-tag carries out alternately.Mutual method will be looked concrete setting and decided.

Whole process is except last of first 1/8 process of read channel r and write access 1/8, and other processes all are the read-write channel concurrent workings.Suppose that serial copies 16 unit interval of 1280 byte process needs (read 160, write 160, read 160... again and write last 160 bytes), based on the realization of the embodiment of the invention only need 10 unit interval (read 160, write 160+ read next 160... write last 160).Therebetween only need in the have no progeny value of (totally 8 times) change s-tag and inquiry p-tag, expense is very little.

Embodiment two

Correspondingly, the embodiment of the invention provides a kind of memory copying accelerated method, comprises the steps:

Step S102; The multi-channel DMA accelerator is when receiving the reading and writing data request of the memory copying order that processor core sends to internal memory; Judge and decompose said reading and writing data tasks requested information according to said reading and writing data tasks requested information, and according to the mission bit stream of decomposed data read-write requests, and the wherein read-write frequency and the priority of a plurality of read-write channels; And the value of the marker bit of said read-write channel; Control that a plurality of read-write channels are parallel to send repeatedly read-write requests to internal memory, the parallel data read-write is until accomplishing all read-write operations;

Preferably, among the said step S102, said judgement is also decomposed said reading and writing data tasks requested information, comprises the steps:

The multi-channel DMA accelerator is judged according to said mission bit stream; When copies data total length during greater than the bus bit wide of the single passage of said multi-channel DMA accelerator; Then the mission bit stream to said memory copying order decomposes; Mission bit stream according to the memory copying order after decomposing sends repeatedly read-write requests through a plurality of passages to internal memory by the multi-channel DMA accelerator;

Otherwise the multi-channel DMA accelerator selects a passage to send the reading and writing data request to internal memory at random;

Preferably, among the said step S102, said parallel data read-write comprises the steps:

After said cache module receives the reading and writing data request of multi-channel DMA accelerator, judge whether in cache module, had respective backup by visit data in the said reading and writing data request;

Step S1022 is if said reading and writing data request is had respective backup, execution in step S1023 by visit data in buffer memory (Cache) module; Otherwise execution in step S1024;

Step S1023 reads and writes in cache module accordingly by visit data according to said reading and writing data request, and promptly internal storage access buffer memory (Cache) hits, return read operation required by visit data, or upgrade corresponding buffer memory (Cache) piece of write operation by visit data;

Step S1024; Said reading and writing data request can't be read by visit data from the respective backup of cache module; Be internal storage access buffer memory (Cache) disappearance, then cause buffer memory (Cache) replacement operation (Cache Evict), will treat by visit data by changing in the external memory in the cache module; And return read operation required by visit data, or upgrade corresponding for write operation by the backup of visit data in buffer memory (Cache).

Said buffer memory (Cache) replacement operation (Cache Evict) is a kind of prior art, therefore, in embodiments of the present invention, describes in detail no longer one by one.

But as a kind of embodiment, preferably, the memory copying accelerated method of the embodiment of the invention, said step 102 also comprises the steps:

Step S201, processor core (CPU) is provided with the configuration register of every passage of multi-channel DMA accelerator, and the value of initialization s-tag;

If the marker bit data represented amount of s-tag then temporarily quits work during passage completing steps S201 initialization, and sends interrupt request to processor core (CPU), then execution in step S204; Continue execution in step S202 otherwise return.

Step S204, whether processor core (CPU) detects the value of s-tag and the value of inquiring about p-tag, accomplish according to the value judgment data transmission of s-tag and p-tag; If then finish whole transformation task; Otherwise, begin next marker bit data represented transmission.

Illustrate processor system and the multichannel memory copy accelerator and the method for the embodiment of the invention below.

In a practical implementation example, a kind of course of work of using the parallel memory copying function m emcpy () of processor system of the present invention and multichannel memory copy accelerator and method, as shown in Figure 6.

Function m emcpy (src, dst, len) in, src representes the source address of data segment to be copied, dst representes the destination address that copies, len representes the length of data segment.

The read channel of the said multi-channel DMA accelerator of load (src) expression is from the source address reading of data;

The write access of the said multi-channel DMA accelerator of store (dst) expression writes destination address with data.

As shown in Figure 6, based on the marker bit (tag) of read-write channel, the read-write operation of memory copying process can be realized a kind of processing of streamlined among this embodiment: read channel reads tag0 data represented amount from source address earlier, sends interrupt request to CPU then.If CPU judges this part data and has been ready to then begins the data transfer task of write access, the data that read write back the destination address of internal memory.At this moment, read channel continues to read tag1 data represented amount, like this write access just can with the read channel concurrent working.Through the processing of this streamlined, the startup of write access and the conversion between the read-write channel, suspend and expense such as restart and all stashed, reduced the expense of processor, also obtained very high bandwidth availability ratio simultaneously.

Should be noted that at last that obviously those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these revise and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification.

Claims

1. a processor system comprises processor core, and internal memory, it is characterized in that, also comprises through data bus being connected the multi-channel DMA accelerator between processor core and the internal memory;

2. processor system according to claim 1 is characterized in that, also comprises the cache module that is connected between internal memory and the multi-channel DMA accelerator, is used to be buffered in the data of transmitting between internal memory and the multi-channel DMA accelerator.

3. processor system according to claim 1 is characterized in that, said multi-channel DMA accelerator comprises at least one DMA engine modules and two interfaces;

Said data-interface is used to transmit the data of the required read-write of reading and writing data request of said memory copying order;

4. processor system according to claim 3 is characterized in that, said interface is the interface based on 128 bus, accomplishes the read-write operation of internal storage location through this data bus; Said cache module is the L2 cache module.

5. processor system according to claim 3 is characterized in that, said DMA engine modules comprises a plurality of read-write channels and corresponding marker bit thereof, connects two flow control unit and the data buffer of said processor core and said internal memory;

Said read channel is used under the control of said flow control unit, and reading of data is to said processor core from said internal memory;

Said flow control unit; Be used for value according to the marker bit of the configuration data of configuration register and read-write channel; And each read-write channel of the primary system meter that is provided with during initialization uses the situation of data bus, data bus distributed to the highest read-write channel of priority use, and the frequency and the priority of every read-write channel are controlled; Control the different different data transfer tasks of channel start, and with the duty of the mutual read-write channel of control module of processor core;

Said data buffer is used for the data of cache read write access;

6. according to each described processor system of claim 1 to 5, it is characterized in that the control module of said processor core comprises initialization subelement and configuration subelement, wherein:

7. processor system according to claim 6 is characterized in that said configuration data comprises source address, destination address, data segment, length.

8. processor system according to claim 5; It is characterized in that; Said many read-write channels comprise three read channels and a write access; Every passage all has oneself independently configuration register and marker bit, and wherein a read-write channel other read-write channel when starting can carry out its data transmission, and promptly four read-write channels can the parallel processing data.

9. a multichannel memory copies the DMA accelerator; It is characterized in that; Be used for receiving processor core when internal memory sends the copies data read-write requests, judge and decompose said reading and writing data tasks requested information according to said reading and writing data tasks requested information, and according to the mission bit stream of decomposed data read-write requests; And the wherein read-write frequency and the priority of a plurality of read-write channels; And the value of the marker bit of said read-write channel, control that a plurality of read-write channels are parallel to send repeatedly read-write requests to internal memory, accomplish reading and writing data.

10. multichannel memory copy DMA accelerator according to claim 9 is characterized in that said multi-channel DMA accelerator comprises at least one DMA engine modules and two interfaces;

11. according to claim 9 or 10 described multichannel memory copy DMA accelerators; It is characterized in that; Said DMA engine modules comprises a plurality of read-write channels and corresponding marker bit thereof, connects two flow control unit and the data buffer of said processor core and said internal memory;

Said flow control unit; Be used for value according to the marker bit of the configuration data of configuration register and read-write channel; And each read-write channel of the primary system meter that is provided with during initialization uses the situation of data bus, data bus distributed to the highest read-write channel of priority use, and the frequency and the priority of every read-write channel are controlled; Control the different different data transfer tasks of channel start, and with the duty of the mutual read-write channel of control module of said processor;

Said data buffer is used for the data of cache read write access;

12. a memory copying accelerated method is characterized in that, comprises the steps:

13. memory copying accelerated method according to claim 12 is characterized in that, among the said step S102, said judgement is also decomposed said reading and writing data tasks requested information, comprises the steps:

14. memory copying accelerated method according to claim 12 is characterized in that, said parallel data read-write comprises the steps:

15. memory copying accelerated method according to claim 14 is characterized in that said step S102 also comprises the steps:

16. memory copying accelerated method according to claim 14 is characterized in that, said cache module is the L2 cache module.