CN107870879A - A kind of data-moving method, accelerator board, main frame and data-moving system - Google Patents

A kind of data-moving method, accelerator board, main frame and data-moving system Download PDF

Info

Publication number
CN107870879A
CN107870879A CN201610848270.7A CN201610848270A CN107870879A CN 107870879 A CN107870879 A CN 107870879A CN 201610848270 A CN201610848270 A CN 201610848270A CN 107870879 A CN107870879 A CN 107870879A
Authority
CN
China
Prior art keywords
data
accelerator
main frame
board
moving
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610848270.7A
Other languages
Chinese (zh)
Other versions
CN107870879B (en
Inventor
段然
白常明
崔春风
任容玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communications Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communications Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communications Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201610848270.7A priority Critical patent/CN107870879B/en
Publication of CN107870879A publication Critical patent/CN107870879A/en
Application granted granted Critical
Publication of CN107870879B publication Critical patent/CN107870879B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides a kind of data-moving method, accelerator board, main frame and data-moving system, master control side of the accelerator board as data-moving, eliminates the interaction that main frame is stuck in the DMA configurations of data-moving with accelerator plate, improves treatment effeciency.In addition, the present invention uses polling mode audit memory by main frame, rather than by interface poll accelerator register, improve search efficiency.In addition, the present invention additionally uses thread binding core and the big page skill upgrading of the general processing platform real-time of polling procedure, reducing the non real-time nature of general processing platform influences.

Description

A kind of data-moving method, accelerator board, main frame and data-moving system
Technical field
The present invention relates to general purpose processor platform technical field, and in particular to a kind of data-moving method, accelerator board, Main frame and data-moving system.
Background technology
Centralized wireless access network (C-RAN) framework is that the revolution of conventional wireless access network (RAN) is changed, it One important goal is to realize separation of software and hardware, and base station functions are realized using software mode on general processing platform, final real Existing hardware resource is shared between different systems base station or multiple base station carriers.In order to realize such a target, with IT technologies Based on, base station physical layer is handled based on the mode of pure software realization, such system scheme is on C-RAN evolution road An important role is play on line, this is a kind of with traditional Digital Signal Processing (DSP)+field programmable gate array (FPGA) the entirely different base station solution of base station vertical solution.This solution has easily extension, easily upgrading The advantages that with good forward direction/backwards compatibility.
Although being based on general purpose processor platform, the solution of physical layer function is realized by the way of pure software realization With above advantage, but new framework in terms for the treatment of effeciency, performance indications and reliability compared with traditional base station architecture also There are many incomplete places, especially in terms of base station signal processing, place of the general processor when handling intensive mathematical Reason efficiency is far below special DSP Processor.Therefore, introduce on general processing platform and add for the exclusive of intensive mathematical Fast device just into a kind of selection, there is scheme at present, is devised based on PCIe buses or based on general processor SoC internal bus Some hardware-accelerated schemes, such as CPU+GPU, or CPU+FPGA.But in hardware-accelerated scheme, generally used between main frame and accelerator The handshake method of down trigger, the operation result of accelerator is notified to the program operated in CPU, trigger data stream is from acceleration Device board (such as PCIe device) is moved to the process of host memory.By taking the accelerator facility communication of typical PCIe types as an example, Description Acceleration of starting device board and accelerator board task terminate after feedback processing two processes.
Wherein, in the Booting sequence of accelerator board as shown in Figure 1, data are issued to accelerator plate from host computer side and sticked into Row processing.When needing to call hardware accelerator resource to perform data processing in main frame running, following processing is generally included Step:
Step 11, pending data is moved into specified memory to wait and moved by the CPU of main frame;
Step 12, host CPU writes direct memory access (DMA, Direct Memory Access) to accelerator board The host memory address of transmission, internal storage data move the data-moving active flag position in size, and DMA control register;
Step 13, accelerator plate clamping receives the memory read request bag of host CPU transmission, parses the request bag, will be corresponding Content writes DMA register group;
Step 14, the digital independent flow of accelerator board is initiated, encapsulation internal memory reads the control bag of application request, transfers to bottom Layer transaction layer is sent to main frame;
Step 15, after host CPU receives reading application request, encapsulated data packet, and sent by interface unit to acceleration Device board;
Step 16, after the interface unit of accelerator board receives above-mentioned packet, it is responsible for unpacking by bag receiving module, obtains Original data content;
Step 17, accelerator functional unit, the accelerator processing procedure of log-on data bag are transmitted to.
The state of register sign current accelerator functional unit can be now used in accelerator board internal register, or It is stuck in by accelerator plate on interface and feeds back to the information that main frame current accelerator has begun to processing data using interrupting.Host computer side Application program can be triggered by interrupt response, or the state by the register group in poll accelerator board, inquire about accelerator Working condition.Dotted line herein below in figure, automatically process part for hardware, it is not necessary to which main frame, which is excessively intervened, can realize faster Accelerator data processing.
Fig. 2 gives accelerator board processing and terminates the flow that backward main frame is fed back.Accelerator board processing terminates Refer to that the data processing that accelerator functional unit issues main frame finishes, wait and result is fed back into main frame.Main frame can be with Accelerator board working condition is learned by two class modes, one kind is to obtain system running state by interrupting from interface, separately One kind is whether to complete inter-related task by the register group condition adjudgement of poll accelerator board.When main frame learns that task is handled After the completion of initiate step as shown in Figure 2:
Step 21, the DMA copy register cells of host CPU initiation access accelerator board, write control information, specifically It can include copying address register to the DMA of accelerator board and DMA copies data length register write-in host memory and copied Shellfish address and predicted data length;
Step 22, the control information sent by the interface between accelerator and main frame, accelerator plate card Receiving Host, deposit Corresponding register;
Step 23, the packet encapsulation module of bottom affairs layer unit receives DMA kick command word;
Step 24, it is internal memory data packets that packet encapsulation module, which is connect data encapsulation group bag waiting for transmission,;
Step 25, packaged packet is transmitted via bottom transaction layer, is sent by interface unit to main frame;
Step 26, the packet sended over is unpacked and cached to internal memory by the bottom affairs layer unit of main frame;
Step 27, host CPU reads the data after accelerator processing by address procedures.
The Booting sequence of accelerator board has certain similitude with terminating flow, and host CPU is whole data-moving The supervisor of process and the accelerator course of work.Mainly there are 3 defects from above flow:
1) when application program is performed on host CPU, application program execution system on host CPU, which is called, has retardance, Cause the application program on host CPU to be possible to find the execution state of accelerator functional unit in time, bring certain Operating lag;
2) all data-movings are initiated by host CPU, when transmission DMA internal memories move command word, can neatly be controlled The destination address and size of data of the data-moving of system each time, possess certain flexibility, but if periodically quickly remove Data as phase shift, initiated if moving every time by CPU, the interaction of control process will turn into Redundant process, transmission effect Rate reduces;
3) if system design it is expected that resource utilization maximizes, then notifies host CPU to add using down trigger mode more The state of fast device board, due to the delay of interrupt response, the execution efficiency of accelerator will decline.
In summary, if the scheme of above-mentioned accelerator board is applied in the high type of service of requirement of real-time, such as base Stand under the Real-time digital signal processing scene of virtualization, it will bring the decline of execution efficiency, it is impossible to meet high real-time requires.
The content of the invention
Technical problems to be solved of the embodiment of the present invention be to provide a kind of data-moving method, accelerator board, main frame and Data-moving system, to improve the treatment effeciency of accelerator board.
In order to solve the above technical problems, a kind of data-moving method provided in an embodiment of the present invention, including:
The host memory address and internal memory for direct memory access DMA transfer that accelerator plate card Receiving Host is sent Data-moving size;
Accelerator board moves size according to the host memory address and internal storage data, preset accelerator board it is defeated Enter output data address, the assignment processing of operation is written and read to DMA register group;
Accelerator plate card start-up data moving between accelerator board and main frame.
The embodiment of the present invention additionally provides another data-moving method, including:
The data for needing accelerator board to handle are write rdma read address by main frame;
Main frame sends the host memory address and internal storage data for direct memory access DMA transfer to accelerator board Size is moved, the host memory address includes rdma read address and writes memory address;
Main frame receives the reading request for data bag for the host memory that accelerator board is sent;
Main frame encapsulation rdma read needs the data that accelerator board is handled in address, constructs internal storage data wrapper and sends To accelerator board.
The embodiments of the invention provide a kind of accelerator board, including DMA register group, in addition to:
Interface unit, for Receiving Host send for direct memory access DMA transfer host memory address and Internal storage data moves size;
Register assignment unit, for moving size, preset acceleration according to the host memory address and internal storage data The inputoutput data address of device board, the assignment processing of operation is written and read to DMA register group;
Accelerator functional unit, for filling in DMA active flags position, log-on data is between accelerator board and main frame Move.
The embodiment of the present invention additionally provides a kind of main frame, including:
Internal storage location, the interface unit being connected with accelerator board;
CPU element, for the data for needing accelerator board to handle to be write into rdma read address, sent to accelerator board Host memory address and internal storage data for direct memory access DMA transfer move size, the host memory address bag Include rdma read address and write memory address;
Bottom affairs layer unit, the reading request for data bag of the host memory for receiving the transmission of accelerator board, encapsulation are read The data for needing accelerator board to handle in memory address, construct internal storage data wrapper and send to accelerator board.
The embodiment of the present invention additionally provides a kind of data-moving system, including accelerator board as described above and main frame.
Compared with prior art, a kind of data-moving method, accelerator board, main frame and number provided in an embodiment of the present invention According to system of moving, master control side of the accelerator board as data-moving, eliminate main frame and be stuck in data-moving with accelerator plate The interaction of DMA configurations, improves treatment effeciency.
Brief description of the drawings
Fig. 1 is the Booting sequence schematic diagram of the accelerator board of prior art;
Fig. 2 is that the accelerator board processing of prior art terminates the schematic flow sheet that backward main frame is fed back;
Fig. 3 is the schematic flow sheet of data-moving method provided in an embodiment of the present invention;
Fig. 4 is another schematic flow sheet of data-moving method provided in an embodiment of the present invention;
Fig. 5 is the data-moving schematic diagram between the main frame and accelerator board of the embodiment of the present invention;
Fig. 6 is another schematic diagram of data-moving between the main frame and accelerator board of the embodiment of the present invention;
Fig. 7 is the application environment schematic diagram of one application example of data-moving provided in an embodiment of the present invention;
Fig. 8 is the schematic flow sheet of one application example of data-moving provided in an embodiment of the present invention.
Embodiment
To make the technical problem to be solved in the present invention, technical scheme and advantage clearer, below in conjunction with accompanying drawing and tool Body embodiment is described in detail.In the following description, there is provided such as specific configuration and the specific detail of component are only In order to help comprehensive understanding embodiments of the invention.Therefore, it will be apparent to those skilled in the art that can be to reality described herein Example is applied to make various changes and modifications without departing from scope and spirit of the present invention.In addition, for clarity and brevity, eliminate pair The description of known function and construction.
It should be understood that " one embodiment " or " embodiment " that specification is mentioned in the whole text mean it is relevant with embodiment During special characteristic, structure or characteristic are included at least one embodiment of the present invention.Therefore, occur everywhere in entire disclosure " in one embodiment " or " in one embodiment " identical embodiment is not necessarily referred to.In addition, these specific feature, knots Structure or characteristic can combine in one or more embodiments in any suitable manner.
In various embodiments of the present invention, it should be appreciated that the size of the sequence number of following each processes is not meant to perform suitable The priority of sequence, the execution sequence of each process should be determined with its function and internal logic, without the implementation of the reply embodiment of the present invention Process forms any restriction.
The embodiment of the present invention is on original accelerator function basis, it is proposed that real-time optimal solution, the present invention Embodiment designs fixed host memory structure, and the task allocation information to storage host simultaneously configures accelerator function on mark Will position;The data for needing accelerator to handle are write host memory by main frame;Master control of the accelerator functional unit as data-moving Side, control data move process between main frame and accelerator board, read data from host memory or are write to host memory Enter data;Main frame reads the data write after the processing of accelerator board by way of poll, judges that data return to state, goes forward side by side The follow-up task processing of row.Above procedure can reduce processing operating lag, improve treatment effeciency of the accelerator board to data.
Fig. 3 is refer to, the embodiments of the invention provide a kind of data-moving method, applied to accelerator board, the acceleration Device board can be host process inter-related task and feedback processing result.Here, main frame is the equipment using general processing platform, Such as base station equipment;Accelerator board is to be connected by interface with main frame, for performing the auxiliary equipment of specific calculation processing.Such as Shown in Fig. 3, the data-moving method comprises the following steps:
Step 31, the host memory address for direct memory access DMA transfer that accelerator plate card Receiving Host is sent And internal storage data moves size.
Here, main frame has allocated host memory address in advance, including rdma read address and writes memory address, and in being configured with Deposit data moves size.
Step 32, accelerator board moves size, preset accelerator according to the host memory address and internal storage data The inputoutput data address of board, the assignment processing of operation is written and read to DMA register group.
Here, accelerator board carries out relevant configuration to DMA register group, to realize to follow-up data-moving process Relevant parameter configures.For example, according to the host memory address, configuration DMA copies address register, including:According to the reading Memory address, configure the input data address in the inputoutput data address of accelerator board;Memory address is write according to described, Configure the output data address in the inputoutput data address of accelerator board.Furthermore it is also possible to according to data-moving size, Configure DMA copy data length registers etc..
Step 33, accelerator plate card start-up data moving between accelerator board and main frame.
Here, master control side of the accelerator board as data-moving, by filling in DMA active flags position, log-on data is removed Move past journey.
In the above step of the embodiment of the present invention, the startup of all round data-movings, by accelerator plate card start-up, Accelerator board becomes the master control side of data-moving, so as to improve prior art main frame as control caused by master control side The interaction relative redundancy of process, efficiency of transmission reduce the problem of.
Specifically, for the read operation of host memory, i.e. data are moved to accelerator board, above-mentioned steps 33 from main frame and had Body includes:
Step 331, accelerator board fills in the DMA active flags position for triggering host memory read operation.
Step 332, accelerator board encapsulates the reading request for data bag of host memory and sent to main frame, Receiving Host and sends Internal storage data wrapper, and unpack obtain initial data.
Step 333, according to default data structure, the validity of the initial data is judged, and in the initial data Initial data is handled when effectively.
Here it is possible to configure a data structure for host memory, the flag bit of correlation can be set in the data structure, Whether effectively (whether needing accelerator plate card to be handled) to identify initial data, accelerator board is according to the flag bit Judge whether initial data needs to be handled.
For the write operation of host memory, i.e. data are moved to main frame, above-mentioned steps 33 from accelerator board and specifically included:
Step 331', the initial data that accelerator board processing main frame is sent, obtains the number to be moved as result According to rear, the DMA active flags position for triggering host memory write operation is filled in.
Step 332', accelerator board treat that moving data is assembled into internal memory data packets by described.
Step 333', according to the inputoutput data address of accelerator board, the internal memory data packets are transmitted to institute State host memory address.
By above procedure, realize two-way between main frame and accelerator board of data and move.
Fig. 4 is refer to, the embodiments of the invention provide a kind of data-moving method, applied to host computer side, as shown in figure 4, The data-moving method comprises the following steps:
Step 41, the data for needing accelerator board to handle are write rdma read address by main frame.
Step 42, main frame to accelerator board send for direct memory access DMA transfer host memory address and Internal storage data moves size, and the host memory address includes rdma read address and writes memory address.
Here it is possible to by host CPU, the data for needing accelerator board to handle are write into rdma read address, with And sent to accelerator board and move size for the host memory address and internal storage data of direct memory access DMA transfer.
Step 43, main frame receives the reading request for data bag for the host memory that accelerator board is sent.
Step 44, the data that accelerator board is handled, construction internal storage data encapsulation are needed in main frame encapsulation rdma read address Wrap and send to accelerator board.
Here it is possible to the reading Shen of the host memory sent by the bottom affairs layer unit of main frame, reception accelerator board Please packet, and, the data for needing accelerator board to handle in encapsulation rdma read address, construction internal storage data wrapper is concurrent Deliver to accelerator board.
It is presented above data to move to the flow of accelerator board from main frame, it can be seen that it is by adding that this, which moves flow, Fast device board is initiated, so as to improve interaction relative redundancy of the prior art main frame as control process caused by master control side, The problem of efficiency of transmission reduces.In addition, host CPU is not involved in step 43, the processing of 44 data receiver, CPU place can be saved Resource is managed, cpu resource is used for other desired place.
Moved for data from accelerator board further comprising the steps of to the flow of main frame, the above method:
Step 45, main frame receives the internal memory data packets for the host memory address that accelerator board is sent.
Step 46, main frame is unpacked to the internal memory data packets, obtains data to be written, and with writing the host memory Location.
By above procedure, realize data and moved from accelerator board to the flow of main frame.It is main in the embodiment of the present invention Machine can use data of the polling mode inquiry with the presence or absence of accelerator board feedback, specifically, by writing internal memory described in poll Address, the data newly write are judged whether, if in the presence of the data newly write, at the triggering the new write-in data of the task Manage flow.
In view of the characteristic of general processing platform, poller module real-time may cannot be guaranteed, and the embodiment of the present invention is also Handled further directed to the poll of host computer side, be further superimposed general processing platform enhancing function, lift its real-time and reliable Property.Specifically, the present embodiment can affine using CPU (affinity) technology and big page technology improve the real-time of processing. It following is a brief introduction of the related notion of above two technology and the relevant treatment of the embodiment of the present invention.
1) big page technology
The memory management of processor includes two concepts:Physical memory and virtual memory.It is whole inside (SuSE) Linux OS Physical memory is managed by frame (frames), and virtual memory is managed according to page (page).Memory management unit (MMU) conversion from virtual memory address to physical memory addresses is completed.Memory management unit carries out the letter of address conversion needs Breath is stored in inside a data structure for being page table (page table), and page table lookup is a kind of extremely time-consuming operation.For The search procedure of page table is reduced, Intel processor realizes one piece of caching to preserve lookup result, and this block caching is referred to as TLB (Translation Lookaside Buffer), it saves virtual address to the mapping relations of physical address.All void Intended address before physical address is converted to, processor can search whether effective mapping relations be present in TLB first, If do not find effectively to map, that is, TLB miss, processor carry out the lookup of page table again.The search procedure pair of page table Performance impact is very big, it is therefore desirable to reduces TLB miss generation as far as possible.X86 processor hardwares under default configuration, page it is big Small is 4K, but can also support bigger page table size, such as 2M or 1G page table.After having used big page table function, a TLB List item can point to bigger region of memory, and TLB miss generation can so be greatly decreased.
The embodiment of the present invention further can then utilize big page technology, and above-mentioned host memory is distributed in HugePage, The management to memory pool (mempool) is realized, and has allocated an equal amount of mbuf in advance, is used for each packet.It is logical After distributing the host memory for DMA transfer from big page (HugePage) in advance, and distribute for the multiple of packet transmission The buffer unit of formed objects, the embodiment of the present invention can reduce the lookup time of page table, improve the efficiency of transmission of data.
2) CPU " affine " (affinity) technology
Modern operating system is all based on timesharing method of calling to realize task scheduling, and multiple processes or thread are at multinuclear Manage and be constantly alternately performed on some core of device.Each handoff procedure, it is required for the status register of processor being stored in In storehouse, and recover the status information of current process, this is a kind of processing expense in fact to system.One thread is fixed one Run on core, the overhead that switching is brought can be eliminated.In addition by process or thread migration to the other of polycaryon processor When being run on core, the data in processor cache are also required to be purged, and cause the utilizing status of processor cache to reduce. CPU affine technologs, exactly by some process, either thread is tied on specific one or multiple cores and performed, without being migrated Run on to other cores, so ensure that the performance of dedicated program.
In the embodiment of the present invention, the main frame further by the predetermined core of host CPU, is performed in being write described in the poll Deposit the operation of address.Specifically, main frame can call Linux pthread storehouses, corresponding thread and CPU are carried out in systems Core is bound, and then corresponding thread carries out data polling processing using independent resource as far as possible, so as to reach lifting real-time The purpose of processing.
Illustrate the data-moving process of the embodiment of the present invention from accelerator board side and host computer side respectively above.Enter below One step combination Fig. 5 and Fig. 6 flow, illustrate the interaction of main frame and accelerator board.
Fig. 5 and Fig. 6 are refer to, data are sent to accelerator unit from main frame, or main frame receives data from accelerator Two flows are combined into one and are described:
Step 51, host CPU sends the host memory address of DMA transfer by interface unit to accelerator board, and Internal storage data moves size, includes DMA write host memory and DMA reads host memory.
Step 52, accelerator plate card receives the host memory address of main frame transmission, and internal memory by its interface unit Data-moving size, corresponding contents are write into register, the inputoutput data address of preset accelerator function.
Step 53, accelerator functional unit fills in DMA active flags position, trigger data is moved as task master control side.
Step 54, packet encapsulation/decapsulation module of bottom affairs layer unit receives DMA kick command word.Below It is classified into host memory write operation and host memory read operation illustrates respectively:
If host memory write operation, then following step is performed:
Step 55, treat that moving data group bag is internal memory data packets by what the processing of accelerator functional unit was completed.
Step 56, packaged packet is sent to bottom affairs layer unit, transmitted by interface unit to main frame Bottom affairs layer unit.
Step 57, the packet sended over is unpacked and cached to host memory by the bottom affairs layer unit of main frame.
Step 58, the poll processing unit audit memory state of host CPU, real time access accelerator processing terminate after it is defeated Go out the data situation of core position, accelerator data processing is completed and reached internal memory if finding, carries out data processing Follow-up process.
If host memory read operation, then following step is performed:
In above-mentioned steps 54, bottom affairs layer unit automatic packaging internal memory reads request for data bag;
Step 57, (preset data is advance from step 58 poll processing unit for the preset data in main frame encapsulation internal memory The pending data filled), after constructing encapsulated data packet, the bottom transaction layer for sending data to main frame enters line interface transmission To accelerator board.
Step 56, after the interface unit of accelerator board receives data, it is responsible for unpacking by bag receiving module, obtains original number According to content.
Step 55, original data content is transmitted to accelerator functional unit, the accelerator processing procedure of log-on data bag, Accelerator judges the validity of obtained data, performs subsequent operation.
From above flow it can also be seen that the embodiment of the present invention and at least following difference of traditional process:
1) startup of all round data-movings, start by the accelerator functional unit in accelerator board;2) exist Increase poll processing unit in host CPU, host CPU is no longer participate in the handling process of accelerator functional unit, only for internal memory Change is handled;3) accelerator functional unit is obtained after pending data, it is necessary to additionally increase judges data validity function, Data are not handled if the data invalid to internal memory is moved.
The embodiment of the present invention additionally provides a kind of data-moving system, including accelerator board as shown in Figure 5 and main frame. Wherein, accelerator board provided in an embodiment of the present invention, including:
DMA register group, specifically include Data Transmission Controlling register group, DMA copies address register, DMA copy numbers According to length register, DMA control register etc.;
Bottom affairs layer unit;
Interface unit, for Receiving Host send for direct memory access DMA transfer host memory address and Internal storage data moves size;
Register assignment unit, for moving size, preset acceleration according to the host memory address and internal storage data The inputoutput data address of device board, the assignment processing of operation is written and read to DMA register group;
Accelerator functional unit, for filling in DMA active flags position, log-on data is between accelerator board and main frame Move.
When carrying out host memory read operation, the bottom affairs layer unit, for touching according to DMA active flags position Hair, encapsulate the reading request for data bag of host memory and send to main frame;And the internal storage data wrapper that Receiving Host is sent, And unpack and obtain initial data.The accelerator functional unit, specifically for filling in the DMA for triggering host memory read operation Active flag position;And according to default data structure, judge the validity for the initial data that bottom affairs layer unit obtains, And initial data is handled when the initial data is effective.
When carrying out host memory write operation, the accelerator functional unit, it is additionally operable in the original of processing main frame transmission After data, obtain as result after moving data, fill in the DMA active flags for triggering host memory write operation Position;Bottom affairs layer unit, is additionally operable to the triggering according to DMA active flags position, treats that moving data is assembled into internal memory and write by described Packet;And the inputoutput data address according to accelerator board, the internal memory data packets are transmitted to the main frame Memory address.
Main frame as shown in Figure 5, it includes:
Internal storage location, the interface unit being connected with accelerator board;
CPU element, for the data for needing accelerator board to handle to be write into rdma read address, sent to accelerator board Host memory address and internal storage data for direct memory access DMA transfer move size, the host memory address bag Include rdma read address and write memory address;And
Bottom affairs layer unit, the reading request for data bag of the host memory for receiving the transmission of accelerator board, encapsulation are read The data for needing accelerator board to handle in memory address, construct internal storage data wrapper and send to accelerator board.
When receiving data from accelerator board side joint, the bottom affairs layer unit, it is additionally operable to receive accelerator board hair The internal memory data packets for the host memory address sent;And the internal memory data packets are unpacked, obtain to be written Data, and write the host memory address.
Above-mentioned main frame also includes poll processing unit, and the poll processing unit can be arranged in CPU element.Specifically, The poll processing unit, for writing memory address described in poll, judge whether the data newly write;It is and new existing During the data of write-in, the handling process of the triggering new write-in data.
In order to improve processing host process efficiency, delay is reduced, the poll processing unit, further passes through host CPU Predetermined core, perform the operation that memory address is write described in the poll.In addition, the CPU element, can be also used in advance from big The host memory for DMA transfer is distributed in page HugePage, and distributes the slow of multiple formed objects for packet transmission Memory cell.
Finally, then by the channel coding accelerator of a radio open and the base station equipment using general processing platform it Between specific example the present invention will be further described.
The function description of channel coding accelerator:In 4G radio communications, a nucleus module of LTE physical layers is that channel is compiled Code module, it is that will be sent to the source code flow of terminal that it, which is acted on, is compiled as 1:3 redundant data, pass through the volume of redundant data Code correlation and repeatability allow receiver to parse initial data code stream by decoding algorithm, lift wireless communication procedure In data transfer error correcting capability.Turbo codings are one of conventional Error Corrections of Coding, and this example is carried out not directed to Turbo code Principle is lifted, but stresses description after Turbo channel codings are embedded in general processing platform as accelerator, base station side The working mechanism of poll processing unit, reach the purpose to be communicated with Turbo coding accelerator module real-time Communication for Power.Poll processing is single The task of member judges that polling procedure is as shown in Figure 7.Turbo coding poll processing unit flow of task is then as shown in Figure 8.
The maintenance of inactive queue of task is that poll processing unit is realized in flow it can be seen from Fig. 7~Fig. 8.Appoint Business queue can use chained list to realize, when there is new Turbo encoding tasks to reach, add newest Turbo by poll and compile Code task configures input and output memory address space and length to task queue.The then number in poll returned data space According to, if having data return if represent task queue in some task completed, having completed cataloged procedure of the task will Deleted from task queue.Poll processing unit is no termination procedure, all the time appointing in the algorithm mission dispatching of poll upper strata Business and the data space returned from accelerator board.
Corresponding, Turbo code function of hardware acceleration unit can periodically copy whole task queue, and with this ground The task queue once copied is compared, if finding there is new task insertion, whether the running status according to itself (has enough Resource perform new task), judge whether start pending data memory copying move work.Due to accelerator function by Hardware realizes that its real-time is high, and the cycle for copying task queue can reach us levels, it is ensured that Turbo code accelerator pair The real-time of outdoor main unit.
More traditional hardware accelerator processing scheme is compared from the embodiment of the present invention described above that can be seen that, is eliminated Main frame is stuck in the interaction of the DMA configurations of data-moving with accelerator plate, improves treatment effeciency.In addition, the present invention is implemented Example uses polling mode audit memory by main frame, rather than by interface poll accelerator register, improves search efficiency.Separately Outside, the embodiment of the present invention additionally uses thread binding core and the big page skill upgrading of the general processing platform reality of polling procedure Shi Xing, reducing the non real-time nature of general processing platform influences.
Described above is the preferred embodiment of the present invention, it is noted that for those skilled in the art For, on the premise of principle of the present invention is not departed from, some improvements and modifications can also be made, these improvements and modifications It should be regarded as protection scope of the present invention.

Claims (16)

  1. A kind of 1. data-moving method, it is characterised in that including:
    The host memory address and internal storage data for direct memory access DMA transfer that accelerator plate card Receiving Host is sent Move size;
    Accelerator board moves size according to the host memory address and internal storage data, and the input of preset accelerator board is defeated Go out data address, the assignment processing of operation is written and read to DMA register group;
    Accelerator plate card start-up data moving between accelerator board and main frame.
  2. 2. data-moving method as claimed in claim 1, it is characterised in that the accelerator plate card start-up data are in accelerator Between board and main frame the step of moving, including:
    Accelerator board fills in the DMA active flags position for triggering host memory read operation;
    Encapsulate the reading request for data bag of host memory and send the internal storage data wrapper sent to main frame, Receiving Host, and tear open Bag obtains initial data;
    According to default data structure, the validity of the initial data is judged, and when the initial data is effective to original Data are handled.
  3. 3. data-moving method as claimed in claim 2, it is characterised in that the accelerator plate card start-up data are in accelerator Between board and main frame the step of moving, in addition to:
    The initial data that accelerator board processing main frame is sent, obtains as result after moving data, fills in and be used for Trigger the DMA active flags position of host memory write operation;
    Treat that moving data is assembled into internal memory data packets by described;
    According to the inputoutput data address of accelerator board, the internal memory data packets are transmitted to the host memory Location.
  4. A kind of 4. data-moving method, it is characterised in that including:
    The data for needing accelerator board to handle are write rdma read address by main frame;
    Main frame sends to accelerator board and moved for the host memory address and internal storage data of direct memory access DMA transfer Size, the host memory address include rdma read address and write memory address;
    Main frame receives the reading request for data bag for the host memory that accelerator board is sent;
    Main frame encapsulates the data for needing accelerator board to handle in rdma read address, constructs internal storage data wrapper and transmission extremely adds Fast device board.
  5. 5. data-moving method as claimed in claim 4, it is characterised in that also include:
    Main frame receives the internal memory data packets for the host memory address that accelerator board is sent;And to described interior Deposit data packets to unpack, obtain data to be written, and write the host memory address.
  6. 6. data-moving method as claimed in claim 4, it is characterised in that also include:
    Memory address is write described in host computer polls, judges whether the data newly write;
    When the data newly write be present, the task handling process of the triggering new write-in data.
  7. 7. data-moving method as claimed in claim 4, it is characterised in that
    Sent in the main frame to accelerator board and move size for the host memory address and internal storage data of DMA transfer Before step, in addition to:
    The host memory for DMA transfer is distributed from big page HugePage in advance, and is distributed for the multiple of packet transmission The buffer unit of formed objects.
  8. 8. a kind of accelerator board, including DMA register group, it is characterised in that also include:
    Interface unit, the host memory address and internal memory for direct memory access DMA transfer sent for Receiving Host Data-moving size;
    Register assignment unit, for moving size, preset accelerator plate according to the host memory address and internal storage data The inputoutput data address of card, the assignment processing of operation is written and read to DMA register group;
    Accelerator functional unit, for log-on data moving between accelerator board and main frame.
  9. 9. accelerator board as claimed in claim 8, it is characterised in that also include:
    Bottom affairs layer unit, for the triggering according to DMA active flags position, the reading request for data bag for encapsulating host memory is concurrent Deliver to main frame;And the internal storage data wrapper that Receiving Host is sent, and unpack and obtain initial data;
    The accelerator functional unit, specifically for filling in the DMA active flags position for triggering host memory read operation;With And according to default data structure, judge the validity for the initial data that bottom affairs layer unit obtains, and in the original number According to it is effective when initial data is handled.
  10. 10. accelerator board as claimed in claim 9, it is characterised in that
    The accelerator functional unit, it is additionally operable to after the initial data that processing main frame is sent, acquisition is treated as result After moving data, the DMA active flags position for triggering host memory write operation is filled in;
    The bottom affairs layer unit, is additionally operable to the triggering according to DMA active flags position, by it is described treat that moving data is assembled into Deposit data packets;And the inputoutput data address according to accelerator board, the internal memory data packets are transmitted to described Host memory address.
  11. A kind of 11. main frame, it is characterised in that including:
    Internal storage location, the interface unit being connected with accelerator board;
    CPU element, for the data for needing accelerator board to handle to be write into rdma read address, send and be used for accelerator board The host memory address and internal storage data of direct memory access DMA transfer move size, and the host memory address includes reading Memory address and write memory address;
    Bottom affairs layer unit, the reading request for data bag of the host memory for receiving the transmission of accelerator board, encapsulates rdma read The data for needing accelerator board to handle in address, construct internal storage data wrapper and send to accelerator board.
  12. 12. main frame as claimed in claim 11, it is characterised in that
    The bottom affairs layer unit, the internal memory for the host memory address for being additionally operable to receive the transmission of accelerator board are write Packet;And the internal memory data packets are unpacked, data to be written are obtained, and write the host memory address.
  13. 13. main frame as claimed in claim 12, it is characterised in that the CPU element includes:
    Poll processing unit, for writing memory address described in poll, judge whether the data newly write;And exist During the data newly write, the handling process of the triggering new write-in data.
  14. 14. main frame as claimed in claim 13, it is characterised in that
    The poll processing unit, further by the predetermined core of host CPU, perform the behaviour that memory address is write described in the poll Make.
  15. 15. main frame as claimed in claim 11, it is characterised in that
    The CPU element, it is additionally operable to distribute the host memory for DMA transfer from big page HugePage in advance, and distributes use In the buffer unit of multiple formed objects of packet transmission.
  16. 16. a kind of data-moving system, it is characterised in that including the accelerator board as described in any one of claim 8~10 With the main frame described in any one of claim 11~15.
CN201610848270.7A 2016-09-23 2016-09-23 Data moving method, accelerator board card, host and data moving system Active CN107870879B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610848270.7A CN107870879B (en) 2016-09-23 2016-09-23 Data moving method, accelerator board card, host and data moving system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610848270.7A CN107870879B (en) 2016-09-23 2016-09-23 Data moving method, accelerator board card, host and data moving system

Publications (2)

Publication Number Publication Date
CN107870879A true CN107870879A (en) 2018-04-03
CN107870879B CN107870879B (en) 2020-04-03

Family

ID=61751562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610848270.7A Active CN107870879B (en) 2016-09-23 2016-09-23 Data moving method, accelerator board card, host and data moving system

Country Status (1)

Country Link
CN (1) CN107870879B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111031011A (en) * 2019-11-26 2020-04-17 中科驭数(北京)科技有限公司 Interaction method and device of TCP/IP accelerator
CN111045817A (en) * 2019-11-08 2020-04-21 福州瑞芯微电子股份有限公司 PCIe transmission management method, system and device
CN111367839A (en) * 2020-02-21 2020-07-03 苏州浪潮智能科技有限公司 Data synchronization method between host terminal and FPGA accelerator
CN111817818A (en) * 2019-04-11 2020-10-23 中国移动通信集团四川有限公司 Method and device for transmitting data packet
CN111858457A (en) * 2020-07-15 2020-10-30 苏州浪潮智能科技有限公司 Data processing method, device and system and FPGA
CN112131164A (en) * 2020-09-23 2020-12-25 山东云海国创云计算装备产业创新中心有限公司 Data scheduling method and device applied to acceleration board card, acceleration board card and medium
CN112286688A (en) * 2020-11-05 2021-01-29 北京深维科技有限公司 Memory management and use method, device, equipment and medium
CN113190480A (en) * 2021-07-01 2021-07-30 深圳华云信息系统有限公司 Data transmission method and device, electronic equipment and computer readable storage medium
CN114741338A (en) * 2022-06-06 2022-07-12 飞腾信息技术有限公司 Bypass conversion buffer, data updating method, memory management unit and chip

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101046786A (en) * 2007-04-27 2007-10-03 北京中星微电子有限公司 DMA controller and transmission method of implementing high efficient DMA transmission
CN103646001A (en) * 2013-12-12 2014-03-19 北京经纬恒润科技有限公司 DMA (direct memory access) data transmission control method and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101046786A (en) * 2007-04-27 2007-10-03 北京中星微电子有限公司 DMA controller and transmission method of implementing high efficient DMA transmission
CN103646001A (en) * 2013-12-12 2014-03-19 北京经纬恒润科技有限公司 DMA (direct memory access) data transmission control method and system

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111817818A (en) * 2019-04-11 2020-10-23 中国移动通信集团四川有限公司 Method and device for transmitting data packet
CN111045817A (en) * 2019-11-08 2020-04-21 福州瑞芯微电子股份有限公司 PCIe transmission management method, system and device
CN111045817B (en) * 2019-11-08 2023-09-26 瑞芯微电子股份有限公司 PCIe transmission management method, system and device
CN111031011B (en) * 2019-11-26 2020-12-25 中科驭数(北京)科技有限公司 Interaction method and device of TCP/IP accelerator
CN111031011A (en) * 2019-11-26 2020-04-17 中科驭数(北京)科技有限公司 Interaction method and device of TCP/IP accelerator
WO2021164452A1 (en) * 2020-02-21 2021-08-26 苏州浪潮智能科技有限公司 Method for data synchronization between host end and fpga accelerator
US11762790B2 (en) 2020-02-21 2023-09-19 Inspur Suzhou Intelligent Technology Co., Ltd. Method for data synchronization between host side and FPGA accelerator
CN111367839B (en) * 2020-02-21 2021-06-29 苏州浪潮智能科技有限公司 Data synchronization method between host terminal and FPGA accelerator
CN111367839A (en) * 2020-02-21 2020-07-03 苏州浪潮智能科技有限公司 Data synchronization method between host terminal and FPGA accelerator
JP2022554039A (en) * 2020-02-21 2022-12-27 ▲蘇▼州浪潮智能科技有限公司 Data synchronization method between host side and FPGA accelerator
JP7270113B2 (en) 2020-02-21 2023-05-09 ▲蘇▼州浪潮智能科技有限公司 Data synchronization method between host side and FPGA accelerator
CN111858457A (en) * 2020-07-15 2020-10-30 苏州浪潮智能科技有限公司 Data processing method, device and system and FPGA
CN111858457B (en) * 2020-07-15 2023-01-10 苏州浪潮智能科技有限公司 Data processing method, device and system and FPGA
CN112131164A (en) * 2020-09-23 2020-12-25 山东云海国创云计算装备产业创新中心有限公司 Data scheduling method and device applied to acceleration board card, acceleration board card and medium
CN112131164B (en) * 2020-09-23 2022-06-17 山东云海国创云计算装备产业创新中心有限公司 Data scheduling method and device applied to acceleration board card, acceleration board card and medium
CN112286688A (en) * 2020-11-05 2021-01-29 北京深维科技有限公司 Memory management and use method, device, equipment and medium
CN112286688B (en) * 2020-11-05 2024-01-05 北京深维科技有限公司 Memory management and use method, device, equipment and medium
CN113190480A (en) * 2021-07-01 2021-07-30 深圳华云信息系统有限公司 Data transmission method and device, electronic equipment and computer readable storage medium
CN114741338B (en) * 2022-06-06 2022-09-13 飞腾信息技术有限公司 Bypass conversion buffer, data updating method, memory management unit and chip
CN114741338A (en) * 2022-06-06 2022-07-12 飞腾信息技术有限公司 Bypass conversion buffer, data updating method, memory management unit and chip

Also Published As

Publication number Publication date
CN107870879B (en) 2020-04-03

Similar Documents

Publication Publication Date Title
CN107870879A (en) A kind of data-moving method, accelerator board, main frame and data-moving system
EP3748510A1 (en) Network interface for data transport in heterogeneous computing environments
US7996569B2 (en) Method and system for zero copy in a virtualized network environment
US9781211B2 (en) Storage device having master and slave storage device modes
US20200133909A1 (en) Writes to multiple memory destinations
US8494833B2 (en) Emulating a computer run time environment
US20080189432A1 (en) Method and system for vm migration in an infiniband network
EP3593253B1 (en) Communication method and device for virtual base stations
CN102301671B (en) Message communication techniques
US20160350261A1 (en) Memory Deduplication Support for Remote Direct Memory Access (RDMA)
EP2284702A1 (en) Operating cell processors over a network
CN114553635B (en) Data processing method, data interaction method and product in DPU network equipment
CN105677597A (en) Data writing method and device
US20050144402A1 (en) Method, system, and program for managing virtual memory
EP3077914B1 (en) System and method for managing and supporting virtual host bus adaptor (vhba) over infiniband (ib) and for supporting efficient buffer usage with a single external memory interface
US20230251894A1 (en) Method of executing programmable atomic unit resources within a multi-process system
US20220138142A1 (en) Initialization sequencing of chiplet i/o channels within a chiplet system
WO2022115166A2 (en) Registering a custom atomic operation with the operating system
CN102799491A (en) Inter-virtual-machine secure communication method
CN114925012A (en) Ethernet frame issuing method, Ethernet frame uploading method and related devices
CN103678163A (en) Method, device and system for switching of data stream
US11698791B2 (en) On-demand programmable atomic kernel loading
CN115473811A (en) Network performance optimization method, device, equipment and medium
KR102426416B1 (en) Method for processing input and output on multi kernel system and apparatus for the same
US20220121486A1 (en) Rescheduling a failed memory request in a processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant