CN101650698B - Method for realizing direct memory access - Google Patents

Method for realizing direct memory access Download PDF

Info

Publication number
CN101650698B
CN101650698B CN2009100918351A CN200910091835A CN101650698B CN 101650698 B CN101650698 B CN 101650698B CN 2009100918351 A CN2009100918351 A CN 2009100918351A CN 200910091835 A CN200910091835 A CN 200910091835A CN 101650698 B CN101650698 B CN 101650698B
Authority
CN
China
Prior art keywords
buffer area
message
network interface
interface card
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN2009100918351A
Other languages
Chinese (zh)
Other versions
CN101650698A (en
Inventor
聂华
邵宗有
历军
窦晓光
刘新春
刘朝晖
李永成
贺志强
刘兴奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dawning Information Industry Beijing Co Ltd
Dawning Information Industry Co Ltd
Original Assignee
Dawning Information Industry Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dawning Information Industry Beijing Co Ltd filed Critical Dawning Information Industry Beijing Co Ltd
Priority to CN2009100918351A priority Critical patent/CN101650698B/en
Publication of CN101650698A publication Critical patent/CN101650698A/en
Application granted granted Critical
Publication of CN101650698B publication Critical patent/CN101650698B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a method for realizing direct memory access and an apparatus, wherein the method comprises the following steps: a network card determines threads corresponding to messages according to control information of received messages; the network card determines cache regions corresponding to the threads on a host computer and writes the messages to the cache regions, wherein the cache regions correspond to CPU cores on the host computer. By using the invention, each DMA array corresponds to a CPU core and a software thread, and data processing threads of software are rarely interactive, thereby avoiding access conflicts in technology related and reducing multithread synchronization spending in a mono-array DMA method. Processor resources of the system are used adequately to improve the transmission bandwidth of DMA data and the processing efficiency.

Description

The implementation method of direct memory visit
Technical field
The present invention relates to the communications field, relate in particular to the implementation method of a kind of direct memory visit (Direct MemoryAccess abbreviates DMA as).
Background technology
DMA is a kind of high-speed data transmission operation, it allows externally directly to carry out data write between the equipment and storer, CPU need do when data transmission begins and finish a bit and handle, and the process of whole read-write does not neither need CPU to intervene by CPU yet, and CPU can carry out other work in transmission course.That is to say that in the most of the time of dma operation, the processing that CPU carries out and the input-output operation of storer are in the executed in parallel state, therefore can improve the efficient of whole computer system greatly.
At existing symmetrical multiprocessing (Symmetrical Multi-Processing, abbreviate SMP as) in the system, the DMA pattern adopts single engine list formation to realize, a DMA data of a plurality of processors sharing are that each CPU vouching is solely safeguarded a piece of data by the memory copying mode perhaps.Like this, when the DMA data of an equipment of a plurality of CPU nuclear parallel processings, can there be a large amount of memory access conflicts between a plurality of CPU, if avoid these conflicts just need carry out synchronous operation, cause cpu resource to be taken in a large number, and can reduce the efficient of dma operation.
At existing access conflict to cause cpu resource to take and the inefficient problem of dma operation owing to dma operation in the correlation technique, effective solution is not proposed as yet at present.
Summary of the invention
There is access conflict and causes cpu resource to take and the inefficient problem of dma operation at direct memory access operation in the correlation technique, the object of the present invention is to provide the implementation of a kind of direct memory visit, with in addressing the above problem one of at least.
For achieving the above object, according to the present invention, provide a kind of implementation method of direct memory visit.
Implementation method according to direct memory of the present invention visit comprises: network interface card is determined and message thread one to one according to the control information of the message of its reception; Network interface card determines on the main frame and thread buffer area one to one, and message is write in the buffer area, and wherein, each buffer area is corresponding one by one with CPU nuclear on the main frame.
Wherein, network interface card determines that according to the control information of message the operation of its respective thread comprises: network interface card carries out Hash calculation according to the control information of message and obtains cryptographic hash, and determines corresponding thread according to cryptographic hash.
And, determine and thread one to one after the buffer area that at network interface card this method can further comprise: network interface card is determined the address and the length of buffer area according to the buffer area descriptor information of its local cache, so that carry out writing of message afterwards.
In addition, network interface card comprises the processing that message writes buffer area: network interface card sends the request that writes to main frame, and returns in the request that writes at response of host and to finish writing of message when finishing signal.
At last, after network interface card write message in the buffer area, this method can also comprise: network interface card its local update this in end address that buffer area writes.
Preferably, above-mentioned control information is one of following: a tuple information, binary group information, triplet information, quaternary group information, five-tuple information.
According to a further aspect in the invention, provide a kind of implement device of direct memory visit, this device can be arranged at the network interface card side.
Implement device according to direct memory visit of the present invention comprises: receiver module is used to receive message; First determination module, the control information that is used for the message that receives according to receiver module are determined and message thread one to one; Second determination module is used for determining on the main frame and the thread of determining buffer area one to one; Writing module is used for message is write definite buffer area, and wherein, the CPU nuclear on each buffer area and the main frame is corresponding one by one.
Wherein, first determination module can be used for carrying out Hash calculation according to the control information of message and obtains cryptographic hash, and determines corresponding thread according to cryptographic hash.
This device may further include: register is used for the address and the length of buffer area on the storage host, so that writing module reference when writing message.
Preferably, above-mentioned control information is one of following: a tuple information, binary group information, triplet information, quaternary group information, five-tuple information.
By above-mentioned at least one technical scheme of the present invention, the method of DMA by the single engines of many formations, make the corresponding CPU nuclear of each DMA formation and a software thread, almost not mutual between each data processing thread of software, avoided the access conflict in the correlation technique, reduce the multithreading synchronization overhead in single formation DMA method, make full use of the processor resource of system, improve DMA data transfer bandwidth and treatment effeciency.
Description of drawings
Fig. 1 is the process flow diagram according to the implementation method of the direct memory visit of the inventive method embodiment;
Fig. 2 is the block diagram according to the implement device of the direct memory visit of apparatus of the present invention embodiment;
Fig. 3 carries out the synoptic diagram that message writes according to the implement device that the direct memory of apparatus of the present invention embodiment is visited to main frame.
Embodiment
Functional overview
At existing access conflict to cause cpu resource to take and the inefficient problem of dma operation owing to dma operation in the correlation technique, the present invention proposes to put buffer area one to one for each CPU caryogamy of main frame on main frame, this buffer area thread with this CPU nuclear equally is corresponding one by one, network interface card writes side in the corresponding cache region according to the corresponding relation of pre-configured thread and message with the message that receives, thereby avoid a plurality of CPU nuclears to visit the buffer areas of sharing and the access conflict that causes, improved the efficient of dma operation.
To describe embodiments of the invention in detail below.
Method embodiment
A kind of implementation method of direct memory visit is provided in the present embodiment.
Before realizing the solution of the present invention, at first need thread configuration corresponding cache region to each CPU nuclear of main frame, the information of each buffer area is recorded on the network interface card, and disposes the message cryptographic hash of each thread correspondence, concrete corresponding both sides can be cryptographic hash and thread number.
Fig. 1 is the process flow diagram according to the implementation method of the direct memory visit of the inventive method embodiment.As shown in Figure 1, the implementation method of visiting according to the direct memory of present embodiment comprises:
Step S102, network interface card is determined and message thread one to one according to the control information of the message of its reception, particularly, network interface card can carry out Hash calculation according to the control information of message, determine the thread of cryptographic hash correspondence, the control information of message can be a tuple information, binary group information, triplet information, quaternary group information, five-tuple information;
Step S104, network interface card determine on the main frame and thread buffer area one to one, and message is write in the buffer area, and wherein, each buffer area is corresponding one by one with CPU nuclear on the main frame.
Wherein, determine and thread one to one after the buffer area that at network interface card network interface card need be determined the address and the length of this buffer area according to the buffer area descriptor information of its local cache, so that carry out writing of message afterwards.
When network interface card write buffer area with message, network interface card can send the request that write to main frame, and returned in the request that writes at response of host and to finish writing of message when finishing signal.And when writing of network interface card transmission asked greatly, network interface card sent to main frame after the request of its transmission at first can being divided into a plurality of little requests.
After network interface card writes message in the buffer area, network interface card its local update this in the end address that buffer area writes, so that continue to write from this address when in this buffer area, writing message next time.
By above-mentioned processing, make the CPU only to visit and its corresponding cache district, to have avoided in the correlation technique because the access conflict that interleaving access brings, the resource of saving CPU can effectively improve the efficient of dma operation.
Device embodiment
In the present embodiment, provide a kind of implement device of direct memory visit, this device can be arranged at the network interface card side.
Fig. 2 is the block diagram according to the implement device of the direct memory visit of present embodiment.As shown in Figure 2, the implement device of visiting according to the direct memory of present embodiment comprises:
Receiver module 1 is used to receive message;
First determination module 2, be connected to receiver module 1, the control information that is used for the message that receives according to receiver module is determined and message thread one to one, preferably, first determination module 2 can calculate according to the message control information, obtain cryptographic hash, and determine corresponding thread according to the corresponding relation of pre-configured cryptographic hash and thread number; Preferably, the control information of message can be a tuple information, binary group information, triplet information, quaternary group information or five-tuple information
Second determination module 3 is connected to first determination module 2, is used for determining on the main frame and the thread of being determined by first determination module 2 buffer area one to one;
Writing module 4 is connected to second determination module 3, is used for message is write the buffer area of determining before second determination module 3, and wherein, each buffer area is examined corresponding one by one with CPU on the main frame.
This device can also comprise: register 5, be connected to the writing module 4 and second determination module 3, the address and the length that are used for buffer area on the storage host, so that when second determination module 3 is determined buffer area and writing module 4 reference when writing message, thereby determine the position of the buffer area that need write.
In actual applications, the implement device of visiting according to direct memory of the present invention can be connected with main frame according to mode shown in Figure 3, can further divide or merge according to the function of module in the implement device of direct memory visit of the present invention.
As shown in Figure 3, the left side is the network interface card side, and the right is a host computer side, mainly comprises in the network interface card side: the data acquisition and processing (DAP) module, and be used to receive message, and can carry out Ha Xi according to the control information of message and calculate, obtain cryptographic hash; Determination module is used for handling carry out some from the cryptographic hash of data acquisition and processing (DAP) module, determines corresponding thread and buffer area afterwards,, can realize definite function of first determination module 2 and second determination module 3 among Fig. 2 that is; DMA writes engine, be that DMA among Fig. 3 writes control module, the tissue, the formation that are used to carry out the DMA write request are selected (promptly, the multiselect one of realization buffer area), divide flow algorithm (promptly, determine thread) and DMA write safeguarding (promptly of descriptor, information such as buffer area address of preserving in the register of renewal network interface card this locality and length), and the interface controller to main frame before can writing at the buffer area to main frame sends the request that writes, can also and be packaged into the controller data packet format with request subpackage to be sent, re-send to host interface controller afterwards; Register is used to store information such as the address of each buffer area and length.
Fashionable, concrete processing procedure is as follows writing:
According to the quantity of CPU nuclear DMA is write the formation of engine allocation buffer when (1) main frame loads the driving of binding with network card equipment, that is, each CPU is all distributed a buffer area one to one, and buffer zone formation associated description information is write the network interface card register;
(2) network interface card is when receiving the IP message, the data acquisition and processing (DAP) module extracts four-tuple (also can be a tuple, two tuples, tlv triple or five-tuple) in the IP message packet header, utilize hash algorithm to calculate a hash value, then IP message and corresponding hash value are submitted to determination module;
(3) determination module is calculated thread number under the message according to hash value and the good branch flow algorithm of software arrangements, selects the descriptor information (address of formation, length) of corresponding D MA buffer area formation then according to thread number;
(4) DMA writes control module and writes request and be packaged into PCIe transaction layer data packet format and send to PCIe interface (or PCIe interface controller) to be sent, writes control module to DMA and return and finish signal when all data transmission of request are finished;
(5) DMA writes the control module request of receiving and finishes the transmission course that signal is finished an IP message afterwards, upgrades the information of DMA compose buffer formation in the register then, and begins to handle the transmission of next IP bag.
In above-mentioned steps (5), writing fashionablely, the chipset of host computer side can be with in certain buffer area in the main memory of data owner pusher side, comprise in the main memory with a plurality of CPU nuclear core 1, core 2 ..., core n corresponding cache district 1 is to buffer area n.
In the present invention, DMA writes engine (DMA writes control module) can support the formation of 64 DMA compose buffers, and the buffer zone number of queues of software startup and the flow proportional of each formation can dispose, thereby satisfy the requirement of current and coming years of main flow SMP server, and can write engine to DMA at an easy rate and expand, make its support 128 even more buffer zone formation.
By said apparatus, make the CPU only to visit and its corresponding cache district, to have avoided in the correlation technique because the access conflict that interleaving access brings, the resource of saving CPU can effectively improve the efficient of dma operation.
In sum, the present invention is directed to symmetrical multiprocessing (Symmetrical Multi Processing, abbreviate SMP as) framework of many CPU multinuclear of system, the DMA method of the single engine of many formations has been proposed, the corresponding CPU nuclear of each DMA formation and a software thread, almost not mutual between each data processing thread of software, avoided access conflict and the access resources competition in the correlation technique, reduce multithreading synchronization overhead and operating system protocol stack overhead in single formation DMA method, make full use of the processor resource of system, improve DMA data transfer bandwidth and treatment effeciency, improve system performance.
Obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with the general calculation device, they can concentrate on the single calculation element, perhaps be distributed on the network that a plurality of calculation element forms, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in the memory storage and carry out by calculation element, perhaps they are made into each integrated circuit modules respectively, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (8)

1. the implementation method of a direct memory visit is characterized in that, comprising:
Network interface card carries out Hash calculation according to the control information of message and obtains cryptographic hash, and determines corresponding thread according to described cryptographic hash, thereby determines and described message thread one to one;
Described network interface card determines on the main frame and described thread buffer area one to one, and described message is write in the described buffer area, and wherein, each buffer area is corresponding one by one with CPU nuclear on the described main frame.
2. method according to claim 1 is characterized in that, determines and described thread one to one after the buffer area that at described network interface card described method further comprises:
Described network interface card is determined the address and the length of described buffer area according to the buffer area descriptor information of its local cache, so that carry out writing of message afterwards.
3. method according to claim 1 is characterized in that, described network interface card comprises the processing that described message writes described buffer area:
Described network interface card sends the request that writes to described main frame, and returns in the said write request at described response of host and to finish writing of message when finishing signal.
4. method according to claim 1 is characterized in that, after described network interface card write described message in the described buffer area, described method also comprised:
Described network interface card its local update this in end address that described buffer area writes.
5. according to each described method in the claim 1 to 4, it is characterized in that described control information is one of following: a tuple information, binary group information, triplet information, quaternary group information, five-tuple information.
6. the implement device of a direct memory visit is arranged at the network interface card side, it is characterized in that described device comprises:
Receiver module is used to receive message;
First determination module is used for carrying out Hash calculation according to the control information of described message and obtains cryptographic hash, and determines corresponding thread according to described cryptographic hash, thereby determines and described message thread one to one;
Second determination module is used for determining on the main frame and the described thread of determining buffer area one to one;
Writing module is used for described message is write definite described buffer area, and wherein, the CPU nuclear on each buffer area and the described main frame is corresponding one by one.
7. device according to claim 6 is characterized in that, further comprises:
Register, the address and the length that are used to store buffer area on the described main frame are so that the reference when writing message of said write module.
8. according to claim 6 or 7 described devices, it is characterized in that described control information is one of following: a tuple information, binary group information, triplet information, quaternary group information, five-tuple information.
CN2009100918351A 2009-08-28 2009-08-28 Method for realizing direct memory access Active CN101650698B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100918351A CN101650698B (en) 2009-08-28 2009-08-28 Method for realizing direct memory access

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100918351A CN101650698B (en) 2009-08-28 2009-08-28 Method for realizing direct memory access

Publications (2)

Publication Number Publication Date
CN101650698A CN101650698A (en) 2010-02-17
CN101650698B true CN101650698B (en) 2011-11-16

Family

ID=41672937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100918351A Active CN101650698B (en) 2009-08-28 2009-08-28 Method for realizing direct memory access

Country Status (1)

Country Link
CN (1) CN101650698B (en)

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102103490B (en) * 2010-12-17 2014-07-23 曙光信息产业股份有限公司 Method for improving memory efficiency by using stream processing
CN102045199A (en) * 2010-12-17 2011-05-04 天津曙光计算机产业有限公司 Performance optimization method for multi-server multi-buffer zone parallel packet sending
CN102541779B (en) * 2011-11-28 2015-07-08 曙光信息产业(北京)有限公司 System and method for improving direct memory access (DMA) efficiency of multi-data buffer
CN102497322A (en) * 2011-12-19 2012-06-13 曙光信息产业(北京)有限公司 High-speed packet filtering device and method realized based on shunting network card and multi-core CPU (Central Processing Unit)
CN102567226A (en) * 2011-12-31 2012-07-11 曙光信息产业股份有限公司 Data access implementation method and data access implementation device
CN102571580A (en) * 2011-12-31 2012-07-11 曙光信息产业股份有限公司 Data receiving method and computer
CN102541803A (en) * 2011-12-31 2012-07-04 曙光信息产业股份有限公司 Data sending method and computer
CN102662865B (en) * 2012-04-06 2014-11-26 福建星网锐捷网络有限公司 Multi-core CPU (central processing unit) cache management method, device and equipment
CN102984085A (en) * 2012-11-21 2013-03-20 网神信息技术(北京)股份有限公司 Mapping method and device
CN103532876A (en) * 2013-10-23 2014-01-22 中国科学院声学研究所 Processing method and system of data stream
CN104753813B (en) * 2013-12-27 2018-03-16 国家计算机网络与信息安全管理中心 The method that DMA transmits message
CN105786733B (en) * 2014-12-26 2020-08-07 南京中兴新软件有限责任公司 Method and device for writing TCAM (ternary content addressable memory) entries
CN105187235A (en) * 2015-08-12 2015-12-23 广东睿江科技有限公司 Message processing method and device
CN105094992B (en) * 2015-09-25 2018-11-02 浪潮(北京)电子信息产业有限公司 A kind of method and system of processing file request
CN107025064B (en) * 2016-01-30 2019-12-03 北京忆恒创源科技有限公司 A kind of data access method of the high IOPS of low latency
CN107615259B (en) * 2016-04-13 2020-03-20 华为技术有限公司 Data processing method and system
CN105930397B (en) * 2016-04-15 2019-05-17 北京思特奇信息技术股份有限公司 A kind of message treatment method and system
CN108134855B (en) * 2017-12-18 2021-03-09 东软集团股份有限公司 ARP table management method, processor core, storage medium and electronic device
US11132233B2 (en) * 2018-05-07 2021-09-28 Micron Technology, Inc. Thread priority management in a multi-threaded, self-scheduling processor
CN111240813A (en) * 2018-11-29 2020-06-05 杭州嘉楠耘智信息科技有限公司 DMA scheduling method, device and computer readable storage medium
CN110012025B (en) * 2019-04-17 2020-10-30 浙江禾川科技股份有限公司 Data transmission method, system and related device in image acquisition process
CN110147254A (en) * 2019-05-23 2019-08-20 苏州浪潮智能科技有限公司 A kind of data buffer storage processing method, device, equipment and readable storage medium storing program for executing
US11301295B1 (en) * 2019-05-23 2022-04-12 Xilinx, Inc. Implementing an application specified as a data flow graph in an array of data processing engines
CN113296972A (en) * 2020-07-20 2021-08-24 阿里巴巴集团控股有限公司 Information registration method, computing device and storage medium
CN112100111B (en) * 2020-09-15 2022-04-26 浪潮集团有限公司 Control method of multiple AWG board cards
CN112702275A (en) * 2020-12-29 2021-04-23 迈普通信技术股份有限公司 Method, device, network equipment and computer storage medium based on packet-by-packet forwarding
CN112929183B (en) * 2021-01-26 2023-01-20 北京百度网讯科技有限公司 Intelligent network card, message transmission method, device, equipment and storage medium
WO2022227053A1 (en) * 2021-04-30 2022-11-03 华为技术有限公司 Communication device and communication method

Also Published As

Publication number Publication date
CN101650698A (en) 2010-02-17

Similar Documents

Publication Publication Date Title
CN101650698B (en) Method for realizing direct memory access
EP2406723B1 (en) Scalable interface for connecting multiple computer systems which performs parallel mpi header matching
US9935899B2 (en) Server switch integration in a virtualized system
US8209690B2 (en) System and method for thread handling in multithreaded parallel computing of nested threads
CN100590609C (en) Method for managing dynamic internal memory base on discontinuous page
CN100592273C (en) Apparatus and method for performing DMA data transfer
JP4275085B2 (en) Information processing apparatus, information processing method, and data stream generation method
CN103412786B (en) High performance server architecture system and data processing method thereof
CN111431757B (en) Virtual network flow acquisition method and device
WO2015066489A2 (en) Efficient implementations for mapreduce systems
WO2014166404A1 (en) Network data packet processing method and device
JP2006107514A (en) System and device which have interface device which can perform data communication with external device
US9727521B2 (en) Efficient CPU mailbox read access to GPU memory
CN105302489B (en) A kind of remote embedded accumulator system of heterogeneous polynuclear and method
US20080225858A1 (en) Data transferring apparatus and information processing system
CN106293953A (en) A kind of method and system accessing shared video data
CN109101439B (en) Message processing method and device
CN105224258B (en) The multiplexing method and system of a kind of data buffer zone
WO2005088458A2 (en) A method and system for coalescing coherence messages
US6847990B2 (en) Data transfer unit with support for multiple coherency granules
US20120096245A1 (en) Computing device, parallel computer system, and method of controlling computer device
CN115529275B (en) Message processing system and method
US9424227B2 (en) Providing byte enables for peer-to-peer data transfer within a computing environment
CN101572689A (en) Method and device for transmitting data between network interface card and accelerators in multi-processor system
CN103377085A (en) Method, device and system for instruction management and operation core

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee
CP02 Change in the address of a patent holder

Address after: 100193 Beijing, Haidian District, northeast Wang West Road, building 8, No. 36

Patentee after: Dawning Information Industry (Beijing) Co.,Ltd.

Address before: 100084 No. 6 South Road, Zhongguancun Academy of Sciences, Beijing, Haidian District

Patentee before: Dawning Information Industry (Beijing) Co.,Ltd.

TR01 Transfer of patent right

Effective date of registration: 20220726

Address after: 100089 building 36, courtyard 8, Dongbeiwang West Road, Haidian District, Beijing

Patentee after: Dawning Information Industry (Beijing) Co.,Ltd.

Patentee after: DAWNING INFORMATION INDUSTRY Co.,Ltd.

Address before: 100193 No. 36 Building, No. 8 Hospital, Wangxi Road, Haidian District, Beijing

Patentee before: Dawning Information Industry (Beijing) Co.,Ltd.

TR01 Transfer of patent right