CN117714400A - Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform - Google Patents

Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform Download PDF

Info

Publication number
CN117714400A
CN117714400A CN202311621493.6A CN202311621493A CN117714400A CN 117714400 A CN117714400 A CN 117714400A CN 202311621493 A CN202311621493 A CN 202311621493A CN 117714400 A CN117714400 A CN 117714400A
Authority
CN
China
Prior art keywords
interrupt
loongson
hardware platform
optimization method
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311621493.6A
Other languages
Chinese (zh)
Inventor
王云涛
虞文武
张振华
孟浩飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202311621493.6A priority Critical patent/CN117714400A/en
Publication of CN117714400A publication Critical patent/CN117714400A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a network performance comprehensive optimization method for a Loongson 3A domestic software and hardware platform. According to the invention, the interruption frequency and the copying times are reduced through the optimized network card drive at the home network card, and a large number of DMA memory allocation and release operations are reduced during data forwarding, so that the repeated operation of the memory and the memory management burden of the system are reduced; after the data packet is forwarded from the domestic network card to the domestic operating system kernel, the generated interrupt is subjected to load balancing in the kernel by an optimization method of interrupt rotation load balancing, and each processor core of the Loongson 3A participates in network interrupt processing, so that the network throughput is effectively improved, and the packet loss rate of the data is reduced; and meanwhile, the data access frequency is reduced in the kernel protocol stack through an optimized cache lock. The network system data forwarding comprehensive optimization method based on the domestic software and hardware platform and designed for the Loongson 3A processor can meet the application requirements of high broadband and low time delay.

Description

Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform
Technical Field
The invention belongs to the technical field of network performance optimization, relates to a domestic software and hardware platform adaptation optimization technology, and particularly relates to a network performance comprehensive optimization method for a Loongson 3A domestic software and hardware platform.
Background
Loongson No. 3 is a domestic multi-core high-performance processor developed by the research and development of computing technology of China academy of sciences, and is the first four-core processor with complete independent intellectual property rights in China. Compared with the international mainstream processor platform, the network performance of the domestic software and hardware platform of the Loongson 3A series multi-core processor is still different from that of the international mainstream processor platform, so that the design of the network performance optimization method for the Loongson 3A domestic software and hardware platform is urgent for realizing the network data forwarding of the Loongson multi-core platform with high bandwidth and low time delay.
Disclosure of Invention
In order to solve the problems, the invention provides a network performance comprehensive optimization method for Loongson 3A domestic software and hardware platforms.
In order to achieve the above purpose, the present invention provides the following technical solutions:
the network performance comprehensive optimization method for the Loongson 3A domestic software and hardware platform comprises the following steps:
when the Loongson 3A domestic software and hardware platform receives the data packet transmitted by the transmitting end, data forwarding is carried out based on the network card of the optimized drive;
after the data packet is forwarded from the network card to the kernel of the operating system, carrying out load balancing on the generated interrupt in the kernel of the system by an optimization method of interrupt rotation load balancing; and simultaneously, cache locking operation is carried out in the kernel protocol stack.
Further, the process of forwarding data by the optimally driven network card includes the following steps:
(1) Selecting proper number of received message buffer areas for the network card driver, and establishing a received message buffer model based on a NAPI polling mechanism;
(2) After obtaining related parameters through a received message buffer model, pre-distributing the number of the obtained received message buffers when the network card is initialized, and carrying out DMA (direct memory access) streaming mapping on each received message buffer and storing the DMA streaming mapping as channel parameters of the DMA;
(3) When the network card receives a data packet and triggers an interrupt, judging whether the network card is under a high load condition, if so, closing a hardware interrupt in an interrupt processing program and activating a polling thread;
(4) In a polling thread, data packets are DMA-transferred to a recorded receiving message buffer zone one by one, and specific fields of the message buffer zone are set and then transferred to an upper protocol stack;
(5) After the polling thread processes all or the maximum number of data packets of the receive queue, the hardware interrupt is opened, and the system continues to perform other tasks until the next interrupt is generated.
Further, when the hardware interrupt is closed, a timer is triggered to count, and when the count reaches the delay time threshold, data is sent to the protocol stack.
Further, the optimization method for interrupt rotation load balancing comprises the following steps:
after receiving an interrupt signal, the interrupt signal is processed in a round-robin mode in the middle circuit breaking, and the inter-core interrupt is sent to a target processor through the designated interrupt signal;
and the target processor core receiving the inter-core interrupt reads the IPI_Status register to obtain the distributed interrupt number, and then performs secondary distribution according to the interrupt number.
Further, the cache lock locks the skb_buff data structure.
Further, the cache lock locks header information in the packet buffer.
Compared with the prior art, the invention has the following advantages and beneficial effects:
according to the invention, the interruption frequency and the copying times are reduced through the optimized network card drive at the home network card, and a large number of DMA memory allocation and release operations are reduced during data forwarding, so that the repeated operation of the memory and the memory management burden of the system are reduced; processing the data packet by adopting a data packet receiving model with mixed interruption and polling, closing the hardware interruption of the network card when the network is under high load, and carrying out polling processing on the data packet, otherwise, using an interruption mechanism to receive the data packet; the invention performs load balancing on generated interrupt in the kernel by an optimization method of interrupt rotation load balancing; and performing cache locking operation on the skb_buff data structure, and performing cache locking operation on header information in a data packet buffer area, so that the data access frequency is reduced in a kernel protocol stack through an optimized cache lock. The network system data forwarding comprehensive optimization method based on the domestic software and hardware platform and designed for the Loongson 3A processor can meet the application requirements of high broadband and low time delay.
Drawings
Fig. 1 is a flow chart of a network performance comprehensive optimization method for a Loongson 3A domestic software and hardware platform.
Detailed Description
The technical scheme provided by the present invention will be described in detail with reference to the following specific examples, and it should be understood that the following specific examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.
The invention provides a network performance comprehensive optimization method for a Loongson 3A domestic software and hardware platform, which is applied to the Loongson 3A domestic software and hardware platform. The Loongson 3A domestic software and hardware platform comprises a Loongson 3A series multi-core processor, a domestic network chip and a domestic operating system.
The network performance comprehensive optimization method for the Loongson 3A domestic software and hardware platform comprises the following steps:
step 1, when a Loongson 3A domestic software and hardware platform receives a data packet transmitted by a transmitting end, reducing interrupt frequency and copying times at a domestic network card through an optimized network card driver;
wherein, the optimization is realized by modifying the network card drive, which comprises the following steps:
(1) Selecting proper number of received message buffer areas for the domestic network card driver, and establishing a received message buffer model based on a NAPI (New API) polling mechanism for the number;
(2) After obtaining related parameters (the number of buffer areas) through a received message buffer model, pre-distributing the obtained number of the received message buffer areas when the home network card is initialized, and carrying out DMA (Direct Memory Access) flow mapping on each received message buffer area and storing the corresponding received message buffer area as a channel parameter of DMA (direct memory access);
(3) When the domestic network card receives a data packet and triggers an interrupt, judging whether the domestic network card is under a high load condition, if so, closing a hardware interrupt in an interrupt processing program and activating a polling thread;
(4) In a polling thread, data packets are DMA-transferred to a recorded receiving message buffer zone one by one, and specific fields of the message buffer zone are set and then transferred to an upper protocol stack;
(5) After the polling thread processes all or the maximum number of data packets of the receive queue, the hardware interrupt is opened, and the system continues to perform other tasks until the next interrupt is generated.
In particular, the optimization of the network card according to the invention is embodied as an interrupt adjustment algorithm, which comprises two principles. (1) Collecting as many data packets as possible in each interrupt signal; (2) The response to the interrupt signal is as fast as possible, rather than pursuing the number of packets sent in a single interrupt signal. An interrupt conditioning algorithm designed based on the principles described above empirically sets a 250 microsecond delay time threshold. When the interrupt signal of the data packet is received, the receiving interrupt is immediately closed, the triggering timer is enabled to count down, then the data is stored in a circulating buffer area for receiving DMARING, after the delay time of 250 microseconds is maximally passed, the data is sent to a protocol stack, and then the receiving interrupt closed before is opened again. The process of transmitting data is similar to the process of receiving data.
The specific process of judging the interrupt signal is as follows:
in the interrupt function, an irqreturn_t udma_inr () function is called to judge whether the received interrupt signal is an interrupt signal for transmitting data or an interrupt signal for receiving data, wherein if the received signal is an interrupt signal of a first data packet, a udma_irq_disable () function is called to prohibit interrupt.
And (3) calling a udma_ring_clean_rx_irq () function to judge whether the current DMARING Desc is finished, and if so, calling the udma_rx_ring_pop () function to clean the DMARING Desc and forwarding data at the back, thereby realizing the function of cleaning the DMARING.
Step 2, after the data packet is forwarded from the domestic network card to the domestic operating system kernel, carrying out load balancing on generated interrupt in the kernel by an optimization method of interrupt rotation load balancing; and meanwhile, the data access frequency is reduced in the kernel protocol stack through an optimized cache lock.
The optimization method for interrupt rotation load balancing is specifically implemented as follows: the Loongson 3A processor has an IPI_Status inter-core interrupt Status register, and when any one bit is set to 1 and the corresponding bit of the IPI_Enable register is enabled, the INT4 interrupt line of the Loongson processor core is set, the INT4 corresponds to the IP6 of the STATUS register, i.e. the inter-core interrupt is corresponding, so as to trigger the inter-core interrupt. And after the interrupt is received by the interrupt controller, processing the interrupt number in a round-robin mode, and sending the inter-core interrupt to the target processor through the designated interrupt number. The target processor core receiving the inter-core interrupt can read the IPI_Status register of the target processor core to obtain the distributed interrupt number, and then carry out secondary distribution according to the interrupt number to execute do_IRQ ().
The specific implementation process using the interrupt rotation load balancing technique is as follows:
in the kernel interrupt function map_irq_dispatch (), a variable cpu mask flag is set to mark the target processor core of the next round, and the get_irq_ht () function is called in the dispatch_ip3 () function to obtain an interrupt number from the HT interrupt register, if the interrupt number is 3 or 5, the loongson3_send_irq_by_ipi () function is called to send an inter-core interrupt, and the cpu mask variable is modified. I.e. modifying the target processor core of the next round, then calling the aloognson 3_ipi_interrupt () function to process inter-core interrupt, and finally calling the do_IRQ () function to process interrupt distribution.
After the interrupt rotation load balancing method is used for optimizing the processing, each processor core of the Loongson 3A participates in the network interrupt processing, so that the network throughput is effectively improved, and the packet loss rate of data is reduced.
The optimized cache lock is specifically realized as follows: during the whole system operation, the DMA descriptor and the sending and receiving queue address are fixed, and the dynamically allocated skb_buff and the data buffer area are in a certain address range, so that the space locality of the data part in the whole network processing flow is good, and the cache lock can be adopted to improve the network processing performance. Because the system has zero copy technology application, the access to the skb_buff data structure is more frequent, and the access to the data packet buffer is less, in the specific implementation, only the head information (mac address, IP address, TCP/UDP head information) in the data packet buffer is subjected to cache locking operation.
The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features. It should be noted that modifications and adaptations to the invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims (6)

1. The network performance comprehensive optimization method for the Loongson 3A domestic software and hardware platform is characterized by comprising the following steps:
when the Loongson 3A domestic software and hardware platform receives the data packet transmitted by the transmitting end, data forwarding is carried out based on the network card of the optimized drive;
after the data packet is forwarded from the network card to the kernel of the operating system, carrying out load balancing on the generated interrupt in the kernel of the system by an optimization method of interrupt rotation load balancing; and simultaneously, cache locking operation is carried out in the kernel protocol stack.
2. The method for comprehensively optimizing the network performance of the Loongson 3A domestic software and hardware platform according to claim 1, wherein the process of forwarding the data by the optimally driven network card comprises the following steps:
(1) Selecting proper number of received message buffer areas for the network card driver, and establishing a received message buffer model based on a NAPI polling mechanism;
(2) After obtaining related parameters through a received message buffer model, pre-distributing the number of the obtained received message buffers when the network card is initialized, and carrying out DMA (direct memory access) streaming mapping on each received message buffer and storing the DMA streaming mapping as channel parameters of the DMA;
(3) When the network card receives a data packet and triggers an interrupt, judging whether the network card is under a high load condition, if so, closing a hardware interrupt in an interrupt processing program and activating a polling thread;
(4) In a polling thread, data packets are DMA-transferred to a recorded receiving message buffer zone one by one, and specific fields of the message buffer zone are set and then transferred to an upper protocol stack;
(5) After the polling thread processes all or the maximum number of data packets of the receive queue, the hardware interrupt is opened, and the system continues to perform other tasks until the next interrupt is generated.
3. The method for comprehensively optimizing network performance of a Loongson 3A domestic software and hardware platform according to claim 2, wherein the method is characterized in that a timer is triggered to count when the hardware interrupt is closed, and data is sent to a protocol stack when the count reaches a delay time threshold.
4. The network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform according to claim 1, wherein the optimization method for interrupt rotation load balancing comprises the following steps:
after receiving an interrupt signal, the interrupt signal is processed in a round-robin mode in the middle circuit breaking, and the inter-core interrupt is sent to a target processor through the designated interrupt signal;
and the target processor core receiving the inter-core interrupt reads the IPI_Status register to obtain the distributed interrupt number, and then performs secondary distribution according to the interrupt number.
5. The network performance comprehensive optimization method for the Loongson 3A domestic software and hardware platform according to claim 1, wherein the cache lock locks a skb_buff data structure.
6. The method for comprehensively optimizing network performance of a Loongson 3A domestic software and hardware platform according to claim 5, wherein the cache lock locks header information in a data packet buffer.
CN202311621493.6A 2023-11-30 2023-11-30 Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform Pending CN117714400A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311621493.6A CN117714400A (en) 2023-11-30 2023-11-30 Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311621493.6A CN117714400A (en) 2023-11-30 2023-11-30 Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform

Publications (1)

Publication Number Publication Date
CN117714400A true CN117714400A (en) 2024-03-15

Family

ID=90156198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311621493.6A Pending CN117714400A (en) 2023-11-30 2023-11-30 Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform

Country Status (1)

Country Link
CN (1) CN117714400A (en)

Similar Documents

Publication Publication Date Title
US6167029A (en) System and method for integrated data flow control
US8762532B2 (en) Apparatus and method for efficient memory allocation
EP0642246B1 (en) Network communication method for systems equipped with virtual memory
US7836195B2 (en) Preserving packet order when migrating network flows between cores
KR101623197B1 (en) System and method for scheduling packet transmission on a client device
US8661167B2 (en) DMA (direct memory access) coalescing
US6473414B1 (en) Carrier sense collision avoidance with auto abort
US20060203730A1 (en) Method and system for reducing end station latency in response to network congestion
US20120054362A1 (en) Mechanism for autotuning mass data transfer from a sender to a receiver over parallel connections
US20070174511A1 (en) Transmit rate pacing system and method
US20130089099A1 (en) Modifying Data Streams without Reordering in a Multi-Thread, Multi-Flow Network Communications Processor Architecture
US20210320866A1 (en) Flow control technologies
WO2011020053A1 (en) Apparatus and method for efficient data processing
US20210076248A1 (en) Communication Processor Handling Communications Protocols on Separate Threads
US20110040947A1 (en) Apparatus and Method for Memory Management and Efficient Data Processing
CN108600053B (en) Wireless network data packet capturing method based on zero copy technology
JPH10207822A (en) Interruption processing method for high speed i/o controller
US20210072995A1 (en) Multi-Threaded Processor with Thread Granularity
TWI455531B (en) Network processor
WO2022089175A1 (en) Network congestion control method and apparatus
US20110041128A1 (en) Apparatus and Method for Distributed Data Processing
US6725270B1 (en) Apparatus and method for programmably modifying a limit of a retry counter in a network switch port in response to exerting backpressure
CN117714400A (en) Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform
CN115866103A (en) Message processing method and device, intelligent network card and server
US20060067311A1 (en) Method of processing packet data at a high speed

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination