CN117714400A - Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform - Google Patents
Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform Download PDFInfo
- Publication number
- CN117714400A CN117714400A CN202311621493.6A CN202311621493A CN117714400A CN 117714400 A CN117714400 A CN 117714400A CN 202311621493 A CN202311621493 A CN 202311621493A CN 117714400 A CN117714400 A CN 117714400A
- Authority
- CN
- China
- Prior art keywords
- interrupt
- loongson
- hardware platform
- optimization method
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000005457 optimization Methods 0.000 title claims abstract description 27
- 239000000872 buffer Substances 0.000 claims description 30
- 230000008569 process Effects 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 5
- 230000007246 mechanism Effects 0.000 claims description 4
- 230000003213 activating effect Effects 0.000 claims description 3
- 230000001960 triggered effect Effects 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 8
- 230000006870 function Effects 0.000 description 12
- 101710092887 Integrator complex subunit 4 Proteins 0.000 description 2
- 102100037075 Proto-oncogene Wnt-3 Human genes 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000003750 conditioning effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Abstract
The invention provides a network performance comprehensive optimization method for a Loongson 3A domestic software and hardware platform. According to the invention, the interruption frequency and the copying times are reduced through the optimized network card drive at the home network card, and a large number of DMA memory allocation and release operations are reduced during data forwarding, so that the repeated operation of the memory and the memory management burden of the system are reduced; after the data packet is forwarded from the domestic network card to the domestic operating system kernel, the generated interrupt is subjected to load balancing in the kernel by an optimization method of interrupt rotation load balancing, and each processor core of the Loongson 3A participates in network interrupt processing, so that the network throughput is effectively improved, and the packet loss rate of the data is reduced; and meanwhile, the data access frequency is reduced in the kernel protocol stack through an optimized cache lock. The network system data forwarding comprehensive optimization method based on the domestic software and hardware platform and designed for the Loongson 3A processor can meet the application requirements of high broadband and low time delay.
Description
Technical Field
The invention belongs to the technical field of network performance optimization, relates to a domestic software and hardware platform adaptation optimization technology, and particularly relates to a network performance comprehensive optimization method for a Loongson 3A domestic software and hardware platform.
Background
Loongson No. 3 is a domestic multi-core high-performance processor developed by the research and development of computing technology of China academy of sciences, and is the first four-core processor with complete independent intellectual property rights in China. Compared with the international mainstream processor platform, the network performance of the domestic software and hardware platform of the Loongson 3A series multi-core processor is still different from that of the international mainstream processor platform, so that the design of the network performance optimization method for the Loongson 3A domestic software and hardware platform is urgent for realizing the network data forwarding of the Loongson multi-core platform with high bandwidth and low time delay.
Disclosure of Invention
In order to solve the problems, the invention provides a network performance comprehensive optimization method for Loongson 3A domestic software and hardware platforms.
In order to achieve the above purpose, the present invention provides the following technical solutions:
the network performance comprehensive optimization method for the Loongson 3A domestic software and hardware platform comprises the following steps:
when the Loongson 3A domestic software and hardware platform receives the data packet transmitted by the transmitting end, data forwarding is carried out based on the network card of the optimized drive;
after the data packet is forwarded from the network card to the kernel of the operating system, carrying out load balancing on the generated interrupt in the kernel of the system by an optimization method of interrupt rotation load balancing; and simultaneously, cache locking operation is carried out in the kernel protocol stack.
Further, the process of forwarding data by the optimally driven network card includes the following steps:
(1) Selecting proper number of received message buffer areas for the network card driver, and establishing a received message buffer model based on a NAPI polling mechanism;
(2) After obtaining related parameters through a received message buffer model, pre-distributing the number of the obtained received message buffers when the network card is initialized, and carrying out DMA (direct memory access) streaming mapping on each received message buffer and storing the DMA streaming mapping as channel parameters of the DMA;
(3) When the network card receives a data packet and triggers an interrupt, judging whether the network card is under a high load condition, if so, closing a hardware interrupt in an interrupt processing program and activating a polling thread;
(4) In a polling thread, data packets are DMA-transferred to a recorded receiving message buffer zone one by one, and specific fields of the message buffer zone are set and then transferred to an upper protocol stack;
(5) After the polling thread processes all or the maximum number of data packets of the receive queue, the hardware interrupt is opened, and the system continues to perform other tasks until the next interrupt is generated.
Further, when the hardware interrupt is closed, a timer is triggered to count, and when the count reaches the delay time threshold, data is sent to the protocol stack.
Further, the optimization method for interrupt rotation load balancing comprises the following steps:
after receiving an interrupt signal, the interrupt signal is processed in a round-robin mode in the middle circuit breaking, and the inter-core interrupt is sent to a target processor through the designated interrupt signal;
and the target processor core receiving the inter-core interrupt reads the IPI_Status register to obtain the distributed interrupt number, and then performs secondary distribution according to the interrupt number.
Further, the cache lock locks the skb_buff data structure.
Further, the cache lock locks header information in the packet buffer.
Compared with the prior art, the invention has the following advantages and beneficial effects:
according to the invention, the interruption frequency and the copying times are reduced through the optimized network card drive at the home network card, and a large number of DMA memory allocation and release operations are reduced during data forwarding, so that the repeated operation of the memory and the memory management burden of the system are reduced; processing the data packet by adopting a data packet receiving model with mixed interruption and polling, closing the hardware interruption of the network card when the network is under high load, and carrying out polling processing on the data packet, otherwise, using an interruption mechanism to receive the data packet; the invention performs load balancing on generated interrupt in the kernel by an optimization method of interrupt rotation load balancing; and performing cache locking operation on the skb_buff data structure, and performing cache locking operation on header information in a data packet buffer area, so that the data access frequency is reduced in a kernel protocol stack through an optimized cache lock. The network system data forwarding comprehensive optimization method based on the domestic software and hardware platform and designed for the Loongson 3A processor can meet the application requirements of high broadband and low time delay.
Drawings
Fig. 1 is a flow chart of a network performance comprehensive optimization method for a Loongson 3A domestic software and hardware platform.
Detailed Description
The technical scheme provided by the present invention will be described in detail with reference to the following specific examples, and it should be understood that the following specific examples are only for illustrating the present invention and are not intended to limit the scope of the present invention.
The invention provides a network performance comprehensive optimization method for a Loongson 3A domestic software and hardware platform, which is applied to the Loongson 3A domestic software and hardware platform. The Loongson 3A domestic software and hardware platform comprises a Loongson 3A series multi-core processor, a domestic network chip and a domestic operating system.
The network performance comprehensive optimization method for the Loongson 3A domestic software and hardware platform comprises the following steps:
step 1, when a Loongson 3A domestic software and hardware platform receives a data packet transmitted by a transmitting end, reducing interrupt frequency and copying times at a domestic network card through an optimized network card driver;
wherein, the optimization is realized by modifying the network card drive, which comprises the following steps:
(1) Selecting proper number of received message buffer areas for the domestic network card driver, and establishing a received message buffer model based on a NAPI (New API) polling mechanism for the number;
(2) After obtaining related parameters (the number of buffer areas) through a received message buffer model, pre-distributing the obtained number of the received message buffer areas when the home network card is initialized, and carrying out DMA (Direct Memory Access) flow mapping on each received message buffer area and storing the corresponding received message buffer area as a channel parameter of DMA (direct memory access);
(3) When the domestic network card receives a data packet and triggers an interrupt, judging whether the domestic network card is under a high load condition, if so, closing a hardware interrupt in an interrupt processing program and activating a polling thread;
(4) In a polling thread, data packets are DMA-transferred to a recorded receiving message buffer zone one by one, and specific fields of the message buffer zone are set and then transferred to an upper protocol stack;
(5) After the polling thread processes all or the maximum number of data packets of the receive queue, the hardware interrupt is opened, and the system continues to perform other tasks until the next interrupt is generated.
In particular, the optimization of the network card according to the invention is embodied as an interrupt adjustment algorithm, which comprises two principles. (1) Collecting as many data packets as possible in each interrupt signal; (2) The response to the interrupt signal is as fast as possible, rather than pursuing the number of packets sent in a single interrupt signal. An interrupt conditioning algorithm designed based on the principles described above empirically sets a 250 microsecond delay time threshold. When the interrupt signal of the data packet is received, the receiving interrupt is immediately closed, the triggering timer is enabled to count down, then the data is stored in a circulating buffer area for receiving DMARING, after the delay time of 250 microseconds is maximally passed, the data is sent to a protocol stack, and then the receiving interrupt closed before is opened again. The process of transmitting data is similar to the process of receiving data.
The specific process of judging the interrupt signal is as follows:
in the interrupt function, an irqreturn_t udma_inr () function is called to judge whether the received interrupt signal is an interrupt signal for transmitting data or an interrupt signal for receiving data, wherein if the received signal is an interrupt signal of a first data packet, a udma_irq_disable () function is called to prohibit interrupt.
And (3) calling a udma_ring_clean_rx_irq () function to judge whether the current DMARING Desc is finished, and if so, calling the udma_rx_ring_pop () function to clean the DMARING Desc and forwarding data at the back, thereby realizing the function of cleaning the DMARING.
Step 2, after the data packet is forwarded from the domestic network card to the domestic operating system kernel, carrying out load balancing on generated interrupt in the kernel by an optimization method of interrupt rotation load balancing; and meanwhile, the data access frequency is reduced in the kernel protocol stack through an optimized cache lock.
The optimization method for interrupt rotation load balancing is specifically implemented as follows: the Loongson 3A processor has an IPI_Status inter-core interrupt Status register, and when any one bit is set to 1 and the corresponding bit of the IPI_Enable register is enabled, the INT4 interrupt line of the Loongson processor core is set, the INT4 corresponds to the IP6 of the STATUS register, i.e. the inter-core interrupt is corresponding, so as to trigger the inter-core interrupt. And after the interrupt is received by the interrupt controller, processing the interrupt number in a round-robin mode, and sending the inter-core interrupt to the target processor through the designated interrupt number. The target processor core receiving the inter-core interrupt can read the IPI_Status register of the target processor core to obtain the distributed interrupt number, and then carry out secondary distribution according to the interrupt number to execute do_IRQ ().
The specific implementation process using the interrupt rotation load balancing technique is as follows:
in the kernel interrupt function map_irq_dispatch (), a variable cpu mask flag is set to mark the target processor core of the next round, and the get_irq_ht () function is called in the dispatch_ip3 () function to obtain an interrupt number from the HT interrupt register, if the interrupt number is 3 or 5, the loongson3_send_irq_by_ipi () function is called to send an inter-core interrupt, and the cpu mask variable is modified. I.e. modifying the target processor core of the next round, then calling the aloognson 3_ipi_interrupt () function to process inter-core interrupt, and finally calling the do_IRQ () function to process interrupt distribution.
After the interrupt rotation load balancing method is used for optimizing the processing, each processor core of the Loongson 3A participates in the network interrupt processing, so that the network throughput is effectively improved, and the packet loss rate of data is reduced.
The optimized cache lock is specifically realized as follows: during the whole system operation, the DMA descriptor and the sending and receiving queue address are fixed, and the dynamically allocated skb_buff and the data buffer area are in a certain address range, so that the space locality of the data part in the whole network processing flow is good, and the cache lock can be adopted to improve the network processing performance. Because the system has zero copy technology application, the access to the skb_buff data structure is more frequent, and the access to the data packet buffer is less, in the specific implementation, only the head information (mac address, IP address, TCP/UDP head information) in the data packet buffer is subjected to cache locking operation.
The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features. It should be noted that modifications and adaptations to the invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.
Claims (6)
1. The network performance comprehensive optimization method for the Loongson 3A domestic software and hardware platform is characterized by comprising the following steps:
when the Loongson 3A domestic software and hardware platform receives the data packet transmitted by the transmitting end, data forwarding is carried out based on the network card of the optimized drive;
after the data packet is forwarded from the network card to the kernel of the operating system, carrying out load balancing on the generated interrupt in the kernel of the system by an optimization method of interrupt rotation load balancing; and simultaneously, cache locking operation is carried out in the kernel protocol stack.
2. The method for comprehensively optimizing the network performance of the Loongson 3A domestic software and hardware platform according to claim 1, wherein the process of forwarding the data by the optimally driven network card comprises the following steps:
(1) Selecting proper number of received message buffer areas for the network card driver, and establishing a received message buffer model based on a NAPI polling mechanism;
(2) After obtaining related parameters through a received message buffer model, pre-distributing the number of the obtained received message buffers when the network card is initialized, and carrying out DMA (direct memory access) streaming mapping on each received message buffer and storing the DMA streaming mapping as channel parameters of the DMA;
(3) When the network card receives a data packet and triggers an interrupt, judging whether the network card is under a high load condition, if so, closing a hardware interrupt in an interrupt processing program and activating a polling thread;
(4) In a polling thread, data packets are DMA-transferred to a recorded receiving message buffer zone one by one, and specific fields of the message buffer zone are set and then transferred to an upper protocol stack;
(5) After the polling thread processes all or the maximum number of data packets of the receive queue, the hardware interrupt is opened, and the system continues to perform other tasks until the next interrupt is generated.
3. The method for comprehensively optimizing network performance of a Loongson 3A domestic software and hardware platform according to claim 2, wherein the method is characterized in that a timer is triggered to count when the hardware interrupt is closed, and data is sent to a protocol stack when the count reaches a delay time threshold.
4. The network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform according to claim 1, wherein the optimization method for interrupt rotation load balancing comprises the following steps:
after receiving an interrupt signal, the interrupt signal is processed in a round-robin mode in the middle circuit breaking, and the inter-core interrupt is sent to a target processor through the designated interrupt signal;
and the target processor core receiving the inter-core interrupt reads the IPI_Status register to obtain the distributed interrupt number, and then performs secondary distribution according to the interrupt number.
5. The network performance comprehensive optimization method for the Loongson 3A domestic software and hardware platform according to claim 1, wherein the cache lock locks a skb_buff data structure.
6. The method for comprehensively optimizing network performance of a Loongson 3A domestic software and hardware platform according to claim 5, wherein the cache lock locks header information in a data packet buffer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311621493.6A CN117714400A (en) | 2023-11-30 | 2023-11-30 | Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311621493.6A CN117714400A (en) | 2023-11-30 | 2023-11-30 | Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117714400A true CN117714400A (en) | 2024-03-15 |
Family
ID=90156198
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311621493.6A Pending CN117714400A (en) | 2023-11-30 | 2023-11-30 | Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117714400A (en) |
-
2023
- 2023-11-30 CN CN202311621493.6A patent/CN117714400A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6167029A (en) | System and method for integrated data flow control | |
US8762532B2 (en) | Apparatus and method for efficient memory allocation | |
EP0642246B1 (en) | Network communication method for systems equipped with virtual memory | |
US7836195B2 (en) | Preserving packet order when migrating network flows between cores | |
KR101623197B1 (en) | System and method for scheduling packet transmission on a client device | |
US8661167B2 (en) | DMA (direct memory access) coalescing | |
US6473414B1 (en) | Carrier sense collision avoidance with auto abort | |
US20060203730A1 (en) | Method and system for reducing end station latency in response to network congestion | |
US20120054362A1 (en) | Mechanism for autotuning mass data transfer from a sender to a receiver over parallel connections | |
US20070174511A1 (en) | Transmit rate pacing system and method | |
US20130089099A1 (en) | Modifying Data Streams without Reordering in a Multi-Thread, Multi-Flow Network Communications Processor Architecture | |
US20210320866A1 (en) | Flow control technologies | |
WO2011020053A1 (en) | Apparatus and method for efficient data processing | |
US20210076248A1 (en) | Communication Processor Handling Communications Protocols on Separate Threads | |
US20110040947A1 (en) | Apparatus and Method for Memory Management and Efficient Data Processing | |
CN108600053B (en) | Wireless network data packet capturing method based on zero copy technology | |
JPH10207822A (en) | Interruption processing method for high speed i/o controller | |
US20210072995A1 (en) | Multi-Threaded Processor with Thread Granularity | |
TWI455531B (en) | Network processor | |
WO2022089175A1 (en) | Network congestion control method and apparatus | |
US20110041128A1 (en) | Apparatus and Method for Distributed Data Processing | |
US6725270B1 (en) | Apparatus and method for programmably modifying a limit of a retry counter in a network switch port in response to exerting backpressure | |
CN117714400A (en) | Network performance comprehensive optimization method for Loongson 3A domestic software and hardware platform | |
CN115866103A (en) | Message processing method and device, intelligent network card and server | |
US20060067311A1 (en) | Method of processing packet data at a high speed |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |