CN103220226A - Transparent real-time flow compression method and transparent real-time flow compression system among data centers - Google Patents

Transparent real-time flow compression method and transparent real-time flow compression system among data centers Download PDF

Info

Publication number
CN103220226A
CN103220226A CN2013101586913A CN201310158691A CN103220226A CN 103220226 A CN103220226 A CN 103220226A CN 2013101586913 A CN2013101586913 A CN 2013101586913A CN 201310158691 A CN201310158691 A CN 201310158691A CN 103220226 A CN103220226 A CN 103220226A
Authority
CN
China
Prior art keywords
compression
data
stream
central processor
compressing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013101586913A
Other languages
Chinese (zh)
Other versions
CN103220226B (en
Inventor
王燕飞
吴教仁
刘晓光
刘涛
刘宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310158691.3A priority Critical patent/CN103220226B/en
Publication of CN103220226A publication Critical patent/CN103220226A/en
Application granted granted Critical
Publication of CN103220226B publication Critical patent/CN103220226B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a transparent real-time flow compression method among data centers. The method comprises the following steps of: carrying out stream-based fine grit compression on a data stream, namely, compressing the data stream based on the compression strategy corresponding to stream partitioning according to the port attribute of the data stream, so as to obtain a plurality of compression data blocks; and according to the stream type of the plurality of compression data blocks, transmitting the compression data blocks of a plurality of stream types through different channels, wherein the compression data blocks are transmitted by utilizing a batch processing strategy and a partial buffer pool method. The redundant information of specific services is compressed, the wideband utilization rate of transmission links among the data centers is sufficiently used, the system expense is optimized, the flexibility is good, the efficiency is high, and the performance advantage is remarkable. The invention further discloses a transparent real-time flow compression system among data centers.

Description

Transparent real-time traffic compression method and system between data center
Technical field
The present invention relates to field of computer technology, transparent real-time traffic compression method and system between particularly a kind of data center.
Background technology
In magnitude is under the high-speed link of 10Gbps, and the Real Time Compression of packet will face very big challenge, divides three aspects that the challenge of Real Time Compression is described below.
(1), in system realizes, the general auxiliary unloading of hardware acceleration device (Offload) system handles pressure that adopts, yet, as the acceleration equipment of peripheral hardware when participation system is mutual, to face huge overhead, for example I/O efficient (comprising the device register memory access optimization of PCIe bandwidth usage efficient, high latency etc.), operating system overhead (comprising the system call expense, kernel state and user's attitude packet copy expense etc.).
(2), under general multi-core platform, 10Gbps magnitude linear speed processing demands must cause concurrent design.According to the Amdahl law, the most concurrent speed-up ratio of system for restricting is partly incited somebody to action in system's serial.Therefore, it is most important to optimize concurrent design.Yet the part compressing card drives from hardware and drives two aspects and all fails well to be applied to the concurrent processing scene of high speed at present.Though increase the efficient that data packet length to be compressed can be optimized compression device, the processing that has also increased the excessive data bag is simultaneously prolonged.Can not fully control concurrent processing efficient between general processor and the acceleration equipment based on the I/O communication pattern of block type.
(3), in the network design aspect, the transparence compression design need satisfy the compression/de-compression of symmetrical expression and dispose.This has brought and how has at utmost utilized application characteristic, carries out the problem of the compression processing of Different Strategies according to the data redundancy characteristic of different application.Generally in order to optimize compression bandwidth and compression efficiency, adopt many bags to assemble compression simultaneously, but the packet loss that the unreliability of potential IP network causes may influence multiple services TCP performance simultaneously, it is also most important therefore to optimize the performance loss that packet loss causes.Also to consider how adaptive control during design.In order to transmit the packet after many bags are assembled compression simultaneously, also need encapsulated message head again, therefore but the stream single characteristic that traditional tunnel pattern causes may influence the efficiency of transmission of packet in data center's core layer, how to design that the tunnel system is also most important efficiently.
Prior art generally is mainly used in the transmission link of high latency, in satellite link.Because in the heart high bandwidth requirements in the free of data, therefore patent and the technology that overhead of compression engine is not optimized.
In the existing compress technique, in order to improve compression efficiency, adopted the compressed configuration based on data block (block), a plurality of packets participate in compression together.Adopt the virtual link method to realize the transparent compression of packet, the data message after the compression transmits in virtual link, and transparent compression has been realized in the corresponding virtual link of compressed and decompressed difference two ends therefore.
Prior art has following shortcoming:
(1) though, existing compress technique based on the block compression to a certain degree improves the compression efficiency of acceleration equipment, do not have fine-grained control, this has increased unusual influences to network performance such as packet loss to a certain extent.For example may have influence on a plurality of TCP streams simultaneously, perhaps have influence on a plurality of service features simultaneously.In addition, do not have fine-grained control, be difficult to excavate the data redundancy characteristic of different TCP streams and different business, cause compression efficiency limited.
(2), in data center's core network, single virtual link may cause the high speed flow can not fully control basic network.
(3), there is not the high-throughput design requirement, the therefore very difficult overhead that embodies, especially more difficult embodiment in the parallelization platform based on general multinuclear quickens application.The high-performance real-time system design of 10,000,000,000 (10Gbps) magnitude needs the optimization system expense, especially lays particular emphasis on paralleling tactic optimization efficiently.
Summary of the invention
The present invention is intended to solve at least one of technical problem that exists in the prior art.
For this reason, one object of the present invention is to propose transparent real-time traffic compression method between a kind of data center.The redundant information of this method compression specific transactions, the abundant bandwidth availability ratio of transmission link in the heart in the mining data, the optimization system expense, flexibility is good, the efficient height, performance advantage is obvious.
Second purpose of the present invention is to propose transparent real-time traffic compressibility between a kind of data center.
For achieving the above object, the embodiment of first aspect present invention has proposed transparent real-time traffic compression method between a kind of data center.Comprise the steps: data stream is carried out compressing based on the fine granularity of stream, comprising: the port attribute according to data flow carries out compressing to obtain a plurality of compression data blocks based on the Compression Strategies that flows the correspondence of dividing to described data flow; According to the stream type of described a plurality of compression data blocks, the compression data block of a plurality of stream types is transmitted simultaneously by different tunnels respectively, wherein, adopt batch processing strategy and local Buffer Pool method that described compression data block is transmitted.
Transparent real-time traffic compression method has adopted the optimisation strategy of fine granularity control between the data center according to the embodiment of the invention, apply property according to carrying carries out more fine-grained performance optimization, the redundant information of compression specific transactions, abundant bandwidth resources in the heart in the mining data, improve bandwidth availability ratio, optimize cost, reduce the influence that unreliable network causes the packet loss phenomenon, flexibility and performance advantage are provided.Optimize defeated performance.In data center network, in order to improve network transmission efficiency and redundancy, the various flows packet can be propagated simultaneously through different routed paths, has improved propagation efficiency, the optimization system expense.
In one embodiment of the invention, the magnitude in described tunnel equates with the magnitude of the packet of described data flow.Optimized Tunnel Design.Many tunnel modes design optimization restriction, made full use of the bandwidth of data center network.
In one embodiment of the invention, described data stream is carried out adopting the running of flowing water strategy based on the fine granularity compression of stream, wherein, central processor CPU receives the request compression queue, there is compressing card that the described request compression queue is compressed, and will compress the back data and send into and respond out group formation, described central processor CPU obtains the described described packed data that responds out in group formation.Adopt the flowing water strategy can realize the running of high load capacity simultaneously of CPU and compressing card, fully excavate the service efficiency of general multi-core CPU and compressing card, thereby avoid idle running, reach the purpose that reduces systematic function.
In one embodiment of the invention, utilize the buffer system between described central processor CPU and the compressing card, increase buffer size between described central processor CPU and the compressing card according to the system load jitter amplitude.
In one embodiment of the invention, to the described request compression queue with describedly respond out group formation and carry out polling operation.Carry out polling operation to asking compression queue and responding out group formation, can reduce system break expense and system's handover overhead.
In one embodiment of the invention, comprise the steps: that also the I/O model that adopts user's attitude to drive transmits described compression data block.Can optimization system call bag copy communication overhead between expense, kernel state and the user's attitude etc., optimize that big bag is long to be used the CPU that copy procedure down brings and consume and the cache pollution problem, and then the raising systematic function.
The embodiment of second aspect present invention has proposed transparent real-time traffic compressibility between a kind of data center, comprises compression treatment device and flow management device.
Wherein, compression treatment device is used for data stream is carried out compressing based on the fine granularity of stream, wherein, described compression treatment device carries out compressing to obtain a plurality of compression data blocks based on the Compression Strategies that flows the correspondence of dividing to described data flow according to the port attribute of data flow; The flow management device is used for the stream type according to described a plurality of compression data blocks, and the compression data block of a plurality of stream types is transmitted simultaneously by different tunnels respectively, wherein, adopts batch processing strategy and local Buffer Pool method that described compression data block is transmitted.
Transparent real-time traffic compressibility between data center according to the embodiment of the invention, adopted the optimisation strategy of fine granularity control, apply property according to carrying carries out more fine-grained performance optimization, the redundant information of compression specific transactions, fully bandwidth resources in the heart in the mining data improve bandwidth availability ratio, optimize cost, reduce the influence that unreliable network causes the packet loss phenomenon, flexibility and performance advantage are provided.Optimize defeated performance.In data center network, in order to improve network transmission efficiency and redundancy, the various flows packet can be propagated simultaneously through different routed paths, has improved propagation efficiency, the optimization system expense.
In one embodiment of the invention, the magnitude in described tunnel equates with the magnitude of the packet of described data flow.Optimized Tunnel Design.Many tunnel modes design optimization restriction, made full use of the bandwidth of data center network.
In one embodiment of the invention, described compression treatment device carries out adopting the running of flowing water strategy based on the fine granularity compression of stream to described data flow, wherein, described compression treatment device comprises central processor CPU and compressing card, wherein, described central processor CPU is used for receiving the request compression queue; Described compressing card is used for the described request compression queue is compressed, and will compress the back data and send into and respond out group formation; Wherein, described central processor CPU also is used for obtaining the described described packed data that responds out group formation.Adopt the flowing water strategy can realize the running of high load capacity simultaneously of CPU and compressing card, fully excavate the service efficiency of general multi-core CPU and compressing card, thereby avoid idle running, reach the purpose that reduces systematic function.
In one embodiment of the invention, described compression treatment device also comprises buffer system, wherein, described buffer system is used for increasing buffer size between described central processor CPU and the compressing card according to the system load jitter amplitude between described central processor CPU and compressing card.
In one embodiment of the invention, described compression treatment device also is used for the described request compression queue and describedly responds out group formation and carry out polling operation.Carry out polling operation to asking compression queue and responding out group formation, can reduce system break expense and system's handover overhead.
In one embodiment of the invention, described current processing device adopts the I/O model of user's attitude driving that described compression data block is transmitted.Can optimization system call bag copy communication overhead between expense, kernel state and the user's attitude etc., optimize that big bag is long to be used the CPU that copy procedure down brings and consume and the cache pollution problem, and then the raising systematic function.
Additional aspect of the present invention and advantage part in the following description provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Description of drawings
Above-mentioned and/or additional aspect of the present invention and advantage are from obviously and easily understanding becoming the description of embodiment in conjunction with following accompanying drawing, wherein:
Fig. 1 is a transparent real-time traffic compression method flow chart between data center according to the embodiment of the invention;
Fig. 2 is the flow chart according to the flowing water strategy of the embodiment of the invention;
Fig. 3 is the The general frame according to the system of the embodiment of the invention; With
Fig. 4 is the structural representation of transparent real-time traffic compressibility between data center according to the embodiment of the invention.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein identical from start to finish or similar label is represented identical or similar elements or the element with identical or similar functions.Below by the embodiment that is described with reference to the drawings is exemplary, only is used to explain the present invention, and can not be interpreted as limitation of the present invention.
Below with reference to transparent real-time traffic compression method between the data center of Fig. 1 description, comprise the steps: according to the embodiment of the invention
Step S110: data stream is carried out compressing based on the fine granularity of stream, comprising: the port attribute according to data flow carries out compressing to obtain a plurality of compression data blocks based on the Compression Strategies that flows the correspondence of dividing to data stream.
In one embodiment of the invention, data stream is carried out adopting the running of flowing water strategy based on the fine granularity compression of stream, wherein, central processor CPU receives the request compression queue, there is compressing card that the request compression queue is compressed, and will compress the back data and send into and respond out group formation, central processor CPU obtains the packed data that responds out in group formation.Utilize the buffer system between central processor CPU and the compressing card, according to the buffer size between system load jitter amplitude increase central processor CPU and the compressing card.
Wherein, carry out polling operation to asking compression queue and responding out group formation.
Step S120: according to the stream type of a plurality of compression data blocks, the compression data block of a plurality of stream types is transmitted simultaneously by different tunnels respectively, wherein, adopt batch processing strategy and local Buffer Pool method that compression data block is transmitted.
Wherein, the magnitude in tunnel equates with the magnitude of the packet of data flow.
Comprise the steps: that also the I/O model that adopts user's attitude to drive transmits compression data block.
Come transparent real-time traffic compression method between the data center according to the embodiment of the invention is described with concrete example below.Be understandable that following explanation is not limited thereto according to embodiments of the invention only for illustrative purposes.
Step S210: data stream is carried out compressing based on the fine granularity of stream.
Wherein, the fine granularity compression design based on stream (Flow) is as follows:
Port attribute according to data flow is divided data stream, and difference is divided the different Compression Strategies of enforcement, obtains a plurality of compression data blocks.Utilize the feature of data redundancy feature similarity in the same data flow, improve compression efficiency, optimize bandwidth.When bandwidth is heightened in the compression of many packet aggregations, can guarantee that packet loss only is confined in the TCP stream the coverage of system, avoid many streams packet loss that sets out simultaneously, thereby avoid a plurality of connections to enter congestion avoidance procedure simultaneously, and then optimize the overall transfer performance.
Step S220:, the compression data block of a plurality of stream types is transmitted simultaneously by different tunnels respectively according to the stream type of a plurality of compression data blocks.
Single tunnel mode will limit the propagation efficiency of high flow capacity in data center network, in order to improve network transmission efficiency and redundancy.In data center network, the various flows packet can be propagated simultaneously through different routed paths.Particularly, adopt the layout strategy of class NAT based on the virtual transmission link design of many tunnels (Tunnel) pattern, the mode of utilizing port and IP address to shine upon the tunnel jointly reduces the occupancy of IP address.Under the extreme case, the magnitude in tunnel equates with the magnitude of the packet of data flow, has optimized Tunnel Design.Many tunnel modes design optimization restriction, made full use of the bandwidth of data center network.
In order to optimize the average expense of every bag (Per-Packet Bookkeeping) under 10,000,000,000 magnitudes, overhead such as the high latency register memory access expense of accelerator card PCIe equipment, bag Memory Allocation expense for example, adopt batch processing strategy and local Buffer Pool method that compression data block is transmitted, optimize competition and average overhead.
The design of nothing lock concurrent system on different levels according to this method design is as follows:
(1), application: in order fully to excavate the concurrent processing ability of general multiple nucleus system, according to fine granularity stream compatibility strategy, the packet that satisfies various flows walks abreast, does not have the compression process of lock.Guaranteed that based on the partition strategy of stream level (Flow level Based) the different physics of polycaryon processor do not endorse not have lock, concurrent realization compression/decompression process fully.
(2), driving aspect: adopt thread-safe and concurrent drive system fully, avoid that serial code becomes the concurrent bottleneck of system in the drive system.Driving should make full use of the concurrent visit support of hardware that accelerator card equipment provides, and optimizes the lock competition, guarantees complete concurrent the carrying out of high flow capacity Real Time Compression.
(3), hardware level: select the accelerating hardware of the many formations of support hardware for use, guarantee to drive the concurrent visit of carrying out accelerating hardware that not have lock with application program.For the accelerator card that does not have the many formations of hardware to support, when highly concurrent driving and upper layer software (applications) are applied in the access hardware acceleration equipment, to inevitably carry out the visit of serial mutual exclusion, this serial time of implementation will directly have influence on the speed-up ratio of multinuclear multi-threaded system.
(4), data structure level: potential Memory Allocation process all adopts every check figure to finish according to the data structure of structure (Per-Core Data Structure) software caching (Cache) and (Lock-free) formation of overall situation nothing lock in the compressibility, improves the concurrency and the flexibility of system to the full extent.
Wherein, data stream is carried out adopting the running of flowing water strategy based on the fine granularity compression of stream, adopt flowing water (pipelining) strategy can realize the running of high load capacity simultaneously of CPU and compressing card, fully excavate the service efficiency of general multi-core CPU and compressing card, thereby avoid idle running, reach the purpose that reduces systematic function.Specific design comprises CPU, Request Ring buffer, Response Ring buffer and HW as shown in Figure 2.Central processor CPU is CPU among Fig. 2, and buffering area is made up of Request Ring buffer and Response Ring buffer, and HW represents compressing card among Fig. 2.Concrete design comprises:
(1), makes full use of buffer system between CPU and the compressing card.Amplitude according to the system load shake suitably increases buffer size.
(2), compression process is divided into three processing procedures, comprise that respectively the request of being handled by central processor CPU is compressed into the response that compression process that formation process, compressing card handle and central processor CPU handle and goes out group process.
Concrete steps are as shown in Figure 2:
Step S1: central processor CPU receives request compression queue Request Ring buffer.
Step S2: compressing card HW compresses request compression queue Request Ring buffer
Step S3: data were sent into and are responded out group formation Response Ring buffer after compressing card HW will compress.
Step S4: central processor CPU obtains the packed data that responds out in group formation.
Among Fig. 2, the flowing water design object makes CPU and compressing card hardware HW all remain on the state of relative busy busy, can fully excavate the throughput of flowing water.
(3), on the I/O pattern, avoid using block type (Block) communication pattern.Blocking model is under extreme case, and CPU can get clogged and finish up to the packet compression, and this can lower the efficient of flowing water.The I/O model that can adopt user's attitude to drive transmits compression data block.The bag that calls between expense, kernel state and the user's attitude for optimization system copies communication overhead etc., adopt user's attitude to drive (User Space Driver, UIO) She Ji I/O model, realize the I/O driving model of zero-copy, thereby CPU consumption and cache pollution problem (Cache Pollution) that the copy procedure under long application of the big bag of optimization is brought, and then improve systematic function.
(4), poll (Polling) request compression queue and respond out group formation, at utmost bring into play the flowing water performance.Carry out polling operation to asking compression queue and responding out group formation, can reduce system break expense and system's handover overhead.
Fig. 3 is according to the The general frame of a system of this method realization, is understandable that Fig. 3 is not limited thereto according to embodiments of the invention only for illustrative purposes.System is divided into flow management module and compression processing module.Fine granularity flow management process provides the continuous compression data block Chunk based on the stream division for the compression processing module, and packed data transmission is efficiently carried out by many tunnels in the centre, and this symmetrical system provides transparent compression process.Since system separate the pressure side flow process and compression end has certain symmetry, the compression process with Fig. 3 left side is that example describes below.Flows Management represents the flow management module, and C0, C1, C2 represent the processor physics nuclear of multi-core platform, and wherein, according to agreement parallelization strategy, each physics nuclear has the session context of some, represents with flows among Fig. 3.Generally organize the session context with the Hash table form, according to the data message of strategy tissue from different flows, the Flows chunk formation chunk to be compressed by Fig. 3 is submitted to compression engine Compress Engine; Final result New data is by encapsulating new header again, as among Fig. 3+header shown in, by the Tunnel tunnel, be sent to the opposite end and carry out decompress(ion).
Transparent real-time traffic compression method is mainly used in the uniform data in the heart high-performance data and optimizes transmission platform between the data center according to the embodiment of the invention, be used to have the bag transmission performance optimization of high latency bandwidth product characteristic link, comprise purposes such as bandwidth availability ratio optimization, delay performance optimization.By the redundant information of compression specific transactions, real-time data compression design, abundant bandwidth resources in the heart in the mining data improve the bandwidth availability ratio of the high business of some data redundancy amount, optimize cost, and flexibility and performance advantage are provided.Adopted the optimisation strategy of fine granularity control, carried out more fine-grained performance optimization according to the apply property that carries.Adopted I/O system, avoided the high overhead of the extra introducing of system design of high-magnitude from the performance perspective height optimization.
Below with reference to transparent real-time traffic compressibility 100 between the data center of Fig. 4 description, comprise compression treatment device 110 and flow management device 120 according to the embodiment of the invention.
Wherein, compression treatment device 110 is used for data stream is carried out compressing based on the fine granularity of stream, wherein, compression treatment device 110 carries out compressing to obtain a plurality of compression data blocks based on the Compression Strategies that flows the correspondence of dividing to data stream according to the port attribute of data flow; Flow management device 120 is used for the stream type according to a plurality of compression data blocks, and the compression data block of a plurality of stream types is transmitted simultaneously by different tunnels respectively, wherein, adopts batch processing strategy and local Buffer Pool method that compression data block is transmitted.
Wherein, the magnitude in tunnel equates with the magnitude of the packet of data flow.
110 pairs of data streams of compression treatment device carry out adopting the running of flowing water strategy based on the fine granularity compression of stream, and wherein, compression treatment device 110 comprises central processor CPU 111 and compressing card 112, and wherein, central processor CPU 111 is used for receiving the request compression queue; Compressing card 112 is used for the request compression queue is compressed, and will compress afterwards data and send into and respond out group formation; Wherein, central processor CPU 111 also is used for obtaining the packed data that responds out group formation.
Compression treatment device 110 also comprises buffer system 113113, wherein, buffer system 113 is used for according to the buffer size between system load jitter amplitude increase central processor CPU 111 and the compressing card 112 between central processor CPU 111 and compressing card 112.
Compression treatment device 110 also is used for the request compression queue and responds out group formation carrying out polling operation.
The I/O model that current processing device adopts user's attitude to drive transmits compression data block.
Come transparent real-time traffic compressibility between the data center according to the embodiment of the invention is described with concrete example below.Be understandable that following explanation is not limited thereto according to embodiments of the invention only for illustrative purposes.
110 pairs of data streams of compression treatment device carry out the fine granularity compression based on stream.
Wherein, compression treatment device 110 comprises central processor CPU 111, compressing card 112 and buffer system 113, and buffer system 113 is between central processor CPU 111 and compressing card 112.Fine granularity compression design based on stream (Flow) in the compression treatment device 110 is as follows:
Compression treatment device 110 is divided data stream according to the port attribute of data flow, and difference is divided the different Compression Strategies of enforcement, obtains a plurality of compression data blocks.Compression treatment device 110 utilizes the feature of data redundancy feature similarity in the same data flow, improves compression efficiency, optimizes bandwidth.When bandwidth is heightened in the compression of many packet aggregations, can guarantee that packet loss only is confined in the TCP stream the coverage of system, avoid many streams packet loss that sets out simultaneously, thereby avoid a plurality of connections to enter congestion avoidance procedure simultaneously, and then optimize the overall transfer performance.
Flow management device 120 is according to the stream type of a plurality of compression data blocks, and the compression data block of a plurality of stream types is transmitted simultaneously by different tunnels respectively.
Single tunnel mode will limit the propagation efficiency of high flow capacity in data center network, in order to improve network transmission efficiency and redundancy.In data center network, the various flows packet can be propagated simultaneously through different routed paths.Particularly, adopt the layout strategy of class NAT in the flow management device 120 based on the virtual transmission link design of many tunnels (Tunnel) pattern, the mode of utilizing port and IP address to shine upon the tunnel jointly reduces the occupancy of IP address.Under the extreme case, the magnitude in tunnel equates with the magnitude of the packet of data flow, has optimized Tunnel Design.Many tunnel modes design optimization restriction, made full use of the bandwidth of data center network.
In order to optimize the average expense of every bag (Per-Packet Bookkeeping) under 10,000,000,000 magnitudes, overhead such as the high latency register memory access expense of accelerator card PCIe equipment, bag Memory Allocation expense for example, flow management device 120 adopts batch processing strategy and local Buffer Pool method that compression data block is transmitted, and optimizes competition and average overhead.
The design of nothing lock concurrent system on different levels according to the native system design is as follows:
(1), application: in order fully to excavate the concurrent processing ability of general multiple nucleus system, according to compression treatment device 110 fine granularities stream compatibility strategy, the packet that satisfies various flows walks abreast, does not have the compression process of lock.Guaranteed that based on the partition strategy of stream level (Flow level Based) the different physics of polycaryon processor do not endorse not have lock, concurrent realization compression/decompression process fully.
(2), driving aspect: adopt thread-safe and concurrent drive system fully, avoid that serial code becomes the concurrent bottleneck of system in the drive system.Driving should make full use of the concurrent visit support of hardware that accelerator card equipment provides, and optimizes the lock competition, guarantees complete concurrent the carrying out of high flow capacity Real Time Compression.
(3), hardware level: select the accelerating hardware of the many formations of support hardware for use, guarantee to drive the concurrent visit of carrying out accelerating hardware that not have lock with application program.For the accelerator card that does not have the many formations of hardware to support, when highly concurrent driving and upper layer software (applications) are applied in the access hardware acceleration equipment, to inevitably carry out the visit of serial mutual exclusion, this serial time of implementation will directly have influence on the speed-up ratio of multinuclear multi-threaded system.
(4), data structure level: potential Memory Allocation process all adopts every check figure to finish according to the data structure of structure (Per-Core Data Structure) software caching (Cache) and (Lock-free) formation of overall situation nothing lock in the compressibility, improves the concurrency and the flexibility of system to the full extent.
Wherein, 110 pairs of data streams of compression treatment device carry out adopting the running of flowing water strategy based on the fine granularity compression of stream.Adopt flowing water (pipelining) strategy can realize central processor CPU 111 and compressing card 112 running of high load capacity simultaneously, fully excavate the service efficiency of general multi-core CPU and compressing card 112, thereby avoid idle running, reach the purpose that reduces systematic function.Specific design comprises CPU, Request Ring buffer, Response Ring buffer and HW as shown in Figure 2.Central processor CPU 111 is CPU among the figure, and buffer system 113 is made up of Request Ring buffer and Response Ring buffer, and HW represents compressing card 112 among the figure.Concrete design comprises:
(1), makes full use of buffer system 113 between CPU and the compressing card 112.Amplitude according to the system load shake suitably increases buffer size.
(2), compression process is divided into three processing procedures, comprise that respectively the request of being handled by central processor CPU 111 is compressed into the response that compression process that formation process, compressing card 112 handle and central processor CPU 111 handle and goes out group process.
Concrete steps are as shown in Figure 2:
Step S1: central processor CPU 111 receives request compression queue Request Ring buffer.
Step S2: compressing card 112HW compresses request compression queue Request Ring buffer
Step S3: data were sent into and are responded out group formation Response Ring buffer after compressing card 112HW will compress.
Step S4: central processor CPU 111 obtains the packed data that responds out in group formation.
Among Fig. 2, the flowing water design object makes CPU and compressing card 112 hardware HW all remain on the state of relative busy busy, can fully excavate the throughput of flowing water.
(3), on the I/O pattern, avoid using block type (Block) communication pattern.Blocking model is under extreme case, and CPU can get clogged and finish up to the packet compression, and this can lower the efficient of flowing water.The I/O model that can adopt user's attitude to drive transmits compression data block.The bag that calls between expense, kernel state and the user's attitude for optimization system copies communication overhead etc., current processing device adopts user's attitude to drive (User Space Driver, UIO) She Ji I/O model, realize the I/O driving model of zero-copy, thereby CPU consumption and cache pollution problem (Cache Pollution) that the copy procedure under long application of the big bag of optimization is brought, and then improve systematic function.
(4), poll (Polling) request compression queue and respond out group formation, at utmost bring into play the flowing water performance.110 pairs of compression treatment devices ask compression queue and respond out group formation to carry out polling operation, can reduce system break expense and system's handover overhead.
Transparent real-time traffic compressibility is mainly used in the uniform data in the heart high-performance data and optimizes transmission platform between the data center according to the embodiment of the invention, be used to have the bag transmission performance optimization of high latency bandwidth product characteristic link, comprise purposes such as bandwidth availability ratio optimization, delay performance optimization.By the redundant information of compression specific transactions, real-time data compression design, abundant bandwidth resources in the heart in the mining data improve the bandwidth availability ratio of the high business of some data redundancy amount, optimize cost, and flexibility and performance advantage are provided.Adopted the optimisation strategy of fine granularity control, carried out more fine-grained performance optimization according to the apply property that carries.Adopted I/O system, avoided the high overhead of the extra introducing of system design of high-magnitude from the performance perspective height optimization.
In the description of this specification, concrete feature, structure, material or characteristics that the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means in conjunction with this embodiment or example description are contained at least one embodiment of the present invention or the example.In this manual, the schematic statement to above-mentioned term not necessarily refers to identical embodiment or example.And concrete feature, structure, material or the characteristics of description can be with the suitable manner combination in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification that scope of the present invention is by claims and be equal to and limit to these embodiment.

Claims (12)

1. transparent real-time traffic compression method between a data center is characterized in that, comprises the steps:
Data stream is carried out compressing based on the fine granularity of stream, comprising: the port attribute according to data flow carries out compressing to obtain a plurality of compression data blocks based on the Compression Strategies that flows the correspondence of dividing to described data flow;
According to the stream type of described a plurality of compression data blocks, the compression data block of a plurality of stream types is transmitted simultaneously by different tunnels respectively,
Wherein, adopt batch processing strategy and local Buffer Pool method that described compression data block is transmitted.
2. the method for claim 1 is characterized in that, the magnitude in described tunnel equates with the magnitude of the packet of described data flow.
3. the method for claim 1, it is characterized in that, described data stream is carried out adopting the running of flowing water strategy based on the fine granularity compression of stream, wherein, central processor CPU receives the request compression queue, have compressing card that the described request compression queue is compressed, and will compress the back data and send into and respond out group formation, described central processor CPU obtains the described described packed data that responds out in group formation.
4. method as claimed in claim 3 is characterized in that, utilizes the buffer system between described central processor CPU and the compressing card, increases buffer size between described central processor CPU and the compressing card according to the system load jitter amplitude.
5. method as claimed in claim 3 is characterized in that, to the described request compression queue with describedly respond out group formation and carry out polling operation.
6. the method for claim 1 is characterized in that, comprises the steps: that also the I/O model that adopts user's attitude to drive transmits described compression data block.
7. transparent real-time traffic compressibility between a data center is characterized in that, comprising:
Compression treatment device, be used for data stream is carried out compressing based on the fine granularity of stream, wherein, described compression treatment device carries out compressing to obtain a plurality of compression data blocks based on the Compression Strategies that flows the correspondence of dividing to described data flow according to the port attribute of data flow;
The flow management device is used for the stream type according to described a plurality of compression data blocks, and the compression data block of a plurality of stream types is transmitted simultaneously by different tunnels respectively, wherein, adopts batch processing strategy and local Buffer Pool method that described compression data block is transmitted.
8. device as claimed in claim 7 is characterized in that, the magnitude in described tunnel equates with the magnitude of the packet of described data flow.
9. device as claimed in claim 7 is characterized in that, described compression treatment device carries out adopting the running of flowing water strategy based on the fine granularity compression of stream to described data flow, and wherein, described compression treatment device comprises central processor CPU and compressing card, wherein,
Described central processor CPU is used for receiving the request compression queue;
Described compressing card is used for the described request compression queue is compressed, and will compress the back data and send into and respond out group formation;
Wherein, described central processor CPU also is used for obtaining the described described packed data that responds out group formation.
10. device as claimed in claim 9, it is characterized in that, described compression treatment device also comprises buffer system, wherein, described buffer system is used for increasing buffer size between described central processor CPU and the compressing card according to the system load jitter amplitude between described central processor CPU and compressing card.
11. device as claimed in claim 7 is characterized in that, described compression treatment device also is used for the described request compression queue and describedly responds out group formation and carry out polling operation.
12. device as claimed in claim 7 is characterized in that, the I/O model that described current processing device adopts user's attitude to drive transmits described compression data block.
CN201310158691.3A 2013-05-02 2013-05-02 Transparent real-time traffic compression method and system between data center Active CN103220226B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310158691.3A CN103220226B (en) 2013-05-02 2013-05-02 Transparent real-time traffic compression method and system between data center

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310158691.3A CN103220226B (en) 2013-05-02 2013-05-02 Transparent real-time traffic compression method and system between data center

Publications (2)

Publication Number Publication Date
CN103220226A true CN103220226A (en) 2013-07-24
CN103220226B CN103220226B (en) 2016-04-20

Family

ID=48817705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310158691.3A Active CN103220226B (en) 2013-05-02 2013-05-02 Transparent real-time traffic compression method and system between data center

Country Status (1)

Country Link
CN (1) CN103220226B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105931278A (en) * 2015-02-28 2016-09-07 阿尔特拉公司 Methods And Apparatus For Two-dimensional Block Bit-stream Compression And Decompression
CN113301123A (en) * 2021-04-30 2021-08-24 阿里巴巴新加坡控股有限公司 Data stream processing method, device and storage medium
CN114827125A (en) * 2022-03-23 2022-07-29 深圳北鲲云计算有限公司 Parallel data transmission method, system and medium for high-performance computing cloud platform

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020091905A1 (en) * 1999-01-29 2002-07-11 Interactive Silicon, Incorporated, Parallel compression and decompression system and method having multiple parallel compression and decompression engines
CN101977162A (en) * 2010-12-03 2011-02-16 电子科技大学 Load balancing method of high-speed network
CN102916905A (en) * 2012-10-18 2013-02-06 曙光信息产业(北京)有限公司 Gigabit network card multi-path shunting method and system based on hash algorithm
CN102984269A (en) * 2012-12-10 2013-03-20 北京网御星云信息技术有限公司 Method and device for peer-to-peer flow identification
CN202907104U (en) * 2012-08-07 2013-04-24 上海算芯微电子有限公司 Compression and decompression system of video data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020091905A1 (en) * 1999-01-29 2002-07-11 Interactive Silicon, Incorporated, Parallel compression and decompression system and method having multiple parallel compression and decompression engines
CN101977162A (en) * 2010-12-03 2011-02-16 电子科技大学 Load balancing method of high-speed network
CN202907104U (en) * 2012-08-07 2013-04-24 上海算芯微电子有限公司 Compression and decompression system of video data
CN102916905A (en) * 2012-10-18 2013-02-06 曙光信息产业(北京)有限公司 Gigabit network card multi-path shunting method and system based on hash algorithm
CN102984269A (en) * 2012-12-10 2013-03-20 北京网御星云信息技术有限公司 Method and device for peer-to-peer flow identification

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105931278A (en) * 2015-02-28 2016-09-07 阿尔特拉公司 Methods And Apparatus For Two-dimensional Block Bit-stream Compression And Decompression
CN113301123A (en) * 2021-04-30 2021-08-24 阿里巴巴新加坡控股有限公司 Data stream processing method, device and storage medium
CN113301123B (en) * 2021-04-30 2024-04-05 阿里巴巴创新公司 Data stream processing method, device and storage medium
CN114827125A (en) * 2022-03-23 2022-07-29 深圳北鲲云计算有限公司 Parallel data transmission method, system and medium for high-performance computing cloud platform

Also Published As

Publication number Publication date
CN103220226B (en) 2016-04-20

Similar Documents

Publication Publication Date Title
US20210243247A1 (en) Service mesh offload to network devices
CN100461770C (en) Data processing method for the packet service transfer link of the wireless communication system of the terminal
CN111966446B (en) RDMA virtualization method in container environment
WO2012065520A1 (en) System and method for file transmission
US11936571B2 (en) Reliable transport offloaded to network devices
CN103532876A (en) Processing method and system of data stream
CN101834789A (en) Packet-circuit exchanging on-chip router oriented rollback steering routing algorithm and router used thereby
CN103220226B (en) Transparent real-time traffic compression method and system between data center
CN107800700B (en) Router and network-on-chip transmission system and method
CN1832488A (en) System and method for inter connecting SP14 equipment and PCI Express equipment
WO2023045134A1 (en) Data transmission method and apparatus
CN101217786B (en) Baseband resource sharing method, communication system and device
DE102008034006A1 (en) Arithmetic unit, hardware data transmission unit, software control unit, and method for performing a data transmission in a computing unit
US9304706B2 (en) Efficient complex network traffic management in a non-uniform memory system
CN111680791B (en) Communication method, device and system suitable for heterogeneous environment
CN109743350B (en) Unloading implementation method for switching communication mode of scientific computing application image area
CN111131081B (en) Method and device for supporting high-performance one-way transmission of multiple processes
WO2017067347A1 (en) Resource allocation method and base station
CN104636206A (en) Optimization method and device for system performance
CN112887093B (en) Hardware acceleration system and method for implementing cryptographic algorithms
Kundu et al. Hardware acceleration for open radio access networks: A contemporary overview
KR101463783B1 (en) Packet processing device and method thereof
CN1668008A (en) Data exchange method and device between double-channel asymmetric isolation networks
CN112148453A (en) Computing chip for privacy computation and network computing system
CN115473861B (en) High-performance processing system and method based on communication and calculation separation and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant