CN110890956A - Improved data blocking method for key data stream - Google Patents

Improved data blocking method for key data stream Download PDF

Info

Publication number
CN110890956A
CN110890956A CN201911057388.8A CN201911057388A CN110890956A CN 110890956 A CN110890956 A CN 110890956A CN 201911057388 A CN201911057388 A CN 201911057388A CN 110890956 A CN110890956 A CN 110890956A
Authority
CN
China
Prior art keywords
data
block
repeated
data block
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911057388.8A
Other languages
Chinese (zh)
Other versions
CN110890956B (en
Inventor
高明
罗锦
焦海
周慧颖
应丽莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Gongshang University
Original Assignee
Zhejiang Gongshang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Gongshang University filed Critical Zhejiang Gongshang University
Priority to CN201911057388.8A priority Critical patent/CN110890956B/en
Publication of CN110890956A publication Critical patent/CN110890956A/en
Application granted granted Critical
Publication of CN110890956B publication Critical patent/CN110890956B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0643Hash functions, e.g. MD5, SHA, HMAC or f9 MAC

Abstract

The invention discloses an improved data blocking method for key data streams, which is an acceleration mechanism based on a software defined wide area network. The invention comprehensively uses the fixed block algorithm, the Bloom Filter algorithm and the MD5 algorithm, and can meet the different accelerated transmission requirements of different data streams in the SD-WAN, the functions of load balancing of network flow, data stream classification and the like. The method can adopt different acceleration strategies and scheduling schemes aiming at different data flows so as to meet the requirements of users and realize the maximization of the network utilization rate.

Description

Improved data blocking method for key data stream
Technical Field
The invention belongs to the technical field of network communication, and particularly relates to an improved data blocking method for a key data stream
Background
The combination of the SDN technology and the WAN transmission improves the network transmission capability, and if the improved WAN acceleration technology is integrated into the SD-WAN, the transmission efficiency will be greatly improved for large file transmission or repeated data transmission. The traditional wide area network acceleration mode is not only unnecessary to accelerate a plurality of data streams with low QoS requirements, but also influences the transmission of some important data streams, so that the transmission quality of key data streams can not be ensured.
Disclosure of Invention
An improved data blocking method for critical data streams, comprising the steps of:
the method comprises the following steps that (1) data transmitted in the wide area network are divided into two types: critical data streams and non-critical data streams.
The key data stream class comprises a plurality of data streams of different types, namely each stream in the key data streams has certain QoS requirements, and the data streams are classified into one class only and adopt the same acceleration strategy.
The non-critical data flow is a data flow without Qos requirement.
Step (2) inputting a key data stream, carrying out size detection on the key data stream, if the data is greater than 4KB, executing step (3), otherwise, not carrying out any processing on the key data stream;
step (3) equally dividing the key data stream into data blocks with the size of 4KB by adopting a technology similar to fixed block division, and only recording the position of each block point;
step (4) using a sliding window with the size of 256 bytes to detect from the position of each block point; the implementation method is as the step (5)
Step (5) calculating the MD5 fingerprint value in the sliding window, using the MD5 fingerprint value as the input of a Bloom Filter, if the fingerprint value passes through the Bloom Filter, indicating that the data block is a high-frequency data block, executing step (6), and if the fingerprint value does not pass through the Bloom Filter, executing step (7);
step (6) detecting the MD5 fingerprint value of the data block in a repeated data base, if so, executing step (8), otherwise, executing step (9);
the repeated data base is an original data block corresponding to each MD5 value, when the MD5 value calculated by the key data stream compression module is searched in the repeated data base, if the searching is successful, the data block is the repeated data block, and the corresponding label index is found in the repeated data base for replacement.
Step (7), the sliding window is moved backwards by one byte, and the step (3) is executed until the next partitioning point is met;
step (8) indicating that the data block is a repeated data block, replacing the data block with the index value of the data block in the repeated data base, and transmitting;
step (9), the data block is not a repeated data block, but belongs to a high-frequency data block and needs to be added into a repeated data base;
and (10) repeating the step (5) until the data flow is ended.
The invention has the following beneficial effects:
the invention comprehensively uses the fixed block algorithm, the Bloom Filter algorithm and the MD5 algorithm, and can meet the different accelerated transmission requirements of different data streams in the SD-WAN, the functions of load balancing of network flow, data stream classification and the like. The method can adopt different acceleration strategies and scheduling schemes aiming at different data flows so as to meet the requirements of users and realize the maximization of the network utilization rate. The concrete embodiment is as follows:
(1) firstly, the speed of fixed blocking is far higher than that of a CDC algorithm, and no substantial blocking is carried out; (2) the algorithm adopts the MD5 algorithm to calculate the hash value of the data block, the calculation speed of the MD5 algorithm is about 227MB/S, the calculation speed of the SHA-1 algorithm is only 83MB/S, the calculation speed of the MD5 algorithm is about three times faster than that of the SHA-1 algorithm, and the method is very suitable for accelerating the wide area network; (3) the method only needs to perform substantial blocking on original data once, does not need to calculate the Rabin fingerprint value and then calculate the SHA-1 value like CDC algorithm, only needs to calculate the MD5 value, and is used as the input of a Bloom Filter and the retrieval value of a repeated database, thereby saving much time and space consumption; (4) the whole calculation process of the method is not complex, and the burden on the system is far less than that of a data coding mode requiring complex coding.
Drawings
FIG. 1 is a flow diagram of a compressed key data stream;
Detailed Description
The invention is further illustrated by the following figures and examples.
An improved data blocking method for critical data streams, comprising the steps of:
the method comprises the following steps that (1) data transmitted in the wide area network are divided into two types: critical data streams and non-critical data streams.
The key data stream class comprises a plurality of data streams of different types, namely each stream in the key data streams has certain QoS requirements, and the data streams are classified into one class only and adopt the same acceleration strategy.
The non-critical data flow is a data flow without Qos requirement.
Step (2) inputting a key data stream, carrying out size detection on the key data stream, if the data is greater than 4KB, executing step (3), otherwise, not carrying out any processing on the key data stream;
step (3) equally dividing the key data stream into data blocks with the size of 4KB by adopting a technology similar to fixed block division, and only recording the position of each block point;
step (4) using a sliding window with the size of 256 bytes to detect from the position of each block point; the method is realized as the step (5);
step (5) calculating the MD5 fingerprint value in the sliding window, using the MD5 fingerprint value as the input of a Bloom Filter, if the fingerprint value passes through the Bloom Filter, indicating that the data block is a high-frequency data block, executing step (6), and if the fingerprint value does not pass through the Bloom Filter, executing step (7);
step (6) detecting the MD5 fingerprint value of the data block in a repeated data base, if so, executing step (8), otherwise, executing step (9);
the repeated data base is an original data block corresponding to each MD5 value, when the MD5 value calculated by the key data stream compression module is searched in the repeated data base, if the searching is successful, the data block is the repeated data block, and the corresponding label index is found in the repeated data base for replacement.
Step (7), the sliding window is moved backwards by one byte, and the step (3) is executed until the next partitioning point is met;
step (8) indicating that the data block is a repeated data block, replacing the data block with the index value of the data block in the repeated data base, and transmitting;
step (9), the data block is not a repeated data block, but belongs to a high-frequency data block and needs to be added into a repeated data base;
and (10) repeating the step (5) until the data flow is ended.

Claims (1)

1. An improved data blocking method for critical data streams, comprising the steps of:
the method comprises the following steps that (1) data transmitted in the wide area network are divided into two types: critical data streams and non-critical data streams.
The key data stream class comprises a plurality of data streams of different types, namely each stream in the key data streams has certain QoS requirements, and the data streams are classified into one class only and adopt the same acceleration strategy.
The non-critical data flow is a data flow without Qos requirement.
Step (2) inputting a key data stream, carrying out size detection on the key data stream, if the data is greater than 4KB, executing step (3), otherwise, not carrying out any processing on the key data stream;
step (3) equally dividing the key data stream into data blocks with the size of 4KB by adopting a technology similar to fixed block division, and only recording the position of each block point;
step (4) using a sliding window with the size of 256 bytes to detect from the position of each block point;
step (5) calculating the MD5 fingerprint value in the sliding window, using the MD5 fingerprint value as the input of a Bloom Filter, if the fingerprint value passes through the Bloom Filter, indicating that the data block is a high-frequency data block, executing step (6), and if the fingerprint value does not pass through the Bloom Filter, executing step (7);
step (6) detecting the MD5 fingerprint value of the data block in a repeated data base, if so, executing step (8), otherwise, executing step (9);
the repeated data base is an original data block corresponding to each MD5 value, when the MD5 value calculated by the key data stream compression module is searched in the repeated data base, if the searching is successful, the data block is the repeated data block, and the corresponding label index is found in the repeated data base for replacement.
Step (7), the sliding window is moved backwards by one byte, and the step (3) is executed until the next partitioning point is met;
step (8) indicating that the data block is a repeated data block, replacing the data block with the index value of the data block in the repeated data base, and transmitting;
step (9), the data block is not a repeated data block, but belongs to a high-frequency data block and needs to be added into a repeated data base;
and (10) repeating the step (5) until the data flow is ended.
CN201911057388.8A 2019-10-31 2019-10-31 Improved data blocking method for key data stream Active CN110890956B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911057388.8A CN110890956B (en) 2019-10-31 2019-10-31 Improved data blocking method for key data stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911057388.8A CN110890956B (en) 2019-10-31 2019-10-31 Improved data blocking method for key data stream

Publications (2)

Publication Number Publication Date
CN110890956A true CN110890956A (en) 2020-03-17
CN110890956B CN110890956B (en) 2023-04-18

Family

ID=69746677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911057388.8A Active CN110890956B (en) 2019-10-31 2019-10-31 Improved data blocking method for key data stream

Country Status (1)

Country Link
CN (1) CN110890956B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050091234A1 (en) * 2003-10-23 2005-04-28 International Business Machines Corporation System and method for dividing data into predominantly fixed-sized chunks so that duplicate data chunks may be identified
CN103970744A (en) * 2013-01-25 2014-08-06 华中科技大学 Extendible repeated data detection method
US20190199636A1 (en) * 2017-09-21 2019-06-27 Citrix Systems, Inc. Encapsulating traffic entropy into virtual wan overlay for better load balancing
US20190236283A1 (en) * 2018-01-30 2019-08-01 International Business Machines Corporation Data analysis in streaming data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050091234A1 (en) * 2003-10-23 2005-04-28 International Business Machines Corporation System and method for dividing data into predominantly fixed-sized chunks so that duplicate data chunks may be identified
CN103970744A (en) * 2013-01-25 2014-08-06 华中科技大学 Extendible repeated data detection method
US20190199636A1 (en) * 2017-09-21 2019-06-27 Citrix Systems, Inc. Encapsulating traffic entropy into virtual wan overlay for better load balancing
US20190236283A1 (en) * 2018-01-30 2019-08-01 International Business Machines Corporation Data analysis in streaming data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
时立锋 等: "基于数据去重的广域网络传输优化系统研究" *

Also Published As

Publication number Publication date
CN110890956B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CA2898667C (en) Data object processing method and apparatus
KR20200024193A (en) Apparatus and method for single pass entropy detection on data transfer
EP2895968B1 (en) Optimal data representation and auxiliary structures for in-memory database query processing
WO2010135082A1 (en) Localized weak bit assignment
US11551785B2 (en) Gene sequencing data compression preprocessing, compression and decompression method, system, and computer-readable medium
CN106257403A (en) The apparatus and method of the single-pass entropy detection for transmitting about data
WO2020207410A1 (en) Data compression method, electronic device, and storage medium
CN106295500B (en) A kind of repetition dither signal and normal signal separation method
CN103678158B (en) A kind of data layout optimization method and system
CN109376797B (en) Network traffic classification method based on binary encoder and multi-hash table
CN103412858A (en) Method for large-scale feature matching of text content or network content analyses
CN105917304A (en) Apparatus and method for de-duplication of data
CN110890956B (en) Improved data blocking method for key data stream
US20220004524A1 (en) Chunking method and apparatus
CN112104658B (en) Message compression method and system
CN105930104B (en) Date storage method and device
CN104751459A (en) Multi-dimensional feature similarity measuring optimizing method and image matching method
CN102622354B (en) Aggregated data quick searching method based on feature vector
CN103744899A (en) Distributed environment based mass data rapid classification method
CN111159996B (en) Short text set similarity comparison method and system based on text fingerprint algorithm
CN106682107A (en) Method and device for determining database table incidence relation
CN107193862A (en) A kind of variance optimization histogram construction method and device based on Spark Streaming
CN103258035A (en) Method and device for data processing
US11971856B2 (en) Efficient database query evaluation
Wen et al. MASC: A bitmap index encoding algorithm for fast data retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant