CN113452621A - Simple and efficient multilink data deduplication method - Google Patents

Simple and efficient multilink data deduplication method Download PDF

Info

Publication number
CN113452621A
CN113452621A CN202110652823.2A CN202110652823A CN113452621A CN 113452621 A CN113452621 A CN 113452621A CN 202110652823 A CN202110652823 A CN 202110652823A CN 113452621 A CN113452621 A CN 113452621A
Authority
CN
China
Prior art keywords
link data
data packet
link
packet
packets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110652823.2A
Other languages
Chinese (zh)
Other versions
CN113452621B (en
Inventor
张凯
郑应强
刘同鹤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing LSSEC Technology Co Ltd
Original Assignee
Beijing LSSEC Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing LSSEC Technology Co Ltd filed Critical Beijing LSSEC Technology Co Ltd
Priority to CN202110652823.2A priority Critical patent/CN113452621B/en
Publication of CN113452621A publication Critical patent/CN113452621A/en
Application granted granted Critical
Publication of CN113452621B publication Critical patent/CN113452621B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/31Flow control; Congestion control by tagging of packets, e.g. using discard eligibility [DE] bits
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/32Flow control; Congestion control by discarding or delaying data units, e.g. packets or frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a simple and efficient multilink data deduplication method, which comprises the following steps: acquiring a plurality of link data packets received by a receiving end: respectively determining the types of a plurality of link data packets, classifying the link data packets, and determining the link data packets as a first link data packet set and a second link data packet set; the first link data packet set comprises a plurality of first link data packets, and the first link data packets are link fragment data; the second link data packet set comprises a plurality of second link data packets, and the second link data packets are link complete data; respectively judging whether a first link data packet in a first link data packet set is a redundant packet, screening out the first link data packet which is the redundant packet, and performing deduplication processing; and respectively judging whether the second link data packets in the second link data packet set are redundant packets, screening out the second link data packets which are the redundant packets, and performing deduplication processing. The deduplication efficiency is improved.

Description

Simple and efficient multilink data deduplication method
Technical Field
The invention relates to the technical field of data processing, in particular to a simple and efficient multilink data deduplication method.
Background
With the continuous development of the equipment redundancy technology, the method is widely applied to the communication field. In the data communication process, in order to ensure that the link can realize the data transmission with the maximum reliability, the multilink equipment can carry out three-level redundant transmission when transmitting the original user data, so that the completeness of the data received by a receiving end can be ensured, and the data incompleteness caused by packet loss and the like in the data transmission process is avoided. However, when the receiving end performs data reassembly, the receiving end receives repeated fragment data based on the three-level redundant transmission, and the redundant fragments are too many, which occupies more memory and also occupies the memory garbage recycling time. Under more extreme conditions, in order to ensure the maximum reliable transmission, multiple link sending ends may copy multiple copies of original user data and send the copied data to a receiving end of a multilink device, and if these data packets are all output to a service port of the receiving end, on one hand, the processing load at the post-stage is increased, and on the other hand, the processing result of service software may be affected.
In the prior art, the redundant data acquired by a receiving end has the problems of incomplete data deduplication, low deduplication efficiency and inaccurate deduplication, and the incomplete data can be reconstructed based on the inaccurate deduplication, so that the system can perform error control according to the incomplete data, and the safety and reliability of the system are affected.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, the invention aims to provide a simple and efficient multilink data deduplication method, the type of a data packet is accurately determined, whether the data packet is a redundant packet or not is accurately judged according to the classified data packet, and then different deduplication methods are adopted, so that deduplication efficiency of the data packet is improved; on the one hand, the occupied system resources of the receiving end are reduced as much as possible, such as memory, CPU processing time and the like, on the other hand, for a user, no matter how the sending end sends data, the receiving end finally only receives one piece of original data, the user service level can not be affected by any adverse effect, the completeness and the accuracy of the data after the data are recombined at the receiving end are guaranteed, the quality of data transmission is improved, meanwhile, the phenomenon that the data are not complete due to inaccurate de-duplication is avoided, further, the system can perform error control according to incomplete data, the safety and the reliability of the system are improved, and the multi-link data are more simply and efficiently de-duplicated.
To achieve the above object, an embodiment of the present invention provides a simple and efficient method for removing duplicate data in multiple links, including:
acquiring a plurality of link data packets received by a receiving end:
respectively determining the types of a plurality of link data packets, classifying the link data packets, and determining the link data packets as a first link data packet set and a second link data packet set; the first link data packet set comprises a plurality of first link data packets, and the first link data packets are link fragment data; the second link data packet set comprises a plurality of second link data packets, and the second link data packets are link complete data;
respectively judging whether a first link data packet in a first link data packet set is a redundant packet, screening out the first link data packet which is the redundant packet, and performing deduplication processing;
and respectively judging whether the second link data packets in the second link data packet set are redundant packets, screening out the second link data packets which are the redundant packets, and performing deduplication processing.
According to some embodiments of the present invention, respectively determining whether a first link data packet in a first link data packet set is a redundant packet, screening out a first link data packet that is a redundant packet, and performing deduplication processing, includes:
acquiring a source id of a historical first link data packet, and establishing a duplicate removal black tree;
the duplicate removal red and black tree comprises a plurality of tree nodes, and a source port, a destination port and a historical first link data packet serial number are stored in each tree node;
establishing a first hash table according to a source port, a destination port and a historical first link data packet serial number stored in each tree node;
the first hash table comprises a plurality of first nodes, and history information is stored in each first node;
obtaining a source id of a first link data packet, and determining a target tree node on the duplicate removal black tree according to the source id of the first link data packet;
quickly positioning to a corresponding position in a first hash table according to a first target packet serial number stored in a target tree node, and determining a first target node;
comparing the first to-be-stored information of the current first link data packet with the historical information stored by the first target node, and judging whether the time difference between the historical information stored by the first target node and the first to-be-stored information of the current first link data packet is smaller than a preset time difference or not when the first to-be-stored information and the historical information stored by the first target node are consistent;
and when the time difference between the historical information stored by the first target node and the first to-be-stored information of the current first link data packet is determined to be smaller than the preset time difference, the first link data packet is represented as a redundant packet, and the deduplication processing is carried out without entering the subsequent data reassembly processing flow.
According to some embodiments of the present invention, respectively determining whether a second link data packet in a second link data packet set is a redundant packet, screening out a second link data packet that is a redundant packet, and performing deduplication processing, includes:
acquiring a source id of a historical second link data packet, and establishing a second hash table and a third hash table; the second hash table takes the source id of the historical second link data packet as KEY; establishing an incidence relation between the second hash table and the third hash table;
acquiring a source id of a second link data packet, and quickly searching a second target node in a second hash table;
quickly positioning to a corresponding position in a third hash table according to a second target packet serial number stored in the second target node, and determining a third target node;
when the third target packet sequence number stored in the third target node is determined to be consistent with the second link data packet sequence number, judging whether the time difference between the third target node historically storing the third target packet sequence number and the second link data packet received this time is smaller than a preset time difference;
and when the time difference between the third target node historically storing the third target packet serial number and the current received second link data packet is determined to be smaller than the preset time difference, the second link data packet is represented as a redundant packet, and the duplicate removal processing is carried out without entering the subsequent data recombination processing flow.
According to some embodiments of the invention, the time difference is 3 s.
According to some embodiments of the present invention, the history information includes a history first link data packet sequence number, a bit map value for reconstructing data, a length of the history first link data packet, an offset position of the history first link data packet in a data packet to be transmitted, a unique identifier ID of the history first link data packet, and a received time of the history first link data packet.
According to some embodiments of the present invention, after performing deduplication processing on the first link data packet screened out as the redundant packet, the method further includes:
randomly selecting two first link data packets from the first link data packet set subjected to the duplicate removal processing, wherein the two first link data packets are a first link data packet A and a first link data packet B respectively;
calculating the similarity between the first link data packet A and the first link data packet B, judging whether the similarity is greater than a preset similarity, and when the similarity is determined to be greater than the preset similarity, indicating that the duplicate removal processing of the first link data packet in the first link data packet set is unqualified and the duplicate removal processing needs to be carried out again;
calculating the similarity between the first link data packet A and the first link data packet B, including:
acquiring sub-link data (A) included in a first link data packet A1,A2,...,Am);
Acquiring sub-link data included in a second link packet B (B)1,B2,...,Bm);
Calculating a similarity S (a, B) of the first link packet a and the first link packet B based on formula (1):
Figure BDA0003112379190000051
wherein m is the number of sub-link data included in the first link data packet a; n is a second chainThe number of sub-link data included in the link data packet B; i is the ith sub-link data in the first link data packet A; j is jth sub-link data in the second link data packet B; k is a radical ofijA judgment coefficient for judging whether the ith sub-link data in the first link data packet A is the same as the jth sub-link data in the second link data packet B, and when the ith sub-link data in the first link data packet A and the jth sub-link data in the second link data packet B are the same, k isij1 is ═ 1; otherwise; k is a radical ofij=0。
According to some embodiments of the present invention, before determining and classifying the types of the plurality of link packets, the method further includes:
respectively judging whether the plurality of link data packets comprise the marked images, and determining the link data packets with the marked images as to-be-detected link data packets;
carrying out gray processing on the marked image in the link data packet to be detected, and calculating to obtain a gray conversion function;
calculating the average power of noise signals in the marked image according to the gray scale transformation function, judging whether the average power is greater than a preset average power or not, and performing noise reduction processing on the marked image when the average power is determined to be greater than the preset average power;
calculating the average power of noise signals in the marked image according to the gray scale transformation function, comprising:
calculating the average power of the noise signal in the length direction in the marked image
Figure BDA0003112379190000061
Figure BDA0003112379190000062
Wherein N is the length of the marker image; m is the width of the marker image; f (x, y) is a gray scale transformation function with respect to a pixel point (x, y) on the marker image;
calculating average power of noise signal in width direction in graphic data
Figure BDA0003112379190000063
Figure BDA0003112379190000064
Calculating the average power of the noise signal in the marked image according to the average power of the noise signal in the length direction and the average power of the noise signal in the width direction in the marked image:
Figure BDA0003112379190000065
wherein the content of the first and second substances,
Figure BDA0003112379190000066
is the average power of the noise signal in the marked image.
According to some embodiments of the present invention, before determining and classifying the types of the plurality of link packets, the method further includes:
determining the ratio of the number of the link data packets received by the receiving end to the number of the link data packets sent by the sending end, and judging whether the ratio is smaller than a preset ratio or not;
when the ratio is determined to be smaller than the preset ratio, detecting the duty ratio of a preset network node in a use link for sending the link data packet at the time;
determining the load capacity of the preset network nodes according to the duty ratio, screening out the preset network nodes with the load capacity larger than the preset load capacity as marked preset network nodes;
determining link information comprising the marked preset network node by taking the marked preset network node as an extension point;
and setting the priority level of the used link according to the link information to ensure that the ratio of the number of the link data packets received again by the receiving end to the number of the link data packets sent again by the sending end is greater than or equal to a preset ratio.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flow chart of a simple and efficient method for removing duplicate multi-link data according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
As shown in fig. 1, an embodiment of the present invention provides a simple and efficient method for removing duplicate data from multiple links, including steps S1-S4:
s1, acquiring a plurality of link data packets received by the receiving end:
s2, determining the types of a plurality of link data packets respectively, classifying the link data packets, and determining the link data packets as a first link data packet set and a second link data packet set; the first link data packet set comprises a plurality of first link data packets, and the first link data packets are link fragment data; the second link data packet set comprises a plurality of second link data packets, and the second link data packets are link complete data;
s3, respectively judging whether the first link data packet in the first link data packet set is a redundant packet, screening out the first link data packet which is the redundant packet and carrying out deduplication processing;
and S4, respectively judging whether the second link data packets in the second link data packet set are redundant packets, screening out the second link data packets which are redundant packets, and performing deduplication processing.
The working principle of the technical scheme is as follows: acquiring a plurality of link data packets received by a receiving end; respectively determining the types of a plurality of link data packets, classifying the link data packets, and determining the link data packets as a first link data packet set and a second link data packet set; the first link data packet set comprises a plurality of first link data packets, and the first link data packets are link fragment data; the second link data packet set comprises a plurality of second link data packets, and the second link data packets are link complete data; namely, if the first link data packet is a redundant packet, the first link data packet is link fragment data redundancy; if the second link data packet is a redundant packet, the second link data packet is link complete data redundancy; respectively judging whether a first link data packet in a first link data packet set is a redundant packet, screening out the first link data packet which is the redundant packet, and performing deduplication processing; and respectively judging whether the second link data packets in the second link data packet set are redundant packets, screening out the second link data packets which are the redundant packets, and performing deduplication processing.
The beneficial effects of the above technical scheme are that: the type of the link data packet is accurately determined, whether the link data packet is a redundant packet is accurately judged according to the classified link data packet, and then different duplication eliminating methods are adopted, so that the duplication eliminating efficiency of the data packet is improved; on the one hand, the occupied system resources of the receiving end are reduced as much as possible, such as memory, CPU processing time and the like, on the other hand, for a user, no matter how the sending end sends data, the receiving end finally only receives one piece of original data, the user service level can not be affected by any adverse effect, the completeness and the accuracy of the data after the data are recombined at the receiving end are guaranteed, the quality of data transmission is improved, meanwhile, the phenomenon that the data are not complete due to inaccurate de-duplication is avoided, further, the system can perform error control according to incomplete data, the safety and the reliability of the system are improved, and the multi-link data are more simply and efficiently de-duplicated.
According to some embodiments of the present invention, respectively determining whether a first link data packet in a first link data packet set is a redundant packet, screening out a first link data packet that is a redundant packet, and performing deduplication processing, includes:
acquiring a source id of a historical first link data packet, and establishing a duplicate removal black tree;
the duplicate removal red and black tree comprises a plurality of tree nodes, and a source port, a destination port and a historical first link data packet serial number are stored in each tree node;
establishing a first hash table according to a source port, a destination port and a historical first link data packet serial number stored in each tree node;
the first hash table comprises a plurality of first nodes, and history information is stored in each first node;
obtaining a source id of a first link data packet, and determining a target tree node on the duplicate removal black tree according to the source id of the first link data packet;
quickly positioning to a corresponding position in a first hash table according to a first target packet serial number stored in a target tree node, and determining a first target node;
comparing the first to-be-stored information of the current first link data packet with the historical information stored by the first target node, and judging whether the time difference between the historical information stored by the first target node and the first to-be-stored information of the current first link data packet is smaller than a preset time difference or not when the first to-be-stored information and the historical information stored by the first target node are consistent;
and when the time difference between the historical information stored by the first target node and the first to-be-stored information of the current first link data packet is determined to be smaller than the preset time difference, the first link data packet is represented as a redundant packet, and the deduplication processing is carried out without entering the subsequent data reassembly processing flow.
The working principle of the technical scheme is as follows: the Red Black Tree (Red Black Tree) is a self-balancing binary search Tree, is a data structure used in computer science, and is typically used for realizing an associated array. When judging whether a first link data packet in a first link data packet set is a redundant packet, acquiring a source id of a historical first link data packet, and establishing a duplicate removal Reddish-Black tree; the duplicate removal red and black tree comprises a plurality of tree nodes, and a source port, a destination port and a historical first link data packet serial number are stored in each tree node; the source port is the beginning of the transmission link and the destination port is the end of the transmission link. A hash table, also called a hash table, is a data structure directly accessed from a Key value (Key value). That is, it accesses the record by mapping the key value to a location in the table to speed the lookup. Establishing a first hash table according to a source port, a destination port and a historical first link data packet serial number stored in each tree node; the first hash table comprises a plurality of first nodes, and history information is stored in each first node; obtaining a source id of a first link data packet, and determining a target tree node on the duplicate removal black tree according to the source id of the first link data packet; quickly positioning to a corresponding position in a first hash table according to a first target packet serial number stored in a target tree node, and determining a first target node; comparing the first to-be-stored information of the current first link data packet with the historical information stored by the first target node, and judging whether the time difference between the historical information stored by the first target node and the first to-be-stored information of the current first link data packet is smaller than a preset time difference or not when the first to-be-stored information and the historical information stored by the first target node are consistent; and when the time difference between the historical information stored by the first target node and the first to-be-stored information of the current first link data packet is determined to be smaller than the preset time difference, the first link data packet is represented as a redundant packet, and the deduplication processing is carried out without entering the subsequent data reassembly processing flow. The historical first link data packet is the same type of historical data as the first link data packet.
The beneficial effects of the above technical scheme are that: the method comprises the steps of carrying out quick searching and positioning on a first link data packet based on a duplicate removal red-black tree established by a historical first link data packet, determining whether the first link data packet is recorded in a historical process, and accurately determining whether the first link data packet is a redundant packet based on two key factors of comparing first to-be-stored information of the current first link data packet with historical information stored by a first target node and comparing whether the time difference between the historical information stored by the first target node and the first to-be-stored information of the current first link data packet is smaller than a preset time difference.
According to some embodiments of the present invention, respectively determining whether a second link data packet in a second link data packet set is a redundant packet, screening out a second link data packet that is a redundant packet, and performing deduplication processing, includes:
acquiring a source id of a historical second link data packet, and establishing a second hash table and a third hash table; the second hash table takes the source id of the historical second link data packet as KEY; establishing an incidence relation between the second hash table and the third hash table;
acquiring a source id of a second link data packet, and quickly searching a second target node in a second hash table;
quickly positioning to a corresponding position in a third hash table according to a second target packet serial number stored in the second target node, and determining a third target node;
when the third target packet sequence number stored in the third target node is determined to be consistent with the second link data packet sequence number, judging whether the time difference between the third target node historically storing the third target packet sequence number and the second link data packet received this time is smaller than a preset time difference;
and when the time difference between the third target node historically storing the third target packet serial number and the current received second link data packet is determined to be smaller than the preset time difference, the second link data packet is represented as a redundant packet, and the duplicate removal processing is carried out without entering the subsequent data recombination processing flow.
The working principle of the technical scheme is as follows: when judging whether a second link data packet in a second link data packet set is a redundant packet, acquiring a source id of a historical second link data packet, and establishing a second hash table and a third hash table; the second hash table takes the source id of the historical second link data packet as KEY; establishing an incidence relation between the second hash table and the third hash table; acquiring a source id of a second link data packet, and quickly searching a second target node in a second hash table; quickly positioning to a corresponding position in a third hash table according to a second target packet serial number stored in the second target node, and determining a third target node; when the third target packet sequence number stored in the third target node is determined to be consistent with the second link data packet sequence number, judging whether the time difference between the third target node historically storing the third target packet sequence number and the second link data packet received this time is smaller than a preset time difference; and when the time difference between the third target node historically storing the third target packet serial number and the current received second link data packet is determined to be smaller than the preset time difference, the second link data packet is represented as a redundant packet, and the duplicate removal processing is carried out without entering the subsequent data recombination processing flow. The second hash table is used as a primary table. The third hash table is used as a secondary table. The second hash table comprises a plurality of second nodes; the third hash table comprises a plurality of second nodes; each node maintains a sequence number and time of the most recently output link packet recorded at the corresponding location. The historical second link data packet is the same type of historical data as the second link data packet.
The beneficial effects of the above technical scheme are that: and a two-stage hash table is established based on the historical second link data packet, so that whether the received second link data packet is a redundant packet or not is accurately judged, and the method is simpler and more efficient.
According to some embodiments of the invention, the time difference is 3 s.
According to some embodiments of the present invention, the history information includes a history first link data packet sequence number, a bit map value for reconstructing data, a length of the history first link data packet, an offset position of the history first link data packet in a data packet to be transmitted, a unique identifier ID of the history first link data packet, and a received time of the history first link data packet.
And accurately judging whether the first link data packet is a redundant data packet or not based on the unique identification ID of the historical first link data packet and the first link data packet received this time.
According to some embodiments of the present invention, after performing deduplication processing on the first link data packet screened out as the redundant packet, the method further includes:
randomly selecting two first link data packets from the first link data packet set subjected to the duplicate removal processing, wherein the two first link data packets are a first link data packet A and a first link data packet B respectively;
calculating the similarity between the first link data packet A and the first link data packet B, judging whether the similarity is greater than a preset similarity, and when the similarity is determined to be greater than the preset similarity, indicating that the duplicate removal processing of the first link data packet in the first link data packet set is unqualified and the duplicate removal processing needs to be carried out again;
calculating the similarity between the first link data packet A and the first link data packet B, including:
acquiring sub-link data (A) included in a first link data packet A1,A2,...,Am);
Acquiring sub-link data included in a second link packet B (B)1,B2,...,Bm);
Calculating a similarity S (a, B) of the first link packet a and the first link packet B based on formula (1):
Figure BDA0003112379190000141
wherein m is the number of sub-link data included in the first link data packet a; n is the number of sub-link data included in the second link packet B; i is the ith sub-link data in the first link data packet A; j is jth sub-link data in the second link data packet B; k is a radical ofijA judgment coefficient for judging whether the ith sub-link data in the first link data packet A is the same as the jth sub-link data in the second link data packet B, and when the ith sub-link data in the first link data packet A and the jth sub-link data in the second link data packet B are the same, k isij1 is ═ 1; otherwise; k is a radical ofij=0。
The working principle and the beneficial effects of the technical scheme are as follows: after the selected first link data packets which are the redundant packets are subjected to deduplication processing, detecting the deduplication effect of a first link data packet set subjected to deduplication processing, specifically, randomly selecting two first link data packets which are a first link data packet A and a first link data packet B from the first link data packet set subjected to deduplication processing; calculating the similarity between the first link data packet A and the first link data packet B, judging whether the similarity is greater than a preset similarity, and when the similarity is determined to be greater than the preset similarity, indicating that the duplicate removal processing of the first link data packet in the first link data packet set is unqualified and the duplicate removal processing needs to be carried out again; in another embodiment, the two first link data packets may be randomly selected for multiple times, the similarity between the two first link data packets may be calculated, the mean value of the similarity may be calculated, and the accuracy of the detection of the deduplication effect may be improved according to the mean value of the similarity. The similarity between the first link data packet A and the first link data packet B is calculated, the sub-link data included in the two first link data packets are respectively compared, the arrangement sequence of the sub-link data included in the same first link data packet is considered, the calculated similarity is more reasonable, the similarity between the first link data packet A and the first link data packet B is accurately calculated according to a formula (1), and the accuracy of judging the similarity and the preset similarity is improved.
According to some embodiments of the present invention, before determining and classifying the types of the plurality of link packets, the method further includes:
respectively judging whether the plurality of link data packets comprise the marked images, and determining the link data packets with the marked images as to-be-detected link data packets;
carrying out gray processing on the marked image in the link data packet to be detected, and calculating to obtain a gray conversion function;
calculating the average power of noise signals in the marked image according to the gray scale transformation function, judging whether the average power is greater than a preset average power or not, and performing noise reduction processing on the marked image when the average power is determined to be greater than the preset average power;
calculating the average power of noise signals in the marked image according to the gray scale transformation function, comprising:
calculating the average power of the noise signal in the length direction in the marked image
Figure BDA0003112379190000151
Figure BDA0003112379190000161
Wherein N is the length of the marker image; m is the width of the marker image; f (x, y) is a gray scale transformation function with respect to a pixel point (x, y) on the marker image;
calculating average power of noise signal in width direction in graphic data
Figure BDA0003112379190000162
Figure BDA0003112379190000163
Calculating the average power of the noise signal in the marked image according to the average power of the noise signal in the length direction and the average power of the noise signal in the width direction in the marked image:
Figure BDA0003112379190000164
wherein the content of the first and second substances,
Figure BDA0003112379190000165
is the average power of the noise signal in the marked image.
The working principle and the beneficial effects of the technical scheme are as follows: before determining and classifying the types of a plurality of link data packets respectively, the method further comprises the following steps: respectively judging whether the plurality of link data packets comprise the marked images, and determining the link data packets with the marked images as to-be-detected link data packets; the marking image is a sending identification for marking the link data packet, so that whether the link data packet is lost in the transmission process is conveniently judged, and when the link data packet is lost, the lost link data packet is determined according to the marking image, so that the link data packet is conveniently sent again, and the accuracy of data transmission is improved. Carrying out gray processing on the marked image in the link data packet to be detected, and calculating to obtain a gray conversion function; calculating the average power of noise signals in the marked image according to the gray scale transformation function, judging whether the average power is greater than a preset average power or not, and performing noise reduction processing on the marked image when the average power is determined to be greater than the preset average power; the method and the device facilitate the improvement of the identification accuracy of the marked image, and improve the determination accuracy for determining the types of the plurality of link data packets and classifying the link data packets in the subsequent steps. The gray scale transformation function is prior art and will not be described herein. According to the average power of the noise signals in the marked image in the length direction and the average power of the noise signals in the width direction, the average power of the noise signals in the marked image is calculated, the accuracy of calculating the total average power of the noise signals in the marked image is improved, and the accuracy of judging the average power and the preset average power is further improved.
According to some embodiments of the present invention, before determining and classifying the types of the plurality of link packets, the method further includes:
determining the ratio of the number of the link data packets received by the receiving end to the number of the link data packets sent by the sending end, and judging whether the ratio is smaller than a preset ratio or not;
when the ratio is determined to be smaller than the preset ratio, detecting the duty ratio of a preset network node in a use link for sending the link data packet at the time;
determining the load capacity of the preset network nodes according to the duty ratio, screening out the preset network nodes with the load capacity larger than the preset load capacity as marked preset network nodes;
determining link information comprising the marked preset network node by taking the marked preset network node as an extension point;
and setting the priority level of the used link according to the link information to ensure that the ratio of the number of the link data packets received again by the receiving end to the number of the link data packets sent again by the sending end is greater than or equal to a preset ratio.
The working principle and the beneficial effects of the technical scheme are as follows: before determining the types of the plurality of link data packets and classifying the link data packets, the method also comprises the step of detecting the sending success rate and the packet loss number of the link data packets of the sending end, so that when the receiving end conducts data recombination, the plurality of data packets can form the data to be transmitted determined by the sending end completely. Determining the ratio of the number of the link data packets received by the receiving end to the number of the link data packets sent by the sending end, and judging whether the ratio is smaller than a preset ratio or not; indicating that the packet loss number is too large, the link quality of a used link of a data packet of a sending link needs to be detected, and detecting the duty ratio of a preset network node in the used link of the data packet of the sending link when the ratio is determined to be smaller than a preset ratio; determining the load capacity of the preset network nodes according to the duty ratio, screening out the preset network nodes with the load capacity larger than the preset load capacity as marked preset network nodes; determining link information comprising the marked preset network node by taking the marked preset network node as an extension point; and setting the priority level of the used link according to the link information to ensure that the ratio of the number of the link data packets received again by the receiving end to the number of the link data packets sent again by the sending end is greater than or equal to a preset ratio. The priority level of the used link is determined, the priority transmission of the data packet of the current link is guaranteed, the packet loss number is reduced, the number of the link data packets received by the receiving end is guaranteed, and the accuracy of data transmission is improved.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. A simple and efficient multilink data deduplication method is characterized by comprising the following steps:
acquiring a plurality of link data packets received by a receiving end:
respectively determining the types of a plurality of link data packets, classifying the link data packets, and determining the link data packets as a first link data packet set and a second link data packet set; the first link data packet set comprises a plurality of first link data packets, and the first link data packets are link fragment data; the second link data packet set comprises a plurality of second link data packets, and the second link data packets are link complete data;
respectively judging whether a first link data packet in a first link data packet set is a redundant packet, screening out the first link data packet which is the redundant packet, and performing deduplication processing;
and respectively judging whether the second link data packets in the second link data packet set are redundant packets, screening out the second link data packets which are the redundant packets, and performing deduplication processing.
2. The simple and efficient method for removing duplicate data in multiple links according to claim 1, wherein determining whether the first link data packet in the first link data packet set is a redundant packet, respectively, and screening out the first link data packet that is the redundant packet and performing the duplicate removal process includes:
acquiring a source id of a historical first link data packet, and establishing a duplicate removal black tree;
the duplicate removal red and black tree comprises a plurality of tree nodes, and a source port, a destination port and a historical first link data packet serial number are stored in each tree node;
establishing a first hash table according to a source port, a destination port and a historical first link data packet serial number stored in each tree node;
the first hash table comprises a plurality of first nodes, and history information is stored in each first node;
obtaining a source id of a first link data packet, and determining a target tree node on the duplicate removal black tree according to the source id of the first link data packet;
quickly positioning to a corresponding position in a first hash table according to a first target packet serial number stored in a target tree node, and determining a first target node;
comparing the first to-be-stored information of the current first link data packet with the historical information stored by the first target node, and judging whether the time difference between the historical information stored by the first target node and the first to-be-stored information of the current first link data packet is smaller than a preset time difference or not when the first to-be-stored information and the historical information stored by the first target node are consistent;
and when the time difference between the historical information stored by the first target node and the first to-be-stored information of the current first link data packet is determined to be smaller than the preset time difference, the first link data packet is represented as a redundant packet, and the deduplication processing is carried out without entering the subsequent data reassembly processing flow.
3. The simple and efficient method for removing duplicate data in multiple links according to claim 1, wherein determining whether the second link data packets in the second link data packet set are redundant packets, respectively, and screening out the second link data packets that are redundant packets and performing the duplicate removal process includes:
acquiring a source id of a historical second link data packet, and establishing a second hash table and a third hash table; the second hash table takes the source id of the historical second link data packet as KEY; establishing an incidence relation between the second hash table and the third hash table;
acquiring a source id of a second link data packet, and quickly searching a second target node in a second hash table;
quickly positioning to a corresponding position in a third hash table according to a second target packet serial number stored in the second target node, and determining a third target node;
when the third target packet sequence number stored in the third target node is determined to be consistent with the second link data packet sequence number, judging whether the time difference between the third target node historically storing the third target packet sequence number and the second link data packet received this time is smaller than a preset time difference;
and when the time difference between the third target node historically storing the third target packet serial number and the current received second link data packet is determined to be smaller than the preset time difference, the second link data packet is represented as a redundant packet, and the duplicate removal processing is carried out without entering the subsequent data recombination processing flow.
4. A simple and efficient multilink data deduplication method as recited in claim 2 or 3, wherein the time difference is 3 s.
5. The simple and efficient multilink data deduplication method of claim 2, wherein the historical information comprises a historical first link data packet sequence number, a bit map value for reassembly data, a length of the historical first link data packet, an offset position of the historical first link data packet within a data packet to be transmitted, a unique identification ID of the historical first link data packet, and a received time of the historical first link data packet.
6. The simple and efficient method for removing duplicate data in multiple links according to claim 1, wherein after the first link data packet screened out as the redundant packet is subjected to the duplicate removal process, the method further comprises:
randomly selecting two first link data packets from the first link data packet set subjected to the duplicate removal processing, wherein the two first link data packets are a first link data packet A and a first link data packet B respectively;
calculating the similarity between the first link data packet A and the first link data packet B, judging whether the similarity is greater than a preset similarity, and when the similarity is determined to be greater than the preset similarity, indicating that the duplicate removal processing of the first link data packet in the first link data packet set is unqualified and the duplicate removal processing needs to be carried out again;
calculating the similarity between the first link data packet A and the first link data packet B, including:
acquiring sub-link data (A) included in a first link data packet A1,A2,...,Am);
Acquiring sub-link data included in a second link packet B (B)1,B2,...,Bm);
Calculating a similarity S (a, B) of the first link packet a and the first link packet B based on formula (1):
Figure FDA0003112379180000041
wherein m is the number of sub-link data included in the first link data packet a; n is the number of sub-link data included in the second link packet B; i is the ith sub-link data in the first link data packet A; j is jth sub-link data in the second link data packet B; k is a radical ofijA judgment coefficient for judging whether the ith sub-link data in the first link data packet A is the same as the jth sub-link data in the second link data packet B, and when the ith sub-link data in the first link data packet A and the jth sub-link data in the second link data packet B are the same, k isij1 is ═ 1; otherwise; k is a radical ofij=0。
7. The simple and efficient method for removing duplicate content in multilink data according to claim 1, further comprising, before determining and classifying the types of the plurality of link packets, respectively:
respectively judging whether the plurality of link data packets comprise the marked images, and determining the link data packets with the marked images as to-be-detected link data packets;
carrying out gray processing on the marked image in the link data packet to be detected, and calculating to obtain a gray conversion function;
calculating the average power of noise signals in the marked image according to the gray scale transformation function, judging whether the average power is greater than a preset average power or not, and performing noise reduction processing on the marked image when the average power is determined to be greater than the preset average power;
calculating the average power of noise signals in the marked image according to the gray scale transformation function, comprising:
calculating the average power of the noise signal in the length direction in the marked image
Figure FDA0003112379180000051
Figure FDA0003112379180000052
Wherein N is the length of the marker image; m is the width of the marker image; f (x, y) is a gray scale transformation function with respect to a pixel point (x, y) on the marker image;
calculating average power of noise signal in width direction in graphic data
Figure FDA0003112379180000053
Figure FDA0003112379180000054
Calculating the average power of the noise signal in the marked image according to the average power of the noise signal in the length direction and the average power of the noise signal in the width direction in the marked image:
Figure FDA0003112379180000055
wherein the content of the first and second substances,
Figure FDA0003112379180000056
is the average power of the noise signal in the marked image.
8. The simple and efficient method for removing duplicate content in multilink data according to claim 1, further comprising, before determining and classifying the types of the plurality of link packets, respectively:
determining the ratio of the number of the link data packets received by the receiving end to the number of the link data packets sent by the sending end, and judging whether the ratio is smaller than a preset ratio or not;
when the ratio is determined to be smaller than the preset ratio, detecting the duty ratio of a preset network node in a use link for sending the link data packet at the time;
determining the load capacity of the preset network nodes according to the duty ratio, screening out the preset network nodes with the load capacity larger than the preset load capacity as marked preset network nodes;
determining link information comprising the marked preset network node by taking the marked preset network node as an extension point;
and setting the priority level of the used link according to the link information to ensure that the ratio of the number of the link data packets received again by the receiving end to the number of the link data packets sent again by the sending end is greater than or equal to a preset ratio.
CN202110652823.2A 2021-06-11 2021-06-11 Simple and efficient multilink data deduplication method Active CN113452621B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110652823.2A CN113452621B (en) 2021-06-11 2021-06-11 Simple and efficient multilink data deduplication method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110652823.2A CN113452621B (en) 2021-06-11 2021-06-11 Simple and efficient multilink data deduplication method

Publications (2)

Publication Number Publication Date
CN113452621A true CN113452621A (en) 2021-09-28
CN113452621B CN113452621B (en) 2022-02-25

Family

ID=77811363

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110652823.2A Active CN113452621B (en) 2021-06-11 2021-06-11 Simple and efficient multilink data deduplication method

Country Status (1)

Country Link
CN (1) CN113452621B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114244781A (en) * 2021-12-20 2022-03-25 苏州盛科通信股份有限公司 DPDK-based message deduplication processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1362809A (en) * 2001-01-08 2002-08-07 陈文胜 Data transmission method and equipment based on network
US6658478B1 (en) * 2000-08-04 2003-12-02 3Pardata, Inc. Data storage system
CN106657400A (en) * 2017-02-20 2017-05-10 北京古盘创世科技发展有限公司 Data transmitting-receiving device and electronic equipment
CN109379432A (en) * 2018-10-31 2019-02-22 腾讯科技(深圳)有限公司 Data processing method, device, server and computer readable storage medium
CN112165457A (en) * 2020-09-04 2021-01-01 苏州浪潮智能科技有限公司 Method, system and device for file rearrangement and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6658478B1 (en) * 2000-08-04 2003-12-02 3Pardata, Inc. Data storage system
CN1362809A (en) * 2001-01-08 2002-08-07 陈文胜 Data transmission method and equipment based on network
CN106657400A (en) * 2017-02-20 2017-05-10 北京古盘创世科技发展有限公司 Data transmitting-receiving device and electronic equipment
CN109379432A (en) * 2018-10-31 2019-02-22 腾讯科技(深圳)有限公司 Data processing method, device, server and computer readable storage medium
CN112165457A (en) * 2020-09-04 2021-01-01 苏州浪潮智能科技有限公司 Method, system and device for file rearrangement and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114244781A (en) * 2021-12-20 2022-03-25 苏州盛科通信股份有限公司 DPDK-based message deduplication processing method and device
CN114244781B (en) * 2021-12-20 2023-12-22 苏州盛科通信股份有限公司 Message de-duplication processing method and device based on DPDK

Also Published As

Publication number Publication date
CN113452621B (en) 2022-02-25

Similar Documents

Publication Publication Date Title
CN110309336A (en) Image search method, device, system, server and storage medium
CN113452621B (en) Simple and efficient multilink data deduplication method
CN106257403A (en) The apparatus and method of the single-pass entropy detection for transmitting about data
US20110289270A1 (en) System, method and computer program product for data transfer management
CN109522316A (en) Log processing method, device, equipment and storage medium
CN113489619A (en) Network topology inference method and device based on time series analysis
US7834784B1 (en) Data redundancy elimination mechanism including fast lookup of data patterns exhibiting spatial locality
CN103227818A (en) Terminal, server, file transferring method, file storage management system and file storage management method
CN111026917A (en) Data packet classification method and system based on convolutional neural network
WO2019107149A1 (en) Bit assignment assessment device, bit assignment assessment method, and program
CN114124447B (en) Intrusion detection method and device based on Modbus data packet reorganization
CN110990603B (en) Method and system for format recognition of segmented image data
US20060274762A1 (en) Method and system for supporting efficient and cache-friendly TCP session lookup operations based on canonicalization tags
KR960014184B1 (en) Method for detecting class error of a classified vector guantized image
US20150081649A1 (en) In-line deduplication for a network and/or storage platform
CN113064554B (en) Optimal storage node matching method, device and medium based on distributed storage
CN111586052B (en) Multi-level-based crowd sourcing contract abnormal transaction identification method and identification system
CN113900886A (en) Abnormal log monitoring method
CN114677584A (en) Water immersion identification method and system for power distribution station of double-attention power system
CN112333155A (en) Abnormal flow detection method and system, electronic equipment and storage medium
CN115167767B (en) Dirty data prevention method and system based on BBC exclusive OR check
CN116934733B (en) Reliability test method and system for chip
CN111177092A (en) Deduplication method and device based on erasure codes
CN117112039B (en) Transmission optimization system and operation method of data center
CN111641538B (en) MAC address table capacity test method, device, electronic device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant