CN114900555A - Data distribution method and device based on lossless compression algorithm - Google Patents

Data distribution method and device based on lossless compression algorithm Download PDF

Info

Publication number
CN114900555A
CN114900555A CN202111524920.XA CN202111524920A CN114900555A CN 114900555 A CN114900555 A CN 114900555A CN 202111524920 A CN202111524920 A CN 202111524920A CN 114900555 A CN114900555 A CN 114900555A
Authority
CN
China
Prior art keywords
data
terminal
subscribing
publishing
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111524920.XA
Other languages
Chinese (zh)
Inventor
夏科睿
彭超
马姓
涂凡凡
姬鹏鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Hagong Xuanyuan Intelligent Technology Co ltd
Original Assignee
Hefei Hagong Xuanyuan Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Hagong Xuanyuan Intelligent Technology Co ltd filed Critical Hefei Hagong Xuanyuan Intelligent Technology Co ltd
Priority to CN202111524920.XA priority Critical patent/CN114900555A/en
Publication of CN114900555A publication Critical patent/CN114900555A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/06Notations for structuring of protocol data, e.g. abstract syntax notation one [ASN.1]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention discloses a data distribution method and a device based on a lossless compression algorithm, wherein the method is applied to a publishing and subscribing system, the publishing and subscribing system consists of a publishing terminal and a plurality of subscribing terminals, and the method comprises the following steps: initializing both a publishing terminal and a subscribing terminal; the publishing terminal and the subscribing terminal are connected with the intermediate proxy, and the intermediate proxy connects the publishing terminal and the subscribing terminal with the same theme; the issuing end serializes the cached data packet into a binary file through a lossless compression algorithm to complete encapsulation information, and then issues the encapsulation information to the intermediate agent; the intermediate proxy receives the packaging information released by the issuing end, decompresses the packaging information and sends the decompressed packaging information to the subscribing end; the invention has the advantages that: the problem of data packet loss in the data distribution process is reduced.

Description

Data distribution method and device based on lossless compression algorithm
Technical Field
The invention relates to the field of data transmission, in particular to a data distribution method and device based on a lossless compression algorithm.
Background
Currently, message middleware is changing from proxy to proxy-less and is moving towards low latency, high throughput and data intensive communication, and will soon become one of the mainstream middleware of the cloud era with the continuous improvement of data distribution services with publish-subscribe function.
In the data distribution process, under the condition of high-frequency message sending and receiving between a publishing terminal and a subscribing terminal, a receiving terminal reads a message and can read the data of a buffer area only by calling system interruption, so that the rate of memory release of the buffer area of the receiving terminal is slower than the rate of message sending, and the condition that the bandwidth utilization rate of the message sending and receiving in the Ethernet is lower under normal conditions is considered, and the cache residual space of the receiving terminal is usually gradually reduced after a period of time, so that the cache of the receiving terminal is slowly exhausted until overflowing, and the receiving terminal can not effectively utilize the trigger mechanism of the receiving terminal to process the larger packet loss condition. The phenomenon is widely existed in various publishing and subscribing systems, such as ' zeroMQ cloud era ultra-speed messaging library ', electronic industry publishers, 2015 ', a publishing/subscribing mode of the system is described in detail, and when a publisher and a subscriber establish connection and then high-frequency data is distributed by using the publishing/subscribing mode of the zeroMQ, the size of a buffer area cannot be effectively controlled based on a traditional data distribution model, and the condition that a large amount of packet loss is caused by memory overflow due to a trigger mechanism is adapted is avoided. Although the ZeroMQ bottom layer adopts an edge trigger mechanism, when a read-write event occurs on a monitored file descriptor, epoll _ wait () informs a handler to read and write, but if the read-write buffer is too small when the secondary edge trigger has not completely read and write the data, then epoll _ wait () will not be notified again the next time it is called, until a second read-write event occurs on the file descriptor, because the subscriber needs to process the service flow after receiving the message, when the time cost of the service flow process is large, the recv thread in the subscriber process is blocked, the message of the publisher is intercepted again after the blockage is finished, therefore, after the thread of the subscribed service stream is blocked, when a plurality of messages arrive, the trigger is triggered only once, only the actually triggered messages are read, and the un-triggered messages are discarded, so that the condition of packet loss caused by buffer overflow in the process of distributing high-frequency data occurs. When the high-frequency data distribution is performed on the large data at the publishing terminal, the cache area can be quickly filled to cause overflow, the packet loss rate for the problem can be changed according to the size change of the data, and the packet loss rate is higher when the data size is larger.
Disclosure of Invention
The technical problem to be solved by the present invention is to solve the problem that data packet loss is easily generated when the message middleware in the prior art distributes data through a publish-subscribe mode.
The invention solves the technical problems through the following technical means: a data distribution method based on a lossless compression algorithm is applied to a publish-subscribe system, the publish-subscribe system is composed of a publish end and a plurality of subscribe ends, and the method comprises the following steps:
s1: initializing both a publishing terminal and a subscribing terminal;
s2: the publishing terminal and the subscribing terminal are connected with the intermediate proxy, and the intermediate proxy connects the publishing terminal and the subscribing terminal with the same theme;
s3: the publishing terminal caches data packets according to the request of the subscribing terminal and caches the data packets to the intermediate proxy message, and when the cached data packets reach the set message sending number, the cached data packets are processed by data compression to complete the encapsulation message, and then the encapsulation message is published to the intermediate proxy; the data compression processing process comprises the following steps: if two pieces of content in the data are the same, replacing the next piece of content by a pair of information of the distance between the two pieces of content and the length of the same piece of content, regarding the value of the preset bit length in the data as a symbol, recoding the symbol according to the frequency of the symbol in the data, wherein the number of times of the symbol is inversely proportional to the number of data bits of the Huffman code;
s4: the intermediate proxy receives the encapsulated message published by the publishing terminal, decompresses the encapsulated message and sends the decompressed encapsulated message to the subscribing terminal.
The data of each publishing terminal is serialized into a binary file through data compression processing and then is published to the middle proxy, the middle proxy decompresses the binary file and then publishes the binary file to the subscribing terminal, a lossless compression algorithm is introduced into the traditional message middle event data distribution process, so that the transmitted data is serialized and lossless compressed, the memory consumed by the subscribing terminal, namely the receiving terminal in the data receiving process is greatly reduced, and the problem that a large amount of packet loss is easily generated due to slow exhaustion of the receiving terminal cache until overflow in the high-frequency data distribution process is better solved.
Further, the initialization process in step S1 includes: the publishing terminal and the subscribing terminal respectively extract the port number and the IP address of the publishing terminal and the subscribing terminal and send the port number and the IP address to the intermediate agent; after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal.
Still further, the step S3 includes the steps of:
s31: initializing the cached data packet, and adding file header information to the cached data packet as a whole;
s32: carrying out data compression processing on the cached data packet added with the file header information;
s33: and checking all data after the data compression processing, returning to execute S31 if the data is coded incorrectly in the compression processing process, and outputting a coding result, namely outputting an encapsulation message if the data is coded incorrectly in the compression processing process.
Furthermore, the start bit sign is uniformly added after or before the coding result of each symbol, and the start bit sign is different from the coding results of all the symbols.
Further, the step S32 is preceded by a step of establishing a huffman dictionary.
Further, in step S33, the data is huffman encoded according to the huffman encoding rule set in the huffman dictionary, so as to complete the data compression process.
Further, the decompression process in step S4 is as follows:
the intermediate proxy receives an encapsulation message issued by the issuing terminal, the encapsulation message is in a binary file format, a Huffman tree table is constructed according to the encapsulation message in the binary file format, the binary file is searched, matched, copied and replaced by a corresponding symbol according to the Huffman tree table, decompression is completed, and the decompressed data is subjected to deserialization and returned to the subscribing terminal.
The invention also provides a data distribution device based on lossless compression algorithm, which is applied to a publishing and subscribing system, wherein the publishing and subscribing system consists of a publishing terminal and a plurality of subscribing terminals, and the device comprises:
the initialization module is used for initializing both the publishing terminal and the subscribing terminal;
the connection module is used for establishing connection between the publishing terminal and the subscribing terminal and the intermediate proxy, and the intermediate proxy connects the publishing terminal and the subscribing terminal with the same theme;
the data compression module is used for caching data packets and caching the data packets to the intermediate proxy message by the publishing terminal according to the request of the subscribing terminal, and when the cached data packets reach the set message sending number, the cached data packets are processed by data compression to complete the encapsulation message, and then the encapsulation message is published to the intermediate proxy; the data compression processing process comprises the following steps: if two pieces of content in the data are the same, replacing the next piece of content by a pair of information of the distance between the two pieces of content and the length of the same piece of content, regarding the value of the preset bit length in the data as a symbol, recoding the symbol according to the frequency of the symbol in the data, wherein the number of times of the symbol is inversely proportional to the number of data bits of the Huffman code;
and the data decompression output module is used for receiving the packaging information released by the publishing terminal by the intermediate proxy, decompressing the packaging information and sending the decompressed packaging information to the subscribing terminal.
Further, the initialization process in the initialization module includes: the publishing terminal and the subscribing terminal respectively extract the port number and the IP address of the publishing terminal and the subscribing terminal and send the port number and the IP address to the intermediate agent; after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal.
Still further, the data compression module further comprises:
an initialization unit: initializing the cached data packet, and adding file header information to the cached data packet as a whole;
a data compression unit: carrying out data compression processing on the cached data packet added with the file header information;
a checking unit: and checking all data after data compression processing, returning to an execution initialization unit if the data are wrongly encoded in the compression processing process, and outputting an encoding result, namely outputting an encapsulation message if the data are wrongly encoded in the compression processing process.
Furthermore, the start bit sign is uniformly added after or before the coding result of each symbol, and the start bit sign is different from the coding results of all the symbols.
Furthermore, before executing the data compression unit, the method also comprises the step of establishing a Huffman dictionary.
Furthermore, the verification unit performs huffman coding on the data according to a huffman coding rule set in the established huffman dictionary so as to complete data compression processing.
Further, the process of decompressing in the data decompression output module is as follows:
the intermediate proxy receives an encapsulation message issued by the issuing terminal, the encapsulation message is in a binary file format, a Huffman tree table is constructed according to the encapsulation message in the binary file format, the binary file is searched, matched, copied and replaced by a corresponding symbol according to the Huffman tree table, decompression is completed, and the decompressed data is subjected to deserialization and returned to the subscribing terminal.
The invention has the advantages that:
(1) the data of each publishing terminal is serialized into a binary file through data compression processing and then is published to the intermediate proxy, the intermediate proxy decompresses the binary file and then publishes the binary file to the subscribing terminal, a lossless compression algorithm is introduced into the traditional message intermediate event data distribution process, so that the transmitted data is serialized and lossless compressed, the memory consumed by the subscribing terminal, namely the receiving terminal in the data receiving process is greatly reduced, and the problem of a large amount of lost packets caused by slow exhaustion of the receiving terminal cache until overflow is easily generated in the high-frequency data distribution process is well solved.
(2) The invention introduces the lossless compression algorithm into the distribution process, and the data is compressed and then sent into the transmission channel, so that the utilization rate of the bandwidth is improved when the data is distributed at high frequency.
(3) If two pieces of content in the data of the invention are the same, the next piece of content is replaced by a pair of information of the distance between the two pieces of content and the length of the same piece of content, and the size of the pair of information of the length of the same piece of content is smaller than that of the replaced piece of content due to the distance between the two pieces of content, so that the file is compressed.
(4) The Huffman coding of the invention regards the value of the preset bit length in the data as symbols, re-codes the symbols according to the frequency of the symbols appearing in the data, uses less data bits to represent the symbols with a very large number of occurrences, and uses more data bits to represent the symbols with a very small number of occurrences, so that the number of bits of some parts of the file is reduced, the number of bits of some parts is increased, and the file is further compressed because the reduced parts are more than the enlarged parts.
Drawings
FIG. 1 is a timing diagram of a conventional publish-subscribe pattern;
FIG. 2 is a flow chart of a data distribution method based on a lossless compression algorithm according to an embodiment of the present invention;
FIG. 3 is a data distribution timing chart of a data distribution method based on a lossless compression algorithm according to an embodiment of the present invention;
FIG. 4 is a flowchart of a publisher in a data distribution method based on a lossless compression algorithm according to an embodiment of the present invention;
fig. 5 is a flowchart of a subscriber in the data distribution method based on the lossless compression algorithm according to the embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
The process of the existing data distribution method is shown in fig. 1, a publish-subscribe system is composed of a publish terminal and a plurality of subscribe terminals,
the issuing terminal and the subscribing terminal respectively send the port number and the IP address of the issuing terminal and the subscribing terminal to the intermediate proxy;
after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal to indicate that the port number and the IP address sent by the opposite side are received;
the publishing terminal and the subscribing terminal are connected through an intermediate proxy, and the intermediate proxy connects the publishing terminal and the subscribing terminal with the same theme;
when the message is published, the publishing terminal sends the message corresponding to the topic requested by the subscribing terminal to the intermediate proxy, and the intermediate proxy forwards the message corresponding to the topic requested by the subscribing terminal to the subscribing terminal. The publishing and subscribing system uses the PUB-SUB socket to carry out asynchronous communication, the publishing end uses the PUB socket to publish data, the data is sent to each subscribing end in a fan-out mode, the subscribing end uses the SUB socket to receive the data, the subscribing end filters the type of the message to be subscribed before receiving the data, namely, the subscribing end needs to identify the data content needed by the subscribing end, and the subscribing end cannot receive any message if the subscribing end does not set the subscription content needed by the subscribing end. Each message subscribing terminal receives the messages in a circular waiting mode, and the message publishing terminal calls zmq _ send () method to continuously publish the messages according to the requirements.
In fig. 1, publish represents an issuing end, Middleware represents an intermediate proxy, subscribe represents a subscribing end, zmp.socket (zmp.pub) represents that the issuing end sends its port number and IP address to the intermediate proxy, zmp.socket (zmp.sub) represents that the subscribing end sends its port number and IP address to the intermediate proxy, return represents that the intermediate proxy feeds back a signal of receiving a message to the issuing end or the subscribing end, zmp.bind () represents that the issuing end is bound with the intermediate proxy, that is, the issuing end is connected with the intermediate proxy, zmp.connect () represents that the subscribing end is connected with the intermediate proxy, zmp.send represents that the issuing end sends a message requested by the subscribing end to the intermediate proxy, and zmp.recv () represents that the subscribing end receives a message issued by the subscribing end.
However, in the above-described data distribution method in the prior art, under the condition that the message is sent and received between the publishing terminal and the subscribing terminal at a high frequency, the subscribing terminal reads the message and can only read the data in the buffer region by invoking system interrupt, so that the memory release rate of the buffer region at the subscribing terminal is slower than the message sending rate, and considering that the bandwidth utilization rate of the message sent and received in the ethernet under normal conditions is low, the cache residual space of the subscribing terminal is often gradually reduced after a period of time, which causes the cache of the subscribing terminal to be slowly exhausted until overflowing, and the subscribing terminal cannot effectively utilize its own trigger mechanism to handle the larger packet loss condition.
As shown in fig. 3, the present invention provides a data distribution method based on lossless compression algorithm based on the problem that data packet loss is easily generated during data distribution in the prior art, and compresses and decompresses data based on the prior data distribution to reduce the data packet loss rate, where the method is applied to a publish-subscribe system, and the publish-subscribe system is composed of a publish terminal and a plurality of subscribe terminals, and the method includes:
s1: initializing both a publishing terminal and a subscribing terminal, and respectively extracting port numbers and IP addresses of the publishing terminal and the subscribing terminal in the initialization process and sending the port numbers and the IP addresses to an intermediate agent; after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal; in fig. 3, self _init _ () indicates that both the publishing side and the subscribing side are initialized, compress indicates compression, decompress indicates decompression, zmp.pub indicates the publishing side, zmp.sub indicates the subscribing side, package.duration () indicates the result of compressing the cached packet, and return package.loads () indicates the result of returning the decompressed result.
S2: the publishing terminal and the subscribing terminal are connected with the intermediate proxy, and the intermediate proxy connects the publishing terminal and the subscribing terminal with the same theme;
s3: the publishing terminal caches data packets according to the request of the subscribing terminal and caches the data packets to the intermediate proxy message, and when the cached data packets reach the set message sending number, the cached data packets are processed by data compression to complete the encapsulation message, and then the encapsulation message is published to the intermediate proxy; the data compression processing process comprises the following steps: if two pieces of content in the data are the same, replacing the next piece of content by a pair of information of the distance between the two pieces of content and the length of the same piece of content, regarding the value of the preset bit length in the data as a symbol, recoding the symbol according to the frequency of the symbol in the data, wherein the number of times of the symbol is inversely proportional to the number of data bits of the Huffman code;
lossless compression algorithms utilize statistical redundancy of data for compression and can fully recover the original data without causing any distortion. Generating and continuously updating a compression dictionary of the current block through a Hash algorithm; and on the other hand, the data matching condition is stored in the buffer area through the lookup dictionary, and the frequency of the data matching condition is counted. Thus, the matching relationship of the original data is described by the unmatched bytes, the matching length and the matching distance. And after the processing window data is processed, coding the matching condition of the processing window data by utilizing the dynamic and static Huffman algorithms, and comparing and selecting the Huffman algorithm with good compression effect to perform compression coding output. The flow chart is shown in fig. 2, the LZ77 algorithm in fig. 2 is the lossless compression algorithm described in step S3 of the present invention, and the Zlib fault tolerance manner is the verification manner described in step S33 of the present invention, which is prior art and will not be described herein again. The specific compression process is as follows,
s31: initializing the cached data packet, and adding file header information, namely header information, to the cached data packet as a whole, so as to identify and package the message packet according to the file header information during decompression;
s32: performing data compression processing on the cached data packet added with the file header information, wherein the data processing process comprises the following steps: if two pieces of content in the data are the same, the next piece of content is replaced by a pair of information, namely the distance between the two pieces of content and the length of the same piece of content. The size of the pair of information, the length of the same content, is smaller than the size of the replaced content due to the distance between the two, so that the file is compressed. The Huffman coding is to regard the value with preset bit length in the data as the symbol, and to conduct Huffman coding on the symbol according to the frequency of the symbol appearing in the file, to establish the Huffman tree and to output the coding. For example, 256 values of 8 bits long, i.e., 256 values of a byte, are considered as symbols. The symbols are recoded according to the frequency of the symbols appearing in the data, the number of the symbols appearing is inversely proportional to the number of the Huffman-coded data bits, namely, the symbols appearing for a very large number of times are represented by fewer data bits, the symbols appearing for a very small number of times are represented by more data bits, for example, the number of the A symbols appearing is 100, the number of the B symbols appearing is 200, the number of the C symbols appearing is 300, the number of the D symbols appearing is 400, the D symbols are represented by one-bit data bits, the C symbols are represented by two-bit data bits, the B symbols are represented by three-bit data bits, the A symbols are represented by four-bit data bits, or the D symbols are represented by one-bit data bits 0, the C symbols are represented by a coding form 1 of one-bit data bits different from the D symbols, and the specific coded data bits are adjusted according to the actual application. When encoding is carried out, the start bit symbol is uniformly added behind or in front of the encoding result of each symbol, and the start bit symbol is different from the encoding results of all the symbols, so that each symbol can be distinguished conveniently when decoding is carried out. Thus, the number of partial bits of the file is reduced, the number of partial bits is increased, and the file is compressed because the reduced part is larger than the enlarged part.
Before data compression processing is carried out on the cached data packet added with the file header information, a Huffman dictionary can be generated and updated, and Huffman coding is carried out on data according to Huffman coding rules established by the dictionary so as to establish a Huffman tree more quickly, so that data compression is completed more quickly.
S33: and checking all data after data compression processing, if the coding is wrong in the compression processing process, returning to the step S31 for compression processing again, and if the coding is correct in the compression processing process, outputting a coding result, namely an encapsulation message.
S4: the intermediate proxy receives the encapsulated message published by the publishing terminal, decompresses the encapsulated message, and sends the decompressed encapsulated message to the subscribing terminal, and the specific process is as follows: the intermediate proxy reads the binary file after receiving the compressed data, constructs a Huffman tree table, searches for a matched copy of the binary file according to the Huffman tree table, replaces the matched copy with a corresponding symbol, completes decompression, performs deserialization on the data after decompression, and returns the deserialized data to the subscriber. The distributed data is compressed and then sent to the transmission channel by introducing a lossless compression algorithm, so that the utilization rate of bandwidth is improved when the data is distributed at high frequency by a data distribution model, and the problem of memory overflow caused by fast filling a cache region when a subscriber receives the data is solved.
Fig. 3 is a timing diagram of a data distribution service after being improved by the method provided by the present invention, because the improved lossless compression algorithm is added in the improved data distribution method, when data is distributed, a distribution end calls a send _ compress () interface to send data, and when receiving, a subscriber calls a recv _ compress () interface to return data received by a subscription port Socket. Fig. 4 and 5 are flow charts of the modified publisher and subscriber.
Through the technical scheme, the data of each publishing terminal is serialized into the binary file through the lossless compression algorithm and then cross-arrives at the subscribing terminal, the lossless compression algorithm is introduced into the traditional message intermediate event data distribution process, so that the transmitted data is serialized and lossless compressed, the memory consumed by the subscribing terminal, namely the receiving terminal in the data receiving process is greatly reduced, and the problem that a large amount of packet loss from slow exhaustion of the receiving terminal cache to overflow is easily generated in the high-frequency data distribution process is well solved.
Example 2
Based on embodiment 1 of the present invention, embodiment 2 of the present invention further provides a data distribution apparatus based on a lossless compression algorithm, where the apparatus is applied to a publish-subscribe system, the publish-subscribe system is composed of a publishing terminal and a plurality of subscribing terminals, and the apparatus includes:
the initialization module is used for initializing both the publishing terminal and the subscribing terminal;
the connection module is used for establishing connection between the publishing terminal and the subscribing terminal and the intermediate proxy, and the intermediate proxy connects the publishing terminal and the subscribing terminal with the same theme;
the data compression module is used for caching data packets and caching the data packets to the intermediate proxy message by the publishing terminal according to the request of the subscribing terminal, and when the cached data packets reach the set message sending number, the cached data packets are processed by data compression to complete the encapsulation message, and then the encapsulation message is published to the intermediate proxy; the data compression processing process comprises the following steps: if two pieces of content in the data are the same, replacing the next piece of content by a pair of information of the distance between the two pieces of content and the length of the same piece of content, regarding the value of the preset bit length in the data as a symbol, recoding the symbol according to the frequency of the symbol in the data, wherein the number of times of the symbol is inversely proportional to the number of data bits of the Huffman code;
and the data decompression output module is used for receiving the packaging information released by the publishing terminal by the intermediate proxy, decompressing the packaging information and sending the decompressed packaging information to the subscribing terminal.
Specifically, the initialization process in the initialization module includes: the publishing terminal and the subscribing terminal respectively extract the port number and the IP address of the publishing terminal and the subscribing terminal and send the port number and the IP address to the intermediate proxy; after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal.
Specifically, the data compression module further includes:
an initialization unit: initializing the cached data packet, and adding file header information to the cached data packet as a whole;
a data compression unit: carrying out data compression processing on the cached data packet added with the file header information;
a checking unit: and checking all data after data compression processing, returning to an execution initialization unit if the data are wrongly encoded in the compression processing process, and outputting an encoding result, namely outputting an encapsulation message if the data are wrongly encoded in the compression processing process.
More specifically, the start bit sign is uniformly added after or before the coding result of each symbol, and the start bit sign is different from the coding results of all the symbols.
More specifically, before executing the data compression unit, the method further comprises the step of establishing a huffman dictionary.
More specifically, the verification unit performs huffman coding on the data according to a huffman coding rule set in the established huffman dictionary, thereby completing the data compression processing.
Specifically, the process of decompressing in the data decompression output module is as follows:
the intermediate proxy receives an encapsulation message issued by the issuing terminal, the encapsulation message is in a binary file format, a Huffman tree table is constructed according to the encapsulation message in the binary file format, the binary file is searched, matched, copied and replaced by a corresponding symbol according to the Huffman tree table, decompression is completed, and the decompressed data is subjected to deserialization and returned to the subscribing terminal.
Example 3
As shown in fig. 3, embodiment 3 of the present invention provides a data distribution method, where the method is applied to a publishing terminal of a publish-subscribe system, where the publish-subscribe system is composed of one publishing terminal and a plurality of subscribing terminals, and the method includes:
s1: and initializing the issuing end, wherein the issuing end extracts the port number and the IP address of the issuing end per se in the initialization process and sends the port number and the IP address to the intermediate agent to acquire the feedback message of the intermediate agent.
S2: the publishing terminal establishes connection with the subscribing terminal through the intermediate proxy;
s3: the publishing terminal caches the data packets according to the request of the subscribing terminal forwarded by the intermediate proxy and caches the data packets to the intermediate proxy, when the cached data packets reach the set message sending number, the cached data packets are processed into a binary file through data compression to complete the encapsulation message, and then the encapsulation message is published to the intermediate proxy;
lossless compression algorithms utilize statistical redundancy of data for compression and can fully recover the original data without causing any distortion. Generating and continuously updating a compression dictionary of the current block through a Hash algorithm; and on the other hand, the data matching condition is stored in the buffer area through the lookup dictionary, and the frequency of the data matching condition is counted. Thus, the matching relationship of the original data is described by the unmatched bytes, the matching length and the matching distance. And after the processing window data is processed, coding the matching condition of the processing window data by utilizing the dynamic and static Huffman algorithms, and comparing and selecting the Huffman algorithm with good compression effect to perform compression coding output. The flow chart is shown in fig. 2, the LZ77 algorithm in fig. 2 is the lossless compression algorithm described in step S3 of the present invention, and the Zlib fault-tolerant manner is the checking manner described in step S33 of the present invention, which is prior art and will not be described herein. The specific compression process is as follows,
s31: initializing the cached data packet, and adding file header information, namely header information, to the cached data packet as a whole, so as to identify and package a message packet according to the file header information during decompression;
s32: performing data compression processing on the cached data packet added with the file header information, wherein the data processing process comprises the following steps: if two pieces of content in the data are the same, the next piece of content is replaced by a pair of information, namely the distance between the two pieces of content and the length of the same piece of content. The size of the pair of information, the length of the same content, is smaller than the size of the replaced content due to the distance between the two, so that the file is compressed. The Huffman coding is to regard the value with preset bit length in the data as the symbol, to conduct Huffman coding to the symbol according to the frequency of the symbol appearing in the file, to establish the Huffman tree and to output the coding. For example, 256 values of 8 bits long, i.e., 256 values of a byte, are considered as symbols. The symbols are recoded according to the frequency of the symbols appearing in the data, the number of the symbols appearing is inversely proportional to the number of the Huffman-coded data bits, namely, the symbols appearing for a very large number of times are represented by fewer data bits, the symbols appearing for a very small number of times are represented by more data bits, for example, the number of the A symbols appearing is 100, the number of the B symbols appearing is 200, the number of the C symbols appearing is 300, the number of the D symbols appearing is 400, the D symbols are represented by one-bit data bits, the C symbols are represented by two-bit data bits, the B symbols are represented by three-bit data bits, the A symbols are represented by four-bit data bits, or the D symbols are represented by one-bit data bits 0, the C symbols are represented by a coding form 1 of one-bit data bits different from the D symbols, and the specific coded data bits are adjusted according to the actual application. When encoding is carried out, the start bit symbol is uniformly added behind or in front of the encoding result of each symbol, and the start bit symbol is different from the encoding results of all the symbols, so that each symbol can be distinguished conveniently when decoding is carried out. Thus, the number of partial bits of the file is reduced, the number of partial bits is increased, and the file is compressed because the reduced part is larger than the enlarged part.
Before data compression processing is carried out on the cached data packet added with the file header information, a Huffman dictionary can be generated and updated, and Huffman coding is carried out on data according to Huffman coding rules established by the dictionary so as to establish a Huffman tree more quickly, so that data compression is completed more quickly.
S33: and checking all data after the data compression processing, returning to execute S31 for compression processing if the coding is wrong in the compression processing process, and outputting the coding result, namely the encapsulation message, if the coding is not wrong in the compression processing process.
Example 4
As shown in fig. 3, an embodiment 4 of the present invention provides a data distribution method, where the method is applied to an intermediate broker of a publish-subscribe system, where the publish-subscribe system is composed of a publishing terminal and a plurality of subscribing terminals, and the method includes:
s1: after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal.
S2: the intermediate proxy establishes connection with the publishing terminal and the subscribing terminal respectively, and connects the publishing terminal and the subscribing terminal with the same theme;
s3: the intermediate agent receives an encapsulation message sent by a release terminal;
s4: the intermediate proxy decompresses the encapsulated message and sends the decompressed encapsulated message to the subscription end, and the specific process is as follows: the method comprises the steps that a package message is a binary file, an intermediate proxy reads the binary file after receiving the package message, a Huffman tree table is constructed, the binary file is searched, matched, copied and replaced by a corresponding symbol according to the Huffman tree table, decompression is completed, the symbol obtained after decompression is deserialized to obtain a value with a preset bit length, all values with the preset bit length form complete data, if the data has a pair of information of the distance between two pieces of content and the length of the same content, the position of the same content is found according to the distance between the two pieces of content, the same content replaces the pair of information of the distance between the two pieces of content and the length of the same content, decompression is further completed, the data after decompression comprises file header information, different data packets are distinguished according to the file header information, and the data packets are sent to a subscription end.
Example 5
As shown in fig. 3, an embodiment 5 of the present invention provides a data distribution method, where the method is applied to a subscriber of a publish-subscribe system, where the publish-subscribe system is composed of a publisher and a plurality of subscribers, and the method includes:
s1: the subscription terminal initializes, extracts the self port number and IP address and sends to the intermediate proxy in the initialization process, and obtains the feedback information of the intermediate proxy;
s2: the subscription end establishes connection with the intermediate proxy and requests the intermediate proxy to receive data of a specified theme;
s3: and the subscriber receives the data packet sent by the intermediate proxy.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. The data distribution method based on the lossless compression algorithm is characterized in that the method is applied to a publishing and subscribing system, the publishing and subscribing system consists of a publishing terminal and a plurality of subscribing terminals, and the method comprises the following steps:
s1: initializing both a publishing terminal and a subscribing terminal;
s2: the publishing terminal and the subscribing terminal are connected with the intermediate proxy, and the intermediate proxy connects the publishing terminal and the subscribing terminal with the same theme;
s3: the publishing terminal caches data packets according to the request of the subscribing terminal and caches the data packets to the intermediate proxy message, and when the cached data packets reach the set message sending number, the cached data packets are processed by data compression to complete the encapsulation message, and then the encapsulation message is published to the intermediate proxy; the data compression processing process comprises the following steps: if two pieces of content in the data are the same, replacing the next piece of content by a pair of information of the distance between the two pieces of content and the length of the same piece of content, regarding the value of the preset bit length in the data as a symbol, recoding the symbol according to the frequency of the symbol in the data, wherein the number of times of the symbol is inversely proportional to the number of data bits of the Huffman code;
s4: the intermediate proxy receives the encapsulated message published by the publishing terminal, decompresses the encapsulated message and sends the decompressed encapsulated message to the subscribing terminal.
2. The lossless compression algorithm-based data distribution method according to claim 1, wherein the initialization process in step S1 includes: the publishing terminal and the subscribing terminal respectively extract the port number and the IP address of the publishing terminal and the subscribing terminal and send the port number and the IP address to the intermediate agent; after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal.
3. The lossless compression algorithm-based data distribution method according to claim 2, wherein the step S3 includes the steps of:
s31: initializing the cached data packet, and adding file header information to the cached data packet as a whole;
s32: carrying out data compression processing on the cached data packet added with the file header information;
s33: and checking all data after the data compression processing, returning to execute S31 if the data is coded incorrectly in the compression processing process, and outputting a coding result, namely outputting an encapsulation message if the data is coded incorrectly in the compression processing process.
4. The method of claim 3, wherein the start bit sign is uniformly added after or before the coding result of each symbol, and the start bit sign is different from the coding result of all symbols.
5. The data distribution method based on lossless compression algorithm of claim 3, wherein the step S32 is preceded by the step of establishing a Huffman dictionary.
6. The data distribution method based on lossless compression algorithm of claim 5, wherein in step S33, data is huffman coded according to huffman coding rules set in the huffman dictionary to complete the data compression process.
7. The data distribution method based on lossless compression algorithm according to claim 1, wherein the decompression in step S4 is performed by:
the intermediate proxy receives an encapsulation message issued by the issuing terminal, the encapsulation message is in a binary file format, a Huffman tree table is constructed according to the encapsulation message in the binary file format, the binary file is searched, matched, copied and replaced by a corresponding symbol according to the Huffman tree table, decompression is completed, and the decompressed data is subjected to deserialization and returned to the subscribing terminal.
8. The data distribution device based on lossless compression algorithm is characterized in that the device is applied to a publishing and subscribing system, the publishing and subscribing system is composed of a publishing terminal and a plurality of subscribing terminals, and the device comprises:
the initialization module is used for initializing both the publishing terminal and the subscribing terminal;
the connection module is used for establishing connection between the publishing terminal and the subscribing terminal and the intermediate proxy, and the intermediate proxy connects the publishing terminal and the subscribing terminal with the same theme;
the data compression module is used for caching data packets and caching the data packets to the intermediate proxy message by the publishing terminal according to the request of the subscribing terminal, and when the cached data packets reach the set message sending number, the cached data packets are processed by data compression to complete the encapsulation message, and then the encapsulation message is published to the intermediate proxy; the data compression processing process comprises the following steps: if two pieces of content in the data are the same, replacing the next piece of content by a pair of information of the distance between the two pieces of content and the length of the same piece of content, regarding the value of the preset bit length in the data as a symbol, recoding the symbol according to the frequency of the symbol in the data, wherein the number of times of the symbol is inversely proportional to the number of data bits of the Huffman code;
and the data decompression output module is used for receiving the packaging information released by the publishing terminal by the intermediate proxy, decompressing the packaging information and sending the decompressed packaging information to the subscribing terminal.
9. The apparatus for data distribution based on lossless compression algorithm according to claim 8, wherein the initialization process in the initialization module includes: the publishing terminal and the subscribing terminal respectively extract the port number and the IP address of the publishing terminal and the subscribing terminal and send the port number and the IP address to the intermediate agent; after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal.
10. The lossless compression algorithm-based data distribution apparatus according to claim 9, wherein the data compression module further comprises:
an initialization unit: initializing the cached data packet, and adding file header information to the cached data packet as a whole;
a data compression unit: carrying out data compression processing on the cached data packet added with the file header information;
a checking unit: and checking all data after data compression processing, returning to an execution initialization unit if the data are wrongly encoded in the compression processing process, and outputting an encoding result, namely outputting an encapsulation message if the data are wrongly encoded in the compression processing process.
CN202111524920.XA 2021-12-14 2021-12-14 Data distribution method and device based on lossless compression algorithm Withdrawn CN114900555A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111524920.XA CN114900555A (en) 2021-12-14 2021-12-14 Data distribution method and device based on lossless compression algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111524920.XA CN114900555A (en) 2021-12-14 2021-12-14 Data distribution method and device based on lossless compression algorithm

Publications (1)

Publication Number Publication Date
CN114900555A true CN114900555A (en) 2022-08-12

Family

ID=82714391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111524920.XA Withdrawn CN114900555A (en) 2021-12-14 2021-12-14 Data distribution method and device based on lossless compression algorithm

Country Status (1)

Country Link
CN (1) CN114900555A (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004006486A2 (en) * 2002-07-08 2004-01-15 Precache, Inc. Packet routing via payload inspection for alert services, for digital content delivery and for quality of service management and caching with selective multicasting in a publish-subscribe network
CN101159710A (en) * 2007-11-06 2008-04-09 中国科学院计算技术研究所 Service combination searching method and system of structure facing to service
CN101848236A (en) * 2010-05-06 2010-09-29 北京邮电大学 Real-time data distribution system with distributed network architecture and working method thereof
CN102143198A (en) * 2010-09-30 2011-08-03 华为技术有限公司 Method, device and system for transferring messages
CN105183299A (en) * 2015-09-30 2015-12-23 珠海许继芝电网自动化有限公司 Human-computer interface service processing system and method
CN105335221A (en) * 2015-10-09 2016-02-17 中国电子科技集团公司第二十九研究所 Reconstructible distributed software bus
CN105472042A (en) * 2016-01-15 2016-04-06 中煤电气有限公司 WEB terminal controlled message middleware system and data transmission method thereof
EP3188442A1 (en) * 2015-12-30 2017-07-05 VeriSign, Inc. Detection, prevention, and/or mitigation of dos attacks in publish/subscribe infrastructure
US20170353424A1 (en) * 2016-06-07 2017-12-07 Machine Zone, Inc. Message compression in scalable messaging system
CN107592117A (en) * 2017-08-15 2018-01-16 深圳前海信息技术有限公司 Compression data block output intent and device based on Deflate
CN107688439A (en) * 2017-08-15 2018-02-13 深圳前海信息技术有限公司 The generation method and device of onrelevant compression blocks based on Deflate
CN111600936A (en) * 2020-04-24 2020-08-28 国电南瑞科技股份有限公司 Asymmetric processing system based on multiple containers and suitable for ubiquitous electric power internet of things edge terminal
US20210176324A1 (en) * 2019-12-10 2021-06-10 Vmware, Inc. Topic-based data routing in a publish-subscribe messaging environment
KR20210073005A (en) * 2019-12-10 2021-06-18 (주)구름네트웍스 A middleware apparatus operating method of data distribution services for providing a efficient message processing
US20210226647A1 (en) * 2015-12-30 2021-07-22 Teraki Gmbh Method and system for obtaining and storing sensor data
CN113778759A (en) * 2021-11-05 2021-12-10 北京泰策科技有限公司 Failure detection and recovery method in data distribution process

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004006486A2 (en) * 2002-07-08 2004-01-15 Precache, Inc. Packet routing via payload inspection for alert services, for digital content delivery and for quality of service management and caching with selective multicasting in a publish-subscribe network
CN101159710A (en) * 2007-11-06 2008-04-09 中国科学院计算技术研究所 Service combination searching method and system of structure facing to service
CN101848236A (en) * 2010-05-06 2010-09-29 北京邮电大学 Real-time data distribution system with distributed network architecture and working method thereof
CN102143198A (en) * 2010-09-30 2011-08-03 华为技术有限公司 Method, device and system for transferring messages
CN105183299A (en) * 2015-09-30 2015-12-23 珠海许继芝电网自动化有限公司 Human-computer interface service processing system and method
CN105335221A (en) * 2015-10-09 2016-02-17 中国电子科技集团公司第二十九研究所 Reconstructible distributed software bus
US20210226647A1 (en) * 2015-12-30 2021-07-22 Teraki Gmbh Method and system for obtaining and storing sensor data
EP3188442A1 (en) * 2015-12-30 2017-07-05 VeriSign, Inc. Detection, prevention, and/or mitigation of dos attacks in publish/subscribe infrastructure
CN105472042A (en) * 2016-01-15 2016-04-06 中煤电气有限公司 WEB terminal controlled message middleware system and data transmission method thereof
US20170353424A1 (en) * 2016-06-07 2017-12-07 Machine Zone, Inc. Message compression in scalable messaging system
CN107688439A (en) * 2017-08-15 2018-02-13 深圳前海信息技术有限公司 The generation method and device of onrelevant compression blocks based on Deflate
CN107592117A (en) * 2017-08-15 2018-01-16 深圳前海信息技术有限公司 Compression data block output intent and device based on Deflate
US20210176324A1 (en) * 2019-12-10 2021-06-10 Vmware, Inc. Topic-based data routing in a publish-subscribe messaging environment
KR20210073005A (en) * 2019-12-10 2021-06-18 (주)구름네트웍스 A middleware apparatus operating method of data distribution services for providing a efficient message processing
CN111600936A (en) * 2020-04-24 2020-08-28 国电南瑞科技股份有限公司 Asymmetric processing system based on multiple containers and suitable for ubiquitous electric power internet of things edge terminal
CN113778759A (en) * 2021-11-05 2021-12-10 北京泰策科技有限公司 Failure detection and recovery method in data distribution process

Similar Documents

Publication Publication Date Title
US9967368B2 (en) Systems and methods for data block decompression
US10284225B2 (en) Systems and methods for data compression
CN114556956A (en) Low latency encoding using bypass sub-streams and entropy encoded sub-streams
US7417568B2 (en) System and method for data feed acceleration and encryption
EP0493286A1 (en) System, method and apparatus for packet transmission with date compression
US11023412B2 (en) RDMA data sending and receiving methods, electronic device, and readable storage medium
CN109831409B (en) Data transmission method and device, computer readable storage medium and electronic equipment
US7605721B2 (en) Adaptive entropy coding compression output formats
CN105052040A (en) System and method for multi-stream compression and decompression
CN112399479B (en) Method, electronic device and storage medium for data transmission
US7518538B1 (en) Adaptive entropy coding compression with multi-level context escapes
CN106851733A (en) A kind of adaptive H TTP message compression methods for mobile network's application
CN114900555A (en) Data distribution method and device based on lossless compression algorithm
CN115499506B (en) MQTT information transmission data compression method based on LZW algorithm and server
CN114116234B (en) Three-dimensional scene model decompression loading method based on browser
CN114979093B (en) RTP-based data transmission method, device, equipment and medium
CN111490984B (en) Network data coding and encryption algorithm thereof
CN112732810A (en) Data transmission system, data transmission method, data transmission device, storage medium, and electronic device
US7564383B2 (en) Compression ratio of adaptive compression algorithms
WO2023237121A1 (en) Data processing method and apparatus and related device
CN110995274B (en) Decompression method and device
CN115665268B (en) Data transmission device and method suitable for storage and calculation integrated chip
CN106790462B (en) Short sentence transmission method and system, server, sending client and receiving client
CN115085859B (en) Packet loss prevention method, device and computer readable storage medium
WO2023236876A1 (en) Data processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220812

WW01 Invention patent application withdrawn after publication