CN114900555A

CN114900555A - Data distribution method and device based on lossless compression algorithm

Info

Publication number: CN114900555A
Application number: CN202111524920.XA
Authority: CN
Inventors: 夏科睿; 彭超; 马姓; 涂凡凡; 姬鹏鹏
Original assignee: Hefei Hagong Xuanyuan Intelligent Technology Co ltd
Current assignee: Hefei Hagong Xuanyuan Intelligent Technology Co ltd
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-08-12

Abstract

The invention discloses a data distribution method and a device based on a lossless compression algorithm, wherein the method is applied to a publishing and subscribing system, the publishing and subscribing system consists of a publishing terminal and a plurality of subscribing terminals, and the method comprises the following steps: initializing both a publishing terminal and a subscribing terminal; the publishing terminal and the subscribing terminal are connected with the intermediate proxy, and the intermediate proxy connects the publishing terminal and the subscribing terminal with the same theme; the issuing end serializes the cached data packet into a binary file through a lossless compression algorithm to complete encapsulation information, and then issues the encapsulation information to the intermediate agent; the intermediate proxy receives the packaging information released by the issuing end, decompresses the packaging information and sends the decompressed packaging information to the subscribing end; the invention has the advantages that: the problem of data packet loss in the data distribution process is reduced.

Description

Data distribution method and device based on lossless compression algorithm

Technical Field

The invention relates to the field of data transmission, in particular to a data distribution method and device based on a lossless compression algorithm.

Background

Currently, message middleware is changing from proxy to proxy-less and is moving towards low latency, high throughput and data intensive communication, and will soon become one of the mainstream middleware of the cloud era with the continuous improvement of data distribution services with publish-subscribe function.

In the data distribution process, under the condition of high-frequency message sending and receiving between a publishing terminal and a subscribing terminal, a receiving terminal reads a message and can read the data of a buffer area only by calling system interruption, so that the rate of memory release of the buffer area of the receiving terminal is slower than the rate of message sending, and the condition that the bandwidth utilization rate of the message sending and receiving in the Ethernet is lower under normal conditions is considered, and the cache residual space of the receiving terminal is usually gradually reduced after a period of time, so that the cache of the receiving terminal is slowly exhausted until overflowing, and the receiving terminal can not effectively utilize the trigger mechanism of the receiving terminal to process the larger packet loss condition. The phenomenon is widely existed in various publishing and subscribing systems, such as ' zeroMQ cloud era ultra-speed messaging library ', electronic industry publishers, 2015 ', a publishing/subscribing mode of the system is described in detail, and when a publisher and a subscriber establish connection and then high-frequency data is distributed by using the publishing/subscribing mode of the zeroMQ, the size of a buffer area cannot be effectively controlled based on a traditional data distribution model, and the condition that a large amount of packet loss is caused by memory overflow due to a trigger mechanism is adapted is avoided. Although the ZeroMQ bottom layer adopts an edge trigger mechanism, when a read-write event occurs on a monitored file descriptor, epoll _ wait () informs a handler to read and write, but if the read-write buffer is too small when the secondary edge trigger has not completely read and write the data, then epoll _ wait () will not be notified again the next time it is called, until a second read-write event occurs on the file descriptor, because the subscriber needs to process the service flow after receiving the message, when the time cost of the service flow process is large, the recv thread in the subscriber process is blocked, the message of the publisher is intercepted again after the blockage is finished, therefore, after the thread of the subscribed service stream is blocked, when a plurality of messages arrive, the trigger is triggered only once, only the actually triggered messages are read, and the un-triggered messages are discarded, so that the condition of packet loss caused by buffer overflow in the process of distributing high-frequency data occurs. When the high-frequency data distribution is performed on the large data at the publishing terminal, the cache area can be quickly filled to cause overflow, the packet loss rate for the problem can be changed according to the size change of the data, and the packet loss rate is higher when the data size is larger.

Disclosure of Invention

The technical problem to be solved by the present invention is to solve the problem that data packet loss is easily generated when the message middleware in the prior art distributes data through a publish-subscribe mode.

The invention solves the technical problems through the following technical means: a data distribution method based on a lossless compression algorithm is applied to a publish-subscribe system, the publish-subscribe system is composed of a publish end and a plurality of subscribe ends, and the method comprises the following steps:

s1: initializing both a publishing terminal and a subscribing terminal;

s2: the publishing terminal and the subscribing terminal are connected with the intermediate proxy, and the intermediate proxy connects the publishing terminal and the subscribing terminal with the same theme;

s3: the publishing terminal caches data packets according to the request of the subscribing terminal and caches the data packets to the intermediate proxy message, and when the cached data packets reach the set message sending number, the cached data packets are processed by data compression to complete the encapsulation message, and then the encapsulation message is published to the intermediate proxy; the data compression processing process comprises the following steps: if two pieces of content in the data are the same, replacing the next piece of content by a pair of information of the distance between the two pieces of content and the length of the same piece of content, regarding the value of the preset bit length in the data as a symbol, recoding the symbol according to the frequency of the symbol in the data, wherein the number of times of the symbol is inversely proportional to the number of data bits of the Huffman code;

s4: the intermediate proxy receives the encapsulated message published by the publishing terminal, decompresses the encapsulated message and sends the decompressed encapsulated message to the subscribing terminal.

The data of each publishing terminal is serialized into a binary file through data compression processing and then is published to the middle proxy, the middle proxy decompresses the binary file and then publishes the binary file to the subscribing terminal, a lossless compression algorithm is introduced into the traditional message middle event data distribution process, so that the transmitted data is serialized and lossless compressed, the memory consumed by the subscribing terminal, namely the receiving terminal in the data receiving process is greatly reduced, and the problem that a large amount of packet loss is easily generated due to slow exhaustion of the receiving terminal cache until overflow in the high-frequency data distribution process is better solved.

Further, the initialization process in step S1 includes: the publishing terminal and the subscribing terminal respectively extract the port number and the IP address of the publishing terminal and the subscribing terminal and send the port number and the IP address to the intermediate agent; after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal.

Still further, the step S3 includes the steps of:

s31: initializing the cached data packet, and adding file header information to the cached data packet as a whole;

s32: carrying out data compression processing on the cached data packet added with the file header information;

s33: and checking all data after the data compression processing, returning to execute S31 if the data is coded incorrectly in the compression processing process, and outputting a coding result, namely outputting an encapsulation message if the data is coded incorrectly in the compression processing process.

Furthermore, the start bit sign is uniformly added after or before the coding result of each symbol, and the start bit sign is different from the coding results of all the symbols.

Further, the step S32 is preceded by a step of establishing a huffman dictionary.

Further, in step S33, the data is huffman encoded according to the huffman encoding rule set in the huffman dictionary, so as to complete the data compression process.

Further, the decompression process in step S4 is as follows:

the intermediate proxy receives an encapsulation message issued by the issuing terminal, the encapsulation message is in a binary file format, a Huffman tree table is constructed according to the encapsulation message in the binary file format, the binary file is searched, matched, copied and replaced by a corresponding symbol according to the Huffman tree table, decompression is completed, and the decompressed data is subjected to deserialization and returned to the subscribing terminal.

The invention also provides a data distribution device based on lossless compression algorithm, which is applied to a publishing and subscribing system, wherein the publishing and subscribing system consists of a publishing terminal and a plurality of subscribing terminals, and the device comprises:

the initialization module is used for initializing both the publishing terminal and the subscribing terminal;

the connection module is used for establishing connection between the publishing terminal and the subscribing terminal and the intermediate proxy, and the intermediate proxy connects the publishing terminal and the subscribing terminal with the same theme;

the data compression module is used for caching data packets and caching the data packets to the intermediate proxy message by the publishing terminal according to the request of the subscribing terminal, and when the cached data packets reach the set message sending number, the cached data packets are processed by data compression to complete the encapsulation message, and then the encapsulation message is published to the intermediate proxy; the data compression processing process comprises the following steps: if two pieces of content in the data are the same, replacing the next piece of content by a pair of information of the distance between the two pieces of content and the length of the same piece of content, regarding the value of the preset bit length in the data as a symbol, recoding the symbol according to the frequency of the symbol in the data, wherein the number of times of the symbol is inversely proportional to the number of data bits of the Huffman code;

and the data decompression output module is used for receiving the packaging information released by the publishing terminal by the intermediate proxy, decompressing the packaging information and sending the decompressed packaging information to the subscribing terminal.

Further, the initialization process in the initialization module includes: the publishing terminal and the subscribing terminal respectively extract the port number and the IP address of the publishing terminal and the subscribing terminal and send the port number and the IP address to the intermediate agent; after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal.

Still further, the data compression module further comprises:

an initialization unit: initializing the cached data packet, and adding file header information to the cached data packet as a whole;

a data compression unit: carrying out data compression processing on the cached data packet added with the file header information;

a checking unit: and checking all data after data compression processing, returning to an execution initialization unit if the data are wrongly encoded in the compression processing process, and outputting an encoding result, namely outputting an encapsulation message if the data are wrongly encoded in the compression processing process.

Furthermore, before executing the data compression unit, the method also comprises the step of establishing a Huffman dictionary.

Furthermore, the verification unit performs huffman coding on the data according to a huffman coding rule set in the established huffman dictionary so as to complete data compression processing.

Further, the process of decompressing in the data decompression output module is as follows:

The invention has the advantages that:

(1) the data of each publishing terminal is serialized into a binary file through data compression processing and then is published to the intermediate proxy, the intermediate proxy decompresses the binary file and then publishes the binary file to the subscribing terminal, a lossless compression algorithm is introduced into the traditional message intermediate event data distribution process, so that the transmitted data is serialized and lossless compressed, the memory consumed by the subscribing terminal, namely the receiving terminal in the data receiving process is greatly reduced, and the problem of a large amount of lost packets caused by slow exhaustion of the receiving terminal cache until overflow is easily generated in the high-frequency data distribution process is well solved.

(2) The invention introduces the lossless compression algorithm into the distribution process, and the data is compressed and then sent into the transmission channel, so that the utilization rate of the bandwidth is improved when the data is distributed at high frequency.

(3) If two pieces of content in the data of the invention are the same, the next piece of content is replaced by a pair of information of the distance between the two pieces of content and the length of the same piece of content, and the size of the pair of information of the length of the same piece of content is smaller than that of the replaced piece of content due to the distance between the two pieces of content, so that the file is compressed.

(4) The Huffman coding of the invention regards the value of the preset bit length in the data as symbols, re-codes the symbols according to the frequency of the symbols appearing in the data, uses less data bits to represent the symbols with a very large number of occurrences, and uses more data bits to represent the symbols with a very small number of occurrences, so that the number of bits of some parts of the file is reduced, the number of bits of some parts is increased, and the file is further compressed because the reduced parts are more than the enlarged parts.

Drawings

FIG. 1 is a timing diagram of a conventional publish-subscribe pattern;

FIG. 2 is a flow chart of a data distribution method based on a lossless compression algorithm according to an embodiment of the present invention;

FIG. 3 is a data distribution timing chart of a data distribution method based on a lossless compression algorithm according to an embodiment of the present invention;

FIG. 4 is a flowchart of a publisher in a data distribution method based on a lossless compression algorithm according to an embodiment of the present invention;

fig. 5 is a flowchart of a subscriber in the data distribution method based on the lossless compression algorithm according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

The process of the existing data distribution method is shown in fig. 1, a publish-subscribe system is composed of a publish terminal and a plurality of subscribe terminals,

the issuing terminal and the subscribing terminal respectively send the port number and the IP address of the issuing terminal and the subscribing terminal to the intermediate proxy;

after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal to indicate that the port number and the IP address sent by the opposite side are received;

the publishing terminal and the subscribing terminal are connected through an intermediate proxy, and the intermediate proxy connects the publishing terminal and the subscribing terminal with the same theme;

when the message is published, the publishing terminal sends the message corresponding to the topic requested by the subscribing terminal to the intermediate proxy, and the intermediate proxy forwards the message corresponding to the topic requested by the subscribing terminal to the subscribing terminal. The publishing and subscribing system uses the PUB-SUB socket to carry out asynchronous communication, the publishing end uses the PUB socket to publish data, the data is sent to each subscribing end in a fan-out mode, the subscribing end uses the SUB socket to receive the data, the subscribing end filters the type of the message to be subscribed before receiving the data, namely, the subscribing end needs to identify the data content needed by the subscribing end, and the subscribing end cannot receive any message if the subscribing end does not set the subscription content needed by the subscribing end. Each message subscribing terminal receives the messages in a circular waiting mode, and the message publishing terminal calls zmq _ send () method to continuously publish the messages according to the requirements.

In fig. 1, publish represents an issuing end, Middleware represents an intermediate proxy, subscribe represents a subscribing end, zmp.socket (zmp.pub) represents that the issuing end sends its port number and IP address to the intermediate proxy, zmp.socket (zmp.sub) represents that the subscribing end sends its port number and IP address to the intermediate proxy, return represents that the intermediate proxy feeds back a signal of receiving a message to the issuing end or the subscribing end, zmp.bind () represents that the issuing end is bound with the intermediate proxy, that is, the issuing end is connected with the intermediate proxy, zmp.connect () represents that the subscribing end is connected with the intermediate proxy, zmp.send represents that the issuing end sends a message requested by the subscribing end to the intermediate proxy, and zmp.recv () represents that the subscribing end receives a message issued by the subscribing end.

However, in the above-described data distribution method in the prior art, under the condition that the message is sent and received between the publishing terminal and the subscribing terminal at a high frequency, the subscribing terminal reads the message and can only read the data in the buffer region by invoking system interrupt, so that the memory release rate of the buffer region at the subscribing terminal is slower than the message sending rate, and considering that the bandwidth utilization rate of the message sent and received in the ethernet under normal conditions is low, the cache residual space of the subscribing terminal is often gradually reduced after a period of time, which causes the cache of the subscribing terminal to be slowly exhausted until overflowing, and the subscribing terminal cannot effectively utilize its own trigger mechanism to handle the larger packet loss condition.

As shown in fig. 3, the present invention provides a data distribution method based on lossless compression algorithm based on the problem that data packet loss is easily generated during data distribution in the prior art, and compresses and decompresses data based on the prior data distribution to reduce the data packet loss rate, where the method is applied to a publish-subscribe system, and the publish-subscribe system is composed of a publish terminal and a plurality of subscribe terminals, and the method includes:

s1: initializing both a publishing terminal and a subscribing terminal, and respectively extracting port numbers and IP addresses of the publishing terminal and the subscribing terminal in the initialization process and sending the port numbers and the IP addresses to an intermediate agent; after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal; in fig. 3, self _init _ () indicates that both the publishing side and the subscribing side are initialized, compress indicates compression, decompress indicates decompression, zmp.pub indicates the publishing side, zmp.sub indicates the subscribing side, package.duration () indicates the result of compressing the cached packet, and return package.loads () indicates the result of returning the decompressed result.

lossless compression algorithms utilize statistical redundancy of data for compression and can fully recover the original data without causing any distortion. Generating and continuously updating a compression dictionary of the current block through a Hash algorithm; and on the other hand, the data matching condition is stored in the buffer area through the lookup dictionary, and the frequency of the data matching condition is counted. Thus, the matching relationship of the original data is described by the unmatched bytes, the matching length and the matching distance. And after the processing window data is processed, coding the matching condition of the processing window data by utilizing the dynamic and static Huffman algorithms, and comparing and selecting the Huffman algorithm with good compression effect to perform compression coding output. The flow chart is shown in fig. 2, the LZ77 algorithm in fig. 2 is the lossless compression algorithm described in step S3 of the present invention, and the Zlib fault tolerance manner is the verification manner described in step S33 of the present invention, which is prior art and will not be described herein again. The specific compression process is as follows,

s31: initializing the cached data packet, and adding file header information, namely header information, to the cached data packet as a whole, so as to identify and package the message packet according to the file header information during decompression;

s32: performing data compression processing on the cached data packet added with the file header information, wherein the data processing process comprises the following steps: if two pieces of content in the data are the same, the next piece of content is replaced by a pair of information, namely the distance between the two pieces of content and the length of the same piece of content. The size of the pair of information, the length of the same content, is smaller than the size of the replaced content due to the distance between the two, so that the file is compressed. The Huffman coding is to regard the value with preset bit length in the data as the symbol, and to conduct Huffman coding on the symbol according to the frequency of the symbol appearing in the file, to establish the Huffman tree and to output the coding. For example, 256 values of 8 bits long, i.e., 256 values of a byte, are considered as symbols. The symbols are recoded according to the frequency of the symbols appearing in the data, the number of the symbols appearing is inversely proportional to the number of the Huffman-coded data bits, namely, the symbols appearing for a very large number of times are represented by fewer data bits, the symbols appearing for a very small number of times are represented by more data bits, for example, the number of the A symbols appearing is 100, the number of the B symbols appearing is 200, the number of the C symbols appearing is 300, the number of the D symbols appearing is 400, the D symbols are represented by one-bit data bits, the C symbols are represented by two-bit data bits, the B symbols are represented by three-bit data bits, the A symbols are represented by four-bit data bits, or the D symbols are represented by one-bit data bits 0, the C symbols are represented by a coding form 1 of one-bit data bits different from the D symbols, and the specific coded data bits are adjusted according to the actual application. When encoding is carried out, the start bit symbol is uniformly added behind or in front of the encoding result of each symbol, and the start bit symbol is different from the encoding results of all the symbols, so that each symbol can be distinguished conveniently when decoding is carried out. Thus, the number of partial bits of the file is reduced, the number of partial bits is increased, and the file is compressed because the reduced part is larger than the enlarged part.

Before data compression processing is carried out on the cached data packet added with the file header information, a Huffman dictionary can be generated and updated, and Huffman coding is carried out on data according to Huffman coding rules established by the dictionary so as to establish a Huffman tree more quickly, so that data compression is completed more quickly.

S33: and checking all data after data compression processing, if the coding is wrong in the compression processing process, returning to the step S31 for compression processing again, and if the coding is correct in the compression processing process, outputting a coding result, namely an encapsulation message.

S4: the intermediate proxy receives the encapsulated message published by the publishing terminal, decompresses the encapsulated message, and sends the decompressed encapsulated message to the subscribing terminal, and the specific process is as follows: the intermediate proxy reads the binary file after receiving the compressed data, constructs a Huffman tree table, searches for a matched copy of the binary file according to the Huffman tree table, replaces the matched copy with a corresponding symbol, completes decompression, performs deserialization on the data after decompression, and returns the deserialized data to the subscriber. The distributed data is compressed and then sent to the transmission channel by introducing a lossless compression algorithm, so that the utilization rate of bandwidth is improved when the data is distributed at high frequency by a data distribution model, and the problem of memory overflow caused by fast filling a cache region when a subscriber receives the data is solved.

Fig. 3 is a timing diagram of a data distribution service after being improved by the method provided by the present invention, because the improved lossless compression algorithm is added in the improved data distribution method, when data is distributed, a distribution end calls a send _ compress () interface to send data, and when receiving, a subscriber calls a recv _ compress () interface to return data received by a subscription port Socket. Fig. 4 and 5 are flow charts of the modified publisher and subscriber.

Through the technical scheme, the data of each publishing terminal is serialized into the binary file through the lossless compression algorithm and then cross-arrives at the subscribing terminal, the lossless compression algorithm is introduced into the traditional message intermediate event data distribution process, so that the transmitted data is serialized and lossless compressed, the memory consumed by the subscribing terminal, namely the receiving terminal in the data receiving process is greatly reduced, and the problem that a large amount of packet loss from slow exhaustion of the receiving terminal cache to overflow is easily generated in the high-frequency data distribution process is well solved.

Example 2

Based on embodiment 1 of the present invention, embodiment 2 of the present invention further provides a data distribution apparatus based on a lossless compression algorithm, where the apparatus is applied to a publish-subscribe system, the publish-subscribe system is composed of a publishing terminal and a plurality of subscribing terminals, and the apparatus includes:

Specifically, the initialization process in the initialization module includes: the publishing terminal and the subscribing terminal respectively extract the port number and the IP address of the publishing terminal and the subscribing terminal and send the port number and the IP address to the intermediate proxy; after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal.

Specifically, the data compression module further includes:

More specifically, the start bit sign is uniformly added after or before the coding result of each symbol, and the start bit sign is different from the coding results of all the symbols.

More specifically, before executing the data compression unit, the method further comprises the step of establishing a huffman dictionary.

More specifically, the verification unit performs huffman coding on the data according to a huffman coding rule set in the established huffman dictionary, thereby completing the data compression processing.

Specifically, the process of decompressing in the data decompression output module is as follows:

Example 3

As shown in fig. 3, embodiment 3 of the present invention provides a data distribution method, where the method is applied to a publishing terminal of a publish-subscribe system, where the publish-subscribe system is composed of one publishing terminal and a plurality of subscribing terminals, and the method includes:

s1: and initializing the issuing end, wherein the issuing end extracts the port number and the IP address of the issuing end per se in the initialization process and sends the port number and the IP address to the intermediate agent to acquire the feedback message of the intermediate agent.

S2: the publishing terminal establishes connection with the subscribing terminal through the intermediate proxy;

s3: the publishing terminal caches the data packets according to the request of the subscribing terminal forwarded by the intermediate proxy and caches the data packets to the intermediate proxy, when the cached data packets reach the set message sending number, the cached data packets are processed into a binary file through data compression to complete the encapsulation message, and then the encapsulation message is published to the intermediate proxy;

lossless compression algorithms utilize statistical redundancy of data for compression and can fully recover the original data without causing any distortion. Generating and continuously updating a compression dictionary of the current block through a Hash algorithm; and on the other hand, the data matching condition is stored in the buffer area through the lookup dictionary, and the frequency of the data matching condition is counted. Thus, the matching relationship of the original data is described by the unmatched bytes, the matching length and the matching distance. And after the processing window data is processed, coding the matching condition of the processing window data by utilizing the dynamic and static Huffman algorithms, and comparing and selecting the Huffman algorithm with good compression effect to perform compression coding output. The flow chart is shown in fig. 2, the LZ77 algorithm in fig. 2 is the lossless compression algorithm described in step S3 of the present invention, and the Zlib fault-tolerant manner is the checking manner described in step S33 of the present invention, which is prior art and will not be described herein. The specific compression process is as follows,

s31: initializing the cached data packet, and adding file header information, namely header information, to the cached data packet as a whole, so as to identify and package a message packet according to the file header information during decompression;

s32: performing data compression processing on the cached data packet added with the file header information, wherein the data processing process comprises the following steps: if two pieces of content in the data are the same, the next piece of content is replaced by a pair of information, namely the distance between the two pieces of content and the length of the same piece of content. The size of the pair of information, the length of the same content, is smaller than the size of the replaced content due to the distance between the two, so that the file is compressed. The Huffman coding is to regard the value with preset bit length in the data as the symbol, to conduct Huffman coding to the symbol according to the frequency of the symbol appearing in the file, to establish the Huffman tree and to output the coding. For example, 256 values of 8 bits long, i.e., 256 values of a byte, are considered as symbols. The symbols are recoded according to the frequency of the symbols appearing in the data, the number of the symbols appearing is inversely proportional to the number of the Huffman-coded data bits, namely, the symbols appearing for a very large number of times are represented by fewer data bits, the symbols appearing for a very small number of times are represented by more data bits, for example, the number of the A symbols appearing is 100, the number of the B symbols appearing is 200, the number of the C symbols appearing is 300, the number of the D symbols appearing is 400, the D symbols are represented by one-bit data bits, the C symbols are represented by two-bit data bits, the B symbols are represented by three-bit data bits, the A symbols are represented by four-bit data bits, or the D symbols are represented by one-bit data bits 0, the C symbols are represented by a coding form 1 of one-bit data bits different from the D symbols, and the specific coded data bits are adjusted according to the actual application. When encoding is carried out, the start bit symbol is uniformly added behind or in front of the encoding result of each symbol, and the start bit symbol is different from the encoding results of all the symbols, so that each symbol can be distinguished conveniently when decoding is carried out. Thus, the number of partial bits of the file is reduced, the number of partial bits is increased, and the file is compressed because the reduced part is larger than the enlarged part.

S33: and checking all data after the data compression processing, returning to execute S31 for compression processing if the coding is wrong in the compression processing process, and outputting the coding result, namely the encapsulation message, if the coding is not wrong in the compression processing process.

Example 4

As shown in fig. 3, an embodiment 4 of the present invention provides a data distribution method, where the method is applied to an intermediate broker of a publish-subscribe system, where the publish-subscribe system is composed of a publishing terminal and a plurality of subscribing terminals, and the method includes:

s1: after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal.

S2: the intermediate proxy establishes connection with the publishing terminal and the subscribing terminal respectively, and connects the publishing terminal and the subscribing terminal with the same theme;

s3: the intermediate agent receives an encapsulation message sent by a release terminal;

s4: the intermediate proxy decompresses the encapsulated message and sends the decompressed encapsulated message to the subscription end, and the specific process is as follows: the method comprises the steps that a package message is a binary file, an intermediate proxy reads the binary file after receiving the package message, a Huffman tree table is constructed, the binary file is searched, matched, copied and replaced by a corresponding symbol according to the Huffman tree table, decompression is completed, the symbol obtained after decompression is deserialized to obtain a value with a preset bit length, all values with the preset bit length form complete data, if the data has a pair of information of the distance between two pieces of content and the length of the same content, the position of the same content is found according to the distance between the two pieces of content, the same content replaces the pair of information of the distance between the two pieces of content and the length of the same content, decompression is further completed, the data after decompression comprises file header information, different data packets are distinguished according to the file header information, and the data packets are sent to a subscription end.

Example 5

As shown in fig. 3, an embodiment 5 of the present invention provides a data distribution method, where the method is applied to a subscriber of a publish-subscribe system, where the publish-subscribe system is composed of a publisher and a plurality of subscribers, and the method includes:

s1: the subscription terminal initializes, extracts the self port number and IP address and sends to the intermediate proxy in the initialization process, and obtains the feedback information of the intermediate proxy;

s2: the subscription end establishes connection with the intermediate proxy and requests the intermediate proxy to receive data of a specified theme;

s3: and the subscriber receives the data packet sent by the intermediate proxy.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. The data distribution method based on the lossless compression algorithm is characterized in that the method is applied to a publishing and subscribing system, the publishing and subscribing system consists of a publishing terminal and a plurality of subscribing terminals, and the method comprises the following steps:

s1: initializing both a publishing terminal and a subscribing terminal;

2. The lossless compression algorithm-based data distribution method according to claim 1, wherein the initialization process in step S1 includes: the publishing terminal and the subscribing terminal respectively extract the port number and the IP address of the publishing terminal and the subscribing terminal and send the port number and the IP address to the intermediate agent; after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal.

3. The lossless compression algorithm-based data distribution method according to claim 2, wherein the step S3 includes the steps of:

4. The method of claim 3, wherein the start bit sign is uniformly added after or before the coding result of each symbol, and the start bit sign is different from the coding result of all symbols.

5. The data distribution method based on lossless compression algorithm of claim 3, wherein the step S32 is preceded by the step of establishing a Huffman dictionary.

6. The data distribution method based on lossless compression algorithm of claim 5, wherein in step S33, data is huffman coded according to huffman coding rules set in the huffman dictionary to complete the data compression process.

7. The data distribution method based on lossless compression algorithm according to claim 1, wherein the decompression in step S4 is performed by:

8. The data distribution device based on lossless compression algorithm is characterized in that the device is applied to a publishing and subscribing system, the publishing and subscribing system is composed of a publishing terminal and a plurality of subscribing terminals, and the device comprises:

9. The apparatus for data distribution based on lossless compression algorithm according to claim 8, wherein the initialization process in the initialization module includes: the publishing terminal and the subscribing terminal respectively extract the port number and the IP address of the publishing terminal and the subscribing terminal and send the port number and the IP address to the intermediate agent; after receiving the port number and the IP address sent by the publishing terminal and the port number and the IP address sent by the subscribing terminal, the intermediate proxy respectively sends a feedback message to the publishing terminal and the subscribing terminal.

10. The lossless compression algorithm-based data distribution apparatus according to claim 9, wherein the data compression module further comprises: