CN115906809A - Text transmission method and electronic equipment - Google Patents

Text transmission method and electronic equipment Download PDF

Info

Publication number
CN115906809A
CN115906809A CN202211529772.5A CN202211529772A CN115906809A CN 115906809 A CN115906809 A CN 115906809A CN 202211529772 A CN202211529772 A CN 202211529772A CN 115906809 A CN115906809 A CN 115906809A
Authority
CN
China
Prior art keywords
segmentation
text
word segmentation
dictionary
compressed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211529772.5A
Other languages
Chinese (zh)
Inventor
文江
江文龙
斯奇能
李原
朱崇凯
白璐
吴梦秋
周明伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202211529772.5A priority Critical patent/CN115906809A/en
Publication of CN115906809A publication Critical patent/CN115906809A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention discloses a text transmission method and electronic equipment, wherein the text transmission method comprises the following steps: acquiring a text to be transmitted; performing word segmentation on a text to be transmitted, and acquiring word segmentation word frequency corresponding to each word segmentation in the text to be transmitted so as to determine word segmentation weight corresponding to each word segmentation; sorting the corresponding participles according to the size relation of the participle weights to obtain a participle sorting result; when the segmentation ordering result is different from the segmentation ordering result in the historical segmentation compressed dictionary, updating the historical segmentation compressed dictionary by using the segmentation ordering result to obtain a current segmentation compressed dictionary; and constructing compressed information for the text to be transmitted based on the current word segmentation compressed dictionary so as to send the compressed information. The text compression dictionary can be constructed according to the word segmentation weight of the transmission text, the text compression dictionary is updated, the constructed compression information of the text to be transmitted is transmitted based on the updated text compression dictionary, and the compression ratio is improved, so that the transmission efficiency is improved.

Description

Text transmission method and electronic equipment
Technical Field
The present application relates to the field of data transmission technologies, and in particular, to a text transmission method and an electronic device.
Background
With the continuous development and landing of cloud native technology and the continuous improvement of big data technology, a large number of repeated fields related to the context exist in text transmission.
In the research and practice processes of the prior art, with the further development of network technologies, more and more data needing network transmission, especially large-scale landing of cloud native and distributed technology applications, brings about a sharp increase in the amount of transmission and storage data, and the overhead of the whole network bandwidth is very large.
Disclosure of Invention
The technical problem mainly solved by the application is to provide a text transmission method and electronic equipment, a text compression dictionary can be built according to word segmentation weights of a transmission text, the text compression dictionary is updated, compression information is built on the text to be transmitted based on the updated text compression dictionary, and the compression ratio is improved so as to improve the transmission efficiency.
In order to solve the technical problem, the present application adopts a technical scheme that: a text transmission method is provided and applied to a sending end, and the text transmission method comprises the following steps: acquiring a text to be transmitted; performing word segmentation on the text to be transmitted, and acquiring word segmentation word frequency corresponding to each word segmentation in the text to be transmitted so as to determine word segmentation weight corresponding to each word segmentation; sorting the corresponding participles according to the size relation of the participle weights to obtain a sorting result of the participles; when the sorting result of the participles is different from the sorting result of the participles in the historical participle compression dictionary, updating the historical participle compression dictionary by using the sorting result of the participles to obtain a current participle compression dictionary; and constructing compressed information for the text to be transmitted based on the current word segmentation compressed dictionary so as to send the compressed information.
In an embodiment of the application, the sorting the corresponding participles according to the magnitude relationship of the participle weights to obtain a sorting result of the participles includes: obtaining a word segmentation weight corresponding to each word segmentation; and sequencing the participles with the participle weights larger than a preset weight threshold value according to the size relation of the participle weights, and obtaining the sequencing result of the sequenced participles.
In an embodiment of the application, after obtaining a ranking result of the ranked word segments, the text transmission method further includes: caching a preset number of the word segments sorted in the front according to the sorting result of the word segments; and determining a segmentation compression dictionary identifier corresponding to each cached segmentation according to the sequencing result of the segmentation, and constructing an initial segmentation compression dictionary based on the segmentation compression dictionary identifiers.
In an embodiment of the application, when the sorting result of the participles is different from the sorting result of the participles in the historical participle compression dictionary, updating the historical participle compression dictionary by using the sorting result of the participles to obtain the current participle compression dictionary, including: when a new word segmentation ordering result is obtained, determining the initial word segmentation compression dictionary as a historical word segmentation compression dictionary; and if the new segmentation compressed dictionary identification corresponding to the segmentation exists different from the segmentation compressed dictionary identification in the historical segmentation compressed dictionary, updating the historical segmentation compressed dictionary by using the new segmentation compressed dictionary identification corresponding to the segmentation to obtain the current segmentation compressed dictionary.
In an embodiment of the application, after obtaining the current word segmentation compression dictionary, the text transmission method further includes: and updating and synchronizing the current word segmentation compression dictionary to a receiving end through an increment in a preset period, wherein the increment comprises newly added words and words with a deletion utilization rate lower than a preset utilization rate.
In an embodiment of the application, the constructing compression information for the text to be transmitted based on the current word segmentation compression dictionary to send the compression information includes: constructing a transmission format based on a transmission protocol, wherein the transmission format comprises a transmission type, a length byte number, a length description and data content; and setting each word segmentation of the text to be transmitted according to the current word segmentation compression dictionary and the transmission format to obtain compression information corresponding to the text to be transmitted so as to send the compression information.
In order to solve the above technical problem, another technical solution adopted by the present application is: a text transmission method is provided and applied to a receiving end, and comprises the following steps: receiving compressed information; decoding the compressed information based on a segmentation compression dictionary to obtain segmentation compression dictionary identifications corresponding to all the segmentation words, wherein the segmentation compression dictionary is updated and synchronized by a sending end according to a segmentation ordering result of a transmission text; and acquiring corresponding actual data based on the segmentation compression dictionary identification of each segmentation so as to determine the text content of the transmission text.
In an embodiment of the application, the decoding the compressed information based on the participle compression dictionary to obtain an identifier of the participle compression dictionary corresponding to each participle includes: acquiring a segmentation compressed dictionary which is updated and synchronized by a sending end according to a segmentation ordering result of a transmission text; and decoding the compressed information according to the word segmentation compression dictionary to obtain word segmentation compression dictionary identifications corresponding to the compressed information, transmission types corresponding to the word segmentation compression dictionary identifications and data contents.
In an embodiment of the application, the obtaining of corresponding actual data based on the segmentation compression dictionary identifier of each segmentation to determine text content of the transmission text includes: determining whether the compression information has compressed content based on the transmission type; if the transmission type indicates that compressed content exists, decoding the compressed content according to the word segmentation compression dictionary to obtain actual data corresponding to the transmission text and determine the actual data as text content of the transmission text; and if the transmission type indicates that compressed content does not exist, directly determining actual data corresponding to the data content as the text content of the transmission text.
In order to solve the above technical problem, the present application adopts another technical solution: there is provided an electronic device comprising a memory and a processor coupled to the memory, the memory storing at least one computer program which, when loaded and executed by the processor, is adapted to carry out a text transmission method as described above.
Different from the prior art, the text transmission method provided by the application comprises the following steps: acquiring a text to be transmitted; performing word segmentation on the text to be transmitted, and acquiring word segmentation word frequency corresponding to each word segmentation in the text to be transmitted so as to determine word segmentation weight corresponding to each word segmentation; sorting the corresponding participles according to the size relation of the participle weights to obtain a sorting result of the participles; when the sorting result of the participles is different from the sorting result of the participles in the historical participle compression dictionary, updating the historical participle compression dictionary by using the sorting result of the participles to obtain a current participle compression dictionary; and constructing compressed information for the text to be transmitted based on the current word segmentation compressed dictionary so as to send the compressed information. The method and the device have the advantages that corresponding participles are sequenced through the participle weights, when the sequencing result of the participles is different from the segmentation sequencing result in the historical participle compression dictionary, the historical participle compression dictionary is updated according to the current sequencing result of the participles, compressed information is constructed for the text to be transmitted according to the updated participle compression dictionary, the compression ratio can be improved through participle compression, the participle compression dictionary is dynamically updated through the participle weights, the high compression ratio can be kept in the whole transmission process, and therefore transmission efficiency is improved when the compressed information is sent.
Drawings
FIG. 1 is a schematic flow chart diagram of a first embodiment of a text transmission method of the present application;
FIG. 2 is a schematic flow chart of an embodiment of step S12 of the present application;
FIG. 3 is a schematic flowchart of an embodiment of step S13 of the present application;
FIG. 4 is a schematic flow chart illustrating a process of the present application after obtaining the sorting result of the sorted participles;
FIG. 5 is a schematic flowchart of an embodiment of step S14 of the present application;
FIG. 6 is a flowchart illustrating an embodiment of step S15 of the present application;
FIG. 7 is a flowchart illustrating a second embodiment of the text transmission method of the present application;
FIG. 8 is a flowchart illustrating an embodiment of step S22 of the present application;
FIG. 9 is a flowchart illustrating an embodiment of step S23 of the present application;
FIG. 10 is a schematic block diagram of an embodiment of a text transmission system of the present application;
FIG. 11 is a schematic block diagram of an embodiment of an electronic device of the present application;
FIG. 12 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The invention is described in further detail below with reference to the figures and examples. It is to be noted that the following examples are only illustrative of the present invention, and do not limit the scope of the present invention. Likewise, the following examples are only some but not all examples of the present invention, and all other examples obtained by those skilled in the art without any inventive step are within the scope of the present invention.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In the conventional text transmission method, in order to improve the transmission efficiency, a large amount of word segmentation index information needs to be maintained during compression transmission, a high memory is occupied, a characteristic value needs to be generated by compression during each transmission, and the occupied bandwidth is high.
Therefore, the text transmission method provided by the application dynamically compresses the word segmentation compression dictionary through the word segmentation weight, and improves the compression ratio so as to improve the transmission efficiency.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a text transmission method according to a first embodiment of the present application; it should be noted that, if the result is substantially the same, the text transmission method of the present application is not limited to the flow sequence shown in fig. 1, and as shown in fig. 1, the text transmission method applied to the sending end includes the following steps:
s11, obtaining a text to be transmitted;
the text to be transmitted is text data which needs to be transmitted.
In some embodiments, the text to be transmitted may be txt text, doc text, hlp text, wps text, rtf text, htm text, pdf text, or the like, and may be transmitted in one of them or in multiple texts at the same time.
Specifically, at the sending end, the text data to be transmitted is determined as the text to be transmitted.
S12, performing word segmentation on the text to be transmitted, and acquiring word segmentation word frequency corresponding to each word segmentation in the text to be transmitted so as to determine word segmentation weight corresponding to each word segmentation;
the method comprises the following steps of carrying out word segmentation processing on a text to be transmitted, wherein the word segmentation processing is a precondition step for carrying out word segmentation compression, namely, carrying out reasonable segmentation on text contents by taking words as units; the word frequency of the participle indicates the occurrence frequency of the participle, and the participle weight is set according to the word frequency of the participle and is used for indicating the importance degree of the participle.
In some embodiments, the segmentation can be one of single word, word group, idiom, adage, colloquial and professional term or a combination of several of them, and the segmentation can be performed on the text to be transmitted according to the actual requirement so as to realize reasonable segmentation.
Specifically, word segmentation processing is performed on the obtained text to be transmitted to obtain each word segmentation corresponding to the file to be transmitted, the number of times of occurrence of each word segmentation is counted to determine a word segmentation word frequency corresponding to each word segmentation, and then a word segmentation weight of each word segmentation is obtained according to the word segmentation word frequency corresponding to each word segmentation.
S13, sequencing the corresponding participles according to the size relation of the participle weights to obtain a sequencing result of the participles;
each participle has a corresponding participle weight, and the ranking is performed according to the participle weights, so that a ranking result of the participles ranked according to the size relation of the participle weights can be obtained.
Specifically, for the obtained word segmentation weight of each word segmentation, sorting from large to small according to the numerical value size relationship of the word segmentation weight, and obtaining a sorting result corresponding to each word segmentation according to the numerical value size sorting result of the word segmentation weight.
S14, when the sorting result of the participles is different from the sorting result of the participles in the historical participle compression dictionary, updating the historical participle compression dictionary by using the sorting result of the participles to obtain a current participle compression dictionary;
the current obtained word segmentation ordering result is the current word segmentation ordering result, the historical word segmentation compression dictionary is constructed for the previous obtained historical word segmentation ordering result, and the existing word segmentation compression dictionary becomes the historical word segmentation compression dictionary after a newly obtained word segmentation ordering result every time.
S15, constructing compressed information for the text to be transmitted based on the current word segmentation compressed dictionary, and sending the compressed information.
And the compression information is obtained by compiling the text to be transmitted by the current word segmentation compression dictionary.
Specifically, the segmentation results of the text to be transmitted are matched according to the segmentation ordering result in the current segmentation compressed dictionary, if the segmentation results are matched, the compressed information is transmitted according to the segmentation ordering result in the current segmentation compressed dictionary, and if the segmentation results cannot be matched, the compressed information is transmitted according to the segmentation result of the text to be transmitted.
Different from the prior art, the method and the device determine the word segmentation weight corresponding to each word segmentation according to the word segmentation result of the text to be transmitted, sequence the corresponding word segmentation according to the size relation of the word segmentation weight to obtain a word segmentation sequencing result, update the historical word segmentation compression dictionary when the sequencing result of the word segmentation is different from the word segmentation sequencing result in the historical word segmentation compression dictionary, and construct compression information for the text to be transmitted based on the updated word segmentation compression dictionary to send the compression information corresponding to the text to be transmitted; the word segmentation compression dictionary is dynamically adjusted through the word segmentation weight, so that the high compression ratio can be kept in the whole transmission process, and the transmission efficiency is improved.
Referring to fig. 2, fig. 2 is a flowchart illustrating an embodiment of step S12 of the present application, in order to save network resources, some of the participles with smaller bytes and some of the participles with smaller occurrence times may be selectively filtered, as shown in fig. 2, step S12 includes:
s121, filtering the participles smaller than the preset bytes;
the preset bytes are set to filter the participles with small bytes.
In some embodiments, the preset byte may be set to 1 byte, or may be set according to actual conditions, so as to meet requirements, such as filtering letters, commas, semicolon periods, and the like.
Specifically, if the bytes of the participle are smaller than the preset bytes, filtering the participle; and if the bytes of the participle are larger than or equal to the preset bytes, keeping the participle. For example: and filtering out the participles with the byte less than 1 for each participle in the text to be transmitted, and reserving the participles with the byte more than or equal to 1.
S122, acquiring word frequency of each filtered participle according to the accumulated times of each filtered participle;
each filtered participle is a participle with the length larger than 1 byte, the word frequency of the participle can be obtained by counting the accumulation frequency of the participle, and the participle word frequency is the frequency of the occurrence frequency of the participle.
Specifically, the participles with more than 1 byte are obtained, the occurrence times of the participles with more than 1 byte are accumulated, and the participle word frequency of each participle is determined.
And S123, determining the participle weight corresponding to each filtered participle when the participle frequency is greater than the preset times.
The preset times are set for filtering the participles with less occurrence times, so that the cache space of the participle compression dictionary is reduced, and the occupation of network resources is reduced.
In some embodiments, the preset number of times may be set to 1 or 2 times or may be set according to actual situations.
Specifically, after filtering the participles smaller than 1 byte in the text to be transmitted and acquiring the participle word frequency corresponding to each participle, comparing the participle word frequency with the preset times, and if the participle word frequency is greater than the preset times, determining the participle weight corresponding to each participle according to each participle word frequency; and if the word frequency of the participle is less than or equal to the preset times, filtering the participle corresponding to the word frequency of the participle less than or equal to the preset times.
For example, setting the preset times to be 1, recording each participle as Di, and obtaining the participle word frequency Fi corresponding to each participle according to the occurrence times of each participle; if the word frequency of the participle is greater than 1 time, namely the occurrence frequency corresponding to the participle is greater than 1 time, determining the participle weight Wi corresponding to the participle according to the word frequency Fi corresponding to each participle and the Di of each participle, wherein Wi = Di (Fi-1), the Fi-1 is used for filtering the participle which only appears once, resetting the participle weight of the participle which only appears once to 0, and not participating in the subsequent participle compression dictionary construction process.
Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of step S13 in the present application, in order to further save network resources, a participle with a smaller participle weight may be filtered according to a preset weight threshold, as shown in fig. 3, step S13 includes:
s131, acquiring a word segmentation weight corresponding to each word segmentation;
here, the segmentation weight corresponding to each segmentation word is obtained in step S12.
Specifically, in step S12, after determining the segmentation weight corresponding to each segmentation according to the segmentation word frequency corresponding to each segmentation word in the text to be transmitted, the segmentation weight corresponding to each segmentation word is obtained.
S132, sorting the participles with the participle weights larger than the preset weight threshold according to the size relation of the participle weights, and obtaining a sorting result of the sorted participles.
Each participle weight has a corresponding numerical value, a preset weight threshold value is set, and the numerical value of the participle weight is compared with the preset weight threshold value so as to filter participles with smaller participle weights and reduce the occupation of network resources.
Specifically, after the segmentation weight corresponding to each segmentation is obtained, the segmentation weight smaller than the preset weight threshold is filtered, and then the filtered segmentation weights are sorted according to the size relationship.
In some embodiments, in order to further save network resources, the ranked next participles may be filtered according to a preset number, referring to fig. 4, and a flow chart after obtaining the ranking result of the ranked participles is shown in fig. 4, where the text transmission method further includes:
s133, caching the word segmentation in the preset number in the front according to the word segmentation sorting result;
the word segmentation method comprises the steps of obtaining word segmentation weights corresponding to the words, determining the word segmentation weights according to the word segmentation weights corresponding to the words, and setting a preset number of the word segmentation weights to filter a part of the words after the word segmentation weights are sequenced so as to reduce the occupation of network resources.
Specifically, a preset number is set, after the sorting is performed according to the magnitude relation of the word weight corresponding to each word to determine the sorting result of the words, the words sorted in the front are selected according to the preset number and cached, and the words sorted out of the preset number are not cached.
S134, determining a segmentation compression dictionary identifier corresponding to each cached segmentation according to the sorting result of the segmentation, and constructing an initial segmentation compression dictionary based on the segmentation compression dictionary identifiers.
The word segmentation compression dictionary identification is the identification of each word segmentation in the word segmentation compression dictionary, and the word segmentation compression dictionary stores the word segmentation result of each word segmentation and the word segmentation compression dictionary identification corresponding to each word segmentation.
In some embodiments, the word segmentation result may include word segmentation frequency and/or word segmentation byte number, or content obtained according to actual situations and representing the word segmentation result.
Specifically, after the word segmentation results of the word segmentation are determined by sequencing according to the magnitude relation of the word segmentation weight corresponding to each word segmentation, and the word segmentation in the front sequence is selected according to the preset number through caching, a word segmentation compression dictionary identifier is given to each cached word segmentation, and then an initial word segmentation compression dictionary is constructed according to the word segmentation compression dictionary identifier and the word segmentation results of each cached word segmentation. For example, here, the participle word frequency is used as a participle result, and an initial participle compression dictionary is constructed according to the cached identification of the participle compression dictionary corresponding to each participle and the cached word frequency corresponding to each participle.
In some embodiments, the word segmentation compression dictionary may be cached by an LFU (Least frequent Used), that is, when the cache space is full or elements need to be removed, the Least accessed data is preferentially eliminated, and if there are more data with the Least access times, the data with the access time farthest away from the current time is eliminated.
Referring to fig. 5, fig. 5 is a flowchart illustrating an embodiment of step S14 according to the present application, in order to ensure a high compression ratio in the whole transmission process, dynamic update of the segmentation compression dictionary is required, as shown in fig. 5, step S14 includes:
s141, when a new word segmentation ordering result is obtained, determining the initial word segmentation compression dictionary as a historical word segmentation compression dictionary;
after the initial segmentation compression dictionary is constructed, the segmentation ordering results obtained subsequently can be called as new segmentation ordering results.
Specifically, the first obtained ranking result of the participles is used to construct an initial participle compression dictionary, and the first obtained ranking result of the participles corresponds to the initial participle compression dictionary, so that the initial participle compression dictionary may be determined as a historical participle compression dictionary when a new ranking result of the participles is obtained.
And S142, if the word segmentation compression dictionary identification corresponding to the new word segmentation is different from the word segmentation compression dictionary identification in the historical word segmentation compression dictionary, updating the historical word segmentation compression dictionary by using the word segmentation compression dictionary identification corresponding to the new word segmentation to obtain the current word segmentation compression dictionary.
After the word segmentation with the prior sequence is selected according to the preset quantity by the cache, a word segmentation compression dictionary identifier is given to each cached word segmentation, so that each newly cached word segmentation has a word segmentation compression dictionary identifier.
Specifically, after a new word segmentation ordering result is obtained, a word segmentation compression dictionary identifier corresponding to each newly cached word segmentation is obtained and compared with a word segmentation compression dictionary identifier in the historical word segmentation compression dictionary, and if the word segmentation compression dictionary identifier corresponding to the new word segmentation is different from the word segmentation compression dictionary identifier in the historical word segmentation compression dictionary, the historical word segmentation compression dictionary is updated by using the word segmentation compression dictionary identifier corresponding to the new word segmentation, so that the current word segmentation compression dictionary is obtained.
In some embodiments, the update may include two parts, one part may be newly added participles, and the other part may be less frequently used participles; for example, when the word segmentation compression dictionary identifier corresponding to the new word segmentation is different from the word segmentation compression dictionary identifier in the historical word segmentation compression dictionary, adding the word segmentation result of the newly added word segmentation and the word segmentation compression dictionary identifier of the newly added word segmentation into the historical word segmentation compression dictionary, and deleting the word segmentation result of the word segmentation with lower utilization rate and the word segmentation compression dictionary identifier of the word segmentation with lower utilization rate in the historical word segmentation compression dictionary to obtain the current word segmentation compression dictionary.
In some embodiments, since the updating of the segmentation weight is time-consuming, it may be set to use a separate thread process and a periodic update to prevent the updating of the segmentation compression dictionary from occupying too many network resources, and huffman compression cannot be used here because only segmentation is compressed instead of all messages, and if huffman is used, it is necessary to describe huffman coding by carrying length information or long coding such as longest huffman coding.
Referring to fig. 6, fig. 6 is a flowchart illustrating an embodiment of step S15 of the present application, in order to further improve transmission efficiency, a uniform transmission format needs to be constructed for transmission data, as shown in fig. 6, step S15 includes:
s151, constructing a transmission format based on a transmission protocol, wherein the transmission format comprises a transmission type, a length byte number, a length description and data content;
wherein, the transmission protocol is a part of the data link layer, and a mechanism provided for transmitting data can be simple file transfer protocol (TFTP), network transmission protocol (such as TCP/IP, netBEUI, DHCP, FTP, etc.) and file transfer protocol (english: filetransfer protocol, FTP for short); the transmission format is a format that needs to be formulated before data transmission.
Specifically, in order to achieve higher transmission efficiency, an existing transmission format is modified appropriately to construct a transmission format of higher transmission efficiency. For example, the existing transport format is TLV format, and the modified transport format includes transport type, length byte number, length description, and data content. Setting transmission type occupation 2 bits for representing the type of the current content, wherein 00 represents instruction information, 01 represents transmission compression content, 10 represents transmission text content, and 11 is reserved; setting length byte number to occupy 2 bits, wherein the length byte number represents the space size occupied by the subsequent length description, wherein 00 represents 1 byte (8 bits), 01 represents 2 bytes, 10 represents 3 bytes and 11 represents 4 bytes; the length description occupies 1-4 bytes by the length byte number instruction, and the length conversion is carried out according to the small end, namely the length conversion is carried out according to the int type; the data content carries specific content, such as instruction information, compressed content or text content.
In some embodiments, the instruction information may include setting word segmentation compression, cancelling word segmentation compression; the message format may be transmitted in a format including but not limited to json, yml, xml, etc., or in a custom format.
S152, setting each word segmentation of the text to be transmitted according to the current word segmentation compression dictionary and the transmission format to obtain the compression information corresponding to the text to be transmitted so as to send the compression information.
In order to improve transmission efficiency, each word segmentation of a text to be transmitted needs to be set according to a transmission format to form a uniform transmission stream, and after the word segmentation is set according to the transmission format, compressed information corresponding to the text to be transmitted can be obtained, namely the compressed information is obtained after each word segmentation of the text to be transmitted is set according to the transmission format of a current word segmentation compression dictionary.
Specifically, after the current word segmentation compression dictionary is obtained and the transmission format is established, each word segmentation of the text to be transmitted is set according to the transmission format according to the current word segmentation compression dictionary, so that the compression information corresponding to the text to be transmitted can be obtained, and the compression information corresponding to the text to be transmitted is transmitted to the receiving end in the established transmission format based on the transmission protocol.
In some embodiments, after obtaining the sorting result of the word segmentation, the text transmission method further comprises:
and if the segmentation compression dictionary identification corresponding to the new segmentation is consistent with the segmentation compression dictionary identification in the historical segmentation compression dictionary, constructing compression information for the text to be transmitted based on the historical segmentation compression dictionary, and sending the compression information.
In some embodiments, the sending end does not need to rely on the history word segmentation compressed dictionary records when starting, and can be constructed in real time according to the transmission text. When the method is started, the sending end can select proper parameters to construct an LFU cache with a specified size according to service requirements and scenes, if the client accesses different data streams and the information difference of the different data streams is large, the sending end can also independently construct the LFU cache for each data stream and independently maintain a word segmentation compression dictionary of the data stream, and therefore transmission compression efficiency is improved. For LFU buffer space, a larger buffer space within a certain range can provide a better compression ratio, but beyond this range, even if the LFU is larger, it cannot provide a better compression effect. And when only a small space is allocated for the LFU, a good compression effect can be obtained, and the method is friendly to mobile equipment and the like.
When the sending end is connected with a new receiving end, a word segmentation compression dictionary resetting instruction is sent to inform the receiving end of emptying the cache, so that the condition that the cache of the receiving end lags behind to influence the subsequent text transmission decompression is prevented. With the text transmission, the word segmentation compressed dictionary cached in the LFU is continuously updated, and the updating comprises two parts, wherein one part is newly added word segmentation, and the other part is eliminated word segmentation caused by low utilization rate. When the two updates are generated, the sending end needs to inform the receiving end of updating the word segmentation dictionary in an incremental updating mode, and the consistency of the dictionaries of the two sides is kept.
Different from the prior art, the technical scheme of the embodiment dynamically adjusts the participle compression dictionary through the participle weight so as to ensure that the whole transmission process can keep high compression ratio and improve transmission efficiency; and by filtering the participles with small bytes, filtering the participles with less occurrence times, filtering the participles with smaller participle weights and filtering the participles with later sequencing, the occupation of network resources is reduced, so that the technical scheme can save the network resources, simultaneously can keep high compression ratio in the whole transmission process, is suitable for mass deployment in the cloud native environment and improves the transmission efficiency.
Referring to fig. 7, fig. 7 is a flowchart illustrating a text transmission method according to a second embodiment of the present application; it should be noted that, if the result is substantially the same, the text transmission method of the present application is not limited to the flow sequence shown in fig. 7, and as shown in fig. 7, the text transmission method applied to the receiving end includes the following steps:
s21, receiving compressed information;
wherein, the compressed information is transmitted by the transmitting end.
Specifically, the receiving end receives the compressed information transmitted by the transmitting end.
S22, decoding the compressed information based on the word segmentation compressed dictionary to obtain word segmentation compressed dictionary identifications corresponding to all the words, wherein the word segmentation compressed dictionary is updated and synchronized by the sending end according to word segmentation sequencing results of the transmission text;
the participle compression dictionary is shared by the sending end and the receiving end, so that the participle compression dictionary needs to be updated and synchronized at the sending end and the receiving end, and the condition of decoding errors can not occur when compressed information is decoded through the participle compression dictionary.
Specifically, after the sending end updates the segmentation compressed dictionary according to the segmentation ordering result of the transmission text, the segmentation compressed dictionary needs to be updated synchronously at the receiving end so as to ensure that the segmentation compressed dictionaries corresponding to the segmentation at the two ends are consistent; and decoding the compressed information based on the word segmentation compressed dictionary to obtain word segmentation compressed dictionary identifications and word segmentation results corresponding to all the words in the compressed information.
And S23, acquiring corresponding actual data based on the segmentation compression dictionary identification of each segmentation to determine the text content of the transmission text.
The actual data is the text content of the transmission text, namely the text content of the transmission text before being compressed; the word segmentation compressed dictionary identification and the word segmentation result of each word segmentation are cached in the word segmentation compressed dictionary, and because the word segmentation compressed dictionary identification corresponds to each word segmentation, each word segmentation can form the text content of the transmission text before being compressed, so the text content of the transmission text can be obtained.
Specifically, after obtaining the segmentation compressed dictionary identifier corresponding to each segmentation, the actual data corresponding to each segmentation may be obtained based on the segmentation compressed dictionary identifier of each segmentation, so as to determine the text content of the transmission text.
Different from the prior art, the technical scheme of the embodiment updates and synchronizes the segmentation compressed dictionary according to the segmentation ordering result of the transmission text by the sending end, and decodes the received compressed information based on the segmentation compressed dictionary to obtain the text content of the transmission text; the scheme can dynamically adjust the word segmentation compression dictionary to ensure that the whole transmission process can keep high compression ratio and improve transmission efficiency.
Referring to fig. 8, fig. 8 is a schematic flowchart of an embodiment of step S22 of the present application, and as shown in fig. 8, step S22 includes:
s221, acquiring a segmentation compressed dictionary which is updated and synchronized by the sending end according to a segmentation ordering result of the transmission text;
the word segmentation compression dictionary is shared by the sending end and the receiving end and needs to be consistent with the sending end, and if the sending end integrally maintains a word segmentation compression dictionary, the receiving end also needs to maintain only one word segmentation compression dictionary according to the identification of the sending end; if the transmitting end maintains the participle compression dictionary separately for each data stream, the receiving end also needs to maintain separately. The method and the device prevent the message decompression failure and even the message error caused by the inconsistency of the word segmentation dictionaries maintained by the receiving end of the sending end.
Specifically, after the transmitting end updates the segmentation compression dictionary according to the segmentation ordering result of the transmission text, the segmentation compression dictionary needs to be updated synchronously at the receiving end, so that the segmentation compression dictionary needs to be updated to the receiving end according to the updating condition of the transmitting end on the segmentation compression dictionary, so as to synchronize the segmentation compression dictionary shared by the transmitting end and the receiving end.
S222, decoding the compressed information according to the word segmentation compression dictionary to obtain word segmentation compression dictionary identifications corresponding to the compressed information, transmission types corresponding to the word segmentation compression dictionary identifications and data contents.
When the word segmentation compression dictionary identification is transmitted at the transmitting end, the word segmentation compression dictionary identification is transmitted based on a constructed transmission format, and the transmission format comprises a transmission type, a length byte number, a length description and data content. Therefore, the word segmentation compression dictionary identification corresponding to the compression information and the transmission type and data content corresponding to the word segmentation compression dictionary identification can be obtained.
Referring to fig. 9, fig. 9 is a schematic flowchart of an embodiment of step S23 in the present application, and as shown in fig. 9, step S23 includes:
s231, determining whether the compressed information has compressed content or not based on the transmission type;
the transmission type is contained in a transmission format, and in the transmission format, the transmission type is set to occupy 2 bits and is used for representing the type of the current content, wherein 00 represents instruction information, 01 represents transmission compression content, 10 represents transmission text content, and 11 is reserved; setting length byte number to occupy 2 bits to represent the space size occupied by the subsequent length description, wherein 00 represents that 1 byte (8 bits) is occupied, 01 represents that 2 bytes is occupied, 10 represents that 3 bytes is occupied, and 11 represents that 4 bytes is occupied; the length description occupies 1-4 bytes by the length byte number instruction, and the length conversion is carried out according to the small end, namely the length conversion is carried out according to the int type; the data content carries specific content, such as instruction information, compressed content or text content; the compressed content is the content compressed by the word segmentation compression dictionary in the transmission process.
Specifically, it is determined whether the compressed information includes specific content, whether the compressed content is included, whether the instruction information is included, whether the text content is included, and the like, based on the byte content of the transmission type.
S232, if the transmission type indicates that compressed content exists, decoding the compressed content according to the word segmentation compression dictionary to obtain actual data corresponding to the transmission text and determine the actual data as text content of the transmission text; and if the transmission type indicates that compressed content does not exist, directly determining actual data corresponding to the data content as the text content of the transmission text.
The compressed content is not directly readable, and therefore needs to be decoded, and the compressed content is compressed based on the participle compression dictionary, and therefore needs to be decoded according to the participle compression dictionary.
Specifically, when the specific content contained in the compressed information is determined to have compressed content based on the byte content of the transmission type, a word segmentation compression dictionary is called to decode the compressed content so as to obtain actual data of each word segmentation corresponding to each word segmentation compression dictionary identifier, and the actual data is combined and determined to be text content of the transmission text; and if the transmission type indicates that compressed content does not exist, directly taking actual data corresponding to the transmitted data content as the text content of the transmission text.
In some embodiments, for scenes such as log transmission and the like which need to be filed and read less, a receiving end can directly store a transport stream, and when the receiving end is used, the transport stream is analyzed from a transport stream disk-dropping file to recover an original file, so that occupation of a CPU by analysis can be reduced, and storage can be saved.
In some embodiments, the instruction information may include setting word segmentation compression, cancelling word segmentation compression; the message format may be transmitted in a format including but not limited to json, yml, xml, etc., or in a custom format. For example, after the receiving end receives the transport stream, if the character of the transport type is 00, the data block is used for constructing a dictionary; if the character of the transmission type is 01, acquiring actual data corresponding to the value from a dictionary according to the value of the data part in the data block; if the character of the transmission type is 10, the content of the data portion in the data block is directly taken out as the actual data. According to the method, the original data can be recovered at the receiving end or when the archived transport stream is read.
Referring to fig. 10, fig. 10 is a schematic structural diagram of an embodiment of a text transmission system of the present application, as shown in fig. 10, the text transmission system 200 includes a sending end 210, a server 220, and a receiving end 230; sender 210 is configured to implement the text transmission method applied to sender 210 as described above, server 220 is configured to communicate between sender 210 and receiver 230, and receiver 230 is configured to implement the text transmission method applied to receiver 230 as described above.
Referring to fig. 11, fig. 11 is a schematic structural diagram of an embodiment of an electronic device according to the present application. The electronic device can execute the steps in the text transmission method, and the related contents refer to the detailed description in the text transmission method, which is not described in detail herein.
The electronic device 300 includes a memory 320 and a processor 310 coupled to the memory, the memory 320 storing program instructions; the processor 310 executes the program instructions to implement the steps in the text transmission method described above.
Referring to fig. 12, fig. 12 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention. The computer-readable storage medium 400 stores a computer program 410, which computer program 410 realizes the steps in the text transmission method described above when executed by a processor. For a detailed description of the text transmission method, reference is made to the above text transmission method, which is not described in detail herein.
According to the scheme, the text to be transmitted is obtained; performing word segmentation on the text to be transmitted, and acquiring word segmentation word frequency corresponding to each word segmentation in the text to be transmitted so as to determine word segmentation weight corresponding to each word segmentation; sorting the corresponding participles according to the size relation of the participle weights to obtain a sorting result of the participles; when the sorting result of the participles is different from the sorting result of the participles in the historical participle compression dictionary, updating the historical participle compression dictionary by using the sorting result of the participles to obtain a current participle compression dictionary; and constructing compressed information for the text to be transmitted based on the current word segmentation compressed dictionary so as to send the compressed information. The scheme of this application can guarantee the high compression ratio in the transmission course promptly, improves transmission efficiency.
In the several embodiments provided in the present invention, it should be understood that the disclosed system and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules or units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A text transmission method is applied to a sending end, and comprises the following steps:
acquiring a text to be transmitted;
performing word segmentation on the text to be transmitted, and acquiring word segmentation word frequency corresponding to each word segmentation in the text to be transmitted so as to determine word segmentation weight corresponding to each word segmentation;
sorting the corresponding participles according to the size relation of the participle weights to obtain a sorting result of the participles;
when the sorting result of the participles is different from the sorting result of the participles in the historical participle compression dictionary, updating the historical participle compression dictionary by using the sorting result of the participles to obtain a current participle compression dictionary;
and constructing compressed information for the text to be transmitted based on the current word segmentation compressed dictionary so as to send the compressed information.
2. The text transmission method according to claim 1,
the sorting of the corresponding participles according to the magnitude relation of the participle weights to obtain the sorting result of the participles comprises the following steps:
acquiring a segmentation weight corresponding to each segmentation;
and sequencing the participles with the participle weights larger than a preset weight threshold value according to the size relation of the participle weights, and obtaining the sequencing result of the sequenced participles.
3. The text transmission method according to claim 2,
after obtaining the sorting result of the sorted word segmentation, the text transmission method further comprises the following steps:
caching the word segmentation with a preset number in the front according to the sorting result of the word segmentation;
and determining the segmentation compression dictionary identification corresponding to each cached segmentation according to the sequencing result of the segmentation, and constructing an initial segmentation compression dictionary based on the segmentation compression dictionary identification.
4. The text transmission method according to claim 3,
when the sorting result of the participles is different from the sorting result of the participles in the historical participle compression dictionary, updating the historical participle compression dictionary by using the sorting result of the participles to obtain a current participle compression dictionary, wherein the method comprises the following steps:
when a new word segmentation ordering result is obtained, determining the initial word segmentation compression dictionary as a historical word segmentation compression dictionary;
and if the new segmentation compressed dictionary identification corresponding to the segmentation exists different from the segmentation compressed dictionary identification in the historical segmentation compressed dictionary, updating the historical segmentation compressed dictionary by using the new segmentation compressed dictionary identification corresponding to the segmentation to obtain the current segmentation compressed dictionary.
5. The text transmission method according to claim 1,
after obtaining the current word segmentation compression dictionary, the text transmission method further comprises the following steps:
and updating and synchronizing the current word segmentation compression dictionary to a receiving end through an increment in a preset period, wherein the increment comprises newly added words and words with a deletion utilization rate lower than a preset utilization rate.
6. The text transmission method according to claim 1,
the constructing of compressed information for the text to be transmitted based on the current word segmentation compressed dictionary to send the compressed information includes:
constructing a transmission format based on a transmission protocol, wherein the transmission format comprises a transmission type, a length byte number, a length description and data content;
and setting each word segmentation of the text to be transmitted according to the current word segmentation compression dictionary and the transmission format to obtain compression information corresponding to the text to be transmitted so as to send the compression information.
7. A text transmission method is applied to a receiving end, and the text transmission method comprises the following steps:
receiving compressed information;
decoding the compressed information based on a segmentation compression dictionary to obtain segmentation compression dictionary identifications corresponding to all the segmentation words, wherein the segmentation compression dictionary is updated and synchronized by a sending end according to a segmentation ordering result of a transmission text;
and acquiring corresponding actual data based on the segmentation compression dictionary identification of each segmentation so as to determine the text content of the transmission text.
8. The text transmission method according to claim 7,
the decoding the compressed information based on the participle compression dictionary to obtain the corresponding participle compression dictionary identifier of each participle comprises the following steps:
acquiring a segmentation compressed dictionary which is updated and synchronized by a sending end according to a segmentation ordering result of a transmission text;
and decoding the compressed information according to the word segmentation compression dictionary to obtain word segmentation compression dictionary identifications corresponding to the compressed information, transmission types corresponding to the word segmentation compression dictionary identifications and data contents.
9. The text transmission method according to claim 8,
the acquiring of corresponding actual data based on the segmentation compression dictionary identification of each segmentation to determine the text content of the transmission text comprises:
determining whether the compressed information has compressed content based on the transmission type;
if the transmission type indicates that compressed content exists, decoding the compressed content according to the word segmentation compression dictionary to obtain actual data corresponding to the transmission text and determine the actual data as text content of the transmission text; and if the transmission type indicates that compressed content does not exist, directly determining actual data corresponding to the data content as the text content of the transmission text.
10. An electronic device, comprising a memory and a processor coupled to the memory, the memory storing at least one computer program which, when loaded and executed by the processor, is adapted to implement the text transmission method according to any of claims 1-9.
CN202211529772.5A 2022-11-30 2022-11-30 Text transmission method and electronic equipment Pending CN115906809A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211529772.5A CN115906809A (en) 2022-11-30 2022-11-30 Text transmission method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211529772.5A CN115906809A (en) 2022-11-30 2022-11-30 Text transmission method and electronic equipment

Publications (1)

Publication Number Publication Date
CN115906809A true CN115906809A (en) 2023-04-04

Family

ID=86482173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211529772.5A Pending CN115906809A (en) 2022-11-30 2022-11-30 Text transmission method and electronic equipment

Country Status (1)

Country Link
CN (1) CN115906809A (en)

Similar Documents

Publication Publication Date Title
US20150006475A1 (en) Data deduplication in a file system
CN109947668B (en) Method and device for storing data
US8972488B2 (en) System, methods, and media for providing in-memory non-relational databases
US9471646B2 (en) Method and server device for exchanging information items with a plurality of client entities
US11748335B2 (en) Maintaining consistency of data between computing nodes of a distributed computer architecture
US10187081B1 (en) Dictionary preload for data compression
CN110427368A (en) Data processing method, device, electronic equipment and storage medium
CN102194499A (en) Method and device for synchronizing compressed dictionary
WO2013079999A1 (en) Methods and devices for encoding and decoding messages
CN114666212A (en) Configuration data issuing method
CN112335203A (en) Processing local area network diagnostic data
Doblander et al. Shared dictionary compression in publish/subscribe systems
CN115906809A (en) Text transmission method and electronic equipment
CN116010348A (en) Distributed mass object management method and device
US20210286801A1 (en) Record transmitting method and device
EP2291952A2 (en) Adapter for synchronizing data over different networks
CN111435332B (en) Data processing method and device
CN113228596B (en) Method and device for transmitting list information
CN110784775A (en) Video fragment caching method and device and video-on-demand system
CN116339646B (en) Flight test data storage method, device, equipment and storage medium
US20100023479A1 (en) Hexadecimal file fast decompression method
CN114363640B (en) Data storage method, device and system
EP4290775A1 (en) Data compression method and apparatus
CN117640780A (en) Data transmission method, device, electronic equipment and computer storage medium
CN115048418A (en) Data processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination