CN116032292A - Efficient big data storage method based on translation file - Google Patents

Efficient big data storage method based on translation file Download PDF

Info

Publication number
CN116032292A
CN116032292A CN202310300380.XA CN202310300380A CN116032292A CN 116032292 A CN116032292 A CN 116032292A CN 202310300380 A CN202310300380 A CN 202310300380A CN 116032292 A CN116032292 A CN 116032292A
Authority
CN
China
Prior art keywords
sequence
character
buffer
phrase
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310300380.XA
Other languages
Chinese (zh)
Other versions
CN116032292B (en
Inventor
郑春光
葛玉梅
梅康
颜妍
杨玉猛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Smart Translation Information Technology Co ltd
Original Assignee
Shandong Smart Translation Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Smart Translation Information Technology Co ltd filed Critical Shandong Smart Translation Information Technology Co ltd
Priority to CN202310300380.XA priority Critical patent/CN116032292B/en
Publication of CN116032292A publication Critical patent/CN116032292A/en
Application granted granted Critical
Publication of CN116032292B publication Critical patent/CN116032292B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to the technical field of data processing for data compression, in particular to a high-efficiency storage method of big data based on a translation file, which comprises the following steps: obtaining a translation file, and preprocessing the translation file to obtain a translation sequence; compressing the translated sequence by combining the character searching buffer area and the phrase searching buffer area to obtain a compressed sequence, obtaining binary sequences corresponding to all the coding objects, and coding the compressed sequence according to the binary sequences corresponding to all the coding objects to obtain a coding sequence; and sending the coded sequence to a terminal at the conference site. The invention ensures the short execution time of compressing the translation and high compression efficiency by setting a smaller search buffer area, a character search buffer area and a phrase search buffer area for the LZ77 compression algorithm, solves the problem that the size of the search buffer area in the LZ77 compression algorithm contradicts the influence of the execution time and the compression efficiency, and realizes the efficient storage and the rapid acquisition of the translation file.

Description

Efficient big data storage method based on translation file
Technical Field
The invention relates to the technical field of data processing for data compression, in particular to a high-efficiency storage method for big data based on a translation file.
Background
Simultaneous interpretation is a translation mode which requires a translator to give out corresponding translations while hearing the speech of a speaker, and has the greatest characteristics of high efficiency, no influence or interruption on the thought of the speaker, and guarantee of coherent speech of the speaker. Under certain specific situations, the translator cannot synchronize to the scene, at this time, the audio signals recorded by the speaker during speaking need to be sent to the translator, the translator gives corresponding translation sequences, and then the translation sequences are sent to the terminal at the conference scene.
Because simultaneous interpretation has high requirements on timeliness, the compression efficiency of the translation sequence is required to be high, meanwhile, the execution time is short, and a simple and rapid compression algorithm is required to be used for efficiently and rapidly obtaining the translation sequence in a conference site; also, since simultaneous interpretation is performed in real time, statistical characteristics of the translations cannot be known in advance, and thus the translations cannot be compressed by a compression algorithm based on the statistical characteristics, such as a huffman compression algorithm.
Based on the requirements, the translation is compressed through an LZ77 compression algorithm. The LZ77 compression algorithm takes the compressed data in the translation as a searching buffer area, takes the data to be compressed as a preceding buffer area, utilizes the locality of the data to realize high-efficiency compression, and the size of the searching buffer area in the LZ77 compression algorithm determines the execution time and the compression efficiency of the algorithm, so that the problem that the influence of the size of the searching buffer area on the execution time and the compression efficiency is contradictory exists, and the problem that how to efficiently and rapidly obtain a translation sequence on a conference site becomes urgent to be solved.
Disclosure of Invention
The invention provides a high-efficiency storage method of big data based on a translation file, which aims to solve the existing problems.
The invention discloses a high-efficiency storage method for big data based on a translation file, which adopts the following technical scheme:
the embodiment of the invention provides a method for efficiently storing big data based on a translation file, which comprises the following steps:
obtaining a translation file, and preprocessing the translation file to obtain a translation sequence;
compressing the translation sequence in combination with the character search buffer and the phrase search buffer to obtain a compressed sequence, comprising:
s1, filling a second preset length of empty characters in front of a translated text sequence, taking a sequence consisting of the characters with the first preset length in the translated text sequence as a sliding window, and setting an empty sequence as a compressed character sequence;
s2, taking a sequence formed by a first second preset length of characters in the sliding window as a character searching buffer area and taking a sequence formed by the rest characters in the sliding window as a preceding buffer area;
s3, obtaining a phrase searching buffer area according to the compressed character sequence;
s401, obtaining all character strings of the preceding buffer, judging whether the phrase searching buffer has the same phrase as the 1 st character string of the preceding buffer, and executing S402 if not; if so, searching in the phrase searching buffer area to obtain the maximum matching item of the preceding buffer area, and according to the maximum matching item, obtaining an output result and a sliding quantity, and executing S5;
s402, searching in a character searching buffer area to obtain a maximum matching item of a preceding buffer area, obtaining an output result and a sliding quantity according to the maximum matching item, and executing S5;
s5, obtaining a new sliding window according to the sliding quantity, and taking the sliding window after sliding as the new sliding window; re-executing from S2 according to the new sliding window until the obtained size of the new sliding window is smaller than the second preset length; the sequence formed by all the obtained output results according to the sequence is recorded as a compressed sequence of the translation sequence;
obtaining binary sequences corresponding to all the coding objects, and coding the compressed sequences according to the binary sequences corresponding to all the coding objects to obtain coding sequences; and temporarily storing the obtained code sequence and sending the code sequence to a terminal at the conference site.
Further, the preprocessing of the translation file to obtain a translation sequence includes the following specific steps:
the method comprises the steps of taking a first preset symbol as an identifier of the end of an English word, taking a second preset symbol as an identifier of the end of a sentence, replacing the identifiers of the end of all English words in a translation file with the first preset symbol, replacing the identifiers of the end of all sentences in the translation file with the second preset symbol, adding a first preset symbol at the beginning of the translation file, taking the first preset symbol, the second preset symbol and all English letters as characters, and recording a sequence formed by all the characters contained in the translation file according to a sequence as a translation sequence.
Further, the root obtains a phrase searching buffer area according to the compressed character sequence, and the method comprises the following specific steps:
a sequence formed by all English letters between two symbols in a compressed character sequence is recorded as a phrase, and one symbol in the compressed character sequence is also recorded as a phrase, wherein the symbols comprise two types, namely a first preset symbol and a second preset symbol; all phrases in the obtained compressed character sequence are arranged according to the sequence, and finally
Figure SMS_1
The sequence formed by the individual phrases is used as a phrase searching buffer area pair to be filled with the translated text sequence, wherein,
Figure SMS_2
representing a second preset length.
Further, the obtaining all the character strings of the look-ahead buffer includes the following specific steps:
dividing the look-ahead buffer into a plurality of subsequences according to symbols, marking each subsequence and each symbol as a character string, and obtaining all character strings of the look-ahead buffer, wherein the symbols comprise two kinds of symbols, namely a first preset symbol and a second preset symbol.
Further, the searching in the phrase searching buffer area to obtain the maximum matching item of the preceding buffer area, and obtaining the output result and the sliding quantity according to the maximum matching item, comprising the following specific steps:
word group searching buffer zone and 1 st character string of preceding buffer zone
Figure SMS_5
The same phrase is obtained, and the phrase searching buffer area and the character string are obtained
Figure SMS_7
Identical phrase
Figure SMS_12
Wherein, the method comprises the steps of, wherein,
Figure SMS_4
representing an ith phrase in the phrase searching buffer area; if an integer is present
Figure SMS_10
And causes the ith phrase in the phrase lookup buffer
Figure SMS_13
To the (i+z) th phrase
Figure SMS_16
All phrases in between and the 1 st character string of the advance buffer zone
Figure SMS_3
To the 1+z-th character string
Figure SMS_8
All strings in the buffer are identical, the 1 st string in the buffer is first buffered
Figure SMS_9
To the 1+z-th character string
Figure SMS_14
The sequence of all strings in between is denoted as the largest match, where s represents the position in the look-ahead bufferThe number of strings; searching phrase and character string in buffer area
Figure SMS_6
Identical phrase
Figure SMS_11
The serial number i of (2) is marked as an offset, and the integer z is used as a matching length; taking a binary group consisting of the offset and the matching length as an output result; the 1 st character string of the advance buffer zone
Figure SMS_15
To the 1+z-th character string
Figure SMS_17
The sum of the lengths of all the character strings in between is used as the slip amount.
Further, the searching in the searching area to obtain the maximum matching item of the area to be processed, and obtaining the plaintext and the translation according to the maximum matching item comprises the following specific steps:
judging whether the character searching buffer area has the same character as the 1 st character of the first buffer area:
if the character lookup buffer does not have the 1 st character with the look-ahead buffer
Figure SMS_18
The same character will be the 1 st character of the look-ahead buffer
Figure SMS_19
As the maximum matching item, taking the maximum matching item as an output result, and taking the number of characters in the maximum matching item as a sliding quantity;
if the character search buffer exists with the 1 st character of the look-ahead buffer
Figure SMS_21
The same character, obtain the same character in the character search buffer
Figure SMS_24
Identical characters
Figure SMS_28
Wherein, the method comprises the steps of, wherein,
Figure SMS_23
representing a j-th character in the character lookup buffer; if an integer is present
Figure SMS_27
And causing the jth character in the character lookup buffer to
Figure SMS_30
To the j+r-th character
Figure SMS_32
All characters in between and character 1 of the look-ahead buffer
Figure SMS_22
Up to 1+r characters
Figure SMS_26
All characters in between are identical, 1 st character of the look-ahead buffer will
Figure SMS_29
Up to 1+r characters
Figure SMS_33
The sequence of all characters in between is noted as the largest match, where,
Figure SMS_20
indicating that the first preset length is a first preset length,
Figure SMS_25
representing a second preset length; look up the characters in the buffer with the characters
Figure SMS_31
Identical characters
Figure SMS_34
The sequence number j of (2) is marked as an offset, and the integer z is used as a matching length; binary component of offset and matching lengthThe group is taken as an output result; the matching length is taken as the slip quantity.
Further, the obtaining the binary sequences corresponding to all the encoding objects includes the following specific steps:
marking the first preset symbol, the second preset symbol and all English letters as characters, and marking all characters and all English letters as characters
Figure SMS_35
All integers in between are recorded as coding objects;
with a length of
Figure SMS_36
Wherein w represents the like number of all characters,
Figure SMS_37
the method comprises the following steps of representing the upward rounding and setting short binary sequences corresponding to all characters: the 1 st bit in the short binary sequence is marked as a distinguishing bit and is set to 0; will be any one
Figure SMS_38
Binary data as a post-sequence of short binary values
Figure SMS_39
The bit and the short binary sequences corresponding to any two characters are different;
with a length of
Figure SMS_40
The long binary sequence of (2) represents all
Figure SMS_41
All integers in the middle, wherein,
Figure SMS_42
the step of representing the second preset length and setting the long binary sequence corresponding to all integers specifically comprises the following steps: the 1 st bit in the long binary sequence is marked as a distinguishing bit and is set as 1; corresponding the integer
Figure SMS_43
Bit binary data as bits 2 to 2 of long binary sequence
Figure SMS_44
A bit; the last 1 bit in the long binary sequence is recorded as a recognition bit, the recognition bit is set to 1 if the offset and the matching length are obtained according to the character searching buffer, and the recognition bit is set to 0 if the offset and the matching length are obtained according to the phrase searching buffer.
Further, the coding of the compressed sequence according to the binary sequences corresponding to all the coding objects to obtain the coding sequence comprises the following specific steps:
and obtaining short binary sequences corresponding to all characters in the compressed sequence and long binary sequences corresponding to all offsets and matching lengths, and recording the sequences formed by all the obtained short binary sequences and all the long binary sequences according to the sequence as a coding sequence.
The technical scheme of the invention has the beneficial effects that: the invention ensures the short execution time of the compression algorithm by setting a smaller searching buffer zone for the LZ77 compression algorithm; by setting the character searching buffer area and the phrase searching buffer area, the compression algorithm can realize high-efficiency compression by utilizing the locality of characters with shorter distances in the translation, can realize high-efficiency compression by utilizing the locality of characters with longer distances in the translation, and finally ensures that the execution time for compressing the translation is short, and meanwhile, the compression efficiency is high, thereby solving the problem that the influence of the size of the searching buffer area in the LZ77 compression algorithm on the execution time and the compression efficiency is contradictory, realizing the high-efficiency storage of translation files, and ensuring that a translation sequence can be obtained quickly and efficiently on a conference site.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of a method for efficiently storing big data based on a translation file according to the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the invention to achieve the preset aim, the following is a detailed description of a specific implementation, structure, characteristics and effects of the method for efficiently storing big data based on translation files according to the invention, which is provided by the invention, with reference to the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
The following specifically describes a specific scheme of the high-efficiency storage method of big data based on translation files.
Referring to fig. 1, a flowchart of a method for efficiently storing big data based on a translation file according to an embodiment of the present invention is shown, where the method includes the following steps:
s001, obtaining a translation file, and preprocessing the translation file to obtain a translation sequence.
In the embodiment, a terminal at a conference site receives speech signals of a talker from radio equipment and sends the received speech signals to a translation terminal; and the translator at the translation end carries out online real-time translation on the received voice signal to obtain translated text, and all the translated text in a preset time period form a translation file.
In order to ensure that the conference participants understand the speech throughout, simultaneous interpretation has high requirements on timeliness, and efficient and rapid acquisition of a translation sequence is required on the conference site, so in this embodiment, the preset time period is 5 seconds, and in other embodiments, the implementation personnel can set the preset time period as required.
It should be noted that, the translation file for this embodiment is an english version translation file, so the translation file is composed of an english word, an identifier at the end of the english word, and an identifier at the end of the sentence, where the english word is composed of a plurality of english letters; the method comprises the steps of taking a first preset symbol as an identifier of the end of an English word, taking a second preset symbol as an identifier of the end of a sentence, replacing the identifiers of the end of all English words in a translation file with the first preset symbol, replacing the identifiers of the end of all sentences in the translation file with the second preset symbol, adding a first preset symbol at the beginning of the translation file, taking the first preset symbol, the second preset symbol and all English letters as characters, and recording a sequence formed by all the characters contained in the translation file according to a sequence as a translation sequence, wherein the first preset symbol is "#", and the second preset symbol is "heat".
For example, in this embodiment, the translation file is "i have a dog. The dog", and the translation sequence obtained by replacing the identifier at the end of all english words and the identifier at the end of sentences in the translation file is "#i#have#a#dog & the#dog".
S002, compressing the translation sequence by combining the character searching buffer area and the phrase searching buffer area to obtain a compressed sequence.
It should be noted that, because simultaneous interpretation has high requirements on timeliness, the compression efficiency of the translation sequence is required to be high, and the execution time is short, so that a simple and rapid compression algorithm is required to ensure that the translation sequence is obtained efficiently and rapidly on the conference site; also, since simultaneous interpretation is performed in real time, statistical characteristics of the translations cannot be known in advance, and thus the translations cannot be compressed by a compression algorithm based on the statistical characteristics, such as a huffman compression algorithm. Therefore, the present embodiment compresses and transmits the translation by the LZ77 compression algorithm.
The LZ77 compression algorithm takes the compressed characters in the translation as a searching buffer area, takes the characters to be compressed as a preceding buffer area, and realizes high-efficiency compression by utilizing the locality of the characters; the size of the lookup buffer in the LZ77 compression algorithm determines the execution time and compression efficiency of the algorithm: when the translation is compressed by the LZ77 compression algorithm, the largest matching phrase in the searching buffer area is needed to be searched in the advance buffer area, so that the smaller the searching buffer area is in the LZ77 compression algorithm, the shorter the searching time is, namely the shorter the execution time of the algorithm is; because the LZ77 compression algorithm replaces repeated characters with compressed characters by searching whether characters to be compressed in a preceding buffer area appear in the compressed characters, and utilizes the locality of the characters in the translation to realize efficient compression, when the search buffer area in the LZ77 compression algorithm is smaller, the number of the compressed characters which can be utilized is smaller, the locality of the characters which are closer in the translation can only be utilized, the locality of the characters which are farther in the translation can not be utilized, the locality of the characters is not fully utilized, and efficient compression can not be realized, so that the smaller the search buffer area in the LZ77 compression algorithm is, the lower the compression efficiency of the algorithm is. In summary, the size of the lookup buffer of the LZ77 compression algorithm contradicts the impact of execution time and compression efficiency.
In order to ensure that a translation sequence is obtained efficiently and quickly on a conference site, a smaller search buffer is required to be arranged, but the smaller search buffer can lead to lower compression efficiency, so the invention is provided with two search buffers which are a character search buffer and a phrase search buffer respectively, wherein the character search buffer takes a single character as an object, the high-efficiency compression can be realized by utilizing the locality of characters with a smaller distance in the translation site, the phrase search buffer takes a phrase formed by a plurality of characters as an object, the high-efficiency compression can be realized by utilizing the locality of characters with a longer distance in the translation site, and finally the translation sequence can be obtained efficiently and quickly on the conference site.
1. The size of the sliding window and the size of the searching buffer area are set, and the size is specifically as follows: the size of the sliding window is set to be equal to the first preset length, and the size of the searching buffer area in the sliding window is set to be equal to the second preset length. In order to ensure that translations can be sent quickly, in this embodimentIn a first preset length
Figure SMS_45
A second preset length
Figure SMS_46
In other embodiments, the practitioner may set the first preset length and the second preset length as desired, but must ensure that the first preset length is greater than the second preset length.
2. Obtaining a sliding window and a compressed character sequence, and filling in the sliding window before the translated text sequence
Figure SMS_47
The blank characters are used for translating the front part in the sequence
Figure SMS_48
The sequence of characters is used as a sliding window, and a null sequence is set as a compressed character sequence.
3. The method comprises the steps of obtaining a character searching buffer area and a look-ahead buffer area in a sliding window, wherein the character searching buffer area and the look-ahead buffer area are specifically as follows: will slide in front of window
Figure SMS_49
A sequence consisting of the characters is used as a character searching buffer area, and the sliding window is arranged at the rear
Figure SMS_50
The sequence of individual characters acts as a look-ahead buffer. For example, the number of the cells to be processed, for the filled translated sequence "$ $ $ $ $ $ $ $ #i#have#a#dog $ $ $ $ $ $ $ $ #&the #dog ", the first obtained sliding window is" $, $, $, $, $, $, $, #, I, #, h, a, v, e, # ", the character search buffer in the sliding window is" $, $, $, $, $, ", the preceding buffer in the sliding window" #, I, #, h, a, v, e, # ", and the embodiment uses" $ "to indicate an empty character.
4. The phrase searching buffer area is obtained according to the compressed character sequence, which is specifically as follows: a sequence formed by all English letters between two symbols in the compressed character sequence is recorded as a phrase, and a symbol in the compressed character sequence is also recorded as a phrase, wherein the symbols comprise two kinds of symbols which are respectively the firstA preset symbol and a second preset symbol; all phrases in the obtained compressed character sequence are arranged according to the sequence, and finally
Figure SMS_51
The sequence of individual phrases is used as a phrase searching buffer area.
5. Searching in the phrase searching buffer area to obtain the maximum matching item of the preceding buffer area, and obtaining an output result and a sliding quantity according to the maximum matching item, wherein the method specifically comprises the following steps:
(1) All the character strings of the advance buffer are obtained, specifically: dividing the look-ahead buffer into a plurality of subsequences according to symbols, marking each subsequence and each symbol as a character string, and obtaining all character strings of the look-ahead buffer, wherein the symbols comprise two kinds of symbols, namely a first preset symbol and a second preset symbol. For example, all the strings of the look-ahead buffers "#, i, #, h, a, v, e, #" are "#," "have", "#", respectively.
(2) Judging whether the 1 st character string of the phrase searching buffer area exists with the preceding buffer area
Figure SMS_52
The same phrase, there are two cases, respectively:
case 1: word group searching buffer zone and 1 st character string of preceding buffer zone are not existed
Figure SMS_53
And (3) executing the step (3) if the phrases are the same.
Case 2: word group searching buffer zone and 1 st character string of preceding buffer zone
Figure SMS_57
The same phrase is obtained, and the phrase searching buffer area and the character string are obtained
Figure SMS_61
Identical phrase
Figure SMS_67
Wherein, the method comprises the steps of, wherein,
Figure SMS_54
representing an ith phrase in the phrase searching buffer area; if an integer is present
Figure SMS_59
And causes the ith phrase in the phrase lookup buffer
Figure SMS_63
To the (i+z) th phrase
Figure SMS_66
All phrases in between and the 1 st character string of the advance buffer zone
Figure SMS_56
To the 1+z-th character string
Figure SMS_60
All strings in the buffer are identical, the 1 st string in the buffer is first buffered
Figure SMS_64
To the 1+z-th character string
Figure SMS_68
The sequence of all character strings in between is marked as the maximum matching item, wherein s represents the number of all character strings in the advance buffer; searching phrase and character string in buffer area
Figure SMS_55
Identical phrase
Figure SMS_58
The serial number i of (2) is marked as an offset, and the integer z is used as a matching length; taking a binary group consisting of the offset and the matching length as an output result; the 1 st character string of the advance buffer zone
Figure SMS_62
To the 1+z-th character string
Figure SMS_65
The sum of the lengths of all the character strings in between is taken as the sliding quantity; step 6 is performed.
(3) Judging whether the 1 st character of the first buffer area exists in the character searching buffer area
Figure SMS_69
The same character, there are two cases, respectively:
case 1: character search buffer does not have character 1 with the look-ahead buffer
Figure SMS_70
The same character will be the 1 st character of the look-ahead buffer
Figure SMS_71
As the maximum matching item, taking the maximum matching item as an output result, and taking the number of characters in the maximum matching item as a sliding quantity; step 6 is performed.
Case 2: character search buffer presence and 1 st character of look-ahead buffer
Figure SMS_74
The same character, obtain the same character in the character search buffer
Figure SMS_76
Identical characters
Figure SMS_80
Wherein, the method comprises the steps of, wherein,
Figure SMS_73
representing a j-th character in the character lookup buffer; if an integer is present
Figure SMS_78
And causing the jth character in the character lookup buffer to
Figure SMS_81
To the j+r-th character
Figure SMS_84
All characters in between and look ahead buffering1 st character of region
Figure SMS_72
Up to 1+r characters
Figure SMS_79
All characters in between are identical, 1 st character of the look-ahead buffer will
Figure SMS_82
Up to 1+r characters
Figure SMS_83
The sequence composed of all characters in between is marked as the maximum matching item; look up the characters in the buffer with the characters
Figure SMS_75
Identical characters
Figure SMS_77
The sequence number j of (2) is marked as an offset, and the integer z is used as a matching length; taking a binary group consisting of the offset and the matching length as an output result; taking the matching length as the sliding quantity; step 6 is performed.
6. Obtaining a new sliding window according to the sliding quantity, specifically: sliding the sliding window rightwards, wherein the sliding length is equal to the sliding amount, and taking the sliding window after sliding as a new sliding window; re-executing from step 3 according to the new sliding window until the obtained new sliding window is smaller than the second preset length
Figure SMS_85
And (3) marking the sequence formed by all the obtained output results according to the sequence as a compressed sequence of the translation sequence.
As shown in Table 1, the compressed sequences obtained by compressing the translated sequences "#i#have#a#dog & the#dog" according to the above-described steps are "#i, 7,1, h, a, v, e,6,1,5,1,4,1, d, o, g, & t, h, e,7,2".
TABLE 1
Figure SMS_86
In the embodiment, a smaller searching buffer area is arranged for the LZ77 compression algorithm, so that the execution time of the compression algorithm is ensured to be short; by setting the character searching buffer area and the phrase searching buffer area, the compression algorithm can realize high-efficiency compression by utilizing the locality of characters with shorter distances in the translation, can realize high-efficiency compression by utilizing the locality of characters with longer distances in the translation, and finally ensures that the execution time for compressing the translation is short, and meanwhile, the compression efficiency is high, thereby solving the problem that the influence of the size of the searching buffer area in the LZ77 compression algorithm on the execution time and the compression efficiency is contradictory, realizing the high-efficiency storage of translation files, and ensuring that a translation sequence can be obtained quickly and efficiently on a conference site.
S003, coding the compressed sequence according to binary sequences corresponding to all coding objects to obtain a coding sequence.
The compressed sequence obtained in step S002 comprises a binary group of characters, an offset and a matching length, wherein the characters comprise a first preset symbol, a second preset symbol and all English letters, and the essence of the offset and the matching length is that
Figure SMS_87
Integers in between, therefore, the object to be encoded for encoding the compressed sequence includes all the characters and
Figure SMS_88
all integers in between, null characters do not have task semantics, but only to ensure the integrity of the look-ahead buffer, no encoding is required, and therefore null characters do not need to be considered.
The invention encodes the compressed sequence through a binary sequence, wherein the binary sequence consists of a plurality of 0 or 1, and the specific process of encoding is as follows:
1. and obtaining binary sequences corresponding to all the coding objects.
Figure SMS_89
Between (a) and (b)All integers are decimal data, so binary data of each integer is obtained, specifically: due to the number of kinds of integers being
Figure SMS_90
Therefore, only need
Figure SMS_91
The binary data of bits can represent an integer number,
Figure SMS_92
representing an upward rounding.
W=28 characters including the first preset symbol, the second preset symbol and all english alphabets, and therefore, only
Figure SMS_93
The binary data may represent the character.
It should be noted that, because the binary data corresponding to the integer and the binary data corresponding to the character are different in number of bits, and the integer and the character are mixed in the compressed sequence, in the encoded sequence obtained after the compressed sequence is encoded, the binary data corresponding to the integer and the binary data corresponding to the character are mixed, so that decoding cannot be performed. Therefore, the present invention contemplates setting a discrimination bit in the binary sequence corresponding to the integer and the character.
The integer contains all offset and matching length, in the compressed sequence, some offset and matching length are obtained according to the character searching buffer, some offset and matching length are obtained according to the phrase searching buffer, in order to ensure that the coded sequence can be accurately decoded, an identification bit is required to be set in the binary sequence corresponding to the character.
In summary, the length is
Figure SMS_94
The short binary sequence of (2) represents all characters, and then all characters in the compressed sequence are encoded according to the short binary sequence, and the step of setting the short binary sequence corresponding to all characters is specificThe method comprises the following steps: the 1 st bit in the short binary sequence is marked as a distinguishing bit and is set to 0; will be any one
Figure SMS_95
Binary data as a post-sequence of short binary values
Figure SMS_96
It should be noted that the short binary sequences corresponding to any two characters are not identical.
With a length of
Figure SMS_97
The long binary sequences of (2) represent all integers, and then all offset and matching lengths in the compressed sequence are encoded according to all the long binary sequences, and the steps of setting the long binary sequences corresponding to all the integers specifically include: the 1 st bit in the long binary sequence is marked as a distinguishing bit and is set as 1; corresponding the integer
Figure SMS_98
Bit binary data as bits 2 to 2 of long binary sequence
Figure SMS_99
A bit; the last 1 bit in the long binary sequence is recorded as a recognition bit, the recognition bit is set to 1 if the offset and the matching length are obtained according to the character searching buffer, and the recognition bit is set to 0 if the offset and the matching length are obtained according to the phrase searching buffer.
It should be noted that, the correspondence between all the encoding objects and the binary sequences needs to be stored in the terminal and the translation end of the conference site respectively.
2. And coding the compressed sequence according to the binary sequences corresponding to all the coding objects to obtain a coding sequence.
And (3) obtaining short binary sequences corresponding to all characters in the compressed sequence and long binary sequences corresponding to all offsets and matching lengths, and marking the sequences formed by all the obtained short binary sequences and all the long binary sequences according to the sequence as coding sequences to realize the coding of the compressed sequence.
S004, the coding sequence is sent to a terminal on the conference site, and a translation sequence is obtained by decoding and decompressing the coding sequence.
The coding sequence obtained by the translation terminal is sent to a terminal of the conference site in real time, and the terminal of the conference site obtains the translation sequence by decoding and decompressing the coding sequence, so that the conference site can obtain the translation sequence quickly and efficiently; splitting all English words according to all first preset symbols in the translated text sequence, splitting all sentences according to all second preset symbols in the translated text sequence, recording the split sequence as a translation file, processing the translation file into audio by a terminal on a conference site, and transmitting the translation file and the audio to all participants.
1. And decoding the coding sequence according to the corresponding relation between the coding object and the binary sequence to obtain a compressed sequence.
(1) Decoding the coding sequence according to the corresponding relation between the coding object and the binary sequence, specifically: the 1 st data in the coding sequence is obtained, and in the corresponding relation between the coding object and the binary sequence, no matter the short binary sequence corresponding to the character or the long binary sequence corresponding to the integer, the 1 st bit is the distinguishing bit, so if the 1 st data in the coding sequence is 0, the front part of the coding sequence is provided
Figure SMS_100
The sequence formed by the data is used as a short binary sequence, and in the corresponding relation between the coded object and the binary sequence, the character corresponding to the short binary sequence is obtained; if the 1 st data in the coding sequence is 1, the front of the coding sequence is provided
Figure SMS_101
And the sequence formed by the data is used as a long binary sequence, and in the corresponding relation between the coded object and the binary sequence, the integer corresponding to the long binary sequence is obtained.
(2) The obtained short binary sequence or long binary sequence is removed from the coding sequence to obtain a new coding sequence.
(3) And (3) re-executing from the step (1) according to the new coding sequence until the obtained new coding sequence is empty.
And (3) marking the sequence formed by all the obtained characters and integers according to the sequence as a compressed sequence.
2. Decompressing the compressed sequence to obtain a translation sequence.
(1) Setting a blank sequence as a translated sequence, and leading the translated sequence
Figure SMS_102
The sequence of characters acts as a sliding window.
(2) Will slide in front of window
Figure SMS_103
A sequence consisting of the characters is used as a character searching buffer area, and the sliding window is arranged at the rear
Figure SMS_104
The sequence of individual characters acts as a look-ahead buffer.
(3) A sequence formed by all English letters between two symbols in a translation sequence is recorded as a phrase, and a symbol in a compressed character sequence is also recorded as a phrase, wherein the symbols comprise two types, namely a first preset symbol and a second preset symbol; all phrases in the obtained compressed character sequence are arranged according to the sequence, and finally
Figure SMS_105
The sequence of individual phrases is used as a phrase searching buffer area.
(4) For the v data in the compressed sequence, if the v data is a character, the character is directly filled in the last of the translated sequence; if the v-th data is an integer, taking the integer and an adjacent integer on the right side as a binary group, taking the first integer in the binary group as an offset p, and taking the second length in the binary group as a matching length q; if the identification bit in the long binary sequence corresponding to the offset and the matching length is 0, filling the character string formed by the p+q-th phrase from the p-th phrase in the phrase searching buffer zone at the end of the translation sequence, and taking the length of the character string formed by the p-th character to the p+q-th character in the phrase searching buffer zone as the sliding quantity; if the identification bit in the long binary sequence corresponding to the offset and the matching length is 1, filling the character string consisting of the p-th character to the p+q-th character in the character searching buffer area at the last of the translation sequence, and taking the matching length q as the sliding quantity.
(5) Sliding the sliding window rightwards, wherein the sliding length is equal to the sliding amount, and taking the sliding window after sliding as a new sliding window; re-executing from step 2 according to the new sliding window until the obtained new sliding window is smaller than the second preset length
Figure SMS_106
The translation sequence at this time is the decompression result of the compressed sequence.
The invention ensures the short execution time of the compression algorithm by setting a smaller searching buffer zone for the LZ77 compression algorithm; by setting the character searching buffer area and the phrase searching buffer area, the compression algorithm can realize high-efficiency compression by utilizing the locality of characters with shorter distances in the translation, can realize high-efficiency compression by utilizing the locality of characters with longer distances in the translation, and finally ensures that the execution time for compressing the translation is short, and meanwhile, the compression efficiency is high, thereby solving the problem that the influence of the size of the searching buffer area in the LZ77 compression algorithm on the execution time and the compression efficiency is contradictory, realizing the high-efficiency storage of translation files, and ensuring that a translation sequence can be obtained quickly and efficiently on a conference site.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

Claims (8)

1. The method for efficiently storing big data based on the translation file is characterized by comprising the following steps:
obtaining a translation file, and preprocessing the translation file to obtain a translation sequence;
compressing the translation sequence in combination with the character search buffer and the phrase search buffer to obtain a compressed sequence, comprising:
s1, filling a second preset length of empty characters in front of a translated text sequence, taking a sequence consisting of the characters with the first preset length in the translated text sequence as a sliding window, and setting an empty sequence as a compressed character sequence;
s2, taking a sequence formed by a first second preset length of characters in the sliding window as a character searching buffer area and taking a sequence formed by the rest characters in the sliding window as a preceding buffer area;
s3, obtaining a phrase searching buffer area according to the compressed character sequence;
s401, obtaining all character strings of the preceding buffer, judging whether the phrase searching buffer has the same phrase as the 1 st character string of the preceding buffer, and executing S402 if not; if so, searching in the phrase searching buffer area to obtain the maximum matching item of the preceding buffer area, and according to the maximum matching item, obtaining an output result and a sliding quantity, and executing S5;
s402, searching in a character searching buffer area to obtain a maximum matching item of a preceding buffer area, obtaining an output result and a sliding quantity according to the maximum matching item, and executing S5;
s5, obtaining a new sliding window according to the sliding quantity, and taking the sliding window after sliding as the new sliding window; re-executing from S2 according to the new sliding window until the obtained size of the new sliding window is smaller than the second preset length; the sequence formed by all the obtained output results according to the sequence is recorded as a compressed sequence of the translation sequence;
obtaining binary sequences corresponding to all the coding objects, coding the compressed sequences according to the binary sequences corresponding to all the coding objects to obtain coding sequences, temporarily storing the obtained coding sequences, and sending the obtained coding sequences to a terminal on a conference site.
2. The method for efficiently storing big data based on a translation file according to claim 1, wherein the step of preprocessing the translation file to obtain a translation sequence comprises the following specific steps:
the method comprises the steps of taking a first preset symbol as an identifier of the end of an English word, taking a second preset symbol as an identifier of the end of a sentence, replacing the identifiers of the end of all English words in a translation file with the first preset symbol, replacing the identifiers of the end of all sentences in the translation file with the second preset symbol, adding a first preset symbol at the beginning of the translation file, taking the first preset symbol, the second preset symbol and all English letters as characters, and recording a sequence formed by all the characters contained in the translation file according to a sequence as a translation sequence.
3. The method for efficiently storing big data based on a translation file according to claim 1, wherein the step of obtaining a phrase search buffer according to a compressed character sequence comprises the following specific steps:
a sequence formed by all English letters between two symbols in a compressed character sequence is recorded as a phrase, and one symbol in the compressed character sequence is also recorded as a phrase, wherein the symbols comprise two types, namely a first preset symbol and a second preset symbol; all phrases in the obtained compressed character sequence are arranged according to the sequence, and finally
Figure QLYQS_1
The sequence composed of the individual phrases is used as a phrase searching buffer area pair to be filled with the translated text sequence, wherein ∈>
Figure QLYQS_2
Representing a second preset length.
4. The method for efficiently storing big data based on a translation file according to claim 1, wherein the step of obtaining all the strings in the look-ahead buffer comprises the following specific steps:
dividing the look-ahead buffer into a plurality of subsequences according to symbols, marking each subsequence and each symbol as a character string, and obtaining all character strings of the look-ahead buffer, wherein the symbols comprise two kinds of symbols, namely a first preset symbol and a second preset symbol.
5. The method for efficiently storing big data based on a translation file according to claim 1, wherein the searching in the phrase searching buffer area obtains the largest matching item of the look-ahead buffer area, and the outputting result and the sliding amount are obtained according to the largest matching item, comprising the following specific steps:
word group searching buffer zone and 1 st character string of preceding buffer zone
Figure QLYQS_3
The same phrase is obtained, and the phrase searching buffer area is provided with the same character string +.>
Figure QLYQS_8
Identical phrase->
Figure QLYQS_14
Wherein->
Figure QLYQS_4
Representing an ith phrase in the phrase searching buffer area; if an integer is present
Figure QLYQS_10
And causes the i-th phrase +_ in the phrase look-up buffer>
Figure QLYQS_11
To the (i+z) th wordGroup->
Figure QLYQS_17
All phrases in between and 1 st character string of the look-ahead buffer +.>
Figure QLYQS_5
To the 1+z character string +.>
Figure QLYQS_7
All the strings in the buffer are identical, the 1 st string of the preceding buffer is +.>
Figure QLYQS_13
To the 1+z character string +.>
Figure QLYQS_16
The sequence of all character strings in between is marked as the maximum matching item, wherein s represents the number of all character strings in the advance buffer; find the phrase and the character string in the buffer area +.>
Figure QLYQS_6
Identical phrase->
Figure QLYQS_9
The serial number i of (2) is marked as an offset, and the integer z is used as a matching length; taking a binary group consisting of the offset and the matching length as an output result; the 1 st character string of the preceding buffer zone +.>
Figure QLYQS_12
To the 1+z character string +.>
Figure QLYQS_15
The sum of the lengths of all the character strings in between is used as the slip amount.
6. The method for efficiently storing big data based on a translation file according to claim 1, wherein the searching in the searching area is performed to obtain a maximum matching item of the area to be processed, and the plaintext and the translation amount are obtained according to the maximum matching item, comprising the following specific steps:
judging whether the character searching buffer area has the same character as the 1 st character of the first buffer area:
if the character lookup buffer does not have the 1 st character with the look-ahead buffer
Figure QLYQS_18
The same character, 1 st character of the preceding buffer zone +>
Figure QLYQS_19
As the maximum matching item, taking the maximum matching item as an output result, and taking the number of characters in the maximum matching item as a sliding quantity;
if the character search buffer exists with the 1 st character of the look-ahead buffer
Figure QLYQS_21
The same character, obtain the same character +.>
Figure QLYQS_25
Identical character->
Figure QLYQS_31
Wherein->
Figure QLYQS_22
Representing a j-th character in the character lookup buffer; if an integer is present
Figure QLYQS_26
And causes the j-th character +_in the character lookup buffer>
Figure QLYQS_28
To the j+r-th character
Figure QLYQS_32
All characters and antecedents therebetween1 st character of buffer->
Figure QLYQS_20
Up to 1+r th character->
Figure QLYQS_24
All characters in the buffer are identical, the 1 st character of the look-ahead buffer is +.>
Figure QLYQS_29
Up to 1+r th character->
Figure QLYQS_34
The sequence of all characters in between is noted as the largest match, wherein +.>
Figure QLYQS_23
Representing a first preset length,/a>
Figure QLYQS_27
Representing a second preset length; look up the AND character in the buffer>
Figure QLYQS_30
Identical character->
Figure QLYQS_33
The sequence number j of (2) is marked as an offset, and the integer z is used as a matching length; taking a binary group consisting of the offset and the matching length as an output result; the matching length is taken as the slip quantity.
7. The method for efficiently storing big data based on a translation file according to claim 1, wherein the step of obtaining binary sequences corresponding to all the encoded objects comprises the following specific steps:
marking the first preset symbol, the second preset symbol and all English letters as characters, and marking all characters and all English letters as characters
Figure QLYQS_35
All integers in between are recorded as coding objects;
with a length of
Figure QLYQS_36
Wherein w represents the like number of all characters, and the step of setting the short binary sequence corresponding to all characters is specifically: the 1 st bit in the short binary sequence is marked as a distinguishing bit and is set to 0; either one is +.>
Figure QLYQS_37
Binary data of bits as a short binary sequence>
Figure QLYQS_38
The bit and the short binary sequences corresponding to any two characters are different;
with a length of
Figure QLYQS_39
The long binary sequence of (2) represents all +.>
Figure QLYQS_40
All integers in between, wherein ∈ ->
Figure QLYQS_41
The step of representing the second preset length and setting the long binary sequence corresponding to all integers specifically comprises the following steps: the 1 st bit in the long binary sequence is marked as a distinguishing bit and is set as 1; corresponding integer +.>
Figure QLYQS_42
Bit binary data as bits 2 to 2 of long binary sequence
Figure QLYQS_43
A bit; the last 1 bit in the long binary sequence is recorded as an identification bit, if the offset and the matching length are obtained according to the character searching buffer, the identification bit is set to be 1, and if the offset and the matching length areAnd setting the identification bit to 0 according to the phrase searching buffer area.
8. The method for efficiently storing big data based on a translation file according to claim 7, wherein the encoding the compressed sequence according to the binary sequences corresponding to all the encoding objects to obtain the encoded sequence comprises the following specific steps:
and obtaining short binary sequences corresponding to all characters in the compressed sequence and long binary sequences corresponding to all offsets and matching lengths, and recording the sequences formed by all the obtained short binary sequences and all the long binary sequences according to the sequence as a coding sequence.
CN202310300380.XA 2023-03-27 2023-03-27 Efficient big data storage method based on translation file Active CN116032292B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310300380.XA CN116032292B (en) 2023-03-27 2023-03-27 Efficient big data storage method based on translation file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310300380.XA CN116032292B (en) 2023-03-27 2023-03-27 Efficient big data storage method based on translation file

Publications (2)

Publication Number Publication Date
CN116032292A true CN116032292A (en) 2023-04-28
CN116032292B CN116032292B (en) 2023-06-09

Family

ID=86089448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310300380.XA Active CN116032292B (en) 2023-03-27 2023-03-27 Efficient big data storage method based on translation file

Country Status (1)

Country Link
CN (1) CN116032292B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116683916A (en) * 2023-08-03 2023-09-01 山东五棵松电气科技有限公司 Disaster recovery system of data center

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1992002989A1 (en) * 1990-08-09 1992-02-20 Telcor Systems Corporation Compounds adaptive data compression system
US20020140583A1 (en) * 2000-12-22 2002-10-03 Cilys 53 Inc. System and method for compressing and decompressing data in real time
JP2007257188A (en) * 2006-03-22 2007-10-04 Casio Comput Co Ltd Dictionary search device and its control program
CN202931289U (en) * 2012-11-14 2013-05-08 无锡芯响电子科技有限公司 Hardware LZ 77 compression implement system
JP2013162474A (en) * 2012-02-08 2013-08-19 Tamura Seisakusho Co Ltd Data compression method and device
US20170373702A1 (en) * 2016-06-22 2017-12-28 Fujitsu Limited Data compression device and data decompression device
US20180102789A1 (en) * 2016-10-06 2018-04-12 Fujitsu Limited Computer-readable recording medium, encoding apparatus, and encoding method
CN108768403A (en) * 2018-05-30 2018-11-06 中国人民解放军战略支援部队信息工程大学 Lossless data compression, decompressing method based on LZW and LZW encoders, decoder
US20220121770A1 (en) * 2020-10-19 2022-04-21 Duality Technologies, Inc. Efficient secure string search using homomorphic encryption
CN114567331A (en) * 2022-01-29 2022-05-31 山东云海国创云计算装备产业创新中心有限公司 LZ 77-based compression method, device and medium thereof

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1992002989A1 (en) * 1990-08-09 1992-02-20 Telcor Systems Corporation Compounds adaptive data compression system
US20020140583A1 (en) * 2000-12-22 2002-10-03 Cilys 53 Inc. System and method for compressing and decompressing data in real time
JP2007257188A (en) * 2006-03-22 2007-10-04 Casio Comput Co Ltd Dictionary search device and its control program
JP2013162474A (en) * 2012-02-08 2013-08-19 Tamura Seisakusho Co Ltd Data compression method and device
CN202931289U (en) * 2012-11-14 2013-05-08 无锡芯响电子科技有限公司 Hardware LZ 77 compression implement system
US20170373702A1 (en) * 2016-06-22 2017-12-28 Fujitsu Limited Data compression device and data decompression device
US20180102789A1 (en) * 2016-10-06 2018-04-12 Fujitsu Limited Computer-readable recording medium, encoding apparatus, and encoding method
CN108768403A (en) * 2018-05-30 2018-11-06 中国人民解放军战略支援部队信息工程大学 Lossless data compression, decompressing method based on LZW and LZW encoders, decoder
US20220121770A1 (en) * 2020-10-19 2022-04-21 Duality Technologies, Inc. Efficient secure string search using homomorphic encryption
CN114567331A (en) * 2022-01-29 2022-05-31 山东云海国创云计算装备产业创新中心有限公司 LZ 77-based compression method, device and medium thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
D. R. VASANTHI, R. ANUSHA AND B. K. VINAY: "Implementation of Robust Compression Technique Using LZ77 Algorithm on Tensilica\'s Xtensa Processor", 《2016 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY (ICIT)》, pages 148 - 153 *
满天星: "改进的LZ系列压缩文本上的搜索算法", 《中国优秀硕士学位论文全文数据库信息科技辑》, pages 138 - 353 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116683916A (en) * 2023-08-03 2023-09-01 山东五棵松电气科技有限公司 Disaster recovery system of data center
CN116683916B (en) * 2023-08-03 2023-10-10 山东五棵松电气科技有限公司 Disaster recovery system of data center

Also Published As

Publication number Publication date
CN116032292B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
US5006849A (en) Apparatus and method for effecting data compression
FI114051B (en) Procedure for compressing dictionary data
US9223765B1 (en) Encoding and decoding data using context model grouping
US4597057A (en) System for compressed storage of 8-bit ASCII bytes using coded strings of 4 bit nibbles
CN116032292B (en) Efficient big data storage method based on translation file
US20060071822A1 (en) Method and apparatus for adaptive data compression
CN101783788A (en) File compression method, file compression device, file decompression method, file decompression device, compressed file searching method and compressed file searching device
US11669553B2 (en) Context-dependent shared dictionaries
US4295124A (en) Communication method and system
CN110518917A (en) LZW data compression method and system based on Huffman coding
CN115840799B (en) Intellectual property comprehensive management system based on deep learning
CN116610265B (en) Data storage method of business information consultation system
CN101388731B (en) Low rate equivalent speech water sound communication technique
JPS59231683A (en) Data compression system
CN101534124A (en) Compression algorithm for short natural language
CN105630755A (en) Source encoding and decoding methods and devices for expanding information quantity transmission of Beidou-satellite short message
CN116645971A (en) Semantic communication text transmission optimization method based on deep learning
CN115099244A (en) Voice translation method, and method and device for training voice translation model
CN114595698A (en) Semantic communication method based on CCSK and deep learning
CN114491597A (en) Text carrierless information hiding method based on Chinese character component combination
KR20050053996A (en) Method and apparatus for decoding huffman code effectively
RU2437148C1 (en) Method to compress and to restore messages in systems of text information processing, transfer and storage
Shanmugasundaram et al. Text preprocessing using enhanced intelligent dictionary based encoding (EIDBE)
JPH0546358A (en) Compressing method for text data
JPH0546357A (en) Compressing method and restoring method for text data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant