CN105893358A - A real-time compression method for files - Google Patents

A real-time compression method for files Download PDF

Info

Publication number
CN105893358A
CN105893358A CN201410464824.4A CN201410464824A CN105893358A CN 105893358 A CN105893358 A CN 105893358A CN 201410464824 A CN201410464824 A CN 201410464824A CN 105893358 A CN105893358 A CN 105893358A
Authority
CN
China
Prior art keywords
file
node
character
weights
tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410464824.4A
Other languages
Chinese (zh)
Inventor
陈宏庆
顾永青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Gm-Winlead Intelligent Technology Co Ltd
Original Assignee
Jiangsu Gm-Winlead Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Gm-Winlead Intelligent Technology Co Ltd filed Critical Jiangsu Gm-Winlead Intelligent Technology Co Ltd
Priority to CN201410464824.4A priority Critical patent/CN105893358A/en
Publication of CN105893358A publication Critical patent/CN105893358A/en
Pending legal-status Critical Current

Links

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of real-time file transmission and compression. In the prior art, the method of compressing a file transmitted through a network comprises the steps of scanning the file after the file is completely transmitted and stored in a local storage space, formulating a proper encoding strategy according to appearing probability distribution condition statistics of characters in the file, scanning the file for a secondary time and generating a compression code for each character in the file and writing the same to a compressed file, and finally deleting the original file. The network transmission and scanning cost much time, and coexistence of the original file and the compressed file in a compressing process occupies the local storage space. The invention provides a method for rapidly compressing a network transmission file; when a file is completely transmitted, a compressed file is generated, and no other storage spaces are occupied in the compression process.

Description

A kind of real-time compression method of file
One, technical field
The present invention is applied to network data transmission compressing file and decompression field, mainly solves during network transmission file data, the problem of transmission limit, limit compression.
Two, background technology
One file transmitted by network is compressed, traditional way is after file such as grade transfers, it is saved in locally stored space, again file is scanned, the probability distribution situation occurred by character in statistics file, formulates suitable coding strategy, then second time scanning file, each character applied compression of file is encoded, is written in compressed file, finally deletes original document.Network transmission and twice sweep, occupy the substantial amounts of time, and original document and compressed file exist during compression simultaneously, if file is relatively big, can take a large amount of memory space.The present invention is primarily to create a kind of method carrying out being not take up other memory spaces in Fast Compression, compression process for the file of network transmission.
Three, summary of the invention
One transmitting terminal sends file to a receiving terminal, and receiving terminal receives file and is compressed file, and after file is sent by the time, compressed file generates the most simultaneously, will not produce temporary file, will not take other memory space during compression.
1. transmitting terminal and receiving terminal are communicated by two TCP connections, and one connects responsible transmission instruction, and transmission data are responsible in a connection.
2. transmitting terminal sends connection by instruction and sends file transmission request, and the message format of request is as follows:
Type, fid, file-size, filename-length, filename.
Type:8bit, unsigned number, the type of message, sending request message is 1.
Fid:64bit, unsigned number, it is transmitted the unique identifier of file, after instruction transmission TCP successful connection, fid is initialized to the random number between 0~2^64-1, often transfers this numerical value of file and is increased by one, 0 is returned to, with this regular cycles after arriving 2^64-1.
File-size:64bit, unsigned number, file size.
Filename-length:16bit, unsigned number, filename length.
Filename: length is not intended to, character string, filename content.
3. the reply that receiving terminal makes requests on, normal reply (preparing to receive) message format is as follows:
Type, fid
Type:8bit, unsigned number, the type of message, prepare to be received as 2.
Fid:64bit, unsigned number, the unique identifier of file to be received.
When makeing mistakes to reply message form as follows:
Type, fid, code
Type:8bit, unsigned number, the type of message, error messages is 3.
Fid:64bit, unsigned number, the unique identifier of file to be received.
Code:8bit, unsigned number, reason-code of makeing mistakes.Table specific as follows:
code Implication
1 Memory space inadequate
2 Transmitting file
3 Do not write authority
4 Other reasons
4. receiving terminal adds " .gmf " at locally stored middle establishment compressed file, filename old file name, and the content of file beginning is as follows:
File-size, filename-length, filename
File-size:64bit, unsigned number, file size.
Filename-length:16bit, unsigned number, filename length.
Filename: be not intended to length, character string, filename.
5. receiving terminal initializes code tree, and this tree only one of which sky leaf node, symbol is TERM, and weights are always 0, numbered 1024.
6. transmitting terminal starts to send data by data transmission connection, and receiving terminal carries out data receiver.Receiving terminal is often read, into a character, to check whether this character is present in code tree:
1) if it does not exist, then this character is encoded, start up to TERM character from the root node of tree, just it is encoded to 0 through left child, is encoded to 1 through right child, until arriving this character, finally plus this character itself, in the coding write compressed file of generation.Then a stalk tree is generated, original TERM node is replaced with this stalk tree, the father node symbol of this subtree is empty, and weights are 1 (weights 0 of TERM are plus the symbol node weights 1 of new addition), the numbering that numbered TERM is original, its right branch node is the character just read in, this node symbol is this character, and weights are 1, and numbered present father node numbering deducts 1, left branch node is a new empty leaf node TERM, and numbered present father node numbering deducts 2.Because adding new node, so each weights needing to adjust each node, according to the order that node serial number is ascending, before amendment weights, node maximum for the numbering in present node and block with identical weights is swapped (switch character and weights, do not exchange numbering), and make the father node of the latter become new present node, until running into root node.
2) if it is present this character is encoded, start from the root node of tree until this character, be just encoded to 0 through left child, be encoded to 1 through right child, until arriving this character, in the coding write compressed file of generation.Then the weights of each node are adjusted, according to the order that node serial number is ascending, before revising the weights of this node, the node of the numbering maximum having identical weights in present node and block is swapped, and make the father node of the latter become new present node, until running into root node.
7. after transmitting terminal has sent data, sending file and be sent message in instruction TCP connection, message format is as follows:
Type, fid
Type:8bit, unsigned number, the type of message, being sent message is 4.
Fid:64bit, unsigned number, the unique identifier of the file being sent.
8. receiving terminal receives after file is sent message, returns under normal circumstances and is properly received message, and message format is as follows:
Type, fid
Type:8bit, unsigned number, the type of message, prepare to be received as 5.
Fid:64bit, unsigned number, confirm the unique identifier of the file received.
When makeing mistakes to reply message form as follows:
Type, fid, code
Type:8bit, unsigned number, the type of message, error messages is 6.
Fid:64bit, unsigned number, receive the unique identifier of the file made mistakes.
Code:8bit, unsigned number, reason-code of makeing mistakes.Table specific as follows:
code Implication
1 Memory space inadequate
2 Transmitting file
3 The file data received is the most complete
4 Other reasons
Focusing on the generation method to compressed encoding below and file decompression illustrates, such as old file name is a.txt, and the byte content of binary file is abccab, and compression step is as follows:
1. receiving terminal initializes code tree, this tree only one of which sky leaf node, and symbol is TERM, and the weights beginning 0, numbered 1024, a in Figure of description 1 is carried out the code tree after this step.
2. read first character joint for a, because there is no character a in Shu, so the coding of a is to navigate to the coding of TERM node plus character a itself from root vertex, because only that a TERM node, so the coding of a is exactly a, write compressed encoding a.Then a stalk tree is generated, original TERM node is replaced with this stalk tree, the father node symbol of this subtree is empty, numbering 1024 original for numbered TERM, its right branch node be character be a, weights are 1, numbered present father node numbering deduct 1 that is 1023, left branch node is a new empty leaf node TERM, and numbered present father node numbering deducts 2 that is 1022.Although the weights of root node are increased, but this node serial number is maximum, so not doing any switching motion, finally amendment node weights are 1 (weights 0 of TERM are plus the weights 1 of node a), and the b in Figure of description 1 is carried out the code tree after this step.
3. read second byte b, tree does not has character b, from root node to the path code of TERM node plus the coding of b character inherently character b, write compressed encoding 0b.Use comprises new TERM node and character b substitutes old TERM node, and the weights of root node add 1.C in Figure of description 1 is carried out the code tree after this step.
4. reading the 3rd byte c, do not have character c in tree, write c is encoded to 00c.Use comprises new TERM node and character b substitutes old TERM node, d in Figure of description 1 is carried out the code tree after this step, the weights now needing the node to numbered 1022 add 1 operation, but now identical with the node weights of numbered 1022 block has 1023, 1021, 1019, wherein 1023 is maximum, so needing to swap 1022 with 1023, weights are added 1 process, then using 1023 father node as present node, because being root node, so directly the weights of root node being added 1, e in Figure of description 1 is carried out the code tree after this step.
5. read the 4th byte c, having character c in tree, write c is encoded to 101, will add 1 to the weights of 1019 nodes, now identical with 1019 weights node has 1021 and 1022, maximum numbering is 1022, so needing to exchange 1022 and 1019, after exchange, the node weights to numbered 1022 add 1 process, then using 1022 father node as present node, because being root node, so directly adding 1 by the weights of root node, the f in Figure of description 1 is carried out the code tree after this step.
null6. read the 5th byte a,Tree has character a,Write a is encoded to 101,The weights of 1019 nodes will be added 1,Now identical with 1019 weights node has 1021,And 1021 are more than 1019,So needing to exchange 1021 and 1019,After exchange, the node weights to numbered 1021 add 1 process,G in Figure of description 1 is carried out the code tree after this step,Then need to adjust the weights of the father node 1023 of 1021,Because the block identical with 1023 weights 2 has 1022,But 1022 are less than 1023,So not swapping,The weights of 1023 are added 1 and becomes 3,Need to adjust the weights of 1024 below,Because being root node,So directly weights being added 1,Become 5,H in Figure of description 1 is carried out the code tree after this step.
7. read the 6th byte b, tree has character b, write b is encoded to 101, the node weights of 1019 will be added 1, now the most identical with 1019 weights 1 block, it is made without exchange, directly adding 1 by the weights of 1019 nodes becomes 2, then the weights of 1020 nodes will be added 1, now the most identical with 1020 node weights 1 block, it is made without exchange, directly adding 1 by the weights of 1020 nodes becomes 2, then the weights of 1023 nodes will be added 1, now there is no the block as 1023 weights, it is made without exchange, directly adding 1 by the weights of 1023 nodes becomes 4, then the weights of 1024 nodes will be added 1, because being root node, so directly adding 1 to become 6.I in Figure of description 1 is carried out the code tree after this step.
8. the compressed encoding of final abccab is: a0b00c101101101, the entitled a.txt.gmf of file.Originally store this file to need to use 48bit, by having only to 36bit after compression.
Compressed file is decompressed, it is desirable to the original binary content of generation is: abccab, and the step of decompressing files is as follows:
1. the original document size reading compressed file is 6 bytes, and the entitled a.txt of original document, by the original empty file of original document name creation.
2. initializing code tree, this tree only one of which sky leaf node, symbol is TERM, and the weights beginning 0, numbered 1024, a in Figure of description 1 is carried out the code tree after this step.
3. reading the first character joint in compressed encoding, for character a, write original document, then join in tree by character a, method is identical with during compression, and the b in Figure of description 1 is carried out the code tree after this step.
4.Reading 0 in compressed encoding, arrive TERM node, then read a byte from compressed encoding, for character b, character b is write original document, joins in tree by character b, the c in Figure of description 1 is carried out the code tree after this step.
5. read 0 in compressed encoding, arrive 1022, read 0 in compressed encoding again, arrive TERM, then from compressed encoding, read a byte, for character c, character c is write original document, joining in tree by character c, the e in Figure of description 1 is carried out the code tree after this step.
6. reading the position 101 in compressed encoding, eventually arrive at 1019 nodes, this node character is c, and c is write original document, adds 1 by the weights of 1019 nodes, and the f in Figure of description 1 is carried out the code tree after this step.
7. reading the position 101 in compressed encoding, eventually arrive at 1019 nodes, this node character is a, and a is write original document, adds 1 by the weights of 1019 nodes, and the h in Figure of description 1 is carried out the code tree after this step.
8. reading the position 101 in compressed encoding, eventually arrive at 1019 nodes, this node character is b, and b is write original document, adds 1 by the weights of 1019 nodes, and the i in Figure of description 1 is carried out the code tree after this step.
9. being finally completed decompression operation, original binary content is abccab, consistent with intended.
Fig. 1 is the various code tree Transformation Graphs occurred in compressing file and decompression process.
In Fig. 1, a is the initial code tree only comprising an empty node;
In Fig. 1, b is the code tree after adding character a node;
In Fig. 1, c is the code tree after adding character b node;
In Fig. 1, d is the code tree after adding character c node;
In Fig. 1, e is to add after character c node the code tree after weights carry;
In Fig. 1, f is the code tree after again adding character c node;
In Fig. 1, g is the code tree after again adding character a node;
In Fig. 1, h is again to add after character a node the code tree after weights carry;
In Fig. 1, i is the code tree after again adding character b node.
Four, detailed description of the invention
1. configuration TCP/IP network, and on network, two ends can intercommunication.
2. start transmitting terminal in one end of network, the other end at network starts receiving terminal.
3. transmitting terminal sends file, and receiving terminal can produce compressed file.

Claims (2)

1. the present invention uses variable tree to encode the file data of transmission in document transmission process, therefore it is required that protection uses the mode of variable tree to net File in network transmission encodes.
2. the present invention uses dual pathways TCP to connect control, a channel transfer control instruction, a channel transfer literary composition when compressing in document transmission process Number of packages evidence, therefore it is required that protection uses dual pathways TCP to connect the method for designing realizing the compression of transmission limit, limit in document transmission process.
CN201410464824.4A 2014-09-12 2014-09-12 A real-time compression method for files Pending CN105893358A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410464824.4A CN105893358A (en) 2014-09-12 2014-09-12 A real-time compression method for files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410464824.4A CN105893358A (en) 2014-09-12 2014-09-12 A real-time compression method for files

Publications (1)

Publication Number Publication Date
CN105893358A true CN105893358A (en) 2016-08-24

Family

ID=56999973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410464824.4A Pending CN105893358A (en) 2014-09-12 2014-09-12 A real-time compression method for files

Country Status (1)

Country Link
CN (1) CN105893358A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110401723A (en) * 2019-08-16 2019-11-01 北京浪潮数据技术有限公司 Method, system, equipment and the storage medium of OVA file upload services device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101521661A (en) * 2008-02-29 2009-09-02 北京盖特佳信息安全技术股份有限公司 Dual-channel information exchange method based on load balancing technique
CN102546108A (en) * 2011-12-28 2012-07-04 深圳市新为软件有限公司 Method and device for transmitting network resources by tree structure
CN102546105A (en) * 2011-12-28 2012-07-04 深圳市新为软件有限公司 Method and device for network resource transmission
CN103181168A (en) * 2010-08-17 2013-06-26 三星电子株式会社 Video encoding method and apparatus using transformation unit of variable tree structure, and video decoding method and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101521661A (en) * 2008-02-29 2009-09-02 北京盖特佳信息安全技术股份有限公司 Dual-channel information exchange method based on load balancing technique
CN103181168A (en) * 2010-08-17 2013-06-26 三星电子株式会社 Video encoding method and apparatus using transformation unit of variable tree structure, and video decoding method and apparatus
CN102546108A (en) * 2011-12-28 2012-07-04 深圳市新为软件有限公司 Method and device for transmitting network resources by tree structure
CN102546105A (en) * 2011-12-28 2012-07-04 深圳市新为软件有限公司 Method and device for network resource transmission

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110401723A (en) * 2019-08-16 2019-11-01 北京浪潮数据技术有限公司 Method, system, equipment and the storage medium of OVA file upload services device

Similar Documents

Publication Publication Date Title
CN103250463B (en) For the subset coding of communication system
CN104081702B (en) Method for sending/receiving grouping in a communications system
CN101359981B (en) Method, apparatus and system for data packet redundant encoding and decoding
CN103428227B (en) Based on the IP conceal communication method of Huffman coding
CN101222297B (en) Interlaced code and network code combined data distribution method
CN105812098A (en) Universal file transmission method for providing veried error protection and
CN102143367A (en) Method, device and system for checking error correction
CN103858370A (en) Apparatus and method for transmitting/receiving forward error correction packet in mobile communication system
CN101877620B (en) Method, apparatus and system for forward error correction
CN105721611A (en) General method for generating minimal storage regenerating code with maximum distance separable storage code
CN105553873B (en) Method and system for processing data in a telecommunication system to dynamically adapt to the amount of data to be transmitted
CN106776129A (en) A kind of restorative procedure of the multinode data file based on minimum memory regeneration code
US11936475B2 (en) Method, apparatus, and system for improving reliability of data transmission involving an ethernet device
CN112600647B (en) Multi-hop wireless network transmission method based on network coding endurance
CN102804661A (en) Block aggregation of objects in a communication system
CN104836642A (en) LTP (Licklider Transmission Protocol) optimized design method based on erase code
CN103944676A (en) MLT code coding and decoding method based on deep space communication environment
CN105893358A (en) A real-time compression method for files
CN110191248A (en) A kind of unmanned plane image transfer method of the Bats Code based on feedback
CN105827441A (en) SOAP message transmission method and system
Yang et al. Large file transmission in network-coded networks with packet loss: A performance perspective
CN105119957A (en) Information transmission method and device used for intelligent device
US10728356B2 (en) Communication device and communication system
CN112328373B (en) Distributed simulation-oriented automatic discovery method for data distribution service DDS
Nie et al. A novel systematic raptor network coding scheme for Mars-to-Earth relay communications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
DD01 Delivery of document by public notice

Addressee: JIANGSU GM-WINLEAD INTELLIGENT TECHNOLOGY CO., LTD.

Document name: Notification of Publication of the Application for Invention

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160824