CN102982046B - A kind of web data compression and storage method and system - Google Patents

A kind of web data compression and storage method and system Download PDF

Info

Publication number
CN102982046B
CN102982046B CN201110264127.0A CN201110264127A CN102982046B CN 102982046 B CN102982046 B CN 102982046B CN 201110264127 A CN201110264127 A CN 201110264127A CN 102982046 B CN102982046 B CN 102982046B
Authority
CN
China
Prior art keywords
piecemeal
webpage
compressed data
storage
compression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201110264127.0A
Other languages
Chinese (zh)
Other versions
CN102982046A (en
Inventor
闫瑞
韩金宇
罗志国
孙少陵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201110264127.0A priority Critical patent/CN102982046B/en
Publication of CN102982046A publication Critical patent/CN102982046A/en
Application granted granted Critical
Publication of CN102982046B publication Critical patent/CN102982046B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a kind of web data compression and storage method and system:When needing to be compressed any webpage, two or more piecemeal is divided the webpage into;For each piecemeal, determine whether to have stored its corresponding compressed data respectively, if it is not, then compressing the piecemeal, and compressed data is stored, if it is, without compression.Using scheme of the present invention, it is possible to increase compression efficiency and saving memory space.

Description

A kind of web data compression and storage method and system
Technical field
The present invention relates to data processing technique, more particularly to a kind of web data compression and storage method and system.
Background technology
, can be using data compression technique come to data in order to improve the efficiency of transmission of data and save the memory space of data It is compressed, data can be that, because itself having redundancy, data compression technique is to refer to utilize certain algorithm will by compression The redundancy of data reduces as much as possible, and is reduced as far as distortion.
Data compression technique is generally divided into Lossless Compression and lossy compression method.
Lossless Compression refers to obtain after reducing compressed data and the identical data of initial data.It is lossless Compression, which is mainly used in, requires reconstruction signal and the completely the same occasion of primary signal, such as compression of text data, program.It is lossless The compression ratio of compression is relatively low, and usually 1/2~1/5.Typical lossless compression algorithm has Huffman (Huffman) coding, arithmetic Coding, Run- Length Coding etc..
Lossy coding refers to that the data obtained after being reduced to compressed data and initial data are different, but does not influence The information of initial data expression, therefore compression ratio is much greater.Lossy compression method is mainly used in the data such as voice, image and video Compression.Typical Lossy Compression Algorithm have pulse code modulation (PCM, Pulse Code Modulation), predictive coding, Transition coding, interpolation and extrapolation etc..
Web data compression generally uses Lossless Compression, and be compressed in units of single webpage, specific implementation It is as follows:Webpage to be compressed is obtained, it is compressed according to certain algorithm, compressed data is obtained, compressed data is protected Deposit, and correspondingly preserve the URL (URL, Uniform Resource Locator) of the webpage.Subsequently, needs are worked as When reading the webpage, its corresponding compressed data is found according to the URL of the webpage, and compressed data is decompressed, so that Obtain the webpage.
But, the problem of aforesaid way can have certain in actual applications:In some cases, meeting between different webpages In the presence of certain general character, such as the different web pages in same website, and in the prior art when being compressed to webpage, with single Webpage is unit, the general character between different web pages is not considered, such as two webpages have 40% content identical, but this identical 40% content can be compressed twice, can also be stored twice, so as to not only reduce compression efficiency, and increase to storage The occupancy in space.
The content of the invention
In view of this, the present invention provides a kind of web data compression and storage method and system, it is possible to increase compression efficiency and Save memory space.
To reach above-mentioned purpose, the technical proposal of the invention is realized in this way:
A kind of web data compression and storage method, including:
When needing to be compressed any webpage, two or more piecemeal is divided the webpage into;
For each piecemeal, determine whether to have stored its corresponding compressed data respectively, if it is not, then compressing this point Block, and compressed data is stored, if it is, without compression.
A kind of web data compression storage system, including:
Compression service device, for when needing to be compressed any webpage, dividing the webpage into two or more point Block;For each piecemeal, inquiry request is sent to storage server respectively, whether inquiry has wherein stored piecemeal correspondence Compressed data, deny message if received, compress the piecemeal, and the storage server is arrived into compressed data storage In, if receiving confirmation message, without compression;
The storage server, for storing compressed data, and according to the inquiry request for being received from the compression service device Confirm or deny message to its return.
It can be seen that, using scheme of the present invention, if the corresponding compressed data of a certain piecemeal of webpage has been present, i.e., it Preceding a certain webpage exists and the same piecemeal of the webpage and has been compressed storage, then compression is not repeated, otherwise, It is compressed, so as to improve compression efficiency, its complete compression number need not be stored for each webpage by being additionally, since According to, therefore save memory space.
Brief description of the drawings
Fig. 1 is a kind of template schematic diagram.
Fig. 2 is the flow chart of web data compression and storage method embodiment of the present invention.
Fig. 3 is the corresponding dom tree schematic diagram of template shown in Fig. 1.
Fig. 4 is the flow chart of web data compression and storage method preferred embodiment of the present invention.
Fig. 5 is the composition structural representation of web data compression storage system embodiment of the present invention.
Embodiment
For problems of the prior art, propose that the web data after a kind of improvement compresses storage side in the present invention Case, it is possible to increase compression efficiency and saving memory space.
As it was previously stated, in some cases, can there is certain general character between different webpages, such as in same website not Same webpage.
Webpage in same website be all based on greatly a class or a few class templates generation.Fig. 1 is a kind of template schematic diagram, such as Shown in Fig. 1, according to the template, a webpage can be divided into the part of A, B, C, D, E 5 altogether, wherein, A, B, C, D part is in webpages Navigation and the information such as advertisement, E parts are text message.For according to the different web pages of template generation shown in Fig. 1, its A, B, C, D part are typically identical, and only E parts are different.
So, if webpage 1 and webpage 2 are the webpage according to template generation shown in Fig. 1, and compressed net is stored A, B, C, D, E part of page 1, then subsequently when needing to be compressed storage to webpage 2, then can be without recompression storage webpage A, B, C, D part in 2, need to only compress and store the E parts different from webpage 1.
To make technical scheme clearer, clear, develop simultaneously embodiment referring to the drawings, to of the present invention Scheme is described in further detail.
Fig. 2 is the flow chart of web data compression and storage method embodiment of the present invention.As shown in Fig. 2 comprising the following steps:
Step 21:When needing to be compressed any webpage X (for ease of statement, any webpage is represented with webpage X), Webpage X is divided into two or more piecemeal.
How webpage is divided into two or more piecemeal for prior art, such as, can be by the DOM Document Object Model of webpage (DOM, Document Object Model) sets to parse webpage, and then obtains each piecemeal.
Fig. 3 is the corresponding dom tree schematic diagram of template shown in Fig. 1.As shown in figure 3, the part in addition to A, B, C, D part is For E parts.
Step 22:For each piecemeal Y (for ease of statement, any piecemeal is represented with piecemeal Y), determine whether respectively Its stored corresponding compressed data, if it is not, then compression piecemeal Y, and compressed data is stored, if it is, not It is compressed.
In this step, for each piecemeal Y marked off in step 21, its identification information is generated, in actual applications, should Identification information can be signing messages, how be generated as prior art, and determine whether to store the identification information, if it is not, then Piecemeal Y is compressed, compressed data is stored, and correspondingly stores the identification information, if it is, without compression.
In addition, if the corresponding compressed data of non-memory partitioning Y, then after piecemeal Y compressed data is stored, note Record the corresponding relation between the piecemeal Y storage location of compressed data and webpage X URL;If storing the corresponding pressures of piecemeal Y Contracting data, then directly record the corresponding relation between the piecemeal X storage location of compressed data and webpage X URL.
After being disposed in the manner described above to each piecemeal, it will record webpage X URL and multiple storage positions Corresponding relation between putting, the specific value of " multiple " is identical with the block count that webpage X is divided into.
So, when needing to read webpage X, each of webpage X can be found according to the webpage X corresponding each storage locations of URL Piecemeal, is decompressed respectively, and each piecemeal after decompression is spliced, generation webpage X.
Process shown in Fig. 2 is further described below by preferred embodiment.
Fig. 4 is the flow chart of web data compression and storage method preferred embodiment of the present invention.As shown in figure 4, including following Step:
Step 41:When needing to be compressed any webpage X, webpage X is divided into two or more piecemeal.
Step 42:For each piecemeal Y, its identification information is generated, and determines whether to store the identification information, if It is no, then step 43 is performed, if it is, performing step 44.
Step 43:Piecemeal Y is compressed, compressed data is stored, and correspondingly stores its identification information, while recording piecemeal Corresponding relation between the storage location of Y compressed data and webpage X URL, then performs step 45.
Step 44:The corresponding relation between the piecemeal Y storage location of compressed data and webpage X URL is recorded, is then held Row step 45.
Step 45:When needing to read webpage X, each of webpage X is found according to the corresponding each storage locations of webpage X URL Piecemeal, is decompressed respectively, and each piecemeal after decompression is spliced, and generates webpage X, terminates flow.
So far, that is, the introduction on the inventive method embodiment is completed.
Based on above-mentioned introduction, Fig. 5 is the composition structural representation of web data compression storage system embodiment of the present invention.Such as Shown in Fig. 5, including:
Compression service device 51, for when needing to be compressed any webpage, the webpage to be divided into two or more point Block;For each piecemeal, inquiry request is sent to storage server 52 respectively, whether inquiry has wherein stored the piecemeal pair The compressed data answered, message is denied if received, and compresses the piecemeal, and compressed data storage is arrived into storage server 52 In, if receiving confirmation message, without compression;
Storage server 52, for storing compressed data, and according to be received from the inquiry request of compression service device 51 to its Return and confirm or deny message.
Compression service device 51 can be further used for, and when needing to read the webpage, obtain and deposited from storage server 52 Each piecemeal of the webpage of storage, is decompressed respectively, and each piecemeal after decompression is spliced, and generates the webpage.
In addition, compression service device 51 can be further used for, for any piecemeal, its identification information is generated, and carry Storage server 52 is sent in inquiry request;Correspondingly, storage server 52 determines itself whether store the identification information, Deny message if it is not, then being returned to compression service device 51, if it is, returning to confirmation message to compression service device 51;Compression If server 51 have received denies message for any piecemeal, by pressure of the identification information of the piecemeal together with the piecemeal Contracting data correspond to storage into storage server 52 together.
The URL of the webpage can be further carried in above-mentioned inquiry request;Correspondingly, storage server 52 can be used further In if not storing the corresponding compressed data of a piecemeal, after the corresponding compressed data of the piecemeal is stored, record should Corresponding relation between the storage location of the compressed data of piecemeal and the URL of the webpage;If storing the corresponding pressure of the piecemeal Contracting data, then directly record the corresponding relation between the storage location of the compressed data of the piecemeal and the URL of the webpage;Compression clothes Business device 51 obtains the corresponding each storage locations of URL of the webpage from storage server 52, and the net is found according to each storage location Each piecemeal of page.
Above-mentioned identification information can be signing messages.
The specific workflow of system shown in Figure 5 embodiment refer to the identical explanation in above method embodiment, herein Repeat no more.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention God is with principle, and any modification, equivalent substitution and improvements done etc. should be included within the scope of protection of the invention.

Claims (8)

1. a kind of web data compression and storage method, it is characterised in that including:
When needing to be compressed any webpage, two or more piecemeal is divided the webpage into;
For each piecemeal, determine whether to have stored its corresponding compressed data respectively, if it is not, then the piecemeal is compressed, And stored compressed data, if it is, without compression;
If not storing the corresponding compressed data of a piecemeal, after the compressed data of the piecemeal is stored, this point is recorded Corresponding relation between the storage location of the compressed data of block and the uniform resource position mark URL of the webpage;If stored The corresponding compressed data of the piecemeal, then directly record between the storage location of the compressed data of the piecemeal and the URL of the webpage Corresponding relation;
Wherein, it is described to be directed to each piecemeal, determine whether to have stored its corresponding compressed data respectively, if it is not, then pressure Contracted the piecemeal, and compressed data progress storage is included:
For any piecemeal, its identification information is generated, and determines whether to store the identification information, if it is not, then compressing this point Block, compressed data is stored, and correspondingly stores the identification information.
2. according to the method described in claim 1, it is characterised in that this method further comprises:
When needing to read the webpage, each piecemeal of the webpage stored is obtained, is decompressed respectively, and will decompression Each piecemeal after contracting is spliced, and generates the webpage.
3. method according to claim 2, it is characterised in that
Each piecemeal for obtaining the webpage stored includes:Looked for according to the corresponding each storage locations of the URL of the webpage To each piecemeal of the webpage.
4. according to the method described in claim 1, it is characterised in that the identification information is signing messages.
5. a kind of web data compression storage system, it is characterised in that including:
Compression service device, for when needing to be compressed any webpage, dividing the webpage into two or more piecemeal;Pin To each piecemeal, inquiry request is sent to storage server respectively, whether inquiry has wherein stored the corresponding pressure of the piecemeal Contracting data, message is denied if received, and compresses the piecemeal, and by compressed data storage into the storage server, such as Fruit receives confirmation message, then without compression;It is if not storing the corresponding compressed data of a piecemeal, the piecemeal is corresponding After compressed data is stored, the storage location and the URL of the webpage of the compressed data of the piecemeal are recorded Corresponding relation between URL;If storing the corresponding compressed data of the piecemeal, the compressed data of the piecemeal is directly recorded Corresponding relation between the URL of storage location and the webpage;For any piecemeal, its identification information is generated, and carry in institute State and the storage server is sent in inquiry request;Deny message for any piecemeal if having received, by this point The identification information of block corresponds to storage into the storage server together with the compressed data of the piecemeal;
The storage server, for storing compressed data, and according to be received from the inquiry request of the compression service device to its Return and confirm or deny message;Wherein, the URL of the webpage is carried in the inquiry request;Determine whether itself stores The identification information, denies message, if it is, being returned to the compression service device if it is not, then being returned to the compression service device Return confirmation message.
6. system according to claim 5, it is characterised in that the compression service device is further used for, when needing to read During the webpage, each piecemeal of the webpage stored is obtained from the storage server, is decompressed respectively, and will Each piecemeal after decompression is spliced, and generates the webpage.
7. system according to claim 6, it is characterised in that
The corresponding each storage locations of URL that the compression service device obtains the webpage from the storage server, according to each Storage location finds each piecemeal of the webpage.
8. system according to claim 5, it is characterised in that the identification information is signing messages.
CN201110264127.0A 2011-09-07 2011-09-07 A kind of web data compression and storage method and system Expired - Fee Related CN102982046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110264127.0A CN102982046B (en) 2011-09-07 2011-09-07 A kind of web data compression and storage method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110264127.0A CN102982046B (en) 2011-09-07 2011-09-07 A kind of web data compression and storage method and system

Publications (2)

Publication Number Publication Date
CN102982046A CN102982046A (en) 2013-03-20
CN102982046B true CN102982046B (en) 2017-09-26

Family

ID=47856082

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110264127.0A Expired - Fee Related CN102982046B (en) 2011-09-07 2011-09-07 A kind of web data compression and storage method and system

Country Status (1)

Country Link
CN (1) CN102982046B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104376584B (en) * 2013-08-15 2018-02-13 华为技术有限公司 A kind of method of data compression, computer system and device
CN103473214B (en) * 2013-09-06 2017-04-12 百度在线网络技术(北京)有限公司 Method and device for displaying page characters
EP3229444B1 (en) 2015-12-29 2019-10-16 Huawei Technologies Co., Ltd. Server and method for compressing data by server
CN113742335A (en) * 2021-01-28 2021-12-03 北京沃东天骏信息技术有限公司 Data compression management method and device
WO2022198483A1 (en) * 2021-03-24 2022-09-29 深圳市大疆创新科技有限公司 Data compression method and apparatus, movable platform, and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127044A (en) * 2007-06-08 2008-02-20 北京大学 Dynamic web page segmentation method
CN101944109A (en) * 2010-09-06 2011-01-12 华南理工大学 System and method for extracting picture abstract based on page partitioning

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL133888A0 (en) * 2000-01-05 2001-04-30 Keselman Alexander Method and algorithm for viewing search results in the internet and multi-page system using the same
CN1332527A (en) * 2000-07-10 2002-01-23 刘明 WAP-based transmitted data compressing process
CN1182682C (en) * 2001-09-24 2004-12-29 北京大学 Multimedia web site spliting and reconstructing method
CN101079895B (en) * 2006-12-21 2010-12-01 腾讯科技(深圳)有限公司 A method, system and proxy service device for quick access to Web page
CN102148833A (en) * 2011-04-18 2011-08-10 中国工商银行股份有限公司 Method for transmitting data report, server, client and system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127044A (en) * 2007-06-08 2008-02-20 北京大学 Dynamic web page segmentation method
CN101944109A (en) * 2010-09-06 2011-01-12 华南理工大学 System and method for extracting picture abstract based on page partitioning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
一种在线的动态网页分块缓存方法;尤朝等;《电子学报》;20090531;第37卷(第5期);全文 *
基于视觉的Web页面分块算法的改进与实现;高乐等;《计算机系统应用》;20090430(第4期);全文 *
面向移动设备的WEB页面分块算法;路松峰等;《小型微型计算机系统》;20070930(第9期);全文 *

Also Published As

Publication number Publication date
CN102982046A (en) 2013-03-20

Similar Documents

Publication Publication Date Title
US9317792B2 (en) Method and apparatus for using a limited capacity portable data carrier
CN106170921B (en) It is related to the source code and decoding method and device of the data of sign compression
CN102982046B (en) A kind of web data compression and storage method and system
CN107886560B (en) Animation resource processing method and device
CN102571966B (en) Network transmission method for large extensible markup language (XML) document
US7924183B2 (en) Method and system for reducing required storage during decompression of a compressed file
CN102768662B (en) A kind of method and apparatus Loaded Image
CN116506073B (en) Industrial computer platform data rapid transmission method and system
US8824560B2 (en) Virtual frame buffer system and method
CN103679487A (en) Advertisement display monitoring method and device
CN103346800B (en) A kind of data compression method and device
CN111510718A (en) Method and system for improving compression ratio through inter-block difference of image file
CN105096367A (en) Method and device of optimizing Canvas rendering performance
CN110321354A (en) Structured data storage method, device, equipment and storage medium
CN104408503B (en) The processing method and system of Quick Response Code
CN110019347A (en) A kind of data processing method, device and the terminal device of block chain
CN106293542B (en) Method and device for decompressing file
JP5180470B2 (en) Electronic color code and information processing system
CN102768755B (en) Obtain the method and apparatus of the thumbnail of picture
JP5110304B2 (en) Screen data transmitting apparatus, screen data transmitting method, and screen data transmitting program
CN105704215A (en) File sharing system and corresponding file sending and receiving method and device
CA2535282A1 (en) A method and system for message thread compression
US9002135B2 (en) Form image management system and form image management method
CN100511212C (en) Processing method and apparatus for electronic table file
JP4446102B2 (en) Data compression / decompression system, data compression device, data decompression device, and program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170926

CF01 Termination of patent right due to non-payment of annual fee