CN103593467B - Method and device for generating webpage template and achieving incremental transmission - Google Patents
Method and device for generating webpage template and achieving incremental transmission Download PDFInfo
- Publication number
- CN103593467B CN103593467B CN201310612919.1A CN201310612919A CN103593467B CN 103593467 B CN103593467 B CN 103593467B CN 201310612919 A CN201310612919 A CN 201310612919A CN 103593467 B CN103593467 B CN 103593467B
- Authority
- CN
- China
- Prior art keywords
- web page
- webpage
- page template
- data
- cryptographic hash
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
- G06F16/9577—Optimising the visualization of content, e.g. distillation of HTML documents
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a method and device for generating a webpage template and achieving incremental transmission. The method includes the steps of obtaining webpage data of a webpage, generating a hash value label for the webpage data, searching for the webpage template corresponding to the hash value label, calculating and finding incremental code data between the webpage template and the webpage, and determining whether to generate a new webpage template or not according to the incremental code data obtained through calculation. By means of the method and the device, the problem that in the prior art, the system overhead is large when incremental transmission is achieved and the webpage template is generated is solved, and the effect of saving the system overhead is achieved.
Description
Technical field
The present invention relates to browser field, in particular to a kind of generation side of the web page template for realizing incremental transmission
Method and device.
Background technology
For the mobile phone browser of C/S frameworks, when user browses webpage using browser, browser is by locally delaying
Web page template is deposited, service end only needs to transmit the incremental encoding data of webpage, improve clear so as to reach saving network data transmission
Look at the effect of speed.
In actual applications, whether not all webpage all utilizes the web page template of caching, for using caching
Web page template, the size of the incremental encoding data being often decided by between template and webpage, if the increasing between template and webpage
Amount coded data is small, then using the web page template of caching, if the incremental encoding data between template and webpage are not small enough, no
Using the web page template of caching, new web page template is created.In the prior art, the new webpage mould of generation is being determined the need for
During plate, if incrementally the size of coded data is judged, need to calculate increment of each web page template for webpage
Coded data, if being cached with a large amount of web page templates, can cause overhead than larger.
For in the prior art when incremental transmission generation web page template is realized overhead than larger problem, at present still
Effective solution is not proposed.
The content of the invention
Generation method and device it is a primary object of the present invention to provide a kind of web page template for realizing incremental transmission, with
Solve the problems, such as in the prior art realize incremental transmission generation web page template when overhead than larger.
To achieve these goals, according to an aspect of the invention, there is provided a kind of net for realizing incremental transmission
The generation method of page template.The generation method includes:Obtain the web data of webpage;Cryptographic Hash label is generated to web data;
Search web page template corresponding with cryptographic Hash label;Calculate the incremental encoding data between the web page template and webpage for finding;
Incremental encoding data according to being calculated determine whether the new web page template of generation.
Further, web data generation cryptographic Hash label is included:Cryptographic Hash is generated according to web data, according to default
Rule is arranged cryptographic Hash, and taking prefix according to domain name and to the cryptographic Hash after arrangement obtains multiple cryptographic Hash labels;Search
Web page template corresponding with cryptographic Hash label includes:According to cryptographic Hash label lookup template table, obtain and multiple cryptographic Hash labels
Corresponding web page template.
Further, determine whether that the new web page template of generation includes according to the incremental encoding data being calculated:Compare
The web page template data and web data for finding, obtain webpage incremental encoding data;Whether judge webpage incremental encoding data
More than given threshold;If webpage incremental encoding data are less than or equal to given threshold, based on the corresponding net of incremental encoding data
Page template transmission incremental encoding data;If webpage incremental encoding data are more than given threshold, new web page template is generated.
Further, determine whether that the new web page template of generation includes according to the incremental encoding data being calculated:Compare
The web page template data and web data for finding, obtain webpage incremental encoding data;Calculate webpage incremental encoding data and net
The ratio of page data;Judge whether the incremental encoding data of webpage are less than setting fractional threshold with the ratio of web data;If
Webpage incremental encoding data are less than setting fractional threshold with the ratio of web data, and webpage is added into incremental encoding data correspondence
Web page template covering webpage concentrate;If webpage incremental encoding data are more than or equal to setting ratio with the ratio of web data
Threshold value, generates new web page template.
Further, after lookup web page template corresponding with cryptographic Hash label, the method also includes:Judge whether to deposit
In two cryptographic Hash label correspondence identical web page templates;If two cryptographic Hash label correspondence identical web page templates, obtain
Take the corresponding webpage of two cryptographic Hash labels, and the corresponding webpage of two cryptographic Hash labels is added to identical web page template
The webpage of covering is concentrated.
To achieve these goals, according to another aspect of the present invention, there is provided a kind of net for realizing incremental transmission
The generating means of page template.The generating means include:Acquiring unit, the web data for obtaining webpage;Tag unit, is used for
Cryptographic Hash label is generated to web data;Searching unit, for searching web page template corresponding with cryptographic Hash label;Calculate single
Unit, for calculating the incremental encoding data between the web page template and webpage that find;Generation unit, is calculated for basis
Incremental encoding data determine whether the new web page template of generation.
Further, tag unit includes:First generation module, for generating cryptographic Hash according to web data;Arrangement mould
Block, for being arranged cryptographic Hash according to preset rules;Label model, for being taken according to domain name and to the cryptographic Hash after arrangement
Prefix obtains multiple cryptographic Hash labels, wherein, searching unit is used for according to cryptographic Hash label lookup template table, obtains multiple Hash
The corresponding web page template of value label.
Further, generation unit includes:Comparison module, for comparing the web page template data and webpage number that find
According to obtaining webpage incremental encoding data;First judge module, for judging webpage incremental encoding data whether more than setting threshold
Value;First transport module, for when webpage incremental encoding data are less than or equal to given threshold, based on incremental encoding data correspondence
Web page template transmission incremental encoding data;Second generation module, for webpage incremental encoding data be more than given threshold when,
The new web page template of generation.
Further, generation unit includes:Comparison module, for comparing the web page template data and webpage number that find
According to obtaining webpage incremental encoding data;Computing module, the ratio for calculating webpage incremental encoding data and web data;The
Two judge modules, for judging whether the incremental encoding data of webpage are less than setting ratio threshold with the ratio of netpage coded data
Value;Second transport module, during for the ratio in webpage incremental encoding data and web data less than setting fractional threshold, by net
The webpage that page is added to the corresponding web page template covering of incremental encoding data is concentrated;3rd generation module, in multiple increments
When coded data is more than or equal to setting fractional threshold with the ratio of web data, new web page template is generated.
Further, the device also includes:Judging unit is identical for judging whether two cryptographic Hash label correspondences
Web page template;Combining unit, in two cryptographic Hash label correspondence identical web page templates, obtaining two cryptographic Hash marks
Sign corresponding webpage, and the corresponding webpage of two cryptographic Hash labels is added to the webpage collection of identical web page template covering
In.
By the present invention, the cryptographic Hash label of fixed number is generated using the cryptographic Hash of web data, according to the Kazakhstan of generation
Uncommon value label lookup web page template, it is only necessary to which the web page template to finding carries out the judgement of incremental encoding size of data, without
All web page templates are calculated, the overhead ratio when incremental transmission generation web page template is realized has been solved in technology
Larger problem, and then reached the effect for saving overhead.
Brief description of the drawings
The accompanying drawing for constituting the part of the application is used for providing a further understanding of the present invention, schematic reality of the invention
Apply example and its illustrate, for explaining the present invention, not constitute inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the signal of the web page template generating means for realizing incremental transmission according to a first embodiment of the present invention
Figure;
Fig. 2 is the signal of the web page template generating means for realizing incremental transmission according to a second embodiment of the present invention
Figure;
Fig. 3 is the signal of the web page template generating means for realizing incremental transmission according to a third embodiment of the present invention
Figure;
Fig. 4 is the signal of the web page template generating means for realizing incremental transmission according to a fourth embodiment of the present invention
Figure;
Fig. 5 is the signal of the web page template generating means for realizing incremental transmission according to a fifth embodiment of the present invention
Figure;
Fig. 6 is the flow chart of the web page template generation method for realizing incremental transmission according to embodiments of the present invention;
Fig. 7 is to generate cryptographic Hash in the web page template generation method for realizing incremental transmission according to embodiments of the present invention
The flow chart of label;
Fig. 8 is that life is determined whether in the web page template generation method for realizing incremental transmission according to embodiments of the present invention
Into the flow chart of web page template method;
Fig. 9 is that life is determined whether in the web page template generation method for realizing incremental transmission according to embodiments of the present invention
Into the method for optimizing flow chart of web page template;And
Figure 10 is the flow of the web page template generation method for realizing incremental transmission according to a second embodiment of the present invention
Figure.
Specific embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase
Mutually combination.Describe the present invention in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention
Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present invention, it is clear that described embodiment is only
The embodiment of a part of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people
The every other embodiment that member is obtained under the premise of creative work is not made, should all belong to the model of present invention protection
Enclose.
It should be noted that term " first ", " in description and claims of this specification and above-mentioned accompanying drawing
Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so using
Data can exchange in the appropriate case, so that embodiments of the invention described herein for example can be with except scheming herein
Order beyond those shown or describe is implemented.Additionally, term " comprising " and " having " and their any deformation, it is intended that
In cover it is non-exclusive include, for example, the process, method, system, product or the equipment that contain series of steps or unit need not
Be limited to those steps clearly listed or unit, but may include not list clearly or for these processes, method,
Product or other intrinsic steps of equipment or unit.
The embodiment of the invention provides a kind of web page template generating means for realizing incremental transmission.The web page template is given birth to
It is used to generate web page template to realize that the increment to web page contents is transmitted into device.
Fig. 1 is the signal of the web page template generating means for realizing incremental transmission according to a first embodiment of the present invention
Figure.As illustrated, this realizes that the web page template generating means of incremental transmission include acquiring unit 10, tag unit 20, search list
Unit 30, computing unit 40 and generation unit 50.
Acquiring unit 10 is used to obtain the web data of webpage.Acquiring unit 10 can obtain the webpage of any one webpage
Data, the web data of acquisition can include obtaining the data of all the elements in webpage, such as in webpage in news data, webpage
Ad data, link data in webpage etc..
Obtaining web data can first obtain web page address, then further according to the web page address for getting from web data
Corresponding web data is obtained in table.The field portions and description section of web data can be stored in web data table.For example,
The field of storage is " network address ", and corresponding description is that " web page address removes protocol section, removes anchor section, and domain name is overturn by domain
Part ".For example:http://www.sina.com.cn/a/b.phpAc=b#ab, corresponding description section should be:
cn.com.sina.www/a/b.php?ac=b。
Tag unit 20 is used to generate cryptographic Hash label to web data.One web data can generate a Hash
Value, the cryptographic Hash of generation can be the cryptographic Hash, or the cryptographic Hash of 128 of 64, can according to system it needs to be determined that
Generate the cryptographic Hash of suitable digit.For example, in 64 cryptographic Hash can just meet desired system, the cryptographic Hash of generation 128
The burden of system can be increased on the contrary, thus it is suitable in the generation digit of cryptographic Hash.
The cryptographic Hash that tag unit 20 can be generated by web data generates multiple cryptographic Hash labels, each cryptographic Hash mark
One web page template of correspondence is signed, the web page template covers one or more webpages.
Searching unit 30 is used to search web page template corresponding with cryptographic Hash label.Web page template can be a webpage,
One webpage can be as the template of another webpage, a web page template correspondence multiple cryptographic Hash label, can be by it
In any one cryptographic Hash label searched.Web page template can be stored in the web page template in caching, searching unit 30
Web page template corresponding with cryptographic Hash label is searched from caching.
Computing unit 40 is for calculating the incremental encoding data between the web page template and webpage that find.Web page template with
Incremental encoding data between webpage can be data divisions different between web data and web page template data.If searched
Web page template have a multiple, calculate the incremental encoding data between multiple web page templates and webpage.
Generation unit 50 is used to determine whether the new web page template of generation according to the incremental encoding data being calculated.If
The incremental encoding data being calculated then generate new web page template more than the threshold value of setting;If the increment being calculated is compiled
Code data then directly invoke the web page template of caching less than or equal to the threshold value of setting.
By foregoing description it is recognised that by the corresponding web page template of cryptographic Hash label lookup, and can calculate and search
To the incremental encoding data between web page template and webpage, it is determined that directly invoking the web page template in caching or generating new net
Page template, so as to reduce the number of times that web page template and webpage are compared, has saved overhead.
Fig. 2 is the signal of the web page template generating means for realizing incremental transmission according to a second embodiment of the present invention
Figure.The web page template generating means for realizing incremental transmission of embodiment illustrated in fig. 2 can increase as the realization of embodiment illustrated in fig. 1
Measure the preferred embodiment of the web page template generating means of transmission.The webpage for realizing incremental transmission of the preferred embodiment
Template generation device includes acquiring unit 10, tag unit 20, searching unit 30, computing unit 40 and generation unit 50, wherein,
Tag unit 20 includes the first generation module 201, arrangement module 202 and label model 203.
Acquiring unit 10, searching unit 30, computing unit 40 and generation unit 50 and Fig. 1 institutes in embodiment illustrated in fig. 2
Show that the function phase of acquiring unit 10 in embodiment, searching unit 30, computing unit 40 and generation unit 50 is same, do not do herein superfluous
State.
First generation module 201 is used to generate cryptographic Hash according to web data.Calculating cryptographic Hash according to web data can be with
It is simhash values, wherein, simhash is the one kind in local sensitivity hash algorithm.Below generating one 64
The method for generating cryptographic Hash is described in detail as a example by simhash values.
First, the integer vectors V [i] of one 64 dimension is initialized as 0.Be initialized as 0 can be by 64 dimension integer to
The mould of amount V [i] is set to 0.
Secondly, it is the n substring of byte to cut length from each byte position of webpage, used as the feature set of webpage.
The number of the feature in the feature set of webpage can be the number of the byte of whole webpage.Each in feature set is characterized in one
The n word string of byte.N can take 64, or 32, or other numerical value such as 20.
Again, to each feature in feature set, the binary system of 64 is produced using a character string hash function
Integer, to each position i of the bigit of 64:If value is 1, V [i] Jia 1, V [i] is subtracted 1.
Finally, create an integer for new 64, the integer of new 64 for creating be by 64 each positions of integer with
It is obtained in the previous step vector correspond, that is, the integer to this new establishment each position i, it is vectorial in upper step
V [i] is set to 1 when being not less than 0, is otherwise set to 0, thus obtains the integer of the binary number representation of 64, this integer conduct
Simhash values.
Arrangement module 202 is used to carry out step-by-step random alignment to cryptographic Hash according to preset rules.For example, can be breathed out to 64
Uncommon value carries out 32 random alignments.
It should be noted that the number of times of arrangement can need and be determined according to real system, however it is not limited to the present invention
The number of times that embodiment is provided.Arrangement number of times provided in an embodiment of the present invention is used for the purpose of elaborating embodiment party of the invention
Case, does not do exhaustive and limits.
Label model 203 is used to be worth to multiple cryptographic Hash labels according to the Hash after domain name and arrangement.Cryptographic Hash is entered
After row arrangement, the cryptographic Hash after arrange, regular length position and domain name are together as Kazakhstan before taking the cryptographic Hash after arrangement
Uncommon value label.For example, after the random alignment of 32 times is done to the cryptographic Hash step-by-step of 64, obtaining the cryptographic Hash after 32 arrangements.
The domain name of first 16 of cryptographic Hash after arrangement and webpage can be taken as cryptographic Hash label, its form can be " domain name/arrangement
First 16 of cryptographic Hash afterwards ".So, the cryptographic Hash step-by-step of 64 is done after the random alignment of 32 times, obtains 32 cryptographic Hash
Label.
It is pointed out that first 16 that take the cryptographic Hash after arrangement here can also take 32 or 8, here only
It is, for clearer elaboration the solution of the present invention, the present invention program not to be done and is improperly limited.
Searching unit 40 is used for according to cryptographic Hash label lookup template table, obtains webpage corresponding with multiple cryptographic Hash labels
Template.Web page template corresponding with multiple cryptographic Hash labels can be one, or multiple.Searching unit can carry out N
Secondary lookup, searches number of times of the number of times no more than arrangement.For example, after the random alignment of 32 times has been carried out to cryptographic Hash, at most
Carry out the lookup of 32 times.
Cryptographic Hash is generated by web page template, and cryptographic Hash label is worth to according to Hash, using cryptographic Hash label lookup
Web page template can greatly reduce the number of times of query webpage template, increased the speed of system processes data, and improve and look into
Look for the accuracy of web page template.
Fig. 3 is the signal of the web page template generating means for realizing incremental transmission according to a third embodiment of the present invention
Figure.The web page template generating means for realizing incremental transmission of embodiment illustrated in fig. 3 can increase as the realization of embodiment illustrated in fig. 1
Measure the preferred embodiment of the web page template generating means of transmission.The webpage for realizing incremental transmission of the preferred embodiment
Template generation device includes acquiring unit 10, tag unit 20, searching unit 30, computing unit 40 and generation unit 50, wherein,
Generation unit 50 includes:Comparison module 501, the first judge module 503, the first transport module 505 and the second generation module 507.
The acquiring unit 10 of embodiment illustrated in fig. 3, tag unit 20, searching unit 30 and computing unit 40 with shown in Fig. 1
The acquiring unit 10 of embodiment, tag unit 20, searching unit 30 and the function phase of computing unit 40 are same, will not be described here.
Comparison module 501 is used to compare the web page template data and the web data for finding, and obtains webpage increment volume
Code data.The web page template for finding can be a web page template, or multiple web page templates, in the webpage for finding
When template is multiple template, the incremental encoding data between each template and web data in multiple template can be compared.Can
To obtain webpage incremental encoding data by way of comparing one by one, it is also possible to while each mould in being respectively compared multiple template
Incremental encoding data between plate and web data.
Whether the first judge module 503 is used to judge webpage incremental encoding data more than given threshold.If incremental encoding
Data are larger, that is, webpage is lower with the similarity of web page template, when incremental encoding data are more than threshold value, find
Web page template cannot meet the requirement of webpage, it is impossible to as the template of webpage, it is therefore desirable to incremental encoding data and threshold
Value is compared.
First transport module 505 is used for when webpage incremental encoding data are less than or equal to given threshold, based on incremental encoding
The corresponding web page template transmission incremental encoding data of data.
If incremental encoding data are less than or equal to given threshold, then the similarity of webpage and the web page template for finding compared with
Height, the web page template for finding disclosure satisfy that the requirement of webpage, then the web page template that will can be found is stored in the webpage mould
The webpage of plate covering is concentrated, it is also possible to transmit incremental encoding data based on the web page template for finding.
Second generation module 507 is used to, when webpage incremental encoding data are more than given threshold, generate new web page template.
If incremental encoding data are more than given threshold, then webpage is unable to reach webpage with the similarity of the web page template for finding
It is required that, then the new web page template of generation.The new web page template of generation can be the webpage that will get as new webpage mould
Plate.
By directly comparing the size of incremental encoding data and given threshold, it is determined that directly invoke caching web page template or
Person generates new web page template, can more facilitate and accurately judgement is called or generates web page template.
Fig. 4 is the signal of the web page template generating means for realizing incremental transmission according to a fourth embodiment of the present invention
Figure.The web page template generating means for realizing incremental transmission of embodiment illustrated in fig. 4 can increase as the realization of embodiment illustrated in fig. 1
Measure the preferred embodiment of the web page template generating means of transmission.The webpage for realizing incremental transmission of the preferred embodiment
Template generation device includes acquiring unit 10, tag unit 20, searching unit 30, computing unit 40 and generation unit 50, wherein,
Generation unit 50 includes:Comparison module 501, computing module 502, the second judge module 504, the second transport module 506 and the 3rd
Generation module 508.
The acquiring unit 10 of embodiment illustrated in fig. 4, tag unit 20, searching unit 30 and computing unit 40 with shown in Fig. 1
The acquiring unit 10 of embodiment, tag unit 20, searching unit 30 and the function phase of computing unit 40 are same, will not be described here.
Comparison module 501 compares the web page template data and the web data for finding, and obtains webpage incremental encoding number
According to.The web page template for finding can be a web page template, or multiple web page templates, in the web page template for finding
During for multiple template, the incremental encoding data between each template and web data in multiple template can be compared.Can lead to
Cross the mode for comparing one by one and obtain webpage incremental encoding data, it is also possible at the same each template in being respectively compared multiple template with
Incremental encoding data between web data.
Computing module 502 is used to calculate the ratio of webpage incremental encoding data and web data.Incremental encoding number can be used
According to than upper web data, then incremental encoding data are smaller with the ratio of web data, then the similarity of web page template and webpage
Higher, incremental encoding data are bigger with the ratio of web data, then web page template is lower with the similarity of webpage.
Second judge module 504 is used to judge whether the incremental encoding data of webpage are less than setting with the ratio of web data
Fractional threshold.Can be by judging whether webpage incremental encoding data are less than setting fractional threshold with the ratio of web data, really
Determine webpage and web page template similarity whether meet webpage the need for.
Second transport module 506 is used for the ratio in webpage incremental encoding data and web data less than setting fractional threshold
When, the webpage that webpage is added to the corresponding web page template covering of incremental encoding data is concentrated.If incremental encoding data and net
The ratio of page data is less than setting fractional threshold, then webpage disclosure satisfy that the requirement of webpage with the similarity of web page template, can
The webpage concentration that the corresponding web page template of incremental encoding data is covered is added to by the webpage, when reusing the web page template
The web page template can be directly invoked, and incremental encoding data are transmitted based on the web page template for calling, without again
Generation.
3rd generation module 508 is used for the ratio in multiple incremental encoding data and web data more than setting fractional threshold
When, generate new web page template.If incremental encoding data are more than or equal to setting fractional threshold with the ratio of web data, then
Webpage cannot meet the requirement of webpage with the similarity of web page template, can generate new web page template, it is also possible to straight with webpage
Connect as new web page template.
Can more accurately be sentenced compared with setting fractional threshold with the ratio of web data by incremental encoding data
Suspension page and the similarity of web page template, the setting fractional threshold go for including the webpage of different pieces of information amount, without
The threshold value of different incremental encoding data must be set according to the data volume of different web pages, improve and webpage is transmitted based on web page template
The applicability of delta file method, and more facilitate.
Fig. 5 is the signal of the web page template generating means for realizing incremental transmission according to a fifth embodiment of the present invention
Figure.The web page template generating means for realizing incremental transmission of embodiment illustrated in fig. 5 can increase as the realization of embodiment illustrated in fig. 1
Measure the preferred embodiment of the web page template generating means of transmission.The webpage for realizing incremental transmission of the preferred embodiment
Template generation device includes that acquiring unit 10, tag unit 20, searching unit 30, computing unit 40, generation unit 50, judgement are single
Unit 60 and combining unit 70.
The acquiring unit 10 of embodiment illustrated in fig. 5, tag unit 20, searching unit 30, computing unit 40 and generation unit
50 with the acquiring unit 10 of embodiment illustrated in fig. 1, tag unit 20, searching unit 30, computing unit 40 and the work(of generation unit 50
Can be identical, will not be described here.
Judging unit 60 is used to judge whether two cryptographic Hash label correspondence identical web page templates.One cryptographic Hash
Label can correspond to one or more webpages, if the corresponding web page template of two cryptographic Hash labels is identical, two cryptographic Hash
The corresponding webpage of label also corresponds to an identical web page template.
Combining unit 70 is used to, in two cryptographic Hash label correspondence identical web page templates, obtain two cryptographic Hash labels
Corresponding webpage, and the webpage that the corresponding webpage of two cryptographic Hash labels is added to the covering of identical web page template is concentrated.
One cryptographic Hash label can correspond to multiple webpages, and multiple webpages can correspond to multiple web page templates, many in web page template table
Individual webpage is likely to be present under different web page templates.If the corresponding web page template of two cryptographic Hash labels is identical,
The corresponding webpage of two cryptographic Hash labels is added into the webpage that identical web page template covered in web page template table to concentrate.
By the merging and the renewal of web page template table of webpage, the webpage that can will be possible to use same web page template is returned
Belong to the webpage covered in same web page template to concentrate, can more easily be inquired when query webpage template table needs again
The web page template wanted, and directly invoke the web page template, it is to avoid the overhead for generating again and bringing.
The embodiment of the present invention additionally provides a kind of web page template generation method for realizing incremental transmission.The present invention is implemented
What the web page template generation method for realizing incremental transmission of example can be provided by the embodiment of the present invention realizes that increment is passed
Defeated web page template generating means are performed, and the web page template generating means for realizing incremental transmission of the embodiment of the present invention can also
For performing the web page template generation method for realizing incremental transmission that the embodiment of the present invention is provided.
Below according to accompanying drawing to for realizing that the web page template generation method of incremental transmission is described in detail.Need explanation
, following step and can be in the such as one group calculating of computer executable instructions the step of the flow of accompanying drawing is illustrated
Performed in machine system, and, although logical order is shown in flow charts, but in some cases, can be being different from
Order herein performs shown or described step.
Fig. 6 is the flow chart of the web page template generation method for realizing incremental transmission according to embodiments of the present invention.With
Lower combination flow chart is illustrated to the web page template generation method for realizing incremental transmission of the present embodiment.As illustrated, should
Method comprises the following steps:
Step S101, obtains the web data of webpage.The web data for obtaining webpage can obtain any one webpage
Web data, the web data of acquisition can include obtaining the data of all the elements in webpage, such as news data, net in webpage
Ad data in page, the link data in webpage etc..
Obtaining web data can first obtain web page address, then further according to the web page address for getting from web data
Corresponding web data is obtained in table.The field portions and description section of web data can be stored in web data table.For example,
The field of storage is " network address ", and corresponding description is that " web page address removes protocol section, removes anchor section, and domain name is overturn by domain
Part ".For example:http://www.sina.com.cn/a/b.phpAc=b#ab, corresponding description section can be:
cn.com.sina.www/a/b.php?ac=b。
Step S102, cryptographic Hash label is generated to web data.One web data can generate a cryptographic Hash, generation
Cryptographic Hash can be the cryptographic Hash, or the cryptographic Hash of 128 of 64, can be according to system it needs to be determined that generation is suitable
The cryptographic Hash of digit.For example, in 64 cryptographic Hash can just meet desired system, the cryptographic Hash of generation 128 can increase on the contrary
The burden of adding system, therefore generation here is the cryptographic Hash that suitable digit is generated according to system situation.
The cryptographic Hash generated by web data generates cryptographic Hash label, and each cryptographic Hash label can correspond to a net
Page, it is also possible to correspondence multiple webpage.
Step S103, searches web page template corresponding with cryptographic Hash label.Web page template can be a webpage, a net
Page can be as the template of another webpage, cryptographic Hash label one web page template of correspondence.Web page template can be protected
Web page template in the buffer is deposited, searching unit 30 searches web page template corresponding with cryptographic Hash label from caching.
Step S104, calculates the incremental encoding data between the web page template and webpage for finding.Web page template and webpage
Between incremental encoding data can be data divisions different between web data and web page template data.If the net searched
Page template has multiple, calculates the incremental encoding data between multiple web page templates and webpage.
Step S105, the new web page template of generation is determined whether according to the incremental encoding data being calculated.If calculated
The incremental encoding data for obtaining then generate new web page template more than the threshold value of setting;If the incremental encoding number being calculated
According to the threshold value less than or equal to setting, then the web page template of caching is directly invoked.
By foregoing description it is recognised that by the corresponding web page template of cryptographic Hash label lookup, and can calculate and search
To the incremental encoding data between web page template and webpage, it is determined that directly invoking the web page template in caching or generating new net
Page template, so as to reduce the number of times that web page template and webpage are compared, has saved overhead.
Fig. 7 is to generate cryptographic Hash in the web page template generation method for realizing incremental transmission according to embodiments of the present invention
The flow chart of label.The method comprises the following steps:
Step S201, cryptographic Hash is generated according to web data.It can be simhash to calculate cryptographic Hash according to web data
Value.The method for generating cryptographic Hash is described in detail as a example by generating the one 64 simhash value for being below.
First, the integer vectors V [i] of one 64 dimension is initialized as 0.Be initialized as 0 can be by 64 dimension integer to
The mould of amount V [i] is set to 0.
Secondly, it is the n substring of byte to cut length from each byte position of webpage, used as the feature set of webpage.
The number of the feature in the feature set of webpage can be the number of the byte of whole webpage.Each in feature set is characterized in one
The n word string of byte.N can take 64, or 32, or other numerical value such as 20.
Again, to each feature in feature set, the binary system of 64 is produced using a character string hash function
Integer, to each position i of the bigit of 64:If value is 1, V [i] Jia 1, V [i] is subtracted 1.
Finally, create an integer for new 64, the integer of new 64 for creating be by 64 each positions of integer with
It is obtained in the previous step vector correspond, that is, the integer to this new establishment each position i, the vectorial V in upper step
[i] is set to 1 when being not less than 0, is otherwise set to 0, thus obtains the integer of the binary number representation of 64, this integer conduct
Simhash values.
Step S202, step-by-step random alignment is carried out according to preset rules to cryptographic Hash.For example, can enter to 64 cryptographic Hash
32 random alignments of row.
It should be noted that arrangement number of times here can need and be determined according to real system, however it is not limited to this
The number of times that inventive embodiments are provided.Arrangement number of times provided in an embodiment of the present invention is used for the purpose of elaborating implementation of the invention
Scheme, does not do exhaustive and limits.
Step S203, multiple cryptographic Hash labels are worth to according to the Hash after domain name and arrangement.Cryptographic Hash is arranged
Afterwards, the cryptographic Hash after being arranged, regular length position and domain name are together as cryptographic Hash mark before taking the cryptographic Hash after arrangement
Sign.For example, after the random alignment of 32 times is done to the cryptographic Hash step-by-step of 64, obtaining the cryptographic Hash after 32 arrangements.Can take
, used as cryptographic Hash label, its form can be " Hash after domain name/arrangement for first 16 of cryptographic Hash after arrangement and the domain name of webpage
First 16 of value ".So, the cryptographic Hash step-by-step of 64 is done after the random alignment of 32 times, obtains 32 cryptographic Hash labels.
It is pointed out that first 16 that take the cryptographic Hash after arrangement here can also take 32 or 8, here only
It is, for clearer elaboration the solution of the present invention, the present invention program not to be done and is improperly limited.
Step S204, according to cryptographic Hash label lookup template table, obtains web page template corresponding with multiple cryptographic Hash labels.
Web page template corresponding with multiple cryptographic Hash labels can be one, or multiple.Searching unit can carry out looking into for n times
Look for, search number of times of the number of times no more than arrangement.For example, after the random alignment of 32 times has been carried out to cryptographic Hash, at most carrying out 32
Secondary lookup.
Cryptographic Hash is generated by web page template, and cryptographic Hash label is worth to according to Hash, using cryptographic Hash label lookup
Web page template can greatly reduce the number of times of query webpage template, increased the speed of system processes data, and improve and look into
Look for the accuracy of web page template.
Fig. 8 is that life is determined whether in the web page template generation method for realizing incremental transmission according to embodiments of the present invention
Into the flow chart of web page template method.The method comprises the following steps:
Step S301, compares the web page template data and web data for finding, and obtains webpage incremental encoding data.Search
To web page template can be a web page template, or multiple web page template, be multiple in the web page template for finding
During web page template, to compare the incremental encoding data of each template and webpage in multiple template.
Whether step S302, judge webpage delta coded file more than given threshold.If incremental encoding data are larger,
It is exactly webpage lower with the similarity of web page template, when incremental encoding data are more than threshold value, the web page template for finding
The requirement of webpage cannot be met, it is impossible to as the template of webpage, it is therefore desirable to which incremental encoding data and threshold value are compared.
Step S303, if multiple web pages incremental encoding data are less than or equal to given threshold, based on incremental encoding data pair
The web page template transmission incremental encoding data answered.If incremental encoding data are less than or equal to given threshold, then webpage and lookup
The similarity of the web page template for arriving is higher, and the web page template for finding disclosure satisfy that the requirement of webpage, then can be based on finding
Web page template transmission incremental encoding data.
Step S304, if webpage incremental encoding data are more than given threshold, generates new web page template.If increment
Coded data is more than given threshold, then webpage is unable to reach the requirement of webpage with the similarity of the web page template for finding, that
The new web page template of generation.The new web page template of generation can be the webpage that will get as new web page template.
By directly comparing the size of incremental encoding data and given threshold, it is determined that directly invoke caching web page template or
Person generates new web page template, can more facilitate and accurately judgement is called or generates web page template.
Fig. 9 is that life is determined whether in the web page template generation method for realizing incremental transmission according to embodiments of the present invention
Into the method for optimizing flow chart of web page template.The method comprises the following steps:
Step S401, compares the web page template data and web data for finding, and obtains webpage incremental encoding data.Search
To web page template can be a web page template, or multiple web page template, be multiple in the web page template for finding
During web page template, to compare the incremental encoding data of each template and webpage in multiple template.
Step S402, calculates the ratio of webpage incremental encoding data and web data.Can be with incremental encoding data than upper
Web data, then incremental encoding data are smaller with the ratio of web data, then web page template is higher with the similarity of webpage, increases
Amount coded data is bigger with the ratio of web data, then web page template is lower with the similarity of webpage.
Step S403, judges whether the incremental encoding data of webpage are less than setting fractional threshold with the ratio of web data.
Webpage and net can be determined by judging whether webpage incremental encoding data are less than setting fractional threshold with the ratio of web data
The need for whether the similarity of page template meets webpage.
Step S404, if webpage incremental encoding data are less than setting fractional threshold with the ratio of web data, by net
The webpage that page is added to the corresponding web page template covering of incremental encoding data is concentrated.If incremental encoding data and web data
Ratio is less than setting fractional threshold, then webpage disclosure satisfy that the requirement of webpage with the similarity of web page template, can be by webpage
The webpage for being added to the corresponding web page template covering of incremental encoding data is concentrated, and can directly be adjusted when needing the web page template again
The web page template is used, and incremental encoding data are transmitted based on the web page template.
Step S405, if webpage incremental encoding data are more than or equal to setting fractional threshold with the ratio of web data,
The new web page template of generation.If incremental encoding data are more than or equal to setting fractional threshold with the ratio of web data, then net
Page cannot meet the requirement of webpage with the similarity of web page template, can generate new web page template, it is also possible to direct with webpage
As new web page template.
Can more accurately be sentenced compared with setting fractional threshold with the ratio of web data by incremental encoding data
Suspension page and the similarity of web page template, the setting fractional threshold go for including the webpage of different pieces of information amount, without
The threshold value of different incremental encoding data must be set according to the data volume of different web pages, improve and webpage is transmitted based on web page template
The applicability of delta file method, and more facilitate.
Figure 10 is the flow of the web page template generation method for realizing incremental transmission according to a second embodiment of the present invention
Figure.The method comprises the following steps:
Step S501, obtains the web data of webpage.The web data for obtaining webpage can obtain any one webpage
Web data, the web data of acquisition can include obtaining the data of all the elements in webpage.
Step S502, cryptographic Hash label is generated to web data.One web data can generate a cryptographic Hash, generation
Cryptographic Hash can be the cryptographic Hash, or the cryptographic Hash of 128 of 64, can be according to system it needs to be determined that generation is suitable
The cryptographic Hash of digit.
Step S503, searches web page template corresponding with cryptographic Hash label.Web page template can be a webpage, a net
Page can be as the template of another webpage, and web page template corresponding with cryptographic Hash label can be a web page template,
Can be multiple web page templates.Web page template can be stored in the web page template in caching, can be with when web page template is needed
Web page template corresponding with cryptographic Hash label is searched from caching.
Step S504, judges whether two cryptographic Hash label correspondence identical web page templates.One cryptographic Hash label
Multiple webpages can be corresponded to, if the corresponding web page template of two cryptographic Hash labels is identical, two cryptographic Hash labels are corresponding
Multiple webpages also correspond to an identical web page template.
Step S505, if two cryptographic Hash label correspondence identical web page templates, obtain two cryptographic Hash labels pair
The multiple webpages answered, and the corresponding multiple webpages of two cryptographic Hash labels are added to the webpage of identical web page template covering
Concentrate.One cryptographic Hash label correspondence multiple webpage, multiple webpages may correspond to multiple web page templates, return in web page template table
Belong under different web page template.If the corresponding web page template of two cryptographic Hash labels is identical, in web page template table
The corresponding multiple webpages of two cryptographic Hash labels are belonged under identical web page template.
Webpage concentration and the webpage mould that the web page template is covered are added to by by the webpage of correspondence same web page template
The renewal of plate table, can will can correspond to the webpage collection that the webpage ownership of same web page template is covered in same web page template
In, can be more convenient quickly to inquire the web page template of needs again when query webpage template table, and directly invoke
The template.
The preferred embodiments of the present invention are the foregoing is only, is not intended to limit the invention, for the skill of this area
For art personnel, the present invention can have various modifications and variations.It is all within the spirit and principles in the present invention, made any repair
Change, equivalent, improvement etc., should be included within the scope of the present invention.
Claims (8)
1. a kind of web page template generation method for realizing incremental transmission, it is characterised in that including:
Obtain the web data of webpage;
Cryptographic Hash label is generated to the web data;
Search web page template corresponding with the cryptographic Hash label;
Calculate the incremental encoding data between the web page template and the webpage for finding;And
Incremental encoding data according to being calculated determine whether the new web page template of generation,
Wherein, web data generation cryptographic Hash label is included:Cryptographic Hash is generated according to the web data, according to default
Rule is arranged the cryptographic Hash, and taking prefix according to domain name and to the cryptographic Hash after arrangement obtains multiple cryptographic Hash labels;
Searching web page template corresponding with the cryptographic Hash label includes:According to the cryptographic Hash label lookup template table, obtain
Web page template corresponding with the multiple cryptographic Hash label.
2. the web page template generation method for realizing incremental transmission according to claim 1, it is characterised in that according to calculating
To incremental encoding data determine whether that the new web page template of generation includes:
Compare the web page template data and the web data for finding, obtain webpage incremental encoding data;
Judge the webpage incremental encoding data whether more than given threshold;
If the webpage incremental encoding data are less than or equal to given threshold, based on the corresponding webpage of the incremental encoding data
Template transmits incremental encoding data;
If the webpage incremental encoding data are more than given threshold, new web page template is generated.
3. the web page template generation method for realizing incremental transmission according to claim 1, it is characterised in that according to calculating
To incremental encoding data determine whether that the new web page template of generation includes:
Compare the web page template data and the web data for finding, obtain webpage incremental encoding data;
Calculate the ratio of the webpage incremental encoding data and the web data;
Judge whether the incremental encoding data of the webpage are less than setting fractional threshold with the ratio of the web data;
If the webpage incremental encoding data are less than setting fractional threshold with the ratio of the web data, the webpage is added
The webpage entered to the corresponding web page template covering of the incremental encoding data is concentrated;
If the webpage incremental encoding data are new more than or equal to fractional threshold, generation is set with the ratio of the web data
Web page template.
4. the web page template generation method for realizing incremental transmission according to claim 1, it is characterised in that in lookup and institute
State after the corresponding web page template of cryptographic Hash label, methods described also includes:
Judge whether two cryptographic Hash label correspondence identical web page templates;
If two cryptographic Hash label correspondence identical web page templates, the corresponding net of the cryptographic Hash label of acquisition two
Page, and the webpage that the corresponding webpage of two cryptographic Hash labels is added to the identical web page template covering is concentrated.
5. a kind of web page template generating means for realizing incremental transmission, it is characterised in that including:
Acquiring unit, the web data for obtaining webpage;
Tag unit, for generating cryptographic Hash label to the web data;
Searching unit, for searching web page template corresponding with the cryptographic Hash label;
Computing unit, for calculating the incremental encoding data between the web page template and the webpage that find;And
Generation unit, the web page template new for determining whether generation according to the incremental encoding data being calculated,
Wherein, the tag unit includes:
First generation module, for generating cryptographic Hash according to the web data;
Arrangement module, for being arranged the cryptographic Hash according to preset rules;
Label model, multiple cryptographic Hash labels are obtained for taking prefix according to domain name and to the cryptographic Hash after arrangement,
Wherein, the searching unit is used for according to the cryptographic Hash label lookup template table, obtains the multiple cryptographic Hash label
Corresponding web page template.
6. web page template generating means for realizing incremental transmission according to claim 5, it is characterised in that the generation list
Unit includes:
Comparison module, for comparing the web page template data and the web data that find, obtains webpage incremental encoding data;
First judge module, for judging the webpage incremental encoding data whether more than given threshold;
First transport module, for when the webpage incremental encoding data are less than or equal to given threshold, being compiled based on the increment
The corresponding web page template transmission incremental encoding data of code data;
Second generation module, for when the webpage incremental encoding data are more than given threshold, generating new web page template.
7. web page template generating means for realizing incremental transmission according to claim 5, it is characterised in that the generation list
Unit includes:
Comparison module, for comparing the web page template data and the web data that find, obtains webpage incremental encoding data;
Computing module, the ratio for calculating the webpage incremental encoding data and the web data;
Second judge module, for judging the incremental encoding data of the webpage set with whether the ratio of netpage coded data is less than
Determine fractional threshold;
Second transport module, for being less than setting ratio threshold with the ratio of the web data in the webpage incremental encoding data
During value, the webpage that the webpage is added to the corresponding web page template covering of the incremental encoding data is concentrated;
3rd generation module, for the ratio in the incremental encoding data and web data more than or equal to setting fractional threshold
When, generate new web page template.
8. web page template generating means for realizing incremental transmission according to claim 5, it is characterised in that described device is also
Including:
Judging unit, for judging whether two cryptographic Hash label correspondence identical web page templates;
Combining unit, in two cryptographic Hash label correspondence identical web page templates, obtaining two cryptographic Hash
The corresponding webpage of label, and the corresponding webpage of two cryptographic Hash labels is added to the identical web page template covering
Webpage concentrate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310612919.1A CN103593467B (en) | 2013-11-26 | 2013-11-26 | Method and device for generating webpage template and achieving incremental transmission |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310612919.1A CN103593467B (en) | 2013-11-26 | 2013-11-26 | Method and device for generating webpage template and achieving incremental transmission |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103593467A CN103593467A (en) | 2014-02-19 |
CN103593467B true CN103593467B (en) | 2017-05-24 |
Family
ID=50083608
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310612919.1A Active CN103593467B (en) | 2013-11-26 | 2013-11-26 | Method and device for generating webpage template and achieving incremental transmission |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103593467B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066175B (en) * | 2017-04-18 | 2020-06-16 | 湖南福米信息科技有限责任公司 | Method and device for generating display interface of securities |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101950312A (en) * | 2010-08-18 | 2011-01-19 | 赵清政 | Method for analyzing webpage content of internet |
CN102129436A (en) * | 2010-01-20 | 2011-07-20 | 北大方正集团有限公司 | Method, system and device for constructing webpage template |
-
2013
- 2013-11-26 CN CN201310612919.1A patent/CN103593467B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102129436A (en) * | 2010-01-20 | 2011-07-20 | 北大方正集团有限公司 | Method, system and device for constructing webpage template |
CN101950312A (en) * | 2010-08-18 | 2011-01-19 | 赵清政 | Method for analyzing webpage content of internet |
Also Published As
Publication number | Publication date |
---|---|
CN103593467A (en) | 2014-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102651002B (en) | A kind of method for abstracting web page information and its system | |
CN102043862B (en) | Directional web data extraction method | |
CN102096712A (en) | Method and device for cache-control of mobile terminal | |
CN104462285B (en) | A kind of method for secret protection of Information Mobile Service inquiry system | |
CN102622444B (en) | XML (extensible markup language) message processing method and XML message processing device | |
CN103345493B (en) | Method that content of text on mobile terminal shows, Apparatus and system | |
CN102185901A (en) | Client message conversion method | |
CN104239162A (en) | Data check method and data check device | |
CN102708168A (en) | System and method for sorting search results of teaching resources | |
CN103491089B (en) | Code-transferring method and system in a kind of data convert based on HTTP | |
CN102184240B (en) | Webpage layout method and system based on mobile communication equipment terminal | |
CN104200380B (en) | The localization method and device of promotion message | |
CN102799655B (en) | The treating method and apparatus of imperfect picture information in a kind of webpage | |
CN106790444A (en) | Network data exchange method and device | |
CN104346443A (en) | Web text processing method and device | |
CN103593467B (en) | Method and device for generating webpage template and achieving incremental transmission | |
CN107784107A (en) | Dark chain detection method and device based on flight behavior analysis | |
CN103426325A (en) | Two-dimensional bar code electronic station board | |
CN103049445B (en) | A kind of method, system and picture state server for inquiring about pictorial information | |
CN103076894A (en) | Method and equipment for building input entries for object identity information according to object identity information | |
CN103631944B (en) | A kind of content-based similar webpage splitting method | |
CN107229653A (en) | Pseudo- static Web page generation method and device | |
CN113126980A (en) | Page generation method and device and electronic equipment | |
CN103970755A (en) | Novel catalog entry identification method, device and system | |
Rajkumar et al. | Dynamic web page segmentation based on detecting reappearance and layout of tag patterns for small screen devices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200610 Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province Patentee after: Alibaba (China) Co.,Ltd. Address before: 100080, room 16, building 10-20, Building 29, Haidian District, Suzhou Street, Beijing Patentee before: UC MOBILE Ltd. |
|
TR01 | Transfer of patent right |