CN103593467B - Method and device for generating webpage template and achieving incremental transmission - Google Patents

Method and device for generating webpage template and achieving incremental transmission Download PDF

Info

Publication number
CN103593467B
CN103593467B CN201310612919.1A CN201310612919A CN103593467B CN 103593467 B CN103593467 B CN 103593467B CN 201310612919 A CN201310612919 A CN 201310612919A CN 103593467 B CN103593467 B CN 103593467B
Authority
CN
China
Prior art keywords
web page
webpage
page template
data
cryptographic hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310612919.1A
Other languages
Chinese (zh)
Other versions
CN103593467A (en
Inventor
周向根
郑海洪
翟光亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Ucweb Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ucweb Inc filed Critical Ucweb Inc
Priority to CN201310612919.1A priority Critical patent/CN103593467B/en
Publication of CN103593467A publication Critical patent/CN103593467A/en
Application granted granted Critical
Publication of CN103593467B publication Critical patent/CN103593467B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a method and device for generating a webpage template and achieving incremental transmission. The method includes the steps of obtaining webpage data of a webpage, generating a hash value label for the webpage data, searching for the webpage template corresponding to the hash value label, calculating and finding incremental code data between the webpage template and the webpage, and determining whether to generate a new webpage template or not according to the incremental code data obtained through calculation. By means of the method and the device, the problem that in the prior art, the system overhead is large when incremental transmission is achieved and the webpage template is generated is solved, and the effect of saving the system overhead is achieved.

Description

The generation method and device of a kind of web page template for realizing incremental transmission
Technical field
The present invention relates to browser field, in particular to a kind of generation side of the web page template for realizing incremental transmission Method and device.
Background technology
For the mobile phone browser of C/S frameworks, when user browses webpage using browser, browser is by locally delaying Web page template is deposited, service end only needs to transmit the incremental encoding data of webpage, improve clear so as to reach saving network data transmission Look at the effect of speed.
In actual applications, whether not all webpage all utilizes the web page template of caching, for using caching Web page template, the size of the incremental encoding data being often decided by between template and webpage, if the increasing between template and webpage Amount coded data is small, then using the web page template of caching, if the incremental encoding data between template and webpage are not small enough, no Using the web page template of caching, new web page template is created.In the prior art, the new webpage mould of generation is being determined the need for During plate, if incrementally the size of coded data is judged, need to calculate increment of each web page template for webpage Coded data, if being cached with a large amount of web page templates, can cause overhead than larger.
For in the prior art when incremental transmission generation web page template is realized overhead than larger problem, at present still Effective solution is not proposed.
The content of the invention
Generation method and device it is a primary object of the present invention to provide a kind of web page template for realizing incremental transmission, with Solve the problems, such as in the prior art realize incremental transmission generation web page template when overhead than larger.
To achieve these goals, according to an aspect of the invention, there is provided a kind of net for realizing incremental transmission The generation method of page template.The generation method includes:Obtain the web data of webpage;Cryptographic Hash label is generated to web data; Search web page template corresponding with cryptographic Hash label;Calculate the incremental encoding data between the web page template and webpage for finding; Incremental encoding data according to being calculated determine whether the new web page template of generation.
Further, web data generation cryptographic Hash label is included:Cryptographic Hash is generated according to web data, according to default Rule is arranged cryptographic Hash, and taking prefix according to domain name and to the cryptographic Hash after arrangement obtains multiple cryptographic Hash labels;Search Web page template corresponding with cryptographic Hash label includes:According to cryptographic Hash label lookup template table, obtain and multiple cryptographic Hash labels Corresponding web page template.
Further, determine whether that the new web page template of generation includes according to the incremental encoding data being calculated:Compare The web page template data and web data for finding, obtain webpage incremental encoding data;Whether judge webpage incremental encoding data More than given threshold;If webpage incremental encoding data are less than or equal to given threshold, based on the corresponding net of incremental encoding data Page template transmission incremental encoding data;If webpage incremental encoding data are more than given threshold, new web page template is generated.
Further, determine whether that the new web page template of generation includes according to the incremental encoding data being calculated:Compare The web page template data and web data for finding, obtain webpage incremental encoding data;Calculate webpage incremental encoding data and net The ratio of page data;Judge whether the incremental encoding data of webpage are less than setting fractional threshold with the ratio of web data;If Webpage incremental encoding data are less than setting fractional threshold with the ratio of web data, and webpage is added into incremental encoding data correspondence Web page template covering webpage concentrate;If webpage incremental encoding data are more than or equal to setting ratio with the ratio of web data Threshold value, generates new web page template.
Further, after lookup web page template corresponding with cryptographic Hash label, the method also includes:Judge whether to deposit In two cryptographic Hash label correspondence identical web page templates;If two cryptographic Hash label correspondence identical web page templates, obtain Take the corresponding webpage of two cryptographic Hash labels, and the corresponding webpage of two cryptographic Hash labels is added to identical web page template The webpage of covering is concentrated.
To achieve these goals, according to another aspect of the present invention, there is provided a kind of net for realizing incremental transmission The generating means of page template.The generating means include:Acquiring unit, the web data for obtaining webpage;Tag unit, is used for Cryptographic Hash label is generated to web data;Searching unit, for searching web page template corresponding with cryptographic Hash label;Calculate single Unit, for calculating the incremental encoding data between the web page template and webpage that find;Generation unit, is calculated for basis Incremental encoding data determine whether the new web page template of generation.
Further, tag unit includes:First generation module, for generating cryptographic Hash according to web data;Arrangement mould Block, for being arranged cryptographic Hash according to preset rules;Label model, for being taken according to domain name and to the cryptographic Hash after arrangement Prefix obtains multiple cryptographic Hash labels, wherein, searching unit is used for according to cryptographic Hash label lookup template table, obtains multiple Hash The corresponding web page template of value label.
Further, generation unit includes:Comparison module, for comparing the web page template data and webpage number that find According to obtaining webpage incremental encoding data;First judge module, for judging webpage incremental encoding data whether more than setting threshold Value;First transport module, for when webpage incremental encoding data are less than or equal to given threshold, based on incremental encoding data correspondence Web page template transmission incremental encoding data;Second generation module, for webpage incremental encoding data be more than given threshold when, The new web page template of generation.
Further, generation unit includes:Comparison module, for comparing the web page template data and webpage number that find According to obtaining webpage incremental encoding data;Computing module, the ratio for calculating webpage incremental encoding data and web data;The Two judge modules, for judging whether the incremental encoding data of webpage are less than setting ratio threshold with the ratio of netpage coded data Value;Second transport module, during for the ratio in webpage incremental encoding data and web data less than setting fractional threshold, by net The webpage that page is added to the corresponding web page template covering of incremental encoding data is concentrated;3rd generation module, in multiple increments When coded data is more than or equal to setting fractional threshold with the ratio of web data, new web page template is generated.
Further, the device also includes:Judging unit is identical for judging whether two cryptographic Hash label correspondences Web page template;Combining unit, in two cryptographic Hash label correspondence identical web page templates, obtaining two cryptographic Hash marks Sign corresponding webpage, and the corresponding webpage of two cryptographic Hash labels is added to the webpage collection of identical web page template covering In.
By the present invention, the cryptographic Hash label of fixed number is generated using the cryptographic Hash of web data, according to the Kazakhstan of generation Uncommon value label lookup web page template, it is only necessary to which the web page template to finding carries out the judgement of incremental encoding size of data, without All web page templates are calculated, the overhead ratio when incremental transmission generation web page template is realized has been solved in technology Larger problem, and then reached the effect for saving overhead.
Brief description of the drawings
The accompanying drawing for constituting the part of the application is used for providing a further understanding of the present invention, schematic reality of the invention Apply example and its illustrate, for explaining the present invention, not constitute inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the signal of the web page template generating means for realizing incremental transmission according to a first embodiment of the present invention Figure;
Fig. 2 is the signal of the web page template generating means for realizing incremental transmission according to a second embodiment of the present invention Figure;
Fig. 3 is the signal of the web page template generating means for realizing incremental transmission according to a third embodiment of the present invention Figure;
Fig. 4 is the signal of the web page template generating means for realizing incremental transmission according to a fourth embodiment of the present invention Figure;
Fig. 5 is the signal of the web page template generating means for realizing incremental transmission according to a fifth embodiment of the present invention Figure;
Fig. 6 is the flow chart of the web page template generation method for realizing incremental transmission according to embodiments of the present invention;
Fig. 7 is to generate cryptographic Hash in the web page template generation method for realizing incremental transmission according to embodiments of the present invention The flow chart of label;
Fig. 8 is that life is determined whether in the web page template generation method for realizing incremental transmission according to embodiments of the present invention Into the flow chart of web page template method;
Fig. 9 is that life is determined whether in the web page template generation method for realizing incremental transmission according to embodiments of the present invention Into the method for optimizing flow chart of web page template;And
Figure 10 is the flow of the web page template generation method for realizing incremental transmission according to a second embodiment of the present invention Figure.
Specific embodiment
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combination.Describe the present invention in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present invention, it is clear that described embodiment is only The embodiment of a part of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people The every other embodiment that member is obtained under the premise of creative work is not made, should all belong to the model of present invention protection Enclose.
It should be noted that term " first ", " in description and claims of this specification and above-mentioned accompanying drawing Two " it is etc. for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so using Data can exchange in the appropriate case, so that embodiments of the invention described herein for example can be with except scheming herein Order beyond those shown or describe is implemented.Additionally, term " comprising " and " having " and their any deformation, it is intended that In cover it is non-exclusive include, for example, the process, method, system, product or the equipment that contain series of steps or unit need not Be limited to those steps clearly listed or unit, but may include not list clearly or for these processes, method, Product or other intrinsic steps of equipment or unit.
The embodiment of the invention provides a kind of web page template generating means for realizing incremental transmission.The web page template is given birth to It is used to generate web page template to realize that the increment to web page contents is transmitted into device.
Fig. 1 is the signal of the web page template generating means for realizing incremental transmission according to a first embodiment of the present invention Figure.As illustrated, this realizes that the web page template generating means of incremental transmission include acquiring unit 10, tag unit 20, search list Unit 30, computing unit 40 and generation unit 50.
Acquiring unit 10 is used to obtain the web data of webpage.Acquiring unit 10 can obtain the webpage of any one webpage Data, the web data of acquisition can include obtaining the data of all the elements in webpage, such as in webpage in news data, webpage Ad data, link data in webpage etc..
Obtaining web data can first obtain web page address, then further according to the web page address for getting from web data Corresponding web data is obtained in table.The field portions and description section of web data can be stored in web data table.For example, The field of storage is " network address ", and corresponding description is that " web page address removes protocol section, removes anchor section, and domain name is overturn by domain Part ".For example:http://www.sina.com.cn/a/b.phpAc=b#ab, corresponding description section should be: cn.com.sina.www/a/b.php?ac=b。
Tag unit 20 is used to generate cryptographic Hash label to web data.One web data can generate a Hash Value, the cryptographic Hash of generation can be the cryptographic Hash, or the cryptographic Hash of 128 of 64, can according to system it needs to be determined that Generate the cryptographic Hash of suitable digit.For example, in 64 cryptographic Hash can just meet desired system, the cryptographic Hash of generation 128 The burden of system can be increased on the contrary, thus it is suitable in the generation digit of cryptographic Hash.
The cryptographic Hash that tag unit 20 can be generated by web data generates multiple cryptographic Hash labels, each cryptographic Hash mark One web page template of correspondence is signed, the web page template covers one or more webpages.
Searching unit 30 is used to search web page template corresponding with cryptographic Hash label.Web page template can be a webpage, One webpage can be as the template of another webpage, a web page template correspondence multiple cryptographic Hash label, can be by it In any one cryptographic Hash label searched.Web page template can be stored in the web page template in caching, searching unit 30 Web page template corresponding with cryptographic Hash label is searched from caching.
Computing unit 40 is for calculating the incremental encoding data between the web page template and webpage that find.Web page template with Incremental encoding data between webpage can be data divisions different between web data and web page template data.If searched Web page template have a multiple, calculate the incremental encoding data between multiple web page templates and webpage.
Generation unit 50 is used to determine whether the new web page template of generation according to the incremental encoding data being calculated.If The incremental encoding data being calculated then generate new web page template more than the threshold value of setting;If the increment being calculated is compiled Code data then directly invoke the web page template of caching less than or equal to the threshold value of setting.
By foregoing description it is recognised that by the corresponding web page template of cryptographic Hash label lookup, and can calculate and search To the incremental encoding data between web page template and webpage, it is determined that directly invoking the web page template in caching or generating new net Page template, so as to reduce the number of times that web page template and webpage are compared, has saved overhead.
Fig. 2 is the signal of the web page template generating means for realizing incremental transmission according to a second embodiment of the present invention Figure.The web page template generating means for realizing incremental transmission of embodiment illustrated in fig. 2 can increase as the realization of embodiment illustrated in fig. 1 Measure the preferred embodiment of the web page template generating means of transmission.The webpage for realizing incremental transmission of the preferred embodiment Template generation device includes acquiring unit 10, tag unit 20, searching unit 30, computing unit 40 and generation unit 50, wherein, Tag unit 20 includes the first generation module 201, arrangement module 202 and label model 203.
Acquiring unit 10, searching unit 30, computing unit 40 and generation unit 50 and Fig. 1 institutes in embodiment illustrated in fig. 2 Show that the function phase of acquiring unit 10 in embodiment, searching unit 30, computing unit 40 and generation unit 50 is same, do not do herein superfluous State.
First generation module 201 is used to generate cryptographic Hash according to web data.Calculating cryptographic Hash according to web data can be with It is simhash values, wherein, simhash is the one kind in local sensitivity hash algorithm.Below generating one 64 The method for generating cryptographic Hash is described in detail as a example by simhash values.
First, the integer vectors V [i] of one 64 dimension is initialized as 0.Be initialized as 0 can be by 64 dimension integer to The mould of amount V [i] is set to 0.
Secondly, it is the n substring of byte to cut length from each byte position of webpage, used as the feature set of webpage. The number of the feature in the feature set of webpage can be the number of the byte of whole webpage.Each in feature set is characterized in one The n word string of byte.N can take 64, or 32, or other numerical value such as 20.
Again, to each feature in feature set, the binary system of 64 is produced using a character string hash function Integer, to each position i of the bigit of 64:If value is 1, V [i] Jia 1, V [i] is subtracted 1.
Finally, create an integer for new 64, the integer of new 64 for creating be by 64 each positions of integer with It is obtained in the previous step vector correspond, that is, the integer to this new establishment each position i, it is vectorial in upper step V [i] is set to 1 when being not less than 0, is otherwise set to 0, thus obtains the integer of the binary number representation of 64, this integer conduct Simhash values.
Arrangement module 202 is used to carry out step-by-step random alignment to cryptographic Hash according to preset rules.For example, can be breathed out to 64 Uncommon value carries out 32 random alignments.
It should be noted that the number of times of arrangement can need and be determined according to real system, however it is not limited to the present invention The number of times that embodiment is provided.Arrangement number of times provided in an embodiment of the present invention is used for the purpose of elaborating embodiment party of the invention Case, does not do exhaustive and limits.
Label model 203 is used to be worth to multiple cryptographic Hash labels according to the Hash after domain name and arrangement.Cryptographic Hash is entered After row arrangement, the cryptographic Hash after arrange, regular length position and domain name are together as Kazakhstan before taking the cryptographic Hash after arrangement Uncommon value label.For example, after the random alignment of 32 times is done to the cryptographic Hash step-by-step of 64, obtaining the cryptographic Hash after 32 arrangements. The domain name of first 16 of cryptographic Hash after arrangement and webpage can be taken as cryptographic Hash label, its form can be " domain name/arrangement First 16 of cryptographic Hash afterwards ".So, the cryptographic Hash step-by-step of 64 is done after the random alignment of 32 times, obtains 32 cryptographic Hash Label.
It is pointed out that first 16 that take the cryptographic Hash after arrangement here can also take 32 or 8, here only It is, for clearer elaboration the solution of the present invention, the present invention program not to be done and is improperly limited.
Searching unit 40 is used for according to cryptographic Hash label lookup template table, obtains webpage corresponding with multiple cryptographic Hash labels Template.Web page template corresponding with multiple cryptographic Hash labels can be one, or multiple.Searching unit can carry out N Secondary lookup, searches number of times of the number of times no more than arrangement.For example, after the random alignment of 32 times has been carried out to cryptographic Hash, at most Carry out the lookup of 32 times.
Cryptographic Hash is generated by web page template, and cryptographic Hash label is worth to according to Hash, using cryptographic Hash label lookup Web page template can greatly reduce the number of times of query webpage template, increased the speed of system processes data, and improve and look into Look for the accuracy of web page template.
Fig. 3 is the signal of the web page template generating means for realizing incremental transmission according to a third embodiment of the present invention Figure.The web page template generating means for realizing incremental transmission of embodiment illustrated in fig. 3 can increase as the realization of embodiment illustrated in fig. 1 Measure the preferred embodiment of the web page template generating means of transmission.The webpage for realizing incremental transmission of the preferred embodiment Template generation device includes acquiring unit 10, tag unit 20, searching unit 30, computing unit 40 and generation unit 50, wherein, Generation unit 50 includes:Comparison module 501, the first judge module 503, the first transport module 505 and the second generation module 507.
The acquiring unit 10 of embodiment illustrated in fig. 3, tag unit 20, searching unit 30 and computing unit 40 with shown in Fig. 1 The acquiring unit 10 of embodiment, tag unit 20, searching unit 30 and the function phase of computing unit 40 are same, will not be described here.
Comparison module 501 is used to compare the web page template data and the web data for finding, and obtains webpage increment volume Code data.The web page template for finding can be a web page template, or multiple web page templates, in the webpage for finding When template is multiple template, the incremental encoding data between each template and web data in multiple template can be compared.Can To obtain webpage incremental encoding data by way of comparing one by one, it is also possible to while each mould in being respectively compared multiple template Incremental encoding data between plate and web data.
Whether the first judge module 503 is used to judge webpage incremental encoding data more than given threshold.If incremental encoding Data are larger, that is, webpage is lower with the similarity of web page template, when incremental encoding data are more than threshold value, find Web page template cannot meet the requirement of webpage, it is impossible to as the template of webpage, it is therefore desirable to incremental encoding data and threshold Value is compared.
First transport module 505 is used for when webpage incremental encoding data are less than or equal to given threshold, based on incremental encoding The corresponding web page template transmission incremental encoding data of data.
If incremental encoding data are less than or equal to given threshold, then the similarity of webpage and the web page template for finding compared with Height, the web page template for finding disclosure satisfy that the requirement of webpage, then the web page template that will can be found is stored in the webpage mould The webpage of plate covering is concentrated, it is also possible to transmit incremental encoding data based on the web page template for finding.
Second generation module 507 is used to, when webpage incremental encoding data are more than given threshold, generate new web page template. If incremental encoding data are more than given threshold, then webpage is unable to reach webpage with the similarity of the web page template for finding It is required that, then the new web page template of generation.The new web page template of generation can be the webpage that will get as new webpage mould Plate.
By directly comparing the size of incremental encoding data and given threshold, it is determined that directly invoke caching web page template or Person generates new web page template, can more facilitate and accurately judgement is called or generates web page template.
Fig. 4 is the signal of the web page template generating means for realizing incremental transmission according to a fourth embodiment of the present invention Figure.The web page template generating means for realizing incremental transmission of embodiment illustrated in fig. 4 can increase as the realization of embodiment illustrated in fig. 1 Measure the preferred embodiment of the web page template generating means of transmission.The webpage for realizing incremental transmission of the preferred embodiment Template generation device includes acquiring unit 10, tag unit 20, searching unit 30, computing unit 40 and generation unit 50, wherein, Generation unit 50 includes:Comparison module 501, computing module 502, the second judge module 504, the second transport module 506 and the 3rd Generation module 508.
The acquiring unit 10 of embodiment illustrated in fig. 4, tag unit 20, searching unit 30 and computing unit 40 with shown in Fig. 1 The acquiring unit 10 of embodiment, tag unit 20, searching unit 30 and the function phase of computing unit 40 are same, will not be described here.
Comparison module 501 compares the web page template data and the web data for finding, and obtains webpage incremental encoding number According to.The web page template for finding can be a web page template, or multiple web page templates, in the web page template for finding During for multiple template, the incremental encoding data between each template and web data in multiple template can be compared.Can lead to Cross the mode for comparing one by one and obtain webpage incremental encoding data, it is also possible at the same each template in being respectively compared multiple template with Incremental encoding data between web data.
Computing module 502 is used to calculate the ratio of webpage incremental encoding data and web data.Incremental encoding number can be used According to than upper web data, then incremental encoding data are smaller with the ratio of web data, then the similarity of web page template and webpage Higher, incremental encoding data are bigger with the ratio of web data, then web page template is lower with the similarity of webpage.
Second judge module 504 is used to judge whether the incremental encoding data of webpage are less than setting with the ratio of web data Fractional threshold.Can be by judging whether webpage incremental encoding data are less than setting fractional threshold with the ratio of web data, really Determine webpage and web page template similarity whether meet webpage the need for.
Second transport module 506 is used for the ratio in webpage incremental encoding data and web data less than setting fractional threshold When, the webpage that webpage is added to the corresponding web page template covering of incremental encoding data is concentrated.If incremental encoding data and net The ratio of page data is less than setting fractional threshold, then webpage disclosure satisfy that the requirement of webpage with the similarity of web page template, can The webpage concentration that the corresponding web page template of incremental encoding data is covered is added to by the webpage, when reusing the web page template The web page template can be directly invoked, and incremental encoding data are transmitted based on the web page template for calling, without again Generation.
3rd generation module 508 is used for the ratio in multiple incremental encoding data and web data more than setting fractional threshold When, generate new web page template.If incremental encoding data are more than or equal to setting fractional threshold with the ratio of web data, then Webpage cannot meet the requirement of webpage with the similarity of web page template, can generate new web page template, it is also possible to straight with webpage Connect as new web page template.
Can more accurately be sentenced compared with setting fractional threshold with the ratio of web data by incremental encoding data Suspension page and the similarity of web page template, the setting fractional threshold go for including the webpage of different pieces of information amount, without The threshold value of different incremental encoding data must be set according to the data volume of different web pages, improve and webpage is transmitted based on web page template The applicability of delta file method, and more facilitate.
Fig. 5 is the signal of the web page template generating means for realizing incremental transmission according to a fifth embodiment of the present invention Figure.The web page template generating means for realizing incremental transmission of embodiment illustrated in fig. 5 can increase as the realization of embodiment illustrated in fig. 1 Measure the preferred embodiment of the web page template generating means of transmission.The webpage for realizing incremental transmission of the preferred embodiment Template generation device includes that acquiring unit 10, tag unit 20, searching unit 30, computing unit 40, generation unit 50, judgement are single Unit 60 and combining unit 70.
The acquiring unit 10 of embodiment illustrated in fig. 5, tag unit 20, searching unit 30, computing unit 40 and generation unit 50 with the acquiring unit 10 of embodiment illustrated in fig. 1, tag unit 20, searching unit 30, computing unit 40 and the work(of generation unit 50 Can be identical, will not be described here.
Judging unit 60 is used to judge whether two cryptographic Hash label correspondence identical web page templates.One cryptographic Hash Label can correspond to one or more webpages, if the corresponding web page template of two cryptographic Hash labels is identical, two cryptographic Hash The corresponding webpage of label also corresponds to an identical web page template.
Combining unit 70 is used to, in two cryptographic Hash label correspondence identical web page templates, obtain two cryptographic Hash labels Corresponding webpage, and the webpage that the corresponding webpage of two cryptographic Hash labels is added to the covering of identical web page template is concentrated. One cryptographic Hash label can correspond to multiple webpages, and multiple webpages can correspond to multiple web page templates, many in web page template table Individual webpage is likely to be present under different web page templates.If the corresponding web page template of two cryptographic Hash labels is identical, The corresponding webpage of two cryptographic Hash labels is added into the webpage that identical web page template covered in web page template table to concentrate.
By the merging and the renewal of web page template table of webpage, the webpage that can will be possible to use same web page template is returned Belong to the webpage covered in same web page template to concentrate, can more easily be inquired when query webpage template table needs again The web page template wanted, and directly invoke the web page template, it is to avoid the overhead for generating again and bringing.
The embodiment of the present invention additionally provides a kind of web page template generation method for realizing incremental transmission.The present invention is implemented What the web page template generation method for realizing incremental transmission of example can be provided by the embodiment of the present invention realizes that increment is passed Defeated web page template generating means are performed, and the web page template generating means for realizing incremental transmission of the embodiment of the present invention can also For performing the web page template generation method for realizing incremental transmission that the embodiment of the present invention is provided.
Below according to accompanying drawing to for realizing that the web page template generation method of incremental transmission is described in detail.Need explanation , following step and can be in the such as one group calculating of computer executable instructions the step of the flow of accompanying drawing is illustrated Performed in machine system, and, although logical order is shown in flow charts, but in some cases, can be being different from Order herein performs shown or described step.
Fig. 6 is the flow chart of the web page template generation method for realizing incremental transmission according to embodiments of the present invention.With Lower combination flow chart is illustrated to the web page template generation method for realizing incremental transmission of the present embodiment.As illustrated, should Method comprises the following steps:
Step S101, obtains the web data of webpage.The web data for obtaining webpage can obtain any one webpage Web data, the web data of acquisition can include obtaining the data of all the elements in webpage, such as news data, net in webpage Ad data in page, the link data in webpage etc..
Obtaining web data can first obtain web page address, then further according to the web page address for getting from web data Corresponding web data is obtained in table.The field portions and description section of web data can be stored in web data table.For example, The field of storage is " network address ", and corresponding description is that " web page address removes protocol section, removes anchor section, and domain name is overturn by domain Part ".For example:http://www.sina.com.cn/a/b.phpAc=b#ab, corresponding description section can be: cn.com.sina.www/a/b.php?ac=b。
Step S102, cryptographic Hash label is generated to web data.One web data can generate a cryptographic Hash, generation Cryptographic Hash can be the cryptographic Hash, or the cryptographic Hash of 128 of 64, can be according to system it needs to be determined that generation is suitable The cryptographic Hash of digit.For example, in 64 cryptographic Hash can just meet desired system, the cryptographic Hash of generation 128 can increase on the contrary The burden of adding system, therefore generation here is the cryptographic Hash that suitable digit is generated according to system situation.
The cryptographic Hash generated by web data generates cryptographic Hash label, and each cryptographic Hash label can correspond to a net Page, it is also possible to correspondence multiple webpage.
Step S103, searches web page template corresponding with cryptographic Hash label.Web page template can be a webpage, a net Page can be as the template of another webpage, cryptographic Hash label one web page template of correspondence.Web page template can be protected Web page template in the buffer is deposited, searching unit 30 searches web page template corresponding with cryptographic Hash label from caching.
Step S104, calculates the incremental encoding data between the web page template and webpage for finding.Web page template and webpage Between incremental encoding data can be data divisions different between web data and web page template data.If the net searched Page template has multiple, calculates the incremental encoding data between multiple web page templates and webpage.
Step S105, the new web page template of generation is determined whether according to the incremental encoding data being calculated.If calculated The incremental encoding data for obtaining then generate new web page template more than the threshold value of setting;If the incremental encoding number being calculated According to the threshold value less than or equal to setting, then the web page template of caching is directly invoked.
By foregoing description it is recognised that by the corresponding web page template of cryptographic Hash label lookup, and can calculate and search To the incremental encoding data between web page template and webpage, it is determined that directly invoking the web page template in caching or generating new net Page template, so as to reduce the number of times that web page template and webpage are compared, has saved overhead.
Fig. 7 is to generate cryptographic Hash in the web page template generation method for realizing incremental transmission according to embodiments of the present invention The flow chart of label.The method comprises the following steps:
Step S201, cryptographic Hash is generated according to web data.It can be simhash to calculate cryptographic Hash according to web data Value.The method for generating cryptographic Hash is described in detail as a example by generating the one 64 simhash value for being below.
First, the integer vectors V [i] of one 64 dimension is initialized as 0.Be initialized as 0 can be by 64 dimension integer to The mould of amount V [i] is set to 0.
Secondly, it is the n substring of byte to cut length from each byte position of webpage, used as the feature set of webpage. The number of the feature in the feature set of webpage can be the number of the byte of whole webpage.Each in feature set is characterized in one The n word string of byte.N can take 64, or 32, or other numerical value such as 20.
Again, to each feature in feature set, the binary system of 64 is produced using a character string hash function Integer, to each position i of the bigit of 64:If value is 1, V [i] Jia 1, V [i] is subtracted 1.
Finally, create an integer for new 64, the integer of new 64 for creating be by 64 each positions of integer with It is obtained in the previous step vector correspond, that is, the integer to this new establishment each position i, the vectorial V in upper step [i] is set to 1 when being not less than 0, is otherwise set to 0, thus obtains the integer of the binary number representation of 64, this integer conduct Simhash values.
Step S202, step-by-step random alignment is carried out according to preset rules to cryptographic Hash.For example, can enter to 64 cryptographic Hash 32 random alignments of row.
It should be noted that arrangement number of times here can need and be determined according to real system, however it is not limited to this The number of times that inventive embodiments are provided.Arrangement number of times provided in an embodiment of the present invention is used for the purpose of elaborating implementation of the invention Scheme, does not do exhaustive and limits.
Step S203, multiple cryptographic Hash labels are worth to according to the Hash after domain name and arrangement.Cryptographic Hash is arranged Afterwards, the cryptographic Hash after being arranged, regular length position and domain name are together as cryptographic Hash mark before taking the cryptographic Hash after arrangement Sign.For example, after the random alignment of 32 times is done to the cryptographic Hash step-by-step of 64, obtaining the cryptographic Hash after 32 arrangements.Can take , used as cryptographic Hash label, its form can be " Hash after domain name/arrangement for first 16 of cryptographic Hash after arrangement and the domain name of webpage First 16 of value ".So, the cryptographic Hash step-by-step of 64 is done after the random alignment of 32 times, obtains 32 cryptographic Hash labels.
It is pointed out that first 16 that take the cryptographic Hash after arrangement here can also take 32 or 8, here only It is, for clearer elaboration the solution of the present invention, the present invention program not to be done and is improperly limited.
Step S204, according to cryptographic Hash label lookup template table, obtains web page template corresponding with multiple cryptographic Hash labels. Web page template corresponding with multiple cryptographic Hash labels can be one, or multiple.Searching unit can carry out looking into for n times Look for, search number of times of the number of times no more than arrangement.For example, after the random alignment of 32 times has been carried out to cryptographic Hash, at most carrying out 32 Secondary lookup.
Cryptographic Hash is generated by web page template, and cryptographic Hash label is worth to according to Hash, using cryptographic Hash label lookup Web page template can greatly reduce the number of times of query webpage template, increased the speed of system processes data, and improve and look into Look for the accuracy of web page template.
Fig. 8 is that life is determined whether in the web page template generation method for realizing incremental transmission according to embodiments of the present invention Into the flow chart of web page template method.The method comprises the following steps:
Step S301, compares the web page template data and web data for finding, and obtains webpage incremental encoding data.Search To web page template can be a web page template, or multiple web page template, be multiple in the web page template for finding During web page template, to compare the incremental encoding data of each template and webpage in multiple template.
Whether step S302, judge webpage delta coded file more than given threshold.If incremental encoding data are larger, It is exactly webpage lower with the similarity of web page template, when incremental encoding data are more than threshold value, the web page template for finding The requirement of webpage cannot be met, it is impossible to as the template of webpage, it is therefore desirable to which incremental encoding data and threshold value are compared.
Step S303, if multiple web pages incremental encoding data are less than or equal to given threshold, based on incremental encoding data pair The web page template transmission incremental encoding data answered.If incremental encoding data are less than or equal to given threshold, then webpage and lookup The similarity of the web page template for arriving is higher, and the web page template for finding disclosure satisfy that the requirement of webpage, then can be based on finding Web page template transmission incremental encoding data.
Step S304, if webpage incremental encoding data are more than given threshold, generates new web page template.If increment Coded data is more than given threshold, then webpage is unable to reach the requirement of webpage with the similarity of the web page template for finding, that The new web page template of generation.The new web page template of generation can be the webpage that will get as new web page template.
By directly comparing the size of incremental encoding data and given threshold, it is determined that directly invoke caching web page template or Person generates new web page template, can more facilitate and accurately judgement is called or generates web page template.
Fig. 9 is that life is determined whether in the web page template generation method for realizing incremental transmission according to embodiments of the present invention Into the method for optimizing flow chart of web page template.The method comprises the following steps:
Step S401, compares the web page template data and web data for finding, and obtains webpage incremental encoding data.Search To web page template can be a web page template, or multiple web page template, be multiple in the web page template for finding During web page template, to compare the incremental encoding data of each template and webpage in multiple template.
Step S402, calculates the ratio of webpage incremental encoding data and web data.Can be with incremental encoding data than upper Web data, then incremental encoding data are smaller with the ratio of web data, then web page template is higher with the similarity of webpage, increases Amount coded data is bigger with the ratio of web data, then web page template is lower with the similarity of webpage.
Step S403, judges whether the incremental encoding data of webpage are less than setting fractional threshold with the ratio of web data. Webpage and net can be determined by judging whether webpage incremental encoding data are less than setting fractional threshold with the ratio of web data The need for whether the similarity of page template meets webpage.
Step S404, if webpage incremental encoding data are less than setting fractional threshold with the ratio of web data, by net The webpage that page is added to the corresponding web page template covering of incremental encoding data is concentrated.If incremental encoding data and web data Ratio is less than setting fractional threshold, then webpage disclosure satisfy that the requirement of webpage with the similarity of web page template, can be by webpage The webpage for being added to the corresponding web page template covering of incremental encoding data is concentrated, and can directly be adjusted when needing the web page template again The web page template is used, and incremental encoding data are transmitted based on the web page template.
Step S405, if webpage incremental encoding data are more than or equal to setting fractional threshold with the ratio of web data, The new web page template of generation.If incremental encoding data are more than or equal to setting fractional threshold with the ratio of web data, then net Page cannot meet the requirement of webpage with the similarity of web page template, can generate new web page template, it is also possible to direct with webpage As new web page template.
Can more accurately be sentenced compared with setting fractional threshold with the ratio of web data by incremental encoding data Suspension page and the similarity of web page template, the setting fractional threshold go for including the webpage of different pieces of information amount, without The threshold value of different incremental encoding data must be set according to the data volume of different web pages, improve and webpage is transmitted based on web page template The applicability of delta file method, and more facilitate.
Figure 10 is the flow of the web page template generation method for realizing incremental transmission according to a second embodiment of the present invention Figure.The method comprises the following steps:
Step S501, obtains the web data of webpage.The web data for obtaining webpage can obtain any one webpage Web data, the web data of acquisition can include obtaining the data of all the elements in webpage.
Step S502, cryptographic Hash label is generated to web data.One web data can generate a cryptographic Hash, generation Cryptographic Hash can be the cryptographic Hash, or the cryptographic Hash of 128 of 64, can be according to system it needs to be determined that generation is suitable The cryptographic Hash of digit.
Step S503, searches web page template corresponding with cryptographic Hash label.Web page template can be a webpage, a net Page can be as the template of another webpage, and web page template corresponding with cryptographic Hash label can be a web page template, Can be multiple web page templates.Web page template can be stored in the web page template in caching, can be with when web page template is needed Web page template corresponding with cryptographic Hash label is searched from caching.
Step S504, judges whether two cryptographic Hash label correspondence identical web page templates.One cryptographic Hash label Multiple webpages can be corresponded to, if the corresponding web page template of two cryptographic Hash labels is identical, two cryptographic Hash labels are corresponding Multiple webpages also correspond to an identical web page template.
Step S505, if two cryptographic Hash label correspondence identical web page templates, obtain two cryptographic Hash labels pair The multiple webpages answered, and the corresponding multiple webpages of two cryptographic Hash labels are added to the webpage of identical web page template covering Concentrate.One cryptographic Hash label correspondence multiple webpage, multiple webpages may correspond to multiple web page templates, return in web page template table Belong under different web page template.If the corresponding web page template of two cryptographic Hash labels is identical, in web page template table The corresponding multiple webpages of two cryptographic Hash labels are belonged under identical web page template.
Webpage concentration and the webpage mould that the web page template is covered are added to by by the webpage of correspondence same web page template The renewal of plate table, can will can correspond to the webpage collection that the webpage ownership of same web page template is covered in same web page template In, can be more convenient quickly to inquire the web page template of needs again when query webpage template table, and directly invoke The template.
The preferred embodiments of the present invention are the foregoing is only, is not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.It is all within the spirit and principles in the present invention, made any repair Change, equivalent, improvement etc., should be included within the scope of the present invention.

Claims (8)

1. a kind of web page template generation method for realizing incremental transmission, it is characterised in that including:
Obtain the web data of webpage;
Cryptographic Hash label is generated to the web data;
Search web page template corresponding with the cryptographic Hash label;
Calculate the incremental encoding data between the web page template and the webpage for finding;And
Incremental encoding data according to being calculated determine whether the new web page template of generation,
Wherein, web data generation cryptographic Hash label is included:Cryptographic Hash is generated according to the web data, according to default Rule is arranged the cryptographic Hash, and taking prefix according to domain name and to the cryptographic Hash after arrangement obtains multiple cryptographic Hash labels;
Searching web page template corresponding with the cryptographic Hash label includes:According to the cryptographic Hash label lookup template table, obtain Web page template corresponding with the multiple cryptographic Hash label.
2. the web page template generation method for realizing incremental transmission according to claim 1, it is characterised in that according to calculating To incremental encoding data determine whether that the new web page template of generation includes:
Compare the web page template data and the web data for finding, obtain webpage incremental encoding data;
Judge the webpage incremental encoding data whether more than given threshold;
If the webpage incremental encoding data are less than or equal to given threshold, based on the corresponding webpage of the incremental encoding data Template transmits incremental encoding data;
If the webpage incremental encoding data are more than given threshold, new web page template is generated.
3. the web page template generation method for realizing incremental transmission according to claim 1, it is characterised in that according to calculating To incremental encoding data determine whether that the new web page template of generation includes:
Compare the web page template data and the web data for finding, obtain webpage incremental encoding data;
Calculate the ratio of the webpage incremental encoding data and the web data;
Judge whether the incremental encoding data of the webpage are less than setting fractional threshold with the ratio of the web data;
If the webpage incremental encoding data are less than setting fractional threshold with the ratio of the web data, the webpage is added The webpage entered to the corresponding web page template covering of the incremental encoding data is concentrated;
If the webpage incremental encoding data are new more than or equal to fractional threshold, generation is set with the ratio of the web data Web page template.
4. the web page template generation method for realizing incremental transmission according to claim 1, it is characterised in that in lookup and institute State after the corresponding web page template of cryptographic Hash label, methods described also includes:
Judge whether two cryptographic Hash label correspondence identical web page templates;
If two cryptographic Hash label correspondence identical web page templates, the corresponding net of the cryptographic Hash label of acquisition two Page, and the webpage that the corresponding webpage of two cryptographic Hash labels is added to the identical web page template covering is concentrated.
5. a kind of web page template generating means for realizing incremental transmission, it is characterised in that including:
Acquiring unit, the web data for obtaining webpage;
Tag unit, for generating cryptographic Hash label to the web data;
Searching unit, for searching web page template corresponding with the cryptographic Hash label;
Computing unit, for calculating the incremental encoding data between the web page template and the webpage that find;And
Generation unit, the web page template new for determining whether generation according to the incremental encoding data being calculated,
Wherein, the tag unit includes:
First generation module, for generating cryptographic Hash according to the web data;
Arrangement module, for being arranged the cryptographic Hash according to preset rules;
Label model, multiple cryptographic Hash labels are obtained for taking prefix according to domain name and to the cryptographic Hash after arrangement,
Wherein, the searching unit is used for according to the cryptographic Hash label lookup template table, obtains the multiple cryptographic Hash label Corresponding web page template.
6. web page template generating means for realizing incremental transmission according to claim 5, it is characterised in that the generation list Unit includes:
Comparison module, for comparing the web page template data and the web data that find, obtains webpage incremental encoding data;
First judge module, for judging the webpage incremental encoding data whether more than given threshold;
First transport module, for when the webpage incremental encoding data are less than or equal to given threshold, being compiled based on the increment The corresponding web page template transmission incremental encoding data of code data;
Second generation module, for when the webpage incremental encoding data are more than given threshold, generating new web page template.
7. web page template generating means for realizing incremental transmission according to claim 5, it is characterised in that the generation list Unit includes:
Comparison module, for comparing the web page template data and the web data that find, obtains webpage incremental encoding data;
Computing module, the ratio for calculating the webpage incremental encoding data and the web data;
Second judge module, for judging the incremental encoding data of the webpage set with whether the ratio of netpage coded data is less than Determine fractional threshold;
Second transport module, for being less than setting ratio threshold with the ratio of the web data in the webpage incremental encoding data During value, the webpage that the webpage is added to the corresponding web page template covering of the incremental encoding data is concentrated;
3rd generation module, for the ratio in the incremental encoding data and web data more than or equal to setting fractional threshold When, generate new web page template.
8. web page template generating means for realizing incremental transmission according to claim 5, it is characterised in that described device is also Including:
Judging unit, for judging whether two cryptographic Hash label correspondence identical web page templates;
Combining unit, in two cryptographic Hash label correspondence identical web page templates, obtaining two cryptographic Hash The corresponding webpage of label, and the corresponding webpage of two cryptographic Hash labels is added to the identical web page template covering Webpage concentrate.
CN201310612919.1A 2013-11-26 2013-11-26 Method and device for generating webpage template and achieving incremental transmission Active CN103593467B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310612919.1A CN103593467B (en) 2013-11-26 2013-11-26 Method and device for generating webpage template and achieving incremental transmission

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310612919.1A CN103593467B (en) 2013-11-26 2013-11-26 Method and device for generating webpage template and achieving incremental transmission

Publications (2)

Publication Number Publication Date
CN103593467A CN103593467A (en) 2014-02-19
CN103593467B true CN103593467B (en) 2017-05-24

Family

ID=50083608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310612919.1A Active CN103593467B (en) 2013-11-26 2013-11-26 Method and device for generating webpage template and achieving incremental transmission

Country Status (1)

Country Link
CN (1) CN103593467B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107066175B (en) * 2017-04-18 2020-06-16 湖南福米信息科技有限责任公司 Method and device for generating display interface of securities

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101950312A (en) * 2010-08-18 2011-01-19 赵清政 Method for analyzing webpage content of internet
CN102129436A (en) * 2010-01-20 2011-07-20 北大方正集团有限公司 Method, system and device for constructing webpage template

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102129436A (en) * 2010-01-20 2011-07-20 北大方正集团有限公司 Method, system and device for constructing webpage template
CN101950312A (en) * 2010-08-18 2011-01-19 赵清政 Method for analyzing webpage content of internet

Also Published As

Publication number Publication date
CN103593467A (en) 2014-02-19

Similar Documents

Publication Publication Date Title
CN102651002B (en) A kind of method for abstracting web page information and its system
CN102043862B (en) Directional web data extraction method
CN102096712A (en) Method and device for cache-control of mobile terminal
CN104462285B (en) A kind of method for secret protection of Information Mobile Service inquiry system
CN102622444B (en) XML (extensible markup language) message processing method and XML message processing device
CN103345493B (en) Method that content of text on mobile terminal shows, Apparatus and system
CN102185901A (en) Client message conversion method
CN104239162A (en) Data check method and data check device
CN102708168A (en) System and method for sorting search results of teaching resources
CN103491089B (en) Code-transferring method and system in a kind of data convert based on HTTP
CN102184240B (en) Webpage layout method and system based on mobile communication equipment terminal
CN104200380B (en) The localization method and device of promotion message
CN102799655B (en) The treating method and apparatus of imperfect picture information in a kind of webpage
CN106790444A (en) Network data exchange method and device
CN104346443A (en) Web text processing method and device
CN103593467B (en) Method and device for generating webpage template and achieving incremental transmission
CN107784107A (en) Dark chain detection method and device based on flight behavior analysis
CN103426325A (en) Two-dimensional bar code electronic station board
CN103049445B (en) A kind of method, system and picture state server for inquiring about pictorial information
CN103076894A (en) Method and equipment for building input entries for object identity information according to object identity information
CN103631944B (en) A kind of content-based similar webpage splitting method
CN107229653A (en) Pseudo- static Web page generation method and device
CN113126980A (en) Page generation method and device and electronic equipment
CN103970755A (en) Novel catalog entry identification method, device and system
Rajkumar et al. Dynamic web page segmentation based on detecting reappearance and layout of tag patterns for small screen devices

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200610

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Alibaba (China) Co.,Ltd.

Address before: 100080, room 16, building 10-20, Building 29, Haidian District, Suzhou Street, Beijing

Patentee before: UC MOBILE Ltd.

TR01 Transfer of patent right