WO2015078231A1 - 网页模板生成方法和服务器 - Google Patents

网页模板生成方法和服务器 Download PDF

Info

Publication number
WO2015078231A1
WO2015078231A1 PCT/CN2014/087822 CN2014087822W WO2015078231A1 WO 2015078231 A1 WO2015078231 A1 WO 2015078231A1 CN 2014087822 W CN2014087822 W CN 2014087822W WO 2015078231 A1 WO2015078231 A1 WO 2015078231A1
Authority
WO
WIPO (PCT)
Prior art keywords
webpage
template
data
webpage template
list
Prior art date
Application number
PCT/CN2014/087822
Other languages
English (en)
French (fr)
Inventor
翟光亚
郑海洪
江蔚然
周向根
Original Assignee
优视科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201310605106.XA external-priority patent/CN103685476B/zh
Priority claimed from CN201310612915.3A external-priority patent/CN103605770A/zh
Application filed by 优视科技有限公司 filed Critical 优视科技有限公司
Publication of WO2015078231A1 publication Critical patent/WO2015078231A1/zh
Priority to US15/156,753 priority Critical patent/US10747951B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC

Definitions

  • the present invention relates to the field of mobile browsers, and in particular to a web page template generation method and server.
  • a compression technique is proposed in the prior art.
  • the website is allowed to provide templates and delta files.
  • the template needs to be downloaded only for the first time, and other requests are only You need to download the delta file and use the delta file and template file to rebuild the original page, thus reducing the client's access traffic. Therefore, the technology can utilize the same portion between multiple web pages to compress traffic.
  • the defect of the technology is that the target website must support the protocol, the dependence on the target website is relatively strong, and the target website itself needs to provide the template and the corresponding incremental file, which is one of the reasons why the compression technology cannot be widely promoted. .
  • the method for automatically generating a template in the prior art mainly extracts a public part according to a DOM (Document Obiect Model) tree structure, and the method has a large amount of calculation, is difficult to extract, and is not compatible.
  • the existing common template generation program is for a single website, and the processing scale is small.
  • the terminal device When the user browses the webpage, the terminal device needs to receive a large amount of webpage data sent by the server to implement webpage presentation. There is often a large amount of duplicate data between these displayed web pages. Each time the user browses these web pages with duplicate data, the repeated data needs to be repeatedly loaded and repeated, not only occupying more bandwidth during the transmission process, but also The response time is increased when the page is loaded, resulting in slow page browsing.
  • the main purpose of the present invention is to provide a webpage template generation method and a server, so as to solve the problem that the webpage template generation method in the prior art has a relatively strong dependence on the target website.
  • a web page template generation method includes: collecting webpage data of a webpage; and generating a webpage according to the webpage data.
  • the webpage template generates a template index according to the generated webpage template, and through the template index, the webpage template corresponding to the webpage can be retrieved.
  • the webpage template generating method further includes: publishing the webpage template and the template index to the plurality of template servers that provide the webpage template; the plurality of template servers respectively storing the webpage template and the template index;
  • the first template server in the template server uses the template index to retrieve the webpage template matching the webpage, and provides a template matching the webpage to other template servers of the plurality of template servers except the first template server.
  • publishing the webpage template and the template index to the plurality of template servers that provide the webpage template includes: after generating the plurality of webpage templates and the template index; calculating an overall difference ratio of the collection of the plurality of webpage templates and the historical template collection; determining the overall difference Whether the rate is greater than the preset overall difference rate threshold; if it is determined that the overall difference rate is greater than the preset overall difference rate threshold, the webpage template and the template index are published; if it is determined that the overall difference rate is not greater than the preset overall difference rate threshold, the rate is not released.
  • Web page template and template index is: after generating the plurality of webpage templates and the template index; calculating an overall difference ratio of the collection of the plurality of webpage templates and the historical template collection; determining the overall difference Whether the rate is greater than the preset overall difference rate threshold; if it is determined that the overall difference rate is greater than the preset overall difference rate threshold, the webpage template and the template index are published; if it is determined that the overall difference rate is not greater than the preset overall difference rate
  • generating a template index according to the generated webpage template includes: selecting a template whose quality meets a predetermined quality condition; determining a URL path to which the template is applicable; and selecting, from the URL path, a URL path applicable to a template whose quality meets a predetermined quality condition; Convert to a template index.
  • the webpage template generating method further includes: determining whether the number of webpage templates reaches a preset number; if it is determined that the number of webpage templates reaches a preset number, calculating each webpage template The coverage of the webpage template that is smaller than the first preset coverage threshold is compared with the webpage template that is greater than the first preset coverage threshold; if the webpage template that is smaller than the first preset coverage threshold is greater than the first pre-predicted If the difference rate of the webpage template of the coverage threshold is less than the preset difference rate threshold, the webpage template that is smaller than the first preset coverage threshold is merged with the webpage template that is greater than the first preset coverage threshold.
  • comparing the webpage template with the coverage ratio smaller than the first preset coverage threshold with the webpage template that is greater than the first preset coverage threshold includes: sorting the plurality of webpage templates according to the coverage ratio from large to small; Compare the next page template with the top page template.
  • generating a template index according to the generated webpage template includes: storing a plurality of webpage templates; calculating a coverage rate of each webpage template; and determining whether a total of the coverage ratios of the webpage templates in each path reaches a second preset coverage threshold ; delete the webpage template under the path that the sum of the coverage of the webpage template does not reach the second preset coverage threshold.
  • a web page template server includes: an acquisition unit, configured to collect webpage data of a webpage; and a generating unit, And a webpage template for generating a webpage according to the webpage data; and an indexing unit, configured to generate a template index according to the generated webpage template.
  • the webpage template server further includes: a publishing unit, configured to: after generating the webpage template of the webpage according to the webpage data, publishing the webpage template and the template index to the plurality of template servers providing the webpage template; and the storage unit, configured to be in the multiple templates
  • the server separately stores the webpage template and the template index; the template retrieving unit is configured to retrieve the webpage template matching the webpage by using the template index, and provide a template matching the webpage to other servers.
  • the issuing unit includes: a calculating module, configured to calculate an overall difference rate of the set of the plurality of webpage templates and the historical template set; and a determining module, configured to determine whether the overall difference rate is greater than a preset overall difference rate threshold; After determining that the overall difference rate is greater than the preset overall difference rate threshold, the webpage template is published, and the webpage template is not published when it is determined that the overall difference rate is not greater than the preset overall difference rate threshold.
  • the index unit includes: a template selection module for selecting a template whose quality meets a predetermined quality condition; a template path derivation module for determining a URL path applicable to the template; and a template path pruning module for selecting a quality from the URL path A URL path applicable to a template conforming to a predetermined quality condition; a template index generation module for converting the selected path into a template index.
  • the webpage template server further includes: a judging unit, configured to determine whether the number of webpage templates reaches a preset number after the webpage template of the webpage is generated according to the webpage data; and the calculating unit is configured to determine that the number of webpage templates reaches a preset When the quantity is used, the coverage of each webpage template is calculated; the comparison unit is configured to compare the webpage template whose coverage is smaller than the first preset coverage threshold with the webpage template that is greater than the first preset coverage threshold; The webpage template that is smaller than the first preset coverage threshold and the webpage template that is greater than the first preset coverage threshold is smaller than the preset difference rate threshold, and the webpage template that is smaller than the first preset coverage threshold is greater than the first A web page template with a preset coverage threshold is merged.
  • a judging unit configured to determine whether the number of webpage templates reaches a preset number after the webpage template of the webpage is generated according to the webpage data
  • the calculating unit is configured to determine that the number of webpage templates reaches a preset When the quantity is used, the
  • the comparison unit includes: a sorting module, configured to sort the plurality of webpage templates according to the size of the coverage ratio; the comparison module is configured to compare the webpage templates arranged in the back with the webpage templates ranked in the frontpage .
  • the indexing unit includes: a storage module, configured to store a plurality of webpage templates after generating a webpage template of the webpage according to the webpage data; a computing module, configured to calculate a coverage ratio of each webpage template; and a third determining module, configured to: Determining whether the total coverage of the webpage template in each path reaches the second preset coverage threshold; the deleting module is configured to delete the webpage in the path that the total coverage of the webpage template does not reach the second preset coverage threshold template.
  • the webpage data of the webpage is collected, and the webpage template of the webpage is generated according to the webpage data, thereby solving the problem that the webpage template generating method in the prior art has strong dependence on the target website, thereby achieving the method for reducing the webpage template generation method.
  • a web page template server comprising:
  • a webpage template data storage unit configured to store webpage template data
  • a webpage template data obtaining unit configured to acquire webpage template data corresponding to the webpage data acquired and forwarded by the middleware server after receiving the webpage browsing request from the terminal device from the webpage template data storage unit;
  • a difference data generating unit configured to generate difference data between the webpage data and webpage template data based on webpage data received from the middleware server and webpage template data corresponding to the webpage data
  • a sending unit configured to forward the difference data to the terminal device via the middleware server, so that the terminal device sends, according to the difference data, a webpage template corresponding to the difference data locally stored by the terminal device
  • the data shows the requested web page.
  • the webpage browsing request includes a first webpage template ID list
  • the webpage template data obtaining unit is configured to sequentially acquire the webpage template ID in the first webpage template ID list, and based on the obtained webpage template ID, Obtaining webpage template data in the webpage template data storage unit, and
  • the difference data generating unit includes:
  • a difference data calculation module configured to calculate difference data between the webpage data and the webpage template data obtained from the webpage template data storage unit
  • a determining module configured to determine the difference data as the difference data when a compression ratio between the calculated difference data and the webpage data is less than a first predetermined threshold
  • the webpage template data acquiring unit and the delta data generating unit are configured to repeatedly execute the processing until the generation The delta data.
  • the webpage template server when the webpage browsing request includes a webpage address and a first webpage template ID list, the webpage template server includes
  • a webpage template ID list library configured to store a second webpage template ID list in association with the webpage address
  • a webpage template ID list obtaining module configured to obtain a corresponding second webpage template ID list from the webpage template ID list library according to the webpage address of the webpage requested to be browsed
  • a webpage template ID list merging unit configured to merge the first webpage template ID list and the second webpage template ID list into a third webpage template ID list
  • the webpage template data obtaining unit is configured to sequentially acquire a webpage template ID in the third webpage template ID list, and obtain webpage template data from the webpage template data storage unit based on the obtained webpage template ID, and
  • the difference data generating unit includes: a difference data calculating module, configured to calculate difference data between the webpage data and the webpage template data acquired from the webpage template data storage unit; and
  • a determining module configured to determine the difference data as the difference data when a compression ratio between the calculated difference data and the webpage data is less than a first predetermined threshold
  • the webpage template data acquiring unit and the delta data generating unit are configured to repeatedly execute the processing until the generation The delta data.
  • the webpage template ID list merging unit is configured to merge the webpage template IDs in the first webpage template ID list and the second webpage template ID list according to priorities to form a third webpage template ID list, wherein the first webpage The intersection of the template ID list and the second webpage template ID list has the highest priority, the remaining part of the first webpage template ID list is second, and the remaining part of the second webpage template ID list is the lowest.
  • the webpage template server of the present invention further includes: a delta data saving unit configured to store the delta data in association with the webpage template ID and the webpage address;
  • a difference data query unit configured to query, in the difference data holding unit, the associated difference data according to the webpage template ID and the webpage address, and
  • the delta data generating unit is configured to generate the delta data when the delta data query unit does not query the associated delta data.
  • the difference data generating unit of the present invention as a preferred method further includes:
  • a counting unit configured to count the number of calculations of the difference data calculating unit when a compression ratio between the calculated difference data and the webpage data is not less than the first predetermined threshold
  • the webpage template data obtaining unit is configured to acquire a next webpage template ID, and obtain from the webpage template data storage unit based on the next webpage template ID.
  • the webpage template server of the present invention further includes: a delta data generation failure message generating unit, configured to generate a delta data generation failure message when the number of calculations exceeds a second predetermined threshold, and
  • the sending unit is further configured to return a difference data generation failure message to the middleware server, so that the middleware server returns webpage data to the terminal device to display after receiving the difference data generation failure message.
  • the webpage template server of the present invention further includes: a second determining unit, configured to determine, after the generating the difference data, whether the webpage template ID currently used by the webpage template data acquiring unit belongs to the first webpage template ID List, and
  • the sending unit is configured to return the generated delta data and the currently used webpage template ID to the middleware server and forward it to the middleware server Terminal Equipment,
  • the sending unit is configured to return the currently used webpage template ID to the middleware server, and the middleware server will receive the received webpage template ID and webpage The data is sent to the terminal device.
  • a web page template generation method is provided.
  • the method for generating a webpage template may be a method for realizing webpage presentation by using a webpage template executed by a webpage template server, and the method includes:
  • the webpage template data corresponding to the webpage data is obtained from the webpage template data storage unit in the webpage template server;
  • the generated difference data is forwarded to the terminal device via the middleware server, so that the terminal device displays the requested webpage according to the difference data and the webpage template data corresponding to the difference data stored locally by the terminal device.
  • the web browsing request includes a first webpage template ID list
  • the webpage template data corresponding to the webpage data obtained from the webpage template data storage unit, and the difference data generated between the webpage data and the webpage template data based on the webpage data and the webpage template data includes:
  • the first webpage template ID in the first webpage template ID list is sequentially acquired to repeatedly perform the following process until the difference data is generated:
  • the difference data is determined as the difference data
  • the webpage browsing request includes a webpage address of the requested webpage and a first webpage template ID list
  • the webpage template ID list library of the webpage template server stores a second webpage template ID list in association with the webpage address
  • the webpage template data corresponding to the webpage data obtained from the webpage template data storage unit, and the difference data generated between the webpage data and the webpage template data based on the webpage data and the webpage template data includes:
  • the difference data is determined as the difference data
  • the next webpage template ID is obtained from the third webpage template ID list as a new currently acquired webpage template ID.
  • the first webpage template ID list and the second webpage template ID list are merged into a third webpage template ID list, including
  • the method further includes: after generating the difference data, the webpage template server determines whether the currently used webpage template ID belongs to the first webpage template ID list, and
  • the webpage template server When the currently used webpage template ID belongs to the first webpage template ID list, the webpage template server returns the generated delta data and the currently used webpage template ID to the middleware server and forwards to the terminal device via the middleware server. ,
  • the webpage template server When the currently used webpage template ID does not belong to the first webpage template ID list, the webpage template server returns the currently used webpage template ID to the middleware server, and the middleware server sends the received webpage template ID and webpage data. To the terminal device.
  • a computer readable medium having program code executable by a processor, wherein, when executed, the program code causes a processor to perform the following Step: collecting webpage data of the webpage; generating a webpage template of the webpage according to the webpage data; and generating a template index according to the generated webpage template.
  • the web page template and the web page template server are implemented by using the webpage template of the present invention, and the webpage template server is configured to store and calculate the difference data between the webpage template and the webpage data, and the middleware server sends the difference data to the terminal device.
  • the terminal device locally calls the webpage template corresponding to the difference data, thereby realizing the presentation of the webpage.
  • FIG. 1 is a flowchart of a method for generating a web page template according to a first embodiment of the present invention
  • FIG. 2 is a flowchart of a method for generating a web page template according to a second embodiment of the present invention
  • FIG. 3 is a flowchart of a method for generating a web page template according to a third embodiment of the present invention.
  • FIG. 4 is a flowchart of a method for generating a web page template according to a fourth embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a web page template server according to a first embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a web page template server according to a second embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a web page template server according to a third embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a web page template server according to a fourth embodiment of the present invention.
  • FIG. 9 is a block diagram showing a connection between a webpage template server and an intermediate server and a terminal device according to an embodiment of the present invention.
  • FIG. 10 is a block diagram showing an embodiment of a terminal device according to an embodiment of the present invention.
  • FIG. 11 is a block schematic diagram of one embodiment of a middleware server in accordance with an embodiment of the present invention.
  • FIG. 12 is a block schematic diagram of one embodiment of a web page template server in accordance with an embodiment of the present invention.
  • FIG. 13 is a block diagram showing an embodiment of a delta data generating unit of a web page template server according to an embodiment of the present invention.
  • FIG. 14 is a block diagram showing a second embodiment of a web page template server according to an embodiment of the present invention.
  • 15 is a flow chart of an embodiment of a method for implementing web page presentation using a web page template in accordance with the present invention.
  • FIG. 16a and FIG. 16b are flowcharts showing a first embodiment of the step S703 in the case where the webpage browsing request of the method for realizing webpage presentation using the webpage template according to the present invention includes the first webpage template ID list;
  • FIG. 17a and FIG. 17b are flowcharts showing a second embodiment of the step S703 in the case where the webpage browsing request of the method for realizing webpage presentation using the webpage template according to the present invention includes the first webpage template ID list;
  • FIG. 18a and FIG. 18b are flowcharts of a process for a terminal device to acquire webpage template data in a method for realizing webpage presentation by using a webpage template.
  • the embodiment of the invention provides a method for generating a webpage template.
  • FIG. 1 is a flow chart of a method for generating a web page template according to a first embodiment of the present invention. As shown in the figure, the web page template generating method includes the following steps:
  • Step S101 collecting webpage data of a webpage.
  • the webpage data of the webpage is webpage data that needs to browse the webpage.
  • the webpage data of the webpage is from a client or a plurality of clients, and the webpage data of the webpage collected may be webpage data of one or more webpages of one client, and the webpage of the webpage is collected.
  • the webpage data can also be data of webpages under the same domain name or different domain names. Store these collected web page data.
  • webpage data of the webpage can be collected according to the needs of the user to browse the webpage.
  • the source of the webpage data of the webpage is only for exemplifying the data of the webpage of the above source, and is not used to limit the webpage data of the webpage. In the process, it is necessary to collect the webpage data of all the webpages of all the above webpage sources.
  • Step S102 Generate a webpage template of the webpage according to the collected webpage data.
  • the generated web page template can be calculated and generated using a locally sensitive hash algorithm (simhash) algorithm.
  • the simhash algorithm may be used to generate an N-bit hash value for the webpage data, and the generated N-bit hash value is generated by a random hash prefix method to generate T label values, and each label value is from the same domain name.
  • the webpage template finds a webpage template. If a suitable webpage template is found, the found webpage template can be used as a webpage template that needs to browse the webpage to transmit incremental data. If no suitable webpage template is found, the webpage to be browsed can be browsed.
  • the template is stored as a web page template in the template gallery.
  • Step S103 Generate a template index according to the generated webpage template.
  • the template index can be used to find a webpage template corresponding to the webpage.
  • the template index is generated according to the generated webpage template, and the template index is used to find the matching webpage template.
  • webpage template generated by the above method may cause the same or similar webpage template to appear, these same or similar templates may be stored in different clients, so that the obtained webpage template reduces the occupation of the storage space and the obtained webpage. Templates are more representative, and you can keep one of the same or similar web page templates and delete the rest of the same or similar templates.
  • the webpage template of the webpage may be established by using the collected webpage data, so that the establishment of the template does not depend on a specific target website, and the dependency on the target website is reduced, and the Create a corresponding web page template for any target website.
  • the web page template generating method includes the following steps:
  • Step S201 collecting webpage data of the webpage.
  • the webpage data of the webpage may be webpage data that needs to browse the webpage.
  • the webpage data of the webpage may be from one client or multiple clients, and the webpage data of the webpage may be webpage data of one or more webpages of one client, and the webpage data is collected.
  • the webpage data of the webpage may also be the data of the webpage under the same domain name or different domain name.
  • the webpage data of the webpage can be collected according to the needs of the user to browse the webpage.
  • the source of the webpage data of the webpage is only for exemplifying the data of the webpage of the above source, and is not used. In the process of collecting the webpage data of the webpage, it is necessary to collect the webpage data of all the webpages of all the webpage sources mentioned above.
  • Step S202 Generate a webpage template of the webpage according to the collected webpage data.
  • the generated webpage template needs to be filtered. For the convenience of screening, the following steps S203 to S205 are first performed.
  • Step S203 publishing a webpage template and a template index to a plurality of template servers that provide a webpage template.
  • the webpage template may be published to a plurality of template servers that provide the webpage template.
  • multiple template servers can provide webpage templates to different websites.
  • Step S204 The plurality of template servers respectively store the webpage template and the template index.
  • the plurality of template servers respectively store the received webpage templates, so that each webpage template exists in each of the plurality of template servers, and when the webpage data needs to be transmitted on the basis of the webpage template, multiple templates may be selected.
  • the webpage template in the template server with better network condition in the server transmits the incremental data, thereby increasing the convenience and reliability of calling the webpage template.
  • the first template server of the plurality of template servers uses the template index to retrieve the webpage template that matches the webpage, and provides a template matching the webpage to the template servers other than the first template server.
  • the template server can be any one of a plurality of template servers.
  • the template index is used to quickly determine whether a webpage request matches the webpage template stored in the server, and the matching webpage template is determined according to the webpage of the requested webpage. After determining the webpage template that matches the webpage, the matching webpage template is sent to the other template server.
  • the webpage template generating method of the embodiment of the present invention for example, collecting webpage data, generating a webpage template, webpage template publishing, and webpage template retrieval, etc., may be deployed on multiple servers, so that multiple servers cooperate, that is, The functionality of the web page template server of the present invention can be accomplished by a plurality of co-servers.
  • FIG. 3 is a flow chart of a method for generating a web page template according to a third embodiment of the present invention.
  • the embodiment shown in the figure can be used as a preferred embodiment for publishing a webpage template and a template index to a plurality of template servers providing webpage templates in step S203 in the embodiment shown in FIG. 2.
  • step S202 shown in FIG. 2 After performing step S202 shown in FIG. 2, the following steps are performed. :
  • Step S301 after generating a plurality of webpage templates, establishing an index of the plurality of webpage templates.
  • a plurality of webpage template indexes are created for conveniently finding the webpage template.
  • the web page template index can index web page templates by URL or domain name.
  • an index of the webpage template may be obtained by generating a row label value or an MD5 value of the domain name.
  • Step S302 Calculate an overall difference rate between the set of the plurality of webpage templates and the set of the history template. In order to avoid waste of resources caused by replacing the webpage template to regenerate the delta file when the webpage template changes are small, the overall difference rate of the collection of the plurality of webpage templates and the historical template collection is calculated.
  • Step S303 determining whether the overall difference rate is greater than a preset overall difference rate threshold. Determining whether the overall difference rate between the set of the plurality of webpage templates and the historical template set is greater than a preset overall difference rate threshold. If the overall difference rate between the set of the plurality of webpage templates and the historical template set is greater than a preset overall difference rate threshold, the webpage template If the overall difference between the collection of the plurality of webpage templates and the historical template collection is less than the preset overall difference rate threshold, the webpage template changes are small, and the webpage template is not published.
  • Step S304 if it is determined that the overall difference rate is greater than the preset overall difference rate threshold, the webpage template is published. If the overall difference rate between the collection of the plurality of webpage templates and the historical template collection is greater than the preset overall difference threshold, the collection of the plurality of webpage templates is larger than the historical template collection, and the webpage template may be published.
  • step S305 if it is determined that the overall difference rate is not greater than the preset overall difference rate threshold, the webpage template is not published. If the overall difference between the collection of the plurality of webpage templates and the historical template collection is less than the preset overall difference threshold, the generated collection of the plurality of webpage templates is smaller than the historical template collection, and the incremental file transmission may be performed based on the historical template. You can not post a web page template.
  • the template index is generated according to the generated webpage template, and the method for generating the template index is as follows:
  • the template can save the amount of data transfer.
  • the URL path for the template whose quality meets the predetermined quality criteria is selected from the URL path. Since the short path template has better coverage, the lookup URL path can be looked up from the shortest path closest to the root directory.
  • the selected path is converted to a template index.
  • the path of the webpage template selected according to the URL path corresponds to the webpage accessed by the user, and forms a template index.
  • FIG. 4 is a flow chart of a method for generating a web page template according to a fourth embodiment of the present invention. As shown in the figure, the embodiment shown in the figure can be used as a preferred embodiment of the embodiment shown in FIG. 1. The specific steps are as follows:
  • Step S401 collecting webpage data of a webpage.
  • the webpage data of the webpage may be webpage data that needs to browse the webpage.
  • the webpage data of the webpage may be from one client or multiple clients, and the webpage data of the webpage may be webpage data of one or more webpages of one client, and the webpage data is collected.
  • the webpage data of the webpage may also be the data of the webpage under the same domain name or different domain name.
  • Step S402 determining whether the number of webpage templates reaches a preset number. After the webpage template of the webpage is generated according to the webpage data, it is determined whether the number of webpage templates reaches a preset number. If it is determined that the number of webpage templates does not reach the preset number, the webpage template of the webpage may be continuously generated according to the webpage data, if it is determined The number of web page templates reaches a preset number, and the coverage rate of each web page template can be calculated.
  • Step S403 if it is determined that the number of webpage templates reaches a preset number, the coverage rate of each webpage template is calculated.
  • Template coverage is an important indicator to measure the quality of the generated webpage template.
  • the template coverage rate can be the ratio of the number of webpage templates that can be applied to the webpages in the website to the total number of webpages on the website.
  • the template coverage rate is larger. The more pages that can be applied to the site.
  • Template coverage not only measures the quality of a website's website template, but also the quality of a web page template in a single path. For example, the web coverage of a web page template is not very high, but the coverage rate under a certain path is very high, and the web page template can achieve good results in practical applications.
  • Step S404 Compare a webpage template whose coverage is smaller than the first preset coverage threshold with a webpage template that is greater than the first preset coverage threshold. After calculating the coverage rate of each web page template, in order to avoid re-selecting similar web page templates for incremental file transmission when the web page template changes are small, the template and the coverage with the coverage ratio less than the first preset coverage threshold may be compared. A web page template whose rate is greater than the first preset coverage threshold.
  • Step S405 If the difference between the webpage template that is smaller than the first preset coverage threshold and the webpage template that is greater than the first preset coverage threshold is less than the preset difference rate threshold, the webpage template that is smaller than the first preset coverage threshold is used. Merged with a web page template that is greater than the first preset coverage threshold.
  • the difference between the webpage template that is smaller than the first preset coverage threshold and the webpage template that is greater than the first preset coverage threshold may be that the two templates are calculated by the open-vcdiff algorithm.
  • the ratio of the difference between the difference between the two webpage templates and the size of the webpage template that is smaller than the first preset coverage threshold, and the difference between the webpage template that is smaller than the first preset coverage threshold and the webpage template that is greater than the first preset coverage threshold It can be used to measure the degree of difference between the webpage template that is smaller than the first preset coverage threshold and the webpage template that is greater than the first preset coverage threshold.
  • the webpage template that is smaller than the first preset coverage threshold is considered to be larger than the first
  • a webpage template with a preset coverage threshold is similar, and a webpage template that is smaller than the first preset coverage threshold is merged with a webpage template that is greater than the first preset coverage threshold, and the merge process may be smaller than the first preset coverage.
  • the data of the threshold web page template is merged into the data of the web page template that is greater than the first preset coverage threshold.
  • the plurality of webpage templates may be performed according to the coverage rate. Small sorts, then compare the page templates that follow in the page with the top page templates.
  • the webpage template with the difference rate of the webpage template is smaller than the preset difference rate threshold. merge.
  • the webpage template index and the webpage template index are all released according to the webpage template index of the webpage.
  • generating a template index according to the generated webpage template includes the following steps:
  • Step S501 storing a plurality of webpage templates.
  • the generated plurality of webpage templates are stored.
  • Step S502 calculating the coverage rate of each webpage template. Since the template close to the root directory usually has better coverage, when searching for a template, the template close to the root directory is preferentially processed. Therefore, when calculating multiple webpage templates, the generated webpage templates are first sorted according to the path depth, and the short path is used. The page templates relative to the deep path are arranged closer to the root directory.
  • the coverage rate of each webpage template in one path can be calculated when calculating the coverage rate of each webpage template.
  • the coverage of each webpage template may be the coverage ratio of the webpage template relative to all the webpage templates in the entire path.
  • the ranking can be sorted from high to low.
  • a certain number of webpage templates can be intercepted according to the path depth from long to short, so that the number of webpage templates in the same path is reduced to reduce the calculation speed.
  • Step S503 Determine whether the total coverage of the webpage template under each path reaches a second preset coverage threshold.
  • each web page template After the coverage of each web page template is calculated, it is determined whether the total coverage of the web page template in each path reaches the second preset coverage threshold, and if the second preset coverage threshold is reached, the path is retained; If the total coverage of the webpage template in each path does not reach the second preset coverage threshold, the webpage template in the path where the total coverage of the webpage template does not reach the second preset coverage threshold is deleted.
  • Step S504 deleting the webpage template under the path that the total coverage of the webpage template does not reach the second preset coverage threshold. If the total coverage of the webpage template does not reach the second preset coverage threshold, the webpage template in the path where the total coverage of the webpage template does not reach the second preset coverage threshold does not need to be processed and used. Therefore, the webpage template under the path that the total coverage of the webpage template does not reach the second preset coverage threshold may be deleted to save storage resources.
  • the template index can be generated according to the generated webpage template by the above steps S501 to S504. Therefore, when the user visits the webpage, the template index can be used to find the matching webpage template.
  • FIG. 5 is a schematic diagram of a webpage template server according to the first embodiment of the present invention.
  • the webpage template server may be the same server as the template server in the foregoing embodiment.
  • the web page template server includes an acquisition unit 10, a generation unit 30, and an index unit 60.
  • the collecting unit 10 is configured to collect webpage data of a webpage.
  • the webpage data of the webpage may be webpage data that needs to browse the webpage.
  • the webpage data of the webpage may be from one client or multiple clients, and the webpage data of the webpage may be webpage data of one or more webpages of one client, and the webpage data is collected.
  • the webpage data of the webpage may also be the data of the webpage under the same domain name or different domain name. Store these collected web page data.
  • webpage data of the webpage can be collected according to the needs of the user to browse the webpage.
  • the source of the webpage data of the webpage is only for exemplifying the data of the webpage of the above source, and is not used to limit the webpage data of the webpage. In the process, it is necessary to collect the webpage data of all the webpages of all the above webpage sources.
  • the generating unit 30 is configured to generate a template corresponding to the webpage according to the collected webpage data, for example, the webpage template of the webpage may be generated according to the webpage data of the webpage.
  • the generated web page template can be calculated and generated using a locally sensitive hash algorithm (simhash) algorithm.
  • the simhash algorithm may be used to generate an N-bit hash value for the webpage data, and the generated N-bit hash value is generated by a random hash prefix method to generate T label values, and each label value is from the same domain name.
  • the webpage template finds a webpage template. If a suitable webpage template is found, the found webpage template can be used as a webpage template that needs to browse the webpage to transmit incremental data. If no suitable webpage template is found, the webpage to be browsed can be browsed.
  • the template is stored as a web page template in the template gallery.
  • the indexing unit 60 is configured to generate a template index according to the generated webpage template.
  • the mapping unit can establish a mapping relationship between the URL path of the webpage template and the template according to the generated webpage template, and use the mapping relationship as a template index.
  • webpage template generated by the above method may cause the same or similar webpage template to appear, these same or similar templates may be stored in different clients, so that the obtained webpage template reduces the occupation of the storage space and the obtained webpage. Templates are more representative, and you can keep one of the same or similar web page templates and delete the rest of the same or similar templates.
  • the webpage template of the webpage may be established by using the collected webpage data, so that the establishment of the template does not depend on a specific target website, and the dependency on the target website is reduced, and the Create a corresponding web page template for any target website.
  • FIG. 6 is a schematic diagram of a web page template server in accordance with a second embodiment of the present invention. This embodiment can be used as a preferred embodiment of the embodiment shown in FIG. 5.
  • the webpage template server includes an acquisition unit 10, a generation unit 30, a distribution unit 40, a storage unit 50, an index unit 60, and a template retrieval unit 20. .
  • the publishing unit 40 is configured to post the webpage template to the plurality of template servers that provide the webpage template after generating the webpage template of the webpage according to the webpage data.
  • the webpage template may be published to a plurality of template servers that provide the webpage template.
  • multiple template servers can send webpage templates to multiple websites, and can also collect webpage data from multiple websites.
  • the storage unit 50 is configured to separately store webpage templates in a plurality of template servers.
  • the plurality of template servers respectively store the received webpage templates, so that each webpage template exists in each of the plurality of template servers, and when the webpage data needs to be transmitted on the basis of the webpage template, multiple templates may be selected.
  • the webpage template in the template server of the network condition in the server transmits the incremental data, thereby increasing the convenience and reliability of loading the webpage data based on the webpage template.
  • the indexing unit 60 is configured to generate a template index according to the generated webpage template.
  • the mapping unit can establish a mapping relationship between the URL path of the webpage template and the template according to the generated webpage template, and use the mapping relationship as a template index.
  • the template retrieval unit 20 is configured to retrieve a webpage template matching the webpage by using the template index, and provide a template matching the webpage to other servers.
  • the template index is used to quickly determine whether a webpage request matches the webpage template stored in the server, and the matching webpage template is determined according to the webpage of the requested webpage. After any one of the plurality of template generation servers determines the webpage template that matches the webpage, the matching webpage template is sent to other servers in the plurality of template generation servers.
  • the webpage template generation method of the embodiment of the present invention for example, collecting webpage data, generating webpage templates, webpage template publishing, and webpage template retrieval, etc., may be deployed on multiple servers, so that multiple servers cooperate to serve. That is, the function of the web page template server of the present invention can be completed by a plurality of common servers. Different feature templates can be deployed on different servers. It is also possible to deploy the same functional modules in different servers.
  • the index unit 60 includes a template selection module, a template path derivation module, a template path pruning module, and a template index generation module.
  • the template selection module is used to select a template whose quality meets the predetermined quality conditions. Searching the generated webpage template for a template that meets the predetermined quality condition, wherein the predetermined quality condition may be that the template has a coverage rate of the webpage accessed by the user that is greater than a predetermined threshold, and the template whose quality meets the predetermined quality condition does not meet the predetermined quality condition compared to the quality.
  • the template can save the amount of data transfer.
  • the template path derivation module is used to determine the URL path to which the template applies. Finding all the page templates under the path based on the URL path used by the template can improve the speed of finding the page template.
  • the template path pruning module is used to select a URL path suitable for the template whose quality meets the predetermined quality condition from the URL path. Since the short path template has better coverage, the lookup URL path can be looked up from the shortest path closest to the root directory.
  • the template index generation module is used to convert the selected path into a template index.
  • the path of the webpage template selected according to the URL path corresponds to the webpage accessed by the user, and forms a template index.
  • FIG. 7 is a schematic diagram of a web page template server according to a third embodiment of the present invention.
  • This embodiment can be used as a preferred embodiment of the embodiment shown in FIG. 5.
  • the webpage template server includes an acquisition unit 10, and generates The unit 30, the issuing unit 40, the storage unit 50 and the indexing unit 60, wherein the issuing unit 40 comprises a calculating module 401, a determining module 402 and a publishing module 403.
  • the calculation module 401 is configured to calculate an overall difference rate of the set of the plurality of webpage templates and the set of the history template. In order to avoid waste of resources caused by replacing the webpage template to regenerate the delta file when the webpage template changes are small, the overall difference rate of the collection of the plurality of webpage templates and the historical template collection is calculated.
  • the determining module 402 is configured to determine whether the overall difference rate is greater than a preset overall difference rate threshold. Determining whether the overall difference rate between the set of the plurality of webpage templates and the historical template set is greater than a preset overall difference rate threshold. If the overall difference rate between the set of the plurality of webpage templates and the historical template set is greater than a preset overall difference rate threshold, the webpage template If the overall difference between the collection of the plurality of webpage templates and the historical template collection is less than the preset overall difference rate threshold, the webpage template changes are small, and the webpage template is not published.
  • the publishing module 403 is configured to: after determining that the overall difference rate is greater than the preset overall difference rate threshold, publishing the webpage template, and determining that the overall difference rate is not greater than the preset overall difference rate threshold, and not publishing the webpage template. If the overall difference rate between the collection of the plurality of webpage templates and the historical template collection is greater than the preset overall difference threshold, the collection of the plurality of webpage templates is larger than the historical template collection, and the webpage template may be published. If the overall difference between the collection of the plurality of webpage templates and the historical template collection is less than the preset overall difference threshold, the generated collection of the plurality of webpage templates is smaller than the historical template collection, and the incremental file transmission may be performed based on the historical template. You can not post a web page template.
  • FIG. 8 is a schematic diagram of a web page template server according to a fourth embodiment of the present invention.
  • the web page template server includes an acquisition unit 10, a generation unit 30, a determination unit 60, a calculation unit 70, a comparison unit 80, and a merging unit 90.
  • the functions of the collecting unit 10 and the generating unit 30 shown in FIG. 8 are the same as those of the collecting unit 10 and the generating unit 30 in the embodiment shown in FIG. 5, and are not described herein.
  • the determining unit 60 is configured to determine whether the number of the webpage templates reaches a preset number after generating the webpage template of the webpage according to the webpage data. After the webpage template of the webpage is generated according to the webpage data, it is determined whether the number of webpage templates reaches a preset number. If it is determined that the number of webpage templates does not reach the preset number, the webpage template of the webpage may be continuously generated according to the webpage data, if it is determined The number of web page templates reaches a preset number, and the coverage rate of each web page template can be calculated.
  • the calculating unit 70 is configured to calculate the coverage rate of each webpage template when it is determined that the number of webpage templates reaches a preset number.
  • Template coverage is an important indicator to measure the quality of the generated webpage template.
  • the template coverage rate can be the ratio of the number of webpage templates that can be applied to the webpages in the website to the total number of webpages on the website.
  • the template coverage rate is larger. The more pages the template can be applied to within the site.
  • Template coverage not only measures the quality of a website's website template, but also the quality of a web page template in a single path. For example, the web coverage of a web page template is not very high, but the coverage rate under a certain path is very high, and the web page template can achieve good results in practical applications.
  • the comparing unit 80 is configured to compare the webpage template whose coverage is less than the preset coverage threshold with the webpage template that is greater than the preset coverage threshold. After calculating the coverage rate of each web page template, in order to avoid re-selecting similar web page templates for incremental file transmission when the web page template changes are small, the template and the coverage with the coverage ratio less than the first preset coverage threshold may be compared. A web page template whose rate is greater than the first preset coverage threshold.
  • the merging unit 90 is configured to: when the webpage template that is smaller than the preset coverage threshold and the webpage template that is greater than the preset coverage threshold, the difference ratio is smaller than the preset difference rate threshold, and the webpage template that is smaller than the preset coverage threshold is greater than the preset coverage.
  • the rate threshold for the page template is merged.
  • the difference between the webpage template that is smaller than the first preset coverage threshold and the webpage template that is greater than the first preset coverage threshold may be that the two templates use the open-vcdiff algorithm to calculate the difference between the two webpage templates and the first preset.
  • the ratio of the size of the webpage template of the coverage threshold to the webpage template that is smaller than the first preset coverage threshold and the webpage template that is greater than the first preset coverage threshold may be used to measure the threshold that is smaller than the first preset coverage threshold.
  • the degree of difference between the webpage template and the webpage template that is greater than the first preset coverage threshold may be used to measure the threshold that is smaller than the first preset coverage threshold.
  • the webpage template that is smaller than the first preset coverage threshold is considered to be larger than the first
  • a webpage template with a preset coverage threshold is similar, and a webpage template that is smaller than the first preset coverage threshold is merged with a webpage template that is greater than the first preset coverage threshold, and the merge process may be smaller than the first preset coverage.
  • the data of the threshold web page template is merged into the data of the web page template that is greater than the first preset coverage threshold.
  • the comparing unit 80 includes a sorting module and a comparing module, wherein the sorting module The method is used for sorting multiple webpage templates according to the size of the coverage ratio; the comparison module is used to compare the webpage templates that are listed in the back with the webpage templates that are listed in the front.
  • the webpage template with the difference rate of the webpage template is smaller than the preset difference rate threshold. merge.
  • the webpage template index may be obtained according to the webpage or the domain name of the webpage, and the webpage template data and the webpage template index may be published.
  • the index unit 60 further includes: a storage module, a calculation module, a third determination module, and a deletion module.
  • the storage module is configured to store a plurality of webpage templates after the webpage template of the webpage is generated according to the webpage data. To facilitate the invocation of the template, the generated plurality of webpage templates are stored after the webpage template of the webpage is generated.
  • a calculation module for calculating the coverage of each web page template Since the template close to the root directory usually has better coverage, when searching for a template, the template close to the root directory is preferentially processed. Therefore, when calculating multiple webpage templates, the generated webpage templates are first sorted according to the path depth, and the short path is used. The page templates relative to the deep path are arranged closer to the root directory.
  • the coverage rate of each webpage template in one path can be calculated when calculating the coverage rate of each webpage template.
  • the coverage of each webpage template may be the coverage ratio of the webpage template relative to all the webpage templates in the entire path.
  • the ranking can be sorted from high to low.
  • a certain number of webpage templates can be intercepted according to the path depth from long to short, so that the number of webpage templates in the same path is reduced to reduce the calculation speed.
  • the third determining module is configured to determine whether the total coverage of the webpage template under each path reaches a preset coverage threshold. After the coverage of each web page template is calculated, it is determined whether the total coverage of the web page template in each path reaches the second preset coverage threshold, and if the second preset coverage threshold is reached, the path is retained; If the total coverage of the webpage template in each path does not reach the second preset coverage threshold, the webpage template in the path where the total coverage of the webpage template does not reach the second preset coverage threshold is deleted.
  • the deletion module is configured to delete a webpage template under the path that the total coverage of the webpage template does not reach the preset coverage threshold. If the total coverage of the webpage template does not reach the second preset coverage threshold, the webpage template in the path where the total coverage of the webpage template does not reach the second preset coverage threshold does not need to be processed and used. Therefore, the webpage template under the path that the total coverage of the webpage template does not reach the second preset coverage threshold may be deleted to save storage resources.
  • the webpage template may be a webpage, and one webpage can be used as a webpage template of another webpage.
  • the webpage A can cover most of the content of the webpage B, that is, the webpage A and the webpage B have similar structures, contents, or codes, and there is a large amount of duplicate data between the webpage A and the webpage B, the webpage A can serve as the webpage of the webpage B.
  • Templates, too, web page B can also be used as a web page template for web page A.
  • a web page can have one or more web page templates, and a web page template can also serve as a template for one or more web pages.
  • FIG. 9 is a block diagram showing the connection of a web page template server to an intermediate server and a terminal device according to an embodiment of the present invention. As shown in Figure 9.
  • the terminal device 10 is configured to send a webpage browsing request to the middleware server 20, receive the difference data returned by the middleware server 20 in response to the webpage browsing request, and the webpage template corresponding to the difference data locally stored by the terminal device 10.
  • the data and the difference data represent the requested web page, and the delta data is generated in the web page template server 30 based on the web page data of the requested web page and the web page template data corresponding to the web page data.
  • the user operates the terminal device 10 to issue a browsing request through the terminal device 10.
  • the terminal device 10 receives the browsing request of the web page, and transmits the browsing request of the web page to the middleware server 20.
  • the user can make a browsing request to the terminal device 10 by clicking the action.
  • the middleware server 20 is configured to acquire the requested webpage data according to the received webpage browsing request, forward the webpage data to the webpage template server 30, and forward the said data to the terminal device 10 after receiving the difference data returned by the webpage template server 30. Differential data.
  • the webpage template server 30 is configured to generate the difference data between the webpage data and the webpage template data based on the webpage data received from the middleware server 20 and the locally acquired webpage template data corresponding to the webpage data, and forward the difference data to the middleware. Server 20.
  • the difference data existing between the webpage and the corresponding webpage template when the webpage data is transmitted, if the webpage template exists locally in the terminal device 10, only the differential data is transmitted, and it is not necessary to transmit all the data of the webpage.
  • the middleware server 20 directly returns the acquired webpage data.
  • the web page template server 30 of the present invention can also generate web page template data of a new web page template.
  • the webpage template server 30 of the present invention generates webpage template data of a new webpage template, which may be generated by the webpage template server in advance by receiving webpage data forwarded by the middleware server.
  • the webpage template server 30 receives a large amount of webpage data from the middleware server 20, and the embodiment of the present invention may adopt a Hadoop (distributed system infrastructure) cluster.
  • the web page template server 30 is a server cluster composed of a plurality of servers.
  • the server cluster stores webpage data, template data, template index, and the like, and is based on Hadoop-based HBase (distributed, column-oriented open source database) database.
  • HBase distributed, column-oriented open source database
  • Web page template data generation uses MapReduce (parallel computing method for large-scale data sets) computing framework.
  • MapReduce parallel computing method for large-scale data sets
  • Hadoop clusters are a natural distributed storage and computing framework. It is only necessary to increase the number of servers that generate webpage templates in the webpage template server 30, and it is possible to scale the cluster horizontally and have good disaster tolerance.
  • the webpage template server 30 is a server cluster
  • the webpage template server 30 is configured to generate based on webpage data received from the middleware server 20 and locally acquired webpage template data corresponding to the webpage data.
  • the difference data between the web page data and the web page template data is forwarded to the middleware server 20.
  • the local acquisition here means the acquisition in the server cluster.
  • the web page data referred to here includes structural data, content data or encoded data of the web page, and the data is transmitted from the middleware server 20 to the terminal device 10 via the radio communication network or the Internet or transmitted to the middleware server 20 by the terminal device 10.
  • the webpage template of the present invention is stored in the cache in the form of an encoding. Therefore, when the webpage is displayed, the terminal device 10 is required to decode the webpage template data and the delta data, and the webpage template data is restored together with the delta data to obtain the webpage to be displayed. .
  • the difference data is smaller than the web page data, in the case where the web page template exists in the terminal device 10, only the difference data may be transmitted when the web page data is transmitted.
  • the difference data is a part of the webpage data, so the method of transmitting the differential data can be transmitted in the same manner as the transmission method of the webpage data through a network such as a radio communication network or the Internet.
  • the middleware server 20 sends the difference data to the terminal device 10, and the terminal device 10 invokes the webpage template corresponding to the webpage, thereby realizing the webpage presentation, effectively saving network resources, reducing bandwidth occupation, and improving the loading speed of the webpage. To further improve the speed of users browsing the web.
  • Figure 10 is a block schematic diagram of one embodiment of a terminal device in accordance with an embodiment of the present invention.
  • the terminal device 10 includes a web page browsing request transmitting unit 101, a difference amount receiving unit 102, and a web page presenting unit 103.
  • the webpage browsing request sending unit 101 is configured to send a webpage browsing request to the middleware server 20; the terminal device 10 of the present invention needs to find and find the webpage locally before the webpage browsing request sending unit 101 sends the webpage browsing request to the middleware server 20.
  • the page template matching the requested webpage is browsed. If the matching webpage template is found, the first template ID list including the webpage template ID is required to be included in the webpage browsing request package, and the list is empty if not found.
  • the local search for the webpage template that matches the webpage requesting the webpage request may be performed according to the webpage address of the requested webpage, or the webpage may be processed to generate a webpage label for querying, for example, generating a hash value label.
  • the matching principle between the webpage and the webpage template is determined according to the requirements of different websites or webpages.
  • the coverage ratio that is, the coverage ratio between the webpage template of the webpage A and the webpage A reaches a predetermined value
  • the webpage template with the coverage of the webpage A reaching a predetermined value needs to be queried.
  • the matching manner between the webpage template and the webpage may be other than the coverage ratio such as the compression ratio, and is merely an example and is not exhaustive.
  • the size of the first template ID list or the number of template IDs is necessary to be within a certain numerical range. For example, a request packet can only be attached with a maximum of 5 template IDs at a time.
  • the difference data receiving unit 102 is configured to receive the difference data returned by the middleware server 20 in response to the webpage browsing request, where the delta data is webpage data and based on the requested webpage in the webpage template server 30.
  • the webpage template data corresponding to the webpage data is generated, and
  • the webpage presentation unit 103 is configured to display the requested webpage according to the webpage template data and the difference data corresponding to the difference data stored locally by the terminal device 10. Transmission using TCP/IP protocol. If the data received by the webpage presentation unit 103 is the difference data, the requested webpage is presented according to the webpage template data and the difference data corresponding to the difference data stored by the terminal device 10, and if the received data is webpage data, Direct web page presentation.
  • the web page presentation unit 103 needs to restore the encoded data and display the original web page together with the difference data.
  • the number of webpage templates or the total size of webpage template data stored by the terminal device 10 in the present invention is limited, and a threshold may be set, for example, only 100 templates may be saved and the total size may not exceed 10MB. If the threshold is exceeded, the template may be eliminated according to the LRU (Least Recently Used), that is, the least recently used page replacement algorithm.
  • LRU east Recently Used
  • the LRU algorithm can be used to delete a web page template that has been used less recently and is not used for a long period of time in the future, and the storage resource of the terminal device 10 can be saved.
  • the terminal device 10 further includes a webpage template downloading unit 104 and a webpage template data saving unit 105.
  • the webpage template downloading unit 104 is configured to download the corresponding webpage template server 30 from the webpage template server 30 via the middleware server 20 after receiving the webpage template ID that does not belong to the first webpage template ID list from the middleware server 20. Web page template data.
  • the webpage template downloading unit 104 is an independent working thread, which can intelligently perform template request downloading when the network is idle or in the wifi environment, thereby avoiding occupying bandwidth and affecting the browsing experience of the user.
  • the webpage template data saving unit 105 is configured to store the webpage template data downloaded by the webpage template downloading unit 104 in association with the corresponding webpage template ID.
  • the webpage template data for storing the terminal device 10 is the same as the previous embodiment. Based on the storage capability of the terminal device 10, the number of templates stored by the webpage template data saving unit 105 or the total size of the webpage template data is limited, and may be set. Set a threshold, such as a maximum of 100 modes The board and the total size cannot exceed 10MB. If the threshold is exceeded, the template may be eliminated according to the LRU (Least Recently Used), that is, the least recently used page replacement algorithm. The LRU algorithm can be used to delete a web page template that has been used less recently and is not used for a long period of time in the future, and the storage resource of the terminal device 10 can be saved.
  • LRU Least Recently Used
  • the terminal device 10 of the present invention may include a terminal device having a display function, such as a mobile terminal, a PDA, an IPad, and the like, which can perform web browsing.
  • a terminal device having a display function such as a mobile terminal, a PDA, an IPad, and the like, which can perform web browsing.
  • FIG. 11 is a block schematic diagram of one embodiment of a middleware server in accordance with an embodiment of the present invention.
  • the middleware server 20 shown in FIG. 11 includes a webpage data obtaining unit 201, configured to acquire the requested webpage data after receiving the webpage browsing request sent by the terminal device 10; the webpage data may be first cached from the middleware server 20. Whether the query has cached webpage data, if not, it needs to access the target web server to obtain.
  • the forwarding unit 202 is further configured to forward the acquired webpage data to the webpage template server 30, and forward the difference data to the terminal device 10 after receiving the delta data returned by the webpage template server 30.
  • the forwarding unit 202 can transmit data using the TCP/IP protocol.
  • the forwarding unit 202 transmits the recommended template ID to the terminal device 10 and the web page data acquired by the web page data acquiring unit 201.
  • the middleware server 20 further includes a template data obtaining module 203 for receiving a template ID of the webpage template data to be downloaded sent by the webpage template downloading unit 104 of the terminal device 10, by which the template ID is used.
  • the web page template server 30 downloads web page template data. Then, it is sent to the forwarding module 202, and the forwarding template 202 transmits the webpage template data to the terminal device 10, and is saved by the webpage template data saving unit 105 of the terminal device 10.
  • FIG. 12 is a block schematic diagram of one embodiment of a web page template server in accordance with an embodiment of the present invention.
  • the web page template server 30 shown in FIG. 12 includes a web page template data storage unit 301, a web page template data acquiring unit 302, a difference data generating unit 303, and a transmitting unit 304.
  • the webpage template data storage unit 301 is configured to store webpage template data. Specifically, the webpage template data storage unit 301 stores the webpage template ID and the webpage template data in association.
  • the webpage template data obtaining unit 302 is configured to acquire webpage template data corresponding to the received webpage data from the webpage template data storage unit 301.
  • the webpage template data obtaining unit 302 acquires the webpage template data through the first template ID list webpage template data storage unit 301 or acquires the webpage template data by using the first template ID list and the webpage address of the requested webpage to the webpage template data storage unit 301. .
  • the difference data generating unit 303 is configured to generate difference data between the web page data and the web page template data based on the web page data received from the middleware server 20 and the web page template data corresponding to the web page data.
  • the transmitting unit 304 is configured to transmit the generated difference data to the middleware server 20.
  • FIG. 13 is a block diagram showing an embodiment of a delta data generating unit of a web page template server according to an embodiment of the present invention.
  • the terminal device 10 when the terminal device 10 locally stores the webpage template matching the requested webpage, the first template ID list of the IDs of all matching templates is sent to the middleware server 20 together with the webpage browsing request, and the middleware server The first template ID list is forwarded to the webpage template server 30, and the webpage template data obtaining unit 302 of the webpage template server 30 is configured to sequentially acquire the webpage template ID in the first webpage template ID list, and based on the The acquired webpage template ID is obtained by acquiring the webpage template data from the webpage template data storage unit 301.
  • the difference data generating unit 303 includes the difference data calculating module 3031 and the determining module 3032 as shown in FIG. 5.
  • the difference data calculation module 3031 is configured to calculate difference data between the web page data and the web page template data acquired from the web page template data storage unit 301.
  • the difference data calculation module 3031 calculates the web page template data and the web page data using a difference algorithm.
  • the determining module 3032 is configured to determine the difference data as the difference data when the compression ratio between the calculated difference data and the webpage data is less than a first predetermined threshold, where the calculated difference data is When the compression ratio between the web page data is not less than the first predetermined threshold, the processing of the web page template data acquiring unit 302 and the difference data generating unit 303 is repeated until the difference data is generated.
  • the compression ratio between the difference data and the webpage data is the ratio of the compressed value of the difference data to the data compressed by the webpage data.
  • FIG. 14 is a block schematic diagram of a second embodiment of a web page template server in accordance with an embodiment of the present invention.
  • the terminal device 10 when the terminal device 10 locally stores a webpage template matching the requested webpage, the first template ID list of the IDs of all matching templates is sent to the middle together with the webpage browsing request.
  • the server 20 while the middleware server 20 forwards the first template ID list and the requested webpage address to the webpage template server 30, that is, the webpage browsing request includes the webpage address and the first webpage template ID list, and the webpage template server 30 includes :
  • the webpage template ID list library 305 is configured to store the second webpage template ID list in association with the webpage address.
  • the second webpage template ID list is a template ID list corresponding to the address of the webpage requested to be browsed recommended by the webpage template server 30.
  • the template ID of the webpage template data stored in the webpage template server 30 that matches the address of the webpage constitutes a second webpage template ID list recommended by the webpage template server 30.
  • the webpage template ID list obtaining unit 306 is configured to obtain a corresponding second webpage template ID list from the webpage template ID list library 305 according to the webpage address of the webpage requested to be browsed.
  • the webpage template ID list merging unit 307 is configured to merge the first webpage template ID list and the second webpage template ID list into a third webpage template ID list.
  • the method for the webpage template ID list merging unit 307 to merge the first webpage template ID list and the second my sadness template ID list into the third webpage template ID list is: in the first webpage template ID list and the second webpage template ID list
  • the webpage template IDs are merged according to the priority to form a third webpage template ID list, wherein the intersection of the first webpage template ID list and the second webpage template ID list has the highest priority, and the remaining part of the first webpage template ID list The remaining part of the second webpage template ID list is the lowest.
  • the webpage template data acquiring unit 302 sequentially acquires the webpage template ID in the third webpage template ID list, and based on the acquired webpage template ID, from the webpage template data storage unit 301. Get page template data.
  • the process in which the difference data calculation module 3031 and the determination module 3032 included in the difference data generation unit generate the difference data is the same as the embodiment shown in FIG.
  • a counting unit is further provided (not shown in the figure) And counting, when the compression ratio between the calculated difference data and the webpage data is not less than the first predetermined threshold, counting the number of calculations of the difference data calculation unit, and
  • the webpage template data obtaining unit 302 is configured to acquire the next webpage template ID, and based on the next webpage template ID, from the webpage template data storage unit 301. Get page template data, and
  • the webpage template server 30 When the number of calculations exceeds a second predetermined threshold, the webpage template server 30 returns a delta data generation failure message to the middleware server 20, so that the middleware server 20 fails to generate the difference data. After the message, the web page data is returned to the terminal device 10 for presentation.
  • the webpage template server 30 further includes a delta data saving unit (not shown) for storing the delta data in association with the webpage template ID and the webpage address;
  • a difference data query unit (not shown) for querying the associated difference data in the difference data holding unit according to the webpage template ID and the webpage address, and
  • the delta data generating unit 303 is configured to generate the delta data. A certain amount of difference calculation result is saved. When the same template ID and web page request occur, the difference data generating unit 303 does not need to perform the difference data calculation, and can directly obtain the difference data from the difference data holding unit to improve the response speed. .
  • the webpage template server 30 further includes a second determining unit (not shown) for determining the webpage currently used by the webpage template data acquiring unit 302 after generating the difference data. Whether the template ID belongs to the first webpage template ID list, and
  • the sending unit 304 When the currently used webpage template ID belongs to the first webpage template ID list, the sending unit 304 returns the generated delta data and the currently used webpage template ID to the middleware server 20 and forwards it to the middleware server 20 via the middleware server 20 Terminal device 10,
  • the sending unit 304 returns the currently used webpage template ID to the middleware server 20, and the middleware server 20 will receive the received webpage template ID and webpage.
  • the data is sent to the terminal device 10, and the webpage template downloading unit 104 of the terminal device 10 downloads the webpage template data corresponding to the webpage template ID when it is idle or WIFI.
  • the webpage template server 30 of the present invention may further include a webpage template data generating unit 308, a webpage collecting unit 309, and a webpage saving unit 310.
  • the webpage collecting unit 309 is configured to receive the webpage data sent by the middleware server 20.
  • the webpage saving unit 310 is configured to store the webpage data sent by the middleware server 20 received by the webpage collecting unit 309.
  • the webpage template data generating unit 308 is configured to generate webpage template data according to the webpage data sent by the middleware server 20 stored by the webpage saving unit 310, and generate a corresponding webpage template ID, and store the webpage template data and the webpage template ID correspondingly.
  • the web page template data storage unit 301 stores the web page template ID in association with the web page address in the web page template ID list library 305.
  • the webpage template data generating unit 308 is configured to generate the webpage template data according to the webpage data sent by the middleware server.
  • the webpage template data generating unit 308 uses the unique algorithm to quickly generate the webpage template data.
  • a unique algorithm for quickly generating web page template data can be It is generated by a method of generating a hash value for a webpage or a method of branching webpage data. Since the webpage template may be a webpage, the webpage itself may also be used as a webpage template.
  • the webpage template data generating unit 308 of the present invention may generate the webpage template data, and may create a new webpage template according to the webpage requested by the user when the difference data generating unit 303 fails to generate the difference data, or the template server 30 may receive the middle in advance. Generated by the web page data forwarded by the server 20.
  • the webpage template server 30 can receive multiple The web page data of the middleware server 20, so the web page template server 30 receives a large amount of web page data from the middleware server 20.
  • the embodiments of the present invention need to store a large amount of data and perform a large number of operations on a large amount of webpage data to generate a webpage template. Therefore, embodiments of the present invention can employ Hadoop (Distributed System Infrastructure) clusters for data storage and computation. That is, the web page template server 30 is a server cluster composed of a plurality of servers.
  • the web page template data generating unit 308 can be disposed in a plurality of servers of the server cluster.
  • the server cluster stores webpage data, template data, template index, and the like, and is based on Hadoop-based HBase (distributed, column-oriented open source database) database.
  • the template generation uses the MapReduce (parallel computing method for large-scale data sets) computing framework. Hadoop clusters are a natural distributed storage and computing framework. It is only necessary to increase the number of servers that generate the webpage template in the webpage template server 30, that is, to increase the number of servers including the webpage template data generating unit 308, and to scale the cluster horizontally, and to have good disaster tolerance.
  • the size of the template ID list is limited, for example, the webpage template ID is returned each time. It can only be five.
  • the webpage template server 30 may further include a webpage template deletion unit (not shown) for determining that the number of webpage templates or the size of the webpage in the webpage template data storage unit 301 exceeds a predetermined size.
  • the least recently used web page template data in the web page template data storage unit 301 is deleted.
  • the least recently used means that the webpage template data that has not been used for a long time may not be used for a long period of time in the future. Then, according to the least recent principle, the webpage template data that has not been used in the recent period of time is analyzed, and the webpage template data that may not be used in the future for a long period of time is analyzed, and the webpage template deletion unit will be in the latest period of time. Web page template data not used.
  • the web page presentation system of the present invention may include only terminal devices and servers. That is, the webpage template server 30 of the present invention cannot be considered as a limitation on a certain physical server.
  • the webpage template server 30 can be a server. In order to alleviate the pressure of calculation and storage, the webpage template server 30 can also be a server cluster, and the same middleware.
  • the function of the server 20 can be completed on the same physical server or as a server cluster.
  • the functional modules included in the middleware server 20 and the webpage template server 30 of the present invention can be distributedly arranged. In multiple servers.
  • one or more servers including the webpage template data generating unit 308, the webpage collecting unit 309, and the webpage saving unit 310 may be provided, and one or more servers including the webpage template data acquiring unit 302 and the delta data generating unit 303 may be provided.
  • the group is a server cluster of the invention web page template server 30.
  • the webpage presentation system of the present invention stores and calculates the difference data between the webpage template and the webpage data by setting the webpage template server 30, and the middleware server 20 transmits the difference data to the terminal device 10, and the terminal device 10 locally calls the The webpage template corresponding to the difference data, thereby realizing the presentation of the webpage.
  • the difference data is smaller than the web page data. It can effectively save network resources, reduce bandwidth consumption, and improve the loading speed of web pages, further improving the speed of users browsing web pages.
  • 15 is a flow diagram of an embodiment of a method for implementing web page presentation using a web page template in accordance with the present invention.
  • the embodiment of the invention provides a webpage presentation method.
  • the method is used for transmitting webpage data, and can improve the webpage presentation speed.
  • the webpage presentation method of the embodiment of the present invention may be performed by using the webpage presentation system or the webpage presentation system provided by the embodiment of the present invention.
  • the webpage presentation system or the webpage presentation system of the embodiment of the present invention may also be used to perform the embodiments of the present invention. Web page display method.
  • the web page presentation method shown in FIG. 15 includes the following steps.
  • Step S701 The terminal device acquires a browsing request sent by the user, searches for a webpage template that matches the webpage requesting the webpage browsing request, and sends a webpage browsing request including the first template ID list of the webpage template ID to the middleware server. If not found, the list is empty.
  • the local search for the webpage template that matches the webpage requesting the webpage request may be based on the requested webpage address, or the webpage may be processed to generate a webpage tag for querying, for example, generating a hash value tag.
  • the matching principle between web pages and web page templates depends on the needs of different websites or web pages. For example, when the coverage ratio is used, that is, the coverage between the webpage template of the webpage A and the webpage A reaches a predetermined value, it is considered to match the webpage A.
  • the user submits a browsing request to the terminal device, and the terminal device acquires a browsing request sent by the user.
  • the terminal device can be connected to the middleware server through a radio communication network or the Internet to implement communication and data transmission between the terminal device and the middleware server.
  • the user can make a browsing request to the terminal device by clicking the action.
  • a request packet can only be attached with a maximum of 5 template IDs at a time.
  • the matching manner between the webpage template and the webpage may also be other than the coverage ratio, which is merely an example and is not exhaustive.
  • step S702 after receiving the webpage browsing request sent by the terminal device, the intermediate server acquires the requested webpage data based on the webpage browsing request, and forwards the obtained webpage data to the webpage template server.
  • the middleware server can store some web page addresses and web page data locally. After receiving the webpage browsing request sent by the terminal device, the webpage browsing request is used to locally find whether the requested webpage exists, or go to the webpage server to obtain the webpage. The middleware server forwards the obtained webpage data to the webpage template server, and sends the requested webpage address to the webpage template server.
  • Step S703 the webpage template server locally obtains webpage template data corresponding to the webpage data, and generates difference data between the webpage data and the webpage template data based on the received webpage data and the acquired webpage template data, and The generated delta data is sent to the middleware server.
  • the webpage template data corresponding to the webpage data is the data of the webpage template matching the webpage, and the matching principle here may be the same as or different from the previous step. There is the same data between the web page and the corresponding web page template, but there are also different data differences. Wherein, the difference data may be data existing in the webpage and data that does not exist in the webpage template.
  • the webpage data referred to herein includes structural data, content data or encoded data of the webpage, and the data is transmitted from the middleware server to the terminal device via the radio communication network or the Internet or transmitted by the terminal device to the middleware server.
  • the webpage template server locally obtains the webpage template data corresponding to the webpage data, and generates the webpage data and the webpage template data based on the received webpage data and the acquired webpage template data.
  • the difference data holding unit further stores the webpage template ID and the webpage address difference data in association with each other.
  • the webpage template server receives the webpage data sent by the middleware server, the requested webpage URL, and the webpage template ID corresponding to the webpage, the associated difference is checked in the difference data holding unit according to the webpage template ID and the webpage address. The quantity data, and when the difference data query unit does not query the associated difference data, proceeds to step S703.
  • Step S704 the middleware server forwards the received difference data to the terminal device.
  • Step S705 The terminal device displays the requested webpage according to the received difference data and the locally stored webpage template data corresponding to the delta data.
  • the terminal device After receiving the difference data transmitted by the network, the terminal device searches the webpage template data locally through the webpage template ID or the label capable of characterizing the webpage template, and the webpage can be displayed from the webpage template data together with the difference data.
  • the data of the webpage template includes data such as encoding information of the webpage template.
  • the webpage is displayed according to the webpage template data and the difference data, wherein the webpage data can be obtained by decoding the webpage template data and the delta data.
  • the difference data is smaller than the webpage data, in the case where the webpage template exists locally in the terminal device, only the difference data may be transmitted when the webpage data is transmitted.
  • the difference data is a part of the webpage data, and therefore the method of transmitting the differential data is the same as the method of transmitting the webpage data, and is transmitted through a network such as a radio communication network or the Internet.
  • the middleware server sends the difference data to the terminal device, and the terminal device invokes the webpage template corresponding to the webpage, thereby realizing the presentation of the webpage.
  • the size of the difference data is much smaller than the webpage data. Therefore, the network resources occupied by the transmission of the difference data are far less than the network resources occupied by the transmission of the webpage data, thereby improving the transmission efficiency of the webpage data and further improving the loading speed of the webpage.
  • 16 is a flow chart of a first embodiment of the step S703 in the case where the webpage browsing request of the method for realizing webpage presentation using the webpage template according to the present invention includes the first webpage template ID list.
  • the webpage template data acquiring unit sequentially acquires the first webpage template ID in the first webpage template ID list, and then step S802.
  • the webpage template data obtaining unit acquires the webpage template data from the webpage template data storage unit based on the currently acquired first webpage template ID.
  • the difference data calculation module calculates difference data between the webpage data and the webpage template data acquired from the webpage template data storage unit.
  • step S804 it is determined whether the compression ratio between the difference data and the webpage data is less than a first predetermined threshold.
  • step S805 the determining module determines the difference data as the difference data, and then proceeds to step S806.
  • the transmitting unit returns the generated difference data and the currently used webpage template ID to the middleware server and forwards to the terminal device via the middleware server.
  • step S807 When the compression ratio between the calculated difference data and the webpage data is not less than the first predetermined threshold, the process proceeds to step S807, where it is determined whether the current first webpage template ID is the last webpage template ID in the first webpage template ID list. . If not, the process proceeds to step S810.
  • the webpage template data obtaining unit acquires the next first webpage template ID from the first webpage template ID list as the new currently acquired first webpage template ID, and then returns to step S802. If yes, go to step S811, the sending unit returns the information that the difference data calculation fails to the middleware server, and the middleware server only returns the webpage data to the terminal device, and the process ends.
  • the webpage template server is excessively calculated.
  • the step S807 of the embodiment may be replaced by S808 and S809.
  • the counting unit adds 1 to the difference calculation number of the difference data calculating unit, and step S809 determines the difference calculating unit. Whether the number of difference settlements exceeds a second predetermined threshold. When the number of calculations does not exceed the second predetermined threshold, the process proceeds to step S810. When the second predetermined threshold is exceeded, the process proceeds to step S811.
  • 17 is a flow chart of a second embodiment of the step S703 in the case where the webpage browsing request of the method for realizing webpage presentation using the webpage template according to the present invention includes the first webpage template ID list.
  • the webpage template ID list obtaining unit selects from the webpage template ID list library according to the webpage address of the webpage requested to be browsed. Obtain a corresponding second webpage template ID list.
  • a second webpage template ID list is stored in the webpage template ID list library in association with the webpage address.
  • the webpage template ID list merging unit merges the first webpage template ID list and the second webpage template ID list into a third webpage template ID list.
  • the method for generating the third webpage template ID list in this step may be that the webpage template IDs in the first webpage template ID list and the second webpage template ID list are combined according to priorities to form a third webpage template ID list, wherein the first The intersection of the webpage template ID list and the second webpage template ID list has the highest priority, the remaining part of the first webpage template ID list is second, and the remaining part of the second webpage template ID list is the lowest.
  • step S903 the webpage template data acquiring unit sequentially acquires the third webpage template ID in the third webpage template ID list. Then, in step S904, the webpage template data is acquired from the webpage template data storage unit based on the acquired webpage template ID.
  • the difference data calculation module calculates difference data between the webpage data and the webpage template data acquired from the webpage template data storage unit.
  • step S906 it is determined whether the compression ratio between the difference data and the webpage data is less than a first predetermined threshold.
  • step S907 When the compression ratio between the calculated difference data and the webpage data is less than the first predetermined threshold, the process proceeds to step S907, and the determining module determines the difference data as the difference data. Then, proceeding to step S908, the transmitting unit returns the generated difference data and the currently used webpage template ID to the middleware server and forwards to the terminal device via the middleware server.
  • step S909 When the compression ratio between the calculated difference data and the webpage data is not less than the first predetermined threshold, the process proceeds to step S909, where it is determined whether the current third webpage template ID is the last webpage template ID in the third webpage template ID list. . If not, the process proceeds to step S910, and the webpage template data acquiring unit acquires the next third webpage template ID from the third webpage template ID list as the new currently acquired third webpage template ID, and then returns to step S904. If yes, go to step S913, the sending unit returns the information that the difference data calculation fails to the middleware server, and the middleware server only returns the webpage data to the terminal device, and the process ends.
  • the webpage template server is excessively calculated.
  • the step S909 of the embodiment may be replaced by S911 and S912.
  • the step S911 is that the counting unit adds 1 to the difference calculation number of the difference data calculating unit, and then the step S912 determines the difference calculating unit. Whether the number of difference settlements exceeds a second predetermined threshold. When the number of calculations does not exceed the second predetermined threshold, the process returns to step S910. When the second predetermined threshold is exceeded, the process proceeds to step S913.
  • the preferred embodiment of the present embodiment further includes a step S915, the second determining unit determines whether the webpage template ID currently used by the webpage template data acquiring unit belongs to the first webpage template ID list.
  • the sending unit proceeds to step S916, and the sending unit returns the currently used webpage template ID to the middleware server for the middleware server to receive the received webpage template ID and The web page data is simultaneously sent to the terminal device.
  • the webpage template downloading unit of the terminal device downloads the corresponding webpage template data from the webpage template data storage unit via the middleware server based on the webpage template ID, and the webpage template data is downloaded by the webpage template data saving unit.
  • the web page template ID is saved in association.
  • the webpage template downloading unit of the terminal device downloads the corresponding webpage template data from the webpage template data storage unit via the middleware server based on the webpage template ID, and may be intelligently idle after the webpage is displayed. Or, in the case of wifi, requesting to download webpage template data can avoid occupying bandwidth and improving the browsing experience of the user.
  • the middleware server sends the recommended webpage template data to the terminal device when the network is idle, so that the terminal device needs to be directly used when the webpage template needs to be used again, not only less.
  • Bandwidth consumption also speeds up browsing and improves user experience.
  • step S701 of FIG. 7 the terminal device acquires a browsing request sent by the user, and locally finds a webpage template that matches the webpage requesting the webpage browsing request, and if not found, sends the webpage template to the middleware server.
  • the first template ID list is not included in the web browsing request. If the webpage template matching the web browser request is not found, the webpage template that the terminal device does not store the webpage requesting the webpage browsing request locally does not exist.
  • the present invention further includes the step of the terminal device searching and downloading the webpage template through the middleware server to the webpage template server.
  • step S703 is a flow chart of a second embodiment of the step S703 in the case where the webpage browsing request of the method for realizing webpage presentation using the webpage template according to the present invention includes the first webpage template ID list.
  • the present invention further includes a step S1001, and sending a first template ID that does not include the webpage template ID to the middleware server. List of web browsing requests.
  • Step S1002 After receiving the webpage browsing request sent by the terminal device, the intermediate server acquires the requested webpage data based on the webpage browsing request, and forwards the obtained webpage data to the webpage template server.
  • step S1003 the webpage template ID list obtaining unit acquires the corresponding second webpage template ID list from the webpage template ID list library according to the webpage address of the webpage requested to be browsed.
  • a second webpage template ID list is stored in the webpage template ID list library in association with the webpage address.
  • the webpage template data acquiring unit sequentially acquires the second webpage template ID in the second webpage template ID list, and then, in step S1005, the webpage template data acquiring unit is based on the currently obtained second webpage template ID, from the webpage template data storage unit. Get the page template data in .
  • step S1006 the difference data calculation module calculates difference data between the webpage data and the webpage template data acquired from the webpage template data storage unit.
  • step S1007 it is determined whether the compression ratio between the difference data and the webpage data is less than a first predetermined threshold.
  • step S1008 the determining module determines the difference data as the difference data, and then proceeds to step S1009, and sends The unit returns the currently used webpage template ID to the middleware server for the middleware server to simultaneously send the received webpage template ID and webpage data to the terminal device.
  • the webpage template downloading unit of the terminal device downloads the corresponding webpage template data from the webpage template data storage unit via the middleware server based on the webpage template ID, and the webpage template data is downloaded by the webpage template data saving unit.
  • the web page template ID is saved in association.
  • the webpage template downloading unit of the terminal device downloads the corresponding webpage template data from the webpage template data storage unit via the middleware server based on the webpage template ID, and may be intelligently idle after the webpage is displayed. Or, in the case of wifi, requesting to download webpage template data can avoid occupying bandwidth and improving the browsing experience of the user.
  • the middleware server sends the recommended webpage template data to the terminal device when the network is idle, so that the terminal device needs to be directly used when the webpage template needs to be used again, not only less.
  • Bandwidth consumption also speeds up browsing and improves user experience.
  • step S1010 When the compression ratio between the calculated difference data and the webpage data is not less than the first predetermined threshold, the process proceeds to step S1010, and it is determined whether the current second webpage template ID is the last webpage template ID in the second webpage template ID list. . If not, the process proceeds to step S1011, and the webpage template data obtaining unit acquires the next second webpage template ID from the second webpage template ID list as the new currently acquired second webpage template ID, and then returns to step S1005. If yes, the process goes to step S1014, the sending unit returns the information that the difference data calculation fails to the middleware server, and the middleware server only returns the webpage data to the terminal device, and the process ends.
  • the webpage template server is excessively calculated.
  • the step S1010 of the embodiment may be replaced by S1012 and S1013.
  • the step S1012 is that the counting unit adds 1 to the difference calculation number of the difference data calculation unit, and then the step S1013 determines the difference calculation unit. Whether the number of difference settlements exceeds a second predetermined threshold. When the number of calculations does not exceed the second predetermined threshold, the process returns to step S1011. When the second predetermined threshold is exceeded, the process proceeds to step S1014.
  • the webpage template data generating unit is configured to generate webpage template data according to the webpage data middleware sent by the middleware, and generate a corresponding webpage template ID. And storing the webpage template data and the webpage template ID in the webpage template data storage unit correspondingly, and storing the webpage template ID and the webpage address in the webpage template ID list library correspondingly.
  • the webpage template data generating unit is configured to generate webpage template data according to the webpage data middleware sent by the middleware, and the webpage template data is quickly generated by using a unique algorithm, which may be a method for generating a hash value for the webpage or a webpage data. The method of branch generation is generated. Since the webpage template can be a webpage, the webpage data itself can also be used as the webpage template data.
  • the time for generating the webpage template by the present invention is not limited to the steps S811, S913, and S1014. It is also possible prior to the present invention that the web page template server generates web page template data in advance by receiving web page data forwarded by the middleware server. In the method that the webpage template server generates the webpage template data in advance by receiving the webpage data forwarded by the middleware server, since the webpage accessed by the middleware server is many every day, and the webpage template server can receive the webpage data of the plurality of middleware servers, the webpage The template server is connected from the middleware server Received a huge amount of web page data.
  • the embodiments of the present invention need to store a large amount of data and perform a large number of operations on a large amount of webpage data to generate a webpage template. Therefore, embodiments of the present invention can employ Hadoop (Distributed System Infrastructure) clusters for data storage and computation. That is, the web page template server is a server cluster composed of multiple servers. The web page template data generating unit may be disposed in a plurality of servers of the server cluster. The server cluster stores webpage data, template data, template index, and the like, and is based on Hadoop-based HBase (distributed, column-oriented open source database) database. The template generation uses the MapReduce (parallel computing method for large-scale data sets) computing framework. Hadoop clusters are a natural distributed storage and computing framework. Only the number of servers that generate webpage templates is added to the webpage template server, that is, the server that includes the webpage template data generating unit can be scaled horizontally and has good disaster tolerance.
  • MapReduce Parallel computing method for large-scale data sets
  • the webpage display method of the present invention may further include a step of eliminating the terminal device webpage template data. For example, if the threshold is exceeded, the template may be eliminated according to the LRU (Least Recently Used), that is, the least recently used page replacement algorithm.
  • LRU Least Recently Used
  • the LRU algorithm can be used to delete a web page template that has been used less recently and is not used for a long period of time in the future, thereby saving storage resources of the terminal device.
  • the method for displaying the webpage of the webpage template server may also include the step of eliminating the webpage template data of the webpage template server.
  • a program for executing the web page template generating method of the embodiment of the present invention may be stored in a computer readable storage medium. Accordingly, an embodiment of the present invention further provides a computer readable storage medium storing a program for executing a webpage template generation method of an embodiment of the present invention. Accordingly, in one embodiment of the present invention, a computer readable medium having program code executable by a processor, when executed, causes the processor to perform the steps of: collecting web page data of a web page Generating a webpage template of the webpage according to the webpage data; generating a template index according to the generated webpage template.
  • the mobile terminal of the present invention may be a variety of handheld terminal devices, such as mobile phones, personal digital assistants (PDAs), etc., and thus the scope of protection of the present invention should not be limited to a particular type of mobile terminal.
  • PDAs personal digital assistants
  • the method according to the invention can also be implemented as a computer program executed by a CPU.
  • the computer program is executed by the CPU, the above-described functions defined in the method of the present invention are performed.
  • the method steps and system units described above may also be implemented with a controller and a computer readable storage device for storing a computer program that causes the controller to implement the steps or unit functions described above.
  • a computer readable storage device eg, a memory
  • a volatile memory can be a volatile memory or a nonvolatile memory, or can include both volatile and nonvolatile memory.
  • non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash.
  • ROM read only memory
  • PROM programmable ROM
  • EPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • flash volatile memory
  • Volatile memory can include random access memory (RAM), which can act as external cache memory.
  • RAM can be obtained in a variety of forms, such as synchronous RAM (DRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM) and direct Rambus RAM (DRRAM).
  • DRAM synchronous RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM Synchronous Link DRAM
  • DRRAM direct Rambus RAM
  • Storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.
  • DSP digital signal processor
  • ASIC dedicated An integrated circuit
  • FPGA field programmable gate array
  • a general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
  • the processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
  • a software unit may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor, such that the processor can read information from or write information to the storage medium.
  • the storage medium can be integrated with a processor.
  • the processor and the storage medium can reside in an ASIC.
  • the ASIC can reside in the user terminal.
  • the processor and the storage medium may reside as discrete components in the user terminal.
  • the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted as one or more instructions or code on a computer readable medium.
  • Computer readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another.
  • a storage medium may be any available media that can be accessed by a general purpose or special purpose computer.
  • the computer readable medium may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage device, disk storage device or other magnetic storage device, or may be used for carrying or storing in the form of The required program code of an instruction or data structure and any other medium that can be accessed by a general purpose or special purpose computer or a general purpose or special purpose processor. Also, any connection is properly termed a computer-readable medium.
  • a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave is used to transmit software from a website, server, or other remote source
  • the coaxial line Cables, fiber optic cables, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are all included in the definition of the medium.
  • a magnetic disk and an optical disk include a compact disk (CD), a laser disk, an optical disk, a digital versatile disk (DVD), a floppy disk, a Blu-ray disk, in which a disk generally reproduces data magnetically, and the optical disk optically reproduces data using a laser. . Combinations of the above should also be included within the scope of computer readable media.
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device, such that they may be stored in a storage device by a computing device, or they may be fabricated into individual integrated circuit modules, or Multiple modules or steps are made into a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software.

Abstract

本发明公开了一种网页模板生成方法和服务器。该网页模板生成方法包括:采集网页的网页数据,根据网页数据生成网页的网页模板。通过本发明,解决了现有技术中的网页模板生成方法对目标网站依赖性比较强的问题,降低了网页模板生成方法对目标网站的依赖性。

Description

网页模板生成方法和服务器 技术领域
本发明涉及移动浏览器领域,具体而言,涉及一种网页模板生成方法和服务器。
背景技术
针对网页浏览,在现有技术中提出了一种压缩技术,通过扩展HTTP请求,允许网站提供模板和增量文件,在客户端访问相同模板网页时,仅第一次需要下载模板,其它请求仅需要下载增量文件,使用增量文件和模板文件重建原始页面,从而减少客户端的访问流量。因此,该技术可以利用多网页之间的相同部分来压缩流量。
发明人发现该技术的缺陷在于目标网站必须支持该协议,对目标网站的依赖性比较强,并且需要目标网站自己提供模板和对应的增量文件,这也是该压缩技术不能广泛推广的原因之一。
另外,现有技术的自动生成模板的方式主要是依据解析网页DOM(Document Obiect Model)树结构,提取公共部分,此方法计算量大,不易提取,兼容性不强。而且,现有的常用的模板生成程序是对单个网站的,处理规模小。
在用户浏览网页时,终端设备需要接收由服务器发送的大量网页数据,以实现网页展现。在这些展现的网页之间往往存在大量重复数据,每次用户浏览这些有重复数据的网页时,这些重复的数据需要被重复加载,重复传输,不仅在传输过程中占用较多的带宽,而且在网页加载时增加了响应时间,导致网页浏览速度慢。
针对现有技术中的网页模板生成方法对目标网站依赖性比较强的问题,目前尚未提出有效的解决方案。
发明内容
本发明的主要目的在于提供一种网页模板生成方法和服务器,以解决现有技术中的网页模板生成方法对目标网站依赖性比较强的问题。
为了实现上述目的,根据本发明的一个方面,提供了一种网页模板生成方法。根据本发明的网页模板生成方法包括:采集网页的网页数据;根据网页数据生成网页的 网页模板;根据生成的网页模板生成模板索引,通过该模板索引,可以检索与网页对应的网页模板。
进一步地,在根据网页数据生成网页的网页模板之后,网页模板生成方法还包括:向提供网页模板的多个模板服务器发布网页模板和模板索引;多个模板服务器分别存储网页模板和模板索引;多个模板服务器中的第一模板服务器利用模板索引检索与网页匹配的网页模板,向多个模板服务器中除第一模板服务器之外的其它模板服务器提供与网页匹配的模板。
进一步地,向提供网页模板的多个模板服务器发布网页模板和模板索引包括:在生成多个网页模板和模板索引之后;计算多个网页模板的集合与历史模板集合的整体差异率;判断整体差异率是否大于预设整体差异率阈值;如果判断出整体差异率大于预设整体差异率阈值,则发布网页模板和模板索引;如果判断出整体差异率不大于预设整体差异率阈值,则不发布网页模板和模板索引。
进一步地,根据生成的网页模板生成模板索引包括:选取质量符合预定质量条件的模板;确定模板适用的URL路径;从URL路径中选取质量符合预定质量条件的模板适用的URL路径;将选取的路径转换成模板索引。
进一步地,在根据网页数据生成网页的网页模板之后,网页模板生成方法还包括:判断网页模板的数量是否达到预设数量;如果判断出网页模板的数量达到预设数量,则计算每个网页模板的覆盖率;将覆盖率小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板进行对比;如果小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板的差异率小于预设差异率阈值,则将小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板合并。
进一步地,将覆盖率小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板进行对比包括:将多个网页模板按照覆盖率大小进行由大到小的排序;将排在后面的网页模板与排在前面的网页模板进行对比。
进一步地,根据生成的网页模板生成模板索引包括:存储多个网页模板;计算每个网页模板的覆盖率;判断每个路径下的网页模板的覆盖率的总和是否达到第二预设覆盖率阈值;删除网页模板的覆盖率的总和未达到第二预设覆盖率阈值的路径下的网页模板。
为了实现上述目的,根据本发明的另一方面,提供了一种网页模板服务器。根据本发明的网页模板服务器包括:采集单元,用于采集网页的网页数据;生成单元,用 于根据网页数据生成网页的网页模板;索引单元,用于根据生成的网页模板生成模板索引。
进一步地,网页模板服务器还包括:发布单元,用于在根据网页数据生成网页的网页模板之后,向提供网页模板的多个模板服务器发布网页模板和模板索引;存储单元,用于在多个模板服务器分别存储网页模板和模板索引;模板检索单元,用于利用模板索引检索与网页匹配的网页模板,向其它服务器提供与网页匹配的模板。
进一步地,发布单元包括:计算模块,用于计算多个网页模板的集合与历史模板集合的整体差异率;判断模块,用于判断整体差异率是否大于预设整体差异率阈值;发布模块,用于在判断出整体差异率大于预设整体差异率阈值,发布网页模板,在判断出整体差异率不大于预设整体差异率阈值,不发布网页模板。
进一步地,索引单元包括:模板选取模块,用于选取质量符合预定质量条件的模板;模板路径推导模块,用于确定模板适用的URL路径;模板路径剪枝模块,用于从URL路径中选取质量符合预定质量条件的模板适用的URL路径;模板索引生成模块,用于将选取的路径转换成模板索引。
进一步地,网页模板服务器还包括:判断单元,用于在根据网页数据生成网页的网页模板之后判断网页模板的数量是否达到预设数量;计算单元,用于在判断出网页模板的数量达到预设数量时,计算每个网页模板的覆盖率;对比单元,用于将覆盖率小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板进行对比;合并单元,用于在小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板的差异率小于预设差异率阈值,将小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板合并。
进一步地,对比单元包括:排序模块,用于将多个网页模板按照覆盖率大小进行由大到小的排序;对比模块,用于将排在后面的网页模板与排在前面的网页模板进行对比。
进一步地,索引单元包括:存储模块,用于在根据网页数据生成网页的网页模板之后,存储多个网页模板;计算模块,用于计算每个网页模板的覆盖率;第三判断模块,用于判断每个路径下的网页模板的覆盖率的总和是否达到第二预设覆盖率阈值;删除模块,用于删除网页模板的覆盖率的总和未达到第二预设覆盖率阈值的路径下的网页模板。
通过本发明,采用采集网页的网页数据,根据网页数据生成网页的网页模板,解决了现有技术中的网页模板生成方法对目标网站依赖性比较强的问题,进而达到了降低网页模板生成方法对目标网站的依赖性的效果。
为了实现上述目的,根据本发明的另一方面,提供了一种网页模板服务器,该网页模板服务器包括:
网页模板数据存储单元,用于存储网页模板数据;
网页模板数据获取单元,用于从网页模板数据存储单元获取与中间件服务器在接收到来自终端设备的网页浏览请求后获取并转发的网页数据对应的网页模板数据;
差量数据生成单元,用于基于从中间件服务器接收的网页数据和与该网页数据对应的网页模板数据,生成所述网页数据和网页模板数据之间的差量数据,以及
发送单元,用于将所生成的差量数据经由中间件服务器向终端设备转发所述差量数据,以供终端设备根据所述差量数据和终端设备本地存储的与差量数据对应的网页模板数据展现所请求的网页。
其中,所述网页浏览请求包含第一网页模板ID列表,所述网页模板数据获取单元被配置为顺序获取第一网页模板ID列表中的网页模板ID,并且基于所获取的网页模板ID,从所述网页模板数据存储单元中获取网页模板数据,以及
所述差量数据生成单元包括:
差值数据计算模块,用于计算网页数据和从网页模板数据存储单元中获取的网页模板数据之间的差值数据;和
确定模块,用于在所计算出的差值数据与网页数据之间压缩比小于第一预定阈值时,将所述差值数据确定为所述差量数据,以及
在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,所述网页模板数据获取单元和所述差量数据生成单元被配置为重复执行处理过程,直到生成所述差量数据。
其中,当所述网页浏览请求包含网页地址和第一网页模板ID列表,所述网页模板服务器包括,
网页模板ID列表库,用于与网页地址相关联地存储第二网页模板ID列表;
网页模板ID列表获取模块,用于根据所请求浏览的网页的网页地址,从网页模板ID列表库中获取对应的第二网页模板ID列表,
网页模板ID列表合并单元,用于将第一网页模板ID列表和第二网页模板ID列表合并成第三网页模板ID列表;
所述网页模板数据获取单元被配置为顺序获取第三网页模板ID列表中的网页模板ID,并且基于所获取的网页模板ID,从所述网页模板数据存储单元中获取网页模板数据,以及
所述差量数据生成单元包括:差值数据计算模块,用于计算网页数据和从网页模板数据存储单元中获取的网页模板数据之间的差值数据;和
确定模块,用于在所计算出的差值数据与网页数据之间压缩比小于第一预定阈值时,将所述差值数据确定为所述差量数据,以及
在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,所述网页模板数据获取单元和所述差量数据生成单元被配置为重复执行处理过程,直到生成所述差量数据。
其中,所述网页模板ID列表合并单元被配置为对第一网页模板ID列表和第二网页模板ID列表中的网页模板ID按照优先级进行合并,形成第三网页模板ID列表,其中第一网页模板ID列表和第二网页模板ID列表的交集的优先级最高,第一网页模板ID列表中的剩余部分次之,第二网页模板ID列表中的剩余部分最低。
作为优选的本发明的网页模板服务器还包括:差量数据保存单元,用于与网页模板ID和网页地址相关联地存储差量数据;以及
差量数据查询单元,用于根据网页模板ID和网页地址,在所述差量数据保存单元中查询相关联的差量数据,以及
在所述差量数据查询单元没有查询到相关联的差量数据时,所述差量数据生成单元被配置为生成所述差量数据。
作为优选的本发明的所述差量数据生成单元还包括:
计数单元,用于在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,计数所述差值数据计算单元的计算次数,以及
在所述计算次数不超过第二预定阈值时,所述网页模板数据获取单元被配置为获取下一网页模板ID,并且基于所述下一网页模板ID,从所述网页模板数据存储单元中获取网页模板数据,以及
作为优选的本发明的所述网页模板服务器还包括:差量数据生成失败消息生成单元,用于在所述计算次数超过第二预定阈值时,生成差量数据生成失败消息,以及
所述发送单元还被配置为向所述中间件服务器返回差量数据生成失败消息,以便所述中间件服务器在接收到所述差量数据生成失败消息后,向终端设备返回网页数据来进行展现。
作为优选的本发明的网页模板服务器,还包括:第二判断单元,用于在生成所述差量数据后,判断所述网页模板数据获取单元当前使用的网页模板ID是否属于第一网页模板ID列表,以及
在当前使用的网页模板ID属于第一网页模板ID列表时,所述发送单元被配置为将所生成的差量数据和该当前使用的网页模板ID返回给中间件服务器并经由中间件服务器转发给终端设备,
在当前使用的网页模板ID不属于第一网页模板ID列表时,所述发送单元被配置为将当前使用的网页模板ID返回给中间件服务器,并且中间件服务器将所接收的网页模板ID和网页数据发送给终端设备。
为了实现上述目的,根据本发明的另一方面,提供了一种网页模板生成方法。该网页模板生成方法可以为网页模板服务器执行的利用网页模板实现网页展现的方法,该方法包括:
在获取到中间件服务器响应于所接收的来自终端设备的网页浏览请求后获取并转发的网页数据后,从网页模板服务器中的网页模板数据存储单元获取与所述网页数据对应的网页模板数据;
基于所述网页数据和所述网页模板数据,生成所述网页数据和网页模板数据之间的差量数据,以及
将所生成的差量数据经由中间件服务器转发给终端设备,以供终端设备根据所述差量数据和终端设备本地存储的与差量数据对应的网页模板数据展现所请求的网页。
其中,所述网页浏览请求包含第一网页模板ID列表,以及
从网页模板数据存储单元获取的网页数据对应的网页模板数据,以及基于所述网页数据和所述网页模板数据,生成所述网页数据和网页模板数据之间的差量数据包括:
顺序获取第一网页模板ID列表中的第一网页模板ID来重复执行下述过程,直到生成所述差量数据:
基于当前获取的第一网页模板ID,从网页模板数据存储单元中获取网页模板数据,以及
计算网页数据和从网页模板数据存储单元中获取的网页模板数据之间的差值数据,
在所计算出的差值数据与网页数据之间压缩比小于第一预定阈值时,将所述差值数据确定为所述差量数据,以及
在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,从第一网页模板ID列表中获取下一第一网页模板ID,作为新的当前获取的第一网页模板ID。
其中,所述网页浏览请求包含所请求的网页的网页地址和第一网页模板ID列表,所述网页模板服务器的网页模板ID列表库中与网页地址相关联地存储有第二网页模板ID列表,
从网页模板数据存储单元获取的网页数据对应的网页模板数据,以及基于所述网页数据和所述网页模板数据,生成所述网页数据和网页模板数据之间的差量数据包括:
根据所请求浏览的网页的网页地址,从网页模板ID列表库中获取对应的第二网页模板ID列表,
将第一网页模板ID列表和第二网页模板ID列表合并成第三网页模板ID列表;
顺序获取第三网页模板ID列表中的网页模板ID来重复执行下述过程,直到生成所述差量数据:
基于当前获取的网页模板ID,从网页模板数据存储单元中获取网页模板数据,以及
计算网页数据和从网页模板数据存储单元中获取的网页模板数据之间的差值数据,
在所计算出的差值数据与网页数据之间压缩比小于第一预定阈值时,将所述差值数据确定为所述差量数据,以及
在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,从第三网页模板ID列表中获取下一网页模板ID,作为新的当前获取的网页模板ID。
其中,将第一网页模板ID列表和第二网页模板ID列表合并成第三网页模板ID列表包括,
对第一网页模板ID列表和第二网页模板ID列表中的网页模板ID按照优先级进行合并,形成第三网页模板ID列表,其中第一网页模板ID列表和第二网页模板ID列表的交集的优先级最高,第一网页模板ID列表中的剩余部分次之,第二网页模板ID列表中的剩余部分最低。
其中,还包括:在生成所述差量数据后,所述网页模板服务器判断当前使用的网页模板ID是否属于第一网页模板ID列表,以及
在当前使用的网页模板ID属于第一网页模板ID列表时,所述网页模板服务器将所生成的差量数据和该当前使用的网页模板ID返回给中间件服务器并经由中间件服务器转发给终端设备,
在当前使用的网页模板ID不属于第一网页模板ID列表时,所述网页模板服务器将当前使用的网页模板ID返回给中间件服务器,并且中间件服务器将所接收的网页模板ID和网页数据发送给终端设备。
为了实现上述目的,根据本发明的另一方面,提供了一种具有处理器可执行的程序代码的计算机可读介质,其特征在于,在被执行时,所述程序代码使得处理器执行下述步骤:采集网页的网页数据;根据所述网页数据生成所述网页的网页模板;根据生成的所述网页模板生成模板索引。
为了实现上述目的,根据本发明的另一方面,提供了一种计算机程序,该计算机程序用于执行本发明提供的任意一种网页模板生成方法。
利用本发明的网页模板实现网页展现的方法和网页模板服务器,通过设置网页模板服务器来存储和计算网页模板和网页数据之间的差量数据,由中间件服务器将差量数据发送至终端设备,终端设备本地调用该差量数据对应的网页模板,从而实现网页的展现。传输网页数据时仅仅传输差量数据,而差量数据相较于网页数据较小。能够 有效的节约网络资源,减少带宽的占用,并且提高了网页的加载速度,进一步提高用户的浏览网页的速度。
附图说明
构成本申请的一部分的附图用来提供对本发明的进一步理解,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本发明第一实施例的网页模板生成方法的流程图;
图2是根据本发明第二实施例的网页模板生成方法的流程图;
图3是根据本发明第三实施例的网页模板生成方法的流程图;
图4是根据本发明第四实施例的网页模板生成方法的流程图;
图5是根据本发明第一实施例的网页模板服务器的示意图;
图6是根据本发明第二实施例的网页模板服务器的示意图;
图7是根据本发明第三实施例的网页模板服务器的示意图;
图8是根据本发明第四实施例的网页模板服务器的示意图;
图9是根据本发明实施例的网页模板服务器与中间服务器以及终端设备连接的方框示意图;
图10是根据本发明实施例的终端设备的一个实施例的方框示意图;
图11是根据本发明实施例的中间件服务器的一个实施例的方框示意图;
图12是根据本发明实施例的网页模板服务器的一个实施例的方框示意图;
图13是根据本发明实施例的网页模板服务器的差量数据生成单元一个实施例的方框示意图;
图14是根据本发明实施例的网页模板服务器的第二个实施例的方框示意图;
图15是根据本发明利用网页模板实现网页展现的方法的实施例流程图;
图16a和图16b是根据本发明利用网页模板实现网页展现的方法的网页浏览请求包含第一网页模板ID列表的情况下S703步骤的第一实施例流程图;
图17a和图17b是根据本发明利用网页模板实现网页展现的方法的网页浏览请求包含第一网页模板ID列表的情况下S703步骤的第二实施例流程图;以及
图18a和图18b为本发明利用网页模板实现网页展现的方法中终端设备获取网页模板数据的过程的流程图。
具体实施方式
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本发明。
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
本发明实施例提供了一种网页模板生成方法。
图1是根据本发明第一实施例的网页模板生成方法的流程图。如图所示,该网页模板生成方法包括如下步骤:
步骤S101,采集网页的网页数据。采集网页的网页数据是需要浏览网页的网页数据,网页的网页数据来自一个客户端或多个客户端,采集网页的网页数据可以是来自一个客户端的一个或者多个网页的网页数据,采集网页的网页数据还可以是相同域名或不同域名下网页的数据。存储这些采集到的网页数据。
需要说明的是,采集网页的网页数据可以根据用户浏览网页的需要进行采集,上述网页的网页数据的来源只是为了举例说明可以采集上述来源的网页的数据,并不用于限定在采集网页的网页数据过程中一定要采集上述所有网页来源的所有网页的网页数据。
步骤S102,根据采集到的网页数据生成该网页的网页模板。
生成网页模板可以利用一种局部敏感的哈希算法(simhash)算法计算生成。具体地,可以利用simhash算法对网页数据生成一个N位的哈希值,对生成的N位哈希值通过随机散列取前缀的方法生成T个标签值,对每个标签值从同域名下的网页模板中查找网页模板,如果找到合适的网页模板,可以将查找到的网页模板作为需要浏览网页的网页模板进行增量数据的传输,如果没有找到合适的网页模板,可以将需要浏览的网页模板作为网页模板存储在模板库中。
步骤S103,根据生成的网页模板生成模板索引。通过该模板索引,可以查找与网页对应的网页模板,为了模板调用方便,根据生成的网页模板生成模板索引,利用模板索引查找匹配的网页模板。
由于通过上述方法生成的网页模板可能会导致出现相同或相似的网页模板,这些相同或相似的模板可能存储在不同的客户端中,为了使得到的网页模板减少存储空间的占用以及使得得到的网页模板更加有代表性,可以保留相同或相似的网页模板中的其中一个,将其余相同或相似的模板删除。
在该实施例中,在建立网页模板时,可以利用采集到的网页数据建立该网页的网页模板,这样,模板的建立不依赖于特定的目标网站,降低了对目标网站的依赖性,能够针对任何的目标网站建立相应的网页模板。
图2是根据本发明第二实施例的网页模板生成方法的流程图。如图所述,该网页模板生成方法包括如下步骤:
步骤S201,采集网页的网页数据。采集网页的网页数据可以是需要浏览网页的网页数据,网页的网页数据可以来自一个客户端或多个客户端,采集网页的网页数据可以是来自一个客户端的一个或者多个网页的网页数据,采集网页的网页数据还可以是相同域名或不同域名下网页的数据。
需要说明的是,采集网页的网页数据可以根据用户浏览网页的需要进行采集,上述网页的网页数据的来源只是为了举例说明可以采集上述来源的网页的数据,并不用 于限定在采集网页的网页数据过程中一定要采集上述所有网页来源的所有网页的网页数据。
步骤S202,根据采集到的网页数据生成该网页的网页模板。
由于通过上述步骤生成的模板不止一个,为了能够获得与用户浏览的网页匹配的网页模板,还需要对生成的网页模板进行筛选。为了筛选方便,首先执行以下步骤S203至步骤S205。
步骤S203,向提供网页模板的多个模板服务器发布网页模板和模板索引。在根据网页数据生成网页的网页模板之后,可以向提供网页模板的多个模板服务器发布网页模板。其中,多个模板服务器可以向不同的网站提供网页模板。
步骤S204,多个模板服务器分别存储网页模板和模板索引。多个模板服务器分别存储接收到的网页模板,这样,在多个模板服务器中的每个模板服务器中都存在网页模板,需要在该网页模板的基础上传输网页数据的时候,可以选择多个模板服务器中的网络状况较好的模板服务器中的网页模板进行增量数据的传输,从而增加了调用网页模板的方便性和可靠性。
步骤S205,多个模板服务器中的第一模板服务器利用模板索引检索与网页匹配的网页模板,向该多个模板服务器中除第一模板服务器外的其它模板服务器提供与网页匹配的模板,第一模板服务器可以为多个模板服务器中的任意一个模板服务器。利用模板索引快速确定一个网页请求是否匹配服务器中存储的网页模板,并且根据请求网页的网址确定匹配的网页模板。在确定与网页匹配的网页模板之后,将匹配的网页模板发送至其他模板服务器。
考虑到上述过程要处理的数据量极大,可以使用Hadoop等分布式计算框架构建程序,使用HBase等大数据量存储服务。另外,为了增加可靠性,发明实施例的网页模板生成方法例如采集网页数据、生成网页模板、网页模板发布和网页模板检索等,都可以在多个服务器上部署,使得多个服务器协同服务,即本发明的网页模板服务器的功能可以由多个共同服务器完成。
图3是根据本发明第三实施例的网页模板生成方法的流程图。该图所示实施例可以作为图2所示实施例中步骤S203向提供网页模板的多个模板服务器发布网页模板和模板索引的优选实施方式,在执行图2所示步骤S202之后,执行以下步骤:
步骤S301,在生成多个网页模板之后,建立多个网页模板的索引。在生成多个网页模板之后,为了方便查找网页模板建立多个网页模板索引。网页模板索引可以通过网址或者域名来索引网页模板。具体地,为了提高索引网页模板的准确性,可以利用生成行标签值或者域名的MD5值获得网页模板的索引。
步骤S302,计算多个网页模板的集合与历史模板集合的整体差异率。为了避免网页模板的变动较小时更换网页模板重新生成增量文件而造成的资源浪费,因此计算多个网页模板的集合与历史模板集合的整体差异率。
步骤S303,判断整体差异率是否大于预设整体差异率阈值。判断多个网页模板的集合与历史模板集合的整体差异率是否大于预设整体差异率阈值,如果多个网页模板的集合与历史模板集合的整体差异率大于预设整体差异率阈值,则网页模板变动较大,直接发布网页模板,如果多个网页模板的集合与历史模板集合的整体差异率小于预设整体差异率阈值,则网页模板变动较小,不发布网页模板。
步骤S304,如果判断出整体差异率大于预设整体差异率阈值,则发布网页模板。如果多个网页模板的集合与历史模板集合的整体差异率大于预设整体差异阈值,表示生成的多个网页模板的集合较历史模板集合的变动较大,可以发布网页模板。
步骤S305,如果判断出整体差异率不大于预设整体差异率阈值,则不发布网页模板。如果多个网页模板的集合与历史模板集合的整体差异率小于预设整体差异阈值,表示生成的多个网页模板的集合较历史模板集合的变动较小,可以基于历史模板进行增量文件传输,可以不发布网页模板。
为了方便在生成的网页模板中与网页匹配的网页模板,根据生成的网页模板生成模板索引,生成模板索引的方法如下:
首先,选取质量符合预定质量条件的模板。在生成的网页模板中查找符合预定质量条件的模板,其中,预定质量条件可以是模板对用户访问的网页的覆盖率大于预定阈值,质量符合预定质量条件的模板相较于质量不符合预定质量条件的模板能够节约差量数据的传输量。
其次,确定模板适用的URL路径。根据模板使用的URL路径查找该路径下的所有网页模板,从而能够提高查找网页模板的速度。
再次,从URL路径中选取质量符合预定质量条件的模板适用的URL路径。由于短路径的模板的覆盖度更好,因此查找URL路径可以从距离根目录最近的短路径开始查找。
最后,将选取的路径转换成模板索引。将根据URL路径选取的网页模板的路径与用户访问的网页相对应,形成模板索引。
图4是根据本发明第四实施例的网页模板生成方法的流程图。如图所示,该图所示实施例可以作为图1所示实施例的优选实施方式,具体步骤如下:
步骤S401,采集网页的网页数据。采集网页的网页数据可以是需要浏览网页的网页数据,网页的网页数据可以来自一个客户端或多个客户端,采集网页的网页数据可以是来自一个客户端的一个或者多个网页的网页数据,采集网页的网页数据还可以是相同域名或不同域名下网页的数据。
步骤S402,判断网页模板的数量是否达到预设数量。在根据网页数据生成网页的网页模板之后,需要判断网页模板的数量是否达到预设数量,如果判断出网页模板的数量没有达到预设数量,可以继续根据网页数据生成网页的网页模板,如果判断出网页模板的数量达到预设数量,可以计算每个网页模板的覆盖率。
步骤S403,如果判断出网页模板的数量达到预设数量,则计算每个网页模板的覆盖率。模板覆盖率是衡量生成的网页模板质量的重要指标,模板覆盖率可以是一个网站内,网页模板能够应用到网站内的网页上的数量与该网站全部网页数量的比值,模板覆盖率越大,能应用到该网站内网页的数量也就越多。模板覆盖率不仅可以衡量一个网站的网站模板质量,还可以衡量某一个路径下的网页模板的质量。例如,某个网页模板的网站覆盖率不是很高,但是在某个路径下的覆盖率很高,该网页模板在实际应用中也能达到很好的效果。
步骤S404,将覆盖率小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板进行对比。在计算出每个网页模板的覆盖率之后,为了避免网页模板变动较小的情况下重新选择相似的网页模板进行增量文件传输,可以比较覆盖率小于第一预设覆盖率阈值的模板与覆盖率大于第一预设覆盖率阈值的网页模板。
步骤S405,如果小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板的差异率小于预设差异率阈值,则将小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板合并。小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板的差异率可以是两个模板以open-vcdiff算法算出 两个网页模板的差值与小于第一预设覆盖率阈值的网页模板的大小的比值,小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板的差异率可以用来衡量小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板的差异程度。
如果小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板的差异率小于预设差异率阈值,则认为小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板相似,将小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板合并,合并的过程可以是将小于第一预设覆盖率阈值的网页模板的数据合并到大于第一预设覆盖率阈值的网页模板的数据中。
优选地,为了方便的将将覆盖率小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板进行对比,可以将多个网页模板按照覆盖率大小进行由大到小的排序,然后将排在后面的网页模板与排在前面的网页模板进行对比。
在对多个网页模板按照覆盖率大小进行由大到小的排序之后,通过对队列中的网页模板进行两两比较或者逐个比较,能够将网页模板的差异率小于预设差异率阈值的网页模板合并。
在将网页模板的差异率小于预设差异率阈值的网页模板合并之后,根据网页的网址或者域名得到网页模板索引将该网页模板数据和网页模板索引都发布出去。
为了方便存储和调用,优选地,根据生成的网页模板生成模板索引包括如下步骤:
步骤S501,存储多个网页模板。为了方便模板的调用,在生成网页的网页模板之后,存储生成的多个网页模板。
步骤S502,计算每个网页模板的覆盖率。由于接近根目录的模板通常具有更好的覆盖率,查找模板的时候优先处理接近根目录的模板,因此在计算多个网页模板的时候,首先将生成的网页模板按照路径深度进行排序,短路径相对于深路径的网页模板的排列位置更靠近根目录。
由于已经对多个网页模板按照路径深度进行排序,在计算每个网页模板的覆盖率时可以计算一个路径下每个网页模板的覆盖率。其中,每个网页模板的覆盖率可以是该网页模板相对于整个路径下的所有网页模板的覆盖率。
为了便于使用覆盖率较大的网页模板进行增量文件的传输,可以按覆盖率从高到低进行排序。同时,在同一路径下网页模板数量较多时,可以按照路径深度从长到短的顺序截取一定数量的网页模板,避免同一路径下网页模板数量较多降低计算速度。
步骤S503,判断每个路径下的网页模板的覆盖率的总和是否达到第二预设覆盖率阈值。
在计算出每个网页模板的覆盖率之后,判断每个路径下的网页模板的覆盖率的总和是否达到第二预设覆盖率阈值,如果达到第二预设覆盖率阈值保留此路径;如果判断出每个路径下的网页模板的覆盖率总和没有达到第二预设覆盖率阈值,则将网页模板的覆盖率的总和未达到第二预设覆盖率阈值的路径下的网页模板删除。
步骤S504,删除网页模板的覆盖率的总和未达到第二预设覆盖率阈值的路径下的网页模板。由于网页模板的覆盖率的总和未达到第二预设覆盖率阈值,那么网页模板的覆盖率的总和未达到第二预设覆盖率阈值的路径下的网页模板都不需要再进行处理和使用,因此可以将网页模板的覆盖率的总和未达到第二预设覆盖率阈值的路径下的网页模板删除,以节省存储资源。
通过上述步骤S501至步骤S504能够根据生成的网页模板生成模板索引。从而在用户访问网页时可以利用模板索引查找匹配的网页模板。
图5是根据本发明第一实施例的网页模板服务器的示意图,该网页模板服务器可以和前述实施例中的模板服务器为同一个服务器。如图所示,该网页模板服务器包括采集单元10、生成单元30和索引单元60。
采集单元10用于采集网页的网页数据。采集网页的网页数据可以是需要浏览网页的网页数据,网页的网页数据可以来自一个客户端或多个客户端,采集网页的网页数据可以是来自一个客户端的一个或者多个网页的网页数据,采集网页的网页数据还可以是相同域名或不同域名下网页的数据。存储这些采集到的网页数据。
需要说明的是,采集网页的网页数据可以根据用户浏览网页的需要进行采集,上述网页的网页数据的来源只是为了举例说明可以采集上述来源的网页的数据,并不用于限定在采集网页的网页数据过程中一定要采集上述所有网页来源的所有网页的网页数据。
生成单元30用于根据采集到的网页数据生成该网页对应的模板,例如,可以根据网页的网页数据生成该网页的网页模板。
生成网页模板可以利用一种局部敏感的哈希算法(simhash)算法计算生成。具体地,可以利用simhash算法对网页数据生成一个N位的哈希值,对生成的N位哈希值通过随机散列取前缀的方法生成T个标签值,对每个标签值从同域名下的网页模板中查找网页模板,如果找到合适的网页模板,可以将查找到的网页模板作为需要浏览网页的网页模板进行增量数据的传输,如果没有找到合适的网页模板,可以将需要浏览的网页模板作为网页模板存储在模板库中。
索引单元60用于根据生成的网页模板生成模板索引。索引单元能够根据生成的网页模板可以建立网页模板的URL路径与模板的映射关系,将该映射关系作为模板索引。
由于通过上述方法生成的网页模板可能会导致出现相同或相似的网页模板,这些相同或相似的模板可能存储在不同的客户端中,为了使得到的网页模板减少存储空间的占用以及使得得到的网页模板更加有代表性,可以保留相同或相似的网页模板中的其中一个,将其余相同或相似的模板删除。
在该实施例中,在建立网页模板时,可以利用采集到的网页数据建立该网页的网页模板,这样,模板的建立不依赖于特定的目标网站,降低了对目标网站的依赖性,能够针对任何的目标网站建立相应的网页模板。
图6是根据本发明第二实施例的网页模板服务器的示意图。该实施例可以作为图5所示实施例的优选实施方式,如图所示,该网页模板服务器包括采集单元10、生成单元30、发布单元40、存储单元50、索引单元60和模板检索单元20。
发布单元40用于在根据网页数据生成网页的网页模板之后,向提供网页模板的多个模板服务器发布网页模板。在根据网页数据生成网页的网页模板之后,可以向提供网页模板的多个模板服务器发布网页模板。其中,多个模板服务器可以向多个网站发送网页模板,还可以采集来自多个网站的网页数据。
存储单元50用于在多个模板服务器分别存储网页模板。多个模板服务器分别存储接收到的网页模板,这样,在多个模板服务器中的每个模板服务器中都存在网页模板,需要在该网页模板的基础上传输网页数据的时候,可以选择多个模板服务器中的网络状况的模板服务器中的网页模板进行增量数据的传输,从而增加了基于网页模板加载网页数据的方便性和可靠性。
索引单元60用于根据生成的网页模板生成模板索引。索引单元能够根据生成的网页模板可以建立网页模板的URL路径与模板的映射关系,将该映射关系作为模板索引。
模板检索单元20用于利用模板索引检索与网页匹配的网页模板,向其它服务器提供与网页匹配的模板。利用模板索引快速确定一个网页请求是否匹配服务器中存储的网页模板,并且根据请求网页的网址确定匹配的网页模板。在多个模板生成服务器中的任意一个模板生成服务器在确定与网页匹配的网页模板之后,将匹配的网页模板发送至多个模板生成服务器中的其他服务器。
考虑到上述过程要处理的数据量极大,可以使用Hadoop等分布式计算框架构建程序,使用HBase等大数据量存储服务。另外,为了增加可靠性,发明实施例的网页模板生成方法例如采集网页数据、生成网页模板、网页模板发布和网页模板检索等,都可以在多个服务器上部署,使得多个服务器协同服务。即本发明的网页模板服务器的功能可以由多个共同服务器完成。不同的功能模板可以部署在不同的服务器中。也可以在不同的服务器中部署相同的功能模块。
优选地,为了生成模板索引,上述索引单元60包括模板选取模块、模板路径推导模块、模板路径剪枝模块和模板索引生成模块。
模板选取模块用于选取质量符合预定质量条件的模板。在生成的网页模板中查找符合预定质量条件的模板,其中,预定质量条件可以是模板对用户访问的网页的覆盖率大于预定阈值,质量符合预定质量条件的模板相较于质量不符合预定质量条件的模板能够节约差量数据的传输量。
模板路径推导模块用于确定模板适用的URL路径。根据模板使用的URL路径查找该路径下的所有网页模板,从而能够提高查找网页模板的速度。
模板路径剪枝模块用于从URL路径中选取质量符合预定质量条件的模板适用的URL路径。由于短路径的模板的覆盖度更好,因此查找URL路径可以从距离根目录最近的短路径开始查找。
模板索引生成模块用于将选取的路径转换成模板索引。将根据URL路径选取的网页模板的路径与用户访问的网页相对应,形成模板索引。
图7是根据本发明第三实施例的网页模板服务器的示意图。该实施例可以作为图5所示实施例的优选实施方式,如图所示,该网页模板服务器包括采集单元10、生成 单元30、发布单元40、存储单元50和索引单元60,其中,发布单元40包括计算模块401、判断模块402和发布模块403。
计算模块401用于计算多个网页模板的集合与历史模板集合的整体差异率。为了避免网页模板的变动较小时更换网页模板重新生成增量文件而造成的资源浪费,因此计算多个网页模板的集合与历史模板集合的整体差异率。
判断模块402用于判断整体差异率是否大于预设整体差异率阈值。判断多个网页模板的集合与历史模板集合的整体差异率是否大于预设整体差异率阈值,如果多个网页模板的集合与历史模板集合的整体差异率大于预设整体差异率阈值,则网页模板变动较大,直接发布网页模板,如果多个网页模板的集合与历史模板集合的整体差异率小于预设整体差异率阈值,则网页模板变动较小,不发布网页模板。
发布模块403用于在判断出整体差异率大于预设整体差异率阈值,发布网页模板,在判断出整体差异率不大于预设整体差异率阈值,不发布网页模板。如果多个网页模板的集合与历史模板集合的整体差异率大于预设整体差异阈值,表示生成的多个网页模板的集合较历史模板集合的变动较大,可以发布网页模板。如果多个网页模板的集合与历史模板集合的整体差异率小于预设整体差异阈值,表示生成的多个网页模板的集合较历史模板集合的变动较小,可以基于历史模板进行增量文件传输,可以不发布网页模板。
图8是根据本发明第四实施例的网页模板服务器的示意图。如图所示,该网页模板服务器包括采集单元10、生成单元30、判断单元60、计算单元70、对比单元80和合并单元90。其中,图8所示的采集单元10、和生成单元30的功能与图5所示实施例中的采集单元10、和生成单元30的功能相同,在此不做赘述。
判断单元60用于在根据网页数据生成网页的网页模板之后判断网页模板的数量是否达到预设数量。在根据网页数据生成网页的网页模板之后,需要判断网页模板的数量是否达到预设数量,如果判断出网页模板的数量没有达到预设数量,可以继续根据网页数据生成网页的网页模板,如果判断出网页模板的数量达到预设数量,可以计算每个网页模板的覆盖率。
计算单元70用于在判断出网页模板的数量达到预设数量时,计算每个网页模板的覆盖率。模板覆盖率是衡量生成的网页模板质量的重要指标,模板覆盖率可以是一个网站内,网页模板能够应用到网站内的网页上的数量与该网站全部网页数量的比值,模板覆盖率越大,该网页模板能够应用到该网站内网页的数量也就越多。模板覆盖率不仅可以衡量一个网站的网站模板质量,还可以衡量某一个路径下的网页模板的质量。 例如,某个网页模板的网站覆盖率不是很高,但是在某个路径下的覆盖率很高,该网页模板在实际应用中也能达到很好的效果。
对比单元80用于将覆盖率小于预设覆盖率阈值的网页模板与大于预设覆盖率阈值的网页模板进行对比。在计算出每个网页模板的覆盖率之后,为了避免网页模板变动较小的情况下重新选择相似的网页模板进行增量文件传输,可以比较覆盖率小于第一预设覆盖率阈值的模板与覆盖率大于第一预设覆盖率阈值的网页模板。
合并单元90用于在小于预设覆盖率阈值的网页模板与大于预设覆盖率阈值的网页模板的差异率小于预设差异率阈值,将小于预设覆盖率阈值的网页模板与大于预设覆盖率阈值的网页模板合并。小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板的差异率可以是两个模板以open-vcdiff算法算出两个网页模板的差值与小于第一预设覆盖率阈值的网页模板的大小的比值,小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板的差异率可以用来衡量小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板的差异程度。
如果小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板的差异率小于预设差异率阈值,则认为小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板相似,将小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板合并,合并的过程可以是将小于第一预设覆盖率阈值的网页模板的数据合并到大于第一预设覆盖率阈值的网页模板的数据中。
优选地,为了方便的将将覆盖率小于第一预设覆盖率阈值的网页模板与大于第一预设覆盖率阈值的网页模板进行对比,对比单元80包括排序模块和对比模块,其中,排序模块,用于将多个网页模板按照覆盖率大小进行由大到小的排序;对比模块,用于将排在后面的网页模板与排在前面的网页模板进行对比。
在对多个网页模板按照覆盖率大小进行由大到小的排序之后,通过对队列中的网页模板进行两两比较或者逐个比较,能够将网页模板的差异率小于预设差异率阈值的网页模板合并。
在将网页模板的差异率小于预设差异率阈值的网页模板合并之后,可以根据网页的网址或者域名得到网页模板索引,可以将该网页模板数据和网页模板索引都发布出去。
为了方便存储和调用,索引单元60还包括:存储模块、计算模块、第三判断模块和删除模块。
存储模块,用于在根据网页数据生成网页的网页模板之后,存储多个网页模板.为了方便模板的调用,在生成网页的网页模板之后,存储生成的多个网页模板。
计算模块,用于计算每个网页模板的覆盖率。由于接近根目录的模板通常具有更好的覆盖率,查找模板的时候优先处理接近根目录的模板,因此在计算多个网页模板的时候,首先将生成的网页模板按照路径深度进行排序,短路径相对于深路径的网页模板的排列位置更靠近根目录。
由于已经对多个网页模板按照路径深度进行排序,在计算每个网页模板的覆盖率时可以计算一个路径下每个网页模板的覆盖率。其中,每个网页模板的覆盖率可以是该网页模板相对于整个路径下的所有网页模板的覆盖率。
为了便于使用覆盖率较大的网页模板进行增量文件的传输,可以按覆盖率从高到低进行排序。同时,在同一路径下网页模板数量较多时,可以按照路径深度从长到短的顺序截取一定数量的网页模板,避免同一路径下网页模板数量较多降低计算速度。
第三判断模块,用于判断每个路径下的网页模板的覆盖率的总和是否达到预设覆盖率阈值。在计算出每个网页模板的覆盖率之后,判断每个路径下的网页模板的覆盖率的总和是否达到第二预设覆盖率阈值,如果达到第二预设覆盖率阈值保留此路径;如果判断出每个路径下的网页模板的覆盖率总和没有达到第二预设覆盖率阈值,则将网页模板的覆盖率的总和未达到第二预设覆盖率阈值的路径下的网页模板删除。
删除模块,用于删除网页模板的覆盖率的总和未达到预设覆盖率阈值的路径下的网页模板。由于网页模板的覆盖率的总和未达到第二预设覆盖率阈值,那么网页模板的覆盖率的总和未达到第二预设覆盖率阈值的路径下的网页模板都不需要再进行处理和使用,因此可以将网页模板的覆盖率的总和未达到第二预设覆盖率阈值的路径下的网页模板删除,以节省存储资源。
在本发明实施例中,网页模板即可以是一个网页,一个网页能够作为另外一个网页的网页模板。例如,如果网页A能够覆盖网页B的大部分的内容,即网页A与网页B结构、内容或者编码相似,网页A与网页B之间存在大量重复的数据,则网页A可以作为网页B的网页模板,同样,网页B也可以作为网页A的网页模板。一个网页可以有一个或多个网页模板,一个网页模板也可以作为一个或多个网页的模板。
图9是根据本发明实施例的网页模板服务器与中间服务器以及终端设备连接的方框示意图。如图9所示。
终端设备10用于向中间件服务器20发送网页浏览请求,接收中间件服务器20响应于所述网页浏览请求而返回的差量数据,以及根据终端设备10本地存储的与差量数据对应的网页模板数据和差量数据展现所请求的网页,所述差量数据是在网页模板服务器30中基于所请求的网页的网页数据和与该网页数据对应的网页模板数据生成的。在进行网页浏览时,用户操作终端设备10,通过终端设备10发出浏览请求,此时,终端设备10接收网页的浏览请求,且将网页的浏览请求发送给中间件服务器20。用户可以通过点击的动作向终端设备10提出浏览请求。
中间件服务器20用于根据所接收的网页浏览请求,获取所请求的网页数据并转发给网页模板服务器30,以及在接收到网页模板服务器30返回的差量数据后,向终端设备10转发所述差量数据。
网页模板服务器30用于基于从中间件服务器20接收的网页数据以及本地获取的与该网页数据对应的网页模板数据,生成所述网页数据和网页模板数据之间的差量数据并转发给中间件服务器20。
网页与相应的网页模板之间存在的差量数据,在传输网页数据的时候,如果终端设备10本地存在网页模板,则仅仅传输差量数据,不必传输网页的全部数据。
在网页模板服务器30无法获取与需要展现的网页匹配的网页模板时,中间件服务器20直接返回获取的网页数据。
本发明的网页模板服务器30还可以生成新的网页模板的网页模板数据。
本发明的网页模板服务器30生成新的网页模板的网页模板数据,可以是网页模板服务器预先通过接收中间件服务器转发的网页数据而生成的。网页模板服务器预先通过接收中间件服务器转发的网页数据的方法中,由于网页模板服务器30从中间件服务器20接收的是海量的网页数据,本发明实施例可以采用Hadoop(分布式系统基础架构)集群进行数据存储与计算。即网页模板服务器30是一个服务器集群,由多个服务器组成。所述服务器集群存放网页数据、模板数据、模板索引等采用的是基于Hadoop的HBase(分布式、面向列的开源式数据库)数据库。网页模板数据生成采用的是MapReduce(大规模数据集的并行运算方法)计算框架。Hadoop集群是天然的分布式存储和计算框架。只需要网页模板服务器30中增加生成网页模板的服务器的数量就能够对集群进行横向扩展,具备良好的容灾能力。
当网页模板服务器30是一个服务器集群时,网页模板服务器30用于基于从中间件服务器20接收的网页数据以及本地获取的与该网页数据对应的网页模板数据,生成 所述网页数据和网页模板数据之间的差量数据并转发给中间件服务器20。这里的本地获取的意思是服务器集群中获取。
从上述的分析可以知道,网页和网页模板之间存在重复数据,也存在差量数据,其中,差量数据是网页中存在的数据而网页模板中不存在的数据。这里所说的网页数据包括网页的结构数据、内容数据或者编码数据,这些数据通过无线电通讯网络或者互联网由中间件服务器20发送至终端设备10或者由终端设备10发送至中间件服务器20。本发明的网页模板会以编码的形式存储在缓存中,因此在展现网页时,需要终端设备10对网页模板数据和差量数据进行解码,网页模板数据与差量数据一起还原得到需要展现的网页。
由于差量数据相较于网页数据较小,在终端设备10存在网页模板的情况下,传输网页数据时可仅仅传输差量数据。差量数据是网页数据的一部分,因此差量数据的传输方法可以与网页数据的传输方法相同,通过无线电通讯网络或者互联网等网络传输。中间件服务器20将差量数据发送至终端设备10,终端设备10调用该网页对应的网页模板,从而实现网页的展现,能够有效的节约网络资源,减少带宽的占用,并且提高了网页的加载速度,进一步提高用户的浏览网页的速度。
图10是根据本发明实施例的终端设备的一个实施例的方框示意图。
如图10所示,终端设备10包括网页浏览请求发送单元101、差量数据接收单元102、网页展现单元103。
网页浏览请求发送单元101,用于向中间件服务器20发送网页浏览请求;本发明的终端设备10在网页浏览请求发送单元101向中间件服务器20发送网页浏览请求前,需要在本地查找找到与网页浏览请求的网页相匹配的网页模板,如果找到相匹配的网页模板则需要在所述网页浏览请求包中带上包含该网页模板ID的第一模板ID列表,找不到则列表为空。本地查找找到与网页浏览请求的网页相匹配的网页模板可以是根据请求的网页的网页地址进行查询,或者对网页进行处理生成网页标签进行查询,例如生成哈希值标签等。网页与网页模板的匹配原则根据不同网站或者网页的需求而定,例如,用覆盖率的方式时,即网页A的网页模板与网页A之间的覆盖率达到预定值才认为与网页A相匹配,则需要在网页模板库中查询与网页A的覆盖率达到预定值的网页模板。需要说明的是,网页模板与网页之间的匹配方式还可以是压缩比等除覆盖率以外的其他方式,这里只是举例说明,不做穷举。
需要说明的是,为了减少传输资源的负担,提高终端设备10响应速度,需要限制第一模板ID列表的大小或者包含模板ID的数量在一定的数值范围以内。例如,请求包每次最多只能附带5个模板ID。
差量数据接收单元102,用于接收中间件服务器20响应于所述网页浏览请求而返回的差量数据,所述差量数据是在网页模板服务器30中基于所请求的网页的网页数据和与该网页数据对应的网页模板数据生成的,以及
网页展现单元103,用于根据终端设备10本地存储的与差量数据对应的网页模板数据和差量数据展现所请求的网页。采用TCP/IP协议传输。如果网页展现单元103接收的数据是差量数据,则根据终端设备10本地存储的与差量数据对应的网页模板数据和差量数据展现所请求的网页,如果接收的数据是网页数据,则可直接进行网页展现。
由于网页模板以编码的形式在通讯网络中传输,因此网页展现单元103需要将这些编码数据还原,并与差量数据一起展示原始网页。
基于终端设备1010的存储能力考虑,本发明中终端设备10本地存储的网页模板数量或网页模板数据总大小是有限制的,可以设定阈值,如最多只能保存100个模板并且总大小不能超过10MB。如果超过阈值,则可根据LRU(Least Recently Used,简称最近最少)即最近最少使用页面置换算法对模板进行淘汰。利用LRU算法可以对最近使用较少,并且在未来较长一段时间不使用的网页模板进行删除,能够节省终端设备10的存储资源。
本发明另一个实施例中,终端设备10还包括网页模板下载单元104、网页模板数据保存单元105。
网页模板下载单元104,用于在从中间件服务器20接收到不属于第一网页模板ID列表的网页模板ID后,基于该网页模板ID,经由中间件服务器20从网页模板服务器30中下载对应的网页模板数据。网页模板下载单元104是一个独立的工作线程,可智能的在网络空闲时或者wifi环境下进行模板请求下载,避免占用带宽,影响用户的浏览体验。
网页模板数据保存单元105,用于将网页模板下载单元104下载的网页模板数据与对应的网页模板ID相关联地保存。用于存储终端设备10的网页模板数据,与上一实施例一样,基于终端设备10的存储能力考虑,网页模板数据保存单元105存储的模板数量或网页模板数据总大小是有限制的,可以设定阈值,如最多只能保存100个模 板并且总大小不能超过10MB。如果超过阈值,则可根据LRU(Least Recently Used,简称最近最少)即最近最少使用页面置换算法对模板进行淘汰。利用LRU算法可以对最近使用较少,并且在未来较长一段时间不使用的网页模板进行删除,能够节省终端设备10的存储资源。
本发明的终端设备10可以包括移动终端、PDA、IPad等具有显示功能,可以进行网页浏览的终端设备。
图11是根据本发明实施例的中间件服务器的一个实施例的方框示意图。
如图11所示中间件服务器20包括网页数据获取单元201,用于在接收到终端设备10发送的网页浏览请求后,获取所请求的网页数据;网页数据的获取可先从中间件服务器20缓存中查询是否有缓存的网页数据,如果没有则需要访问目标网站服务器获取。
还包括转发单元202,用于向网页模板服务器30转发所获取的网页数据,以及在接收到网页模板服务器30返回的差量数据后,向终端设备10转发所述差量数据。转发单元202可以采用TCP/IP协议传输数据。向网页模板服务器30转发所获取的网页数据的同时,还将网页浏览请求所请求的网页网址、第一模板ID列表一起发送到网页模板服务器30。
在接收到的数据不是差量数据,而是网页模板服务器30返回的推荐的模板ID时,转发单元202向终端设备10发送推荐的模板ID,以及网页数据获取单元201获取的网页数据。
本发明的另一优选的实施例中,中间件服务器20还包括模板数据获取模块203用于接收终端设备10的网页模板下载单元104发送的要下载网页模板数据的模板ID,通过该模板ID从网页模板服务器30中下载网页模板数据。之后发送给转发模块202,由转发模块202将网页模板数据发送给终端设备10,并且由终端设备10的网页模板数据保存单元105保存。
图12是根据本发明实施例的网页模板服务器的一个实施例的方框示意图。
如图12所示所述网页模板服务器30包括网页模板数据存储单元301、网页模板数据获取单元302、差量数据生成单元303和发送单元304。
其中网页模板数据存储单元301用于存储网页模板数据。网页模板数据存储单元301中具体是相关联地存储着网页模板ID和网页模板数据。
网页模板数据获取单元302用于从网页模板数据存储单元301获取与所接收的网页数据对应的网页模板数据。网页模板数据获取单元302是通过第一模板ID列表网页模板数据存储单元301获取网页模板数据的或通过第一模板ID列表和请求的网页的网页地址去网页模板数据存储单元301获取网页模板数据的。
差量数据生成单元303用于基于从中间件服务器20接收的网页数据和与该网页数据对应的网页模板数据,生成所述网页数据和网页模板数据之间的差量数据。
发送单元304用于将所生成的差量数据发送给中间件服务器20。
图13是根据本发明实施例的网页模板服务器的差量数据生成单元一个实施例的方框示意图。
本发明中当终端设备10本地存储有与请求的网页相匹配的网页模板时会将所有相匹配的模板的ID的第一模板ID列表连同网页浏览请求发送给中间件服务器20,同时中间件服务器20会将第一模板ID列表转发给网页模板服务器30,此时网页模板服务器30的所述网页模板数据获取单元302被配置为顺序获取第一网页模板ID列表中的网页模板ID,并且基于所获取的网页模板ID,从所述网页模板数据存储单元301中获取网页模板数据,此时差量数据生成单元303包括如图5所示的差值数据计算模块3031和确定模块3032。
差值数据计算模块3031用于计算网页数据和从网页模板数据存储单元301中获取的网页模板数据之间的差值数据。差值数据计算模块3031是将网页模板数据和网页数据使用差量算法进行计算。
确定模块3032用于在所计算出的差值数据与网页数据之间压缩比小于第一预定阈值时,将所述差值数据确定为所述差量数据,在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,重复所述网页模板数据获取单元302和所述差量数据生成单元303的处理过程,直到生成所述差量数据。
差值数据与网页数据之间压缩比,即为差值数据经过压缩后的数值与网页数据进行压缩后的数据的比值,这个压缩比越小说明该差值数据对应的网页模板数据与网页数据之间的差异越小。
图14是根据本发明实施例的网页模板服务器的第二个实施例的方框示意图。
如图14所示,本发明中当终端设备10本地存储有与请求的网页相匹配的网页模板时,会将所有相匹配的模板的ID的第一模板ID列表连同网页浏览请求发送给中间 件服务器20,同时中间件服务器20会将第一模板ID列表和请求的网页地址转发给网页模板服务器30,即网页浏览请求包含网页地址和第一网页模板ID列表,此时网页模板服务器30包括:
网页模板ID列表库305,用于与网页地址相关联地存储第二网页模板ID列表。该第二网页模板ID列表为网页模板服务器30推荐的与所请求浏览的网页的地址对应的模板ID列表。网页模板服务器30中存储的与网页的地址相匹配的网页模板数据的模板ID构成页网页模板服务器30推荐的第二网页模板ID列表。
网页模板ID列表获取单元306,用于根据所请求浏览的网页的网页地址,从网页模板ID列表库305中获取对应的第二网页模板ID列表。
网页模板ID列表合并单元307,用于将第一网页模板ID列表和第二网页模板ID列表合并成第三网页模板ID列表。网页模板ID列表合并单元307将第一网页模板ID列表和第二我难过也模板ID列表合并成第三网页模板ID列表的方法为:对第一网页模板ID列表和第二网页模板ID列表中的网页模板ID按照优先级进行合并,形成第三网页模板ID列表,其中第一网页模板ID列表和第二网页模板ID列表的交集的优先级最高,第一网页模板ID列表中的剩余部分次之,第二网页模板ID列表中的剩余部分最低。
形成第三网页模板ID列表后,所述网页模板数据获取单元302顺序获取第三网页模板ID列表中的网页模板ID,并且基于所获取的网页模板ID,从所述网页模板数据存储单元301中获取网页模板数据。此时差值数据生成单元包括的差值数据计算模块3031和确定模块3032生成差量数据的过程与图13所示的实施例相同。
本发明的优选实施例中为了避免差量数据生成单元303进行生成所述差量数据时,在计算失败后进行过多的计算而影响系统运行效率,还设置有计数单元(图中未示出),用于在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,计数所述差值数据计算单元的计算次数,以及
在所述计算次数不超过第二预定阈值时,所述网页模板数据获取单元302被配置为获取下一网页模板ID,并且基于所述下一网页模板ID,从所述网页模板数据存储单元301中获取网页模板数据,以及
在所述计算次数超过第二预定阈值时,所述网页模板服务器30向所述中间件服务器20返回差量数据生成失败消息,以便所述中间件服务器20在接收到所述差量数据生成失败消息后,向终端设备10返回网页数据来进行展现。
本发明另一优选实施例中,网页模板服务器30还包括差量数据保存单元(图中未示出),用于与网页模板ID和网页地址相关联地存储差量数据;以及
差量数据查询单元(图中未示出),用于根据网页模板ID和网页地址,在所述差量数据保存单元中查询相关联的差量数据,以及
在所述差量数据查询单元没有查询到相关联的差量数据时,所述差量数据生成单元303被配置为生成所述差量数据。保存一定数量的差量计算结果,当出现相同的模板ID和网页请求时,差量数据生成单元303不需要进行差量数据计算,可直接从差量数据保存单元获取差量数据,提高响应速度。
本发明另一优选实施例中网页模板服务器30还包括第二判断单元(图中未示出),用于在生成所述差量数据后,判断所述网页模板数据获取单元302当前使用的网页模板ID是否属于第一网页模板ID列表,以及
在当前使用的网页模板ID属于第一网页模板ID列表时,所述发送单元304将所生成的差量数据和该当前使用的网页模板ID返回给中间件服务器20并经由中间件服务器20转发给终端设备10,
在当前使用的网页模板ID不属于第一网页模板ID列表时,所述发送单元304将当前使用的网页模板ID返回给中间件服务器20,并且中间件服务器20将所接收的网页模板ID和网页数据发送给终端设备10,供终端设备10的网页模板下载单元104在空闲时或者WIFI情况下下载网页模板ID对应的网页模板数据。
同时本发明的网页模板服务器30还可以包括网页模板数据生成单元308、网页采集单元309和网页保存单元310。
网页采集单元309用于接收中间件服务器20发送过来的网页数据。
网页保存单元310用于存储网页采集单元309接收的中间件服务器20发送过来的网页数据。
网页模板数据生成单元308用于根据网页保存单元310存储的中间件服务器20发送过来的网页数据生成网页模板数据,并且生成对应的网页模板ID,并将网页模板数据和网页模板ID相对应地存储到网页模板数据存储单元301,将网页模板ID与网页地址相对应地存储到网页模板ID列表库305。网页模板数据生成单元308用于根据中间件服务器发送过来的网页数据生成网页模板数据网页模板数据生成单元308是采用特有的算法快速生成网页模板数据。特有的算法快速生成网页模板数据的方法可以 是通过对网页生成哈希值的方法或者对网页数据进行分行的方法生成,由于网页模板可以是网页,因此,也可以将该网页本身作为网页模板。
本发明的网页模板数据生成单元308生成网页模板数据可以在差量数据生成单元303生成差量数据失败时,根据用户请求浏览的网页建立新的网页模板,也可以是模板服务器30预先通过接收中间件服务器20转发的网页数据而生成的。
本发明的实施例中网页模板服务器30通过接收中间件服务器20转发的网页数据而生成网页模板数据的方法中,由于每天通过中间件服务器20访问的网页很多,且网页模板服务器30可以接收多个中间件服务器20的网页数据,所以网页模板服务器30从中间件服务器20接收的是海量的网页数据。本发明实施例需要对海量数据进行存储和需要对海量的网页数据进行大量的运算来生成网页模板。所以本发明的实施例可以采用Hadoop(分布式系统基础架构)集群进行数据存储与计算。即网页模板服务器30是一个服务器集群,由多个服务器组成。网页模板数据生成单元308可以设置在服务器集群的多个服务器中。而所述服务器集群存放网页数据、模板数据、模板索引等采用的是基于Hadoop的HBase(分布式、面向列的开源式数据库)数据库。模板生成采用的是MapReduce(大规模数据集的并行运算方法)计算框架。Hadoop集群是天然的分布式存储和计算框架。只需要网页模板服务器30中增加生成网页模板的服务器的数量,即增加包含网页模板数据生成单元308的服务器就能够对集群进行横向扩展,具备良好的容灾能力。本发明的网页展现系统中,当网页模板服务器30向中间件服务器20返回模板ID列表时,为了不影响用户浏览网页的速度,模板ID列表的大小有限制,例如每次返回的网页模板ID最多只能是5个。
本发明的另一优选实施例中网页模板服务器30还可以包括网页模板删除单元(图中未示出),用于在判断出网页模板数据存储单元301中的网页模板数量或占用空间大小超出预定阈值时,删除网页模板数据存储单元301中最近最少使用的网页模板数据。其中,最近最少使用是指,已经很久没有使用的网页模板数据可能在未来较长的一段时间内不会被用到。那么,根据最近最少原理,分析得到最近一段时间没有使用的网页模板数据,并且可能在未来较长的一段时间内也不会被用到的网页模板数据,则网页模板删除单元将最近一段时间内没有使用的网页模板数据删除。
值得说明的本发明的网页展现系统可以仅仅只包含终端设备和服务器。即本发明网页模板服务器30并不能认为是对某个实体服务器的限定,网页模板服务器30可以是一个服务器,为了减轻计算和存储的压力网页模板服务器30也可以是一个服务器集群,同理中间件服务器20的功能可以在同一个实体服务器完成也可以是一个服务器集群。本发明的中间件服务器20和网页模板服务器30包含的功能模块可以分布的设置 于多个服务器中。比如可以设置一个或多个包含网页模板数据生成单元308、网页采集单元309和网页保存单元310的服务器,设置一个或者多个包含网页模板数据获取单元302和差量数据生成单元303的服务器这些服务器组成本发明网页模板服务器30的服务器集群。
本发明的网页展现系统,通过设置网页模板服务器30来存储和计算网页模板和网页数据之间的差量数据,由中间件服务器20将差量数据发送至终端设备10,终端设备10本地调用该差量数据对应的网页模板,从而实现网页的展现。传输网页数据时仅仅传输差量数据,而差量数据相较于网页数据较小。能够有效的节约网络资源,减少带宽的占用,并且提高了网页的加载速度,进一步提高用户的浏览网页的速度。
图15是根据本发明利用网页模板实现网页展现的方法的实施例流程图。
本发明实施例提供了一种网页展现方法。该方法用于传输网页数据,能够提高网页展现速度。
本发明实施例的网页展现方法可以通过本发明实施例所提供的网页展现系统或者网页展现系统来执行,本发明实施例的网页展现系统或者网页展现系统也可以用于执行本发明实施例所提供的网页展现方法。
如图15所示该网页展现方法包括如下步骤。
步骤S701,终端设备获取用户发送的浏览请求,在本地查找找到与网页浏览请求的网页相匹配的网页模板,向中间件服务器发送包含该网页模板ID的第一模板ID列表的网页浏览请求。如果找不到,则列表为空。本地查找找到与网页浏览请求的网页相匹配的网页模板可以是根据请求的网页地址进行查询,或者对网页进行处理生成网页标签进行查询,例如生成哈希值标签等。网页与网页模板的匹配原则根据不同网站或者网页的需求而定。例如,用覆盖率的方式时,即网页A的网页模板与网页A之间的覆盖率达到预定值才认为与网页A相匹配。在进行网页浏览时,用户向终端设备提出浏览请求,终端设备获取用户发送的浏览请求。终端设备能通过无线电通讯网络或者互联网与中间件服务器相连接,以实现终端设备与中间件服务器之间的通信和数据传输。用户可以通过点击的动作向终端设备提出的浏览请求。
作为优选实施例,为了减少传输资源的负担,提高终端设备响应速度,需要限制向中间件服务器发送的网页浏览请求第一模板ID列表的大小或者包含模板ID的数量。例如,请求包每次最多只能附带5个模板ID。
需要说明的是,网页模板与网页之间的匹配方式还可以是除覆盖率以外的其他方式,这里只是举例说明,不做穷举。
之后步骤S702,在接收到终端设备发送的网页浏览请求后,中间服务器基于所述网页浏览请求获取所请求的网页数据,并将所获取的网页数据转发给网页模板服务器。
中间件服务器可以在本地相关联的存储一些网页地址和网页数据。在接收到终端设备发送的网页浏览请求后,根据网页浏览请求在本地查找是否存在请求的网页,或者去网页服务器获取网页。中间件服务器将所获取的网页数据转发给网页模板服务器的同时会将请求的网页地址发送给网页模板服务器。
步骤S703,网页模板服务器本地获取与该网页数据对应的网页模板数据,基于所接收的网页数据和所获取的网页模板数据,生成所述网页数据和网页模板数据之间的差量数据,并将所生成的差量数据发送给中间件服务器。
网页数据对应的网页模板数据,即为与网页匹配的网页模板的数据,此处匹配原则与上步骤可以相同也可以不同。网页与相应的网页模板之间存在相同的数据,但是也存在不同数据差量。其中,而差量数据可以是网页中存在的数据而网页模板中不存在的数据。这里所说的网页数据包括网页的结构数据、内容数据或者编码数据,这些数据通过无线电通讯网络或者互联网由中间件服务器发送至终端设备或者由终端设备发送至中间件服务器。
本发明的网页展现方法的优选实施例中,网页模板服务器本地获取与该网页数据对应的网页模板数据,基于所接收的网页数据和所获取的网页模板数据,生成所述网页数据和网页模板数据之间的差量数据之后,还包括差量数据保存单元将网页模板ID、网页地址差量数据相关联地存储。同时在网页模板服务器接收到中间件服务器发送的网页数据、请求的网页网址和网址对应的网页模板ID时,根据网页模板ID和网页地址,在所述差量数据保存单元中查询相关联的差量数据,以及在所述差量数据查询单元没有查询到相关联的差量数据时,再进入步骤S703。
步骤S704,中间件服务器将所接收的差量数据转发给所述终端设备。
步骤S705,终端设备根据所接收的差量数据和本地存储的与差量数据对应的网页模板数据,展现所请求的网页。终端设备在接收到网络传输来的差量数据后,通过网页模板ID或者能够表征网页模板的标签等在本地查找网页模板数据,从根据网页模板数据与差量数据一起可以展现网页。网页模板的数据包括网页模板的编码信息等数据。 在得到网页模板的数据之后,根据网页模板数据和差量数据展现网页,其中,网页数据可以通过对网页模板数据与差量数据进行解码得到。
由于差量数据相较于网页数据较小,在终端设备本地存在网页模板的情况下,传输网页数据时可以仅仅传输差量数据。差量数据是网页数据的一部分,因此差量数据的传输方法与网页数据的传输方法相同,通过无线电通讯网络或者互联网等网络传输。中间件服务器将差量数据发送至终端设备,终端设备调用该网页对应的网页模板,从而实现网页的展现。差量数据的大小远远小于网页数据,因此,传输差量数据所占用的网络资源也远小于传输网页数据所占用的网络资源,提高了网页数据的传输效率,进一步提高了网页的加载速度。
图16是根据本发明利用网页模板实现网页展现的方法的网页浏览请求包含第一网页模板ID列表的情况下S703步骤的第一实施例流程图。
所述网页浏览请求包含第一网页模板ID列表的情况下,如图16所示,进入步骤S801,网页模板数据获取单元顺序获取第一网页模板ID列表中的第一网页模板ID,之后步骤S802,网页模板数据获取单元基于当前获取的第一网页模板ID,从网页模板数据存储单元中获取网页模板数据。
之后S803步骤,差值数据计算模块计算网页数据和从网页模板数据存储单元中获取的网页模板数据之间的差值数据。
然后S804步骤,判断所述差值数据与网页数据之间压缩比是否小于第一预定阈值。
在所计算出的差值数据与网页数据之间压缩比小于第一预定阈值时,在进入到步骤S805,确定模块将所述差值数据确定为所述差量数据,之后进入步骤S806,所述发送单元将所生成的差量数据和该当前使用的网页模板ID返回给中间件服务器并经由中间件服务器转发给终端设备。
在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,进入到步骤S807,判断当前第一网页模板ID是否第一网页模板ID列表中最后一个网页模板ID。如果不是,则进入步骤S810,网页模板数据获取单元从第一网页模板ID列表中获取下一第一网页模板ID,作为新的当前获取的第一网页模板ID,之后返回步骤S802。如果是,则进入步骤S811,发送单元返回差量数据计算失败的信息给中间件服务器,中间件服务器仅仅返回网页数据给终端设备,本流程结束。
为了防止第一网页模板ID列表中网页模板ID数量过多,导致网页模板服务器计算量过大。作为优选的实施例,本实施例的S807步骤可以被S808、S809替换,S808步骤为计数单元将述差值数据计算单元的差值计算次数加1,步骤S809,判断所述差值计算单元的差值结算次数是否超过第二预定阈值。当在所述计算次数不超过第二预定阈值,则进入步骤S810。当超过第二预定阈值时,进入步骤S811。
图17是根据本发明利用网页模板实现网页展现的方法的网页浏览请求包含第一网页模板ID列表的情况下S703步骤的第二实施例流程图。
如图17所示,在网页模板服务器接收到包含第一网页模板列表的网页浏览请求后,步骤S901,网页模板ID列表获取单元根据所请求浏览的网页的网页地址,从网页模板ID列表库中获取对应的第二网页模板ID列表。网页模板ID列表库中与网页地址相关联地存储着第二网页模板ID列表。
之后步骤S902,网页模板ID列表合并单元,将第一网页模板ID列表和第二网页模板ID列表合并成第三网页模板ID列表。此步骤中第三网页模板ID列表的生成方法可以是对第一网页模板ID列表和第二网页模板ID列表中的网页模板ID按照优先级进行合并,形成第三网页模板ID列表,其中第一网页模板ID列表和第二网页模板ID列表的交集的优先级最高,第一网页模板ID列表中的剩余部分次之,第二网页模板ID列表中的剩余部分最低。
之后步骤S903,网页模板数据获取单元顺序获取第三网页模板ID列表中的第三网页模板ID。之后步骤S904,并且基于所获取的网页模板ID,从所述网页模板数据存储单元中获取网页模板数据。
获取到网页模板数据后,步骤S905,差值数据计算模块计算网页数据和从网页模板数据存储单元中获取的网页模板数据之间的差值数据。
然后S906步骤,判断所述差值数据与网页数据之间压缩比是否小于第一预定阈值。
在所计算出的差值数据与网页数据之间压缩比小于第一预定阈值时,进入步骤S907中,确定模块将所述差值数据确定为所述差量数据。之后进入步骤S908,所述发送单元将所生成的差量数据和该当前使用的网页模板ID返回给中间件服务器并经由中间件服务器转发给终端设备。
在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,进入到步骤S909,判断当前第三网页模板ID是否第三网页模板ID列表中最后一个网页模板ID。如果不是,则进入步骤S910,网页模板数据获取单元从第三网页模板ID列表中获取下一第三网页模板ID,作为新的当前获取的第三网页模板ID,之后返回步骤S904。如果是,则进入步骤S913,发送单元返回差量数据计算失败的信息给中间件服务器,中间件服务器仅仅返回网页数据给终端设备,本流程结束。
为了防止第三网页模板ID列表中网页模板ID数量过多,导致网页模板服务器计算量过大。作为优选的实施例,本实施例的S909步骤可以被S911、S912替换,S911步骤为计数单元将述差值数据计算单元的差值计算次数加1,之后步骤S912,判断所述差值计算单元的差值结算次数是否超过第二预定阈值。当在所述计算次数不超过第二预定阈值,则返回步骤S910。当超过第二预定阈值时,进入步骤S913。
进入步骤S908之前,本实施例优选实例中还包括步骤S915,第二判断单元判断所述网页模板数据获取单元当前使用的网页模板ID是否属于第一网页模板ID列表,
当在前使用的网页模板ID属于第一网页模板ID列表,进入步骤S908。
当在当前使用的网页模板ID不属于第一网页模板ID列表时,则进入步骤S916所述发送单元将当前使用的网页模板ID返回给中间件服务器供中间件服务器将所接收的网页模板ID和网页数据同时发送给终端设备。
之后终端设备的网页模板下载单元基于该网页模板ID经由中间件服务器从网页模板数据存储单元中下载对应的网页模板数据,且由网页模板数据保存单元将网页模板下载单元下载的网页模板数据与对应的网页模板ID相关联地保存。
作为本发明的优选实施例,终端设备的网页模板下载单元基于该网页模板ID经由中间件服务器从网页模板数据存储单元中下载对应的网页模板数据可以在在网页展现之后,智能的在网络空闲时或者wifi情况下请求进行网页模板数据下载,能够避免占用带宽,提高用户的浏览体验。
在终端设备本地没有匹配的网页模板的情况下,中间件服务器在网络空闲的时候将推荐的网页模板数据发送至终端设备,从而在终端设备需要再次使用该网页模板时方便直接调用,不仅较少了带宽占用,还加快了浏览速度,提高用户体验。
本发明中在图7的步骤S701中,终端设备获取用户发送的浏览请求,在本地查找找到与网页浏览请求的网页相匹配的网页模板,如果找不到,则向中间件服务器发送 的网页浏览请求中不包含第一模板ID列表。找不到与网页浏览器请求相匹配的网页模板的即为终端设备本地没有存储与网页浏览请求的网页相匹配的网页模板。此时本发明的还包括终端设备通过中间件服务器去网页模板服务器查找和下载网页模板的步骤。
图18是根据本发明利用网页模板实现网页展现的方法的网页浏览请求包含第一网页模板ID列表的情况下S703步骤的第二实施例流程图。
在终端设备在本地无法查找到与网页浏览请求的网页相匹配的网页模板后,如图18所示,本发明还包括步骤S1001,向中间件服务器发送不包含该网页模板ID的第一模板ID列表的网页浏览请求。
步骤S1002,在接收到终端设备发送的网页浏览请求后,中间服务器基于所述网页浏览请求获取所请求的网页数据,并将所获取的网页数据转发给网页模板服务器。
步骤S1003,网页模板ID列表获取单元根据所请求浏览的网页的网页地址,从网页模板ID列表库中获取对应的第二网页模板ID列表。网页模板ID列表库中与网页地址相关联地存储着第二网页模板ID列表。
进入步骤S1004,网页模板数据获取单元顺序获取第二网页模板ID列表中的第二网页模板ID,之后步骤S1005,网页模板数据获取单元基于当前获取的第二网页模板ID,从网页模板数据存储单元中获取网页模板数据。
之后S1006步骤,差值数据计算模块计算网页数据和从网页模板数据存储单元中获取的网页模板数据之间的差值数据。
然后S1007步骤,判断所述差值数据与网页数据之间压缩比是否小于第一预定阈值。
在所计算出的差值数据与网页数据之间压缩比小于第一预定阈值时,在进入到步骤S1008,确定模块将所述差值数据确定为所述差量数据,之后进入步骤S1009,发送单元将当前使用的网页模板ID返回给中间件服务器供中间件服务器将所接收的网页模板ID和网页数据同时发送给终端设备。
之后终端设备的网页模板下载单元基于该网页模板ID经由中间件服务器从网页模板数据存储单元中下载对应的网页模板数据,且由网页模板数据保存单元将网页模板下载单元下载的网页模板数据与对应的网页模板ID相关联地保存。
作为本发明的优选实施例,终端设备的网页模板下载单元基于该网页模板ID经由中间件服务器从网页模板数据存储单元中下载对应的网页模板数据可以在在网页展现之后,智能的在网络空闲时或者wifi情况下请求进行网页模板数据下载,能够避免占用带宽,提高用户的浏览体验。
在终端设备本地没有匹配的网页模板的情况下,中间件服务器在网络空闲的时候将推荐的网页模板数据发送至终端设备,从而在终端设备需要再次使用该网页模板时方便直接调用,不仅较少了带宽占用,还加快了浏览速度,提高用户体验。
在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,进入到步骤S1010,判断当前第二网页模板ID是否第二网页模板ID列表中最后一个网页模板ID。如果不是,则进入步骤S1011,网页模板数据获取单元从第二网页模板ID列表中获取下一第二网页模板ID,作为新的当前获取的第二网页模板ID,之后返回步骤S1005。如果是,则进入步骤S1014,发送单元返回差量数据计算失败的信息给中间件服务器,中间件服务器仅仅返回网页数据给终端设备,本流程结束。
为了防止第二网页模板ID列表中网页模板ID数量过多,导致网页模板服务器计算量过大。作为优选的实施例,本实施例的S1010步骤可以被S1012、S1013替换,S1012步骤为计数单元将述差值数据计算单元的差值计算次数加1,之后步骤S1013,判断所述差值计算单元的差值结算次数是否超过第二预定阈值。当在所述计算次数不超过第二预定阈值,则返回步骤S1011。当超过第二预定阈值时,进入步骤S1014。
作为本发明的优选实施例,本在上述步骤S811、S913、S1014之前还包括,网页模板数据生成单元用于根据中间件发送过来的网页数据中间件生成网页模板数据,且生成对应的网页模板ID,且将网页模板数据和网页模板ID相对应地存储到网页模板数据存储单元,将网页模板ID与网页地址相对应地存储到网页模板ID列表库。网页模板数据生成单元用于根据中间件发送过来的网页数据中间件生成网页模板数据是采用特有的算法快速生成网页模板数据,具体可以是是通过对网页生成哈希值的方法或者对网页数据进行分行的方法生成。由于网页模板可以是网页,因此,也可以将该网页数据本身作为网页模板数据。
当然本发明生成网页模板的时间并不局限于步骤骤S811、S913、S1014之前。也可以在本发明之前,网页模板服务器预先通过接收中间件服务器转发的网页数据而生成网页模板数据。网页模板服务器预先通过接收中间件服务器转发的网页数据而生成网页模板数据的方法中,由于每天通过中间件服务器访问的网页很多,且网页模板服务器可以接收多个中间件服务器的网页数据,所以网页模板服务器从中间件服务器接 收的是海量的网页数据。本发明实施例需要对海量数据进行存储和需要对海量的网页数据进行大量的运算来生成网页模板。所以本发明的实施例可以采用Hadoop(分布式系统基础架构)集群进行数据存储与计算。即网页模板服务器是一个服务器集群,由多个服务器组成。网页模板数据生成单元可以设置在服务器集群的多个服务器中。而所述服务器集群存放网页数据、模板数据、模板索引等采用的是基于Hadoop的HBase(分布式、面向列的开源式数据库)数据库。模板生成采用的是MapReduce(大规模数据集的并行运算方法)计算框架。Hadoop集群是天然的分布式存储和计算框架。只需要网页模板服务器中增加生成网页模板的服务器的数量,即增加包含网页模板数据生成单元的服务器就能够对集群进行横向扩展,具备良好的容灾能力。
基于终端设备的存储能力考虑,终端设备本地存储的网页模板数量或网页模板数据总大小是有限制的,可以设定阈值,如最多只能保存100个模板并且总大小不能超过10MB。本发明网页展现方法中,还可以包括终端设备网页模板数据淘汰步骤。例如如果超过阈值,则可根据LRU(Least Recently Used,简称最近最少)即最近最少使用页面置换算法对模板进行淘汰。利用LRU算法可以对最近使用较少,并且在未来较长一段时间不使用的网页模板进行删除,能够节省终端设备的存储资源。
同样基于网页模板服务器的存储能力考虑,发明网页展现方法中,还可以包括网页模板服务器的网页模板数据淘汰步骤。
用于执行本发明实施例的网页模板生成方法的程序可以存储在计算机可读存储介质中。因而,本发明实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有用于执行本发明实施例的网页模板生成方法的程序。相应地,在本发明的一个实施例中,提供了一种具有处理器可执行的程序代码的计算机可读介质,在被执行时,程序代码使得处理器执行下述步骤:采集网页的网页数据;根据所述网页数据生成所述网页的网页模板;根据生成的所述网页模板生成模板索引。
此外,典型地,本发明所述的移动终端可为各种手持终端设备,例如手机、个人数字助理(PDA)等,因此本发明的保护范围不应限定为某种特定类型的移动终端。
此外,根据本发明的方法还可以被实现为由CPU执行的计算机程序。在该计算机程序被CPU执行时,执行本发明的方法中限定的上述功能。
此外,上述方法步骤以及系统单元也可以利用控制器以及用于存储使得控制器实现上述步骤或单元功能的计算机程序的计算机可读存储设备实现。
此外,应该明白的是,本文所述的计算机可读存储设备(例如,存储器)可以是易失性存储器或非易失性存储器,或者可以包括易失性存储器和非易失性存储器两者。作为例子而非限制性的,非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦写可编程ROM(EEPROM)或快闪存储器。易失性存储器可以包括随机存取存储器(RAM),该RAM可以充当外部高速缓存存储器。作为例子而非限制性的,RAM可以以多种形式获得,比如同步RAM(DRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据速率SDRAM(DDR SDRAM)、增强SDRAM(ESDRAM)、同步链路DRAM(SLDRAM)以及直接Rambus RAM(DRRAM)。所公开的方面的存储设备意在包括但不限于这些和其它合适类型的存储器。
结合这里的公开所描述的各种示例性逻辑块、单元和电路可以利用被设计成用于执行这里所述功能的下列部件来实现或执行:通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑器件、分立门或晶体管逻辑、分立的硬件组件或者这些部件的任何组合。通用处理器可以是微处理器,但是可替换地,处理器可以是任何传统处理器、控制器、微控制器或状态机。处理器也可以被实现为计算设备的组合,例如,DSP和微处理器的组合、多个微处理器、一个或多个微处理器结合DSP核、或任何其它这种配置。
结合这里的公开所描述的方法或算法的步骤可以直接包含在硬件中、由处理器执行的软件单元中或这两者的组合中。软件单元可以驻留在RAM存储器、快闪存储器、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、可移动盘、CD-ROM、或本领域已知的任何其它形式的存储介质中。示例性的存储介质被耦合到处理器,使得处理器能够从该存储介质中读取信息或向该存储介质写入信息。在一个替换方案中,所述存储介质可以与处理器集成在一起。处理器和存储介质可以驻留在ASIC中。ASIC可以驻留在用户终端中。在一个替换方案中,处理器和存储介质可以作为分立组件驻留在用户终端中。
在一个或多个示例性设计中,所述功能可以在硬件、软件、固件或其任意组合中实现。如果在软件中实现,则可以将所述功能作为一个或多个指令或代码存储在计算机可读介质上或通过计算机可读介质来传送。计算机可读介质包括计算机存储介质和通信介质,该通信介质包括有助于将计算机程序从一个位置传送到另一个位置的任何介质。存储介质可以是能够被通用或专用计算机访问的任何可用介质。作为例子而非限制性的,该计算机可读介质可以包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储设备、磁盘存储设备或其它磁性存储设备,或者是可以用于携带或存储形式为 指令或数据结构的所需程序代码并且能够被通用或专用计算机或者通用或专用处理器访问的任何其它介质。此外,任何连接都可以适当地称为计算机可读介质。例如,如果使用同轴线缆、光纤线缆、双绞线、数字用户线路(DSL)或诸如红外线、无线电和微波的无线技术来从网站、服务器或其它远程源发送软件,则上述同轴线缆、光纤线缆、双绞线、DSL或诸如红外先、无线电和微波的无线技术均包括在介质的定义。如这里所使用的,磁盘和光盘包括压缩盘(CD)、激光盘、光盘、数字多功能盘(DVD)、软盘、蓝光盘,其中磁盘通常磁性地再现数据,而光盘利用激光光学地再现数据。上述内容的组合也应当包括在计算机可读介质的范围内。
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (28)

  1. 一种网页模板生成方法,其特征在于,包括:
    采集网页的网页数据;
    根据所述网页数据生成所述网页的网页模板;
    根据生成的所述网页模板生成模板索引。
  2. 根据权利要求1所述的网页模板生成方法,其特征在于,在根据所述网页数据生成所述网页的网页模板之后,所述网页模板生成方法还包括:
    向提供网页模板的多个模板服务器发布所述网页模板和所述模板索引;
    所述多个模板服务器分别存储所述网页模板和所述模板索引;以及
    所述多个模板服务器中的第一模板服务器利用所述模板索引检索与所述网页匹配的网页模板,向所述多个模板服务器中除所述第一模板服务器之外的其它模板服务器提供所述与所述网页匹配的模板。
  3. 根据权利要求2所述的网页模板生成方法,其特征在于,向提供网页模板的多个模板服务器发布所述网页模板和所述模板索引包括:
    在生成多个所述网页模板和所述模板索引之后;
    计算多个所述网页模板的集合与历史模板集合的整体差异率;
    判断所述整体差异率是否大于预设整体差异率阈值;
    如果判断出所述整体差异率大于所述预设整体差异率阈值,则发布所述网页模板和所述模板索引;以及
    如果判断出所述整体差异率不大于所述预设整体差异率阈值,则不发布所述网页模板和所述模板索引。
  4. 根据权利要求1所述的网页模板生成方法,其特征在于,根据生成的所述网页模板生成模板索引包括:
    选取质量符合预定质量条件的模板;
    确定模板适用的URL路径;
    从所述URL路径中选取质量符合所述预定质量条件的模板适用的URL路径;以及
    将选取的路径转换成模板索引。
  5. 根据权利要求1所述的网页模板生成方法,其特征在于,在根据所述网页数据生成所述网页的网页模板之后,所述网页模板生成方法还包括:
    判断所述网页模板的数量是否达到预设数量;
    如果判断出所述网页模板的数量达到所述预设数量,则计算每个网页模板的覆盖率;
    将覆盖率小于第一预设覆盖率阈值的网页模板与大于所述第一预设覆盖率阈值的网页模板进行对比;以及
    如果小于所述第一预设覆盖率阈值的网页模板与大于所述第一预设覆盖率阈值的网页模板的差异率小于预设差异率阈值,则将小于所述第一预设覆盖率阈值的网页模板与大于所述第一预设覆盖率阈值的网页模板合并。
  6. 根据权利要求5所述的网页模板生成方法,其特征在于,将覆盖率小于第一预设覆盖率阈值的网页模板与大于所述第一预设覆盖率阈值的网页模板进行对比包括:
    将多个所述网页模板按照覆盖率大小进行由大到小的排序;以及
    将排在后面的网页模板与排在前面的网页模板进行对比。
  7. 根据权利要求1所述的网页模板生成方法,其特征在于,根据生成的所述网页模板生成模板索引包括:
    存储多个所述网页模板;
    计算每个所述网页模板的覆盖率;
    判断每个路径下的所述网页模板的覆盖率的总和是否达到第二预设覆盖率阈值;以及
    删除所述网页模板的覆盖率的总和未达到所述第二预设覆盖率阈值的路径下的网页模板。
  8. 根据权利要求1所述的网页模板生成方法,其特征在于,在根据所述网页数据生成所述网页的网页模板之后,所述网页模板生成方法包括:
    在获取到中间件服务器响应于所接收的来自终端设备的网页浏览请求后获取并转发的网页数据后,从网页模板服务器中的网页模板数据存储单元获取与所述网页数据对应的网页模板数据;
    基于所述网页数据和所述网页模板数据,生成所述网页数据和网页模板数据之间的差量数据,以及
    将所生成的差量数据经由中间件服务器转发给终端设备,以供终端设备根据所述差量数据和终端设备本地存储的与差量数据对应的网页模板数据展现所请求的网页。
  9. 根据权利要求8所述的方法,其中,所述网页浏览请求包含第一网页模板ID列表,以及
    从网页模板数据存储单元获取的网页数据对应的网页模板数据,以及基于所述网页数据和所述网页模板数据,生成所述网页数据和网页模板数据之间的差量数据包括:
    顺序获取第一网页模板ID列表中的第一网页模板ID来重复执行下述过程,直到生成所述差量数据:
    基于当前获取的第一网页模板ID,从网页模板数据存储单元中获取网页模板数据,以及
    计算网页数据和从网页模板数据存储单元中获取的网页模板数据之间的差值数据,
    在所计算出的差值数据与网页数据之间压缩比小于第一预定阈值时,将所述差值数据确定为所述差量数据,以及
    在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,从第一网页模板ID列表中获取下一第一网页模板ID,作为新的当前获取的第一网页模板ID。
  10. 根据权利要求8所述的方法,其中,所述网页浏览请求包含所请求的网页的网页地址和第一网页模板ID列表,所述网页模板服务器的网页模板ID列表库中与网页地址相关联地存储有第二网页模板ID列表,
    从网页模板数据存储单元获取的网页数据对应的网页模板数据,以及基于所述网页数据和所述网页模板数据,生成所述网页数据和网页模板数据之间的差量数据包括:
    根据所请求浏览的网页的网页地址,从网页模板ID列表库中获取对应的第二网页模板ID列表,
    将第一网页模板ID列表和第二网页模板ID列表合并成第三网页模板ID列表;
    顺序获取第三网页模板ID列表中的网页模板ID来重复执行下述过程,直到生成所述差量数据:
    基于当前获取的网页模板ID,从网页模板数据存储单元中获取网页模板数据,以及
    计算网页数据和从网页模板数据存储单元中获取的网页模板数据之间的差值数据,
    在所计算出的差值数据与网页数据之间压缩比小于第一预定阈值时,将所述差值数据确定为所述差量数据,以及
    在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,从第三网页模板ID列表中获取下一网页模板ID,作为新的当前获取的网页模板ID。
  11. 根据权利要求10所述的方法,其中,将第一网页模板ID列表和第二网页模板ID列表合并成第三网页模板ID列表包括,
    对第一网页模板ID列表和第二网页模板ID列表中的网页模板ID按照优先级进行合并,形成第三网页模板ID列表,其中第一网页模板ID列表和第二网页模板ID列表的交集的优先级最高,第一网页模板ID列表中的剩余部分次之,第二网页模板ID列表中的剩余部分最低。
  12. 根据权利要求10或11所述的方法,还包括:
    在生成所述差量数据后,所述网页模板服务器判断当前使用的网页模板ID是否属于第一网页模板ID列表,以及
    在当前使用的网页模板ID属于第一网页模板ID列表时,所述网页模板服务器将所生成的差量数据和该当前使用的网页模板ID返回给中间件服务器并经由中间件服务器转发给终端设备,
    在当前使用的网页模板ID不属于第一网页模板ID列表时,所述网页模板服务器将当前使用的网页模板ID返回给中间件服务器,并且中间件服务器将所接收的网页模板ID和网页数据发送给终端设备。
  13. 一种网页模板服务器,其特征在于,包括:
    采集单元,用于采集网页的网页数据;
    生成单元,用于根据所述网页数据生成所述网页的网页模板;以及
    索引单元,用于根据生成的所述网页模板生成模板索引。
  14. 根据权利要求13所述的网页模板服务器,其特征在于,所述网页模板服务器还包括:
    发布单元,用于在根据所述网页数据生成所述网页的网页模板之后,向提供网页模板的多个模板服务器发布所述网页模板和所述模板索引;
    存储单元,用于在所述多个模板服务器分别存储所述网页模板和所述模板索引;以及
    模板检索单元,用于利用所述模板索引检索与所述网页匹配的网页模板,向其它服务器提供与所述网页匹配的模板。
  15. 根据权利要求14所述的网页模板服务器,其特征在于,所述发布单元包括:
    计算模块,用于计算多个所述网页模板的集合与历史模板集合的整体差异率;
    判断模块,用于判断所述整体差异率是否大于预设整体差异率阈值;以及
    发布模块,用于在判断出所述整体差异率大于所述预设整体差异率阈值,发布所述网页模板,在判断出所述整体差异率不大于所述预设整体差异率阈值,不发布所述网页模板。
  16. 根据权利要求14所述的网页模板服务器,其特征在于,所述索引单元包括:
    模板选取模块,用于选取质量符合预定质量条件的模板;
    模板路径推导模块,用于确定模板适用的URL路径;
    模板路径剪枝模块,用于从所述URL路径中选取质量符合所述预定质量条件的模板适用的URL路径;以及
    模板索引生成模块,用于将选取的路径转换成模板索引。
  17. 根据权利要求14所述的网页模板服务器,其特征在于,所述网页模板服务器还包括:
    判断单元,用于在根据所述网页数据生成所述网页的网页模板之后判断所述网页模板的数量是否达到预设数量;
    计算单元,用于在判断出所述网页模板的数量达到所述预设数量时,计算每个网页模板的覆盖率;
    对比单元,用于将覆盖率小于第一预设覆盖率阈值的网页模板与大于所述第一预设覆盖率阈值的网页模板进行对比;以及
    合并单元,用于在小于所述第一预设覆盖率阈值的网页模板与大于所述第一预设覆盖率阈值的网页模板的差异率小于预设差异率阈值,将小于所述第一预设覆盖率阈值的网页模板与大于所述第一预设覆盖率阈值的网页模板合并。
  18. 根据权利要求16所述的网页模板服务器,其特征在于,所述对比单元包括:
    排序模块,用于将多个所述网页模板按照覆盖率大小进行由大到小的排序;以及
    对比模块,用于将排在后面的网页模板与排在前面的网页模板进行对比。
  19. 根据权利要求14所述的网页模板服务器,其特征在于,所述索引单元包括:
    存储模块,用于在根据所述网页数据生成所述网页的网页模板之后,存储多个所述网页模板;
    计算模块,用于计算每个所述网页模板的覆盖率;
    第三判断模块,用于判断每个路径下的所述网页模板的覆盖率的总和是否达到第二预设覆盖率阈值;以及
    删除模块,用于删除所述网页模板的覆盖率的总和未达到所述第二预设覆盖率阈值的路径下的网页模板。
  20. 根据权利要求13所述的网页模板服务器,其特征在于,还包括:
    网页模板数据存储单元,用于存储网页模板数据;
    网页模板数据获取单元,用于从网页模板数据存储单元获取与中间件服务器在接收到来自终端设备的网页浏览请求后获取并转发的网页数据对应的网页模板数据;
    差量数据生成单元,用于基于从中间件服务器接收的网页数据和与该网页数据对应的网页模板数据,生成所述网页数据和网页模板数据之间的差量数据,以及
    发送单元,用于将所生成的差量数据经由中间件服务器向终端设备转发所述差量数据,以供终端设备根据所述差量数据和终端设备本地存储的与差量数据对应的网页模板数据展现所请求的网页。
  21. 根据权利要求20所述的网页模板服务器,其中,所述网页浏览请求包含第一网页模板ID列表,所述网页模板数据获取单元被配置为顺序获取第一网页模板ID列表中的网页模板ID,并且基于所获取的网页模板ID,从所述网页模板数据存储单元中获取网页模板数据,以及
    所述差量数据生成单元包括:
    差值数据计算模块,用于计算网页数据和从网页模板数据存储单元中获取的网页模板数据之间的差值数据;和
    确定模块,用于在所计算出的差值数据与网页数据之间压缩比小于第一预定阈值时,将所述差值数据确定为所述差量数据,以及
    在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,所述网页模板数据获取单元和所述差量数据生成单元被配置为重复执行处理过程,直到生成所述差量数据。
  22. 根据权利要求20所述的网页模板服务器,其中,所述网页浏览请求包含网页地址和第一网页模板ID列表,所述网页模板服务器包括,
    网页模板ID列表库,用于与网页地址相关联地存储第二网页模板ID列表;
    网页模板ID列表获取模块,用于根据所请求浏览的网页的网页地址,从网页模板ID列表库中获取对应的第二网页模板ID列表,
    网页模板ID列表合并单元,用于将第一网页模板ID列表和第二网页模板ID列表合并成第三网页模板ID列表;
    所述网页模板数据获取单元被配置为顺序获取第三网页模板ID列表中的网页模板ID,并且基于所获取的网页模板ID,从所述网页模板数据存储单元中获取网页模板数据,以及
    所述差量数据生成单元包括:
    差值数据计算模块,用于计算网页数据和从网页模板数据存储单元中获取的网页模板数据之间的差值数据;和
    确定模块,用于在所计算出的差值数据与网页数据之间压缩比小于第一预定阈值时,将所述差值数据确定为所述差量数据,以及
    在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,所述网页模板数据获取单元和所述差量数据生成单元被配置为重复执行处理过程,直到生成所述差量数据。
  23. 根据权利要求22所述的网页模板服务器,其中,所述网页模板ID列表合并单元被配置为对第一网页模板ID列表和第二网页模板ID列表中的网页模板ID按照优先级进行合并,形成第三网页模板ID列表,其中第一网页模板ID列表和第二网页模板ID列表的交集的优先级最高,第一网页模板ID列表中的剩余部分次之,第二网页模板ID列表中的剩余部分最低。
  24. 根据权利要求20到23中任何一个所述的网页模板服务器,还包括:
    差量数据保存单元,用于与网页模板ID和网页地址相关联地存储差量数据;以及
    差量数据查询单元,用于根据网页模板ID和网页地址,在所述差量数据保存单元中查询相关联的差量数据,以及
    在所述差量数据查询单元没有查询到相关联的差量数据时,所述差量数据生成单元被配置为生成所述差量数据。
  25. 根据权利要求22或23所述的网页模板服务器,其中,所述差量数据生成单元还包括:
    计数单元,用于在所计算出的差值数据与网页数据之间压缩比不小于所述第一预定阈值时,计数所述差值数据计算单元的计算次数,以及
    在所述计算次数不超过第二预定阈值时,所述网页模板数据获取单元被配置为获取下一网页模板ID,并且基于所述下一网页模板ID,从所述网页模板数据存储单元中获取网页模板数据,以及
    所述网页模板服务器还包括:
    差量数据生成失败消息生成单元,用于在所述计算次数超过第二预定阈值时,生成差量数据生成失败消息,以及
    所述发送单元还被配置为向所述中间件服务器返回差量数据生成失败消息,以便所述中间件服务器在接收到所述差量数据生成失败消息后,向终端设备返回网页数据来进行展现。
  26. 根据权利要求22或23所述的网页模板服务器,还包括:
    第二判断单元,用于在生成所述差量数据后,判断所述网页模板数据获取单元当前使用的网页模板ID是否属于第一网页模板ID列表,以及
    在当前使用的网页模板ID属于第一网页模板ID列表时,所述发送单元被配置为将所生成的差量数据和该当前使用的网页模板ID返回给中间件服务器并经由中间件服务器转发给终端设备,
    在当前使用的网页模板ID不属于第一网页模板ID列表时,所述发送单元被配置为将当前使用的网页模板ID返回给中间件服务器,并且中间件服务器将所接收的网页模板ID和网页数据发送给终端设备。
  27. 一种具有处理器可执行的程序代码的计算机可读介质,其特征在于,在被执行时,所述程序代码使得处理器执行下述步骤:
    采集网页的网页数据;
    根据所述网页数据生成所述网页的网页模板;
    根据生成的所述网页模板生成模板索引。
  28. 一种计算机程序,其特征在于,用于执行权利要求1至12中任一项所述的网页模板生成方法。
PCT/CN2014/087822 2013-11-26 2014-09-29 网页模板生成方法和服务器 WO2015078231A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/156,753 US10747951B2 (en) 2013-11-26 2016-05-17 Webpage template generating method and server

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201310605106.X 2013-11-26
CN201310612915.3 2013-11-26
CN201310605106.XA CN103685476B (zh) 2013-11-26 2013-11-26 利用网页模板实现网页展现的方法和网页模板服务器
CN201310612915.3A CN103605770A (zh) 2013-11-26 2013-11-26 网页模板生成方法和服务器

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/156,753 Continuation US10747951B2 (en) 2013-11-26 2016-05-17 Webpage template generating method and server

Publications (1)

Publication Number Publication Date
WO2015078231A1 true WO2015078231A1 (zh) 2015-06-04

Family

ID=53198319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/087822 WO2015078231A1 (zh) 2013-11-26 2014-09-29 网页模板生成方法和服务器

Country Status (2)

Country Link
US (1) US10747951B2 (zh)
WO (1) WO2015078231A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10747951B2 (en) 2013-11-26 2020-08-18 Uc Mobile Co., Ltd. Webpage template generating method and server

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EA201591606A1 (ru) * 2013-03-14 2016-04-29 Викс.Ком Лтд. Устройство, система и способ построения веб-сайтов посредством использования списков данных
US11632317B2 (en) * 2016-09-16 2023-04-18 Oracle International Corporation Conflict resolution design for importing template package in sites cloud service
US10387559B1 (en) * 2016-11-22 2019-08-20 Google Llc Template-based identification of user interest
EP3559820A4 (en) * 2016-12-21 2020-08-26 Open Text Corporation SYSTEMS AND METHODS FOR CONVERTING WEB CONTENT INTO REUSABLE TEMPLATES AND COMPONENTS
US10733149B2 (en) * 2017-05-18 2020-08-04 Nec Corporation Template based data reduction for security related information flow data
CN108021655A (zh) * 2017-12-01 2018-05-11 广东工业大学 一种数据处理方法、装置、设备及可读存储介质
US11170067B2 (en) * 2017-12-13 2021-11-09 Google Llc Methods, systems, and media for updating a webpage rendered with cached content
JP7174343B2 (ja) * 2018-06-29 2022-11-17 ブラザー工業株式会社 テンプレート処理プログラム及び印刷物作成装置
US10896290B2 (en) * 2018-09-06 2021-01-19 Infocredit Services Private Limited Automated pattern template generation system using bulk text messages
CN111125589B (zh) * 2018-10-31 2023-09-05 新方正控股发展有限责任公司 数据采集方法及装置、计算机可读存储介质
US10956659B1 (en) * 2019-12-09 2021-03-23 Amazon Technologies, Inc. System for generating templates from webpages
CN112817586A (zh) * 2021-01-18 2021-05-18 北京致远互联软件股份有限公司 一种页面快速复用方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012821A (zh) * 2010-12-09 2011-04-13 向心力信息技术股份有限公司 一种二次开发适配方法
CN102073670A (zh) * 2010-10-26 2011-05-25 百度在线网络技术(北京)有限公司 一种用于调试在线网页模板的方法、设备及系统
CN102819591A (zh) * 2012-08-07 2012-12-12 北京网康科技有限公司 一种基于内容的网页分类方法及系统
CN103605770A (zh) * 2013-11-26 2014-02-26 优视科技有限公司 网页模板生成方法和服务器
CN103685476A (zh) * 2013-11-26 2014-03-26 优视科技有限公司 利用网页模板实现网页展现的方法和网页模板服务器

Family Cites Families (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6990628B1 (en) * 1999-06-14 2006-01-24 Yahoo! Inc. Method and apparatus for measuring similarity among electronic documents
US7660819B1 (en) * 2000-07-31 2010-02-09 Alion Science And Technology Corporation System for similar document detection
WO2002023401A2 (en) 2000-09-12 2002-03-21 Citrix Systems, Inc. A system and method for accessing web pages
US7269784B1 (en) * 2001-01-22 2007-09-11 Kasriel Stephane Server-originated differential caching
US7171443B2 (en) * 2001-04-04 2007-01-30 Prodigy Communications, Lp Method, system, and software for transmission of information
US7092997B1 (en) * 2001-08-06 2006-08-15 Digital River, Inc. Template identification with differential caching
US7970816B2 (en) * 2002-03-01 2011-06-28 NetSuite Inc. Client-side caching of pages with changing content
US7792951B2 (en) * 2002-12-10 2010-09-07 International Business Machines Corporation Apparatus and methods for classification of web sites
US20040249824A1 (en) * 2003-06-05 2004-12-09 International Business Machines Corporation Semantics-bases indexing in a distributed data processing system
US20050060643A1 (en) * 2003-08-25 2005-03-17 Miavia, Inc. Document similarity detection and classification system
EP1881422A1 (en) * 2005-04-20 2008-01-23 Intellectual Property Bank Corp. Device for extracting index work in document to be examined and document feature analyzer
US20070112867A1 (en) * 2005-11-15 2007-05-17 Clairvoyance Corporation Methods and apparatus for rank-based response set clustering
US20070112898A1 (en) * 2005-11-15 2007-05-17 Clairvoyance Corporation Methods and apparatus for probe-based clustering
US7676465B2 (en) * 2006-07-05 2010-03-09 Yahoo! Inc. Techniques for clustering structurally similar web pages based on page features
US7680858B2 (en) * 2006-07-05 2010-03-16 Yahoo! Inc. Techniques for clustering structurally similar web pages
US8745055B2 (en) * 2006-09-28 2014-06-03 Symantec Operating Corporation Clustering system and method
US20080147875A1 (en) 2006-12-18 2008-06-19 International Business Machines Corporation System, method and program for minimizing amount of data transfer across a network
CN101276362B (zh) 2007-03-26 2011-05-11 国际商业机器公司 定制网页的装置和方法
WO2008141429A1 (en) * 2007-05-17 2008-11-27 Fat Free Mobile Inc. Method and system for generating an aggregate website search database using smart indexes for searching
US20090019133A1 (en) 2007-07-13 2009-01-15 Stephen Brimley System, method and computer program for updating a web page in a web browser
US8239387B2 (en) * 2008-02-22 2012-08-07 Yahoo! Inc. Structural clustering and template identification for electronic documents
US7962523B2 (en) * 2008-04-11 2011-06-14 Yahoo! Inc. System and method for detecting templates of a website using hyperlink analysis
US20090287668A1 (en) * 2008-05-16 2009-11-19 Justsystems Evans Research, Inc. Methods and apparatus for interactive document clustering
US20100169311A1 (en) * 2008-12-30 2010-07-01 Ashwin Tengli Approaches for the unsupervised creation of structural templates for electronic documents
US8254698B2 (en) * 2009-04-02 2012-08-28 Check Point Software Technologies Ltd Methods for document-to-template matching for data-leak prevention
US8165974B2 (en) * 2009-06-08 2012-04-24 Xerox Corporation System and method for assisted document review
JP5817531B2 (ja) * 2009-12-22 2015-11-18 日本電気株式会社 文書クラスタリングシステム、文書クラスタリング方法およびプログラム
US8832102B2 (en) * 2010-01-12 2014-09-09 Yahoo! Inc. Methods and apparatuses for clustering electronic documents based on structural features and static content features
CN101727500A (zh) 2010-01-15 2010-06-09 清华大学 一种基于流聚类的中文网页文本分类方法
US8422786B2 (en) * 2010-03-26 2013-04-16 International Business Machines Corporation Analyzing documents using stored templates
CN101807209A (zh) 2010-04-14 2010-08-18 深圳市同洲电子股份有限公司 网页制作的方法和系统,客户端、服务端
US9317622B1 (en) * 2010-08-17 2016-04-19 Amazon Technologies, Inc. Methods and systems for fragmenting and recombining content structured language data content to reduce latency of processing and rendering operations
CN101950312B (zh) 2010-08-18 2012-07-04 赵清政 一种互联网网页内容解析方法
CN101916285B (zh) 2010-08-20 2016-06-08 北京新岸线移动多媒体技术有限公司 一种互联网网页内容解析方法及装置
US8543520B2 (en) * 2011-03-09 2013-09-24 Telenav, Inc. Navigation system with single pass clustering based template generation mechanism and method of operation thereof
CN102170438B (zh) 2011-04-20 2014-08-13 中广传播集团有限公司 基于移动多媒体广播的富媒体音频广播前端及系统
US10061860B2 (en) * 2011-07-29 2018-08-28 Oath Inc. Method and system for personalizing web page layout
CN102298617A (zh) 2011-08-02 2011-12-28 百度在线网络技术(北京)有限公司 一种用于获取目标页面的方法与设备
US9916538B2 (en) * 2012-09-15 2018-03-13 Z Advanced Computing, Inc. Method and system for feature detection
US8311973B1 (en) * 2011-09-24 2012-11-13 Zadeh Lotfi A Methods and systems for applications for Z-numbers
US8650196B1 (en) * 2011-09-30 2014-02-11 Google Inc. Clustering documents based on common document selections
US9020947B2 (en) * 2011-11-30 2015-04-28 Microsoft Technology Licensing, Llc Web knowledge extraction for search task simplification
JP5929369B2 (ja) * 2012-03-16 2016-06-01 日本電気株式会社 電子文書データベースを含む情報処理装置、不正格納文書検出方法、及びプログラム
US20130339840A1 (en) * 2012-05-08 2013-12-19 Anand Jain System and method for logical chunking and restructuring websites
US8543576B1 (en) * 2012-05-23 2013-09-24 Google Inc. Classification of clustered documents based on similarity scores
CN103024013A (zh) 2012-12-03 2013-04-03 百度在线网络技术(北京)有限公司 差异化的传输方法、系统和装置
US9373031B2 (en) * 2013-03-14 2016-06-21 Digitech Systems Private Reserve, LLC System and method for document alignment, correction, and classification
CN104182408B (zh) * 2013-05-23 2019-01-29 腾讯科技(深圳)有限公司 一种网页离线访问方法及装置
US9529790B2 (en) * 2013-07-09 2016-12-27 Flipboard, Inc. Hierarchical page templates for content presentation in a digital magazine
US9483444B2 (en) * 2013-07-09 2016-11-01 Flipboard, Inc. Dynamic layout engine for a digital magazine
US9489349B2 (en) * 2013-07-09 2016-11-08 Flipboard, Inc. Page template selection for content presentation in a digital magazine
US9396255B2 (en) * 2013-08-22 2016-07-19 Xerox Corporation Methods and systems for facilitating evaluation of documents
US20170308517A1 (en) * 2013-09-11 2017-10-26 Google Inc. Automatic generation of templates for parsing electronic documents
US10445063B2 (en) * 2013-09-17 2019-10-15 Adobe Inc. Method and apparatus for classifying and comparing similar documents using base templates
US9292579B2 (en) * 2013-11-01 2016-03-22 Intuit Inc. Method and system for document data extraction template management
CN103618787B (zh) 2013-11-26 2017-03-15 优视科技有限公司 一种网页展现系统和方法
WO2015078231A1 (zh) 2013-11-26 2015-06-04 优视科技有限公司 网页模板生成方法和服务器

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073670A (zh) * 2010-10-26 2011-05-25 百度在线网络技术(北京)有限公司 一种用于调试在线网页模板的方法、设备及系统
CN102012821A (zh) * 2010-12-09 2011-04-13 向心力信息技术股份有限公司 一种二次开发适配方法
CN102819591A (zh) * 2012-08-07 2012-12-12 北京网康科技有限公司 一种基于内容的网页分类方法及系统
CN103605770A (zh) * 2013-11-26 2014-02-26 优视科技有限公司 网页模板生成方法和服务器
CN103685476A (zh) * 2013-11-26 2014-03-26 优视科技有限公司 利用网页模板实现网页展现的方法和网页模板服务器

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10747951B2 (en) 2013-11-26 2020-08-18 Uc Mobile Co., Ltd. Webpage template generating method and server

Also Published As

Publication number Publication date
US20160335243A1 (en) 2016-11-17
US10747951B2 (en) 2020-08-18

Similar Documents

Publication Publication Date Title
WO2015078231A1 (zh) 网页模板生成方法和服务器
US9077681B2 (en) Page loading optimization using page-maintained cache
KR101584828B1 (ko) 웹-기반 다중사용자 협업
US9015269B2 (en) Methods and systems for notifying a server with cache information and for serving resources based on it
CN110401724B (zh) 文件管理方法、文件传输协议服务器及存储介质
WO2015078160A1 (zh) 一种网页展现系统和方法
CN105144121A (zh) 高速缓存内容可寻址数据块以供存储虚拟化
CN106933965B (zh) 静态资源请求的方法
CN104516979A (zh) 一种基于二次检索的数据查询方法及系统
US20130185429A1 (en) Processing Store Visiting Data
US11030262B2 (en) Recyclable private memory heaps for dynamic search indexes
CN110347651A (zh) 基于云存储的数据同步方法、装置、设备及存储介质
JP2018049653A (ja) キャッシュ管理
JP2016046809A (ja) コンテンツ中心ネットワークにおけるオールインワンコンテンツストリームについてのシステム及び方法
CN103544149A (zh) 一种访问图片的方法、系统和图片服务器
Changtong An improved HDFS for small file
CN103605770A (zh) 网页模板生成方法和服务器
CN110765086A (zh) 一种小文件的目录读取方法、系统、电子设备及存储介质
Gao et al. An effective merge strategy based hierarchy for improving small file problem on HDFS
CN109947718A (zh) 一种数据存储方法、存储平台及存储装置
Zhou et al. Sfmapreduce: An optimized mapreduce framework for small files
CN106372109A (zh) 互联网资源文件缓存方法及装置
US20180302489A1 (en) Architecture for proactively providing bundled content items to client devices
US20160315997A1 (en) File transfer method, device, and system
Jin Research on data retrieval and analysis system based on Baidu reptile technology in big data era

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14866268

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 02/08/2016)

122 Ep: pct application non-entry in european phase

Ref document number: 14866268

Country of ref document: EP

Kind code of ref document: A1