WO2017107570A1 - 一种基于HTML5应用缓存的移动Web缓存优化方法 - Google Patents

一种基于HTML5应用缓存的移动Web缓存优化方法 Download PDF

Info

Publication number
WO2017107570A1
WO2017107570A1 PCT/CN2016/098292 CN2016098292W WO2017107570A1 WO 2017107570 A1 WO2017107570 A1 WO 2017107570A1 CN 2016098292 W CN2016098292 W CN 2016098292W WO 2017107570 A1 WO2017107570 A1 WO 2017107570A1
Authority
WO
WIPO (PCT)
Prior art keywords
resource
cache
resources
page
time
Prior art date
Application number
PCT/CN2016/098292
Other languages
English (en)
French (fr)
Inventor
黄罡
刘譞哲
马郓
东帅亮
梅宏
Original Assignee
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学 filed Critical 北京大学
Priority to US15/514,632 priority Critical patent/US20180285470A1/en
Publication of WO2017107570A1 publication Critical patent/WO2017107570A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9574Browsing optimisation, e.g. caching or content distillation of access to content, e.g. by caching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44568Immediately runnable code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • G06F9/45529Embedded in an application, e.g. JavaScript in a Web browser
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/563Data redirection of data network streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/18Information format or content conversion, e.g. adaptation by the network of the transmitted or received information for the purpose of wireless delivery to users or terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates

Definitions

  • the invention is a mobile web cache optimization method based on HTML5 application cache, and belongs to the technical field of software.
  • Web applications are applications that are developed using web technologies such as HTML, JavaScript, and CSS and accessed through a browser. They are one of the most important forms of software on mobile devices. Compared with traditional personal computers, mobile devices have limited computing power, poor network environment, slow access speed of mobile web applications, and consumes a lot of data traffic, which seriously affects the user experience of mobile web applications. Caching is an important technical means to improve the performance of web applications.
  • a web application consists of a number of web resources. The cache stores the downloaded web resources in the local space. When the cached resources are requested again, they can be loaded locally. Cache can reduce the number of network requests, thereby reducing the data traffic consumption when the web application accesses, thereby increasing the loading speed of the web application. At the same time, locally acquiring resources also saves the computing resources of the mobile device, and meets the requirements of lightweight computing of mobile devices.
  • the traditional web cache is based on the caching mechanism provided by the HTTP protocol.
  • the mechanism specifically provides two models: the expiration model requires the developer to configure an expiration time for the web resource. When the expiration time is not reached, the browser can directly load the resource from the cache; the verification model requires the developer to configure an identifier for the web resource. The identifier may be a modification time or a unique identifier.
  • the browser sends the configured resource identifier to the server, and the server determines whether the corresponding resource changes by using the identifier, and returns only one header information if there is no change. Otherwise, the updated resources are returned to the browser.
  • the performance of the mobile web cache is problematic, resulting in a large number of redundant requests, affecting the performance of the mobile web application.
  • Application Cache is an offline application interface provided by HTML5: Web developers can create Manifest files, declare a list of resources that can be cached locally, and configure Manifest files on the main HTML page of the web application.
  • Web developers can create Manifest files, declare a list of resources that can be cached locally, and configure Manifest files on the main HTML page of the web application.
  • the resources declared in the Manifest file can be directly read from the local; when the user accesses the online, the browser automatically checks the update status of the Manifest file, and when the Manifest changes, the browser can Automatically update all resources declared by Manifest.
  • the HTML5 application cache actually provides a fine-grained control interface to the web application cache. Accordingly, the present invention proposes an automated development technique to help developers optimize the caching of mobile web applications.
  • the object of the present invention is to provide a method for optimizing the mobile web cache based on the HTML5 application cache, and the core idea is: for a mobile web application, the application is automatically obtained on the server side. The update status of the included resources, predicting the update time of each resource, thereby selecting a more stable set of resources to be configured into the Manifest file of the HTML5 application cache, and updating the Manifest when the resource content in the Manifest file changes; browsing on the client side
  • the utility provides a JavaScript runtime library, and the developer can add the runtime library to its mobile web application, so that the mobile web application can utilize the HTML5 application cache; the invention supports the developer to modify the application conveniently and quickly.
  • the invention is mainly divided into three parts:
  • a tool that runs on the server and automatically generates, maintains, and updates Manifest files.
  • the core of the invention is to use a tool to analyze the resource data of the mobile web application and maintain the Manifest list, thereby providing an effective cache service for the client.
  • the core tool consists of four steps:
  • the tool first crawls all resources under a given mobile web application at regular intervals to obtain resource information at each time point.
  • resource mapping maps the URL of each resource to a regular expression.
  • Resources that match the same regular expression are treated as the same resource. That is, for resources with different URLs but the same content (such as a.jpg?123 and a.jpg?345), after the server crawls, they know that they are the same picture (the same content), so an expression is generated instead of the two. Resources. By generating a regular expression for the URLs of these originally content-like resources, these resources can be not downloaded repeatedly.
  • forecast time According to the resource information at each time point, learn to identify the law of resource change and predict the time when the resource remains unchanged.
  • the tool first crawls the resources of the target mobile web application at regular intervals to obtain resource information at each time point.
  • the tool continuously accesses and renders the page according to the specified URL and the access interval, parses the resources contained in the webpage, and obtains resource information, such as the size of the resource, the MD5 value of the resource content, and the cache time configuration of the resource.
  • the access interval can be given by the developer in conjunction with the actual situation of the website, or it can be automatically selected by the tool.
  • Resource mapping The tool supports the identification of resources whose URLs are dynamically changing. Many of the resources obtained from the first step are dynamically generated. These resources will have different URLs even if the content is exactly the same, and the tool will map them to the same resource. For example, resources dynamically requested by AJAX often have AJAX timestamps, and the host name, path name, and port number are exactly the same. In the mapping, these time-stamped resources are mapped to the same resource. It is worth noting that the correspondence between URLs and regular expressions is relatively ambiguous. If the regular expressions corresponding to a set of URLs are too broad, conflicts may occur between regular expressions. By default, the tool uses a stricter regular expression generation method to generate a mapping target by identifying the longest common substring of a set of URLs from those URLs with the same resource content but different URLs.
  • the algorithm pseudo code used for resource mapping is as follows:
  • the input of the algorithm is a regularized resource list H t-1 at time t-1 and a specific resource list R t at time t , and a regularized resource list H t at time t is generated.
  • Regularization means that resources in H can be uniquely determined by regular expressions. Firstly completion initialization (L1-L4), the n-time t of the Resource List H t is initialized to a positive time t-1 is of the resource list H t-1, and each resource state is "inexistence" ( does not exist). The body portion (L5-L20) for each resource r in R, to obtain the mapping relationship regular URL and its expression in H t.
  • H t r and the corresponding resources the newly added record of the r (L12-L15) are in H t. If there is a unique resource in H t that corresponds to r, then r is mapped to H t and the regular expression (L8-L11) is recalculated. If there are multiple resources in H t corresponding to r, the original mapping fails, the original mapping is deleted, and the record about r is newly added in H t (L16-L19).
  • Forecast time Predicting the time that each resource remains unchanged by crawling historical information, only long-term unchanging resources can be put into the application cache to bring considerable profits; on the contrary, if the resources placed in the application cache change too frequently, it will lead to The entire application cache is constantly being refreshed, which in turn offsets the optimization effects it brings, which is not worth the candle.
  • the tool extracts the MD5 of each resource at each moment from the historical information, obtains the time series of the change situation, and finally completes the prediction by linear regression under the time series.
  • the pseudo code of the algorithm used for prediction time is as follows:
  • the input of the algorithm is all historical state information of a resource. There may be three historical states, unchanged, changed, and non-existent. According to the characteristics of the network resources, the resources disappear at a certain moment, and the possibility of occurrence of the resources at the next moment is relatively small. Therefore, for resources whose current state is "non-existent", the algorithm prediction time is 0 (L1-L3). For other resources, the algorithm uses linear regression to predict the time of the change. GDM is a commonly used gradient descent algorithm in linear regression and is an efficient online algorithm (L4-L9). Finally, the algorithm is also responsible for deleting resources with short prediction time, reducing the number of resources that need to be processed, and improving computational efficiency (L10-L12).
  • the tool will consider the various aspects of a resource, and weigh the pros and cons to determine the resources that are placed in the application cache.
  • the factors that affect whether a resource is cached are: the size of the resource, the time when the prediction remains unchanged, the configuration of the cache, and the distribution of users of the mobile web application itself. Larger resources, as well as long-term stable resources, often yield better benefits.
  • the cache configuration also has a great impact on the resource cache: the resource with a longer cache time can work well through the HTTP cache protocol; accordingly, the shorter the cache configuration time of the resource itself, the greater the additional benefit obtained. .
  • the application's user access distribution also affects the selection of resources.
  • the tool comprehensively considers the various influencing factors, calculates the best resource set, and configures it into the Manifest file of the HTML5 application cache.
  • the algorithm pseudo code for selecting a resource to use is as follows:
  • the algorithm enumerates a list from short to long according to the update time. And given an update time, the traffic that can be saved by putting a resource into the application cache can be expressed as L7.
  • the expression L7 indicates that the traffic saved by a resource by putting it into the application cache is caused by the difference between the expected cache time and the previous default cache time after the resource is placed in the application cache.
  • the traffic saved by putting a resource into the application cache (expected cache time - cache time of the resource configuration) * The resource size multiplied by the user access profile is the overall saved traffic. Therefore, for a given update time Ti,
  • is the user access distribution function. This allows you to enumerate the benefits of all possible combinations (L2-L10). Finally, the algorithm selects the combination with the most profit, that is, the maximum value of all benefit(i), and sets its corresponding resource set to the Manifest file of the HTML5 application cache.
  • a JavaScript library running in the client browser including:
  • This tool provides developers with a complete deployment solution.
  • the deployment content is divided into three steps.
  • the first step is to add a call to the JavaScript library in the target page.
  • the second step is to generate a blank page as the proxy page, and parse the URL of the original home page to the proxy page.
  • the original home page becomes a resource requested from the proxy page. We call this blank page a proxy page because it can be used.
  • the third step is to run the tool.
  • the JavaScript library is called, so that the original page has the function of intercepting the URL request and obtaining the cache information.
  • the deployed application page needs to be changed to an automatically generated proxy page, and the original page is requested as a resource in the proxy page (step 2).
  • the first and second step here is programmatic and can be automatically generated by a tool with one click.
  • the solution uses the invention tool to obtain network resource information simply and effectively, and effectively improves the cache hit rate of the resource by predicting the time in advance, saves the access time, and improves the user experience of the mobile device.
  • Figure 1 is a flow chart of the method of the present invention.
  • This section gives an example of using the caching method on the website of the School of Information Science and Technology of Peking University (http://eecs.pku.edu.cn).
  • the processing method is shown in the figure.
  • the website is the portal of the School of Information Science and Technology of Peking University, which includes modules such as college news, notice announcements, educational notices, and lecture information.
  • a proxy page is generated, and the URL of the original home page is parsed into the proxy page, and the original home page becomes a resource requested from the proxy page.
  • access the original URL such as http://eecs.pku.edu.cn
  • the client first requests the proxy page, and then requests all the original resources in the proxy page. If some of these resources can be effectively mapped to the regular expressions recorded in the resource list, the previously added JavaScript function will automatically replace the URL and instead request the cache resource.
  • the server automatically runs the tool.
  • the tool automatically crawls and analyzes the page, and provides and maintains a cache resource list Manifest on the server side, the cache resource list contains various information of the resource, and is connected to the proxy page through the application cache interface.
  • the user still accesses the web application through the original URL and has a better experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于HTML5应用缓存的移动Web缓存优化方法。本方法为:1)服务器端定期爬取设定移动Web应用所包含资源信息;2)将内容相同但对应不同URL的资源映射为同一资源;3)选取一组稳定的资源配置到缓存资源列表中,同时生成一个资源映射文件;4)设置一JavaScript运行库;在每一目标页面中添加该运行库的调用指令;5)对每一目标页面生成一代理页面,将目标页面的URL解析到对应代理页面,然后访问一目标页面时,根据请求的资源查询该资源映射文件,然后根据查询结果从该缓存资源列表中读取匹配的缓存资源加载到该代理页面。本方法节约移动Web应用的访问时间和数据流量,提高了移动设备的用户体验。

Description

一种基于HTML5应用缓存的移动Web缓存优化方法 技术领域
本发明是一种基于HTML5应用缓存的移动Web缓存优化方法,属于软件技术领域。
背景技术
Web应用是采用HTML、JavaScript、CSS等Web技术开发的、通过浏览器访问的应用软件,是移动设备上最主要的软件形态之一。与传统的个人计算机相比,移动设备计算能力有限、网络环境差,移动Web应用的访问速度慢、消耗数据流量多,严重影响移动Web应用的用户体验。缓存是提高Web应用性能的一种重要技术手段。一个Web应用由众多Web资源构成,缓存是将已经下载过的Web资源存储在本地空间,当缓存的资源被再次请求时可以直接从本地加载。缓存可以减少网络请求数量,从而减少Web应用访问时的数据流量消耗,进而提高Web应用的加载速度;同时,本地获取资源也节省了移动设备的计算资源,符合移动设备轻量级计算的要求。
传统的Web缓存是基于HTTP协议提供的缓存机制。该机制具体提供了两种模型:过期模型要求开发者给Web资源配置一个过期时间,当过期时间未到时,浏览器可直接从缓存加载资源;验证模型要求开发者给Web资源配置一个标识,该标识可以为修改时间或唯一标识符,当资源过期时,浏览器将配置的资源标识发送给服务器,服务器通过标识来判定相应的资源是否发生变化,如果没有变化则只返回一个头部信息,否则就将更新的资源返回给浏览器。在实践中,由于Web开发者缓存配置不当以及大量动态资源的存在,移动Web缓存的性能存在问题,导致了大量的冗余请求,影响移动Web应用的性能。
HTML5的发展和普及,为移动Web应用的体验优化带来了新的技术思路。应用缓存(Application Cache)是HTML5提供的离线应用接口:Web开发者可创建Manifest文件,声明可被缓存在本地的资源列表,并将Manifest文件配置到Web应用的主HTML页面上。由此,当用户离线访问Web应用时,Manifest文件中声明的资源可直接从本地读取;当用户在线访问时,浏览器会自动检查Manifest文件的更新状况,当Manifest发生变化时,浏览器可自动更新Manifest所声明的所有资源。HTML5应用缓存实际上提供了一种对Web应用缓存的细粒度控制接口。因此,本发明提出一种自动化的开发技术来帮助开发者优化移动Web应用的缓存。
发明内容
针对现有移动Web应用缓存中存在的问题,本发明的目的是基于HTML5应用缓存提供一种优化移动Web缓存的方法,其核心思想为:针对一个移动Web应用,在服务器端通过自动获取该应用所包含资源的更新状态,预测各资源的更新时间,从而选取较稳定的一组资源配置到HTML5应用缓存的Manifest文件中,并且在Manifest文件中的资源内容发生变化时更新Manifest;在客户端浏览器提供一个JavaScript运行库,开发者可将运行库加入到其移动Web应用之中,使得移动Web应用可利用HTML5应用缓存;本发明支持开发者方便快捷地改造其应用。
本发明主要分为三个部分:
1.一个运行在服务器端,自动生成、维护、更新Manifest文件的工具。
2.运行在客户端浏览器的JavaScript库。
3.一套部署方案。
本发明的核心在于利用一个工具分析移动Web应用的资源数据,维护Manifest列表,从而为客户端提供有效的缓存服务。核心工具包含四个步骤:
1,自动爬取。工具首先按照一定间隔不断爬取给定移动Web应用下的所有资源,获取每个时间点的资源信息。
2,资源映射。工具将每个资源的URL映射为一个正则表达式。匹配到同一正则表达式的资源视为同一资源。即对于URL不同但内容相同的资源(如a.jpg?123和a.jpg?345),通过服务器爬取之后知道它们是同一个图片(内容一样),所以会生成一个表达式来代替这两个资源。通过将这些本来内容一样的资源的URL生成一个正则表达式,从而使得这些资源可以不被重复下载。
3,预测时间。根据每个时间点的资源信息,学习识别资源变化规律,预测资源维持不变的时间。
4,选择资源。根据预测时间的结果判断选取最佳的资源集合,生成或更新HTML5应用缓存的Manifest配置文件。
上述步骤的具体技术方案如下:
1.自动爬取。工具首先按照一定间隔不断爬取目标移动Web应用的资源,获取每个时间点的资源信息。工具按照指定的URL和访问间隔不断访问并渲染页面,解析网页包含的资源,获取资源信息,如资源的大小、资源内容的MD5值、资源的缓存时间配置情况等。访问间隔可以由开发者结合网站实际情况给出,也可以由工具自动选择。
2.资源映射。工具支持识别URL动态变化的资源。由第一步获得的资源中,有很多资源是动态生成的。这些资源即使内容完全一样,也会有不同的URL,工具会把他们映射为同一资源。比如由AJAX动态请求的资源往往会带有AJAX的时间戳而主机名、路径名、端口号完全一样,在映射中,这些带时间戳的资源都会被映射为同一个资源。值得注意的是,URL和正则表达式的对应关系是相对模糊的,如果一组URL对应的正则表达式涵义太广泛,则正则表达式之间可能产生冲突。工具默认选用比较严格的正则表达式生成方法,即通过从资源内容一样但URL不一样的这些URL中识别一组URL的最长公共子串生成映射目标。资源映射使用的算法伪代码如下:
Figure PCTCN2016098292-appb-000001
算法的输入是t-1时刻的正则化的资源列表Ht-1和t时刻的具体资源列表Rt,生成t时刻的正则化资源列表Ht。正则化是指H中的资源是可以用正则表达式唯一确定的。算法首先完成初始化的工作(L1-L4),将t时刻的正则化资源列表Ht初始化为t-1时刻的正则化资源列表Ht-1,并设置每个资源的状态为“inexistence”(不存在)。主体部分(L5-L20)是对于每个R中的资源r,得到它的URL和Ht中的正则表达式的映射关系。如果Ht中没有资源和r对应,则在Ht中新添加关于r的记录(L12-L15)。如果Ht中有唯一资源和r对应,则将r映射到Ht并重新计算正则表达式(L8-L11)。如果Ht中有多个资源和r对应,则原有的映射失败, 删除原有映射,并且重新在Ht中新添加关于r的记录(L16-L19)。
3.预测时间。通过爬取的历史信息预测每个资源维持不变的时间,只有长期不变的资源配置到应用缓存中才能带来可观的收益;相反,如果放入应用缓存的资源变化过于频繁,那么会导致整个应用缓存不断被刷新,进而抵消了其带来的优化效果,得不偿失。技术实现上,工具从历史信息中提取每个资源在每个时刻的MD5,获取变化情况的时间序列,最后借助时间序列下的线性回归完成预测。预测时间使用的算法伪代码如下:
Figure PCTCN2016098292-appb-000002
算法的输入是一个资源所有的历史状态信息。历史状态可能有三种,未改变、改变、不存在。根据网络资源的特性,某一时刻资源消失,下一时刻该资源出现的可能性比较小,因此,对于当前时刻状态为“不存在”的资源,算法预测时间为0(L1-L3)。对于其他资源,算法使用线性回归预测变化的时间。GDM是线性回归中常用的梯度下降算法,是一种高效的在线算法(L4-L9)。最后该算法还负责删除那些预测时间很短的资源,减少需要处理的资源数目,提高运算效率(L10-L12)。
4选择资源。在这一步,工具将综合考虑一个资源的各方面性质,权衡利弊决定放入应用缓存中的资源。影响一个资源是否被缓存的因素有:资源的大小,预测维持不变的时间,缓存的配置,移动Web应用本身用户分布。比较大的资源,以及长期保持稳定不变的资源往往能获得更好的效益。缓存配置也会对资源缓存有很大的影响:本身配置缓存时间较长的资源通过HTTP缓存协议就可以很好的工作;相应的,资源本身的缓存配置时间越短,获得的额外效益越大。最后,应用的用户访问分布也会影响到资源的选取。工具综合考虑权衡各种影响因素,计算出最佳资源集合,配置到HTML5应用缓存的Manifest文件中。选择资源使用的算法伪代码如下:
Figure PCTCN2016098292-appb-000003
由于一个资源列表的总体更新时间取决于列表中更新最频繁的那个资源,算法对一个列表按更新时间从短到长进行枚举。而给定一个更新时间后,将一个资源放入应用缓存可节约的传输流量可以表示为L7。L7这条表达式表示,一个资源通过放入应用缓存所节省的流量,是由于该资源放入应用缓存后所预期达到的缓存时间与之前默认缓存时间之差造成的,即
某资源放入应用缓存所节省的流量=(预期缓存时间-该资源配置的缓存时间)*资源大小上式乘以用户访问分布就是总体上所能节省的流量。因此,对于给定的更新时间Ti,
Figure PCTCN2016098292-appb-000004
其中σ是用户访问分布函数。由此可以枚举计算所有可能组合的收益(L2-L10)。最后算法选择收益最大的组合,即所有benefit(i)中的最大值,并且把它对应的资源集合设置到HTML5应用缓存的Manifest文件中。
运行在客户端浏览器的JavaScript库,包括:
1.拦截页面请求、获取请求URL的接口。在页面中调用该接口,可以自动拦截页面解析过程中所发出的所有请求的URL,并且与应用缓存中的资源列表进行比对,如果缓存列表中有该资源的正则表达式映射,可以自动实现URL的替换,从而避免冗余资源的传输。
2.与HTML5应用缓存的交互功能。主要包括对缓存资源的查询、检测、正则表达式的比对等等。
部署方案:
本工具为开发者提供完善的部署方案。部署内容分为三步。第一步,在目标页面中添加调用JavaScript库。第二步,生成一个空白页面作为代理页,将原来主页的URL解析到代理页面,原来的主页成为从该代理页面处请求的一个资源,我们称这个空白页面为代理页面,因为它可以用来加载原页面的资源。第三步,运行工具。第一步中调用JavaScript库,使得原来的页面具有拦截URL请求和获取缓存信息的功能。由于HTML5应用缓存的限制,部署后的应用页面需要改为一个自动生成的代理页面,原页面作为资源在代理页面中被请求(第二步)。这里的第一第二步是程序化的,可以通过工具一键自动生成。
需要注意的是,原网页的URL需要重定向为新生成的代理页面。之所以需要重定向,是为了解决应用缓存HTML页面的弊端。这种部署方案更加具有一般性。针对于主页固定的网站,部署方案的第二步也可以省略。上述两种方案都是程序化的,可以由工具一键生成,也可以由开发者手动调用。
与现有技术相比,本发明的积极效果为:
本方案借助发明工具简单有效地获取网络资源信息,通过提前预测时间的方式有效提高了资源的缓存命中率,节约访问时间,提高了移动设备的用户体验。
附图说明
图1为本发明的方法流程图。
具体实施方式
本节以北京大学信息科学技术学院网站(http://eecs.pku.edu.cn)给出使用该缓存方法的实例,其处理方法流程如图所示。该网站是北京大学信息科学技术学院的门户网站,包含学院新闻、通知公告、教务通知、讲座信息等模块。
首先,在原网页的HTML文件中添加调用JavaScript库的命令,提供自动拦截URL解析请求的任务,并且可以和缓存列表进行交互。
接下来,生成代理页面,并将原来主页的URL解析到代理页面,原来的主页成为从该代理页面处请求的一个资源。此时访问原先的URL,如http://eecs.pku.edu.cn,客户端先请求代理页面,然后在代理页面中会请求原先的所有资源。如果这些资源中有部分URL可以与资源列表中记录的正则表达式形成有效映射,之前添加的JavaScript函数将自动替换该URL,并且转而请求缓存资源。
最后,服务器端自动运行工具。该工具自动抓取并分析页面,并在服务器端提供和维护缓存资源列表Manifest,该缓存资源列表包含资源的各种信息,并且通过应用缓存接口与代理页面相连接。
用户仍然通过原先的URL访问Web应用,并且拥有更好的体验效果。

Claims (8)

  1. 一种基于HTML5应用缓存的移动Web缓存优化方法,其步骤为:
    1)针对一设定移动Web应用,服务器端定期爬取该移动Web应用所包含资源信息;
    2)将爬取资源中内容相同但对应不同URL的资源映射为同一资源;
    3)根据爬取的资源的历史信息预测每个资源维持不变的时间,选取一组稳定的资源配置到HTML5应用缓存的缓存资源列表Manifest文件中,同时生成一个资源映射文件;该资源映射文件中保存每一资源与对应URL的映射关系;
    4)设置一JavaScript运行库;在每一目标页面中添加该JavaScript运行库的调用指令,用于当该客户端浏览器访问目标页面时,自动拦截该目标页面的URL解析请求任务;其中,目标页面为该设定移动Web应用的一个页面,每一目标页面具有若干资源;
    5)对每一目标页面生成一代理页面,将目标页面的URL解析到对应代理页面,然后通过该客户端浏览器访问一目标页面时,根据请求的资源查询该资源映射文件,然后根据查询结果从该缓存资源列表Manifest文件中读取匹配的缓存资源加载到该代理页面。
  2. 如权利要求1所述的方法,其特征在于,所述资源信息包括资源的大小、资源内容的MD5值、资源的缓存时间配置情况。
  3. 如权利要求2所述的方法,其特征在于,从历史信息中提取每个资源在每个时刻的MD5值,获取资源变化情况的时间序列,最后根据GDM算法预测每个资源维持不变的时间。
  4. 如权利要求1所述的方法,其特征在于,将爬取资源中内容相同但对应不同URL的资源映射为同一资源的方法为:首先根据t-1时刻的正则化的资源列表Ht-1和t时刻的具体资源列表Rt,生成t时刻的正则化资源列表Ht;然后将t时刻的正则化资源列表Ht初始化为t-1时刻的正则化资源列表Ht-1,并将每个资源的状态设置为不存在;然后对于资源列表R中的每一资源r,如果资源列表Ht中没有资源和资源r对应,则在资源列表Ht中添加关于资源r的记录;如果资源列表Ht中有唯一资源和资源r对应,则将资源r映射到资源列表Ht中并重新计算资源r的正则表达式,如果资源列表Ht中有多个资源和资源r对应,则删除原有映射,并且重新在资源列表Ht中添加关于r的记录。
  5. 如权利要求1所述的方法,其特征在于,根据资源的大小、预测维持不变的时间、缓存配置和移动Web应用本身用户分布选取一组资源配置到该缓存资源列表Manifest文件中。
  6. 如权利要求5所述的方法,其特征在于,选取一组资源配置到该缓存资源列表Manifest文件中的方法为:对于给定缓存资源列表Manifest的更新时间Ti,计算将一个资源放入 应用缓存所节约的传输流量,然后计算每一种应用缓存组合的总收益;最后选择总收益最大的组合对应的资源集合设置到HTML5应用缓存的Manifest文件中。
  7. 如权利要求6所述的方法,其特征在于,资源通过放入应用缓存所节约的流量=(预期缓存时间-该资源配置的缓存时间)*资源大小;
    Figure PCTCN2016098292-appb-100001
    其中,σ是用户访问分布函数。
  8. 如权利要求1~7任一所述的方法,其特征在于,服务器端在Manifest文件中的资源内容发生变化时更新Manifest文件。
PCT/CN2016/098292 2015-12-23 2016-09-07 一种基于HTML5应用缓存的移动Web缓存优化方法 WO2017107570A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/514,632 US20180285470A1 (en) 2015-12-23 2016-09-07 A Mobile Web Cache Optimization Method Based on HTML5 Application Caching

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510980489.8 2015-12-23
CN201510980489.8A CN105550338B (zh) 2015-12-23 2015-12-23 一种基于HTML5应用缓存的移动Web缓存优化方法

Publications (1)

Publication Number Publication Date
WO2017107570A1 true WO2017107570A1 (zh) 2017-06-29

Family

ID=55829527

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/098292 WO2017107570A1 (zh) 2015-12-23 2016-09-07 一种基于HTML5应用缓存的移动Web缓存优化方法

Country Status (3)

Country Link
US (1) US20180285470A1 (zh)
CN (1) CN105550338B (zh)
WO (1) WO2017107570A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110090436A (zh) * 2019-04-23 2019-08-06 深圳易帆互动科技有限公司 H5小游戏资源缓存方法
CN110162727A (zh) * 2019-05-29 2019-08-23 上海有谱网络科技有限公司 Android系统HTML5资源本地缓存的方法
CN114968397A (zh) * 2022-05-13 2022-08-30 银盛支付服务股份有限公司 一种解决前端应用缓存引起渲染异常的方法

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550338B (zh) * 2015-12-23 2018-11-23 北京大学 一种基于HTML5应用缓存的移动Web缓存优化方法
CN107644038A (zh) * 2016-07-20 2018-01-30 平安科技(深圳)有限公司 页面缓存方法和装置
US10970354B2 (en) * 2017-07-17 2021-04-06 Songtradr, Inc. Method for processing code and increasing website and client interaction speed
CN107517254B (zh) * 2017-08-22 2020-10-16 北京梅泰诺通信技术股份有限公司 一种动态数据请求处理系统及方法
US11328021B2 (en) * 2018-12-31 2022-05-10 Microsoft Technology Licensing, Llc Automatic resource management for build systems
CN110134896B (zh) * 2019-05-17 2023-05-09 山东渤聚通云计算有限公司 一种代理服务器的监控过程及智能缓存方法
CN110569467B (zh) * 2019-08-27 2022-10-14 上海易点时空网络有限公司 用于客户端应用程序的离线访问方法以及装置
CN110569465B (zh) * 2019-08-27 2022-09-02 上海易点时空网络有限公司 用于客户端应用程序的离线访问方法以及装置
CN110851801B (zh) * 2019-09-24 2022-07-12 云深互联(北京)科技有限公司 一种基于统一资源定位符的资源数据页面标识方法和装置
CN112579857A (zh) * 2019-09-30 2021-03-30 北京国双科技有限公司 一种数据爬取的方法、装置、电子设备及存储介质
CN113687885A (zh) * 2020-05-19 2021-11-23 京东方科技集团股份有限公司 一种加载页面数据的方法、装置及系统
CN112597054B (zh) * 2020-12-30 2023-04-11 深圳市世强元件网络有限公司 移动终端h5页面应用测试装置、测试方法和电脑终端
CN114024730B (zh) * 2021-10-29 2024-04-09 海南学之舟科技有限公司 一种企业门户管理系统
CN116244538B (zh) * 2023-01-31 2023-11-21 彭志勇 基于serviceworker的文件缓存方法和加载方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120102221A1 (en) * 2010-10-25 2012-04-26 Google Inc. System and method for redirecting a request for a non-canonical web page
CN103108035A (zh) * 2013-01-17 2013-05-15 深圳市中兴移动通信有限公司 一种基于webos的应用本地化方法及装置
US20150058435A1 (en) * 2013-08-21 2015-02-26 International Business Machines Corporation Fast Mobile Web Applications Using Cloud Caching
CN105550338A (zh) * 2015-12-23 2016-05-04 北京大学 一种基于HTML5应用缓存的移动Web缓存优化方法

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9459936B2 (en) * 2009-05-01 2016-10-04 Kaazing Corporation Enterprise client-server system and methods of providing web application support through distributed emulation of websocket communications
CN101668046B (zh) * 2009-10-13 2012-12-19 成都市华为赛门铁克科技有限公司 资源缓存方法及其装置、系统
US11102325B2 (en) * 2009-10-23 2021-08-24 Moov Corporation Configurable and dynamic transformation of web content
US8600803B1 (en) * 2010-05-18 2013-12-03 Google Inc. Incentivizing behavior to address pricing, tax, and currency issues in an online marketplace for digital goods
US8909732B2 (en) * 2010-09-28 2014-12-09 Qualcomm Incorporated System and method of establishing transmission control protocol connections
US9037638B1 (en) * 2011-04-11 2015-05-19 Viasat, Inc. Assisted browsing using hinting functionality
US9912718B1 (en) * 2011-04-11 2018-03-06 Viasat, Inc. Progressive prefetching
US9106607B1 (en) * 2011-04-11 2015-08-11 Viasat, Inc. Browser based feedback for optimized web browsing
US20120290910A1 (en) * 2011-05-11 2012-11-15 Searchreviews LLC Ranking sentiment-related content using sentiment and factor-based analysis of contextually-relevant user-generated data
US20130226979A1 (en) * 2011-10-17 2013-08-29 Brainshark, Inc. Systems and methods for multi-device rendering of multimedia presentations
US10229222B2 (en) * 2012-03-26 2019-03-12 Greyheller, Llc Dynamically optimized content display
US8656265B1 (en) * 2012-09-11 2014-02-18 Google Inc. Low-latency transition into embedded web view
CN103686684A (zh) * 2012-09-20 2014-03-26 腾讯科技(深圳)有限公司 离线缓存的方法及装置
US9992268B2 (en) * 2012-09-27 2018-06-05 Oracle International Corporation Framework for thin-server web applications
US9838463B2 (en) * 2013-03-12 2017-12-05 Sony Interactive Entertainment America Llc System and method for encoding control commands
US9426200B2 (en) * 2013-03-12 2016-08-23 Sap Se Updating dynamic content in cached resources
CN103269353B (zh) * 2013-04-19 2016-11-02 网宿科技股份有限公司 Web缓存回源优化方法及Web缓存系统
US9098477B2 (en) * 2013-05-15 2015-08-04 Cloudflare, Inc. Method and apparatus for automatically optimizing the loading of images in a cloud-based proxy service
US9300687B2 (en) * 2013-08-06 2016-03-29 Sap Se Managing access to secured content
US20150113093A1 (en) * 2013-10-21 2015-04-23 Frank Brunswig Application-aware browser
US9819721B2 (en) * 2013-10-31 2017-11-14 Akamai Technologies, Inc. Dynamically populated manifests and manifest-based prefetching
CN103916474B (zh) * 2014-04-04 2018-05-22 北京搜狗科技发展有限公司 缓存时间的确定方法、装置及系统
US9509742B2 (en) * 2014-10-29 2016-11-29 DLVR, Inc. Configuring manifest files referencing infrastructure service providers for adaptive streaming video

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120102221A1 (en) * 2010-10-25 2012-04-26 Google Inc. System and method for redirecting a request for a non-canonical web page
CN103108035A (zh) * 2013-01-17 2013-05-15 深圳市中兴移动通信有限公司 一种基于webos的应用本地化方法及装置
US20150058435A1 (en) * 2013-08-21 2015-02-26 International Business Machines Corporation Fast Mobile Web Applications Using Cloud Caching
CN105550338A (zh) * 2015-12-23 2016-05-04 北京大学 一种基于HTML5应用缓存的移动Web缓存优化方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110090436A (zh) * 2019-04-23 2019-08-06 深圳易帆互动科技有限公司 H5小游戏资源缓存方法
CN110162727A (zh) * 2019-05-29 2019-08-23 上海有谱网络科技有限公司 Android系统HTML5资源本地缓存的方法
CN114968397A (zh) * 2022-05-13 2022-08-30 银盛支付服务股份有限公司 一种解决前端应用缓存引起渲染异常的方法

Also Published As

Publication number Publication date
CN105550338B (zh) 2018-11-23
CN105550338A (zh) 2016-05-04
US20180285470A1 (en) 2018-10-04

Similar Documents

Publication Publication Date Title
WO2017107570A1 (zh) 一种基于HTML5应用缓存的移动Web缓存优化方法
US10834225B2 (en) System for prefetching digital tags
US10645143B1 (en) Static tracker
JP6356273B2 (ja) バッチ最適化レンダリング及びフェッチアーキテクチャ
US8612418B2 (en) Mobile web browser for pre-loading web pages
KR102294326B1 (ko) 연결 해제의 기간을 위한 애플리케이션 데이터의 프리페치
CN106462561B (zh) 优化浏览器渲染过程
US20150207691A1 (en) Preloading content based on network connection behavior
EP3821344B1 (en) Use of cache for content validation and error remediation
WO2020057523A1 (zh) 触发漏洞检测的方法及装置
US20220237042A1 (en) Resource pre-fetch using age threshold
US20220092144A1 (en) Intelligent dynamic preloading
JP2018160264A (ja) バッチ最適化レンダリング及びフェッチアーキテクチャ
US20230088115A1 (en) Generating early hints informational responses at an intermediary server
US20240176732A1 (en) Advanced application of model operations in energy
US9307052B1 (en) Edge side landing pages
US10599740B1 (en) Program code streaming
KR102440893B1 (ko) 멀티 챗봇 서비스의 응답 시간 개선을 위한 방법 및 장치
CN107357897A (zh) 一种实现用户访问控制的方法、装置及计算机存储介质
Liu et al. Mitigating Redundant Data Transfers for Mobile Web Applications via App-Specific Cache Space
US20180270163A1 (en) Profile guided load optimization for browsers
Singh et al. A Dynamic Web Caching Technique for using “URL Rewriting”
Mehedintu et al. Optimization Model For Web Applications Databases

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 15514632

Country of ref document: US

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16877381

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16877381

Country of ref document: EP

Kind code of ref document: A1