WO2011097992A1 - 实现网页访问的方法、系统及前端服务器 - Google Patents

实现网页访问的方法、系统及前端服务器 Download PDF

Info

Publication number
WO2011097992A1
WO2011097992A1 PCT/CN2011/070703 CN2011070703W WO2011097992A1 WO 2011097992 A1 WO2011097992 A1 WO 2011097992A1 CN 2011070703 W CN2011070703 W CN 2011070703W WO 2011097992 A1 WO2011097992 A1 WO 2011097992A1
Authority
WO
WIPO (PCT)
Prior art keywords
webpage
data
digital television
receiving terminal
television receiving
Prior art date
Application number
PCT/CN2011/070703
Other languages
English (en)
French (fr)
Inventor
易睿
Original Assignee
深圳市同洲电子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市同洲电子股份有限公司 filed Critical 深圳市同洲电子股份有限公司
Publication of WO2011097992A1 publication Critical patent/WO2011097992A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8126Monomedia components thereof involving additional data, e.g. news, sports, stocks, weather forecasts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/435Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream
    • H04N21/4355Processing of additional data, e.g. decrypting of additional data, reconstructing software from modules extracted from the transport stream involving reformatting operations of additional data, e.g. HTML pages on a television screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/482End-user interface for program selection

Definitions

  • the present invention relates to the field of communications, and in particular, to a method, system, and front-end server for implementing webpage access. Background technique
  • the digital TV receiving terminal such as the set-top box is generally an embedded system, the performance of the CPU is relatively low, and therefore, the speed of accessing the webpage through it is slow;
  • the object of the present invention is to provide a method, a system and a front-end server for realizing webpage access, and a transit platform between the digital television receiving terminal and the Internet, which can instantly stream the webpage content of the Internet and solve the digital television receiving terminal. Due to the low CPU speed problem, the access speed is slow and the access function is incomplete.
  • an embodiment of the present invention discloses a method for implementing webpage access, including: collecting webpage data according to a webpage collection policy;
  • an embodiment of the present invention further discloses a front-end server, including:
  • An acquisition module configured to collect webpage data according to a webpage collection strategy
  • a data processing module configured to perform analysis and processing on the webpage data collected by the collection module, and perform data conversion on the parsed webpage data
  • a sending module configured to send, according to the received webpage access request of the digital television receiving terminal, the webpage data after the data processing module performs data conversion to the digital television receiving terminal, so that the digital television receiving terminal according to the The web page data is displayed on the corresponding web page.
  • the embodiment of the invention further discloses a system for implementing webpage access, comprising a digital television receiving terminal, further comprising: a front-end server,
  • the digital television receiving terminal is configured to send a webpage access request to the front-end server, and receive webpage data sent by the front-end server, and perform corresponding webpage display according to the received webpage data;
  • the front-end server is configured to collect webpage data according to a webpage collection policy; analyze and process the collected webpage data, and perform data conversion on the parsed webpage data; according to the received digital television receiving terminal And receiving, by the webpage access request, the converted webpage data to the digital television receiving terminal, so that the digital television receiving terminal performs corresponding webpage display according to the converted webpage data.
  • the invention establishes a transit platform between the digital television receiving terminal and the Internet, and according to the webpage access request sent by the digital television receiving terminal, the webpage data is collected, processed and converted by the transit server, and the webpage (including: large webpage)
  • the webpage content can be a function supported by the browser in the digital television receiving terminal, and the browser of the digital television receiving terminal can be backgroundized, so that complicated and complicated webpage parsing and processing are performed on the server.
  • the problem that the speed of accessing the webpage caused by the low rate of the CPU of the digital television terminal is slow, and the function of the digital television receiving terminal is weak, such as rmvb decoding is not supported on the hardware, and the animation is not supported to convert the animation into Flash playback. Etc., the problem caused by the incomplete access to the web page.
  • DRAWINGS 1 is a schematic structural diagram of an embodiment of a system for implementing webpage access according to the present invention
  • FIG. 2 is a schematic structural diagram of an embodiment of a front-end server according to the present invention
  • FIG. 3 is a flowchart of a first embodiment of a method for implementing webpage access according to the present invention
  • FIG. 4 is a flow chart of a second embodiment of a method for implementing web page access according to the present invention. detailed description
  • FIG. 1 is a schematic structural diagram of an embodiment of a system for implementing webpage access according to the present invention.
  • the system for implementing web page access includes: a digital television receiving terminal 10 and a front end server 20.
  • the digital television receiving terminal 10 is configured to send a webpage access request to the front-end server 20, and receive webpage data sent by the front-end server 20, and perform corresponding webpage display according to the received webpage data.
  • the digital television receiving terminal 10 includes, but is not limited to, a set top box (STB), an IPTV (Internet Protocol Television), a television mobile phone, a digital television integrated machine, and the like, and has a terminal for receiving a digital television function.
  • STB set top box
  • IPTV Internet Protocol Television
  • the digital television receiving terminal 10 according to the embodiment of the present invention includes an embedded browser to support the function of webpage access.
  • the webpage access request of the digital television receiving terminal 10 includes: a model number of the digital television receiving terminal 10, a display requirement of the webpage requested to be accessed, a keyword of the webpage requested to be accessed, and a webpage URL requested to be accessed. Any one or more of (Uniform/Universal Resource Locator).
  • the front-end server 20 is configured to collect webpage data that is requested to be accessed according to a webpage collection policy; analyze and process the collected webpage data, and perform data conversion on the webpage data after the analysis and processing; according to the digital television Receiving a webpage access request of the terminal 10, and transmitting the converted webpage data to the digital television receiving terminal 10, so that the digital television receiving terminal 10 performs corresponding webpage display according to the converted webpage data.
  • the browser of the digital television receiving terminal 10 serves the front end through a television network.
  • the server 20 sends a webpage access request, such as: http (HyperText Transfer Protocol) request.
  • the front-end server 20 accesses the website through the Internet, collects the webpage data, and performs analysis processing, data conversion, and the like on the collected webpage data according to the display requirements of the webpage included in the webpage access request, so that the webpage data is formed.
  • a smaller, more compressed page suitable for display on the digital television receiving terminal 10 and transmitting the webpage data to the digital television receiving according to the model number of the digital television receiving terminal 10 included in the webpage access request Terminal 10.
  • the digital television receiving terminal 10 performs display of the webpage according to the received webpage data.
  • the display function of the television connected to the set top box may be used, and finally The content of the page is displayed to the user, and human-computer interaction is provided, and the user performs audio and video playback, picture browsing, and the like on the displayed page; for example, when the digital television receiving terminal 10 is a digital television integrated machine or the like.
  • the display function unit of the all-in-one can be used to display the final page content to the user, and human-computer interaction can be provided, and the user performs audio and video playback, picture browsing and the like on the displayed page.
  • the invention establishes a transit platform between the digital television receiving terminal and the Internet, and according to the webpage access request sent by the digital television receiving terminal, the webpage data is collected, processed and converted by the transit server, and the webpage (including: large webpage)
  • the webpage content can be a function supported by the browser in the digital television receiving terminal, and the browser of the digital television receiving terminal can be backgroundized, so that complicated and complicated webpage parsing and processing are performed on the server.
  • the problem of low CPU speed of the digital television terminal is solved; at the same time, the access speed of the digital television receiving terminal to the webpage is accelerated.
  • the front-end server 20 includes: an acquisition module 201, a data processing module 202, and a sending module 203.
  • the collecting module 201 is configured to collect webpage data according to a webpage collection policy.
  • the front-end server 20 includes a database, which mainly includes: a temporary webpage database, a URL database, a webpage cache database, a content database, a keyword index database, a URL index database, and a behavior database.
  • the collecting module 201 mainly collects corresponding webpage data by loading a webpage collecting program (also referred to as a web crawler).
  • the front-end server 20 may also include a DNS (Domain Name System) Save, when the webpage collection program collects, it directly obtains the IP (Internet Protocol) of the webpage from the DNS cache, and does not need to resolve the domain name every time to reduce the parsing time.
  • DNS Domain Name System
  • the webpage accessing program can be distributedly arranged, and only the URLs are grouped according to the collection policy, and each webpage is arranged with a webpage accessing program. Then, each webpage accessing program collects webpage data according to different URL groupings, which can effectively avoid repeated collection. .
  • the collection policy of the collection module 201 may include a policy such as breadth-first, depth-first, or linear-priority, and may also analyze information such as common links, common keywords, and access numbers of the user through user behavior data analysis.
  • the information is formulated with a weighting coefficient of the URL, and the URL database has a priority. For example, the user's instant request has a larger weighting coefficient of the URL and a higher priority.
  • the collection module 201 loads the webpage collection program according to the priority of the URL database and the collection policy, and extracts the URL of the webpage, and simultaneously collects the original webpage data, that is, the webpage data requested to be accessed.
  • a record table when the webpage data is collected, a record table may be used, and the record table includes information such as accessed, unvisited, and content summary, so that repeated collection of the webpage may be avoided, and the specific record table method and existing The method of storing and recording in the data access is similar, and will not be described here.
  • the data processing module 202 is configured to perform analysis processing on the webpage data collected by the collection module 201, and perform data conversion on the parsed webpage data.
  • the data processing module 202 analyzes and processes webpage data by loading a webpage analysis program, and specifically includes: denoising webpage data, removing advertisement data, removing navigation bar data, removing unsupported function tags and attribute data, and removing Javascript script data, removing any one or more of CSS (Cascading Style Sheets) grammar data and web page data compression.
  • CSS CSS
  • HTML HyperText Mark-up Language
  • the data processing module 202 performs data conversion on the obtained webpage data, including: any one or more of image data conversion, audio and video data format conversion, and tube conversion.
  • the data converted data will be imported into the content database for corresponding storage.
  • the webpage data processed by the data processing module 202 is already suitable for display by the browser of the digital television receiving terminal 10.
  • the sending module 203 is configured to send the webpage data after the data processing module 202 performs data conversion to the digital television receiving terminal 10, so that the digital television receiving terminal 10 performs corresponding webpage display according to the webpage data. .
  • the invention establishes a transit platform between the digital television receiving terminal and the Internet, and pre-registers, processes and converts the webpage data by the transit server according to a certain webpage collection strategy, and performs webpage (including: large webpage).
  • the content of the webpage can be a function supported by a browser in the digital television receiving terminal, and the content of the webpage is sent to the digital television receiving terminal according to the webpage access request sent by the digital television receiving terminal, so that the number is
  • the background of the browser of the TV receiving terminal allows complex and cumbersome web page parsing and processing to be performed on the server, which solves the problem of low CPU speed of the digital television terminal; at the same time, speeds up the access speed of the digital television receiving terminal to the webpage.
  • the front-end server 20 may further include: a retrieval module 204, a rearrangement module 205, an index generation module 206, and an update module 207.
  • the search module 204 is configured to retrieve, according to the webpage access request of the digital television receiving terminal 10, whether the webpage data requested to be accessed exists in the database; if the result of the retrieval by the retrieval module 204 is yes, the sending module 203 is caused Sending the retrieved webpage data to the digital television receiving terminal 10; if the result of the retrieval by the retrieval module 204 is no, the acquisition module 201 is caused to collect the webpage data requested to be accessed.
  • the searching module 204 first searches for a webpage cache database according to the keyword of the requested webpage included in the webpage access request of the digital television receiving terminal 10, the URL of the webpage requested to be accessed, and the like.
  • the requested webpage data if any, directly sends the retrieved webpage data to the digital television receiving terminal 10; if not, the retrieval module 204 continues to retrieve the URL index database according to the requested URL.
  • the URL of the requested access is retrieved, the corresponding webpage data is found from the content database according to the index of the URL I database, and sent to the digital television receiving terminal 10; if the URL index database is not retrieved
  • the URL for requesting access may also search the index database according to the input keyword to see whether there is a corresponding keyword, and if so, directly find the corresponding webpage data in the content database according to the keyword in the index database; If not, the collection module 201 performs network from the Internet. data collection.
  • the index database stores the keyword of the webpage, and the inventory of the URL index data.
  • the URL of the webpage is stored, and the content database stores corresponding webpage data, wherein the index database, the URL index database, and the content database are associated with each other.
  • the rearrangement module 205 is configured to: after the data processing module 202 performs data conversion on the webpage data, perform the typesetting on the converted webpage data according to the webpage access request of the digital television receiving terminal 10, The typeset web page data is transmitted by the transmitting module 203 to the digital television receiving terminal 10.
  • the display requirements of the webpage are different according to different models of the data receiving terminals 10 of the data words. Therefore, the rearrangement module 205 may be based on the number included in the webpage access request of the digital television receiving terminal 10. The model and display request of the television receiving terminal 10 re-format the data converted web page data to fit the layout display of the browser of the digital television receiving terminal 10.
  • the high-definition video data is reformatted into ordinary video data and sent to the digital television receiving terminal 10 for display; for example: when the webpage data is converted, the number is When the model type of the television receiving terminal 10 is still unable to quickly open a large web page, it can be reformatted into web page data of a smaller page and then transmitted to the digital television receiving terminal 10 for display.
  • the index generating module 206 is configured to generate a keyword index and a URL index according to the parsed webpage data when the data processing module 202 performs data conversion on the webpage data.
  • the index generating module 206 performs a corresponding index generating process, and in addition to the regular keyword or keyword index generation, a URL index is generated, because the number is
  • the webpage access of the television receiving terminal 10 is a process of URL access. Therefore, in order to ensure the real-time access of the webpage, it is necessary to generate a keyword index and a URL index, and the generated keyword index and URL index are respectively put into the index database. And the URL index in the database.
  • each database must store some website content (webpage data) that the user frequently visits, such as: Sina, Netease, Sohu, Tencent, and the like.
  • the update module 207 is configured to update the database according to a webpage access request of the digital television receiving terminal 10.
  • the update module 207 is based on the digital television receiving terminal 10 A web page access request updates the database.
  • the update module 207 can update the database by using a load behavior analysis program. When receiving the webpage access request of the digital television receiving terminal 10, the behavior analysis program automatically analyzes the request and analyzes the common link. Information such as common keywords, access numbers, etc., based on this information, re-define the storage policy of the database, and then update the database.
  • the URL database is updated in real time according to the analysis of the user's access content, that is, the commonly used link, etc., and the commonly used URL data is stored in the URL database.
  • some web pages are collected (grabbed) through the webpage collection program, and then analyzed by the behavior analysis program, and then the database is updated according to the analysis result, so that the real-time nature of the webpage access system can be ensured.
  • the invention establishes a transit platform between the digital television receiving terminal and the Internet, and according to the webpage access request sent by the digital television receiving terminal, the webpage data is collected, processed and converted by the transit server, and the webpage (including: large webpage)
  • the webpage content can be a function supported by the browser in the digital television receiving terminal, and the browser of the digital television receiving terminal can be backgroundized, so that complicated and complicated webpage parsing and processing are performed on the server.
  • the problem of low CPU speed of the digital television terminal is solved; at the same time, the access speed of the digital television receiving terminal to the webpage is accelerated.
  • FIG. 3 it is a flowchart of a first embodiment of a method for implementing webpage access according to the present invention.
  • the method includes:
  • the webpage data collection strategy may include a breadth-first, a depth-first, or a linear-priority policy, and may also analyze user-used links, common keywords, and the number of visits through user behavior data analysis, and formulate the information according to the information.
  • the weighting coefficient of the URL the URL database will have a priority, for example: The user's instant request, the URL has a larger weighting coefficient, and its priority is higher.
  • the webpage collection program is loaded, the URL of the webpage is extracted, and the original webpage data, that is, the webpage data requested to be accessed, is collected.
  • a record table when the webpage data is collected, a record table may be used, and the record table includes information such as accessed, unvisited, and content summary, so that repeated collection of the webpage may be avoided, and the specific record table method and existing The method of storing and recording in the data access is similar, and will not be described here.
  • the collected webpage data is analyzed and processed, including: denoising webpage data, removing advertisement data, removing navigation bar data, removing unsupported function tags and attribute data, removing Javascript script data, and removing CSS syntax data. And one or more of web page data compression.
  • the data conversion includes: one or more of image data conversion, audio and video data format conversion, and tube conversion.
  • the webpage access request of the digital television receiving terminal includes: a model of the digital television receiving terminal, a display request of the webpage requested to be accessed, a keyword of the webpage requested to be accessed, and a URL of the webpage requested to be accessed.
  • a model of the digital television receiving terminal a display request of the webpage requested to be accessed, a keyword of the webpage requested to be accessed, and a URL of the webpage requested to be accessed.
  • the invention establishes a transit platform between the digital television receiving terminal and the Internet, and according to the webpage access request sent by the digital television receiving terminal, the webpage data is collected, processed and converted by the transit server, and the webpage (including: large webpage)
  • the webpage content can be a function supported by the browser in the digital television receiving terminal, and the browser of the digital television receiving terminal can be backgroundized, so that complicated and complicated webpage parsing and processing are performed on the server.
  • the problem of low CPU speed of the digital television terminal is solved; at the same time, the access speed of the digital television receiving terminal to the webpage is accelerated.
  • FIG. 4 it is a flowchart of a second embodiment of a method for implementing webpage access according to the present invention.
  • the method includes:
  • the digital television receiving terminal issues a webpage access request
  • step S206 the database is searched for the presence of the requested web page data; if the search result is yes, step S206 is performed, if the search result is no, step S203 is performed;
  • the front-end server first searches for the requested webpage data in the webpage cache database according to the keyword of the requested webpage included in the webpage access request, the URL of the webpage requested to be accessed, and the like. If yes, step S206 is performed; if not, S202 will continue to retrieve the URL index database according to the requested URL, and if the URL with the requested access is retrieved, find the corresponding content from the content database according to the index of the URL index database.
  • step S206 if the URL of the requested access is not retrieved in the URL index database, the index database may be retrieved according to the input keyword to see if there is a corresponding keyword, and if so, directly Searching for keywords in the database, finding corresponding webpage data in the content database, and then performing step S206; if not, executing step S203.
  • the index database stores the keywords of the webpage
  • the URL index data store stores the URL of the webpage
  • the content database stores the corresponding webpage data, wherein the index database, the URL index database, and the content database are associated with each other.
  • the S202 needs to ensure that when the webpage data requested to be accessed is retrieved in the database, and the webpage data is responded to by the webpage data request, the webpage data retrieved by the webpage must be the latest webpage data (ie, : Ensure that the page data is not out of date).
  • S203 mainly collects the corresponding webpage data by loading a webpage collecting program (also called a web crawler).
  • a webpage collecting program also called a web crawler.
  • the webpage collection program collects, the IP of the webpage is directly obtained from the DNS cache, and the domain name is not required to be parsed each time to reduce the parsing time.
  • the webpage accessing program can be distributedly arranged, and only the URLs are grouped according to the collection policy, and each webpage is arranged with a webpage accessing program. Then, each webpage accessing program collects webpage data according to different URL groupings, which can effectively avoid repeated collection. .
  • the S203 loads the webpage collection program according to the priority of the URL database and the collection policy, extracts the URL of the webpage, and collects the original webpage data, that is, the webpage data requested to be accessed.
  • a record table may be used, and the record table includes information such as accessed, unvisited, and content summary, so that repeated collection of the webpage may be avoided, and the specific record table method and existing The method of storing and recording in the data access is similar, and will not be described here.
  • the S204 analyzes and processes the webpage data by loading the webpage analysis program, and specifically includes: denoising the webpage data, removing the advertisement data, removing the navigation bar data, removing unsupported function tags and attribute data, removing the Javascript script data, and removing Any one or more of CSS syntax data and web page data compression.
  • the content of the web page and the basic HTML tags are preserved, providing high-quality material for subsequent data conversion and index generation.
  • S204 performs data conversion on the obtained webpage data, including: Any one or more of picture data conversion, audio and video data format conversion, and tube conversion.
  • the data converted data will be imported into the content database for corresponding storage.
  • the webpage data processed by S204 described above is already suitable for display by the browser of the digital television receiving terminal.
  • the S205 re-formats the data of the converted webpage according to the model and display requirements of the digital television receiving terminal. It is suitable for the layout display of the browser of the digital television receiving terminal. For example: according to the display request of the digital television receiving terminal, the high-definition video data is reformatted into ordinary video data; for example: after the webpage data is converted, the model type of the digital television receiving terminal cannot be quickly opened. When you have a large web page, you can reformat it into a smaller page of web page data.
  • the digital television receiving terminal displays the corresponding webpage according to the webpage data.
  • the method of the second embodiment of the present invention may further include:
  • a keyword index and a URL index on the webpage data analyzed and processed by S204.
  • S208 performs a corresponding index generation process, and in addition to the regular keyword or keyword index generation, there is also a URL index generation, because the webpage of the digital television receiving terminal Access is a process of URL access. Therefore, in order to ensure the real-time access of web pages, it is necessary to generate a keyword index and a URL index.
  • the generated keyword index and URL index are respectively placed in the index database and the URL index database. Specifically, in order to speed up the access, each database must store some website content (webpage data) that the user frequently visits, such as: Sina, Netease, Sohu, Tencent, and the like.
  • the method in the second embodiment of the present invention may further include a database update process, specifically: S209, updating the database according to the webpage access request of S201, and/or the keyword index and the URL index generated by S208.
  • a database update process specifically: S209, updating the database according to the webpage access request of S201, and/or the keyword index and the URL index generated by S208.
  • S209 updates the database according to the webpage access request of S201.
  • the S209 can update the database through the load behavior analysis program.
  • the behavior analysis program automatically analyzes the request, and analyzes common links, common keywords, and access numbers. And so on, based on this information, re-define the database storage strategy, and then update the database.
  • the webpage collecting program randomly collects (crawls) some webpages, performs corresponding analysis through the behavior analysis program, and then updates according to the analysis result.
  • the database in this way, can guarantee the real-time nature of the web access system.
  • the execution bodies of the remaining processing procedures are front-end servers.
  • the invention establishes a transit platform between the digital television receiving terminal and the Internet, and according to the webpage access request sent by the digital television receiving terminal, the webpage data is collected, processed and converted by the transit server, and the webpage (including: large webpage)
  • the webpage content can be a function supported by the browser in the digital television receiving terminal, and the browser of the digital television receiving terminal can be backgroundized, so that complicated and complicated webpage parsing and processing are performed on the server.
  • the problem of low CPU speed of the digital television terminal is solved; at the same time, the access speed of the digital television receiving terminal to the webpage is accelerated.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Transfer Between Computers (AREA)

Description

实现网页访问的方法、 系统及前端服务器
本申请要求于 2010年 2月 9日提交中国专利局、申请号为 201010112317.6、 发明名称为"实现网页访问的方法、 系统及前端服务器"的中国专利申请的优先 权, 其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信领域,尤其涉及实现网页访问的方法、系统及前端服务器。 背景技术
随着网络技术发展, 以及数字电视接收终端的功能多样化,使用数字电视 接收终端访问互联网上的网页也变得十分的容易和普遍。现有的数字电视接收 终端 (比如: 机顶盒) 直接通过自身的嵌入式浏览器访问网站上的网页。
发明人在实施本发明的过程中发现,现有的这种实现网页访问的方案存在 以下缺陷:
1、 由于机顶盒等数字电视接收终端一般为嵌入式系统, 其 CPU的性能较 低, 因此, 通过其访问网页的速度较慢;
2、 由于某些机顶盒等数字电视接收终端的功能较弱, 如在硬件上不支持 rmvb解码, 软件上不支持把动画转换成 Flash播放等功能, 因此, 导致无法访 问到完整的网页功能。 发明内容
本发明的目的在于, 提供一种实现网页访问的方法、 系统及前端服务器, 在数字电视接收终端与互联网之间搭建中转的平台,能够即时的把互联网的网 页内容筒化, 解决数字电视接收终端由于 CPU低速率的问题所带的访问速度 慢和访问功能不完整的问题。
为了实现上述目的,本发明实施例公开了一种实现网页访问的方法,包括: 根据网页采集策略采集网页数据;
对所述采集的网页数据进行分析处理,并对所述分析处理后的网页数据进 行数据转换;
根据接收的数字电视接收终端的网页访问请求,将所述转换后的网页数据 发送给所述数字电视接收终端,以使所述数字电视接收终端根据所述转换后的 网页数据进行相应的网页显示。
相应地, 本发明实施例还公开了一种前端服务器, 包括:
采集模块, 用于根据网页采集策略采集网页数据;
数据处理模块, 用于对所述采集模块采集的所述网页数据进行分析处理, 并对所述分析处理后的网页数据进行数据转换;
发送模块, 用于根据接收的数字电视接收终端的网页访问请求,将所述数 据处理模块进行数据转换后的网页数据发送给所述数字电视接收终端,以使所 述数字电视接收终端根据所述网页数据进行相应在的网页显示。
相应地, 本发明实施例还公开了一种实现网页访问的系统, 包括数字电视 接收终端, 还包括: 前端服务器,
所述数字电视接收终端,用于向所述前端服务器发送网页访问请求, 并接 收所述前端服务器所发送的网页数据,根据所述接收的网页数据进行相应的网 页显示;
所述前端服务器, 用于根据网页采集策略采集网页数据; 对所述采集的网 页数据进行分析处理, 并对所述分析处理后的网页数据进行数据转换; 根据接 收的所述数字电视接收终端的网页访问请求,将所述转换后的网页数据发送给 所述数字电视接收终端,以使所述数字电视接收终端根据所述转换后的网页数 据进行相应的网页显示。
本发明通过在数字电视接收终端与互联网之间搭建中转的平台,根据数字 电视接收终端发出的网页访问请求, 由中转的服务器进行网页数据的采集、处 理及转换, 对网页(包括: 大型网页)进行筒化, 其筒化后的网页内容能够成 为数字电视接收终端中的浏览器所能支持的功能,使数字电视接收终端的浏览 器后台化, 让复杂繁瑣的网页解析和处理均在服务器进行,解决了数字电视终 端的 CPU低速率导致的访问网页的速度较慢的问题, 以及数字电视接收终端 的功能较弱,如在硬件上不支持 rmvb解码,软件上不支持把动画转换成 Flash 播放等, 所导致的不能完整访问网页的问题。 附图说明 图 1为本发明的实现网页访问的系统的实施例的结构示意图; 图 2为本发明的前端服务器的实施例的结构示意图;
图 3为本发明的实现网页访问的方法的第一实施例的流程图;
图 4为本发明的实现网页访问的方法的第二实施例的流程图。 具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。基于本发明中的实施例, 本领域普通技术人员在没有作出创造 性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
请参见图 1 , 为本发明的实现网页访问的系统的实施例的结构示意图。 所 述实现网页访问的系统包括: 数字电视接收终端 10和前端服务器 20。
所述数字电视接收终端 10,用于向所述前端服务器 20发送网页访问请求, 并接收所述前端服务器 20所发送的网页数据, 根据所述接收的网页数据进行 相应的网页显示。
具体实现中, 所述数字电视接收终端 10包括但不限于: 机顶盒(Set Top Box, STB )、 IPTV ( Internet Protocol Television, 网络电视)、 电视手机、 数字 电视一体机等具有接收数字电视功能的终端。具体地, 本发明实施例所述的数 字电视接收终端 10包括嵌入式的浏览器, 以支持网页访问的功能。
具体地, 所述数字电视接收终端 10的网页访问请求包括: 所述数字电视 接收终端 10的型号、 所请求访问的网页的显示要求、 所请求访问的网页的关 键词、 所请求访问的网页 URL ( Uniform/Universal Resource Locator, 统一资 源定位符) 中的任一种或多种。
所述前端服务器 20, 用于根据网页采集策略采集所请求访问的网页数据; 对所述采集的网页数据进行分析处理,并对所述分析处理后的网页数据进行数 据转换; 根据所述数字电视接收终端 10的网页访问请求, 将所述转换后的网 页数据发送给所述数字电视接收终端 10, 以使所述数字电视接收终端 10根据 所述转换后的网页数据进行相应的网页显示。
具体实现中, 所述数字电视接收终端 10的浏览器通过电视网络向前端服 务器 20发送网页访问请求, 比如: http ( HyperText Transfer Protocol , 超文本 传输协议 )请求。 所述前端服务器 20通过互联网去访问网站, 采集网页数据, 并根据网页访问请求中所包括的对网页的显示要求等,对采集的网页数据进行 分析处理、数据转换等处理,使其网页数据形成一个适合于在所述数字电视接 收终端 10显示的较小、 较筒化的页面, 根据网页访问请求中所包含的数字电 视接收终端 10的型号, 再将该网页数据传送给所述数字电视接收终端 10。 所 述数字电视接收终端 10根据接收到网页数据, 进行网页的显示, 比如: 当所 述数字电视接收终端 10为机顶盒等类似设备时, 可以利用与机顶盒相连接的 电视机的显示功能, 将最终的页面内容显示给用户, 并可提供人机交互, 由用 户在显示的页面上进行音视频播放、 图片浏览等操作; 再如, 当该数字电视接 收终端 10为数字电视一体机等类似设备时, 可以利用该一体机的显示功能单 元, 将最终的页面内容显示给用户, 并可提供人机交互, 由用户在显示的页面 上进行音视频播放、 图片浏览等操作。
本发明通过在数字电视接收终端与互联网之间搭建中转的平台,根据数字 电视接收终端发出的网页访问请求, 由中转的服务器进行网页数据的采集、处 理及转换, 对网页(包括: 大型网页)进行筒化, 其筒化后的网页内容能够成 为数字电视接收终端中的浏览器所能支持的功能,使数字电视接收终端的浏览 器后台化, 让复杂繁瑣的网页解析和处理均在服务器进行,解决了数字电视终 端的 CPU低速率的问题; 同时, 加快了数字电视接收终端对于网页的访问速 度。
为了更清楚的说明本发明, 下面将对前端服务器作详细介绍。
请参见图 2, 为本发明的前端服务器的实施例的结构示意图。 所述前端服 务器 20包括: 采集模块 201 , 数据处理模块 202和发送模块 203。
所述采集模块 201 , 用于根据网页采集策略采集网页数据。
具体实现中, 所述前端服务器 20包括数据库, 其主要包括: 临时网页数 据库、 URL数据库、 网页緩存数据库、 内容数据库、 关键词索引数据库、 URL 索引数据库和行为数据库。 所述采集模块 201 主要是通过加载网页采集程序 (又称: 网络爬虫)进行相应的网页数据的采集。
所述前端服务器 20还可包括 DNS ( Domain Name System, 域名系统)緩 存,当网页采集程序进行采集时,直接从 DNS緩存中去获取网页的 IP( Internet Protocol, 网协), 而不需要每次都解析域名, 以减少解析的时间。
另外, 网页访问程序可以分布式布置, 只需要根据采集策略将 URL进行 分组, 每组布置一个网页访问程序, 那么各网页访问程序根据不同的 URL分 组进行网页数据的采集, 可以有效的避免重复采集。
具体地, 所述采集模块 201的采集策略可以包括广度优先、 深度优先、 或 线性优先等策略, 还可以通过用户行为数据分析, 分析出用户常用链接、 常用 关键字、 访问数量等信息, 根据这些信息制定 URL的加权系数, URL数据库 会有优先级之分, 比如: 用户的即时请求, 其 URL的加权系数较大, 其优先 级较高。 所述采集模块 201根据 URL数据库的优先级和采集策略, 加载网页 采集程序, 进行网页的 URL的提取, 同时采集到原始的网页数据, 即所请求 访问的网页数据。 具体实现中, 在进行网页数据采集时, 可采用记录表, 记录 表中包括已访问、 未访问、 内容摘要等信息, 这样, 也可以避免网页的重复采 集, 具体的记录表的方法与现有的数据访问中的存储和记录的方法类似,在此 不进行赘述。
所述数据处理模块 202, 用于对所述采集模块 201采集的所述网页数据进 行分析处理, 并对所述分析处理后的网页数据进行数据转换。
具体实现中,所述数据处理模块 202通过加载网页分析程序对网页数据进 行分析处理, 具体包括: 网页数据去噪、 去除广告数据、 去除导航栏数据、 去 除不支持的功能标签和属性数据、 去除 Javascript 脚本数据、 去除 CSS ( Cascading Style Sheets, 层叠样式表)语法数据和网页数据压缩中的任一种 或多种。 经过分析处理之后, 会保留网页实质的内容和基本的 HTML ( HyperText Mark-up Language, 超文本标记语言)标签, 为后序的数据转换 和索引生成提供高质量素材。
经过上述的分析处理之后,所述数据处理模块 202即对得到的网页数据进 行数据转换, 包括: 图片数据转换、 音视频数据格式转换和筒化转换中的任一 种或多种。 所述的数据转换后的数据会被导入内容数据库中进行相应的存储。
经过所述数据处理模块 202处理后的网页数据,已经是比较适合所述数字 电视接收终端 10的浏览器进行显示的数据了。 所述发送模块 203 , 用于将所述数据处理模块 202进行数据转换后的网页 数据发送给所述数字电视接收终端 10, 使所述数字电视接收终端 10根据所述 网页数据进行相应的网页显示。
本发明通过在数字电视接收终端与互联网之间搭建中转的平台,预先根据 一定的网页采集策略由中转的服务器进行网页数据的采集、处理及转换,对网 页(包括: 大型网页)进行筒化, 其筒化后的网页内容能够成为数字电视接收 终端中的浏览器所能支持的功能,并根据数字电视接收终端发出的网页访问请 求将筒化后的网页内容发给数字电视接收终端,使数字电视接收终端的浏览器 后台化, 让复杂繁瑣的网页解析和处理均在服务器进行,解决了数字电视终端 的 CPU低速率的问题; 同时, 加快了数字电视接收终端对于网页的访问速度。
再请参见图 2, 所述前端服务器 20还可以包括: 检索模块 204, 重排模块 205, 索引生成模块 206和更新模块 207。
所述检索模块 204, 用于根据数字电视接收终端 10的网页访问请求, 检 索数据库内是否存在所请求访问的网页数据;如果所述检索模块 204检索的结 果为是,则使所述发送模块 203将所述检索到的网页数据发送给所述数字电视 接收终端 10; 如果所述检索模块 204检索的结果为否, 则使所述采集模块 201 对所请求访问的网页数据进行采集。
具体实现中, 所述检索模块 204根据所述数字电视接收终端 10的网页访 问请求中包括的所请求访问的网页的关键词、 所请求访问的网页 URL等, 首 先去检索网页緩存数据库中是否有所请求的网页数据,如果有, 则直接将检索 到的网页数据发送给所述数字电视接收终端 10; 如果没有, 则所述检索模块 204会继续根据所请求访问的 URL, 检索 URL索引数据库, 如果检索到存在 所请求访问的 URL , 则根据 URL索 I数据库的索引, 从内容数据库中找出相 应的网页数据, 发送给所述数字电视接收终端 10; 如果在 URL索引数据库中 未检索到所请求访问的 URL, 还可根据输入的关键词, 检索索引数据库, 看 是否存在相应的关键词, 如果有, 则直接根据索引数据库中的关键词, 在内容 数据库中找出相应的网页数据; 如果没有, 则由所述采集模块 201从互联网进 行网页数据采集。
具体地, 上述索引数据库存储了网页的关键词、 所述 URL索引数据库存 储了网页的 URL、 所述内容数据库存储了相应的网页数据, 其中, 索引数据 库、 URL索引数据库、 内容数据库相互关联。
所述重排模块 205 , 用于在所述数据处理模块 202对网页数据进行数据转 换后, 根据所述数字电视接收终端 10的网页访问请求, 对所述转换后的网页 数据进行排版,所述排版后的网页数据由所述发送模块 203发送给所述数字电 视接收终端 10。
具体实现中, 由于各数据字电视接收终端 10的型号不同, 对网页的显示 要求也不同, 因此, 所述重排模块 205会根据所述数字电视接收终端 10的网 页访问请求中所包括的数字电视接收终端 10的型号以及显示要求, 对数据转 换后的网页数据再进行重新排版, 使其适合数字电视接收终端 10的浏览器的 排版显示。 比如: 根据数字电视接收终端 10的显示请求, 将高清晰的视频数 据重新排版为普通的视频数据发送给所述数字电视接收终端 10来显示; 再比 如: 当网页数据转换后, 其所述数字电视接收终端 10的型号类型依然不能够 快速的打开大型网页时, 可以将其重新排版为更小页面的网页数据, 然后发送 给所述数字电视接收终端 10进行显示。
所述索引生成模块 206, 用于在所述数据处理模块 202对网页数据进行数 据转换时, 根据所述分析处理后的网页数据, 生成关键词索引和 URL索引。
具体实现中,对于分析处理后的网页数据, 所述索引生成模块 206会进行 相应的索引生成过程,除了常规的关键字或关键词的索引生成外,还会有 URL 索引生成,这是因为数字电视接收终端 10的网页访问是一个 URL访问的过程, 因此, 为了保证网页访问的实时性, 生成关键词索引和 URL索引就是必要的, 其生成的关键字词索引和 URL索引分别放入索引数据库和 URL索引数据库 中。 具体地, 为了加快访问速度, 各数据库中必须存储一些用户经常访问的网 站内容(网页数据), 比如: 新浪、 网易、 搜狐、 腾讯等。
所述更新模块 207, 用于根据所述数字电视接收终端 10的网页访问请求, 更新所述数据库。
由于数据库中需要存储一些常用的网页数据, 以加快网页访问的速度, 并 且, 需要保证数据库中的网页数据必须是最新的、 未过时的数据, 以实时响应 网页访问请求。 因此, 所述更新模块 207, 根据所述数字电视接收终端 10的 网页访问请求, 更新所述数据库。 具体实现中, 所述更新模块 207可以通过加 载行为分析程序实现数据库的更新, 在接收到所述数字电视接收终端 10的网 页访问请求时, 其行为分析程序会自动分析该请求, 分析出常用链接、 常用关 键字、 访问数量等信息, 根据这些信息再重新去制定数据库的存储策略, 然后 更新数据库。 例如根据分析用户的访问内容即常用链接等, 实时更新 URL数 据库, 将常用的 URL数据存储在所述 URL数据库中。 另外, 通过网页采集程 序采集(抓取)一些网页, 再通过行为分析程序进行相应的分析, 然后根据分 析结果更新数据库, 这样, 可以保证网页访问系统的实时性。
本发明通过在数字电视接收终端与互联网之间搭建中转的平台,根据数字 电视接收终端发出的网页访问请求, 由中转的服务器进行网页数据的采集、处 理及转换, 对网页(包括: 大型网页)进行筒化, 其筒化后的网页内容能够成 为数字电视接收终端中的浏览器所能支持的功能,使数字电视接收终端的浏览 器后台化, 让复杂繁瑣的网页解析和处理均在服务器进行,解决了数字电视终 端的 CPU低速率的问题; 同时, 加快了数字电视接收终端对于网页的访问速 度。
为了更清楚的说明本发明, 下面将对实现网页访问的方法作详细介绍。 请参见图 3, 为本发明的实现网页访问的方法的第一实施例的流程图。 该 方法包括:
5101 , 根据网页采集策略采集网页数据。
具体实现中, 网页数据的采集策略可以包括广度优先、 深度优先、 或线性 优先等策略, 还可以通过用户行为数据分析, 分析出用户常用链接、 常用关键 字、 访问数量等信息, 根据这些信息制定 URL的加权系数, URL数据库会有 优先级之分, 比如: 用户的即时请求, 其 URL的加权系数较大, 其优先级较 高。 根据 URL数据库的优先级和采集策略, 加载网页采集程序, 进行网页的 URL的提取, 同时采集到原始的网页数据, 即所请求访问的网页数据。
具体实现中, 在进行网页数据采集时, 可采用记录表, 记录表中包括已访 问、 未访问、 内容摘要等信息, 这样, 也可以避免网页的重复采集, 具体的记 录表的方法与现有的数据访问中的存储和记录的方法类似, 在此不进行赘述。
5102,对所述采集的网页数据进行分析处理,并对所述分析处理后的网页 数据进行数据转换。
具体实现中,对所述采集的网页数据进行分析处理, 包括: 网页数据去噪、 去除广告数据、 去除导航栏数据、 去除不支持的功能标签和属性数据、 去除 Javascript脚本数据、 去除 CSS语法数据和网页数据压缩中的任一种或多种。
所述数据转换包括: 图片数据转换、音视频数据格式转换和筒化转换中的 任一种或多种。
S103 ,根据数字电视接收终端的网页访问请求,将所述转换后的网页数据 发送给数字电视接收终端,以使数字电视接收终端根据转换后的网页数据进行 相应的网页显示。
具体实现中, 所述数字电视接收终端的网页访问请求包括: 数字电视接收 终端的型号、 所请求访问的网页的显示要求、 所请求访问的网页的关键词、 所 请求访问的网页 URL中的任一种或多种。
本发明通过在数字电视接收终端与互联网之间搭建中转的平台,根据数字 电视接收终端发出的网页访问请求, 由中转的服务器进行网页数据的采集、处 理及转换, 对网页(包括: 大型网页)进行筒化, 其筒化后的网页内容能够成 为数字电视接收终端中的浏览器所能支持的功能,使数字电视接收终端的浏览 器后台化, 让复杂繁瑣的网页解析和处理均在服务器进行,解决了数字电视终 端的 CPU低速率的问题; 同时, 加快了数字电视接收终端对于网页的访问速 度。
请参见图 4, 为本发明的实现网页访问的方法的第二实施例的流程图。 该 方法包括:
5201 , 数字电视接收终端发出网页访问请求;
5202, 检索数据库是否存在所请求访问的网页数据; 如果检索结果为是, 则执行步骤 S206, 如果检索结果为否, 则执行步骤 S203;
具体实现中,前端服务器根据所述网页访问请求中包括的所请求访问的网 页的关键词、所请求访问的网页 URL等, 由 S202首先去检索网页緩存数据库 中是否有所请求的网页数据, 如果有, 则执行步骤 S206; 如果没有, S202会 继续根据所请求访问的 URL, 检索 URL索引数据库, 如果检索到存在所请求 访问的 URL, 则根据 URL索引数据库的索引, 从内容数据库中找出相应的网 页数据, 然后执行步骤 S206; 如果在 URL索引数据库中未检索到所请求访问 的 URL, 还可根据输入的关键词, 检索索引数据库中, 看是否存在相应的关 键词, 如果有, 则直接根据索引数据库中的关键词, 在内容数据库中找出相应 的网页数据, 然后执行步骤 S206; 如果没有, 则执行步骤 S203。
具体地, 上述索引数据库存储了网页的关键词、 所述 URL索引数据库存 储了网页的 URL、 所述内容数据库存储了相应的网页数据, 其中, 索引数据 库、 URL索引数据库、 内容数据库相互关联。 具体实现中, S202需要保证, 当在数据库中检索到所请求访问的网页数据,并以所述网页数据响应所述网页 访问请求时, 必须保证其检索到的网页数据为最新的网页数据(即: 保证该网 页数据未过时)。
5203 , 采集所请求访问的网页数据;
具体实现中, S203 主要是通过加载网页采集程序 (又称: 网络爬虫)进 行相应的网页数据的采集。 所述网页采集程序进行采集时, 直接从 DNS緩存 中去获取网页的 IP, 不需要每次解析域名, 以减少解析的时间。 另外, 网页 访问程序可以分布式布置, 只需要根据采集策略将 URL进行分组, 每组布置 一个网页访问程序, 那么各网页访问程序根据不同的 URL分组进行网页数据 的采集, 可以有效的避免重复采集。 具体地, S203根据 URL数据库的优先级 和采集策略, 加载网页采集程序, 进行网页的 URL的提取, 同时采集到原始 的网页数据,即所请求访问的网页数据。具体实现中,在进行网页数据采集时, 可采用记录表, 记录表中包括已访问、 未访问、 内容摘要等信息, 这样, 也可 以避免网页的重复采集,具体的记录表的方法与现有的数据访问中的存储和记 录的方法类似, 在此不进行赘述。
5204, 对采集的网页数据进行分析处理、 数据转换;
具体实现中, S204 通过加载网页分析程序对网页数据进行分析处理, 具 体包括: 网页数据去噪、 去除广告数据、 去除导航栏数据、 去除不支持的功能 标签和属性数据、 去除 Javascript脚本数据、 去除 CSS语法数据和网页数据压 缩中的任一种或多种。 经过分析处理之后, 会保留网页实质的内容和基本的 HTML标签, 为后序的数据转换和索引生成提供高质量素材。
经过上述的分析处理之后, S204对得到的网页数据进行数据转换, 包括: 图片数据转换、音视频数据格式转换和筒化转换中的任一种或多种。所述的数 据转换后的数据会被导入内容数据库中进行相应的存储。 经过 S204上述处理 后的网页数据,已经是比较适合所述数字电视接收终端的浏览器进行显示的数 据了。
S205 , 对分析处理后的网页数据排版;
具体实现中, 由于各数字电视接收终端的型号不同,对网页的显示要求不 同, 因此, S205 会根据所述数字电视接收终端的型号以及显示要求, 对数据 转换后的网页数据再进行重新排版,使其适合数字电视接收终端的浏览器的排 版显示。 比如: 根据数字电视接收终端的显示请求, 将高清晰的视频数据重新 排版为普通的视频数据; 再比如: 当网页数据转换后, 其所述数字电视接收终 端的型号类型依然不能够快速的打开大型网页时,可以将其重新排版为更小页 面的网页数据。
5206, 将网页数据发送给所述数字电视接收终端;
5207 , 数字电视接收终端根据网页数据显示相应的网页。
优选地, 本发明实施例二所述方法, 在步骤 S204之后, 还可以包括:
5208, 对经 S204分析处理后的网页数据生成关键词索引和 URL索引。 具体实现中, 对于分析处理后的网页数据, S208会进行相应的索引生成 过程, 除了常规的关键字或关键词的索引生成外, 还会有 URL索引生成, 这 是因为数字电视接收终端的网页访问是一个 URL访问的过程, 因此, 为了保 证网页访问的实时性, 生成关键词索引和 URL索引就是必要的, 其生成的关 键字词索引和 URL索引分别放入索引数据库和 URL索引数据库中。 具体地, 为了加快访问速度,各数据库中必须存储一些用户经常访问的网站内容(网页 数据), 比如: 新浪、 网易、 搜狐、 腾讯等。
优选地,本发明实施例二所述方法,还可以包括数据库更新过程,具体的: S209, 根据 S201的网页访问请求, 和 /或 S208生成的关键词索引和 URL 索引, 更新数据库。
由于数据库中需要存储一些常用的网页数据, 以加快网页访问的速度, 并 且, 需要保证数据库中的网页数据必须是最新的、 未过时的数据, 以实时响应 网页访问请求。 因此, S209根据 S201的网页访问请求, 更新所述数据库。 具 体实现中, S209可以通过加载行为分析程序实现数据库的更新, 在 S201的数 字电视接收终端发送网页访问请求时, 其行为分析程序会自动分析该请求, 分 析出常用链接、 常用关键字、 访问数量等信息, 根据这些信息再重新去制定数 据库的存储策略, 然后更新数据库。 另外, 可在平时 (即数字电视接收终端 10未发送网页访问请求的时候), 由网页采集程序随机的采集(抓取)一些网 页, 再通过行为分析程序进行相应的分析, 然后根据分析结果更新数据库, 这 样, 可以保证网页访问系统的实时性。
上述各步骤中,除 S201和 S207的执行主体为数字电视接收终端外,其余 的处理过程的执行主体均为前端服务器。
本发明通过在数字电视接收终端与互联网之间搭建中转的平台,根据数字 电视接收终端发出的网页访问请求, 由中转的服务器进行网页数据的采集、处 理及转换, 对网页(包括: 大型网页)进行筒化, 其筒化后的网页内容能够成 为数字电视接收终端中的浏览器所能支持的功能,使数字电视接收终端的浏览 器后台化, 让复杂繁瑣的网页解析和处理均在服务器进行,解决了数字电视终 端的 CPU低速率的问题; 同时, 加快了数字电视接收终端对于网页的访问速 度。
以上所揭露的仅为本发明一种较佳实施例而已,当然不能以此来限定本发 明之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流 程, 并依本发明权利要求所作的等同变化, 仍属于发明所涵盖的范围。

Claims

权 利 要 求
1、 一种实现网页访问的方法, 其特征在于:
根据网页采集策略采集网页数据;
对所述采集的网页数据进行分析处理,并对所述分析处理后的网页数据进 行数据转换;
根据接收的数字电视接收终端的网页访问请求,将所述转换后的网页数据 发送给所述数字电视接收终端,以使所述数字电视接收终端根据所述转换后的 网页数据进行相应的网页显示。
2、 如权利要求 1所述的方法, 其特征在于, 在执行所述根据网页采集策 略采集网页数据步骤之前, 还包括:
根据接收的数字电视接收终端的网页访问请求,检索数据库内是否存在所 请求访问的网页数据;
如果检索结果为是,则将所述检索到的网页数据发送给所述数字电视接收 终端,以使所述数字电视接收终端根据所述检索到的网页数据进行相应的网页 显示;
如果检索结果为否, 再执行所述根据网页采集策略采集网页数据的步骤。
3、 如权利要求 1所述的方法, 其特征在于, 所述对所述分析处理后的网 页数据进行数据转换之后,将所述转换后的网页数据发送给所述数字电视接收 终端之前, 还包括:
根据所述数字电视接收终端的网页访问请求,对所述转换后的网页数据进 行排版。
4、 如权利要求 1所述的方法, 其特征在于, 所述对所述分析处理后的网 页数据进行数据转换之后, 还包括:
根据所述分析处理后的网页数据, 生成关键词索引和 URL索引。
5、 如权利要求 1所述的方法, 其特征在于, 还包括: 根据接收的所述数字电视接收终端的网页访问请求, 更新所述数据库。
6、 如权利要求 1-5任一项所述的方法, 其特征在于:
所述数字电视接收终端的网页访问请求包括:所述数字电视接收终端的型 号、 所请求访问的网页的显示要求、 所请求访问的网页的关键词、 所请求访问 的网页 URL中的任一种或多种;
所述数据库包括: 临时网页数据库、 URL数据库、 网页緩存数据库、 内 容数据库、 关键词索引数据库、 URL索引数据库和行为数据库。
7、 一种前端服务器, 其特征在于, 包括:
采集模块, 用于根据网页采集策略采集网页数据;
数据处理模块, 用于对所述采集模块采集的所述网页数据进行分析处理, 并对所述分析处理后的网页数据进行数据转换;
发送模块,用于根据接收的数字电视接收终端的网页访问请求将所述数据 处理模块进行数据转换后的网页数据发送给所述数字电视接收终端,以使所述 数字电视接收终端根据所述网页数据进行相应的网页显示。
8、 如权利要求 7所述的服务器, 其特征在于, 还包括:
检索模块, 用于在采集模块进行网页数据采集之前,根据所述数字电视接 收终端的网页访问请求, 检索数据库内是否存在所请求访问的网页数据; 如果所述检索模块检索的结果为是,则使所述发送模块将所述检索到的网 页数据发送给所述数字电视接收终端;
如果所述检索模块检索的结果为否,则使所述采集模块对所请求访问的网 页数据进行采集。
9、 如权利要求 7所述的服务器, 其特征在于, 还包括:
重排模块, 用于在所述数据处理模块对网页数据进行数据转换后,根据所 述数字电视接收终端的网页访问请求, 对所述转换后的网页数据进行排版; 所述排版后的网页数据由所述发送模块发送给所述数字电视接收终端。
10、 如权利要求 7所述的服务器, 其特征在于, 还包括:
索引生成模块, 用于在所述数据处理模块对网页数据进行数据转换时,根 据所述分析处理后的网页数据, 生成关键词索引和 URL索引。
11、 如权利要求 7所述的服务器, 其特征在于, 还包括:
更新模块, 用于根据所述数字电视接收终端的网页访问请求, 更新所述数 据库。
12、 一种实现网页访问的系统, 包括数字电视接收终端, 其特征在于, 还 包括: 前端服务器;
所述数字电视接收终端, 用于向所述前端服务器发送网页访问请求, 并接 收所述前端服务器所发送的网页数据,根据所述接收的网页数据进行相应的网 页显示;
所述前端服务器, 用于根据网页采集策略采集网页数据; 对所述采集的网 页数据进行分析处理, 并对所述分析处理后的网页数据进行数据转换; 根据接 收的所述数字电视接收终端的网页访问请求,将所述转换后的网页数据发送给 所述数字电视接收终端,以使所述数字电视接收终端根据所述转换后的网页数 据进行相应的网页显示。
PCT/CN2011/070703 2010-02-09 2011-01-27 实现网页访问的方法、系统及前端服务器 WO2011097992A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201010112317A CN101808114A (zh) 2010-02-09 2010-02-09 实现网页访问的方法、系统及前端服务器
CN201010112317.6 2010-02-09

Publications (1)

Publication Number Publication Date
WO2011097992A1 true WO2011097992A1 (zh) 2011-08-18

Family

ID=42609734

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/070703 WO2011097992A1 (zh) 2010-02-09 2011-01-27 实现网页访问的方法、系统及前端服务器

Country Status (2)

Country Link
CN (1) CN101808114A (zh)
WO (1) WO2011097992A1 (zh)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101808114A (zh) * 2010-02-09 2010-08-18 深圳市同洲电子股份有限公司 实现网页访问的方法、系统及前端服务器
CN102411576B (zh) * 2010-09-25 2017-03-08 上海掌门科技有限公司 用电子书阅读器浏览论坛的方法
WO2012071798A1 (zh) * 2010-12-01 2012-06-07 深圳市同洲软件有限公司 移动终端与数字电视接收终端分享网页方法及装置和系统
CN102611913B (zh) * 2011-01-24 2015-04-29 北京东方广视科技股份有限公司 用于有线电视访问网页的服务平台、机顶盒、系统及方法
CN102364461A (zh) * 2011-06-30 2012-02-29 广州市动景计算机科技有限公司 网页内容数据获取方法及服务器
CN102255970B (zh) * 2011-07-20 2013-12-18 北京视博云科技有限公司 一种互动业务远程访问系统
CN102724189B (zh) * 2012-06-06 2016-06-15 杭州华三通信技术有限公司 一种控制用户url访问的方法及装置
CN106202264A (zh) * 2016-06-29 2016-12-07 乐视控股(北京)有限公司 一种数据处理方法及装置
CN106021615A (zh) * 2016-07-01 2016-10-12 广东小天才科技有限公司 题目搜索优化方法及装置
CN106446299A (zh) * 2016-11-30 2017-02-22 深圳Tcl数字技术有限公司 网站信息下载方法及装置
CN112988860B (zh) * 2019-12-18 2023-09-26 菜鸟智能物流控股有限公司 数据加速处理方法、装置及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211349A (zh) * 2006-12-28 2008-07-02 深圳市同洲电子股份有限公司 一种生成数字电视开机门户页面的系统及其方法
CN101378472A (zh) * 2007-08-27 2009-03-04 奇景光电股份有限公司 数字电视收看终端、电子节目指南服务系统及其显示方法
CN101527783A (zh) * 2008-12-25 2009-09-09 深圳市同洲电子股份有限公司 一种获取界面数据方法、系统及数字电视接收终端
CN101808114A (zh) * 2010-02-09 2010-08-18 深圳市同洲电子股份有限公司 实现网页访问的方法、系统及前端服务器
CN101908048A (zh) * 2009-06-04 2010-12-08 深圳市彪骐数码科技有限公司 一种互联网影视内容搜索的方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211349A (zh) * 2006-12-28 2008-07-02 深圳市同洲电子股份有限公司 一种生成数字电视开机门户页面的系统及其方法
CN101378472A (zh) * 2007-08-27 2009-03-04 奇景光电股份有限公司 数字电视收看终端、电子节目指南服务系统及其显示方法
CN101527783A (zh) * 2008-12-25 2009-09-09 深圳市同洲电子股份有限公司 一种获取界面数据方法、系统及数字电视接收终端
CN101908048A (zh) * 2009-06-04 2010-12-08 深圳市彪骐数码科技有限公司 一种互联网影视内容搜索的方法及系统
CN101808114A (zh) * 2010-02-09 2010-08-18 深圳市同洲电子股份有限公司 实现网页访问的方法、系统及前端服务器

Also Published As

Publication number Publication date
CN101808114A (zh) 2010-08-18

Similar Documents

Publication Publication Date Title
WO2011097992A1 (zh) 实现网页访问的方法、系统及前端服务器
CN100424694C (zh) 一种网络收藏夹的实现方法
RU2522103C2 (ru) Способ и браузер для уведомления об обновлении
WO2016095733A1 (zh) 网络数据的展示处理方法和装置
US20140201617A1 (en) Method for Browsing Web Page on Mobile Terminal
US20100268694A1 (en) System and method for sharing web applications
CN105095280A (zh) 一种浏览器缓存方法和装置
WO2015043383A1 (zh) 一种进行网页加载的方法、装置和浏览器
US20130305140A1 (en) Apparatus, system, and method for obtaining image and text information
WO2012071993A1 (zh) 一种环球信息网www页面处理方法和装置
WO2013078830A1 (zh) 一种处理移动终端的页面访问请求的方法、设备与系统
WO2013034094A1 (zh) 一种网页浏览方法、装置及存储介质
US20130305131A1 (en) Method, system and computer storage medium for pre-reading network data
US20100077300A1 (en) Computer Method and Apparatus Providing Social Preview in Tag Selection
WO2015003663A1 (zh) 一种视频处理方法、装置、服务器和客户端设备
US9465814B2 (en) Annotating search results with images
WO2016050124A1 (zh) 网页转码方法、装置以及服务器
KR102009020B1 (ko) 검색 엔진으로 웹 사이트 인증 데이터를 제공하기 위한 방법 및 장치
RU2399090C2 (ru) Система и способ для интернет-поиска мультимедийного контента реального времени
US20010056497A1 (en) Apparatus and method of providing instant information service for various devices
WO2012159360A1 (zh) 网页预取的方法及装置
WO2012119496A1 (zh) 预读方法和装置
TW201009698A (en) Method for improving the accessing efficiency of embedded web page
US9940364B2 (en) Obtaining desired web content from a mobile device
JPWO2006109770A1 (ja) 意味検索プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11741863

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11741863

Country of ref document: EP

Kind code of ref document: A1