CN101763419A - Method for synchronously updating remote rss data by local database - Google Patents

Method for synchronously updating remote rss data by local database Download PDF

Info

Publication number
CN101763419A
CN101763419A CN200910255744A CN200910255744A CN101763419A CN 101763419 A CN101763419 A CN 101763419A CN 200910255744 A CN200910255744 A CN 200910255744A CN 200910255744 A CN200910255744 A CN 200910255744A CN 101763419 A CN101763419 A CN 101763419A
Authority
CN
China
Prior art keywords
rss
item
source
information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910255744A
Other languages
Chinese (zh)
Inventor
袁东风
颜廷芝
王恒
徐超
林贺
陈飞
魏斌
石祚夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University
Original Assignee
Shandong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University filed Critical Shandong University
Priority to CN200910255744A priority Critical patent/CN101763419A/en
Publication of CN101763419A publication Critical patent/CN101763419A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a method for synchronously updating remote rss data by a local database and belongs to the technical field of updating network database. The method comprises the following steps of: 1) analyzing all the rss sources by a content server and inputting the analyzed rss information to the local database, 2) classifying the rss information and integrating into the local database of the content server, and 3) updating the rss content by the content server. The method of the invention solves the problems that the speed of the client directly accessing the rss data is low, the classification of the rss data is complex and the operation is fussy.

Description

A kind of method of synchronously updating remote rss data by local database
Affiliated technical field
The present invention relates to a kind of method of synchronously updating remote rss data by local database, belong to network data base renewal technology field.
Background technology
RSS also is clustering RSS, is a kind of easy means (also being aggregated content, Really SimpleSyndication) of online content shared.Usually use RSS to subscribe on ageing more intense content and can obtain information faster, the website provides RSS output, helps allowing the user obtain the latest update of web site contents.RSS is used in sharing information between the website, the network user can be in client by means of the polymerization tool software of supporting RSS (Sharp Reader for example, RSS Reader, NewzCrawler, Feed Demon), under the situation of not opening the web site contents page, read the web site contents of supporting RSS output.The RSS source is a kind of description and synchronous website format of content, is that present most popular XML uses.RSS has built the fast-spreading technology platform of information, makes everyone become potential informant.After issuing a RSS file, the information that comprises among this RSS Feed just can directly be called by other websites, and because these data all are the standard XML forms, thus also can other terminal and service in use.RSS is that present most popular resource sharing is used, and can be called as the extension of resource sharing pattern.
The grammer introduction of RSS: RSS is based on the form of text.It is a kind of form of XML (XML (extensible Markup Language)), and in simple terms, in fact RSS is exactly an XML file, has defined relevant DTD (Document Type Definition, document class definition).The XML data that the RSS file is exactly one section standard, this document is generally with rss, and xml or rdf are as suffix.Usually the RSS file all is to be designated as XML, and RSS file (being also referred to as RSS feeds or channels usually) only comprises simple project (item) tabulation usually.Generally speaking, each project (item) all contains a title (title), and one section simple introduce (description) also has a URL link (link is such as the address that is a webpage).And other information, for example information (author) of date (pubdate), founder or the like all can be selected.The concrete structure of RSS: the RSS2.0 file is made of a channel element and daughter element item thereof.All RSS must follow XML1.0 standard, root element<RSS〉version (version) attribute point out the RSS standard that the document is followed.The channel element is used to describe RSS feed, it has three daughter elements is necessary, be respectively<title,<description 〉,<link 〉, wherein<and title〉title in this RSS source described,<description〉be description to this channel,<link〉the URL link of channel correspondence described; Other daughter element is optional, as<image 〉,<language 〉,<category 〉,<copyright 〉,<pubdate〉etc.,<image〉defined the GIF that shows this channel, the picture of JPEG or PNG form,<language〉language that this RSS uses described,<category〉state one or more classification under this channel,<copyright〉be the copyright statement of this RSS,<pubdate〉date of describing this RSS issue.
<item〉element is most important parts in the rss document, each<channel〉element can have one or more<item〉element, each<item〉one piece of article or " story " among the element definable RSS feed.Its content also often changes, and is used for the content of display update.<item〉in the element<title 〉,<description 〉,<link〉element is necessary, wherein<title〉be used to describe this title,<description〉and be description to these clauses and subclauses,<link〉corresponding URL link described; Also have some options as<pubdate 〉,<source 〉,<author 〉,<comments 〉,<category 〉,<guid〉etc., wherein<and pubdate〉be the date of this clauses and subclauses issue,<source〉be that third party of this clauses and subclauses appointment originates,<author〉the description author information,<comments〉the permission project is connected to relevant this purpose note (file),<category〉point out this affiliated classification,<guid〉be unique identifier of this project definition.Generally, the introduction of one section item may comprise whole introductions of news, perhaps only is extra content or brief introduction.The link of these projects can both be linked to whole contents usually, can allow the up-to-date information of user's reading website content.
Rss provides the content sharing between the website, and the user can read the web site contents of supporting RSS output by subscribing to rss under the situation of not opening the web site contents page.Generally speaking, the user also can pass through rss reader or online network tool direct reading rss content.But these application needs user initiatively seeks the rss information source, and manually adds the rss information source in tabulation, complex operation, and data mix, and are not easy to user's operation.
Traditional rss subscribing manner can obtain each element information of rss by direct visit rss, at first the user sends to server and connects the rss request, server is received the request back and is connected the Internet, and the Internet returns the rss data and gives server, and last server is to user's return data.Its step is as follows:
(1), the user sends connection request to server;
(2), server goes to the Internet to connect the rss source according to the rss link information of user's submission;
(3), the Internet returns the rss data message to server;
(4), server returns rss information to the user.
This way access speed is slow, the contents processing complexity, in case and server be in the pattern of going offline, system just can not normally move, its reliability descends greatly, can not offer high-quality service for the user.As computer knowledge and the 5th the 9th phase of volume of technical journal, in March, 2009, " based on the individual info service research of RSS " described these row that promptly belong to.
Summary of the invention
For defective and the deficiency that overcomes prior art, directly visit speed that the rss data exist slowly and the problem of complicated chaotic, the complex operation of rss data qualification to solve the user, the invention provides a kind of method of synchronously updating remote rss data by local database.
Technical solution of the present invention is: directly carries out rss and subscribes at the content server end, and the rss information classification that will originate different, be incorporated into local data base, directly provide rss service by content server to the user.
Technical solution of the present invention is as follows:
A kind of method of synchronously updating remote rss data by local database, step is as follows:
1) content server is resolved all rss sources, and the rss information of resolving is put into local data base;
2) the rss information that obtains is classified, be incorporated into the local data base of content server;
3) the content server end carries out the rss content update.
Above-mentioned steps 1) described content server is resolved all rss sources, and the rss information of resolving is put into local data base, and concrete steps are as follows:
(1) the XML_RSS object: $rss=﹠amp of the some rss of generation source correspondence; New XML_RSS ($url); Url is the link of this rss source correspondence;
(2) resolve this rss source: $rss-〉parse ();
(3) obtain all item:$items=$rss-in this rss source〉getItems ();
(4) all rss are resolved, and the information in each rss source is all left in the tables of data, wherein in local data base, set up a tables of data, to identify different rss source information for each rss source.
Usually, we extract title, description, link, the pubdate element of item, and classify according to title element and description element, by with the rss source under channel carry out fuzzy matching classification, deposit in the different pieces of information table in the local data base.
Above-mentioned steps 2) described the rss information that obtains is classified, be incorporated into the local data base of content server, concrete steps are as follows:
(1) determines according to channel classification information which the classification of rss information has, wherein set up a tables of data, to identify different channel datas for each channel;
(2) by parsing obtain<title,<description determine which classification this item belongs to;
(3), obtain the full text information of this item according to the link information that obtains; And this item joined in the respective classified;
(4) all item are classified.
Above-mentioned steps (2) described by parsing obtain<title,<description determine this item belong to which the classification, concrete steps are as follows:
1. at first right<title〉with<description different weights are set;
2. carry out the column The matching analysis with the column under definite this item;
3. carry out the channel analysis at a certain column, the channel classification under determining;
4. judge whether this item has been present in this classification, if, do not carry out tables of data and insert operation, otherwise with this item insert in the tables of data of corresponding channel classification.
For guaranteeing the ageing of rss information, need carry out the rss content update at the content server end;
Above-mentioned steps 3) described content server end carries out the rss content update, and concrete steps are as follows:
(1) if first item in a certain rss source has been present in the database, then stop to upgrade, otherwise continue to extract the item in this source, be inserted in the tables of data of this source correspondence, and it is classified;
(2) upgrading rss successively has been present in the tables of data until the item that is extracted;
(3) all rss sources are upgraded.
Above-mentioned steps 3) the content server end upgrades the rss source in, can to different rss sources different renewal frequencies be set according to the renewal speed in different rss source.
Above-mentioned steps (3) is described to be upgraded all rss sources, and concrete steps are as follows:
1. to a certain rss source item<pubdate (issuing time) analyze, and determines the similarity of issuing time between the adjacent item, sets up the similarity vector table;
2. determine the renewal frequency in a certain rss source according to the similarity vector table;
3. set up the similarity vector table in all rss sources, determine the renewal frequency in all rss sources.
The inventive method system for use in carrying by the terminal of user side, deposit RSS information content server, provide 3 parts in the Internet in RSS source to form, wherein content server is integrated classification with RSS source information data different on the Internet, leave in the local data base, and provide subscription service to the user.Synchronously updating remote rss data by local database can be divided into 2 functional modules: database update module, information subscribing module.
Content server has extracted long-range rss information data and has carried out classification and storage on the content server of this locality, and system can export different rss information to the user according to the information of customization, and the user need not to know the source of these contents.Because these rss information are to be stored in local content server, this has improved user's access speed, has saved user's time.This method avoids user side directly to connect the rss source, thereby improves user's access speed, and disconnects under the situation about being connected with the rss source at content server, and system still can work, and avoids the service disruption that causes owing to the network failure reason.At the content server end, we carry out regular update to the information in rss source, thereby guarantee the ageing of user profile, and for the user provides quality services, the user only need subscribe to the rss service that corresponding information just can obtain belonging on the network this information classification.
The RSS subscribing manner of background technology is, the user subscribes to RSS information by the RSS reader, can be obtained each element information of rss by direct visit rss: at first the user sends and connects the RSS request, server is received the request back and is connected the Internet, the Internet returns the rss data and gives server, and last server returns the RSS data to the user.And the method for synchronously updating remote rss data by local database of the present invention is separated the user by content server with the Internet, content server obtains the information in rss source from the Internet, the user only need submit to content server and subscribe to, sends the information service that browse request just can obtain coming from the Internet.
The inventive method has solved directly carries out slow this problem of RSS subscription access speed, this technology is subscribed in conjunction with traditional RSS, carrying out RSS at the content server end subscribes to, resolve the RSS source, and the RSS information of resolving integrated classification, and be stored in local data base, optimized data memory format, promptly increase an intermediary service layer, to improve access speed at the content server end.
Description of drawings
Fig. 1 is the process flow diagram of the inventive method; Wherein 1) be its each step-3).
Fig. 2 is the process flow diagram of the concrete steps of step 1) shown in Fig. 1; Wherein (1)-(4) are its each step.
Fig. 3 is a step 2 shown in Fig. 1) the process flow diagram of concrete steps; Wherein a-d is its each step.
Fig. 4 is the process flow diagram of the concrete steps of the b of step shown in Fig. 3; Wherein e-h is its each step.
Fig. 5 is the process flow diagram of the concrete steps of step 3) shown in Fig. 1; Wherein i-k is its each step.
Fig. 6 is the process flow diagram of the concrete steps of the k of step shown in Fig. 5; Wherein 1-n is its each step.
Embodiment
The invention will be further described below in conjunction with drawings and Examples, but be not limited thereto.
Embodiment:
A kind of method of synchronously updating remote rss data by local database, as shown in Figure 1, step is as follows:
1) content server is resolved all rss sources, and the rss information of resolving is put into local data base;
2) the rss information that obtains is classified, be incorporated into the local data base of content server;
3) the content server end carries out the rss content update.
Above-mentioned steps 1) described content server is resolved all rss sources, and the rss information of resolving is put into local data base, and as shown in Figure 2, concrete steps are as follows:
(1) the XML_RSS object: $rss=﹠amp of the some rss of generation source correspondence; New XML_RSS ($url); Url is the link of this rss source correspondence;
(2) resolve this rss source: $rss-〉parse ();
(3) obtain all item:$items=$rss-in this rss source〉getItems ();
(4) all rss are resolved, and the information in each rss source is all left in the tables of data, wherein in local data base, set up a tables of data, to identify different rss source information for each rss source.
Above-mentioned steps 2) described the rss information that obtains is classified, be incorporated into the local data base of content server, as shown in Figure 3, concrete steps are as follows:
A determines according to channel classification information which the classification of rss information has, and wherein sets up a tables of data for each channel, to identify different channel datas;
B is obtained<title by parsing 〉,<description〉determine which classification this item belongs to;
C obtains the full text information of this item according to the link information that obtains; And this item joined in the respective classified;
D classifies to all item.
Above-mentioned steps b described by parsing obtain<title,<description determine this item belong to which the classification, as shown in Figure 4, concrete steps are as follows:
E is at first right<title with<description different weights are set;
F carries out the column The matching analysis with the column under definite this item;
G carries out the channel analysis at a certain column, the channel classification under determining;
H judges whether this item has been present in this classification, if, do not carry out tables of data and insert operation, otherwise with this item insert in the tables of data of corresponding channel classification.
Above-mentioned steps 3) described content server end carries out the rss content update, and as shown in Figure 5, concrete steps are as follows:
I then stops to upgrade if first item in a certain rss source has been present in the database, otherwise otherwise continue to extract the item in this source, be inserted in the tables of data of this source correspondence, and it is classified;
J upgrades rss successively and has been present in the tables of data until the item that is extracted;
K upgrades all rss sources.
Above-mentioned steps k is described to be upgraded all rss sources, and as shown in Figure 6, concrete steps are as follows:
1 couple of a certain rss source item<pubdate〉(issuing time) analyze, and determines the similarity of issuing time between the adjacent item, sets up the similarity vector table;
M determines the renewal frequency in a certain rss source according to the similarity vector table;
N sets up the similarity vector table in all rss sources, determines the renewal frequency in all rss sources.

Claims (6)

1. the method for a synchronously updating remote rss data by local database, step is as follows:
1) content server is resolved all rss sources, and the rss information of resolving is put into local data base;
2) the rss information that obtains is classified, be incorporated into the local data base of content server;
3) the content server end carries out the rss content update.
2. resolve all rss sources as the described content server of step 1) in the claim 1, the rss information of resolving is put into local data base, concrete steps are as follows:
(1) the XML_RSS object: $rss=﹠amp of the some rss of generation source correspondence; New XML_RSS ($url); Url is the link of this rss source correspondence;
(2) resolve this rss source: $rss->parse ();
(3) obtain all item:$items=$rss->getItems () in this rss source;
(4) all rss sources are resolved, and the information in each rss source is all left in the tables of data, wherein in local data base, set up a tables of data, to identify different rss source information for each rss source.
3. as step 2 in the claim 1) described the rss information that obtains is classified, be incorporated into the local data base of content server, concrete steps are as follows:
(1) determines according to channel classification information which the classification of rss information has, wherein set up a tables of data, to identify different channel datas for each channel;
(2) by parsing obtain<title,<description determine which classification this item belongs to;
(3), obtain the full text information of this item according to the link information that obtains; And this item joined in the respective classified;
(4) all item are classified.
As step in the claim 3 (2) described by parsing obtain<title,<description determine this item belong to which the classification, concrete steps are as follows:
1. at first right<title〉with<description different weights are set;
2. carry out the column The matching analysis, with the column under definite this item;
3. carry out the channel analysis at a certain column, the channel classification under determining;
4. judge whether this item has been present in this classification, if, do not carry out tables of data and insert operation, otherwise with this item insert in the tables of data of corresponding channel classification.
5. carry out the rss content update as the described content server end of step 3) in the claim 1, concrete steps are as follows:
(1) if first item in a certain rss source has been present in the database, then stop to upgrade, otherwise continue to extract the item in this source, be inserted in the tables of data of this source correspondence, and it is classified;
(2) upgrading rss successively has been present in the tables of data until the item that is extracted;
(3) all rss sources are upgraded.
The content server end upgrades the rss source in the step 3), can to different rss sources different renewal frequencies be set according to the renewal speed in different rss source.
6. as step in the claim 5 (3) is described all rss sources are upgraded, are implemented as follows:
1. to a certain rss source item<pubdate (issuing time) analyze, and determines the similarity of issuing time between the adjacent item, sets up the similarity vector table;
2. determine the renewal frequency in a certain rss source according to the similarity vector table;
3. set up the similarity vector table in all rss sources, determine the renewal frequency in all rss sources.
CN200910255744A 2009-12-28 2009-12-28 Method for synchronously updating remote rss data by local database Pending CN101763419A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910255744A CN101763419A (en) 2009-12-28 2009-12-28 Method for synchronously updating remote rss data by local database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910255744A CN101763419A (en) 2009-12-28 2009-12-28 Method for synchronously updating remote rss data by local database

Publications (1)

Publication Number Publication Date
CN101763419A true CN101763419A (en) 2010-06-30

Family

ID=42494583

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910255744A Pending CN101763419A (en) 2009-12-28 2009-12-28 Method for synchronously updating remote rss data by local database

Country Status (1)

Country Link
CN (1) CN101763419A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012016404A1 (en) * 2010-08-05 2012-02-09 中兴通讯股份有限公司 Really simple syndication subscription method and client thereof
CN102779146A (en) * 2012-04-26 2012-11-14 新奥特(北京)视频技术有限公司 Method and system for updating data in local database in real time
CN102799602A (en) * 2012-04-26 2012-11-28 新奥特(北京)视频技术有限公司 Method and system for acquiring data from Internet
CN103207859A (en) * 2012-01-11 2013-07-17 北京四维图新科技股份有限公司 Method and device for integrating databases
WO2013185587A1 (en) * 2012-06-11 2013-12-19 腾讯科技(深圳)有限公司 Information syndication file synchronizing method, device and system
CN108665654A (en) * 2018-05-18 2018-10-16 任飞翔 Cash register information synchronization method and cash register system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012016404A1 (en) * 2010-08-05 2012-02-09 中兴通讯股份有限公司 Really simple syndication subscription method and client thereof
CN103207859A (en) * 2012-01-11 2013-07-17 北京四维图新科技股份有限公司 Method and device for integrating databases
CN103207859B (en) * 2012-01-11 2016-07-06 北京四维图新科技股份有限公司 The method and apparatus of integrated database
CN102779146A (en) * 2012-04-26 2012-11-14 新奥特(北京)视频技术有限公司 Method and system for updating data in local database in real time
CN102799602A (en) * 2012-04-26 2012-11-28 新奥特(北京)视频技术有限公司 Method and system for acquiring data from Internet
CN102799602B (en) * 2012-04-26 2018-03-16 新奥特(北京)视频技术有限公司 A kind of method and system that data are obtained from internet
WO2013185587A1 (en) * 2012-06-11 2013-12-19 腾讯科技(深圳)有限公司 Information syndication file synchronizing method, device and system
CN108665654A (en) * 2018-05-18 2018-10-16 任飞翔 Cash register information synchronization method and cash register system

Similar Documents

Publication Publication Date Title
CN100353733C (en) RSS message interactive processing method based on XML file
CN101286169B (en) Client end management for coordinating content downloading order
CN101375247B (en) Service creating method, for realizing computer program and the computer system of described method
CN100444174C (en) Method for picking-up, and aggregating micro content of web page, and automatic updating system
CN107357933B (en) Label description method and device for multi-source heterogeneous scientific and technological information resources
CN100430939C (en) Method and system for client-side manipulation of tables
KR102138896B1 (en) Method for providing online to offline based multiplatform making service combining socialmedia, marketing and e-commerce
CN101763419A (en) Method for synchronously updating remote rss data by local database
CN101196899B (en) Method and system for processing the input in an XML form
CN101290624B (en) News web page metadata automatic extraction method
CN107678943B (en) Page automatic testing method of abstract page object
CN101997927A (en) Method and system for caching data of WEB platform
CN109815382B (en) Method and system for sensing and acquiring large-scale network data
CN102279894A (en) Method for searching, integrating and providing comment information based on semantics and searching system
US20060253773A1 (en) Web-based client/server interaction method and system
AU2014400621B2 (en) System and method for providing contextual analytics data
CN110263009A (en) Generation method, device, equipment and the readable storage medium storing program for executing of log classifying rules
CN102880683A (en) Automatic network generation system for feasibility study report and generation method thereof
NL1025547C2 (en) Content management portal and method for managing digital values.
CN109446042A (en) A kind of blog management method and system for intelligent power equipment
US20180315092A1 (en) Server For Providing Internet Content and Computer-Readable Recording Medium Including Implemented Internet Content Providing Method
CN109284469B (en) Webpage development framework
KR20090047756A (en) System and method for providing internet users with individually customized rss service
US9047300B2 (en) Techniques to manage universal file descriptor models for content files
US20080281828A1 (en) Variable Data Replacement Technique For An Electronic Communication System

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Open date: 20100630