CN102799602A - Method and system for acquiring data from Internet - Google Patents

Method and system for acquiring data from Internet Download PDF

Info

Publication number
CN102799602A
CN102799602A CN2012101264116A CN201210126411A CN102799602A CN 102799602 A CN102799602 A CN 102799602A CN 2012101264116 A CN2012101264116 A CN 2012101264116A CN 201210126411 A CN201210126411 A CN 201210126411A CN 102799602 A CN102799602 A CN 102799602A
Authority
CN
China
Prior art keywords
xml file
rss
xml
target database
gauge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101264116A
Other languages
Chinese (zh)
Other versions
CN102799602B (en
Inventor
王征
赵海军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xinaote Intelligent Sports Innovation Development Co., Ltd.
Original Assignee
China Digital Video Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Digital Video Beijing Ltd filed Critical China Digital Video Beijing Ltd
Priority to CN201210126411.6A priority Critical patent/CN102799602B/en
Publication of CN102799602A publication Critical patent/CN102799602A/en
Application granted granted Critical
Publication of CN102799602B publication Critical patent/CN102799602B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and system for acquiring data from Internet. The method comprises the following steps of: acquiring an extensible markup language (XML) document from a network data provider; judging whether the acquired XML document is legal; analyzing the XML document if the acquired XML document is legal, wherein the XML document is in a really simple syndication (RSS) format if the XML document meets the standard format of the RSS, otherwise, the XML document is in a non-standard RSS format; acquiring the XML document from the network data provider again; and saving the XML document into a target data library according to the adaptability of different formats. By the method and the system, the XML documents which are in different formats including the RSS and the non-standard RSS can be intelligently identified from the Internet and are saved in the target data library; and therefore, the flexibility of data acquisition from the Internet is improved and great convenience and real-time network resources are provided for users.

Description

A kind of method and system that obtains data from the internet
Technical field
The present invention relates to the internet information technical field, particularly a kind of method and system that obtains data from the internet.
Background technology
Because fast development of information technology, the world has got into the epoch of information, and information is numerous and jumbled, because information can be supported utilization for some crowd, thereby is regarded as a kind of resource, these information that can support utilization are claimed information.So-called information broadcasting system also is image-text information broadcasting system, is for traditional television broadcast system.Traditional video broadcast system is a main task to broadcast movable television image and sound accompaniment all, and information broadcasting system is to be major-minor with dynamic image with literal, figure, chart, propagates the system of various information.It can independently accomplish the broadcast of a television channel (information channel, TV shopping channel), also can be attached in traditional broadcast system, increases the broadcast quantity of information of channel.Existing information broadcasting system has following characteristic: 1, picture, video, on roll, a left side flies, the animation footmark with screen broadcast 2, multirow information real time modifying broadcasts 3 in real time, all kinds of TV column templates of customizing; The column packing directly applies mechanically 4, board-like versatile and flexible; Can set a plurality of advertisement position 5 arbitrarily, unlimited layer captions can add a large amount of display advertising information and animation file in stack 6, the advertisement show window in real time; And title and Word message 7 can be arranged in every advertising message, can broadcast Financial Information simultaneously; Exchange rate window, stock market's wind and cloud, weather forecast etc.The data of broadcasting in the information broadcasting system obtain from network data provider.
Extend markup language (Extensible Markup Language; XML); Be used for the electroactive marker son file and make it have structural SGML, can be used for flag data, definition of data type, the source language that to be a kind of user of permission define oneself SGML.XML is the subclass of standard generalized markup language (SGML), is fit to very much the Web transmission.XML provides unified method to describe and exchange the structural data that is independent of application program or supplier.
Wherein, RSS is one of form of XML file, and RSS (simple and easy information fusion also is aggregated content) is a kind of description and synchronous website format of content.RSS can be one of them of following three explanations: Really Simple Syndication; RDF (Resource Description Framework) Site Summary; Rich Site Summary.But these three explanations all are meant the technology with a kind of Syndication in fact.RSS is widely used in the cyber journalism channel at present, blog and wiki, and main version has 0.91,1.0, and 2.0.Use RSS to subscribe to and can obtain information quickly, the website provides RSS output, helps letting the user obtain the latest update of web site contents.
From realizing the inventor the process of the present invention, find to have following defective in the prior art: when obtaining the XML file from the internet, can only subscribe to the data of single form and obtain, can not discern the data of multiple form simultaneously.
Summary of the invention
To defective of the prior art; The present invention can comprise the XML file of RSS and off-gauge RSS from Intelligent Recognition different-format on the internet; Improved the dirigibility of obtaining data from the internet, for the user provides more convenience and real-time Internet resources.
The invention provides a kind of method of obtaining data from the internet in order to solve above technical matters, specifically comprise:
Obtain the expandable mark language XML file from network data provider;
Judge whether the said XML file that gets access to is legal, if legal, then analyze said XML file, if meet the standard format of really simple syndication (RSS), then said XML file is the RSS form, otherwise is off-gauge RSS form; Otherwise, obtain said XML file from network data provider again;
Deposit said XML file in target database according to different-format adaptability, specifically comprise:
When the form of said XML file is RSS, deposit in after the parsing in the said target database; Or, when the form of said XML file is off-gauge RSS, directly deposit in the said target database.
Wherein, saidly obtain the expandable mark language XML file, specifically comprise from network data provider:
Import said XML address according to user's request with parametric form;
Analyze said XML address and obtain corresponding with it URL link;
Obtain said XML file through reading said URL link.
Wherein, judge whether the said XML file that gets access to is legal, specifically comprises:
Judge according to the XML syntactic property whether the said XML file that gets access to is legal.
Wherein, when said form when said XML file is RSS, deposit in after the parsing in the said target database, specifically comprise:
When the form of said XML file is RSS, resolves the back and deposit in the said target database T_XmlRss table with linescan method.
Wherein, when said form when said XML file is off-gauge RSS, directly deposit in the said target database, specifically comprise:
When the form of said XML file is off-gauge RSS, directly said XML is deposited in the said target database T_XmlOriginal table.
The present invention also provides a kind of system that obtains data from the internet, specifically comprises:
Acquiring unit is used for obtaining the expandable mark language XML file from network data provider;
Judging unit is used to judge whether the said XML file that gets access to is legal;
Analytic unit is used to analyze said XML file, if meet the standard format of really simple syndication (RSS), then said XML file is the RSS form, otherwise is off-gauge RSS form;
Storage unit is used for depositing the XML file adaptability of different-format in target database, wherein, also specifically comprises: resolution unit is used for when the form of said XML file is RSS, depositing in after the parsing in the said target database; Or, when the form of said XML file is off-gauge RSS, directly deposit in the said target database.
Wherein, said acquiring unit specifically comprises importing unit, analytic unit and reading unit, wherein,
Import the unit, be used for importing said XML address with parametric form according to user's request;
Analytic unit is used to analyze said XML address and obtains corresponding with it URL link;
Reading unit is used for obtaining said XML file through reading said URL link.
Wherein, judging unit specifically is used for:
Judge according to the XML syntactic property whether the said XML file that gets access to is legal.
Wherein, when said form when said XML file is RSS, deposit in after the parsing in the said target database, specifically comprise:
When the form of said XML file is RSS, resolves the back and deposit in the said target database T_XmlRss table with linescan method.
Wherein, when said form when said XML file is off-gauge RSS, directly deposit in the said target database, specifically comprise:
When the form of said XML file is off-gauge RSS, directly said XML is deposited in the said target database T_XmlOriginal table.
Compared with prior art; The embodiment of the invention has the following advantages: the XML file that comprises RSS and off-gauge RSS through Intelligent Recognition different-format from the internet; Store in the target database; Thereby improved the dirigibility of obtaining data from the internet, for the user provides more convenience and real-time Internet resources.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art; To do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work property, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1: be a kind of process flow diagram that obtains the method for data from the internet in the embodiment of the invention 1;
Fig. 2: be a kind of structural drawing that obtains the system of data from the internet in the embodiment of the invention 2.
Embodiment
To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obvious described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills are not making the every other embodiment that is obtained under the creative work prerequisite, all belong to the scope of the present invention's protection.
A kind of method of obtaining data from the internet is provided in the embodiment of the invention 1, as shown in Figure 1, may further comprise the steps:
Step S101 obtains the expandable mark language XML file from network data provider, specifically comprises:
Import said XML address according to user's request with parametric form; A plurality of XML use space-separated in the address; Username and password is then used CSV if desired, for example ' xmlReader.exehttp: gapore.info.afg.xml; User, pass ';
Analyze said XML address and obtain corresponding with it URL link;
Obtain said XML file through reading said URL link.
Step S102, judge whether the said XML file that gets access to is legal, specifically comprises:
Judge according to the XML syntactic property whether the said XML file that gets access to is legal, if legal, then if implementation step S103 illegal, then obtains said XML file from network data provider again.
Step S103 analyzes said XML file, if meet the standard format of really simple syndication (RSS), then said XML file is the RSS form, otherwise is off-gauge RSS form.
Step S104 deposits said XML file in target database according to different-format adaptability, specifically comprises:
When the form of said XML file is RSS, deposit in after the parsing in the said target database; Or, when the form of said XML file is off-gauge RSS, directly deposit in the said target database,
Wherein, when said form when said XML file is RSS, deposit in after the parsing in the said target database, specifically comprise:
When the form of said XML file is RSS, resolves the back and deposit in the said target database T_XmlRss table with linescan method;
Wherein, when said form when said XML file is off-gauge RSS, directly deposit in the said target database, specifically comprise:
When the form of said XML file is off-gauge RSS, directly said XML is deposited in the said target database T_XmlOriginal table.
The beneficial effect that the technical scheme of the embodiment of the invention is brought is following: the XML file that comprises RSS and off-gauge RSS through Intelligent Recognition different-format from the internet; Be stored in the target database; Thereby improved the dirigibility of obtaining data from the internet, for the user provides more convenience and real-time Internet resources.
A kind of system that obtains data from the internet is provided in the embodiment of the invention 2, as shown in Figure 2, comprising:
Acquiring unit 201 is used for obtaining the expandable mark language XML file from network data provider;
Wherein, said acquiring unit specifically comprises importing unit, analytic unit and reading unit, wherein,
Import unit 2011, be used for importing said XML address with parametric form according to user's request;
Analytic unit 2012 is used to analyze said XML address and obtains corresponding with it URL link;
Reading unit 2013 is used for obtaining said XML file through reading said URL link.
Judging unit 202 is used to judge whether the said XML file that gets access to is legal, is specially:
Judge according to the XML syntactic property whether the said XML file that gets access to is legal.
Analytic unit 203 is used to analyze said XML file, if meet the standard format of really simple syndication (RSS), then said XML file is the RSS form, otherwise is off-gauge RSS form.
Storage unit 204 is used for depositing the XML file adaptability of different-format in target database, wherein, also specifically comprises: resolution unit 2031 is used for when the form of said XML file is RSS, depositing in after the parsing in the said target database; Or, when the form of said XML file is off-gauge RSS, directly deposit in the said target database,
Wherein, when said form when said XML file is RSS, deposit in after the parsing in the said target database, specifically comprise:
When the form of said XML file is RSS, resolves the back and deposit in the said target database T_XmlRss table with linescan method.
Wherein, when said form when said XML file is off-gauge RSS, directly deposit in the said target database, specifically comprise:
When the form of said XML file is off-gauge RSS, directly said XML is deposited in the said target database T_XmlOriginal table.
The beneficial effect that the technical scheme of the embodiment of the invention is brought is following: the XML file that comprises RSS and off-gauge RSS through Intelligent Recognition different-format from the internet; Be stored in the target database; Thereby improved the dirigibility of obtaining data from the internet, for the user provides more convenience and real-time Internet resources.
Through the description of above embodiment, those skilled in the art can be well understood to the present invention and can realize through hardware, also can realize by the mode that software adds necessary general hardware platform.Based on such understanding; Technical scheme of the present invention can be come out with the embodied of software product, this software product can be stored in a non-volatile memory medium (can be CD-ROM, USB flash disk; Portable hard drive etc.) in; Comprise some instructions with so that computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.
It will be appreciated by those skilled in the art that accompanying drawing is the synoptic diagram of a preferred embodiment, module in the accompanying drawing or flow process might not be that embodiment of the present invention is necessary.
It will be appreciated by those skilled in the art that the module in the device among the embodiment can be distributed in the device of embodiment according to the embodiment description, also can carry out respective change and be arranged in the one or more devices that are different from present embodiment.The module of the foregoing description can be merged into a module, also can further split into a plurality of submodules.
The invention described above embodiment sequence number is not represented the quality of embodiment just to description.
More than disclosedly be merely several specific embodiment of the present invention, still, the present invention is not limited thereto, any those skilled in the art can think variation all should fall into protection scope of the present invention.

Claims (10)

1. a method of obtaining data from the internet is characterized in that, comprising:
Obtain the expandable mark language XML file from network data provider;
Judge whether the said XML file that gets access to is legal, if legal, then analyze said XML file, if meet the standard format of really simple syndication (RSS), then said XML file is the RSS form, otherwise is off-gauge RSS form; Otherwise, obtain said XML file from network data provider again;
Deposit said XML file in target database according to different-format adaptability, specifically comprise:
When the form of said XML file is RSS, deposit in after the parsing in the said target database; Or, when the form of said XML file is off-gauge RSS, directly deposit in the said target database.
2. the method for claim 1 is characterized in that, saidly obtains the expandable mark language XML file from network data provider, specifically comprises:
Import said XML address according to user's request with parametric form;
Analyze said XML address and obtain corresponding with it URL link;
Obtain said XML file through reading said URL link.
3. the method for claim 1 is characterized in that, judges whether the said XML file that gets access to is legal, specifically comprises:
Judge according to the XML syntactic property whether the said XML file that gets access to is legal.
4. the method for claim 1 is characterized in that, when said form when said XML file is RSS, deposits in after the parsing in the said target database, specifically comprises:
When the form of said XML file is RSS, resolves the back and deposit in the said target database T_XmlRss table with linescan method.
5. the method for claim 1 is characterized in that, when said form when said XML file is off-gauge RSS, directly deposits in the said target database, specifically comprises:
When the form of said XML file is off-gauge RSS, directly said XML is deposited in the said target database T_XmlOriginal table.
6. a system that obtains data from the internet is characterized in that, comprising:
Acquiring unit is used for obtaining the expandable mark language XML file from network data provider;
Judging unit is used to judge whether the said XML file that gets access to is legal;
Analytic unit is used to analyze said XML file, if meet the standard format of really simple syndication (RSS), then said XML file is the RSS form, otherwise is off-gauge RSS form;
Storage unit is used for depositing the XML file adaptability of different-format in target database, wherein, also specifically comprises: resolution unit is used for when the form of said XML file is RSS, depositing in after the parsing in the said target database; Or, when the form of said XML file is off-gauge RSS, directly deposit in the said target database.
7. system as claimed in claim 6 is characterized in that, said acquiring unit specifically comprises importing unit, analytic unit and reading unit, wherein,
Import the unit, be used for importing said XML address with parametric form according to user's request;
Analytic unit is used to analyze said XML address and obtains corresponding with it URL link;
Reading unit is used for obtaining said XML file through reading said URL link.
8. system as claimed in claim 6 is characterized in that judging unit specifically is used for:
Judge according to the XML syntactic property whether the said XML file that gets access to is legal.
9. system as claimed in claim 6 is characterized in that, when said form when said XML file is RSS, deposits in after the parsing in the said target database, specifically comprises:
When the form of said XML file is RSS, resolves the back and deposit in the said target database T_XmlRss table with linescan method.
10. system as claimed in claim 6; It is characterized in that; When said form when said XML file is off-gauge RSS; Directly deposit in the said target database, specifically comprise: when the form of said XML file is off-gauge RSS, directly said XML is deposited in the said target database T_XmlOriginal table.
CN201210126411.6A 2012-04-26 2012-04-26 A kind of method and system that data are obtained from internet Expired - Fee Related CN102799602B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210126411.6A CN102799602B (en) 2012-04-26 2012-04-26 A kind of method and system that data are obtained from internet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210126411.6A CN102799602B (en) 2012-04-26 2012-04-26 A kind of method and system that data are obtained from internet

Publications (2)

Publication Number Publication Date
CN102799602A true CN102799602A (en) 2012-11-28
CN102799602B CN102799602B (en) 2018-03-16

Family

ID=47198714

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210126411.6A Expired - Fee Related CN102799602B (en) 2012-04-26 2012-04-26 A kind of method and system that data are obtained from internet

Country Status (1)

Country Link
CN (1) CN102799602B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1672523A2 (en) * 2004-12-20 2006-06-21 Microsoft Corporation Method and system for linking data ranges of a computer-generated document with associated extensible markup language elements
CN2852542Y (en) * 2005-11-07 2006-12-27 国网北京电力建设研究院 Wireless communication base station for transmission line monitoring
CN101739421A (en) * 2008-11-21 2010-06-16 上海电机学院 XML-based data integration information exchange platform
CN101763419A (en) * 2009-12-28 2010-06-30 山东大学 Method for synchronously updating remote rss data by local database
US7752224B2 (en) * 2005-02-25 2010-07-06 Microsoft Corporation Programmability for XML data store for documents

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1672523A2 (en) * 2004-12-20 2006-06-21 Microsoft Corporation Method and system for linking data ranges of a computer-generated document with associated extensible markup language elements
US7752224B2 (en) * 2005-02-25 2010-07-06 Microsoft Corporation Programmability for XML data store for documents
CN2852542Y (en) * 2005-11-07 2006-12-27 国网北京电力建设研究院 Wireless communication base station for transmission line monitoring
CN101739421A (en) * 2008-11-21 2010-06-16 上海电机学院 XML-based data integration information exchange platform
CN101763419A (en) * 2009-12-28 2010-06-30 山东大学 Method for synchronously updating remote rss data by local database

Also Published As

Publication number Publication date
CN102799602B (en) 2018-03-16

Similar Documents

Publication Publication Date Title
Bruns Faster than the speed of print: Reconciling'big data'social media analysis and academic scholarship
CN101753559B (en) Network resource obtaining system and network resource list obtaining method
US10354294B2 (en) Methods and systems for providing third-party content on a web page
US20090024922A1 (en) Method and system for synchronizing media files
CN105677824A (en) Content flow generating and publishing system and content flow capture method
US10536354B1 (en) Methods and systems for identifying styles of properties of document object model elements of an information resource
CN102780577A (en) Method for detecting network fault
CN102420855B (en) Method and system for displaying and playing by light-emitting diode (LED) terminal as well as server
CN104412605A (en) Transmission device, information processing method, program, reception device, and application linking system
Andersson et al. Mobile e-services using HTML5
Whittaker Producing for Web 2.0: a student guide
WO2015160653A1 (en) Methods and systems for generating a stable identifier for nodes likely including primary content within an information resource
US20180300424A1 (en) Systems and methods for providing structured markup content retrievable by a service that provides rich search results
CN102065338A (en) Digital television interaction service tag language resolution method and system
CN102402546A (en) Webpage content display method and system
CN101710971A (en) Method and device for generating page
CN102098319A (en) Semi-open portal service system
CN104281581A (en) Method and system for monitoring exposure of content at recommendation position of webpage
CN103248915B (en) A kind of EPG system realizing individual cultivation
CN102799602A (en) Method and system for acquiring data from Internet
CN104683883A (en) Method and device for generating playing strategy
CN105808543A (en) Information display method and apparatus
CN102779146A (en) Method and system for updating data in local database in real time
CN102622362A (en) Method and device for confirming resource position on webpage
KR101331533B1 (en) Mobile device capable of providing optional information considering screen size

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20190320

Address after: 100195 No. 621, 6th floor, No. 1 Building, 131 North West Fourth Ring Road, Haidian District, Beijing

Patentee after: Beijing Xinaote Intelligent Sports Innovation Development Co., Ltd.

Address before: 100195 new technology building, 49 Wukesong Road, Haidian District, Beijing

Patentee before: China Digital Video (Beijing) Limited

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180316

Termination date: 20200426