CN102799602A - Method and system for acquiring data from Internet - Google Patents
Method and system for acquiring data from Internet Download PDFInfo
- Publication number
- CN102799602A CN102799602A CN2012101264116A CN201210126411A CN102799602A CN 102799602 A CN102799602 A CN 102799602A CN 2012101264116 A CN2012101264116 A CN 2012101264116A CN 201210126411 A CN201210126411 A CN 201210126411A CN 102799602 A CN102799602 A CN 102799602A
- Authority
- CN
- China
- Prior art keywords
- xml file
- rss
- xml
- target database
- gauge
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention discloses a method and system for acquiring data from Internet. The method comprises the following steps of: acquiring an extensible markup language (XML) document from a network data provider; judging whether the acquired XML document is legal; analyzing the XML document if the acquired XML document is legal, wherein the XML document is in a really simple syndication (RSS) format if the XML document meets the standard format of the RSS, otherwise, the XML document is in a non-standard RSS format; acquiring the XML document from the network data provider again; and saving the XML document into a target data library according to the adaptability of different formats. By the method and the system, the XML documents which are in different formats including the RSS and the non-standard RSS can be intelligently identified from the Internet and are saved in the target data library; and therefore, the flexibility of data acquisition from the Internet is improved and great convenience and real-time network resources are provided for users.
Description
Technical field
The present invention relates to the internet information technical field, particularly a kind of method and system that obtains data from the internet.
Background technology
Because fast development of information technology, the world has got into the epoch of information, and information is numerous and jumbled, because information can be supported utilization for some crowd, thereby is regarded as a kind of resource, these information that can support utilization are claimed information.So-called information broadcasting system also is image-text information broadcasting system, is for traditional television broadcast system.Traditional video broadcast system is a main task to broadcast movable television image and sound accompaniment all, and information broadcasting system is to be major-minor with dynamic image with literal, figure, chart, propagates the system of various information.It can independently accomplish the broadcast of a television channel (information channel, TV shopping channel), also can be attached in traditional broadcast system, increases the broadcast quantity of information of channel.Existing information broadcasting system has following characteristic: 1, picture, video, on roll, a left side flies, the animation footmark with screen broadcast 2, multirow information real time modifying broadcasts 3 in real time, all kinds of TV column templates of customizing; The column packing directly applies mechanically 4, board-like versatile and flexible; Can set a plurality of advertisement position 5 arbitrarily, unlimited layer captions can add a large amount of display advertising information and animation file in stack 6, the advertisement show window in real time; And title and Word message 7 can be arranged in every advertising message, can broadcast Financial Information simultaneously; Exchange rate window, stock market's wind and cloud, weather forecast etc.The data of broadcasting in the information broadcasting system obtain from network data provider.
Extend markup language (Extensible Markup Language; XML); Be used for the electroactive marker son file and make it have structural SGML, can be used for flag data, definition of data type, the source language that to be a kind of user of permission define oneself SGML.XML is the subclass of standard generalized markup language (SGML), is fit to very much the Web transmission.XML provides unified method to describe and exchange the structural data that is independent of application program or supplier.
Wherein, RSS is one of form of XML file, and RSS (simple and easy information fusion also is aggregated content) is a kind of description and synchronous website format of content.RSS can be one of them of following three explanations: Really Simple Syndication; RDF (Resource Description Framework) Site Summary; Rich Site Summary.But these three explanations all are meant the technology with a kind of Syndication in fact.RSS is widely used in the cyber journalism channel at present, blog and wiki, and main version has 0.91,1.0, and 2.0.Use RSS to subscribe to and can obtain information quickly, the website provides RSS output, helps letting the user obtain the latest update of web site contents.
From realizing the inventor the process of the present invention, find to have following defective in the prior art: when obtaining the XML file from the internet, can only subscribe to the data of single form and obtain, can not discern the data of multiple form simultaneously.
Summary of the invention
To defective of the prior art; The present invention can comprise the XML file of RSS and off-gauge RSS from Intelligent Recognition different-format on the internet; Improved the dirigibility of obtaining data from the internet, for the user provides more convenience and real-time Internet resources.
The invention provides a kind of method of obtaining data from the internet in order to solve above technical matters, specifically comprise:
Obtain the expandable mark language XML file from network data provider;
Judge whether the said XML file that gets access to is legal, if legal, then analyze said XML file, if meet the standard format of really simple syndication (RSS), then said XML file is the RSS form, otherwise is off-gauge RSS form; Otherwise, obtain said XML file from network data provider again;
Deposit said XML file in target database according to different-format adaptability, specifically comprise:
When the form of said XML file is RSS, deposit in after the parsing in the said target database; Or, when the form of said XML file is off-gauge RSS, directly deposit in the said target database.
Wherein, saidly obtain the expandable mark language XML file, specifically comprise from network data provider:
Import said XML address according to user's request with parametric form;
Analyze said XML address and obtain corresponding with it URL link;
Obtain said XML file through reading said URL link.
Wherein, judge whether the said XML file that gets access to is legal, specifically comprises:
Judge according to the XML syntactic property whether the said XML file that gets access to is legal.
Wherein, when said form when said XML file is RSS, deposit in after the parsing in the said target database, specifically comprise:
When the form of said XML file is RSS, resolves the back and deposit in the said target database T_XmlRss table with linescan method.
Wherein, when said form when said XML file is off-gauge RSS, directly deposit in the said target database, specifically comprise:
When the form of said XML file is off-gauge RSS, directly said XML is deposited in the said target database T_XmlOriginal table.
The present invention also provides a kind of system that obtains data from the internet, specifically comprises:
Acquiring unit is used for obtaining the expandable mark language XML file from network data provider;
Judging unit is used to judge whether the said XML file that gets access to is legal;
Analytic unit is used to analyze said XML file, if meet the standard format of really simple syndication (RSS), then said XML file is the RSS form, otherwise is off-gauge RSS form;
Storage unit is used for depositing the XML file adaptability of different-format in target database, wherein, also specifically comprises: resolution unit is used for when the form of said XML file is RSS, depositing in after the parsing in the said target database; Or, when the form of said XML file is off-gauge RSS, directly deposit in the said target database.
Wherein, said acquiring unit specifically comprises importing unit, analytic unit and reading unit, wherein,
Import the unit, be used for importing said XML address with parametric form according to user's request;
Analytic unit is used to analyze said XML address and obtains corresponding with it URL link;
Reading unit is used for obtaining said XML file through reading said URL link.
Wherein, judging unit specifically is used for:
Judge according to the XML syntactic property whether the said XML file that gets access to is legal.
Wherein, when said form when said XML file is RSS, deposit in after the parsing in the said target database, specifically comprise:
When the form of said XML file is RSS, resolves the back and deposit in the said target database T_XmlRss table with linescan method.
Wherein, when said form when said XML file is off-gauge RSS, directly deposit in the said target database, specifically comprise:
When the form of said XML file is off-gauge RSS, directly said XML is deposited in the said target database T_XmlOriginal table.
Compared with prior art; The embodiment of the invention has the following advantages: the XML file that comprises RSS and off-gauge RSS through Intelligent Recognition different-format from the internet; Store in the target database; Thereby improved the dirigibility of obtaining data from the internet, for the user provides more convenience and real-time Internet resources.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art; To do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work property, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1: be a kind of process flow diagram that obtains the method for data from the internet in the embodiment of the invention 1;
Fig. 2: be a kind of structural drawing that obtains the system of data from the internet in the embodiment of the invention 2.
Embodiment
To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obvious described embodiment is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills are not making the every other embodiment that is obtained under the creative work prerequisite, all belong to the scope of the present invention's protection.
A kind of method of obtaining data from the internet is provided in the embodiment of the invention 1, as shown in Figure 1, may further comprise the steps:
Step S101 obtains the expandable mark language XML file from network data provider, specifically comprises:
Import said XML address according to user's request with parametric form; A plurality of XML use space-separated in the address; Username and password is then used CSV if desired, for example ' xmlReader.exehttp: gapore.info.afg.xml; User, pass ';
Analyze said XML address and obtain corresponding with it URL link;
Obtain said XML file through reading said URL link.
Step S102, judge whether the said XML file that gets access to is legal, specifically comprises:
Judge according to the XML syntactic property whether the said XML file that gets access to is legal, if legal, then if implementation step S103 illegal, then obtains said XML file from network data provider again.
Step S103 analyzes said XML file, if meet the standard format of really simple syndication (RSS), then said XML file is the RSS form, otherwise is off-gauge RSS form.
Step S104 deposits said XML file in target database according to different-format adaptability, specifically comprises:
When the form of said XML file is RSS, deposit in after the parsing in the said target database; Or, when the form of said XML file is off-gauge RSS, directly deposit in the said target database,
Wherein, when said form when said XML file is RSS, deposit in after the parsing in the said target database, specifically comprise:
When the form of said XML file is RSS, resolves the back and deposit in the said target database T_XmlRss table with linescan method;
Wherein, when said form when said XML file is off-gauge RSS, directly deposit in the said target database, specifically comprise:
When the form of said XML file is off-gauge RSS, directly said XML is deposited in the said target database T_XmlOriginal table.
The beneficial effect that the technical scheme of the embodiment of the invention is brought is following: the XML file that comprises RSS and off-gauge RSS through Intelligent Recognition different-format from the internet; Be stored in the target database; Thereby improved the dirigibility of obtaining data from the internet, for the user provides more convenience and real-time Internet resources.
A kind of system that obtains data from the internet is provided in the embodiment of the invention 2, as shown in Figure 2, comprising:
Acquiring unit 201 is used for obtaining the expandable mark language XML file from network data provider;
Wherein, said acquiring unit specifically comprises importing unit, analytic unit and reading unit, wherein,
Import unit 2011, be used for importing said XML address with parametric form according to user's request;
Analytic unit 2012 is used to analyze said XML address and obtains corresponding with it URL link;
Reading unit 2013 is used for obtaining said XML file through reading said URL link.
Judging unit 202 is used to judge whether the said XML file that gets access to is legal, is specially:
Judge according to the XML syntactic property whether the said XML file that gets access to is legal.
Analytic unit 203 is used to analyze said XML file, if meet the standard format of really simple syndication (RSS), then said XML file is the RSS form, otherwise is off-gauge RSS form.
Storage unit 204 is used for depositing the XML file adaptability of different-format in target database, wherein, also specifically comprises: resolution unit 2031 is used for when the form of said XML file is RSS, depositing in after the parsing in the said target database; Or, when the form of said XML file is off-gauge RSS, directly deposit in the said target database,
Wherein, when said form when said XML file is RSS, deposit in after the parsing in the said target database, specifically comprise:
When the form of said XML file is RSS, resolves the back and deposit in the said target database T_XmlRss table with linescan method.
Wherein, when said form when said XML file is off-gauge RSS, directly deposit in the said target database, specifically comprise:
When the form of said XML file is off-gauge RSS, directly said XML is deposited in the said target database T_XmlOriginal table.
The beneficial effect that the technical scheme of the embodiment of the invention is brought is following: the XML file that comprises RSS and off-gauge RSS through Intelligent Recognition different-format from the internet; Be stored in the target database; Thereby improved the dirigibility of obtaining data from the internet, for the user provides more convenience and real-time Internet resources.
Through the description of above embodiment, those skilled in the art can be well understood to the present invention and can realize through hardware, also can realize by the mode that software adds necessary general hardware platform.Based on such understanding; Technical scheme of the present invention can be come out with the embodied of software product, this software product can be stored in a non-volatile memory medium (can be CD-ROM, USB flash disk; Portable hard drive etc.) in; Comprise some instructions with so that computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.
It will be appreciated by those skilled in the art that accompanying drawing is the synoptic diagram of a preferred embodiment, module in the accompanying drawing or flow process might not be that embodiment of the present invention is necessary.
It will be appreciated by those skilled in the art that the module in the device among the embodiment can be distributed in the device of embodiment according to the embodiment description, also can carry out respective change and be arranged in the one or more devices that are different from present embodiment.The module of the foregoing description can be merged into a module, also can further split into a plurality of submodules.
The invention described above embodiment sequence number is not represented the quality of embodiment just to description.
More than disclosedly be merely several specific embodiment of the present invention, still, the present invention is not limited thereto, any those skilled in the art can think variation all should fall into protection scope of the present invention.
Claims (10)
1. a method of obtaining data from the internet is characterized in that, comprising:
Obtain the expandable mark language XML file from network data provider;
Judge whether the said XML file that gets access to is legal, if legal, then analyze said XML file, if meet the standard format of really simple syndication (RSS), then said XML file is the RSS form, otherwise is off-gauge RSS form; Otherwise, obtain said XML file from network data provider again;
Deposit said XML file in target database according to different-format adaptability, specifically comprise:
When the form of said XML file is RSS, deposit in after the parsing in the said target database; Or, when the form of said XML file is off-gauge RSS, directly deposit in the said target database.
2. the method for claim 1 is characterized in that, saidly obtains the expandable mark language XML file from network data provider, specifically comprises:
Import said XML address according to user's request with parametric form;
Analyze said XML address and obtain corresponding with it URL link;
Obtain said XML file through reading said URL link.
3. the method for claim 1 is characterized in that, judges whether the said XML file that gets access to is legal, specifically comprises:
Judge according to the XML syntactic property whether the said XML file that gets access to is legal.
4. the method for claim 1 is characterized in that, when said form when said XML file is RSS, deposits in after the parsing in the said target database, specifically comprises:
When the form of said XML file is RSS, resolves the back and deposit in the said target database T_XmlRss table with linescan method.
5. the method for claim 1 is characterized in that, when said form when said XML file is off-gauge RSS, directly deposits in the said target database, specifically comprises:
When the form of said XML file is off-gauge RSS, directly said XML is deposited in the said target database T_XmlOriginal table.
6. a system that obtains data from the internet is characterized in that, comprising:
Acquiring unit is used for obtaining the expandable mark language XML file from network data provider;
Judging unit is used to judge whether the said XML file that gets access to is legal;
Analytic unit is used to analyze said XML file, if meet the standard format of really simple syndication (RSS), then said XML file is the RSS form, otherwise is off-gauge RSS form;
Storage unit is used for depositing the XML file adaptability of different-format in target database, wherein, also specifically comprises: resolution unit is used for when the form of said XML file is RSS, depositing in after the parsing in the said target database; Or, when the form of said XML file is off-gauge RSS, directly deposit in the said target database.
7. system as claimed in claim 6 is characterized in that, said acquiring unit specifically comprises importing unit, analytic unit and reading unit, wherein,
Import the unit, be used for importing said XML address with parametric form according to user's request;
Analytic unit is used to analyze said XML address and obtains corresponding with it URL link;
Reading unit is used for obtaining said XML file through reading said URL link.
8. system as claimed in claim 6 is characterized in that judging unit specifically is used for:
Judge according to the XML syntactic property whether the said XML file that gets access to is legal.
9. system as claimed in claim 6 is characterized in that, when said form when said XML file is RSS, deposits in after the parsing in the said target database, specifically comprises:
When the form of said XML file is RSS, resolves the back and deposit in the said target database T_XmlRss table with linescan method.
10. system as claimed in claim 6; It is characterized in that; When said form when said XML file is off-gauge RSS; Directly deposit in the said target database, specifically comprise: when the form of said XML file is off-gauge RSS, directly said XML is deposited in the said target database T_XmlOriginal table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210126411.6A CN102799602B (en) | 2012-04-26 | 2012-04-26 | A kind of method and system that data are obtained from internet |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210126411.6A CN102799602B (en) | 2012-04-26 | 2012-04-26 | A kind of method and system that data are obtained from internet |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102799602A true CN102799602A (en) | 2012-11-28 |
CN102799602B CN102799602B (en) | 2018-03-16 |
Family
ID=47198714
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210126411.6A Expired - Fee Related CN102799602B (en) | 2012-04-26 | 2012-04-26 | A kind of method and system that data are obtained from internet |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102799602B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1672523A2 (en) * | 2004-12-20 | 2006-06-21 | Microsoft Corporation | Method and system for linking data ranges of a computer-generated document with associated extensible markup language elements |
CN2852542Y (en) * | 2005-11-07 | 2006-12-27 | 国网北京电力建设研究院 | Wireless communication base station for transmission line monitoring |
CN101739421A (en) * | 2008-11-21 | 2010-06-16 | 上海电机学院 | XML-based data integration information exchange platform |
CN101763419A (en) * | 2009-12-28 | 2010-06-30 | 山东大学 | Method for synchronously updating remote rss data by local database |
US7752224B2 (en) * | 2005-02-25 | 2010-07-06 | Microsoft Corporation | Programmability for XML data store for documents |
-
2012
- 2012-04-26 CN CN201210126411.6A patent/CN102799602B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1672523A2 (en) * | 2004-12-20 | 2006-06-21 | Microsoft Corporation | Method and system for linking data ranges of a computer-generated document with associated extensible markup language elements |
US7752224B2 (en) * | 2005-02-25 | 2010-07-06 | Microsoft Corporation | Programmability for XML data store for documents |
CN2852542Y (en) * | 2005-11-07 | 2006-12-27 | 国网北京电力建设研究院 | Wireless communication base station for transmission line monitoring |
CN101739421A (en) * | 2008-11-21 | 2010-06-16 | 上海电机学院 | XML-based data integration information exchange platform |
CN101763419A (en) * | 2009-12-28 | 2010-06-30 | 山东大学 | Method for synchronously updating remote rss data by local database |
Also Published As
Publication number | Publication date |
---|---|
CN102799602B (en) | 2018-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bruns | Faster than the speed of print: Reconciling'big data'social media analysis and academic scholarship | |
CN101753559B (en) | Network resource obtaining system and network resource list obtaining method | |
US10354294B2 (en) | Methods and systems for providing third-party content on a web page | |
US20090024922A1 (en) | Method and system for synchronizing media files | |
CN105677824A (en) | Content flow generating and publishing system and content flow capture method | |
US10536354B1 (en) | Methods and systems for identifying styles of properties of document object model elements of an information resource | |
CN102780577A (en) | Method for detecting network fault | |
CN102420855B (en) | Method and system for displaying and playing by light-emitting diode (LED) terminal as well as server | |
CN104412605A (en) | Transmission device, information processing method, program, reception device, and application linking system | |
Andersson et al. | Mobile e-services using HTML5 | |
Whittaker | Producing for Web 2.0: a student guide | |
WO2015160653A1 (en) | Methods and systems for generating a stable identifier for nodes likely including primary content within an information resource | |
US20180300424A1 (en) | Systems and methods for providing structured markup content retrievable by a service that provides rich search results | |
CN102065338A (en) | Digital television interaction service tag language resolution method and system | |
CN102402546A (en) | Webpage content display method and system | |
CN101710971A (en) | Method and device for generating page | |
CN102098319A (en) | Semi-open portal service system | |
CN104281581A (en) | Method and system for monitoring exposure of content at recommendation position of webpage | |
CN103248915B (en) | A kind of EPG system realizing individual cultivation | |
CN102799602A (en) | Method and system for acquiring data from Internet | |
CN104683883A (en) | Method and device for generating playing strategy | |
CN105808543A (en) | Information display method and apparatus | |
CN102779146A (en) | Method and system for updating data in local database in real time | |
CN102622362A (en) | Method and device for confirming resource position on webpage | |
KR101331533B1 (en) | Mobile device capable of providing optional information considering screen size |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20190320 Address after: 100195 No. 621, 6th floor, No. 1 Building, 131 North West Fourth Ring Road, Haidian District, Beijing Patentee after: Beijing Xinaote Intelligent Sports Innovation Development Co., Ltd. Address before: 100195 new technology building, 49 Wukesong Road, Haidian District, Beijing Patentee before: China Digital Video (Beijing) Limited |
|
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180316 Termination date: 20200426 |