CN104462547A - Configurable webpage data acquisition method and system - Google Patents
Configurable webpage data acquisition method and system Download PDFInfo
- Publication number
- CN104462547A CN104462547A CN201410822548.4A CN201410822548A CN104462547A CN 104462547 A CN104462547 A CN 104462547A CN 201410822548 A CN201410822548 A CN 201410822548A CN 104462547 A CN104462547 A CN 104462547A
- Authority
- CN
- China
- Prior art keywords
- configuration
- information
- website
- module
- content pages
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/957—Browsing optimisation, e.g. caching or content distillation
- G06F16/9577—Optimising the visualization of content, e.g. distillation of HTML documents
Abstract
The invention relates to a configurable webpage data acquisition method and system which is especially suitable for the situation that a webpage data acquisition mode is needed to be continuously updated. The configurable webpage data acquisition method comprises the steps that S1, configuration information for webpage data acquisition is obtained from a database; S2, required classified websites are obtained and logined according to the configuration information; S3, themes required to be acquired by the websites are obtained according to website information after login; S4, required website contents are acquired according to the configuration information and the acquired themes; S5, required information of acquired content pages is extracted through a regular expression in a configured data table or according to a certain rule; S6, extracted table data are stored in the database. By means of the configurable webpage data acquisition method and system, a user can voluntarily and optionally configure webpage data required to be acquired, acquire relevant data information of the whole network according to a configured acquisition scheme and achieve flexible and convenient webpage data acquisition.
Description
Technical field
The present invention relates to network communication technology field, more particularly, relating to a kind of for realizing the method and system of continuous renewal to the configurable collecting webpage data of the situation of the acquisition mode of web data.
Background technology
Along with the high speed development that Web technology and Web apply, the arriving of large data age, applies website to various Web, and the application of the monitoring of special social platform, the public opinion monitoring of each company, user data collection, large data mining is more and more extensive; All trades and professions also more and more rely on internet and rely on internet information height.How but the data of internet are all magnanimity, so go the data extracting our needs?
In the market only for the acquisition system of a certain website or several website, not configurable, to specify particular data webpage data acquiring method.
Webpage layout design both can adopt Table mode also can adopt DIV mode or both mixed compositions, gathered mistake or abnormal so there will be when image data; Need again to develop program after the website revision gathered, increase cost of development.
This just needs us to go to develop these data of system acquisition, and each website is each have their own design and ways of presentation, all websites can not be gathered with same kind of analysis mode, for avoiding doing analytic method for each website and website revision needs update routine, necessary needs develops a kind of general, configurable collecting webpage data system.
Summary of the invention
Technical matters to be solved by this invention is, one or several website can only be gathered for existing collecting webpage data system, there is unicity and the not strong defect of practicality, provide a kind of configurable, the method and system of operation strategies configurable collecting webpage data widely.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of method of configurable collecting webpage data, and the method comprises:
S1, from database, obtain the configuration information of collecting webpage data, this configuration information comprises: configuration gathers the classified information of website, and configuration gathers theme Template Information, and configuration gathers content pages Template Information and configuration data table information;
S2, gather the classified information of website according to configuration, judge whether to enable the classification gathering website, if it is enable the classification gathering website, obtain classifieds website, otherwise terminate program;
S3, gather the classified information of website according to configuration, judge whether to log in the classifieds website collected, if it is log in this classifieds website, otherwise virtual for employing log-on webpage is logged in this classifieds website;
S4, gather theme Template Information according to configuration, the required theme gathered under obtaining website;
S5, according to the theme gathered, judge whether the content of this theme exists multi-page situation, if it is obtain a list of websites information according to point footers, otherwise the content pages of direct this theme of acquisition;
S6, to intercept according to the opening flag of content pages and end mark and gather content, and obtain the network address set of content pages according to expression formula;
S7, according to configuration collection content pages Template Information, obtain gather content pages;
S8, according to gather content pages, judge whether it exists multi-page situation, if it is the list of websites information of multi-page is obtained according to point footers, then according to opening flag and the end mark intercepting content of content pages, otherwise the content of content pages is directly intercepted according to opening flag and end mark;
S9, obtain expression formula corresponding to field according to the data table information of configuration or dependency rule extracts list data;
S10, the list data extracted to be stored in database.
In the method for configurable collecting webpage data of the present invention, described acquisition attributes information comprises: gather network address, gather website coding and frequency acquisition.
Described collection network address, for gathering the web page address meeting configuration;
Described collection website coding, for gathering the source code of website;
Described frequency acquisition is set to every 5 minutes once.
In the method for configurable collecting webpage data of the present invention, described data table information comprises: gather title, acquisition time, collection content and gather the source of content.
Gather title, for gathering the title of content pages;
Gather content, for gathering the content of content pages;
Gather the source of content, for gathering the information of the content sources of content pages.
In the method for configurable collecting webpage data of the present invention, the configuration step of the configuration information of described step S1 comprises:
The classification of a, configuration collection website and acquisition attributes information;
B, configuration gather theme Template Information;
C, configuration gather content pages Template Information;
D, store configuration information, in database, transfer use after convenient.
Construct a kind of system of configurable collecting webpage data, comprising: start module, transfer configuration module, judge module, acquisition configuration information module, database, intercepting content module and memory module;
Described database, for store configuration information and list data;
Described acquisition configuration information module, for the web data gathered needed for configure user;
Described acquisition configuration information module comprises acquisition Website Module, obtains subject of Web site module, obtains content pages module and obtain list data module, wherein,
Described acquisition Website Module, for obtaining the classifieds website needed for user;
Described acquisition subject of Web site module, for obtaining the theme in classifieds website needed for user;
Described acquisition content pages module, for obtaining the content pages in theme needed for user;
Obtain list data module, for obtaining list data in content pages.
Described judge module comprises: the first judge module, the second judge module, the 3rd judge module and the 4th judge module;
Described intercepting content module comprises: first intercepts content module and second intercepts content module;
Described acquisition configuration information module comprises: obtain Website Module, obtain subject of Web site module, obtain content pages module and obtain list data module.
Start module, for starting configurable collecting webpage data system;
Transfer configuration module, for transferring the required corresponding configuration information gathered from database;
First judge module, for judging whether that configuration gathers the classification of website and the function of acquisition attributes, judging whether to enable the classification gathering website, if it is enabling the classification gathering website, obtaining classifieds website, otherwise terminating program;
Second judge module, logging in for judging whether the classifieds website collected, if it is logging in this website, otherwise virtual for employing log-on webpage is logged in this classifieds website;
Obtain subject of Web site module, for the subject of Web site Template Information according to configuration, obtain the required theme logging in classifieds website;
3rd judge module, for judging whether this subject content exists multi-page situation, if it is being obtained the list of websites information of multi-page, being obtained the content pages of multi-page by this list information, otherwise directly obtain the content pages of this theme according to point footers;
First intercepts content module, intercepts content information for the opening flag and end mark passing through content pages;
Obtain and gather content pages module, for the collection content page information according to configuration, from the topic module of website, obtain required content pages;
4th judge module, for judging whether it exists multi-page situation, if it is obtain the list of websites information of multi-page according to point footers, then intercept the content of content pages according to opening flag and end mark, otherwise directly intercept the content of content pages according to opening flag and end mark;
Second intercepts content module, intercepts content information for the opening flag and end mark passing through web page contents page;
Extract list data module, for the image data table information according to configuration, extract expression formula corresponding to field or Rule list data;
Memory module, for being stored into the data extracted in database.
In the system of configurable collecting webpage data of the present invention, whether described acquisition Website Module is first enabled before execution and is logged in the judgement of website, if it is carries out the module obtaining subject of Web site and content pages, otherwise will terminate process.
In the system of configurable collecting webpage data of the present invention, if described 4th judge module runs into multi-page situation, the mode image data adopting datacycle to merge when paging gathers content.
Implement the method and system of configurable collecting webpage data of the present invention, there is following beneficial effect: user can need webpage data information and the condition of collection by arbitrary disposition voluntarily, gathered the relevant data message of the whole network by the acquisition scheme configured, realize flexibly, easily any webpage carried out the collection of data content.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the first preferred embodiment of the method for configurable collecting webpage data of the present invention;
Fig. 2 is the process flow diagram of the second preferred embodiment of the method for configurable collecting webpage data of the present invention;
Fig. 3 is first or two process flow diagrams of configuration information step of preferred embodiment of the method for configurable collecting webpage data of the present invention;
Fig. 4 is the system chart of configurable collecting webpage data of the present invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
As shown in Figure 1, in the process flow diagram of the first preferred embodiment of the method for configurable collecting webpage data of the present invention, the method of described configurable collecting webpage data starts from step S100: proceed to step S110 after step S100, the configuration information of collecting webpage data is obtained from database, this configuration information comprises: configuration gathers the classified information of website, configuration gathers theme Template Information, and configuration gathers content pages Template Information and configuration data table information; Subsequently, to next step S120, gather the classified information of website according to configuration, judge whether to enable the classification gathering website, if it is enable the classification gathering website, obtain classifieds website; Otherwise end program; Subsequently, to next step S130, gather the classified information of website according to configuration, judge whether to log in the classifieds website collected, if it is log in this classifieds website, otherwise virtual for employing log-on webpage is logged in this classifieds website; Subsequently, to next step S140, gather theme Template Information according to configuration, the required theme gathered under obtaining website; Subsequently, to next step S150, according to the theme gathered, judge whether this subject content exists multi-page situation, if it is obtain the list of websites information of multi-page according to point footers, obtained the content pages of multi-page by this list information, otherwise directly obtain the content pages of this theme; Subsequently, to next step S160, intercept according to the opening flag of content pages and end mark and gather content and the network address set obtaining the multi-page of content pages according to expression formula; Subsequently, to next step S170, according to the collection content pages Template Information of configuration, obtain the content pages gathered; Subsequently, to next step 180, according to the content pages gathered, judge whether it exists multi-page situation, if it is the list of websites information of multi-page is obtained according to point footers, then intercept the content of content pages according to opening flag and end mark, otherwise directly intercept the content of content pages according to opening flag and end mark; Subsequently, to next step S190, obtain expression formula corresponding to field or dependency rule extraction list data according to the data table information of configuration, subsequently, to next step S200, be stored in database by the list data extracted, last the method ends at step S210.
Further, described acquisition attributes information comprises: gather network address, gather website coding and frequency acquisition.
Further, described data table information comprises: gather title, acquisition time, collection content and gather the source of content.
Further, described expression formula adopts regular expression, and such as find out acquisition time by regular expression, then regular expression extracts the formula on date is: d{4} (-|/| .) d{1,2} 1 d{1,2}.
The method of configurable collecting webpage data of the present invention, can provide a kind of mode that can customize the collecting webpage data of configuration needs for user, increase its practicality and validity.
As shown in Figure 2, in the process flow diagram of the second preferred embodiment of the method for configurable collecting webpage data of the present invention, the method of described configurable collecting webpage data starts from step S300: proceed to step S310 after step S300, obtains the configuration information of collecting webpage data from database; Subsequently, to next step S320, gather the classified information of website according to configuration, obtain the website of classification, arrive next step S330 subsequently, according to the subject information that configuration gathers, the required theme gathered under obtaining website; Subsequently, to next step S340, according to the theme collected, the web page contents needed for collection; Subsequently, to next step S350, according to the data table information of configuration, regular expression or certain Rule is adopted to gather the information of content pages by the data table information of configuration; Subsequently, to next step S360, the list data extracted is stored in database; Last the method ends at step S370.
The method of configurable collecting webpage data of the present invention, can provide a kind of mode that can customize the collecting webpage data of configuration needs for user, more simplify and be user-friendly to, and increasing its practicality and validity.
As shown in Figure 3, configurable collecting webpage data of the present invention method first or two preferred embodiment configuration information step process flow diagram in, configuration information step in the method for described configurable collecting webpage data starts from step S400: proceed to step S410 after step S400, and configuration gathers classification and the acquisition attributes of website; Subsequently, to next step S420, configuration gathers theme template; Subsequently, carry out next step S430, configuration gathers content pages template; Subsequently, carry out next step S440, store configuration information, in database, transfers use after convenient; Last the method ends at step S450.
The flow process of configuration information step of the present invention, it is clear to realize, and needed for detailed search collection, the data message of related web site provides the condition support of collection, is convenient to the carrying out of method flow.
As shown in Figure 4, in the system chart of configurable collecting webpage data of the present invention, the system of this configurable collecting webpage data, comprise: start module 510, transfer configuration module 520, judge module 530, obtain configuration information module 540, intercept content module 550 and memory module 560, database 570;
Described judge module 530 comprises: the first judge module 531, second judge module 532, the 3rd judge module 533 and the 4th judge module 534;
Described intercepting content module 550 comprises: first intercepts content module 551 and second intercepts content module 552;
Described database 570 is for store configuration information and list data;
Described acquisition configuration information module 540 comprises: obtain Website Module 541, obtain subject of Web site module 542, obtain content pages module 543 and obtain list data module 544.
Described startup module 510, for starting configurable collecting webpage data system;
Describedly transfer configuration module 520, for transferring the required corresponding configuration information gathered from database;
Described first judge module 531, for judging whether that configuration gathers the classification of website and the function of acquisition attributes, judging whether to enable the classification gathering website, if it is enabling the classification gathering website, otherwise terminating program;
Described acquisition Website Module 541, for classification and the attribute information of the collection website according to configuration, obtains required website from all kinds of website;
Described second judge module 532, logging in for judging whether the classifieds website collected, if it is logging in this website, otherwise virtual for employing log-on webpage is logged in this website;
Described acquisition subject of Web site module 542, for the subject of Web site Template Information according to configuration, obtains the required subject information logging in website;
Described 3rd judge module 533, for judging whether this subject content exists multi-page situation, if it is obtains the list of websites information of multi-page, otherwise directly obtains the web page contents of this theme according to point footers;
Described first intercepts content module 551, intercepts content information for the opening flag and end mark passing through web page contents;
Described acquisition content pages module 543, for the collection content page information according to configuration, obtains required content page information from the topic module of website;
Described 4th judge module 534, for judging whether it exists multi-page situation, if it is obtain the list of websites information of multi-page according to point footers, then intercept content according to opening flag and end mark, otherwise directly intercept content according to opening flag and end mark;
Described second intercepts content module 552, intercepts content information for the opening flag and end mark passing through web page contents page;
Described acquisition list data module 544, for the image data table information according to configuration, extracts expression formula corresponding to field or Rule list data;
Described memory module 560, for being stored into the data extracted in database.
Further, whether described acquisition Website Module is first enabled before execution and is logged in the judgement of website, if it is carries out the module obtaining subject of Web site and content pages, otherwise will terminate process.
Further, if described 4th judge module runs into multi-page situation, the mode image data adopting datacycle to merge when paging gathers content.
Compared with prior art, the advantage of the method and system of configurable collecting webpage data of the present invention is, user can need the web data that gathers by arbitrary disposition voluntarily, is gathered the relevant data message of the whole network, realizes flexibly, collecting webpage data easily by the acquisition scheme configured.
The foregoing is only embodiments of the invention; not thereby the scope of the claims of the present invention is limited; every equivalent structure transformation utilizing instructions of the present invention and accompanying drawing content to do, or be directly or indirectly used in other relevant technical fields, be all in like manner included in scope of patent protection of the present invention.
Claims (9)
1. a method for configurable collecting webpage data, is characterized in that, the method comprises:
S1, from database, obtain the configuration information of collecting webpage data, this configuration information comprises: configuration gathers the classified information of website, and configuration gathers theme Template Information, and configuration gathers content pages Template Information and configuration data table information;
S2, gather the classified information of website according to configuration, obtain the required classifieds website gathered;
S3, gather theme Template Information according to configuration, in the classifieds website gathered, obtain the required theme gathered;
S4, gather content pages Template Information according to configuration, obtain the required content pages gathered from the theme gathered;
S5, according to configuration data table information, obtain expression formula corresponding to field or dependency rule, from the content pages gathered, extract list data;
S6, the list data of extraction to be stored in database.
2. the method for configurable collecting webpage data according to claim 1, it is characterized in that, after described step S2, logging in judging whether the classifieds website collected, if it is log in this classifieds website, otherwise virtual for employing log-on webpage is logged in this classifieds website.
3. the method for configurable collecting webpage data according to claim 1, it is characterized in that, according to the theme obtained in described step S3, judge whether this theme exists multi-page situation, if it is the list information of multi-page network address is obtained according to point footers, obtained the content pages of multi-page by this list information, otherwise directly obtain content pages.
4. the method for configurable collecting webpage data according to claim 1, it is characterized in that, according to the content pages obtained in described step S4, judge whether it exists multi-page situation, if it is the list of websites information of multi-page and the opening flag of described content pages and end mark is obtained according to point footers, intercept the content of content pages, otherwise directly according to opening flag and the end mark of content pages, intercept the content of content pages.
5. the method for configurable collecting webpage data according to claim 1, is characterized in that, described acquisition attributes information comprises: gather network address, gather website coding and frequency acquisition.
6. the method for configurable collecting webpage data according to claim 1, is characterized in that, described data table information comprises: gather title, acquisition time, collection content and gather the source of content.
7. the method for configurable collecting webpage data according to claim 1, is characterized in that, the configuration step of the configuration information of described step S1 comprises:
The classification of a, configuration collection website and acquisition attributes;
B, configuration gather theme template;
C, configuration gather content pages template;
D, store configuration information in database, with use to be transferred.
8. a system for configurable collecting webpage data, is characterized in that, comprises database, obtains configuration information module and obtain configuration information module, wherein:
Described acquisition configuration information module, for obtaining the configuration information of collecting webpage data from database;
Described database, for store configuration information and list data;
Described acquisition configuration information module comprises acquisition Website Module, obtains subject of Web site module, obtains content pages module and obtain list data module, wherein,
Described acquisition Website Module, for gathering the classified information of website according to configuration, obtains the required classifieds website gathered;
Described acquisition subject of Web site module, for gathering theme Template Information according to configuration, obtains the required theme gathered in the classifieds website gathered;
Described acquisition content pages module, for gathering content pages Template Information according to configuration, obtains the required content pages gathered from the theme gathered;
Obtaining list data module, for obtaining expression formula corresponding to field or dependency rule, from the content pages gathered, extracting list data.
9. the system of configurable collecting webpage data according to claim 8, it is characterized in that, described acquisition Website Module also enables classifieds website for judging whether after execution, if it is carries out the module obtaining subject of Web site module and content pages, otherwise will terminate process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410822548.4A CN104462547B (en) | 2014-12-25 | 2014-12-25 | A kind of method and system of configurable collecting webpage data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410822548.4A CN104462547B (en) | 2014-12-25 | 2014-12-25 | A kind of method and system of configurable collecting webpage data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104462547A true CN104462547A (en) | 2015-03-25 |
CN104462547B CN104462547B (en) | 2019-04-02 |
Family
ID=52908582
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410822548.4A Active CN104462547B (en) | 2014-12-25 | 2014-12-25 | A kind of method and system of configurable collecting webpage data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104462547B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104915334A (en) * | 2015-05-29 | 2015-09-16 | 浪潮软件集团有限公司 | Automatic extraction method of key information of bidding project based on semantic analysis |
CN106022126A (en) * | 2016-05-06 | 2016-10-12 | 哈尔滨工程大学 | A webpage feature extracting method for WEB Trojan horse detection |
CN106341470A (en) * | 2016-08-31 | 2017-01-18 | 北京量科邦信息技术有限公司 | Method for keeping conversation and grasping continuously-updated data of conversation |
CN106547749A (en) * | 2015-09-16 | 2017-03-29 | 北京国双科技有限公司 | The method and apparatus of collecting webpage data |
CN108520043A (en) * | 2018-03-30 | 2018-09-11 | 纳思达股份有限公司 | Data object acquisition method, apparatus and system, computer readable storage medium |
CN108549678A (en) * | 2018-04-02 | 2018-09-18 | 北京今朝在线科技有限公司 | Information acquisition system |
CN108763279A (en) * | 2018-04-11 | 2018-11-06 | 北京中科闻歌科技股份有限公司 | A kind of web data distribution template acquisition method and system |
CN109902220A (en) * | 2019-02-27 | 2019-06-18 | 腾讯科技(深圳)有限公司 | Webpage information acquisition methods, device and computer readable storage medium |
CN110188259A (en) * | 2019-05-27 | 2019-08-30 | 厦门商集网络科技有限责任公司 | A kind of data grab method and device of configurableization |
CN110334259A (en) * | 2019-04-22 | 2019-10-15 | 新分享科技服务(深圳)有限公司 | Webpage data acquiring method, device and computer readable storage medium |
CN111953766A (en) * | 2020-08-07 | 2020-11-17 | 福建省天奕网络科技有限公司 | Method and system for collecting network data |
CN112667872A (en) * | 2020-11-17 | 2021-04-16 | 国家计算机网络与信息安全管理中心 | Real-time acquisition method of new coronary pneumonia epidemic situation data |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101034997A (en) * | 2006-03-09 | 2007-09-12 | 新数通兴业科技(北京)有限公司 | Method and system for accurately publishing the data information |
CN101561802A (en) * | 2008-04-18 | 2009-10-21 | 上海复旦光华信息科技股份有限公司 | Web page structural data extraction method and system |
CN103593344A (en) * | 2012-08-13 | 2014-02-19 | 北大方正集团有限公司 | Information acquisition method and device |
-
2014
- 2014-12-25 CN CN201410822548.4A patent/CN104462547B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101034997A (en) * | 2006-03-09 | 2007-09-12 | 新数通兴业科技(北京)有限公司 | Method and system for accurately publishing the data information |
CN101561802A (en) * | 2008-04-18 | 2009-10-21 | 上海复旦光华信息科技股份有限公司 | Web page structural data extraction method and system |
CN103593344A (en) * | 2012-08-13 | 2014-02-19 | 北大方正集团有限公司 | Information acquisition method and device |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104915334A (en) * | 2015-05-29 | 2015-09-16 | 浪潮软件集团有限公司 | Automatic extraction method of key information of bidding project based on semantic analysis |
CN106547749B (en) * | 2015-09-16 | 2021-02-12 | 北京国双科技有限公司 | Webpage data acquisition method and device |
CN106547749A (en) * | 2015-09-16 | 2017-03-29 | 北京国双科技有限公司 | The method and apparatus of collecting webpage data |
CN106022126A (en) * | 2016-05-06 | 2016-10-12 | 哈尔滨工程大学 | A webpage feature extracting method for WEB Trojan horse detection |
CN106022126B (en) * | 2016-05-06 | 2018-07-24 | 哈尔滨工程大学 | A kind of web page characteristics extracting method towards WEB trojan horse detections |
CN106341470A (en) * | 2016-08-31 | 2017-01-18 | 北京量科邦信息技术有限公司 | Method for keeping conversation and grasping continuously-updated data of conversation |
CN108520043A (en) * | 2018-03-30 | 2018-09-11 | 纳思达股份有限公司 | Data object acquisition method, apparatus and system, computer readable storage medium |
CN108549678A (en) * | 2018-04-02 | 2018-09-18 | 北京今朝在线科技有限公司 | Information acquisition system |
CN108763279A (en) * | 2018-04-11 | 2018-11-06 | 北京中科闻歌科技股份有限公司 | A kind of web data distribution template acquisition method and system |
CN109902220A (en) * | 2019-02-27 | 2019-06-18 | 腾讯科技(深圳)有限公司 | Webpage information acquisition methods, device and computer readable storage medium |
CN109902220B (en) * | 2019-02-27 | 2023-11-24 | 腾讯科技(深圳)有限公司 | Webpage information acquisition method, device and computer readable storage medium |
CN110334259A (en) * | 2019-04-22 | 2019-10-15 | 新分享科技服务(深圳)有限公司 | Webpage data acquiring method, device and computer readable storage medium |
CN110188259A (en) * | 2019-05-27 | 2019-08-30 | 厦门商集网络科技有限责任公司 | A kind of data grab method and device of configurableization |
CN111953766A (en) * | 2020-08-07 | 2020-11-17 | 福建省天奕网络科技有限公司 | Method and system for collecting network data |
CN112667872A (en) * | 2020-11-17 | 2021-04-16 | 国家计算机网络与信息安全管理中心 | Real-time acquisition method of new coronary pneumonia epidemic situation data |
Also Published As
Publication number | Publication date |
---|---|
CN104462547B (en) | 2019-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104462547A (en) | Configurable webpage data acquisition method and system | |
Wang et al. | How do developers react to restful api evolution? | |
US10621255B2 (en) | Identifying equivalent links on a page | |
CN103714115A (en) | Method and device for loading web page content | |
WO2014101783A1 (en) | Method and server for performing cloud detection for malicious information | |
CN103544176A (en) | Method and device for generating page structure template corresponding to multiple pages | |
US11200244B2 (en) | Keyword reporting for mobile applications | |
CN104899220A (en) | Application program recommendation method and system | |
CN103294781A (en) | Method and equipment used for processing page data | |
US9355137B2 (en) | Displaying articles matching a user's interest based on key words and the number of comments | |
CN103853757A (en) | Method and system for displaying information of network, terminal and information displaying and processing device | |
CN103279548A (en) | Method for performing barrier-free detection on websites | |
CN105117434A (en) | Webpage classification method and webpage classification system | |
EP3220285A1 (en) | Data acquisition program, data acquisition method and data acquisition device | |
CN102004805B (en) | Webpage denoising system and method based on maximum similarity matching | |
AU2014209089A1 (en) | Systems and methods for semantic URL handling | |
CN103365961A (en) | Accurate search-oriented website structurization labeling method and system | |
US20170235835A1 (en) | Information identification and extraction | |
CN104298786B (en) | A kind of image search method and device | |
JP5216654B2 (en) | Importance determination device, importance determination method, and program | |
CN103246680A (en) | Method and device for aggregating and displaying webpage contents in browser | |
CN103377207B (en) | Microblog users relation acquisition method based on script engine | |
CN105550279A (en) | Vision-based list page identification method | |
CN106339381B (en) | Information processing method and device | |
JP5380874B2 (en) | Information retrieval method, program and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |