CN104462547A - Configurable webpage data acquisition method and system - Google Patents

Configurable webpage data acquisition method and system Download PDF

Info

Publication number
CN104462547A
CN104462547A CN201410822548.4A CN201410822548A CN104462547A CN 104462547 A CN104462547 A CN 104462547A CN 201410822548 A CN201410822548 A CN 201410822548A CN 104462547 A CN104462547 A CN 104462547A
Authority
CN
China
Prior art keywords
configuration
information
website
module
content pages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410822548.4A
Other languages
Chinese (zh)
Other versions
CN104462547B (en
Inventor
吴正辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENZHEN LAN-YOU TECHNOLOG Co Ltd
Original Assignee
SHENZHEN LAN-YOU TECHNOLOG Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN LAN-YOU TECHNOLOG Co Ltd filed Critical SHENZHEN LAN-YOU TECHNOLOG Co Ltd
Priority to CN201410822548.4A priority Critical patent/CN104462547B/en
Publication of CN104462547A publication Critical patent/CN104462547A/en
Application granted granted Critical
Publication of CN104462547B publication Critical patent/CN104462547B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents

Abstract

The invention relates to a configurable webpage data acquisition method and system which is especially suitable for the situation that a webpage data acquisition mode is needed to be continuously updated. The configurable webpage data acquisition method comprises the steps that S1, configuration information for webpage data acquisition is obtained from a database; S2, required classified websites are obtained and logined according to the configuration information; S3, themes required to be acquired by the websites are obtained according to website information after login; S4, required website contents are acquired according to the configuration information and the acquired themes; S5, required information of acquired content pages is extracted through a regular expression in a configured data table or according to a certain rule; S6, extracted table data are stored in the database. By means of the configurable webpage data acquisition method and system, a user can voluntarily and optionally configure webpage data required to be acquired, acquire relevant data information of the whole network according to a configured acquisition scheme and achieve flexible and convenient webpage data acquisition.

Description

A kind of method and system of configurable collecting webpage data
Technical field
The present invention relates to network communication technology field, more particularly, relating to a kind of for realizing the method and system of continuous renewal to the configurable collecting webpage data of the situation of the acquisition mode of web data.
Background technology
Along with the high speed development that Web technology and Web apply, the arriving of large data age, applies website to various Web, and the application of the monitoring of special social platform, the public opinion monitoring of each company, user data collection, large data mining is more and more extensive; All trades and professions also more and more rely on internet and rely on internet information height.How but the data of internet are all magnanimity, so go the data extracting our needs?
In the market only for the acquisition system of a certain website or several website, not configurable, to specify particular data webpage data acquiring method.
Webpage layout design both can adopt Table mode also can adopt DIV mode or both mixed compositions, gathered mistake or abnormal so there will be when image data; Need again to develop program after the website revision gathered, increase cost of development.
This just needs us to go to develop these data of system acquisition, and each website is each have their own design and ways of presentation, all websites can not be gathered with same kind of analysis mode, for avoiding doing analytic method for each website and website revision needs update routine, necessary needs develops a kind of general, configurable collecting webpage data system.
Summary of the invention
Technical matters to be solved by this invention is, one or several website can only be gathered for existing collecting webpage data system, there is unicity and the not strong defect of practicality, provide a kind of configurable, the method and system of operation strategies configurable collecting webpage data widely.
The technical scheme that the present invention solves the problems of the technologies described above is as follows: a kind of method of configurable collecting webpage data, and the method comprises:
S1, from database, obtain the configuration information of collecting webpage data, this configuration information comprises: configuration gathers the classified information of website, and configuration gathers theme Template Information, and configuration gathers content pages Template Information and configuration data table information;
S2, gather the classified information of website according to configuration, judge whether to enable the classification gathering website, if it is enable the classification gathering website, obtain classifieds website, otherwise terminate program;
S3, gather the classified information of website according to configuration, judge whether to log in the classifieds website collected, if it is log in this classifieds website, otherwise virtual for employing log-on webpage is logged in this classifieds website;
S4, gather theme Template Information according to configuration, the required theme gathered under obtaining website;
S5, according to the theme gathered, judge whether the content of this theme exists multi-page situation, if it is obtain a list of websites information according to point footers, otherwise the content pages of direct this theme of acquisition;
S6, to intercept according to the opening flag of content pages and end mark and gather content, and obtain the network address set of content pages according to expression formula;
S7, according to configuration collection content pages Template Information, obtain gather content pages;
S8, according to gather content pages, judge whether it exists multi-page situation, if it is the list of websites information of multi-page is obtained according to point footers, then according to opening flag and the end mark intercepting content of content pages, otherwise the content of content pages is directly intercepted according to opening flag and end mark;
S9, obtain expression formula corresponding to field according to the data table information of configuration or dependency rule extracts list data;
S10, the list data extracted to be stored in database.
In the method for configurable collecting webpage data of the present invention, described acquisition attributes information comprises: gather network address, gather website coding and frequency acquisition.
Described collection network address, for gathering the web page address meeting configuration;
Described collection website coding, for gathering the source code of website;
Described frequency acquisition is set to every 5 minutes once.
In the method for configurable collecting webpage data of the present invention, described data table information comprises: gather title, acquisition time, collection content and gather the source of content.
Gather title, for gathering the title of content pages;
Gather content, for gathering the content of content pages;
Gather the source of content, for gathering the information of the content sources of content pages.
In the method for configurable collecting webpage data of the present invention, the configuration step of the configuration information of described step S1 comprises:
The classification of a, configuration collection website and acquisition attributes information;
B, configuration gather theme Template Information;
C, configuration gather content pages Template Information;
D, store configuration information, in database, transfer use after convenient.
Construct a kind of system of configurable collecting webpage data, comprising: start module, transfer configuration module, judge module, acquisition configuration information module, database, intercepting content module and memory module;
Described database, for store configuration information and list data;
Described acquisition configuration information module, for the web data gathered needed for configure user;
Described acquisition configuration information module comprises acquisition Website Module, obtains subject of Web site module, obtains content pages module and obtain list data module, wherein,
Described acquisition Website Module, for obtaining the classifieds website needed for user;
Described acquisition subject of Web site module, for obtaining the theme in classifieds website needed for user;
Described acquisition content pages module, for obtaining the content pages in theme needed for user;
Obtain list data module, for obtaining list data in content pages.
Described judge module comprises: the first judge module, the second judge module, the 3rd judge module and the 4th judge module;
Described intercepting content module comprises: first intercepts content module and second intercepts content module;
Described acquisition configuration information module comprises: obtain Website Module, obtain subject of Web site module, obtain content pages module and obtain list data module.
Start module, for starting configurable collecting webpage data system;
Transfer configuration module, for transferring the required corresponding configuration information gathered from database;
First judge module, for judging whether that configuration gathers the classification of website and the function of acquisition attributes, judging whether to enable the classification gathering website, if it is enabling the classification gathering website, obtaining classifieds website, otherwise terminating program;
Second judge module, logging in for judging whether the classifieds website collected, if it is logging in this website, otherwise virtual for employing log-on webpage is logged in this classifieds website;
Obtain subject of Web site module, for the subject of Web site Template Information according to configuration, obtain the required theme logging in classifieds website;
3rd judge module, for judging whether this subject content exists multi-page situation, if it is being obtained the list of websites information of multi-page, being obtained the content pages of multi-page by this list information, otherwise directly obtain the content pages of this theme according to point footers;
First intercepts content module, intercepts content information for the opening flag and end mark passing through content pages;
Obtain and gather content pages module, for the collection content page information according to configuration, from the topic module of website, obtain required content pages;
4th judge module, for judging whether it exists multi-page situation, if it is obtain the list of websites information of multi-page according to point footers, then intercept the content of content pages according to opening flag and end mark, otherwise directly intercept the content of content pages according to opening flag and end mark;
Second intercepts content module, intercepts content information for the opening flag and end mark passing through web page contents page;
Extract list data module, for the image data table information according to configuration, extract expression formula corresponding to field or Rule list data;
Memory module, for being stored into the data extracted in database.
In the system of configurable collecting webpage data of the present invention, whether described acquisition Website Module is first enabled before execution and is logged in the judgement of website, if it is carries out the module obtaining subject of Web site and content pages, otherwise will terminate process.
In the system of configurable collecting webpage data of the present invention, if described 4th judge module runs into multi-page situation, the mode image data adopting datacycle to merge when paging gathers content.
Implement the method and system of configurable collecting webpage data of the present invention, there is following beneficial effect: user can need webpage data information and the condition of collection by arbitrary disposition voluntarily, gathered the relevant data message of the whole network by the acquisition scheme configured, realize flexibly, easily any webpage carried out the collection of data content.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the first preferred embodiment of the method for configurable collecting webpage data of the present invention;
Fig. 2 is the process flow diagram of the second preferred embodiment of the method for configurable collecting webpage data of the present invention;
Fig. 3 is first or two process flow diagrams of configuration information step of preferred embodiment of the method for configurable collecting webpage data of the present invention;
Fig. 4 is the system chart of configurable collecting webpage data of the present invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
As shown in Figure 1, in the process flow diagram of the first preferred embodiment of the method for configurable collecting webpage data of the present invention, the method of described configurable collecting webpage data starts from step S100: proceed to step S110 after step S100, the configuration information of collecting webpage data is obtained from database, this configuration information comprises: configuration gathers the classified information of website, configuration gathers theme Template Information, and configuration gathers content pages Template Information and configuration data table information; Subsequently, to next step S120, gather the classified information of website according to configuration, judge whether to enable the classification gathering website, if it is enable the classification gathering website, obtain classifieds website; Otherwise end program; Subsequently, to next step S130, gather the classified information of website according to configuration, judge whether to log in the classifieds website collected, if it is log in this classifieds website, otherwise virtual for employing log-on webpage is logged in this classifieds website; Subsequently, to next step S140, gather theme Template Information according to configuration, the required theme gathered under obtaining website; Subsequently, to next step S150, according to the theme gathered, judge whether this subject content exists multi-page situation, if it is obtain the list of websites information of multi-page according to point footers, obtained the content pages of multi-page by this list information, otherwise directly obtain the content pages of this theme; Subsequently, to next step S160, intercept according to the opening flag of content pages and end mark and gather content and the network address set obtaining the multi-page of content pages according to expression formula; Subsequently, to next step S170, according to the collection content pages Template Information of configuration, obtain the content pages gathered; Subsequently, to next step 180, according to the content pages gathered, judge whether it exists multi-page situation, if it is the list of websites information of multi-page is obtained according to point footers, then intercept the content of content pages according to opening flag and end mark, otherwise directly intercept the content of content pages according to opening flag and end mark; Subsequently, to next step S190, obtain expression formula corresponding to field or dependency rule extraction list data according to the data table information of configuration, subsequently, to next step S200, be stored in database by the list data extracted, last the method ends at step S210.
Further, described acquisition attributes information comprises: gather network address, gather website coding and frequency acquisition.
Further, described data table information comprises: gather title, acquisition time, collection content and gather the source of content.
Further, described expression formula adopts regular expression, and such as find out acquisition time by regular expression, then regular expression extracts the formula on date is: d{4} (-|/| .) d{1,2} 1 d{1,2}.
The method of configurable collecting webpage data of the present invention, can provide a kind of mode that can customize the collecting webpage data of configuration needs for user, increase its practicality and validity.
As shown in Figure 2, in the process flow diagram of the second preferred embodiment of the method for configurable collecting webpage data of the present invention, the method of described configurable collecting webpage data starts from step S300: proceed to step S310 after step S300, obtains the configuration information of collecting webpage data from database; Subsequently, to next step S320, gather the classified information of website according to configuration, obtain the website of classification, arrive next step S330 subsequently, according to the subject information that configuration gathers, the required theme gathered under obtaining website; Subsequently, to next step S340, according to the theme collected, the web page contents needed for collection; Subsequently, to next step S350, according to the data table information of configuration, regular expression or certain Rule is adopted to gather the information of content pages by the data table information of configuration; Subsequently, to next step S360, the list data extracted is stored in database; Last the method ends at step S370.
The method of configurable collecting webpage data of the present invention, can provide a kind of mode that can customize the collecting webpage data of configuration needs for user, more simplify and be user-friendly to, and increasing its practicality and validity.
As shown in Figure 3, configurable collecting webpage data of the present invention method first or two preferred embodiment configuration information step process flow diagram in, configuration information step in the method for described configurable collecting webpage data starts from step S400: proceed to step S410 after step S400, and configuration gathers classification and the acquisition attributes of website; Subsequently, to next step S420, configuration gathers theme template; Subsequently, carry out next step S430, configuration gathers content pages template; Subsequently, carry out next step S440, store configuration information, in database, transfers use after convenient; Last the method ends at step S450.
The flow process of configuration information step of the present invention, it is clear to realize, and needed for detailed search collection, the data message of related web site provides the condition support of collection, is convenient to the carrying out of method flow.
As shown in Figure 4, in the system chart of configurable collecting webpage data of the present invention, the system of this configurable collecting webpage data, comprise: start module 510, transfer configuration module 520, judge module 530, obtain configuration information module 540, intercept content module 550 and memory module 560, database 570;
Described judge module 530 comprises: the first judge module 531, second judge module 532, the 3rd judge module 533 and the 4th judge module 534;
Described intercepting content module 550 comprises: first intercepts content module 551 and second intercepts content module 552;
Described database 570 is for store configuration information and list data;
Described acquisition configuration information module 540 comprises: obtain Website Module 541, obtain subject of Web site module 542, obtain content pages module 543 and obtain list data module 544.
Described startup module 510, for starting configurable collecting webpage data system;
Describedly transfer configuration module 520, for transferring the required corresponding configuration information gathered from database;
Described first judge module 531, for judging whether that configuration gathers the classification of website and the function of acquisition attributes, judging whether to enable the classification gathering website, if it is enabling the classification gathering website, otherwise terminating program;
Described acquisition Website Module 541, for classification and the attribute information of the collection website according to configuration, obtains required website from all kinds of website;
Described second judge module 532, logging in for judging whether the classifieds website collected, if it is logging in this website, otherwise virtual for employing log-on webpage is logged in this website;
Described acquisition subject of Web site module 542, for the subject of Web site Template Information according to configuration, obtains the required subject information logging in website;
Described 3rd judge module 533, for judging whether this subject content exists multi-page situation, if it is obtains the list of websites information of multi-page, otherwise directly obtains the web page contents of this theme according to point footers;
Described first intercepts content module 551, intercepts content information for the opening flag and end mark passing through web page contents;
Described acquisition content pages module 543, for the collection content page information according to configuration, obtains required content page information from the topic module of website;
Described 4th judge module 534, for judging whether it exists multi-page situation, if it is obtain the list of websites information of multi-page according to point footers, then intercept content according to opening flag and end mark, otherwise directly intercept content according to opening flag and end mark;
Described second intercepts content module 552, intercepts content information for the opening flag and end mark passing through web page contents page;
Described acquisition list data module 544, for the image data table information according to configuration, extracts expression formula corresponding to field or Rule list data;
Described memory module 560, for being stored into the data extracted in database.
Further, whether described acquisition Website Module is first enabled before execution and is logged in the judgement of website, if it is carries out the module obtaining subject of Web site and content pages, otherwise will terminate process.
Further, if described 4th judge module runs into multi-page situation, the mode image data adopting datacycle to merge when paging gathers content.
Compared with prior art, the advantage of the method and system of configurable collecting webpage data of the present invention is, user can need the web data that gathers by arbitrary disposition voluntarily, is gathered the relevant data message of the whole network, realizes flexibly, collecting webpage data easily by the acquisition scheme configured.
The foregoing is only embodiments of the invention; not thereby the scope of the claims of the present invention is limited; every equivalent structure transformation utilizing instructions of the present invention and accompanying drawing content to do, or be directly or indirectly used in other relevant technical fields, be all in like manner included in scope of patent protection of the present invention.

Claims (9)

1. a method for configurable collecting webpage data, is characterized in that, the method comprises:
S1, from database, obtain the configuration information of collecting webpage data, this configuration information comprises: configuration gathers the classified information of website, and configuration gathers theme Template Information, and configuration gathers content pages Template Information and configuration data table information;
S2, gather the classified information of website according to configuration, obtain the required classifieds website gathered;
S3, gather theme Template Information according to configuration, in the classifieds website gathered, obtain the required theme gathered;
S4, gather content pages Template Information according to configuration, obtain the required content pages gathered from the theme gathered;
S5, according to configuration data table information, obtain expression formula corresponding to field or dependency rule, from the content pages gathered, extract list data;
S6, the list data of extraction to be stored in database.
2. the method for configurable collecting webpage data according to claim 1, it is characterized in that, after described step S2, logging in judging whether the classifieds website collected, if it is log in this classifieds website, otherwise virtual for employing log-on webpage is logged in this classifieds website.
3. the method for configurable collecting webpage data according to claim 1, it is characterized in that, according to the theme obtained in described step S3, judge whether this theme exists multi-page situation, if it is the list information of multi-page network address is obtained according to point footers, obtained the content pages of multi-page by this list information, otherwise directly obtain content pages.
4. the method for configurable collecting webpage data according to claim 1, it is characterized in that, according to the content pages obtained in described step S4, judge whether it exists multi-page situation, if it is the list of websites information of multi-page and the opening flag of described content pages and end mark is obtained according to point footers, intercept the content of content pages, otherwise directly according to opening flag and the end mark of content pages, intercept the content of content pages.
5. the method for configurable collecting webpage data according to claim 1, is characterized in that, described acquisition attributes information comprises: gather network address, gather website coding and frequency acquisition.
6. the method for configurable collecting webpage data according to claim 1, is characterized in that, described data table information comprises: gather title, acquisition time, collection content and gather the source of content.
7. the method for configurable collecting webpage data according to claim 1, is characterized in that, the configuration step of the configuration information of described step S1 comprises:
The classification of a, configuration collection website and acquisition attributes;
B, configuration gather theme template;
C, configuration gather content pages template;
D, store configuration information in database, with use to be transferred.
8. a system for configurable collecting webpage data, is characterized in that, comprises database, obtains configuration information module and obtain configuration information module, wherein:
Described acquisition configuration information module, for obtaining the configuration information of collecting webpage data from database;
Described database, for store configuration information and list data;
Described acquisition configuration information module comprises acquisition Website Module, obtains subject of Web site module, obtains content pages module and obtain list data module, wherein,
Described acquisition Website Module, for gathering the classified information of website according to configuration, obtains the required classifieds website gathered;
Described acquisition subject of Web site module, for gathering theme Template Information according to configuration, obtains the required theme gathered in the classifieds website gathered;
Described acquisition content pages module, for gathering content pages Template Information according to configuration, obtains the required content pages gathered from the theme gathered;
Obtaining list data module, for obtaining expression formula corresponding to field or dependency rule, from the content pages gathered, extracting list data.
9. the system of configurable collecting webpage data according to claim 8, it is characterized in that, described acquisition Website Module also enables classifieds website for judging whether after execution, if it is carries out the module obtaining subject of Web site module and content pages, otherwise will terminate process.
CN201410822548.4A 2014-12-25 2014-12-25 A kind of method and system of configurable collecting webpage data Active CN104462547B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410822548.4A CN104462547B (en) 2014-12-25 2014-12-25 A kind of method and system of configurable collecting webpage data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410822548.4A CN104462547B (en) 2014-12-25 2014-12-25 A kind of method and system of configurable collecting webpage data

Publications (2)

Publication Number Publication Date
CN104462547A true CN104462547A (en) 2015-03-25
CN104462547B CN104462547B (en) 2019-04-02

Family

ID=52908582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410822548.4A Active CN104462547B (en) 2014-12-25 2014-12-25 A kind of method and system of configurable collecting webpage data

Country Status (1)

Country Link
CN (1) CN104462547B (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915334A (en) * 2015-05-29 2015-09-16 浪潮软件集团有限公司 Automatic extraction method of key information of bidding project based on semantic analysis
CN106022126A (en) * 2016-05-06 2016-10-12 哈尔滨工程大学 A webpage feature extracting method for WEB Trojan horse detection
CN106341470A (en) * 2016-08-31 2017-01-18 北京量科邦信息技术有限公司 Method for keeping conversation and grasping continuously-updated data of conversation
CN106547749A (en) * 2015-09-16 2017-03-29 北京国双科技有限公司 The method and apparatus of collecting webpage data
CN108520043A (en) * 2018-03-30 2018-09-11 纳思达股份有限公司 Data object acquisition method, apparatus and system, computer readable storage medium
CN108549678A (en) * 2018-04-02 2018-09-18 北京今朝在线科技有限公司 Information acquisition system
CN108763279A (en) * 2018-04-11 2018-11-06 北京中科闻歌科技股份有限公司 A kind of web data distribution template acquisition method and system
CN109902220A (en) * 2019-02-27 2019-06-18 腾讯科技(深圳)有限公司 Webpage information acquisition methods, device and computer readable storage medium
CN110188259A (en) * 2019-05-27 2019-08-30 厦门商集网络科技有限责任公司 A kind of data grab method and device of configurableization
CN110334259A (en) * 2019-04-22 2019-10-15 新分享科技服务(深圳)有限公司 Webpage data acquiring method, device and computer readable storage medium
CN111953766A (en) * 2020-08-07 2020-11-17 福建省天奕网络科技有限公司 Method and system for collecting network data
CN112667872A (en) * 2020-11-17 2021-04-16 国家计算机网络与信息安全管理中心 Real-time acquisition method of new coronary pneumonia epidemic situation data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101034997A (en) * 2006-03-09 2007-09-12 新数通兴业科技(北京)有限公司 Method and system for accurately publishing the data information
CN101561802A (en) * 2008-04-18 2009-10-21 上海复旦光华信息科技股份有限公司 Web page structural data extraction method and system
CN103593344A (en) * 2012-08-13 2014-02-19 北大方正集团有限公司 Information acquisition method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101034997A (en) * 2006-03-09 2007-09-12 新数通兴业科技(北京)有限公司 Method and system for accurately publishing the data information
CN101561802A (en) * 2008-04-18 2009-10-21 上海复旦光华信息科技股份有限公司 Web page structural data extraction method and system
CN103593344A (en) * 2012-08-13 2014-02-19 北大方正集团有限公司 Information acquisition method and device

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915334A (en) * 2015-05-29 2015-09-16 浪潮软件集团有限公司 Automatic extraction method of key information of bidding project based on semantic analysis
CN106547749B (en) * 2015-09-16 2021-02-12 北京国双科技有限公司 Webpage data acquisition method and device
CN106547749A (en) * 2015-09-16 2017-03-29 北京国双科技有限公司 The method and apparatus of collecting webpage data
CN106022126A (en) * 2016-05-06 2016-10-12 哈尔滨工程大学 A webpage feature extracting method for WEB Trojan horse detection
CN106022126B (en) * 2016-05-06 2018-07-24 哈尔滨工程大学 A kind of web page characteristics extracting method towards WEB trojan horse detections
CN106341470A (en) * 2016-08-31 2017-01-18 北京量科邦信息技术有限公司 Method for keeping conversation and grasping continuously-updated data of conversation
CN108520043A (en) * 2018-03-30 2018-09-11 纳思达股份有限公司 Data object acquisition method, apparatus and system, computer readable storage medium
CN108549678A (en) * 2018-04-02 2018-09-18 北京今朝在线科技有限公司 Information acquisition system
CN108763279A (en) * 2018-04-11 2018-11-06 北京中科闻歌科技股份有限公司 A kind of web data distribution template acquisition method and system
CN109902220A (en) * 2019-02-27 2019-06-18 腾讯科技(深圳)有限公司 Webpage information acquisition methods, device and computer readable storage medium
CN109902220B (en) * 2019-02-27 2023-11-24 腾讯科技(深圳)有限公司 Webpage information acquisition method, device and computer readable storage medium
CN110334259A (en) * 2019-04-22 2019-10-15 新分享科技服务(深圳)有限公司 Webpage data acquiring method, device and computer readable storage medium
CN110188259A (en) * 2019-05-27 2019-08-30 厦门商集网络科技有限责任公司 A kind of data grab method and device of configurableization
CN111953766A (en) * 2020-08-07 2020-11-17 福建省天奕网络科技有限公司 Method and system for collecting network data
CN112667872A (en) * 2020-11-17 2021-04-16 国家计算机网络与信息安全管理中心 Real-time acquisition method of new coronary pneumonia epidemic situation data

Also Published As

Publication number Publication date
CN104462547B (en) 2019-04-02

Similar Documents

Publication Publication Date Title
CN104462547A (en) Configurable webpage data acquisition method and system
Wang et al. How do developers react to restful api evolution?
US10621255B2 (en) Identifying equivalent links on a page
CN103714115A (en) Method and device for loading web page content
WO2014101783A1 (en) Method and server for performing cloud detection for malicious information
CN103544176A (en) Method and device for generating page structure template corresponding to multiple pages
US11200244B2 (en) Keyword reporting for mobile applications
CN104899220A (en) Application program recommendation method and system
CN103294781A (en) Method and equipment used for processing page data
US9355137B2 (en) Displaying articles matching a user's interest based on key words and the number of comments
CN103853757A (en) Method and system for displaying information of network, terminal and information displaying and processing device
CN103279548A (en) Method for performing barrier-free detection on websites
CN105117434A (en) Webpage classification method and webpage classification system
EP3220285A1 (en) Data acquisition program, data acquisition method and data acquisition device
CN102004805B (en) Webpage denoising system and method based on maximum similarity matching
AU2014209089A1 (en) Systems and methods for semantic URL handling
CN103365961A (en) Accurate search-oriented website structurization labeling method and system
US20170235835A1 (en) Information identification and extraction
CN104298786B (en) A kind of image search method and device
JP5216654B2 (en) Importance determination device, importance determination method, and program
CN103246680A (en) Method and device for aggregating and displaying webpage contents in browser
CN103377207B (en) Microblog users relation acquisition method based on script engine
CN105550279A (en) Vision-based list page identification method
CN106339381B (en) Information processing method and device
JP5380874B2 (en) Information retrieval method, program and apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant