CN108763279A - A kind of web data distribution template acquisition method and system - Google Patents

A kind of web data distribution template acquisition method and system Download PDF

Info

Publication number
CN108763279A
CN108763279A CN201810319851.0A CN201810319851A CN108763279A CN 108763279 A CN108763279 A CN 108763279A CN 201810319851 A CN201810319851 A CN 201810319851A CN 108763279 A CN108763279 A CN 108763279A
Authority
CN
China
Prior art keywords
data
template
webpage
acquisition
data acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810319851.0A
Other languages
Chinese (zh)
Other versions
CN108763279B (en
Inventor
方省
王海亮
皇秋曼
王磊
罗引
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Song Polytron Technologies Inc
Original Assignee
Beijing Zhongke Song Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Song Polytron Technologies Inc filed Critical Beijing Zhongke Song Polytron Technologies Inc
Priority to CN201810319851.0A priority Critical patent/CN108763279B/en
Publication of CN108763279A publication Critical patent/CN108763279A/en
Application granted granted Critical
Publication of CN108763279B publication Critical patent/CN108763279B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention relates to a kind of web data distribution template acquisition method and system, which includes:Data acquisition board is directed respectively by the type of the webpage in different tables of data and is stored;Corresponding data acquisition board is obtained from tables of data according to the type of collected webpage, data acquisition board in template pond is distributed at least two acquisition clients, it acquires client and data pick-up is carried out to webpage according to data acquisition board respectively, integration obtains the web data of webpage.The embodiment of the present invention is by building different data acquisition boards, corresponding data acquisition board is chosen according to the type of collected webpage, and data acquisition is carried out by data acquisition board respectively to the webpage by multiple acquisition clients, ensure the accuracy and integrality of data.

Description

A kind of web data distribution template acquisition method and system
Technical field
The present invention relates to data acquisition technology field more particularly to a kind of web data distribution template acquisition method and it is System.
Background technology
With the fast development of Internet of Things and the rise of big data, demand of the people to data is more and more, does not require nothing more than Data volume is more, and the requirement to the quality of data also improves.The quality of the quality of data is directly determined by being obtained after big data analysis Conclusion quality, good data will greatly promote precision of analysis.Under such circumstances, the technology of data acquisition It is particularly important.
Invention content
Of the existing technology in order to solve the problems, such as, at least one embodiment of the present invention provides a kind of web data point Cloth template acquisition method, including:
Different data acquisition boards is configured for different types of webpage, and the data acquisition board is pressed into the net The type of page, which is directed respectively into different tables of data, to be stored;
Corresponding data acquisition board is obtained from the tables of data according to the type of collected webpage, and will be got Data acquisition board be put into template pond;
Data acquisition board in the template pond is distributed at least two acquisition clients, the acquisition client point Data pick-up is not carried out to the webpage according to the data acquisition board, obtains the web data of the webpage.
Based on the above-mentioned technical proposal, the embodiment of the present invention can also make following improvement.
Optionally, the data acquisition board includes:Site level template, channel layer template and text layer template;
The site level template includes:Site name, site address, coded format, country, language and channel list;
The channel layer template include:Whether channel name channel address, coded format, categorical attribute, needs proxy access And page iden-tity;
The text layer template include:Title parsing, text parsing, issuing time parsing, author parsing, origin analysis and Picture parses.
Optionally, different data acquisition boards is configured for different types of webpage, and by the data acquisition board It is directed respectively into different tables of data and is stored by the type of the webpage, specifically included:
S11, the type configuration site level template according to webpage, obtain website template, judge the webpage with the presence or absence of frequency Track address;It is then, to execute S12, otherwise, the website template is the data acquisition board of the webpage, executes S14;
S12, it is based on the website template configuration channel layer template, obtains channel template, with judging the channel of the webpage Location whether there is text, be then, to execute S13, and otherwise, the channel template is the data acquisition board, executes S14;
S13, it is based on the channel template configuration text layer template, obtains the data acquisition board;
S14, it is carried out according to different be directed respectively into different tables of data by the type of the webpage of data acquisition board Storage, and service interface corresponding with the tables of data is set.
Optionally, the type of the collected webpage of the basis obtains corresponding data acquisition module from the tables of data Plate, and the data acquisition board got is put into template pond, it specifically includes:
S21, acquisition server call the service interface, are obtained from the tables of data according to collected type of webpage Corresponding data acquisition board;
The data acquisition board got is put into template pond by S22, the acquisition server, and monitors template pond in real time In the data acquisition board quantity;
S23, when the quantity of the data acquisition board in the template pond be less than preset value when, execute S21;When described When the quantity of the data acquisition board in template pond is greater than or equal to preset value, by the data acquisition module in the template pond Plate is distributed at least two acquisition clients.
Optionally, the data acquisition board by the template pond is distributed at least two acquisition clients, specifically Including:
When acquiring service interface described in client call, the acquisition server acquires the data in the template pond Template is distributed to the acquisition client, and the data acquisition board is distributed to at least one other acquisition client End.
Optionally, the acquisition client carries out data pick-up according to the data acquisition board to the webpage respectively, The web data of the webpage is obtained, is specifically included:
The acquisition client extracts the site address of the webpage according to the data acquisition board, and according to the station Dot address carries out page download;
By the data acquisition board, data pick-up is carried out to the webpage based on XPATH technologies, obtains the webpage Web data.
Optionally, the type of the collected webpage of the basis obtains corresponding data acquisition module from the tables of data Plate specifically includes:
Obtain the whole data acquisition board in tables of data corresponding with the type of the webpage.
Optionally, the type of the collected webpage of the basis obtains corresponding data acquisition module from the tables of data Plate specifically includes:
Obtain corresponding with the type of webpage tables of data, obtained from the tables of data according to default template ID and The corresponding data acquisition boards of template ID.
Optionally, the type of the collected webpage of the basis obtains corresponding data acquisition module from the tables of data Plate specifically includes:
Tables of data corresponding with the type of the webpage is obtained, is obtained greatly from the tables of data according to default template ID In the template ID numerical value data acquisition boards.
The embodiment of the present invention additionally provides a kind of web data distribution template acquisition system, including:Template configuration subsystem System, acquisition server subsystem and acquisition client-end subsystem;For realizing any of the above-described web data distribution mould Plate acquisition method.
The above-mentioned technical proposal of the present invention has the following advantages that compared with prior art:The embodiment of the present invention is by building not Same data acquisition board chooses corresponding data acquisition board according to the type of collected webpage, and passes through multiple acquisitions Client carries out data acquisition respectively by data acquisition board to the webpage, ensures the accuracy and integrality of data.
Description of the drawings
Fig. 1 is a kind of web data distribution template acquisition method flow diagram provided in an embodiment of the present invention;
Fig. 2 is a kind of web data distribution template acquisition method flow diagram that another embodiment of the present invention provides;
Fig. 3 is a kind of web data distribution template acquisition method flow diagram that further embodiment of this invention provides;
Fig. 4 is a kind of web data distribution template acquisition system structural schematic diagram that further embodiment of this invention provides.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The every other embodiment that member is obtained without making creative work, shall fall within the protection scope of the present invention.
As shown in Figure 1, a kind of web data distribution template acquisition method provided in an embodiment of the present invention, including:
Different data acquisition boards is configured for different types of webpage, and data acquisition board is pressed to the type of webpage It is directed respectively into different tables of data and is stored;
Specifically, in the present embodiment, different webpages has different arrangement pattern, data distribution and the page Redirect type, and most of webpage is all made of link clicks and jumps in another first level pages and browsed, as the case may be Difference, it is also possible to there is the case where multistage redirects, in embodiments of the present invention, phase configured for different types of webpage The webpage of the data acquisition board answered, same type also will appear the inconsistent situation of page layout, so the net of same type The data acquisition board of page may be differed also, and data acquisition board is directed respectively into different data by the type of webpage at this time It is stored in table, it is convenient subsequently in use, the acquisition efficiency of template.
Corresponding data acquisition board, and the number that will be got are obtained from tables of data according to the type of collected webpage It is put into template pond according to acquisition module;
Specifically, in the present embodiment, corresponding tables of data is chosen according to the type of collected webpage, from the data Corresponding data acquisition board is obtained in table, accelerates the acquisition efficiency of data acquisition board with this, and the data got are acquired Template is put into template pond and is kept in;Wherein, the mode for obtaining corresponding data acquisition board includes:
Obtain all bus data acquisition template in tables of data corresponding with the type of webpage;Alternatively, obtaining and webpage The corresponding tables of data of type obtains data acquisition board corresponding with template ID according to default template ID from tables of data;Or Person obtains tables of data corresponding with the type of webpage, is obtained from tables of data according to default template ID and is more than template ID numerical value A data acquisition board.
Data acquisition board in template pond is distributed at least two acquisition clients, acquisition client is respectively according to number Data pick-up is carried out to webpage according to acquisition module, integration obtains the web data of webpage;
Specifically, the data acquisition board in template pond is distributed in multiple acquisition clients, by acquiring client Data pick-up is carried out to webpage respectively, respectively obtains the partial or complete web data of webpage, is finally directed to each acquisition client Obtained data are held to be integrated, the web data to improve, this can not be got by avoiding the occurrence of web data acquisition failure The case where web data of position, ensures the integrality of the web data obtained.
In above-described embodiment, different data acquisition boards is built for different types of webpage, and by the type of webpage It stores respectively into different tables of data, corresponding data acquisition board is extracted when being used when data being facilitated to acquire, in acquisition net When data on page, according to the type of webpage, data acquisition board is obtained, and be stored in template pond, by will be in template pond Data acquisition board is distributed to multiple acquisition clients, is extracted respectively to the web data on webpage, and finally integrates and close And the web data of the webpage is obtained, which thereby enhance the accuracy and integrality of web data.
In the present embodiment, data acquisition board includes:Site level template, channel layer template and text layer template;
Site level template includes:Site name, site address, coded format, country, language and channel list;
Channel layer template include:Whether channel name channel address, coded format, categorical attribute, needs proxy access and page Face identifies;
Text layer template include:Title parsing, text parsing, issuing time parsing, author's parsing, origin analysis and picture Parsing;
Specifically, in the present embodiment, data acquisition board is divided into three layers, and top layer is site level, passes through site address It may have access to, wherein being stored with channel list, corresponding channel may have access to by channel list, click channel address therein i.e. It may have access to corresponding text and obtain bottom data, in the actual work, the level of webpage can be increased according to specific setting Add, and redirected by level, details are not described herein for other situations, and data acquisition board can specifically be set according to actual conditions It sets.
As shown in Fig. 2, in a specific embodiment, a kind of web data distribution template acquisition method specifically wraps It includes:
S11, the type configuration site level template according to webpage, obtain website template, judge webpage with the presence or absence of channel Location;It is then, to execute S12, otherwise, website template is the data acquisition board of webpage, executes S14;
S12, it is based on website template configuration channel layer template, obtains channel template, judges whether the channel address of webpage is deposited It is then, to execute S13 in text, otherwise, channel template is data acquisition board, executes S14;
S13, it is based on channel template configuration text layer template, obtains data acquisition board;
S14, it is stored according to different be directed respectively into different tables of data by the type of webpage of data acquisition board, And service interface corresponding with tables of data is set;
Specifically, when building data acquisition board, its level quantity is judged, when there are channel address, configure channel Template, such as whether there is multi-layer channel, carries out channel layer wherein the channel template configured can have more the level quantity of channel The configuration of template judges to whether there is text under channel address, such as there is text, text layer is configured according to the layout of text Template realizes the structure of data acquisition board.
As shown in figure 3, in the present embodiment, web data distribution template acquisition method further includes:
S21, acquisition server call service interface, obtain corresponding number from tables of data according to collected type of webpage According to acquisition module;
The data acquisition board got is put into template pond by S22, acquisition server, and is monitored in real time in template pond The quantity of data acquisition board;
S23, when the quantity of the data acquisition board in template pond be less than preset value when, execute S21;Number in template pond When being not less than preset value according to the quantity of acquisition module, the data acquisition board in template pond is distributed at least two acquisition clients End.
Data acquisition board in template pond is distributed at least two acquisition clients, acquisition client is respectively according to number Data pick-up is carried out to webpage according to acquisition module, integration obtains the web data of webpage;
Specifically, the data acquisition board in template pond is distributed in multiple acquisition clients, by acquiring client Data pick-up is carried out to webpage respectively, respectively obtains the partial or complete web data of webpage, is finally directed to each acquisition client Obtained data are held to be integrated, the web data to improve, this can not be got by avoiding the occurrence of web data acquisition failure The case where web data of position, ensures the integrality of the web data obtained, wherein be distributed at least two acquisition clients Flow specifically includes:When acquiring client call service interface, acquisition server divides the data acquisition board in template pond It is dealt into acquisition client, and data acquisition board is distributed to at least one other acquisition client;By working as some acquisition When client starts that service interface is called to carry out data acquisition, data acquisition board is assigned to at least one other acquisition client In end, web data is acquired respectively from there through different acquisition client, improves the accuracy of data.
Specifically, in the present embodiment, the gatherer process of web data, specific following steps:
The site address that client extracts webpage according to data acquisition board is acquired, and is carried out under webpage according to site address It carries;
By data acquisition board, data pick-up is carried out to webpage based on XPATH technologies, integration obtains the webpage number of webpage According to.
As shown in figure 4, the embodiment of the present invention additionally provides a kind of web data distribution template acquisition system, including:Mould Plate configuration subsystem, acquisition server subsystem and acquisition client-end subsystem;Acquiring client-end subsystem includes:At least two Acquire client;
Template configuration subsystem, the data acquisition board different for the type configuration for different webpages, and will count It is directed respectively into different tables of data and is stored according to acquisition module;
Acquisition server subsystem is adopted for obtaining corresponding data from tables of data according to the type of collected webpage Collect template, and the data acquisition board got is put into template pond;
Acquisition server subsystem is additionally operable to the data acquisition board in template pond being distributed at least two acquisition clients End,
Client is acquired, for carrying out data pick-up to webpage according to data acquisition board respectively, obtains the webpage of webpage Data.
In the present embodiment, template configuration subsystem is specifically used for the type configuration site level template according to webpage, obtains Website template judges that webpage whether there is channel address;It is then, to execute S12, otherwise, website template is that the data of webpage acquire Template executes S14;
S12, it is based on website template configuration channel layer template, obtains channel template, judges whether the channel address of webpage is deposited It is then, to execute S13 in text, otherwise, channel template is data acquisition board, executes S14;
S13, it is based on channel template configuration text layer template, obtains data acquisition board;
S14, the difference that acquisition module is acquired according to data are directed respectively into different tables of data and are stored, and be arranged with The corresponding service interface of tables of data.
In the present embodiment, acquisition server subsystem is specifically used for calling service interface, according to collected web page class Type obtains corresponding data acquisition board from tables of data;The data acquisition board got is put into template pond by acquisition server In, and monitor the quantity of the data acquisition board in template pond in real time;When the quantity of the data acquisition board in template pond is less than When preset value, corresponding data acquisition board is obtained from tables of data according to collected type of webpage;Number in template pond When being greater than or equal to preset value according to the quantity of acquisition module, the data acquisition board in template pond is distributed at least two acquisitions Client.
In the present embodiment, acquisition server subsystem is specifically used for when acquiring client call service interface, acquisition Data acquisition board in template pond is distributed to acquisition client by server, and data acquisition board is distributed at least one Others acquisition client.
In the present embodiment, client is acquired, is specifically used for extracting the site address of webpage according to data acquisition board, and Page download is carried out according to site address;By data acquisition board, data pick-up is carried out to webpage based on XPATH technologies, is obtained To the web data of webpage.
In the present embodiment, acquisition server subsystem is specifically used for obtaining tables of data corresponding with the type of webpage In all bus data acquisition template.
In the present embodiment, acquisition server subsystem is specifically used for obtaining tables of data corresponding with the type of webpage, Data acquisition board corresponding with template ID is obtained from tables of data according to default template ID.
In the present embodiment, acquisition server subsystem is specifically used for obtaining tables of data corresponding with the type of webpage, It is obtained from tables of data according to default template ID and is more than template ID numerical value data acquisition boards.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, it will be understood by those of ordinary skill in the art that:It still may be used With technical scheme described in the above embodiments is modified or equivalent replacement of some of the technical features; And these modifications or replacements, various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of web data distribution template acquisition method, which is characterized in that including:
Different data acquisition boards is configured for different types of webpage, and by the data acquisition board by the webpage Type is directed respectively into different tables of data and is stored;
Corresponding data acquisition board, and the number that will be got are obtained from the tables of data according to the type of collected webpage It is put into template pond according to acquisition module;
Data acquisition board in the template pond is distributed at least two acquisition clients, the acquisition client distinguishes root Data pick-up is carried out to the webpage according to the data acquisition board, obtains the web data of the webpage.
2. web data distribution template acquisition method according to claim 1, which is characterized in that the data acquisition module Plate includes:Site level template, channel layer template and text layer template;
The site level template includes:Site name, site address, coded format, country, language and channel list;
The channel layer template include:Whether channel name channel address, coded format, categorical attribute, needs proxy access and page Face identifies;
The text layer template include:Title parsing, text parsing, issuing time parsing, author's parsing, origin analysis and picture Parsing.
3. web data distribution template acquisition method according to claim 2, which is characterized in that for different types of Webpage configures different data acquisition boards, and the data acquisition board is directed respectively by the type of the webpage different It is stored, is specifically included in tables of data:
S11, the type configuration site level template according to webpage, obtain website template, judge the webpage with the presence or absence of channel Location;It is then, to execute S12, otherwise, the website template is the data acquisition board of the webpage, executes S14;
S12, it is based on the website template configuration channel layer template, obtains channel template, judges that the channel address of the webpage is No is then, to execute S13 there are text, and otherwise, the channel template is the data acquisition board, executes S14;
S13, it is based on the channel template configuration text layer template, obtains the data acquisition board;
S14, it is stored according to different be directed respectively into different tables of data by the type of the webpage of data acquisition board, And service interface corresponding with the tables of data is set.
4. web data distribution template acquisition method according to claim 3, which is characterized in that the basis is collected The type of webpage corresponding data acquisition board is obtained from the tables of data, and the data acquisition board got is put into In template pond, specifically include:
S21, acquisition server call the service interface, are obtained accordingly from the tables of data according to collected type of webpage Data acquisition board;
The data acquisition board got is put into template pond by S22, the acquisition server, and is monitored in real time in template pond The quantity of the data acquisition board;
S23, when the quantity of the data acquisition board in the template pond be less than preset value when, execute S21;When the template When the quantity of the data acquisition board in pond is greater than or equal to preset value, by the data acquisition board in the template pond point It is dealt at least two acquisition clients.
5. web data distribution template acquisition method according to claim 4, which is characterized in that described by the template Data acquisition board in pond is distributed at least two acquisition clients, specifically includes:
When acquiring service interface described in client call, the acquisition server is by the data acquisition board in the template pond It is distributed to the acquisition client, and the data acquisition board is distributed to at least one other acquisition client.
6. web data distribution template acquisition method according to claim 5, which is characterized in that the acquisition client Data pick-up is carried out to the webpage according to the data acquisition board respectively, obtains the web data of the webpage, it is specific to wrap It includes:
The acquisition client extracts the site address of the webpage according to the data acquisition board, and according to the website Location carries out page download;
By the data acquisition board, data pick-up is carried out to the webpage based on XPATH technologies, obtains the net of the webpage Page data.
7. according to any web data distribution template acquisition method in claim 1-6, which is characterized in that described Corresponding data acquisition board is obtained from the tables of data according to the type of collected webpage, is specifically included:
Obtain the whole data acquisition board in tables of data corresponding with the type of the webpage.
8. according to any web data distribution template acquisition method in claim 1-6, which is characterized in that described Corresponding data acquisition board is obtained from the tables of data according to the type of collected webpage, is specifically included:
Obtain corresponding with the type of webpage tables of data, obtained from the tables of data according to default template ID with it is described The corresponding data acquisition boards of template ID.
9. according to any web data distribution template acquisition method in claim 1-6, which is characterized in that described Corresponding data acquisition board is obtained from the tables of data according to the type of collected webpage, is specifically included:
Tables of data corresponding with the type of the webpage is obtained, is obtained from the tables of data according to default template ID and is more than institute State template ID numerical value data acquisition boards.
10. a kind of web data distribution template acquisition system, which is characterized in that including:Template configuration subsystem, acquisition service Device subsystem and acquisition client-end subsystem;For realizing the web data distribution mould as described in any in claim 1-9 Plate acquisition method.
CN201810319851.0A 2018-04-11 2018-04-11 Webpage data distributed template acquisition method and system Active CN108763279B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810319851.0A CN108763279B (en) 2018-04-11 2018-04-11 Webpage data distributed template acquisition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810319851.0A CN108763279B (en) 2018-04-11 2018-04-11 Webpage data distributed template acquisition method and system

Publications (2)

Publication Number Publication Date
CN108763279A true CN108763279A (en) 2018-11-06
CN108763279B CN108763279B (en) 2020-12-15

Family

ID=63981462

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810319851.0A Active CN108763279B (en) 2018-04-11 2018-04-11 Webpage data distributed template acquisition method and system

Country Status (1)

Country Link
CN (1) CN108763279B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110262904A (en) * 2019-05-17 2019-09-20 北京达佳互联信息技术有限公司 Collecting method and device
CN110334259A (en) * 2019-04-22 2019-10-15 新分享科技服务(深圳)有限公司 Webpage data acquiring method, device and computer readable storage medium
CN117150105A (en) * 2023-10-27 2023-12-01 四川银亿科技有限公司 Data acquisition method and acquisition platform based on webpage

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719124A (en) * 2008-10-09 2010-06-02 李晶心 System of infinite layering multi-path acquisition based on regular matching
CN101957816A (en) * 2009-07-13 2011-01-26 上海谐宇网络科技有限公司 Webpage metadata automatic extraction method and system based on multi-page comparison
US8250613B2 (en) * 2004-04-29 2012-08-21 Harris Corporation Media asset management system for managing video news segments and associated methods
US8413110B2 (en) * 2007-04-25 2013-04-02 Kai C. Leung Automating applications in a multimedia framework
CN103279507A (en) * 2013-05-16 2013-09-04 北京尚友通达信息技术有限公司 Webpage spider operational method and system
CN103618787A (en) * 2013-11-26 2014-03-05 优视科技有限公司 System and method for displaying webpage
CN104268283A (en) * 2014-10-21 2015-01-07 浪潮集团有限公司 Method for automatically analyzing Internet web page
CN104462547A (en) * 2014-12-25 2015-03-25 深圳联友科技有限公司 Configurable webpage data acquisition method and system
CN104735138A (en) * 2015-03-09 2015-06-24 中国科学院计算技术研究所 Distributed acquisition method and system oriented to user generated content
CN107220250A (en) * 2016-03-21 2017-09-29 北大方正集团有限公司 A kind of template configuration method and system
CN107766234A (en) * 2017-08-31 2018-03-06 广州数沃信息科技有限公司 A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8250613B2 (en) * 2004-04-29 2012-08-21 Harris Corporation Media asset management system for managing video news segments and associated methods
US8413110B2 (en) * 2007-04-25 2013-04-02 Kai C. Leung Automating applications in a multimedia framework
CN101719124A (en) * 2008-10-09 2010-06-02 李晶心 System of infinite layering multi-path acquisition based on regular matching
CN101957816A (en) * 2009-07-13 2011-01-26 上海谐宇网络科技有限公司 Webpage metadata automatic extraction method and system based on multi-page comparison
CN103279507A (en) * 2013-05-16 2013-09-04 北京尚友通达信息技术有限公司 Webpage spider operational method and system
CN103618787A (en) * 2013-11-26 2014-03-05 优视科技有限公司 System and method for displaying webpage
CN104268283A (en) * 2014-10-21 2015-01-07 浪潮集团有限公司 Method for automatically analyzing Internet web page
CN104462547A (en) * 2014-12-25 2015-03-25 深圳联友科技有限公司 Configurable webpage data acquisition method and system
CN104735138A (en) * 2015-03-09 2015-06-24 中国科学院计算技术研究所 Distributed acquisition method and system oriented to user generated content
CN107220250A (en) * 2016-03-21 2017-09-29 北大方正集团有限公司 A kind of template configuration method and system
CN107766234A (en) * 2017-08-31 2018-03-06 广州数沃信息科技有限公司 A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曹茜茜: "基于Hadoop的电信大数据分析的设计与实现", 《中国优秀硕士学位论文 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110334259A (en) * 2019-04-22 2019-10-15 新分享科技服务(深圳)有限公司 Webpage data acquiring method, device and computer readable storage medium
CN110262904A (en) * 2019-05-17 2019-09-20 北京达佳互联信息技术有限公司 Collecting method and device
CN117150105A (en) * 2023-10-27 2023-12-01 四川银亿科技有限公司 Data acquisition method and acquisition platform based on webpage
CN117150105B (en) * 2023-10-27 2023-12-26 四川银亿科技有限公司 Data acquisition method and acquisition platform based on webpage

Also Published As

Publication number Publication date
CN108763279B (en) 2020-12-15

Similar Documents

Publication Publication Date Title
DE60216918T2 (en) METHOD AND COMPUTER SYSTEM FOR SELECTION OF A BORDER COMPUTER
KR100573037B1 (en) Content extraction server on the rss and method thereof, service system for idle screen on mobile using the same
CN107071009A (en) A kind of distributed big data crawler system of load balancing
CN103678408B (en) A kind of method and device of inquiry data
CN108763279A (en) A kind of web data distribution template acquisition method and system
DE102013017085A1 (en) System for deep linking and search engine support for websites integrating a third-party application and components
CN107404480B (en) A kind of transmission method of stream medium data, storage medium and streaming media server
CN102663062A (en) Method and device for processing invalid links in search result
CN103425644B (en) The extracting method of picture and device in Web page text
CN106126557A (en) Page processing method and device
CN103309884A (en) User behavior data collecting method and system
CN106250454A (en) The loading method of a kind of page script and device
DE102015101062B4 (en) Server system, method for controlling a server system and storage medium
CN106557584A (en) A kind of web site collection method and device
KR102009020B1 (en) Method and apparatus for providing website authentication data for search engine
US11531733B2 (en) Authority filter method and authority filter device
CN103678317B (en) The automatic adaptation method and system of page layout
US8521719B1 (en) Searchable and size-constrained local log repositories for tracking visitors' access to web content
CN113051460A (en) Elasticissearch-based data retrieval method and system, electronic device and storage medium
US10579699B2 (en) Computing system with dynamic web page feature
CN108075922A (en) A kind of telecommunication network management system
DE69925435T2 (en) A computer-implemented method and apparatus for providing a logical access point to one or more files
CN112749215B (en) Data display method and related equipment
CN106503038B (en) A kind of method and system of automatic buffer network request returned data
CN107871009A (en) A kind of method and device for gathering directory metadata

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant