CN108763279A - A kind of web data distribution template acquisition method and system - Google Patents
A kind of web data distribution template acquisition method and system Download PDFInfo
- Publication number
- CN108763279A CN108763279A CN201810319851.0A CN201810319851A CN108763279A CN 108763279 A CN108763279 A CN 108763279A CN 201810319851 A CN201810319851 A CN 201810319851A CN 108763279 A CN108763279 A CN 108763279A
- Authority
- CN
- China
- Prior art keywords
- data
- template
- webpage
- acquisition
- data acquisition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The present invention relates to a kind of web data distribution template acquisition method and system, which includes:Data acquisition board is directed respectively by the type of the webpage in different tables of data and is stored;Corresponding data acquisition board is obtained from tables of data according to the type of collected webpage, data acquisition board in template pond is distributed at least two acquisition clients, it acquires client and data pick-up is carried out to webpage according to data acquisition board respectively, integration obtains the web data of webpage.The embodiment of the present invention is by building different data acquisition boards, corresponding data acquisition board is chosen according to the type of collected webpage, and data acquisition is carried out by data acquisition board respectively to the webpage by multiple acquisition clients, ensure the accuracy and integrality of data.
Description
Technical field
The present invention relates to data acquisition technology field more particularly to a kind of web data distribution template acquisition method and it is
System.
Background technology
With the fast development of Internet of Things and the rise of big data, demand of the people to data is more and more, does not require nothing more than
Data volume is more, and the requirement to the quality of data also improves.The quality of the quality of data is directly determined by being obtained after big data analysis
Conclusion quality, good data will greatly promote precision of analysis.Under such circumstances, the technology of data acquisition
It is particularly important.
Invention content
Of the existing technology in order to solve the problems, such as, at least one embodiment of the present invention provides a kind of web data point
Cloth template acquisition method, including:
Different data acquisition boards is configured for different types of webpage, and the data acquisition board is pressed into the net
The type of page, which is directed respectively into different tables of data, to be stored;
Corresponding data acquisition board is obtained from the tables of data according to the type of collected webpage, and will be got
Data acquisition board be put into template pond;
Data acquisition board in the template pond is distributed at least two acquisition clients, the acquisition client point
Data pick-up is not carried out to the webpage according to the data acquisition board, obtains the web data of the webpage.
Based on the above-mentioned technical proposal, the embodiment of the present invention can also make following improvement.
Optionally, the data acquisition board includes:Site level template, channel layer template and text layer template;
The site level template includes:Site name, site address, coded format, country, language and channel list;
The channel layer template include:Whether channel name channel address, coded format, categorical attribute, needs proxy access
And page iden-tity;
The text layer template include:Title parsing, text parsing, issuing time parsing, author parsing, origin analysis and
Picture parses.
Optionally, different data acquisition boards is configured for different types of webpage, and by the data acquisition board
It is directed respectively into different tables of data and is stored by the type of the webpage, specifically included:
S11, the type configuration site level template according to webpage, obtain website template, judge the webpage with the presence or absence of frequency
Track address;It is then, to execute S12, otherwise, the website template is the data acquisition board of the webpage, executes S14;
S12, it is based on the website template configuration channel layer template, obtains channel template, with judging the channel of the webpage
Location whether there is text, be then, to execute S13, and otherwise, the channel template is the data acquisition board, executes S14;
S13, it is based on the channel template configuration text layer template, obtains the data acquisition board;
S14, it is carried out according to different be directed respectively into different tables of data by the type of the webpage of data acquisition board
Storage, and service interface corresponding with the tables of data is set.
Optionally, the type of the collected webpage of the basis obtains corresponding data acquisition module from the tables of data
Plate, and the data acquisition board got is put into template pond, it specifically includes:
S21, acquisition server call the service interface, are obtained from the tables of data according to collected type of webpage
Corresponding data acquisition board;
The data acquisition board got is put into template pond by S22, the acquisition server, and monitors template pond in real time
In the data acquisition board quantity;
S23, when the quantity of the data acquisition board in the template pond be less than preset value when, execute S21;When described
When the quantity of the data acquisition board in template pond is greater than or equal to preset value, by the data acquisition module in the template pond
Plate is distributed at least two acquisition clients.
Optionally, the data acquisition board by the template pond is distributed at least two acquisition clients, specifically
Including:
When acquiring service interface described in client call, the acquisition server acquires the data in the template pond
Template is distributed to the acquisition client, and the data acquisition board is distributed to at least one other acquisition client
End.
Optionally, the acquisition client carries out data pick-up according to the data acquisition board to the webpage respectively,
The web data of the webpage is obtained, is specifically included:
The acquisition client extracts the site address of the webpage according to the data acquisition board, and according to the station
Dot address carries out page download;
By the data acquisition board, data pick-up is carried out to the webpage based on XPATH technologies, obtains the webpage
Web data.
Optionally, the type of the collected webpage of the basis obtains corresponding data acquisition module from the tables of data
Plate specifically includes:
Obtain the whole data acquisition board in tables of data corresponding with the type of the webpage.
Optionally, the type of the collected webpage of the basis obtains corresponding data acquisition module from the tables of data
Plate specifically includes:
Obtain corresponding with the type of webpage tables of data, obtained from the tables of data according to default template ID and
The corresponding data acquisition boards of template ID.
Optionally, the type of the collected webpage of the basis obtains corresponding data acquisition module from the tables of data
Plate specifically includes:
Tables of data corresponding with the type of the webpage is obtained, is obtained greatly from the tables of data according to default template ID
In the template ID numerical value data acquisition boards.
The embodiment of the present invention additionally provides a kind of web data distribution template acquisition system, including:Template configuration subsystem
System, acquisition server subsystem and acquisition client-end subsystem;For realizing any of the above-described web data distribution mould
Plate acquisition method.
The above-mentioned technical proposal of the present invention has the following advantages that compared with prior art:The embodiment of the present invention is by building not
Same data acquisition board chooses corresponding data acquisition board according to the type of collected webpage, and passes through multiple acquisitions
Client carries out data acquisition respectively by data acquisition board to the webpage, ensures the accuracy and integrality of data.
Description of the drawings
Fig. 1 is a kind of web data distribution template acquisition method flow diagram provided in an embodiment of the present invention;
Fig. 2 is a kind of web data distribution template acquisition method flow diagram that another embodiment of the present invention provides;
Fig. 3 is a kind of web data distribution template acquisition method flow diagram that further embodiment of this invention provides;
Fig. 4 is a kind of web data distribution template acquisition system structural schematic diagram that further embodiment of this invention provides.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The every other embodiment that member is obtained without making creative work, shall fall within the protection scope of the present invention.
As shown in Figure 1, a kind of web data distribution template acquisition method provided in an embodiment of the present invention, including:
Different data acquisition boards is configured for different types of webpage, and data acquisition board is pressed to the type of webpage
It is directed respectively into different tables of data and is stored;
Specifically, in the present embodiment, different webpages has different arrangement pattern, data distribution and the page
Redirect type, and most of webpage is all made of link clicks and jumps in another first level pages and browsed, as the case may be
Difference, it is also possible to there is the case where multistage redirects, in embodiments of the present invention, phase configured for different types of webpage
The webpage of the data acquisition board answered, same type also will appear the inconsistent situation of page layout, so the net of same type
The data acquisition board of page may be differed also, and data acquisition board is directed respectively into different data by the type of webpage at this time
It is stored in table, it is convenient subsequently in use, the acquisition efficiency of template.
Corresponding data acquisition board, and the number that will be got are obtained from tables of data according to the type of collected webpage
It is put into template pond according to acquisition module;
Specifically, in the present embodiment, corresponding tables of data is chosen according to the type of collected webpage, from the data
Corresponding data acquisition board is obtained in table, accelerates the acquisition efficiency of data acquisition board with this, and the data got are acquired
Template is put into template pond and is kept in;Wherein, the mode for obtaining corresponding data acquisition board includes:
Obtain all bus data acquisition template in tables of data corresponding with the type of webpage;Alternatively, obtaining and webpage
The corresponding tables of data of type obtains data acquisition board corresponding with template ID according to default template ID from tables of data;Or
Person obtains tables of data corresponding with the type of webpage, is obtained from tables of data according to default template ID and is more than template ID numerical value
A data acquisition board.
Data acquisition board in template pond is distributed at least two acquisition clients, acquisition client is respectively according to number
Data pick-up is carried out to webpage according to acquisition module, integration obtains the web data of webpage;
Specifically, the data acquisition board in template pond is distributed in multiple acquisition clients, by acquiring client
Data pick-up is carried out to webpage respectively, respectively obtains the partial or complete web data of webpage, is finally directed to each acquisition client
Obtained data are held to be integrated, the web data to improve, this can not be got by avoiding the occurrence of web data acquisition failure
The case where web data of position, ensures the integrality of the web data obtained.
In above-described embodiment, different data acquisition boards is built for different types of webpage, and by the type of webpage
It stores respectively into different tables of data, corresponding data acquisition board is extracted when being used when data being facilitated to acquire, in acquisition net
When data on page, according to the type of webpage, data acquisition board is obtained, and be stored in template pond, by will be in template pond
Data acquisition board is distributed to multiple acquisition clients, is extracted respectively to the web data on webpage, and finally integrates and close
And the web data of the webpage is obtained, which thereby enhance the accuracy and integrality of web data.
In the present embodiment, data acquisition board includes:Site level template, channel layer template and text layer template;
Site level template includes:Site name, site address, coded format, country, language and channel list;
Channel layer template include:Whether channel name channel address, coded format, categorical attribute, needs proxy access and page
Face identifies;
Text layer template include:Title parsing, text parsing, issuing time parsing, author's parsing, origin analysis and picture
Parsing;
Specifically, in the present embodiment, data acquisition board is divided into three layers, and top layer is site level, passes through site address
It may have access to, wherein being stored with channel list, corresponding channel may have access to by channel list, click channel address therein i.e.
It may have access to corresponding text and obtain bottom data, in the actual work, the level of webpage can be increased according to specific setting
Add, and redirected by level, details are not described herein for other situations, and data acquisition board can specifically be set according to actual conditions
It sets.
As shown in Fig. 2, in a specific embodiment, a kind of web data distribution template acquisition method specifically wraps
It includes:
S11, the type configuration site level template according to webpage, obtain website template, judge webpage with the presence or absence of channel
Location;It is then, to execute S12, otherwise, website template is the data acquisition board of webpage, executes S14;
S12, it is based on website template configuration channel layer template, obtains channel template, judges whether the channel address of webpage is deposited
It is then, to execute S13 in text, otherwise, channel template is data acquisition board, executes S14;
S13, it is based on channel template configuration text layer template, obtains data acquisition board;
S14, it is stored according to different be directed respectively into different tables of data by the type of webpage of data acquisition board,
And service interface corresponding with tables of data is set;
Specifically, when building data acquisition board, its level quantity is judged, when there are channel address, configure channel
Template, such as whether there is multi-layer channel, carries out channel layer wherein the channel template configured can have more the level quantity of channel
The configuration of template judges to whether there is text under channel address, such as there is text, text layer is configured according to the layout of text
Template realizes the structure of data acquisition board.
As shown in figure 3, in the present embodiment, web data distribution template acquisition method further includes:
S21, acquisition server call service interface, obtain corresponding number from tables of data according to collected type of webpage
According to acquisition module;
The data acquisition board got is put into template pond by S22, acquisition server, and is monitored in real time in template pond
The quantity of data acquisition board;
S23, when the quantity of the data acquisition board in template pond be less than preset value when, execute S21;Number in template pond
When being not less than preset value according to the quantity of acquisition module, the data acquisition board in template pond is distributed at least two acquisition clients
End.
Data acquisition board in template pond is distributed at least two acquisition clients, acquisition client is respectively according to number
Data pick-up is carried out to webpage according to acquisition module, integration obtains the web data of webpage;
Specifically, the data acquisition board in template pond is distributed in multiple acquisition clients, by acquiring client
Data pick-up is carried out to webpage respectively, respectively obtains the partial or complete web data of webpage, is finally directed to each acquisition client
Obtained data are held to be integrated, the web data to improve, this can not be got by avoiding the occurrence of web data acquisition failure
The case where web data of position, ensures the integrality of the web data obtained, wherein be distributed at least two acquisition clients
Flow specifically includes:When acquiring client call service interface, acquisition server divides the data acquisition board in template pond
It is dealt into acquisition client, and data acquisition board is distributed to at least one other acquisition client;By working as some acquisition
When client starts that service interface is called to carry out data acquisition, data acquisition board is assigned to at least one other acquisition client
In end, web data is acquired respectively from there through different acquisition client, improves the accuracy of data.
Specifically, in the present embodiment, the gatherer process of web data, specific following steps:
The site address that client extracts webpage according to data acquisition board is acquired, and is carried out under webpage according to site address
It carries;
By data acquisition board, data pick-up is carried out to webpage based on XPATH technologies, integration obtains the webpage number of webpage
According to.
As shown in figure 4, the embodiment of the present invention additionally provides a kind of web data distribution template acquisition system, including:Mould
Plate configuration subsystem, acquisition server subsystem and acquisition client-end subsystem;Acquiring client-end subsystem includes:At least two
Acquire client;
Template configuration subsystem, the data acquisition board different for the type configuration for different webpages, and will count
It is directed respectively into different tables of data and is stored according to acquisition module;
Acquisition server subsystem is adopted for obtaining corresponding data from tables of data according to the type of collected webpage
Collect template, and the data acquisition board got is put into template pond;
Acquisition server subsystem is additionally operable to the data acquisition board in template pond being distributed at least two acquisition clients
End,
Client is acquired, for carrying out data pick-up to webpage according to data acquisition board respectively, obtains the webpage of webpage
Data.
In the present embodiment, template configuration subsystem is specifically used for the type configuration site level template according to webpage, obtains
Website template judges that webpage whether there is channel address;It is then, to execute S12, otherwise, website template is that the data of webpage acquire
Template executes S14;
S12, it is based on website template configuration channel layer template, obtains channel template, judges whether the channel address of webpage is deposited
It is then, to execute S13 in text, otherwise, channel template is data acquisition board, executes S14;
S13, it is based on channel template configuration text layer template, obtains data acquisition board;
S14, the difference that acquisition module is acquired according to data are directed respectively into different tables of data and are stored, and be arranged with
The corresponding service interface of tables of data.
In the present embodiment, acquisition server subsystem is specifically used for calling service interface, according to collected web page class
Type obtains corresponding data acquisition board from tables of data;The data acquisition board got is put into template pond by acquisition server
In, and monitor the quantity of the data acquisition board in template pond in real time;When the quantity of the data acquisition board in template pond is less than
When preset value, corresponding data acquisition board is obtained from tables of data according to collected type of webpage;Number in template pond
When being greater than or equal to preset value according to the quantity of acquisition module, the data acquisition board in template pond is distributed at least two acquisitions
Client.
In the present embodiment, acquisition server subsystem is specifically used for when acquiring client call service interface, acquisition
Data acquisition board in template pond is distributed to acquisition client by server, and data acquisition board is distributed at least one
Others acquisition client.
In the present embodiment, client is acquired, is specifically used for extracting the site address of webpage according to data acquisition board, and
Page download is carried out according to site address;By data acquisition board, data pick-up is carried out to webpage based on XPATH technologies, is obtained
To the web data of webpage.
In the present embodiment, acquisition server subsystem is specifically used for obtaining tables of data corresponding with the type of webpage
In all bus data acquisition template.
In the present embodiment, acquisition server subsystem is specifically used for obtaining tables of data corresponding with the type of webpage,
Data acquisition board corresponding with template ID is obtained from tables of data according to default template ID.
In the present embodiment, acquisition server subsystem is specifically used for obtaining tables of data corresponding with the type of webpage,
It is obtained from tables of data according to default template ID and is more than template ID numerical value data acquisition boards.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, it will be understood by those of ordinary skill in the art that:It still may be used
With technical scheme described in the above embodiments is modified or equivalent replacement of some of the technical features;
And these modifications or replacements, various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (10)
1. a kind of web data distribution template acquisition method, which is characterized in that including:
Different data acquisition boards is configured for different types of webpage, and by the data acquisition board by the webpage
Type is directed respectively into different tables of data and is stored;
Corresponding data acquisition board, and the number that will be got are obtained from the tables of data according to the type of collected webpage
It is put into template pond according to acquisition module;
Data acquisition board in the template pond is distributed at least two acquisition clients, the acquisition client distinguishes root
Data pick-up is carried out to the webpage according to the data acquisition board, obtains the web data of the webpage.
2. web data distribution template acquisition method according to claim 1, which is characterized in that the data acquisition module
Plate includes:Site level template, channel layer template and text layer template;
The site level template includes:Site name, site address, coded format, country, language and channel list;
The channel layer template include:Whether channel name channel address, coded format, categorical attribute, needs proxy access and page
Face identifies;
The text layer template include:Title parsing, text parsing, issuing time parsing, author's parsing, origin analysis and picture
Parsing.
3. web data distribution template acquisition method according to claim 2, which is characterized in that for different types of
Webpage configures different data acquisition boards, and the data acquisition board is directed respectively by the type of the webpage different
It is stored, is specifically included in tables of data:
S11, the type configuration site level template according to webpage, obtain website template, judge the webpage with the presence or absence of channel
Location;It is then, to execute S12, otherwise, the website template is the data acquisition board of the webpage, executes S14;
S12, it is based on the website template configuration channel layer template, obtains channel template, judges that the channel address of the webpage is
No is then, to execute S13 there are text, and otherwise, the channel template is the data acquisition board, executes S14;
S13, it is based on the channel template configuration text layer template, obtains the data acquisition board;
S14, it is stored according to different be directed respectively into different tables of data by the type of the webpage of data acquisition board,
And service interface corresponding with the tables of data is set.
4. web data distribution template acquisition method according to claim 3, which is characterized in that the basis is collected
The type of webpage corresponding data acquisition board is obtained from the tables of data, and the data acquisition board got is put into
In template pond, specifically include:
S21, acquisition server call the service interface, are obtained accordingly from the tables of data according to collected type of webpage
Data acquisition board;
The data acquisition board got is put into template pond by S22, the acquisition server, and is monitored in real time in template pond
The quantity of the data acquisition board;
S23, when the quantity of the data acquisition board in the template pond be less than preset value when, execute S21;When the template
When the quantity of the data acquisition board in pond is greater than or equal to preset value, by the data acquisition board in the template pond point
It is dealt at least two acquisition clients.
5. web data distribution template acquisition method according to claim 4, which is characterized in that described by the template
Data acquisition board in pond is distributed at least two acquisition clients, specifically includes:
When acquiring service interface described in client call, the acquisition server is by the data acquisition board in the template pond
It is distributed to the acquisition client, and the data acquisition board is distributed to at least one other acquisition client.
6. web data distribution template acquisition method according to claim 5, which is characterized in that the acquisition client
Data pick-up is carried out to the webpage according to the data acquisition board respectively, obtains the web data of the webpage, it is specific to wrap
It includes:
The acquisition client extracts the site address of the webpage according to the data acquisition board, and according to the website
Location carries out page download;
By the data acquisition board, data pick-up is carried out to the webpage based on XPATH technologies, obtains the net of the webpage
Page data.
7. according to any web data distribution template acquisition method in claim 1-6, which is characterized in that described
Corresponding data acquisition board is obtained from the tables of data according to the type of collected webpage, is specifically included:
Obtain the whole data acquisition board in tables of data corresponding with the type of the webpage.
8. according to any web data distribution template acquisition method in claim 1-6, which is characterized in that described
Corresponding data acquisition board is obtained from the tables of data according to the type of collected webpage, is specifically included:
Obtain corresponding with the type of webpage tables of data, obtained from the tables of data according to default template ID with it is described
The corresponding data acquisition boards of template ID.
9. according to any web data distribution template acquisition method in claim 1-6, which is characterized in that described
Corresponding data acquisition board is obtained from the tables of data according to the type of collected webpage, is specifically included:
Tables of data corresponding with the type of the webpage is obtained, is obtained from the tables of data according to default template ID and is more than institute
State template ID numerical value data acquisition boards.
10. a kind of web data distribution template acquisition system, which is characterized in that including:Template configuration subsystem, acquisition service
Device subsystem and acquisition client-end subsystem;For realizing the web data distribution mould as described in any in claim 1-9
Plate acquisition method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810319851.0A CN108763279B (en) | 2018-04-11 | 2018-04-11 | Webpage data distributed template acquisition method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810319851.0A CN108763279B (en) | 2018-04-11 | 2018-04-11 | Webpage data distributed template acquisition method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108763279A true CN108763279A (en) | 2018-11-06 |
CN108763279B CN108763279B (en) | 2020-12-15 |
Family
ID=63981462
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810319851.0A Active CN108763279B (en) | 2018-04-11 | 2018-04-11 | Webpage data distributed template acquisition method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108763279B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110262904A (en) * | 2019-05-17 | 2019-09-20 | 北京达佳互联信息技术有限公司 | Collecting method and device |
CN110334259A (en) * | 2019-04-22 | 2019-10-15 | 新分享科技服务(深圳)有限公司 | Webpage data acquiring method, device and computer readable storage medium |
CN117150105A (en) * | 2023-10-27 | 2023-12-01 | 四川银亿科技有限公司 | Data acquisition method and acquisition platform based on webpage |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101719124A (en) * | 2008-10-09 | 2010-06-02 | 李晶心 | System of infinite layering multi-path acquisition based on regular matching |
CN101957816A (en) * | 2009-07-13 | 2011-01-26 | 上海谐宇网络科技有限公司 | Webpage metadata automatic extraction method and system based on multi-page comparison |
US8250613B2 (en) * | 2004-04-29 | 2012-08-21 | Harris Corporation | Media asset management system for managing video news segments and associated methods |
US8413110B2 (en) * | 2007-04-25 | 2013-04-02 | Kai C. Leung | Automating applications in a multimedia framework |
CN103279507A (en) * | 2013-05-16 | 2013-09-04 | 北京尚友通达信息技术有限公司 | Webpage spider operational method and system |
CN103618787A (en) * | 2013-11-26 | 2014-03-05 | 优视科技有限公司 | System and method for displaying webpage |
CN104268283A (en) * | 2014-10-21 | 2015-01-07 | 浪潮集团有限公司 | Method for automatically analyzing Internet web page |
CN104462547A (en) * | 2014-12-25 | 2015-03-25 | 深圳联友科技有限公司 | Configurable webpage data acquisition method and system |
CN104735138A (en) * | 2015-03-09 | 2015-06-24 | 中国科学院计算技术研究所 | Distributed acquisition method and system oriented to user generated content |
CN107220250A (en) * | 2016-03-21 | 2017-09-29 | 北大方正集团有限公司 | A kind of template configuration method and system |
CN107766234A (en) * | 2017-08-31 | 2018-03-06 | 广州数沃信息科技有限公司 | A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device |
-
2018
- 2018-04-11 CN CN201810319851.0A patent/CN108763279B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8250613B2 (en) * | 2004-04-29 | 2012-08-21 | Harris Corporation | Media asset management system for managing video news segments and associated methods |
US8413110B2 (en) * | 2007-04-25 | 2013-04-02 | Kai C. Leung | Automating applications in a multimedia framework |
CN101719124A (en) * | 2008-10-09 | 2010-06-02 | 李晶心 | System of infinite layering multi-path acquisition based on regular matching |
CN101957816A (en) * | 2009-07-13 | 2011-01-26 | 上海谐宇网络科技有限公司 | Webpage metadata automatic extraction method and system based on multi-page comparison |
CN103279507A (en) * | 2013-05-16 | 2013-09-04 | 北京尚友通达信息技术有限公司 | Webpage spider operational method and system |
CN103618787A (en) * | 2013-11-26 | 2014-03-05 | 优视科技有限公司 | System and method for displaying webpage |
CN104268283A (en) * | 2014-10-21 | 2015-01-07 | 浪潮集团有限公司 | Method for automatically analyzing Internet web page |
CN104462547A (en) * | 2014-12-25 | 2015-03-25 | 深圳联友科技有限公司 | Configurable webpage data acquisition method and system |
CN104735138A (en) * | 2015-03-09 | 2015-06-24 | 中国科学院计算技术研究所 | Distributed acquisition method and system oriented to user generated content |
CN107220250A (en) * | 2016-03-21 | 2017-09-29 | 北大方正集团有限公司 | A kind of template configuration method and system |
CN107766234A (en) * | 2017-08-31 | 2018-03-06 | 广州数沃信息科技有限公司 | A kind of assessment method, the apparatus and system of the webpage health degree based on mobile device |
Non-Patent Citations (1)
Title |
---|
曹茜茜: "基于Hadoop的电信大数据分析的设计与实现", 《中国优秀硕士学位论文 信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110334259A (en) * | 2019-04-22 | 2019-10-15 | 新分享科技服务(深圳)有限公司 | Webpage data acquiring method, device and computer readable storage medium |
CN110262904A (en) * | 2019-05-17 | 2019-09-20 | 北京达佳互联信息技术有限公司 | Collecting method and device |
CN117150105A (en) * | 2023-10-27 | 2023-12-01 | 四川银亿科技有限公司 | Data acquisition method and acquisition platform based on webpage |
CN117150105B (en) * | 2023-10-27 | 2023-12-26 | 四川银亿科技有限公司 | Data acquisition method and acquisition platform based on webpage |
Also Published As
Publication number | Publication date |
---|---|
CN108763279B (en) | 2020-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
DE60216918T2 (en) | METHOD AND COMPUTER SYSTEM FOR SELECTION OF A BORDER COMPUTER | |
KR100573037B1 (en) | Content extraction server on the rss and method thereof, service system for idle screen on mobile using the same | |
CN107071009A (en) | A kind of distributed big data crawler system of load balancing | |
CN103678408B (en) | A kind of method and device of inquiry data | |
CN108763279A (en) | A kind of web data distribution template acquisition method and system | |
DE102013017085A1 (en) | System for deep linking and search engine support for websites integrating a third-party application and components | |
CN107404480B (en) | A kind of transmission method of stream medium data, storage medium and streaming media server | |
CN102663062A (en) | Method and device for processing invalid links in search result | |
CN103425644B (en) | The extracting method of picture and device in Web page text | |
CN106126557A (en) | Page processing method and device | |
CN103309884A (en) | User behavior data collecting method and system | |
CN106250454A (en) | The loading method of a kind of page script and device | |
DE102015101062B4 (en) | Server system, method for controlling a server system and storage medium | |
CN106557584A (en) | A kind of web site collection method and device | |
KR102009020B1 (en) | Method and apparatus for providing website authentication data for search engine | |
US11531733B2 (en) | Authority filter method and authority filter device | |
CN103678317B (en) | The automatic adaptation method and system of page layout | |
US8521719B1 (en) | Searchable and size-constrained local log repositories for tracking visitors' access to web content | |
CN113051460A (en) | Elasticissearch-based data retrieval method and system, electronic device and storage medium | |
US10579699B2 (en) | Computing system with dynamic web page feature | |
CN108075922A (en) | A kind of telecommunication network management system | |
DE69925435T2 (en) | A computer-implemented method and apparatus for providing a logical access point to one or more files | |
CN112749215B (en) | Data display method and related equipment | |
CN106503038B (en) | A kind of method and system of automatic buffer network request returned data | |
CN107871009A (en) | A kind of method and device for gathering directory metadata |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |