CN107682382A - A kind of internet big data acquisition system and its application method - Google Patents
A kind of internet big data acquisition system and its application method Download PDFInfo
- Publication number
- CN107682382A CN107682382A CN201610616584.4A CN201610616584A CN107682382A CN 107682382 A CN107682382 A CN 107682382A CN 201610616584 A CN201610616584 A CN 201610616584A CN 107682382 A CN107682382 A CN 107682382A
- Authority
- CN
- China
- Prior art keywords
- data
- layer
- internet
- data acquisition
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
Abstract
The present invention relates to internet arena, there is provided a kind of internet big data acquisition system and its application method, including collection center and system administration center, collection center and system administration center are connected with extraneous service end respectively.System administration center includes:Node administration, system monitoring, information point are deployed to ensure effective monitoring and control of illegal activities.Collection center includes:Transport layer, application layer, data Layer.Application layer and data Layer are connected by transport layer with exogenous data and extraneous server.Collection center and system administration center cooperate, and obtaining data from exogenous data by transport layer is acquired and transmits to extraneous service end, data acquisition work.The present invention makes mutual cooperation of the client easily by collection center and system administration center, is automatically performed the collecting work of data, solves the problems, such as that the complexity of manual gathered data, technical requirements are high and limitation is larger.
Description
Technical field
The present invention relates to internet arena, more particularly to a kind of internet big data acquisition system and its application method.
Background technology
Web is public resource treasure-house maximum in the world, has at least 5.5 hundred million websites, page total number to exceed at present
Several trillion, all there are a large amount of valuable informations, such as list and the contact details of potential customers each second in flood tide increase, the inside,
The price list of competing product, real time financial news, public feelings information, word-of-mouth information, supply-demand information, scientific research periodical, forum postings,
Blog articles, latest news etc..Can be due to that key message is all that to be present in each website in the form of semi-structured substantial amounts of
In html web page, it is difficult to concentrate and be directly used.Internet data has:Big quantization, diversified, rapid, valueization.
On the one hand cause information content increasing, and on the other hand useful information is faced with that extraction is difficult, and cleaning is difficult, causes
Absence of information.At present, for big data acquisition technique, the product gathered manually now is provided in market, collection rule is extremely complex,
Technical requirements are higher, and limitation is bigger.Therefore, be the complexity of the manual gathered data of solution, technical requirements are high and limitation
Larger problem, prior art need to be improved.
The content of the invention
It is an object of the invention to provide a kind of internet big data acquisition system and its application method, to solve manually
The complexity of gathered data, technical requirements are high and the problem of limitation is larger.
In order to solve the above technical problems, embodiments of the present invention provide a kind of internet big data acquisition system and its
Application method, the internet big data acquisition system include collection center and system administration center, the collection center and institute
System administration center is stated respectively with extraneous service end to be connected.The system administration center includes:Node administration, system monitoring,
Information point is deployed to ensure effective monitoring and control of illegal activities.The collection center includes:Transport layer, application layer, data Layer.The application layer and the data Layer pass through institute
Transport layer is stated with exogenous data and the extraneous server to be connected.The collection center and the system administration center phase interworking
Close, obtaining data from the exogenous data by the transport layer is acquired and transmits to the extraneous service end, completes number
According to collecting work.
The present invention makes mutual cooperation of the client easily by collection center and system administration center, is automatically performed data
Collecting work, solve the problems, such as that the complexity of manual gathered data, technical requirements are high and limitation is larger.
Further, the transport layer includes receive an assignment module, unified resource locating reporting module, gathered data transmission
Module;The transport layer is connected by the gathered data transport module with the exogenous data.
Further, the application layer includes acquisition module, transport module, access modules, step process module.
Further, the data Layer includes data processing module, nodal information extraction module.
Further, the nodal information extraction module includes node content information extraction modules, node attribute information carries
Modulus block, list node information extraction modules.
Further, the collection center also includes browser processing module, circular treatment module.
Further, the circular treatment module includes circulation starting point function and circulation end point function.
Further, before the browser processing module includes browser opening function, browser closing function, browser
Enter function, browser returns to function.
A kind of method for applying internet big data acquisition system as claimed in claim, its operating procedure are as follows:
Step 1:Collection described first receives an assignment centrally through the transport layer;
Step 2:Then the transport layer carries out data exchange with the exogenous data;
Step 3:The administrative center receives the data of the extraneous service end, coordinates the application layer and the data Layer to institute
Exogenous data is stated to carry out specifically interaction and gather;
Step 4:The collection center interacts the data collected by the transport layer and the extraneous server.
The present invention makes mutual cooperation of the client easily by collection center and system administration center, is automatically performed data
Collecting work, solve the problems, such as that the complexity of manual gathered data, technical requirements are high and limitation is larger.The present invention is for mutual
The web retrieval for big data of networking, there is provided simplified collection rule configuration, it is therefore an objective to realize internet big data collection rule
Simplify configuration, to reduce the threshold that uses of data acquisition, easy acquisition step, and then improve the sharing of internet data.It is logical
Cross further to improve and facilitate various operations of the client to browser.
Brief description of the drawings
Fig. 1 is the work block diagram of a kind of internet big data acquisition system of the present invention and its application method.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, each reality below in conjunction with accompanying drawing to the present invention
The mode of applying is explained in detail.However, it will be understood by those skilled in the art that in the embodiments of the present invention
In, in order that reader more fully understands the application and proposes many ins and outs.But even if without these ins and outs and
Many variations and modification based on following embodiment, each claim of the application skill claimed can also be realized
Art scheme.
Embodiments of the present invention are related to a kind of internet big data acquisition system and its application method, with reference to shown in Fig. 1,
A kind of internet big data acquisition system of the present invention includes collection center and system administration center, gathers in center and system administration
The heart is connected with extraneous service end respectively.System administration center includes:Node administration 1, system monitoring 2, information point deploy to ensure effective monitoring and control of illegal activities 3.
Collection center includes:Transport layer 4, application layer 5, data Layer 6.Transport layer 4 includes receive an assignment module, unified resource
Locating reporting module, gathered data transport module;Transport layer 4 is connected by gathered data transport module with exogenous data.Should
Include acquisition module, transport module, access modules, step process module with layer 5.Data Layer 6 includes data processing module, node
Information extraction modules.Nodal information extraction module includes node content information extraction modules, node attribute information extraction module, row
Table nodal information extraction module.
Collection center also includes browser processing module 7, circular treatment module 8.Circular treatment module 8 starts including circulation
Point function and circulation end point function.Browser processing module 7 includes browser opening function, browser closing function, browsed
Device advancement function, browser return to function.
Application layer 5 and data Layer 6 are connected by transport layer 4 with exogenous data and extraneous server.Collection center and it is
The administrative center that unites cooperates, and obtaining data from exogenous data by transport layer 4 is acquired and transmits to extraneous service end, complete
Into data collection task.
The method used, its operating procedure are as follows:
Step 1:Gather and received an assignment centrally through transport layer 4 first;
Step 2:Then transport layer 4 carries out data exchange with exogenous data;
Step 3:It is specific to exogenous data progress that administrative center receives the data of extraneous service end, fit applications layer 5 and data Layer 6
Interaction and collection;
Step 4:Collection center interacts the data collected by transport layer 4 and extraneous server.
Concrete operations are as follows:
One, preparation:
Step 1:Network address is opened in selection, sets corresponding network address content;
Step 2:Button/link is clicked in selection, needs the information that gathers browsing class mark and hitting;
Step 3:Button/link is clicked in selection, and " list " chart is clicked in browsing area;
Step 4:Circulation starting point is set;
Step 5:Button/link, the first row data of click data list are clicked in selection;
Step 6:Selective extraction data, and extract field;
Step 7:Because the page is without return push-button, so selection browser returns;
Step 8:Button/link is clicked in selection, clicks on button " lower one page ";
Step 9:Circulation end point is set;
It is provided with, starts collection.
Two, collecting work
Step 10:Collection reads effective information in exogenous data storehouse centrally through transport layer 4;
Step 11:Administrative center's fit applications layer and data Layer 6 carry out specifically interaction to exogenous data and gathered;
Step 12:Collection center interacts the data collected by transport layer 4 and extraneous server;
Step 13:Exit collecting work.
The present invention makes mutual cooperation of the client easily by collection center and system administration center, is automatically performed data
Collecting work, solve the problems, such as that the complexity of manual gathered data, technical requirements are high and limitation is larger.The present invention is for mutual
The web retrieval for big data of networking, there is provided simplified collection rule configuration, it is therefore an objective to realize internet big data collection rule
Simplify configuration, to reduce the threshold that uses of data acquisition, easy acquisition step, and then improve the sharing of internet data.It is logical
Cross further to improve and facilitate various operations of the client to browser.
It will be understood by those skilled in the art that the respective embodiments described above are to realize the specific embodiment of the present invention,
And in actual applications, can to it, various changes can be made in the form and details, without departing from the spirit and scope of the present invention.
Claims (9)
- A kind of 1. internet big data acquisition system, it is characterised in that:The internet big data acquisition system is included in collection The heart and system administration center, the collection center and the system administration center are connected with extraneous service end respectively;The system administration center includes:Node administration, system monitoring, information point are deployed to ensure effective monitoring and control of illegal activities;The collection center includes:Transport layer, application layer, data Layer;The application layer and the data Layer are connected by the transport layer with exogenous data and the extraneous server;The collection center and the system administration center cooperate, by the transport layer from the exogenous data number Worked according to being acquired and transmitting to the extraneous service end, data acquisition.
- 2. big data acquisition system in internet as claimed in claim 1, it is characterised in that:The transport layer includes receiving an assignment Module, unified resource locating reporting module, gathered data transport module;The transport layer passes through the gathered data transport module It is connected with the exogenous data.
- 3. big data acquisition system in internet as claimed in claim 1, it is characterised in that:The application layer includes obtaining mould Block, transport module, access modules, step process module.
- 4. big data acquisition system in internet as claimed in claim 1, it is characterised in that:The data Layer includes data processing Module, nodal information extraction module.
- 5. big data acquisition system in internet as claimed in claim 4, it is characterised in that:The nodal information extraction module bag Include node content information extraction modules, node attribute information extraction module, list node information extraction modules.
- 6. big data acquisition system in internet as claimed in claim 1, it is characterised in that:The collection center also includes browsing Device processing module, circular treatment module.
- 7. big data acquisition system in internet as claimed in claim 6, it is characterised in that:The circular treatment module includes following Ring starting point function and circulation end point function.
- 8. big data acquisition system in internet as claimed in claim 6, it is characterised in that:The browser processing module includes Browser opening function, browser closing function, browser advancement function, browser return to function.
- 9. the application method of a kind of internet big data acquisition system applied as described in claim 1-8 and its application method, It is characterized in that:Step 1:Collection described first receives an assignment centrally through the transport layer;Step 2:Then the transport layer carries out data exchange with the exogenous data;Step 3:The administrative center receives the data of the extraneous service end, coordinates the application layer and the data Layer to institute Exogenous data is stated to carry out specifically interaction and gather;Step 4:The collection center interacts the data collected by the transport layer and the extraneous server.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610616584.4A CN107682382A (en) | 2016-08-01 | 2016-08-01 | A kind of internet big data acquisition system and its application method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610616584.4A CN107682382A (en) | 2016-08-01 | 2016-08-01 | A kind of internet big data acquisition system and its application method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107682382A true CN107682382A (en) | 2018-02-09 |
Family
ID=61133043
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610616584.4A Pending CN107682382A (en) | 2016-08-01 | 2016-08-01 | A kind of internet big data acquisition system and its application method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107682382A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102375837A (en) * | 2010-08-19 | 2012-03-14 | 中国移动通信集团公司 | Data acquiring system and method |
CN103593502A (en) * | 2013-10-16 | 2014-02-19 | 中国水利水电科学研究院 | Temperature and stress analysis and back analysis method used for crack control of concrete dam |
CN104112207A (en) * | 2014-07-29 | 2014-10-22 | 浪潮软件集团有限公司 | Electronic commerce transaction monitoring method based on internet data |
CN104767803A (en) * | 2015-03-27 | 2015-07-08 | 浪潮集团有限公司 | Internet data collecting method |
CN104820670A (en) * | 2015-03-13 | 2015-08-05 | 国家电网公司 | Method for acquiring and storing big data of power information |
CN104915415A (en) * | 2015-06-08 | 2015-09-16 | 浪潮集团有限公司 | Distributed internet data collection and analysis system |
CN105683967A (en) * | 2016-01-30 | 2016-06-15 | 深圳市博信诺达经贸咨询有限公司 | Web page grabbing method and web page grabbing system based on big data |
-
2016
- 2016-08-01 CN CN201610616584.4A patent/CN107682382A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102375837A (en) * | 2010-08-19 | 2012-03-14 | 中国移动通信集团公司 | Data acquiring system and method |
CN103593502A (en) * | 2013-10-16 | 2014-02-19 | 中国水利水电科学研究院 | Temperature and stress analysis and back analysis method used for crack control of concrete dam |
CN104112207A (en) * | 2014-07-29 | 2014-10-22 | 浪潮软件集团有限公司 | Electronic commerce transaction monitoring method based on internet data |
CN104820670A (en) * | 2015-03-13 | 2015-08-05 | 国家电网公司 | Method for acquiring and storing big data of power information |
CN104767803A (en) * | 2015-03-27 | 2015-07-08 | 浪潮集团有限公司 | Internet data collecting method |
CN104915415A (en) * | 2015-06-08 | 2015-09-16 | 浪潮集团有限公司 | Distributed internet data collection and analysis system |
CN105683967A (en) * | 2016-01-30 | 2016-06-15 | 深圳市博信诺达经贸咨询有限公司 | Web page grabbing method and web page grabbing system based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103297469B (en) | The acquisition method of a kind of website data and device | |
CN104077402B (en) | Data processing method and data handling system | |
Beel et al. | Mr. DLib: recommendations-as-a-service (RaaS) for academia | |
CN101408877B (en) | System and method for loading tree node | |
CN105677842A (en) | Log analysis system based on Hadoop big data processing technique | |
CN104462547B (en) | A kind of method and system of configurable collecting webpage data | |
CN101355587B (en) | Method and apparatus for obtaining URL information as well as method and system for implementing searching engine | |
CN103699822A (en) | Application system and detection method for users' abnormal behaviors in e-commerce based on mouse behaviors | |
CN103218431A (en) | System and method for identifying and automatically acquiring webpage information | |
CN103458042A (en) | Microblog advertisement user detection method | |
CN105069087A (en) | Web log data mining based website optimization method | |
CN102750352A (en) | Method and device for classified collection of historical access records in browser | |
CN103399877A (en) | Multi-Android-client service sharing method and system | |
CN103970843A (en) | Conversation combining method based on UUID in Web log preprocessing | |
CN102724184A (en) | Webpage collecting and sharing method and server | |
CN108124007A (en) | The method and apparatus of message data real-time Transmission | |
CN107911466A (en) | A kind of association method under multi-layer framework | |
CN104598604A (en) | Browsing method of website navigation applied in various browsers | |
CN103778156A (en) | Method and device for searching for data and server for data search | |
CN107370628A (en) | Based on the log processing method and system buried a little | |
CN100366002C (en) | Shared access testing system of internet | |
CN106412003A (en) | Information pushing method and device, and information request device | |
CN106651453A (en) | Network platform-oriented automatic promotion method and system, and computing device | |
CN107682382A (en) | A kind of internet big data acquisition system and its application method | |
CN105653533B (en) | A kind of method and apparatus updating classification associated set of words |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180209 |