CN107682382A - A kind of internet big data acquisition system and its application method - Google Patents

A kind of internet big data acquisition system and its application method Download PDF

Info

Publication number
CN107682382A
CN107682382A CN201610616584.4A CN201610616584A CN107682382A CN 107682382 A CN107682382 A CN 107682382A CN 201610616584 A CN201610616584 A CN 201610616584A CN 107682382 A CN107682382 A CN 107682382A
Authority
CN
China
Prior art keywords
data
layer
internet
data acquisition
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610616584.4A
Other languages
Chinese (zh)
Inventor
陈浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hui Shi Electronic Commerce (shanghai) Co Ltd
Original Assignee
Hui Shi Electronic Commerce (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hui Shi Electronic Commerce (shanghai) Co Ltd filed Critical Hui Shi Electronic Commerce (shanghai) Co Ltd
Priority to CN201610616584.4A priority Critical patent/CN107682382A/en
Publication of CN107682382A publication Critical patent/CN107682382A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]

Abstract

The present invention relates to internet arena, there is provided a kind of internet big data acquisition system and its application method, including collection center and system administration center, collection center and system administration center are connected with extraneous service end respectively.System administration center includes:Node administration, system monitoring, information point are deployed to ensure effective monitoring and control of illegal activities.Collection center includes:Transport layer, application layer, data Layer.Application layer and data Layer are connected by transport layer with exogenous data and extraneous server.Collection center and system administration center cooperate, and obtaining data from exogenous data by transport layer is acquired and transmits to extraneous service end, data acquisition work.The present invention makes mutual cooperation of the client easily by collection center and system administration center, is automatically performed the collecting work of data, solves the problems, such as that the complexity of manual gathered data, technical requirements are high and limitation is larger.

Description

A kind of internet big data acquisition system and its application method
Technical field
The present invention relates to internet arena, more particularly to a kind of internet big data acquisition system and its application method.
Background technology
Web is public resource treasure-house maximum in the world, has at least 5.5 hundred million websites, page total number to exceed at present Several trillion, all there are a large amount of valuable informations, such as list and the contact details of potential customers each second in flood tide increase, the inside, The price list of competing product, real time financial news, public feelings information, word-of-mouth information, supply-demand information, scientific research periodical, forum postings, Blog articles, latest news etc..Can be due to that key message is all that to be present in each website in the form of semi-structured substantial amounts of In html web page, it is difficult to concentrate and be directly used.Internet data has:Big quantization, diversified, rapid, valueization. On the one hand cause information content increasing, and on the other hand useful information is faced with that extraction is difficult, and cleaning is difficult, causes Absence of information.At present, for big data acquisition technique, the product gathered manually now is provided in market, collection rule is extremely complex, Technical requirements are higher, and limitation is bigger.Therefore, be the complexity of the manual gathered data of solution, technical requirements are high and limitation Larger problem, prior art need to be improved.
The content of the invention
It is an object of the invention to provide a kind of internet big data acquisition system and its application method, to solve manually The complexity of gathered data, technical requirements are high and the problem of limitation is larger.
In order to solve the above technical problems, embodiments of the present invention provide a kind of internet big data acquisition system and its Application method, the internet big data acquisition system include collection center and system administration center, the collection center and institute System administration center is stated respectively with extraneous service end to be connected.The system administration center includes:Node administration, system monitoring, Information point is deployed to ensure effective monitoring and control of illegal activities.The collection center includes:Transport layer, application layer, data Layer.The application layer and the data Layer pass through institute Transport layer is stated with exogenous data and the extraneous server to be connected.The collection center and the system administration center phase interworking Close, obtaining data from the exogenous data by the transport layer is acquired and transmits to the extraneous service end, completes number According to collecting work.
The present invention makes mutual cooperation of the client easily by collection center and system administration center, is automatically performed data Collecting work, solve the problems, such as that the complexity of manual gathered data, technical requirements are high and limitation is larger.
Further, the transport layer includes receive an assignment module, unified resource locating reporting module, gathered data transmission Module;The transport layer is connected by the gathered data transport module with the exogenous data.
Further, the application layer includes acquisition module, transport module, access modules, step process module.
Further, the data Layer includes data processing module, nodal information extraction module.
Further, the nodal information extraction module includes node content information extraction modules, node attribute information carries Modulus block, list node information extraction modules.
Further, the collection center also includes browser processing module, circular treatment module.
Further, the circular treatment module includes circulation starting point function and circulation end point function.
Further, before the browser processing module includes browser opening function, browser closing function, browser Enter function, browser returns to function.
A kind of method for applying internet big data acquisition system as claimed in claim, its operating procedure are as follows:
Step 1:Collection described first receives an assignment centrally through the transport layer;
Step 2:Then the transport layer carries out data exchange with the exogenous data;
Step 3:The administrative center receives the data of the extraneous service end, coordinates the application layer and the data Layer to institute Exogenous data is stated to carry out specifically interaction and gather;
Step 4:The collection center interacts the data collected by the transport layer and the extraneous server.
The present invention makes mutual cooperation of the client easily by collection center and system administration center, is automatically performed data Collecting work, solve the problems, such as that the complexity of manual gathered data, technical requirements are high and limitation is larger.The present invention is for mutual The web retrieval for big data of networking, there is provided simplified collection rule configuration, it is therefore an objective to realize internet big data collection rule Simplify configuration, to reduce the threshold that uses of data acquisition, easy acquisition step, and then improve the sharing of internet data.It is logical Cross further to improve and facilitate various operations of the client to browser.
Brief description of the drawings
Fig. 1 is the work block diagram of a kind of internet big data acquisition system of the present invention and its application method.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, each reality below in conjunction with accompanying drawing to the present invention The mode of applying is explained in detail.However, it will be understood by those skilled in the art that in the embodiments of the present invention In, in order that reader more fully understands the application and proposes many ins and outs.But even if without these ins and outs and Many variations and modification based on following embodiment, each claim of the application skill claimed can also be realized Art scheme.
Embodiments of the present invention are related to a kind of internet big data acquisition system and its application method, with reference to shown in Fig. 1, A kind of internet big data acquisition system of the present invention includes collection center and system administration center, gathers in center and system administration The heart is connected with extraneous service end respectively.System administration center includes:Node administration 1, system monitoring 2, information point deploy to ensure effective monitoring and control of illegal activities 3.
Collection center includes:Transport layer 4, application layer 5, data Layer 6.Transport layer 4 includes receive an assignment module, unified resource Locating reporting module, gathered data transport module;Transport layer 4 is connected by gathered data transport module with exogenous data.Should Include acquisition module, transport module, access modules, step process module with layer 5.Data Layer 6 includes data processing module, node Information extraction modules.Nodal information extraction module includes node content information extraction modules, node attribute information extraction module, row Table nodal information extraction module.
Collection center also includes browser processing module 7, circular treatment module 8.Circular treatment module 8 starts including circulation Point function and circulation end point function.Browser processing module 7 includes browser opening function, browser closing function, browsed Device advancement function, browser return to function.
Application layer 5 and data Layer 6 are connected by transport layer 4 with exogenous data and extraneous server.Collection center and it is The administrative center that unites cooperates, and obtaining data from exogenous data by transport layer 4 is acquired and transmits to extraneous service end, complete Into data collection task.
The method used, its operating procedure are as follows:
Step 1:Gather and received an assignment centrally through transport layer 4 first;
Step 2:Then transport layer 4 carries out data exchange with exogenous data;
Step 3:It is specific to exogenous data progress that administrative center receives the data of extraneous service end, fit applications layer 5 and data Layer 6 Interaction and collection;
Step 4:Collection center interacts the data collected by transport layer 4 and extraneous server.
Concrete operations are as follows:
One, preparation:
Step 1:Network address is opened in selection, sets corresponding network address content;
Step 2:Button/link is clicked in selection, needs the information that gathers browsing class mark and hitting;
Step 3:Button/link is clicked in selection, and " list " chart is clicked in browsing area;
Step 4:Circulation starting point is set;
Step 5:Button/link, the first row data of click data list are clicked in selection;
Step 6:Selective extraction data, and extract field;
Step 7:Because the page is without return push-button, so selection browser returns;
Step 8:Button/link is clicked in selection, clicks on button " lower one page ";
Step 9:Circulation end point is set;
It is provided with, starts collection.
Two, collecting work
Step 10:Collection reads effective information in exogenous data storehouse centrally through transport layer 4;
Step 11:Administrative center's fit applications layer and data Layer 6 carry out specifically interaction to exogenous data and gathered;
Step 12:Collection center interacts the data collected by transport layer 4 and extraneous server;
Step 13:Exit collecting work.
The present invention makes mutual cooperation of the client easily by collection center and system administration center, is automatically performed data Collecting work, solve the problems, such as that the complexity of manual gathered data, technical requirements are high and limitation is larger.The present invention is for mutual The web retrieval for big data of networking, there is provided simplified collection rule configuration, it is therefore an objective to realize internet big data collection rule Simplify configuration, to reduce the threshold that uses of data acquisition, easy acquisition step, and then improve the sharing of internet data.It is logical Cross further to improve and facilitate various operations of the client to browser.
It will be understood by those skilled in the art that the respective embodiments described above are to realize the specific embodiment of the present invention, And in actual applications, can to it, various changes can be made in the form and details, without departing from the spirit and scope of the present invention.

Claims (9)

  1. A kind of 1. internet big data acquisition system, it is characterised in that:The internet big data acquisition system is included in collection The heart and system administration center, the collection center and the system administration center are connected with extraneous service end respectively;
    The system administration center includes:Node administration, system monitoring, information point are deployed to ensure effective monitoring and control of illegal activities;
    The collection center includes:Transport layer, application layer, data Layer;
    The application layer and the data Layer are connected by the transport layer with exogenous data and the extraneous server;
    The collection center and the system administration center cooperate, by the transport layer from the exogenous data number Worked according to being acquired and transmitting to the extraneous service end, data acquisition.
  2. 2. big data acquisition system in internet as claimed in claim 1, it is characterised in that:The transport layer includes receiving an assignment Module, unified resource locating reporting module, gathered data transport module;The transport layer passes through the gathered data transport module It is connected with the exogenous data.
  3. 3. big data acquisition system in internet as claimed in claim 1, it is characterised in that:The application layer includes obtaining mould Block, transport module, access modules, step process module.
  4. 4. big data acquisition system in internet as claimed in claim 1, it is characterised in that:The data Layer includes data processing Module, nodal information extraction module.
  5. 5. big data acquisition system in internet as claimed in claim 4, it is characterised in that:The nodal information extraction module bag Include node content information extraction modules, node attribute information extraction module, list node information extraction modules.
  6. 6. big data acquisition system in internet as claimed in claim 1, it is characterised in that:The collection center also includes browsing Device processing module, circular treatment module.
  7. 7. big data acquisition system in internet as claimed in claim 6, it is characterised in that:The circular treatment module includes following Ring starting point function and circulation end point function.
  8. 8. big data acquisition system in internet as claimed in claim 6, it is characterised in that:The browser processing module includes Browser opening function, browser closing function, browser advancement function, browser return to function.
  9. 9. the application method of a kind of internet big data acquisition system applied as described in claim 1-8 and its application method, It is characterized in that:
    Step 1:Collection described first receives an assignment centrally through the transport layer;
    Step 2:Then the transport layer carries out data exchange with the exogenous data;
    Step 3:The administrative center receives the data of the extraneous service end, coordinates the application layer and the data Layer to institute Exogenous data is stated to carry out specifically interaction and gather;
    Step 4:The collection center interacts the data collected by the transport layer and the extraneous server.
CN201610616584.4A 2016-08-01 2016-08-01 A kind of internet big data acquisition system and its application method Pending CN107682382A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610616584.4A CN107682382A (en) 2016-08-01 2016-08-01 A kind of internet big data acquisition system and its application method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610616584.4A CN107682382A (en) 2016-08-01 2016-08-01 A kind of internet big data acquisition system and its application method

Publications (1)

Publication Number Publication Date
CN107682382A true CN107682382A (en) 2018-02-09

Family

ID=61133043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610616584.4A Pending CN107682382A (en) 2016-08-01 2016-08-01 A kind of internet big data acquisition system and its application method

Country Status (1)

Country Link
CN (1) CN107682382A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102375837A (en) * 2010-08-19 2012-03-14 中国移动通信集团公司 Data acquiring system and method
CN103593502A (en) * 2013-10-16 2014-02-19 中国水利水电科学研究院 Temperature and stress analysis and back analysis method used for crack control of concrete dam
CN104112207A (en) * 2014-07-29 2014-10-22 浪潮软件集团有限公司 Electronic commerce transaction monitoring method based on internet data
CN104767803A (en) * 2015-03-27 2015-07-08 浪潮集团有限公司 Internet data collecting method
CN104820670A (en) * 2015-03-13 2015-08-05 国家电网公司 Method for acquiring and storing big data of power information
CN104915415A (en) * 2015-06-08 2015-09-16 浪潮集团有限公司 Distributed internet data collection and analysis system
CN105683967A (en) * 2016-01-30 2016-06-15 深圳市博信诺达经贸咨询有限公司 Web page grabbing method and web page grabbing system based on big data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102375837A (en) * 2010-08-19 2012-03-14 中国移动通信集团公司 Data acquiring system and method
CN103593502A (en) * 2013-10-16 2014-02-19 中国水利水电科学研究院 Temperature and stress analysis and back analysis method used for crack control of concrete dam
CN104112207A (en) * 2014-07-29 2014-10-22 浪潮软件集团有限公司 Electronic commerce transaction monitoring method based on internet data
CN104820670A (en) * 2015-03-13 2015-08-05 国家电网公司 Method for acquiring and storing big data of power information
CN104767803A (en) * 2015-03-27 2015-07-08 浪潮集团有限公司 Internet data collecting method
CN104915415A (en) * 2015-06-08 2015-09-16 浪潮集团有限公司 Distributed internet data collection and analysis system
CN105683967A (en) * 2016-01-30 2016-06-15 深圳市博信诺达经贸咨询有限公司 Web page grabbing method and web page grabbing system based on big data

Similar Documents

Publication Publication Date Title
CN103297469B (en) The acquisition method of a kind of website data and device
CN104077402B (en) Data processing method and data handling system
Beel et al. Mr. DLib: recommendations-as-a-service (RaaS) for academia
CN101408877B (en) System and method for loading tree node
CN105677842A (en) Log analysis system based on Hadoop big data processing technique
CN104462547B (en) A kind of method and system of configurable collecting webpage data
CN101355587B (en) Method and apparatus for obtaining URL information as well as method and system for implementing searching engine
CN103699822A (en) Application system and detection method for users' abnormal behaviors in e-commerce based on mouse behaviors
CN103218431A (en) System and method for identifying and automatically acquiring webpage information
CN103458042A (en) Microblog advertisement user detection method
CN105069087A (en) Web log data mining based website optimization method
CN102750352A (en) Method and device for classified collection of historical access records in browser
CN103399877A (en) Multi-Android-client service sharing method and system
CN103970843A (en) Conversation combining method based on UUID in Web log preprocessing
CN102724184A (en) Webpage collecting and sharing method and server
CN108124007A (en) The method and apparatus of message data real-time Transmission
CN107911466A (en) A kind of association method under multi-layer framework
CN104598604A (en) Browsing method of website navigation applied in various browsers
CN103778156A (en) Method and device for searching for data and server for data search
CN107370628A (en) Based on the log processing method and system buried a little
CN100366002C (en) Shared access testing system of internet
CN106412003A (en) Information pushing method and device, and information request device
CN106651453A (en) Network platform-oriented automatic promotion method and system, and computing device
CN107682382A (en) A kind of internet big data acquisition system and its application method
CN105653533B (en) A kind of method and apparatus updating classification associated set of words

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180209