CN109918558A - A kind of big data acquisition interface and acquisition method based on the technology that crawls - Google Patents
A kind of big data acquisition interface and acquisition method based on the technology that crawls Download PDFInfo
- Publication number
- CN109918558A CN109918558A CN201910193666.6A CN201910193666A CN109918558A CN 109918558 A CN109918558 A CN 109918558A CN 201910193666 A CN201910193666 A CN 201910193666A CN 109918558 A CN109918558 A CN 109918558A
- Authority
- CN
- China
- Prior art keywords
- module
- data
- crawls
- information
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000005516 engineering process Methods 0.000 title claims abstract description 19
- 238000000034 method Methods 0.000 title claims abstract description 10
- 230000009193 crawling Effects 0.000 claims abstract description 17
- 238000004458 analytical method Methods 0.000 claims abstract description 13
- 238000007405 data analysis Methods 0.000 claims abstract description 10
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000009194 climbing Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Abstract
The invention discloses a kind of big data acquisition interfaces and acquisition method based on the technology that crawls, and wherein method from network server the following steps are included: crawl data information by crawling module;Data information memory that module crawls will be crawled in data memory module;It is compared the data information that module crawls is crawled with the data information being pre-stored in contrast module by data analysis module, obtains processing data;The processing data that data processing module analysis is handled are sent to external receiving end via sending module.The module that crawls periodically crawls data information.The present invention is based on the technologies that crawls quickly and effectively to acquire big data, and analysis can be extracted to big data, obtains processing data, and be rapidly sent to receiving end, improves the collection analysis efficiency of big data.
Description
Technical field
The invention belongs to fields of communication technology, and in particular to a kind of based on the big data acquisition interface for the technology that crawls and acquisition
Method.
Background technique
With the rapid development of network, internet becomes the carrier of bulk information, wherein including public feelings information, social thing
Part, policy repercussion, various industries information, talent market etc. are the numbers of big data the analysis of public opinion system, macro economic analysis system
According to basis.
Big data refers to the data that can not be captured, managed and be handled with conventional software tool within the scope of certain time
Set is magnanimity, the Gao Zeng for needing new tupe that could have more preceding decision edge, see clearly discovery power and process optimization ability
Long rate and diversified information assets.By analyzing big data, can obtain many intelligence, it is deep and valuable
Information, for enterprise development, society management be of great significance, how to efficiently extract and use these information becomes
One huge challenge.
Summary of the invention
To solve the above-mentioned problems, the present invention provides a kind of big data acquisition interface based on the technology that crawls, based on climbing
Worm technology fast and effeciently collecting data information.
The technical solution of the present invention is as follows: a kind of big data acquisition interface based on the technology that crawls, including interface body, it is described
Include: in interface body
Module is crawled, for crawling data information from network server;
Data memory module, for storing the data information for crawling module and crawling;
Data processing module obtains processing number for carrying out analytical calculation processing to crawling the data information that module crawls
According to;
Sending module, the processing data for handling data processing module analysis are sent to external receiving end.
Preferably, the data processing module includes data analysis module and contrast module, pass through data analysis module
The data information that module crawls will be crawled to be compared with the data information being pre-stored in contrast module, obtain processing number
According to.
Preferably, further including being converted to digital letter for the analog information that module crawls will to be crawled in the interface body
The analog-digital converter of breath.
Preferably, the interface body includes the acquisition sub-interface of at least two file formats.
Preferably, further including timed task module in the interface body, the module that crawls is according to timed task mould
The clocking discipline timing of block crawls data information from network server.
The present invention also provides a kind of big data acquisition methods based on the technology that crawls, comprising the following steps:
(1) data information is crawled from network server by crawling module;
(2) data information memory that module crawls will be crawled in data memory module;
(3) data information and the number being pre-stored in contrast module that module crawls will be crawled by data analysis module
It is believed that breath is compared, processing data are obtained;
(4) the processing data that data processing module analysis is handled are sent to external receiving end via sending module.
Preferably, the module that crawls periodically crawls data information.
Compared with prior art, the beneficial effects of the present invention are embodied in:
The present invention is based on the technologies that crawls quickly and effectively to acquire big data, and analysis can be extracted to big data, obtains
Data are handled, and are rapidly sent to receiving end, improve the collection analysis efficiency of big data.
Specific embodiment
Based on the big data acquisition interface for the technology that crawls, including interface body in the present invention, interface body includes at least two
The acquisition sub-interface of kind file format.
Wherein, include: in interface body
Module is crawled, for crawling data information from network server;
Data memory module, for storing the data information for crawling module and crawling;
Data processing module obtains processing number for carrying out analytical calculation processing to crawling the data information that module crawls
According to;
Sending module, the processing data for handling data processing module analysis are sent to external receiving end.
Data processing module includes data analysis module and contrast module, will crawl module by data analysis module and crawl
Data information be compared with the data information being pre-stored in contrast module, obtain processing data.
It further include for the analog information that module crawls will be crawled to be converted to the modulus of digital information in above-mentioned interface body
Converter.
It further include timed task module in interface body in the present invention, the module that crawls is determined according to timed task module
When rule timing crawl data information from network server.
When above-mentioned interface is carried out data acquisition, specific method, comprising the following steps:
(1) data information is crawled from network server by crawling module, wherein crawling module timing crawls data letter
Breath;
(2) data information memory that module crawls will be crawled in data memory module;
(3) data information and the number being pre-stored in contrast module that module crawls will be crawled by data analysis module
It is believed that breath is compared, processing data are obtained;
(4) the processing data that data processing module analysis is handled are sent to external receiving end via sending module.
Claims (7)
1. a kind of big data acquisition interface based on the technology that crawls, including interface body, which is characterized in that in the interface body
Include:
Module is crawled, for crawling data information from network server;
Data memory module, for storing the data information for crawling module and crawling;
Data processing module obtains processing data for carrying out analytical calculation processing to crawling the data information that module crawls;
Sending module, the processing data for handling data processing module analysis are sent to external receiving end.
2. as described in claim 1 based on the big data acquisition interface for the technology that crawls, which is characterized in that the data processing mould
Block includes data analysis module and contrast module, will crawl data information that module crawls by data analysis module and deposits in advance
The data information being stored in contrast module is compared, and obtains processing data.
3. as described in claim 1 based on the big data acquisition interface for the technology that crawls, which is characterized in that in the interface body
It further include for the analog information that module crawls will be crawled to be converted to the analog-digital converter of digital information.
4. as described in claim 1 based on the big data acquisition interface for the technology that crawls, which is characterized in that the interface body packet
Include the acquisition sub-interface of at least two file formats.
5. as described in claim 1 based on the big data acquisition interface for the technology that crawls, which is characterized in that in the interface body
It further include timed task module, the module that crawls is climbed from network server according to the clocking discipline timing of timed task module
Take data information.
6. a kind of big data acquisition method based on the technology that crawls, which comprises the following steps:
(1) data information is crawled from network server by crawling module;
(2) data information memory that module crawls will be crawled in data memory module;
(3) data information that module crawls will be crawled by data analysis module and the data being pre-stored in contrast module is believed
Breath is compared, and obtains processing data;
(4) the processing data that data processing module analysis is handled are sent to external receiving end via sending module.
7. as claimed in claim 6 based on the big data acquisition method for the technology that crawls, which is characterized in that the module that crawls is determined
When crawl data information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910193666.6A CN109918558A (en) | 2019-03-14 | 2019-03-14 | A kind of big data acquisition interface and acquisition method based on the technology that crawls |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910193666.6A CN109918558A (en) | 2019-03-14 | 2019-03-14 | A kind of big data acquisition interface and acquisition method based on the technology that crawls |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109918558A true CN109918558A (en) | 2019-06-21 |
Family
ID=66964834
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910193666.6A Pending CN109918558A (en) | 2019-03-14 | 2019-03-14 | A kind of big data acquisition interface and acquisition method based on the technology that crawls |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109918558A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617174A (en) * | 2013-11-04 | 2014-03-05 | 同济大学 | Distributed searching method based on cloud computing |
CN107071009A (en) * | 2017-03-28 | 2017-08-18 | 江苏飞搏软件股份有限公司 | A kind of distributed big data crawler system of load balancing |
CN107273409A (en) * | 2017-05-03 | 2017-10-20 | 广州赫炎大数据科技有限公司 | A kind of network data acquisition, storage and processing method and system |
CN107633025A (en) * | 2017-08-30 | 2018-01-26 | 苏州朗动网络科技有限公司 | Big data business processing system and method |
CN107870986A (en) * | 2017-10-13 | 2018-04-03 | 平安科技(深圳)有限公司 | User behavior analysis method, application server and computer-readable recording medium based on reptile data |
-
2019
- 2019-03-14 CN CN201910193666.6A patent/CN109918558A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103617174A (en) * | 2013-11-04 | 2014-03-05 | 同济大学 | Distributed searching method based on cloud computing |
CN107071009A (en) * | 2017-03-28 | 2017-08-18 | 江苏飞搏软件股份有限公司 | A kind of distributed big data crawler system of load balancing |
CN107273409A (en) * | 2017-05-03 | 2017-10-20 | 广州赫炎大数据科技有限公司 | A kind of network data acquisition, storage and processing method and system |
CN107633025A (en) * | 2017-08-30 | 2018-01-26 | 苏州朗动网络科技有限公司 | Big data business processing system and method |
CN107870986A (en) * | 2017-10-13 | 2018-04-03 | 平安科技(深圳)有限公司 | User behavior analysis method, application server and computer-readable recording medium based on reptile data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108616534B (en) | Method and system for preventing DDoS (distributed denial of service) attack of Internet of things equipment based on block chain | |
CN111190939B (en) | User portrait construction method and device | |
CN101697545B (en) | Security incident correlation method and device as well as network server | |
CN106982150B (en) | Hadoop-based mobile internet user behavior analysis method | |
CN104462213A (en) | User behavior analysis method and system based on big data | |
CN101035111A (en) | Intelligent protocol parsing method and device | |
CN109218223B (en) | Robust network traffic classification method and system based on active learning | |
CN102542061B (en) | Intelligent product classification method | |
CN106101015A (en) | A kind of mobile Internet traffic classes labeling method and system | |
TW200613969A (en) | Method and system for distinguishing relevant network security threats using comparison of refined intrusion detection audits and intelligent security analysis | |
CN105447081A (en) | Cloud platform-oriented government affair and public opinion monitoring method | |
CN104376063A (en) | Multithreading web crawler method based on sort management and real-time information updating system | |
CN103218431A (en) | System and method for identifying and automatically acquiring webpage information | |
CN109818949A (en) | A kind of anti-crawler method neural network based | |
CN104022924A (en) | Method for detecting HTTP (hyper text transfer protocol) communication content | |
CN112491917A (en) | Unknown vulnerability identification method and device for Internet of things equipment | |
CN105631050A (en) | Rule-configuration-based method and system for extracting URL (uniform resource locator) search keywords | |
CN108289093A (en) | The construction method and structure system in App application condition codes library | |
CN112367273A (en) | Knowledge distillation-based flow classification method and device for deep neural network model | |
Rong et al. | Umvd-fsl: Unseen malware variants detection using few-shot learning | |
CN107506503A (en) | A kind of intellectual property outward appearance infringement analysis and management system | |
CN110120957B (en) | Safe disposal digital twin method and system based on intelligent scoring mechanism | |
Rizal et al. | Investigation Internet of Things (IoT) Device using Integrated Digital Forensics Investigation Framework (IDFIF) | |
CN109344333A (en) | A kind of internet big data analysis extracting method and system | |
CN109918558A (en) | A kind of big data acquisition interface and acquisition method based on the technology that crawls |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190621 |
|
RJ01 | Rejection of invention patent application after publication |