CN109918558A - A kind of big data acquisition interface and acquisition method based on the technology that crawls - Google Patents

A kind of big data acquisition interface and acquisition method based on the technology that crawls Download PDF

Info

Publication number
CN109918558A
CN109918558A CN201910193666.6A CN201910193666A CN109918558A CN 109918558 A CN109918558 A CN 109918558A CN 201910193666 A CN201910193666 A CN 201910193666A CN 109918558 A CN109918558 A CN 109918558A
Authority
CN
China
Prior art keywords
module
data
crawls
information
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910193666.6A
Other languages
Chinese (zh)
Inventor
马文
黄祖源
耿贞伟
赵晓平
苏文伟
张新阳
李辉
张雪坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan Baoer Technology Co Ltd
Information Center of Yunnan Power Grid Co Ltd
Original Assignee
Yunnan Baoer Technology Co Ltd
Information Center of Yunnan Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan Baoer Technology Co Ltd, Information Center of Yunnan Power Grid Co Ltd filed Critical Yunnan Baoer Technology Co Ltd
Priority to CN201910193666.6A priority Critical patent/CN109918558A/en
Publication of CN109918558A publication Critical patent/CN109918558A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a kind of big data acquisition interfaces and acquisition method based on the technology that crawls, and wherein method from network server the following steps are included: crawl data information by crawling module;Data information memory that module crawls will be crawled in data memory module;It is compared the data information that module crawls is crawled with the data information being pre-stored in contrast module by data analysis module, obtains processing data;The processing data that data processing module analysis is handled are sent to external receiving end via sending module.The module that crawls periodically crawls data information.The present invention is based on the technologies that crawls quickly and effectively to acquire big data, and analysis can be extracted to big data, obtains processing data, and be rapidly sent to receiving end, improves the collection analysis efficiency of big data.

Description

A kind of big data acquisition interface and acquisition method based on the technology that crawls
Technical field
The invention belongs to fields of communication technology, and in particular to a kind of based on the big data acquisition interface for the technology that crawls and acquisition Method.
Background technique
With the rapid development of network, internet becomes the carrier of bulk information, wherein including public feelings information, social thing Part, policy repercussion, various industries information, talent market etc. are the numbers of big data the analysis of public opinion system, macro economic analysis system According to basis.
Big data refers to the data that can not be captured, managed and be handled with conventional software tool within the scope of certain time Set is magnanimity, the Gao Zeng for needing new tupe that could have more preceding decision edge, see clearly discovery power and process optimization ability Long rate and diversified information assets.By analyzing big data, can obtain many intelligence, it is deep and valuable Information, for enterprise development, society management be of great significance, how to efficiently extract and use these information becomes One huge challenge.
Summary of the invention
To solve the above-mentioned problems, the present invention provides a kind of big data acquisition interface based on the technology that crawls, based on climbing Worm technology fast and effeciently collecting data information.
The technical solution of the present invention is as follows: a kind of big data acquisition interface based on the technology that crawls, including interface body, it is described Include: in interface body
Module is crawled, for crawling data information from network server;
Data memory module, for storing the data information for crawling module and crawling;
Data processing module obtains processing number for carrying out analytical calculation processing to crawling the data information that module crawls According to;
Sending module, the processing data for handling data processing module analysis are sent to external receiving end.
Preferably, the data processing module includes data analysis module and contrast module, pass through data analysis module The data information that module crawls will be crawled to be compared with the data information being pre-stored in contrast module, obtain processing number According to.
Preferably, further including being converted to digital letter for the analog information that module crawls will to be crawled in the interface body The analog-digital converter of breath.
Preferably, the interface body includes the acquisition sub-interface of at least two file formats.
Preferably, further including timed task module in the interface body, the module that crawls is according to timed task mould The clocking discipline timing of block crawls data information from network server.
The present invention also provides a kind of big data acquisition methods based on the technology that crawls, comprising the following steps:
(1) data information is crawled from network server by crawling module;
(2) data information memory that module crawls will be crawled in data memory module;
(3) data information and the number being pre-stored in contrast module that module crawls will be crawled by data analysis module It is believed that breath is compared, processing data are obtained;
(4) the processing data that data processing module analysis is handled are sent to external receiving end via sending module.
Preferably, the module that crawls periodically crawls data information.
Compared with prior art, the beneficial effects of the present invention are embodied in:
The present invention is based on the technologies that crawls quickly and effectively to acquire big data, and analysis can be extracted to big data, obtains Data are handled, and are rapidly sent to receiving end, improve the collection analysis efficiency of big data.
Specific embodiment
Based on the big data acquisition interface for the technology that crawls, including interface body in the present invention, interface body includes at least two The acquisition sub-interface of kind file format.
Wherein, include: in interface body
Module is crawled, for crawling data information from network server;
Data memory module, for storing the data information for crawling module and crawling;
Data processing module obtains processing number for carrying out analytical calculation processing to crawling the data information that module crawls According to;
Sending module, the processing data for handling data processing module analysis are sent to external receiving end.
Data processing module includes data analysis module and contrast module, will crawl module by data analysis module and crawl Data information be compared with the data information being pre-stored in contrast module, obtain processing data.
It further include for the analog information that module crawls will be crawled to be converted to the modulus of digital information in above-mentioned interface body Converter.
It further include timed task module in interface body in the present invention, the module that crawls is determined according to timed task module When rule timing crawl data information from network server.
When above-mentioned interface is carried out data acquisition, specific method, comprising the following steps:
(1) data information is crawled from network server by crawling module, wherein crawling module timing crawls data letter Breath;
(2) data information memory that module crawls will be crawled in data memory module;
(3) data information and the number being pre-stored in contrast module that module crawls will be crawled by data analysis module It is believed that breath is compared, processing data are obtained;
(4) the processing data that data processing module analysis is handled are sent to external receiving end via sending module.

Claims (7)

1. a kind of big data acquisition interface based on the technology that crawls, including interface body, which is characterized in that in the interface body Include:
Module is crawled, for crawling data information from network server;
Data memory module, for storing the data information for crawling module and crawling;
Data processing module obtains processing data for carrying out analytical calculation processing to crawling the data information that module crawls;
Sending module, the processing data for handling data processing module analysis are sent to external receiving end.
2. as described in claim 1 based on the big data acquisition interface for the technology that crawls, which is characterized in that the data processing mould Block includes data analysis module and contrast module, will crawl data information that module crawls by data analysis module and deposits in advance The data information being stored in contrast module is compared, and obtains processing data.
3. as described in claim 1 based on the big data acquisition interface for the technology that crawls, which is characterized in that in the interface body It further include for the analog information that module crawls will be crawled to be converted to the analog-digital converter of digital information.
4. as described in claim 1 based on the big data acquisition interface for the technology that crawls, which is characterized in that the interface body packet Include the acquisition sub-interface of at least two file formats.
5. as described in claim 1 based on the big data acquisition interface for the technology that crawls, which is characterized in that in the interface body It further include timed task module, the module that crawls is climbed from network server according to the clocking discipline timing of timed task module Take data information.
6. a kind of big data acquisition method based on the technology that crawls, which comprises the following steps:
(1) data information is crawled from network server by crawling module;
(2) data information memory that module crawls will be crawled in data memory module;
(3) data information that module crawls will be crawled by data analysis module and the data being pre-stored in contrast module is believed Breath is compared, and obtains processing data;
(4) the processing data that data processing module analysis is handled are sent to external receiving end via sending module.
7. as claimed in claim 6 based on the big data acquisition method for the technology that crawls, which is characterized in that the module that crawls is determined When crawl data information.
CN201910193666.6A 2019-03-14 2019-03-14 A kind of big data acquisition interface and acquisition method based on the technology that crawls Pending CN109918558A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910193666.6A CN109918558A (en) 2019-03-14 2019-03-14 A kind of big data acquisition interface and acquisition method based on the technology that crawls

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910193666.6A CN109918558A (en) 2019-03-14 2019-03-14 A kind of big data acquisition interface and acquisition method based on the technology that crawls

Publications (1)

Publication Number Publication Date
CN109918558A true CN109918558A (en) 2019-06-21

Family

ID=66964834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910193666.6A Pending CN109918558A (en) 2019-03-14 2019-03-14 A kind of big data acquisition interface and acquisition method based on the technology that crawls

Country Status (1)

Country Link
CN (1) CN109918558A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617174A (en) * 2013-11-04 2014-03-05 同济大学 Distributed searching method based on cloud computing
CN107071009A (en) * 2017-03-28 2017-08-18 江苏飞搏软件股份有限公司 A kind of distributed big data crawler system of load balancing
CN107273409A (en) * 2017-05-03 2017-10-20 广州赫炎大数据科技有限公司 A kind of network data acquisition, storage and processing method and system
CN107633025A (en) * 2017-08-30 2018-01-26 苏州朗动网络科技有限公司 Big data business processing system and method
CN107870986A (en) * 2017-10-13 2018-04-03 平安科技(深圳)有限公司 User behavior analysis method, application server and computer-readable recording medium based on reptile data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103617174A (en) * 2013-11-04 2014-03-05 同济大学 Distributed searching method based on cloud computing
CN107071009A (en) * 2017-03-28 2017-08-18 江苏飞搏软件股份有限公司 A kind of distributed big data crawler system of load balancing
CN107273409A (en) * 2017-05-03 2017-10-20 广州赫炎大数据科技有限公司 A kind of network data acquisition, storage and processing method and system
CN107633025A (en) * 2017-08-30 2018-01-26 苏州朗动网络科技有限公司 Big data business processing system and method
CN107870986A (en) * 2017-10-13 2018-04-03 平安科技(深圳)有限公司 User behavior analysis method, application server and computer-readable recording medium based on reptile data

Similar Documents

Publication Publication Date Title
CN108616534B (en) Method and system for preventing DDoS (distributed denial of service) attack of Internet of things equipment based on block chain
CN111190939B (en) User portrait construction method and device
CN101697545B (en) Security incident correlation method and device as well as network server
CN106982150B (en) Hadoop-based mobile internet user behavior analysis method
CN104462213A (en) User behavior analysis method and system based on big data
CN101035111A (en) Intelligent protocol parsing method and device
CN109218223B (en) Robust network traffic classification method and system based on active learning
CN102542061B (en) Intelligent product classification method
CN106101015A (en) A kind of mobile Internet traffic classes labeling method and system
TW200613969A (en) Method and system for distinguishing relevant network security threats using comparison of refined intrusion detection audits and intelligent security analysis
CN105447081A (en) Cloud platform-oriented government affair and public opinion monitoring method
CN104376063A (en) Multithreading web crawler method based on sort management and real-time information updating system
CN103218431A (en) System and method for identifying and automatically acquiring webpage information
CN109818949A (en) A kind of anti-crawler method neural network based
CN104022924A (en) Method for detecting HTTP (hyper text transfer protocol) communication content
CN112491917A (en) Unknown vulnerability identification method and device for Internet of things equipment
CN105631050A (en) Rule-configuration-based method and system for extracting URL (uniform resource locator) search keywords
CN108289093A (en) The construction method and structure system in App application condition codes library
CN112367273A (en) Knowledge distillation-based flow classification method and device for deep neural network model
Rong et al. Umvd-fsl: Unseen malware variants detection using few-shot learning
CN107506503A (en) A kind of intellectual property outward appearance infringement analysis and management system
CN110120957B (en) Safe disposal digital twin method and system based on intelligent scoring mechanism
Rizal et al. Investigation Internet of Things (IoT) Device using Integrated Digital Forensics Investigation Framework (IDFIF)
CN109344333A (en) A kind of internet big data analysis extracting method and system
CN109918558A (en) A kind of big data acquisition interface and acquisition method based on the technology that crawls

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190621

RJ01 Rejection of invention patent application after publication