CN109241432A - Discrete data acquisition analysis system and method - Google Patents

Discrete data acquisition analysis system and method Download PDF

Info

Publication number
CN109241432A
CN109241432A CN201811045808.6A CN201811045808A CN109241432A CN 109241432 A CN109241432 A CN 109241432A CN 201811045808 A CN201811045808 A CN 201811045808A CN 109241432 A CN109241432 A CN 109241432A
Authority
CN
China
Prior art keywords
data
module
term
discrete
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811045808.6A
Other languages
Chinese (zh)
Inventor
杨率
付乐爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan Dongba Wen Information Technology Co Ltd
Original Assignee
Yunnan Dongba Wen Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan Dongba Wen Information Technology Co Ltd filed Critical Yunnan Dongba Wen Information Technology Co Ltd
Priority to CN201811045808.6A priority Critical patent/CN109241432A/en
Publication of CN109241432A publication Critical patent/CN109241432A/en
Pending legal-status Critical Current

Links

Abstract

The present invention discloses a kind of discrete data acquisition analysis system and method, it include: that data acquisition module acquires discrete data for real-time various dimensions, data include internet data, Hadoop data, server running log data and the data that can access other operation systems of web page media text data, web crawlers crawl;Data analysis module is used to carry out data parsing and data cleansing to those collected data;Data memory module is used to store the data after data cleansing by inverted index mode, and establishes the mapping relations in data between participle and locating document;Data retrieval module extracts keyword, and the corresponding target data of term is grabbed out from data memory module for analyzing the term of input;Data visualization module is used to carry out marking sequence to target data according to term, shows data dependence with topological diagram.Data acquisition channel of the present invention is more, retrieves more efficient more accurate.

Description

Discrete data acquisition analysis system and method
Technical field
The present invention relates to big datas to grab technical field, more particularly to a kind of discrete data acquisition analysis system and side Method.
Background technique
Big data era afterwards, it is important to how from PB grades of most roots, different numbers after it experienced data and largely acquire According to inherent, the potential relationship of cleaning and then mining data in the isomeric data of type, discrete data, or even distinguish hot topic degree and phase Guan Du, and visually dissolve with various charts the internal relations of data.And currently, acquisition data in terms of usually acquisition channel it is few, adopt The data of collection just can not be comprehensive, cause search result not precisely, low efficiency;Lack depth in terms of data mining and excavates discrete data Inherent, potential effective ways, cause data user rate not high.
Summary of the invention
The present invention is in view of the problems of the existing technology and insufficient, provides a kind of discrete data acquisition analysis system and side Method.
The present invention is to solve above-mentioned technical problem by following technical proposals:
The present invention provides a kind of discrete data acquisition analysis system, it is characterized in that comprising a data acquisition module, a data Analysis module, a data memory module, a data retrieval module and a data visualization module;
The data acquisition module acquires discrete data for real-time various dimensions, and the data include web page media textual data According to the internet data of, web crawlers crawl, Hadoop data, server running log data and other business can be accessed The data of system;
The data analysis module is used to carry out data parsing and data cleansing to those collected data;
The data memory module is used to store the data after data cleansing by inverted index mode, and establishes in data and divide Mapping relations between word and locating document;
The data retrieval module extracts keyword, and from data memory module for analyzing the term of input Grab out the corresponding target data of term;
The data visualization module is used to carry out marking sequence to target data according to term, shows data phase with topological diagram Guan Xing.
Preferably, the data retrieval module is used to provide corresponding segmenter for different language.
Preferably, other described operation systems include traditional database, the traditional database include oracle database, Mysql database and sqlserver database.
The present invention also provides a kind of discrete data capturing analysis methods, it is characterized in that comprising following steps:
Acquire to S1, real-time various dimensions discrete data, the data include that web page media text data, web crawlers grab Internet data, Hadoop data, server running log data and the data that other operation systems can be accessed;
S2, data parsing and data cleansing are carried out to those collected data;
S3, the data after data cleansing are stored by inverted index mode, and established in data between participle and locating document Mapping relations;
S4, the term of input is analyzed, extracts keyword, and it is corresponding to grab out from data memory module term Data in document are as target data;
S5, marking sequence is carried out to target data according to term, data dependence is showed with topological diagram.
Preferably, in step s 4, providing corresponding segmenter for different language.
Preferably, other described operation systems include traditional database, the traditional database include oracle database, Mysql database and sqlserver database.
On the basis of common knowledge of the art, above-mentioned each optimum condition, can any combination to get each preferable reality of the present invention Example.
The positive effect of the present invention is that:
1. the acquisition channel of data of the present invention is more, data are more polynary, more comprehensively, for realizing that the big data in certain industry acquires Solid guarantee is provided with analysis.
2. providing corresponding segmenter the present invention provides the identification of the language vocabulary of almost Perfect for different language, being Realize that search provides wider dimension and higher-quality guarantee in real time.
3. the present invention realizes the excavation and foundation of the inherent complete set of a large amount of non-relational data, and can be with the figure of topological diagram Effect displaying, and can external any mainstream map demonstration tool or any type of figure exhibition of plug-in unit progress.
4. the present invention can be as needed by data dump in cluster to file, mail, log, traditional database, hadoop Deng storage terminal.
Detailed description of the invention
Fig. 1 is the structural block diagram of the discrete data acquisition analysis system of present pre-ferred embodiments;
Fig. 2 is the relational graph of the original document of present pre-ferred embodiments, entry and inverted index structure;
Fig. 3 is the flow chart of the discrete data capturing analysis method of present pre-ferred embodiments.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and for explaining only the invention, and is not considered as limiting the invention.On the contrary, this The embodiment of invention includes all changes fallen within the scope of the spiritual and intension of attached claims, modification and is equal Object.
As shown in Figure 1, the present embodiment provides a kind of discrete data acquisition analysis systems comprising a data acquisition module 1, One data analysis module 2, a data memory module 3, a data retrieval module 4 and a data visualization module 5.
The data acquisition module 1 acquires discrete data for real-time various dimensions, and the data include web page media Text data, web crawlers crawl internet data, Hadoop data, server running log data and it can be accessed The data of his operation system (such as traditional database: oracle database, mysql database and sqlserver database etc.). Data acquisition channel is more, data are more polynary, more comprehensively, for realizing that the acquisition of the big data in certain industry and analysis provide heavily fortified point It is real to ensure.
The data analysis module 2 is used to carry out data parsing and data cleansing to those collected data, with filtering Some invalid data in collected data.
The data memory module 3 is used to store the data after data cleansing by inverted index mode, and establishes number According to the mapping relations between middle participle and locating document.In inverted index mode, data be towards word (Term) rather than face To document.
Inverted index is a kind of structure, it is suitable for quick full-text search.One inverted index by document it is all not The list of repetitor is constituted, and for wherein each word, there is the lists of documents comprising it.Original document, entry and the row's of falling rope The relationship of guiding structure is as shown in Figure 2.
The data retrieval module 4 extracts keyword, and store mould from data for analyzing the term of input The corresponding target data of term is grabbed out in block 3, wherein provide corresponding segmenter for different language.
Corresponding segmenter is provided for different language, to realize that search in real time provides wider dimension and higher-quality Guarantee.
The data visualization module 5 is used to carry out marking sequence to target data according to term, is showed with topological diagram Data dependence.Such as: can behavioural analysis, anti-fraud, network security, drug discovery, personalized medicine, or based on lasting Real time data constructs personalized recommendation.
As shown in figure 3, the present embodiment also provides a kind of discrete data capturing analysis method comprising following steps:
Discrete data are acquired to step 101, real-time various dimensions, the data include web page media text data, web crawlers Internet data, Hadoop data, server running log data and the data of traditional database storage of crawl;
Step 102 carries out data parsing and data cleansing to those collected data;
Step 103 stores the data after data cleansing by inverted index mode, and establishes participle and locating document in data Between mapping relations;
Step 104 analyzes the term of input, extracts keyword, and term is grabbed out from data memory module Data in corresponding document are as target data;
Step 105 carries out marking sequence to target data according to term, shows data dependence with topological diagram.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with A variety of variations, modification, replacement can be carried out to these embodiments without departing from the principles and spirit of the present invention by understanding And modification, the scope of the present invention is by appended claims and its equivalent limits.

Claims (6)

1. a kind of discrete data acquisition analysis system, which is characterized in that it includes a data acquisition module, data analysis mould Block, a data memory module, a data retrieval module and a data visualization module;
The data acquisition module acquires discrete data for real-time various dimensions, and the data include web page media textual data According to the internet data of, web crawlers crawl, Hadoop data, server running log data and other business can be accessed The data of system;
The data analysis module is used to carry out data parsing and data cleansing to those collected data;
The data memory module is used to store the data after data cleansing by inverted index mode, and establishes in data and divide Mapping relations between word and locating document;
The data retrieval module extracts keyword, and from data memory module for analyzing the term of input Grab out the corresponding target data of term;
The data visualization module is used to carry out marking sequence to target data according to term, shows data phase with topological diagram Guan Xing.
2. discrete data acquisition analysis system according to claim 1, which is characterized in that the data retrieval module is used for Corresponding segmenter is provided for different language.
3. discrete data acquisition analysis system according to claim 1, which is characterized in that other described operation systems include Traditional database, the traditional database include oracle database, mysql database and sqlserver database.
4. a kind of discrete data capturing analysis method, which is characterized in that itself the following steps are included:
Acquire to S1, real-time various dimensions discrete data, the data include that web page media text data, web crawlers grab Internet data, Hadoop data, server running log data and the data that other operation systems can be accessed;
S2, data parsing and data cleansing are carried out to those collected data;
S3, the data after data cleansing are stored by inverted index mode, and established in data between participle and locating document Mapping relations;
S4, the term of input is analyzed, extracts keyword, and it is corresponding to grab out from data memory module term Data in document are as target data;
S5, marking sequence is carried out to target data according to term, data dependence is showed with topological diagram.
5. discrete data capturing analysis method according to claim 4, which is characterized in that in step s 4, for difference Language provides corresponding segmenter.
6. discrete data capturing analysis method according to claim 4, which is characterized in that other described operation systems include Traditional database, the traditional database include oracle database, mysql database and sqlserver database.
CN201811045808.6A 2018-09-07 2018-09-07 Discrete data acquisition analysis system and method Pending CN109241432A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811045808.6A CN109241432A (en) 2018-09-07 2018-09-07 Discrete data acquisition analysis system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811045808.6A CN109241432A (en) 2018-09-07 2018-09-07 Discrete data acquisition analysis system and method

Publications (1)

Publication Number Publication Date
CN109241432A true CN109241432A (en) 2019-01-18

Family

ID=65067373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811045808.6A Pending CN109241432A (en) 2018-09-07 2018-09-07 Discrete data acquisition analysis system and method

Country Status (1)

Country Link
CN (1) CN109241432A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625719A (en) * 2020-05-21 2020-09-04 四川九八村信息科技有限公司 Propaganda channel expanding system and method for plasma single-collection station
CN112434209A (en) * 2020-12-07 2021-03-02 广东电网有限责任公司佛山供电局 Multi-channel and rapid knowledge point collecting system
CN113051234A (en) * 2021-04-19 2021-06-29 国际关系学院 Mobile on-site big data analysis platform

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622443A (en) * 2012-03-13 2012-08-01 北京邮电大学 Customized screening system and method for microblog
CN102915381A (en) * 2012-11-20 2013-02-06 公安部第三研究所 Multi-dimensional semantic based visualized network retrieval rendering system and rendering control method
CN104731851A (en) * 2014-12-16 2015-06-24 芜湖乐锐思信息咨询有限公司 Big data analysis method based on topological network
CN106354772A (en) * 2016-08-23 2017-01-25 成都卡莱博尔信息技术股份有限公司 Mass data system with data cleaning function
CN106776719A (en) * 2016-11-21 2017-05-31 北海高创电子信息孵化器有限公司 A kind of on-line information consultant search method
CN107038225A (en) * 2017-03-31 2017-08-11 江苏飞搏软件股份有限公司 The search method of information intelligent retrieval system
CN107633075A (en) * 2017-09-22 2018-01-26 吉林大学 A kind of multi-source heterogeneous data fusion platform and fusion method
CN107818130A (en) * 2017-09-15 2018-03-20 深圳市电陶思创科技有限公司 The method for building up and system of a kind of search engine
CN108491438A (en) * 2018-02-12 2018-09-04 陆夏根 A kind of technology policy retrieval analysis method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622443A (en) * 2012-03-13 2012-08-01 北京邮电大学 Customized screening system and method for microblog
CN102915381A (en) * 2012-11-20 2013-02-06 公安部第三研究所 Multi-dimensional semantic based visualized network retrieval rendering system and rendering control method
CN104731851A (en) * 2014-12-16 2015-06-24 芜湖乐锐思信息咨询有限公司 Big data analysis method based on topological network
CN106354772A (en) * 2016-08-23 2017-01-25 成都卡莱博尔信息技术股份有限公司 Mass data system with data cleaning function
CN106776719A (en) * 2016-11-21 2017-05-31 北海高创电子信息孵化器有限公司 A kind of on-line information consultant search method
CN107038225A (en) * 2017-03-31 2017-08-11 江苏飞搏软件股份有限公司 The search method of information intelligent retrieval system
CN107818130A (en) * 2017-09-15 2018-03-20 深圳市电陶思创科技有限公司 The method for building up and system of a kind of search engine
CN107633075A (en) * 2017-09-22 2018-01-26 吉林大学 A kind of multi-source heterogeneous data fusion platform and fusion method
CN108491438A (en) * 2018-02-12 2018-09-04 陆夏根 A kind of technology policy retrieval analysis method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625719A (en) * 2020-05-21 2020-09-04 四川九八村信息科技有限公司 Propaganda channel expanding system and method for plasma single-collection station
CN111625719B (en) * 2020-05-21 2023-06-13 四川九八村信息科技有限公司 Propaganda channel expanding system and method for single plasma collecting station
CN112434209A (en) * 2020-12-07 2021-03-02 广东电网有限责任公司佛山供电局 Multi-channel and rapid knowledge point collecting system
CN113051234A (en) * 2021-04-19 2021-06-29 国际关系学院 Mobile on-site big data analysis platform

Similar Documents

Publication Publication Date Title
Liang et al. Dynamic clustering of streaming short documents
US11036791B2 (en) Computerized system and method for determining non-redundant tags from a user's network activity
Healy The performativity of networks
CN110704411B (en) Knowledge graph building method and device suitable for art field and electronic equipment
CN104182389B (en) A kind of big data analyzing business intelligence service system based on semanteme
US10691770B2 (en) Real-time classification of evolving dictionaries
Kovacs-Gyori et al. # London2012: Towards citizen-contributed urban planning through sentiment analysis of twitter data
CN109241432A (en) Discrete data acquisition analysis system and method
CN110533212A (en) Urban waterlogging public sentiment monitoring and pre-alarming method based on big data
US11263523B1 (en) System and method for organizational health analysis
CN108363748B (en) Topic portrait system and topic portrait method based on knowledge
Yahia et al. A new approach for evaluation of data mining techniques
CN109992653A (en) Information processing method and processing system
CN104615701B (en) The embedded big data visualization engine cluster in smart city based on video cloud platform
US20150142780A1 (en) Apparatus and method for analyzing event time-space correlation in social web media
CN107239509A (en) Towards single Topics Crawling method and system of short text
Scharl et al. Tourism intelligence and visual media analytics for destination management organizations
Kanza et al. City nexus: Discovering pairs of jointly-visited locations based on geo-tagged posts in social networks
CN110222057A (en) A kind of construction method of aerosol document formatted data base
CN111061853B (en) Method for rapidly acquiring FAQ model training corpus
CN114756685A (en) Complaint risk identification method and device for complaint sheet
CN205754379U (en) Log processing system
Ogbuju et al. The sentiment analysis of EndSARS protest in Nigeria
Ishikawa et al. Generalized difference method for generating integrated hypotheses in social big data
Yang et al. KOSMOS: Knowledge-graph oriented social media and mainstream media overview system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190118