CN104484409A - Data mining method for big data processing - Google Patents

Data mining method for big data processing Download PDF

Info

Publication number
CN104484409A
CN104484409A CN201410783092.5A CN201410783092A CN104484409A CN 104484409 A CN104484409 A CN 104484409A CN 201410783092 A CN201410783092 A CN 201410783092A CN 104484409 A CN104484409 A CN 104484409A
Authority
CN
China
Prior art keywords
data
correlation
data processing
amalgamation process
carries out
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410783092.5A
Other languages
Chinese (zh)
Inventor
赵迪
高辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhu Leruisi Information Consulting Co Ltd
Original Assignee
Wuhu Leruisi Information Consulting Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhu Leruisi Information Consulting Co Ltd filed Critical Wuhu Leruisi Information Consulting Co Ltd
Priority to CN201410783092.5A priority Critical patent/CN104484409A/en
Publication of CN104484409A publication Critical patent/CN104484409A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the technical field of internet, in particular to a data mining method for big data processing, which is comprehensive in mining result and high in data processing speed. The data mining method comprises the following steps of acquiring a retrieval demand of a user, processing the retrieval demand to be consistent data, then performing correlation matching on the consistent data and pre-stored purchased words, acquiring at least one correlation data source between the pre-stored purchased words and the retrieval demand data, building a topological graph of network, and mining in a database based on the topological graph of network. Compared with the prior art, the simple word matching or semantic mining way is abandoned, and the mining for potential key information is carried out on the data to be analyzed through starting with the correlation topological network, so that the data mining method for big data processing has the remarkable advantages that the acquired results are more comprehensive and more accurate, and the like.

Description

For the data digging method of large data processing
Technical field:
The present invention relates to Internet technical field, the data digging method for large data processing that specifically a kind of Result is comprehensive, data processing speed is high.
Background technology:
Large data technique or claim flood tide data, refers to involved data quantity huge to cannot by current main software instrument, reaches to draw, manage, process and arranges and become the more positive object information of help enterprise management decision-making within the rational time.The strategic importance of large data counts does not lie in grasps huge data message, and be to carry out specialized process to these containing significant data, in other words, if large data are compared to a kind of industry, the key that so this industry realizes profit is to improve " working ability " to data, realizes increment by process data.
How from a large amount of, incomplete, noisy, fuzzy, random extracting data lie in wherein, ignorant in advance but process that the is information of potentially useful sometimes is called as data mining, obviously, the key of large data technique during data mining.Data digging method common is at present roughly divided into following several: a kind of mode by semi-automation sets up Web page classifying system, and introduce Data classification, query word classification or buy the attributes such as classification, carry out relevance feedback in conjunction with Webpage searching result, thus obtain the information wanted; Another kind is based on literal characters matching; Also have one to be adopt based on semanteme, analyze potential applications correlation models thus obtain retrieve data, the potential feature in above analytic system easy drain message business, causes data results not comprehensive.
Summary of the invention:
The present invention is directed to the shortcoming and defect existed in prior art, propose the data digging method for large data processing that a kind of Result is comprehensive, data processing speed is high.
The present invention is reached by following measures:
For a data digging method for large data processing, it is characterized in that comprising the following steps:
Step 1: the Search Requirement obtaining user, is inputted the mode of term or voice, know the demand of user by user;
Step 2: carry out rough handling to the data obtained in step 1, prepares next step process stored in storer after being treated to consistent data;
Step 3: extract through pretreated Search Requirement data from storer, carries out relevant matches by itself and the purchase word prestored, and obtains at least one the correlation data source between purchase word and Search Requirement data prestored;
Step 4: network topological diagram is built to the correlation data that step 3 obtains, and topological diagram Network Based excavates in a database;
Step 5: export Result.
In step 2 of the present invention, rough handling is carried out to obtained data, can adopt hash function model that high dimensional data is treated to binary data, thus be convenient to store and further analyzing and processing.
Also comprise in step 3 of the present invention and carry out amalgamation process at least one correlation data source, described amalgamation process adopts method of weighted mean disposal route.
Also comprise in step 3 of the present invention and carry out amalgamation process at least one correlation data source, described amalgamation process adopts Kalman filtering facture.
Also comprise in step 3 of the present invention and carry out amalgamation process at least one correlation data source, described amalgamation process adopts statistical decision facture.
Also comprise in step 3 of the present invention and carry out amalgamation process at least one correlation data source, described amalgamation process adopts Processing with Neural Network method.
The present invention compared with prior art, has abandoned simple text coupling or the semantic mode excavated, has opened up general network and start with, data to be analyzed are carried out to the excavation of potential key message from correlativity, have obtained result more comprehensively, the significant advantage such as more accurate.
Accompanying drawing illustrates:
Accompanying drawing is process flow diagram of the present invention.
Embodiment:
Below in conjunction with accompanying drawing, the present invention is further illustrated.
As shown in drawings, the present invention proposes a kind of data digging method for large data processing, it is characterized in that comprising the following steps:
Step 1: the Search Requirement obtaining user, is inputted the mode of term or voice, know the demand of user by user;
Step 2: carry out rough handling to the data obtained in step 1, prepares next step process stored in storer after being treated to consistent data;
Step 3: extract through pretreated Search Requirement data from storer, carries out relevant matches by itself and the purchase word prestored, and obtains at least one the correlation data source between purchase word and Search Requirement data prestored;
Step 4: network topological diagram is built to the correlation data that step 3 obtains, and topological diagram Network Based excavates in a database;
Step 5: export Result.
In step 2 of the present invention, rough handling is carried out to obtained data, can adopt hash function model that high dimensional data is treated to binary data, thus be convenient to store and further analyzing and processing.
Also comprise in step 3 of the present invention and carry out amalgamation process at least one correlation data source, described amalgamation process adopts method of weighted mean disposal route.
Also comprise in step 3 of the present invention and carry out amalgamation process at least one correlation data source, described amalgamation process adopts Kalman filtering facture.
Also comprise in step 3 of the present invention and carry out amalgamation process at least one correlation data source, described amalgamation process adopts statistical decision facture.
Also comprise in step 3 of the present invention and carry out amalgamation process at least one correlation data source, described amalgamation process adopts Processing with Neural Network method.
The present invention compared with prior art, has abandoned simple text coupling or the semantic mode excavated, has opened up general network and start with, data to be analyzed are carried out to the excavation of potential key message from correlativity, have obtained result more comprehensively, the significant advantage such as more accurate.

Claims (6)

1., for a data digging method for large data processing, it is characterized in that comprising the following steps:
Step 1: the Search Requirement obtaining user, is inputted the mode of term or voice, know the demand of user by user;
Step 2: carry out rough handling to the data obtained in step 1, prepares next step process stored in storer after being treated to consistent data;
Step 3: extract through pretreated Search Requirement data from storer, carries out relevant matches by itself and the purchase word prestored, and obtains at least one the correlation data source between purchase word and Search Requirement data prestored;
Step 4: network topological diagram is built to the correlation data that step 3 obtains, and topological diagram Network Based excavates in a database;
Step 5: export Result.
2. a kind of data digging method for large data processing according to claim 1, it is characterized in that carrying out rough handling to obtained data in described step 2, adopt hash function model that high dimensional data is treated to binary data, thus be convenient to store and further analyzing and processing.
3. a kind of data digging method for large data processing according to claim 1, is characterized in that also comprising in described step 3 and carries out amalgamation process at least one correlation data source, and described amalgamation process adopts method of weighted mean disposal route.
4. a kind of data digging method for large data processing according to claim 1, is characterized in that also comprising in described step 3 and carries out amalgamation process at least one correlation data source, and described amalgamation process adopts Kalman filtering facture.
5. a kind of data digging method for large data processing according to claim 1, is characterized in that also comprising in described step 3 and carries out amalgamation process at least one correlation data source, and described amalgamation process adopts statistical decision facture.
6. a kind of data digging method for large data processing according to claim 1, is characterized in that also comprising in described step 3 and carries out amalgamation process at least one correlation data source, and described amalgamation process adopts Processing with Neural Network method.
CN201410783092.5A 2014-12-16 2014-12-16 Data mining method for big data processing Pending CN104484409A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410783092.5A CN104484409A (en) 2014-12-16 2014-12-16 Data mining method for big data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410783092.5A CN104484409A (en) 2014-12-16 2014-12-16 Data mining method for big data processing

Publications (1)

Publication Number Publication Date
CN104484409A true CN104484409A (en) 2015-04-01

Family

ID=52758950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410783092.5A Pending CN104484409A (en) 2014-12-16 2014-12-16 Data mining method for big data processing

Country Status (1)

Country Link
CN (1) CN104484409A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550704A (en) * 2015-12-10 2016-05-04 南京邮电大学 Hybrid common factor analyzer-based distributed high-dimensional data classification method
CN105653707A (en) * 2015-12-30 2016-06-08 芜湖乐锐思信息咨询有限公司 Network information monitoring and analyzing system
CN106776719A (en) * 2016-11-21 2017-05-31 北海高创电子信息孵化器有限公司 A kind of on-line information consultant search method
WO2024060543A1 (en) * 2022-09-20 2024-03-28 河北网新科技集团股份有限公司 Real-time data processing method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110089A (en) * 2007-09-04 2008-01-23 华为技术有限公司 Method and system for data digging and model building
US20120041901A1 (en) * 2007-10-19 2012-02-16 Quantum Intelligence, Inc. System and Method for Knowledge Pattern Search from Networked Agents
CN103106340A (en) * 2013-01-21 2013-05-15 天津大学 Game level automatic generation system and method based on data mining and data fusion
CN103136337A (en) * 2013-02-01 2013-06-05 北京邮电大学 Distributed knowledge data mining device and mining method used for complex network
CN103699550A (en) * 2012-09-27 2014-04-02 腾讯科技(深圳)有限公司 Data mining system and data mining method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101110089A (en) * 2007-09-04 2008-01-23 华为技术有限公司 Method and system for data digging and model building
US20120041901A1 (en) * 2007-10-19 2012-02-16 Quantum Intelligence, Inc. System and Method for Knowledge Pattern Search from Networked Agents
CN103699550A (en) * 2012-09-27 2014-04-02 腾讯科技(深圳)有限公司 Data mining system and data mining method
CN103106340A (en) * 2013-01-21 2013-05-15 天津大学 Game level automatic generation system and method based on data mining and data fusion
CN103136337A (en) * 2013-02-01 2013-06-05 北京邮电大学 Distributed knowledge data mining device and mining method used for complex network

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550704A (en) * 2015-12-10 2016-05-04 南京邮电大学 Hybrid common factor analyzer-based distributed high-dimensional data classification method
CN105550704B (en) * 2015-12-10 2019-01-01 南京邮电大学 Distributed high dimensional data classification method based on mixing common factor analyzer
CN105653707A (en) * 2015-12-30 2016-06-08 芜湖乐锐思信息咨询有限公司 Network information monitoring and analyzing system
CN106776719A (en) * 2016-11-21 2017-05-31 北海高创电子信息孵化器有限公司 A kind of on-line information consultant search method
WO2024060543A1 (en) * 2022-09-20 2024-03-28 河北网新科技集团股份有限公司 Real-time data processing method and system

Similar Documents

Publication Publication Date Title
CN106250513B (en) Event modeling-based event personalized classification method and system
CN105389349B (en) Dictionary update method and device
CN102722709B (en) Method and device for identifying garbage pictures
CN104615687B (en) A kind of entity fine grit classification method and system towards knowledge base update
CN104199972A (en) Named entity relation extraction and construction method based on deep learning
CN106844640B (en) Webpage data analysis processing method
CN104484409A (en) Data mining method for big data processing
CN105447179A (en) Microblog social network based topic automated recommendation method and system
CN102542061B (en) Intelligent product classification method
RU2010141559A (en) RANKING SEARCH RESULTS USING THE EDITING DISTANCE AND DOCUMENT INFORMATION
CN111899089A (en) Enterprise risk early warning method and system based on knowledge graph
CN101141456A (en) Vertical search based network data excavation method
MY191557A (en) Management server and management method employing same
CN104281698A (en) Efficient big data query method
CN105183710A (en) Method for automatically generating document summary
CN113342989B (en) Knowledge graph construction method and device of patent data, storage medium and terminal
CN104268283A (en) Method for automatically analyzing Internet web page
CN106649329A (en) Safety production big data mining system
GB2603018A (en) Generation of digital well schematics
CN102982048A (en) Method and device for assessing junk information mining rule
CN103984700A (en) Heterogeneous data analysis method for vertical search of scientific information
CN104281710A (en) Network data excavation method
CN103902709A (en) Association analyzing method
CN111723297B (en) Dual-semantic similarity judging method for grid society situation research and judgment
CN112464648B (en) Industry standard blank feature recognition system and method based on multi-source data analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150401

WD01 Invention patent application deemed withdrawn after publication