CN106959948A - The system and its preprocess method pre-processed for distributed nature to big data - Google Patents

The system and its preprocess method pre-processed for distributed nature to big data Download PDF

Info

Publication number
CN106959948A
CN106959948A CN201610010843.9A CN201610010843A CN106959948A CN 106959948 A CN106959948 A CN 106959948A CN 201610010843 A CN201610010843 A CN 201610010843A CN 106959948 A CN106959948 A CN 106959948A
Authority
CN
China
Prior art keywords
data
distributed
item
block
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201610010843.9A
Other languages
Chinese (zh)
Inventor
顾青
梁佐泉
谢超
梁艳敏
王宁宁
冯四风
赵艳红
田文晋
王亚红
黄奚芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Waterhouse Integrity Information Technology Co Ltd
Original Assignee
Waterhouse Integrity Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Waterhouse Integrity Information Technology Co Ltd filed Critical Waterhouse Integrity Information Technology Co Ltd
Priority to CN201610010843.9A priority Critical patent/CN106959948A/en
Publication of CN106959948A publication Critical patent/CN106959948A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Abstract

The invention discloses a kind of system pre-processed for distributed nature to big data, including:Adapter is pre-processed, entrance is provided for initial data pretreatment, is divided into automation pretreatment adapter and semi-automatic pretreatment adapter;Data processing module, the data that pretreatment adapter is sent carry out the division of data block according to the data form of specified rule and unified standard, data block after division is distributed on different memory nodes, mutually have related data to be divided in same data block, and do not possess relevance between data block;Distributed storage module, sets multiple memory nodes, the data block sent for data storage processing module.Present invention also offers a kind of method pre-processed for distributed nature to big data.The present invention can greatly improve big data Distributed Calculation and the degree of accuracy and the efficiency of mining analysis.

Description

The system and its preprocess method pre-processed for distributed nature to big data
Technical field
The present invention relates to computer realm, the system that more particularly to a kind of distributed nature for big data is pre-processed to big data.The invention further relates to a kind of method that distributed nature for big data is pre-processed to big data
Background technology
The development of big data technology is swift and violent, and data technique handle the data of single type from early stage on unit, develops into the data of the current processing polymorphic type on computer cluster, realizes time loose data analysis application.With data volume develop into it is PB, EB grades even more big, and it is required that the faster Treatment Analysis time, the application technology of the general technology such as analysis, the second level time series analysis of complex types of data such as big data special-purpose computer, strange land distributed computer cluster, the processing of polymorphic type multi-source data and analysis, data network and various domain-orienteds is the development trend of big data technology.The big data general technology for representative and open source projects fast development with HDFS, GFS, MapReduce, Hadoop, Spark, Storm, HBase, MongoDB etc., big data preconditioning technique is an essential link in big data processing procedure, and these big data treatment technologies all introduce the concept that Distributed Calculation is analyzed with distributed libray.
Big data information source is complicated, and data structure is various, the data collected need to be pre-processed using big data preconditioning technique, and information is established as to the data standard of unified standard, so as to support follow-up data to calculate and mining analysis.In order to effectively support big data Distributed Calculation and mining analysis, big data need to be pre-processed for distributed nature, it is ensured that related data, which is between same node and node, is not present data and the interactivity in calculating.
Data distribution formula characteristic includes Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser.
The content of the invention
The technical problem to be solved in the present invention is to provide the system pre-processed using data distribution formula characteristic (Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser) to big data, so as to which big data to be fast and effeciently processed as to the data form of unified standard, and associated data is divided into the data that same node, total data be divided between different nodes and node does not in order possess relevance, the interactive computing between node is avoided, big data Distributed Calculation and mining analysis is effectively supported.Present invention also offers a kind of method pre-processed using data distribution formula characteristic to big data
In order to solve the above technical problems, the system pre-processed for distributed nature to big data that the present invention is provided, including:Pre-process adapter, data processing module and distributed storage module;
Adapter is pre-processed, the data for providing entrance and initial data being converted into object format are pre-processed for initial data, is divided into automation pretreatment adapter and semi-automatic pretreatment adapter;
Automation pretreatment adapter, different automation adapters are set according to different data source formats, initial data is converted into the data of object format;
Semi-automation pretreatment adapter, the standard for carrying out secondary development by pre-processing interface to open standard or pre-processing adapter according to automation adds corresponding configuration file, and initial data is converted into the data of object format or the data of automation pretreatment adapter call format are met;
Data processing module, the data that pretreatment adapter is sent carry out the division of data block according to the data form of specified rule and unified standard, data block after division is distributed on different memory nodes, the data for meeting default association computation rule are divided in same data block, and do not possess relevance between data block;
Distributed storage module, sets multiple memory nodes, the data block sent for data storage processing module.
Wherein, the specified rule of data processing module progress data block division is:Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser of data;
Wherein, when being divided for Distributed Calculation algorithm to data block, it is accomplished by the following way:
Data aggregate, by the sequence of data, Classifying Sum, data packet operation by data integration be data block;
Data recombination, according to specific rule, extracts corresponding data items, reconfigures as new data block;
The data that correlation rule is met between data item, by setting associated data rule, are divided into a data block by data correlation;
Data cutting, on the basis of data aggregate, data correlation and data reorganization operation, for between different pieces of information block in Distributed Calculation or between the data of different machines data calculating need to be carried out according to the computation model of setting, data are carried out by data cutting by specified rule according to business demand, so that data be distributed on different nodes in order;
Computation model, i.e., the mathematical formulae abstracted according to business demand;
The data item of data division can be carried out in specified rule, including data category, size of data or calculating data.
Wherein, when being divided for distributed libray parser to data, it is accomplished by the following way:
Data message is extracted, and according to the parameter requirements of parser, extracts the data item for needing to analyze, and be stored on identical back end;
Data processing, on the basis of initial data, according to business diagnosis target, sets corresponding computing formula, new data item is produced by the calculating between data with existing;
Mining analysis algorithm data form is changed, and converts raw data into the data form of mining analysis algorithm requirement.
Wherein, when carrying out data block division using the corresponding Mathematical Modeling of distributed libray parser, it is accomplished by the following way:
By Data Format Transform and data model extraction, data item, data type and the data form needed for Mathematical Modeling are extracted, by data distribution to different nodes;
Mathematical Modeling Data Format Transform, the data form converted raw data into needed for Mathematical Modeling;
Data model is extracted, the need for mining analysis, and extracting part typical data according to specified rule in initial data builds Mathematical Modeling.
Wherein, the configuration item of automation pretreatment adapter can be corresponded by the field name or data item mapped with data storage, the data item of automation pretreatment adapter can be chosen by configuring the page, or by being set to configuration item data parameter value, changing and choose.
The method pre-processed for distributed nature to big data that the present invention is provided, including:
The first step, initial data is converted into the data of object format according to different data source formats, major key of one of data item as calculating is set according to data analysis target, based on available data item, data corresponding to the prime key item of any two data in all data are combined, associated data pair is drawn;
Second step, the prime key item based on associated data centering, the corresponding computation model of setting obtain the data item needed for association is calculated as value, are converted to<key,value>Key-value pair;
3rd step, different data blocks is obtained to the division that key-value pair carries out data block according to specified rule, and obtain new data block to obtained data block progress parallel computation;
4th step, new data block is divided on different nodes, does not possess any relevance between obtained data block.
Wherein, when implementing three steps, the specified rule for carrying out data block division is:Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser of data.
Wherein, when carrying out data block division using Distributed Calculation algorithm, it is accomplished by the following way:
Data aggregate, by the sequence of data, Classifying Sum, data packet operation by data integration be data block;
Data recombination, according to specific rule, extracts corresponding data items, reconfigures as new data block;
The data that correlation rule is met between data item, by setting associated data rule, are divided into a data block by data correlation;
Data cutting, on the basis of data aggregate, data correlation and data reorganization operation, for between different pieces of information block in Distributed Calculation or between the data of different machines data calculating need to be carried out according to the computation model of setting, data are carried out by data cutting by specified rule according to business demand, so that data be distributed on different nodes in order.
Computation model, i.e., the mathematical formulae abstracted according to business demand;
The data item of data division can be carried out in specified rule, including data category, size of data or calculating data.
Wherein, when carrying out data block division using distributed libray parser, it is accomplished by the following way:
Data message is extracted, and according to the parameter requirements of parser, extracts the data item for needing to analyze, and be stored on identical back end;
Data processing, on the basis of initial data, according to business diagnosis target, sets corresponding computing formula, new data item is produced by the calculating between data with existing;
Mining analysis algorithm data form is changed, and converts raw data into the data form of mining analysis algorithm requirement.
Wherein, when carrying out data block division using the corresponding Mathematical Modeling of distributed libray parser, it is accomplished by the following way:
By Data Format Transform and data model extraction, data item, data type and the data form needed for Mathematical Modeling are extracted, by data distribution to different nodes;
Mathematical Modeling Data Format Transform, the data form converted raw data into needed for Mathematical Modeling;
Data model is extracted, the need for mining analysis, and extracting part typical data according to specified rule in initial data builds Mathematical Modeling.
Wherein, when implementing the first step, the configuration item of the data of object format can be corresponded by mapping with the field name of data storage or data item, and the data item of the data of object format can be by choosing, or by being set to configuration item data parameter value, changing and choose.
So that the relation value between data is calculated as an example, illustrate the operation principle of the present invention.
Assuming that shared N datas, set the unique mark per data as major key key, by the calculating between any same data item of two datas, obtain the relating value between any two data, N* (N-1)/2 calculating need to be carried out altogether.
As shown in figure 1, being the computation structure figure of the data after Several Traditional Preconditioning Methods processing.
Traditional data preprocessing method is:Data are averagely divided on m node according to the size of data volume, because any two data is all needed calculate so as to draw in its relating value, Fig. 1, the data in the data block 1 of node 1, which are calculated, can be seen that, co-exists in following three types of data and calculates:Calculated between any two data in c1, same data block;Data between c2, the different pieces of information block of uniform machinery are calculated;Data between c3, the different pieces of information block of different machines are calculated.
Need frequently to be interacted between different nodes between different pieces of information, between different pieces of information block when data after preprocess method processing carry out data calculating, can all cause what is calculated to take.
Fig. 2 is the computation structure figure of the data after present invention pretreatment.By the data that need to be calculated storage it is a data by pretreated data, it is to avoid communication and interaction between different pieces of information, between different pieces of information block between different nodes, greatly improves the efficiency that data carry out Distributed Calculation.Through pretreated data of the invention according to business diagnosis target, the data form needed for mining analysis is processed into.
Big data Distributed Calculation and the efficiency of mining analysis can be substantially improved in the present invention.
Brief description of the drawings
The present invention is further detailed explanation with embodiment below in conjunction with the accompanying drawings:
Fig. 1 is the computation structure schematic diagram of the data after Several Traditional Preconditioning Methods processing.
Fig. 2 is the computation structure schematic diagram of data after present invention pretreatment.
Fig. 3 is pretreatment system structural representation of the present invention.
Embodiment
As shown in figure 3, the system pre-processed for distributed nature to big data that the present invention is provided, including:Pre-process adapter, data processing module and distributed storage module;
Adapter is pre-processed, the data for providing entrance and initial data being converted into object format are pre-processed for initial data, is divided into automation pretreatment adapter and semi-automatic pretreatment adapter;
Automation pretreatment adapter, different automation adapters are set according to different data source formats, initial data is converted into the data of object format;
Semi-automation pretreatment adapter, the standard for carrying out secondary development by pre-processing interface to open standard or pre-processing adapter according to automation adds corresponding configuration file, and initial data is converted into the data of object format or the data of automation pretreatment adapter call format are met;
Data processing module, the data that pretreatment adapter is sent carry out the division of data block according to the data form of specified rule and unified standard, data block after division is distributed on different memory nodes, the data for meeting default association computation rule are divided in same data block, and do not possess relevance between data block;
Distributed storage module, sets multiple memory nodes, the data block sent for data storage processing module.
Wherein, the specified rule of data processing module progress data block division is:Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser of data;
Wherein, when being divided for Distributed Calculation algorithm to data block, it is accomplished by the following way:
Data aggregate, by the sequence of data, Classifying Sum, data packet operation by data integration be data block;
Data recombination, according to specific rule, extracts corresponding data items, reconfigures as new data block;
The data that correlation rule is met between data item, by setting associated data rule, are divided into a data block by data correlation;
Data cutting, on the basis of data aggregate, data correlation and data reorganization operation, for between different pieces of information block in Distributed Calculation or between the data of different machines data calculating need to be carried out according to the computation model of setting, data are carried out by data cutting by specified rule according to business demand, so that data be distributed on different nodes in order.
Computation model, i.e., the mathematical formulae abstracted according to business demand.
The data item of data division can be carried out in specified rule, including data category, size of data or calculating data.
Wherein, when being divided for distributed libray parser to data, it is accomplished by the following way:
Data message is extracted, and according to the parameter requirements of parser, extracts the data item for needing to analyze, and be stored on identical back end;
Data processing, on the basis of initial data, according to business diagnosis target, sets corresponding computing formula, new data item is produced by the calculating between data with existing;
Mining analysis algorithm data form is changed, and converts raw data into the data form of mining analysis algorithm requirement.
Wherein, when carrying out data block division using the corresponding Mathematical Modeling of distributed libray parser, it is accomplished by the following way:
By Data Format Transform and data model extraction, data item, data type and the data form needed for Mathematical Modeling are extracted, by data distribution to different nodes;
Mathematical Modeling Data Format Transform, the data form converted raw data into needed for Mathematical Modeling;
Data model is extracted, the need for mining analysis, and extracting part typical data according to specified rule in initial data builds Mathematical Modeling.
Wherein, the configuration item of automation pretreatment adapter can be corresponded by the field name or data item mapped with data storage, the data item of automation pretreatment adapter can be chosen by configuring the page, or by being set to configuration item data parameter value, changing and choose.
The present invention provides a kind of method pre-processed for distributed nature to big data, including:
The first step, initial data is converted into the data of object format according to different data source formats, major key of one of data item as calculating is set according to data analysis target, based on available data item, data corresponding to the prime key item of any two data in all data are combined, associated data pair is drawn;
Second step, the prime key item based on associated data centering, the corresponding computation model of setting obtain the data item needed for association is calculated as value, are converted to<key,value>Key-value pair;
3rd step, different data blocks is obtained to the division that key-value pair carries out data block according to specified rule, and obtain new data block to obtained data block progress parallel computation;
4th step, new data block is divided on different nodes, does not possess any relevance between obtained data block.
Wherein, when implementing three steps, the specified rule for carrying out data block division is:Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser of data.
Wherein, when carrying out data block division using Distributed Calculation algorithm, it is accomplished by the following way:
Data aggregate, by the sequence of data, Classifying Sum, data packet operation by data integration be data block;
Data recombination, according to specific rule, extracts corresponding data items, reconfigures as new data block;
The data that correlation rule is met between data item, by setting associated data rule, are divided into a data block by data correlation;
Data cutting, on the basis of data aggregate, data correlation and data reorganization operation, for between different pieces of information block in Distributed Calculation or between the data of different machines data calculating need to be carried out according to the computation model of setting, data are carried out by data cutting by specified rule according to business demand, so that data be distributed on different nodes in order.
Computation model, i.e., the mathematical formulae abstracted according to business demand.
The data item of data division can be carried out in specified rule, including data category, size of data or calculating data.
Wherein, when carrying out data block division using distributed libray parser, it is accomplished by the following way:
Data message is extracted, and according to the parameter requirements of parser, extracts the data item for needing to analyze, and be stored on identical back end;
Data processing, on the basis of initial data, according to business diagnosis target, sets corresponding computing formula, new data item is produced by the calculating between data with existing;
Mining analysis algorithm data form is changed, and converts raw data into the data form of mining analysis algorithm requirement.
Wherein, when carrying out data block division using the corresponding Mathematical Modeling of distributed libray parser, it is accomplished by the following way:
By Data Format Transform and data model extraction, data item, data type and the data form needed for Mathematical Modeling are extracted, by data distribution to different nodes;
Mathematical Modeling Data Format Transform, the data form converted raw data into needed for Mathematical Modeling;
Data model is extracted, the need for mining analysis, and extracting part typical data according to specified rule in initial data builds Mathematical Modeling.
Wherein, when implementing the first step, the configuration item of the data of object format can be corresponded by mapping with the field name of data storage or data item, and the data item of the data of object format can be by choosing, or by being set to configuration item data parameter value, changing and choose.
The present invention is described in detail above by embodiment and embodiment, but these are not construed as limiting the invention.Without departing from the principles of the present invention, those skilled in the art can also make many modification and improvement, and these also should be regarded as protection scope of the present invention.

Claims (10)

1. a kind of system pre-processed for distributed nature to big data, it is characterised in that including:Pre-process adapter, data processing module and distributed storage module;
Adapter is pre-processed, the data for providing entrance and initial data being converted into object format are pre-processed for initial data, is divided into automation pretreatment adapter and semi-automatic pretreatment adapter;
Automation pretreatment adapter, different automation adapters are set according to different data source formats, initial data is converted into the data of object format;
Semi-automation pretreatment adapter, the standard for carrying out secondary development by pre-processing interface to open standard or pre-processing adapter according to automation adds corresponding configuration file, and initial data is converted into the data of object format or the data of automation pretreatment adapter call format are met;
Data processing module, the data that pretreatment adapter is sent carry out the division of data block according to the data form of specified rule and unified standard, data block after division is distributed on different memory nodes, the data for meeting default association computation rule are divided in same data block, and do not possess relevance between data block;
Distributed storage module, sets multiple memory nodes, the data block sent for data storage processing module.
2. the system pre-processed as claimed in claim 1 for distributed nature to big data, it is characterised in that:Data processing module carry out data block division specified rule be:Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser of data.
3. the system pre-processed as claimed in claim 2 for distributed nature to big data, it is characterised in that:When being divided for Distributed Calculation algorithm to data block, it is accomplished by the following way:
Data aggregate, by the sequence of data, Classifying Sum, data packet operation by data integration be data block;
Data recombination, according to specific rule, extracts corresponding data items, reconfigures as new data block;
The data that correlation rule is met between data item, by setting associated data rule, are divided into a data block by data correlation;
Data cutting, on the basis of data aggregate, data correlation and data reorganization operation, for between different pieces of information block in Distributed Calculation or between the data of different machines data calculating need to be carried out according to the computation model of setting, data are carried out by data cutting by specified rule according to business demand, so that data be distributed on different nodes in order;
Computation model, i.e., the mathematical formulae taken out according to business demand;
The data item of data division can be carried out in specified rule, including data category, size of data or calculating data.
4. the system pre-processed as claimed in claim 2 for distributed nature to big data, it is characterised in that:When being divided for distributed libray parser to data, it is accomplished by the following way:
Data message is extracted, and according to the parameter requirements of parser, extracts the data item for needing to analyze, and be stored on identical back end;
Data processing, on the basis of initial data, according to business diagnosis target, sets corresponding computing formula, new data item is produced by the calculating between data with existing;
Mining analysis algorithm data form is changed, and converts raw data into the data form of mining analysis algorithm requirement.
5. the system pre-processed as claimed in claim 2 for distributed nature to big data, it is characterised in that:When carrying out data block division using the corresponding Mathematical Modeling of distributed libray parser, it is accomplished by the following way:
By Data Format Transform and data model extraction, data item, data type and the data form needed for Mathematical Modeling are extracted, by data distribution to different nodes;
Mathematical Modeling Data Format Transform, the data form converted raw data into needed for Mathematical Modeling;
Data model is extracted, the need for mining analysis, and extracting part typical data according to specified rule in initial data builds Mathematical Modeling.
6. the system pre-processed as claimed in claim 1 for distributed nature to big data, it is characterised in that:The configuration item of automation pretreatment adapter can be corresponded by the field name or data item mapped with data storage, the data item of automation pretreatment adapter can be chosen by configuring the page, or by being set to configuration item data parameter value, changing and choose.
7. a kind of method pre-processed for distributed nature to big data, it is characterised in that including:
The first step, initial data is converted into the data of object format according to different data source formats, major key of one of data item as calculating is set according to data analysis target, based on available data item, data corresponding to the prime key item of any two data in all data are combined, associated data pair is drawn;
Second step, the prime key item based on associated data centering, the corresponding computation model of setting obtain the data item needed for association is calculated as value, are converted to<key,value>Key-value pair;
3rd step, different data blocks is obtained to the division that key-value pair carries out data block according to specified rule, and obtain new data block to obtained data block progress parallel computation;
4th step, new data block is divided on different nodes, does not possess any relevance between obtained data block.
8. the method pre-processed as claimed in claim 7 for distributed nature to big data, it is characterised in that:When carrying out data block division using Distributed Calculation algorithm, it is accomplished by the following way:
Data aggregate, by the sequence of data, Classifying Sum, data packet operation by data integration be data block;
Data recombination, according to specific rule, extracts corresponding data items, reconfigures as new data block;
The data that correlation rule is met between data item, by setting associated data rule, are divided into a data block by data correlation;
Data cutting, on the basis of data aggregate, data correlation and data reorganization operation, for between different pieces of information block in Distributed Calculation or between the data of different machines data calculating need to be carried out according to the computation model of setting, data are carried out by data cutting by specified rule according to business demand, so that data be distributed on different nodes in order;
Computation model, i.e., the mathematical formulae taken out according to business demand;
The data item of data division can be carried out in specified rule, including data category, size of data or calculating data.
9. the method pre-processed as claimed in claim 7 for distributed nature to big data, it is characterised in that:When carrying out data block division using distributed libray parser, it is accomplished by the following way:
Data message is extracted, and according to the parameter requirements of parser, extracts the data item for needing to analyze, and be stored on identical back end;
Data processing, on the basis of initial data, according to business diagnosis target, sets corresponding computing formula, new data item is produced by the calculating between data with existing;
Mining analysis algorithm data form is changed, and converts raw data into the data form of mining analysis algorithm requirement.
10. the method pre-processed as claimed in claim 7 for distributed nature to big data, it is characterised in that:When carrying out data block division using the corresponding Mathematical Modeling of distributed libray parser, it is accomplished by the following way:
By Data Format Transform and data model extraction, data item, data type and the data form needed for Mathematical Modeling are extracted, by data distribution to different nodes;
Mathematical Modeling Data Format Transform, the data form converted raw data into needed for Mathematical Modeling;
Data model is extracted, the need for mining analysis, and extracting part typical data according to specified rule in initial data builds Mathematical Modeling.
CN201610010843.9A 2016-01-08 2016-01-08 The system and its preprocess method pre-processed for distributed nature to big data Withdrawn CN106959948A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610010843.9A CN106959948A (en) 2016-01-08 2016-01-08 The system and its preprocess method pre-processed for distributed nature to big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610010843.9A CN106959948A (en) 2016-01-08 2016-01-08 The system and its preprocess method pre-processed for distributed nature to big data

Publications (1)

Publication Number Publication Date
CN106959948A true CN106959948A (en) 2017-07-18

Family

ID=59480733

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610010843.9A Withdrawn CN106959948A (en) 2016-01-08 2016-01-08 The system and its preprocess method pre-processed for distributed nature to big data

Country Status (1)

Country Link
CN (1) CN106959948A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920516A (en) * 2018-05-31 2018-11-30 北京字节跳动网络技术有限公司 Real-time analysis method, system, device and computer readable storage medium
CN109344131A (en) * 2018-10-10 2019-02-15 国网安徽省电力有限公司信息通信分公司 Date storage method, device and management server
CN110175546A (en) * 2019-05-15 2019-08-27 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101089725B1 (en) * 2011-03-18 2011-12-07 동국대학교 산학협력단 Method of designing threshold filter for lossless image compression, apparatus and method for lossless image compression using the filter
CN104408159A (en) * 2014-12-04 2015-03-11 曙光信息产业(北京)有限公司 Data correlating, loading and querying method and device
WO2015113636A1 (en) * 2014-02-03 2015-08-06 Nokia Solutions And Networks Oy Architecture for enhanced processing of big data in network environment
CN103646111B (en) * 2013-12-25 2017-02-15 普元信息技术股份有限公司 System and method for realizing real-time data association in big data environment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101089725B1 (en) * 2011-03-18 2011-12-07 동국대학교 산학협력단 Method of designing threshold filter for lossless image compression, apparatus and method for lossless image compression using the filter
CN103646111B (en) * 2013-12-25 2017-02-15 普元信息技术股份有限公司 System and method for realizing real-time data association in big data environment
WO2015113636A1 (en) * 2014-02-03 2015-08-06 Nokia Solutions And Networks Oy Architecture for enhanced processing of big data in network environment
CN104408159A (en) * 2014-12-04 2015-03-11 曙光信息产业(北京)有限公司 Data correlating, loading and querying method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吕婉琪等: ""Hadoop分布式架构下大数据集的并行挖掘"", 《计算机技术与发展》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920516A (en) * 2018-05-31 2018-11-30 北京字节跳动网络技术有限公司 Real-time analysis method, system, device and computer readable storage medium
CN109344131A (en) * 2018-10-10 2019-02-15 国网安徽省电力有限公司信息通信分公司 Date storage method, device and management server
CN110175546A (en) * 2019-05-15 2019-08-27 深圳市商汤科技有限公司 Image processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106709012A (en) Method and device for analyzing big data
CN104331435B (en) A kind of efficient mass data abstracting method of low influence based on Hadoop big data platforms
CN106959948A (en) The system and its preprocess method pre-processed for distributed nature to big data
CN105677615B (en) A kind of distributed machines learning method based on weka interface
CN103258049A (en) Association rule mining method based on mass data
CN105049247A (en) Network safety log template extraction method and device
CN103279478A (en) Method for extracting features based on distributed mutual information documents
CN106815307A (en) Public Culture knowledge mapping platform and its use method
US20170060977A1 (en) Data preparation for data mining
KR101617696B1 (en) Method and device for mining data regular expression
CN107463706B (en) Hadoop-based mass wave recording data storage and analysis method and system
CN104615765A (en) Data processing method and data processing device for browsing internet records of mobile subscribers
CN104536830A (en) KNN text classification method based on MapReduce
CN105574032A (en) Rule matching operation method and device
CN104102701A (en) Hive-based method for filing and inquiring historical data
CN104298496B (en) data analysis type software development framework system
CN106874479A (en) The improved method and device of the FP Growth algorithms based on FPGA
CN109284088A (en) A kind of signaling big data processing method and electronic equipment
CN107611962B (en) Power grid system branch searching method and system and electronic equipment
CN103810197A (en) Hadoop-based data processing method and system
CN102637200B (en) Method for distributing multi-level associated data to same node of cluster
CN109857803B (en) Data synchronization method, device, equipment, system and computer readable storage medium
CN105630896A (en) Method for quickly importing mass data
CN110968596A (en) Data processing method based on label system
CN110941598A (en) Data deduplication method, device, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20170718

WW01 Invention patent application withdrawn after publication