CN106959948A - The system and its preprocess method pre-processed for distributed nature to big data - Google Patents
The system and its preprocess method pre-processed for distributed nature to big data Download PDFInfo
- Publication number
- CN106959948A CN106959948A CN201610010843.9A CN201610010843A CN106959948A CN 106959948 A CN106959948 A CN 106959948A CN 201610010843 A CN201610010843 A CN 201610010843A CN 106959948 A CN106959948 A CN 106959948A
- Authority
- CN
- China
- Prior art keywords
- data
- distributed
- item
- block
- processed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
Abstract
The invention discloses a kind of system pre-processed for distributed nature to big data, including:Adapter is pre-processed, entrance is provided for initial data pretreatment, is divided into automation pretreatment adapter and semi-automatic pretreatment adapter;Data processing module, the data that pretreatment adapter is sent carry out the division of data block according to the data form of specified rule and unified standard, data block after division is distributed on different memory nodes, mutually have related data to be divided in same data block, and do not possess relevance between data block;Distributed storage module, sets multiple memory nodes, the data block sent for data storage processing module.Present invention also offers a kind of method pre-processed for distributed nature to big data.The present invention can greatly improve big data Distributed Calculation and the degree of accuracy and the efficiency of mining analysis.
Description
Technical field
The present invention relates to computer realm, the system that more particularly to a kind of distributed nature for big data is pre-processed to big data.The invention further relates to a kind of method that distributed nature for big data is pre-processed to big data
Background technology
The development of big data technology is swift and violent, and data technique handle the data of single type from early stage on unit, develops into the data of the current processing polymorphic type on computer cluster, realizes time loose data analysis application.With data volume develop into it is PB, EB grades even more big, and it is required that the faster Treatment Analysis time, the application technology of the general technology such as analysis, the second level time series analysis of complex types of data such as big data special-purpose computer, strange land distributed computer cluster, the processing of polymorphic type multi-source data and analysis, data network and various domain-orienteds is the development trend of big data technology.The big data general technology for representative and open source projects fast development with HDFS, GFS, MapReduce, Hadoop, Spark, Storm, HBase, MongoDB etc., big data preconditioning technique is an essential link in big data processing procedure, and these big data treatment technologies all introduce the concept that Distributed Calculation is analyzed with distributed libray.
Big data information source is complicated, and data structure is various, the data collected need to be pre-processed using big data preconditioning technique, and information is established as to the data standard of unified standard, so as to support follow-up data to calculate and mining analysis.In order to effectively support big data Distributed Calculation and mining analysis, big data need to be pre-processed for distributed nature, it is ensured that related data, which is between same node and node, is not present data and the interactivity in calculating.
Data distribution formula characteristic includes Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser.
The content of the invention
The technical problem to be solved in the present invention is to provide the system pre-processed using data distribution formula characteristic (Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser) to big data, so as to which big data to be fast and effeciently processed as to the data form of unified standard, and associated data is divided into the data that same node, total data be divided between different nodes and node does not in order possess relevance, the interactive computing between node is avoided, big data Distributed Calculation and mining analysis is effectively supported.Present invention also offers a kind of method pre-processed using data distribution formula characteristic to big data
In order to solve the above technical problems, the system pre-processed for distributed nature to big data that the present invention is provided, including:Pre-process adapter, data processing module and distributed storage module;
Adapter is pre-processed, the data for providing entrance and initial data being converted into object format are pre-processed for initial data, is divided into automation pretreatment adapter and semi-automatic pretreatment adapter;
Automation pretreatment adapter, different automation adapters are set according to different data source formats, initial data is converted into the data of object format;
Semi-automation pretreatment adapter, the standard for carrying out secondary development by pre-processing interface to open standard or pre-processing adapter according to automation adds corresponding configuration file, and initial data is converted into the data of object format or the data of automation pretreatment adapter call format are met;
Data processing module, the data that pretreatment adapter is sent carry out the division of data block according to the data form of specified rule and unified standard, data block after division is distributed on different memory nodes, the data for meeting default association computation rule are divided in same data block, and do not possess relevance between data block;
Distributed storage module, sets multiple memory nodes, the data block sent for data storage processing module.
Wherein, the specified rule of data processing module progress data block division is:Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser of data;
Wherein, when being divided for Distributed Calculation algorithm to data block, it is accomplished by the following way:
Data aggregate, by the sequence of data, Classifying Sum, data packet operation by data integration be data block;
Data recombination, according to specific rule, extracts corresponding data items, reconfigures as new data block;
The data that correlation rule is met between data item, by setting associated data rule, are divided into a data block by data correlation;
Data cutting, on the basis of data aggregate, data correlation and data reorganization operation, for between different pieces of information block in Distributed Calculation or between the data of different machines data calculating need to be carried out according to the computation model of setting, data are carried out by data cutting by specified rule according to business demand, so that data be distributed on different nodes in order;
Computation model, i.e., the mathematical formulae abstracted according to business demand;
The data item of data division can be carried out in specified rule, including data category, size of data or calculating data.
Wherein, when being divided for distributed libray parser to data, it is accomplished by the following way:
Data message is extracted, and according to the parameter requirements of parser, extracts the data item for needing to analyze, and be stored on identical back end;
Data processing, on the basis of initial data, according to business diagnosis target, sets corresponding computing formula, new data item is produced by the calculating between data with existing;
Mining analysis algorithm data form is changed, and converts raw data into the data form of mining analysis algorithm requirement.
Wherein, when carrying out data block division using the corresponding Mathematical Modeling of distributed libray parser, it is accomplished by the following way:
By Data Format Transform and data model extraction, data item, data type and the data form needed for Mathematical Modeling are extracted, by data distribution to different nodes;
Mathematical Modeling Data Format Transform, the data form converted raw data into needed for Mathematical Modeling;
Data model is extracted, the need for mining analysis, and extracting part typical data according to specified rule in initial data builds Mathematical Modeling.
Wherein, the configuration item of automation pretreatment adapter can be corresponded by the field name or data item mapped with data storage, the data item of automation pretreatment adapter can be chosen by configuring the page, or by being set to configuration item data parameter value, changing and choose.
The method pre-processed for distributed nature to big data that the present invention is provided, including:
The first step, initial data is converted into the data of object format according to different data source formats, major key of one of data item as calculating is set according to data analysis target, based on available data item, data corresponding to the prime key item of any two data in all data are combined, associated data pair is drawn;
Second step, the prime key item based on associated data centering, the corresponding computation model of setting obtain the data item needed for association is calculated as value, are converted to<key,value>Key-value pair;
3rd step, different data blocks is obtained to the division that key-value pair carries out data block according to specified rule, and obtain new data block to obtained data block progress parallel computation;
4th step, new data block is divided on different nodes, does not possess any relevance between obtained data block.
Wherein, when implementing three steps, the specified rule for carrying out data block division is:Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser of data.
Wherein, when carrying out data block division using Distributed Calculation algorithm, it is accomplished by the following way:
Data aggregate, by the sequence of data, Classifying Sum, data packet operation by data integration be data block;
Data recombination, according to specific rule, extracts corresponding data items, reconfigures as new data block;
The data that correlation rule is met between data item, by setting associated data rule, are divided into a data block by data correlation;
Data cutting, on the basis of data aggregate, data correlation and data reorganization operation, for between different pieces of information block in Distributed Calculation or between the data of different machines data calculating need to be carried out according to the computation model of setting, data are carried out by data cutting by specified rule according to business demand, so that data be distributed on different nodes in order.
Computation model, i.e., the mathematical formulae abstracted according to business demand;
The data item of data division can be carried out in specified rule, including data category, size of data or calculating data.
Wherein, when carrying out data block division using distributed libray parser, it is accomplished by the following way:
Data message is extracted, and according to the parameter requirements of parser, extracts the data item for needing to analyze, and be stored on identical back end;
Data processing, on the basis of initial data, according to business diagnosis target, sets corresponding computing formula, new data item is produced by the calculating between data with existing;
Mining analysis algorithm data form is changed, and converts raw data into the data form of mining analysis algorithm requirement.
Wherein, when carrying out data block division using the corresponding Mathematical Modeling of distributed libray parser, it is accomplished by the following way:
By Data Format Transform and data model extraction, data item, data type and the data form needed for Mathematical Modeling are extracted, by data distribution to different nodes;
Mathematical Modeling Data Format Transform, the data form converted raw data into needed for Mathematical Modeling;
Data model is extracted, the need for mining analysis, and extracting part typical data according to specified rule in initial data builds Mathematical Modeling.
Wherein, when implementing the first step, the configuration item of the data of object format can be corresponded by mapping with the field name of data storage or data item, and the data item of the data of object format can be by choosing, or by being set to configuration item data parameter value, changing and choose.
So that the relation value between data is calculated as an example, illustrate the operation principle of the present invention.
Assuming that shared N datas, set the unique mark per data as major key key, by the calculating between any same data item of two datas, obtain the relating value between any two data, N* (N-1)/2 calculating need to be carried out altogether.
As shown in figure 1, being the computation structure figure of the data after Several Traditional Preconditioning Methods processing.
Traditional data preprocessing method is:Data are averagely divided on m node according to the size of data volume, because any two data is all needed calculate so as to draw in its relating value, Fig. 1, the data in the data block 1 of node 1, which are calculated, can be seen that, co-exists in following three types of data and calculates:Calculated between any two data in c1, same data block;Data between c2, the different pieces of information block of uniform machinery are calculated;Data between c3, the different pieces of information block of different machines are calculated.
Need frequently to be interacted between different nodes between different pieces of information, between different pieces of information block when data after preprocess method processing carry out data calculating, can all cause what is calculated to take.
Fig. 2 is the computation structure figure of the data after present invention pretreatment.By the data that need to be calculated storage it is a data by pretreated data, it is to avoid communication and interaction between different pieces of information, between different pieces of information block between different nodes, greatly improves the efficiency that data carry out Distributed Calculation.Through pretreated data of the invention according to business diagnosis target, the data form needed for mining analysis is processed into.
Big data Distributed Calculation and the efficiency of mining analysis can be substantially improved in the present invention.
Brief description of the drawings
The present invention is further detailed explanation with embodiment below in conjunction with the accompanying drawings:
Fig. 1 is the computation structure schematic diagram of the data after Several Traditional Preconditioning Methods processing.
Fig. 2 is the computation structure schematic diagram of data after present invention pretreatment.
Fig. 3 is pretreatment system structural representation of the present invention.
Embodiment
As shown in figure 3, the system pre-processed for distributed nature to big data that the present invention is provided, including:Pre-process adapter, data processing module and distributed storage module;
Adapter is pre-processed, the data for providing entrance and initial data being converted into object format are pre-processed for initial data, is divided into automation pretreatment adapter and semi-automatic pretreatment adapter;
Automation pretreatment adapter, different automation adapters are set according to different data source formats, initial data is converted into the data of object format;
Semi-automation pretreatment adapter, the standard for carrying out secondary development by pre-processing interface to open standard or pre-processing adapter according to automation adds corresponding configuration file, and initial data is converted into the data of object format or the data of automation pretreatment adapter call format are met;
Data processing module, the data that pretreatment adapter is sent carry out the division of data block according to the data form of specified rule and unified standard, data block after division is distributed on different memory nodes, the data for meeting default association computation rule are divided in same data block, and do not possess relevance between data block;
Distributed storage module, sets multiple memory nodes, the data block sent for data storage processing module.
Wherein, the specified rule of data processing module progress data block division is:Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser of data;
Wherein, when being divided for Distributed Calculation algorithm to data block, it is accomplished by the following way:
Data aggregate, by the sequence of data, Classifying Sum, data packet operation by data integration be data block;
Data recombination, according to specific rule, extracts corresponding data items, reconfigures as new data block;
The data that correlation rule is met between data item, by setting associated data rule, are divided into a data block by data correlation;
Data cutting, on the basis of data aggregate, data correlation and data reorganization operation, for between different pieces of information block in Distributed Calculation or between the data of different machines data calculating need to be carried out according to the computation model of setting, data are carried out by data cutting by specified rule according to business demand, so that data be distributed on different nodes in order.
Computation model, i.e., the mathematical formulae abstracted according to business demand.
The data item of data division can be carried out in specified rule, including data category, size of data or calculating data.
Wherein, when being divided for distributed libray parser to data, it is accomplished by the following way:
Data message is extracted, and according to the parameter requirements of parser, extracts the data item for needing to analyze, and be stored on identical back end;
Data processing, on the basis of initial data, according to business diagnosis target, sets corresponding computing formula, new data item is produced by the calculating between data with existing;
Mining analysis algorithm data form is changed, and converts raw data into the data form of mining analysis algorithm requirement.
Wherein, when carrying out data block division using the corresponding Mathematical Modeling of distributed libray parser, it is accomplished by the following way:
By Data Format Transform and data model extraction, data item, data type and the data form needed for Mathematical Modeling are extracted, by data distribution to different nodes;
Mathematical Modeling Data Format Transform, the data form converted raw data into needed for Mathematical Modeling;
Data model is extracted, the need for mining analysis, and extracting part typical data according to specified rule in initial data builds Mathematical Modeling.
Wherein, the configuration item of automation pretreatment adapter can be corresponded by the field name or data item mapped with data storage, the data item of automation pretreatment adapter can be chosen by configuring the page, or by being set to configuration item data parameter value, changing and choose.
The present invention provides a kind of method pre-processed for distributed nature to big data, including:
The first step, initial data is converted into the data of object format according to different data source formats, major key of one of data item as calculating is set according to data analysis target, based on available data item, data corresponding to the prime key item of any two data in all data are combined, associated data pair is drawn;
Second step, the prime key item based on associated data centering, the corresponding computation model of setting obtain the data item needed for association is calculated as value, are converted to<key,value>Key-value pair;
3rd step, different data blocks is obtained to the division that key-value pair carries out data block according to specified rule, and obtain new data block to obtained data block progress parallel computation;
4th step, new data block is divided on different nodes, does not possess any relevance between obtained data block.
Wherein, when implementing three steps, the specified rule for carrying out data block division is:Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser of data.
Wherein, when carrying out data block division using Distributed Calculation algorithm, it is accomplished by the following way:
Data aggregate, by the sequence of data, Classifying Sum, data packet operation by data integration be data block;
Data recombination, according to specific rule, extracts corresponding data items, reconfigures as new data block;
The data that correlation rule is met between data item, by setting associated data rule, are divided into a data block by data correlation;
Data cutting, on the basis of data aggregate, data correlation and data reorganization operation, for between different pieces of information block in Distributed Calculation or between the data of different machines data calculating need to be carried out according to the computation model of setting, data are carried out by data cutting by specified rule according to business demand, so that data be distributed on different nodes in order.
Computation model, i.e., the mathematical formulae abstracted according to business demand.
The data item of data division can be carried out in specified rule, including data category, size of data or calculating data.
Wherein, when carrying out data block division using distributed libray parser, it is accomplished by the following way:
Data message is extracted, and according to the parameter requirements of parser, extracts the data item for needing to analyze, and be stored on identical back end;
Data processing, on the basis of initial data, according to business diagnosis target, sets corresponding computing formula, new data item is produced by the calculating between data with existing;
Mining analysis algorithm data form is changed, and converts raw data into the data form of mining analysis algorithm requirement.
Wherein, when carrying out data block division using the corresponding Mathematical Modeling of distributed libray parser, it is accomplished by the following way:
By Data Format Transform and data model extraction, data item, data type and the data form needed for Mathematical Modeling are extracted, by data distribution to different nodes;
Mathematical Modeling Data Format Transform, the data form converted raw data into needed for Mathematical Modeling;
Data model is extracted, the need for mining analysis, and extracting part typical data according to specified rule in initial data builds Mathematical Modeling.
Wherein, when implementing the first step, the configuration item of the data of object format can be corresponded by mapping with the field name of data storage or data item, and the data item of the data of object format can be by choosing, or by being set to configuration item data parameter value, changing and choose.
The present invention is described in detail above by embodiment and embodiment, but these are not construed as limiting the invention.Without departing from the principles of the present invention, those skilled in the art can also make many modification and improvement, and these also should be regarded as protection scope of the present invention.
Claims (10)
1. a kind of system pre-processed for distributed nature to big data, it is characterised in that including:Pre-process adapter, data processing module and distributed storage module;
Adapter is pre-processed, the data for providing entrance and initial data being converted into object format are pre-processed for initial data, is divided into automation pretreatment adapter and semi-automatic pretreatment adapter;
Automation pretreatment adapter, different automation adapters are set according to different data source formats, initial data is converted into the data of object format;
Semi-automation pretreatment adapter, the standard for carrying out secondary development by pre-processing interface to open standard or pre-processing adapter according to automation adds corresponding configuration file, and initial data is converted into the data of object format or the data of automation pretreatment adapter call format are met;
Data processing module, the data that pretreatment adapter is sent carry out the division of data block according to the data form of specified rule and unified standard, data block after division is distributed on different memory nodes, the data for meeting default association computation rule are divided in same data block, and do not possess relevance between data block;
Distributed storage module, sets multiple memory nodes, the data block sent for data storage processing module.
2. the system pre-processed as claimed in claim 1 for distributed nature to big data, it is characterised in that:Data processing module carry out data block division specified rule be:Distributed Calculation algorithm, distributed libray parser and the corresponding Mathematical Modeling of distributed libray parser of data.
3. the system pre-processed as claimed in claim 2 for distributed nature to big data, it is characterised in that:When being divided for Distributed Calculation algorithm to data block, it is accomplished by the following way:
Data aggregate, by the sequence of data, Classifying Sum, data packet operation by data integration be data block;
Data recombination, according to specific rule, extracts corresponding data items, reconfigures as new data block;
The data that correlation rule is met between data item, by setting associated data rule, are divided into a data block by data correlation;
Data cutting, on the basis of data aggregate, data correlation and data reorganization operation, for between different pieces of information block in Distributed Calculation or between the data of different machines data calculating need to be carried out according to the computation model of setting, data are carried out by data cutting by specified rule according to business demand, so that data be distributed on different nodes in order;
Computation model, i.e., the mathematical formulae taken out according to business demand;
The data item of data division can be carried out in specified rule, including data category, size of data or calculating data.
4. the system pre-processed as claimed in claim 2 for distributed nature to big data, it is characterised in that:When being divided for distributed libray parser to data, it is accomplished by the following way:
Data message is extracted, and according to the parameter requirements of parser, extracts the data item for needing to analyze, and be stored on identical back end;
Data processing, on the basis of initial data, according to business diagnosis target, sets corresponding computing formula, new data item is produced by the calculating between data with existing;
Mining analysis algorithm data form is changed, and converts raw data into the data form of mining analysis algorithm requirement.
5. the system pre-processed as claimed in claim 2 for distributed nature to big data, it is characterised in that:When carrying out data block division using the corresponding Mathematical Modeling of distributed libray parser, it is accomplished by the following way:
By Data Format Transform and data model extraction, data item, data type and the data form needed for Mathematical Modeling are extracted, by data distribution to different nodes;
Mathematical Modeling Data Format Transform, the data form converted raw data into needed for Mathematical Modeling;
Data model is extracted, the need for mining analysis, and extracting part typical data according to specified rule in initial data builds Mathematical Modeling.
6. the system pre-processed as claimed in claim 1 for distributed nature to big data, it is characterised in that:The configuration item of automation pretreatment adapter can be corresponded by the field name or data item mapped with data storage, the data item of automation pretreatment adapter can be chosen by configuring the page, or by being set to configuration item data parameter value, changing and choose.
7. a kind of method pre-processed for distributed nature to big data, it is characterised in that including:
The first step, initial data is converted into the data of object format according to different data source formats, major key of one of data item as calculating is set according to data analysis target, based on available data item, data corresponding to the prime key item of any two data in all data are combined, associated data pair is drawn;
Second step, the prime key item based on associated data centering, the corresponding computation model of setting obtain the data item needed for association is calculated as value, are converted to<key,value>Key-value pair;
3rd step, different data blocks is obtained to the division that key-value pair carries out data block according to specified rule, and obtain new data block to obtained data block progress parallel computation;
4th step, new data block is divided on different nodes, does not possess any relevance between obtained data block.
8. the method pre-processed as claimed in claim 7 for distributed nature to big data, it is characterised in that:When carrying out data block division using Distributed Calculation algorithm, it is accomplished by the following way:
Data aggregate, by the sequence of data, Classifying Sum, data packet operation by data integration be data block;
Data recombination, according to specific rule, extracts corresponding data items, reconfigures as new data block;
The data that correlation rule is met between data item, by setting associated data rule, are divided into a data block by data correlation;
Data cutting, on the basis of data aggregate, data correlation and data reorganization operation, for between different pieces of information block in Distributed Calculation or between the data of different machines data calculating need to be carried out according to the computation model of setting, data are carried out by data cutting by specified rule according to business demand, so that data be distributed on different nodes in order;
Computation model, i.e., the mathematical formulae taken out according to business demand;
The data item of data division can be carried out in specified rule, including data category, size of data or calculating data.
9. the method pre-processed as claimed in claim 7 for distributed nature to big data, it is characterised in that:When carrying out data block division using distributed libray parser, it is accomplished by the following way:
Data message is extracted, and according to the parameter requirements of parser, extracts the data item for needing to analyze, and be stored on identical back end;
Data processing, on the basis of initial data, according to business diagnosis target, sets corresponding computing formula, new data item is produced by the calculating between data with existing;
Mining analysis algorithm data form is changed, and converts raw data into the data form of mining analysis algorithm requirement.
10. the method pre-processed as claimed in claim 7 for distributed nature to big data, it is characterised in that:When carrying out data block division using the corresponding Mathematical Modeling of distributed libray parser, it is accomplished by the following way:
By Data Format Transform and data model extraction, data item, data type and the data form needed for Mathematical Modeling are extracted, by data distribution to different nodes;
Mathematical Modeling Data Format Transform, the data form converted raw data into needed for Mathematical Modeling;
Data model is extracted, the need for mining analysis, and extracting part typical data according to specified rule in initial data builds Mathematical Modeling.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610010843.9A CN106959948A (en) | 2016-01-08 | 2016-01-08 | The system and its preprocess method pre-processed for distributed nature to big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610010843.9A CN106959948A (en) | 2016-01-08 | 2016-01-08 | The system and its preprocess method pre-processed for distributed nature to big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106959948A true CN106959948A (en) | 2017-07-18 |
Family
ID=59480733
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610010843.9A Withdrawn CN106959948A (en) | 2016-01-08 | 2016-01-08 | The system and its preprocess method pre-processed for distributed nature to big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106959948A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920516A (en) * | 2018-05-31 | 2018-11-30 | 北京字节跳动网络技术有限公司 | Real-time analysis method, system, device and computer readable storage medium |
CN109344131A (en) * | 2018-10-10 | 2019-02-15 | 国网安徽省电力有限公司信息通信分公司 | Date storage method, device and management server |
CN109977271A (en) * | 2019-04-29 | 2019-07-05 | 华北理工大学 | A kind of big data processing system and its processing method |
CN110175546A (en) * | 2019-05-15 | 2019-08-27 | 深圳市商汤科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN111198882A (en) * | 2019-12-26 | 2020-05-26 | 东软集团股份有限公司 | Data processing method and device, storage medium and electronic equipment |
CN111447257A (en) * | 2020-03-09 | 2020-07-24 | 中国建设银行股份有限公司 | Message conversion method and device |
CN112559483A (en) * | 2020-12-22 | 2021-03-26 | 赛尔网络有限公司 | HDFS-based data management method and device, electronic equipment and medium |
CN112835917A (en) * | 2021-01-28 | 2021-05-25 | 山东浪潮通软信息科技有限公司 | Data caching method and system based on blood relationship distribution |
WO2021189695A1 (en) * | 2020-03-25 | 2021-09-30 | 平安科技(深圳)有限公司 | Distributed database dynamic expansion method and apparatus, and device and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101089725B1 (en) * | 2011-03-18 | 2011-12-07 | 동국대학교 산학협력단 | Method of designing threshold filter for lossless image compression, apparatus and method for lossless image compression using the filter |
CN104408159A (en) * | 2014-12-04 | 2015-03-11 | 曙光信息产业(北京)有限公司 | Data correlating, loading and querying method and device |
WO2015113636A1 (en) * | 2014-02-03 | 2015-08-06 | Nokia Solutions And Networks Oy | Architecture for enhanced processing of big data in network environment |
CN103646111B (en) * | 2013-12-25 | 2017-02-15 | 普元信息技术股份有限公司 | System and method for realizing real-time data association in big data environment |
-
2016
- 2016-01-08 CN CN201610010843.9A patent/CN106959948A/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101089725B1 (en) * | 2011-03-18 | 2011-12-07 | 동국대학교 산학협력단 | Method of designing threshold filter for lossless image compression, apparatus and method for lossless image compression using the filter |
CN103646111B (en) * | 2013-12-25 | 2017-02-15 | 普元信息技术股份有限公司 | System and method for realizing real-time data association in big data environment |
WO2015113636A1 (en) * | 2014-02-03 | 2015-08-06 | Nokia Solutions And Networks Oy | Architecture for enhanced processing of big data in network environment |
CN104408159A (en) * | 2014-12-04 | 2015-03-11 | 曙光信息产业(北京)有限公司 | Data correlating, loading and querying method and device |
Non-Patent Citations (1)
Title |
---|
吕婉琪等: ""Hadoop分布式架构下大数据集的并行挖掘"", 《计算机技术与发展》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920516A (en) * | 2018-05-31 | 2018-11-30 | 北京字节跳动网络技术有限公司 | Real-time analysis method, system, device and computer readable storage medium |
CN108920516B (en) * | 2018-05-31 | 2022-03-22 | 北京字节跳动网络技术有限公司 | Real-time analysis method, system, device and computer readable storage medium |
CN109344131A (en) * | 2018-10-10 | 2019-02-15 | 国网安徽省电力有限公司信息通信分公司 | Date storage method, device and management server |
CN109344131B (en) * | 2018-10-10 | 2022-03-29 | 国网安徽省电力有限公司信息通信分公司 | Data storage method and device and management server |
CN109977271A (en) * | 2019-04-29 | 2019-07-05 | 华北理工大学 | A kind of big data processing system and its processing method |
CN109977271B (en) * | 2019-04-29 | 2022-12-20 | 重庆憨牛技术创新服务有限公司 | Big data processing system and processing method thereof |
CN110175546A (en) * | 2019-05-15 | 2019-08-27 | 深圳市商汤科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN111198882A (en) * | 2019-12-26 | 2020-05-26 | 东软集团股份有限公司 | Data processing method and device, storage medium and electronic equipment |
CN111447257A (en) * | 2020-03-09 | 2020-07-24 | 中国建设银行股份有限公司 | Message conversion method and device |
WO2021189695A1 (en) * | 2020-03-25 | 2021-09-30 | 平安科技(深圳)有限公司 | Distributed database dynamic expansion method and apparatus, and device and storage medium |
CN112559483A (en) * | 2020-12-22 | 2021-03-26 | 赛尔网络有限公司 | HDFS-based data management method and device, electronic equipment and medium |
CN112835917A (en) * | 2021-01-28 | 2021-05-25 | 山东浪潮通软信息科技有限公司 | Data caching method and system based on blood relationship distribution |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106959948A (en) | The system and its preprocess method pre-processed for distributed nature to big data | |
CN111259064B (en) | Visual natural language analysis mining system and modeling method thereof | |
CN104331435B (en) | A kind of efficient mass data abstracting method of low influence based on Hadoop big data platforms | |
CN112148772A (en) | Alarm root cause identification method, device, equipment and storage medium | |
CN107885499A (en) | A kind of interface document generation method and terminal device | |
CN105677615B (en) | A kind of distributed machines learning method based on weka interface | |
CN109710703A (en) | A kind of generation method and device of genetic connection network | |
KR101617696B1 (en) | Method and device for mining data regular expression | |
CN106815307A (en) | Public Culture knowledge mapping platform and its use method | |
CN109857803B (en) | Data synchronization method, device, equipment, system and computer readable storage medium | |
CN103279478A (en) | Method for extracting features based on distributed mutual information documents | |
CN114399006B (en) | Multi-source abnormal composition image data fusion method and system based on super-calculation | |
CN104615765A (en) | Data processing method and data processing device for browsing internet records of mobile subscribers | |
CN111210432A (en) | Image semantic segmentation method based on multi-scale and multi-level attention mechanism | |
CN107590225A (en) | A kind of Visualized management system based on distributed data digging algorithm | |
CN114036183A (en) | Data ETL processing method, device, equipment and medium | |
CN103810197A (en) | Hadoop-based data processing method and system | |
CN106874479A (en) | The improved method and device of the FP Growth algorithms based on FPGA | |
CN117093619A (en) | Rule engine processing method and device, electronic equipment and storage medium | |
CN109284088B (en) | Signaling big data processing method and electronic equipment | |
CN110765276A (en) | Entity alignment method and device in knowledge graph | |
CN106682107B (en) | Method and device for determining incidence relation of database table | |
CN114880385B (en) | Method and device for accessing geological disaster data through automatic combination process | |
CN115269654A (en) | Data cache supplementing method, device, equipment and medium | |
CN104268270A (en) | Map Reduce based method for mining triangles in massive social network data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20170718 |