CN108132970A - Big data distributed approach and system based on cloud computing - Google Patents

Big data distributed approach and system based on cloud computing Download PDF

Info

Publication number
CN108132970A
CN108132970A CN201711259714.4A CN201711259714A CN108132970A CN 108132970 A CN108132970 A CN 108132970A CN 201711259714 A CN201711259714 A CN 201711259714A CN 108132970 A CN108132970 A CN 108132970A
Authority
CN
China
Prior art keywords
file
value
relation
mapping
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711259714.4A
Other languages
Chinese (zh)
Inventor
黄凯锋
周岩
王旭辉
李莉
孟庆超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Luoyang Normal University
Original Assignee
Luoyang Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Luoyang Normal University filed Critical Luoyang Normal University
Priority to CN201711259714.4A priority Critical patent/CN108132970A/en
Publication of CN108132970A publication Critical patent/CN108132970A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24561Intermediate data storage techniques for performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

A kind of big data distributed approach based on cloud computing, includes the following steps:S1, input file is received, input fragment is carried out according to input file size, distribute each input fragment to a mapping tasks, input fragment stores fragment length and records the array of the position of data;S2, it is mapped to obtain intermediate file on data memory node by the mapping function write in advance;Duplicate key value in S3, merging intermediate file;S4, circulating memory buffering area is opened up in memory, circulating memory buffering area exports for mapping output file;Configuration file is created in circulating memory buffering area;Protection thread pause writes data into memory, and spill file is written in memory, and spill file determines the file of write-in disk, and by the file write-in disk of circulating memory buffering area until all mapping output file output finishes;S5, it all mapping output files and will store on distributed file storage system.

Description

Big data distributed approach and system based on cloud computing
Technical field
The present invention relates to big data field of cloud computer technology, at more particularly to a kind of big data distribution based on cloud computing Manage method and system.
Background technology
With the arriving of cloud era, big data (Big data) has also attracted more and more concerns.Big data (Big Data a large amount of unstructured datas and semi-structured data) are conventionally used to indicate, these data are downloading to relevant database It is analyzed for purposes.Big data analysis is often linked together with cloud computing, because large data set analysis needs picture in real time Frame the same MapReduce shares out the work to tens of, hundreds of or even thousands of computer.Big data needs special skill Art, effectively to handle the data in a large amount of tolerance elapsed time.Suitable for the technology of big data, at large-scale parallel Manage (MPP) database, data digging system, distributed file system, distributed data base, cloud computing platform, internet and can The storage system of extension.
Data source is very abundant under big data environment and data type is various, and the data volume of storage and analysis mining is huge Greatly, to the more demanding of data exhibiting, and value very much the high efficiency and availability of data processing.However traditional data processing side Method has the following disadvantages:1st, traditional data acquisition source is single, and storage, management and analysis data volume are also relatively small, greatly It can mostly be handled using relevant database and parallel data warehouse.To dependence parallel computation promotion data processing speed aspect Speech, traditional parallel database technology pursue high consistency and fault-tolerance, according to CAP theories, it is difficult to ensure its availability and Autgmentability.2nd, traditional data processing method is that the expense of calculating is considerably increased centered on processor, can not adapt to big number According to a large amount of unstructuredness data process demand.
Invention content
In view of this, the present invention proposes a kind of big data distributed approach and system based on cloud computing.
A kind of big data distributed approach based on cloud computing, includes the following steps:
S1, input file is received, input fragment is carried out according to input file size, distributing one by each input fragment reflects Task is penetrated, input fragment stores fragment length and records the array of the position of data;
S2, it is mapped to obtain intermediate file on data memory node by the mapping function write in advance;
Duplicate key value in S3, merging intermediate file, maps output file redundancy to reduce;And to the key assignments after merging into Row serializing obtains mapped cache file;Automatically the computational load value of each calculate node is obtained, according to the calculating of calculate node Each mapped cache file is assigned in each calculate node by load value;
S4, circulating memory buffering area is opened up in memory, circulating memory buffering area exports for mapping output file;In ring Configuration file is created in shape core buffer, the EMS memory occupation threshold value of core buffer is configured in configuration file;In annular It deposits in buffering area EMS memory occupation to be greater than or equal to when occupying threshold value, protection thread pause writes data into memory, and in memory Be written spill file, spill file determines the file of write-in disk, and by the file of circulating memory buffering area write-in disk until All mapping output file outputs finish;
S5, it all mapping output files and will store on distributed file storage system.
In the big data distributed approach of the present invention based on cloud computing, to input text in the step S1 Part size carries out input fragment and includes:
Incidence relation table is established, input file is split as position relationship value, activity relationship value, structural relation value, function Relation value, functional relationship value, behavior relation value and other relation values, and by the correspondence of each relation value of each input file In relationship write-in incidence relation table;
It will be in the corresponding data cut-in input fragment of each relation value.
In the big data distributed approach of the present invention based on cloud computing, the step S2 includes:
Mapped by the mapping function write in advance by fragment is inputted according to mapping tasks, the mapping include according to Pre-set data format will input fragment content and will be aligned into row-column list, judge that position relationship value, activity relationship value, structure are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values whether there is, if each relation value is deposited Then directly retaining, if there is no a certain item or a few n-th-trem relation n values, then the relation value lacked is sky;The arrangement of each relationship Sequence is consistent.
In the big data distributed approach of the present invention based on cloud computing,
The step S5 includes:
The corresponding all index informations of each mapping output file are inquired from incidence relation table, by each mapping output text Each corresponding segment data of part is inserted into section list;The position relationship value, activity relationship value, structure for recording segment data are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values.
In the big data distributed approach of the present invention based on cloud computing,
The mapping function by writing in advance map also by fragment is inputted according to mapping tasks in the step S2 Including judging that with the presence or absence of logic error, the input fragment is abandoned if existing for input fragment according to incidence relation table.
The present invention also provides a kind of big data distributed processing system(DPS) based on cloud computing, including such as lower unit:
For receiving input file, input fragment is carried out according to input file size for split cells, by each input fragment A mapping tasks are distributed, input fragment stores fragment length and records the array of the position of data;
Map unit is mapped to obtain intermediate text on data memory node for passing through the mapping function write in advance Part;
Computing unit for merging the duplicate key value in intermediate file, maps output file redundancy to reduce;And to merging Key assignments afterwards is serialized to obtain mapped cache file;Automatically the computational load value of each calculate node is obtained, according to calculating Each mapped cache file is assigned in each calculate node by the computational load value of node;
Output unit, for opening up circulating memory buffering area in memory, circulating memory buffering area is literary for mapping output Part exports;Configuration file is created in circulating memory buffering area, the EMS memory occupation threshold of core buffer is configured in configuration file Value;When EMS memory occupation is greater than or equal to occupancy threshold value in circulating memory buffering area, protection thread pause writes data into memory, And spill file is written in memory, spill file determines the file of write-in disk, and the file of circulating memory buffering area is write Enter disk until all mapping output file output finishes;
Merge storage unit, for by all mapping output files and storing to distributed file storage system.
In the big data distributed processing system(DPS) of the present invention based on cloud computing, to input in the split cells File size carries out input fragment and includes:
Incidence relation table is established, input file is split as position relationship value, activity relationship value, structural relation value, function Relation value, functional relationship value, behavior relation value and other relation values, and by the correspondence of each relation value of each input file In relationship write-in incidence relation table;
It will be in the corresponding data cut-in input fragment of each relation value.
In the big data distributed processing system(DPS) of the present invention based on cloud computing, the map unit includes:
Mapped by the mapping function write in advance by fragment is inputted according to mapping tasks, the mapping include according to Pre-set data format will input fragment content and will be aligned into row-column list, judge that position relationship value, activity relationship value, structure are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values whether there is, if each relation value is deposited Then directly retaining, if there is no a certain item or a few n-th-trem relation n values, then the relation value lacked is sky;The arrangement of each relationship Sequence is consistent.
In the big data distributed processing system(DPS) of the present invention based on cloud computing,
The merging storage unit includes:
The corresponding all index informations of each mapping output file are inquired from incidence relation table, by each mapping output text Each corresponding segment data of part is inserted into section list;The position relationship value, activity relationship value, structure for recording segment data are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values.
In the big data distributed processing system(DPS) of the present invention based on cloud computing,
The mapping function by writing in advance is mapped fragment is inputted according to mapping tasks in the map unit It further includes and judges that with the presence or absence of logic error, the input fragment is abandoned if existing for input fragment according to incidence relation table.
Implement the big data distributed approach provided by the invention based on cloud computing and system compared with prior art It has the advantages that:By the way that if the big data data of magnanimity have been divided into stem portion according to pre-set rule, point To more processor parallel processings;Then each processor, treated that result carries out summarizes operation to obtain final result; It has the following effects that:It can realize a large amount of, the non-structured data of processing, improve data processing type and speed.
Description of the drawings
Fig. 1 be the embodiment of the present invention modified wireless communication procedure in language transfer method flow chart.
Specific embodiment
As shown in Figure 1, a kind of big data distributed approach based on cloud computing, includes the following steps:
S1, input file is received, input fragment is carried out according to input file size, distributing one by each input fragment reflects Task is penetrated, input fragment stores fragment length and records the array of the position of data;
S2, it is mapped to obtain intermediate file on data memory node by the mapping function write in advance;
Duplicate key value in S3, merging intermediate file, maps output file redundancy to reduce;And to the key assignments after merging into Row serializing obtains mapped cache file;Automatically the computational load value of each calculate node is obtained, according to the calculating of calculate node Each mapped cache file is assigned in each calculate node by load value;
S4, circulating memory buffering area is opened up in memory, circulating memory buffering area exports for mapping output file;In ring Configuration file is created in shape core buffer, the EMS memory occupation threshold value of core buffer is configured in configuration file;In annular It deposits in buffering area EMS memory occupation to be greater than or equal to when occupying threshold value, protection thread pause writes data into memory, and in memory Be written spill file, spill file determines the file of write-in disk, and by the file of circulating memory buffering area write-in disk until All mapping output file outputs finish;
S5, it all mapping output files and will store on distributed file storage system.
In the big data distributed approach of the present invention based on cloud computing, to input text in the step S1 Part size carries out input fragment and includes:
Incidence relation table is established, input file is split as position relationship value, activity relationship value, structural relation value, function Relation value, functional relationship value, behavior relation value and other relation values, and by the correspondence of each relation value of each input file In relationship write-in incidence relation table;
It will be in the corresponding data cut-in input fragment of each relation value.
By implementing the embodiment of the present invention, various types of data can uniformly be split into each relation value, even if having A little relation value specific type of data do not have.Then distributed treatment is carried out to each relation value, data can be greatly improved Processing capacity.
In the big data distributed approach of the present invention based on cloud computing, the step S2 includes:
Mapped by the mapping function write in advance by fragment is inputted according to mapping tasks, the mapping include according to Pre-set data format will input fragment content and will be aligned into row-column list, judge that position relationship value, activity relationship value, structure are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values whether there is, if each relation value is deposited Then directly retaining, if there is no a certain item or a few n-th-trem relation n values, then the relation value lacked is sky;The arrangement of each relationship Sequence is consistent.
By implementing the present embodiment, it will input fragment content according to pre-set data format and be aligned into row-column list, be made The process resource for obtaining follow-up calculate node occupies less.
In the big data distributed approach of the present invention based on cloud computing,
The step S5 includes:
The corresponding all index informations of each mapping output file are inquired from incidence relation table, by each mapping output text Each corresponding segment data of part is inserted into section list;The position relationship value, activity relationship value, structure for recording segment data are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values.
In the big data distributed approach of the present invention based on cloud computing,
The mapping function by writing in advance map also by fragment is inputted according to mapping tasks in the step S2 Including judging that with the presence or absence of logic error, the input fragment is abandoned if existing for input fragment according to incidence relation table.
By implementing the present embodiment, redundancy, false judgment can be carried out to data, reduce operand.
The present invention also provides a kind of big data distributed processing system(DPS) based on cloud computing, including such as lower unit:
For receiving input file, input fragment is carried out according to input file size for split cells, by each input fragment A mapping tasks are distributed, input fragment stores fragment length and records the array of the position of data;
Map unit is mapped to obtain intermediate text on data memory node for passing through the mapping function write in advance Part;
Computing unit for merging the duplicate key value in intermediate file, maps output file redundancy to reduce;And to merging Key assignments afterwards is serialized to obtain mapped cache file;Automatically the computational load value of each calculate node is obtained, according to calculating Each mapped cache file is assigned in each calculate node by the computational load value of node;
Output unit, for opening up circulating memory buffering area in memory, circulating memory buffering area is literary for mapping output Part exports;Configuration file is created in circulating memory buffering area, the EMS memory occupation threshold of core buffer is configured in configuration file Value;When EMS memory occupation is greater than or equal to occupancy threshold value in circulating memory buffering area, protection thread pause writes data into memory, And spill file is written in memory, spill file determines the file of write-in disk, and the file of circulating memory buffering area is write Enter disk until all mapping output file output finishes;
Merge storage unit, for by all mapping output files and storing to distributed file storage system.
In the big data distributed processing system(DPS) of the present invention based on cloud computing, to input in the split cells File size carries out input fragment and includes:
Incidence relation table is established, input file is split as position relationship value, activity relationship value, structural relation value, function Relation value, functional relationship value, behavior relation value and other relation values, and by the correspondence of each relation value of each input file In relationship write-in incidence relation table;
It will be in the corresponding data cut-in input fragment of each relation value.
In the big data distributed processing system(DPS) of the present invention based on cloud computing, the map unit includes:
Mapped by the mapping function write in advance by fragment is inputted according to mapping tasks, the mapping include according to Pre-set data format will input fragment content and will be aligned into row-column list, judge that position relationship value, activity relationship value, structure are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values whether there is, if each relation value is deposited Then directly retaining, if there is no a certain item or a few n-th-trem relation n values, then the relation value lacked is sky;The arrangement of each relationship Sequence is consistent.
In the big data distributed processing system(DPS) of the present invention based on cloud computing,
The merging storage unit includes:
The corresponding all index informations of each mapping output file are inquired from incidence relation table, by each mapping output text Each corresponding segment data of part is inserted into section list;The position relationship value, activity relationship value, structure for recording segment data are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values.
In the big data distributed processing system(DPS) of the present invention based on cloud computing,
The mapping function by writing in advance is mapped fragment is inputted according to mapping tasks in the map unit It further includes and judges that with the presence or absence of logic error, the input fragment is abandoned if existing for input fragment according to incidence relation table.
Implement the big data distributed approach provided by the invention based on cloud computing and system compared with prior art It has the advantages that:By the way that if the big data data of magnanimity have been divided into stem portion according to pre-set rule, point To more processor parallel processings;Then each processor, treated that result carries out summarizes operation to obtain final result; It has the following effects that:It can realize a large amount of, the non-structured data of processing, improve data processing type and speed.
It is understood that for those of ordinary skill in the art, it can be conceived with the technique according to the invention and done Go out other various corresponding changes and deformation, and all these changes and deformation should all belong to the protection model of the claims in the present invention It encloses.

Claims (10)

1. a kind of big data distributed approach based on cloud computing, which is characterized in that it includes the following steps:
S1, input file is received, input fragment is carried out according to input file size, distributed each input fragment to a mapping and appoint The array of the position of business, input fragment storage fragment length and record data;
S2, it is mapped to obtain intermediate file on data memory node by the mapping function write in advance;
Duplicate key value in S3, merging intermediate file, maps output file redundancy to reduce;And sequence is carried out to the key assignments after merging Rowization obtain mapped cache file;Automatically the computational load value of each calculate node is obtained, according to the computational load of calculate node Each mapped cache file is assigned in each calculate node by value;
S4, circulating memory buffering area is opened up in memory, circulating memory buffering area exports for mapping output file;In annular It deposits and configuration file is created in buffering area, the EMS memory occupation threshold value of core buffer is configured in configuration file;Delay in circulating memory It rushes in area EMS memory occupation to be greater than or equal to when occupying threshold value, protection thread pause writes data into memory, and be written in memory Spill file, spill file determine the file of write-in disk, and by the file write-in disk of circulating memory buffering area until all Mapping output file output finish;
S5, it all mapping output files and will store on distributed file storage system.
2. the big data distributed approach based on cloud computing as described in claim 1, which is characterized in that the step S1 In to input file size carry out input fragment include:
Incidence relation table is established, input file is split as position relationship value, activity relationship value, structural relation value, functional relationship Value, functional relationship value, behavior relation value and other relation values, and by the correspondence of each relation value of each input file It is written in incidence relation table;
It will be in the corresponding data cut-in input fragment of each relation value.
3. the big data distributed approach based on cloud computing as claimed in claim 2, which is characterized in that the step S2 Including:
It is mapped by the mapping function write in advance by fragment is inputted according to mapping tasks, the mapping is included according to advance The data format of setting will input fragment content and will be aligned into row-column list, judge position relationship value, activity relationship value, structural relation Value, functional relationship value, functional relationship value, behavior relation value and other relation values whether there is, if each relation value exists Then directly retain, if there is no a certain item or a few n-th-trem relation n values, then the relation value lacked is sky;The arrangement of each relationship is suitable Sequence is consistent.
4. the big data distributed approach based on cloud computing as claimed in claim 3, which is characterized in that
The step S5 includes:
The corresponding all index informations of each mapping output file are inquired from incidence relation table, by each mapping output file Each a corresponding segment data is inserted into section list;Record position relationship value, activity relationship value, the structural relation of segment data Value, functional relationship value, functional relationship value, behavior relation value and other relation values.
5. the big data distributed approach based on cloud computing as claimed in claim 3, which is characterized in that
Input fragment is carried out mapping according to mapping tasks to the mapping function by writing in advance in the step S2 to further include Judge that with the presence or absence of logic error, the input fragment is abandoned if existing for input fragment according to incidence relation table.
6. a kind of big data distributed processing system(DPS) based on cloud computing, which is characterized in that it includes such as lower unit:
For receiving input file, input fragment is carried out according to input file size for split cells, by each input fragment distribution The array of the position of one mapping tasks, input fragment storage fragment length and record data;
Map unit is mapped to obtain intermediate file on data memory node for passing through the mapping function write in advance;
Computing unit for merging the duplicate key value in intermediate file, maps output file redundancy to reduce;And to merging after Key assignments is serialized to obtain mapped cache file;Automatically the computational load value of each calculate node is obtained, according to calculate node Computational load value each mapped cache file is assigned in each calculate node;
Output unit, for opening up circulating memory buffering area in memory, circulating memory buffering area is defeated for mapping output file Go out;Configuration file is created in circulating memory buffering area, the EMS memory occupation threshold value of core buffer is configured in configuration file; When EMS memory occupation is greater than or equal to occupancy threshold value in circulating memory buffering area, protection thread, which suspends, writes data into memory, and Spill file is written in memory, spill file determines the file of write-in disk, and the file of circulating memory buffering area is written magnetic Disk is until all mapping output file output finishes;
Merge storage unit, for by all mapping output files and storing to distributed file storage system.
7. the big data distributed processing system(DPS) based on cloud computing as claimed in claim 6, which is characterized in that described to split list Input fragment is carried out in member to input file size to include:
Incidence relation table is established, input file is split as position relationship value, activity relationship value, structural relation value, functional relationship Value, functional relationship value, behavior relation value and other relation values, and by the correspondence of each relation value of each input file It is written in incidence relation table;
It will be in the corresponding data cut-in input fragment of each relation value.
8. the big data distributed approach based on cloud computing as claimed in claim 7, which is characterized in that the mapping is single Member includes:
It is mapped by the mapping function write in advance by fragment is inputted according to mapping tasks, the mapping is included according to advance The data format of setting will input fragment content and will be aligned into row-column list, judge position relationship value, activity relationship value, structural relation Value, functional relationship value, functional relationship value, behavior relation value and other relation values whether there is, if each relation value exists Then directly retain, if there is no a certain item or a few n-th-trem relation n values, then the relation value lacked is sky;The arrangement of each relationship is suitable Sequence is consistent.
9. the big data distributed processing system(DPS) based on cloud computing as claimed in claim 8, which is characterized in that
The merging storage unit includes:
The corresponding all index informations of each mapping output file are inquired from incidence relation table, by each mapping output file Each a corresponding segment data is inserted into section list;Record position relationship value, activity relationship value, the structural relation of segment data Value, functional relationship value, functional relationship value, behavior relation value and other relation values.
10. the big data distributed processing system(DPS) based on cloud computing as claimed in claim 9, which is characterized in that
Input fragment according to mapping tasks is mapped by the mapping function by writing in advance in the map unit and is also wrapped It includes and judges that with the presence or absence of logic error, the input fragment is abandoned if existing for input fragment according to incidence relation table.
CN201711259714.4A 2017-12-04 2017-12-04 Big data distributed approach and system based on cloud computing Pending CN108132970A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711259714.4A CN108132970A (en) 2017-12-04 2017-12-04 Big data distributed approach and system based on cloud computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711259714.4A CN108132970A (en) 2017-12-04 2017-12-04 Big data distributed approach and system based on cloud computing

Publications (1)

Publication Number Publication Date
CN108132970A true CN108132970A (en) 2018-06-08

Family

ID=62388934

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711259714.4A Pending CN108132970A (en) 2017-12-04 2017-12-04 Big data distributed approach and system based on cloud computing

Country Status (1)

Country Link
CN (1) CN108132970A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522110A (en) * 2018-11-19 2019-03-26 视联动力信息技术股份有限公司 A kind of multiple task management system and method based on view networking
CN109584068A (en) * 2018-11-02 2019-04-05 深圳市快付通金融网络科技服务有限公司 A kind of distribution of funds formula liquidation method, apparatus and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106198A (en) * 2011-11-09 2013-05-15 金蝶软件(中国)有限公司 Tree structure implementation method and tree structure implementation device
CN106528757A (en) * 2016-11-03 2017-03-22 北京中安智达科技有限公司 Big data-oriented relation analysis display method
CN106951475A (en) * 2017-03-07 2017-07-14 郑州铁路职业技术学院 Big data distributed approach and system based on cloud computing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103106198A (en) * 2011-11-09 2013-05-15 金蝶软件(中国)有限公司 Tree structure implementation method and tree structure implementation device
CN106528757A (en) * 2016-11-03 2017-03-22 北京中安智达科技有限公司 Big data-oriented relation analysis display method
CN106951475A (en) * 2017-03-07 2017-07-14 郑州铁路职业技术学院 Big data distributed approach and system based on cloud computing

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584068A (en) * 2018-11-02 2019-04-05 深圳市快付通金融网络科技服务有限公司 A kind of distribution of funds formula liquidation method, apparatus and system
CN109522110A (en) * 2018-11-19 2019-03-26 视联动力信息技术股份有限公司 A kind of multiple task management system and method based on view networking
CN109522110B (en) * 2018-11-19 2020-03-31 视联动力信息技术股份有限公司 Multitask management system and method based on video networking

Similar Documents

Publication Publication Date Title
CN106951475A (en) Big data distributed approach and system based on cloud computing
US10372723B2 (en) Efficient query processing using histograms in a columnar database
US10990288B2 (en) Systems and/or methods for leveraging in-memory storage in connection with the shuffle phase of MapReduce
WO2018214388A1 (en) Multi-platform big data system and method for aviation electronics
US9256665B2 (en) Creation of inverted index system, and data processing method and apparatus
US20140358977A1 (en) Management of Intermediate Data Spills during the Shuffle Phase of a Map-Reduce Job
CN106126601A (en) A kind of social security distributed preprocess method of big data and system
CN104036029B (en) Large data consistency control methods and system
US10114846B1 (en) Balanced distribution of sort order values for a multi-column sort order of a relational database
CN103425762A (en) Telecom operator mass data processing method based on Hadoop platform
Humbetov Data-intensive computing with map-reduce and hadoop
CN102456076A (en) Massive fragment data aggregation system and method
CN115169810A (en) Artificial intelligence system construction method and device for power grid regulation
CN105095247A (en) Symbolic data analysis method and system
CN108255966A (en) A kind of data migration method and storage medium
CN104572505A (en) System and method for ensuring eventual consistency of mass data caches
Miller et al. Open source big data analytics frameworks written in scala
CN108132970A (en) Big data distributed approach and system based on cloud computing
US20160203032A1 (en) Series data parallel analysis infrastructure and parallel distributed processing method therefor
CN109947743A (en) A kind of the NoSQL big data storage method and system of optimization
Lytvyn et al. Development of Intellectual System for Data De-Duplication and Distribution in Cloud Storage.
CN106648891A (en) MapReduce model-based task execution method and apparatus
CN105630997A (en) Data parallel processing method, device and equipment
CN110222105A (en) Data summarization processing method and processing device
Neves et al. Analysis of big data vendors for SMEs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180608

WD01 Invention patent application deemed withdrawn after publication