CN106951475A - Big data distributed approach and system based on cloud computing - Google Patents

Big data distributed approach and system based on cloud computing Download PDF

Info

Publication number
CN106951475A
CN106951475A CN201710130418.8A CN201710130418A CN106951475A CN 106951475 A CN106951475 A CN 106951475A CN 201710130418 A CN201710130418 A CN 201710130418A CN 106951475 A CN106951475 A CN 106951475A
Authority
CN
China
Prior art keywords
file
value
relation
mapping
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710130418.8A
Other languages
Chinese (zh)
Inventor
梁明亮
孙逸洁
刘伟
苏东民
董黎生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Railway Vocational and Technical College
Original Assignee
Zhengzhou Railway Vocational and Technical College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Railway Vocational and Technical College filed Critical Zhengzhou Railway Vocational and Technical College
Priority to CN201710130418.8A priority Critical patent/CN106951475A/en
Publication of CN106951475A publication Critical patent/CN106951475A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5015Service provider selection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of big data distributed approach based on cloud computing, it comprises the following steps:S1, reception input file, input burst is carried out according to input file size, and a mapping tasks are distributed by each input burst, and input burst stores the array of the position of burst length and record data;S2, by the mapping function write in advance on data memory node map obtaining intermediate file;S3, the duplicate key value merged in intermediate file;S4, open up circulating memory buffering area in internal memory, circulating memory buffering area is used to map output file output;Configuration file is created in circulating memory buffering area;Protection thread pause writes data into internal memory, and writes spill file in internal memory, and spill file determines the file of write-in disk, and the file of circulating memory buffering area is write into disk until all mapping output file output is finished;S5, by all mapping output files and store on distributed file storage system.

Description

Big data distributed approach and system based on cloud computing
Technical field
The present invention relates to big data field of cloud computer technology, at more particularly to a kind of big data distribution based on cloud computing Manage method and system.
Background technology
With the arriving of cloud era, big data (Big data) has also attracted increasing concern.Big data (Big Data a large amount of unstructured datas and semi-structured data) are conventionally used to indicate, these data are downloading to relevant database For purposes analysis.Big data analysis is often linked together with cloud computing, because large data set analysis needs picture in real time Framework the same MapReduce shares out the work to tens of, hundreds of or even thousands of computer.Big data needs special skill Art, effectively to handle the data in the substantial amounts of tolerance elapsed time.Suitable for the technology of big data, including at large-scale parallel Manage (MPP) database, data digging system, distributed file system, distributed data base, cloud computing platform, internet and can The storage system of extension.
Data source is enriched very much under big data environment and data type is various, and the data volume of storage and analysis mining is huge Greatly, the requirement to data exhibiting is higher, and values very much the high efficiency and availability of data processing.But traditional data processing side Method has the following disadvantages:1st, traditional data acquisition source is single, and storage, management and analyze data amount are also relatively small, greatly It is many to be handled using relevant database and parallel data warehouse.To by parallel computation lifting data processing speed aspect Speech, traditional parallel database technology pursues high consistency and fault-tolerance, theoretical according to CAP, it is difficult to ensure its availability and Autgmentability.2nd, traditional data processing method is the expense that calculating is considerably increased centered on processor, it is impossible to adapt to big number According to a large amount of unstructuredness data process demand.
The content of the invention
In view of this, the present invention proposes a kind of big data distributed approach and system based on cloud computing.
A kind of big data distributed approach based on cloud computing, it comprises the following steps:
S1, reception input file, input burst is carried out according to input file size, and distributing one by each input burst reflects Task is penetrated, input burst stores the array of the position of burst length and record data;
S2, by the mapping function write in advance on data memory node map obtaining intermediate file;
S3, the duplicate key value merged in intermediate file, to reduce mapping output file redundancy;And the key assignments after merging is entered Row serializing obtains mapped cache file;Automatically the computational load value of each calculate node is obtained, according to the calculating of calculate node Each mapped cache file is assigned in each calculate node by load value;
S4, open up circulating memory buffering area in internal memory, circulating memory buffering area is used to map output file output;In ring Configuration file is created in shape core buffer, the EMS memory occupation threshold value of core buffer is configured in configuration file;In annular Deposit in buffering area EMS memory occupation to be more than or equal to when taking threshold value, protection thread pause writes data into internal memory, and in internal memory Write spill file, spill file determines the file of write-in disk, and the file of circulating memory buffering area write into disk until All mapping output file outputs are finished;
S5, by all mapping output files and store on distributed file storage system.
In the big data distributed approach of the present invention based on cloud computing, to input text in the step S1 Part size, which carries out input burst, to be included:
Incidence relation table is set up, input file is split as position relationship value, activity relationship value, structural relation value, function Relation value, functional relationship value, behavior relation value and other relation values, and by the correspondence of each relation value of each input file In relation write-in incidence relation table;
The corresponding data of each relation value are included in input burst.
In the big data distributed approach of the present invention based on cloud computing, the step S2 includes:
Mapped by the mapping function write in advance by burst is inputted according to mapping tasks, the mapping including according to The data form pre-set will input burst content and enter row-column list alignment, judge that position relationship value, activity relationship value, structure are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values whether there is, if each relation value is deposited Then directly retaining, if there is no a certain item or a few n-th-trem relation n values, then the relation value lacked is sky;The arrangement of each relation Order is consistent.
In the big data distributed approach of the present invention based on cloud computing,
The step S5 includes:
Each corresponding all index information of mapping output file is inquired about from incidence relation table, by each mapping output text One segment data of each correspondence of part is inserted into section list;The position relationship value, activity relationship value, structure for recording segment data are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values.
In the big data distributed approach of the present invention based on cloud computing,
Burst will be inputted in the step S2 to the mapping function by writing in advance to carry out mapping also according to mapping tasks Including judging that input burst whether there is logic error according to incidence relation table, the input burst is abandoned if existing.
The present invention also provides a kind of big data distributed processing system(DPS) based on cloud computing, and it includes such as lower unit:
Split cells, for receiving input file, input burst is carried out according to input file size, by each input burst A mapping tasks are distributed, input burst stores the array of the position of burst length and record data;
Map unit, on data memory node map and obtains middle text for the mapping function by writing in advance Part;
Computing unit, for merging the duplicate key value in intermediate file, to reduce mapping output file redundancy;And to merging Key assignments afterwards serialize obtaining mapped cache file;Automatically the computational load value of each calculate node is obtained, according to calculating Each mapped cache file is assigned in each calculate node by the computational load value of node;
Output unit, for opening up circulating memory buffering area in internal memory, circulating memory buffering area is used to map output text Part is exported;Configuration file is created in circulating memory buffering area, the EMS memory occupation threshold of core buffer is configured in configuration file Value;When EMS memory occupation is more than or equal to occupancy threshold value in circulating memory buffering area, protection thread pause writes data into internal memory, And spill file is write in internal memory, spill file determines the file of write-in disk, and the file of circulating memory buffering area is write Enter disk until all mapping output file output is finished;
Merge memory cell, for by all mapping output files and storing to distributed file storage system.
In the big data distributed processing system(DPS) of the present invention based on cloud computing, to input in the split cells File size, which carries out input burst, to be included:
Incidence relation table is set up, input file is split as position relationship value, activity relationship value, structural relation value, function Relation value, functional relationship value, behavior relation value and other relation values, and by the correspondence of each relation value of each input file In relation write-in incidence relation table;
The corresponding data of each relation value are included in input burst.
In the big data distributed processing system(DPS) of the present invention based on cloud computing, the map unit includes:
Mapped by the mapping function write in advance by burst is inputted according to mapping tasks, the mapping including according to The data form pre-set will input burst content and enter row-column list alignment, judge that position relationship value, activity relationship value, structure are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values whether there is, if each relation value is deposited Then directly retaining, if there is no a certain item or a few n-th-trem relation n values, then the relation value lacked is sky;The arrangement of each relation Order is consistent.
In the big data distributed processing system(DPS) of the present invention based on cloud computing,
The merging memory cell includes:
Each corresponding all index information of mapping output file is inquired about from incidence relation table, by each mapping output text One segment data of each correspondence of part is inserted into section list;The position relationship value, activity relationship value, structure for recording segment data are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values.
In the big data distributed processing system(DPS) of the present invention based on cloud computing,
The mapping function by writing in advance is mapped burst is inputted according to mapping tasks in the map unit Also include judging that input burst whether there is logic error according to incidence relation table, the input burst is abandoned if existing.
Implement big data distributed approach based on cloud computing that the present invention provides and system compared with prior art Have the advantages that:By the way that if the big data data of magnanimity have been divided into stem portion according to the rule pre-set, point To many processor parallel processings;Then the result after each processor processing is carried out collecting operation to obtain final result; Have the following effects that:A large amount of, the non-structured data of processing can be realized, data processing type and speed is improved.
Brief description of the drawings
Fig. 1 be the embodiment of the present invention modified wireless communication procedure in language transfer method flow chart.
Embodiment
As shown in figure 1, a kind of big data distributed approach based on cloud computing, it comprises the following steps:
S1, reception input file, input burst is carried out according to input file size, and distributing one by each input burst reflects Task is penetrated, input burst stores the array of the position of burst length and record data;
S2, by the mapping function write in advance on data memory node map obtaining intermediate file;
S3, the duplicate key value merged in intermediate file, to reduce mapping output file redundancy;And the key assignments after merging is entered Row serializing obtains mapped cache file;Automatically the computational load value of each calculate node is obtained, according to the calculating of calculate node Each mapped cache file is assigned in each calculate node by load value;
S4, open up circulating memory buffering area in internal memory, circulating memory buffering area is used to map output file output;In ring Configuration file is created in shape core buffer, the EMS memory occupation threshold value of core buffer is configured in configuration file;In annular Deposit in buffering area EMS memory occupation to be more than or equal to when taking threshold value, protection thread pause writes data into internal memory, and in internal memory Write spill file, spill file determines the file of write-in disk, and the file of circulating memory buffering area write into disk until All mapping output file outputs are finished;
S5, by all mapping output files and store on distributed file storage system.
In the big data distributed approach of the present invention based on cloud computing, to input text in the step S1 Part size, which carries out input burst, to be included:
Incidence relation table is set up, input file is split as position relationship value, activity relationship value, structural relation value, function Relation value, functional relationship value, behavior relation value and other relation values, and by the correspondence of each relation value of each input file In relation write-in incidence relation table;
The corresponding data of each relation value are included in input burst.
By implementing the embodiment of the present invention, various types of data can uniformly be split into each relation value, even if having A little relation value specific type of data do not have.Then distributed treatment is carried out to each relation value, data can be greatly improved Disposal ability.
In the big data distributed approach of the present invention based on cloud computing, the step S2 includes:
Mapped by the mapping function write in advance by burst is inputted according to mapping tasks, the mapping including according to The data form pre-set will input burst content and enter row-column list alignment, judge that position relationship value, activity relationship value, structure are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values whether there is, if each relation value is deposited Then directly retaining, if there is no a certain item or a few n-th-trem relation n values, then the relation value lacked is sky;The arrangement of each relation Order is consistent.
By implementing the present embodiment, it will input burst content according to the data form pre-set and enter row-column list alignment, make The process resource for obtaining follow-up calculate node takes less.
In the big data distributed approach of the present invention based on cloud computing,
The step S5 includes:
Each corresponding all index information of mapping output file is inquired about from incidence relation table, by each mapping output text One segment data of each correspondence of part is inserted into section list;The position relationship value, activity relationship value, structure for recording segment data are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values.
In the big data distributed approach of the present invention based on cloud computing,
Burst will be inputted in the step S2 to the mapping function by writing in advance to carry out mapping also according to mapping tasks Including judging that input burst whether there is logic error according to incidence relation table, the input burst is abandoned if existing.
By implementing the present embodiment, redundancy, false judgment can be carried out to data, reduce operand.
The present invention also provides a kind of big data distributed processing system(DPS) based on cloud computing, and it includes such as lower unit:
Split cells, for receiving input file, input burst is carried out according to input file size, by each input burst A mapping tasks are distributed, input burst stores the array of the position of burst length and record data;
Map unit, on data memory node map and obtains middle text for the mapping function by writing in advance Part;
Computing unit, for merging the duplicate key value in intermediate file, to reduce mapping output file redundancy;And to merging Key assignments afterwards serialize obtaining mapped cache file;Automatically the computational load value of each calculate node is obtained, according to calculating Each mapped cache file is assigned in each calculate node by the computational load value of node;
Output unit, for opening up circulating memory buffering area in internal memory, circulating memory buffering area is used to map output text Part is exported;Configuration file is created in circulating memory buffering area, the EMS memory occupation threshold of core buffer is configured in configuration file Value;When EMS memory occupation is more than or equal to occupancy threshold value in circulating memory buffering area, protection thread pause writes data into internal memory, And spill file is write in internal memory, spill file determines the file of write-in disk, and the file of circulating memory buffering area is write Enter disk until all mapping output file output is finished;
Merge memory cell, for by all mapping output files and storing to distributed file storage system.
In the big data distributed processing system(DPS) of the present invention based on cloud computing, to input in the split cells File size, which carries out input burst, to be included:
Incidence relation table is set up, input file is split as position relationship value, activity relationship value, structural relation value, function Relation value, functional relationship value, behavior relation value and other relation values, and by the correspondence of each relation value of each input file In relation write-in incidence relation table;
The corresponding data of each relation value are included in input burst.
In the big data distributed processing system(DPS) of the present invention based on cloud computing, the map unit includes:
Mapped by the mapping function write in advance by burst is inputted according to mapping tasks, the mapping including according to The data form pre-set will input burst content and enter row-column list alignment, judge that position relationship value, activity relationship value, structure are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values whether there is, if each relation value is deposited Then directly retaining, if there is no a certain item or a few n-th-trem relation n values, then the relation value lacked is sky;The arrangement of each relation Order is consistent.
In the big data distributed processing system(DPS) of the present invention based on cloud computing,
The merging memory cell includes:
Each corresponding all index information of mapping output file is inquired about from incidence relation table, by each mapping output text One segment data of each correspondence of part is inserted into section list;The position relationship value, activity relationship value, structure for recording segment data are closed Set occurrence, functional relationship value, functional relationship value, behavior relation value and other relation values.
In the big data distributed processing system(DPS) of the present invention based on cloud computing,
The mapping function by writing in advance is mapped burst is inputted according to mapping tasks in the map unit Also include judging that input burst whether there is logic error according to incidence relation table, the input burst is abandoned if existing.
Implement big data distributed approach based on cloud computing that the present invention provides and system compared with prior art Have the advantages that:By the way that if the big data data of magnanimity have been divided into stem portion according to the rule pre-set, point To many processor parallel processings;Then the result after each processor processing is carried out collecting operation to obtain final result; Have the following effects that:A large amount of, the non-structured data of processing can be realized, data processing type and speed is improved.Can be with Apply in fields such as Study of Intelligent Robot Control, track traffic controls, have broad application prospects.
It is understood that for the person of ordinary skill of the art, can be done with technique according to the invention design Go out other various corresponding changes and deformation, and all these changes and deformation should all belong to the protection model of the claims in the present invention Enclose.

Claims (10)

1. a kind of big data distributed approach based on cloud computing, it is characterised in that it comprises the following steps:
S1, reception input file, input burst is carried out according to input file size, and a mapping times is distributed by each input burst The array of the position of business, input burst storage burst length and record data;
S2, by the mapping function write in advance on data memory node map obtaining intermediate file;
S3, the duplicate key value merged in intermediate file, to reduce mapping output file redundancy;And sequence is carried out to the key assignments after merging Row obtain mapped cache file;Automatically the computational load value of each calculate node is obtained, according to the computational load of calculate node Each mapped cache file is assigned in each calculate node by value;
S4, open up circulating memory buffering area in internal memory, circulating memory buffering area is used to map output file output;In annular Deposit and configuration file is created in buffering area, the EMS memory occupation threshold value of core buffer is configured in configuration file;It is slow in circulating memory Rush in area EMS memory occupation to be more than or equal to when taking threshold value, protection thread pause writes data into internal memory, and writes in internal memory Spill file, spill file determines the file of write-in disk, and the file of circulating memory buffering area is write into disk until all Mapping output file output finish;
S5, by all mapping output files and store on distributed file storage system.
2. the big data distributed approach as claimed in claim 1 based on cloud computing, it is characterised in that the step S1 In to input file size carry out input burst include:
Incidence relation table is set up, input file is split as position relationship value, activity relationship value, structural relation value, functional relationship Value, functional relationship value, behavior relation value and other relation values, and by the corresponding relation of each relation value of each input file Write in incidence relation table;
The corresponding data of each relation value are included in input burst.
3. the big data distributed approach as claimed in claim 2 based on cloud computing, it is characterised in that the step S2 Including:
Mapped by the mapping function write in advance by burst is inputted according to mapping tasks, the mapping is included according to advance The data form of setting will input burst content and enter row-column list alignment, judge position relationship value, activity relationship value, structural relation Value, functional relationship value, functional relationship value, behavior relation value and other relation values whether there is, if each relation value is present Then directly retain, if there is no a certain item or a few n-th-trem relation n values, then the relation value lacked is sky;The arrangement of each relation is suitable Sequence is consistent.
4. the big data distributed approach as claimed in claim 3 based on cloud computing, it is characterised in that
The step S5 includes:
Each corresponding all index information of mapping output file is inquired about from incidence relation table, by each mapping output file Each one segment data of correspondence is inserted into section list;Record the position relationship value, activity relationship value, structural relation of segment data Value, functional relationship value, functional relationship value, behavior relation value and other relation values.
5. the big data distributed approach as claimed in claim 3 based on cloud computing, it is characterised in that
The mapping function by writing in advance, which will be inputted burst and be mapped according to mapping tasks, in the step S2 also includes Judge that input burst whether there is logic error according to incidence relation table, the input burst is abandoned if existing.
6. a kind of big data distributed processing system(DPS) based on cloud computing, it is characterised in that it includes such as lower unit:
Split cells, for receiving input file, input burst is carried out according to input file size, by each input burst distribution The array of the position of one mapping tasks, input burst storage burst length and record data;
Map unit, on data memory node map obtaining intermediate file for the mapping function by writing in advance;
Computing unit, for merging the duplicate key value in intermediate file, to reduce mapping output file redundancy;And to merging after Key assignments serialize obtaining mapped cache file;Automatically the computational load value of each calculate node is obtained, according to calculate node Computational load value each mapped cache file is assigned in each calculate node;
Output unit, for opening up circulating memory buffering area in internal memory, circulating memory buffering area is defeated for mapping output file Go out;Configuration file is created in circulating memory buffering area, the EMS memory occupation threshold value of core buffer is configured in configuration file; When EMS memory occupation is more than or equal to occupancy threshold value in circulating memory buffering area, protection thread, which suspends, writes data into internal memory, and Spill file is write in internal memory, spill file determines the file of write-in disk, and the file of circulating memory buffering area is write into magnetic Disk is until all mapping output file output is finished;
Merge memory cell, for by all mapping output files and storing to distributed file storage system.
7. the big data distributed processing system(DPS) as claimed in claim 6 based on cloud computing, it is characterised in that the fractionation list Carrying out input burst to input file size in member includes:
Incidence relation table is set up, input file is split as position relationship value, activity relationship value, structural relation value, functional relationship Value, functional relationship value, behavior relation value and other relation values, and by the corresponding relation of each relation value of each input file Write in incidence relation table;
The corresponding data of each relation value are included in input burst.
8. the big data distributed approach as claimed in claim 7 based on cloud computing, it is characterised in that the mapping list Member includes:
Mapped by the mapping function write in advance by burst is inputted according to mapping tasks, the mapping is included according to advance The data form of setting will input burst content and enter row-column list alignment, judge position relationship value, activity relationship value, structural relation Value, functional relationship value, functional relationship value, behavior relation value and other relation values whether there is, if each relation value is present Then directly retain, if there is no a certain item or a few n-th-trem relation n values, then the relation value lacked is sky;The arrangement of each relation is suitable Sequence is consistent.
9. the big data distributed processing system(DPS) as claimed in claim 8 based on cloud computing, it is characterised in that
The merging memory cell includes:
Each corresponding all index information of mapping output file is inquired about from incidence relation table, by each mapping output file Each one segment data of correspondence is inserted into section list;Record the position relationship value, activity relationship value, structural relation of segment data Value, functional relationship value, functional relationship value, behavior relation value and other relation values.
10. the big data distributed processing system(DPS) as claimed in claim 9 based on cloud computing, it is characterised in that
Burst will be inputted in the map unit to the mapping function by writing in advance to be mapped and also wrap according to mapping tasks Include and judge that input burst whether there is logic error according to incidence relation table, the input burst is abandoned if existing.
CN201710130418.8A 2017-03-07 2017-03-07 Big data distributed approach and system based on cloud computing Pending CN106951475A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710130418.8A CN106951475A (en) 2017-03-07 2017-03-07 Big data distributed approach and system based on cloud computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710130418.8A CN106951475A (en) 2017-03-07 2017-03-07 Big data distributed approach and system based on cloud computing

Publications (1)

Publication Number Publication Date
CN106951475A true CN106951475A (en) 2017-07-14

Family

ID=59467025

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710130418.8A Pending CN106951475A (en) 2017-03-07 2017-03-07 Big data distributed approach and system based on cloud computing

Country Status (1)

Country Link
CN (1) CN106951475A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108132970A (en) * 2017-12-04 2018-06-08 洛阳师范学院 Big data distributed approach and system based on cloud computing
CN109033137A (en) * 2018-06-06 2018-12-18 千寻位置网络有限公司 Dynamic RINEX date storage method and device
CN109117275A (en) * 2018-08-31 2019-01-01 平安科技(深圳)有限公司 Account checking method, device, computer equipment and storage medium based on data fragmentation
CN109584068A (en) * 2018-11-02 2019-04-05 深圳市快付通金融网络科技服务有限公司 A kind of distribution of funds formula liquidation method, apparatus and system
CN110019234A (en) * 2017-12-28 2019-07-16 中国电信股份有限公司 Method and system for fragment storing data
CN110955637A (en) * 2019-11-27 2020-04-03 集奥聚合(北京)人工智能科技有限公司 Method for realizing ordering of oversized files based on low memory
CN111339041A (en) * 2020-03-10 2020-06-26 中国建设银行股份有限公司 File parsing and warehousing and file generating method and device
CN112416865A (en) * 2020-11-20 2021-02-26 中国建设银行股份有限公司 File processing method and device based on big data
CN112529736A (en) * 2020-12-28 2021-03-19 成都工百利自动化设备有限公司 Online wave recording method and system for distributed power grid
CN112653771A (en) * 2021-03-15 2021-04-13 浙江贵仁信息科技股份有限公司 Water conservancy data fragment storage method, on-demand method and processing system
CN113608775A (en) * 2021-06-18 2021-11-05 天津津航计算技术研究所 Flow configuration method based on direct memory read-write
CN113835634A (en) * 2021-09-23 2021-12-24 中国自然资源航空物探遥感中心 Multi-parameter data synchronous recording method and device based on annular memory double buffering

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140379632A1 (en) * 2013-06-19 2014-12-25 International Business Machines Corporation Smarter big data processing using collaborative map reduce frameworks
CN106202278A (en) * 2016-07-01 2016-12-07 武汉泰迪智慧科技有限公司 A kind of public sentiment based on data mining technology monitoring system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140379632A1 (en) * 2013-06-19 2014-12-25 International Business Machines Corporation Smarter big data processing using collaborative map reduce frameworks
CN106202278A (en) * 2016-07-01 2016-12-07 武汉泰迪智慧科技有限公司 A kind of public sentiment based on data mining technology monitoring system

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108132970A (en) * 2017-12-04 2018-06-08 洛阳师范学院 Big data distributed approach and system based on cloud computing
CN110019234A (en) * 2017-12-28 2019-07-16 中国电信股份有限公司 Method and system for fragment storing data
CN109033137B (en) * 2018-06-06 2021-11-05 千寻位置网络有限公司 Dynamic RINEX data storage method and device
CN109033137A (en) * 2018-06-06 2018-12-18 千寻位置网络有限公司 Dynamic RINEX date storage method and device
CN109117275A (en) * 2018-08-31 2019-01-01 平安科技(深圳)有限公司 Account checking method, device, computer equipment and storage medium based on data fragmentation
WO2020042427A1 (en) * 2018-08-31 2020-03-05 平安科技(深圳)有限公司 Reconciliation method and apparatus based on data fragments, computer device, and storage medium
CN109117275B (en) * 2018-08-31 2024-05-28 平安科技(深圳)有限公司 Account checking method and device based on data slicing, computer equipment and storage medium
CN109584068A (en) * 2018-11-02 2019-04-05 深圳市快付通金融网络科技服务有限公司 A kind of distribution of funds formula liquidation method, apparatus and system
CN110955637A (en) * 2019-11-27 2020-04-03 集奥聚合(北京)人工智能科技有限公司 Method for realizing ordering of oversized files based on low memory
CN111339041A (en) * 2020-03-10 2020-06-26 中国建设银行股份有限公司 File parsing and warehousing and file generating method and device
CN111339041B (en) * 2020-03-10 2024-01-12 中国建设银行股份有限公司 File analysis and storage method and device and file generation method and device
CN112416865A (en) * 2020-11-20 2021-02-26 中国建设银行股份有限公司 File processing method and device based on big data
CN112529736A (en) * 2020-12-28 2021-03-19 成都工百利自动化设备有限公司 Online wave recording method and system for distributed power grid
CN112653771B (en) * 2021-03-15 2021-06-01 浙江贵仁信息科技股份有限公司 Water conservancy data fragment storage method, on-demand method and processing system
CN112653771A (en) * 2021-03-15 2021-04-13 浙江贵仁信息科技股份有限公司 Water conservancy data fragment storage method, on-demand method and processing system
CN113608775A (en) * 2021-06-18 2021-11-05 天津津航计算技术研究所 Flow configuration method based on direct memory read-write
CN113608775B (en) * 2021-06-18 2023-10-13 天津津航计算技术研究所 Flow configuration method based on memory direct reading and writing
CN113835634A (en) * 2021-09-23 2021-12-24 中国自然资源航空物探遥感中心 Multi-parameter data synchronous recording method and device based on annular memory double buffering
CN113835634B (en) * 2021-09-23 2024-09-17 中国自然资源航空物探遥感中心 Multi-parameter data synchronous recording method and device based on annular memory double buffering

Similar Documents

Publication Publication Date Title
CN106951475A (en) Big data distributed approach and system based on cloud computing
WO2018214388A1 (en) Multi-platform big data system and method for aviation electronics
CN107766402A (en) A kind of building dictionary cloud source of houses big data platform
CN110674154B (en) Spark-based method for inserting, updating and deleting data in Hive
Gilbert Simulation: A new way of doing social science
CN107544984A (en) A kind of method and apparatus of data processing
CN107301214A (en) Data migration method, device and terminal device in HIVE
CN104036029A (en) Big data consistency comparison method and system
CN103699660A (en) Large-scale network streaming data cache-write method
CN108255966A (en) A kind of data migration method and storage medium
CN106570145B (en) Distributed database result caching method based on hierarchical mapping
WO2018214387A1 (en) Distributed mining system and method for aviation-oriented electronic data
CN106528898A (en) Method and device for converting data of non-relational database into relational database
Jun et al. Cloud computing based solution to decision making
CN106055590A (en) Power grid data processing method and system based on big data and graph database
US20230106106A1 (en) Text backup method, apparatus, and device, and computer-readable storage medium
CN104219088A (en) Hive-based network alarm information OLAP method
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
Jin et al. Association rules redundancy processing algorithm based on hypergraph in data mining
CN108132970A (en) Big data distributed approach and system based on cloud computing
CN109947743A (en) A kind of the NoSQL big data storage method and system of optimization
CN107679133B (en) Mining method applicable to massive real-time PMU data
Ravichandran Big Data processing with Hadoop: a review
Anusha et al. Big data techniques for efficient storage and processing of weather data
CN116227989A (en) Multidimensional business informatization supervision method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170714