CN104320460A - Big data processing method - Google Patents

Big data processing method Download PDF

Info

Publication number
CN104320460A
CN104320460A CN201410577834.9A CN201410577834A CN104320460A CN 104320460 A CN104320460 A CN 104320460A CN 201410577834 A CN201410577834 A CN 201410577834A CN 104320460 A CN104320460 A CN 104320460A
Authority
CN
China
Prior art keywords
user
processing method
data
data processing
hadoop
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410577834.9A
Other languages
Chinese (zh)
Inventor
王茜
李安颖
史晨昱
葛新
梁小江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Following International Information Ltd Co
Original Assignee
Xi'an Following International Information Ltd Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Following International Information Ltd Co filed Critical Xi'an Following International Information Ltd Co
Priority to CN201410577834.9A priority Critical patent/CN104320460A/en
Publication of CN104320460A publication Critical patent/CN104320460A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers

Abstract

The invention discloses a big data processing method which includes the following steps: building a Hadoop cluster on an Open Stack cloud platform to provide basic environment for big data processing; importing data into HDFS and Swift to build a data source; processing the data built in the step 2 by a user, displaying a processing result through the Web page or assigning the processing result to the output file under the specific route. By means of the big data processing method based on Open Stack and Hadoop, the server resource utilization rate is improved, and the big data access requirement is reduced.

Description

A kind of large data processing method
Technical field
The invention belongs to large technical field of data processing, relate to a kind of large data processing method.
Background technology
More prevalent along with the Network Information epoch, mobile Internet, social networks, ecommerce have expanded boundary and the application of the Internet greatly, we are in " large data " epoch of a data explosive increase, large data are in social economy, politics, culture, the aspects such as people's life produce far-reaching influence, and the data controling power of large data age to the mankind proposes new Oppertunities and challenges.Large data have magnanimity, diversity, high speed, mutability, data type are various, and data value relative density is low, ageing requires high, beyond the disposal ability of traditional data base handling system.Under cover valuable pattern and information in data, utilize the mode of traditional data processing, excavate the information in large data, needs to take a long time and huge cost, even cannot process some data.The tide that cloud computing and large data revolution start, driven the development of data analysis industry, cloud computing provides base platform, and large market demand operates on this platform, and this is one of method of generally acknowledging the large data of process very efficiently at present.Utilize cloud computing to carry out large data analysis, one of development trend in the future certainly will be become.Wherein be applied as the large data analysis of representative with Hadoop, be best suited for one of business run on cloud platform.
OpenStack is that one is increased income cloud computing technology, its main task be simplify cloud deployment and bring good extensibility for it.
Conveniently carry out Treatment Analysis to large data fast, therefrom excavate the value of data, we propose a kind of new processing method and OpenSatck Sahara, utilize Openstack Sahara can fast and the information excavated in large data of low cost.
Summary of the invention
The object of this invention is to provide a kind of large data processing method, improve the resource utilization of server, and reduce the access threshold of large data.
Technical scheme of the present invention is, a kind of large data processing method, specifically implements according to following steps:
Step 1, openstack cloud platform creates Hadoop cluster, provides the Essential Environment of large data processing;
Step 2, by creating data source by data importing to HDFS and Swift;
Step 3, user processes the data in the data source created in step 2, and result is shown by Web page or result be assigned to the output file under particular path.
Feature of the present invention is also,
Step 1 is specifically implemented according to following steps:
Step 1.1, user applies for OpenStack account, and uses OpenStack account to log in OpenStack cloud platform;
Step 1.2, user uploads mirror image to OpenStack cloud platform and registers mirror image;
Step 1.3, user creates network and route, node group module and cluster template;
Step 1.4, user by selecting Plugin and Hadoop version, fills in cluster name, selects cluster template, foundation image, double secret key and network to create Hadoop cluster.
In step 2, data source comprises HDFS data source and Swift data source.
In step 3, user carries out process to data and comprises user interface process method and order line processing method,
User interface process method refers to carries out man-machine interaction by user interface, creates Job Binaries and Job, and performs operation, check execution result by web page;
Order line processing method refers to that user is under Command Line Interface, is submitted to and is performed operation, checked by the output file under the specified path of Output rusults by order.
In step 3, user adopts the Map-Reduce framework of Hadoop to process data.
The invention has the beneficial effects as follows, utilize Sahara can in openstack cloud environment rapid deployment Hadoop cluster, as the bridge of cloud computing and large data, the integration of openstack cloud platform and Hadoop can be promoted, thus can fast and the information excavated in large data of low cost, improve the resource utilization of server, greatly reduce again the access threshold of large data, it is one of methods of the large data of process very efficiently that large market demand operates on cloud platform.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of a kind of large data processing method of the present invention;
Fig. 2 is the schematic diagram of Hadoop cluster constructive process in the inventive method;
Fig. 3 is the schematic flow sheet of Map-Reduce processing method in the inventive method.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention is described in detail.
A kind of large data processing method of the present invention, as shown in Figure 1, comprises the following steps:
Step 1, OpenStack cloud platform creates Hadoop cluster, provides the Essential Environment of large data processing;
Wherein, as shown in Figure 2, step 1 is specifically implemented according to following steps:
Step 1.1, user applies for OpenStack account, and uses OpenStack account to log in OpenStack cloud platform;
Step 1.2, user uploads mirror image to OpenStack cloud platform and registers mirror image;
Step 1.3, user creates network and route, node group module and cluster template;
Step 1.4, user by selecting Plugin and Hadoop version, fills in cluster name, selects cluster template, foundation image, double secret key and network to create Hadoop cluster;
Step 2, by creating data source by data importing to HDFS and Swift;
Wherein, in step 2, data source comprises HDFS data source and Swift data source,
HDFS data source comprises input/output data source name, selects data source types HDFS, I/O URL path.
Swift data source comprises input/output data source name, selection data source types Swift, I/O URL path, input username and password.
Step 3, user can be processed data by two kinds of methods, and one carries out man-machine interaction by user interface, creates Job Binaries, creates job, performs job, checks execution result by web; One is by Command Line Interface, and user, under Command Line Interface, is submitted to by order and performed operation, being checked by the output file under the specified path of Output rusults.Concrete data processing is the Map-Reduce framework adopting Hadoop.Map-reduce is exactly the decomposition of task and gathering of result.Processing procedure is as shown in Figure 3:
The Map stage: Hadoop Map/Reduce framework produces a map task for each InputSplit, and each InputSplit is produced by the InputFormat of this operation; Framework can the value (value) of all pilot processs associated with a specific key point in groups, after the output of Mapper is sorted, is just allocated to each Reducer
Reduce stage: Reducer has 3 Main Stage: shuffle, sort and reduce.The input of Shuffle Reducer is exactly Mapper sorted output.In this stage, framework is that each Reducer obtains piecemeal associated in all Mapper output by HTTP.
In this stage of Sort, the input of value to Reducer according to key is divided into groups by framework (because may have identical key in the output of different mapper).
Two stages of Shuffle and Sort carry out simultaneously; The output of map is also be retrieved while merged.
Reduce is in this stage, and framework is each <key in the input data of having divided into groups, and (list of values) > is to calling a reduce method.The output of Reduce task is normally by calling OutputCollector.collect writing in files system.

Claims (5)

1. a large data processing method, is characterized in that, specifically implements according to following steps:
Step 1, OpenStack cloud platform creates Hadoop cluster, provides the Essential Environment of large data processing;
Step 2, by creating data source by data importing to HDFS and Swift;
Step 3, user processes the data in the data source created in step 2, and result is shown by Web page or result be assigned to the output file under particular path.
2. the large data processing method of one according to claim 1, is characterized in that, described step 1 is specifically implemented according to following steps:
Step 1.1, user applies for OpenStack account, and uses OpenStack account to log in OpenStack cloud platform;
Step 1.2, user uploads mirror image to OpenStack cloud platform and registers mirror image;
Step 1.3, user creates network and route, node group module and cluster template;
Step 1.4, user by selecting Plugin and Hadoop version, fills in cluster name, selects cluster template, foundation image, double secret key and network to create Hadoop cluster.
3. the large data processing method of one according to claim 1, is characterized in that, in step 2, data source comprises HDFS data source and Swift data source.
4. the large data processing method of one according to claim 1, is characterized in that, in described step 3, user carries out process to data and comprises user interface process method and order line processing method,
Described user interface process method refers to carries out man-machine interaction by user interface, creates Job Binaries and Job, and performs operation, check execution result by web page;
Described order line processing method refers to that user is under Command Line Interface, is submitted to and is performed operation, checked by the output file under the specified path of Output rusults by order.
5. the large data processing method of the one according to Claims 1-4 any one, is characterized in that, in step 3, user adopts the Map-Reduce framework of Hadoop to process data.
CN201410577834.9A 2014-10-24 2014-10-24 Big data processing method Pending CN104320460A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410577834.9A CN104320460A (en) 2014-10-24 2014-10-24 Big data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410577834.9A CN104320460A (en) 2014-10-24 2014-10-24 Big data processing method

Publications (1)

Publication Number Publication Date
CN104320460A true CN104320460A (en) 2015-01-28

Family

ID=52375629

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410577834.9A Pending CN104320460A (en) 2014-10-24 2014-10-24 Big data processing method

Country Status (1)

Country Link
CN (1) CN104320460A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104734892A (en) * 2015-04-02 2015-06-24 江苏物联网研究发展中心 Automatic deployment system for big data processing system Hadoop on cloud platform OpenStack
CN104767813A (en) * 2015-04-08 2015-07-08 江苏国盾科技实业有限责任公司 Public bank big data service platform based on openstack
CN106971011A (en) * 2017-05-19 2017-07-21 肇庆市智高电机有限公司 A kind of big data analysis method based on cloud platform
CN108241722A (en) * 2016-12-23 2018-07-03 北京金山云网络技术有限公司 A kind of data processing system, method and device
CN110647379A (en) * 2018-06-27 2020-01-03 复旦大学 Hadoop cluster automatic telescopic deployment and Plugin deployment method based on OpenStack cloud
CN113341899A (en) * 2015-10-09 2021-09-03 费希尔-罗斯蒙特系统公司 Distributed industrial performance monitoring and analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130332568A1 (en) * 2012-06-11 2013-12-12 France Telecom Method of data processing by a navigation module
CN103561061A (en) * 2013-10-17 2014-02-05 南京邮电大学 Flexible cloud data mining platform deploying method
CN104065716A (en) * 2014-06-18 2014-09-24 江苏物联网研究发展中心 OpenStack based Hadoop service providing method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130332568A1 (en) * 2012-06-11 2013-12-12 France Telecom Method of data processing by a navigation module
CN103561061A (en) * 2013-10-17 2014-02-05 南京邮电大学 Flexible cloud data mining platform deploying method
CN104065716A (en) * 2014-06-18 2014-09-24 江苏物联网研究发展中心 OpenStack based Hadoop service providing method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ECCP研发团队博客: "ECCP云平台 Hadoop集群使用文档", 《HTTPS://WWW.TUICOOL.COM/ARTICLES/JRI3EYM》 *
一棹凌烟: ""Sahara浅析"系列之三——Sahara使用方式", 《HTTPS://WWW.TUICOOL.COM/ARTICLES/7RBQNUN》 *
杨赛: "OpenStack大数据项目Sahara概述", 《HTTP://WWW.INFOQ.COM/CN/NEWS/2014/04/OPENSTACK-SAHARA/》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104734892A (en) * 2015-04-02 2015-06-24 江苏物联网研究发展中心 Automatic deployment system for big data processing system Hadoop on cloud platform OpenStack
CN104767813A (en) * 2015-04-08 2015-07-08 江苏国盾科技实业有限责任公司 Public bank big data service platform based on openstack
CN104767813B (en) * 2015-04-08 2018-06-08 江苏国盾科技实业有限责任公司 Public's row big data service platform based on openstack
CN113341899A (en) * 2015-10-09 2021-09-03 费希尔-罗斯蒙特系统公司 Distributed industrial performance monitoring and analysis
CN108241722A (en) * 2016-12-23 2018-07-03 北京金山云网络技术有限公司 A kind of data processing system, method and device
CN106971011A (en) * 2017-05-19 2017-07-21 肇庆市智高电机有限公司 A kind of big data analysis method based on cloud platform
CN110647379A (en) * 2018-06-27 2020-01-03 复旦大学 Hadoop cluster automatic telescopic deployment and Plugin deployment method based on OpenStack cloud
CN110647379B (en) * 2018-06-27 2023-10-17 复旦大学 Method for carrying out Hadoop cluster automatic telescopic deployment and Plugin deployment based on OpenStack cloud

Similar Documents

Publication Publication Date Title
CN104320460A (en) Big data processing method
MX2020014293A (en) Artificial intelligence-based generation of sequencing metadata.
CN107784026B (en) ETL data processing method and device
CN105793822B (en) Dynamic shuffle reconfiguration
WO2020257812A3 (en) Modeling dependencies with global self-attention neural networks
CN105159148B (en) Robot instruction processing method and device
KR20210036226A (en) A distributed computing system including multiple edges and cloud, and method for providing model for using adaptive intelligence thereof
CN106919697B (en) Method for simultaneously importing data into multiple Hadoop assemblies
CN104809231A (en) Mass web data mining method based on Hadoop
WO2005008414A3 (en) Method and apparatus for parallel action processing
Wang et al. Research of massive web log data mining based on cloud computing
CN103514769A (en) Intelligent learning line design system and method
CN106502842A (en) Data reconstruction method and system
CN103853938A (en) High-throughput sequencing data processing and analysis flow control method
CN106599244B (en) General original log cleaning device and method
CN105808577A (en) HBase database-based data batch loading method and device
CN114565316A (en) Task issuing method based on micro-service architecture and related equipment
CN105137839A (en) Cradle head control method and system based on memory database
CN110414939A (en) A method of list is rendered based on component element arrangements and saves form data
Kopczyńska et al. Structured meetings for non-functional requirements elicitation
CN108205578A (en) Index generation method and device
CN105653552B (en) Structured document processing method, device and equipment
CN115455036B (en) Processing method, device, equipment and medium of joint statement
US11442792B2 (en) Systems and methods for dynamic partitioning in distributed environments
US20220172136A1 (en) System for analyzing characteristics of internet of services

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150128

RJ01 Rejection of invention patent application after publication