CN109308209A

CN109308209A - A kind of big data virtualization operations method

Info

Publication number: CN109308209A
Application number: CN201710622208.0A
Authority: CN
Inventors: 李笠
Original assignee: Runze Technology Development Co Ltd
Current assignee: Runze Technology Development Co Ltd
Priority date: 2017-07-27
Filing date: 2017-07-27
Publication date: 2019-02-05

Abstract

The invention discloses a kind of big data virtualization operations methods, the following steps are included: 1) infrastructure virtualizes, 2) virtual machine instantiation, 3) Eucalyptus is installed, 4) service layer is established, 5) mass data in type of dealing with relationship database, 6) procedure operation.The present invention utilizes cloud computing virtualization and big data technology, integrates infrastructure resources, provides the calculating and storage capacity convenient for unified management, have enhanced scalability for platform；It for the scalability for solving data mining algorithm, is designed using a variety of design pattern optimized interfaces, the parameter configuration interface of expression layer and the logic loose coupling of S-PLUS language analysis data；The relevant database vertically extended is changed into the profile database of horizontal extension, is exploded problem to alleviate data.

Description

A kind of big data virtualization operations method

Technical field

The invention belongs to data technique field, in particular to a kind of big data virtualization operations method.

Background technique

In recent years, big data have swepts the globe such as tide, changes people's lives, work and the mode of thinking deeply.Industry Usually summarize the feature of big data with 4 V in boundary.First is that the data scale of construction is huge (Volume).From TB rank, PB grades are risen to Not.Second is that data type is various (Variety).Such diversity also allows data to be divided into structural data and non-knot Structure data.Relative to the previous structural data based on text convenient for storage, unstructured data is more and more, packet Network log, audio, video, picture, geographical location information etc. are included, the data of these polymorphic types propose the processing capacity of data Requirements at the higher level are gone out.Third is that value density is low (Value).The height of value density is inversely proportional with the size of total amount of data.With view For frequency, 1 hour video, in continuous continual monitoring, useful data may only have one or two seconds.How to pass through The value " purification " that powerful machine algorithm more quickly completes data becomes difficulty urgently to be resolved under current big data background Topic.Fourth is that processing speed is fast (Velocity).This is the most significant feature that big data distinguishes over traditional data mining.According to The report in " the digital universe " of IDC, it is contemplated that arrive the year two thousand twenty, global metadata usage amount is up to 35.2ZB.In the number of such magnanimity It is exactly the life of enterprise according to the efficiency in front, handling data.

Meanwhile more and more countries start to recognize big data from strategic level in the world, incorporate in governance field Big data thinking and technology.Due to the sensibility of government data, also there is higher requirement to the safety of network and data. In order to handle the data quicklyd increase, matched hardware environment is needed to meet the needs of big data processing.For storing and applying The network architecture of big data needs to adapt to the feature of big data.There are many big data storage systems in the prior art, generally use The mode of SAN and optical fiber switch, the price is very expensive.It is the cloud storage technology of representative using a large amount of cheap using Hadoop Server constitutes mass memory capacity, cost is greatly reduced compared with SAN, but every storage equipment still needs to be equipped with Corresponding storage server, it is also very high to network bandwidth requirement, generally require the network equipment with valuableness, and Name Node Single Point of Faliure risk is still remained, cost, Performance And Reliability are still not ideal enough.

For this reason, it may be necessary to provide the big data storage device that can store big data of a kind of high-performance, low cost.

Summary of the invention

The present invention provides a kind of big data storage architecture, with provide a kind of high-performance, low investment, high reliability it is big Data virtualization operating method.

The specific technical solution of the present invention is: a kind of big data virtualization operations method, comprising the following steps:

1) infrastructure virtualizes；Facility is virtualized using virtualization technology, the server virtual including physical layer Change, Storage Virtualization and network virtualization form virtualization layer；It establishes and calculates virtualization pond and Storage Virtualization pond；The meter Virtualization pond is calculated for realizing the virtualization of computing resource level, the Storage Virtualization pond is virtual for realizing storing data Change；

2) virtual machine instantiation；The following steps are included:

(1) it selects virtual machine and customizes；

(2) it saves and customizes Parameter File；

(3) the target physical machine server of selection deployment；

(4) associated documents of virtual machine are copied；

(5) virtual machine on target machine after starting deployment；

3) open source cloud computing solution installs Eucalyptus；Based on Eucalyptus, build virtual Machine cluster, user installation cloud computing platform comprising the steps of:

(1) (SuSE) Linux OS is installed

(2) configuration Yum installs source；

(3) installation script is configured；

(4) other node operating systems are installed；

(5) Cobbler service is built；

(6) PXE mode installs node OS；

(7) it is shared that security strategy, bridge, firewall, NFS are configured；

4) service layer, deployment services device host and storage array are established, is led between the server host and storage array Optical fiber switch connection is crossed, the server is equipped with virtualization software and cloud management software, and server passes through virtualization software Server resource is subjected to pond；

5) mass data in type of dealing with relationship database operates relevant database in conjunction with S-PLUS and Hadoop: Text data file is exported by S-PLUS, is uploaded in HDFS, text data set is then converted to, carries out distributed treatment；

6) procedure operates, and realizes various functions to user service layer in a manner of web interface in application layer；Setting point Parameter, data mining are analysed, obtain analysis result and is shown.

Further, it in the service layer, is realized using the reproduction technology and business tool S-PLUS of MySQL database Customizable data pass through mechanism is carried out between Hadoop and database.

Further, in the service layer include hardware firewall, virtual firewall, multiple application virtualization servers, with And distributed memory system；The hardware firewall connects the virtual firewall, and the virtual firewall connects the application Virtualized server, the application virtualization server connect the distributed memory system.

Further, it is connected in the hardware firewall and application firewall is set between the virtual firewall.

Further, in the application layer, the user interface of B/S mode is designed, user need to only utilize graphic interface It is operated, without directly writing, S-PLUS code carries out data analysis and statistics, actual calculating process then pass through It calls S-PLUS language come what is realized in bottom, fundamentally shields the complexity of S-PLUS language.

Further, the model De11 PowerEdgel2G R720 of server host described in step 4)；The storage Array model Dell SCv2020FC；The optical fiber switch model Bocade300.The virtualization software is VMware VSphere, the cloud management software are VMware vCenter.

Beneficial effects of the present invention:

(1) cloud computing virtualization and big data technology are utilized, infrastructure resources are integrated, is provided for platform convenient for unified Manage, have the calculating and storage capacity of enhanced scalability.

(2) it is the scalability for solving data mining algorithm, is designed using a variety of design pattern optimized interfaces, expression layer Parameter configuration interface and S-PLUS language analysis data logic loose coupling.

(3) relevant database vertically extended is changed into the profile database of horizontal extension, to alleviate data Explode problem.

Specific embodiment

Following embodiment, the present invention will be described in further detail.It should be appreciated that specific implementation described herein Example is used only for explaining the present invention, is not intended to limit the present invention.

A kind of big data virtualization operations method, which comprises the following steps:

Step 1: infrastructure virtualization.Using virtualization technology realize host and storage resource integration and It is shared to utilize, resource utilization is improved, cost is reduced, reduces the complexity of management.Facility is virtualized, including server is empty Quasi-ization, Storage Virtualization, network virtualization.The present invention mainly virtualizes in terms of two, establishes two virtualization ponds i.e. Calculate virtualization pond and Storage Virtualization pond.It calculates virtualization pond and mainly realizes application virtualization, include in computing resource level Server virtualization and application middleware virtualize.Data storage virtualization is mainly realized in Storage Virtualization pond, in storage level Software virtualization is virtualized and stored including storage hardware framework.Host, management node, more meters are built according to above-mentioned thinking The hardware such as operator node and the network equipment provide required hardware foundation for big data processing.

Step 2: the virtual machine instantiation stage.It comprises the steps of:

(1) it selects virtual machine and customizes；

(2) it saves and customizes Parameter File；

(3) the target physical machine server of selection deployment；

(4) associated documents of virtual machine are copied；

(5) virtual machine on target machine after starting deployment.

Step 3: open source cloud computing solution installs Eucalyptus；Based on Eucalyptus, build Cluster virtual machine, user create privately owned cloud computing platform in existing architecture, and installation process comprises the steps of:

(1) (SuSE) Linux OS is installed；

(2) configuration Yum installs source；

(3) installation script is configured；

(4) other node operating systems are installed；

(5) Cobbler service is built；

(6) PXE mode installs node OS；

(7) it is shared that security strategy, bridge, firewall, NFS are configured.

Step 4: establish service layer: deployment services device host and storage array, the server host and storage array it Between connected by optical fiber switch, the server is equipped with virtualization software and cloud management software, and server passes through virtualization Server resource is carried out pond by software.

In the present embodiment, four De11PowerEdgel2G R720 server hosts of standard configuration scheme, every clothes Business device configures 2 E5-2650V2 processors, which includes 8 kernels, 16 threads, and FLOPS reaches 166.4, single server provides 16 kernel, 32 thread, four group of planes entirety GFLOPS can reach 1331.2, can meet very well The demand of database, big data and virtualization and the experiment of cloud computing real training.In terms of memory, every server configures 1286 memories, It removes EXSi virtualization system to retain outside 8G, 120G can be provided and used to user virtual machine, single server can create simultaneously With the client machine system of 30 4G or 60 2G of operation.Four servers of a group of planes remove the resource consumptions such as cloud computing service, in total The client machine system of 100 4G or 200 2G can be provided.

Storage for data, it is contemplated that the high concurrent degree of system guarantees that performance, system are adopted to eliminate I/0 hot spot With De11SCv2020FC high-performance storage array, storage array is connected with server host by optical fiber switch. SCv2020FC server storage configuration has 24 pieces of 15K high speed SAS disks, and the capacity of every piece of disk is 6006, and whole volume reaches To 14.4T, big data can be met very well to the memory requirement of mass data.In terms of data redundancy, SCv2020FC storage branch Hold RAIDS/6, RAID10 and RAIDIODM (double-mirror).

If IE is only used to access SPLUS SERVER as client, it need not install and be arranged in any if client Hold, need to only open IE, inputs network address: http://hostname/statserver；Access；If using other clients End, such as SPLUS Publishing client or EXCEL client must then run client peace in client machine Software is filled, this S-PLUS enterprise servers client component is interacted by the mechanism and server that are communicated based on HTTP, This client is mounted on main installation window, and clicking S-PLUS button can start to install.

For the efficiency of transmission of data, exchanged between server and storage using high performance Brocade300 optical fiber Machine, the interchanger can provide the transmittability of up to 8.5Gbit/Sec full duplex, it is sufficient to meet system high concurrent and data Library and big data project big data quantity transmit the strict demand to system performance.

For the safety of system data, other than every server and storage provide redundancy by RA work D10, VMware Data center virtualization software also provides the need for reliable backup of system metadata and user data, can accomplish the abampere of data Entirely.

The virtualization software is vSphere6.0, and the cloud management software is vCenter6.0.VSphere is industry The highest virtualization product of occupation rate of market has the characteristics that stability is good, easy to use and management, good compatibility, main Effect is virtualized to server resource, is needed for every server installation and deployment.VCenter is based on the portion vSphere Administration provides the function of cloud publication and cloud management.

Step 5: the mass data in type of dealing with relationship database；It realizes in conjunction with S-PLUS and Hadoop to relationship type number According to the operation of large-scale data in library: a large amount of data to be analyzed are exported as text data file by business tool S-PLUS, And upload to text data file in HDFS, it is then converted to text data set and carries out distributed treatment.Hadoop is provided Accordingly from the interface of relational database query and reading data, although allowing to be directly read from database with relevant interface Input of the data record as MapReduce, but treatment effeciency is lower, and largely continually looks into from MapReduce program Asking and read relational database will increase the access load of database.Present invention employs one kind more efficiently to read simultaneously Deal with relationship the solution of mass data record in database: by business tool S-PLUS will a large amount of data to be analyzed it is defeated It is out text data file, and uploads in HDFS, is then converted to text data set and carries out distributed treatment.

Step 6: procedure operating method；Various function are realized to user service layer in a manner of web interface in application layer Energy；User manipulates data and outputs and inputs, and realizes branch, circulation, and can customize function, and the function includes but unlimited In the target that the wisdom traffics systems such as city management, urban information system service, social supervision, public safety need to be paid close attention to, setting Content includes: setting data source, selection analysis method, setting analysis parameter, data mining and analysis, obtains analysis result And it shows.

The present invention applies multiple virtualized servers, and server physical resource is abstracted into logical resource, is no longer limited by Boundary physically, but " resource pool " for allowing the hardware such as CPU, memory, disk, I/O to become dynamically to manage, to mention The utilization rate of high resource simplifies system administration, realizes Server Consolidation, a property server can be divided into multiple small Virtual server.There is server virtualization, multiple servers are by a physical machine existence.Since server being merged At less hardware and efficiency is increased, server virtualization reduces cost.Multiple virtualized servers can be according to reality Demand disposes types of applications program in advance, to carry out various processing or application to data.For example, a virtualized server ETL (extract, conversion, load) and dispatcher software can be disposed, by customized scheduling strategy and decimation pattern, by data from Other data sources are imported into distributed memory system.It can be also used for deployment data quality management system, to the number of extraction According to quality of data anatomy is done, quality problems existing for data are found by sampling and scan data, generate quality of data report. It can be also used for deployment webservice, system resource interface be externally provided.

In order to ensure the safety of big data storage application, multiple virtual application servers must be subjected to subregion, and to not Same virtual region uses different security strategies.The multiple virtual firewalls marked off by virtual firewall can be with It realizes this point, can individually distribute security strategy for each virtual firewall.In each area, it is also necessary to according to function With the different demands to safety, multiple virtual application servers are arranged on different virtual network segments, virtual firewall It can be used as the gateway of each network segment.Different subregions uses different security strategies, and satisfaction pacifies multiple virtual network segments Full management.

The purpose of the present invention, technical scheme and beneficial effects are described in detail above, it should be understood that more than Described is only a specific embodiment of the invention, is not intended to limit the scope of protection of the present invention, all in essence of the invention Any modification, equivalent substitution, improvement and etc. done within mind and principle, should all be included in the protection scope of the present invention.

Claims

1. a kind of big data virtualization operations method, which comprises the following steps:

1) infrastructure virtualizes；Facility is virtualized using virtualization technology, server virtualization, storage including physical layer Virtualization and network virtualization form virtualization layer；It establishes and calculates virtualization pond and Storage Virtualization pond；The calculating virtualization Pond is for realizing the virtualization of computing resource level, and the Storage Virtualization pond is for realizing storing data virtualization；

2) virtual machine instantiation；The following steps are included:

(1) it selects virtual machine and customizes；

(2) it saves and customizes Parameter File；

(3) the target physical machine server of selection deployment；

(4) associated documents of virtual machine are copied；

(5) virtual machine on target machine after starting deployment；

3) open source cloud computing solution installs Eucalyptus；Based on Eucalyptus, virtual machine collection is built Group, user installation cloud computing platform comprising the steps of:

(1) (SuSE) Linux OS is installed

(2) configuration Yum installs source；

(3) installation script is configured；

(4) other node operating systems are installed；

(5) Cobbler service is built；

(6) PXE mode installs node OS；

4) service layer, deployment services device host and storage array are established, passes through light between the server host and storage array Fine interchanger connection, the server are equipped with virtualization software and cloud management software, and server will be taken by virtualization software Device resource of being engaged in carries out pond；

5) mass data in type of dealing with relationship database: relevant database is operated in conjunction with S-PLUS and Hadoop: being passed through S-PLUS exports text data file, uploads in HDFS, is then converted to text data set, carries out distributed treatment；

6) procedure operates: realizing various functions to user service layer in a manner of web interface in application layer；Setting analysis ginseng Number, data mining obtain analysis result and show.

2. according to the method described in claim 1, it is characterized by: using the duplication skill of MySQL database in the service layer Art and business tool S-PLUS, which are realized, carries out customizable data pass through mechanism between Hadoop and database.

3. according to the method described in claim 1, it is characterized by: using the duplication skill of MySQL database in the service layer Art and business tool S-PLUS, which are realized, carries out customizable data pass through mechanism between Hadoop and database.

4. according to the method described in claim 1, it is characterized by: the hardware firewall connect the virtual firewall it Between application firewall is set.

5. according to the method described in claim 1, it is characterized by: designing user's operation circle of B/S mode in the application layer Face, user need to only be operated using graphic interface, carry out data analysis and system without directly writing S-PLUS code Meter, actual calculating process are then fundamentally to shield S-PLUS language by calling S-PLUS language to realize in bottom The complexity of speech.

6. according to the method described in claim 1, it is characterized by: the model of server host described in step 4) De11PowerEdgel2G R720；The storage array model Dell SCv2020FC；The optical fiber switch model Bocade300。

7. according to the method described in claim 1, it is characterized by: virtualization software described in step 4) is VMware VSphere, the cloud management software are VMware vCenter.