CN106445763A - Power distribution and utilization big data platform test method and system - Google Patents
Power distribution and utilization big data platform test method and system Download PDFInfo
- Publication number
- CN106445763A CN106445763A CN201610815863.3A CN201610815863A CN106445763A CN 106445763 A CN106445763 A CN 106445763A CN 201610815863 A CN201610815863 A CN 201610815863A CN 106445763 A CN106445763 A CN 106445763A
- Authority
- CN
- China
- Prior art keywords
- data
- platform
- test
- big data
- data platform
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010998 test method Methods 0.000 title claims abstract description 29
- 238000011056 performance test Methods 0.000 claims abstract description 35
- 238000000034 method Methods 0.000 claims abstract description 19
- 238000012360 testing method Methods 0.000 claims description 207
- 230000005611 electricity Effects 0.000 claims description 95
- 238000003860 storage Methods 0.000 claims description 46
- 238000004088 simulation Methods 0.000 claims description 21
- 238000004458 analytical method Methods 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 14
- 238000009434 installation Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 10
- 238000012550 audit Methods 0.000 claims description 7
- 238000007726 management method Methods 0.000 claims description 6
- 238000010835 comparative analysis Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 claims description 3
- 238000012417 linear regression Methods 0.000 claims description 3
- 238000010200 validation analysis Methods 0.000 claims description 3
- 230000006978 adaptation Effects 0.000 claims description 2
- 238000007689 inspection Methods 0.000 claims 1
- 230000008676 import Effects 0.000 description 18
- 230000008859 change Effects 0.000 description 10
- 238000013500 data storage Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 5
- 241001269238 Data Species 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 238000013523 data management Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 230000014759 maintenance of location Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000007257 malfunction Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011076 safety test Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000000547 structure data Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/22—Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
- G06F11/26—Functional testing
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
The invention relates to a power distribution and utilization big data platform test method and a system, wherein the method comprises the following steps: simulating to generate a power distribution and utilization data table with a plurality of data volume levels; respectively carrying out performance test on the target big data platform according to each power distribution and utilization data table, and recording the starting execution time, the ending execution time, the CPU utilization rate, the disk IO interface writing speed and the memory utilization rate of the target big data platform at each data volume level; determining the execution time of the target big data platform under each data level grade, the number of data executed in unit time and the data quantity value executed in unit time according to each starting execution time and each ending execution time; and comparing and analyzing the execution time of the target big data platform at each data level, the number of data executed in unit time, the data quantity executed in unit time, the CPU utilization rate, the write-in speed of the disk IO interface and the memory utilization rate. By adopting the scheme of the invention, the performance test of each big data platform can be realized.
Description
Technical field
The present invention relates to technical field of electric power, more particularly to a kind of adapted TV university data platform method of testing and system.
Background technology
The core of intelligent grid is the depth integration of electrical energy stream and flow of information.Through the high speed development of more than 30 years,
China has been built up a flexible structure, advanced technology, reliable electrical energy flow network;Prominent with information technology
Fly to push ahead vigorously and being greatly reduced of construction cost, matched information flow network also achieves and develops rapidly.Correspondingly, distribution
With the basis of electricity consumption with service data through accumulation, explosive increase especially in recent years for many years, contain extremely valuable in a large number
Information it would be highly desirable to carry out depth excavation, comprehensive utilization, provide solid for building green, economic, reliable intelligent grid
Support.
Join electricity consumption data for the electricity data explosion growth of intelligent adapted and the challenge of electrical network business innovation and intelligence quick-fried
The urgent needss that fried property increases and electrical network business is innovated, can join information integrated platform with existing electric energy data center and battalion, from
The development of the stage constructions such as intelligent adapted TV university data system structure, data management, knowledge model, business innovation and Applied D emonstration is ground
Study carefully, further expand adapted electrical network inside and outside (meteorology, economy, user etc.) basic data source data scale, propose to join
Electricity consumption big data innovation theory and technology, Demonstration Construction adapted TV university data platform, and realized in vastness based on big data technology
Magnanimity join electricity consumption rule, interactive potentiality and the market behavior finding user in electricity consumption data, with user, extensively interactive lifting is joined
The considerable controllability of electrical network, security reliability and economic operation level, with personalized service lifting user power utilization efficiency, Ke Human
Meaning degree and interactive response level, promote adapted electrical network from Traditional business models to big based on data message dependency simultaneously
Data service mode changes, and realizes adapted electric industry business model innovation and social value's lifting.
Big data platform on domestic and international market is varied at present, and the performance how evaluating each big data platform becomes
One problem demanding prompt solution.
Content of the invention
It is an object of the invention to provide a kind of adapted TV university data platform method of testing and system, it is possible to achieve to each
The performance test of individual big data platform.
The purpose of the present invention is achieved through the following technical solutions:
A kind of adapted TV university data platform method of testing, including:
Simulation generates the adapted electricity tables of data of multiple data volume ranks;
Performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data, records described target big
Test relevant parameter under each described data volume rank for the data platform, described test relevant parameter include Starting Executing Time,
Terminate execution time, cpu busy percentage, disk I/O interface writing speed and memory usage;
Determine that described target big data platform exists respectively according to each described Starting Executing Time, each described end execution time
Execution time under each described data volume rank;
According to the data strip number in each described execution time, each described adapted electricity tables of data and each described adapted electricity tables of data
Data values determine respectively described target big data platform under each described data volume rank unit interval execution data
Bar number and the data values of unit interval execution;
Described execution time under each described data volume rank for the described target big data platform, described unit interval are held
The data strip number of row, the data values of execution of described unit interval, described cpu busy percentage, described disk I/O interface writing speed
It is compared analysis with described memory usage, obtain first performance test result.
A kind of adapted TV university data platform test system, including:
Signal generating unit, generates the adapted electricity tables of data of multiple data volume ranks for simulation;
Test cell, for performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data,
Record test relevant parameter under each described data volume rank for the described target big data platform, described test relevant parameter includes
Starting Executing Time, end execution time, cpu busy percentage, disk I/O interface writing speed and memory usage;
First processing units, for determining institute respectively according to each described Starting Executing Time, each described end execution time
State execution time under each described data volume rank for the target big data platform;
Second processing unit, for according to the data strip number in each described execution time, each described adapted electricity tables of data and
The data values of each described adapted electricity tables of data determine described target big data platform respectively under each described data volume rank
The data strip number of unit interval execution and the data values of unit interval execution.
Comparative analysiss unit, for the described execution under each described data volume rank to described target big data platform
Time, the data strip number of execution of described unit interval, the data values of execution of described unit interval, described cpu busy percentage, described
Disk I/O interface writing speed and described memory usage are compared analysis, obtain first performance test result.
According to the scheme of the invention described above, it is the adapted electricity tables of data that simulation generates multiple data volume ranks, according to each
Described adapted electricity tables of data carries out performance test to target big data platform respectively, records described target big data platform in each institute
State the test relevant parameter under data volume rank, described test relevant parameter include Starting Executing Time, terminate execution time,
Cpu busy percentage, disk I/O interface writing speed and memory usage, hold according to each described Starting Executing Time, each described end
The row time determines execution time under each described data volume rank for the described target big data platform respectively, according to each described execution
The data values of the data strip number in time, each described adapted electricity tables of data and each described adapted electricity tables of data determine described respectively
Data strip number and the number of unit interval execution that unit interval under each described data volume rank for the target big data platform executes
According to value, described execution time under each described data volume rank for the described target big data platform, described unit interval are held
The data strip number of row, the data values of execution of described unit interval, described cpu busy percentage, described disk I/O interface writing speed
It is compared analysis with described memory usage, obtain first performance test result, the solution of the present invention pair can be respectively adopted
Each big data platform carries out performance test it is achieved that the test of performance to each big data platform, can based on test result,
Big data platform needed for the situation of own hardware configuration surroundings and business datum amount Rational choice.
Brief description
Fig. 1 realizes schematic flow sheet for the adapted TV university data platform method of testing of the embodiment of the present invention one;
Fig. 2 is the composition schematic flow sheet of the adapted TV university data platform test system of the embodiment of the present invention two.
Specific embodiment
For making the objects, technical solutions and advantages of the present invention become more apparent, below in conjunction with drawings and Examples, to this
Invention is described in further detail.It should be appreciated that specific embodiment described herein is only in order to explain the present invention,
Do not limit protection scope of the present invention.
Embodiment one
The embodiment of the present invention one provides a kind of adapted TV university data platform method of testing, shown in Figure 1, is the present invention
The adapted TV university data platform method of testing of embodiment one realize schematic flow sheet.As shown in figure 1, the adapted electricity of the present embodiment
Big data platform test method comprises the steps:
Step S101:Simulation generates the adapted electricity tables of data of multiple data volume ranks;
Specifically, generation multiple different pieces of information magnitudes other adapted electricity tables of data can be simulated in oracle database,
And be data distribution table name in the adapted electricity tables of data of each data volume rank, the electric tables of data of adapted of different data volume ranks
Corresponding table name is different.
Here, under the number of levels of data volume rank and each data volume rank join electricity consumption data record number (or
Referred to as data strip number) and size (or referred to as data values) can set according to actual needs.Shown in table 1 is in reality
The relevant parameter of the adapted electricity tables of data of each set data volume rank in the test of border.But the data magnitude in the present embodiment
The setting means not limited to this of other adapted electricity tables of data.
The relevant parameter of the adapted electricity data of table 1 data volume rank
Step S102:Performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data, record
Test relevant parameter under each described data volume rank for the described target big data platform, described test relevant parameter includes starting
Execution time, end execution time, CPU (Central Processing Unit, central processing unit) utilization rate, disk I/O
(Input Output, input and output) interface writing speed and memory usage;
Here, described performance test can include data write test, data read test, data query test, data
One of sequence test data correlation inquiry test or arbitrarily multiple combinations.
Described Starting Executing Time is different according to the species of performance test with described end execution time, can represent different
Implication.For example, when carrying out data write test, Starting Executing Time represents the beginning write time, terminates implementation schedule
Show the end write time, also similar for other kinds of performance test, here does not repeat one by one.
Wherein, data write test data read test can include the write to three kinds of different types of data respectively
Test and read test;These three dissimilar inclusion structural datas, non-institutional data and semi-structured data.Specifically
Ground, when carrying out data write test, is to join each generating in a specific oracle database (Oracle platform)
Electricity consumption data table writes target big data platform, when carrying out data read test, is by the adapted in target big data platform
Electric tables of data writes this specific Oracle platform.
Here, described target big data platform is CDH platform, TDH platform, HDP platform or Oracle platform.
The full name of CDH be Cloudera ' s_Distribution Including Apache Hadoop, be with
Big data management platform based on ApacheHadoop.The full name Transwarp Data Hub of TDH, is that Hadoop cluster is big
Data platform.HDP full name Hortonworks Data Platform, is Apache Hadoop big data management platform.
Wherein, above-mentioned cpu busy percentage, disk I/O interface writing speed and memory usage can be remembered every setting time
Record is once it is also possible to only set moment record once;
Step S103:Determine that described target is big respectively according to each described Starting Executing Time, each described end execution time
Execution time under each described data volume rank for the data platform;
Specifically, deduct corresponding Starting Executing Time with the end execution time under each data volume rank respectively to obtain
Execution time under each data volume rank, for example, deducted under data volume rank 1 with the end execution time under data volume rank 1
Starting Executing Time execution time of obtaining under data volume rank 1, deduct number with the end execution time under data volume rank 2
Obtain the execution time under data volume rank 2 according to the Starting Executing Time under magnitude other 2, by that analogy.
Step S104:According to the data strip number in each described execution time, each described adapted electricity tables of data with each described join
The data values of electricity consumption data table determine the unit interval under each described data volume rank for the described target big data platform respectively
The data strip number of execution and the data values of unit interval execution;
Specifically, obtain each data with the data strip number in each adapted electricity tables of data divided by corresponding execution time respectively
Magnitude not under unit interval execution data strip number;Held divided by corresponding with the data values of each adapted electricity tables of data respectively
The row time obtains the data values of the unit interval execution under each data volume rank, for example, with the adapted electricity number of data volume rank 1
Obtain the unit interval execution under data volume rank 1 according to the data strip number in table divided by data volume rank 1 corresponding execution time
Data strip number, with the data values of the adapted of data volume rank 1 electricity tables of data divided by data volume rank 1 corresponding execution time
Obtain the data values of the unit interval execution under data volume rank 1, with the data in the adapted electricity tables of data of data volume rank 2
Bar number obtains the data strip number of the unit interval execution under data volume rank 2 divided by data volume rank 2 corresponding execution time, uses
The data values of the adapted electricity tables of data of data volume rank 2 obtain data magnitude divided by data volume rank 2 corresponding execution time
The data values of the unit interval execution under other 2, by that analogy.
Step S105:To described execution time under each described data volume rank for the described target big data platform, described
The data strip number of unit interval execution, the data values of execution of described unit interval, described cpu busy percentage, described disk I/O connect
Mouth writing speed and described memory usage are compared analysis, obtain first performance test result.
Specifically, to described execution time under each described data volume rank for the described target big data platform, described list
The data strip number of position time execution, the data values of execution of described unit interval, described cpu busy percentage, described disk I/O interface
Writing speed and described memory usage are compared analysis, obtain described target big data platform corresponding execution time, list
The data strip number of position time execution, the data values of unit interval execution, cpu busy percentage, disk I/O interface writing speed and interior
Deposit utilization rate in the inter-step variation tendency of each data volume and whether saltus step etc. occurs.In practical implementations, can distinguish
Set up execution time, the data strip number of unit interval execution, the data values of unit interval execution, cpu busy percentage, disk I/O connect
Mouth writing speed and the chart of memory usage, abscissa is data volume rank, and vertical coordinate is respectively execution time, unit interval
The data strip number of execution, the data values of unit interval execution, cpu busy percentage, disk I/O interface writing speed and internal memory use
Rate, searches execution time, the data strip number of unit interval execution, the data values of unit interval execution, CPU utilization by chart
Rate, disk I/O interface writing speed and memory usage in the inter-step variation tendency of each data volume.
Here, the data strip number that first performance test result can refer to execution time, the unit interval executes, unit interval hold
The data values of row, cpu busy percentage, disk I/O interface writing speed and memory usage in the inter-step change of each data volume
Change trend.
If every setting time record once described cpu busy percentage, described disk I/O interface writing speed and described interior
Deposit utilization rate, then the meansigma methodss of cpu busy percentage of record, disk I/O interface under the corresponding data volume rank of comparative analysiss here
The meansigma methodss of writing speed and the meansigma methodss of memory usage.
Accordingly, the scheme according to above-mentioned the present embodiment, it is the adapted electricity tables of data that simulation generates multiple data volume ranks,
Performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data, records described target big data platform
Test relevant parameter under each described data volume rank, described test relevant parameter includes Starting Executing Time, terminates execution
Time, cpu busy percentage, disk I/O interface writing speed and memory usage, according to each described Starting Executing Time, each described knot
Bundle execution time determines execution time under each described data volume rank for the described target big data platform, respectively according to each described
The data values of the data strip number in execution time, each described adapted electricity tables of data and each described adapted electricity tables of data determine respectively
Data strip number and unit interval execution that unit interval under each described data volume rank for the described target big data platform executes
Data values, during to described execution time under each described data volume rank for the described target big data platform, described unit
Between execution data strip number, the described unit interval execution data values, described cpu busy percentage, described disk I/O interface write
Speed and described memory usage are compared analysis, obtain first performance test result, can be respectively adopted the present embodiment
Scheme carries out performance test to each big data platform respectively it is achieved that the test of performance to each big data platform, can be with base
In the test result of each big data platform, the situation of own hardware configuration surroundings and the big number needed for business datum amount Rational choice
According to platform.
Additionally, except the test result under the data volume rank to same big data platform is compared analysis, acceptable
When being respectively adopted execution under each described data volume rank for the different big data platforms of above-mentioned steps S101- step S104 acquisition
Between, the unit interval execution data strip number, the unit interval execution data values, cpu busy percentage, disk I/O interface writing speed
With described memory usage, analysis is compared to the related data of different big data platforms.
Wherein in an embodiment, this adapted TV university data platform method of testing, also include:To different types of
The number that data strip number that process time under each data volume rank for the big data platform, unit interval are processed, unit interval are processed
It is compared analysis according to value, cpu busy percentage, disk I/O interface writing speed and memory usage, obtain the second performance test
Result.
Specifically, can be respectively under each data volume rank, to different types of big data platform during corresponding process
Between, the unit interval process data strip number, the unit interval process data values, cpu busy percentage, disk I/O interface writing speed
It is compared analysis respectively with memory usage, obtain the second the performance test results.Can be in conjunction with the second the performance test results, hard
Part configuration surroundings and business datum amount determine required optimum data platform, that is, realize the type selecting of big data platform.
Above-mentioned performance test is mainly the software system behavior expression testing target big data platform under certain condition
Whether meet the performance indications of requirement specification.For example, by testing the longest time limit transmitted, the error rate of transmission, the essence calculating
The performance indications such as degree, the time limit of response and recovery time limit, whether the software system of checking big data platform can reach demand rule
Performance indications proposed in lattice explanation, it was found that performance bottleneck in the presence of the software system of big data platform, reach excellent
Change the purpose of software system.
Additionally, for big data platform, reliability is also an important evaluation index, also necessary flat to big data
Platform reliability is tested, reliability be mainly test big data platform structure, destructuring and semi-structured memory node,
Network or single disk break down (or accident) when, the influence on system operation situation to whole system, and the result according to test,
Optimize corresponding big data frame structure, network topology deployment architecture.
Specifically, in wherein embodiment, the adapted TV university data platform method of testing of the present invention, its feature exists
In constituting target big data platform cluster by multiple same type of big data platforms, to described target big data platform cluster
Carry out reliability testing, described reliability testing includes:Choose fault to be measured from default fault set, wrap in described fault set
Include main metadata node failure, standby metadata node fault, memory node fault, memory node list disk failure, memory node
Network failure, main job scheduling node failure and task node disk failure;According to described fault to be measured to described target big number
Carry out fault simulation according to platform cluster, after validation fault simulation, whether affect big data platform cluster or the use with plug-in unit,
Obtain the result.
Wherein, the network topology deployment architecture of target big data platform cluster can be set according to actual needs.Mesh
Mark big data platform cluster includes main metadata node, standby metadata node, memory node, main job scheduling node and task
Node;Simulation main metadata node failure can be simulation host node process dies, and the standby metadata node fault of simulation can be
Delete standby metadata node fault;Analog storage node failure may be off memory node;The event of analog storage node list disk
Barrier can be all hard disks pulling out a memory node manually;Analog storage meshed network fault can be in a storage section
Point simulates machine of automatically delaying.
Additionally, the purpose of usability testing is whether detection user is satisfied with using the systems soft ware of big data platform, its survey
Examination purpose is the real work style allowing systems soft ware be suitable for user, rather than forces the work style of user to be adapted to software
System.During usability testing in big data platform, it is default whether the installation and deployment of mainly test big data platform meet
Plateform system installation and deployment, whether log audit function is complete, the subscriber administration interface whether scheme such as close friend.
Wherein in an embodiment, the adapted TV university data platform method of testing of the present invention, can also include:To described
Target big data platform carries out usability testing, and described usability testing includes installation and deployment test and administration interface test;Institute
State installation and deployment test includes testing whether the plug-in unit of installation and version on described target big data platform are compatible, detection is described
The adaptation platform of target big data platform, complexity is installed, has or not graphical, the information such as configuration complexity and version information;
Whether the administration interface that described administration interface test includes testing described target big data platform can access, and test described target
Whether the file system of big data platform can normally use, and test is under the administration interface of described target big data platform
Daily record whether can be checked, whether there is graphic interface, whether there is templating Service Management and whether there is daily record divide
Analysis function.
Extensibility is a kind of design objective that software system is calculated with disposal ability, and high scalability represents a kind of bullet
Property, in extending in system developmental process, software ensure that vigorous vitality, is set by little change or even simply hardware
Standby adds, and can be achieved with the linear increase of whole system disposal ability, realizes high-throughput and low latency high-performance.Join every year
Grown at top speed with the data of several TB with data volume, big data platform is extended with increase-volume is very normal, and test is joined
To be a critically important index with the extensibility of big data platform.Wherein in an embodiment, the adapted electricity of the present invention
Big data platform test method, can also include described target big data platform cluster is carried out dynamic expansion test, described dynamic
State extension test includes:Increase a back end in the described target big data platform cluster being currently running, verify whether
The use of impact file system.
Additionally, the adapted TV university data platform method of testing of the present invention, can also include to described target big data platform
Cluster carries out security test, and described security test includes:The described target big data platform cluster being currently running is divided
Not Shu Ru validated user information, inactive users information, whether separately verify can be with login system;Or, be currently running
Using various authorities login system respectively in described target big data platform cluster, verify that described target big data platform cluster is
No covering system all permissions;Or, accessed after described target big data platform cluster using illegal way, check audit
Whether daily record has recorded this unauthorized access.
In addition, after carrying out usability testing, reliability testing, autgmentability test or security test etc.,
Analysis can be compared to the test result of different types of big data platform (or different types of cluster), here differs
One repeats.
The adapted TV university data platform method of testing of the present invention, can also be according to each described adapted electricity tables of data respectively to mesh
Mark big data platform carries out algorithm model test;Described algorithm model test includes K-Means cluster test and Linear
Regression class test.Wherein, line algorithm is being entered to target big data platform respectively according to each described adapted electricity tables of data
Model measurement and when performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data, record interior
Hold and be similar to the processing mode of content.
For the ease of understanding the solution of the present invention it is contemplated that the data characteristic of electrical network, below with respectively to CDH platform, TDH
Illustrate as a example platform, HDP platform and four big data platforms of Oracle platform.
1) performance test
1.1) data write test
Here, data write test includes the write test to three kinds of different types of data, and these three are different types of
The write test of data is respectively structural data write test, unstructured data write test and semi-structured data and writes
Enter.
1.1.1) structural data write test
In oracle database, simulation generates structurized adapted electricity tables of data as other in 9 data magnitudes in table 1,
9 data magnitudes are not imported to CDH platform, TDH platform, HDP platform and Oracle platform from Oracle platform respectively, obtains
CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not under the importing time (unit be second), unit
The data record bar number (unit is bar number/second) of time storage, the size of data (unit is the MB/ second) of unit interval storage, CPU
Utilization rate, disk I/O interface writing speed (unit is the KB/ second) and memory usage, can be by these parameters drafting pattern respectively
Table, abscissa is data magnitude, and vertical coordinate is corresponding value of consult volume.
Wherein, CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not under importing time etc.
Import time started and the corresponding difference importing the end time in corresponding;
CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not under unit interval storage
Data record bar number is equal to the ratio of corresponding data record total number and corresponding importing time;
CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not under unit interval storage
Size of data is equal to the ratio of corresponding data total size and corresponding importing time.
In the process that 9 data magnitudes are not directed respectively into CDH platform, TDH platform, HDP platform and Oracle platform
In, respectively record CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not under importing start when
Between, import end time server disk I/O, CPU, memory consumption situation.
Test result shows:One) for the time of importing, with the increase of data magnitude, each big data platform imports the time
Also accordingly with increase.Each big data platform is directed respectively into when joining electricity consumption data less than 100,000,000 orders of magnitude structurized,
Import time phase difference less, but after data magnitude reaches 700,000,000, Oracle platform imports data to Oracle platform
Time to substantially to grow a lot, and the time phase difference that tri- platforms of CDH, TDH, HDP import is little.
Two) for the data record bar number of unit interval storage, when data magnitude is within 10,000,000, each big several platforms
When importing data, curve ratio relatively relaxes, but the increase with data magnitude, when data magnitude is 1,000 ten thousand to 1 hundred million, each flat greatly
The platform unit interval data of storage significantly increases;When data magnitude is when reaching 7 hundred million to 30 hundred million, increase further with data volume level
Greatly, the data of Oracle platform unit interval storage has obvious falling trend, and the number of other three platform unit interval storage
Change less according to basic.
Three) for the data values of unit interval storage, when data magnitude is within 50,000,000, each big several platforms import
During data, trend comparison relaxes, but the increase with data magnitude, when data magnitude is 5,000 ten thousand to 1 hundred million, oracle platform
The unit interval data values of storage significantly increase, and the data values of other three platform unit interval storage sizes also with
Slowly increase;When data magnitude is when reaching 700,000,000, increase further with data volume level, the Oracle platform unit interval stores
Data values substantially fall after rise, and the data values of other three platform unit interval storage are basically unchanged;When data magnitude is big
When 700,000,000, each big data platform unit interval data storage size is basically unchanged.
Four) for the chart of cpu busy percentage, the increase with data set in each big data platform, service station CPU profit
It is basically unchanged with rate, back end cpu busy percentage is with increase, but Oracle platform cpu busy percentage is than other three platforms
Exceed a lot, and other three platform cpu busy percentages are more or less the same.
Five) for disk I/O interface writing speed, increase with data set in each big data platform, each platform disk
I/O interface writing speed significantly improves.In below data set 200W, disk I/O interface writing speed is more or less the same;Work as data set
When 500 ten thousand to 1,000 ten thousand, Oracle platform disk I/O interface writing speed is apparently higher than other three platforms, and other platform
Disk I/O interface writing speed is slow;When data set more than 50,000,000, each platform IO read or write speed increases substantially, and
Oracle platform disk I/O interface writing speed has obvious falling, and when data set 100,000,000, the disk I/O of other three platforms connects
Mouth writing speed changes slowly.
Six) for memory usage, the increase with data set in each big data platform, each platform internal memory utilization rate becomes
Change slow, but when Oracle platform imports, memory usage is significantly less than other three platforms.
Brief summary:CDH, TDH, HDP and Oracle platform is shown in write data target test result:
1st, all using hadoop be ecological big data platform before 100,000,000 DBMS amounts, than the write performance of Oralce
Than larger gap;After 100000000 data volumes, the advantage of big data just must embody;
2nd, the cpu busy percentage of each service node of big data platform compares relatively low, and the ISCSI node server of Oracle
Cpu busy percentage is relatively high;
3rd, after 100,000,000 data volumes, the speed speed of IO depends on hardware to the disk I/O of each service node of big data platform
Performance;
4th, the internal memory of big data platform is substantially all taken it is impossible to do other expenses by the service of big data.
1.1.2) unstructured data write test and semi-structured data write test;
Unstructured data write test and semi-structured data write test are phases all with structural data write test
As, simply the type of adapted electricity tables of data is different, uses destructuring (tool carrying out unstructured data write test
Body be video file) adapted electricity tables of data, carry out semi-structured data write test use semi-structured adapted
Electric tables of data, for saving length, here is not repeated one by one.
1.2) data read test
Here, data read test includes the read test to three kinds of different types of data, and these three are different types of
The read test of data is respectively structural data read test, unstructured data read test and semi-structured data and reads
Take.
1.2.1) structural data read test
In oracle database, simulation generates and as other in 9 data magnitudes in table 1 structurized joins electricity consumption data
Table, 9 data magnitudes are not imported to Oracle platform from CDH platform, TDH platform, HDP platform and Oracle platform respectively,
Obtain CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not lower importing time (unit is the second),
(unit is MB/ to the size of data that the data record bar number (unit is bar number/second) of unit interval storage, unit interval store
Second), cpu busy percentage, disk I/O boot speed (unit be KB/ second) and memory usage, these parameters can be drawn respectively
Become chart, abscissa is data magnitude, vertical coordinate is corresponding value of consult volume.
Wherein, CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not under importing time etc.
Import time started and the corresponding difference importing the end time in corresponding;
CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not under unit interval storage
Data record bar number is equal to the ratio of corresponding data record total number and corresponding importing time;
CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not under unit interval storage
Size of data is equal to the ratio of corresponding data total size and corresponding importing time.
By 9 data magnitudes not respectively from Oracle platform import to CDH platform, TDH platform, HDP platform and
During Oracle platform, record CDH platform, TDH platform, HDP platform and Oracle platform are other in 9 data magnitudes respectively
Under the importing time started, import end time server disk I/O, CPU, memory consumption situation.
Test result shows:One) for the time of importing, with the increase of data magnitude, import to from each big data platform
Oracle plateau time is also accordingly with increase.Each big data platform imports less than discovery importing during 100,000,000 quantity collection
Time phase difference less, but after data volume reaches 700,000,000, imports data to the time of Oracle platform in Oracle platform
There is obvious falling, and the time phase difference that tri- platforms of CDH, TDH, HDP import is little.
Two) for the data record bar number of unit interval storage, when importing Oracle platform with each big several platform datas,
The data acknowledgment number of different magnitude of unit interval storage significantly improves, because Oracle platform holding time storage ratio is larger,
Other each plateau time storage embodiments are inconspicuous, but when data magnitude is more than 700,000,000, each big data platform unit interval storage
Data is basically unchanged.
Three), for the data values of unit interval storage, when data magnitude is within 10,000,000, each big several platforms import number
According to when curve ratio relatively relax, but the increase with data magnitude, when data magnitude is 1,000 ten thousand to 1 hundred million, oracle platform list
The size of data of position time storage significantly increases, and the size of data of other three platform unit interval storage is also with slowly increasing
Greatly;When data magnitude is when reaching 700,000,000, increase further with data volume level, the data that the Oracle platform unit interval stores
Size substantially falls trend after rise, and the size of data of other three platform unit interval storage is basically unchanged;When data magnitude is more than 7
When hundred million, each big data platform unit interval data storage size is basically unchanged.
Four) for the chart of cpu busy percentage, the increase with data set in each big data platform, service station CPU profit
It is basically unchanged with rate, back end cpu busy percentage is with increase, but Oracle platform cpu busy percentage is than other three platforms
Increase, and other three platform cpu busy percentages are more or less the same.
Five) for disk I/O interface writing speed, increase with data set in each big data platform, each platform disk
I O read speed significantly improves.In below data set 200W, disk I/O reading speed is more or less the same;When data set 5,000,000 arrives
When 10000000, Oracle platform disk I/O interface writing speed is apparently higher than other three platforms, and the disk I/O of other platform
Reading speed is steady;When data set more than 50,000,000, each platform IO read or write speed increases substantially, and Oracle platform disk
I O read speed has obvious falling, and when data set 100,000,000, the disk IO read-write speed of other three platforms increases substantially.
Six) for memory usage, the increase with data set in each big data platform, each platform internal memory utilization rate becomes
Change slow, but when Oracle platform imports, memory usage is significantly lower than other three platforms.
Brief summary:Import Oracle index test result to CDH, TDH, HDP and Oracle platform structure data to show:
1st, the data deriving from 3 big data platforms whole test process all the time than on the Oracle derivation time all than
Relatively slow, analysis reason is mainly 3 big data platforms and will be transformed into HDFS, then can lead such as Oracle platform again, and
Oralce is to be introduced directly into another one Oracle, and therefore Oracle herein means than 3 big data platforms and puts on superior performance;
2nd, internal memory and cpu busy percentage, is consistent with performance test trend before, simply disk IO read-write Oracle exists
Suddenly increase when 5000000 data, then gradually reduce.
1.1.2) unstructured data read test and semi-structured data read test;
Unstructured data read test and semi-structured data read test are phases all with structural data read test
As, simply the type of adapted electricity tables of data is different, uses destructuring (tool carrying out unstructured data read test
Body be video file) adapted electricity tables of data, use semi-structured adapted carrying out semi-structured data read test
Electric tables of data, for saving length, here is not repeated one by one.
1.3) data query test
Simulate respectively in 9 test data ranks as above in CDH platform, TDH platform, HDP platform and Oracle platform
Adapted electricity tables of data in inquire about respectively and specify object, for example, 2 month of inquiry user power utilization type electricity charge situation, checking is big
The execution performance to inquiry data for the data platform.Record CDH platform, TDH platform, HDP platform and Oracle platform are surveyed at each
The query time of examination data-level, and with every five seconds for example for interval, record cpu busy percentage, the disk I/O interface write speed of each node
Degree, memory usage.
Test result shows:For query time, with the increase of data magnitude, each platform inquire about the corresponding time also with
Increase.In each big data platform, inquiry is more or less the same less than discovery time during 100,000,000 order of magnitude respectively, but works as data magnitude
When after reaching 700,000,000, in Oracle platform inquire about data to Oracle platform time increase substantially, and CDH, TDH,
Tri- platform query times of HDP are more or less the same.
Electricity charge feelings from each big data platform 2 month of inquiry user power utilization type are generated according to the query time being recorded
Condition, data record (bar number/second) chart of different magnitude of unit interval storage, it is seen that with each big several platforms
Data set increases, and in inquiry operation, the data acknowledgment number of different magnitude of unit interval storage significantly improves, but Oracle
Platform holding time storage record number is more steady, and other each plateau time storage is in ascendant trend.
Electricity charge feelings from each big data platform 2 month of inquiry user power utilization type are generated according to the query time being recorded
Condition, size of data (MB/ second) chart of different magnitude of unit interval storage, it is seen that with each big several platform numbers
Increase according to collection, in inquiry operation, different magnitude of unit interval data storage size significantly improves, because Oracle platform accounts for
More steady with time data storage size, and other each plateau time data storage size is in ascendant trend.
Cpu busy percentage chart is generated according to the cpu busy percentage being recorded, can be seen that from cpu busy percentage chart, each big
With the increase of data set in data platform, during inquiry operation cpu busy percentage with increase, but Oracle platform cpu busy percentage
More high than other three platforms, and when other three platform data collection are more than 700,000,000, cpu busy percentage is more or less the same.
Disk I/O interface writing speed (KB/S) chart is generated according to the disk I/O interface writing speed being recorded, from this figure
Can be seen that in table, increase with data set in each big data platform, inquiry operation is bright to each platform disk I/O reading speed
Aobvious raising.When inquiring about 500W data below collection, disk I/O reading speed is more or less the same;When inquiry 1,000 ten thousand to 5,000 ten thousand data
During collection, the disk I/O reading speed amplification of each platform is little;When inquiring about 100,000,000 data set, each platform IO read or write speed is significantly
Improve, and Oracle platform disk I/O reading speed has obvious falling, the disk I/O of each platform when inquiring about 3,000,000,000 data set
Reading speed changes very greatly.
Memory usage chart is generated according to the memory usage being recorded, can be seen that from this chart, in each big data
With the increase of inquiry data set in platform, each platform internal memory utilization rate change is slow, but Oracle platform internal memory utilization rate
It is significantly less than other three platforms.
Brief summary:CDH, TDH, HDP and Oracle platform data inquiry index test result is shown:
1st, 3 big data platforms hardly differ to the performance of inquiry, wherein CDH somewhat than TDH and HDP strong little by little,
Oracle is better than the platform of big data structure among the inquiry less than 100,000,000 data magnitudes, and then process time is elongated afterwards;
2nd, the cpu busy percentage of each service node of big data platform compares relatively low, and the ISCSI node server of Oracle
Cpu busy percentage is relatively high;
3rd, after 100,000,000 data volumes, the speed speed of IO depends on hardware to the disk I/O of each service node of big data platform
Performance;
4th, the internal memory of big data platform is substantially all taken it is impossible to do other expenses by the service of big data.
1.4) data sorting test
Simulate respectively in 9 test data ranks as above in CDH platform, TDH platform, HDP platform and Oracle platform
Adapted electricity tables of data in takes out and specifies number (such as 20) data, and the data taken out is ranked up inquire about (for example
Descending is inquired about), record CDH platform, TDH platform, HDP platform and Oracle platform are in the query execution of each test data rank
Time, disk I/O, CPU, memory consumption situation.
Below by from each big data platform take out 100 data do descending sort to Oracle as a example illustrate test result.
Do descending sort by taking out 100 data from each big data platform to the query execution time generation figure of Oracle
Table, can be seen that the increase with data magnitude by the chart generating, each big data platform corresponding time to sorting operation
Also with increase.Find the time phase difference of sequence when sequence is less than 100,000,000 quantity magnitude less in each big data platform, but
After data magnitude reaches 700,000,000, in Oracle platform, the time of sorting data is improved largely, and CDH, TDH, HDP
Three platform sorting times are more or less the same.
Do descending sort by taking out 100 data from each big data platform to the data note of the unit interval storage of Oracle
Record (bar number/second) generates chart, be can be seen that by the chart generating to take out 100 data with each big several platforms and do descending and looks into
During inquiry, the data acknowledgment number of different magnitude of unit interval storage significantly improves, because Oracle platform holding time stores ratio
Less, other each plateau time storages are in ascendant trend.
Will be big for the data doing descending sort to the unit interval storage of Oracle from each big data platform taking-up 100 data
Little (MB/ second) generates chart, can be seen that when data magnitude is within 5,000,000 by the chart generating, in Oracle platform row
Ordinal number according to when curve fluctuation big, but the increase with data magnitude, when 500 ten thousand to 30 hundred million, curve fluctuation is relatively more steady, and its
The size of data of his three platform unit interval storage is also with slowly increasing;When data magnitude is when reaching 1,500,000,000, with number
Improve further according to magnitude, Oracle platform unit interval data storage size has through micro- falling.
The cpu busy percentage being descending sort to Oracle from each big data platform taking-up 100 data is generated chart, by
Generate chart can be seen that the increase with sorting data collection in each big data platform, cpu busy percentage with increase, but
Oracle platform cpu busy percentage increases than other three platforms, and other three platform cpu busy percentages are more or less the same.
Do descending sort by taking out 100 data from each big data platform to the disk I/O interface writing speed of Oracle
(KB/S) generate chart, be can be seen that by the chart generating and increase with data set in each big data platform, sorting operation pair
Each platform disk I/O reading speed significantly improves.When inquiring about 500W data below collection, disk I/O reading speed rises less;
When inquiring about 1,000 ten thousand to 5,000 ten thousand data set, the disk I/O reading speed of each platform is similar;When inquiring about 100,000,000 data set,
Each platform IO read or write speed amplitude improves, and Oracle platform disk I/O reading speed has obvious falling, in inquiry 3,000,000,000 number
Disk I/O reading speed according to each platform during collection changes very greatly.
Do descending sort by taking out 100 data from each big data platform to the memory usage generation chart of Oracle,
The increase with data set in each big data platform be can be seen that by the chart generating, each platform internal memory utilization rate change is slow
Slowly, but Oracle platform internal memory utilization rate is significantly less than other three platforms.
Brief summary:CDH, TDH, HDP and Oracle platform data sequence index test result is shown:With data query index
Result be consistent.
1.5) data association inquiry test
The adapted electricity tables of data of 9 amount of test data ranks first in simulation generates as table 1 in Oracle platform;
Secondly, respectively in the left pass joint investigation in 9 adapted electricity tables of data respectively of CDH platform, TDH platform, HDP platform and Oracle platform
Ask the target data electricity charge situation of user power utilization type (such as 2 month), and record CDH platform, TDH platform, HDP platform and
Query execution time under each test data rank for the Oracle platform, disk I/O, CPU, memory consumption situation.
Below to electricity charge table and use in the adapted electricity tables of data of each amount of test data rank from each big data platform
Family type list 2 month of left correlation inquiry, the test result of electricity charge situation of user power utilization type illustrated.
Electricity charge table in the electric tables of data of adapted being generated from each big data platform by the query execution time in test result
With user type table left 2 month of correlation inquiry user power utilization type electricity charge situation, the data of different amount of test data ranks arrives
Time (second) chart of Oracle, can be seen that from chart, with the increase of data magnitude, when each platform correlation inquiry is corresponding
Between also with increase.In each big data platform correlation inquiry be less than 100,000,000 order of magnitude when find correlation inquiry time phase difference not
Greatly, but after data magnitude reaches 700,000,000, in Oracle platform, the time of correlation inquiry data is improved largely, and
Tri- platform correlation inquiry time phase differences of CDH, TDH, HDP are little.
According to the electricity charge in the electric tables of data of adapted that the query execution time in test result generates from each big data platform
Table and the electricity charge situation of user type table 2 month of left correlation inquiry user power utilization type, the unit of different amount of test data ranks
Data record (bar number/second) chart of time storage, can be seen that, with each big several platform electricity charge tables and user type from chart
The data set of the left correlation inquiry of table increases, and the data acknowledgment number of different magnitude of unit interval storage significantly improves, due to
The storage of Oracle platform holding time is smaller, and other each plateau time storages are in ascendant trend.
According to the electricity charge in the electric tables of data of adapted that the query execution time in test result generates from each big data platform
Table and the electricity charge situation of user type table 2 month of left correlation inquiry user power utilization type, the unit of different amount of test data ranks
Size of data (MB/ second) chart of time storage, can be seen that, when data magnitude is within 5,000,000, in Oracle from chart
During platform correlation inquiry, curve fluctuation is big, but the increase with data magnitude, when 500 ten thousand to 30 hundred million, curve fluctuation is more flat
Surely, and the size of data of other three platform unit interval storages also with slowly increasing;When data magnitude is when reaching 1,500,000,000,
Improve further with data volume level, the data that the Oracle platform unit interval stores has slight falling.
According to electricity charge table in the electric tables of data of adapted that the cpu busy percentage in test result generates from each big data platform
With user type table left 2 month of correlation inquiry user power utilization type electricity charge situation, the CPU of different amount of test data ranks utilizes
Rate chart, can be seen that from chart, the increase with correlation inquiry data set in each big data platform, cpu busy percentage with
Increase, but Oracle platform cpu busy percentage increases than other three platforms, and other three platform cpu busy percentage phases
Difference is less.
Generate the adapted electricity from each big data platform according to disk I/O interface writing speed (KB/S) in test result
The electricity charge situation of electricity charge table and user type table 2 month of left correlation inquiry user power utilization type, different test datas in tables of data
(KB/S chart, can be seen that from chart magnitude other disk I/O interface writing speed, with data set in each big data platform
Increase, each platform disk I/O reading speed significantly improves.When data set is below 1,500,000,000, disk I/O reading speed Amplitude Ratio is relatively
Relax;When data set 3,000,000,000, Oracle platform disk I/O reading speed is significantly lower than other three platforms, and other platform
Disk I/O reading speed is very big.Generate the adapted electricity number from each big data platform according to the memory usage in test result
According to the electricity charge situation of electricity charge table in table and user type table 2 month of left correlation inquiry user power utilization type, different amount of test data
The memory usage chart of rank, can be seen that from chart, with the increase of data set, each platform internal memory in each big data platform
Utilization rate change is slow, but Oracle platform internal memory utilization rate is significantly less than other three platforms.
Brief summary:CDH, TDH, HDP and Oracle platform data coupling index test result is shown:With data query index
Result be consistent.
2) reliability testing
2.1) memory node engine test
2.1.1) main metadata node failure test
Simulate the big data platform fault at the host node in each big data platform cluster, the big data at checking host node
Whether platform fault affects the use of system and plug-in unit it is therefore an objective to test big data platform fault (the such as process at host node
Die) whether impact is had or not on big data platform cluster.
Specifically, for target data platform cluster (any one in each big data platform cluster), after normal startup,
Execution catalogue checks that the process number of corresponding main metadata node is searched in operation, after deleting the corresponding process of this process number, checks
Whether WEB page can be accessed.
Test result shows, main metadata is lost or when breaking down, no matter be 3 big data platform cluster or
Oracle cluster platform, all can cause fault to system operation, if after crossing reply main metadata, business datum will not be lost.
2.1.2) standby metadata node fault test
Simulate the big data platform fault of the standby metadata node in each big data platform cluster, the standby metadata node of checking
The big data platform fault at place whether affects the use of system and plug-in unit it is therefore an objective to the big data at the standby metadata node of test is put down
Whether platform fault (such as hard disk failure) has or not impact to big data platform cluster.
2.1.3) memory node fault test
Simulate the big data platform fault at the one of back end in each big data platform cluster, verify data section
Big data platform fault (for example deleting at this node) at point whether affects the use of system and plug-in unit it is therefore an objective to test is standby
Whether the big data platform fault (such as hard disk failure) at metadata node has or not impact to big data platform cluster.
2.1.4) memory node list disk failure test
Simulate the hard disk failure at the one of back end in each big data platform cluster, at checking back end
Whether hard disk failure affects the use of system and plug-in unit it is therefore an objective to test the one of data section in each big data platform cluster
Whether the hard disk failure at point has or not impact to big data platform cluster.
Test result shows:The integrity no shadow to data storage for the hard disk failure of any node in big data platform cluster
Ring
2.1.5) storage node network fault test
Simulate the network failure at the one of back end in each big data platform cluster (machine fault of delaying), verify number
Whether affect the use of system and plug-in unit according to the network failure at node it is therefore an objective to test in each big data platform cluster wherein
Whether the network failure at one back end has or not impact to big data platform cluster.
Test result shows, in big data platform cluster, any memory node section is surprisingly delayed machine, does not interfere with file
System and the normal use of hbase.
2.2) parallel computation engine test
2.2.1) main job scheduling node failure test
Simulate the main job scheduling node failure in each big data platform cluster, the network failure at checking back end is
No impact scheduling system and hbase using it is therefore an objective in test cluster main job scheduling node failure data is debugged complete
Whole property has or not impact.
Test result shows, the fault that each big data platform cluster dispatches host node has shadow to the integrity of data storage
Ring.
2.2.2) task node disk failure test
Simulate wherein one task data node hard disk failure in each big data platform cluster, validation task back end
Hard disk failure whether affect scheduling system and Oozie using it is therefore an objective in each big data platform cluster manually removes any its
In an Oozie node, file system is not no affected.
Test result shows, the fault that cluster Oozie scheduling node deleted by each big data platform is complete to data storage
Property does not affect.
3) usability testing
3.1) installation and deployment test
Installation and deployment test refers to that easy difficulty is compared to each big data stage+module, test each big data platform peace
Plug part and whether version is compatible, if normal use.Each big data stage+module adapts to platform, installs complicated process, has or not
Graphically, the information such as configuration complexity.3.2) administration interface test
Whether simulation test each big data Platform Management Interface can access, and file system HDFS could normally use, respectively
Audit log, monitoring analysis, the content such as templating management is checked under big data Platform Management Interface.4) extension and safety test
4.1) dynamic expansion test
Increase a back end respectively in each big data platform cluster, verify whether to affect the use of file system.
Increasing manually a back end in each big data platform cluster does not affect looking into of file system and hbase data base
See and use.
4.2) authentication test
Simulate and carry out logging in, input inactive users information, interpolation and delete use toward password in each big cluster being currently running
Whether family can be with login system.
4.3) access control test
Respectively using various authorities login system respectively in each target big data platform cluster being currently running, checking is right
The target big data platform cluster answered whether covering system all permissions
4.4) audit testing
Carry out unauthorized access respectively in each target big data platform cluster being currently running, check corresponding journal record
Whether record has this unauthorized access, checks the letter such as record time of recorded unauthorized access, IP address, user name, operation
Whether breath is consistent with this unauthorized access.
5) algorithm model test case
5.1) K-Means cluster
Specifically, 9 data magnitudes of simulation generation other adapted electricity tables of data in big data platform;According to each data
Carry out K-Means cluster test in magnitude other adapted electricity tables of data;Each cluster execution more than 5 times, records execution time, takes
Average time is as the final result of performance test;With every five seconds for example for interval, record the disk I/O of each node, IO of network, interior
Deposit utilization rate, CPU usage.
5.2) Linear Regression class test
Specifically, 9 data magnitudes of simulation generation other adapted electricity tables of data in big data platform, according to each adapted
The different field of electric tables of data, such as electricity charge field, SIC code field carry out LinearRegressionn class test record
Execution time, take the average time as the final result of performance test;With every five seconds for example for interval, record each node disk I/O,
The IO of network, memory usage, CPU usage.
Induction and conclusion is carried out to the test results of above-mentioned several as follows:
(1) performance evaluation
1st, to CDH platform, TDH platform, HDP platform and Oracle platform in same hardware environmental testing, data volume is 1
During hundred million (size is in 10G) left and right, the insertion of Oracle relevant database (Oracle platform), read-write, association, the performance of inquiry
More superior than other 3 big data platforms, during more than 700,000,000 data volumes (or 30G size), the performance ratio of other big data platforms
Oracle is a lot of by force.In this test environment, the performance ratio of CDH is somewhat more winning than TDH, HDP;
2nd, in this test environment, CDH platform, TDH platform, HDP platform utilization rate all ratios relatively low, average utilization is all
Within 10% about, but the utilization rate of Oracle platform, and the utilization rate of wherein accessed node server reached 70-
80%, other service nodes are in 7-9%;
3rd, for the read-write situation of disk I/O, CDH, TDH, HDP3 big data platform, in the read-write reaching hardware disk
Tend to be steady after maximum, but the disk I/O situation of Oracle all maintains a relatively low compound average, and
Each order of magnitude fluctuation ratio is larger;
4th, for the internal memory situation of test, the EMS memory occupation of CDH, TDH, HDP3 big data platform main metadata node
Rate ratio is relatively low, but other back end memory usages have all reached on 95% always.And the memory usage of Oracle is
Yo-yo, after reaching certain peak value, meeting reduction progressively.
(2) reliability evaluation
1st, in this test environment, the main metadata one malfunctions of CDH, TDH, HDP, its system all cannot normally be transported
OK, after standby metadata is replied, can normally run, no data is lost, and Oracle main metadata breaks down, nor just
Often run, need to recover from backup file;
2nd, the memory node fault of CDH, TDH, HDP, disk failure, network failure, if what node allowed in its configuration
In fault coverage, system will normally be run.The memory node fault of Oracle, platform is not normally functioning, and data may be no
Method is repaired completely;
3rd, CDH, TDH, HDP be in parallel computation, main task one malfunctions, then calculating task cannot be carried out normally transporting
OK, task node breaks down, and its calculating task will transfer to other task nodes.
(3) usability evaluation
1st, CDH, TDH and HDP3 big data platform, installing is all that command window is relatively all more difficult, is only applicable to
Linux operating platform, and Oracle is graphical installation interface, installs configuration relatively easy
2nd, CDH, TDH and HDP3 big data platform, has the administration interface of WEB, and Oracle also has WEB administration interface,
But it is fairly simple.
(4) extension and safety evaluation
1st, extensibility aspect, 4 platforms can carry out point spread, after extension, data and calculating is no affected;
2nd, in terms of safety, 4 platforms will carry out authentication, audit log record, but the audit day of Oracle
Will record is order analysis, and CDH, TDH and HDP are to be directly viewable in WEB interface.
To sum up, when selecting data platform, need according to the situation of own hardware configuration surroundings and the feelings of business datum amount
Condition accounts for.
Embodiment two
According to the adapted TV university data platform method of testing in above-described embodiment, the present invention also provides a kind of adapted TV university number
According to platform testing system.Fig. 2 is the composition structural representation of the adapted TV university data platform test system of the embodiment of the present invention two
Figure.The composition structural representation of the adapted TV university data platform test system of the embodiment of the present invention two is shown in Fig. 2.As Fig. 2
Shown, in the present embodiment adapted TV university data platform test system, at signal generating unit 201, test cell 202, first
Reason unit 203, second processing unit 204 and comparative analysiss unit 205, wherein:
Signal generating unit 201, generates the adapted electricity tables of data of multiple data volume ranks for simulation;
Test cell 202, for carrying out performance survey to target big data platform respectively according to each described adapted electricity tables of data
Examination, records test relevant parameter under each described data volume rank for the described target big data platform, described test relevant parameter
Including Starting Executing Time, terminate execution time, cpu busy percentage, disk I/O interface writing speed and memory usage;
First processing units 203, for determining respectively according to each described Starting Executing Time, each described end execution time
Execution time under each described data volume rank for the described target big data platform;
Second processing unit 204, for according to the data strip number in each described execution time, each described adapted electricity tables of data
Determine described target big data platform respectively under each described data volume rank with the data values of each described adapted electricity tables of data
Unit interval execution data strip number and the unit interval execution data values;
Comparative analysiss unit 205, for described target big data platform under each described data volume rank described in hold
Row time, the data strip number of execution of described unit interval, the data values of execution of described unit interval, described cpu busy percentage, institute
State disk I/O interface writing speed and described memory usage is compared analysis, obtain first performance test result.
Adapted TV university data platform test system provided in an embodiment of the present invention it is pointed out that:Above for adapted
The description of TV university data platform test system, the description with above-mentioned adapted TV university data platform method of testing is similar, and
There is the beneficial effect of above-mentioned adapted TV university data platform method of testing, for saving length, repeat no more;Therefore, above to this
The ins and outs not disclosed in the adapted TV university data platform test system that inventive embodiments provide, refer to joining of above-mentioned offer
The description of electricity consumption big data platform test method.
Each technical characteristic of embodiment described above can arbitrarily be combined, for making description succinct, not to above-mentioned reality
The all possible combination of each technical characteristic applied in example is all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, all it is considered to be the scope of this specification record.
Embodiment described above only have expressed the several embodiments of the present invention, and its description is more concrete and detailed, but simultaneously
Can not therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art
Say, without departing from the inventive concept of the premise, some deformation can also be made and improve, these broadly fall into the protection of the present invention
Scope.Therefore, the protection domain of patent of the present invention should be defined by claims.
Claims (10)
1. a kind of adapted TV university data platform method of testing is it is characterised in that include:
Simulation generates the adapted electricity tables of data of multiple data volume ranks;
Performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data, records described target big data
Test relevant parameter under each described data volume rank for the platform, described test relevant parameter includes Starting Executing Time, end
Execution time, cpu busy percentage, disk I/O interface writing speed and memory usage;
Determine described target big data platform in each institute respectively according to each described Starting Executing Time, each described end execution time
State the execution time under data volume rank;
Number according to the data strip number in each described execution time, each described adapted electricity tables of data and each described adapted electricity tables of data
Determine the data strip number of unit interval execution under each described data volume rank for the described target big data platform according to value respectively
Data values with unit interval execution;
To described execution time under each described data volume rank for the described target big data platform, described unit interval execution
Data strip number, the data values of execution of described unit interval, described cpu busy percentage, described disk I/O interface writing speed and institute
State memory usage and be compared analysis, obtain first performance test result.
2. adapted TV university data platform method of testing according to claim 1 is it is characterised in that also include:
Data strip number that process time under each data volume rank for the different types of big data platform, unit interval are processed,
Data values, cpu busy percentage, disk I/O interface writing speed and memory usage that unit interval is processed are compared analysis,
Obtain the second the performance test results.
3. adapted TV university data platform method of testing according to claim 1 is it is characterised in that by multiple same type of
Big data platform constitutes target big data platform cluster, carries out reliability testing to described target big data platform cluster, described
Reliability testing includes:
Choose fault to be measured from default fault set, described fault set includes main metadata node failure, standby metadata section
Point failure, memory node fault, memory node list disk failure, storage node network fault, main job scheduling node failure and
Task node disk failure;
According to described fault to be measured described target big data platform cluster is carried out with fault simulation, whether shadow after validation fault simulation
Ring big data platform cluster or the use with plug-in unit, obtain the result.
4. adapted TV university data platform method of testing according to claim 1 is it is characterised in that also include:To described mesh
Mark big data platform carries out usability testing, and described usability testing includes installation and deployment test and administration interface test;
Described installation and deployment test includes testing the plug-in unit installed on described target big data platform and whether version is compatible, inspection
Survey the adaptation platform of described target big data platform, complexity is installed, have or not graphical, the information such as configuration complexity and version
Information;
Whether the administration interface that described administration interface test includes testing described target big data platform can access, and test is described
Whether the file system of target big data platform can normally use, and test is in the management field of described target big data platform
Daily record whether can be checked under face, whether there is graphic interface, whether there is templating Service Management and whether there is day
Will analytic function.
5. adapted TV university data platform method of testing according to claim 3 is it is characterised in that also include to described target
Big data platform cluster carries out dynamic expansion test, and described dynamic expansion test includes:
Increase a back end in the described target big data platform cluster being currently running, verify whether to affect file system
Use.
6. adapted TV university data platform method of testing according to claim 3 is it is characterised in that also include to described target
Big data platform cluster carries out security test, and described security test includes:
Input validated user information, inactive users information respectively in the described target big data platform cluster being currently running, point
Not verifying whether can be with login system;
Or
Using various authorities login system respectively in the described target big data platform cluster being currently running, verify described target
Big data platform cluster whether covering system all permissions;
Or
Accessed after described target big data platform cluster using illegal way, check whether audit log has recorded this illegal visit
Ask.
7. adapted TV university data platform method of testing according to claim 1 is it is characterised in that described performance test includes
One of data write test, data read test, data query test, data sorting test data correlation inquiry test
Or arbitrarily multiple combinations.
8. adapted TV university data platform method of testing according to claim 1 is it is characterised in that can also include according to each
Described adapted electricity tables of data carries out algorithm model test to target big data platform respectively;Described algorithm model test includes K-
Means cluster test and Linear Regression class test.
9. adapted TV university data platform method of testing according to claim 1 is it is characterised in that described target big data is put down
Platform is CDH platform, TDH platform, HDP platform or Oracle platform.
10. a kind of adapted TV university data platform test system is it is characterised in that include:
Signal generating unit, generates the adapted electricity tables of data of multiple data volume ranks for simulation;
Test cell, for performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data, record
Test relevant parameter under each described data volume rank for the described target big data platform, described test relevant parameter includes starting
Execution time, end execution time, cpu busy percentage, disk I/O interface writing speed and memory usage;
First processing units, for determining described mesh respectively according to each described Starting Executing Time, each described end execution time
Mark execution time under each described data volume rank for the big data platform;
Second processing unit, for according to the data strip number in each described execution time, each described adapted electricity tables of data and each institute
The data values stating adapted electricity tables of data determine unit under each described data volume rank for the described target big data platform respectively
The data strip number of time execution and the data values of unit interval execution;
Comparative analysiss unit, for described execution time under each described data volume rank for the described target big data platform,
The data strip number of described unit interval execution, the data values of execution of described unit interval, described cpu busy percentage, described disk
I/O interface writing speed and described memory usage are compared analysis, obtain first performance test result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610815863.3A CN106445763A (en) | 2016-09-09 | 2016-09-09 | Power distribution and utilization big data platform test method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610815863.3A CN106445763A (en) | 2016-09-09 | 2016-09-09 | Power distribution and utilization big data platform test method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106445763A true CN106445763A (en) | 2017-02-22 |
Family
ID=58169202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610815863.3A Pending CN106445763A (en) | 2016-09-09 | 2016-09-09 | Power distribution and utilization big data platform test method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106445763A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423818A (en) * | 2017-06-26 | 2017-12-01 | 中国电力科学研究院 | A kind of method and system of the test data set generation of power information acquisition system unified interface |
CN110049028A (en) * | 2019-04-03 | 2019-07-23 | 北京奇安信科技有限公司 | Monitor method, apparatus, computer equipment and the storage medium of domain control administrator |
CN110753025A (en) * | 2019-01-07 | 2020-02-04 | 陈庆梅 | Big data security access control method |
CN111796998A (en) * | 2019-06-27 | 2020-10-20 | 上海市计量测试技术研究院 | AML language performance verification system |
CN111796805A (en) * | 2019-06-27 | 2020-10-20 | 上海市计量测试技术研究院 | AML language performance verification method |
CN112231195A (en) * | 2020-12-14 | 2021-01-15 | 广东睿江云计算股份有限公司 | Cloud service performance testing method |
CN116737554A (en) * | 2023-05-30 | 2023-09-12 | 福芯高照(上海)科技有限公司 | Intelligent analysis processing system and method based on big data |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102254035A (en) * | 2011-08-09 | 2011-11-23 | 广东电网公司电力科学研究院 | Relational database testing method and system |
CN102968374A (en) * | 2012-11-29 | 2013-03-13 | 中国移动(深圳)有限公司 | Data warehouse testing method |
US20130159353A1 (en) * | 2011-12-20 | 2013-06-20 | International Business Machines Corporation | Generating a test workload for a database |
CN103425683A (en) * | 2012-05-18 | 2013-12-04 | 上海宝信软件股份有限公司 | Database performance test system |
CN104794007A (en) * | 2015-04-29 | 2015-07-22 | 中国电力科学研究院 | Mass data parallel processing testing method based on electric large data platform |
CN105389401A (en) * | 2015-12-25 | 2016-03-09 | 北京奇虎科技有限公司 | Method and device for testing performance of database |
CN105912681A (en) * | 2016-04-14 | 2016-08-31 | 国家电网公司 | Aging testing method and system for electricity marketing database |
-
2016
- 2016-09-09 CN CN201610815863.3A patent/CN106445763A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102254035A (en) * | 2011-08-09 | 2011-11-23 | 广东电网公司电力科学研究院 | Relational database testing method and system |
US20130159353A1 (en) * | 2011-12-20 | 2013-06-20 | International Business Machines Corporation | Generating a test workload for a database |
CN103425683A (en) * | 2012-05-18 | 2013-12-04 | 上海宝信软件股份有限公司 | Database performance test system |
CN102968374A (en) * | 2012-11-29 | 2013-03-13 | 中国移动(深圳)有限公司 | Data warehouse testing method |
CN104794007A (en) * | 2015-04-29 | 2015-07-22 | 中国电力科学研究院 | Mass data parallel processing testing method based on electric large data platform |
CN105389401A (en) * | 2015-12-25 | 2016-03-09 | 北京奇虎科技有限公司 | Method and device for testing performance of database |
CN105912681A (en) * | 2016-04-14 | 2016-08-31 | 国家电网公司 | Aging testing method and system for electricity marketing database |
Non-Patent Citations (1)
Title |
---|
连智博: "面向对象数据库系统评估与测试技术的研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107423818A (en) * | 2017-06-26 | 2017-12-01 | 中国电力科学研究院 | A kind of method and system of the test data set generation of power information acquisition system unified interface |
CN110753025A (en) * | 2019-01-07 | 2020-02-04 | 陈庆梅 | Big data security access control method |
CN110049028A (en) * | 2019-04-03 | 2019-07-23 | 北京奇安信科技有限公司 | Monitor method, apparatus, computer equipment and the storage medium of domain control administrator |
CN110049028B (en) * | 2019-04-03 | 2021-03-23 | 奇安信科技集团股份有限公司 | Method and device for monitoring domain control administrator, computer equipment and storage medium |
CN111796998A (en) * | 2019-06-27 | 2020-10-20 | 上海市计量测试技术研究院 | AML language performance verification system |
CN111796805A (en) * | 2019-06-27 | 2020-10-20 | 上海市计量测试技术研究院 | AML language performance verification method |
CN111796998B (en) * | 2019-06-27 | 2024-05-07 | 上海市计量测试技术研究院 | AML language performance verification system |
CN111796805B (en) * | 2019-06-27 | 2024-05-07 | 上海市计量测试技术研究院 | AML language performance verification method |
CN112231195A (en) * | 2020-12-14 | 2021-01-15 | 广东睿江云计算股份有限公司 | Cloud service performance testing method |
CN112231195B (en) * | 2020-12-14 | 2021-03-30 | 广东睿江云计算股份有限公司 | Cloud service performance testing method |
CN116737554A (en) * | 2023-05-30 | 2023-09-12 | 福芯高照(上海)科技有限公司 | Intelligent analysis processing system and method based on big data |
CN116737554B (en) * | 2023-05-30 | 2023-12-22 | 内蒙古蒙嘟嘟科技服务有限公司 | Intelligent analysis processing system and method based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106445763A (en) | Power distribution and utilization big data platform test method and system | |
CN106815131B (en) | A kind of game resource inspection method and system based on Unity engine | |
CN103744759A (en) | Method for verifying unattended disk performance and stability under Linux system | |
Traoré et al. | Capturing the dual relationship between simulation models and their context | |
CN106611046A (en) | Big data technology-based space data storage processing middleware framework | |
CN109344056B (en) | Test method and test device | |
US8412548B2 (en) | Linked decision nodes in a business process model | |
CN107659455A (en) | A kind of method, storage medium, equipment and the system of iOS ends Mock data | |
CN109446104A (en) | A kind of testing case management and device based on big data | |
CN104915262B (en) | A kind of check system and its method based on EXCEL data structures | |
CN107729227A (en) | Application testing range determining method, system, server and storage medium | |
CN103106260B (en) | A kind of method for building up of Virtual File System of actor-oriented | |
CN103257987A (en) | Rule-based distributed log service implementation method | |
CN102495730A (en) | Dynamic and extendable web interface method | |
CN107122238A (en) | Efficient iterative Mechanism Design method based on Hadoop cloud Computational frame | |
CN104036039A (en) | Parallel processing method and system of data | |
CN107943412A (en) | A kind of subregion division, the method, apparatus and system for deleting data file in subregion | |
US8850407B2 (en) | Test script generation | |
CN105988863A (en) | Event processing method and device | |
CN110134646A (en) | The storage of knowledge platform service data and integrated approach and system | |
CN105224607B (en) | A kind of Virtual File System design method for simulating cloud storage equipment | |
CN103810258A (en) | Data aggregation scheduling method based on data warehouse | |
Yang et al. | On construction of the air pollution monitoring service with a hybrid database converter | |
CN106649452A (en) | Method of generating template graphics | |
CN110232063A (en) | Hierarchical data querying method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170222 |
|
RJ01 | Rejection of invention patent application after publication |