CN106445763A

CN106445763A - Power distribution and utilization big data platform test method and system

Info

Publication number: CN106445763A
Application number: CN201610815863.3A
Authority: CN
Inventors: 赵云; 李鹏; 钱斌; 肖勇; 李秋硕; 赖宇阳
Original assignee: China South Power Grid International Co ltd; Power Grid Technology Research Center of China Southern Power Grid Co Ltd
Current assignee: China South Power Grid International Co ltd; Power Grid Technology Research Center of China Southern Power Grid Co Ltd
Priority date: 2016-09-09
Filing date: 2016-09-09
Publication date: 2017-02-22

Abstract

The invention relates to a power distribution and utilization big data platform test method and a system, wherein the method comprises the following steps: simulating to generate a power distribution and utilization data table with a plurality of data volume levels; respectively carrying out performance test on the target big data platform according to each power distribution and utilization data table, and recording the starting execution time, the ending execution time, the CPU utilization rate, the disk IO interface writing speed and the memory utilization rate of the target big data platform at each data volume level; determining the execution time of the target big data platform under each data level grade, the number of data executed in unit time and the data quantity value executed in unit time according to each starting execution time and each ending execution time; and comparing and analyzing the execution time of the target big data platform at each data level, the number of data executed in unit time, the data quantity executed in unit time, the CPU utilization rate, the write-in speed of the disk IO interface and the memory utilization rate. By adopting the scheme of the invention, the performance test of each big data platform can be realized.

Description

Adapted TV university data platform method of testing and system

Technical field

The present invention relates to technical field of electric power, more particularly to a kind of adapted TV university data platform method of testing and system.

Background technology

The core of intelligent grid is the depth integration of electrical energy stream and flow of information.Through the high speed development of more than 30 years, China has been built up a flexible structure, advanced technology, reliable electrical energy flow network；Prominent with information technology Fly to push ahead vigorously and being greatly reduced of construction cost, matched information flow network also achieves and develops rapidly.Correspondingly, distribution With the basis of electricity consumption with service data through accumulation, explosive increase especially in recent years for many years, contain extremely valuable in a large number Information it would be highly desirable to carry out depth excavation, comprehensive utilization, provide solid for building green, economic, reliable intelligent grid Support.

Join electricity consumption data for the electricity data explosion growth of intelligent adapted and the challenge of electrical network business innovation and intelligence quick-fried The urgent needss that fried property increases and electrical network business is innovated, can join information integrated platform with existing electric energy data center and battalion, from The development of the stage constructions such as intelligent adapted TV university data system structure, data management, knowledge model, business innovation and Applied D emonstration is ground Study carefully, further expand adapted electrical network inside and outside (meteorology, economy, user etc.) basic data source data scale, propose to join Electricity consumption big data innovation theory and technology, Demonstration Construction adapted TV university data platform, and realized in vastness based on big data technology Magnanimity join electricity consumption rule, interactive potentiality and the market behavior finding user in electricity consumption data, with user, extensively interactive lifting is joined The considerable controllability of electrical network, security reliability and economic operation level, with personalized service lifting user power utilization efficiency, Ke Human Meaning degree and interactive response level, promote adapted electrical network from Traditional business models to big based on data message dependency simultaneously Data service mode changes, and realizes adapted electric industry business model innovation and social value's lifting.

Big data platform on domestic and international market is varied at present, and the performance how evaluating each big data platform becomes One problem demanding prompt solution.

Content of the invention

It is an object of the invention to provide a kind of adapted TV university data platform method of testing and system, it is possible to achieve to each The performance test of individual big data platform.

The purpose of the present invention is achieved through the following technical solutions：

A kind of adapted TV university data platform method of testing, including：

Simulation generates the adapted electricity tables of data of multiple data volume ranks；

Performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data, records described target big Test relevant parameter under each described data volume rank for the data platform, described test relevant parameter include Starting Executing Time, Terminate execution time, cpu busy percentage, disk I/O interface writing speed and memory usage；

Determine that described target big data platform exists respectively according to each described Starting Executing Time, each described end execution time Execution time under each described data volume rank；

According to the data strip number in each described execution time, each described adapted electricity tables of data and each described adapted electricity tables of data Data values determine respectively described target big data platform under each described data volume rank unit interval execution data Bar number and the data values of unit interval execution；

Described execution time under each described data volume rank for the described target big data platform, described unit interval are held The data strip number of row, the data values of execution of described unit interval, described cpu busy percentage, described disk I/O interface writing speed It is compared analysis with described memory usage, obtain first performance test result.

A kind of adapted TV university data platform test system, including：

Signal generating unit, generates the adapted electricity tables of data of multiple data volume ranks for simulation；

Test cell, for performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data, Record test relevant parameter under each described data volume rank for the described target big data platform, described test relevant parameter includes Starting Executing Time, end execution time, cpu busy percentage, disk I/O interface writing speed and memory usage；

First processing units, for determining institute respectively according to each described Starting Executing Time, each described end execution time State execution time under each described data volume rank for the target big data platform；

Second processing unit, for according to the data strip number in each described execution time, each described adapted electricity tables of data and The data values of each described adapted electricity tables of data determine described target big data platform respectively under each described data volume rank The data strip number of unit interval execution and the data values of unit interval execution.

Comparative analysiss unit, for the described execution under each described data volume rank to described target big data platform Time, the data strip number of execution of described unit interval, the data values of execution of described unit interval, described cpu busy percentage, described Disk I/O interface writing speed and described memory usage are compared analysis, obtain first performance test result.

According to the scheme of the invention described above, it is the adapted electricity tables of data that simulation generates multiple data volume ranks, according to each Described adapted electricity tables of data carries out performance test to target big data platform respectively, records described target big data platform in each institute State the test relevant parameter under data volume rank, described test relevant parameter include Starting Executing Time, terminate execution time, Cpu busy percentage, disk I/O interface writing speed and memory usage, hold according to each described Starting Executing Time, each described end The row time determines execution time under each described data volume rank for the described target big data platform respectively, according to each described execution The data values of the data strip number in time, each described adapted electricity tables of data and each described adapted electricity tables of data determine described respectively Data strip number and the number of unit interval execution that unit interval under each described data volume rank for the target big data platform executes According to value, described execution time under each described data volume rank for the described target big data platform, described unit interval are held The data strip number of row, the data values of execution of described unit interval, described cpu busy percentage, described disk I/O interface writing speed It is compared analysis with described memory usage, obtain first performance test result, the solution of the present invention pair can be respectively adopted Each big data platform carries out performance test it is achieved that the test of performance to each big data platform, can based on test result, Big data platform needed for the situation of own hardware configuration surroundings and business datum amount Rational choice.

Brief description

Fig. 1 realizes schematic flow sheet for the adapted TV university data platform method of testing of the embodiment of the present invention one；

Fig. 2 is the composition schematic flow sheet of the adapted TV university data platform test system of the embodiment of the present invention two.

Specific embodiment

For making the objects, technical solutions and advantages of the present invention become more apparent, below in conjunction with drawings and Examples, to this Invention is described in further detail.It should be appreciated that specific embodiment described herein is only in order to explain the present invention, Do not limit protection scope of the present invention.

Embodiment one

The embodiment of the present invention one provides a kind of adapted TV university data platform method of testing, shown in Figure 1, is the present invention The adapted TV university data platform method of testing of embodiment one realize schematic flow sheet.As shown in figure 1, the adapted electricity of the present embodiment Big data platform test method comprises the steps：

Step S101：Simulation generates the adapted electricity tables of data of multiple data volume ranks；

Specifically, generation multiple different pieces of information magnitudes other adapted electricity tables of data can be simulated in oracle database, And be data distribution table name in the adapted electricity tables of data of each data volume rank, the electric tables of data of adapted of different data volume ranks Corresponding table name is different.

Here, under the number of levels of data volume rank and each data volume rank join electricity consumption data record number (or Referred to as data strip number) and size (or referred to as data values) can set according to actual needs.Shown in table 1 is in reality The relevant parameter of the adapted electricity tables of data of each set data volume rank in the test of border.But the data magnitude in the present embodiment The setting means not limited to this of other adapted electricity tables of data.

The relevant parameter of the adapted electricity data of table 1 data volume rank

Step S102：Performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data, record Test relevant parameter under each described data volume rank for the described target big data platform, described test relevant parameter includes starting Execution time, end execution time, CPU (Central Processing Unit, central processing unit) utilization rate, disk I/O (Input Output, input and output) interface writing speed and memory usage；

Here, described performance test can include data write test, data read test, data query test, data One of sequence test data correlation inquiry test or arbitrarily multiple combinations.

Described Starting Executing Time is different according to the species of performance test with described end execution time, can represent different Implication.For example, when carrying out data write test, Starting Executing Time represents the beginning write time, terminates implementation schedule Show the end write time, also similar for other kinds of performance test, here does not repeat one by one.

Wherein, data write test data read test can include the write to three kinds of different types of data respectively Test and read test；These three dissimilar inclusion structural datas, non-institutional data and semi-structured data.Specifically Ground, when carrying out data write test, is to join each generating in a specific oracle database (Oracle platform) Electricity consumption data table writes target big data platform, when carrying out data read test, is by the adapted in target big data platform Electric tables of data writes this specific Oracle platform.

Here, described target big data platform is CDH platform, TDH platform, HDP platform or Oracle platform.

The full name of CDH be Cloudera ' s_Distribution Including Apache Hadoop, be with Big data management platform based on ApacheHadoop.The full name Transwarp Data Hub of TDH, is that Hadoop cluster is big Data platform.HDP full name Hortonworks Data Platform, is Apache Hadoop big data management platform.

Wherein, above-mentioned cpu busy percentage, disk I/O interface writing speed and memory usage can be remembered every setting time Record is once it is also possible to only set moment record once；

Step S103：Determine that described target is big respectively according to each described Starting Executing Time, each described end execution time Execution time under each described data volume rank for the data platform；

Specifically, deduct corresponding Starting Executing Time with the end execution time under each data volume rank respectively to obtain Execution time under each data volume rank, for example, deducted under data volume rank 1 with the end execution time under data volume rank 1 Starting Executing Time execution time of obtaining under data volume rank 1, deduct number with the end execution time under data volume rank 2 Obtain the execution time under data volume rank 2 according to the Starting Executing Time under magnitude other 2, by that analogy.

Step S104：According to the data strip number in each described execution time, each described adapted electricity tables of data with each described join The data values of electricity consumption data table determine the unit interval under each described data volume rank for the described target big data platform respectively The data strip number of execution and the data values of unit interval execution；

Specifically, obtain each data with the data strip number in each adapted electricity tables of data divided by corresponding execution time respectively Magnitude not under unit interval execution data strip number；Held divided by corresponding with the data values of each adapted electricity tables of data respectively The row time obtains the data values of the unit interval execution under each data volume rank, for example, with the adapted electricity number of data volume rank 1 Obtain the unit interval execution under data volume rank 1 according to the data strip number in table divided by data volume rank 1 corresponding execution time Data strip number, with the data values of the adapted of data volume rank 1 electricity tables of data divided by data volume rank 1 corresponding execution time Obtain the data values of the unit interval execution under data volume rank 1, with the data in the adapted electricity tables of data of data volume rank 2 Bar number obtains the data strip number of the unit interval execution under data volume rank 2 divided by data volume rank 2 corresponding execution time, uses The data values of the adapted electricity tables of data of data volume rank 2 obtain data magnitude divided by data volume rank 2 corresponding execution time The data values of the unit interval execution under other 2, by that analogy.

Step S105：To described execution time under each described data volume rank for the described target big data platform, described The data strip number of unit interval execution, the data values of execution of described unit interval, described cpu busy percentage, described disk I/O connect Mouth writing speed and described memory usage are compared analysis, obtain first performance test result.

Specifically, to described execution time under each described data volume rank for the described target big data platform, described list The data strip number of position time execution, the data values of execution of described unit interval, described cpu busy percentage, described disk I/O interface Writing speed and described memory usage are compared analysis, obtain described target big data platform corresponding execution time, list The data strip number of position time execution, the data values of unit interval execution, cpu busy percentage, disk I/O interface writing speed and interior Deposit utilization rate in the inter-step variation tendency of each data volume and whether saltus step etc. occurs.In practical implementations, can distinguish Set up execution time, the data strip number of unit interval execution, the data values of unit interval execution, cpu busy percentage, disk I/O connect Mouth writing speed and the chart of memory usage, abscissa is data volume rank, and vertical coordinate is respectively execution time, unit interval The data strip number of execution, the data values of unit interval execution, cpu busy percentage, disk I/O interface writing speed and internal memory use Rate, searches execution time, the data strip number of unit interval execution, the data values of unit interval execution, CPU utilization by chart Rate, disk I/O interface writing speed and memory usage in the inter-step variation tendency of each data volume.

Here, the data strip number that first performance test result can refer to execution time, the unit interval executes, unit interval hold The data values of row, cpu busy percentage, disk I/O interface writing speed and memory usage in the inter-step change of each data volume Change trend.

If every setting time record once described cpu busy percentage, described disk I/O interface writing speed and described interior Deposit utilization rate, then the meansigma methodss of cpu busy percentage of record, disk I/O interface under the corresponding data volume rank of comparative analysiss here The meansigma methodss of writing speed and the meansigma methodss of memory usage.

Accordingly, the scheme according to above-mentioned the present embodiment, it is the adapted electricity tables of data that simulation generates multiple data volume ranks, Performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data, records described target big data platform Test relevant parameter under each described data volume rank, described test relevant parameter includes Starting Executing Time, terminates execution Time, cpu busy percentage, disk I/O interface writing speed and memory usage, according to each described Starting Executing Time, each described knot Bundle execution time determines execution time under each described data volume rank for the described target big data platform, respectively according to each described The data values of the data strip number in execution time, each described adapted electricity tables of data and each described adapted electricity tables of data determine respectively Data strip number and unit interval execution that unit interval under each described data volume rank for the described target big data platform executes Data values, during to described execution time under each described data volume rank for the described target big data platform, described unit Between execution data strip number, the described unit interval execution data values, described cpu busy percentage, described disk I/O interface write Speed and described memory usage are compared analysis, obtain first performance test result, can be respectively adopted the present embodiment Scheme carries out performance test to each big data platform respectively it is achieved that the test of performance to each big data platform, can be with base In the test result of each big data platform, the situation of own hardware configuration surroundings and the big number needed for business datum amount Rational choice According to platform.

Additionally, except the test result under the data volume rank to same big data platform is compared analysis, acceptable When being respectively adopted execution under each described data volume rank for the different big data platforms of above-mentioned steps S101- step S104 acquisition Between, the unit interval execution data strip number, the unit interval execution data values, cpu busy percentage, disk I/O interface writing speed With described memory usage, analysis is compared to the related data of different big data platforms.

Wherein in an embodiment, this adapted TV university data platform method of testing, also include：To different types of The number that data strip number that process time under each data volume rank for the big data platform, unit interval are processed, unit interval are processed It is compared analysis according to value, cpu busy percentage, disk I/O interface writing speed and memory usage, obtain the second performance test Result.

Specifically, can be respectively under each data volume rank, to different types of big data platform during corresponding process Between, the unit interval process data strip number, the unit interval process data values, cpu busy percentage, disk I/O interface writing speed It is compared analysis respectively with memory usage, obtain the second the performance test results.Can be in conjunction with the second the performance test results, hard Part configuration surroundings and business datum amount determine required optimum data platform, that is, realize the type selecting of big data platform.

Above-mentioned performance test is mainly the software system behavior expression testing target big data platform under certain condition Whether meet the performance indications of requirement specification.For example, by testing the longest time limit transmitted, the error rate of transmission, the essence calculating The performance indications such as degree, the time limit of response and recovery time limit, whether the software system of checking big data platform can reach demand rule Performance indications proposed in lattice explanation, it was found that performance bottleneck in the presence of the software system of big data platform, reach excellent Change the purpose of software system.

Additionally, for big data platform, reliability is also an important evaluation index, also necessary flat to big data Platform reliability is tested, reliability be mainly test big data platform structure, destructuring and semi-structured memory node, Network or single disk break down (or accident) when, the influence on system operation situation to whole system, and the result according to test, Optimize corresponding big data frame structure, network topology deployment architecture.

Specifically, in wherein embodiment, the adapted TV university data platform method of testing of the present invention, its feature exists In constituting target big data platform cluster by multiple same type of big data platforms, to described target big data platform cluster Carry out reliability testing, described reliability testing includes：Choose fault to be measured from default fault set, wrap in described fault set Include main metadata node failure, standby metadata node fault, memory node fault, memory node list disk failure, memory node Network failure, main job scheduling node failure and task node disk failure；According to described fault to be measured to described target big number Carry out fault simulation according to platform cluster, after validation fault simulation, whether affect big data platform cluster or the use with plug-in unit, Obtain the result.

Wherein, the network topology deployment architecture of target big data platform cluster can be set according to actual needs.Mesh Mark big data platform cluster includes main metadata node, standby metadata node, memory node, main job scheduling node and task Node；Simulation main metadata node failure can be simulation host node process dies, and the standby metadata node fault of simulation can be Delete standby metadata node fault；Analog storage node failure may be off memory node；The event of analog storage node list disk Barrier can be all hard disks pulling out a memory node manually；Analog storage meshed network fault can be in a storage section Point simulates machine of automatically delaying.

Additionally, the purpose of usability testing is whether detection user is satisfied with using the systems soft ware of big data platform, its survey Examination purpose is the real work style allowing systems soft ware be suitable for user, rather than forces the work style of user to be adapted to software System.During usability testing in big data platform, it is default whether the installation and deployment of mainly test big data platform meet Plateform system installation and deployment, whether log audit function is complete, the subscriber administration interface whether scheme such as close friend.

Wherein in an embodiment, the adapted TV university data platform method of testing of the present invention, can also include：To described Target big data platform carries out usability testing, and described usability testing includes installation and deployment test and administration interface test；Institute State installation and deployment test includes testing whether the plug-in unit of installation and version on described target big data platform are compatible, detection is described The adaptation platform of target big data platform, complexity is installed, has or not graphical, the information such as configuration complexity and version information； Whether the administration interface that described administration interface test includes testing described target big data platform can access, and test described target Whether the file system of big data platform can normally use, and test is under the administration interface of described target big data platform Daily record whether can be checked, whether there is graphic interface, whether there is templating Service Management and whether there is daily record divide Analysis function.

Extensibility is a kind of design objective that software system is calculated with disposal ability, and high scalability represents a kind of bullet Property, in extending in system developmental process, software ensure that vigorous vitality, is set by little change or even simply hardware Standby adds, and can be achieved with the linear increase of whole system disposal ability, realizes high-throughput and low latency high-performance.Join every year Grown at top speed with the data of several TB with data volume, big data platform is extended with increase-volume is very normal, and test is joined To be a critically important index with the extensibility of big data platform.Wherein in an embodiment, the adapted electricity of the present invention Big data platform test method, can also include described target big data platform cluster is carried out dynamic expansion test, described dynamic State extension test includes：Increase a back end in the described target big data platform cluster being currently running, verify whether The use of impact file system.

Additionally, the adapted TV university data platform method of testing of the present invention, can also include to described target big data platform Cluster carries out security test, and described security test includes：The described target big data platform cluster being currently running is divided Not Shu Ru validated user information, inactive users information, whether separately verify can be with login system；Or, be currently running Using various authorities login system respectively in described target big data platform cluster, verify that described target big data platform cluster is No covering system all permissions；Or, accessed after described target big data platform cluster using illegal way, check audit Whether daily record has recorded this unauthorized access.

In addition, after carrying out usability testing, reliability testing, autgmentability test or security test etc., Analysis can be compared to the test result of different types of big data platform (or different types of cluster), here differs One repeats.

The adapted TV university data platform method of testing of the present invention, can also be according to each described adapted electricity tables of data respectively to mesh Mark big data platform carries out algorithm model test；Described algorithm model test includes K-Means cluster test and Linear Regression class test.Wherein, line algorithm is being entered to target big data platform respectively according to each described adapted electricity tables of data Model measurement and when performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data, record interior Hold and be similar to the processing mode of content.

For the ease of understanding the solution of the present invention it is contemplated that the data characteristic of electrical network, below with respectively to CDH platform, TDH Illustrate as a example platform, HDP platform and four big data platforms of Oracle platform.

1) performance test

1.1) data write test

Here, data write test includes the write test to three kinds of different types of data, and these three are different types of The write test of data is respectively structural data write test, unstructured data write test and semi-structured data and writes Enter.

1.1.1) structural data write test

In oracle database, simulation generates structurized adapted electricity tables of data as other in 9 data magnitudes in table 1, 9 data magnitudes are not imported to CDH platform, TDH platform, HDP platform and Oracle platform from Oracle platform respectively, obtains CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not under the importing time (unit be second), unit The data record bar number (unit is bar number/second) of time storage, the size of data (unit is the MB/ second) of unit interval storage, CPU Utilization rate, disk I/O interface writing speed (unit is the KB/ second) and memory usage, can be by these parameters drafting pattern respectively Table, abscissa is data magnitude, and vertical coordinate is corresponding value of consult volume.

Wherein, CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not under importing time etc. Import time started and the corresponding difference importing the end time in corresponding；

CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not under unit interval storage Data record bar number is equal to the ratio of corresponding data record total number and corresponding importing time；

CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not under unit interval storage Size of data is equal to the ratio of corresponding data total size and corresponding importing time.

In the process that 9 data magnitudes are not directed respectively into CDH platform, TDH platform, HDP platform and Oracle platform In, respectively record CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not under importing start when Between, import end time server disk I/O, CPU, memory consumption situation.

Test result shows：One) for the time of importing, with the increase of data magnitude, each big data platform imports the time Also accordingly with increase.Each big data platform is directed respectively into when joining electricity consumption data less than 100,000,000 orders of magnitude structurized, Import time phase difference less, but after data magnitude reaches 700,000,000, Oracle platform imports data to Oracle platform Time to substantially to grow a lot, and the time phase difference that tri- platforms of CDH, TDH, HDP import is little.

Two) for the data record bar number of unit interval storage, when data magnitude is within 10,000,000, each big several platforms When importing data, curve ratio relatively relaxes, but the increase with data magnitude, when data magnitude is 1,000 ten thousand to 1 hundred million, each flat greatly The platform unit interval data of storage significantly increases；When data magnitude is when reaching 7 hundred million to 30 hundred million, increase further with data volume level Greatly, the data of Oracle platform unit interval storage has obvious falling trend, and the number of other three platform unit interval storage Change less according to basic.

Three) for the data values of unit interval storage, when data magnitude is within 50,000,000, each big several platforms import During data, trend comparison relaxes, but the increase with data magnitude, when data magnitude is 5,000 ten thousand to 1 hundred million, oracle platform The unit interval data values of storage significantly increase, and the data values of other three platform unit interval storage sizes also with Slowly increase；When data magnitude is when reaching 700,000,000, increase further with data volume level, the Oracle platform unit interval stores Data values substantially fall after rise, and the data values of other three platform unit interval storage are basically unchanged；When data magnitude is big When 700,000,000, each big data platform unit interval data storage size is basically unchanged.

Four) for the chart of cpu busy percentage, the increase with data set in each big data platform, service station CPU profit It is basically unchanged with rate, back end cpu busy percentage is with increase, but Oracle platform cpu busy percentage is than other three platforms Exceed a lot, and other three platform cpu busy percentages are more or less the same.

Five) for disk I/O interface writing speed, increase with data set in each big data platform, each platform disk I/O interface writing speed significantly improves.In below data set 200W, disk I/O interface writing speed is more or less the same；Work as data set When 500 ten thousand to 1,000 ten thousand, Oracle platform disk I/O interface writing speed is apparently higher than other three platforms, and other platform Disk I/O interface writing speed is slow；When data set more than 50,000,000, each platform IO read or write speed increases substantially, and Oracle platform disk I/O interface writing speed has obvious falling, and when data set 100,000,000, the disk I/O of other three platforms connects Mouth writing speed changes slowly.

Six) for memory usage, the increase with data set in each big data platform, each platform internal memory utilization rate becomes Change slow, but when Oracle platform imports, memory usage is significantly less than other three platforms.

Brief summary：CDH, TDH, HDP and Oracle platform is shown in write data target test result：

1st, all using hadoop be ecological big data platform before 100,000,000 DBMS amounts, than the write performance of Oralce Than larger gap；After 100000000 data volumes, the advantage of big data just must embody；

2nd, the cpu busy percentage of each service node of big data platform compares relatively low, and the ISCSI node server of Oracle Cpu busy percentage is relatively high；

3rd, after 100,000,000 data volumes, the speed speed of IO depends on hardware to the disk I/O of each service node of big data platform Performance；

4th, the internal memory of big data platform is substantially all taken it is impossible to do other expenses by the service of big data.

1.1.2) unstructured data write test and semi-structured data write test；

Unstructured data write test and semi-structured data write test are phases all with structural data write test As, simply the type of adapted electricity tables of data is different, uses destructuring (tool carrying out unstructured data write test Body be video file) adapted electricity tables of data, carry out semi-structured data write test use semi-structured adapted Electric tables of data, for saving length, here is not repeated one by one.

1.2) data read test

Here, data read test includes the read test to three kinds of different types of data, and these three are different types of The read test of data is respectively structural data read test, unstructured data read test and semi-structured data and reads Take.

1.2.1) structural data read test

In oracle database, simulation generates and as other in 9 data magnitudes in table 1 structurized joins electricity consumption data Table, 9 data magnitudes are not imported to Oracle platform from CDH platform, TDH platform, HDP platform and Oracle platform respectively, Obtain CDH platform, TDH platform, HDP platform and Oracle platform 9 data magnitudes not lower importing time (unit is the second), (unit is MB/ to the size of data that the data record bar number (unit is bar number/second) of unit interval storage, unit interval store Second), cpu busy percentage, disk I/O boot speed (unit be KB/ second) and memory usage, these parameters can be drawn respectively Become chart, abscissa is data magnitude, vertical coordinate is corresponding value of consult volume.

By 9 data magnitudes not respectively from Oracle platform import to CDH platform, TDH platform, HDP platform and During Oracle platform, record CDH platform, TDH platform, HDP platform and Oracle platform are other in 9 data magnitudes respectively Under the importing time started, import end time server disk I/O, CPU, memory consumption situation.

Test result shows：One) for the time of importing, with the increase of data magnitude, import to from each big data platform Oracle plateau time is also accordingly with increase.Each big data platform imports less than discovery importing during 100,000,000 quantity collection Time phase difference less, but after data volume reaches 700,000,000, imports data to the time of Oracle platform in Oracle platform There is obvious falling, and the time phase difference that tri- platforms of CDH, TDH, HDP import is little.

Two) for the data record bar number of unit interval storage, when importing Oracle platform with each big several platform datas, The data acknowledgment number of different magnitude of unit interval storage significantly improves, because Oracle platform holding time storage ratio is larger, Other each plateau time storage embodiments are inconspicuous, but when data magnitude is more than 700,000,000, each big data platform unit interval storage Data is basically unchanged.

Three), for the data values of unit interval storage, when data magnitude is within 10,000,000, each big several platforms import number According to when curve ratio relatively relax, but the increase with data magnitude, when data magnitude is 1,000 ten thousand to 1 hundred million, oracle platform list The size of data of position time storage significantly increases, and the size of data of other three platform unit interval storage is also with slowly increasing Greatly；When data magnitude is when reaching 700,000,000, increase further with data volume level, the data that the Oracle platform unit interval stores Size substantially falls trend after rise, and the size of data of other three platform unit interval storage is basically unchanged；When data magnitude is more than 7 When hundred million, each big data platform unit interval data storage size is basically unchanged.

Four) for the chart of cpu busy percentage, the increase with data set in each big data platform, service station CPU profit It is basically unchanged with rate, back end cpu busy percentage is with increase, but Oracle platform cpu busy percentage is than other three platforms Increase, and other three platform cpu busy percentages are more or less the same.

Five) for disk I/O interface writing speed, increase with data set in each big data platform, each platform disk I O read speed significantly improves.In below data set 200W, disk I/O reading speed is more or less the same；When data set 5,000,000 arrives When 10000000, Oracle platform disk I/O interface writing speed is apparently higher than other three platforms, and the disk I/O of other platform Reading speed is steady；When data set more than 50,000,000, each platform IO read or write speed increases substantially, and Oracle platform disk I O read speed has obvious falling, and when data set 100,000,000, the disk IO read-write speed of other three platforms increases substantially.

Six) for memory usage, the increase with data set in each big data platform, each platform internal memory utilization rate becomes Change slow, but when Oracle platform imports, memory usage is significantly lower than other three platforms.

Brief summary：Import Oracle index test result to CDH, TDH, HDP and Oracle platform structure data to show：

1st, the data deriving from 3 big data platforms whole test process all the time than on the Oracle derivation time all than Relatively slow, analysis reason is mainly 3 big data platforms and will be transformed into HDFS, then can lead such as Oracle platform again, and Oralce is to be introduced directly into another one Oracle, and therefore Oracle herein means than 3 big data platforms and puts on superior performance；

2nd, internal memory and cpu busy percentage, is consistent with performance test trend before, simply disk IO read-write Oracle exists Suddenly increase when 5000000 data, then gradually reduce.

1.1.2) unstructured data read test and semi-structured data read test；

Unstructured data read test and semi-structured data read test are phases all with structural data read test As, simply the type of adapted electricity tables of data is different, uses destructuring (tool carrying out unstructured data read test Body be video file) adapted electricity tables of data, use semi-structured adapted carrying out semi-structured data read test Electric tables of data, for saving length, here is not repeated one by one.

1.3) data query test

Simulate respectively in 9 test data ranks as above in CDH platform, TDH platform, HDP platform and Oracle platform Adapted electricity tables of data in inquire about respectively and specify object, for example, 2 month of inquiry user power utilization type electricity charge situation, checking is big The execution performance to inquiry data for the data platform.Record CDH platform, TDH platform, HDP platform and Oracle platform are surveyed at each The query time of examination data-level, and with every five seconds for example for interval, record cpu busy percentage, the disk I/O interface write speed of each node Degree, memory usage.

Test result shows：For query time, with the increase of data magnitude, each platform inquire about the corresponding time also with Increase.In each big data platform, inquiry is more or less the same less than discovery time during 100,000,000 order of magnitude respectively, but works as data magnitude When after reaching 700,000,000, in Oracle platform inquire about data to Oracle platform time increase substantially, and CDH, TDH, Tri- platform query times of HDP are more or less the same.

Electricity charge feelings from each big data platform 2 month of inquiry user power utilization type are generated according to the query time being recorded Condition, data record (bar number/second) chart of different magnitude of unit interval storage, it is seen that with each big several platforms Data set increases, and in inquiry operation, the data acknowledgment number of different magnitude of unit interval storage significantly improves, but Oracle Platform holding time storage record number is more steady, and other each plateau time storage is in ascendant trend.

Electricity charge feelings from each big data platform 2 month of inquiry user power utilization type are generated according to the query time being recorded Condition, size of data (MB/ second) chart of different magnitude of unit interval storage, it is seen that with each big several platform numbers Increase according to collection, in inquiry operation, different magnitude of unit interval data storage size significantly improves, because Oracle platform accounts for More steady with time data storage size, and other each plateau time data storage size is in ascendant trend.

Cpu busy percentage chart is generated according to the cpu busy percentage being recorded, can be seen that from cpu busy percentage chart, each big With the increase of data set in data platform, during inquiry operation cpu busy percentage with increase, but Oracle platform cpu busy percentage More high than other three platforms, and when other three platform data collection are more than 700,000,000, cpu busy percentage is more or less the same.

Disk I/O interface writing speed (KB/S) chart is generated according to the disk I/O interface writing speed being recorded, from this figure Can be seen that in table, increase with data set in each big data platform, inquiry operation is bright to each platform disk I/O reading speed Aobvious raising.When inquiring about 500W data below collection, disk I/O reading speed is more or less the same；When inquiry 1,000 ten thousand to 5,000 ten thousand data During collection, the disk I/O reading speed amplification of each platform is little；When inquiring about 100,000,000 data set, each platform IO read or write speed is significantly Improve, and Oracle platform disk I/O reading speed has obvious falling, the disk I/O of each platform when inquiring about 3,000,000,000 data set Reading speed changes very greatly.

Memory usage chart is generated according to the memory usage being recorded, can be seen that from this chart, in each big data With the increase of inquiry data set in platform, each platform internal memory utilization rate change is slow, but Oracle platform internal memory utilization rate It is significantly less than other three platforms.

Brief summary：CDH, TDH, HDP and Oracle platform data inquiry index test result is shown：

1st, 3 big data platforms hardly differ to the performance of inquiry, wherein CDH somewhat than TDH and HDP strong little by little, Oracle is better than the platform of big data structure among the inquiry less than 100,000,000 data magnitudes, and then process time is elongated afterwards；

1.4) data sorting test

Simulate respectively in 9 test data ranks as above in CDH platform, TDH platform, HDP platform and Oracle platform Adapted electricity tables of data in takes out and specifies number (such as 20) data, and the data taken out is ranked up inquire about (for example Descending is inquired about), record CDH platform, TDH platform, HDP platform and Oracle platform are in the query execution of each test data rank Time, disk I/O, CPU, memory consumption situation.

Below by from each big data platform take out 100 data do descending sort to Oracle as a example illustrate test result.

Do descending sort by taking out 100 data from each big data platform to the query execution time generation figure of Oracle Table, can be seen that the increase with data magnitude by the chart generating, each big data platform corresponding time to sorting operation Also with increase.Find the time phase difference of sequence when sequence is less than 100,000,000 quantity magnitude less in each big data platform, but After data magnitude reaches 700,000,000, in Oracle platform, the time of sorting data is improved largely, and CDH, TDH, HDP Three platform sorting times are more or less the same.

Do descending sort by taking out 100 data from each big data platform to the data note of the unit interval storage of Oracle Record (bar number/second) generates chart, be can be seen that by the chart generating to take out 100 data with each big several platforms and do descending and looks into During inquiry, the data acknowledgment number of different magnitude of unit interval storage significantly improves, because Oracle platform holding time stores ratio Less, other each plateau time storages are in ascendant trend.

Will be big for the data doing descending sort to the unit interval storage of Oracle from each big data platform taking-up 100 data Little (MB/ second) generates chart, can be seen that when data magnitude is within 5,000,000 by the chart generating, in Oracle platform row Ordinal number according to when curve fluctuation big, but the increase with data magnitude, when 500 ten thousand to 30 hundred million, curve fluctuation is relatively more steady, and its The size of data of his three platform unit interval storage is also with slowly increasing；When data magnitude is when reaching 1,500,000,000, with number Improve further according to magnitude, Oracle platform unit interval data storage size has through micro- falling.

The cpu busy percentage being descending sort to Oracle from each big data platform taking-up 100 data is generated chart, by Generate chart can be seen that the increase with sorting data collection in each big data platform, cpu busy percentage with increase, but Oracle platform cpu busy percentage increases than other three platforms, and other three platform cpu busy percentages are more or less the same.

Do descending sort by taking out 100 data from each big data platform to the disk I/O interface writing speed of Oracle (KB/S) generate chart, be can be seen that by the chart generating and increase with data set in each big data platform, sorting operation pair Each platform disk I/O reading speed significantly improves.When inquiring about 500W data below collection, disk I/O reading speed rises less； When inquiring about 1,000 ten thousand to 5,000 ten thousand data set, the disk I/O reading speed of each platform is similar；When inquiring about 100,000,000 data set, Each platform IO read or write speed amplitude improves, and Oracle platform disk I/O reading speed has obvious falling, in inquiry 3,000,000,000 number Disk I/O reading speed according to each platform during collection changes very greatly.

Do descending sort by taking out 100 data from each big data platform to the memory usage generation chart of Oracle, The increase with data set in each big data platform be can be seen that by the chart generating, each platform internal memory utilization rate change is slow Slowly, but Oracle platform internal memory utilization rate is significantly less than other three platforms.

Brief summary：CDH, TDH, HDP and Oracle platform data sequence index test result is shown：With data query index Result be consistent.

1.5) data association inquiry test

The adapted electricity tables of data of 9 amount of test data ranks first in simulation generates as table 1 in Oracle platform； Secondly, respectively in the left pass joint investigation in 9 adapted electricity tables of data respectively of CDH platform, TDH platform, HDP platform and Oracle platform Ask the target data electricity charge situation of user power utilization type (such as 2 month), and record CDH platform, TDH platform, HDP platform and Query execution time under each test data rank for the Oracle platform, disk I/O, CPU, memory consumption situation.

Below to electricity charge table and use in the adapted electricity tables of data of each amount of test data rank from each big data platform Family type list 2 month of left correlation inquiry, the test result of electricity charge situation of user power utilization type illustrated.

Electricity charge table in the electric tables of data of adapted being generated from each big data platform by the query execution time in test result With user type table left 2 month of correlation inquiry user power utilization type electricity charge situation, the data of different amount of test data ranks arrives Time (second) chart of Oracle, can be seen that from chart, with the increase of data magnitude, when each platform correlation inquiry is corresponding Between also with increase.In each big data platform correlation inquiry be less than 100,000,000 order of magnitude when find correlation inquiry time phase difference not Greatly, but after data magnitude reaches 700,000,000, in Oracle platform, the time of correlation inquiry data is improved largely, and Tri- platform correlation inquiry time phase differences of CDH, TDH, HDP are little.

According to the electricity charge in the electric tables of data of adapted that the query execution time in test result generates from each big data platform Table and the electricity charge situation of user type table 2 month of left correlation inquiry user power utilization type, the unit of different amount of test data ranks Data record (bar number/second) chart of time storage, can be seen that, with each big several platform electricity charge tables and user type from chart The data set of the left correlation inquiry of table increases, and the data acknowledgment number of different magnitude of unit interval storage significantly improves, due to The storage of Oracle platform holding time is smaller, and other each plateau time storages are in ascendant trend.

According to the electricity charge in the electric tables of data of adapted that the query execution time in test result generates from each big data platform Table and the electricity charge situation of user type table 2 month of left correlation inquiry user power utilization type, the unit of different amount of test data ranks Size of data (MB/ second) chart of time storage, can be seen that, when data magnitude is within 5,000,000, in Oracle from chart During platform correlation inquiry, curve fluctuation is big, but the increase with data magnitude, when 500 ten thousand to 30 hundred million, curve fluctuation is more flat Surely, and the size of data of other three platform unit interval storages also with slowly increasing；When data magnitude is when reaching 1,500,000,000, Improve further with data volume level, the data that the Oracle platform unit interval stores has slight falling.

According to electricity charge table in the electric tables of data of adapted that the cpu busy percentage in test result generates from each big data platform With user type table left 2 month of correlation inquiry user power utilization type electricity charge situation, the CPU of different amount of test data ranks utilizes Rate chart, can be seen that from chart, the increase with correlation inquiry data set in each big data platform, cpu busy percentage with Increase, but Oracle platform cpu busy percentage increases than other three platforms, and other three platform cpu busy percentage phases Difference is less.

Generate the adapted electricity from each big data platform according to disk I/O interface writing speed (KB/S) in test result The electricity charge situation of electricity charge table and user type table 2 month of left correlation inquiry user power utilization type, different test datas in tables of data (KB/S chart, can be seen that from chart magnitude other disk I/O interface writing speed, with data set in each big data platform Increase, each platform disk I/O reading speed significantly improves.When data set is below 1,500,000,000, disk I/O reading speed Amplitude Ratio is relatively Relax；When data set 3,000,000,000, Oracle platform disk I/O reading speed is significantly lower than other three platforms, and other platform Disk I/O reading speed is very big.Generate the adapted electricity number from each big data platform according to the memory usage in test result According to the electricity charge situation of electricity charge table in table and user type table 2 month of left correlation inquiry user power utilization type, different amount of test data The memory usage chart of rank, can be seen that from chart, with the increase of data set, each platform internal memory in each big data platform Utilization rate change is slow, but Oracle platform internal memory utilization rate is significantly less than other three platforms.

Brief summary：CDH, TDH, HDP and Oracle platform data coupling index test result is shown：With data query index Result be consistent.

2) reliability testing

2.1) memory node engine test

2.1.1) main metadata node failure test

Simulate the big data platform fault at the host node in each big data platform cluster, the big data at checking host node Whether platform fault affects the use of system and plug-in unit it is therefore an objective to test big data platform fault (the such as process at host node Die) whether impact is had or not on big data platform cluster.

Specifically, for target data platform cluster (any one in each big data platform cluster), after normal startup, Execution catalogue checks that the process number of corresponding main metadata node is searched in operation, after deleting the corresponding process of this process number, checks Whether WEB page can be accessed.

Test result shows, main metadata is lost or when breaking down, no matter be 3 big data platform cluster or Oracle cluster platform, all can cause fault to system operation, if after crossing reply main metadata, business datum will not be lost.

2.1.2) standby metadata node fault test

Simulate the big data platform fault of the standby metadata node in each big data platform cluster, the standby metadata node of checking The big data platform fault at place whether affects the use of system and plug-in unit it is therefore an objective to the big data at the standby metadata node of test is put down Whether platform fault (such as hard disk failure) has or not impact to big data platform cluster.

2.1.3) memory node fault test

Simulate the big data platform fault at the one of back end in each big data platform cluster, verify data section Big data platform fault (for example deleting at this node) at point whether affects the use of system and plug-in unit it is therefore an objective to test is standby Whether the big data platform fault (such as hard disk failure) at metadata node has or not impact to big data platform cluster.

2.1.4) memory node list disk failure test

Simulate the hard disk failure at the one of back end in each big data platform cluster, at checking back end Whether hard disk failure affects the use of system and plug-in unit it is therefore an objective to test the one of data section in each big data platform cluster Whether the hard disk failure at point has or not impact to big data platform cluster.

Test result shows：The integrity no shadow to data storage for the hard disk failure of any node in big data platform cluster Ring

2.1.5) storage node network fault test

Simulate the network failure at the one of back end in each big data platform cluster (machine fault of delaying), verify number Whether affect the use of system and plug-in unit according to the network failure at node it is therefore an objective to test in each big data platform cluster wherein Whether the network failure at one back end has or not impact to big data platform cluster.

Test result shows, in big data platform cluster, any memory node section is surprisingly delayed machine, does not interfere with file System and the normal use of hbase.

2.2) parallel computation engine test

2.2.1) main job scheduling node failure test

Simulate the main job scheduling node failure in each big data platform cluster, the network failure at checking back end is No impact scheduling system and hbase using it is therefore an objective in test cluster main job scheduling node failure data is debugged complete Whole property has or not impact.

Test result shows, the fault that each big data platform cluster dispatches host node has shadow to the integrity of data storage Ring.

2.2.2) task node disk failure test

Simulate wherein one task data node hard disk failure in each big data platform cluster, validation task back end Hard disk failure whether affect scheduling system and Oozie using it is therefore an objective in each big data platform cluster manually removes any its In an Oozie node, file system is not no affected.

Test result shows, the fault that cluster Oozie scheduling node deleted by each big data platform is complete to data storage Property does not affect.

3) usability testing

3.1) installation and deployment test

Installation and deployment test refers to that easy difficulty is compared to each big data stage+module, test each big data platform peace Plug part and whether version is compatible, if normal use.Each big data stage+module adapts to platform, installs complicated process, has or not Graphically, the information such as configuration complexity.3.2) administration interface test

Whether simulation test each big data Platform Management Interface can access, and file system HDFS could normally use, respectively Audit log, monitoring analysis, the content such as templating management is checked under big data Platform Management Interface.4) extension and safety test

4.1) dynamic expansion test

Increase a back end respectively in each big data platform cluster, verify whether to affect the use of file system.

Increasing manually a back end in each big data platform cluster does not affect looking into of file system and hbase data base See and use.

4.2) authentication test

Simulate and carry out logging in, input inactive users information, interpolation and delete use toward password in each big cluster being currently running Whether family can be with login system.

4.3) access control test

Respectively using various authorities login system respectively in each target big data platform cluster being currently running, checking is right The target big data platform cluster answered whether covering system all permissions

4.4) audit testing

Carry out unauthorized access respectively in each target big data platform cluster being currently running, check corresponding journal record Whether record has this unauthorized access, checks the letter such as record time of recorded unauthorized access, IP address, user name, operation Whether breath is consistent with this unauthorized access.

5) algorithm model test case

5.1) K-Means cluster

Specifically, 9 data magnitudes of simulation generation other adapted electricity tables of data in big data platform；According to each data Carry out K-Means cluster test in magnitude other adapted electricity tables of data；Each cluster execution more than 5 times, records execution time, takes Average time is as the final result of performance test；With every five seconds for example for interval, record the disk I/O of each node, IO of network, interior Deposit utilization rate, CPU usage.

5.2) Linear Regression class test

Specifically, 9 data magnitudes of simulation generation other adapted electricity tables of data in big data platform, according to each adapted The different field of electric tables of data, such as electricity charge field, SIC code field carry out LinearRegressionn class test record Execution time, take the average time as the final result of performance test；With every five seconds for example for interval, record each node disk I/O, The IO of network, memory usage, CPU usage.

Induction and conclusion is carried out to the test results of above-mentioned several as follows：

(1) performance evaluation

1st, to CDH platform, TDH platform, HDP platform and Oracle platform in same hardware environmental testing, data volume is 1 During hundred million (size is in 10G) left and right, the insertion of Oracle relevant database (Oracle platform), read-write, association, the performance of inquiry More superior than other 3 big data platforms, during more than 700,000,000 data volumes (or 30G size), the performance ratio of other big data platforms Oracle is a lot of by force.In this test environment, the performance ratio of CDH is somewhat more winning than TDH, HDP；

2nd, in this test environment, CDH platform, TDH platform, HDP platform utilization rate all ratios relatively low, average utilization is all Within 10% about, but the utilization rate of Oracle platform, and the utilization rate of wherein accessed node server reached 70- 80%, other service nodes are in 7-9%；

3rd, for the read-write situation of disk I/O, CDH, TDH, HDP3 big data platform, in the read-write reaching hardware disk Tend to be steady after maximum, but the disk I/O situation of Oracle all maintains a relatively low compound average, and Each order of magnitude fluctuation ratio is larger；

4th, for the internal memory situation of test, the EMS memory occupation of CDH, TDH, HDP3 big data platform main metadata node Rate ratio is relatively low, but other back end memory usages have all reached on 95% always.And the memory usage of Oracle is Yo-yo, after reaching certain peak value, meeting reduction progressively.

(2) reliability evaluation

1st, in this test environment, the main metadata one malfunctions of CDH, TDH, HDP, its system all cannot normally be transported OK, after standby metadata is replied, can normally run, no data is lost, and Oracle main metadata breaks down, nor just Often run, need to recover from backup file；

2nd, the memory node fault of CDH, TDH, HDP, disk failure, network failure, if what node allowed in its configuration In fault coverage, system will normally be run.The memory node fault of Oracle, platform is not normally functioning, and data may be no Method is repaired completely；

3rd, CDH, TDH, HDP be in parallel computation, main task one malfunctions, then calculating task cannot be carried out normally transporting OK, task node breaks down, and its calculating task will transfer to other task nodes.

(3) usability evaluation

1st, CDH, TDH and HDP3 big data platform, installing is all that command window is relatively all more difficult, is only applicable to Linux operating platform, and Oracle is graphical installation interface, installs configuration relatively easy

2nd, CDH, TDH and HDP3 big data platform, has the administration interface of WEB, and Oracle also has WEB administration interface, But it is fairly simple.

(4) extension and safety evaluation

1st, extensibility aspect, 4 platforms can carry out point spread, after extension, data and calculating is no affected；

2nd, in terms of safety, 4 platforms will carry out authentication, audit log record, but the audit day of Oracle Will record is order analysis, and CDH, TDH and HDP are to be directly viewable in WEB interface.

To sum up, when selecting data platform, need according to the situation of own hardware configuration surroundings and the feelings of business datum amount Condition accounts for.

Embodiment two

According to the adapted TV university data platform method of testing in above-described embodiment, the present invention also provides a kind of adapted TV university number According to platform testing system.Fig. 2 is the composition structural representation of the adapted TV university data platform test system of the embodiment of the present invention two Figure.The composition structural representation of the adapted TV university data platform test system of the embodiment of the present invention two is shown in Fig. 2.As Fig. 2 Shown, in the present embodiment adapted TV university data platform test system, at signal generating unit 201, test cell 202, first Reason unit 203, second processing unit 204 and comparative analysiss unit 205, wherein：

Signal generating unit 201, generates the adapted electricity tables of data of multiple data volume ranks for simulation；

Test cell 202, for carrying out performance survey to target big data platform respectively according to each described adapted electricity tables of data Examination, records test relevant parameter under each described data volume rank for the described target big data platform, described test relevant parameter Including Starting Executing Time, terminate execution time, cpu busy percentage, disk I/O interface writing speed and memory usage；

First processing units 203, for determining respectively according to each described Starting Executing Time, each described end execution time Execution time under each described data volume rank for the described target big data platform；

Second processing unit 204, for according to the data strip number in each described execution time, each described adapted electricity tables of data Determine described target big data platform respectively under each described data volume rank with the data values of each described adapted electricity tables of data Unit interval execution data strip number and the unit interval execution data values；

Comparative analysiss unit 205, for described target big data platform under each described data volume rank described in hold Row time, the data strip number of execution of described unit interval, the data values of execution of described unit interval, described cpu busy percentage, institute State disk I/O interface writing speed and described memory usage is compared analysis, obtain first performance test result.

Adapted TV university data platform test system provided in an embodiment of the present invention it is pointed out that：Above for adapted The description of TV university data platform test system, the description with above-mentioned adapted TV university data platform method of testing is similar, and There is the beneficial effect of above-mentioned adapted TV university data platform method of testing, for saving length, repeat no more；Therefore, above to this The ins and outs not disclosed in the adapted TV university data platform test system that inventive embodiments provide, refer to joining of above-mentioned offer The description of electricity consumption big data platform test method.

Each technical characteristic of embodiment described above can arbitrarily be combined, for making description succinct, not to above-mentioned reality The all possible combination of each technical characteristic applied in example is all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all it is considered to be the scope of this specification record.

Embodiment described above only have expressed the several embodiments of the present invention, and its description is more concrete and detailed, but simultaneously Can not therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art Say, without departing from the inventive concept of the premise, some deformation can also be made and improve, these broadly fall into the protection of the present invention Scope.Therefore, the protection domain of patent of the present invention should be defined by claims.

Claims

1. a kind of adapted TV university data platform method of testing is it is characterised in that include：

Performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data, records described target big data Test relevant parameter under each described data volume rank for the platform, described test relevant parameter includes Starting Executing Time, end Execution time, cpu busy percentage, disk I/O interface writing speed and memory usage；

Determine described target big data platform in each institute respectively according to each described Starting Executing Time, each described end execution time State the execution time under data volume rank；

Number according to the data strip number in each described execution time, each described adapted electricity tables of data and each described adapted electricity tables of data Determine the data strip number of unit interval execution under each described data volume rank for the described target big data platform according to value respectively Data values with unit interval execution；

To described execution time under each described data volume rank for the described target big data platform, described unit interval execution Data strip number, the data values of execution of described unit interval, described cpu busy percentage, described disk I/O interface writing speed and institute State memory usage and be compared analysis, obtain first performance test result.

2. adapted TV university data platform method of testing according to claim 1 is it is characterised in that also include：

Data strip number that process time under each data volume rank for the different types of big data platform, unit interval are processed, Data values, cpu busy percentage, disk I/O interface writing speed and memory usage that unit interval is processed are compared analysis, Obtain the second the performance test results.

3. adapted TV university data platform method of testing according to claim 1 is it is characterised in that by multiple same type of Big data platform constitutes target big data platform cluster, carries out reliability testing to described target big data platform cluster, described Reliability testing includes：

Choose fault to be measured from default fault set, described fault set includes main metadata node failure, standby metadata section Point failure, memory node fault, memory node list disk failure, storage node network fault, main job scheduling node failure and Task node disk failure；

According to described fault to be measured described target big data platform cluster is carried out with fault simulation, whether shadow after validation fault simulation Ring big data platform cluster or the use with plug-in unit, obtain the result.

4. adapted TV university data platform method of testing according to claim 1 is it is characterised in that also include：To described mesh Mark big data platform carries out usability testing, and described usability testing includes installation and deployment test and administration interface test；

Described installation and deployment test includes testing the plug-in unit installed on described target big data platform and whether version is compatible, inspection Survey the adaptation platform of described target big data platform, complexity is installed, have or not graphical, the information such as configuration complexity and version Information；

Whether the administration interface that described administration interface test includes testing described target big data platform can access, and test is described Whether the file system of target big data platform can normally use, and test is in the management field of described target big data platform Daily record whether can be checked under face, whether there is graphic interface, whether there is templating Service Management and whether there is day Will analytic function.

5. adapted TV university data platform method of testing according to claim 3 is it is characterised in that also include to described target Big data platform cluster carries out dynamic expansion test, and described dynamic expansion test includes：

Increase a back end in the described target big data platform cluster being currently running, verify whether to affect file system Use.

6. adapted TV university data platform method of testing according to claim 3 is it is characterised in that also include to described target Big data platform cluster carries out security test, and described security test includes：

Input validated user information, inactive users information respectively in the described target big data platform cluster being currently running, point Not verifying whether can be with login system；

Or

Using various authorities login system respectively in the described target big data platform cluster being currently running, verify described target Big data platform cluster whether covering system all permissions；

Or

Accessed after described target big data platform cluster using illegal way, check whether audit log has recorded this illegal visit Ask.

7. adapted TV university data platform method of testing according to claim 1 is it is characterised in that described performance test includes One of data write test, data read test, data query test, data sorting test data correlation inquiry test Or arbitrarily multiple combinations.

8. adapted TV university data platform method of testing according to claim 1 is it is characterised in that can also include according to each Described adapted electricity tables of data carries out algorithm model test to target big data platform respectively；Described algorithm model test includes K- Means cluster test and Linear Regression class test.

9. adapted TV university data platform method of testing according to claim 1 is it is characterised in that described target big data is put down Platform is CDH platform, TDH platform, HDP platform or Oracle platform.

10. a kind of adapted TV university data platform test system is it is characterised in that include：

Test cell, for performance test is carried out to target big data platform respectively according to each described adapted electricity tables of data, record Test relevant parameter under each described data volume rank for the described target big data platform, described test relevant parameter includes starting Execution time, end execution time, cpu busy percentage, disk I/O interface writing speed and memory usage；

First processing units, for determining described mesh respectively according to each described Starting Executing Time, each described end execution time Mark execution time under each described data volume rank for the big data platform；

Second processing unit, for according to the data strip number in each described execution time, each described adapted electricity tables of data and each institute The data values stating adapted electricity tables of data determine unit under each described data volume rank for the described target big data platform respectively The data strip number of time execution and the data values of unit interval execution；

Comparative analysiss unit, for described execution time under each described data volume rank for the described target big data platform, The data strip number of described unit interval execution, the data values of execution of described unit interval, described cpu busy percentage, described disk I/O interface writing speed and described memory usage are compared analysis, obtain first performance test result.