CN107016501A - A kind of efficient industrial big data multidimensional analysis method - Google Patents
A kind of efficient industrial big data multidimensional analysis method Download PDFInfo
- Publication number
- CN107016501A CN107016501A CN201710190553.1A CN201710190553A CN107016501A CN 107016501 A CN107016501 A CN 107016501A CN 201710190553 A CN201710190553 A CN 201710190553A CN 107016501 A CN107016501 A CN 107016501A
- Authority
- CN
- China
- Prior art keywords
- data
- opc
- multidimensional analysis
- row
- json
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 36
- 238000000605 extraction Methods 0.000 claims abstract description 27
- 238000000034 method Methods 0.000 claims abstract description 19
- 238000006243 chemical reaction Methods 0.000 claims abstract description 12
- 238000001914 filtration Methods 0.000 claims description 10
- 241000282813 Aepyceros melampus Species 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 6
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 claims description 3
- 238000012800 visualization Methods 0.000 claims 1
- 238000004364 calculation method Methods 0.000 abstract description 6
- 238000004519 manufacturing process Methods 0.000 description 10
- 238000007405 data analysis Methods 0.000 description 6
- 241001269238 Data Species 0.000 description 5
- 238000013079 data visualisation Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 2
- 238000005265 energy consumption Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000008676 import Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000032683 aging Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 229910052751 metal Inorganic materials 0.000 description 1
- 239000002184 metal Substances 0.000 description 1
- 150000002739 metals Chemical class 0.000 description 1
- 238000005381 potential energy Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Theoretical Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Tourism & Hospitality (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Manufacturing & Machinery (AREA)
- Game Theory and Decision Science (AREA)
- Quality & Reliability (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Operations Research (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of efficient industrial big data multidimensional analysis method, comprise the following steps:(1) data write-in hadoop distributed file systems are read from opc server by OPC extraction programs;(2) data conversion in hadoop distributed file systems is will be stored in into the detail list in hive data warehouses;(3) row and column needed for detail list is filtered out, forms true table;(4) the fact that obtain table and dimension table are connected to the wide table to form subject-oriented, carrying out multidimensional analysis to wide table obtains analysis result;Wherein, dimension table comes from external system, it is necessary to which user is manually imported among hive data warehouses.The inventive method realizes elastic storage, elastic calculation, high availability, the purpose of analysis method Universal efficient.
Description
Technical field
The present invention relates to database technical field, more particularly to a kind of efficient industrial big data multidimensional analysis method.
Background technology
With the popularization of industrial information, factory more and more uses automation control appliance and intelligence in process of production
Energy instrument, these equipment and instrument also generate substantial amounts of real time data.OPC agreements are an industrial standards, have been obtained mostly
The support of number automation control appliance and intelligence instrument, the unified OPC interface that application program is provided by these equipment can
The a large amount of real time datas produced with easily obtaining in production process.These real time datas have reacted the various shapes in production process
State, by the analysis to these data, can help us to optimize production procedure, prevention defect and failure, reduce production cost,
Improve production efficiency.
Existing industrial real-time data analysis is completed usually using traditional relational database.Relative to based on hadoop
Distributed storage and calculating platform, traditional relational database memory capacity is small, and computing capability is weak, and dilatation cost is high.With
The continuous expansion of industrial real-time data scale, is analyzed, it has to carry out the sampling of coarseness, lose using traditional relational database
Discard substantial amounts of field data, this will impact analysis result accuracy.And as industrial real-time data analyzes business
Constantly variation, the computing capability of traditional relational database also becomes the bottleneck of restriction, have impact on the expansion of analysis business
And actual effect.
In the analysis method of industrial real-time data, traditional mode is individually developed generally directed to single business, different
Business uses single data model and exhibition method, have ignored the general character of data analysis business.When data analysis business is more next
When more, either safeguard that still extension can all become more and more difficult.And actual data analysis is often enlightenment,
This data analysis mode according to business customizing just seems underaction, have impact on the thinking diverging of data analyst, holds
Mindset is easily caused, is unfavorable for excavating more valuable information from mass data.
The content of the invention
The present invention is to overcome above-mentioned weak point, it is therefore intended that provide a kind of efficient industrial big data multidimensional analysis side
Method, the present invention builds data warehouse based on this distributed computing technologies increased income of hadoop, and large-scale industrial real-time data is complete
Amount is imported among data warehouse, and multidimensional analysis modeling is carried out according to unified flow for different data analysis business;Have
The data volume of receiving is large and complete, and analysis method is general and efficient, the characteristics of whole system is easy to maintain and expands.The inventive method
Realize elastic storage, elastic calculation, high availability, the purpose of analysis method Universal efficient.
The present invention is to reach above-mentioned purpose by the following technical programs:A kind of efficient industrial big data multidimensional analysis side
Method, comprises the following steps:
(1) data write-in hadoop distributed file systems are read from opc server by OPC extraction programs;
(2) data conversion in hadoop distributed file systems is will be stored in into the detail list in hive data warehouses;
(3) row and column needed for detail list is filtered out, forms true table;
(4) the fact that obtain table and dimension table are connected to the wide table to form subject-oriented, carrying out multidimensional analysis to wide table obtains
To analysis result;Wherein, dimension table comes from external system, it is necessary to which user is manually imported among hive data warehouses.
Preferably, described OPC extraction programs read industrial real-time data according to standard OPC agreements from opc server,
And timestamp when stamping reading is considered as and once read;If the data point that opc server is provided is key1, key2, key3, correspondence
Data be respectively value1, value2, value3, timestamp is represented with timestamp, it is corresponding value then be yyyy-MM-
dd-HH-mm-ss;Wherein, y represents year, and M represents the moon, and d represents day, when H is represented, m represents point that s represents the second;OPC extraction programs
The data once read are described with json forms, form character string as follows:
{“key1”:“value1”,“key2”:“value2”,“key3”:“value3”,“timestamp”:“yyyy-
MM-dd-HH-mm-ss”}。
Preferably, the extraction frequency of the OPC extraction programs is second level, the data of extraction are according to one json word of a line
The form write-in hadoop distributed file systems of symbol string, json character strings are merged into one or more text in units of hour
In part, the file for belonging to each hour is put into same file folder;The data of one hour are write it in OPC extraction programs
Afterwards, the file that size is 0 can be generated in corresponding file, file is entitled _ SUCCESS, _ SUCCESS is as current
The whether complete criterion of data in file.
Preferably, the detail list flow that the step (2) is obtained in hive data warehouses is as follows:(a) hadoop is utilized
The workflow schedule instrument oozie that distributed file system is provided will be distributed literary from hadoop more than the json files of phase buffer
Deleted in part system, wherein phase buffer is default;
(b) according to the merging chronomere of data, using one task of oozie start by set date, by the data of json forms
A two-dimentional detail list is converted into, is loaded among the hive data warehouses based on hadoop platforms, follow-up number is carried out
According to processing.
Preferably, the follow-up data processing is as follows:
(I) an interim table for only including a character tandem is set up to be carried on json data;
(II) key in json character strings is changed into name in column using hive json analytical functions json_tuple,
The value of value conversions in column.
Preferably, described is to exclude the data point unrelated with theme to entering ranks filtering in detail list, to row filtering
It is by the span diminution of the data point related to theme;In step (3) in column, conversion is arranged using by the row conversion of detail list
Embark on journey, timestamp row keep constant, the new row of increase are filtered as the method for the title of data point.
Preferably, it is described the fact table and dimension table connect and can be formed towards master according to one or more row
The wide table of topic.
Preferably, it is described to wide table carry out multidimensional analysis implementation for by hadoop platforms provide it is real-time
Sql query facilities impala, the stsndard SQL driver access that outside data visualization tool can be provided using impala
The data of wide table, complete multidimensional analysis.
The beneficial effects of the present invention are:(1) industrial real-time data of full dose can be write data bins by OPC extraction programs
Storehouse, due to data storage use hadoop distributed file systems, convenience extending transversely, will not due to memory capacity limitation and
It is forced to abandon partial data in the way of sampling;(2) the Distributed Calculation engine generation subject-oriented based on hadoop is utilized
Wide table, so in the case of big data quantity and intensive also can be completed within the relatively short time modeling process;
(3) because hadoop distributed file system memory capacity is huge, a certain degree of data redundancy in wide table can be allowed, so
In data visualisation system can in order to avoid the connection of multilist there is provided the multidimensional analysis function of low delay;(4) utilize and be based on
Hadoop workflow schedule engine, modeling calculating process fault-tolerance is strong, and stability is high;(5) the inventive method clear process, leads to
It is strong with property, can conveniently it be extended in the various multidimensional analysis business of industrial big data.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the inventive method;
Fig. 2 is the system architecture schematic diagram for implementing the present invention.
Embodiment
With reference to specific embodiment, the present invention is described further, but protection scope of the present invention is not limited in
This:
Embodiment:As shown in figure 1, a kind of efficient industrial big data multidimensional analysis method comprises the following steps:
(1) data write-in hadoop distributed file systems are read from opc server;
OPC extraction programs read whole industrial real-time datas from opc server according to the OPC agreements of standard, then beat
Timestamp during upper reading, which is considered as, once to be read.Assuming that the data point that opc server can be provided is key1, key2, key3, it is right
The data answered are respectively value1, value2, value3;Timestamp represents that corresponding value is yyyy-MM- with timestamp
dd-HH-mm-ss;
Wherein, y represents year, and M represents the moon, and d represents day, when H is represented, m represents point that s represents the second.OPC extraction programs are once
The data of reading are described with json forms, form character string as follows:
{“key1”:“value1”,“key2”:“value2”,“key3”:“value3”,“timestamp”:“yyyy-
MM-dd-HH-mm-ss”}
The extraction frequency of OPC extraction programs is second level, and the data of extraction are write according to the form of one json character string of a line
Enter hadoop distributed file systems, json character strings are merged into units of hour in one or more file, belong to every
The file of individual hour is put into same file folder., can be corresponding after OPC extraction programs write the data of a hour
Generate the file that size is 0 in file, file is entitled _ SUCCESS, follow-up data processing routine can whether to have _
SUCCESS presss from both sides the whether complete criterion of interior data as current file.The extraction frequency and data of OPC extraction programs merge
Chronomere can be changed according to the situation of practical business, but in general extract frequency at least than merge chronomere it is small
An order of magnitude.Higher extraction frequency is, in order to obtain more real time datas as far as possible, the thing represented by data not to be lost
Reason state;It is for the ease of the more efficient processing data of Distributed Calculation engine, together with respect to the merging chronomere of coarseness
When taken into account the ageing of data.
(2) initial data is converted into the detail list in hive data warehouses;
Industrial real-time data is stored in hadoop distributed file systems using above-mentioned json forms, is due to
Json forms are logically natural close with industrial real-time data, facilitate the processing of OPC extraction programs.But json form numbers
Very high according to redundancy, among the corresponding data set of same opc server, data point key is largely repeated, in the case of big data
It is the waste to storage resource.Therefore, we arrange one and delayed by the OPC data of json forms as just a cushion
Rush the phase such as 3 months, the workflow schedule instrument oozie provided by hadoop distributed file systems starts one daily to be determined
When task, the json files more than phase buffer are deleted from hadoop distributed file systems.
OPC extraction programs are write data into after hadoop distributed file systems, when oozie can be according to the merging of data
Between one task of unit start by set date, by the data conversion of json forms into a two-dimentional detail list, be loaded into and be based on
Among the hive data warehouses of hadoop platforms, follow-up data processing is carried out.The OPC data of json forms is converted into hive
Detail list mainly by two steps, initially set up one only the interim table comprising a character tandem be carried in json data it
On, the key in json character strings is then changed into name in column using hive json analytical functions json_tuple, value turns
Change the value of row into.It is as shown in table 1 that json character strings in step (1) are converted into result after hive detail lists:
key1 | key2 | key3 | timestamp |
value1 | value2 | value3 | yyyy-MM-dd-HH-mm-ss |
Table 1
In table 1, the first behavior row name, is the title of OPC data point, the timestamp finally gathered plus each data point;
Second row is only the industrial real-time data being really stored in hadoop distributed file systems.If during according to a unit
Between, such as one hour, data are merged, then multirow data will be had in detail list, regard these rows as detail list
One subregion, the entitled yyyy-MM-dd-HH of subregion.The purpose of so subregion is the renewal of detail list increment for convenience, is also
In order to occur conveniently being recalculated after exception in units of subregion.
(2) row and column needed in detail list is filtered out to form true table;
One detail list is reflected one in all industrial real-time datas that an opc server can be provided, actual conditions
Individual opc server might have thousands of data points, but user is only concerned those data points related to some theme.
It would therefore be desirable to be filtered to the row and column of detail list, the true table of generation.Row filtering is excluded unrelated with theme
Data point, is further to reduce the span of the data point related to theme to row filtering.In general, row filtering be must
It is indispensable, and it is optional to go filtering.True table can be with dimension table according to data point in subsequent step for convenience
Key is attached, and is needed the row conversion of detail list during ranks are filtered in column, row conversion is embarked on journey, timestamp row are protected
Hold constant, increase a new row as the title of data point.As shown in table 2, it is assumed that have in detail list comprising key1, key2,
It is time1, time2, time3 to have the corresponding timestamp of three row data in tri- data points of key3, a subregion of detail list, is led to
Cross row filtering and exclude the corresponding row of key3 in detail list, the corresponding rows of time3 in detail list are excluded by row filtering, then passed through
True table is generated after the rule of row-column transform, as shown in table 3.Now, the title of OPC data point is no longer row name, but into
For the key values in true table, while this leu of timestamp is so consistent with detail list.
key1 | key2 | key3 | timestamp |
value11 | value12 | value13 | time1 |
value21 | value22 | value23 | time2 |
value31 | value31 | value31 | time3 |
Table 2
key | value | timestamp |
key1 | value11 | time1 |
key1 | value21 | time2 |
key2 | value12 | time1 |
key2 | value22 | time2 |
Table 3
Opc server and detail list are one-to-one relations;And detail list and true table are one-to-many relations, i.e., one
Multiple true tables can be generated by opening detail list, and a true table can only be from a detail list.The purpose for the arrangement is that being
Avoid that the connection of multilist occurs during the true table of generation, simple flow improves the efficiency performed.
(4) true table and dimension table are connected to the wide table to form subject-oriented;
True table direct sources and industrial real-time data, and dimension table is then from other external systems, it is necessary to user
Manually import among hive data warehouses.Such as user can edit dimension table in relational database, then pass through
Dimension table is imported among hive data warehouses by the Distributed Relational tables of data import tool sqoop that hadoop platforms are provided.
In general, the change frequency of dimension table is relatively low, and data volume is also far smaller than true table.By one or multiple true tables, plus
Upper one or multiple dimension tables, the width towards some special body can be formed by being connected according to one or more row
Table.Generally, the row of connection can include multiple row in the key of data point, connection procedure can be according to some calculation formula
Participate in filter row or column in computing, connection procedure.Assuming that the fact that have as shown in table 4 table and such as the institute of table 5
The dimension table shown, the key values of two tables are the titles of OPC data point, and value represents the corresponding value of data point,
Dimension1 and dimension2 represent two the dimension such as workshops and process related to data point, according to data point key
Wide table is formd after connection, as shown in table 6.Wide table is towards some business-subject, it should as far as possible comprising with the main body
Related all data are, it is necessary to according to would rather the principle that can not lack of redundancy.Multidimensional can be externally provided after wide table formation
The service of analysis, the real-time sql query facilities impala provided by hadoop platforms, outside data visualization tool can be with
Using the data of the impala wide tables of stsndard SQL driver access provided, extemporaneous inquiry, billboard etc. are provided user various many
Tie up analytic function.
key | value | timestamp |
key1 | value1 | time1 |
key2 | value2 | time1 |
Table 4
key | dimension1 | dimension2 |
key1 | dim11 | dim21 |
key2 | dim12 | dim22 |
key3 | dim13 | dim23 |
Table 5
key | value | dimension1 | dimension2 | timestamp |
key1 | value1 | dim11 | dim21 | time1 |
key2 | value2 | dim12 | dim22 | time1 |
Table 6
It is as shown in Figure 2 with the system architecture diagram of the inventive method.
The present invention is applied to certain Large scale nonferrous metals manufactory, whole to the factory by the collection of real time data on production line
The power consumption of individual production process is analyzed, and is excavated potential energy consumption and is wasted reason, energy consumption excess is alerted, realized
The visualized management of energy consumption.Specific implementation step is as follows:
1st, data points all on production line are once obtained in every 30 seconds by OPC extraction programs, and aggregated into according to hour
Detail list in table subregion, generation data warehouse.
The fact that data point related to electricity consumption of equipment in detail list the 2nd, is filtered out into generation power consumption table.
3rd, outside editor's data point title, device name, device type, workshop, process, order of classes or grades at school, the dimension table of time correlation,
And imported among Data Data warehouse.
4th, true table and dimension table are connected, using corresponding electric energy calculation formula, generation equipment power consumption is the theme
Wide table, data visualization tool can analyze the situation of power consumption on production line in real time according to different dimensions.
The technical principle for being the specific embodiment of the present invention and being used above, if conception under this invention institute
The change of work, during the spirit that function produced by it is still covered without departing from specification and accompanying drawing, should belong to the present invention's
Protection domain.
Claims (8)
1. a kind of efficient industrial big data multidimensional analysis method, it is characterised in that comprise the following steps:
(1) data write-in hadoop distributed file systems are read from opc server by OPC extraction programs;
(2) data conversion in hadoop distributed file systems is will be stored in into the detail list in hive data warehouses;
(3) row and column needed for detail list is filtered out, forms true table;
(4) the fact that obtain table and dimension table are connected to the wide table to form subject-oriented, carrying out multidimensional analysis to wide table is divided
Analyse result;Wherein, dimension table comes from external system, it is necessary to which user is manually imported among hive data warehouses.
2. a kind of efficient industrial big data multidimensional analysis method according to claim 1, it is characterised in that:Described
OPC extraction programs read industrial real-time data according to standard OPC agreements from opc server, and timestamp when stamping reading is regarded
Once to read;If opc server provide data point be key1, key2, key3, corresponding data be respectively value1,
Value2, value3, timestamp represent that corresponding value is then yyyy-MM-dd-HH-mm-ss with timestamp;Wherein, y tables
Show year, M represents the moon, and d represents day, when H is represented, m represents point that s represents the second;The data json that OPC extraction programs are once read
Form is described, and forms character string as follows:
{“key1”:“value1”,“key2”:“value2”,“key3”:“value3”,“timestamp”:“yy yy-MM-
dd-HH-mm-ss”}。
3. a kind of efficient industrial big data multidimensional analysis method according to claim 2, it is characterised in that:The OPC
The extraction frequency of extraction program is second level, and the data of extraction are distributed according to the form write-in hadoop of one json character string of a line
Formula file system, json character strings are merged into units of hour in one or more file, belong to the file of each hour
It is put into same file folder;After OPC extraction programs write the data of a hour, it can be generated in corresponding file
One size is 0 file, and file is entitled _ SUCCESS, and _ SUCCESS is used as the whether complete judgement of data in current file folder
Standard.
4. a kind of efficient industrial big data multidimensional analysis method according to claim 1, it is characterised in that:The step
(2) the detail list flow obtained in hive data warehouses is as follows:(a) workflow provided using hadoop distributed file systems
Scheduling tool oozie will be deleted more than the json files of phase buffer from hadoop distributed file systems, and wherein phase buffer is
It is default;
(b) according to the merging chronomere of data, using one task of oozie start by set date, by the data conversion of json forms
Into a two-dimentional detail list, it is loaded among the hive data warehouses based on hadoop platforms, carries out at follow-up data
Reason.
5. a kind of efficient industrial big data multidimensional analysis method according to claim 4, it is characterised in that:It is described follow-up
Data processing it is as follows:
(I) an interim table for only including a character tandem is set up to be carried on json data;
(II) key in json character strings is changed into name in column using hive json analytical functions json_tuple, value turns
Change the value of row into.
6. a kind of efficient industrial big data multidimensional analysis method according to claim 1, it is characterised in that:It is described to bright
It is to exclude the data point unrelated with theme to enter ranks filtering in thin table, is taking the data point related to theme to row filtering
It is worth range shorter;Using by the row conversion of detail list, in column, row conversion is embarked on journey, and timestamp row keep constant, increase in step (3)
A new row are filtered as the method for the title of data point.
7. a kind of efficient industrial big data multidimensional analysis method according to claim 1, it is characterised in that:Described thing
Real table is connected with dimension table according to one or more row can form the wide table of subject-oriented.
8. a kind of efficient industrial big data multidimensional analysis method according to claim 7, it is characterised in that:It is described to width
The implementation that table carries out multidimensional analysis is the real-time sql query facilities impala provided by hadoop platforms, outside number
The data for the wide table of stsndard SQL driver access that can be provided according to visualization tool using impala, complete multidimensional analysis.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710190553.1A CN107016501A (en) | 2017-03-28 | 2017-03-28 | A kind of efficient industrial big data multidimensional analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710190553.1A CN107016501A (en) | 2017-03-28 | 2017-03-28 | A kind of efficient industrial big data multidimensional analysis method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107016501A true CN107016501A (en) | 2017-08-04 |
Family
ID=59445129
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710190553.1A Withdrawn CN107016501A (en) | 2017-03-28 | 2017-03-28 | A kind of efficient industrial big data multidimensional analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107016501A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107491563A (en) * | 2017-09-28 | 2017-12-19 | 深圳市爱贝信息技术有限公司 | Towards the data processing method and system of settlement for account |
CN107958046A (en) * | 2017-11-24 | 2018-04-24 | 小花互联网金融服务(深圳)有限公司 | Internet finance big data warehouse analysis mining method |
CN108664657A (en) * | 2018-05-20 | 2018-10-16 | 湖北九州云仓科技发展有限公司 | A kind of big data method for scheduling task, electronic equipment, storage medium and platform |
CN110019397A (en) * | 2017-12-06 | 2019-07-16 | 北京京东尚科信息技术有限公司 | For carrying out the method and device of data processing |
CN110209893A (en) * | 2019-04-23 | 2019-09-06 | 北京奇艺世纪科技有限公司 | Task creating method, system and storage medium |
CN110457331A (en) * | 2019-07-19 | 2019-11-15 | 北京邮电大学 | General real-time update multidimensional data visualization system and method |
CN110851432A (en) * | 2020-01-14 | 2020-02-28 | 中软信息系统工程有限公司 | Multi-dimensional information extraction method and device based on elastic distributed data model |
CN110850824A (en) * | 2019-11-12 | 2020-02-28 | 北京矿冶科技集团有限公司 | Implementation method for acquiring data of distributed control system to Hadoop platform |
CN111143465A (en) * | 2019-12-11 | 2020-05-12 | 深圳市中电数通智慧安全科技股份有限公司 | Method and device for realizing data center station and electronic equipment |
CN111177220A (en) * | 2019-12-26 | 2020-05-19 | 中国平安财产保险股份有限公司 | Data analysis method, device and equipment based on big data and readable storage medium |
CN112286674A (en) * | 2019-07-24 | 2021-01-29 | 广东知业科技有限公司 | Row-to-column method and system based on edge calculation |
CN113868266A (en) * | 2021-12-06 | 2021-12-31 | 广州市玄武无线科技股份有限公司 | Method and device for generating star model layout of web front end and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103425762A (en) * | 2013-08-05 | 2013-12-04 | 南京邮电大学 | Telecom operator mass data processing method based on Hadoop platform |
CN104111996A (en) * | 2014-07-07 | 2014-10-22 | 山大地纬软件股份有限公司 | Health insurance outpatient clinic big data extraction system and method based on hadoop platform |
CN104299170A (en) * | 2014-09-29 | 2015-01-21 | 华北电力大学(保定) | Intermittent energy mass data processing method |
CN106021486A (en) * | 2016-05-18 | 2016-10-12 | 广东源恒软件科技有限公司 | Big data-based data multidimensional analyzing and processing method |
CN106528809A (en) * | 2016-11-16 | 2017-03-22 | 常州神盾软件科技有限公司 | Police service big data mining and analyzing platform based on PGIS and cloud computing |
-
2017
- 2017-03-28 CN CN201710190553.1A patent/CN107016501A/en not_active Withdrawn
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103425762A (en) * | 2013-08-05 | 2013-12-04 | 南京邮电大学 | Telecom operator mass data processing method based on Hadoop platform |
CN104111996A (en) * | 2014-07-07 | 2014-10-22 | 山大地纬软件股份有限公司 | Health insurance outpatient clinic big data extraction system and method based on hadoop platform |
CN104299170A (en) * | 2014-09-29 | 2015-01-21 | 华北电力大学(保定) | Intermittent energy mass data processing method |
CN106021486A (en) * | 2016-05-18 | 2016-10-12 | 广东源恒软件科技有限公司 | Big data-based data multidimensional analyzing and processing method |
CN106528809A (en) * | 2016-11-16 | 2017-03-22 | 常州神盾软件科技有限公司 | Police service big data mining and analyzing platform based on PGIS and cloud computing |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107491563A (en) * | 2017-09-28 | 2017-12-19 | 深圳市爱贝信息技术有限公司 | Towards the data processing method and system of settlement for account |
CN107958046A (en) * | 2017-11-24 | 2018-04-24 | 小花互联网金融服务(深圳)有限公司 | Internet finance big data warehouse analysis mining method |
CN110019397B (en) * | 2017-12-06 | 2021-06-29 | 北京京东尚科信息技术有限公司 | Method and device for data processing |
CN110019397A (en) * | 2017-12-06 | 2019-07-16 | 北京京东尚科信息技术有限公司 | For carrying out the method and device of data processing |
CN108664657A (en) * | 2018-05-20 | 2018-10-16 | 湖北九州云仓科技发展有限公司 | A kind of big data method for scheduling task, electronic equipment, storage medium and platform |
CN110209893A (en) * | 2019-04-23 | 2019-09-06 | 北京奇艺世纪科技有限公司 | Task creating method, system and storage medium |
CN110457331A (en) * | 2019-07-19 | 2019-11-15 | 北京邮电大学 | General real-time update multidimensional data visualization system and method |
CN112286674B (en) * | 2019-07-24 | 2023-12-19 | 广东知业科技有限公司 | Edge calculation-based row-column conversion method and system |
CN112286674A (en) * | 2019-07-24 | 2021-01-29 | 广东知业科技有限公司 | Row-to-column method and system based on edge calculation |
CN110850824A (en) * | 2019-11-12 | 2020-02-28 | 北京矿冶科技集团有限公司 | Implementation method for acquiring data of distributed control system to Hadoop platform |
CN111143465A (en) * | 2019-12-11 | 2020-05-12 | 深圳市中电数通智慧安全科技股份有限公司 | Method and device for realizing data center station and electronic equipment |
CN111177220A (en) * | 2019-12-26 | 2020-05-19 | 中国平安财产保险股份有限公司 | Data analysis method, device and equipment based on big data and readable storage medium |
CN110851432A (en) * | 2020-01-14 | 2020-02-28 | 中软信息系统工程有限公司 | Multi-dimensional information extraction method and device based on elastic distributed data model |
CN113868266A (en) * | 2021-12-06 | 2021-12-31 | 广州市玄武无线科技股份有限公司 | Method and device for generating star model layout of web front end and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107016501A (en) | A kind of efficient industrial big data multidimensional analysis method | |
CN110472068B (en) | Big data processing method, equipment and medium based on heterogeneous distributed knowledge graph | |
Jindal et al. | Comparative study of data warehouse design approaches: a survey | |
CN103984755A (en) | Multidimensional model based oil and gas resource data key system implementation method and system | |
CN106126601A (en) | A kind of social security distributed preprocess method of big data and system | |
CN106547918B (en) | Statistical data integration method and system | |
CN103019728A (en) | Effective complex report parsing engine and parsing method thereof | |
CN103646079A (en) | Distributed index for graph database searching and parallel generation method of distributed index | |
CN101609460A (en) | A kind of search method and searching system of supporting the heterogeneous earth science data resource | |
CN105630934A (en) | Data statistic method and system | |
CN103077192B (en) | A kind of data processing method and system thereof | |
CN112749266A (en) | Industrial question and answer method, device, system, equipment and storage medium | |
CN112347071A (en) | Power distribution network cloud platform data fusion method and power distribution network cloud platform | |
CN112100149A (en) | Automatic log analysis system | |
CN115309749A (en) | Big data experiment system for scientific and technological service | |
CN112634004B (en) | Method and system for analyzing blood-cause atlas of credit investigation data | |
CN111125045B (en) | Lightweight ETL processing platform | |
US20070282804A1 (en) | Apparatus and method for extracting database information from a report | |
CN110874366A (en) | Data processing and query method and device | |
CN117573881A (en) | Construction and application method of on-orbit fault knowledge graph of spacecraft control propulsion system | |
CN114077652A (en) | Data processing method based on multidimensional data cube and electronic device | |
Sun et al. | SETL: A scalable and high performance ETL system | |
CN104346378A (en) | Method, device and system for realizing processing of complex data | |
Zema et al. | Energy sales forecasting in a sustainable development context: Bibliometric review | |
Trofimov et al. | Data Representation from Energy Balances by Using Geo-information System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20170804 |