CN111046054A - Method and system for analyzing power marketing business data - Google Patents
Method and system for analyzing power marketing business data Download PDFInfo
- Publication number
- CN111046054A CN111046054A CN201911224071.9A CN201911224071A CN111046054A CN 111046054 A CN111046054 A CN 111046054A CN 201911224071 A CN201911224071 A CN 201911224071A CN 111046054 A CN111046054 A CN 111046054A
- Authority
- CN
- China
- Prior art keywords
- data
- database
- query
- power marketing
- server
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000004458 analytical method Methods 0.000 claims abstract description 18
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000011068 loading method Methods 0.000 claims abstract description 7
- 230000004044 response Effects 0.000 claims abstract description 6
- 238000005457 optimization Methods 0.000 claims description 50
- 238000003860 storage Methods 0.000 claims description 30
- 238000013500 data storage Methods 0.000 claims description 20
- 238000005259 measurement Methods 0.000 claims description 18
- 230000005540 biological transmission Effects 0.000 claims description 17
- 238000007405 data analysis Methods 0.000 claims description 16
- 238000012544 monitoring process Methods 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000009466 transformation Effects 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 11
- 230000005611 electricity Effects 0.000 claims description 9
- 238000013075 data extraction Methods 0.000 claims description 6
- 230000006835 compression Effects 0.000 description 13
- 238000007906 compression Methods 0.000 description 13
- 230000008569 process Effects 0.000 description 12
- 238000010276 construction Methods 0.000 description 11
- 230000008901 benefit Effects 0.000 description 10
- 238000009826 distribution Methods 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000013461 design Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 238000007726 management method Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000001914 filtration Methods 0.000 description 5
- 230000010354 integration Effects 0.000 description 4
- 230000002776 aggregation Effects 0.000 description 3
- 238000004220 aggregation Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 238000005065 mining Methods 0.000 description 3
- 241001080526 Vertica Species 0.000 description 2
- 238000012217 deletion Methods 0.000 description 2
- 230000037430 deletion Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 239000002253 acid Substances 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000006116 polymerization reaction Methods 0.000 description 1
- 238000011112 process operation Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/466—Transaction processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Software Systems (AREA)
- Water Supply & Treatment (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Computer Security & Cryptography (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Public Health (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and a system for analyzing electric power marketing business data, wherein the method comprises the steps of extracting electric power marketing business data; preprocessing the electric power marketing business data, and loading the electric power marketing business data into a database in batches; establishing a data index, a filter and a dimension table for the power marketing service data, and importing common data and acceleratable data into a memory database; aiming at high concurrent query requests, the response time from millisecond to second is realized; when the front end initiates a data request, whether cache data exist in the memory database or not is checked firstly, if so, all data are returned, if not, the data are searched and fed back to the front end through the relational database, and the result is synchronously loaded into the memory so as to facilitate subsequent efficient query. By adopting the technical scheme of the invention, the quick response capability of the user multidimensional analysis facing the electric power marketing service can be realized by associating massive multidimensional data and constructing the user main index.
Description
Technical Field
The invention relates to the technical field of data analysis, in particular to a method and a system for analyzing electric power marketing business data.
Background
At present, the reform development of the power enterprises faces new situations and tasks, along with the continuous improvement of the power system, especially the emergence of power selling companies which are widely concerned by the market, the trade of the power market is more democratic and open, the trade mode is gradually upgraded, and various flexible and autonomous trade modes such as power grid, internet, credit, futures, retail, wholesale and the like appear, so that the pressure of the power enterprises on developing the power selling market, preventing the operational risk and the like is increased day by day, and the power enterprises face the severe challenges of reducing high-quality customers, reducing market share and losing high-quality talents, and particularly provide higher requirements and new challenges for marketing service businesses.
For the electric power enterprise, a marketing service technology support platform is constructed by utilizing a big data technology, electricity utilization characteristics and requirements of marketing customers are analyzed, data support can be provided for formulating an electric power marketing scheme, and important data support is provided for the electric power enterprise to seize the market and users, so that the economic benefit of the enterprise is improved, the service level and the service quality of the customers are continuously improved, and the method has very important practical significance.
The characteristics of the electric power big data are mainly as follows:
the data Volume is large (Volume). This is an important feature of power big data. With the continuous promotion of power informatization construction, the growth speed and the scale of power data are far beyond the expectation of power enterprises.
Type multiple (Variety), meaning that power big data contains a wide Variety of data types, such as structured, semi-structured, and unstructured data.
The speed (Velocity) is the speed of collecting, processing and analyzing the large power data.
The Value (Value) means that the power data contains much valuable information.
On one hand, with continuous deepening of power system innovation, power enterprises are gradually changed into service enterprises from production enterprises, business processes of the enterprises need to be optimized and recombined around the needs of power customers, and meanwhile, management modes and strategic transformation need to be changed inside the enterprises. On the other hand, with the deep progress of the construction of the "three-set five-large" system, the construction and the progress of systems such as SG186, SG-ERP and the like, the preliminary integration and the shared utilization of enterprise-level data resources are realized, but with the rapid increase of the electric power data volume, the diversification of data types, and the increasingly deepening of cross-professional and cross-platform applications, the electric power large data inevitably face multiple challenges such as high-performance storage and high expandability of data. In the aspect of electric power marketing data, relevant information such as power consumption user information, electric quantity and electric charge reaches TB-level scale, and in the aspect of multidimensional analysis of a single user, a traditional relational database cannot meet the requirement of quick retrieval for the single user.
Disclosure of Invention
In order to solve the problems in the related art, embodiments of the present invention provide a method and a system for analyzing power marketing service data, which implement a quick response capability of a user multidimensional analysis for power marketing services by associating massive multidimensional data and constructing a user master index.
The embodiment of the invention provides a method for analyzing power marketing business data, which comprises the following steps:
extracting power marketing business data;
preprocessing the electric power marketing business data, and loading the electric power marketing business data into a database in batches;
establishing a data index, a filter and a dimension table for the power marketing service data, and importing common data and acceleratable data into a memory database;
aiming at high concurrent query requests, the response time from millisecond to second is realized;
when the front end initiates a data request, whether cache data exist in the memory database or not is checked firstly, if so, all data are returned, if not, the data are searched and fed back to the front end through the relational database, and the result is synchronously loaded into the memory so as to facilitate subsequent efficient query.
Further, the electricity marketing business data is circularly imported into the distributed column database in a distributed file system file Batch Load mode.
Further, in the distributed column-type database, a data storage model is built by taking the measurement point number plus a timestamp prefix as Rowkey, a column family as t, a device ID as a column and a measurement value as a value corresponding to the column.
Further comprising the steps of:
carrying out syntactic analysis on sentences of the query language, decomposing the grammar into words with representative meanings, carrying out syntactic analysis on different words, and determining language logic relations among different words;
the expression modes in the syntax tree are processed in advance to form an initial query plan with logic execution value;
optimizing the initial query plan to determine a query plan scheme with optimal query application value;
and carrying out physical optimization to generate a space application mode with the most selective value.
Further, the method for preprocessing the expression mode in the syntax tree further comprises the following steps:
and (4) arranging redundant pre-and suffixes of words, removing redundant parts in sentences, and performing scientific calculation by applying a constant expression mode.
Further, optimizing the initial query plan, further comprising the steps of:
technical operation is carried out in a relational algebra model mode, the quantity of executed tasks is reduced through plan adjustment, and the query working efficiency is improved.
The embodiment of the invention also provides a system for analyzing the electric power marketing service data, which comprises an acquisition and transmission power state monitoring system end and a big data platform end, wherein the acquisition and transmission power state monitoring system end further comprises an acquisition database; the big data platform end further comprises a data extraction server, a data analysis server, a data storage server, a database server and a storage device, wherein,
the acquisition database is used for acquiring the utilization and acquisition data and the state monitoring data of the power transmission and transformation equipment from the utilization and transmission and transformation state monitoring system;
the data extraction server is used for extracting the data used for collecting in the database and the state monitoring data of the power transmission and transformation equipment into files and storing the files in a specified directory of the server;
the data analysis server is used for extracting the E file from the server designated directory and analyzing and writing the E file into a big data platform kafka message queue;
the data storage server is used for consuming the accessed measurement data from the big data platform kafka message queue and writing the measurement data into the big data platform column data storage;
the database server is used for storing the warehoused measurement data in a structured mode, carrying out statistical processing on the warehoused measurement data and feeding back a statistical result to the front-end page;
the storage device is used for storing the data related to the acquisition file, the database file and the system log.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
1. in the past, due to the performance problem of a relational database, massive data processing can only be realized by performing database division and table division in a space time-changing mode, and establishing a plurality of copies to perform correlation analysis. By the method, the full-data correlation query is realized, the enterprise construction cost is reduced, the data copy is reduced, and the data repeated construction is avoided. Unified central decision management is constructed on the basis of a big data platform, cooperative decision and cooperative disposal are efficiently implemented, barriers among layers are broken, repeated construction of systems such as data processing, data analysis and scene application is avoided, and service support capacity is improved. Meanwhile, the integrated architecture design scheme can effectively save the investment of companies on system software and hardware resources, and greatly reduce the operation and maintenance workload of the system, thereby improving the economic benefit of enterprises for constructing the project.
2. Through a platform, multi-scene and micro-application construction mode, the requirement of enterprise development is met, and unified management and unified allocation of resources and applications are realized. The daily operation and maintenance and management are effectively supervised and processed, and the integration, the process, the standardization and the precision of the operation and maintenance management are realized.
3. The method can support analysts to shorten the working time of data acquisition, data association, data chart making and the like, and meanwhile, the analysis dimensionality is increased, and the analysis report compiling efficiency is improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic structural diagram of a power marketing business data analysis system in an embodiment of the present invention.
FIG. 2 is a flow chart of data analysis of the electricity marketing service in the embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus, and associated applications, methods consistent with certain aspects of the invention, as detailed in the following claims.
The technical scheme of the invention mainly comprises the step of meeting the performance requirement of query analysis through the column database.
The columnar database dates back to 1994, where Sybase, a company named Express way technologies, was obtained, and its main product is a tool for assisting the conventional database in report acceleration, namely, the data of the linear database is stored in a columnar manner. This technology was formally introduced as the product Sybase IQ in 1996 and continues to the present that the line-to-line transition can increase the speed of reporting.
First, the line database stores data of fields in a row ' manner, and stores the data continuously in a row ' by row ' manner. In this way, the method is intuitive and efficient for OLTP operations such as writing information of a piece of data into a database, or modifying and deleting some fields in a piece of data.
The line database has the following features: 1. the data is stored in rows; 2. queries without indices use a large amount of I/O; 3. it takes a lot of time and resources to build the index and materialized view; 4. in the face of query requirements, databases must be heavily inflated to meet performance requirements.
When some reports and analyses are performed on the line database, people find that the storage format is not efficient to use. Because most statistical analysis scenarios, for example: calculating the sales and profit changes of each province in a same ratio; operations on some of the fields are performed according to department statistics performance completion, etc. The line database uniformly reads other information of each contract, such as customer name, contract signing time, customer manager and the like, when only sales and profits are analyzed according to a page data reading mode without any condition, so that a large amount of precious I/O is wasted.
The first improvement given by the database community is "indexing", which is just like the directory in front of the dictionary, to achieve fast positioning. However, as analysis scenarios become more complex and varied, data administrators find that indexes can generally only help a portion of queries and analysis, and it is almost impossible to optimize all queries and analysis scenarios in an enterprise-level BI system, both from a composition perspective and from a cost perspective. The waste of storage space due to the large number of indices and the waste of time to maintain the indices may increase exponentially.
The thought principle of the column-type database is not complex, the line-type data is completely disassembled and recombined and stored in a column mode, and the data of all the lines in one column are stored together. Efficient encoding is performed according to the characteristic values of the data in the columns (the characteristic values of dimensional fields such as time, department codes, sales regions and the like are not many, and are quite common from a few to hundreds), and the data are stored in an encoded form in actual storage, so that a large proportion of compression is brought.
The columnar database has the following characteristics:
1. high efficiency storage space utilization
Conventional line databases often reserve some space because the length of each column is not the same, so as to prevent one row of data from jumping to another block during updating. The column-oriented database is completely analyzed from the beginning, and the problem of small update is not considered, so that the data is completely densely stored.
A line database will often have a pseudo-column rowid present to indicate the id of the line. A columnar database typically does not hold rowids.
The different algorithms of the columnar databases, which are invented for the data characteristics of different columns, tend to have much higher compression ratios than the line databases, with common line databases typically having compression ratios of about 3: 1 to 5: 1 and columnar databases typically having compression ratios of about 8: 1 to 30: 1. (InfoBright can reach 40: 1 in particular applications and Vertica can reach 60: 1 in particular applications, generally such high compression rates are network traffic dependent)
Due to its special IO model, the columnar database typically does not require indexing by its data execution engine to accomplish a large number of data filtering tasks (except Sybase IQ). This additionally reduces the space consumption of data storage.
A column-wise database does not require materialized views, a row-wise database generally has two materialized views, a non-aggregated materialized view and an aggregated materialized view of a column in order to reduce IO. The columnar database itself is stored in a distributed manner so that the first is not required, but is well suited for normal aggregation operations due to other characteristics. (other materialized views are not refreshable in real-time, such as ranking functions, concatenations irregular by, etc., which are not included in the database.)
2. Invisible indexing
The columnar database does not need an index in the row database to reduce IO and distribution of faster lookup values because each column of data is sorted selectively. When the database execution engine performs where conditional filtering. The data of the entire block is discarded as long as it finds that the data of any column does not satisfy a certain condition. The final preliminary filtering will only scan for data blocks that may satisfy the condition.
3. Data Iteration (Tuple Iteration)
The L2 cache provided by the multi-core CPU can better utilize the characteristics of the second-level cache and the multi-core concurrence of the CPU when the same function is executed for a plurality of times in a short time. The row-type database has lower execution efficiency than the column-type database because the mixed data of the row-type database cannot make the same simple function call on an array.
4. Compression algorithm
The columnar database is stored separately for each column. It is easy to apply different compression algorithms for the features of each column. Common columnar database Compression algorithms are Run Length Encoding, Data Dictionary, Delta Compression, BitMap Index, LZO, Null Compression, and the like. The compression efficiency varies from 10W: 1 to 10: 1 according to different characteristics. And the larger the data, the more obvious the improvement of the compression efficiency is.
5. Delayed materialization
Due to the special execution engine of the columnar database, the data does not need to be decompressed in the middle process operation of the data, and pointers are used for replacing the operation until the complete data is required to be output finally. Conventional row-wise database operations decompress all the data at the beginning of the operation and then perform subsequent filtering, projection, concatenation, and aggregation operations, as is the case with the execution plan of a row-wise database. In the whole calculation process, no matter filtering, projection, connection and aggregation operations, the columnar database does not decompress the data until the original data value is restored by the last data. This has the benefit of reducing CPU consumption, reducing memory consumption, reducing network transmission consumption, and reducing the need for final storage.
The column-type database has the advantages that: extremely high loading speed (up to the sum of all hard disk IO, basically limited); fitting for large amounts of data rather than small data; real-time loading data is limited to additions (deletion and update requires decompression of the Block and then calculation and then recompression of the store); the high-efficiency compression rate saves not only the storage space but also the calculation memory and the CPU; is very suitable for polymerization operation.
The column-wise database has the following disadvantages: are not suitable for scanning small amounts of data; are not suitable for random updating; batch updating conditions are different, some optimized better column databases (such as Vertica) have better performance, and some databases which are not updated have worse performance; it is not suitable for real-time operations involving deletion and update.
Common error zones: one common error area is believed to be that a row-wise database is more advantageous than a column-wise database if each scan involves more rows or full-column full-table scans. In fact, this is just a misleading area of line-based database understanding, i.e., it is considered that the main advantage of a column-based database is that its columns are stored separately, and the other large features mentioned above of the column-based database are ignored, which is the core of the high performance of the column-based database.
According to the technical scheme, through understanding of system systems such as power utilization information acquisition, SG186 and SG-ERP, and combining the technical advantages of a large data platform, data integration work is mainly divided into three key links of data acquisition access, data storage and historical data migration. The data access optimizes marketing data acquisition by applying a big data technology so as to improve the real-time processing capacity, mass storage capacity and rapid off-line analysis application capacity of an acquisition system and meet the requirements of higher and higher data real-time release and value mining in the future.
Fig. 1 is a schematic structural diagram of a power marketing business data analysis system in an embodiment of the present invention. As shown in fig. 1, the system for analyzing the electric marketing service data includes an electricity and power consumption state monitoring system terminal and a big data platform terminal, the electricity and power consumption state monitoring system terminal further includes an electricity and power consumption database 101, and the big data platform terminal further includes a data extraction server 102, a data analysis server 103, a data storage server 104, a database server 105, and a storage device 106.
The collection database collects the mining data and the power transmission and transformation equipment state monitoring data from the mining power transmission and transformation state monitoring system.
And the data extraction server extracts the data used for collecting in the database and the state monitoring data of the power transmission and transformation equipment into files and stores the files in a specified directory of the server.
And the data analysis server extracts the E file from the specified directory of the server and analyzes and writes the E file into a big data platform kafka message queue.
And the data storage server consumes the accessed measurement data from the big data platform kafka message queue and writes the measurement data into the big data platform column data storage.
The database server stores the warehoused measurement data in a structured mode, performs statistical processing, and feeds back statistical results to the front-end page.
The storage device stores the collection file, the database file and the system log related data.
FIG. 2 is a flow chart of data analysis of the electricity marketing service in the embodiment of the present invention. As shown in fig. 2, the electric marketing business data analysis process includes the following steps:
And step 204, aiming at the high-concurrency query request, and realizing the response time from millisecond level to second level. The method can be divided into online transaction processing and online concurrent processing according to the requirements on transaction characteristics, the online transaction has strict limitation on the ACID transaction characteristics, and the online concurrent processing only needs simple row-level lock read-write consistency.
And data enters the system and is finally stored in a distributed columnar database + distributed file system. Considering that the writing throughput of the original interface of the distributed columnar database is not ideal, in order to further improve the data loading efficiency, the electricity marketing business data is circularly imported into the distributed columnar database in a Batch Load mode of a distributed file system file.
And designing a data storage model in the distributed column-type database, and aiming at the service characteristics of batch query, selecting a value corresponding to Rowkey, a column family, a device ID and a measurement value, wherein Rowkey is the number of the measurement point plus a time stamp prefix, t is the column family, and the measurement value is the column.
The basic flow of distributed columnar database query optimization comprises the following steps:
carrying out syntactic analysis on the sentence of the query language, decomposing the grammar into words with representative meanings, carrying out syntactic analysis on different words, and determining the language logic relationship existing between different words;
the expression modes in the syntax tree are processed in advance to form an initial query plan with logic execution value;
optimizing the initial query plan to determine a query plan scheme with optimal query application value;
and carrying out physical optimization to generate a space application mode with the most selective value.
The method comprises the steps of carrying out preprocessing on an expression mode in a grammar tree, further sorting redundant vocabulary suffixes and suffixes, removing redundant parts in sentences, and carrying out scientific calculation by applying a constant expression mode.
And optimizing the initial query plan, further performing technical operation in a relational algebraic model mode, and reducing the task execution amount through plan adjustment to improve the query working efficiency.
The following describes the technical scheme of column type storage and query optimization in detail.
Due to the main position and the role of the query engine in the database construction and application processes, the promotion of the database query function needs to be realized by optimizing the query as a key control link. In the traditional application field of databases, there are many optimization methods for a database query system, and the distributed column-type database and the traditional database have great differences in data storage location, information reading mode, organization structure distribution and the like, so that there are many differences in query optimization modes, which results in a technical application mode of traditional database query optimization and also reflects a certain degree of inadaptability in the application field of the distributed column-type database. The distributed column-type database based on the memory is to be combined with the data storage characteristics and the information reading mode of the distributed column-type database to explore a query optimization mode suitable for self development so as to promote continuous expansion of the application scope of the distributed column-type database.
Column type storage and query optimization technology: the construction of the column-type database is implemented through a mode of a two-dimensional table, the storage of related model data by a computer is stored through a continuous address mode, the conversion process of the data in the two-dimensional table to a one-dimensional data mode is to convert a spatial logic structure of a description object to a planar physical storage mode, the difference between the column-type database and a traditional database is mainly reflected in the difference of the storage modes, the traditional database generally adopts a row storage mode, namely, after one row is stored, the next row is stored, and the column-type database adopts a column-type storage mode, namely, after one column is stored, the next column is stored. The row type storage mode and the column type storage mode have the advantages and the disadvantages of application, and in order to improve the working efficiency of data query and the application advantages of a database, the application performance of the row table sequence data storage mode and the column type storage model construction can be improved to the greatest extent. The method can also be applied to a column type storage mode of row number index and mixed index, different storage models are suitable for different database application fields, and different application advantages and disadvantages can be reflected. The distributed column-type database has an interactive characteristic on data division nodes, so that the information retrieval and query in a distributed environment based on a memory can show great difference, the transmission cost among the nodes is also a problem that the query needs to pay attention, and the optimization work in the distributed storage environment is different from that of the traditional single-node database. The data query is carried out to distribute the query task to different data nodes, and the transmission cost of the data in the network is also a problem which has to be considered. The data distribution strategy and the execution strategy of the query task have great influence on the query working efficiency and are important problems which need to be concerned about in query optimization. According to the structural composition characteristics of the distributed database, the uniform data form can be dispersed into different physical nodes to be important work content, and the granularity of data splitting is a key factor for determining the query quality of the database.
Basic flow of (I) distributed columnar database query optimization
The data query work starts from the generation of an execution command of a query task, and finally forms a physical operation plan capable of being executed through the data transmission and conversion layer by layer. The main process can be summarized into a grammar parsing stage, a preprocessing stage, execution plan optimization according to calculation and conversion rules, calculation of query cost and determination of an optimal query path.
1. Grammar parsing
The work task at this stage is to parse the statements of the query language and to decompose these grammars into words that can have representative meanings, and to parse the different words to determine the linguistic logical relationships that exist between the different words. At this stage, the query languages which do not accord with the grammar rule need to be screened, so that the query languages entering the execution stage can accord with the query rule, and further effective task query is carried out.
2. Pretreatment of
The work task in the stage can process the expression mode in the grammar tree in advance, the workload of the task execution stage is reduced, the optimization mode in the stage is mainly used for sorting redundant vocabulary prefixes and suffixes, removing redundant parts in sentences, applying a constant expression mode to carry out scientific calculation, and forming an initial query plan with logic execution value through the preprocessing traversal process of the grammar tree.
3. Execution plan optimization based on calculation rules
The work task at this stage is to properly adjust the initial query plan determined at the previous stage, the optimization principle is the work basis for plan adjustment, technical operation is carried out in a mode of a relational algebra model, the work goal of reducing the task execution amount is realized through plan adjustment, further the query work efficiency is improved, and a query plan scheme with the optimal query application value can be determined.
4. Physical optimization
The physical optimization aims to generate a space application mode with the optimal selection value, the physical space can meet the actual application requirements of query work in principle, but different optimization selection costs need to be reflected in the optimization process of the physical space, the query task is executed with execution cost, and the estimation of the execution cost is also the main work content of the physical optimization.
(II) query optimization design
1. Query engine design
The query engine is mainly responsible for analyzing and optimizing the statement of the query language and executing the query task, and the optimization of the query engine can be mainly carried out from two aspects of query optimization and task execution optimization. The two-part optimization functions can be kept independent of each other while still being able to exert mutually-facilitated optimization effects. The optimization of the task execution plan can be respectively optimized from multiple layers of plan analysis, execution, network communication, cache management and the like, and the optimization efficiency of the task execution plan is improved.
2. Data storage and distribution scheme
The data storage is the basic function of the database, and the reasonability of data storage and distribution is the work basis for query optimization. The optimization scheme can select a Group-Key column type storage mode, the application of a distributed joint index mode is the main characteristic of the model, and the data distribution mode can keep the relevance among data and simultaneously can also keep the relevance among the data
The method can process the sliced data, realize a management integration mode of data flattening, and reflect the interactive characteristics among different data modules. The data distribution mode can reflect the original logic relationship among the data and has the advantage of laying distribution storage application for inquiry.
(III) optimization of the details of the Algorithm
Data calculation of different nodes is also important content influencing query efficiency, optimization of algorithm details is emphasized, and calculation efficiency of data in different QE nodes can be improved. And a greedy algorithm and a genetic algorithm are applied to design by combining the data distribution mode of the system.
(IV) query task Tree design
The structure of the query syntax tree also has an important influence on the realization of the optimization target, the syntax tree is also the structural characteristic of data, the reasonable task tree design also needs to be optimized from the structure of the syntax tree and different nodes of the tree, and the information processing efficiency is improved through reasonable structure adjustment.
(V) implementation of query optimization
The implementation of query optimization needs to use a data query process as a work basis, and the implementation of query optimization is mainly embodied in four aspects: logic plan optimization, preprocessing module optimization, rule optimization and physical optimization. The data structure of the logic plan corresponds to a grammar tree, the query task is decomposed into different vocabulary query modules, and unambiguous retrieval of the query task is realized through information decomposition and matching of the query modules. The implementation of the optimization function of the preprocessing module is mainly embodied in the aspect of logic processing for removing redundant parts, and the application of the Eval Const Expression () method and the Eval Or Expression () method can achieve the expected working target.
The rule optimization can reflect the operation and application of the relational algebra, ensure that the query service can be matched with the logic plan, apply the relational algebra to carry out operation, and represent the logical relationship in the syntax tree structure taking the relationship as a unit no matter what form of change occurs on the basis of data, so the table taking the relationship as a representation mode has stronger optimization operation value.
The physical optimization calculates the query cost according to the application state of the system, so that the physical optimization frequently uses the data information of the system, firstly calls Set Base Rel (), Deconstruct JoinTree (), Remove Useless jobs () to perform some initialization operations on the logic plan, finally calls Make FinalRel () to generate an optimal path, and the generated final optimal path Rel Opt Info object is stored in the corresponding member variable of the logic plan.
The implementation of the query optimization strategy needs to perform reasonable abstract imagination, namely, the working requirement of an actual query task is met, meanwhile, the representation of a non-query task needs to be considered, the abstract relation existing in the model relation is applied, and is combined with a column type storage mode, so that the optimization operation mode of relational algebra can be expanded to the construction and application of a database, and the design effect of integral query optimization is achieved.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
Claims (7)
1. A method for analyzing electric power marketing business data is characterized by comprising the following steps:
extracting power marketing business data;
preprocessing the electric power marketing business data, and loading the electric power marketing business data into a database in batches;
establishing a data index, a filter and a dimension table for the power marketing service data, and importing common data and acceleratable data into a memory database;
aiming at high concurrent query requests, the response time from millisecond to second is realized;
when the front end initiates a data request, whether cache data exist in the memory database or not is checked firstly, if so, all data are returned, if not, the data are searched and fed back to the front end through the relational database, and the result is synchronously loaded into the memory so as to facilitate subsequent efficient query.
2. The method of power marketing business data analysis of claim 1, wherein the power marketing business data is circularly imported into the distributed columnar database by way of a distributed file system file Batch Load.
3. The method for analyzing the data of the electricity marketing service according to claim 2, wherein in the distributed column-type database, a data storage model is built by using measurement point numbers, a time stamp prefix, a column family, a device ID and a measurement value as corresponding values in the column.
4. The method of power marketing business data analysis of claim 1, further comprising the steps of:
carrying out syntactic analysis on sentences of the query language, decomposing the grammar into words with representative meanings, carrying out syntactic analysis on different words, and determining language logic relations among different words;
the expression modes in the syntax tree are processed in advance to form an initial query plan with logic execution value;
optimizing the initial query plan to determine a query plan scheme with optimal query application value;
and carrying out physical optimization to generate a space application mode with the most selective value.
5. The method for analyzing electric marketing service data of claim 4, wherein the expression patterns in the syntax tree are processed in advance, further comprising the following steps:
and (4) arranging redundant pre-and suffixes of words, removing redundant parts in sentences, and performing scientific calculation by applying a constant expression mode.
6. The method of power marketing business data analysis of claim 4, wherein the initial query plan is optimized, further comprising the steps of:
technical operation is carried out in a relational algebra model mode, the quantity of executed tasks is reduced through plan adjustment, and the query working efficiency is improved.
7. A system for analyzing electric power marketing business data is characterized by comprising an acquisition and transmission and transformation state monitoring system end and a big data platform end, wherein the acquisition and transmission and transformation state monitoring system end further comprises an acquisition database; the big data platform end further comprises a data extraction server, a data analysis server, a data storage server, a database server and a storage device, wherein,
the acquisition database is used for acquiring the utilization and acquisition data and the state monitoring data of the power transmission and transformation equipment from the utilization and transmission and transformation state monitoring system;
the data extraction server is used for extracting the data used for collecting in the database and the state monitoring data of the power transmission and transformation equipment into files and storing the files in a specified directory of the server;
the data analysis server is used for extracting the E file from the server designated directory and analyzing and writing the E file into a big data platform kafka message queue;
the data storage server is used for consuming the accessed measurement data from the big data platform kafka message queue and writing the measurement data into the big data platform column data storage;
the database server is used for storing the warehoused measurement data in a structured mode, carrying out statistical processing on the warehoused measurement data and feeding back a statistical result to the front-end page;
the storage device is used for storing the data related to the acquisition file, the database file and the system log.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911224071.9A CN111046054A (en) | 2019-12-01 | 2019-12-01 | Method and system for analyzing power marketing business data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911224071.9A CN111046054A (en) | 2019-12-01 | 2019-12-01 | Method and system for analyzing power marketing business data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111046054A true CN111046054A (en) | 2020-04-21 |
Family
ID=70234493
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911224071.9A Pending CN111046054A (en) | 2019-12-01 | 2019-12-01 | Method and system for analyzing power marketing business data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111046054A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111815453A (en) * | 2020-07-23 | 2020-10-23 | 国家电网公司西北分部 | Electric power transaction operation system |
CN115309790A (en) * | 2022-10-12 | 2022-11-08 | 国网山东省电力公司新泰市供电公司 | Power data query monitoring system in power system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105718593A (en) * | 2016-01-28 | 2016-06-29 | 长春师范大学 | Database query optimization method and system |
CN106126604A (en) * | 2016-06-20 | 2016-11-16 | 华南理工大学 | A kind of social security data log analysis process system based on Distributed Data Warehouse |
CN106651633A (en) * | 2016-10-09 | 2017-05-10 | 国网浙江省电力公司信息通信分公司 | Power utilization information acquisition system and method based on big data technology |
CN107301206A (en) * | 2017-06-01 | 2017-10-27 | 华南理工大学 | A kind of distributed olap analysis method and system based on pre-computation |
CN107329982A (en) * | 2017-06-01 | 2017-11-07 | 华南理工大学 | A kind of big data parallel calculating method stored based on distributed column and system |
-
2019
- 2019-12-01 CN CN201911224071.9A patent/CN111046054A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105718593A (en) * | 2016-01-28 | 2016-06-29 | 长春师范大学 | Database query optimization method and system |
CN106126604A (en) * | 2016-06-20 | 2016-11-16 | 华南理工大学 | A kind of social security data log analysis process system based on Distributed Data Warehouse |
CN106651633A (en) * | 2016-10-09 | 2017-05-10 | 国网浙江省电力公司信息通信分公司 | Power utilization information acquisition system and method based on big data technology |
CN107301206A (en) * | 2017-06-01 | 2017-10-27 | 华南理工大学 | A kind of distributed olap analysis method and system based on pre-computation |
CN107329982A (en) * | 2017-06-01 | 2017-11-07 | 华南理工大学 | A kind of big data parallel calculating method stored based on distributed column and system |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111815453A (en) * | 2020-07-23 | 2020-10-23 | 国家电网公司西北分部 | Electric power transaction operation system |
CN111815453B (en) * | 2020-07-23 | 2024-01-23 | 国家电网公司西北分部 | Electric power transaction operation system |
CN115309790A (en) * | 2022-10-12 | 2022-11-08 | 国网山东省电力公司新泰市供电公司 | Power data query monitoring system in power system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9542424B2 (en) | Lifecycle-based horizontal partitioning | |
Lamb et al. | The vertica analytic database: C-store 7 years later | |
Färber et al. | The SAP HANA Database--An Architecture Overview. | |
US20170083573A1 (en) | Multi-query optimization | |
US9020910B2 (en) | Storing tables in a database system | |
AU2008300503B2 (en) | ETL-less zero redundancy system and method for reporting OLTP data | |
EP2350868B1 (en) | Efficient large-scale filtering and/or sorting for querying of column based data encoded structures | |
EP2040180B1 (en) | ETL-less zero-redundancy system and method for reporting OLTP data | |
EP2660732B1 (en) | Unified table query processing | |
US8935232B2 (en) | Query execution systems and methods | |
US20150278334A1 (en) | Interpreting relational database statements using a virtual multidimensional data model | |
CN107092627B (en) | Columnar storage representation of records | |
EP2270691A1 (en) | Computer-implemented method for operating a database and corresponding computer system | |
US10521440B2 (en) | High performance data profiler for big data | |
CN103942342A (en) | Memory database OLTP and OLAP concurrency query optimization method | |
Dehne et al. | The cgmCUBE project: Optimizing parallel data cube generation for ROLAP | |
CN101178732A (en) | Method for quick-speed realizing data store house process based on metadata | |
Samtani et al. | Recent advances and research problems in data warehousing | |
Wang et al. | Efficient query processing framework for big data warehouse: an almost join-free approach | |
CN118227767B (en) | Knowledge graph driven large model business intelligent decision question-answering system and method | |
CN111046054A (en) | Method and system for analyzing power marketing business data | |
Fu et al. | Cubist: a new algorithm for improving the performance of ad-hoc OLAP queries | |
US11086864B2 (en) | Optimizing search for data | |
Li et al. | SP-phoenix: a massive spatial point data management system based on phoenix | |
John et al. | High-dimensional Data Cubes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |