CN116431611A

CN116431611A - Automatic data processing module development method based on custom rules

Info

Publication number: CN116431611A
Application number: CN202310446764.2A
Authority: CN
Inventors: 张永祥; 王蔚青; 董凌; 金金; 孙志鹏; 陈佳鑫; 李丹; 马军军; 苗轲; 甘寿成
Original assignee: Qinghai Green Energy Data Co ltd
Current assignee: Qinghai Green Energy Data Co ltd
Priority date: 2023-04-24
Filing date: 2023-04-24
Publication date: 2023-07-14

Abstract

An automatic data processing module development method based on a custom rule aims at overcoming the defects of a traditional data processing system, provides a novel data processing module, can solve the problems that service scenes cannot be changed in time, different service objects cannot be flexibly processed, and new service processing capacity cannot be rapidly provided, and aims to improve the capability of the data processing system for flexibly providing solutions for different scenes and improve the adaptation capability of the data processing system. The system comprises a basic data definition module, a data model management module, a data dynamic storage module, a data standard operation UI generation module, a data service definition module, a data service execution module, a data customizable query module and a data visual display module; different data can be accessed without limitation; the data processing logic can be adjusted under the condition of not interrupting the service; the code quantity brought by system upgrading can be effectively reduced, the running failure probability of the software system is reduced, and the stability of the service system is improved.

Description

Automatic data processing module development method based on custom rules

Technical Field

The invention relates to the field of data processing in a mass data background, in particular to an automatic data processing module development method based on a custom rule.

Background

In software engineering, the types of mass data are various, and structured data represented by data in a relational database and unstructured data represented by log-type data are included; the data volume is large, but the data value is low, and the data of a person or thing in the small time range is accurately positioned from massive data to be further analyzed and processed.

When data of an upstream module is used for calculation, statistics and analysis, a message system, especially based on a distributed environment, can be used, and offline processing and real-time processing are provided, and when the data is backed up to another data center in real time and is operated in a cluster mode, the data can be composed of one service or a plurality of services.

In the traditional software engineering concept, the basic flow of carrying out demand analysis, modeling aiming at data, writing business codes, testing and online operation is deep and widely used. However, under the conditions of mass data access, unknown data types during design, complex and changeable calculation rules, the traditional design mode cannot timely cope with the demands of users. What is needed is a new "intelligent system" that can meet the demands of large-scale business expansion in performance while simultaneously addressing the cost and complexity challenges of data proliferation, and that flexibly utilizes the data to provide "analysis and insight" capabilities.

The existing data processing systems at present have the following disadvantages:

1. under the background of massive data access, the traditional data modeling mode has heavy workload and can not be changed timely and effectively according to actual service scenes; the method has the advantages that the method has high growth rate of massive data, diversified data types, and a data set which cannot be captured, managed and processed by using a conventional software tool within a certain time.

2. The data processing system realizes the shaping of the business objects in the process of demand analysis, the subsequent data modeling is designed according to the result of demand analysis, different business objects cannot be flexibly processed in time according to the change of the business, and the rapid access of new data is more difficult; the method is characterized in that massive data from the front end is quickly imported into a centralized large distributed database or a distributed storage cluster, and the centralized massive data stored in the centralized large distributed database or the distributed storage cluster is subjected to common query, classified summarization and the like by using a distributed technology, so that most common analysis requirements are met. The method can face the challenges of large imported data volume, large data volume related to query and more query requests.

3. The fundamental drawback of relational databases with hierarchical models is the lack of ability to directly construct the information types associated with these applications; performance problems can occur in simple types of complex data reconstruction processes, which are additionally complex in design process and uncoordinated with the data types between programming languages. In theory, the hierarchical model does not directly support complex data types, and a typical result of this inability to support complex data types is the need to decompose the data structure; these decomposed structures cannot directly represent application data and reconstruction from the base components is also very cumbersome and time consuming.

4. The complex query function of the hierarchical data model is poor. While a well defined approach has been provided for data querying, it can be very cumbersome when used to query complex information; in addition, standardized processes in engineering applications typically produce a large number of simple query operations; in such an environment, queries generated by accessing information must handle a large number of tables and complex code connection and join operations.

Unless these queries are provided in a fixed routine, the user must be very familiar with the hierarchical model to find the required information correctly; however, once the query pattern is executed in a fixed regular pattern, the user will eventually perform regular maintenance on the application software; however, changes to the application or human interface software may require routine queries to be modified frequently, and changes to the structure may also cause the routine queries and the application or human interface software to fail; for these reasons, the maintenance overhead of existing hierarchical model systems can be very large.

Because existing hierarchical models do not provide adequate building functions and performance, many engineering problems cannot be directly broken down into simple parts in a more complex model design process.

5. The processing business of all data is developed and completed in advance in a preset mode, and when the system faces new requirements, the application requirements of users cannot be met.

The data processing service has the advantages of high development and delivery cost, long period, high technical threshold to be mastered, more implementation phase problems, very complex delivery flow, close fitting with the service module, and great amount of analysis work for tuning, operation and maintenance and problem positioning of the data module, thereby causing difficult positioning of the operation and maintenance.

6. Data asset ambiguity, lack of necessary information for data storage and computing resource assessment; for the data storage, natural defects exist, and slight changes of the bottom layer have great influence on the upper layer; calculation apertures are inconsistent, and algorithms are inconsistent due to understanding differences of filtering of conditions, rules and the like.

Disclosure of Invention

Aiming at the defects of the traditional data processing system, the invention provides a novel data processing module which can solve the problems that service scenes cannot be changed in time, different service objects cannot be flexibly processed and new service processing capacity cannot be rapidly provided, and aims to improve the capacity of the data processing system for flexibly providing solutions for different scenes and the adaptation capacity of the data processing system; based on the data modeling concept, the method aims at eliminating the data island, solves the problems such as quality, multiplexing, expansion, cost and the like in the mass data calculation of the service through a set of standard methods and tool sets, and can drive the development method of the service development.

An automatic data processing module development method based on a custom rule comprises a basic data definition module, a data model management module, a data dynamic storage module, a data standard operation UI generation module, a data service definition module, a data service execution module, a data custom inquiry module and a data visual display module.

The basic data definition module is responsible for defining and storing metadata of access data and informing a system of rules and modes of using and storing the data; the basic data definition module is only responsible for defining and storing metadata, cannot process business related contents, and the stored format must be completed when defined, and the system cannot store data types of other forms and formats except the metadata.

The data model management module is responsible for converting defined metadata into a service model for use in service; and metadata cannot be directly transferred to the service model, otherwise, data in the service model cannot be transferred as metadata, and the service model data must be converted into metadata again to transfer between different modules.

The data dynamic storage module is responsible for storing data according to metadata definition, and carrying out data formatting, cleaning and partial data type conversion when the data is stored, so that the subsequent service is convenient to use; the dynamic storage is to classify and convert related metadata of existing indexes such as increment, stock and the like according to the rule defined by implementation, and then store the classified and converted data or forward the data again to be processed or stored by other dynamic storage modules; the dynamic memory module can have a plurality of or a plurality of groups of store-and-forward rules, and the store-and-forward rules are executed according to a set sequence.

The data standard operation UI generation module is responsible for providing an operation interface to provide basic data addition, deletion, correction, import and export services for users; in addition to the generation of standard UI modules, the generation of customized components such as search areas, pagers, batch operations, etc. is also provided, and these customized components can be added or deleted according to the requirements of standard operations.

The data service definition module is responsible for defining a data service flow and a rule by a user, defining a data processing mode by the user, and designating processing data aimed at by the service; defining different processing flows and rules for different services, wherein the flows and the rules must be defined firstly and then used in a specific service module; a business may have multiple process flows or rules, and the processes of the flows may be pre-arranged by grouping the flows and rules, and may be performed in a single, round-robin manner, either per group or per strip.

The data service execution module is responsible for carrying out service processing, calculation and return of the result on the appointed data according to the processing flow and the rule provided by the service definition module; the method is characterized in that batch and cyclic execution is carried out according to the processing flow and rule arranged by the data service definition module, a flow processing or batch processing and flow batch integrated processing mode can be adopted during calculation, the calculated result can be returned in batches or uniformly, and the result of the last processing is temporarily stored and used as an input parameter of the next processing.

The data self-defined query module is responsible for receiving a query condition set which is responsible for a user, automatically performing data table exchange, merging and the like, simultaneously adding complex conditions such as grouping, calculation and the like, and selecting a data column required by the user for returning; the user generates query conditions according to the corresponding functions of the modules through the query components provided in the UI module, the user only needs to submit the conditions to the module in a combined way, the module calls the calculation module to analyze, assemble and arrange the query conditions, and the data detail required by the user is returned in a primary way.

The data visualization display module is responsible for graphically displaying the query result in the forms of a graph, a bar graph, a pie chart and the like for the user to review. The user can inquire according to the ranges of years, months, weeks, custom time periods and the like, and can add and delete chart results according to the actual needs of the user, so that custom operation is completely performed.

Compared with the prior art, the invention has the following beneficial effects:

the invention can access data of different types, formats and quantity without limitation, can automatically realize storage, data processing analysis and feed back the calculation result to the user; and (3) carrying out the advance definition of the data format according to a predefined mode, and checking whether the matched format exists in the predefined data format when the format of the data is not met, so that the success rate of data processing and storage is improved, and the data loss caused by abnormal errors is reduced.

2. The invention can adjust the data processing logic under the condition of not interrupting the service, reflect the latest calculation result in real time, greatly reduce the time cost caused by service adjustment, and improve the service processing efficiency of the user in a phase change manner; the data processing mode is based on the quasi-real-time streaming data connection, introduction, mixing, modeling and generation. The user may perform data preparation operations such as join and filter and grouping operations during execution of time window aggregation; creating a single fact source by forcing connections to the data stream instead of the underlying system; through this single source, it is possible to control which data is accessed and how the data is disclosed to the associated module, ultimately mapping the data to the standard definition.

3. The invention can effectively reduce the code quantity brought by system upgrade, reduce the running failure probability of the software system and greatly improve the stability of the service system. The reusability of the basic data elements can be improved by using reusable conversion logic and sharing the reusable conversion logic by a plurality of data sets, so that separate connection with cloud or local data sources is not required to be established; thus, only individual modules are allowed to access the underlying data source, and then the associated modules provide access to the data stream so that it can be built on the basis of the data stream. This approach may reduce the load on the underlying system, enabling an administrator to better control when the system is loaded by refresh. The data stream refresh plan is managed directly from the workspace where the data stream is created, just like the data set.

4. The invention is independent in deployment and is served in an interface mode, only a rule is required to be defined, and the system can automatically realize the full-service flow work of different service data storage, cleaning, calculation, analysis, display and the like according to the instruction of a user.

Drawings

FIG. 1 is a flow chart of the present invention.

Description of the embodiments

The following description will clearly and fully describe the technical solutions in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Embodiment one: the automatic data processing module development method based on the custom rules comprises the following steps:

A. firstly, a user opens a data definition page of an automatic data processing module, and the data format definition to be processed is input into a system; the method specifically comprises the steps of English name, chinese name, data type, length, default value, whether null, unit and input style definition are allowed, a module to which the data belongs, whether the data has metadata attribute, data checking rule, data identifier, data line set, character set of the data (when the data contains Chinese), checking rule of the character set, value range of the data, whether the data can be self-increased, self-increased step length, whether the data is common metadata and the like; the system can accurately know the information such as the number, the type, the specification, the protocol, the scope, the relation among the fields and the like of the data fields to be processed.

During the data definition process, the system automatically monitors whether the same metadata item exists, if the same metadata item exists, the system can be considered to belong to the same object, and when the data is stored, the data with the same metadata item can be stored in the same partition, so that the reading speed is increased.

B. Access data sources are then defined, including data source type, URL, username, password, access frequency, and other parameters. Multiple data sources can be selected according to the specific deployment mode of the data sources, and can be divided into a master-slave mode (comprising one master and multiple slaves, multiple masters and multiple slaves) and a cluster type data source.

When the data source is in a master-slave mode, a master data source (writing) and a slave data source (reading) are required to be respectively defined, and the data source type, the URL, the user name, the password, the access frequency, other parameters and the like of the data source are respectively set.

When the data source is a cluster, only the data source type, URL, user name, password, access frequency and other parameters of any data source in the cluster are needed to be provided, and when the fault data source is unavailable, the monitoring module automatically negotiates with the cluster and switches to the normal data source.

Configuring a connection pool for a data source is responsible for allocating, managing and releasing data source connections, which allows an application to reuse an existing data source connection instead of re-establishing one more; the data source connection whose idle time exceeds the maximum idle time is released to avoid database connection omission caused by the absence of releasing the database connection. This technique can significantly improve the performance of database operations. Wherein the initial number of links of the connection pool, the minimum number of connections in the idle period, the maximum number of connections, the maximum waiting time of the process, etc. need to be set.

C. At this time, the accessed data content can be checked through the data standard operation UI, and basic operations of adding, deleting, importing and exporting are performed. When all data in the UI are subjected to standard operation, limiting operation is performed according to whether the current user has corresponding permission.

It is not recommended to edit metadata that has been used by streaming rules that have generated a large amount of data, which can create read anomalies for the data that has been saved, and blank cases for historical data when querying the data. Although viewing data can be achieved by adjusting metadata attributes, there is some time penalty for processing large amounts of data.

D. The user defines data processing rules including rule names, remarks, data processing expressions, data processing logic conditions, etc. using the data service definition module.

The method is characterized in that the method is a core step of processing data, different data processing rules are provided according to different services, metadata defined in advance can be used in each processing rule, the metadata can be referenced in a multistage nested and multistage parallel manner in the writing of data processing expressions, and alias definition can be carried out on the metadata which are referenced for multiple times to facilitate the writing of the expressions.

The data logic conditions may use symbols common to programming class languages.

E. The user selects the data item to be processed and specifies the data processing rules. Binding a data item and a rule, designating the rule to use the item of data, carrying out streaming processing on the conforming metadata through a certain defined execution rule, wherein one rule can comprise a plurality of streaming processing, sequentially transmitting the data processed by the previous stream to the next stream for processing, and returning the processing result until the last stream in the rule is processed.

The data rule should specify the sequence of the data streaming processing, and the sequence value of the data streaming processing can be adjusted by adjusting the sequence value of the streaming processing, and whether the metadata value in the processing is reserved or not is used as a pre-result of the next calculation.

Setting whether the final processing result of the rule is stored or not, setting a storage period when the final processing result of the rule is required to be stored, and automatically deleting the earlier-stage storage content when the storage period is exceeded.

F. At this time, the results after the data processing can be checked through the data standard operation UI. The method mainly shows the rule execution conditions set in the earlier stage, and can check the execution conditions of all rules, such as whether the execution is successful, the execution times, the abnormal times, the data volume generated by the rules, the data volume stored by the rules and the like.

G. The user uses the data custom query module to define parameters of the query data, including query conditions (support and or not), grouping, computing, etc. The data which are queried and processed through the combination of the query parameters are stored respectively through years, the condition of slow query occurs when the time span is large, the query range can be set according to the actual data volume condition, and the query can be carried out when the data are not in the same time range, and the query can be carried out on the data sources.

H. The user data visualization display module designates the customized query into a specific display model (a graph, a pie chart, a histogram, a thermodynamic diagram and the like), namely a data graph rendered by the query analysis result can be seen in the display platform; the statistical data after the data are subjected to aggregation processing is archived as historical data, only the statistical data are reserved, and the statistical data can be downloaded or screen-shot and saved according to the setting of a chart; a fixed time interval can be set, data is prepared when the module is loaded, and the data prepared in advance is directly loaded for rendering when the module is used; the type and the position of the chart can be manually changed and dragged to meet the requirements of individuation and customization; when the chart is newly added, a data source and a processing rule result need to be selected.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. An automatic data processing module development method based on a custom rule comprises a basic data definition module, a data model management module, a data dynamic storage module, a data standard operation UI generation module, a data service definition module, a data service execution module, a data custom inquiry module and a data visual display module; the method is characterized in that:

the basic data definition module is responsible for defining and storing metadata of access data, informing a system of using rules and modes for using and storing the data, the basic data definition module is only responsible for defining and storing the metadata and cannot process content related to business, the stored format must be completed in definition, and the system cannot store data types of other forms and formats except the metadata;

the data model management module is responsible for converting metadata defined by the basic data definition module into service model data for use in service; the metadata cannot be directly transferred to the service model, otherwise, the data in the service model cannot be transferred as metadata, and the service model data must be converted into metadata again to transfer among different modules;

the data dynamic storage module is responsible for storing data according to metadata definition, and carrying out data formatting, cleaning and partial data type conversion when the data is stored, so that the subsequent service is convenient to use; the dynamic storage is to classify and convert related metadata of existing indexes such as increment, stock and the like according to the rule defined by implementation, and then store the classified and converted data or forward the data again to be processed or stored by other dynamic storage modules; the dynamic storage module can be provided with a plurality of or a plurality of groups of storage forwarding rules, and the storage forwarding rules are executed according to a set sequence;

the data standard operation UI generation module is responsible for providing an operation interface to provide basic data addition, deletion, correction, import and export services for users; in addition to the standard UI module, the generation of customized components such as a search area, a pager, batch operation and the like is provided, and the customized components can be added or deleted according to the requirement of the standard operation;

the data service definition module is responsible for defining a data service flow and a rule by a user, defining a data processing mode by the user, and designating processing data aimed at by the service; defining different processing flows and rules for different services, wherein the flows and the rules must be defined firstly and then used in a specific service module; a business can have a plurality of processing flows or rules, the processing of the flows can be pre-arranged by grouping the flows and the rules, and the processing can be carried out in a single and circular way by groups or stripes;

the data service execution module is responsible for carrying out service processing, calculation and return of the result on the appointed data according to the processing flow and the rule provided by the service definition module; the method comprises the steps of carrying out batch and cyclic execution according to the processing flow and rule arranged by the data service definition module, adopting a flow processing or batch processing and flow batch integrated processing mode during calculation, returning the calculated result in batches or uniformly, and temporarily storing the last processing result for being used as an input parameter for the next processing;

the data self-defined query module is responsible for receiving a query condition set which is responsible for a user, automatically performing data table exchange, merging and the like, simultaneously adding complex conditions such as grouping, calculation and the like, and selecting a data column required by the user for returning; the user generates a query condition according to the corresponding function of the module through a query component provided in the UI module, the user only needs to combine the conditions and submit the conditions to the module, the module calls the calculation module to analyze, assemble and arrange the query condition, and the data detail required by the user is returned in a primary way;

the data visual display module is responsible for graphically displaying the query result in the forms of a graph, a bar graph, a pie chart and the like for the user to review; the user can inquire according to the ranges of years, months, weeks, custom time periods and the like, and can add and delete chart results according to the actual needs of the user, so that custom operation is completely performed.

2. The method for developing an automated data processing module based on custom rules of claim 1, wherein the metadata technique is utilized to automatically maintain a data storage format to accommodate different data storage requirements.

3. The method of claim 1, wherein the data is automatically processed according to custom data business rules and logic and the result is returned.

4. The method for developing an automated data processing module based on a custom rule according to claim 1, wherein the custom query result is graphically presented in the form of a graph, a bar graph, a pie chart, etc., for the user to review.

5. The method for developing an automated data processing module based on custom rules according to claim 1, wherein the specific development method comprises the following steps:

A. firstly, a user opens a data definition page of an automatic data processing module, and the data format definition to be processed is input into a system; the method specifically comprises the steps of English name, chinese name, data type, length, default value, whether null, unit and input style definition are allowed, a module to which the data belongs, whether the data has metadata attribute, data checking rule, data identifier, data line set, character set of the data (when the data contains Chinese), checking rule of the character set, value range of the data, whether the data can be self-increased, self-increased step length, whether the data is common metadata and the like; the system can accurately know the information such as the number, the type, the specification, the protocol, the scope, the relation among the fields and the like of the data fields to be processed;

the system automatically monitors whether the same metadata item exists in the data definition process, if the same metadata item exists, the system can be considered to belong to the same object, and when the data is stored, the data with the same metadata item can be stored in the same partition, so that the reading speed is increased;

B. then defining an access data source comprising a data source type, a URL, a user name, a password, an access frequency, other parameters and the like; multiple data sources can be selected according to the specific deployment mode of the data sources, and can be divided into master-slave (comprising one master and multiple slaves and multiple masters and multiple slaves) and cluster data sources;

when the data source is master-slave, the master data source (writing) and the slave data source (reading) are required to be respectively defined, and the data source type, URL, user name, password, access frequency, other parameters and the like of the data source are respectively set;

when the data source is a cluster, only the data source type, URL, user name, password, access frequency and other parameters of any one data source in the cluster are needed to be provided, and when the fault data source is unavailable, the monitoring module automatically negotiates with the cluster and switches to the normal data source;

configuring a connection pool for a data source is responsible for allocating, managing and releasing data source connections, which allows an application to reuse an existing data source connection instead of re-establishing one more; the data source connection whose idle time exceeds the maximum idle time is released to avoid database connection omission caused by the absence of releasing the database connection. This technique can significantly improve the performance of database operations. The method comprises the steps that the initial link number of a connection pool, the minimum connection number in an idle period, the maximum connection number, the maximum waiting time of a process and the like are required to be set;

C. at this time, the accessed data content can be checked through the data standard operation UI, and basic operations of adding, deleting, importing and exporting are performed. When all data in the UI are subjected to standard operation, limiting operation is performed according to whether the current user has corresponding authority or not;

it is not recommended to edit metadata that has been used by streaming rules that have generated a large amount of data, which can create read anomalies for the data that has been saved, and blank cases for historical data when querying the data. Although the effect of viewing the data can be achieved by adjusting the metadata attribute, a certain time loss exists for processing a large amount of data;

D. the user uses the data service definition module to define the data processing rule, which comprises rule name, remarks, data processing expression, data processing logic condition and the like;

the method is characterized in that the method comprises the core step of processing data, different data processing rules are provided according to different services, metadata defined in advance can be used in each processing rule, the metadata can be referenced in a multistage nested and multistage parallel manner on the writing of data processing expressions, and alias definition can be carried out on the metadata which are referenced for multiple times to facilitate the writing of the expressions; the data logic conditions may use symbols common to programming class languages;

E. the user selects the data item to be processed and specifies the data processing rules. Binding a data item and a rule, designating the rule to use the item of data, carrying out streaming processing on the conforming metadata through a certain defined execution rule, wherein one rule can comprise a plurality of streaming processing, sequentially transmitting the data processed by the previous stream to the next stream for processing, and returning the processing result until the last stream in the rule is processed;

the sequence of the data streaming processing is specified in the data rule, the sequencing value of the streaming processing can be adjusted by adjusting the sequencing value, and whether the metadata value in the processing is reserved or not is used as a pre-result of the next calculation;

setting whether the final processing result of the rule is stored or not, setting a storage period when the final processing result of the rule is required to be stored, and automatically deleting the earlier-stage storage content when the storage period is exceeded;

F. at this time, the results after the data processing can be checked through the data standard operation UI. The method mainly shows the rule execution conditions set in the earlier stage, and can check the execution conditions of all rules, such as whether the execution is successful, the execution times, the abnormal times, the data volume generated by the rules, the data volume stored by the rules and the like;

G. the user uses the data custom query module to define parameters of the query data, including query conditions (support and or not), grouping, computing, etc. The processed data which are queried through the combination of the query parameters are respectively stored through years, the condition of slow query occurs when the time span is large, the query range can be set according to the actual data volume condition, and the query can be performed when the data are not in the same time range, and the query can be performed on the data sources;