CN114518907B

CN114518907B - Extensible self-service data analysis method and system

Info

Publication number: CN114518907B
Application number: CN202210098718.3A
Authority: CN
Inventors: 姜磊; 朱宏飞; 杨钊; 李成; 钟颖欣
Original assignee: Brilliant Data Analytics Inc
Current assignee: Brilliant Data Analytics Inc
Priority date: 2022-01-27
Filing date: 2022-01-27
Publication date: 2023-11-03
Anticipated expiration: 2042-01-27
Also published as: CN114518907A

Abstract

The invention relates to a computer and data analysis technology, in particular to an expandable self-service data analysis method and system, wherein the method comprises the following steps: defining a data analysis underlying structure, abstracting entity objects of the data analysis, defining analysis objects and relations among the analysis objects, abstracting analysis operation, and forming an infrastructure model of the system; based on micro-kernel architecture design and encapsulation core layer, the core layer develops an expansion plug-in based on a distributed architecture, and encapsulates the change of service function expansion in the plug-in; defining an extensible interface, and opening each service function of the extensible layer according to the analysis capability provided by the core layer and the standard interface form; the analysis objects and the relation among the analysis objects are set as a business layer in the form of plug-ins, verification and management are carried out through a core layer according to the corresponding analysis model in the analysis objects, and the core layer drives the plug-ins to operate in the form of events. The system has the characteristics of stable structure, high availability and zero data loss.

Description

Extensible self-service data analysis method and system

Technical Field

The invention relates to a computer and data analysis technology, in particular to an extensible self-service data analysis method and system.

Background

At present, the development of data analysis software is mature, and the products at home and abroad are numerous, but most of the products do not support open expansion. The situation that the system cannot meet the requirements often occurs when data analysis is performed, for example, a new algorithm needs to be added to the analysis process or a new analysis model needs to be added, when the situation is met, a system developer needs to be found to support the system, the period is extremely long, the price is high, and the development of analysis work is seriously hindered, so that the system is a main reason why most of data analysis tools cannot be comprehensively used.

In addition, the existing data analysis system has the analysis model and the core analysis flow management which are tightly mixed together, and each new analysis model is added, the new analysis model needs to be modified and changed into a core code, which causes instability to the corresponding computer program.

Disclosure of Invention

In view of the problems existing in the prior art, the invention provides an extensible self-service data analysis method and system, which realize the functions of plug-in management and quick extension of a data analysis model in a microkernel mode, ensure the stability of an analysis process in a distributed architecture, realize the function extension easily, enable various personalized requirements to be realized quickly, effectively promote the healthy development of the data analysis field, and have the characteristics of stable system structure, high availability and zero data loss.

The analysis method is realized by adopting the following technical scheme: the expandable self-service data analysis method comprises the following steps:

s1, defining a data analysis underlying structure, abstracting entity objects of data analysis, defining analysis objects and relations among the analysis objects, and abstracting analysis operations to form a system infrastructure model;

s2, designing and packaging a core layer based on a micro-kernel architecture; the core layer is an analysis engine and is responsible for converting an analysis task of the service layer into a data query task and returning data so as to support analysis requirements of different service scenes; the core layer develops an expansion plug-in based on the distributed architecture, and encapsulates the change of service function expansion in the plug-in;

s3, defining an extensible interface, and opening each service function of the extensible layer according to the analysis capability provided by the core layer and the standard interface form;

s4, setting the analysis objects defined in the step S1 and the relation among the analysis objects as a business layer in the form of plug-in units; and developing a corresponding analysis model in the analysis object according to the actual business calculation logic, checking and managing through a core layer, and driving the plug-in to operate in an event mode by the core layer.

In a preferred embodiment, step S1 abstracts the entity object according to the integrated service, and the defined analysis object includes: analysis board, analysis graph, analysis window, analysis node, analysis path, analysis model and analysis method; the analysis board is responsible for managing the analysis window and laying out the position of the analysis window; the analysis path is responsible for recording the precedence calculation and the dependency relationship between the analysis nodes; the analysis window is responsible for displaying the data and graphic configuration provided by the analysis node; the analysis node is provided with an extensible analysis model and an analysis method, the analysis model is the parameter configuration of an analysis algorithm and can be stored in a lasting mode, the analysis method is realized through a business logic code of the analysis algorithm, and actual analysis calculation is executed according to the parameter configuration defined by the analysis model.

Further preferably, the data of the analysis object is modified and changed by analysis operation and analysis operation parameters; any modification of the analysis board will result in one analysis operation, each of which is serializable.

The analysis system is realized by adopting the following technical scheme: an expandable self-service data analysis system, comprising:

the underlying structure definition module is used for abstracting entity objects of data analysis, defining the relation between the analysis objects and each analysis object, and abstracting analysis operation to form an infrastructure model of the system;

the core layer design module is used for designing and packaging a core layer based on a microkernel architecture; the core layer is an analysis engine and is responsible for converting an analysis task of the service layer into a data query task and returning data so as to support analysis requirements of different service scenes; the core layer develops an expansion plug-in based on the distributed architecture, and encapsulates the change of service function expansion in the plug-in;

the extensible interface definition module is used for opening each service function of the extensible layer according to the analysis capability provided by the core layer and the standard interface form;

the business layer setting module is used for setting the analysis objects defined by the bottom structure definition module and the relation among the analysis objects as a business layer in the form of plug-in units; and developing a corresponding analysis model in the analysis object according to the actual business calculation logic, checking and managing through a core layer, and driving the plug-in to operate in an event mode by the core layer.

The invention opens the expansion attribute of each business function while providing the basic function of self-service data analysis, and can expand the required functions on the system only by acting according to the specified interface standard, so that various personalized requirements can be realized rapidly, and the healthy development of the data analysis field is effectively promoted. Compared with the prior art, the invention has the following advantages and beneficial effects.

1. The invention realizes the functions of plug-in management and rapid expansion of the data analysis model in a microkernel mode, and simultaneously ensures the stability of the system architecture, and the new model belongs to the mounted plug-in without modifying core codes and affecting the system kernel, so that the whole system is more robust. The invention also ensures the stability of the analysis process by a distributed architecture, and has the characteristics of high availability and zero data loss.

2. The core layer based on the microkernel structure is provided with a perfect event mechanism and an analysis life cycle hook, and has high-real-time response programming, so that new plug-ins can be more easily expanded and developed, codes of the plug-ins only need to register events in one place or mount the analysis life cycle hook, the thought of high cohesion-low coupling is reflected, the responsibility is single, and the maintenance is easy. The built-in intelligent title, node execution, refreshing plan, pre-alarm and other plug-ins are all the development of a file realization function, and are easy to realize.

3. The invention has the multi-user analysis session function and the same analysis path structure, and the authorities of different users are often different, so that different analysis sessions can be started according to different users, and the authority constraint conditions of the current user in the session can be analyzed.

4. The analysis operation (Action) request and the actual analysis execution are separated, so that the analysis operation request has more stable interface response time, and the operation response time is not prolonged due to insufficient resources of a service database.

5. Under the architecture of the microkernel, the method has strict calling range constraint, reduces the probability of writing error codes, for example, can not call the data of the modification signboard without the execution of analysis operation, and ensures that the modification is completed through analysis operation instructions. Each operation has deterministic results, and the input and the output are fixed, so that the unit test can be more conveniently carried out; the state of the sign may be recorded using json files, and the execution of each analysis operation represents a state transition of the sign. In addition, the analysis operations (actions) are serializable, enabling undo, redo, and even replay analysis processes, rendering the sign as it is generated in a step-by-step analysis.

6. The traditional load balancing utilizes a plurality of predefined rules, and the invention adopts AI to calculate the optimal load in real time, so that the optimal resource utilization rate can be achieved according to the real-time resource supply condition of each machine.

Drawings

FIG. 1 is a flow chart of a method of data analysis in an embodiment of the invention;

FIG. 2 is a diagram of the relationship of analysis objects abstractly defined in an embodiment of the invention;

FIG. 3 is an application architecture diagram of a microkernel system designed in an embodiment of the present invention;

FIG. 4 is a schematic diagram of an analysis operation interface call and an asynchronous execution process in an embodiment of the invention.

Detailed Description

The invention will be further described with reference to examples and figures; it will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments obtained without inventive effort by a person skilled in the art based on the embodiments of the invention are within the scope of protection of the invention.

Example 1

As shown in fig. 1, the scalable self-service data analysis method in this embodiment includes the following steps:

s1, defining a data analysis underlying structure: the entity object of data analysis is abstracted, the analysis object and the relation among the analysis objects are defined, and meanwhile, the analysis operation is abstracted, so that an infrastructure model of the system is formed.

According to the comprehensive business, the embodiment abstracts the entity object and defines the following analysis object: an Analysis board (Dashboard), an Analysis graph (Analysis graph), an Analysis window (Analysis window), analysis nodes (Analysis nodes), analysis paths (Analysis connectors), analysis models (Analysis models, extensible), and Analysis methods (Analysis methods, extensible), as shown in fig. 2. FIG. 2 schematically illustrates the relationship between analysis objects such as analysis tiles, analysis paths, analysis windows, analysis nodes, etc., where the analysis tiles are responsible for managing the analysis windows, and laying out the locations of the analysis windows; the analysis path is responsible for recording the precedence calculation and the dependency relationship between the analysis nodes; the analysis window is responsible for displaying the data and graphic configuration provided by the analysis node; the analysis node is provided with an extensible analysis model and an analysis method, the analysis model is the parameter configuration of the analysis algorithm and can be stored in a lasting mode, the analysis method is realized through a business logic code of the analysis algorithm, and actual analysis calculation is executed according to the parameter configuration defined by the analysis model. The analysis model comprises analysis types such as conversion analysis, multidimensional analysis, operator analysis and the like.

The data of the Analysis object is modified and changed by Analysis operation (scalable) and Analysis operation parameters (Analysis Action Arguments, scalable). Any user modification to the sign will result in an analysis operation, such as moving the coordinate position of the chart, modifying the chart display type, and adding fitting calculation. Notably, each analysis operation is serializable, it can be saved as a log, and the analysis process can be replayed and played back.

Preferably, to improve performance, all of the analysis objects have two ways of serialization, one is Protobuf-based binary serialization and the other is Json-based serialization. The Protobuf-based serialization occupies small space, has high performance, is suitable for cache loading, and is used for analyzing objects between the server cluster Leader and the Follower so as to improve the response performance to the request of the user side. The Json-based serialization has good readability, is used for analyzing objects between the front end and the back end, and is convenient to upgrade due to the fact that the Json-based serialization structure is used for database storage.

S2, designing and packaging a core layer based on a micro-kernel architecture: the core layer is used as an analysis engine and is used for ensuring the stable and efficient operation of the data analysis system, and is mainly responsible for converting the analysis task of the service layer into the data query task and returning the data, and the work of the core layer is required to be abstract as much as possible, so that the analysis requirements of different service scenes can be ensured to be supported.

In this embodiment, the design concept of the core layer refers to the micro-core architecture. The essence of the micro-kernel architecture is to encapsulate the change of the service function expansion in the plug-in, thereby achieving the purpose of rapid and flexible expansion without affecting the stability of the whole system. The Core System (Core System) has stable functions, can not be continuously modified due to service function expansion, and the plug-in module can be continuously expanded according to the service function requirement. As the service function is changed along with the change of the service requirement, namely the code of the data analysis is adjusted, the core layer architecture based on the microkernel designed by the embodiment is rarely changed, so that the whole data analysis system is more stable; if the system has problems after service expansion, only the changed service expansion module can not influence other modules, thereby providing feasibility for rapidly checking problems in the running process of the system.

In the core layer, the expansion plugin is developed based on a distributed architecture. The distributed architecture can only guarantee two points at most according to CAP theory, namely Consistency (Consistency), availability (Availability) and partition fault tolerance (Partition tolerance), and the three elements cannot be considered. In order to ensure that data is not lost in the analysis process, partition fault tolerance is required, the data is required to be redundant, and backup is required to be stored. Furthermore, since an analysis board often has one user or several users during analysis, not all users globally. Therefore, in this embodiment, the AP is finally selected from three elements of CAP theory, namely Availability (Availability) and partition fault tolerance (Partition tolerance), so as to ensure high Availability and partition fault tolerance of the system.

In addition, the core layer in this embodiment algorithmically selects a Raft algorithm, and when a user initiates an analysis operation in combination with analysis operation logging, the analysis operation is first serialized into Json text, and then uniformly forwarded to the Leader, and then distributed by the Leader to the opposite Follower machines (multiple, redundant, and mutually backup). When part of machines are in fault and downtime, the clusters automatically recover to be normal. If the Leader is down, the cluster reselects the Leader; if the data is in the FOLLOwer downtime, the data can be restored from other FOLLOwer backups, and the redundant backups are formed again, so that the data safety is ensured.

A query computation layer is also arranged at the core layer. The query computation layer is in part a Serverless architecture. The general name of Serverless is Serverless computing Serverless operation, also known as Function-as-a-Service (abbreviated as FaaS), is a model of cloud computing. In this embodiment, an AI is introduced in the query calculation layer to perform resource scheduling, firstly, information of a database, such as data amount, data type, etc., is read to evaluate a resource consumption amount, and then state information of a machine providing calculation, such as CPU, memory, disk, bandwidth, etc., is collected, and an appropriate machine is selected by the AI to perform calculation processing, so that a scheduling policy can be dynamically adjusted. Specifically, AI automatically balances resource scheduling according to the utilization rate of CPU, memory, disk and bandwidth of the server and by combining the resource consumption of program tasks, so as to realize intelligent load balancing and ensure that the resource utilization rate reaches the optimal state.

In order to enable the whole system to operate, each local state change is triggered by an event, whether the configurable event flows only inside the server side or is sent to the front end or not can be configured, and part of the events can also be directly sent to the front end. In addition, a life cycle hook is added in the calculation processing procedure, for example, before (after) adding an analysis node, before (after) starting calculation, before (after) completing calculation, and the like. For example, the data desensitizing plug-in runs on the life cycle hook after the calculation is completed, when the calculation of the data is completed and is about to be sent to the front end, if the data desensitizing plug-in needs to be performed, the data desensitizing plug-in firstly intercepts the event, then performs data desensitization, finally acquires the data from the front end through the event notification, and renders the interface.

S3, defining an extensible interface. According to the analysis capability provided by the core layer, opening each service function of the expansion layer (also called interface layer) in a standard interface form, wherein the functions supported by the expansion layer comprise the custom expansion of a data platform, the expansion of an analysis model, the expansion of an analysis algorithm, the expansion of analysis operation, the expansion of graphic rendering and the like.

For the architecture of a microkernel, supporting extension development is necessary, so the embodiment performs extension development on the analysis model, the extension plug-in, the analysis operation and the API interface, specifically as follows:

the analytical model is extended. The Analysis model mainly develops two parts, namely model parameter configuration (Analysis Entity) and model algorithm (Analysis method) implementation. The method is divided into two parts because in the traditional object-oriented programming, the attribute and the method of one object are put together, which seems to be no problem, but when the attribute and the business logic are more, the attribute and the business logic are too bulky to be maintained, so that a combined mode is adopted. The model parameters are to be stored in a lasting mode, the logic codes of the model algorithm are not needed, and the data to be stored can be more conveniently seen by separating the model parameter configuration from the model algorithm, and the data to be concerned is needed to be paid attention to in the subsequent upgrading, so that the maintainability of the system is improved, and the occurrence of low-level errors is reduced.

The analysis plug-in is expanded. In addition to some plug-ins already in the kernel, new plug-ins can be developed to enhance the analysis function, and the plug-ins operate in three ways: 1) Calling an interface; 2) A life cycle hook call; 3) And analyzing the event trigger.

The analysis operations are extended. The Analysis operation, namely, executing the modification Action on the signboard, wherein all classes inherit the Analysis Action class so as to ensure that all changes can be recorded. The analysis operation has a mandatory requirement that the reverse operation corresponding to the forward operation must be written to ensure that any operation is revocable; as described above, the analysis objects are serializable json, so the test suite of the kernel also detects the analysis operation of the extension development to determine whether the requirements are satisfied.

And expanding an API interface. The API interface is divided into two types of GraphQL and RESTful, and various required data can be acquired according to the need. It is worth noting that both interfaces are read-only, consistent with idempotent, without side effects. If an attempt is made to modify the billboard's data in the interface code, the kernel will detect an illegal modification operation and therefore cannot modify any data that needs to be persisted (if the data needs to be modified, an analysis operation implementation is required).

S4, setting the analysis objects defined in the step S1 and the relation among the analysis objects as a business layer in the form of plug-in units; and developing a corresponding analysis model in the analysis object according to the actual business calculation logic, checking and managing through a core layer, and driving the plug-in to operate in an event mode by the core layer. For example, when the condition is updated, the intelligent title plug-in is automatically triggered, the intelligent title plug-in generates a corresponding title according to the condition, and if the connection between the analysis nodes is not locked, the intelligent title plug-in automatically cascades the title of each analysis node and notifies the front end of updating the interface in real time in the form of an event.

In this embodiment, the application architecture of the microkernel system is divided into three layers, namely an interface layer, a service layer and a core layer, as shown in fig. 3. Wherein the interface layer is responsible for interacting with the client; the business layer contains specific analysis business logic, including various analysis models, analysis operations, business expansion plug-ins and the like; the core layer is a base of the whole system operation and is responsible for general functions irrelevant to specific services, such as: module loading, event flow control, execution scheduling, operation history management, and the like.

As shown in fig. 4, when a user initiates an analysis operation, the interface call of the analysis operation by the core layer includes the steps of: firstly), a user (i.e. the front end) selects an analysis model and parameters according to requirements; secondly), packaging the analysis model and the corresponding parameters into analysis operation, and sending the analysis operation to a server (namely a back end); third), master node scheduling, which sends the analysis operation to the corresponding analysis engine, and the analysis engine processes the modification of the analysis operation to the analysis billboard.

If data calculation is involved in the Master node scheduling process, the data calculation is transferred to a background, the current interface request directly returns to the state of the background calculation, the content of the data calculation is packaged and sent to a calculation scheduling center, and the calculation scheduling center predicts the fastest machine in real time according to AI to perform calculation processing; and finally, returning the calculation result to the analysis engine in an event mode, carrying out subsequent data processing (for example, serializing into binary in advance, improving the speed of acquiring data by the front end, improving the speed of single http request, improving the concurrency capacity of the browser end) by the analysis engine, informing the event to the front end to acquire the result, and carrying out interface rendering.

Example 2

Based on the same inventive concept as embodiment 1, this embodiment provides an expandable self-service data analysis system, specifically including the following steps:

The underlying structure definition module abstracts the entity object according to the comprehensive service, and the defined analysis object comprises: analysis board, analysis graph, analysis window, analysis node, analysis path, analysis model and analysis method; the analysis board is responsible for managing the analysis window and laying out the position of the analysis window; the analysis path is responsible for recording the precedence calculation and the dependency relationship between the analysis nodes; the analysis window is responsible for displaying the data and graphic configuration provided by the analysis node; the analysis node is provided with an extensible analysis model and an analysis method, the analysis model is the parameter configuration of an analysis algorithm and can be stored in a lasting mode, the analysis method is realized through a business logic code of the analysis algorithm, and actual analysis calculation is executed according to the parameter configuration defined by the analysis model;

modifying and changing the data of the analysis object through analysis operation and analysis operation parameters; any modification of the analysis board will result in an analysis operation, each of which is serializable; when an analysis operation is initiated, the interface call of the analysis operation by the core layer comprises the steps of: the front end selects an analysis model and parameters according to the requirements; packaging the analysis model and the corresponding parameters into analysis operation and sending the analysis operation to a server; and (3) Master node scheduling, namely sending the analysis operation to a corresponding analysis engine, and processing the modification of the analysis operation to the analysis billboard by the analysis engine.

The modules in this embodiment correspondingly execute and implement the steps in embodiment 1, which are not described herein.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereto, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the present invention.

Claims

1. An expandable self-service data analysis method is characterized by comprising the following steps:

s4, setting the analysis objects defined in the step S1 and the relation among the analysis objects as a business layer in the form of plug-in units; corresponding analysis models in the analysis objects are developed according to actual business calculation logic, verification and management are carried out through a core layer, and the core layer drives the plug-in to operate in the form of an event;

step S1, abstracting entity objects according to comprehensive services, wherein defined analysis objects comprise: analysis board, analysis graph, analysis window, analysis node, analysis path, analysis model and analysis method; the analysis board is responsible for managing the analysis window and laying out the position of the analysis window; the analysis path is responsible for recording the precedence calculation and the dependency relationship between the analysis nodes; the analysis window is responsible for displaying the data and graphic configuration provided by the analysis node; the analysis node is provided with an extensible analysis model and an analysis method, the analysis model is the parameter configuration of an analysis algorithm and can be stored in a lasting mode, the analysis method is realized through a business logic code of the analysis algorithm, and actual analysis calculation is executed according to the parameter configuration defined by the analysis model;

the data of the analysis object is modified and changed through analysis operation and analysis operation parameters; any modification of the analysis board will result in one analysis operation, each of which is serializable.

2. The self-service data analysis method according to claim 1, wherein all analysis objects have two serialization methods: protobuf-based binary serialization and Json-based serialization.

3. The self-service data analysis method according to claim 2, wherein the core layer of step S2 selects a Raft algorithm, and when an analysis operation is initiated, the analysis operation is first serialized into Json text, and the Json text is uniformly forwarded to a Leader, and then distributed to an opposite Follower machine by the Leader.

4. The self-service data analysis method according to claim 1, wherein when an analysis operation is initiated, the interface call of the analysis operation by the core layer comprises the steps of:

the front end selects an analysis model and parameters according to the requirements;

packaging the analysis model and the corresponding parameters into analysis operation and sending the analysis operation to a server;

and (3) Master node scheduling, namely sending the analysis operation to a corresponding analysis engine, and processing the modification of the analysis operation to the analysis billboard by the analysis engine.

5. The self-service data analysis method according to claim 1, wherein the core layer of step S2 further sets a query calculation layer, introduces AI into the query calculation layer to perform resource scheduling, firstly reads information of a database, then collects state information of a machine providing calculation, selects an appropriate machine for calculation processing by the AI, and dynamically adjusts a scheduling policy.

6. The self-service data analysis method according to claim 1, wherein the functions supported by the expansion layer in step S3 include custom expansion of the data platform, expansion of the analysis model, expansion of the analysis algorithm, expansion of the analysis operation, and expansion of the graphics rendering.

7. An expandable self-service data analysis system, comprising:

the business layer setting module is used for setting the analysis objects defined by the bottom structure definition module and the relation among the analysis objects as a business layer in the form of plug-in units; corresponding analysis models in the analysis objects are developed according to actual business calculation logic, verification and management are carried out through a core layer, and the core layer drives the plug-in to operate in the form of an event;

8. The self-service data analysis system of claim 7, wherein,

when an analysis operation is initiated, the interface call of the analysis operation by the core layer comprises the steps of: the front end selects an analysis model and parameters according to the requirements; packaging the analysis model and the corresponding parameters into analysis operation and sending the analysis operation to a server; and (3) Master node scheduling, namely sending the analysis operation to a corresponding analysis engine, and processing the modification of the analysis operation to the analysis billboard by the analysis engine.

9. The self-service data analysis system of claim 7, wherein the core layer selects a Raft algorithm, logs in connection with analysis operations, and when an analysis operation is initiated, it is first serialized into Json text, uniformly forwarded to the Leader, and then distributed by the Leader to the opposite printer machine.

10. The self-service data analysis system according to claim 7, wherein the core layer is further provided with a query calculation layer, AI is introduced into the query calculation layer to perform resource scheduling, information of the database is first read, state information of the machine providing calculation is then collected, and appropriate machine is selected by AI to perform calculation processing, so that a scheduling strategy is dynamically adjusted.