CN117931776B

CN117931776B - Big data storage analysis system platform and method based on virtualization technology

Info

Publication number: CN117931776B
Application number: CN202410324015.7A
Authority: CN
Inventors: 邓练兵; 巩志国; 官全龙
Original assignee: Guangdong Qinzhi Technology Research Institute Co ltd
Current assignee: Guangdong Qinzhi Technology Research Institute Co ltd
Priority date: 2024-03-21
Filing date: 2024-03-21
Publication date: 2024-06-07
Anticipated expiration: 2044-03-21
Also published as: CN117931776A

Abstract

The invention discloses a big data storage analysis system platform and a method based on a virtualization technology, wherein the big data storage analysis system platform comprises the following steps: front-end processor, service node, metadata service node; the front-end processor comprises a data source service and a data synchronization module, and the data source service and the data synchronization module are utilized to collect data resources from databases of all business systems; the service node comprises a physical controller manager, a virtual database manager, a virtual table manager, a distributed query engine, an interface container, a database management service and a data service; the physical controller manager comprises a plurality of physical controllers, wherein the physical controllers collect data resources from the front-end processor, manage and inquire the data resources and then transmit the data to the metadata service node; the metadata service node comprises a metadata storage unit, a system management service, a system monitoring service, a data directory service, a user log service and a system interface service; and storing the data uploaded by the service node through the metadata storage unit for each service.

Description

Big data storage analysis system platform and method based on virtualization technology

Technical Field

The invention relates to the technical field of big data storage analysis, in particular to a big data storage analysis system platform and method based on a virtualization technology.

Background

For enterprises, the large data technology is utilized to establish the comprehensive data storage analysis platform, so that management staff in the enterprises can better master the actual situation of the enterprises, the operating cost of the enterprises is effectively controlled, the economic benefit of the enterprises is improved, the management level of the enterprises is comprehensively improved, and the enterprise development is positively promoted. Therefore, the big data technology must be reasonably utilized at the present stage, and a big data storage analysis platform is established by combining the operation mode and the development condition of the enterprise, so that more accurate data information is provided for the development of the enterprise.

The big data technology mainly refers to a modern technology combining the technologies of predictive analysis, statistical analysis, data analysis, artificial intelligence, computer technology, language processing and the like. In the big data technical system, the method comprises 5 aspects of infrastructure, data acquisition and basic processing, data storage, data calculation and analysis and data presentation. And a comprehensive data analysis platform is established for the enterprise by utilizing a big data technology, so that a good foundation can be laid for the development of the enterprise, and various operation projects and management work of the enterprise are more visualized.

At present, with the rapid development of economy, enterprises are innovated and reformed deeply, and information data related to the enterprises are more and the variety is more and more abundant. However, the analysis is performed in combination with the actual situation of the current enterprise, and because of the influence of the management concept, the data information in the enterprise cannot be comprehensively supervised and managed, and a scientific verification mechanism is not provided, so that the sharing and exchange of the information are difficult to quickly complete. Meanwhile, because of the situation that a lot of information is isolated in enterprises, the data analysis work is difficult to go deep into practice. In addition, because of the lack of an effective large data storage and analysis platform in enterprises, much information is not fully utilized, so that much data cannot fully play a role.

At present, due to the continuous development of enterprises, higher requirements are put on the processing of data information. However, the data volume in the enterprise is very large, and the problems of data missing, data inconsistency, data abnormality, data repetition and the like often exist, so that the difficulty of data processing is further increased, and the development of enterprise data management work is not facilitated.

Therefore, how to reasonably utilize the big data technology to establish a more modern and perfected comprehensive data storage analysis platform is particularly important for effectively utilizing enterprise data information.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a big data storage analysis system platform and a method based on a virtualization technology, which are used for solving the technical problem that an effective big data storage analysis platform is lacking in enterprises, thereby achieving the purpose of fully utilizing enterprise information and fully playing data value.

In order to solve the problems, the technical scheme adopted by the invention is as follows:

a big data storage analysis system platform based on virtualization technology, comprising:

the front-end processor is used for collecting large-scale, physically distributed and heterogeneous data resources from the database of each service system;

The service node is used for collecting the data resources from the front-end processor, managing and inquiring the data resources and transmitting data to the metadata service node;

the metadata service node is used for storing the data uploaded by the service node and providing the data for each service;

the service node adopts a virtualization technology to communicate large-scale, physically distributed and heterogeneous data resources to form a virtual data resource center, provides uniform data standards and access interfaces, supports transparent access of the metadata service node to the data sources, and shares and uniformly manages the data resources among a plurality of independent service systems;

The service node completes one-key production of the database instance based on a distributed remote creation technology of the life cycle of the database instance;

The metadata service node converts data into a visual chart based on an automatic adaptation technology of a main component visual view used by a user, and information hidden in the data is directly displayed through the visual chart;

the virtualization technology adopted by the service node comprises the following steps: a data virtualization technology, a heterogeneous data source virtual access technology, a database virtualization microkernel data source registration and encapsulation technology;

Wherein the data virtualization technique comprises: the service node provides a layer of unified data view by using the EVP model, accesses multi-source, distributed and heterogeneous data resources through a unified access interface, completes the integration of the heterogeneous data resources, and shields the physical details of the access of the bottom data.

As a preferred embodiment of the present invention, the front-end processor includes a data source service and a data synchronization module, and uses the data source service and the data synchronization module to collect data resources;

the service node comprises: physical controller manager, virtual database management, virtual table management, distributed query engine, interface container, database management service, and data service;

The physical controller manager includes: the plurality of physical controllers collect the data resources from the front end processor, manage and inquire the data resources through the virtual database management, the virtual table management and the distributed inquiry engine, and then transmit data to a metadata service node through the interface container based on the database management service and the data service;

The metadata service node comprises: metadata storage unit, system management service, system monitoring service, data directory service, user log service, and system interface service; and storing the data uploaded by the service node through the metadata storage unit, and providing the data for the system management service, the system monitoring service, the data directory service, the user log service and the system interface service.

As a preferred embodiment of the present invention, the physical details include: differences in physical location, specific type, data structure of the data sources;

the EVP model includes: a physical resource layer, a virtual resource layer, and an effective resource layer;

The physical resource layer comprises a plurality of nodes, and each node is used for deploying a plurality of physical resources; the virtual resource layer comprises a plurality of virtual organizations, each virtual organization is provided with a plurality of virtual resources, and the plurality of virtual resources form a virtual resource space of each virtual organization; the effective resource layer comprises a plurality of applications, each application uses a plurality of effective resources, and the plurality of effective resources form an effective resource space of each application;

If n nodes exist in the physical resource layer, the application in the effective resource layer can access the physical resources from node 1 to node n through a virtual organization in the virtual resource layer;

The physical resources include: hard disk storage resources, computing resources, and software resources.

As a preferred implementation mode of the invention, the virtual resource layer can create a virtual database based on one physical database, can also create a plurality of virtual databases based on the same physical database, and complete the virtualization of data through the plurality of virtual databases;

The virtual databases are positioned in a data resource pool constructed by a big data engine, data isolation is carried out between the virtual databases and the physical databases by a data isolation mechanism, and data of the physical databases are aggregated into one virtual database by a data aggregation mechanism;

The virtual database preloads data with high data access frequency from the physical database to the buffer area through a data multiplexing technology, requested data is read from the buffer area, if the data cannot be acquired, the data is acquired from the physical database, and the data which is not accessed for a long time is exchanged to the physical database.

As a preferred embodiment of the present invention, when the service node adopts the heterogeneous data source virtual access technology, the service node includes:

The service node provides component access control management of heterogeneous database systems based on users, roles and authorities through a data control processing architecture of a client, a front end and a service end and a flow-oriented and component-based functional component, completes platform, resource and channel flow to realize heterogeneous data source registration, virtual database object registration and virtual resource access, completes shared access management of the heterogeneous database, shields database access details and heterogeneous system multisource, provides uniform data standards and access interfaces, and supports transparent access to data sources;

The client comprises a heterogeneous data source management component, wherein the heterogeneous data source management component is used for managing heterogeneous data from enterprises A, B, C, D and E and authorizing the heterogeneous data to the front-end;

the front-end comprises a DB synchronous management component which is used for managing Mysql, oracle, DB and files and for the server to access data;

the server side comprises: the system comprises a data preprocessing component, a data persistence storage component and a unified data access management component;

The data preprocessing component performs data extraction, data cleaning, data conversion, data loading and data auditing on the data acquired from the DB synchronous management component and outputs the data to the data persistence storage component;

The data persistence storage component performs relationship data storage, file directory service, cluster nodes and parallel computation on the data output by the data preprocessing module and outputs the data to the unified data access management component;

The unified data access management component comprises: a system component and a business component; the system component is used for managing users, roles, domains and rights; the business component is used for managing a data platform, data resources, data channels, virtual data access and data analysis;

the functional component comprises: the heterogeneous data source management component, the DB synchronization management component, the data preprocessing component, the data persistence storage component, the unified data access management component, the system component, and the business component.

As a preferred embodiment of the present invention, when the service node adopts the heterogeneous data source virtual access technology, the service node further includes:

The client authorizes the data access authority, periodically synchronizes the production data to the intermediate data service cluster warehouse of the front end through the heterogeneous data source management component, permanently stores the front end processor data to a data cluster server node through the data preprocessing component, and performs non-business associated data cleaning rule configuration, data cleaning, data auditing and security verification control to form stable and efficient data resources;

the data preprocessing component performs data extraction through a data extraction mechanism, the data extraction mechanism adopts an intelligent scheduling algorithm based on time and events, and the synchronization mode supports incremental and full-quantity comparison extraction;

the data preprocessing component is also used for creating database examples, authorities, running states and fault analysis reports;

The data persistence storage component completes business related data access management functions and provides a transparent unified data access interface for loading different databases.

As a preferred embodiment of the present invention, when the service node adopts the database virtualization microkernel data source registration and encapsulation technology, the service node comprises:

When a host for storing the virtual library and the virtual table is started, reporting virtual data information to the corresponding virtual data Federation according to the virtual storage pool id, and performing automatic registration;

Wherein the encapsulation relationship reflects a data abstraction mapping between the virtual database, virtual table, and view and the physical application server, database, data table, or file;

the service node performs operation of the virtual database and the virtual table on a logic level and finally maps the operation to the operation of the corresponding one or more physical concept entities.

As a preferred embodiment of the present invention, upon completion of one-touch production of a database instance, it comprises:

The instance production request is firstly distributed into an administrative approval process, after approval is passed, the system generates a globally unique instance production batch number, and the one-key production of the instance is completed by using the instance production batch number;

when the one-key production of the example is completed by using the example production lot number, the method comprises the following steps:

the method comprises the steps of a WEB platform end, an RMI client end, an RMI server end and an Oracle server end, wherein the working procedures are cooperated;

The collaborative workflow of the WEB platform end comprises the following steps: the method comprises the steps of client side request processing, business operation response, administrative approval, business parameter input after approval passes, instance scheduling management, production synchronous management implementation and waiting for a processing result of a distributed RMI (remote management interface) side;

The collaborative workflow of the RMI client comprises: receiving the business parameters input by the WEB platform end, packaging a client inclusion, and returning a processing mark to the WEB platform end by a communication request;

The collaborative workflow of the RMI server side comprises the following steps: receiving a communication request of the RMI client to respond, verifying the communication validity, encrypting a channel, verifying service parameters after successful communication, processing a queue task, executing service, and returning a processing mark to the RMI client;

The collaborative workflow of the Oracle server side comprises the following steps: receiving the queue task processing definition SID of the RMI server, initializing an ora file, creating a password file pwd, creating a script file sql, executing a script file, creating a table space, defaulting a user and authorizing, and returning a processing mark to the RMI server.

As a preferred embodiment of the present invention, when the metadata service node converts data into a visual chart, it includes:

the report data transmitted by the service node is obtained through a feature factor extraction module, feature factor extraction is carried out, the extracted feature factors are matched with a feature factor library, and the importance degree is obtained according to the word frequency of the feature factors;

Matching the characteristic factors with a user history algorithm factor library through a chart score recommending module, obtaining chart characteristic factor scores, adding the characteristic factor scores of the same chart to obtain a final score of the chart, and recommending the evaluation chart to the user;

and collecting the history record of the user selection chart through a user history algorithm factor learning module, and carrying out feature factor analysis, keyword processing and feature factor user scoring acquisition to continuously perfect a feature factor library and a user history algorithm factor library.

A big data storage analysis method based on a virtualization technology comprises the following steps:

The front-end processor is utilized to collect large-scale, physically distributed and heterogeneous data resources from the database of each service system;

The service node is utilized to collect the data resources from the front-end processor, and after the data resources are managed and inquired, the data are transmitted to the metadata service node;

Storing the data uploaded by the service node by utilizing a metadata service node, and providing the data for each service;

The service node is utilized to communicate large-scale, physically distributed and heterogeneous data resources by adopting a virtualization technology to form a virtual data resource center, a unified data standard and an access interface are provided, transparent access of the metadata service node to the data source is supported, and data resources among a plurality of independent service systems are shared and unified managed;

The service node is utilized to finish one-key production of the database instance based on a distributed remote creation technology of the life cycle of the database instance; converting the data into a visual chart by using the metadata service node based on an automatic adaptation technology of a main component visual view by a user, and directly showing information hidden in the data through the visual chart;

Wherein the data virtualization technique comprises: and providing a layer of unified data view by using the EVP model by utilizing the service node, accessing multi-source, distributed and heterogeneous data resources through a unified access interface, completing the integration of the heterogeneous data resources, and shielding the physical details of the access of the bottom data.

Compared with the prior art, the invention has the beneficial effects that:

(1) The big data storage analysis system platform provided by the invention takes the data information resource library as a core, integrates independent and scattered application system data and business management resource data comprehensively through the construction of data information resource integration, establishes a unified information resource management function and a long-acting updating mechanism, builds an operation supporting environment meeting the requirements of information resource storage, transmission, exchange, service, application and safety management, and promotes the full sharing utilization of information resources;

(2) The big data storage analysis system platform provided by the invention has a mass elastic storage space, a strong and flexible safety mechanism, a reliable and efficient computing environment, high-speed mass full-text search, flexible and various access modes and friendly and easy-to-use operation and maintenance management, combines an open source Apache Hadoop and Spark, provides a method for rapidly analyzing and processing big data, and provides a model library for industrial processing analysis mining;

(3) The invention provides a layer of unified data view by using the EVP model through the data virtualization technology, accesses multi-source, distributed and heterogeneous data resources through the unified access interface, realizes the integration of the heterogeneous data resources, shields the physical details of the bottom data access, realizes the transparent access to the data sources, realizes the sharing and unified management of the data resources among a plurality of independent service systems, and greatly improves the data access efficiency through the data multiplexing technology;

(4) The invention shields the complicated details of physical operations which have to be paid attention in the past through the database virtualization microkernel data source registration and encapsulation technology, ensures the convenience of the use of an internet platform and the expandable, portable and loose coupling characteristics of the application;

(5) The invention constructs a concise database virtualization microkernel data source registration and encapsulation method, which can be applied to the acquisition of cloud database data information;

(6) According to the invention, the data is converted into the proper visual chart based on the automatic adaptation technology of the main component visual view, and the information hidden in the data is directly displayed in front of people, so that the data is more objective and convincing.

The invention is described in further detail below with reference to the drawings and the detailed description.

Drawings

FIG. 1 is a logical framework diagram of a big data storage analysis system platform provided by the present invention;

FIG. 2 is a physical architecture diagram of a big data storage analysis system platform provided by the present invention;

FIG. 3 is a logical architecture diagram of an EVP model provided by the present invention;

FIG. 4 is a logical architecture diagram of a multi-tenant model provided by the present invention;

fig. 5 is a schematic diagram of a data multiplexing technique provided by the present invention;

FIG. 6 is a flow chart of access management provided by the present invention;

FIG. 7 is a unified shared access graph implementing heterogeneous data sources provided by the present invention;

FIG. 8 is a flowchart of the collaborative processing of the WEB platform end, the distributed RMI end and the Oracle server end provided by the invention;

fig. 9 is a flowchart of an automatic adaptation technique based on a user's use of a principal component visualization, provided by the present invention.

Reference numerals illustrate: 1. a WEB platform end; 2. an RMI client; 3. an RMI server; 4. an Oracle server end; 5. a physical resource layer; 6. a virtual resource layer; 7. an effective resource layer; 8. a front-end processor; 9. a service node; 10. a metadata service node; 11. a client; 12. a front end; 13. a server; 14. a business system.

Detailed Description

The big data storage analysis system platform based on the virtualization technology provided by the invention, as shown in fig. 1 and fig. 2, comprises: a front end processor 8, a service node 9 and a metadata service node 10.

The front-end processor 8 includes a data source service and a data synchronization module, which is used to collect large-scale, physically distributed, heterogeneous data resources from the databases of the respective business systems 14.

A service node 9 including a physical controller manager, virtual database management, virtual table management, distributed query engine, interface container, database management service, and data service; the physical controller manager includes a plurality of physical controllers that collect data resources from the front end processor 8, and after managing and querying the data resources through virtual database management, virtual table management, and distributed query engine, transmit data to the metadata service node 10 through the interface container based on database management service and data service.

The metadata service node 10 includes a metadata storage unit, a system management service, a system monitoring service, a data directory service, a user log service, and a system interface service; the data uploaded by the service node 9 is stored by the metadata storage unit and is used by a system management service, a system monitoring service, a data directory service, a user log service and a system interface service.

The service node 9 uses virtualization technology to communicate large-scale, physically distributed and heterogeneous data resources to form a virtual data resource center, and provides unified data standard and access interface, supports transparent access of the metadata service node 10 to the data sources, and shares and manages the data resources among a plurality of independent service systems 14;

virtual database management is based on a distributed remote creation technology of the life cycle of a database instance, and one-key production of the database instance is completed;

the metadata service node 10 converts the data into a visual chart based on an automatic adaptation technique of a user using a principal component visual view, and directly presents information hidden in the data through the visual chart.

Specifically, as shown in fig. 1, the service node 9 further includes: user log, system monitoring.

Further, the virtualization technology adopted by the service node 9 includes: data virtualization technology, heterogeneous data source virtual access technology, database virtualization microkernel data source registration and encapsulation technology.

Further, when the data virtualization technology is adopted, the service node 9 includes:

The service node 9 provides a layer of unified data view by using the EVP model, accesses multi-source, distributed and heterogeneous data resources through a unified access interface, completes the integration of the heterogeneous data resources, shields the physical details of the access of the bottom layer data, completes the transparent access of the data sources, and completes the sharing and unified management of the data resources among a plurality of independent service systems 14;

Wherein the physical details include: differences in physical location, specific type, data structure of the data sources;

as shown in fig. 3, the EVP model includes: a physical resource layer 5, a virtual resource layer 6 and an effective resource layer 7;

the physical resource layer 5 comprises a plurality of nodes, and each node is used for deploying a plurality of physical resources; the virtual resource layer 6 comprises a plurality of virtual organizations, wherein a plurality of virtual resources exist in each virtual organization, and the plurality of virtual resources form a virtual resource space of each virtual organization; the effective resource layer 7 comprises a plurality of applications, each application uses a plurality of effective resources, and the plurality of effective resources form an effective resource space of each application;

if n nodes exist in the physical layer, the application in the effective resource layer 7 can access the physical resources from the node 1 to the node n through a virtual organization in the virtual resource layer 6;

Further, as shown in fig. 4, the virtual resource layer 6 can create a virtual database based on one physical database, or can create multiple virtual databases based on the same physical database, and complete the virtualization of data through the multiple virtual databases;

As shown in fig. 5, the virtual database preloads data with high data access frequency from the physical database to the buffer area through the data multiplexing technology, the requested data is read from the buffer area, if the data cannot be acquired, the data is acquired from the physical database, and if the data is not accessed for a long time, the data is exchanged to the physical database.

Further, when the service node 9 adopts the heterogeneous data source virtual access technology, the method includes:

As shown in fig. 6, the service node 9 provides access control management of component components of the heterogeneous database system based on users, roles and rights through the data control processing architecture and flow-oriented and component-based functional components of the client 11-front end 12-server 13, completes platform, resource and channel flow to realize heterogeneous data source registration, virtual database object registration and virtual resource access, completes shared access management of the heterogeneous database, shields database access details and heterogeneous system multisource, provides uniform data standard and access interface, and supports transparent access to the data source;

Wherein, the client 11 includes a heterogeneous data source management component, which is used to manage heterogeneous data from enterprise a, enterprise B, enterprise C, enterprise D, and enterprise E, and is authorized to the front end 12;

the front end 12 comprises a DB synchronization management component, which is used for managing Mysql, oracle, DB and files and providing data access for the server end 13;

The server 13 includes: the system comprises a data preprocessing component, a data persistence storage component and a unified data access management component;

The functional components include: the system comprises a heterogeneous data source management component, a DB synchronization management component, a data preprocessing component, a data persistence storage component, a unified data access management component, a system component and a service component.

As shown in fig. 7, when the service node 9 adopts the heterogeneous data source virtual access technology, the method further includes:

the client 11 grants data access authority, periodically synchronizes production data to an intermediate data service cluster warehouse of the front end 12 through a heterogeneous data source management component, persistently stores the front end 8 data to a data cluster server node through a data preprocessing component, and performs non-business associated data cleaning rule configuration, data cleaning, data auditing and security verification control to form stable and efficient data resources;

The data preprocessing component performs data extraction through a data extraction mechanism, the data extraction mechanism adopts an intelligent scheduling algorithm based on time and events, and the synchronization mode supports increment and full quantity comparison extraction;

the data persistence storage component performs business related data access management functions and provides a transparent unified data access interface for loading of different databases.

Further, the service node 9, when adopting the database virtualization microkernel data source registration and encapsulation technology, includes:

The service node 9 finally maps the operations of the virtual database, virtual table, at the logical level to the operations of the corresponding physical conceptual entity or entities.

As shown in fig. 8, further, when one-touch production of the database instance is completed, it includes:

When one-key production of the example is completed by using the example production lot number, the method comprises the following steps:

The WEB platform end 1, the RMI client end 2, the RMI server end 3 and the Oracle server end 4 cooperate with the workflow;

the collaborative work flow of the WEB platform end 1 comprises the following steps: the method comprises the steps of client side request processing, business operation response, administrative approval, business parameter input after approval passes, instance scheduling management, production synchronous management implementation and waiting for a processing result of a distributed RMI (remote management interface) side;

The collaborative workflow of RMI client 2 includes: receiving service parameters input by the WEB platform end 1, packaging a client inclusion, requesting communication, and returning a processing mark to the WEB platform end 1;

The collaborative workflow of the RMI server 3 includes: receiving a communication request of the RMI client 2 for response, verifying the validity of the communication, encrypting the channel, verifying service parameters after successful communication, processing queue tasks, executing service, and returning a processing mark to the RMI client 2;

The collaborative workflow of the Oracle server 4 includes: the queue task process of the RMI server 3 is accepted to define SIDs, an ora file is initialized, a password file pwd is created, a script file sql is created, a script file is executed, a table space is created, a default user and authorization are performed, and a process mark is returned to the RMI server 3.

As shown in fig. 9, further, when the metadata service node 10 converts data into a visualization chart, it includes:

the report data transmitted by the service node 9 is obtained through a feature factor extraction module, feature factor extraction is carried out, the extracted feature factors are matched with a feature factor library, and the importance degree is obtained according to the word frequency of the feature factors;

The invention provides a big data storage analysis method based on a virtualization technology, which comprises the following steps:

step S1: collecting large-scale, physically distributed and heterogeneous data resources from databases of each business system 14 by using a data source service and a data synchronization module in the front-end processor 8;

step S2: the method comprises the steps that data resources are collected from a front-end processor 8 by utilizing a plurality of physical controllers of a service node 9, and after the data resources are managed and queried through virtual database management, virtual table management and a distributed query engine of the service node 9, data are transmitted to a metadata service node 10 through an interface container based on database management service and data service;

Step S3: the metadata storage unit of the metadata service node 10 is used to store the data uploaded by the service node 9 and provide for the system management service, the system monitoring service, the data directory service, the user log service and the system interface service of the metadata service node 10.

Virtual database management is based on a distributed remote creation technology of the life cycle of a database instance, and one-key production of the database instance is completed; the metadata service node 10 converts the data into a visual chart based on an automatic adaptation technique of a user using a principal component visual view, and directly presents information hidden in the data through the visual chart.

The following examples are further illustrative of the present invention, but the scope of the present invention is not limited thereto.

The big data storage analysis system platform provided by the embodiment adopts five key technologies, namely a data virtualization technology, a heterogeneous data source virtual access technology, a database virtualization microkernel data source registration and encapsulation technology, a distributed remote creation technology of a database instance life cycle and an automatic visual view adaptation technology based on a user using a principal component.

1) Data virtualization techniques

The data virtualization technology uses an EVP model to provide a layer of unified data view, accesses multi-source, distributed and heterogeneous data resources through a unified access interface, integrates the heterogeneous data resources, shields physical details of the underlying data access, such as physical positions, specific types, differences of data structures and the like of the data sources, realizes transparent access to the data sources, and realizes sharing and unified management of the data resources among a plurality of independent service systems 14.

The logical architecture of the EVP model described above is shown in fig. 3.

The plurality of users commonly use the same physical database through the virtual database, and the existence of other users is not perceived when the virtual database is used. Specifically, the user can create a virtual database based on one physical database, and a plurality of users can create a plurality of virtual databases based on the same physical database, so that multi-tenant is realized, and meanwhile, the system support can realize data virtualization based on the plurality of physical databases when the virtual database is created.

Wherein the logical architecture of the multi-tenant model is shown in fig. 4.

In order to improve the data access efficiency, the system uses a data multiplexing technology to preload data with high data access frequency from the database to the buffer, read the requested data from the buffer, acquire the data from the database if the data cannot be acquired, and exchange the data without access for a long time to the database. The access efficiency of data is greatly improved by the data multiplexing technology, and fig. 5 shows the principle of the data multiplexing technology.

2) Heterogeneous data source virtual access technology

The heterogeneous data source virtual access technology provided by the embodiment is a technical core of a cloud database of a heterogeneous data source, and virtual access to the heterogeneous data source can be managed through the technology.

The heterogeneous data source virtual access technology provided in this embodiment introduces component access control management of a heterogeneous database system based on users, roles, rights through a data control processing architecture and a flow-oriented and component-based functional component of a client 11-a front end 12-a server 13, completes the flows of platform, resource, channel and the like to realize heterogeneous data source registration, virtual database object registration, virtual resource access, completes shared access management of the heterogeneous database, shields access details of the database and multisource of the heterogeneous system, provides uniform data standard and access interface, supports transparent access to the data source, and realizes sharing and uniform management of data resources among a plurality of independent service systems 14. Fig. 6 and 7 are a flow chart of access management and a unified shared access graph implementing heterogeneous data sources, respectively.

The heterogeneous data source client authorizes data access rights and periodically synchronizes production data to the front end 12 intermediate data service cluster warehouse; the heterogeneous database system uses a data preprocessing flow to store the front-end processor 8 data into a data cluster server node in a lasting way, and performs operations such as non-business associated data cleaning rule configuration, data cleaning, data auditing, security verification control and the like to form stable and efficient data resources, and a data extraction mechanism adopts an intelligent scheduling algorithm based on time and events to support increment and full-quantity comparison extraction in a synchronous mode; the data management layer creates a database instance, authority, running state and fault analysis report; the data access layer completes the service related data access management function and provides a transparent unified data access interface for loading different databases.

3) Database virtualization microkernel data source registration and encapsulation technology

The database virtualization microkernel data source registration and encapsulation technology shields the complicated details of physical operations which have to be paid attention to before, ensures the convenience of using the internet platform and the expandable, portable and loose coupling characteristics of the application. When a host for storing the virtual library and the virtual table is started, reporting virtual data information to the corresponding virtual data Federation according to the virtual storage pool id, and performing automatic registration; the encapsulation relationship reflects a data abstraction mapping between the virtual database, virtual table, and virtual view and the physical application server, database, data table, or file; operations at the logical level on the virtual database, virtual tables, ultimately map to operations of the corresponding one or more physical conceptual entities.

The database virtualization microkernel data source registration and encapsulation technology provided by the embodiment constructs a concise database virtualization microkernel data source registration and encapsulation method; and the method can be applied to the acquisition of the data information of the cloud database.

4) Distributed remote creation technique for database instance lifecycle

The distributed remote creation technology of the database instance life cycle includes that an instance production request is firstly distributed into an administrative approval process, a globally unique instance production batch number is generated by a system after approval is passed, and one-key production of the instance is completed by using the production batch number; the method comprises a WEB platform end 1, a distributed RMI end and an Oracle server end 4 which cooperate with a working flow.

The distributed remote creation technology of the database instance life cycle provided by the embodiment is compatible with the production of Oracle 11g and Oracle 12C version instances. Fig. 8 is a flowchart of the cooperative processing of the WEB platform end 1, the distributed RMI end, and the Oracle server end 4.

Wherein, distributed RMI end includes: RMI client 2 and RMI server 3.

5) Automatic adaptation technology based on main component visual view used by user

The data visualization converts the data into a proper visual chart, and the information hidden in the data is directly displayed in front of people, so that the data is more objective and more convincing. However, visual charts are very diverse and different types of charts meet different presentation and analysis requirements. Without familiarity with these charts, it is difficult for a user to select an appropriate chart to present the data. The business scenario of using the chart by the user has many similarities, the history selection chart record of the user contains important information such as using habit, business habit, data characteristics and the like, but the information is not fully utilized, so that repeated analysis and chart selection of the same business data are required. Fig. 9 is a flowchart of an automatic adapting technique based on a user's visualized view using principal components according to the present embodiment.

The automatic adaptation technique based on the use of the principal component visualization view by the user provides: the system comprises a feature factor extraction module, a chart grading recommendation module and a user history algorithm factor learning module.

The feature factor extraction module extracts feature factors by analyzing the feature of the report data, matches the feature factors with a feature factor library, and calculates importance according to the word frequency of the feature factors.

The chart score recommending module is used for matching the characteristic factors with a user history algorithm factor library, calculating the chart characteristic factor scores, adding the characteristic factor scores of the same chart to calculate the final score of the chart, evaluating the chart and recommending the chart to the user.

The user history algorithm factor learning module is used for collecting the history record of the user selection chart, carrying out feature factor analysis, keyword processing and feature factor user scoring calculation, and continuously perfecting a feature factor library and a user history algorithm factor library.

The above embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but any insubstantial changes and substitutions made by those skilled in the art on the basis of the present invention are intended to be within the scope of the present invention as claimed.

Claims

1. A big data storage analysis system platform based on a virtualization technology, comprising:

Wherein the data virtualization technique comprises: the service node provides a layer of unified data view by using the EVP model, accesses multi-source, distributed and heterogeneous data resources through a unified access interface, completes the integration of the heterogeneous data resources, and shields the physical details of the access of the bottom data;

the service node when adopting heterogeneous data source virtual access technology comprises the following steps:

2. The virtualization technology-based big data storage analysis system platform of claim 1, wherein the front-end processor comprises a data source service and a data synchronization module, and utilizes the data source service and the data synchronization module to collect data resources;

3. The virtualization technology based big data storage analysis system platform of claim 1, wherein the physical details include: differences in physical location, specific type, data structure of the data sources;

4. The large data storage analysis system platform based on virtualization technology according to claim 3, wherein the virtual resource layer can create a virtual database based on one physical database, can also create a plurality of virtual databases based on the same physical database, and complete the virtualization of data through the plurality of virtual databases;

The virtual databases are positioned in a data resource pool constructed by a big data engine, data isolation is carried out between the virtual databases and the physical databases by a data isolation mechanism, and data of the virtual databases are aggregated into one virtual database by a data aggregation mechanism;

5. The virtualization technology-based big data storage analysis system platform of claim 1, wherein the service node, when employing heterogeneous data source virtual access technology, further comprises:

6. The virtualization technology-based big data storage analysis system platform of claim 3, wherein the service node, when employing database virtualization microkernel data source registration and encapsulation technology, comprises:

7. The virtualization technology-based big data storage analytics system platform of claim 1, comprising, upon completion of one-touch production of a database instance:

8. The virtualization technology-based big data storage analytics system platform of claim 1, wherein when the metadata service node converts data into a visualization graph, comprising:

9. A big data storage analysis method based on a virtualization technology is characterized by comprising the following steps:

Wherein the data virtualization technique comprises: providing a layer of unified data view by using the EVP model by utilizing the service node, accessing multi-source, distributed and heterogeneous data resources through a unified access interface, completing the integration of the heterogeneous data resources, and shielding the physical details of the access of the bottom data;