CN111881126A - Big data management system - Google Patents

Big data management system Download PDF

Info

Publication number
CN111881126A
CN111881126A CN202010774634.8A CN202010774634A CN111881126A CN 111881126 A CN111881126 A CN 111881126A CN 202010774634 A CN202010774634 A CN 202010774634A CN 111881126 A CN111881126 A CN 111881126A
Authority
CN
China
Prior art keywords
data
module
metadata
submodule
updating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010774634.8A
Other languages
Chinese (zh)
Inventor
林立磐
潘仲毅
彭子非
陈朝晖
刘智国
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Information & Engineering Co ltd
Original Assignee
Guangdong Information & Engineering Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Information & Engineering Co ltd filed Critical Guangdong Information & Engineering Co ltd
Priority to CN202010774634.8A priority Critical patent/CN111881126A/en
Publication of CN111881126A publication Critical patent/CN111881126A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a big data management system, comprising: the data fusion module is used for fusing data; the data connection module comprises a database connection submodule and a file data source connection submodule; the database connection submodule is used for connecting different types of databases and extracting data from the databases; the file data connection submodule is used for receiving data files of different file formats and extracting data from the data files; and the data fusion module is used for carrying out data standardization processing on the data extracted from each data source according to a standard data rule. The embodiment of the invention can improve the data utilization rate.

Description

Big data management system
Technical Field
The invention relates to the technical field of data processing, in particular to a big data management system.
Background
With the advent of the cloud era, Big data (Big data) has attracted more and more attention. For many industries, how to utilize these large-scale data is a key to gain competition. Enterprises with large numbers of consumers offering products or services can utilize big data for accurate marketing.
However, the data sources of the big data are scattered, and the problem of data heterogeneity exists, so that the data utilization rate is low in the actual data application process.
Disclosure of Invention
The embodiment of the invention provides a big data management system which can be used for collecting data of each data source and carrying out data standardization processing on heterogeneous data so as to improve the data utilization rate.
An embodiment of the present invention provides a big data management system, including: the data fusion module is used for fusing data; the data connection module comprises a database connection submodule and a file data source connection submodule;
the database connection submodule is used for connecting different types of databases and extracting data from the databases;
the file data connection submodule is used for receiving data files of different file formats and extracting data from the data files;
and the data fusion module is used for carrying out data standardization processing on the data extracted from each data source according to the standard data rule.
Further, the system also comprises a data modeling module; and the data modeling module is used for establishing different types of data models.
Furthermore, the system also comprises an updating strategy configuration module, a data updating module and an updating progress monitoring module;
the updating strategy configuration module is used for responding to the data updating configuration operation of the user and generating a data updating strategy; and the data updating module is used for updating data according to the data updating strategy. And the updating progress monitoring module is used for monitoring the execution progress of data updating in real time.
Further, the system also comprises a service package management module and a service package authority management module;
the business package management module is used for responding to data classification operation of a user, storing each data in folders with different names in a classification manner, and displaying the name and the pattern identification of each folder on a user interface;
and the service pack authority management module is used for configuring the operation authority of different users on each folder and the data in each folder.
Further, the system also comprises a metadata management module; the metadata management module comprises a metadata setting submodule, a metadata checking submodule and a metadata viewing submodule;
the metadata setting submodule is used for setting the name of metadata, metadata description information and metadata binding data standard;
the metadata verification submodule is used for verifying the metadata according to the data rule of the metadata;
and the metadata viewing sub-module is used for responding to the metadata query operation of the user and displaying the data details of the selected metadata.
Furthermore, the system also comprises a data source tracing module; and the data source tracing module is used for responding to the data source tracing operation of the user and displaying the data source of the selected data in a graphical mode.
The system further comprises a data quality detection module, wherein the data quality monitoring module is used for verifying data at a preset time node according to the standard data rule.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides a big data management system, which comprises a data connection module and a data fusion module; data of each data source is extracted through the data connection module, collection of multi-source data is achieved, then data standardization processing is conducted on the multi-source data through the data fusion module, and the problem of data isomerism is solved, so that utilization efficiency of later-period data is improved.
Drawings
Fig. 1 is a schematic flowchart of a big data management system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a big data management system, which includes a data connection module and a data fusion module; the data connection module comprises a database connection submodule and a file data source connection submodule;
the database connection submodule is used for connecting different types of databases and extracting data from the databases;
the file data connection submodule is used for receiving data files of different file formats and extracting data from the data files;
and the data fusion module is used for carrying out data standardization processing on the data extracted from each data source according to the standard data rule.
Specifically, the different types of databases include, but are not limited to, any combination of the following databases: mainstream relational databases such as Oracle, DB2, MySQL, SQLServer, KingbaseES, Dameng, etc.; HadoopHive, SPARK, Hua is a fusion Sight and other big data platforms; NOSQL databases such as MongoDB and Redis; MPP databases such as Greenplus, TeraData and the like; BW, IntersystemsCache and the like. The database connection submodule is provided with a type database interface, and when a user executes certain database connection operation, the database connection is carried out through the corresponding database interface, and data is extracted from the database.
The data files of different file formats include, but are not limited to, any combination of the following file formats: EXCEL files, TXT files, CSV files, and the like; the file data connection submodule receives data files in various formats imported by a user, extracts suffix names of the files, identifies the types of the imported files, and extracts file data according to the types of the files;
in a preferred embodiment, the data connection module further comprises a service data source connection submodule; the service data source connecting submodule is used for connecting service data sources such as Webservice and JSON, and therefore data are extracted.
For the data fusion module, a standard data rule configuration interface is arranged in the big data management system disclosed by the invention, a user can configure the standard data rule on the interface in a manual input or click mode, and the configured standard data rule can comprise any one or more of the following combinations: data format, expression, data range (data precision range), data consistency, data integrity, and uniqueness of data.
After the user completes the configuration of the standard data rule, the data fusion module performs data format conversion, data cleaning and other operations on the multi-source data according to the standard data rule, and the problem of data isomerism of the multi-source data is solved.
In a preferred embodiment, the big data management system further comprises a data modeling module; and the data modeling module is used for establishing different types of data models.
Specifically, the different types of data models include any combination of the following data set models: a base table data set model, a file data set model, an SQL data set model and a user-defined data set model;
for the database table data set model, the data modeling module performs data collection on multi-source data, and database tables are generated according to the structure requirements of the database tables of different databases; in a relational database, for example, the structure of a database table is a collection of series of two-dimensional arrays used to represent and store relationships between data objects. It consists of vertical columns and horizontal rows, for example in a table named authors with information about authors, each column containing information of a certain type for all authors, such as "last name", and each row containing all information for a certain author: family name, first name, address, etc. When the database table data set model is constructed, the data modeling module integrates the data into the database table data consistent with the database table structure of the relational database according to the requirement of the database table structure of the relational database. The database table data modeling module can respond to the data copying operation of a user and copy single or multiple database table data.
For the file data set model, the data modeling module generates data files according to the preset file format of each data, and the plurality of data files form a file data set, so that the construction of the file data set model is completed. In the process of constructing the file data model, the data modeling module can respond to data import and data addition operation of a user and add the formed file data memorability data.
For the SQL data set model, the data modeling module extracts the stored data by responding to SQL sentences input by a user to form a corresponding data table, thereby completing the construction of the SQL data set model.
For a user-defined data set model, a data modeling module generates a user-defined data model creating interface, the user-defined data model creating interface comprises a plurality of graphical components and an editing interface, each graphical component is used for identifying a corresponding data processing operation, and after a user drags and drops the graphical components to the editing interface, the data modeling module processes and outputs data according to the data operation identified by the graphical components, so that the generation of user-defined data is completed, and the construction of the user-defined data set model is completed. Preferred imaging assemblies include any one or more of the following in combination: a field selection component, a data filtering component, a cut string component, a line and row conversion component, a remove duplicate records component, a value mapping component, a calculator component, a string replacement component, and a merge record.
In a preferred embodiment, the big data management system further includes an update policy configuration module, a data update module, and an update progress monitoring module;
in the invention, an update strategy configuration module generates a strategy configuration interface, a user configures a strategy for updating data on the interface, and the update strategy comprises the update time and the update range of the data: the specific updating time supports three types of timing, manual anytime and delay updating, and the timing supports month, week, day, hour/minute/second; the updating range supports full and incremental updating, the increment is added by default, and whether synchronous modification and deletion are carried out can be set. The system comprises a policy configuration interface, an update time and an update range, and an update policy configuration module, wherein the policy configuration interface is used for manually inputting or clicking the update time and the update range, and then generating a data update policy according to the update time and the update range input or clicked by a user;
and the data updating module performs data updating on the data in the updating range when the updating time comes according to the updating time and the updating range in the updating strategy.
In the updating process, the updating progress monitoring model monitors the execution progress of each updating strategy, and once an updating error occurs, an alarm prompt is given.
In a preferred embodiment, the big data management system further includes a service package management module and a service package authority management module;
the business package management module is used for responding to data classification operation of a user, storing each data in folders with different names in a classification manner, and displaying the name and the pattern identification of each folder on a user interface;
and the service pack authority management module is used for configuring the operation authority of different users on each folder and the data in each folder.
The method and the device are used for grouping and classifying various data sets based on business analysis requirements. The service pack management module responds to the operation of creating the service folder by the user, creates the folder containing various service names on the display interface, then responds to the operation of dividing each data into various service types by the user, and stores each data in the folder corresponding to the service name.
And the service package authority management module configures each folder and the operation authority of the data in the folder aiming at the role of the user, thereby ensuring the data security. For example, a user configuring the master level may delete a folder named a, and may add, delete, modify, and check data in the folder a.
In a preferred embodiment, the system further comprises a metadata management module; the metadata management module comprises a metadata setting submodule, a metadata checking submodule and a metadata viewing submodule;
the metadata setting submodule is used for setting the name of metadata, metadata description information and metadata binding data standard;
the metadata verification submodule is used for verifying the metadata according to the data rule of the metadata;
and the metadata viewing sub-module is used for responding to the metadata query operation of the user and displaying the data details of the selected metadata.
Metadata is also called intermediate data, relay data, and is data describing data.
The metadata setting sub-module generates a metadata information creation interface, and a user can input the name of the metadata and the metadata description information in the interface and set the binding standard between the metadata and the data. And then the metadata setting submodule stores the information to complete the basic setting of the metadata.
For the metadata verification sub-module, the verification of the metadata may include: data format check, data consistency check, data integrity check, and the like.
For the metadata viewing submodel, the data details of the metadata include any one or a combination of the following: metadata description information, an affiliated table, a data type, metadata binding data standards, data quality and the like.
In a preferred embodiment, the big data management system further includes a data tracing module; and the data source tracing module is used for responding to the data source tracing operation of the user and displaying the data source of the selected data in a graphical mode.
The data tracing module is used for displaying the data source of the data in a graphical mode from three layers of a field, a table and a service package after responding to the data tracing operation of a user, wherein the first layer is a field where the data are displayed, the second layer is a data table where the field is displayed, and the third layer is a service package where the data table is displayed (namely a folder where the data are located); thereby completing the source tracing positioning of the data.
In a preferred embodiment, the big data management system further includes a data quality detection module, and the data quality monitoring module is configured to verify data according to the standard data rule at a preset time node.
And the data quality detection module monitors the quality of the data table according to the standard data rule when a preset time node arrives, and finds out problem data. The monitoring items comprise data format verification, data range verification, expression verification, integrity verification, uniqueness verification, consistency verification and the like.
In a preferred embodiment, the big data management system further comprises a data sharing module; the data sharing module can realize three data sharing modes of service, database pushing and file pushing.
The data sharing module provides shared data for establishing three database pushing modes of custom configuration, table copying and SQL; the strategy of flexibly setting data sharing can be realized; the execution progress of the data sharing task can be monitored in real time, and historical task query and error condition warning can be realized. The WebService and the JSON service modes share data, can realize visual configuration of a shared data range, can realize active/passive sharing mechanisms, and can realize registration and authorization management functions of service interface access users. The shared data can be pushed in file forms such as Excel, TXT, XML and the like.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (7)

1. A big data management system, comprising: the data fusion module is used for fusing data; the data connection module comprises a database connection submodule and a file data source connection submodule;
the database connection submodule is used for connecting different types of databases and extracting data from the databases;
the file data connection submodule is used for receiving data files of different file formats and extracting data from the data files;
and the data fusion module is used for carrying out data standardization processing on the data extracted from each data source according to a standard data rule.
2. The big data management system of claim 1, further comprising a data modeling module; and the data modeling module is used for establishing different types of data models.
3. The big data management system of claim 1, further comprising an update policy configuration module, a data update module, and an update progress monitoring module;
the updating strategy configuration module is used for responding to the data updating configuration operation of the user and generating a data updating strategy;
the data updating module is used for updating data according to the data updating strategy;
and the updating progress monitoring module is used for monitoring the execution progress of data updating in real time.
4. The big data management system of claim 1, further comprising a service pack management module and a service pack rights management module;
the business package management module is used for responding to data classification operation of a user, storing each data in folders with different names in a classification manner, and displaying the name and the pattern identification of each folder on a user interface;
and the service pack authority management module is used for configuring the operation authority of different users on each folder and the data in each folder.
5. The big data management system of claim 1, further comprising a metadata management module; the metadata management module comprises a metadata setting submodule, a metadata checking submodule and a metadata viewing submodule;
the metadata setting submodule is used for setting the name of metadata, metadata description information and metadata binding data standard;
the metadata verification submodule is used for verifying the metadata according to the data rule of the metadata;
and the metadata viewing sub-module is used for responding to the metadata query operation of the user and displaying the data details of the selected metadata.
6. The big data management system of claim 1, further comprising a data tracing module; and the data source tracing module is used for responding to the data source tracing operation of the user and displaying the data source of the selected data in a graphical mode.
7. The big data management system according to claim 1, further comprising a data quality monitoring module, wherein the data quality monitoring module is configured to verify data according to the standard data rule at a predetermined time node.
CN202010774634.8A 2020-08-04 2020-08-04 Big data management system Pending CN111881126A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010774634.8A CN111881126A (en) 2020-08-04 2020-08-04 Big data management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010774634.8A CN111881126A (en) 2020-08-04 2020-08-04 Big data management system

Publications (1)

Publication Number Publication Date
CN111881126A true CN111881126A (en) 2020-11-03

Family

ID=73210174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010774634.8A Pending CN111881126A (en) 2020-08-04 2020-08-04 Big data management system

Country Status (1)

Country Link
CN (1) CN111881126A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112596851A (en) * 2020-12-02 2021-04-02 中国人民解放军63921部队 Multi-source heterogeneous data batch extraction method and analysis method of simulation platform
CN112650850A (en) * 2020-12-25 2021-04-13 胡友彬 Wind and cloud satellite remote sensing mapping data management system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832392A (en) * 2017-10-31 2018-03-23 链家网(北京)科技有限公司 A kind of metadata management system
CN108090205A (en) * 2017-12-27 2018-05-29 南京熊猫电子股份有限公司 A kind of army rear service data system for unified management based on J2EE
CN109241194A (en) * 2018-09-29 2019-01-18 广东省信息工程有限公司 The load-balancing method and device of Database Systems based on High-Performance Computing Cluster distribution
CN109766378A (en) * 2018-12-26 2019-05-17 吕杨 A kind of multi-source heterogeneous water conservancy hydrographic data shared system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832392A (en) * 2017-10-31 2018-03-23 链家网(北京)科技有限公司 A kind of metadata management system
CN108090205A (en) * 2017-12-27 2018-05-29 南京熊猫电子股份有限公司 A kind of army rear service data system for unified management based on J2EE
CN109241194A (en) * 2018-09-29 2019-01-18 广东省信息工程有限公司 The load-balancing method and device of Database Systems based on High-Performance Computing Cluster distribution
CN109766378A (en) * 2018-12-26 2019-05-17 吕杨 A kind of multi-source heterogeneous water conservancy hydrographic data shared system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112596851A (en) * 2020-12-02 2021-04-02 中国人民解放军63921部队 Multi-source heterogeneous data batch extraction method and analysis method of simulation platform
CN112650850A (en) * 2020-12-25 2021-04-13 胡友彬 Wind and cloud satellite remote sensing mapping data management system

Similar Documents

Publication Publication Date Title
CN111159191B (en) Data processing method, device and interface
US10339038B1 (en) Method and system for generating production data pattern driven test data
CN110781236A (en) Method for constructing government affair big data management system
CN105373469A (en) Interface based software automation test method
CN102917009B (en) A kind of stock certificate data collection based on cloud computing technology and storage means and system
CN104200402A (en) Publishing method and system of source data of multiple data sources in power grid
CN102722584B (en) Data storage system and method
CN104298779A (en) Processing method and system for massive data processing
CN111881126A (en) Big data management system
CN112163017B (en) Knowledge mining system and method
CN112988919A (en) Power grid data market construction method and system, terminal device and storage medium
CN104036034A (en) Log analysis method and device for data warehouse
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
CN113742325A (en) Data warehouse construction method, device and system, electronic equipment and storage medium
CN112579578A (en) Metadata-based data quality management method, device and system and server
CN114661832A (en) Multi-mode heterogeneous data storage method and system based on data quality
CN115544183A (en) Data visualization method and device, computer equipment and storage medium
Ali et al. A state of art survey for big data processing and nosql database architecture
KR101829198B1 (en) A metadata-based on-line analytical processing system for analyzing importance of reports
CN107704620A (en) A kind of method, apparatus of file administration, equipment and storage medium
EP2691881A2 (en) Finding a data item of a plurality of data items stored in a digital data storage
US11693834B2 (en) Model generation service for data retrieval
CN111125045B (en) Lightweight ETL processing platform
CN117436740A (en) Asset benefit evaluation method, device and storage medium
CN110704635B (en) Method and device for converting triplet data in knowledge graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination