CN113886378A - Big data management system - Google Patents

Big data management system Download PDF

Info

Publication number
CN113886378A
CN113886378A CN202111220821.2A CN202111220821A CN113886378A CN 113886378 A CN113886378 A CN 113886378A CN 202111220821 A CN202111220821 A CN 202111220821A CN 113886378 A CN113886378 A CN 113886378A
Authority
CN
China
Prior art keywords
data
cleaning
management module
unit
management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111220821.2A
Other languages
Chinese (zh)
Inventor
徐育帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Zhongke Advanced Technology Research Institute Co Ltd
Original Assignee
Suzhou Zhongke Advanced Technology Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Zhongke Advanced Technology Research Institute Co Ltd filed Critical Suzhou Zhongke Advanced Technology Research Institute Co Ltd
Priority to CN202111220821.2A priority Critical patent/CN113886378A/en
Publication of CN113886378A publication Critical patent/CN113886378A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Abstract

The invention relates to the field of data management systems, in particular to a big data management system, which comprises: a data cleaning management module for managing data; the data cleaning management module comprises: the data cleaning rule definition unit is used for defining the rule for cleaning the data; the data cleaning and inquiring unit is used for inquiring data; the data cleaning rule display unit is used for displaying the rule of data cleaning; and the data modifying and deleting unit is used for modifying or deleting the data. Through the definition of the data cleaning rule, the data are managed, the data are detected, the key and necessary data are prevented from being deleted or cleaned, and the quality of the data is ensured.

Description

Big data management system
Technical Field
The invention relates to the field of data management systems, in particular to a big data management system.
Background
Big data (big data), or huge data, means that the size of the data is huge enough to achieve the purpose of capturing, managing, processing and organizing more actively helping enterprise business decision within reasonable time through the current mainstream software tools; with the development of science and technology, modern equipment provides great convenience for people's daily life, but products with higher technology require a lot of data support, so that the collection, processing and management of data are particularly important, and a lot of data require a high-efficiency and convenient system for management; the existing data management system can be used for cleaning and the like of redundant data, error data and the like of a large data platform according to defined rules, and key and necessary data can be easily deleted or cleaned.
Therefore, the prior art has defects and needs to be developed.
Disclosure of Invention
The embodiment of the invention provides a big data management system, which can detect data by defining a data cleaning rule, avoid deleting or cleaning key and necessary data and ensure the quality of the data.
According to an embodiment of the present invention, there is provided a big data governance system, including: a data cleaning management module for managing data; the data cleaning management module comprises:
the data cleaning rule definition unit is used for defining the rule for cleaning the data;
the data cleaning and inquiring unit is used for inquiring data;
the data cleaning rule display unit is used for displaying the rule of data cleaning;
and the data modifying and deleting unit is used for modifying or deleting the data.
Further, the data cleansing management module further comprises:
and the task entry unit is used for adding a new task.
Further, the system further comprises:
and the homepage module is used for browsing the big data management system.
Further, the system further comprises:
and the system management module is used for determining the authority management and the user management of the login user.
Further, the system management module comprises:
the authority management unit is used for managing addition authority, editing authority and deletion authority of the data;
and the user management unit is used for managing the login account.
Further, the system further comprises:
and the log recording management module is used for managing historical data records.
Further, the logging management module comprises:
the task log query unit is used for querying the historical data record;
the cleaning rule log query is used for querying the rule records of the historical cleaning data;
the input table log unit is used for recording input historical data;
and the output table log unit is used for recording the input historical data.
Further, the logging management module further comprises:
and the log classification unit is used for classifying the recorded historical data.
Further, the system further comprises:
and the ETL task management module is used for testing and inquiring the tasks, monitoring and retrieving the tasks and inquiring the operation logs.
Furthermore, the system also comprises a circuit board, and the data cleaning management module, the system management module, the log entry management module and the ETL task management module are connected to the circuit board.
The big data management system in the embodiment of the invention comprises a data cleaning management module for managing data; the data cleaning management module comprises: the data cleaning rule definition unit is used for defining the rule for cleaning the data; the data cleaning and inquiring unit is used for inquiring data; the data cleaning rule display unit is used for displaying the rule of data cleaning; and the data modifying and deleting unit is used for modifying or deleting the data. Through the definition of the data cleaning rule, the data are managed, the data are detected, the key and necessary data are prevented from being deleted or cleaned, and the quality of the data is ensured.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of a big data governance system of the present invention;
FIG. 2 is a preferred flow diagram of the big data abatement system of the present invention;
FIG. 3 is a detailed schematic diagram of the big data governance system of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an embodiment of the present invention, there is provided a big data governance system, referring to fig. 1 and 2, including: a data cleaning management module 100 for managing data; the data cleansing management module 100 includes:
a data cleansing rule defining unit 101 for defining a rule for cleansing data;
a data cleaning and querying unit 102, configured to query data;
a data cleaning rule display unit 103, configured to display a rule for cleaning data;
and a data modification and deletion unit 104, configured to modify or delete data.
The invention manages the data by defining the data cleaning rule, realizes the detection of the data, avoids deleting or cleaning key and necessary data, and ensures the quality of the data.
The quality of the data is directly related to whether the data can truly show the original appearance of the data information, and the high-quality data is a necessary guarantee for daily business processing and leaders to make correct decisions. The data management system realizes the detection of the data and ensures the quality of the data through the modes of data quality detection, data error correction and notification, data correction processing and the like.
The data management system can realize visualization of a data quality flow, perform quality evaluation on data of the central database, judge data integrity, standard validity, calculation logic correctness, consistency and the like according to the rules and characteristics of intranet service data, discriminate data quality, form a data quality evaluation report and provide a basis for data acquisition conversion, statistical analysis and data correction.
The data management system mainly provides a data cleaning rule definition function, a data cleaning query function, a cleaning rule display function, a modification function and a deletion function. The realization of the above functions of the data management subsystem can help a user to clean redundant data, error data and the like of the big data platform according to defined rules, and avoid deleting or cleaning away key and necessary data.
In an embodiment, the data cleansing management module 100 further comprises:
and the task entry unit is used for adding a new task. The data governance system provides cleaning task entry including task name, task description.
In an embodiment, the system further comprises:
and the homepage module 200 is used for browsing the big data governance system. The homepage module 200 is used for browsing the data cleaning management module 100, the system management module 300, the log entry management module 400 and the ETL task management module 500. Through the home page module 200, the user can choose to browse the modules of the system to implement the corresponding functions.
In an embodiment, the system further comprises:
the system management module 300 is used for determining the authority management and the user management of the login user.
The right management unit 301 can perform operations including adding rights, editing rights, and deleting rights; the user management unit 302 is used for managing a login account, and can perform operations of adding a user, editing a user, and deleting a user. The authority management also comprises authority distribution, specifically comprising role adding, role modifying and role deleting; and adding the user after the authority is distributed.
In an embodiment, the system further comprises:
and the log management module 400 is used for managing historical data records. During the operation process of the system, a large amount of historical data is generated, the system generates log records from the generated historical data, and the log data is managed through the log recording management module 400.
Among them, the logging management module 400 includes:
a task log query unit 401, configured to query a historical data record;
a cleaning rule log query 402 for querying rule records of historical cleaning data;
an input table log unit 403 for recording input history data;
and an output table log unit 404 for recording the input history data.
The data management system provides an adding operation name and an operation description, and the adding operation name and the operation description comprise an input data table, an output data table and a cleaning rule; providing cleaning task query, and supporting query according to task names, task descriptions, task states and entry time conditions; and providing view task details, delete task and modify task content, wherein the task content comprises an input and output table in the operation and a cleaning rule.
In an embodiment, the logging management module 400 further comprises:
and the log classification unit is used for classifying the recorded historical data. After the history data is classified correspondingly, the history data is managed by corresponding units, for example, the log is classified into a cleaning rule log, a job log, a task log, an input form log, an output form log, and the like. And then, providing modification log query of tasks, jobs, input tables, output tables and cleaning rule tables, supporting multi-condition and input time query, and displaying query results through lists.
In an embodiment, the system further comprises:
the ETL task management module 500 is used for testing and querying tasks, monitoring and retrieving tasks, and querying operation logs. And providing ETL task management, including task debugging query, task debugging monitoring query, operation log query and the like.
In an embodiment, according to any one of the above data interface management systems, the system further includes a circuit board, and the data cleaning management module 100, the system management module 300, the log entry management module 400, and the ETL task management module 500 are connected to the circuit board. Each module is connected to the circuit board and electrically connected to the circuit board.
Referring to fig. 3, after the user logs in, the system first determines whether the user has a corresponding right, and if the user has a right, the user can enter a homepage for browsing, system management, data cleaning management, ETL task management, log entry management, and the like according to different rights.
The system management can perform operations including adding authority, editing authority and deleting authority; or operations of adding, editing and deleting users can be performed. The authority management also comprises authority distribution, specifically comprising role adding, role modifying and role deleting; and adding the user after the authority is distributed.
The data cleaning management comprises cleaning operation input, task addition, task editing and task deletion judgment; in addition, the method also comprises the steps of cleaning operation inquiry, detail checking, judging whether to delete the rule and the input and output table, inquiring the rule and the output table and modifying the rule input and output table; in addition, the method also comprises the steps of cleaning rule statistics and displaying statistical results.
The ETL task management comprises ETL task classification, task debugging query/task debugging monitoring retrieval/operation log query, query condition setting and query result display.
Log input management firstly classifies logs including cleaning rule logs, job logs, task logs, input form logs and output form logs, performs log query through input query keywords according to classification, and displays logs according to query.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims (10)

1. A big data governance system, comprising: a data cleaning management module for managing data; the data cleaning management module comprises:
the data cleaning rule definition unit is used for defining the rule for cleaning the data;
the data cleaning and inquiring unit is used for inquiring data;
the data cleaning rule display unit is used for displaying the rule of data cleaning;
and the data modifying and deleting unit is used for modifying or deleting the data.
2. The big data governance system of claim 1, wherein the data cleansing management module further comprises:
and the task entry unit is used for adding a new task.
3. The big data governance system according to claim 1, wherein the system further comprises:
and the homepage module is used for browsing the big data management system.
4. The big data governance system according to claim 1, wherein the system further comprises:
and the system management module is used for determining the authority management and the user management of the login user.
5. The big data governance system according to claim 4, wherein the system management module comprises:
the authority management unit is used for managing addition authority, editing authority and deletion authority of the data;
and the user management unit is used for managing the login account.
6. The big data governance system according to claim 1, wherein the system further comprises:
and the log recording management module is used for managing historical data records.
7. The big data governance system according to claim 6, wherein the logging management module comprises:
the task log query unit is used for querying the historical data record;
the cleaning rule log query is used for querying the rule records of the historical cleaning data;
the input table log unit is used for recording input historical data;
and the output table log unit is used for recording the input historical data.
8. The big data governance system of claim 7, wherein the logging management module further comprises:
and the log classification unit is used for classifying the recorded historical data.
9. The big data governance system according to claim 1, wherein the system further comprises:
and the ETL task management module is used for testing and inquiring the tasks, monitoring and retrieving the tasks and inquiring the operation logs.
10. The big data governance system according to any one of claims 1 to 9, wherein the system further comprises a circuit board, and the data cleaning management module, the system management module, the logging management module, and the ETL task management module are connected to the circuit board.
CN202111220821.2A 2021-10-20 2021-10-20 Big data management system Pending CN113886378A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111220821.2A CN113886378A (en) 2021-10-20 2021-10-20 Big data management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111220821.2A CN113886378A (en) 2021-10-20 2021-10-20 Big data management system

Publications (1)

Publication Number Publication Date
CN113886378A true CN113886378A (en) 2022-01-04

Family

ID=79003656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111220821.2A Pending CN113886378A (en) 2021-10-20 2021-10-20 Big data management system

Country Status (1)

Country Link
CN (1) CN113886378A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115688054A (en) * 2023-01-04 2023-02-03 成都中轨轨道设备有限公司 Task classification processing method based on big data processing framework

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115688054A (en) * 2023-01-04 2023-02-03 成都中轨轨道设备有限公司 Task classification processing method based on big data processing framework

Similar Documents

Publication Publication Date Title
CN108959564B (en) Data warehouse metadata management method, readable storage medium and computer device
US11195136B2 (en) Business performance bookmarks
CN103246595B (en) Application management method, device, server and terminating unit
CN110704277B (en) Method for monitoring application performance, related equipment and storage medium
CN108509326B (en) Service state statistical method and system based on nginx log
CN111158983A (en) Integrated operation and maintenance management system
JP2009075655A (en) File management system, file management method, and file management program
CN110209518A (en) A kind of multi-data source daily record data, which is concentrated, collects storage method and device
CN113051147A (en) Database cluster monitoring method, device, system and equipment
CN111612341A (en) IT equipment allocation management and control system and allocation method thereof
CN112162960A (en) Health government affair information sharing method, device and system
CN108073720B (en) Data quality management system and method applied to big data system
CN112817958A (en) Electric power planning data acquisition method and device and intelligent terminal
CN113886378A (en) Big data management system
CN203492034U (en) Data center server and asset management system, and server management device
CN111428139A (en) Safety supervision and inspection method and system
CN111342994A (en) Network management system and method
KR100956142B1 (en) System and method for managing intellectual property based on indicated diagram type
CN110210761A (en) Structure adjusting is matched and information management system
CN111371574A (en) Operation and maintenance management system platform for monitoring machine room
CN112433888B (en) Data processing method and device, storage medium and electronic equipment
CN109412861B (en) Method for establishing security association display of terminal network
CN114328159A (en) Abnormal statement determination method, device, equipment and computer readable storage medium
JP2018147350A (en) Apparatus for analyzing actual use of information processing system, and method for analyzing actual use
CN109933798B (en) Audit log analysis method and audit log analysis device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination