Disclosure of Invention
The technical problem to be solved by the present invention is to provide a data management system and method based on big data technology, which can help enterprises to build applications of a data acquisition platform, improve development efficiency, and solve quality problems in the data production process, aiming at the above defects in the prior art.
In a first aspect, the present application provides a data management system based on big data technology, the system comprising:
the data acquisition module is used for reading the background log of the database, the environment variable information of the database and the system view, analyzing the background log and the system view of the database through a data analysis program according to the real-time change of the service data, and analyzing the real-time incremental data through a data acquisition demand service table;
the data transposition module is used for accessing the real-time incremental data to a big data environment according to the template configured by the data acquisition module;
the source end metadata sensing module is used for automatically sensing the change of the table structure when the table structure of the real-time incremental data changes, adjusting the version number of the data message and informing a downstream system;
and the monitoring and early warning module is used for checking the real-time flow quantity of the service data flow and the data delay condition, and setting an early warning strategy when the data flow is abnormal or the data delay is large.
In an optional implementation manner, the system further includes a login module, configured to provide multi-tenant management, where an administrator has a right to add an account and a password, the added account and password may be used by service personnel, and the service personnel logs in the data management system through the added account and password.
In an optional implementation manner, the system further includes a Web management module, and the Web management module is configured to uniformly manage the login module, the data acquisition module, the data transposition module, the source metadata sensing module, the data desensitization module, and the monitoring and early warning module in a form of a Web system.
In an optional implementation manner, the data acquisition module is further configured to encrypt data to form an encrypted data file; according to the service volume requirement of the data acquisition requirement service table, if the service needs full data acquisition, starting program full acquisition configuration in the encrypted data file, performing full data acquisition, and automatically performing incremental data acquisition after the acquisition is completed.
In an alternative implementation, the data transpose module imports real-time incremental data into a big data environment based on a real-time streaming high availability architecture.
In an optional implementation manner, the data desensitization module is configured to perform real-time desensitization on privacy-sensitive data in the real-time incremental data, where the privacy-sensitive data includes, but is not limited to, an identity card, a system amount, a name, and an account number.
In an alternative implementation, the desensitizing comprises: direct replacement, desensitization salting, regular expression and supporting developers to develop jar package personalized desensitization in a customized manner.
In an optional implementation manner, the monitoring and early warning module is further configured to send an email or a short message to alarm and notify the person responsible for the order in time through the early warning policy when data traffic is abnormal or data delay is large.
In a second aspect, the present application provides a data management method based on big data technology, including the steps of:
reading a database background log, database environment variable information and a reading system view, analyzing the background log and the database system view through a data analysis program according to the real-time change of service data, and analyzing real-time incremental data through a data acquisition demand service table;
accessing the real-time incremental data to a big data environment according to a template configured by the data acquisition module; when the data message is changed, the version number of the data message is adjusted, and a downstream system is informed;
and checking the real-time flow quantity of the service data flow and the data delay condition, and setting an early warning strategy when the data flow is abnormal or the data delay is large.
In a third aspect, an embodiment of the present application provides a computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the data management method according to the second aspect when executing the computer program.
The data management system based on the big data technology helps enterprises to construct a data acquisition platform application, improves development efficiency, helps the enterprises to solve the problem of big data landing in a one-stop manner, helps the enterprises to digitally innovate, promotes industrial digital upgrading, and solves the quality problem in the data production process.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a schematic diagram of a data management system 100 based on a big data technology according to an embodiment of the present invention, where the data management system includes a login module 101, a data collection module 102, a data transpose module 103, a source metadata sensing module 104, a data desensitization module 105, a monitoring and early warning module 106, and a web management module 107.
Specifically, the login module 101 is used for a system administrator to log in to the data acquisition platform through a user name and a password.
By utilizing the same login process used by the service, the user does not need to perform a separate login process in order to login to the application, which is different from the service login process. Once a user logs into an application, the application may perform one or more operations on behalf of the user with respect to the service. The operation may be performed using the identity with which the user has been associated within the application through a "shared" login process.
The data acquisition module 102 is configured to read a database background log, database environment variable information, and a system view, analyze the background log and the database system view through a data analysis program according to real-time changes of service data, analyze real-time incremental data through a data acquisition requirement service table, and support a relational database.
The data transpose module 103 is used for a stream-oriented computing method of a large data processing framework, and can synchronize incremental data and full data. Meanwhile, real-time incremental data can be accessed to a big data environment.
The source metadata sensing module 104 is used for automatically sensing the change of the table structure when the table structure of the data source is changed, adjusting the version number of the data message, and notifying a downstream system through a kafka message and an email.
Kafka is an open source stream processing platform developed by the Apache software foundation, written in Scala and Java. Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all the action flow data of a consumer in a web site. This action (web browsing, searching and other user actions) is a key factor in many social functions on modern networks. These data are typically addressed by handling logs and log aggregations due to throughput requirements. This is a viable solution to the limitations of Hadoop-like log data and offline analysis systems, but which require real-time processing. The purpose of Kafka is to unify online and offline message processing through the parallel loading mechanism of Hadoop, and also to provide real-time messages through clustering.
The data desensitization module 105 is used for desensitizing the identity card, the system amount, the name and the account number to data which is relatively privacy sensitive in the data message in real time. Data desensitization refers to data deformation of some sensitive information through desensitization rules, and reliable protection of sensitive private data is achieved. Sensitive information is shielded by means of a data desensitization technology, and the original data format and attributes of the shielded information are reserved so as to ensure that a system or an application program can normally run in the development and test, data analysis and data trial calculation processes of desensitization data. Data desensitization can be directly replaced, desensitization and salt addition are performed, and a regular expression is adopted, and developers are supported to custom develop jar package personalized desensitization.
The monitoring and early warning module 106 is configured to check the real-time traffic amount of the data stream and the data delay condition, and when the data traffic is abnormal or the data delay is large, configure a rule to set an early warning policy, send a mail or a short message to alarm and notify a person responsible for the order in time.
The Web management module 107 is used for managing a login module, a data acquisition module, a data transposition module, a source metadata learning module, a data desensitization module and a monitoring and early warning module in the form of a Web system.
Referring to fig. 2, a reference schematic diagram is shown for a login method of a data management system based on a big data technology according to an embodiment of the present invention, as shown in fig. 2, a user or an administrator may log in a big data management system platform for operation in a manner of a user name 201 and a password 202, and at the same time, some new users may also log in 204 to the big data management system platform for operation by registering 203 the user name and the password.
Optionally, the provided multi-tenant management can not only manage resources of each tenant, but also provide various services to the tenant, and the tenant first needs to apply for resources, for example: storing resources, computing resources and the like, and normally using various services provided by tenant management after successfully applying for the resources. Based on the above, the resource information may be information such as the resource amount of the storage resource, the resource amount of the computing resource, etc., requested by the target tenant, for example, the resource amount of the database resource, etc.
Please refer to fig. 3, which is a flowchart of a data management method based on big data technology according to an embodiment of the present invention, and as shown in fig. 3, the data management method based on big data technology includes the following steps:
step 301: the method comprises the steps of reading a background log of a database, environment variable information of the database and a system view, analyzing the background log and the system view of the database through a data analysis program according to real-time changes of service data, analyzing real-time incremental data through a data acquisition demand service table, opening program full acquisition configuration in an encrypted data file if the service needs full data acquisition, performing full data acquisition, and automatically performing incremental data acquisition after the acquisition is completed.
Step 302: accessing the real-time incremental data to a big data environment according to a template configured by the data acquisition module; when the data message is changed, the version number of the data message is adjusted, and a downstream system is informed;
step 303: and checking the real-time flow quantity of the service data flow and the data delay condition, and setting an early warning strategy when the data flow is abnormal or the data delay is large. When the data flow is abnormal or the data delay is large, sending mails or short messages to alarm and informing persons responsible for the orders in time through the early warning strategy.
An electronic device 400 according to this embodiment of the invention is described below with reference to fig. 4. The electronic device 400 shown in fig. 4 is only an example and should not bring any limitation to the function and the scope of use of the embodiments of the present invention. As shown in fig. 4, electronic device 400 is embodied in the form of a general purpose computing device. The components of electronic device 400 may include, but are not limited to: the at least one processing unit 410, the at least one memory unit 420, and a bus 430 that couples various system components including the memory unit 420 and the processing unit 410. Wherein the storage unit stores program code that can be executed by the processing unit 410 such that the processing unit 410 performs the steps according to various exemplary embodiments of the present invention as described in the above section "example methods" of the present specification. The storage unit 420 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)421 and/or a cache memory unit 422, and may further include a read only memory unit (ROM) 423. The storage unit 420 may also include a program/utility 424 having a set (at least one) of program modules 425, such program modules 425 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 430 may be any bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures. The electronic device 400 may also communicate with one or more external devices 500 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 400, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 400 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 450. Also, the electronic device 400 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 460. As shown, the network adapter 460 communicates with the other modules of the electronic device 400 over the bus 430. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others. Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.
As shown in fig. 5, a program product 500 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
The big data management platform focuses on the aspects of big data management, application development, intelligent visual analysis and the like, helps enterprises to solve the problem of falling of big data in a one-stop manner, and helps the enterprises to digitally innovate and promote industrial digital upgrading. It has the following functions:
the workstation environment of the big data management platform is divided into a hardware environment and a software environment. For a hardware environment, a common PC computer can be selected for work, and the basic configuration is 1) CPU (Central processing Unit) more than Intel core i3, 2) memory more than 1G, 3) hard disk more than 100G, 4) network card (Modem or 10M/100M network card). For a software environment, the following software must be installed in the workstation computer 1) Windows 2003/XP/7/8/8.1; 2) IE 8.0 or higher (google browser is suggested, only needed to open server operations).
In summary, in the data management system based on the big data technology in this embodiment, a login module, a data acquisition module, a data transposition module, a source metadata sensing module, a data desensitization module, a monitoring and early warning module, and a Web management module are provided. By adopting the embodiment of the invention, the enterprise can be helped to construct a data acquisition platform application, the development efficiency is improved, the enterprise is helped to solve the problem of large data landing in a one-stop manner, the enterprise digital innovation is boosted, and the industrial digital upgrade is promoted to solve the quality problem in the data production process.
The foregoing is merely an example of the present invention, and common general knowledge in the field of known specific structures and characteristics is not described herein in any greater extent than that known in the art at the filing date or prior to the priority date of the application, so that those skilled in the art can now appreciate that all of the above-described techniques in this field and have the ability to apply routine experimentation before this date can be combined with one or more of the present teachings to complete and implement the present invention, and that certain typical known structures or known methods do not pose any impediments to the implementation of the present invention by those skilled in the art. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.