WO2021013057A1 - Data management method and apparatus, and device and computer-readable storage medium - Google Patents

Data management method and apparatus, and device and computer-readable storage medium Download PDF

Info

Publication number
WO2021013057A1
WO2021013057A1 PCT/CN2020/102540 CN2020102540W WO2021013057A1 WO 2021013057 A1 WO2021013057 A1 WO 2021013057A1 CN 2020102540 W CN2020102540 W CN 2020102540W WO 2021013057 A1 WO2021013057 A1 WO 2021013057A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
type
file
source
import
Prior art date
Application number
PCT/CN2020/102540
Other languages
French (fr)
Chinese (zh)
Inventor
王和平
尹强
刘有
黄山
杨峙岳
邸帅
卢道和
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021013057A1 publication Critical patent/WO2021013057A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/116Details of conversion of file system types or formats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the technical field of financial technology (Fintech), in particular to data management methods, devices, equipment and computer-readable storage media.
  • the existing data management method is not associated with each database, and the logic language of each database is different, generally only for a single database, only export data from the database to the local, or import local data into the database, and import
  • the export method is relatively limited, and the data cannot be processed during the import and export process, resulting in a rigid import and export method, and the data cannot be intelligently managed.
  • the main purpose of this application is to propose a data management method, device, equipment and computer-readable storage medium, aiming to realize intelligent management of data.
  • the data management method includes the following steps:
  • the data content of the first data corresponding to the data import request is read, and the first data type corresponding to the first data is determined based on the data content
  • the steps include:
  • the number of occurrences of each data type in the second data type is counted, and the data type with the most frequency is determined as the first data type corresponding to the first data.
  • the step of determining a first data source corresponding to the first data based on the first data type includes:
  • the second data source is used as the first data source corresponding to the first data.
  • the step of importing the first data set into the first data source includes:
  • the data management method further includes:
  • configuration information of the data export request When a data export request is detected, obtain configuration information of the data export request, where the configuration information includes a third data source, query statement, file format, and output path;
  • the second data set is written into the file write-out object, and the file write-out object is exported to a terminal corresponding to the output path.
  • the file format includes a second column of information, a second conversion format and a file format type corresponding to the second column of information, and the second data is generated based on the file format.
  • Data set, and the step of determining the file writing object corresponding to the second data set includes:
  • the step of writing the second data set into the file and writing out the object includes:
  • this application also provides a data management device, the data management device including:
  • a reading module configured to read the data content of the first data corresponding to the data import request when a data import request is detected, and determine the first data type corresponding to the first data based on the data content;
  • a determining module configured to determine a first data source corresponding to the first data based on the first data type
  • a generating module configured to determine a first conversion format and first column information of the first data, and generate a first data set from the first data based on the first conversion format and the first column information;
  • the import module is used to import the first data set into the first data source.
  • the reading module is further used for:
  • the number of occurrences of each data type in the second data type is counted, and the data type with the most frequency is determined as the first data type corresponding to the first data.
  • the determining module is further used for:
  • the second data source is used as the first data source corresponding to the first data.
  • the import module is also used to:
  • the data management device further includes:
  • the obtaining module is configured to obtain configuration information of the data export request when the data export request is detected, the configuration information including the third data source, query statement, file format and output path;
  • the obtaining module is further configured to obtain second data corresponding to the data export request based on the third data source and the query sentence;
  • the generating module is further configured to generate a second data set from the second data based on the file format, and determine a file writing object corresponding to the second data set;
  • the export module is configured to write the second data set into the file write-out object, and export the file write-out object to the terminal corresponding to the output path.
  • the file format includes a second column of information, a second conversion format and file format type corresponding to the second column of information, and the generating module is further configured to:
  • the export module is further used to:
  • this application also provides a data management device, the data management device includes: a memory, a processor, and a data management program stored on the memory and running on the processor, so When the data management program is executed by the processor, the steps of the data management method described above are implemented.
  • the present application also provides a computer-readable storage medium having a data management program stored on the computer-readable storage medium, and when the data management program is executed by a processor, the data management as described above is realized. Method steps.
  • the data management method proposed in this application when a data import request is detected, the data content of the first data corresponding to the data import request is read, and the first data type corresponding to the first data is determined based on the data content ; Based on the first data type, determine the first data source corresponding to the first data; determine the first conversion format and first column information of the first data, and based on the first conversion format and the In the first column of information, a first data set is generated from the first data; the first data set is imported into the first data source.
  • this application processes the data corresponding to the data import request, and imports the processed data into the data source by determining the corresponding data source to realize intelligent data management.
  • FIG. 1 is a schematic diagram of a device structure of a hardware operating environment involved in a solution of an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a second embodiment of the data management method of this application.
  • FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application.
  • the device in the embodiment of this application may be a PC or a server device.
  • the device may include a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 1005 may also be a storage device independent of the foregoing processor 1001.
  • FIG. 1 does not constitute a limitation on the device, and may include more or fewer components than those shown in the figure, or a combination of certain components, or different component arrangements.
  • a memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a data management program.
  • the operating system is a program that manages and controls data management equipment and software resources, and supports the operation of network communication modules, user interface modules, data management programs, and other programs or software;
  • the network communication module is used to manage and control the network interface 1002; users
  • the interface module is used to manage and control the user interface 1003.
  • the data management device calls the data management program stored in the memory 1005 through the processor 1001, and executes the operations in the following embodiments of the data management method.
  • FIG. 2 is a schematic flowchart of a first embodiment of a data management method according to this application, and the method includes:
  • Step S10 when a data import request is detected, read the data content of the first data corresponding to the data import request, and determine the first data type corresponding to the first data based on the data content;
  • Step S20 Determine a first data source corresponding to the first data based on the first data type
  • Step S30 Determine a first conversion format and first column information of the first data, and generate a first data set from the first data based on the first conversion format and the first column information;
  • Step S40 Import the first data set into the first data source.
  • the data management method of this embodiment is applied to the data management equipment of financial institutions such as financial institutions or banking systems.
  • the data management equipment is hereinafter referred to as the management equipment, and the management equipment may be a terminal or a PC equipment.
  • Spark a fast and universal computing engine designed for large-scale data processing
  • the management device supports importing and exporting to various types of data storage components such as Hive, Mysql, Oracle, HDFS, Hbase, Mongodb, etc.
  • the data storage component class is increased through the DataSource API provided by Spark.
  • the specific The program segment is edited according to actual needs, so that DataSourceAPI supports connecting multiple data sources.
  • the implementation of this embodiment relies on Spark’s distributed computing capabilities and the DataSource API (data source call interface) that supports the connection of multiple data sources.
  • Spark s native Datasource (is a set of connections to external data sources and the Spark engine). Framework, it is mainly to provide the Spark framework with the ability to quickly read external data. It can easily register different data formats as Spark tables through the DataSource API (call interface)). It has been implemented for JSON, ORC, Parquet, etc. File format support, but the supported formats are limited. It does not meet actual needs.
  • the embodiment of this application adds support for Excel (such as supporting xls version 03 and xlsx after version 07), CSV, TXT and other file formats through the DataSource API provided by Spark. Segments are edited according to the file format, that is, by adding supporting file formats in the management device, the management device can import and export multiple types of files.
  • the management device of this embodiment when detecting a data import request, processes the data corresponding to the data import request, and imports the processed data into the data source by determining the corresponding data source to realize data intelligence management.
  • Step S10 when a data import request is detected, read the data content of the first data corresponding to the data import request, and determine the first data type corresponding to the first data based on the data content;
  • the management device can complete the data import.
  • the management device when the management device detects the data import request, it reads the data content of the first data corresponding to the data import request, and identifies the first data type corresponding to the first data according to the data content, that is, the first data When importing the corresponding data source, first determine the first data type of the first data, so that the first data can be subsequently imported into the correct data source.
  • the management device supports import and export, such as Hive, Mysql, Oracle, HDFS, Hbase, and Mongodb, etc., in order to achieve accurate first data
  • the first data type of the first data needs to be determined first.
  • step S10 includes:
  • Step a When a data import request is detected, read the data content of the preset number of rows of first data corresponding to the data import request, and determine the second data type to which the column information of each column in the data content belongs;
  • the file reading object (Reader) in the management device will read the first data corresponding to the data import request.
  • the number of rows can be preset to read, that is, the reader only needs to read the data content of the preset number of rows, and judge the first data type of the first data by the read data content
  • the preset number of rows may refer to the previous preset number of rows of the first data, and then the second data type to which the column information of each column in the read data content belongs is determined. If the column information of the current column is a number, the data type of the current column is determined to be a number type; if the column information of the current column is a character, then the data type of the current column information is determined to be a character type, etc.
  • Step b Count the number of occurrences of each data type in the second data type, and determine the data type with the largest number of times as the first data type corresponding to the first data.
  • the second data type to which the column information of each column in the read data content belongs count the number of occurrences of each data type in the second data type, and determine the data type with the most occurrences as the first data For the corresponding first data type, if the number type appears the most times, the first data is determined to be a number type; if the character type appears the most times, the first data is determined to be a character type, etc.
  • the preset number of rows is preferably 10 rows, that is, the Reader reads the data content of the first 10 rows of the first data, and the data type inference device of the management device judges the data type of the data content of the first 10 rows. Determine the data type of each column in the data content, and infer by judging which type appears the most times, such as: user: String, orderId: Int.
  • the judgment result that is, the first data type
  • the management device changes the data type of the first data according to the user's modification wishes.
  • Step S20 Determine a first data source corresponding to the first data based on the first data type.
  • the management device determines the first data source corresponding to the first data based on the determined first data type, that is, determines the first data source into which the first data is to be imported. Specifically, the data type and the data source are mapped in advance to obtain the data type-data source mapping table. When the first data type of the first data is determined, the data type-data source mapping table can be used to determine the first data corresponding The first data source.
  • step S20 includes:
  • Step c Determine the first data source corresponding to the first data based on the first data type, and return the first data source to the client corresponding to the data import request;
  • the management device determines the first data source corresponding to the first data type based on the first data type, and returns the first data source to the client corresponding to the data import request for confirmation by the user of the client.
  • Step d if a second data source sent by the user terminal based on the first data source is received, the second data source is used as the first data source corresponding to the first data.
  • the user can confirm through the user terminal whether the inference of the management device is correct, and if the inference is incorrect, the user terminal can send a corresponding modification instruction for the management device to modify the data type of the first data.
  • the management device receives the second data source sent by the user terminal based on the first data, it uses the second data source as the first data source corresponding to the first data; if it does not receive the user terminal sent based on the first data source If the second data source is given, or a confirmation instruction based on the first data source is received, the first data source is determined to be the data source corresponding to the first data.
  • the management device determines the first data type of the first data, it returns the first data type to the user for confirmation, thereby improving the accuracy of judging the data type of the first data, and the management device is checking the data type of the first data.
  • the modified data type will be saved, so that the next time it encounters the same data content as the first data, the data type can be accurately obtained.
  • Step S30 Determine a first conversion format and first column information of the first data, and generate a first data set from the first data based on the first conversion format and the first column information.
  • the management device determines the first conversion format of the first data and the first column information, so as to perform conversion processing on the first column information according to the first conversion format, wherein the conversion processing includes data desensitization processing and data type
  • Data desensitization refers to the transformation of certain sensitive information through desensitization rules to achieve reliable protection of sensitive private data.
  • the real data should be modified and used for testing without violating system rules, such as personal information such as ID card number, mobile phone number, card number, customer number, etc.
  • the first conversion format and the first column of information of the first data can be user-defined, that is, when the user initiates a data import request, the first conversion format and the first column of information of the first data are defined. Decrypt user information in a data, etc.
  • the first column of information of the first data is processed according to the first conversion format, so as to generate the first data set from the first data.
  • the first data set may be data in multiple files. If the data that the user wants to import is the data in the import file A, the data in the import file B, and the data in the import file C, then the first data in this embodiment is the import file A, the import file B, and the import file C.
  • the first data is converted, that is, the imported files A, B, and C are processed, and finally the first data set is merged.
  • the first data set is specifically a DataFrame (a tabular data structure that contains a group of Ordered columns, each column can have a different value, is a distributed data set organized in named columns).
  • Step S40 Import the first data set into the first data source.
  • the calling interface is an interface reserved by the data management device based on Spark technology, through which data transmission of distributed data sources can be realized.
  • each data source corresponds to a dedicated call interface, that is, after the first data source corresponding to the first data set is determined, the call interface corresponding to the first data source needs to be determined, and the call Interface, import the first data set into the first data source, that is, import the data of the first data source through the calling interface corresponding to the first data source.
  • the calling interfaces of different data sources can be integrated into a general calling interface.
  • the specific program segments can be edited according to actual needs.
  • Through the general calling interface different data sources can be realized. Data transmission, that is, no matter which data source the data is imported into, it is imported to the corresponding data source through a common calling interface.
  • step S40 includes:
  • Step e determining the writing type of the first data set
  • the user can also customize the write type of the first data set, where the write type includes new data, overwritten data, and additional data. For example, the user selects the user order form and selects data addition.
  • the management device can determine the writing type of the first data set, so as to subsequently write the first data set.
  • Step f Import the first data set into the first data source according to the write type.
  • the management device calls the corresponding calling interface based on the determined first data source, and imports the first data set into the first data source through the calling interface according to the determined writing type.
  • the data content of the first data corresponding to the data import request is read, and the first data corresponding to the first data is determined based on the data content.
  • a data type based on the first data type, determine the first data source corresponding to the first data; determine the first conversion format and first column information of the first data, and based on the first conversion format And the first column of information, generating a first data set from the first data; importing the first data set into the first data source.
  • this application processes the data corresponding to the data import request, and imports the processed data into the data source by determining the corresponding data source to realize intelligent data management.
  • the data management method further includes:
  • Step S50 When a data export request is detected, obtain configuration information of the data export request, where the configuration information includes a third data source, query sentences, file format, and output path;
  • Step S60 Acquire second data corresponding to the data export request based on the third data source and the query sentence;
  • Step S70 Generate a second data set from the second data based on the file format, and determine a file writing object corresponding to the second data set;
  • Step S80 Write the second data set into the file write-out object, and export the file write-out object to a terminal corresponding to the output path.
  • the corresponding second data is determined, the second data is processed into a second data set, and the second data set is written into the corresponding file write-out object to export, Realize the intelligent management of data.
  • Step S50 When a data export request is detected, configuration information of the data export request is obtained, where the configuration information includes a third data source, a query sentence, a file format, and an output path.
  • the management device when it detects the data export request, it obtains the configuration information of the data export request.
  • the configuration information is configured by the user.
  • the configuration information includes the third data source, query statement, file format, and output path. That is, when the user exports data, he can select the corresponding data source and the corresponding data table that needs to be exported, such as the user order table in the Mysql library, and define the query statement for the data to be exported from the data table, and Define the data conversion of the column information of the specified column.
  • the management device can determine the corresponding parameters according to the user's configuration information.
  • Step S60 Acquire second data corresponding to the data export request based on the third data source and the query sentence.
  • the management device obtains the second data corresponding to the data export request based on the third data source and the corresponding query statement, and specifically obtains the corresponding data table from the third data source, and uses the query statement in the data table Extract the second data corresponding to the data export request in the data export request, where the second data may be data of a single file or data of multiple files.
  • Step S70 Based on the file format, generate a second data set from the second data, and determine a file writing object corresponding to the second data set.
  • the management device processes the determined second data according to the file format configured by the user, such as desensitization processing, to generate a second data set, which is specifically also a DataFrame, and determines the second data Set the corresponding file to write out the object.
  • step S70 includes:
  • Step g generating a second data set from the second data based on the second conversion format and the second column information
  • the management device generates a second data set from the second data according to the second column information and the second conversion format corresponding to the second column information. Specifically, the second column information is extracted from the second data, and The second column information is converted according to the second conversion format, such as decryption, and the second data is generated into a second data set.
  • Step h Determine a file writing object corresponding to the second data set based on the file format type.
  • the management device determines the file writing object corresponding to the second data set based on the file format type. Specifically, a mapping table between the file format type and the file writing object can be established in advance. After determining the file format type selected by the user, You can determine the corresponding file write-out object. Such as: support Spark Excel file to write out objects.
  • the write module (Writer) in the management device supports multiple file format types such as Excel, csv, Json, etc.
  • Step S80 Write the second data set into the file write-out object, and export the file write-out object to a terminal corresponding to the output path.
  • the management device after determining the file write-out object of the second data set, the management device writes the second data set into the file write-out object, and exports the file write-out object to the corresponding output path configured by the user Terminal, such as: the output path is /home/username/orders.xlsx.
  • step of writing the second data set into the file write-out object includes:
  • Step i Traverse the partitions of the second data set, and write the second data set into the file write-out object according to a write mode of one partition at a time.
  • the management device traverses the partitions of the second data set.
  • the second data set that is, the DataFrame
  • each partition stores data.
  • the partitions are defined by the user in advance, such as Based on the hash rule, the DataFrame is divided into multiple areas. For example, data with a hash value of A is placed in area a, and data with a hash value of B is placed in area b.
  • the management device is written in one partition at a time. Write the second data set to the file write-out object.
  • This embodiment is to prevent the memory overflow problem during the writing process. Therefore, the Writer part is modified and Spark is called ToLocalIterator to traverse the partitions of the DataFrame, collect data in a way that collects one partition at a time, and provides a general writing scheme that can be written to HDFS (Hadoop Distributed File System is an implementation of Hadoop abstract file system, which refers to distributed file system) and local file system.
  • HDFS Hadoop Distributed File System is an implementation of Hadoop abstract file system, which refers to distributed file system
  • configuration information of the data export request is acquired.
  • the configuration information includes a third data source, query sentences, file format, and output path; based on the third data source and the Query sentence to obtain the second data corresponding to the data export request; generate a second data set from the second data based on the file format, and determine the file corresponding to the second data set to write out objects; The second data set is written into the file write-out object, and the file write-out object is exported to the terminal corresponding to the output path.
  • the corresponding second data is determined, and the second data is processed into a second data set, and the second data set is written into the corresponding file write-out object to export the data. Intelligent management.
  • the application also provides a data management device.
  • the data management device of this application includes:
  • a reading module configured to read the data content of the first data corresponding to the data import request when a data import request is detected, and determine the first data type corresponding to the first data based on the data content;
  • a determining module configured to determine a first data source corresponding to the first data based on the first data type
  • a generating module configured to determine a first conversion format and first column information of the first data, and generate a first data set from the first data based on the first conversion format and the first column information;
  • the import module is used to import the first data set into the first data source.
  • reading module is also used for:
  • the number of occurrences of each data type in the second data type is counted, and the data type with the most frequency is determined as the first data type corresponding to the first data.
  • the determining module is also used for:
  • the second data source is used as the first data source corresponding to the first data.
  • import module is also used for:
  • the data management device further includes:
  • the obtaining module is configured to obtain configuration information of the data export request when the data export request is detected, the configuration information including the third data source, query statement, file format and output path;
  • the obtaining module is further configured to obtain second data corresponding to the data export request based on the third data source and the query sentence;
  • the generating module is further configured to generate a second data set from the second data based on the file format, and determine a file writing object corresponding to the second data set;
  • the export module is configured to write the second data set into the file write-out object, and export the file write-out object to the terminal corresponding to the output path.
  • the file format includes a second column of information, a second conversion format and file format type corresponding to the second column of information, and the generating module is further configured to:
  • export module is also used for:
  • the application also provides a computer-readable storage medium.
  • the computer-readable storage medium of the present application stores a data management program, and when the data management program is executed by a processor, the steps of the data management method described above are realized.

Abstract

A data management method, a data management apparatus, a device and a computer-readable storage medium. The method comprises: when a data importing request is detected, reading data content of first data corresponding to the data importing request, and determining, on the basis of the data content, a first data type corresponding to the first data (S10); determining, on the basis of the first data type, a first data source corresponding to the first data (S20); determining a first conversion format and a first column of information of the first data, and generating, on the basis of the first conversion format and the first column of information, a first data set from the first data (S30); and importing the first data set into the first data source (S40).

Description

数据管理方法、装置、设备与计算机可读存储介质Data management method, device, equipment and computer readable storage medium
本申请要求于2019年7月19日申请的、申请号为201910655646.6、名称为“数据管理方法、装置、设备与计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on July 19, 2019, the application number is 201910655646.6, and the title is "data management methods, devices, equipment and computer-readable storage media", the entire contents of which are incorporated by reference In this application.
技术领域Technical field
本申请涉及金融科技(Fintech)技术领域,尤其涉及数据管理方法、装置、设备与计算机可读存储介质。This application relates to the technical field of financial technology (Fintech), in particular to data management methods, devices, equipment and computer-readable storage media.
背景技术Background technique
近年来,随着金融科技(Fintech),尤其是互联网金融的不断发展,大数据技术被引入银行等金融机构的日常业务中。在银行等金融机构的日常服务过程中,数据分析或者数据仓库等岗位的人员需要将数据从数据库中导出,以进行数据分析;或者业务人员应客户需求,需要将数据导出到文件中,以将该文件发送给客户;或者业务人员拿到数据,需要将当前数据导入数据库中保存,很明显,数据的导入导出是银行等金融机构必做的一项数据管理工作。In recent years, with the continuous development of Fintech, especially Internet finance, big data technology has been introduced into the daily business of financial institutions such as banks. In the daily service process of financial institutions such as banks, personnel in data analysis or data warehouse positions need to export data from the database for data analysis; or business personnel need to export data to a file in response to customer needs The file is sent to the customer; or the business personnel get the data and need to import the current data into the database for storage. Obviously, the import and export of data is a data management task that banks and other financial institutions must do.
而现有的数据管理方式由于各数据库之间没有联合,且各数据库的逻辑语言不同,一般只针对单个数据库,仅将数据从数据库中导出到本地,或者将本地的数据导入到数据库中,导入导出方式较为局限,并且在导入导出过程中不能对数据进行处理,导致导入导出方式较为僵硬,无法对数据进行智能化管理。However, the existing data management method is not associated with each database, and the logic language of each database is different, generally only for a single database, only export data from the database to the local, or import local data into the database, and import The export method is relatively limited, and the data cannot be processed during the import and export process, resulting in a rigid import and export method, and the data cannot be intelligently managed.
技术解决方案Technical solutions
本申请的主要目的在于提出一种数据管理方法、装置、设备与计算机可读存储介质,旨在实现数据的智能化管理。The main purpose of this application is to propose a data management method, device, equipment and computer-readable storage medium, aiming to realize intelligent management of data.
为实现上述目的,本申请提供一种数据管理方法,所述数据管理方法包括如下步骤:In order to achieve the above objective, the present application provides a data management method. The data management method includes the following steps:
当检测到数据导入请求时,读取所述数据导入请求对应的第一数据的数据内容,并基于所述数据内容确定所述第一数据对应的第一数据类型;When a data import request is detected, read the data content of the first data corresponding to the data import request, and determine the first data type corresponding to the first data based on the data content;
基于所述第一数据类型,确定所述第一数据对应的第一数据源;Determine the first data source corresponding to the first data based on the first data type;
确定所述第一数据的第一转换格式和第一列信息,并基于所述第一转换格式和所述第一列信息,将所述第一数据生成第一数据集;Determining a first conversion format and first column information of the first data, and generating a first data set from the first data based on the first conversion format and the first column information;
将所述第一数据集导入所述第一数据源。Import the first data set into the first data source.
在一实施例中,所述当检测到数据导入请求时,读取所述数据导入请求对应的第一数据的数据内容,并基于所述数据内容确定所述第一数据对应的第一数据类型的步骤包括:In an embodiment, when a data import request is detected, the data content of the first data corresponding to the data import request is read, and the first data type corresponding to the first data is determined based on the data content The steps include:
当检测到数据导入请求时,读取所述数据导入请求对应的第一数据预设行数的数据内容,并判断所述数据内容中每一列的列信息所属的第二数据类型;When a data import request is detected, read the data content of the preset number of rows of first data corresponding to the data import request, and determine the second data type to which the column information of each column in the data content belongs;
统计所述第二数据类型中各数据类型出现的次数,并将次数最多的数据类型确定为所述第一数据对应的第一数据类型。The number of occurrences of each data type in the second data type is counted, and the data type with the most frequency is determined as the first data type corresponding to the first data.
在一实施例中,所述基于所述第一数据类型,确定所述第一数据对应的第一数据源的步骤包括:In an embodiment, the step of determining a first data source corresponding to the first data based on the first data type includes:
基于所述第一数据类型,确定所述第一数据对应的第一数据源,并将所述第一数据源返回所述数据导入请求对应的用户端;Based on the first data type, determine the first data source corresponding to the first data, and return the first data source to the client corresponding to the data import request;
若接收到所述用户端基于所述第一数据源发送的第二数据源,则将所述第二数据源作为所述第一数据对应的第一数据源。If a second data source sent by the user terminal based on the first data source is received, the second data source is used as the first data source corresponding to the first data.
在一实施例中,所述将所述第一数据集导入所述第一数据源的步骤包括:In an embodiment, the step of importing the first data set into the first data source includes:
确定所述第一数据集的写入类型;Determining the writing type of the first data set;
按照所述写入类型将所述第一数据集导入所述第一数据源。Import the first data set into the first data source according to the write type.
在一实施例中,所述数据管理方法还包括:In an embodiment, the data management method further includes:
当检测到数据导出请求时,获取所述数据导出请求的配置信息,所述配置信息包括第三数据源、查询语句、文件格式和输出路径;When a data export request is detected, obtain configuration information of the data export request, where the configuration information includes a third data source, query statement, file format, and output path;
基于所述第三数据源和所述查询语句,获取所述数据导出请求对应的第二数据;Obtaining second data corresponding to the data export request based on the third data source and the query sentence;
基于所述文件格式,将所述第二数据生成第二数据集,并确定所述第二数据集对应的文件写出对象;Based on the file format, generating a second data set from the second data, and determining a file writing object corresponding to the second data set;
将所述第二数据集写入所述文件写出对象,并将所述文件写出对象导出至所述输出路径对应的终端。The second data set is written into the file write-out object, and the file write-out object is exported to a terminal corresponding to the output path.
在一实施例中,所述文件格式包括第二列信息,所述第二列信息对应的第二转换格式和文件格式类型,所述基于所述文件格式,将所述第二数据生成第二数据集,并确定所述第二数据集对应的文件写出对象的步骤包括:In an embodiment, the file format includes a second column of information, a second conversion format and a file format type corresponding to the second column of information, and the second data is generated based on the file format. Data set, and the step of determining the file writing object corresponding to the second data set includes:
基于所述第二转换格式和所述第二列信息,将所述第二数据生成第二数据集;Generating a second data set from the second data based on the second conversion format and the second column information;
基于所述文件格式类型,确定所述第二数据集对应的文件写出对象。Based on the file format type, determine the file writing object corresponding to the second data set.
在一实施例中,所述将所述第二数据集写入所述文件写出对象的步骤包括:In an embodiment, the step of writing the second data set into the file and writing out the object includes:
遍历所述第二数据集的分区,并按照一次一个分区的写入方式,将所述第二数据集写入所述文件写出对象。Traverse the partitions of the second data set, and write the second data set into the file write-out object according to a write mode of one partition at a time.
此外,为实现上述目的,本申请还提供一种数据管理装置,所述数据管理装置包括:In addition, in order to achieve the above objective, this application also provides a data management device, the data management device including:
读取模块,用于当检测到数据导入请求时,读取所述数据导入请求对应的第一数据的数据内容,并基于所述数据内容确定所述第一数据对应的第一数据类型;A reading module, configured to read the data content of the first data corresponding to the data import request when a data import request is detected, and determine the first data type corresponding to the first data based on the data content;
确定模块,用于基于所述第一数据类型,确定所述第一数据对应的第一数据源;A determining module, configured to determine a first data source corresponding to the first data based on the first data type;
生成模块,用于确定所述第一数据的第一转换格式和第一列信息,并基于所述第一转换格式和所述第一列信息,将所述第一数据生成第一数据集;A generating module, configured to determine a first conversion format and first column information of the first data, and generate a first data set from the first data based on the first conversion format and the first column information;
导入模块,用于将所述第一数据集导入所述第一数据源。The import module is used to import the first data set into the first data source.
在一实施例中,所述读取模块还用于:In an embodiment, the reading module is further used for:
当检测到数据导入请求时,读取所述数据导入请求对应的第一数据预设行数的数据内容,并判断所述数据内容中每一列的列信息所属的第二数据类型;When a data import request is detected, read the data content of the preset number of rows of first data corresponding to the data import request, and determine the second data type to which the column information of each column in the data content belongs;
统计所述第二数据类型中各数据类型出现的次数,并将次数最多的数据类型确定为所述第一数据对应的第一数据类型。The number of occurrences of each data type in the second data type is counted, and the data type with the most frequency is determined as the first data type corresponding to the first data.
在一实施例中,所述确定模块还用于:In an embodiment, the determining module is further used for:
基于所述第一数据类型,确定所述第一数据对应的第一数据源,并将所述第一数据源返回所述数据导入请求对应的用户端;Based on the first data type, determine the first data source corresponding to the first data, and return the first data source to the client corresponding to the data import request;
若接收到所述用户端基于所述第一数据源发送的第二数据源,则将所述第二数据源作为所述第一数据对应的第一数据源。If a second data source sent by the user terminal based on the first data source is received, the second data source is used as the first data source corresponding to the first data.
在一实施例中,所述导入模块还用于:In an embodiment, the import module is also used to:
确定所述第一数据集的写入类型;Determining the writing type of the first data set;
按照所述写入类型将所述第一数据集导入所述第一数据源。Import the first data set into the first data source according to the write type.
在一实施例中,所述数据管理装置还包括:In an embodiment, the data management device further includes:
获取模块,用于当检测到数据导出请求时,获取所述数据导出请求的配置信息,所述配置信息包括第三数据源、查询语句、文件格式和输出路径;The obtaining module is configured to obtain configuration information of the data export request when the data export request is detected, the configuration information including the third data source, query statement, file format and output path;
所述获取模块,还用于基于所述第三数据源和所述查询语句,获取所述数据导出请求对应的第二数据;The obtaining module is further configured to obtain second data corresponding to the data export request based on the third data source and the query sentence;
所述生成模块,还用于基于所述文件格式,将所述第二数据生成第二数据集,并确定所述第二数据集对应的文件写出对象;The generating module is further configured to generate a second data set from the second data based on the file format, and determine a file writing object corresponding to the second data set;
导出模块,用于将所述第二数据集写入所述文件写出对象,并将所述文件写出对象导出至所述输出路径对应的终端。The export module is configured to write the second data set into the file write-out object, and export the file write-out object to the terminal corresponding to the output path.
在一实施例中,所述文件格式包括第二列信息,所述第二列信息对应的第二转换格式和文件格式类型,所述生成模块还用于:In an embodiment, the file format includes a second column of information, a second conversion format and file format type corresponding to the second column of information, and the generating module is further configured to:
基于所述第二转换格式和所述第二列信息,将所述第二数据生成第二数据集;Generating a second data set from the second data based on the second conversion format and the second column information;
基于所述文件格式类型,确定所述第二数据集对应的文件写出对象。Based on the file format type, determine the file writing object corresponding to the second data set.
在一实施例中,所述导出模块还用于:In an embodiment, the export module is further used to:
遍历所述第二数据集的分区,并按照一次一个分区的写入方式,将所述第二数据集写入所述文件写出对象。Traverse the partitions of the second data set, and write the second data set into the file write-out object according to a write mode of one partition at a time.
此外,为实现上述目的,本申请还提供一种数据管理设备,所述数据管理设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的数据管理程序,所述数据管理程序被所述处理器执行时实现如上所述的数据管理方法的步骤。In addition, in order to achieve the above object, this application also provides a data management device, the data management device includes: a memory, a processor, and a data management program stored on the memory and running on the processor, so When the data management program is executed by the processor, the steps of the data management method described above are implemented.
此外,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有数据管理程序,所述数据管理程序被处理器执行时实现如上所述的数据管理方法的步骤。In addition, in order to achieve the above objective, the present application also provides a computer-readable storage medium having a data management program stored on the computer-readable storage medium, and when the data management program is executed by a processor, the data management as described above is realized. Method steps.
本申请提出的数据管理方法,当检测到数据导入请求时,读取所述数据导入请求对应的第一数据的数据内容,并基于所述数据内容确定所述第一数据对应的第一数据类型;基于所述第一数据类型,确定所述第一数据对应的第一数据源;确定所述第一数据的第一转换格式和第一列信息,并基于所述第一转换格式和所述第一列信息,将所述第一数据生成第一数据集;将所述第一数据集导入所述第一数据源。本申请在检测到数据导入请求时,对数据导入请求对应的数据进行加工处理,并通过确定对应的数据源,将加工处理后的数据导入到数据源中,实现数据的智能化管理。In the data management method proposed in this application, when a data import request is detected, the data content of the first data corresponding to the data import request is read, and the first data type corresponding to the first data is determined based on the data content ; Based on the first data type, determine the first data source corresponding to the first data; determine the first conversion format and first column information of the first data, and based on the first conversion format and the In the first column of information, a first data set is generated from the first data; the first data set is imported into the first data source. When a data import request is detected, this application processes the data corresponding to the data import request, and imports the processed data into the data source by determining the corresponding data source to realize intelligent data management.
附图说明Description of the drawings
图1是本申请实施例方案涉及的硬件运行环境的设备结构示意图;FIG. 1 is a schematic diagram of a device structure of a hardware operating environment involved in a solution of an embodiment of the present application;
图2为本申请数据管理方法第一实施例的流程示意图;2 is a schematic flowchart of the first embodiment of the data management method of this application;
图3为本申请数据管理方法第二实施例的流程示意图。FIG. 3 is a schematic flowchart of a second embodiment of the data management method of this application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
本发明的实施方式Embodiments of the invention
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the application, and are not used to limit the application.
如图1所示,图1是本申请实施例方案涉及的硬件运行环境的设备结构示意图。As shown in FIG. 1, FIG. 1 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application.
本申请实施例设备可以是PC机或服务器设备。The device in the embodiment of this application may be a PC or a server device.
如图1所示,该设备可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。As shown in FIG. 1, the device may include a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 may be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a magnetic disk memory. Optionally, the memory 1005 may also be a storage device independent of the foregoing processor 1001.
本领域技术人员可以理解,图1中示出的设备结构并不构成对设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the structure of the device shown in FIG. 1 does not constitute a limitation on the device, and may include more or fewer components than those shown in the figure, or a combination of certain components, or different component arrangements.
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及数据管理程序。As shown in Fig. 1, a memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a data management program.
其中,操作系统是管理和控制数据管理设备与软件资源的程序,支持网络通信模块、用户接口模块、数据管理程序以及其他程序或软件的运行;网络通信模块用于管理和控制网络接口1002;用户接口模块用于管理和控制用户接口1003。Among them, the operating system is a program that manages and controls data management equipment and software resources, and supports the operation of network communication modules, user interface modules, data management programs, and other programs or software; the network communication module is used to manage and control the network interface 1002; users The interface module is used to manage and control the user interface 1003.
在图1所示的数据管理设备中,所述数据管理设备通过处理器1001调用存储器1005中存储的数据管理程序,并执行下述数据管理方法各个实施例中的操作。In the data management device shown in FIG. 1, the data management device calls the data management program stored in the memory 1005 through the processor 1001, and executes the operations in the following embodiments of the data management method.
基于上述硬件结构,提出本申请数据管理方法实施例。Based on the above hardware structure, an embodiment of the data management method of this application is proposed.
参照图2,图2为本申请数据管理方法第一实施例的流程示意图,所述方法包括:Referring to Fig. 2, Fig. 2 is a schematic flowchart of a first embodiment of a data management method according to this application, and the method includes:
步骤S10,当检测到数据导入请求时,读取所述数据导入请求对应的第一数据的数据内容,并基于所述数据内容确定所述第一数据对应的第一数据类型;Step S10, when a data import request is detected, read the data content of the first data corresponding to the data import request, and determine the first data type corresponding to the first data based on the data content;
步骤S20,基于所述第一数据类型,确定所述第一数据对应的第一数据源;Step S20: Determine a first data source corresponding to the first data based on the first data type;
步骤S30,确定所述第一数据的第一转换格式和第一列信息,并基于所述第一转换格式和所述第一列信息,将所述第一数据生成第一数据集;Step S30: Determine a first conversion format and first column information of the first data, and generate a first data set from the first data based on the first conversion format and the first column information;
步骤S40,将所述第一数据集导入所述第一数据源。Step S40: Import the first data set into the first data source.
本实施例数据管理方法运用于理财机构或者银行系统等金融机构的数据管理设备中,为描述方便,数据管理设备以下简称管理设备,管理设备可以是终端或者PC设备,在本申请实施例中,在管理设备中内置Spark(专为大规模数据处理而设计的快速通用的计算引擎),使得管理设备基于Spark技术支持将多类型文件导入导出到数据存储组件,如Excel,CSV,JSON等,同时,基于Spark技术,管理设备支持导入导出到多种类型的数据存储组件如:Hive,Mysql,Oracle,HDFS,Hbase,Mongodb等,具体通过Spark提供的DataSourceAPI增加数据存储组件的类,该类的具体程序段根据实际需要进行编辑,使得DataSourceAPI支持连接多种数据源。本实施例的实现依赖于Spark的分布式计算能力和支持连接多种数据源的DataSourceAPI(数据源调用接口),需要说明的是,Spark原生的Datasource(是一套连接外部数据源和Spark引擎的框架,它主要是给Spark框架提供一种快速读取外界数据的能力,它可以方便地把不同的数据格式通过DataSource API(调用接口)注册成Spark的表)已经实现对JSON、ORC、Parquet等文件格式的支持,但是支持的格式有限。不符合实际的需求,在此基础上,本申请实施例通过Spark提供的DataSourceAPI加入了对Excel(如支持03版本的xls和07版本后的xlsx),CSV,TXT等文件格式的支持,具体程序段根据文件格式进行编辑,也即在管理设备中可通过添加支持文件格式的方式,使得管理设备实现对多类型文件的导入导出。The data management method of this embodiment is applied to the data management equipment of financial institutions such as financial institutions or banking systems. For ease of description, the data management equipment is hereinafter referred to as the management equipment, and the management equipment may be a terminal or a PC equipment. In this embodiment of the application, Spark (a fast and universal computing engine designed for large-scale data processing) is built into the management device, which enables the management device to import and export multiple types of files to data storage components based on Spark technology support, such as Excel, CSV, JSON, etc., at the same time , Based on Spark technology, the management device supports importing and exporting to various types of data storage components such as Hive, Mysql, Oracle, HDFS, Hbase, Mongodb, etc. Specifically, the data storage component class is increased through the DataSource API provided by Spark. The specific The program segment is edited according to actual needs, so that DataSourceAPI supports connecting multiple data sources. The implementation of this embodiment relies on Spark’s distributed computing capabilities and the DataSource API (data source call interface) that supports the connection of multiple data sources. It should be noted that Spark’s native Datasource (is a set of connections to external data sources and the Spark engine). Framework, it is mainly to provide the Spark framework with the ability to quickly read external data. It can easily register different data formats as Spark tables through the DataSource API (call interface)). It has been implemented for JSON, ORC, Parquet, etc. File format support, but the supported formats are limited. It does not meet actual needs. On this basis, the embodiment of this application adds support for Excel (such as supporting xls version 03 and xlsx after version 07), CSV, TXT and other file formats through the DataSource API provided by Spark. Segments are edited according to the file format, that is, by adding supporting file formats in the management device, the management device can import and export multiple types of files.
本实施例的管理设备,在检测到数据导入请求时,对数据导入请求对应的数据进行加工处理,并通过确定对应的数据源,将加工处理后的数据导入到数据源中,实现数据的智能管理。The management device of this embodiment, when detecting a data import request, processes the data corresponding to the data import request, and imports the processed data into the data source by determining the corresponding data source to realize data intelligence management.
以下将对各个步骤进行详细说明:Each step will be described in detail below:
步骤S10,当检测到数据导入请求时,读取所述数据导入请求对应的第一数据的数据内容,并基于所述数据内容确定所述第一数据对应的第一数据类型;Step S10, when a data import request is detected, read the data content of the first data corresponding to the data import request, and determine the first data type corresponding to the first data based on the data content;
在本实施例中,理财机构或者银行等金融机构的相关业务人员,也即用户,在通过各种途径拿到数据,并且该数据需要导入本金融机构对应的数据源时,只需将数据传入管理设备中,管理设备即可完成数据的导入。In this embodiment, when the relevant business personnel of financial institutions such as financial institutions or banks, that is, users, obtain data through various channels, and the data needs to be imported into the data source corresponding to the financial institution, they only need to transfer the data. Into the management device, the management device can complete the data import.
具体的,当管理设备检测到数据导入请求时,读取数据导入请求对应的第一数据的数据内容,并根据数据内容识别出第一数据对应的第一数据类型,也即在将第一数据导入相应的数据源时,先确定第一数据的第一数据类型,以便后续将第一数据导入正确的数据源。Specifically, when the management device detects the data import request, it reads the data content of the first data corresponding to the data import request, and identifies the first data type corresponding to the first data according to the data content, that is, the first data When importing the corresponding data source, first determine the first data type of the first data, so that the first data can be subsequently imported into the correct data source.
可以理解的,本实施例的数据源有多个,即管理设备支持导入导出的数据存储组件有多个,如Hive、Mysql、Oracle、HDFS、Hbase和Mongodb等,为实现将第一数据精准的导入用户想要的数据源中,需要先确定第一数据的第一数据类型。It is understandable that there are multiple data sources in this embodiment, that is, there are multiple data storage components that the management device supports import and export, such as Hive, Mysql, Oracle, HDFS, Hbase, and Mongodb, etc., in order to achieve accurate first data To import the data source desired by the user, the first data type of the first data needs to be determined first.
进一步地,步骤S10包括:Further, step S10 includes:
步骤a,当检测到数据导入请求时,读取所述数据导入请求对应的第一数据预设行数的数据内容,并判断所述数据内容中每一列的列信息所属的第二数据类型;Step a: When a data import request is detected, read the data content of the preset number of rows of first data corresponding to the data import request, and determine the second data type to which the column information of each column in the data content belongs;
在该步骤中,当管理设备检测到数据导入请求时,管理设备中的文件读入对象(Reader)会对数据导入请求对应的第一数据进行读取,在这过程中,为实现快速确定第一数据的第一数据类型,可预设读取的行数,也即Reader只需读取预设行数的数据内容即可,并通过读取的数据内容判断第一数据的第一数据类型,其中,预设行数可指第一数据的前预设行数,之后判断读取的数据内容中每一列的列信息所属的第二数据类型。如当前列的列信息是数字,则确定当前列的数据类型为数字类型;若当前列的列信息是字符,则确定当前列信息的数据类型为字符类型等。In this step, when the management device detects the data import request, the file reading object (Reader) in the management device will read the first data corresponding to the data import request. In this process, in order to quickly determine the first data The first data type of a data, the number of rows can be preset to read, that is, the reader only needs to read the data content of the preset number of rows, and judge the first data type of the first data by the read data content Wherein, the preset number of rows may refer to the previous preset number of rows of the first data, and then the second data type to which the column information of each column in the read data content belongs is determined. If the column information of the current column is a number, the data type of the current column is determined to be a number type; if the column information of the current column is a character, then the data type of the current column information is determined to be a character type, etc.
步骤b,统计所述第二数据类型中各数据类型出现的次数,并将次数最多的数据类型确定为所述第一数据对应的第一数据类型。Step b: Count the number of occurrences of each data type in the second data type, and determine the data type with the largest number of times as the first data type corresponding to the first data.
在该步骤中,根据读取的数据内容中每一列的列信息所属的第二数据类型,统计第二数据类型中各数据类型出现的次数,并将次数出现最多的数据类型确定为第一数据对应的第一数据类型,如数字类型出现的次数最多,则确定第一数据为数字类型;若字符类型出现的次数最多,则确定第一数据为字符类型等。In this step, according to the second data type to which the column information of each column in the read data content belongs, count the number of occurrences of each data type in the second data type, and determine the data type with the most occurrences as the first data For the corresponding first data type, if the number type appears the most times, the first data is determined to be a number type; if the character type appears the most times, the first data is determined to be a character type, etc.
在具体实施时,预设行数优选为10行,即Reader读取第一数据前10行的数据内容,并由管理设备的数据类型推断器对前10行的数据内容进行数据类型判断,具体判断数据内容中每一列的数据类型,并通过判断哪种类型出现的次数最多进行推断,如:user:String,orderId:Int。为提高数据类型的判断精度,可将判断结果,也即第一数据类型,返回数据导入请求对应的用户端,以供用户进行查看确认,在此过程中,若接收到用户通过用户端基于第一数据类型发送的修改指令,则管理设备按照用户的修改意愿,更改第一数据的数据类型。In specific implementation, the preset number of rows is preferably 10 rows, that is, the Reader reads the data content of the first 10 rows of the first data, and the data type inference device of the management device judges the data type of the data content of the first 10 rows. Determine the data type of each column in the data content, and infer by judging which type appears the most times, such as: user: String, orderId: Int. In order to improve the accuracy of the data type judgment, the judgment result, that is, the first data type, can be returned to the user terminal corresponding to the data import request for the user to view and confirm. In this process, if the user receives the user terminal based on the first data type. A modification instruction sent by a data type, the management device changes the data type of the first data according to the user's modification wishes.
步骤S20,基于所述第一数据类型,确定所述第一数据对应的第一数据源。Step S20: Determine a first data source corresponding to the first data based on the first data type.
在本实施例中,管理设备基于确定的第一数据类型,确定第一数据对应的第一数据源,也即确定第一数据即将导入的第一数据源。具体的,事先将数据类型与数据源进行映射,从而得到数据类型-数据源映射表,在确定第一数据的第一数据类型时,即可通过数据类型-数据源映射表确定第一数据对应的第一数据源。In this embodiment, the management device determines the first data source corresponding to the first data based on the determined first data type, that is, determines the first data source into which the first data is to be imported. Specifically, the data type and the data source are mapped in advance to obtain the data type-data source mapping table. When the first data type of the first data is determined, the data type-data source mapping table can be used to determine the first data corresponding The first data source.
进一步地,步骤S20包括:Further, step S20 includes:
步骤c,基于所述第一数据类型,确定所述第一数据对应的第一数据源,并将所述第一数据源返回所述数据导入请求对应的用户端;Step c: Determine the first data source corresponding to the first data based on the first data type, and return the first data source to the client corresponding to the data import request;
在该步骤中,管理设备基于第一数据类型,确定第一数据类型对应的第一数据源,并将第一数据源返回数据导入请求对应的用户端,以供用户端的用户进行确认。In this step, the management device determines the first data source corresponding to the first data type based on the first data type, and returns the first data source to the client corresponding to the data import request for confirmation by the user of the client.
步骤d,若接收到所述用户端基于所述第一数据源发送的第二数据源,则将所述第二数据源作为所述第一数据对应的第一数据源。Step d, if a second data source sent by the user terminal based on the first data source is received, the second data source is used as the first data source corresponding to the first data.
在该步骤中,用户可通过用户终端确认管理设备是否推断正确,若推断不正确,可通过用户终端发送相应的修改指令,以供管理设备对第一数据的数据类型进行修改。具体的,管理设备若接收到用户端基于第一数据发送的第二数据源,则将第二数据源作为第一数据对应的第一数据源;若未接收到用户端基于第一数据源发送给的第二数据源,或者接收到基于所述第一数据源的确认指令,则确定第一数据源为第一数据对应的数据源。In this step, the user can confirm through the user terminal whether the inference of the management device is correct, and if the inference is incorrect, the user terminal can send a corresponding modification instruction for the management device to modify the data type of the first data. Specifically, if the management device receives the second data source sent by the user terminal based on the first data, it uses the second data source as the first data source corresponding to the first data; if it does not receive the user terminal sent based on the first data source If the second data source is given, or a confirmation instruction based on the first data source is received, the first data source is determined to be the data source corresponding to the first data.
可以理解的,由于第一数据的数据类型是通过第一数据预设行数的数据内容推断出来的,在准确度上并不是十分正确,为提高第一数据的数据类型的判断精准度,需要管理设备在判断出第一数据的第一数据类型后,将第一数据类型返回给用户确认,从而提高第一数据的数据类型的判断精准度,并且管理设备在对第一数据的数据类型进行修改时,会对修改后的数据类型进行保存,以便下一次遇到与第一数据相同的数据内容时,准确得到其数据类型。It is understandable that since the data type of the first data is inferred from the data content of the preset number of rows of the first data, the accuracy is not very correct. In order to improve the accuracy of judging the data type of the first data, it is necessary to After the management device determines the first data type of the first data, it returns the first data type to the user for confirmation, thereby improving the accuracy of judging the data type of the first data, and the management device is checking the data type of the first data. When modifying, the modified data type will be saved, so that the next time it encounters the same data content as the first data, the data type can be accurately obtained.
步骤S30,确定所述第一数据的第一转换格式和第一列信息,并基于所述第一转换格式和所述第一列信息,将所述第一数据生成第一数据集。Step S30: Determine a first conversion format and first column information of the first data, and generate a first data set from the first data based on the first conversion format and the first column information.
在本实施例中,管理设备确定第一数据的第一转换格式和第一列信息,从而按照第一转换格式,对第一列信息进行转换处理,其中转换处理包括数据脱敏处理和数据类型转换处理等,数据脱敏是指对某些敏感信息通过脱敏规则进行数据的变形,实现敏感隐私数据的可靠保护。在涉及客户安全数据或者一些商业性敏感数据的情况下,在不违反系统规则条件下,对真实数据进行改造并提供测试使用,如身份证号、手机号、卡号、客户号等个人信息都需要进行数据脱敏;数据类型转换处理,如将word文件转换为PDF文件等。In this embodiment, the management device determines the first conversion format of the first data and the first column information, so as to perform conversion processing on the first column information according to the first conversion format, wherein the conversion processing includes data desensitization processing and data type Data desensitization refers to the transformation of certain sensitive information through desensitization rules to achieve reliable protection of sensitive private data. In the case of customer security data or some commercially sensitive data, the real data should be modified and used for testing without violating system rules, such as personal information such as ID card number, mobile phone number, card number, customer number, etc. Data desensitization; data type conversion processing, such as converting word files to PDF files, etc.
其中,第一数据的第一转换格式和第一列信息可以是用户自定义的,也即在用户发起数据导入请求时,定义第一数据的第一转换格式和第一列信息,如对第一数据中的用户信息进行解密等。Among them, the first conversion format and the first column of information of the first data can be user-defined, that is, when the user initiates a data import request, the first conversion format and the first column of information of the first data are defined. Decrypt user information in a data, etc.
具体的,按照第一转换格式,对第一数据的第一列信息进行处理,从而将第一数据生成第一数据集,需要说明的是,第一数据集可以是多个文件中的数据,如用户想要导入的数据是导入文件A中数据、导入文件B中数据以及导入文件C中数据,那么本实施例的第一数据即为导入文件A、导入文件B和导入文件C,在将第一数据进行转换处理时,也就是对导入文件A、B和C进行处理,最后合并生成第一数据集,该第一数据集具体为DataFrame(一种表格型数据结构,它含有一组有序的列,每列可以是不同的值,是一个以命名列方式组织的分布式数据集)。Specifically, the first column of information of the first data is processed according to the first conversion format, so as to generate the first data set from the first data. It should be noted that the first data set may be data in multiple files. If the data that the user wants to import is the data in the import file A, the data in the import file B, and the data in the import file C, then the first data in this embodiment is the import file A, the import file B, and the import file C. When the first data is converted, that is, the imported files A, B, and C are processed, and finally the first data set is merged. The first data set is specifically a DataFrame (a tabular data structure that contains a group of Ordered columns, each column can have a different value, is a distributed data set organized in named columns).
步骤S40,将所述第一数据集导入所述第一数据源。Step S40: Import the first data set into the first data source.
在本实施例中,基于确定的第一数据源,调用对应的调用接口(DatasourceAPI),并通过该调用接口,将第一数据集(DataFrame)导入第一数据源中,如,Mysql库中的用户订单表,其中,调用接口是数据管理设备基于Spark技术预留的接口,通过该接口可实现分布式数据源的数据传输。In this embodiment, based on the determined first data source, the corresponding calling interface (DatasourceAPI) is called, and the first data set (DataFrame) is imported into the first data source through the calling interface, for example, in the Mysql library In the user order form, the calling interface is an interface reserved by the data management device based on Spark technology, through which data transmission of distributed data sources can be realized.
需要说明的是,每一个数据源都对应有专门的调用接口,也即在确定了第一数据集对应的第一数据源之后,需确定第一数据源对应的调用接口,并通过调用该调用接口,将第一数据集导入第一数据源中,也即导入第一数据源的数据需要通过第一数据源对应的调用接口。It should be noted that each data source corresponds to a dedicated call interface, that is, after the first data source corresponding to the first data set is determined, the call interface corresponding to the first data source needs to be determined, and the call Interface, import the first data set into the first data source, that is, import the data of the first data source through the calling interface corresponding to the first data source.
当然,为实现快速导入,减少调用接口的调用时间,可将不同数据源的调用接口集成一个通用的调用接口,具体程序段可根据实际需求进行编辑,通过通用的调用接口,可实现不同数据源的数据传输,也即不管将数据导入哪一个数据源,都通过通用的调用接口导入到对应的数据源。Of course, in order to realize quick import and reduce the calling time of calling interfaces, the calling interfaces of different data sources can be integrated into a general calling interface. The specific program segments can be edited according to actual needs. Through the general calling interface, different data sources can be realized. Data transmission, that is, no matter which data source the data is imported into, it is imported to the corresponding data source through a common calling interface.
进一步地,步骤S40包括:Further, step S40 includes:
步骤e,确定所述第一数据集的写入类型;Step e, determining the writing type of the first data set;
在该步骤中,用户还可以自定义第一数据集的写入类型,其中,写入类型包括新建数据、复写数据和追加数据等。如用户选择用户订单表,并选择数据追加等。管理设备即可确定第一数据集的写入类型,以便后续对第一数据集进行写入。In this step, the user can also customize the write type of the first data set, where the write type includes new data, overwritten data, and additional data. For example, the user selects the user order form and selects data addition. The management device can determine the writing type of the first data set, so as to subsequently write the first data set.
步骤f,按照所述写入类型将所述第一数据集导入所述第一数据源。Step f: Import the first data set into the first data source according to the write type.
在该步骤中,管理设备基于确定的第一数据源,调用对应的调用接口,并通过该调用接口,按照确定的写入类型,将第一数据集导入第一数据源中。In this step, the management device calls the corresponding calling interface based on the determined first data source, and imports the first data set into the first data source through the calling interface according to the determined writing type.
本实施例在接收到待传播文本时,当检测到数据导入请求时,读取所述数据导入请求对应的第一数据的数据内容,并基于所述数据内容确定所述第一数据对应的第一数据类型;基于所述第一数据类型,确定所述第一数据对应的第一数据源;确定所述第一数据的第一转换格式和第一列信息,并基于所述第一转换格式和所述第一列信息,将所述第一数据生成第一数据集;将所述第一数据集导入所述第一数据源。本申请在检测到数据导入请求时,对数据导入请求对应的数据进行加工处理,并通过确定对应的数据源,将加工处理后的数据导入到数据源中,实现数据的智能化管理。In this embodiment, when the text to be disseminated is received, when a data import request is detected, the data content of the first data corresponding to the data import request is read, and the first data corresponding to the first data is determined based on the data content. A data type; based on the first data type, determine the first data source corresponding to the first data; determine the first conversion format and first column information of the first data, and based on the first conversion format And the first column of information, generating a first data set from the first data; importing the first data set into the first data source. When a data import request is detected, this application processes the data corresponding to the data import request, and imports the processed data into the data source by determining the corresponding data source to realize intelligent data management.
进一步地,基于本申请数据管理方法第一实施例,提出本申请数据管理方法第二实施例。Further, based on the first embodiment of the data management method of the present application, a second embodiment of the data management method of the present application is proposed.
数据管理方法的第二实施例与数据管理方法的第一实施例的区别在于,参照图3,所述数据管理方法还包括:The difference between the second embodiment of the data management method and the first embodiment of the data management method is that, referring to FIG. 3, the data management method further includes:
步骤S50,当检测到数据导出请求时,获取所述数据导出请求的配置信息,所述配置信息包括第三数据源、查询语句、文件格式和输出路径;Step S50: When a data export request is detected, obtain configuration information of the data export request, where the configuration information includes a third data source, query sentences, file format, and output path;
步骤S60,基于所述第三数据源和所述查询语句,获取所述数据导出请求对应的第二数据;Step S60: Acquire second data corresponding to the data export request based on the third data source and the query sentence;
步骤S70,基于所述文件格式,将所述第二数据生成第二数据集,并确定所述第二数据集对应的文件写出对象;Step S70: Generate a second data set from the second data based on the file format, and determine a file writing object corresponding to the second data set;
步骤S80,将所述第二数据集写入所述文件写出对象,并将所述文件写出对象导出至所述输出路径对应的终端。Step S80: Write the second data set into the file write-out object, and export the file write-out object to a terminal corresponding to the output path.
本实施例在检测到数据导出请求时,确定对应的第二数据,并对第二数据进行加工处理成第二数据集,在将第二数据集写入到对应的文件写出对象中导出,实现数据的智能化管理。In this embodiment, when a data export request is detected, the corresponding second data is determined, the second data is processed into a second data set, and the second data set is written into the corresponding file write-out object to export, Realize the intelligent management of data.
以下将对各个步骤进行详细说明:Each step will be described in detail below:
步骤S50,当检测到数据导出请求时,获取所述数据导出请求的配置信息,所述配置信息包括第三数据源、查询语句、文件格式和输出路径。Step S50: When a data export request is detected, configuration information of the data export request is obtained, where the configuration information includes a third data source, a query sentence, a file format, and an output path.
在本实施例中,当管理设备检测到数据导出请求时,获取数据导出请求的配置信息,该配置信息由用户自行配置,其中,配置信息包括第三数据源、查询语句、文件格式和输出路径等,也即,用户在将数据导出时,可选择对应的数据源和对应需要导出的数据表,如Mysql库中的用户订单表,并定义从数据表中需要导出的数据的查询语句,以及定义对指定的列的列信息进行的数据转换。如:定义导出最近半年的订单表,并对用户信息进行数据脱敏处理等,接着选择需要导出的文件格式和输出路径,如:导出用户订单表为Excel,路径为:/home/username/orders.xlsx。管理设备根据用户的配置信息,即可确定对应的参数。In this embodiment, when the management device detects the data export request, it obtains the configuration information of the data export request. The configuration information is configured by the user. The configuration information includes the third data source, query statement, file format, and output path. That is, when the user exports data, he can select the corresponding data source and the corresponding data table that needs to be exported, such as the user order table in the Mysql library, and define the query statement for the data to be exported from the data table, and Define the data conversion of the column information of the specified column. For example: define the export order form for the last six months, and perform data desensitization processing on user information, and then select the file format and output path to be exported, such as: export the user order form to Excel, the path is: /home/username/orders .xlsx. The management device can determine the corresponding parameters according to the user's configuration information.
步骤S60,基于所述第三数据源和所述查询语句,获取所述数据导出请求对应的第二数据。Step S60: Acquire second data corresponding to the data export request based on the third data source and the query sentence.
在本实施例中,管理设备基于第三数据源和对应的查询语句,获取数据导出请求对应的第二数据,具体在第三数据源中获取对应的数据表,并通过查询语句,在数据表中提取数据导出请求对应的第二数据,其中,第二数据可是单个文件的数据,也可以是多个文件的数据。In this embodiment, the management device obtains the second data corresponding to the data export request based on the third data source and the corresponding query statement, and specifically obtains the corresponding data table from the third data source, and uses the query statement in the data table Extract the second data corresponding to the data export request in the data export request, where the second data may be data of a single file or data of multiple files.
步骤S70,基于所述文件格式,将所述第二数据生成第二数据集,并确定所述第二数据集对应的文件写出对象。Step S70: Based on the file format, generate a second data set from the second data, and determine a file writing object corresponding to the second data set.
在本实施例中,管理设备根据用户配置的文件格式,对确定的第二数据进行处理,如脱敏处理,以生成第二数据集,第二数据集具体也为DataFrame,并确定第二数据集对应的文件写出对象。In this embodiment, the management device processes the determined second data according to the file format configured by the user, such as desensitization processing, to generate a second data set, which is specifically also a DataFrame, and determines the second data Set the corresponding file to write out the object.
具体的,所述文件格式包括第二列信息,所述第二列信息对应的第二转换格式和文件格式类型,步骤S70包括:Specifically, the file format includes a second column of information, and the second conversion format and file format type corresponding to the second column of information, step S70 includes:
步骤g,基于所述第二转换格式和所述第二列信息,将所述第二数据生成第二数据集;Step g, generating a second data set from the second data based on the second conversion format and the second column information;
在该步骤中,管理设备根据第二列信息,以及第二列信息对应的第二转换格式,将第二数据生成第二数据集,具体的,在第二数据中提取第二列信息,并将第二列信息按照所述第二转换格式进行转换,如解密等,将第二数据生成第二数据集。In this step, the management device generates a second data set from the second data according to the second column information and the second conversion format corresponding to the second column information. Specifically, the second column information is extracted from the second data, and The second column information is converted according to the second conversion format, such as decryption, and the second data is generated into a second data set.
步骤h,基于所述文件格式类型,确定所述第二数据集对应的文件写出对象。Step h: Determine a file writing object corresponding to the second data set based on the file format type.
在该步骤中,管理设备基于文件格式类型,确定第二数据集对应的文件写出对象,具体可事先建立文件格式类型与文件写出对象的映射表,在确定用户选择的文件格式类型后,即可确定对应的文件写出对象。如如:支持Spark的Excel的文件写出对象。管理设备中的写入模块(Writer)支持多种文件格式类型如Excel,csv,Json等。In this step, the management device determines the file writing object corresponding to the second data set based on the file format type. Specifically, a mapping table between the file format type and the file writing object can be established in advance. After determining the file format type selected by the user, You can determine the corresponding file write-out object. Such as: support Spark Excel file to write out objects. The write module (Writer) in the management device supports multiple file format types such as Excel, csv, Json, etc.
步骤S80,将所述第二数据集写入所述文件写出对象,并将所述文件写出对象导出至所述输出路径对应的终端。Step S80: Write the second data set into the file write-out object, and export the file write-out object to a terminal corresponding to the output path.
在本实施例中,在确定了第二数据集的文件写出对象后,管理设备将第二数据集写入文件写出对象中,并将文件写出对象导出至用户配置的输出路径对应的终端,如:输出路径为/home/username/orders.xlsx。In this embodiment, after determining the file write-out object of the second data set, the management device writes the second data set into the file write-out object, and exports the file write-out object to the corresponding output path configured by the user Terminal, such as: the output path is /home/username/orders.xlsx.
进一步地,将所述第二数据集写入所述文件写出对象的步骤包括:Further, the step of writing the second data set into the file write-out object includes:
步骤i,遍历所述第二数据集的分区,并按照一次一个分区的写入方式,将所述第二数据集写入所述文件写出对象。Step i: Traverse the partitions of the second data set, and write the second data set into the file write-out object according to a write mode of one partition at a time.
在该步骤中,管理设备遍历第二数据集的分区,可以理解的,第二数据集,也即DataFrame,有多个分区,每一个分区都存放有数据,其中,分区由用户事先定义,如基于hash规则,将DataFrame分为多个区,如哈希值为A的数据放在a区,哈希值为B的数据放在b区等,管理设别按照一次一个分区的写入方式,将第二数据集写入文件写出对象中。In this step, the management device traverses the partitions of the second data set. It is understandable that the second data set, that is, the DataFrame, has multiple partitions, and each partition stores data. The partitions are defined by the user in advance, such as Based on the hash rule, the DataFrame is divided into multiple areas. For example, data with a hash value of A is placed in area a, and data with a hash value of B is placed in area b. The management device is written in one partition at a time. Write the second data set to the file write-out object.
本实施例是为了防止在写入的过程中出现内存溢出的问题,因此,对Writer部分做了修改,调用了Spark 的toLocalIterator去遍历DataFrame 的分区,按照一次收集一个分区的方式去收集数据,并提供了通用的写入方案可以写到HDFS(Hadoop Distributed File System是Hadoop抽象文件系统的一种实现,指分布式文件系统)和本地文件系统中。This embodiment is to prevent the memory overflow problem during the writing process. Therefore, the Writer part is modified and Spark is called ToLocalIterator to traverse the partitions of the DataFrame, collect data in a way that collects one partition at a time, and provides a general writing scheme that can be written to HDFS (Hadoop Distributed File System is an implementation of Hadoop abstract file system, which refers to distributed file system) and local file system.
本实施例当检测到数据导出请求时,获取所述数据导出请求的配置信息,所述配置信息包括第三数据源、查询语句、文件格式和输出路径;基于所述第三数据源和所述查询语句,获取所述数据导出请求对应的第二数据;基于所述文件格式,将所述第二数据生成第二数据集,并确定所述第二数据集对应的文件写出对象;将所述第二数据集写入所述文件写出对象,并将所述文件写出对象导出至所述输出路径对应的终端。通过在检测到数据导出请求时,确定对应的第二数据,并对第二数据进行加工处理成第二数据集,在将第二数据集写入到对应的文件写出对象中导出,实现数据的智能化管理。In this embodiment, when a data export request is detected, configuration information of the data export request is acquired. The configuration information includes a third data source, query sentences, file format, and output path; based on the third data source and the Query sentence to obtain the second data corresponding to the data export request; generate a second data set from the second data based on the file format, and determine the file corresponding to the second data set to write out objects; The second data set is written into the file write-out object, and the file write-out object is exported to the terminal corresponding to the output path. When the data export request is detected, the corresponding second data is determined, and the second data is processed into a second data set, and the second data set is written into the corresponding file write-out object to export the data. Intelligent management.
本申请还提供一种数据管理装置。本申请数据管理装置包括:The application also provides a data management device. The data management device of this application includes:
读取模块,用于当检测到数据导入请求时,读取所述数据导入请求对应的第一数据的数据内容,并基于所述数据内容确定所述第一数据对应的第一数据类型;A reading module, configured to read the data content of the first data corresponding to the data import request when a data import request is detected, and determine the first data type corresponding to the first data based on the data content;
确定模块,用于基于所述第一数据类型,确定所述第一数据对应的第一数据源;A determining module, configured to determine a first data source corresponding to the first data based on the first data type;
生成模块,用于确定所述第一数据的第一转换格式和第一列信息,并基于所述第一转换格式和所述第一列信息,将所述第一数据生成第一数据集;A generating module, configured to determine a first conversion format and first column information of the first data, and generate a first data set from the first data based on the first conversion format and the first column information;
导入模块,用于将所述第一数据集导入所述第一数据源。The import module is used to import the first data set into the first data source.
进一步地,所述读取模块还用于:Further, the reading module is also used for:
当检测到数据导入请求时,读取所述数据导入请求对应的第一数据预设行数的数据内容,并判断所述数据内容中每一列的列信息所属的第二数据类型;When a data import request is detected, read the data content of the preset number of rows of first data corresponding to the data import request, and determine the second data type to which the column information of each column in the data content belongs;
统计所述第二数据类型中各数据类型出现的次数,并将次数最多的数据类型确定为所述第一数据对应的第一数据类型。The number of occurrences of each data type in the second data type is counted, and the data type with the most frequency is determined as the first data type corresponding to the first data.
进一步地,所述确定模块还用于:Further, the determining module is also used for:
基于所述第一数据类型,确定所述第一数据对应的第一数据源,并将所述第一数据源返回所述数据导入请求对应的用户端;Based on the first data type, determine the first data source corresponding to the first data, and return the first data source to the client corresponding to the data import request;
若接收到所述用户端基于所述第一数据源发送的第二数据源,则将所述第二数据源作为所述第一数据对应的第一数据源。If a second data source sent by the user terminal based on the first data source is received, the second data source is used as the first data source corresponding to the first data.
进一步地,所述导入模块还用于:Further, the import module is also used for:
确定所述第一数据集的写入类型;Determining the writing type of the first data set;
按照所述写入类型将所述第一数据集导入所述第一数据源。Import the first data set into the first data source according to the write type.
进一步地,所述数据管理装置还包括:Further, the data management device further includes:
获取模块,用于当检测到数据导出请求时,获取所述数据导出请求的配置信息,所述配置信息包括第三数据源、查询语句、文件格式和输出路径;The obtaining module is configured to obtain configuration information of the data export request when the data export request is detected, the configuration information including the third data source, query statement, file format and output path;
所述获取模块,还用于基于所述第三数据源和所述查询语句,获取所述数据导出请求对应的第二数据;The obtaining module is further configured to obtain second data corresponding to the data export request based on the third data source and the query sentence;
所述生成模块,还用于基于所述文件格式,将所述第二数据生成第二数据集,并确定所述第二数据集对应的文件写出对象;The generating module is further configured to generate a second data set from the second data based on the file format, and determine a file writing object corresponding to the second data set;
导出模块,用于将所述第二数据集写入所述文件写出对象,并将所述文件写出对象导出至所述输出路径对应的终端。The export module is configured to write the second data set into the file write-out object, and export the file write-out object to the terminal corresponding to the output path.
进一步地,所述文件格式包括第二列信息,所述第二列信息对应的第二转换格式和文件格式类型,所述生成模块还用于:Further, the file format includes a second column of information, a second conversion format and file format type corresponding to the second column of information, and the generating module is further configured to:
基于所述第二转换格式和所述第二列信息,将所述第二数据生成第二数据集;Generating a second data set from the second data based on the second conversion format and the second column information;
基于所述文件格式类型,确定所述第二数据集对应的文件写出对象。Based on the file format type, determine the file writing object corresponding to the second data set.
进一步地,所述导出模块还用于:Further, the export module is also used for:
遍历所述第二数据集的分区,并按照一次一个分区的写入方式,将所述第二数据集写入所述文件写出对象。Traverse the partitions of the second data set, and write the second data set into the file write-out object according to a write mode of one partition at a time.
本申请还提供一种计算机可读存储介质。The application also provides a computer-readable storage medium.
本申请计算机可读存储介质上存储有数据管理程序,所述数据管理程序被处理器执行时实现如上所述的数据管理方法的步骤。The computer-readable storage medium of the present application stores a data management program, and when the data management program is executed by a processor, the steps of the data management method described above are realized.
其中,在所述处理器上运行的数据管理程序被执行时所实现的方法可参照本申请数据管理方法各个实施例,此处不再赘述。For the method implemented when the data management program running on the processor is executed, please refer to the various embodiments of the data management method of the present application, which will not be repeated here.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or system including a series of elements not only includes those elements, It also includes other elements not explicitly listed, or elements inherent to the process, method, article, or system. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article or system that includes the element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM) as described above. , Magnetic disk, optical disk), including several instructions to make a terminal device (can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the method described in each embodiment of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书与附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made by using the content of the description and drawings of this application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims (10)

  1. 一种数据管理方法,其中,所述数据管理方法包括如下步骤:A data management method, wherein the data management method includes the following steps:
    当检测到数据导入请求时,读取所述数据导入请求对应的第一数据的数据内容,并基于所述数据内容确定所述第一数据对应的第一数据类型;When a data import request is detected, read the data content of the first data corresponding to the data import request, and determine the first data type corresponding to the first data based on the data content;
    基于所述第一数据类型,确定所述第一数据对应的第一数据源;Determine the first data source corresponding to the first data based on the first data type;
    确定所述第一数据的第一转换格式和第一列信息,并基于所述第一转换格式和所述第一列信息,将所述第一数据生成第一数据集;Determining a first conversion format and first column information of the first data, and generating a first data set from the first data based on the first conversion format and the first column information;
    将所述第一数据集导入所述第一数据源。Import the first data set into the first data source.
  2. 如权利要求1所述的数据管理方法,其中,所述当检测到数据导入请求时,读取所述数据导入请求对应的第一数据的数据内容,并基于所述数据内容确定所述第一数据对应的第一数据类型的步骤包括:The data management method according to claim 1, wherein when the data import request is detected, the data content of the first data corresponding to the data import request is read, and the first data content is determined based on the data content. The steps of the first data type corresponding to the data include:
    当检测到数据导入请求时,读取所述数据导入请求对应的第一数据预设行数的数据内容,并判断所述数据内容中每一列的列信息所属的第二数据类型;When a data import request is detected, read the data content of the preset number of rows of first data corresponding to the data import request, and determine the second data type to which the column information of each column in the data content belongs;
    统计所述第二数据类型中各数据类型出现的次数,并将次数最多的数据类型确定为所述第一数据对应的第一数据类型。The number of occurrences of each data type in the second data type is counted, and the data type with the most frequency is determined as the first data type corresponding to the first data.
  3. 如权利要求1所述的数据管理方法,其中,所述基于所述第一数据类型,确定所述第一数据对应的第一数据源的步骤包括:5. The data management method according to claim 1, wherein the step of determining the first data source corresponding to the first data based on the first data type comprises:
    基于所述第一数据类型,确定所述第一数据对应的第一数据源,并将所述第一数据源返回所述数据导入请求对应的用户端;Based on the first data type, determine the first data source corresponding to the first data, and return the first data source to the client corresponding to the data import request;
    若接收到所述用户端基于所述第一数据源发送的第二数据源,则将所述第二数据源作为所述第一数据对应的第一数据源。If a second data source sent by the user terminal based on the first data source is received, the second data source is used as the first data source corresponding to the first data.
  4. 如权利要求1所述的数据管理方法,其中,所述将所述第一数据集导入所述第一数据源的步骤包括:The data management method according to claim 1, wherein the step of importing the first data set into the first data source comprises:
    确定所述第一数据集的写入类型;Determining the writing type of the first data set;
    按照所述写入类型将所述第一数据集导入所述第一数据源。Import the first data set into the first data source according to the write type.
  5. 如权利要求1-4任一项所述的数据管理方法,其中,所述数据管理方法还包括:The data management method according to any one of claims 1 to 4, wherein the data management method further comprises:
    当检测到数据导出请求时,获取所述数据导出请求的配置信息,所述配置信息包括第三数据源、查询语句、文件格式和输出路径;When a data export request is detected, obtain configuration information of the data export request, where the configuration information includes a third data source, query statement, file format, and output path;
    基于所述第三数据源和所述查询语句,获取所述数据导出请求对应的第二数据;Obtaining second data corresponding to the data export request based on the third data source and the query sentence;
    基于所述文件格式,将所述第二数据生成第二数据集,并确定所述第二数据集对应的文件写出对象;Based on the file format, generating a second data set from the second data, and determining a file writing object corresponding to the second data set;
    将所述第二数据集写入所述文件写出对象,并将所述文件写出对象导出至所述输出路径对应的终端。The second data set is written into the file write-out object, and the file write-out object is exported to a terminal corresponding to the output path.
  6. 如权利要求5所述的数据管理方法,其中,所述文件格式包括第二列信息,所述第二列信息对应的第二转换格式和文件格式类型,所述基于所述文件格式,将所述第二数据生成第二数据集,并确定所述第二数据集对应的文件写出对象的步骤包括:The data management method of claim 5, wherein the file format includes a second column of information, a second conversion format and a file format type corresponding to the second column of information, and the file format is based on the file format. The step of generating a second data set from the second data, and determining a file writing object corresponding to the second data set includes:
    基于所述第二转换格式和所述第二列信息,将所述第二数据生成第二数据集;Generating a second data set from the second data based on the second conversion format and the second column information;
    基于所述文件格式类型,确定所述第二数据集对应的文件写出对象。Based on the file format type, determine the file writing object corresponding to the second data set.
  7. 如权利要求5所述的数据管理方法,其中,所述将所述第二数据集写入所述文件写出对象的步骤包括:5. The data management method according to claim 5, wherein the step of writing the second data set into the file write-out object comprises:
    遍历所述第二数据集的分区,并按照一次一个分区的写入方式,将所述第二数据集写入所述文件写出对象。Traverse the partitions of the second data set, and write the second data set into the file write-out object according to a write mode of one partition at a time.
  8. 一种数据管理装置,其中,所述数据管理装置包括:A data management device, wherein the data management device includes:
    读取模块,用于当检测到数据导入请求时,读取所述数据导入请求对应的第一数据的数据内容,并基于所述数据内容确定所述第一数据对应的第一数据类型;A reading module, configured to read the data content of the first data corresponding to the data import request when a data import request is detected, and determine the first data type corresponding to the first data based on the data content;
    确定模块,用于基于所述第一数据类型,确定所述第一数据对应的第一数据源;A determining module, configured to determine a first data source corresponding to the first data based on the first data type;
    生成模块,用于确定所述第一数据的第一转换格式和第一列信息,并基于所述第一转换格式和所述第一列信息,将所述第一数据生成第一数据集;A generating module, configured to determine a first conversion format and first column information of the first data, and generate a first data set from the first data based on the first conversion format and the first column information;
    导入模块,用于将所述第一数据集导入所述第一数据源。The import module is used to import the first data set into the first data source.
  9. 一种数据管理设备,其中,所述数据管理设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的数据管理程序,所述数据管理程序被所述处理器执行时实现如权利要求1至7中任一项所述的数据管理方法的步骤。A data management device, wherein the data management device includes: a memory, a processor, and a data management program stored in the memory and capable of running on the processor, and the data management program is controlled by the processor The steps of the data management method according to any one of claims 1 to 7 are realized when executed.
  10. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有数据管理程序,所述数据管理程序被处理器执行时实现如权利要求1至7中任一项所述的数据管理方法的步骤。A computer-readable storage medium, wherein a data management program is stored on the computer-readable storage medium, and when the data management program is executed by a processor, the data management according to any one of claims 1 to 7 is realized Method steps.
PCT/CN2020/102540 2019-07-19 2020-07-17 Data management method and apparatus, and device and computer-readable storage medium WO2021013057A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910655646.6A CN110362630B (en) 2019-07-19 2019-07-19 Data management method, device, equipment and computer readable storage medium
CN201910655646.6 2019-07-19

Publications (1)

Publication Number Publication Date
WO2021013057A1 true WO2021013057A1 (en) 2021-01-28

Family

ID=68220364

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/102540 WO2021013057A1 (en) 2019-07-19 2020-07-17 Data management method and apparatus, and device and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN110362630B (en)
WO (1) WO2021013057A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362630B (en) * 2019-07-19 2023-11-28 深圳前海微众银行股份有限公司 Data management method, device, equipment and computer readable storage medium
CN110990476B (en) * 2019-12-17 2023-04-07 腾讯科技(深圳)有限公司 Data importing method, device, server and storage medium
CN113434606A (en) * 2021-06-30 2021-09-24 青岛海尔科技有限公司 Data import method, device, equipment and medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824849A (en) * 2015-01-08 2016-08-03 中国移动通信集团河南有限公司 Data import method and adapter
CN106951536A (en) * 2017-03-22 2017-07-14 努比亚技术有限公司 Data method for transformation and system
CN107247767A (en) * 2017-06-05 2017-10-13 山东浪潮通软信息科技有限公司 A kind of method and device that database is imported by formatted data files
CN108563768A (en) * 2018-04-19 2018-09-21 中国平安财产保险股份有限公司 Data transfer device, device, equipment and the storage medium of different data model
CN109213756A (en) * 2018-10-22 2019-01-15 北京锐安科技有限公司 Data storage, search method, device, server and storage medium
CN109740359A (en) * 2018-12-28 2019-05-10 上海点融信息科技有限责任公司 Method, apparatus and storage medium for data desensitization
US20190197174A1 (en) * 2017-12-22 2019-06-27 Warevalley Co., Ltd. Method and system for replicating data to heterogeneous database and detecting synchronization error of heterogeneous database through sql packet analysis
CN110362630A (en) * 2019-07-19 2019-10-22 深圳前海微众银行股份有限公司 Data managing method, device, equipment and computer readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9268802B2 (en) * 2012-06-26 2016-02-23 Google Inc. System and method for end-to-end exposure of exported representations of native data types to third-party applications
US9696967B2 (en) * 2015-11-09 2017-07-04 Microsoft Technology Licensing, Llc Generation of an application from data
CN108228560A (en) * 2016-12-22 2018-06-29 北京国双科技有限公司 A kind of determining method and device of data type
CN108694241B (en) * 2018-05-14 2023-04-18 平安科技(深圳)有限公司 Data storage method and device
CN108664665A (en) * 2018-05-22 2018-10-16 深圳壹账通智能科技有限公司 Data format method for transformation, device, equipment and readable storage medium storing program for executing

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824849A (en) * 2015-01-08 2016-08-03 中国移动通信集团河南有限公司 Data import method and adapter
CN106951536A (en) * 2017-03-22 2017-07-14 努比亚技术有限公司 Data method for transformation and system
CN107247767A (en) * 2017-06-05 2017-10-13 山东浪潮通软信息科技有限公司 A kind of method and device that database is imported by formatted data files
US20190197174A1 (en) * 2017-12-22 2019-06-27 Warevalley Co., Ltd. Method and system for replicating data to heterogeneous database and detecting synchronization error of heterogeneous database through sql packet analysis
CN108563768A (en) * 2018-04-19 2018-09-21 中国平安财产保险股份有限公司 Data transfer device, device, equipment and the storage medium of different data model
CN109213756A (en) * 2018-10-22 2019-01-15 北京锐安科技有限公司 Data storage, search method, device, server and storage medium
CN109740359A (en) * 2018-12-28 2019-05-10 上海点融信息科技有限责任公司 Method, apparatus and storage medium for data desensitization
CN110362630A (en) * 2019-07-19 2019-10-22 深圳前海微众银行股份有限公司 Data managing method, device, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN110362630A (en) 2019-10-22
CN110362630B (en) 2023-11-28

Similar Documents

Publication Publication Date Title
US11442939B2 (en) Configurable and incremental database migration framework for heterogeneous databases
US8601438B2 (en) Data transformation based on a technical design document
WO2021013057A1 (en) Data management method and apparatus, and device and computer-readable storage medium
CN111767095A (en) Micro-service generation method and device, terminal equipment and storage medium
US11294973B2 (en) Codeless information service for abstract retrieval of disparate data
WO2019134340A1 (en) Salary calculation method, application server, and computer readable storage medium
CN109918394B (en) Data query method, system, computer device and computer readable storage medium
WO2019169725A1 (en) Test data generation method, device, and apparatus, and computer readable storage medium
US9262185B2 (en) Scripted dynamic document generation using dynamic document template scripts
WO2023124217A1 (en) Method and device for acquiring comprehensively sorted data of multi-column data
US20180302393A1 (en) Parameterized data delivery system for a spreadsheet application
US20220188336A1 (en) Systems and methods for managing connections in scalable clusters
US20230334238A1 (en) Augmented Natural Language Generation Platform
CN112861182A (en) Database query method and system, computer equipment and storage medium
CN107392560A (en) A kind of Excel list datas issue acquisition method and system based on internet
CN110704635B (en) Method and device for converting triplet data in knowledge graph
US20230153455A1 (en) Query-based database redaction
CN115310127A (en) Data desensitization method and device
US20210182235A1 (en) Metadata-driven distributed dynamic reader and writer
CN110442812A (en) The authority control method and system of front page layout
US20210216886A1 (en) Information providing system and data structure
CN112231377A (en) Data mapping method, system, device, server and storage medium
JP6338909B2 (en) Content control system
US11954224B1 (en) Database redaction for semi-structured and unstructured data
US20230401181A1 (en) Data Management Ecosystem for Databases

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20843937

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20843937

Country of ref document: EP

Kind code of ref document: A1