CN112818070A

CN112818070A - Data query method and device based on global data dictionary and electronic equipment

Info

Publication number: CN112818070A
Application number: CN202110131307.5A
Authority: CN
Inventors: 高飞龙; 赵卓辉; 扶明明
Original assignee: Beijing Qibao Xinan Technology Co ltd
Current assignee: Beijing Qibao Xinan Technology Co ltd
Priority date: 2021-01-30
Filing date: 2021-01-30
Publication date: 2021-05-18

Abstract

The invention discloses a data query method, a device and electronic equipment based on a global data dictionary, wherein the method comprises the following steps: capturing data table information and field information from various heterogeneous data sources at regular time; covering the first data table with the data table information, and covering the second data table with the field information; respectively acquiring remark information and priority information of a data table and a field in each heterogeneous data source; storing remark information and priority information of data tables and fields in all heterogeneous data sources into the third data table and the fourth data table respectively; and inquiring from the global data dictionary according to inquiry information input by a user and displaying target data according to priority information in the global data dictionary. According to the method, data tables and fields in different heterogeneous data sources can be quickly inquired through the global data dictionary; and multi-aspect query of names, comments, types, remarks and the like of the data tables and the fields is supported.

Description

Data query method and device based on global data dictionary and electronic equipment

Technical Field

The invention relates to the technical field of data development, in particular to a data query method and device based on a global data dictionary, electronic equipment and a computer readable medium.

Background

Under the technical innovation represented by cloud computing, data have penetrated into every industry and business function field at present, become important production factors, and big data development is also in process.

In the process of large data development, data information is dispersedly stored in different heterogeneous data systems (such as hive, mysql and the like). For a data demand, such as data of an "order", needs to be queried in different heterogeneous systems, and as the data volume increases, the data table reaches 3000 levels, and the fields of the table reach 43000 levels. This causes the problems of complex data query and low query efficiency of the heterogeneous data system.

Disclosure of Invention

The invention aims to solve the technical problems of complex data query and low query efficiency of a heterogeneous data system in a data development process.

In order to solve the above technical problem, a first aspect of the present invention provides a data query method based on a global data dictionary, where the global data dictionary includes a first data table, a second data table, a third data table, and a fourth data table, and the method includes:

capturing data table information and field information from various heterogeneous data sources at regular time; covering the first data table with the data table information, and covering the second data table with the field information;

respectively acquiring remark information and priority information of a data table in each heterogeneous data source; remark information and priority information of fields in each heterogeneous data source;

storing remark information and priority information of the data tables in each heterogeneous data source into the third data table; storing remark information and priority information of fields in each heterogeneous data source into the fourth data table;

and inquiring from the global data dictionary according to inquiry information input by a user and displaying target data according to priority information in the global data dictionary.

According to a preferred embodiment of the invention, the data table information and the field information are periodically captured from the various heterogeneous data sources by a crawler.

According to a preferred embodiment of the present invention, the acquiring remark information of the data table or the field in each heterogeneous data source includes:

obtaining remark information of a data table or a field from the heterogeneous data source;

and/or receiving remark information of a data table or a field input by a user.

According to a preferred embodiment of the present invention, the priority information of the data table is generated according to the frequency of use of the data table by the user, and the priority information of the field is generated according to the frequency of use of the field by the user.

According to a preferred embodiment of the present invention, the data table information includes: at least one of a data table name, a data table annotation; the field information includes: at least one of field name, field type, and field comment.

According to a preferred embodiment of the present invention, the heterogeneous data source includes: at least one of a relational database management system mysql, a data warehouse tool hive, and an open data processing service ODPS.

In order to solve the above technical problem, a second aspect of the present invention provides a data query apparatus based on a global data dictionary, the global data dictionary including a first data table, a second data table, a third data table and a fourth data table, the apparatus comprising:

the covering module is used for capturing data table information and field information from various heterogeneous data sources at regular time; covering the first data table with the data table information, and covering the second data table with the field information;

the acquisition module is used for respectively acquiring remark information and priority information of a data table in each heterogeneous data source; remark information and priority information of fields in each heterogeneous data source;

the storage module is used for storing remark information and priority information of the data tables in each heterogeneous data source into the third data table; storing remark information and priority information of fields in each heterogeneous data source into the fourth data table;

and the query display module is used for querying from the global data dictionary according to query information input by a user and displaying target data according to priority information in the global data dictionary.

According to a preferred embodiment of the present invention, the overlay module specifically captures the data table information and the field information from the various heterogeneous data sources by a crawler at regular time.

According to a preferred embodiment of the present invention, the obtaining module is specifically configured to obtain remark information of a data table or a field from the heterogeneous data source;

and/or remark information for receiving user-entered data tables or fields.

According to a preferred embodiment of the present invention, the obtaining module specifically generates the priority information of the data table according to the frequency of using the data table by the user, and generates the priority information of the field according to the frequency of using the field by the user.

To solve the above technical problem, a third aspect of the present invention provides an electronic device, comprising:

a processor; and

a memory storing computer executable instructions that, when executed, cause the processor to perform the method described above.

To solve the above technical problems, a fourth aspect of the present invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs which, when executed by a processor, implement the above method.

The method comprises the steps that a global data dictionary containing a first data table, a second data table, a third data table and a fourth data table is created, and data table information and field information captured from various heterogeneous data sources at regular time are respectively covered on the first data table and the second data table; storing remark information and priority information of the data tables in each heterogeneous data source into the third data table; storing remark information and priority information of fields in each heterogeneous data source into the fourth data table; when the data is queried, the data tables and the fields in different heterogeneous data sources can be rapidly queried through the global data dictionary without switching between different heterogeneous data sources. The invention supports multi-aspect query of names, comments, types, remarks and the like of the data table and the fields; the method and the device support the addition of remark information to the data table and the field, can automatically set the priority for the data table and the field, display the data according to the priority, and are convenient for a user to find the target data.

Drawings

In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects obtained more clear, the following will describe in detail the embodiments of the present invention with reference to the accompanying drawings. It should be noted, however, that the drawings described below are only illustrations of exemplary embodiments of the invention, from which other embodiments can be derived by those skilled in the art without inventive step.

FIG. 1 is a flow chart of a data query method based on a global data dictionary according to the present invention;

FIG. 2a is a schematic diagram of the input of query information and the display of target data in accordance with the present invention;

FIG. 2b is a schematic diagram of another embodiment of the present invention for inputting query information and displaying target data;

FIG. 3 is a block diagram of a data query apparatus based on a global data dictionary according to the present invention;

FIG. 4 is a block diagram of an exemplary embodiment of an electronic device in accordance with the present invention;

FIG. 5 is a schematic diagram of one embodiment of a computer-readable medium of the present invention.

Detailed Description

Exemplary embodiments of the present invention will now be described more fully hereinafter with reference to the accompanying drawings, in which exemplary embodiments of the invention may be embodied in many specific forms, and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art.

The structures, properties, effects or other characteristics described in a certain embodiment may be combined in any suitable manner in one or more other embodiments, while still complying with the technical idea of the invention.

In describing particular embodiments, specific details of structures, properties, effects, or other features are set forth in order to provide a thorough understanding of the embodiments by one skilled in the art. However, it is not excluded that a person skilled in the art may implement the invention in a specific case without the above-described structures, performances, effects or other features.

The flow chart in the drawings is only an exemplary flow demonstration, and does not represent that all the contents, operations and steps in the flow chart are necessarily included in the scheme of the invention, nor does it represent that the execution is necessarily performed in the order shown in the drawings. For example, some operations/steps in the flowcharts may be divided, some operations/steps may be combined or partially combined, and the like, and the execution order shown in the flowcharts may be changed according to actual situations without departing from the gist of the present invention.

The block diagrams in the figures generally represent functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The same reference numerals denote the same or similar elements, components, or parts throughout the drawings, and thus, a repetitive description thereof may be omitted hereinafter. It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these elements, components, or sections should not be limited by these terms. That is, these phrases are used only to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention. Furthermore, the term "and/or", "and/or" is intended to include all combinations of any one or more of the listed items.

Referring to fig. 1, fig. 1 is a flowchart of a data query method based on a global data dictionary, wherein the global data dictionary includes a first data table, a second data table, a third data table and a fourth data table; the first data table is used for storing data table information captured from each heterogeneous data source at regular time, the second data table is used for storing field information captured from each heterogeneous data source at regular time, the third data table is used for storing remark information and priority information of the data table in each heterogeneous data source, and the fourth data table is used for storing remark information and priority information of the field in each heterogeneous data source. As shown in fig. 1, the method includes:

s1, capturing data table information and field information from various heterogeneous data sources at regular time; covering the first data table with the data table information, and covering the second data table with the field information;

in the present invention, the heterogeneous data sources correspond to different data development systems, and may specifically include: relational database management system mysql, data warehouse tool hive, open data processing service ODPS, and the like. The data table information is used to identify a data table, and may specifically include: data table names, data table comments, etc.; the field information is used to identify a field, and may specifically include: field names, field types, field comments, and the like. Wherein the field types may include: binary data types Binary, Varbinary, Image; character data types Char, Varchar, and Text; unicode data types including Nchar, Nvarchar, and Ntext; date and time data types including Datetime, Smalldatetime, Date, TimeStamp; digital data types including positive and negative numbers, decimals, and integers; the currency data type represents a positive or negative currency amount; special data types, including Timestamp, Bit, and Uniqueidentifier.

In one example, data table information and field information is crawled from disparate data sources by crawler timing. The crawler is a program for automatically capturing internet information, and mainly comprises a scheduler, a URL manager, a webpage downloader, a webpage parser and an application program (captured valuable data, such as data table information and field information in the invention). Specifically, a fixed interval time may be set to complete the task of capturing at regular time, for example, the fixed interval time may be set to 1 day, and then the crawler captures data table information and field information from various heterogeneous data sources at a fixed time point every day.

In the embodiment of the invention, each heterogeneous data source can have an added or modified data table or field or a deleted data table or field, so that the information of the data table captured at regular time is covered by the information before the first data table, and the information of the field captured at regular time is covered by the information before the second data table, so as to ensure that the first data table and the second data table record all data change information of each heterogeneous data source.

S2, remark information and priority information of a data table in each heterogeneous data source are respectively obtained; remark information and priority information of fields in each heterogeneous data source;

in one example, a user is supported to add remarks to data tables and fields in the heterogeneous data sources through a front-end user interface, so that data understanding and query are facilitated. Therefore, the remark information of the data table or the field input by the user can be received through the front-end page, so that the remark information of the data table or the field in each heterogeneous data source can be obtained. In another example, the heterogeneous data source itself stores remarks of data tables or fields, and the remark information of the data tables or the fields can be directly obtained from the heterogeneous data source; particularly, remark information of a data table or a field can be captured at regular time through a crawler. Obviously, the invention can also obtain the remark information of the data table or the field in each heterogeneous data source by the two methods, and combine the remark information obtained by the two methods to be used as the final remark information. Wherein the remark information includes but is not limited to: data source city, data application scenario, data source date, etc.

The priority information of the data table or the field can be generated according to the use frequency of the data table or the field by the user. The use frequency can be determined by opening the data table or clicking the interactive log of the field by the user, specifically, the name of the queried data table or the name of the field can be obtained by analyzing the get request parameter queried by the user, and the use frequency can be determined according to the number of the interactive logs containing the name of the data table or the name of the field. And sequencing the use frequency of each data table from large to small to obtain the priority information of the data tables, and sequencing the use frequency of each field from large to small to obtain the priority information of the fields.

S3, storing remark information and priority information of the data tables in each heterogeneous data source into the third data table; storing remark information and priority information of fields in each heterogeneous data source into the fourth data table;

and S4, inquiring from the global data dictionary according to the inquiry information input by the user and displaying the target data according to the priority information in the global data dictionary.

Preferably, before this step, a user query interface may be displayed through the front end, and the user inputs query information in the query interface and then the system starts querying.

As shown in FIG. 2a, the query information may be a data table name and/or a data table annotation, which the user enters in the "search bar"; the corresponding displayed target data comprises: and the name of the data table, the name of the data source where the data table is located, the annotation of the data table, the remark of the data table and the name of the service corresponding to the data table are in accordance with the query information.

As shown in fig. 2b, the query information may also be at least one of a field name, a field comment, or a field type, the user enters at least one of the field name, the field comment, or the field type in the "search bar," and the target data displayed correspondingly includes: the field name, the name of the data source where the field is located, the name of the data table where the field is located, the field type, the field comment and the field remark, and the business name corresponding to the field are in accordance with the query information.

In this embodiment, if the data table is queried, the data table meeting the query information is displayed according to the priority order of the data table, that is, the data table with the front priority is displayed in front of the data table and the data table with the back priority is displayed behind the data table according to the priority order, or the data table with the front priority is displayed at the most significant position of the interface, such as the central area of the interface, and the data table with the back priority is displayed around the interface according to the priority order; or, the data sheet with the front priority can be displayed in a large font according to the priority order, and the data sheet with the back priority can be displayed in a small font.

Further, the global data dictionary may further include: a fifth data table and a sixth data table, the method further comprising: and counting data tables or data fields with high user use frequency at regular time, storing the data table information with high user use frequency into the fifth data table, and storing the field information with high user use frequency into the sixth data table. And when the fact that the user logs in the query interface is detected, directly displaying the data table information in the fifth data table and/or the field information in the sixth data table. And if the frequency of using the data table or the field by the user in the specified time period is greater than the preset frequency, determining that the data table or the field is the data table or the field with high use frequency. Therefore, after the user logs in the query interface, the user can see the data table or the field with high use frequency, and can find the target data without searching, so that the data query is more convenient.

Fig. 3 is a schematic structural diagram of a data query apparatus based on a global data dictionary, the global data dictionary including a first data table, a second data table, a third data table and a fourth data table, as shown in fig. 3, the apparatus includes:

the coverage module 31 is configured to capture data table information and field information from each heterogeneous data source at regular time; covering the first data table with the data table information, and covering the second data table with the field information;

an obtaining module 32, configured to obtain remark information and priority information of a data table in each heterogeneous data source respectively; remark information and priority information of fields in each heterogeneous data source;

the storage module 33 is configured to store the remark information and the priority information of the data table in each heterogeneous data source into the third data table; storing remark information and priority information of fields in each heterogeneous data source into the fourth data table;

and the query display module 34 is configured to query from the global data dictionary according to query information input by a user and display target data according to priority information in the global data dictionary.

In one embodiment, the overlay module 31 captures the data table information and the field information from the heterogeneous data sources by a crawler timing.

The obtaining module 32 is specifically configured to obtain remark information of a data table or a field from the heterogeneous data source;

and/or remark information for receiving user-entered data tables or fields.

The obtaining module 32 is further specifically configured to generate priority information of the data table according to the frequency of using the data table by the user, and generate priority information of the field according to the frequency of using the field by the user.

In the embodiment of the present invention, the data table information includes: at least one of a data table name, a data table annotation; the field information includes: at least one of field name, field type, and field comment. The heterogeneous data sources include: at least one of a relational database management system mysql, a data warehouse tool hive, and an open data processing service ODPS.

Those skilled in the art will appreciate that the modules in the above-described embodiments of the apparatus may be distributed as described in the apparatus, and may be correspondingly modified and distributed in one or more apparatuses other than the above-described embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.

In the following, embodiments of the electronic device of the present invention are described, which may be regarded as an implementation in physical form for the above-described embodiments of the method and apparatus of the present invention. Details described in the embodiments of the electronic device of the invention should be considered supplementary to the embodiments of the method or apparatus described above; for details which are not disclosed in embodiments of the electronic device of the invention, reference may be made to the above-described embodiments of the method or the apparatus.

Fig. 4 is a block diagram of an exemplary embodiment of an electronic device according to the present invention. The electronic device shown in fig. 4 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 4, the electronic device 400 of the exemplary embodiment is represented in the form of a general-purpose data processing device. The components of electronic device 400 may include, but are not limited to: at least one processing unit 410, at least one memory unit 420, a bus 430 connecting different electronic device components (including the memory unit 420 and the processing unit 410), a display unit 440, and the like.

The storage unit 420 stores a computer-readable program, which may be a code of a source program or a read-only program. The program may be executed by the processing unit 410 such that the processing unit 410 performs the steps of various embodiments of the present invention. For example, the processing unit 410 may perform the steps as shown in fig. 1.

The storage unit 420 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)4201 and/or a cache memory unit 4202, and may further include a read only memory unit (ROM) 4203. The storage unit 420 may also include a program/utility 4204 having a set (at least one) of program modules 4205, such program modules 4205 including, but not limited to: operating the electronic device, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 430 may be any bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 400 may also communicate with one or more external devices 300 (e.g., keyboard, display, network device, bluetooth device, etc.), enable a user to interact with the electronic device 400 via the external devices 300, and/or enable the electronic device 400 to communicate with one or more other data processing devices (e.g., router, modem, etc.). Such communication may occur via input/output (I/O) interfaces 450, and may also occur via a network adapter 460 with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network such as the Internet). The network adapter 460 may communicate with other modules of the electronic device 400 via the bus 430. It should be appreciated that although not shown in FIG. 4, other hardware and/or software modules may be used in the electronic device 400, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID electronics, tape drives, and data backup storage electronics, among others.

FIG. 5 is a schematic diagram of one computer-readable medium embodiment of the present invention. As shown in fig. 5, the computer program may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electronic device, apparatus, or device that is electronic, magnetic, optical, electromagnetic, infrared, or semiconductor, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer program, when executed by one or more data processing devices, enables the computer-readable medium to implement the above-described method of the invention, namely: capturing data table information and field information from various heterogeneous data sources at regular time; covering the first data table with the data table information, and covering the second data table with the field information; respectively acquiring remark information and priority information of a data table in each heterogeneous data source; remark information and priority information of fields in each heterogeneous data source; storing remark information and priority information of the data tables in each heterogeneous data source into the third data table; storing remark information and priority information of fields in each heterogeneous data source into the fourth data table; and inquiring from the global data dictionary according to inquiry information input by a user and displaying target data according to priority information in the global data dictionary.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a computer-readable storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a data processing device (which can be a personal computer, a server, or a network device, etc.) execute the above-mentioned method according to the present invention.

The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution electronic device, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including object oriented programming languages such as Java, C + + or the like and conventional procedural programming languages, such as "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In summary, the present invention can be implemented as a method, an apparatus, an electronic device, or a computer-readable medium executing a computer program. Some or all of the functions of the present invention may be implemented in practice using a general purpose data processing device such as a microprocessor or a Digital Signal Processor (DSP).

While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

Claims

1. A data query method based on a global data dictionary, wherein the global data dictionary comprises a first data table, a second data table, a third data table and a fourth data table, the method comprising:

2. The method of claim 1, wherein spreadsheet information and field information are periodically crawled from disparate data sources by a crawler.

3. The method according to any one of claims 1-2, wherein the obtaining remark information of data tables or fields in each heterogeneous data source comprises:

and/or receiving remark information of a data table or a field input by a user.

4. A method according to any of claims 1-3, characterized in that the priority information of the data table is generated on the basis of the frequency of use of the data table by the user, and the priority information of the fields is generated on the basis of the frequency of use of the fields by the user.

5. The method according to any of claims 1-4, wherein the data table information comprises: at least one of a data table name, a data table annotation; the field information includes: at least one of field name, field type, and field comment.

6. The method of any of claims 1-5, wherein the heterogeneous data sources comprise: at least one of a relational database management system mysql, a data warehouse tool hive, and an open data processing service ODPS.

7. A global data dictionary based data query apparatus, wherein the global data dictionary comprises a first data table, a second data table, a third data table and a fourth data table, the apparatus comprising:

8. An electronic device, comprising:

a processor; and

a memory storing computer-executable instructions that, when executed, cause the processor to perform the method of any of claims 1-6.

9. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-6.