CN110569317A - metadata collection method and device for data source - Google Patents

metadata collection method and device for data source Download PDF

Info

Publication number
CN110569317A
CN110569317A CN201910866414.5A CN201910866414A CN110569317A CN 110569317 A CN110569317 A CN 110569317A CN 201910866414 A CN201910866414 A CN 201910866414A CN 110569317 A CN110569317 A CN 110569317A
Authority
CN
China
Prior art keywords
cluster
metadata
resources
data source
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910866414.5A
Other languages
Chinese (zh)
Inventor
宋柯
张毅然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201910866414.5A priority Critical patent/CN110569317A/en
Publication of CN110569317A publication Critical patent/CN110569317A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Abstract

The invention provides a metadata acquisition method and a metadata acquisition device for a data source, wherein the method comprises the following steps: pre-estimating required cluster resources according to the metadata scale of the collected data source; loading a database table required by running metadata acquisition into a memory of the cluster to form a temporary table; and executing SQL set for collecting metadata based on the temporary table. In the invention, the relational database metadata acquisition SQL is operated through the cluster, so that the problem that the SQL cannot be operated by the JDBC direct connection database in large data volume is solved.

Description

metadata collection method and device for data source
Technical Field
The invention relates to the field of databases, in particular to a metadata acquisition method and device for a data source.
Background
Most of the existing collection modes of metadata of relational databases such as Hive, Mysql, Oracle, Postgres and the like are to connect libraries where the metadata of various data sources are located through JDBC, and then query various database tables storing the metadata information through sql to extract the metadata information of the data sources.
The collection mode of the metadata is actually the core sql logical operation performed at the server side where the metadata database is located, and the operation is actually performed in the memory of the data source server. The metadata collection mode has no problem under the condition of small data volume, but under the condition of large data volume, the condition that the result cannot be calculated occurs, and often, a user does not have the authority to expand the database server, and the effect of single-machine expansion is limited.
Disclosure of Invention
The embodiment of the invention provides a metadata acquisition method and a metadata acquisition device for a data source, which at least solve the problem of insufficient computing capability caused by the acquisition mode of metadata in the related technology.
According to an embodiment of the present invention, there is provided a metadata collection method for a data source, including: pre-estimating required cluster resources according to the metadata scale of the collected data source; loading a database table required by running metadata acquisition into a memory of the cluster to form a temporary table; and executing SQL set for collecting metadata based on the temporary table.
Preferably, after predicting the required cluster resources according to the metadata size of the collected data source, the method further includes: and initializing the cluster session through the estimated cluster resources.
preferably, the method further comprises: and when the cluster resources are insufficient in the running process, increasing the cluster resources and reinitializing the cluster session.
Preferably, the cluster resources include memory and CPU resources of the cluster.
According to another embodiment of the present invention, there is provided a metadata collection apparatus of a data source, including: the pre-estimation module is used for pre-estimating the required cluster resources according to the metadata scale of the acquired data source; the loading module is used for loading a database table required by running metadata acquisition into the memory of the cluster to form a temporary table; and the execution module is used for executing the SQL set for collecting the metadata based on the temporary table.
Preferably, the apparatus further comprises: and the initialization module is used for initializing cluster conversation through the estimated cluster resources.
Preferably, the apparatus further comprises: and the adjusting module is used for increasing the cluster resources and reinitializing the cluster session when the cluster resources are insufficient in the running process.
Preferably, the cluster resources include memory and CPU resources of the cluster.
According to a further embodiment of the present invention, there is also provided a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.
In the embodiment of the invention, the relational database metadata collection SQL is operated through the cluster, so that the problem that the SQL cannot be operated through the JDBC direct connection database in a large data volume is solved.
drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of a computing terminal configured to operate in accordance with a method of an embodiment of the present invention;
FIG. 2 is a flow diagram of a method of metadata collection for a data source according to an embodiment of the invention;
FIG. 3 is a flow diagram of a method for metadata collection of a data source in accordance with an alternative embodiment of the present invention;
FIG. 4 is a schematic diagram of a metadata collection apparatus for a data source according to an embodiment of the present invention;
Fig. 5 is a schematic structural diagram of a metadata collection apparatus of a data source according to an alternative embodiment of the present invention.
Detailed Description
the invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The method provided by the first embodiment of the present application may be executed in a computer terminal, a server, or a similar computing device. Taking the operation on a computer terminal as an example, fig. 1 is a hardware structure block diagram of the computer terminal operated by the method of the embodiment of the present invention. As shown in fig. 1, the computer terminal 100 may include one or more processors 102 (only one is shown in fig. 1) (the processor 102 may include but is not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, and optionally, the computer terminal 100 may further include a transmission device 106 for communication function and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the computer terminal. For example, computer terminal 100 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 can be used for storing computer programs, for example, software programs and modules of application software, such as computer programs corresponding to the methods in the embodiments of the present invention, and the processor 102 executes the computer programs stored in the memory 104 to execute various functional applications and data processing, i.e., to implement the methods described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 100 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 100. In one example, the transmission device 106 includes a Network adapter (NIC) through which communication with the internet is possible.
In the embodiment, a metadata collection method for a data source running on the computer terminal is provided. In the embodiment of the invention, the database table for storing the metadata in different relational databases is loaded into the cluster, and then the metadata acquisition SQL is operated through the operational capability of the cluster.
fig. 2 is a flow chart of a method according to an embodiment of the present invention, as shown in fig. 2, the flow chart includes the following steps:
Step S202, pre-estimating the required cluster resources according to the metadata scale of the collected data source;
Step S204, loading a database table required by running metadata acquisition into a memory of the cluster to form a temporary table;
Step S206, collecting the sql set of the metadata based on the temporary table.
After step S202 in this embodiment, the method may further include: initializing the spark session through the estimated spark cluster resources.
In step S206 of this embodiment, the method may further include: and when the spare cluster resources are insufficient in the running process, increasing the spare cluster resources and reinitializing the spare session.
in this embodiment, the cluster may be a spark cluster, and the cluster resources may include memory and CPU core number resources of the spark cluster.
in the embodiment of the invention, the relational data source metadata acquisition sql can be efficiently operated and calculated to extract the metadata of different relational data sources, and the metadata of the ultra-large relational data sources in data management can be efficiently acquired.
In order to facilitate an understanding of the technical solutions provided by the embodiments of the present invention, an embodiment of a specific application will be described in detail below.
FIG. 3 provides a method of metadata collection for a data source. As shown in fig. 3, in the present embodiment, the method mainly includes the following steps:
Step S301, pre-estimating required spark cluster resources according to the metadata scale of the acquired data source, and initializing spark session according to the required spark cluster resources;
step S302, loading a metadata table to a memory of the spark cluster to form a temporary table;
Step S303, executing the metadata collection SQL through a temporary table in the spark cluster;
Step S304, judging whether the execution is successful, if not, jumping to step S301, and if so, executing step S305;
Step S305 ends.
In this embodiment, the spark component spark sql may be adopted to load the information of the metadata base of each data source into the memory of the spark cluster, and then perform the analysis operation of the metadata in the operation cluster with strong operation force and memory capacity.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
In this embodiment, a metadata collection apparatus for a data source is further provided, and the apparatus is used to implement the foregoing embodiments and preferred embodiments, and the description of which has been already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
fig. 4 is a block diagram of a metadata collection apparatus of a data source according to an embodiment of the present invention, and as shown in fig. 4, the apparatus includes a prediction module 10, a loading module 20, and an execution module 30.
The estimation module 10 is used for estimating the required cluster resources according to the metadata scale of the collected data source.
The loading module 20 is configured to load a database table required for running metadata collection into the memory of the cluster to form a temporary table.
The execution module 30 is configured to perform the collection of the sql set of metadata based on the temporary table.
Fig. 5 is a block diagram of a metadata collection apparatus of a data source according to an alternative embodiment of the present invention, and as shown in fig. 5, the apparatus includes an initialization module 40 and an adjustment module 50 in addition to the estimation module 10, the loading module 20 and the execution module 30 shown in fig. 4.
the initialization module 40 is configured to initialize a cluster session with the estimated cluster resources.
The adjusting module 50 is configured to, when the spare cluster resources are insufficient in the running process, increase the cluster resources and reinitialize the cluster session.
it should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.
embodiments of the present invention also provide a storage medium having a computer program stored therein, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
it will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for metadata collection from a data source, comprising:
Pre-estimating required cluster resources according to the metadata scale of the collected data source;
Loading a database table required by running metadata acquisition into a memory of the cluster to form a temporary table;
and executing SQL set for collecting metadata based on the temporary table.
2. The method of claim 1, further comprising, after estimating the required cluster resources based on the metadata size of the collected data sources:
And initializing the cluster session through the estimated cluster resources.
3. The method of claim 1, further comprising:
and when the cluster resources are insufficient in the running process, increasing the cluster resources and reinitializing the cluster session.
4. the method of any of claims 1 to 3, wherein the cluster resources comprise memory and CPU resources of the cluster.
5. A metadata collection apparatus for a data source, comprising:
The pre-estimation module is used for pre-estimating the required cluster resources according to the metadata scale of the acquired data source;
the loading module is used for loading a database table required by running metadata acquisition into the memory of the cluster to form a temporary table;
And the execution module is used for executing the SQL set for collecting the metadata based on the temporary table.
6. The apparatus of claim 5, further comprising:
And the initialization module is used for initializing cluster conversation through the estimated cluster resources.
7. The apparatus of claim 5, further comprising:
and the adjusting module is used for increasing the cluster resources and reinitializing the cluster session when the cluster resources are insufficient in the running process.
8. The apparatus of any of claims 5 to 7, wherein the cluster resources comprise memory and CPU resources of the cluster.
9. A computer-readable storage medium, in which a computer program is stored, wherein the computer program is arranged to perform the method of any of claims 1 to 4 when executed.
10. an electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to execute the computer program to perform the method of any of claims 1 to 4.
CN201910866414.5A 2019-09-12 2019-09-12 metadata collection method and device for data source Pending CN110569317A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910866414.5A CN110569317A (en) 2019-09-12 2019-09-12 metadata collection method and device for data source

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910866414.5A CN110569317A (en) 2019-09-12 2019-09-12 metadata collection method and device for data source

Publications (1)

Publication Number Publication Date
CN110569317A true CN110569317A (en) 2019-12-13

Family

ID=68779887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910866414.5A Pending CN110569317A (en) 2019-09-12 2019-09-12 metadata collection method and device for data source

Country Status (1)

Country Link
CN (1) CN110569317A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170030853A1 (en) * 2014-11-21 2017-02-02 John William Hodges, JR. Apparatus, Method and System for Distributed Chemical or Biological to Digital Conversion to Digital Information Using Radio Frequencies
CN108681489A (en) * 2018-05-25 2018-10-19 西安交通大学 It is a kind of it is super calculate environment under mass data in real time acquisition and processing method
CN109542867A (en) * 2018-11-26 2019-03-29 成都四方伟业软件股份有限公司 Distribution type data collection method and device
CN109800271A (en) * 2019-02-23 2019-05-24 湖北理工学院 A kind of information collecting method based on big data
CN110134738A (en) * 2019-05-21 2019-08-16 中国联合网络通信集团有限公司 Distributed memory system resource predictor method, device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170030853A1 (en) * 2014-11-21 2017-02-02 John William Hodges, JR. Apparatus, Method and System for Distributed Chemical or Biological to Digital Conversion to Digital Information Using Radio Frequencies
CN108681489A (en) * 2018-05-25 2018-10-19 西安交通大学 It is a kind of it is super calculate environment under mass data in real time acquisition and processing method
CN109542867A (en) * 2018-11-26 2019-03-29 成都四方伟业软件股份有限公司 Distribution type data collection method and device
CN109800271A (en) * 2019-02-23 2019-05-24 湖北理工学院 A kind of information collecting method based on big data
CN110134738A (en) * 2019-05-21 2019-08-16 中国联合网络通信集团有限公司 Distributed memory system resource predictor method, device

Similar Documents

Publication Publication Date Title
JP5298117B2 (en) Data merging in distributed computing
CA3109481A1 (en) Identification and application of hyperparameters for machine learning
WO2019148713A1 (en) Sql statement processing method and apparatus, computer device, and storage medium
US8959229B1 (en) Intelligently provisioning cloud information services
CN109344126B (en) Method and device for processing map, storage medium and electronic device
CN108509453B (en) Information processing method and device
CN102957622A (en) Method, device and system for data processing
CN108875035B (en) Data storage method of distributed file system and related equipment
CN106407395A (en) A processing method and device for data query
CN108073641B (en) Method and device for querying data table
CN112182031B (en) Data query method and device, storage medium and electronic device
CN110990381B (en) Processing method and device of server, storage medium and electronic device
US8667008B2 (en) Search request control apparatus and search request control method
CN110609924A (en) Method, device and equipment for calculating total quantity relation based on graph data and storage medium
CN113885971A (en) State management method and device based on self-adaptive platform system
CN109388552B (en) Method and device for determining duration of starting application program and storage medium
CN106940710B (en) Information pushing method and device
CN110569317A (en) metadata collection method and device for data source
CN114466387B (en) Updating method and device of configuration file of base station, storage medium and electronic device
CN110716938A (en) Data aggregation method and device, storage medium and electronic device
CN116226178A (en) Data query method and device, storage medium and electronic device
CN111881086B (en) Big data storage method, query method, electronic device and storage medium
CN115563160A (en) Data processing method, data processing device, computer equipment and computer readable storage medium
CN112100208A (en) Operation request forwarding method and device
CN111008220A (en) Dynamic identification method and device of data source, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191213

RJ01 Rejection of invention patent application after publication