CN114490834B

CN114490834B - Method and device for replacing big data calculation operation data source based on Kubernetes

Info

Publication number: CN114490834B
Application number: CN202210357263.2A
Authority: CN
Inventors: 王伟华; 刘井山; 樊宇; 梅进
Original assignee: Gradient Cloud Technology Beijing Co ltd
Current assignee: Gradient Cloud Technology Beijing Co ltd
Priority date: 2022-04-07
Filing date: 2022-04-07
Publication date: 2022-06-21
Anticipated expiration: 2042-04-07
Also published as: CN114490834A

Abstract

The invention provides a method and a device for replacing a big data computing operation data source based on Kubernets. The time for inquiring the data source to be connected in the plurality of data sources is saved, and the efficiency for inquiring the data source to be submitted is greatly improved.

Description

Method and device for replacing big data calculation operation data source based on Kubernetes

Technical Field

The invention belongs to the technical field of computers, and particularly relates to a method and a device for replacing a big data computing operation data source based on Kubernetes.

Background

At present, when a big data operation algorithm is connected with a data source, connection information such as an access host address, a port, a user name, a password, a certificate, a path and the like of the data source needs to be written into the algorithm, when the data source is replaced, on one hand, connection information of the data source needs to be inquired in a large number of data sources, and on the other hand, a plurality of connection information such as the access host address, the port, the user name, the password, the certificate, the path and the like of a new data source need to be written into the algorithm again.

In actual use, a large number of big data operation algorithms in a company need to continuously replace data sources, excessive time is needed for finding out connection information of a new data source from excessive data sources, and connection information such as an access host address, a port, a user name, a password, a certificate, a path and the like of the data source needs to be rewritten in the algorithms when the data sources are replaced every time, so that a large amount of workload of writing in the algorithms exists, and the efficiency of updating the data sources is low.

Disclosure of Invention

The invention aims to solve the technical problem of how to improve the efficiency of connecting a data source during big data operation, and provides a method and a device for replacing a big data calculation operation data source based on Kubernetes.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a method for replacing a big data computing operation data source based on Kubernetes comprises the following steps:

step 1: respectively creating a serialization character string of a data source connection object for access information of different data sources to be connected, wherein the serialization character string of the data source connection object is a character string generated by serialization of the connection data source object created by using a programming language;

step 2: editing the serialized character string on a configuration resource in Kubernets container arrangement software, and creating the configuration resource in the Kubernets container arrangement software according to the edited configuration resource;

and step 3: the big data algorithm obtains the configuration resources corresponding to the data source from the plurality of created configuration resources according to the name of the data source to be connected, and extracts the serialized character strings of the data source connection object from the configuration resources corresponding to the data source;

and 4, step 4: and deserializing the serialized character strings of the data source connection object to obtain the connection object of the data source for connection.

Further, the data source access information in step 1 is information required by the big data algorithm to access the data source.

Further, in step 2, editing the serialized character string on a configuration resource in kubernets container arrangement software means:

appointing the name key of the configuration resource as the data source name corresponding to the serialized character string;

designating the data key of the configuration resource as a serialized string.

Further, in step 3, the method for obtaining the configuration resource corresponding to the data source according to the name of the data source to be connected is as follows:

and querying and finding the configuration resources corresponding to the data source to be connected through a query command provided by Kubernetes container arrangement software and the name of the data source to be connected.

Further, the method for deserializing the serialized character string of the data source connection object in step 4 is as follows: and restoring the serialized character strings of the data source connection objects into the data source connection objects in the memory by using the reverse serialization operation of the programming language serialization operation.

Further, the programming language is a programming language that provides serialized object functionality.

Further, configuration resources in the kubernets container orchestration software include: secret resource, ConfigMap resource.

The invention also provides a device for replacing the big data calculation operation data source based on Kubernets, which is characterized by comprising the following modules:

a serialization string generation module: the serialization character strings used for respectively establishing access information of different data sources to be connected into a data source connection object are character strings generated by serialization of the connection data source object established by using a programming language;

a configuration resource creation module: the system comprises a Kubernets container arrangement software, a database and a database, wherein the Kubernets container arrangement software is used for editing the serialized character strings on configuration resources in the Kubernets container arrangement software and creating configuration resources in the Kubernets container arrangement software according to the edited configuration resources;

the serialization string extraction module: the big data algorithm is used for acquiring the configuration resource corresponding to the data source from the plurality of created configuration resources according to the name of the data source to be connected, and extracting the serialized character string of the data source connection object from the configuration resource corresponding to the data source;

a connecting module: and the device is used for performing deserialization on the serialized character strings of the data source connection objects extracted from the serialized character string extraction module to obtain the connection objects of the data source for connection.

Further, in the connection module, the method for deserializing the serialized character strings comprises the following steps: and restoring the serialized character strings of the data source connection objects into the data source connection objects in the memory by using the reverse serialization operation of the programming language serialization operation.

By adopting the technical scheme, the invention has the following beneficial effects:

according to the method and the device for replacing the big data computing operation data source based on Kubernets, different data source connection objects are stored in configuration resources in Kubernets container arrangement software in a character string mode, and when the number of data sources is increased, the data source to be connected can be quickly found through query commands provided by the Kubernets container arrangement software. The time for inquiring the data source to be connected in the plurality of data sources is saved, and the efficiency for inquiring the data source to be submitted is greatly improved.

In addition, when the data source is replaced by the big data algorithm, all the connection information such as the access host address, the port, the user name, the password, the certificate, the path and the like of the new data source do not need to be rewritten in the algorithm, only the name of the new data source needs to be written in the algorithm, the serialized character string of the data source connection object is obtained through Kubernetes container arrangement software, the deserialization is carried out on the serialized character string of the data source connection object to obtain the connection object of the new data source, the new data source can be directly accessed through the connection object of the new data source, the workload of rewriting all the connection information such as the access host address, the port, the user name, the password, the certificate, the path and the like of the new data source in the algorithm is reduced, the efficiency of updating the data source by the big data algorithm is improved, and the company benefit is improved.

Drawings

FIG. 1 is a flow chart of the system of the present invention.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

FIG. 1 shows an embodiment of the present invention of a method for replacing a data source of a big data computing job based on Kubernets, comprising the following steps:

step 1: and respectively creating a serialization character string of a data source connection object for different data source access information, wherein the serialization character string of the data source connection object is a character string generated by serialization of the connection data source object created by using a programming language. In this embodiment, the data source access information is information required by the big data algorithm to access the data source, and includes: host address, port, user name, password, certificate, path and other information, and the used programming language is a programming language capable of providing a serialized object function, and comprises the following steps: java, Python, Go.

And 2, step: editing the serialized character string on a configuration resource in Kubernets container arrangement software, and creating the configuration resource in the Kubernets container arrangement software according to the edited configuration resource.

In this embodiment, the configuration resources in the kubernets container orchestration software include: secret resource, ConfigMap resource. Editing the serialized character string on a configuration resource in Kubernets container arrangement software means that: appointing the name key of the configuration resource as the data source name corresponding to the serialized character string; and designating the data key of the configuration resource as a serialization character string. The content of the data key is a key value pair structure. The method for creating configuration resources in Kubernets container orchestration software comprises the following steps: kubecect apply, and the like.

If a plurality of HDFS file system data sources are subjected to editing on a Secret resource in Kubernets container arrangement software through a document editor, wherein a name key of the Secret resource is designated as the name of the HDFS file system data source; and specifying the data key content of the Secret resource as a serialization character string of an HDFS file system data source connection object created by the HDFS file system data source specified by the Secret resource name key. In this embodiment, serialized character string manners of data source connection objects are respectively created according to different data source access information, the different data source connection objects are stored in configuration resources in kubernets container arrangement software in a character string manner, when the number of data sources increases, the configuration resources of the data source to be connected can be quickly found through query commands provided by the kubernets container arrangement software, and then the access information of the data source to be connected edited on the configuration resources can be found through the configuration resources. The time for inquiring the data source to be connected in the plurality of data sources is saved, and the efficiency for inquiring the data source to be submitted is greatly improved.

And step 3: and the big data algorithm acquires the configuration resource corresponding to the data source from the plurality of created configuration resources according to the name of the data source to be connected, and extracts the serialized character string of the data source connection object from the configuration resource corresponding to the data source. In this embodiment, the kubernets container arrangement software obtains the configuration resource by kubecectels get and the like, and extracts the serialized character string containing the data connection object information from the data key of the configuration resource. Because the name key of the configuration resource is the name of the data source to be connected, the corresponding configuration resource is quickly found through the query command provided by the Kubernets container arrangement software, and then the connection access information of the data source to be connected is found.

In this embodiment, the deserialization method for the serialized character string of the data source connection object is that reverse serialization operation of the programming language serialization operation is used, so that the serialized character string of the data source connection object is restored to the data source object in the memory, and then the program is run, so that the access object can be quickly connected without rewriting connection information into the algorithm, thereby reducing a large amount of workload of writing in the algorithm and improving the efficiency of updating the data source.

In this embodiment, when the HDFS file system data source is replaced by the big data algorithm, according to the new name of the HDFS file system data source to be connected, a Secret resource to be connected with the HDFS file system data source is obtained through a kubecectes container arrangement software-provided kubecect command, and a serialization character string of a connection object to be connected with the HDFS file system data source is obtained in a data key of the Secret resource. And performing deserialization on the serialized character strings of the HDFS file system data source connection object by using a Java serialization module to obtain the HDFS file system data source connection object, and connecting a new HDFS file system data source through the HDFS file system data source connection object.

The invention also provides a device for replacing a big data calculation operation data source based on Kubernets, which comprises the following modules:

a connecting module: and the device is used for performing deserialization on the serialized character strings of the data source connection objects extracted from the serialized character string extraction module to obtain the connection objects of the data source for connection. In the connection module of this embodiment, the method for deserializing the serialized character string includes: and restoring the serialized character strings of the data source connection objects into the data source connection objects in the memory by using the reverse serialization operation of the programming language serialization operation.

When the data source is replaced by the big data algorithm, all the connection information such as the access host address, the port, the user name, the password, the certificate, the path and the like of the new data source do not need to be rewritten in the algorithm, only the name of the new data source to be connected needs to be written in the algorithm, the serialized character string of the data source connection object is obtained through Kubernetes container arrangement software, the deserialization is carried out on the serialized character string of the data source connection object to obtain the connection object of the new data source, the new data source can be directly accessed through the connection object of the new data source, the workload of rewriting all the connection information such as the access host address, the port, the user name, the password, the certificate, the path and the like of the new data source is reduced, the efficiency of updating the data source by the big data algorithm is improved, and the company benefit is improved.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for replacing a big data computing operation data source based on Kubernetes is characterized by comprising the following steps:

step 2: editing the serialized character strings on configuration resources in Kubernets container arrangement software, and creating configuration resources in the Kubernets container arrangement software according to the edited configuration resources;

and 4, step 4: deserializing the serialized character strings of the data source connection object to obtain a connection object of the data source for connection;

in step 2, editing the serialized character string on a configuration resource in kubernets container arrangement software means:

and designating the data key of the configuration resource as a serialization character string.

2. The method according to claim 1, wherein the data source access information in step 1 is information required by a big data algorithm to access the data source.

3. The method according to claim 2, wherein in step 3, the method for obtaining the configuration resource corresponding to the data source according to the name of the data source to be connected is:

4. The method of claim 1, wherein the deserializing of the serialized character string of the data source connection object in step 4 is performed by: and restoring the serialized character strings of the data source connection objects into the data source connection objects in the memory by using the reverse serialization operation of the programming language serialization operation.

5. The method of any of claims 1 to 4, wherein the programming language is a programming language that provides serialized object functionality.

6. The method of any one of claims 1 to 4, wherein configuring resources in Kubernets container orchestration software comprises: secret resource, ConfigMap resource.

7. An apparatus for replacing big data computing operation data source based on Kubernetes, which is characterized by comprising the following modules:

editing the serialized character string on a configuration resource in Kubernets container arrangement software means that:

designating the data key of the configuration resource as a serialized character string;

8. The apparatus of claim 7, wherein the connection module deserializes the serialized character string by: and restoring the serialized character strings of the data source connection objects into the data source connection objects in the memory by using the reverse serialization operation of the programming language serialization operation.