CN116628066B

CN116628066B - Data transmission method, device, computer equipment and storage medium

Info

Publication number: CN116628066B
Application number: CN202310892885.XA
Authority: CN
Inventors: 石志林
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-07-20
Filing date: 2023-07-20
Publication date: 2024-01-09
Anticipated expiration: 2043-07-20
Also published as: CN116628066A

Abstract

The present application relates to a data transmission method, apparatus, computer device, storage medium and computer program product. The method involves artificial intelligence techniques, including: determining a target database to be subjected to data transmission with a source database, and creating a data pipeline between the source database and the target database; determining a data format optimization strategy matched with the source database according to a data conversion mode supported by the source database in a data transmission process; updating the character string conversion operation in the data conversion mode based on the data format optimization strategy, and converting the data to be transmitted in the source database according to the updated data conversion mode to obtain the data to be transmitted after the data format conversion; and transmitting the data to be transmitted after the data format conversion from the source database to the target database through the data pipeline. By adopting the method, the data transmission efficiency can be improved, so that the processing efficiency of database resource scheduling is improved.

Description

Data transmission method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technology, and in particular, to a data transmission method, apparatus, computer device, storage medium, and computer program product.

Background

With the development of computer technology, the internet world is being filled with a vast amount of data, including various types of data such as text, images, music, sound, video, etc., from sources such as advertising records, shopping consumption records, browsed web pages, transmitted messages, game records, medical records, traffic records, etc.

The various data in the internet are typically stored and managed by a database, i.e., a repository that organizes, stores and manages the data according to a data structure. Data in different business scenes or different platforms are often stored by different databases, and interaction is often required between various different databases, namely a large amount of data movement can be generated between the different databases, so that the efficiency of data transmission between the different databases is low at present.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a data transmission method, apparatus, computer device, computer-readable storage medium, and computer program product capable of improving data transmission efficiency.

In a first aspect, the present application provides a data transmission method. The method comprises the following steps:

Determining a target database to be subjected to data transmission with a source database, and creating a data pipeline between the source database and the target database;

determining a data format optimization strategy matched with the source database according to a data conversion mode supported by the source database in a data transmission process;

updating the character string conversion operation in the data conversion mode based on the data format optimization strategy, and converting the data to be transmitted in the source database according to the updated data conversion mode to obtain the data to be transmitted after the data format conversion; the data to be transmitted after the data format conversion comprises data which is obtained based on the character string conversion operation in the updated data conversion mode and belongs to the object array type;

and transmitting the data to be transmitted after the data format conversion from the source database to the target database through the data pipeline.

In a second aspect, the present application further provides a data transmission device. The device comprises:

the data pipeline creation module is used for determining a target database to be subjected to data transmission with the source database and creating a data pipeline between the source database and the target database;

the format optimization strategy determining module is used for determining a data format optimization strategy matched with the source database according to a data conversion mode supported by the source database in a data transmission process;

The data format conversion module is used for updating the character string conversion operation in the data conversion mode based on the data format optimization strategy, and carrying out data format conversion on the data to be transmitted in the source database according to the updated data conversion mode to obtain the data to be transmitted after the data format conversion; the data to be transmitted after the data format conversion comprises data which is obtained based on the character string conversion operation in the updated data conversion mode and belongs to the object array type;

and the data pipeline transmission module is used for transmitting the data to be transmitted after the data format conversion from the source database to the target database through the data pipeline.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the data transmission method described above when the processor executes the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the data transmission method described above.

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when being executed by a processor, implements the steps of the data transmission method described above.

The data transmission method, the device, the computer equipment, the storage medium and the computer program product are characterized in that for a source database and a target database to be subjected to data transmission, a data pipeline between the source database and the target database is created, a data format optimization strategy is determined according to a data conversion mode supported by the source database in the data transmission process, character string conversion operation in the data conversion mode is updated based on the data format optimization strategy, data format conversion is carried out on data to be transmitted according to the updated data conversion mode, the obtained data to be transmitted after the data format conversion comprises data which belongs to an object array type and is obtained based on the character string conversion operation in the updated data conversion mode, and the data to be transmitted after the data format conversion is transmitted from the source database to the target database through the created data pipeline. When data transmission is carried out between different databases, a data pipeline between a source database and a target database is constructed, character string conversion operation in the data conversion mode is updated based on a data format optimization strategy determined according to the data conversion mode supported by the source database, data format conversion is carried out on data to be transmitted in the source database according to the updated data conversion mode, the data to be transmitted after the data format conversion is directly transmitted between the source database and the target database through the constructed data pipeline, intermediate storage is avoided from being introduced for transfer, data belonging to an object array type is obtained through the conversion of the character string conversion operation, and the data to be transmitted in the source database is subjected to targeted format conversion by utilizing a data format optimization strategy, so that format conversion processing of the data to be transmitted in different databases can be simplified, and the data transmission efficiency between different databases is improved.

In a sixth aspect, the present application provides a data transmission method. The method comprises the following steps:

determining a source database to be subjected to data transmission with a target database, and creating a data pipeline between the target database and the source database;

receiving data to be transmitted after data format conversion transmitted from a source database through a data pipeline; the data to be transmitted after the data format conversion is obtained by updating the character string conversion operation in the data conversion mode supported by the source database in the data transmission process based on the data format optimization strategy and performing data format conversion on the data to be transmitted in the source database according to the updated data conversion mode; the data to be transmitted after the data format conversion comprises data which is obtained based on the character string conversion operation in the updated data conversion mode and belongs to the object array type;

and converting the data to be transmitted after the data format conversion into a target data format supported by a target database, and storing the data to be transmitted belonging to the target data format into the target database.

In a seventh aspect, the present application further provides a data transmission apparatus. The device comprises:

the data pipeline creation module is used for determining a source database to be subjected to data transmission with the target database and creating a data pipeline between the target database and the source database;

The data pipeline transmission module is used for receiving data to be transmitted after the data format conversion transmitted from the source database through a data pipeline; the data to be transmitted after the data format conversion is obtained by updating the character string conversion operation in the data conversion mode supported by the source database in the data transmission process based on the data format optimization strategy and performing data format conversion on the data to be transmitted in the source database according to the updated data conversion mode; the data to be transmitted after the data format conversion comprises data which is obtained based on the character string conversion operation in the updated data conversion mode and belongs to the object array type;

the data storage module is used for converting the data to be transmitted after the data format conversion into a target data format supported by the target database and storing the data to be transmitted belonging to the target data format into the target database.

In an eighth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the data transmission method described above when the processor executes the computer program.

In a ninth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the data transmission method described above.

In a tenth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when being executed by a processor, implements the steps of the data transmission method described above.

The data transmission method, the device, the computer equipment, the storage medium and the computer program product are characterized in that for a source database and a target database to be subjected to data transmission, a data pipeline between the source database and the target database is created, data to be transmitted after data format conversion transmitted from the source database is received through the data pipeline, the data to be transmitted after data format conversion is based on a data format optimization strategy to update character string conversion operation in a data conversion mode supported by the source database in a data transmission process, the data to be transmitted in the source database is subjected to data format conversion according to the updated data conversion mode, and the data to be transmitted after data format conversion is converted into a target data format supported by the target database and then stored in the target database. When data transmission is carried out between different databases, a data pipeline between a source database and a target database is constructed, data to be transmitted after data format conversion is directly transmitted between the source database and the target database through the constructed data pipeline, intermediate storage is prevented from being introduced for transfer, data belonging to an object array type is obtained through character string conversion operation conversion, and the data to be transmitted in the source database is subjected to targeted format conversion by utilizing a data format optimization strategy, so that format conversion processing of the data transmitted by different databases can be simplified, and the data transmission efficiency between different databases is improved.

Drawings

FIG. 1 is a diagram of an application environment for a data transmission method in one embodiment;

FIG. 2 is a flow chart of a data transmission method in one embodiment;

FIG. 3 is a timing diagram of a data transmission method in one embodiment;

FIG. 4 is a flow diagram of determining a process communication interface in one embodiment;

FIG. 5 is a flow chart of a data transmission method according to another embodiment;

FIG. 6 is a timing diagram of a data transmission method according to another embodiment;

FIG. 7 is a flow diagram of coordinating data transfer between worker threads via a worker directory in one embodiment;

FIG. 8 is a flow diagram of a data pipeline generation in one embodiment;

FIG. 9 is a flow diagram of data transfer via an add-in one embodiment;

FIG. 10 is a block diagram of a data transmission device in one embodiment;

FIG. 11 is a block diagram of a data transmission device in another embodiment;

fig. 12 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The data transmission method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the source database management server 104 and the target database management server 106, respectively, via a network. The source database management server 104 is used for managing the source database, and the target database management server 106 is used for managing the target database. When data needs to be transferred from the source database to the target database, the source database may store the data that needs to be transferred. The source database may be integrated on the source database management server 104 or may be located on the cloud or other server. The target database may be integrated on the target database management server 106 or may be located on the cloud or other server.

The user may select a source database and a target database that need to be transmitted with data based on the terminal 102, and send a data transmission request to the source database management server 104 and the target database management server 106 through the terminal 102, so as to instruct the source database management server 104 and the target database management server 106 to transmit data. In performing data transmission, for a source database and a target database to be data-transmitted, the source database management server 104 may create a data pipe between the source database and the target database, and the source database management server 104 may construct a data pipe with the target database management server 106. The source database management server 104 determines a data format optimization strategy according to a data conversion mode supported by the source database in a data transmission process, updates a character string conversion operation in the data conversion mode based on the data format optimization strategy, performs data format conversion on data to be transmitted according to the updated data conversion mode, obtains data to be transmitted after the data format conversion, including data which belongs to an object array type and is obtained based on the character string conversion operation in the updated data conversion mode, and the source database management server 104 transmits the data to be transmitted after the data format conversion from the source database to the target database through a created data pipeline, specifically can transmit the data to be transmitted after the data format conversion from the source database management server 104 to the target database management server 106 through the data pipeline, and stores the data to the target database by the target database management server 106.

Further, at the time of data transmission, for a source database and a target database to be data-transmitted, the target database management server 106 may create a data pipe between the source database and the target database, and the specific target database management server 106 may construct a data pipe with the source database management server 104. The target database management server 106 receives, through a data pipeline, data to be transmitted after data format conversion transmitted from the source database, where the data to be transmitted after data format conversion is obtained by updating, by the source database management server 104, a character string conversion operation in a data conversion mode supported by the source database in a data transmission process based on a data format optimization policy, and converting, according to the updated data conversion mode, the data to be transmitted in the source database, where the data to be transmitted after data format conversion includes data that belongs to an object array type and is obtained based on the character string conversion operation in the updated data conversion mode. The target database management server 106 converts the data to be transmitted after the data format conversion into a target data format supported by the target database and stores the target data format in the target database.

The terminal 102 may be, but not limited to, various desktop computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The source database management server 104 and the target database management server 106 may be independent physical servers, may be a server cluster or a distributed system formed by a plurality of physical servers, and may also be cloud servers for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like. The terminal 102 and the server 104 may be directly or indirectly connected by wired or wireless communication. The terminal 102 and the database node 104 and the server 104 may each be directly or indirectly connected by wired or wireless communication means, respectively.

Cloud technology (Cloud technology) refers to a hosting technology that unifies serial resources such as hardware, software, networks and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is a generic term of network technology, information technology, integration technology, management platform technology, application technology and the like based on cloud computing business model application, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.

Specifically, cloud computing (cloud computing) refers to the delivery and usage patterns of IT infrastructure, meaning that required resources are obtained in an on-demand, easily scalable manner over a network; generalized cloud computing refers to the delivery and usage patterns of services, meaning that the required services are obtained in an on-demand, easily scalable manner over a network. Such services may be IT, software, internet related, or other services. Cloud Computing is a product of fusion of traditional computer and network technology developments such as Grid Computing (Grid Computing), distributed Computing (distributed Computing), parallel Computing (Parallel Computing), utility Computing (Utility Computing), network storage (Network Storage Technologies), virtualization (Virtualization), load balancing (Load balancing), and the like. With the development of the internet, real-time data flow and diversification of connected devices, and the promotion of demands of search services, social networks, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Unlike the previous parallel distributed computing, the generation of cloud computing will promote the revolutionary transformation of the whole internet mode and enterprise management mode in concept.

The data transmission method provided by the embodiment of the application can be realized based on artificial intelligence (Artificial Intelligence, AI) technology. Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions. The data transmission method can rely on the artificial intelligence technology to carry out data format optimization strategy determination, data format conversion and other processes, so that the data transmission efficiency is further improved.

In one embodiment, as shown in fig. 2, a data transmission method is provided, where the method is executed by a computer device, specifically, may be executed by a computer device such as a terminal or a server, or may be executed by both the terminal and the server, and in this embodiment of the present application, the method is applied to a server connected to a source database in fig. 1, that is, a source database management server 104 is illustrated as an example, and includes the following steps:

step 202, determining a target database to be data-transmitted with the source database, and creating a data pipeline between the source database and the target database.

The database is a warehouse for storing data, has a large storage space and can store millions, tens of millions and hundreds of millions of data. Data transmission can be performed between different databases, for example, data in the database a can be transmitted to the database B. The database can be managed by a database management system (Database Management System, DBMS), and particularly can be subjected to storage, interception, security assurance, backup and other processes. When data transmission is carried out between different databases, the data transmission can be realized based on a database management system corresponding to the database, for example, for the database A, the data transmission can be carried out through the database management system 1; for the database B, the database management system 2 may be used to manage the data in the database a, when the data in the database a is transmitted to the database B, the database management system 1 may be used to obtain the data to be transmitted from the database a, and transmit the read data to be transmitted to the database management system 2, and the database management system 2 may write the transmitted data into the database B, so as to implement the data transmission processing between the database a and the database B.

The source database is a database from which data is exported in the data transfer process, and the target database is a database from which data is imported in the data transfer process. For example, when data in database a is transferred into database B, database a acts as the source database and database B acts as the target database; when transferring data in database B into database a, database B acts as the source database and database a acts as the target database. A data pipe is a channel in which data is transferred between different databases, i.e. data is transferred from a source database to a target database in the data pipe. The data pipeline can be created based on the communication interfaces of the source database and the target database, and the data between the source database and the target database can be directly transmitted through the data pipeline without introducing intermediate storage for transfer. In the data pipeline, various formats of data may be transmitted according to actual needs, such as text format data, binary format data, and the like.

Specifically, when data transmission based on the source database is needed, that is, when data in the source database needs to be transmitted to other databases, a server connected with the source database determines a target database to be subjected to data transmission with the source database, that is, determines a database to be imported corresponding to data exported in the source database. The server can create a data pipeline between the source database and the target database, and specifically can create a data pipeline directly connected between the source database and the target database according to respective communication interfaces of the source database and the target database, so that the direct transmission of data between the source database and the target database can be realized based on the data pipeline. In a specific application, the server may receive a data transmission request sent by the user client, determine a target database to be subjected to data transmission based on the data transmission request, further determine a communication interface of the target database, and create a data pipeline between the source database and the target database in combination with the communication interface of the source database.

Step 204, determining a data format optimization strategy matched with the source database according to the data conversion mode supported by the source database in the data transmission process.

The data conversion method is a method of converting a format of data to be transmitted. The data stored in different databases may have different data formats, and when data transmission is performed, it is often necessary to convert the data into a data format suitable for data transmission, such as converting into a text format, converting into a binary format, and so on for transmission. When data transmission is carried out, different databases support different data conversion modes, such as direct format conversion or calling of an external format library for format conversion. For different data conversion modes supported by the source database, format conversion optimization can be performed based on different optimization strategies. The data format optimization strategy is determined based on a data conversion mode supported by the source database and is used for optimizing data format conversion of the source database when the derived data is transmitted so as to simplify format conversion processing. For example, the data format optimization policy may be to perform format conversion on a specific type of data in a specific format conversion manner, to remove redundancy of the data, or to simplify the data encoding process, etc.

Specifically, the server may determine a data conversion manner supported by the source database in a data transmission process, for example, the server may determine a data conversion manner when the source database converts derived data into a text format in the data transmission process, which may specifically include directly performing text format conversion, performing text format conversion with the aid of an external format library, or the like. The server determines a data format optimization strategy matched with the source database based on a data conversion mode supported by the source database, and can optimize the format conversion process of the data exported by the source database through the data format optimization strategy, such as converting the literal quantity of the character string into the character string of the object array type. The mapping relation can be pre-constructed between various data conversion modes and various data format optimization strategies, so that different data conversion modes can correspond to different data format optimization strategies, and the data format optimization strategy aiming at the source database can be determined based on the mapping relation and the determined data conversion modes.

Step 206, updating the character string conversion operation in the data conversion mode based on the data format optimization strategy, and converting the data to be transmitted in the source database according to the updated data conversion mode to obtain the data to be transmitted after the data format conversion; the data to be transmitted after the data format conversion comprises data which is obtained based on the character string conversion operation in the updated data conversion mode and belongs to the object array type.

The character string conversion operation is an operation of converting a data format for the character string type data in the data conversion mode, specifically may be a conversion operation of converting the data into a character string in a text format, and may include a character string literal conversion operation, a constructor operation, a numerical conversion operation, and the like. The data to be transmitted is data that needs to be transmitted between the source database and the target database. And the data to be transmitted is transmitted from the source database to the target database, namely, the data to be transmitted is stored in the source database, and the data format of the data to be transmitted is matched with the source database. The data format of the data to be transmitted may not be suitable for transmission in the data pipeline, and then the data to be transmitted needs to be subjected to data format conversion to be converted into a data format suitable for transmission through the data pipeline, such as text format, binary format, and the like. The data to be transmitted after the data format conversion comprises data belonging to the object array type, which is obtained based on the updated character string conversion operation. The object array type refers to a data type for data storage through the object array, such as AStrings data, which may include the object array type. The object array is used for data storage, so that the character string conversion processing is simplified, the data volume of the character string conversion processing is reduced, and the data transmission efficiency is improved.

Specifically, the server may determine data to be transmitted, the specific server may read the data to be transmitted from the source database, and the server updates the string conversion operation in the data conversion mode based on the determined data format optimization policy, for example, may update the string conversion operation to a string conversion operation of a target array type, and obtain an updated data conversion mode. The server performs data format conversion on the read data to be transmitted according to the updated data conversion mode, wherein the data to be transmitted after the data format conversion comprises data which belongs to the object array type and is obtained based on the character string conversion operation in the updated data conversion mode, so that the data to be transmitted is converted into an intermediate data format suitable for transmission in a created data pipeline, and the data to be transmitted after the data format conversion is obtained. In a specific application, the data conversion method may include a conversion method of converting the data to be transmitted into a text format, that is, after the data to be transmitted read from the source database needs to be converted into the data of the text format, the data to be transmitted in the text format is transmitted in the created data pipeline.

And step 208, transmitting the data to be transmitted after the data format conversion from the source database to the target database through the data pipeline.

Specifically, the server transmits the data to be transmitted after the data format conversion from the source database to the target database, and specifically, the data to be transmitted after the data format conversion is transmitted through a data pipeline created between the source database and the target database. The data pipeline can directly connect the source database and the target database, so that data can be directly transmitted between the source database and the target database without introducing intermediate storage for transfer.

In one specific application, as shown in FIG. 3, a source database is connected to a source DBMS, through which the source database can be managed; the target database is connected with the target DBMS, and can be managed through the target DBMS. When data needs to be transferred from a source database to a target database, this can be accomplished by the source DBMS and the target DBMS. The specific user side can respectively send data transmission requests to the source DBMS and the target DBMS, and the data transmission requests sent to the source DBMS can be data export requests so as to instruct the source DBMS to export data in the source database; the data transfer request sent to the target DBMS may be a data import request to instruct the target DBMS to import data into the target database. The source DBMS may determine a target database and further determine a target DBMS to which the target database is connected, the source DBMS creating a data conduit with the target DBMS such that data transfer may be directly performed between the source DBMS and the target DBMS based on the data conduit. The source DBMS determines a data conversion mode supported by the source database and determines a data format optimization strategy matched with the source database according to the data conversion mode. The source DBMS determines data to be transmitted from a source database, updates character string conversion operation in a data conversion mode based on a data format optimization strategy, performs data format conversion on the data to be transmitted according to the updated data conversion mode, wherein the data to be transmitted after the data format conversion comprises data which belongs to an object array type and is obtained based on the character string conversion operation in the updated data conversion mode, and transmits the data to be transmitted after the data format conversion through a data pipeline, namely, transmits the data to be transmitted after the data format conversion to a target BDMS through the data pipeline. The target BDMS may write the received data to be transmitted after the data format conversion into the target database.

In the data transmission method, for a source database and a target database to be subjected to data transmission, a data pipeline between the source database and the target database is created, a data format optimization strategy is determined according to a data conversion mode supported by the source database in a data transmission process, character string conversion operation in the data conversion mode is updated based on the data format optimization strategy, data format conversion is carried out on the data to be transmitted according to the updated data conversion mode, the obtained data to be transmitted after the data format conversion comprises data which is obtained based on the character string conversion operation in the updated data conversion mode and belongs to an object array type, and the data to be transmitted after the data format conversion is transmitted from the source database to the target database through the created data pipeline. When data transmission is carried out between different databases, a data pipeline between a source database and a target database is constructed, character string conversion operation in the data conversion mode is updated based on a data format optimization strategy determined according to the data conversion mode supported by the source database, data format conversion is carried out on data to be transmitted in the source database according to the updated data conversion mode, the data to be transmitted after the data format conversion is directly transmitted between the source database and the target database through the constructed data pipeline, intermediate storage is avoided from being introduced for transfer, data belonging to an object array type is obtained through the conversion of the character string conversion operation, and the data to be transmitted in the source database is subjected to targeted format conversion by utilizing a data format optimization strategy, so that format conversion processing of the data to be transmitted in different databases can be simplified, and the data transmission efficiency between different databases is improved.

In one embodiment, the data conversion means comprises direct text conversion means and the data format optimization strategy comprises a string optimization strategy; updating the character string conversion operation in the data conversion mode based on the data format optimization strategy, and converting the data to be transmitted in the source database according to the updated data conversion mode to obtain the data to be transmitted after the data format conversion, wherein the method comprises the following steps: updating the character string conversion operation in the direct text conversion mode based on the character string optimization strategy to obtain an updated text conversion mode; and converting the text format of the data to be transmitted in the source database according to the updated text conversion mode, and obtaining the data to be transmitted after the data format conversion.

The direct text conversion mode means that the source database supports local direct text format conversion without calling an external format conversion tool to assist in text format conversion. The source database supports a direct text conversion mode, so that when the source database derives data to be transmitted, the data format conversion is carried out on the data to be transmitted, and text format data suitable for transmission in a data pipeline is obtained. The character string optimizing strategy refers to a format conversion optimizing strategy for character string type data when converting the data to be transmitted in the source database into a text format, namely, a strategy for updating and optimizing a character string conversion operation for converting the data to be transmitted into a character string in a direct text conversion mode, and the character string optimizing strategy can be a strategy for modifying the character string, for example, the organization form of the character string data can be optimized, and particularly, the character string connection can be optimized to be connected based on an object array, so that the simplification of the character string format conversion processing is facilitated.

Specifically, when the server determines that the source database supports the direct text conversion mode, it indicates that the source database can locally convert data into text format, that is, can directly derive the text format data, the determined data format optimization strategy includes a character string optimization strategy, that is, can perform format optimization on the character string data involved in the text format conversion. The server can update the character string conversion operation in the direct text conversion mode based on the character string optimization strategy, namely, the original direct text conversion mode of the source data is optimized through the character string optimization strategy, and particularly, the character string conversion operation processing aiming at the character string type data in the direct text conversion mode can be modified, for example, the character string data can be modified into an object array form, so that the character string data of the object array type is obtained. The server performs text format conversion on the data to be transmitted in the source database according to the updated text conversion mode so as to convert the data to be transmitted into a text format, the data to be transmitted after the data format conversion is obtained, and the data to be transmitted after the data format conversion is suitable for transmission in a data pipeline.

In this embodiment, when the source database supports the direct text conversion mode, the server updates the character string conversion operation in the direct text conversion mode based on the character string optimization strategy, and performs text format conversion on the data to be transmitted in the source database according to the updated direct text conversion mode, so as to obtain the data to be transmitted after the data format conversion, thereby simplifying format conversion processing for the character string in the data to be transmitted, and being beneficial to improving data transmission efficiency between different databases.

In one embodiment, updating a string conversion operation in a direct text conversion mode based on a string optimization strategy to obtain an updated text conversion mode includes: constructing a text conversion data flow diagram according to a direct text conversion mode; determining a specified character string conversion operation from the text conversion dataflow graph; based on the character string optimization strategy, the appointed character string conversion operation in the direct text conversion mode is updated to the character string conversion operation of the object array type, and the updated text conversion mode is obtained.

The direct text conversion mode is a mode that a source database directly performs text format conversion on data to be transmitted at a local end, and can comprise text format conversion operations on various types of data and processing logic of various text format conversion operations. The text conversion dataflow graph is constructed according to a direct text conversion approach, and specifically can be constructed based on text format conversion operations included in the direct text conversion approach. Various text format conversion operations included in the direct text conversion mode may be visually described by the text conversion dataflow graph, such as by characterizing the text format conversion operations by nodes in the text conversion dataflow graph, and by characterizing the data flow by edges connecting the nodes. The specified string conversion operation is a text format conversion operation that needs to be optimized, and specifically, a string conversion operation that needs to be optimized, for example, may include a string literal amount conversion operation, a constructor operation, a numerical conversion operation, and the like. The character string conversion operation of the object array type refers to an operation of converting data into character string data of the object array type when converting the data into data of a character string format. Compared with the original specified character string conversion operation in the direct text conversion mode, the character string conversion operation of the object array type can convert data into character string data of the object array type, such as AStrings data of the object array type can be obtained, so that data storage can be carried out through the object array, the character string conversion processing is facilitated to be simplified, the data volume of the character string conversion processing is reduced, and the data transmission efficiency is improved.

Specifically, the server may construct a text conversion dataflow graph based on the direct text conversion approach, and may specifically determine various text format conversion operations involved in the direct text conversion approach, such as string encoding operations for numeric type data, separating operations for data, wrapping operations for data, and so on. The server builds a text conversion dataflow graph in connection with processing logic of various text format conversion operations, such as in connection with the processing order of the various text format conversion operations. The text-conversion dataflow graph may include nodes and edges connecting the nodes, where the nodes may characterize text format conversion operations and the edges may characterize data flow, i.e., the flow of processing of various text format conversion operations in performing format conversion on the data. The server determines a specified character string conversion operation belonging to a conversion operation of converting data into character strings based on the text conversion data flow graph. In a specific application, the server may determine, from among the various string conversion operations, a string conversion operation that needs to be optimized according to a data type for which the various string conversion operations are directed, and processing logic of the string conversion operation. In the respective conversion operations of converting the data into character strings, the server may select a specified character string conversion operation for optimization, such as a character string conversion operation for numeric type data, or a character string concatenation operation as a specified character string conversion operation for optimization, to perform an optimization process of data format conversion in a targeted manner in connection with the source data. The server optimizes the specified character string conversion operation based on the character string optimization strategy, and can update the specified character string conversion operation in the direct text conversion mode to the character string conversion operation of the object array type to obtain an updated text conversion mode. When text format conversion processing is performed on data based on a character string conversion operation of an object array type, the data can be converted into character string data of the object array type, so that each data can be stored based on the object array.

In this embodiment, the server establishes the text conversion data flow graph, determines the specified string conversion operation from the text conversion data flow graph, updates the specified string conversion operation to the string conversion operation of the object array type based on the string optimization policy, thereby implementing update processing of the direct text conversion mode, and performing text format conversion based on the updated text conversion mode, so that format conversion processing for the strings in the data to be transmitted can be simplified, and data transmission efficiency between different databases can be improved.

In one embodiment, according to an updated text conversion manner, text format conversion is performed on data to be transmitted in a source database, and data to be transmitted after data format conversion is obtained, including: determining character string data belonging to the character string type from data to be transmitted in a source database; converting the character string literal quantity of the character string data into character string optimization data of the object array type through the character string conversion operation of the object array type; performing text format conversion on data except character string data in the data to be transmitted in a direct text conversion mode to obtain text format data; and obtaining data to be transmitted after data format conversion according to the text format data and the character string optimization data.

The character string data is data belonging to a character string type in the data to be transmitted, namely, the character string data needs to be converted into a character string format when being transmitted in a data pipeline, and the character string data can specifically comprise character string literal quantity, construction function, numerical data and the like. The character string optimization data is data obtained by converting text formats of the character string data according to the character string conversion operation of the object array type, and specifically may be data belonging to the object array type obtained by converting the character string literal quantity of the character string data. The character string literal quantity is a series of characters which are led by double-quotation marks in the character string data, and can be specifically various characters such as integers, floating point numbers, symbols, character strings and the like. The character string optimization data belongs to the object array type, namely the character string optimization data can be recorded through the object array, and a plurality of character string optimization data can be recorded in one object array, so that the format conversion processing of the character string data can be effectively simplified. For example, 4 string data such as x, "a", "c", "string" may be stored by the object array [ x "," a "," c "," string "]. The text format data is data obtained by converting text formats of data except for character string data in the data to be transmitted, and specifically may be text format data such as JSON (JavaScript Object Notation, JS object profile), CSV (Comma-separated values), and the like.

Specifically, the server may determine, from the data to be transmitted of the source database, character string data belonging to a character string type, that is, data to be subjected to format conversion processing by a character string conversion operation of an object array type. The server may determine the character string data belonging to the character string type according to the data type of each data in the data to be transmitted. The server determines the character string literal amount of the character string data, and specifically, may determine the character string literal amount according to the double-quotation marks in the character string data. The server performs format conversion on the character string literal quantity of the character string data through the character string conversion operation of the object array type so as to convert the character string literal quantity of the character string data into character string optimization data of the object array type, wherein the character string optimization data can be recorded through the object array, for example, the character string optimization data can be recorded through an array of AStrings, and thus the respective character string literal quantity of each character string data can be recorded. For other data except the character string data in the data to be transmitted, the server can perform text format conversion on the basis of a direct text conversion mode so as to obtain text format data, such as data capable of respectively converting the other data into CSV format. The server obtains data to be transmitted after data format conversion according to the text format data and the character string optimization data, and the specific server can combine the text format data and the character string optimization data to obtain the data to be transmitted in the text format, namely the data to be transmitted after data format conversion.

In this embodiment, the server converts the character string literal quantity of the character string data into the character string optimization data of the object array type through the character string conversion operation of the object array type, converts other data except the character string data according to a direct text conversion mode to obtain text format data, and obtains data to be transmitted after data format conversion according to the text format data and the character string optimization data, and performs different format conversion on the data to be transmitted, so as to simplify format conversion processing on the character string in the data to be transmitted, and be favorable for improving data transmission efficiency between different databases.

In one embodiment, the data conversion means comprises external text conversion means and the data format optimization strategy comprises a data type optimization strategy; updating the character string conversion operation in the data conversion mode based on the data format optimization strategy, and performing data format conversion on the data to be transmitted in the source database according to the updated data conversion mode to obtain the data to be transmitted after the data format conversion, wherein the method comprises the following steps: updating the character string conversion operation of the external text conversion mode based on the data type optimization strategy; generating binary state data for data to be transmitted in a source database; text format conversion is carried out on the binary state data through the updated character string conversion operation, and data to be transmitted after data format conversion is obtained; the data to be transmitted after the data format conversion belongs to the object array type.

The external text conversion mode refers to a mode that a source database supports calling an external format conversion tool to assist in text format conversion, and specifically, text format conversion processing can be performed on data to be transmitted by calling an external format library. When the source database supports an external text conversion mode, the source database needs to call an external format database to convert the data to be transmitted into text format data suitable for transmission in a data pipeline when the data to be transmitted is exported. The data type optimization strategy refers to an optimization strategy capable of converting data to be transmitted into a unified designated data type when converting the data to be transmitted in a source database into a text format, for example, the data to be transmitted can be uniformly converted into data of an object array type. The binary state data is used for representing the data to be transmitted, namely, the binary state data is used as binary representation of the data to be transmitted, the binary state data belongs to a binary format, and the data is represented by a plurality of binary data of 0 and 1, so that the data can be supported by multiple types of databases, namely, the multiple types of databases can convert the binary state data into corresponding formats according to the needs for storage.

Specifically, the source database supports an external text conversion mode, that is, an external format library needs to be called for assisting in text conversion, and the correspondingly determined data format optimization strategy comprises a data type optimization strategy, that is, the data type of the data to be transmitted needs to be optimized. When the data format is converted, the server can determine the character string conversion operation of the external text conversion mode based on the data type optimization strategy, for example, the conversion operation in the external text conversion mode can be replaced directly according to the conversion operation specified in the data type optimization strategy, so that the character string conversion operation is obtained, the updating is carried out on the character string conversion operation, for example, the data format output by the character string conversion operation can be modified, and the data to be transmitted can be converted into the data of the character string type through the updated character string conversion operation. The server generates binary state data for the data to be transmitted in the source database, and can specifically construct corresponding binary state data for the data to be transmitted, so that the data to be transmitted is represented by the binary state data. The server carries out text format conversion on the binary state data through the updated character string conversion operation, converts the binary state data into text format data, and obtains data to be transmitted after the data format conversion. The obtained data to be transmitted after data format conversion belongs to the object array type, and can be recorded by a character string of the object array type, such as binary state data of the data to be transmitted by AString. In the AString, each binary state data is used as the internal state of the AString, and is recorded through one or more object arrays, so that each binary state data can be read from the object array corresponding to the AString.

In this embodiment, when the source database supports the external text conversion mode, the server generates binary state data for the data to be transmitted, and performs text format conversion on the binary state data through a string conversion operation updated based on a data type optimization policy, so as to obtain data to be transmitted after data format conversion, where the data to be transmitted after data format conversion belongs to an object array type, so that the data to be transmitted can be recorded through the object array, and conversion into other types of text formats is not needed, so that conversion processing from the data to be transmitted to the text data format can be simplified, and data transmission efficiency between different databases is improved.

In one embodiment, creating a data pipeline between a source database and a target database includes: respectively determining a first process communication interface of a source database and a second process communication interface of a target database; and establishing communication connection between the first process communication interface and the second process communication interface to obtain a data pipeline between the source database and the target database.

The process communication interface is an interface for communicating with a process of the database, and specifically may be a socket, including an IP address (Internet Protocol Address, an internet protocol address), a port, and the like. Socket is an abstraction of endpoints that perform bi-directional communication between application processes on different hosts in a network, one Socket being the end of a process communication on the network, which provides a mechanism for application layer processes to exchange data using network protocols. The data transmission processing of the database is realized by a process based on a corresponding communication interface.

Specifically, the server may determine respective process communication interfaces of the source database and the target database, specifically determine a first process communication interface of the source database, and a second process communication interface of the target database, where the process communication interfaces may be configured by a process after creation, and may be, for example, sockets. A socket is a logical endpoint in a respective communication connection when two network applications communicate. Socket= (IP address: port number), the Socket is represented by writing an upper port number behind the point decimal IP address, separated by a colon or comma, and each transport layer connection is uniquely determined by two endpoints (i.e., two sockets) at both ends of the communication. For example: if the IP address is 210.37.145.1 and the port number is 23, then the resulting socket is (210.37.145.1:23). The server establishes communication connection between the two communication interfaces based on the first process communication interface and the second process communication interface, for example, communication connection can be established through respective sockets of the source database and the target database, so as to obtain a data pipeline between the source database and the target database. The data pipeline is obtained through the respective process communication interface connection of the source database and the target database, and the direct transmission processing of the data between the source database and the target database can be realized based on the data pipeline without introducing intermediate storage for transfer.

In this embodiment, the server creates a data pipeline between the source database and the target database according to the respective process communication interfaces of the source database and the target database, and does not need to introduce intermediate storage for transfer, which is beneficial to improving the data transmission efficiency between different databases.

In one embodiment, as shown in fig. 4, the process of determining the process communication interface, that is, determining the first process communication interface of the source database and the second process communication interface of the target database, respectively, includes the steps of:

step 402, when a data export request is received, creating a data export worker thread associated with the data export request and determining a first process communication interface for the data export worker thread.

Wherein the data export request may be sent by the user to the source database via the client to request the source database to export the data. The data export worker thread is a thread generated by the source database end and used for executing data export tasks. The database can correspondingly generate corresponding data export working threads aiming at different data export requests so as to realize concurrent data transmission processing through multiple threads. Data export worker threads to establish a communication connection with a target database, each data export worker thread may be directed to a configuration process communication interface, and may specifically be a socket including an IP address and port.

Specifically, the server may receive a data export request sent by a user through the terminal, and upon receiving the data export request, the server may create a data export worker thread associated with the data export request to perform a data export task of the data export request through the created data export worker thread. The process communication interface may be correspondingly configured for the created data export request, and the server may determine the first process communication interface of the data export worker thread, i.e., the first process communication interface as the source database.

Step 404, inquiring the data import working thread from the data import thread catalog; the data import thread directory includes candidate data import worker threads created by the target database based on the data import request.

The data import thread catalog is used for recording various candidate data import working threads, and the candidate data import working threads are created by the target database based on the data import request. When receiving a data import request sent by a user through a terminal, the target database can create a data import working thread associated with the data import request, and record the data import working thread into a data import thread catalog as a candidate data import working thread for selection by the source database.

Specifically, the server may acquire a data import thread directory, where each candidate data import worker thread created by the target database is included, and query the data import worker thread from the data import thread directory, so as to perform output transmission through the queried data import worker thread. In a specific application, the server may query the data import worker thread in sequence from the data import thread directory, e.g., may determine that an idle candidate data import worker thread is the query result.

Step 406, determining a second process communication interface for the data import worker thread.

Each data import worker thread can also be aimed at configuring a corresponding process communication interface. Specifically, the server further determines a second process communication interface of the data import worker thread for the queried data import worker thread to establish a data pipeline between the source database and the target database based on the second process communication interface.

In this embodiment, the server creates an associated data export working thread when receiving a data export request, queries the data import working thread from the data import thread directory, and determines the respective process communication interfaces of the data export working thread and the data import working thread, so that a data pipeline between the source database and the target database can be created based on the process communication interfaces, and intermediate storage is not required to be introduced for transferring, which is beneficial to improving the data transmission efficiency between different databases.

In one embodiment, the data transmission method further comprises: configuring a source database to perform data reading and data transmission through a first process communication interface; and reading data to be transmitted from the source database through the first process communication interface.

The data reading refers to a process of reading data to be transmitted from a source database, and the data transmission refers to a process of transmitting the data to be transmitted from the source database to a target database. Specifically, the server may configure the source database to perform data reading and data transmission through the first process communication interface, for example, data reading and data transmission may be performed through a socket, so that the source database may directly perform data transmission processing through the created data pipeline, and avoid introducing intermediate storage to perform transfer. The server reads data to be transmitted from the source database through the first process communication interface, and the specific server can call a data export working thread to read the data to be transmitted from the source database through the first process communication interface.

Further, the data to be transmitted after the data format conversion is transmitted from the source database to the target database through the data pipeline, which comprises the following steps: transmitting the data to be transmitted after the data format conversion to a target database in a data pipeline through a first process communication interface; and the data to be transmitted after the data format conversion is used for indicating the target database to convert the data to be transmitted after the data format conversion into the data format supported by the target database and then storing the data.

Specifically, when transmitting data to be transmitted to the target database, the server may transmit the data to be transmitted after the data format conversion to the target database in the data pipeline through the first process communication interface, and the specific server may call the data export work thread to perform data transmission in the data pipeline through the first process communication interface. After the target database receives the data to be transmitted after the data format conversion through the data pipeline, the data to be transmitted after the data format conversion can be subjected to format conversion to obtain the data of the data format supported by the target database, and the data is stored aiming at the data, so that the data transmission between the source database and the target database is realized.

In this embodiment, the server configures the source database to perform data reading and data transmission through the first process communication interface, reads data to be transmitted through the first process communication interface, and transmits the data to be transmitted after the data format conversion in the data pipeline through the first process communication interface, so that direct data transmission between the source database and the target database is realized, and data transmission efficiency between different databases can be improved.

In one embodiment, reading data to be transmitted from a source database through a first process communication interface includes: determining a file name format of data to be transmitted; when the file name format is of an active type, the data to be transmitted is read from the source database through the first process communication interface.

The file name format refers to a format of a file name of a file to which data belongs in a source database, and transmission of various data can be controlled through the file name format, for example, a data transmission mode adopted can be controlled based on the file name format. The activation type refers to a file name format that activates data transfer through a data pipe. For various file name formats, the activation type and the deactivation type may be divided to divide whether to activate data reading through the first process communication interface, i.e., whether to activate data transmission processing through the created data pipe. The kind of the file name format included in the activation type may be set according to actual needs.

Specifically, the server may determine a file name format of the data to be transmitted, specifically may determine a file to which the data to be transmitted belongs, and determine a file name format of the file to which the data to be transmitted belongs. The server determines whether the file name format belongs to an activation type, and the file name format that may be included in the activation type may be referred to as a reserved file name, i.e., a specific file name format used in data transmission. If the file name format of the data to be transmitted belongs to the activation type, the file name format indicates that the data to be transmitted needs to be transmitted through the data pipeline, and the server can read the data to be transmitted from the source database through a first process communication interface of the data pipeline. If the file name format is not of an active type, which indicates that the data to be transmitted is not transmitted through the data pipeline, other data reading modes, such as reading the data to be transmitted from the source database by adopting other communication interfaces, can be adopted.

In this embodiment, when determining that the file name format of the data to be transmitted belongs to the activation type, the server indicates that the data to be transmitted can be transmitted through the created data pipeline, and the server reads the data through the first process communication interface, so that the use of the data pipeline can be controlled based on the file name format, the source database is compatible with other data transmission modes, such as data transmission through a disk file reading mode, and an application scenario applicable to data transmission can be expanded.

In one embodiment, determining a data format optimization strategy matched with the source database according to a data conversion mode supported by the source database in a data transmission process comprises: determining the character string conversion operation of a source database in the data transmission process; determining a data conversion mode supported by a source database according to the character string conversion operation; based on the data conversion mode, a data format optimization strategy matched with the source database is determined.

The character string conversion operation is an operation of converting derived data into character strings in a data transmission process of a source database. Specifically, the server may determine a string conversion operation of the source database in the data transmission process, and analyze the string conversion operation to determine a data conversion mode supported by the source database, and specifically determine that the source database supports a direct text conversion mode or an external text conversion mode. And the server determines a data format optimization strategy matched with the source database according to the data conversion mode. The mapping relationship between the data format optimization strategy and the data conversion mode can be pre-constructed, so that the data format optimization strategy matched with the source database can be determined based on the data conversion mode and the mapping relationship.

In this embodiment, the server determines the data conversion mode supported by the source database based on the character string conversion operation of the source database in the data transmission process, and further determines the corresponding data format optimization policy, so that the data to be transmitted in the source database can be subjected to targeted format conversion through the data format optimization policy, format conversion processing of data transmitted by different databases can be simplified, and data transmission efficiency among different databases is improved.

In one embodiment, updating a string conversion operation in a data conversion mode based on a data format optimization strategy, performing data format conversion on data to be transmitted in a source database according to the updated data conversion mode, and obtaining the data to be transmitted after the data format conversion, including: deleting the data separator in the data to be transmitted of the source database, updating the character string conversion operation in the data conversion mode based on the data format optimization strategy, and carrying out data format conversion on the data to be transmitted with the deleted data separator according to the updated data conversion mode to obtain the data to be transmitted after the data format conversion.

The data separator is mark type data representing the beginning and the end of the composite type data, and the data separator can effectively separate the coincidence type data, such as an 'I' symbol in the coincidence type data, and can be used as the data separator to separate the data. In particular, the server may delete the data separator in the data to be transmitted of the source database, in particular by the server determining the data separator from the data to be transmitted and deleting the data separator. For the data to be transmitted after deleting the data separator, the server can update the character string conversion operation in the data conversion mode based on the data format optimization strategy, and perform data format conversion on the data to be transmitted after deleting the data separator according to the updated data conversion mode to obtain the data to be transmitted after data format conversion.

In this embodiment, the server deletes the data separator in the data to be transmitted and performs data format conversion, so that the data separator can be removed, so that format conversion processing for the data separator is omitted, format conversion processing for data transmitted by different databases can be further simplified, and data transmission efficiency between different databases is improved.

In one embodiment, updating a string conversion operation in a data conversion mode based on a data format optimization strategy, performing data format conversion on data to be transmitted in a source database according to the updated data conversion mode, and obtaining the data to be transmitted after the data format conversion, including: and performing redundancy elimination aiming at multiplexing attribute information in the data to be transmitted of the source database, updating character string conversion operation in a data conversion mode based on a data format optimization strategy, and performing data format conversion on the data to be transmitted after redundancy elimination according to the updated data conversion mode to obtain the data to be transmitted after data format conversion.

The multiplexed attribute information is multiplexed attribute information in the serialized data, and may include multiplexed header information, for example, if format conversion processing is performed on the multiplexed attribute information for each data, then more repeated workload is increased. Specifically, the server may perform redundancy elimination for multiplexing attribute information in data to be transmitted of the source database, specifically, the server determines multiplexing attribute information from the data to be transmitted, and performs redundancy elimination for multiplexing attribute information, for example, only one item of multiplexing attribute information is reserved. For the data to be transmitted after the redundancy of the multiplexing attribute information is removed, the server can update the character string conversion operation in the data conversion mode based on the data format optimization strategy, and perform data format conversion on the data to be transmitted after the redundancy removal according to the updated data conversion mode to obtain the data to be transmitted after the data format conversion.

In this embodiment, the server performs redundancy elimination on the multiplexing attribute information in the data to be transmitted, so as to avoid repeated format conversion processing on the multiplexing attribute information, and further simplify format conversion processing on the data transmitted by different databases, thereby improving data transmission efficiency between different databases.

In one embodiment, transmitting data to be transmitted after data format conversion from a source database to a target database through a data pipeline includes: the data to be transmitted after the data format conversion belongs to a storage type of a row format, and the data to be transmitted after the data format conversion is converted into a storage type of a column format; and transmitting the data to be transmitted after the data format conversion of the column format from the source database to the target database through the data pipeline.

The storage type of the row format is a type of storing data to be transmitted after data format conversion in a row guiding mode, namely, the data is stored according to the row format. The storage type of the column format refers to a type of storing with columns as a guide, that is, data is stored in a column form. Compared with the storage type of the row format, higher compression efficiency can be obtained when the data is stored in the column format, thereby being beneficial to reducing the data quantity of data transmission.

Specifically, the server may determine a storage manner of the data to be transmitted after the data format conversion, and determine whether the storage manner is a storage type of a row format, if so, the server converts the data to be transmitted after the data format conversion into a storage type of a column format, so as to store the data to be transmitted after the data format conversion according to the column format. And the server transmits the data to be transmitted after the data format conversion of the column format from the source database to the target database through the data pipeline. For example, the server may sequentially write the data to be transmitted after the data format conversion into the Apache array by constructing the array, so as to convert the data to be transmitted after the data format conversion into a column form. In a specific application, the server may compress the data to be transmitted after the data format conversion in the column format, and transmit the compressed data to the target database through the data pipeline.

In this embodiment, the server converts the data to be transmitted after the data format conversion into the storage type of the column format, and can obtain higher compression efficiency when compressing the data to be transmitted after the data format conversion, which is beneficial to reducing the data volume of data transmission, thereby being capable of improving the data transmission efficiency.

In one embodiment, the data transmission method further comprises: when the source database and the target database are associated to the same operation node, storing the sampling data in the data to be transmitted into a cache of the operation node; obtaining pipeline transmission data matched with the sampling data from the data to be transmitted after the data format conversion; and carrying out data verification on the sampling data in the cache and the pipeline transmission data to obtain a verification result aiming at the data pipeline.

The operation node may be a node that operates on the database, and may specifically be a computer device such as a terminal or a server. When operating on a database through a DBMS, the operating node may be the node where the DBMS is located. The sampled data is data extracted from the data to be transmitted for verification against the data pipe, such as N data that may be transmitted first. The pipeline transmission data is the data actually received after the sampling data in the data to be transmitted is transmitted through the data pipeline.

Specifically, the server may determine the operation node associated with each of the source database and the target database, and if the source database and the target database are associated with the same operation node, which indicates that the DBMSs of each of the source database and the target database are on the same computer device, the server may sample the data to be transmitted, obtain sampled data, and store the sampled data in the cache of the operation node. The server can obtain pipeline transmission data matched with the sampling data from the data to be transmitted after the data format conversion of the data pipeline transmission, namely, the data actually obtained by the sampling data through the data pipeline transmission. The server can perform data verification on the sampled data in the cache and the pipeline transmission data, for example, the sampled data in the cache and the pipeline transmission data can be subjected to data matching, so that a verification result aiming at the data pipeline is obtained. If the sampled data in the buffer memory is consistent with the pipeline transmission data, the data transmitted through the data pipeline is identical with the data written through the disk, and the data pipeline can be determined to be effective and reliable. If the sampled data in the buffer memory is inconsistent with the pipeline transmission data, the data pipeline can be determined to be unreliable, and the data pipeline can be regenerated or the data transmission mode can be updated so as to ensure accurate transmission of the data.

In this embodiment, the source database and the target database are associated to the same operation node, and the operation node buffers the sampled data, and performs data verification on the buffered sampled data and pipeline transmission data received through data pipeline transmission, so that the validity of the data pipeline can be accurately verified, and the validity and reliability of the data transmission can be ensured.

In one embodiment, the data transmission method further comprises: when the source database and the target database support the same binary data format in the data transmission process, obtaining data to be transmitted from the source database; converting the data to be transmitted according to the binary data format to obtain the binary data to be transmitted; the data to be transmitted in binary format is transmitted from the source database to the target database through the data pipeline.

The binary data format refers to a binary data transmission format, such as an Apache arow format. Specifically, the server may determine binary data formats supported by the source database and the target database in the data transmission process, and if the source database and the target database support the same binary data format, the server may directly perform data transmission based on the binary data formats, so as to avoid performing conversion processing of text formats. The server can perform data format conversion on the data to be transmitted according to the binary data format to obtain the data to be transmitted in the binary format, and the data to be transmitted in the binary format is transmitted from the source database to the target database through the constructed data pipeline.

In this embodiment, when the source database and the target database support the same binary data format in the data transmission process, the server may perform data transmission in the data pipeline based on the binary data format, so that conversion processing of the text format may be avoided, format conversion processing of data transmitted by different databases is further simplified, and thus data transmission efficiency between different databases is improved.

In one embodiment, as shown in fig. 5, a data transmission method is provided, where the method is executed by a computer device, specifically, may be executed by a computer device such as a terminal or a server, or may be executed by both the terminal and the server, and in this embodiment of the present application, the method is applied to a server connected to a target database in fig. 1, that is, a target database management server 106 is illustrated as an example, and includes the following steps:

step 502, determining a source database to be data-transmitted with a target database, and creating a data pipeline between the target database and the source database.

The source database is a database for exporting data in the data transmission process, and the target database is a database for importing data in the data transmission process. The data pipeline can be created based on the respective communication interfaces of the source database and the target database, and the direct data transmission between the source database and the target database can be realized through the data pipeline.

Specifically, when data in the source database needs to be transmitted to other databases, a server connected with the target database determines the source database to be transmitted with the target database. The server may create a data pipeline between the source database and the target database, and in particular may create a data pipeline directly connected between the source database and the target database according to respective communication interfaces of the source database and the target database.

Step 504, receiving the data to be transmitted after the data format conversion transmitted from the source database through the data pipeline; the data to be transmitted after the data format conversion is obtained by updating the character string conversion operation in the data conversion mode supported by the source database in the data transmission process based on the data format optimization strategy and performing data format conversion on the data to be transmitted in the source database according to the updated data conversion mode; the data to be transmitted after the data format conversion comprises data which is obtained based on the character string conversion operation in the updated data conversion mode and belongs to the object array type.

The data conversion method is a method of converting a format of data to be transmitted. The data format optimization strategy is determined based on a data conversion mode supported by the source database and is used for optimizing data format conversion of the source database when the derived data is transmitted so as to simplify format conversion processing. The data to be transmitted is data that needs to be transmitted between the source database and the target database.

Specifically, on the source database side, a data conversion mode supported by the source database in the data transmission process can be determined, and a data format optimization strategy matched with the source database is determined based on the data conversion mode supported by the source database. Further, determining data to be transmitted from a source database, updating character string conversion operation in a data conversion mode based on the determined data format optimization strategy, performing data format conversion on the read data to be transmitted according to the updated data conversion mode, so as to convert the data to be transmitted into an intermediate data format suitable for transmission in a created data pipeline, obtaining the data to be transmitted after data format conversion, wherein the data to be transmitted after data format conversion comprises data which belongs to an object array type and is obtained based on the character string conversion operation in the updated data conversion mode, and transmitting the data to be transmitted after data format conversion to a target database through the data pipeline. The server can receive the data to be transmitted after the data format conversion from the source server through the data pipeline.

Step 506, converting the data to be transmitted after the data format conversion into a target data format supported by the target database, and storing the data to be transmitted belonging to the target data format in the target database.

The data format of the data to be transmitted after the data format conversion may be different from the data format of the data stored in the target database, and then the data to be transmitted after the data format conversion needs to be converted into the target data format supported by the target database, so that the target database stores the data to be transmitted belonging to the target data format.

Specifically, the server may determine a target data format supported by the target database, compare the data format of the data to be transmitted after the data format conversion with the target data format, and if the data format of the data to be transmitted after the data format conversion does not match the target data format, the server may convert the data to be transmitted after the data format conversion into the target data format, and store the data to be transmitted belonging to the target data format in the target database.

In one specific application, as shown in FIG. 6, a source database is connected to a source DBMS, through which the source database can be managed; the target database is connected with the target DBMS, and can be managed through the target DBMS. When data needs to be transferred from a source database to a target database, this can be accomplished by the source DBMS and the target DBMS. The specific user side can respectively send data transmission requests to the source DBMS and the target DBMS, and the data transmission requests sent to the source DBMS can be data export requests so as to instruct the source DBMS to export data in the source database; the data transfer request sent to the target DBMS may be a data import request to instruct the target DBMS to import data into the target database. The target DBMS may determine the source database and further determine the source DBMS to which the source database is connected, the target DBMS creating a data conduit with the source DBMS such that data transfer may be directly between the source DBMS and the target DBMS based on the data conduit. The source DBMS may determine data to be transmitted from the source database and transmit the data to be transmitted after the data format conversion to the target DBMS through the data pipe. The target BDMS can convert the received data to be transmitted after the data format conversion to obtain the data to be transmitted belonging to the target data format, and write the data to be transmitted belonging to the target data format into the target database.

In the data transmission method, for a source database and a target database to be subjected to data transmission, a data pipeline between the source database and the target database is created, data to be transmitted after data format conversion transmitted from the source database is received through the data pipeline, the data to be transmitted after data format conversion is based on character string conversion operation in a data conversion mode supported by the source database in a data transmission process, the data to be transmitted in the source database is subjected to data format conversion according to the updated data conversion mode, and the data to be transmitted after data format conversion is converted into a target data format supported by the target database and then stored in the target database. When data transmission is carried out between different databases, a data pipeline between a source database and a target database is constructed, data to be transmitted after data format conversion is directly transmitted between the source database and the target database through the constructed data pipeline, intermediate storage is prevented from being introduced for transfer, data belonging to an object array type is obtained through character string conversion operation conversion, and the data to be transmitted in the source database is subjected to targeted format conversion by utilizing a data format optimization strategy, so that format conversion processing of the data transmitted by different databases can be simplified, and the data transmission efficiency between different databases is improved.

In one embodiment, the data pipe is created based on a first process communication interface of the source database and a second process communication interface of the target database; the data transmission method further comprises the following steps: when a data import request is received, creating a data import working thread associated with the data import request; recording a second process communication interface of the data import working thread into a data import thread catalog; when the number of the imported threads of the data importing working threads included in the data importing thread directory exceeds the number of the importing threads, configuring the idle data importing working threads into a waiting state; the export thread number is the thread number of the data export work thread created by the source database based on the data export request.

The process communication interface is an interface for communicating with the process of the database, and specifically may be a socket, including an IP address, a port, and the like. The data import request may be sent by the user to the target database via the client to request the target database to import the data. The data import work thread is a thread which is generated by the target database end and used for executing the data import task. The database can correspondingly generate corresponding data import working threads aiming at different data import requests so as to realize concurrent data transmission processing through multiple threads. The data import thread directory is used for recording various data import worker threads created by the target database based on the data import request. The import thread number is the number of data import worker threads that the target database has created; the export thread number is the thread number of the data export work threads created by the source database based on the data export request, that is, the export thread number is the number of the data export work threads already created by the source database based on the data export request. A data export request may be sent by a user through a client to a source database to request the source database to export data. The data export worker thread is a thread generated by the source database end and used for executing data export tasks.

Specifically, the server may receive a data import request sent by a user through the terminal, and when the data import request is received, the server creates a data import worker thread associated with the data import request; to execute the data import task of the data import request through the created data import worker thread. The process communication interface may be correspondingly configured for the created data import request, and the server may determine a second process communication interface of the data import worker thread, that is, a second process communication interface that is a target database. The server may acquire the data import thread directory, and record the second process communication interface of the data import working thread into the data import thread directory, and specifically may record the IP address and the port number of the second process communication interface into the data import thread directory. The server may determine the number of import threads of the data import worker threads included in the data import thread directory, and the source database may determine that the number of data import worker threads is greater when the number of import threads exceeds the number of export threads based on the number of data export worker threads created by the data export request, and may determine an idle data import worker thread from the data import thread directory, and configure the idle data import worker thread to a waiting state to wait for an end of data transmission, or wait for the data export worker thread to establish a communication connection.

In this embodiment, when the server receives the data import request, an associated data import working thread is created, and a second process communication interface of the data import working thread is recorded in a data import thread directory, and when the number of import threads of the data import working thread included in the data import thread directory exceeds the number of import threads, the idle data import working thread is configured into a waiting state, so that the idle data import working thread can be processed in time, so that reasonable scheduling of the data import working thread is realized, and resource waste is reduced.

The application also provides an application scene, and the application scene applies the data transmission method. Specifically, the application of the data transmission method in the application scene is as follows:

in actual business, different data systems need to interact frequently, e.g., data needs to be transferred frequently between file systems, databases, different DBMSs (Database Management System, database management systems) and compute engines. There is also a significant amount of data movement between the different data system DBMS's, and each DBMS may have its own internal data format. To solve this problem, a common approach in the industry is to use an intermediate Extract-Transform-Load (ETL) process to issue queries to the source DBMS using protocols such as JDBC (Java DataBase Connectivity ) or to Extract data through a special extractor and then insert it into the target DBMS. However, this approach does not fully exploit the fact that the source and target systems may both be share-nothing architecture and likely execute in the same cluster, and is therefore not suitable for moving large data sets. Alternatively, many DBMS's provide for a user to export stored data to external storage in a particular format and to be able to re-import the data in the same format. A user may utilize this capability to transfer data between two DBMSs, first exporting the data from a source system and then importing it into a target DBMS. This is more effective in the presence of a common data format between the source and target DBMS, such as based on text TXT format, CSV format, or JSON format. However, using text-based formats such as TXT, CSV, or JSON, mobile data is very expensive, requiring serialization of data to be transmitted from the internal format of the source DBMS to a generic text format, storing the serialized data in physical memory, and then importing the serialized data from memory into the internal format of the target DBMS.

Where a DBMS is a database management system, it is a software system that can assist a user in interacting with one or more databases to create, access, manage, and update data. DBMS may be used for various types of data management, such as data management in the business and scientific fields. In a database management system, data transfer is typically the transfer of data from one DBMS to another DBMS for tasks such as hybrid analysis. Data transfer may be accomplished in a variety of ways, such as using an intermediate ETL process, converting data into a common format (e.g., CSV or JSON) for import and export, or constructing a dedicated data transfer program (data pipe), etc. This embodiment improves data transfer efficiency by automatically constructing and embedding data pipes. ETL refers to the process of extraction, conversion and loading of data as it is transferred from one database to another, which is commonly used in data integration and data warehouse construction. CSV is a common data format, representing comma separated values, used to store and transfer data between different database management systems. The CSV file includes a plurality of lines, each representing a record, each record containing a plurality of fields separated by commas, an important feature of the format being that it is a text file that is easy to read and edit, but may be inefficient in transmitting large amounts of data. JSON is a common data exchange format, commonly used on the internet, derived from JavaScript object notation, and therefore has the characteristics of easy reading and writing, and can encode structured data, such as lists and dictionaries, into plain text format for transmission over a network, and can parse and generate quickly.

Based on this, the present embodiment provides a data transmission method, which uses a plurality of optimization algorithms to improve the performance of data transmission, and specifically uses additional codes to expand the implementation of text-oriented data formats to intercept the original data, identify and eliminate delimiters before the text encoding thereof, and identify and eliminate redundant metadata. By capturing the raw data, an efficient binary data pipeline can be automatically constructed using efficient binary formats, such as Arrow format transmission data, taking existing CSV or JSON data export and import functions as specifications. The Arrow is an efficient binary data transmission format, can be used for data transmission among different database management systems, can improve the performance of data transmission, can simplify the evolution process of the binary data transmission format, is excellent in data transmission, and can provide higher performance than text oriented formats such as CSV and JSON. The data transmission method provided by the embodiment can provide efficient data transmission between different data systems and data storage in the service system. The data transmission pipeline can be automatically constructed by utilizing the capability of various DBMS data exports, and the efficient binary data transmission capability is realized by eliminating unnecessary calculation in a text-oriented format. In addition, by improving performance through using the techniques of external library, intermediate format, column orientation, compression and the like, the inherent calculation problem of the text format is solved, the functions of data import and export are expanded, and the service data test shows that the device can obviously accelerate the data transmission speed between different systems. The data transmission method provided by the embodiment not only can realize high-efficiency binary data transmission, but also can simplify the evolution of binary data transmission formats, and can automatically expand various new file formats without respectively realizing support for the new formats in each system.

The data transmission method provided by the embodiment can be applied to a data analysis platform oriented to important business scenes such as advertisement recommendation, data analysis and the like, and high-efficiency data transmission among different data storage systems is supported. Modern data analysis typically requires retrieving stored data from multiple DBMSs. For example, consider that business personnel collect network data in a graph database while storing other metadata in a relational database system. To analyze the data, business personnel must retrieve the data from the relational store and then combine it with the data stored in the graphic database to produce the final result. While it may also be desirable to temporarily move the retrieved data to other DBMS data systems to use machine learning algorithms, array processing operators, etc. Many DBMS provide a way for a user to export stored data to an external storage system and re-import the data in the same format. Users can use these export and import functions to transfer data between multiple DBMSs, but this approach is very inefficient. Therefore, the data transmission method provided by the embodiment can greatly improve the data transmission efficiency, so that the data analysis and use efficiency in actual business is accelerated. Specifically, the data transmission method provided in the present embodiment automatically creates a data pipe based on a method of program analysis, and realizes direct data transmission between DBMSs. Moreover, by eliminating unnecessary computation inherent in text-oriented formats, the transmission performance of the data pipeline is improved. Further, the inefficient text-based data pipeline is replaced with an efficient binary data pipeline, so that a fast data transmission format is available to the data analysis engines without requiring the developer to manually alter the source code of these engines.

Specifically, the data transmission method provided by the embodiment can enable the user to efficiently transmit data between different DBMSs. To use the data pipeline, the source code that generates the data pipeline is first called, and after compiling, query statements may be written to utilize the generated data pipeline for data transmission.

For building a data pipe, a user may make a call through the terminal using a plurality of input parameters, which may include: scripts for building the DBMS and its source code, such as a creation script for MySQL or PostgreSQL databases; a script for executing a script associated with import and export functions of a particular data format, such as CSV, JSON, or binary format, for unit testing, such as exporting data from a MySQL table into CSV format; names of specific data formats generated during import and export, nouns of data formats used for library replacement; other configuration-related metadata such as path of data source, data format, configuration of DBMS, etc. Given these input parameters, data is exported via a web Socket (Socket) for instantiation at run-time by analyzing the DBMS's source code and generating a data pipe. Network sockets, among other things, refer to a mechanism in a computer network by which data may be transferred between different hosts. The use of the network socket can realize the communication among different hosts, and the functions of interprocess communication, data transmission and the like can be realized through the collocation and use with the network communication. Specifically, a data pipeline is generated through a network Socket, and in the process of realizing data transmission and interaction processing, a Socket object can be created and bound to a designated IP address and port number at a source DBMS end; a Socket object can be created on the target DBMS and connected to the IP address and port number appointed by the server side; at the source DBMS, the accept () method is used to receive the connection request of the target DBMS and return a new Socket object for communicating with the target DBMS. Between a source DBMS and a target DBMS, the data can be read and written by using the InputStream and OutputStream classes, the transmission efficiency can be improved by using a buffer area in the data transmission process, and after the data transmission is completed, a Socket object and a stream object are required to be closed, so that resources are released.

Compared with a file system, the method has the advantages that the network socket is used for transmitting data, so that the data can be prevented from being stored in a physical medium, and the transmission efficiency is improved; the transmission speed can be increased by utilizing the parallel export and import functions; the use of an optimized binary format can avoid unnecessary computational and performance overhead in the text format.

In data transfer based on the generated data pipe, two queries need to be performed, one to export data from the source DBMS and the other to import data into the target DBMS, to use the generated data pipe. These queries can be performed in any order, and the system will automatically block, i.e., the upstream working directory will block if no data is available, temporarily not write data downstream until both DBMS's are ready. The query issued to each DBMS needs to be written to a form in which the data is moved from the source DBMS to the target DBMS by physical storage, i.e., the user only needs to specify the name of the target DBMS and the data to be exported from the source using a special "db:// X" syntax. The system, upon receiving the query, connects the data pipes generated in the source DBMS and the target DBMS by passing a web socket to transfer the data. In the case where both the source and target DBMS support multiple worker threads, they coordinate and match the work directories maintained by the system.

As shown in FIG. 7, step 1, a user submits a query to a source database application, which may be a source DBMS, such as Oracle, to calculate raw data, and exports the calculation to a target database application, which may be a target DBMS, using a data pipe; step 2, the user issues an import query on a target DBMS, such as Spark, to calculate a result; step 3, using the generated data pipeline to transmit data, and specifically transmitting the data through the generated data pipeline by the respective working threads in the source DBMS and the target DBMS; and 4, coordinating the connection process through the working catalogue. Wherein the working directory is used to coordinate the connection of the data pipes between the source DBMS and the target DBMS. When a user exports data from a source DBMS a to a target DBMS B, the DBMS B prepares to import, each worker thread worker B1, …, bn registers with the work directory to receive data from the DBMS a. When data is exported from the source DBMS a, each worker thread worker a1, …, an queries the work directory to obtain the address and port of the receiving worker bi. Each export worker ai will block until there is data available in the directory, and then it connects to the corresponding worker thread worker using a web socket. To allow multiple concurrent transmissions between the same pair of DBMS, each import and export query is also assigned a unique identifier to disambiguate the concurrently executed queries. When the number of imported workers exceeds the number of exported workers, the working directory opens a "stub" socket for the excess imported workers, which immediately signals the end of file, and the imported workers wait until the data transfer is completed.

For a working directory, a distributed DBMS supports the parallel import or export of data using multiple working threads. The present apparatus uses a working directory to match working threads in a source and target DBMS. The directory is instantiated when the DBMS is started so that it can be accessed by all DBMS's work processes. When a user exports data from the DBMS a to the DBMS B, each worker thread B1, … …, bn registers with the directory to receive data from the DBMS a when the DBMS B is ready for import. When data is exported from the DBMS a, each worker thread a1, … …, an queries the directory to obtain the address and port of the receiving worker thread bi. Each export worker thread ai blocks until there is data available in the directory and then connects to the corresponding worker thread using a web socket. To allow multiple concurrent transmissions between the same pair of DBMS, each import and export query is also assigned a unique identifier to disambiguate between concurrently executed queries. The data pipe class performs a directory registration or query process. Initiating data transfer using the work directory, the overall workflow being importing the work directory bj from query Q1, registering and waiting for connection; export working directory ai queries the directory and blocks until data is available; a connection is established between the source DBMS and the target DBMS. The working directory typically assumes the same number of working threads between the source and target DBMS, but the user may embed metadata to explicitly indicate the number of export and import processes. When the number of import worker threads exceeds the number of export worker threads, the worker directory opens a "stub" socket for the isolated import worker thread, which immediately signals the end of file. In this way the additional import worker thread is idled until the data transfer is complete.

In the process of generating data pipes, in order to generate data pipes, with the source code of the DBMS as input, the test may perform import and export functions of a given format, and modify the source code of the DBMS to generate data pipes that may be transferred between DBMSs using a common data format, further select the common data format as a text format, thereby optimizing the format of the transferred data to improve performance. As shown in fig. 8, the process of generating the data pipeline is implemented based on two modules, including a lead-in-out (IO) redirector and a format optimizer. The input/output redirector, namely the IO redirector, is used for generating a data pipeline based on the database application source code, specifically based on the DBMS source code, and transmitting data through the network socket; the format optimizer is used for improving the data transmission efficiency of the text-oriented format. The redirector and the format optimizer are exported based on the import and the export, and the optimized data pipeline can be generated after the source code is processed for the database application and then recompiled. The format optimizer may replace an instance of a given format library with a corresponding subtype that avoids the overhead associated with strings and separators and generates an AString containing the internal state of the binary representation, thereby enabling support for a variety of different file formats.

In particular, in an IO redirector, import-export unit testing is a test method that examines specific code segments in a software system. In particular for analyzing the source code of a DBMS and performing import and export unit tests to create a data conduit for directly transferring data. The process needs to detect the execution condition of the code in real time and modify the code to perform I/O operation by using a network socket instead of a disk file, so that the data transmission efficiency can be improved, and the performance bottleneck caused by disk I/O is avoided. The file open expression is a code expression for identifying the relevant call site in the DBMS when opening an import or export file by inserting the code of the file open call in the source code, performing import and export unit tests, and capturing the file name passed to each call. Further, all file calls that are not related to the target import or export are deleted and code is added that redirects the I/O operations to the network socket at runtime so that the data can be transferred in any size and also applies to file systems that do not support named pipes, such as for HDFS (Hadoop Distributed File System, distributed file system). Specifically, the IO redirector finds the location in the DBMS source code that relates to the file system call and adds code to redirect I/O operations to a network socket provided at runtime so that data can be transferred directly from the source DBMS to the target DBMS, bypassing the disk. Condition redirection refers to the redirection of a data stream according to specific conditions. The IO redirector redirects file system calls in the DBMS source code to the network socket using conditional operations and in this way creates a data pipe. The condition is that the reserved file name format is used to activate the data pipeline so that the IO operation can perform network socket operation instead of file system operation, and the condition operation for preventing unnecessary IO operation can improve the data transmission performance. The reserved file name refers to a specific file name format which must be used when importing and exporting data when using efficient data transmission between different database management systems, and the file name format is identified by a system tool and automatically converted into a binary data transmission mode so as to avoid problems when using physical storage for serial transmission.

Further, the IO redirector creates a data conduit that exports data from the existing serialization code of the DBMS and modifies the code so that it no longer uses the file system, but exports or imports data at run-time through the network sockets provided by the system. When the source DBMS and the target DBMS coexist on the same computer, the socket is a local annular socket; otherwise the socket would connect to the remote computer of the target DBMS. To modify the source code of a DBMS in this manner, the IO redirector first identifies the associated file system call site and opens a disk file for import or export. The IO redirector disambiguates by adding tools for all file open calls in the source code, performing import and export unit tests, and capturing the file name passed to each call. Except for the import or export object, all calls whose filenames are not the import or export object are deleted. The IO redirector then modifies each remaining call site by adding code to redirect I/O operations to the network socket. The new code is executed only when the user specifies that the file name is reserved for import or export. The conditional nature of redirection is important and can preserve the ability of the DBMS to import and export from disk files.

As shown in fig. 9, a data pipe is created between a source database application (i.e., a source DMBS) and a target database application (i.e., a target DMBS), data is transmitted through a network socket, a user adds an export plug-in the source DMBS, specifically, adds an import plug-in the target DMBS, so as to transmit data in a data table of the source DMBS to the target DMBS data table, and activates a data pipe automatically generated by a system through a designated reserved file name as an import and export destination, thereby realizing direct transmission of data between different database management systems, and avoiding defects caused by serialization through physical storage.

When conditional logic is added to each associated call site, the IO redirector introduces a special class of data pipes in the source code of the DBMS that reads or writes to the remote DBMS through the incoming web socket, rather than using the file system, but is otherwise a subtype of its file system orientation. This allows the system to replace all instances of the file IO class with data pipe instances. For example, in a language like C or C++, the file descriptor may be replaced with the Berkeley socket descriptor, while the Java or NET file stream may be exchanged with the web socket stream. When the reserved file name is specified, the IO redirector modifies the code to use the generated data pipeline. The correctness of the generated data pipe can be verified in particular by: first, a validation agent is started that imports and exports data as if it were a remote DBMS; the agent redirects all data received through the data pipe to the file system, i.e., redirects data received through the data pipe to the file system, so that the data can be stored on the physical storage device and transferred through the data pipe; next performing a unit test of the modified DBMS using a reserved file name format that will activate the data pipes of all files; as data is imported and exported on the data pipe, the agent reads and writes to the disk; existing unit test logic is then relied upon to verify that the content read from or written to disk is correct.

In addition, the IO redirector may provide a dynamic debug mode that may be used when the source DBMS and the target DBMS are located on the same computer, as the correctness of the generated data pipeline may be verified by the runtime sampling. In this mode the first n transmitted data are written to disk using existing serialization logic and transmitted through the data pipe. The recipient DBMS reads the data from the file system and compares it to the value transmitted through the pipeline. Any deviation triggers a check failure.

In the format optimizer, the import-export unit test functions the same as the import-export unit test in the IO redirector, i.e., to examine a specific code segment in the software system. Data flow analysis can identify data flow conditions in a program, including the transfer from the definition of variables to use. The creation and embedding of data stream pipes is achieved using static and dynamic code analysis, and the data pipes are automatically constructed and embedded. The process of data transfer between different database management systems is simplified, and the efficiency is improved. Accurate analysis of data streams is a great aid to both the efficiency and accuracy of program analysis. Type replacement refers to the process of replacing one data type with another data type in a program. A type replacement technology is used for solving the problems of serialization and conversion of data, and the java.lang.string class is replaced by an AString class. The AString class is a data type supported by an object array, can avoid some problems of java. In addition, other data types of DBMS using external library to sequence data can be replaced, so that the data transmission efficiency is improved.

Specifically, relevant code segments that import and export data to disk files are first found in the source code of the DBMS. File system operations, such as file opening and closing, are identified for each of these segments and replaced with equivalent operations on the network socket. The expressions of the import or export file are tested and identified in the code by detecting the provided units and then modifying those expressions to allow data to be sent and received directly through the network socket, rather than through the file system. This approach allows data to be transferred from any size data set and is suitable for file systems that do not support named pipes, such as HDFS systems. To further improve the performance of the generated data pipeline, unnecessary operations oriented to the text format are eliminated using a format optimizer.

Assuming that the data import and export implementation of a given DBMS is good, it is required that no large addressing be performed during file IO and that the import or export file be opened only once, otherwise multiple sockets may be created during pipe generation. For text-based formats, it is assumed that the data is serialized using character strings, rather than byte arrays, and that the values are constructed using character string concatenation rather than random access. For a given transmission, each DBMS supports the same data format. For export tasks, the data is read and serialized to the corresponding code path on disk using system calls. For import tasks, the serialized data is read from disk and restored.

Further, when a pair of DBMSs share an optimized binary format, the data pipe generated by the IO redirector may be immediately used to transfer data between the two systems. For example, by creating one data pipe, hadoop sequence files are used to transfer data between Spark and Oracle because this format is supported by two systems, but there are few cases where two systems support the same valid binary format at the same time. For example, the part format is supported only by Spark and Hadoop native, oracle does not support part requiring third party extensions, and deby does not support importing and exporting any binary format. Among them, derby is a relational database management system (DBMS) that is adapted for single node environments and supports SQL (Structured Query Language ).

In contrast, text-oriented formats are more commonly supported, and almost all DBMSs in the industry support batch importation and exportation using CSV formats, so text-based data formats can transfer data between these systems, but text-based format transfer is inefficient, resulting in significant performance overhead. The main reasons include the following. First, the problem of string coding of the numeric type, typically, is that the converted string size is much larger than its original value, which increases the transmission time. For example, a 4 byte floating point value requires 24 byte string encoding. Second, the string conversion and parsing overhead problem, text-oriented export and import logic spends a significant amount of time converting the base values into and parsing out of their string representations. In addition, there is the problem of unnecessary separators, and for attributes with a fixed width representation, many data formats include unnecessary separators, such as value separators and line breaks. Moreover, DBMS's that output CSV and JSON are typically stored in a row format, which makes it difficult to obtain the benefits of column-based layout by applying layout and compression techniques. Actual data indicates that converting intermediate data into a column-wise stored form results in some performance improvement.

In order to improve the performance based on text formats, the format optimizer in this embodiment can solve the above-described drawbacks. Firstly analyzing the source code of a DBMS, determining whether export or import logic uses an external library to sequence data into JSON or CSV, specifically analyzing the source code of the source DBMS, and using a character string detection operation to determine whether the external library is used; the format optimizer will then replace the library usage with a perceptual variable that is directly responsible for performing the optimization. Specifically, the format optimizer uses a perceptual variable to replace the use of the library, and when data export or import is needed, the implementation of the library is replaced by a subtype, the subtype avoids the expense caused by character strings and separators, when the text format is constructed or parsed, a binary representation is directly generated or generated inside, when the generated text fragment is converted into the character string form, the subtype generates an AString containing the binary representation, and when the data is imported, the intermediate binary representation AString is directly used. If the DBMS has already implemented its own serialization functionality, rather than using an external library. For example, myria implements its JSON export function, and another mode would be used for such DBMS system format optimizers, namely to optimize data transfer with string modification.

Specifically, the substitution subtype and the string modifier differ in that the substitution subtype mainly substitutes the original data type for the AString type, and the string modifier substitutes the original string type for the AString type. When replacing a subtype, an array of AString types is created to replace the original object array, and optimization is performed by replacing the original object type with this subtype of AString and storing references to the objects in AString instead of characters. And when the character string is decorated, data flow analysis is performed, the character string related to data transmission is identified, and all character string literal values are replaced by AString examples. The alternate subtype simply holds the object reference in AString and the string modifier holds the corresponding string denomination in AString. The character string modification is a technology for optimizing character string processing in the data transmission process. By modifying the source code of the database management system, a fixed width raw data format is converted into a compact binary representation while using enhanced string types to store values and avoid transmission of tail strings and metadata to increase data transmission speed and efficiency.

Assuming the DBMS does not use an external library, the DBMS will sequence the data into CSV or JSON text using the following string operations, convert the object and original values into strings, concatenate the strings, add separators and metadata, such as attribute names, and then write the results out. To optimize these steps, the format optimizer modifies the DBMS source code so that the modified DBMS is modified to a compact binary representation when attempting to write the original string into the stream. The normal string and other non-original values are transmitted in their original form. Doing so may eliminate transmission of unnecessary values such as delimiters and attribute names. But the main challenge is that many original values may be embedded in any string passed to the data pipe type. For example, a particular DBMS may concatenate all attributes together before writing them to the output stream, which is too late when the data pipe type receives the concatenated value, because both attributes have been converted to strings. Thus, this embodiment introduces a new enhanced string named AString to solve this problem, which is a subtype of java. AString is supported by an array of objects rather than characters. By replacing the java. Lang. String instance with an AStrings in the correct location, the above problem is avoided because it stores references to objects to be serialized instead of their string representations.

Consider, for example, a common character string join operation:

the format optimizer converts the statement into a statement using AStrings:

each of these three instances holds its associated value as an internal field (1, "," and "a", respectively), while the connection result is an AString instance holding states [1, ",", "a" ], internally. It should be noted that the final AString instance need not contain the concatenated string "1, a" in its internal state, as it can be regenerated and cached as needed. More complex types are immediately converted to strings during this aggregation process to ensure that subsequent changes to their state do not affect the internal state of the AString instance. Converting a complex object into a string, for example by calling toString, may result in an AString instance, which allows nested conversions in supported formats.

When the IO method is invoked on a data pipe type, it is checked whether any string parameter is AString. Further the method of data pipe type return string will return AString. The data pipe type directly exploits the unconverted values present in the AString internal state during export; also during import, AString will split and convert to numeric values over separators without materializing into strings.

This solves the problems described above, but does not solve the problem of replacing AString with a common string instance. Intuitively, instead of replacing all string instances in the source code, only values directly or indirectly related to the data pipeline operation need be replaced. To find this subset the device performs the provided unit test and marks all call sites where the write reads data from the data pipeline, and then performs a data flow analysis to determine the source of these values (for export) and to convert them to the original values (for import).

Further, a dataflow graph is a graphical representation of the flow of data during a computation, which represents the flow of data during the computation and its processing in the form of nodes and edges, i.e. each node represents a computation operation, and edges represent the flow of data. The present embodiment may employ a dataflow graph to determine alternative candidate expressions. The format optimizer replaces three types of string expressions with the generated dataflow graph: the string literal amount and constructor, converting/forcing any value into a string (for export), and converting/forcing any string into a base type (for import). The string modification process may specifically be as follows:

functionTRANSFORM(T: tests)

1:for each t∈T doFor each calling node T in the set of calling nodes T;

2: Find relevant IO call sites Cfinding a call node set C related to IO call;

3: for each c∈C do for each calling node C in the set of calling nodes C;

4: Construct data-flow graph Gconstructing a data flow graph G by using the nodes;

5: for each expression e∈G dofor each conversion operation e in dataflow graph G;

6: if e is literal v or

7: e is instantiation String(v)or

8: e is v.toString( )then

9: Replace v with AString(v)v replacing the string literal v with the enhanced instance asctring (v);

10: else if e is Integer.parseInt(v)then

11: Replace v with AString.parseInt(v)

12: else if e is Float.parseFloat(v)then

13: Replace v with AString.parseFloat(v)

14: Similarly for other string operationsfor integer data, floating point data and data related to character string conversion, the format of AString (v) is replaced;

and performing each test and determining related file IO call nodes in the 1 st-2 nd rows, namely, finding out a call node set C related to IO call for each call node T in the call node set T. Line 4 refers to building a dataflow graph using these nodes. For the conversion operation e of the string format, it is replaced with a corresponding AString operation (lines 5-14). Lines 6-9 are the example AString (v) for which the data derives a relational expression, replacing the string literal v with an enhancement. Lines 10-14 are intended to support efficient importation, with similar substitution operations for basic types of string transformations.

Many DBMS use external libraries to serialize data. Processing these engines requires implementation of a custom subtype for each external library (typically using several commonly used external libraries). In this mode the format optimizer replaces the instantiation of a given formatting library with a device-aware subtype that attempts to avoid the overhead associated with strings and separators. For example, a binary representation may be built or generated internally whenever the DBMS invokes a method of building or parsing a text format. When the generated text segment is converted to a string form, the perceived subtype will generate an AString containing a binary representation as its internal state. The library subtype recognizes during import that it is interacting with the data pipe type and directly consumes the intermediate binary representation. This allows library subtypes to build efficient internal input representations.

In addition, the format optimizer may also replace instantiation using a library with an extended version of the library, as well as any writer or stream interface that the library exposes. Similar to string adorning, if the generated code does not pass all unit test cases, the format optimizer will disable library call replacement. If the character string decoration fails the test, the device only generates a basic data pipeline for data transmission. Specifically, the format optimizer may replace instantiations using the library with an extended version of the library and replace them with an interface disclosed by the library, similar to string adornments, if the generated code does not pass all unit test cases, the format optimizer may disable the way the library is used; if the string adornment also fails the test in this regard, only the basic data pipe is generated. If the generated code fails all unit test cases, the format optimizer disables the way the library is used because the generated code needs to ensure the correctness and reliability of the generated data, and if the test fails, the correctness and consistency of the data cannot be ensured. Also, if the string adornment fails the test in this respect, only the basic data pipeline is generated to ensure the correctness and consistency of the data. The extended version of the library is used to improve the performance of the text format, particularly when handling large-scale data transfers, which would attempt to avoid the overhead associated with strings and separators, and directly use the data in an intermediate binary representation.

The coding and conversion/parsing overhead of numeric types is solved by using AStrings, and intermediate format optimizations, including delimiter inference and deletion and redundant metadata removal, can be further made to eliminate delimiters and avoid redundant metadata, which are implemented in the data pipe types.

Specifically, delimiter inference and deletion. Text-oriented formats such as CSV and JSON include separators for separating properties and representing the beginning and end of compound types, which separators are fixed in advance in some cases, for example, in JSON using brackets to represent arrays. However, the default separator will typically be different on a per system basis. This is common under CSV where some systems default to using non-comma delimiters, e.g., hadoop default to using tab delimiters, or allow the user to specify delimiters, e.g., derby. To eliminate the separator format optimizers need to infer them first, the unit tests provided are run first to perform this operation. The format optimizer counts the number of strings of length one in the array during execution of each test and determines the most likely character to be a separator. For example, the array [1, "|", "a, b", "\n" ] contains exactly one string of length one ("|"), so this is considered to be most likely a separator. The inputs [1, "|", "a, b", "\n" ] are ambiguous because "|" and "a" occur at the same frequency.

In this case, the following heuristic method is applied in order: first, a non-alphanumeric separator; second, early separators. Under both heuristics, "|" will be chosen as the final separator. It should be noted that if an erroneous separator is inferred, invalid data will be transferred to the remote DBMS. In the previous example, if the format optimizer selected "|" as the separator, and in fact the character "a" was the correct separator, it would erroneously transmit tuple (1, "a") instead of the correct value ("1|", the ""). More importantly, this can lead to unit test failure. This may cause the format optimizer to disable the optimization until the unit test expands to completely disambiguate.

For deleting redundant metadata, more complex text formats such as JSON may not require the delimiter inference described above, but rather serialize complex types such as arrays and dictionaries. In generating or consuming JSON or similar text formats, the compound types generated by the DBMS typically contain highly redundant values. Consider, for example, the following document generated by the Spark toJSON method:

{ " column1 "∶ 1 ," column2 "∶" value1 " }

{ " column1 "∶ 2 ," column2 "∶" value2 " }

{ " column1 "∶ 3 ," column2 "∶" value3 " }

when such JSON documents are moved between systems, the repeated column names greatly increase the size of the intermediate transmission. To avoid such an overhead format optimizer modifying the format of the intermediate data, only the set of keys associated with the dictionary array is transmitted. In the above example, the format optimizer would transmit the column name [ "column1", "column2" ] as key header information and then the value [ (1, "value 1"), … ] as a series of pairs. The format optimizer reverses this process at import to generate the original JSON document.

The logic of this translation is embedded in the JSON state machine, and in particular can be embedded in a subcomponent of the data pipe type for consuming the AString array. When the format optimizer converts to the key state of the first dictionary in the array, it accumulates the key into the key head information. Once the dictionary is checked, key head information is transmitted to the remote DBMS. Subsequent dictionaries in the array will be transferred without keys in a different case than the initial dictionary, and this approach can be extended to nested JSON documents.

If a new key is encountered in the subsequent dictionary in transmitting key header information, the format optimizer will employ one of two strategies. First if the keys of the new dictionary are a superset of the keys in the key head information, the format optimizer appends the new keys to the existing key head information. This solves the problem of incomplete derived key sets due to missing values in the initial derived dictionary, etc. The second case occurs in a case where keys associated with a dictionary different from the key head information do not intersect. This may occur when exporting from a DBMS, the exported elements have widely varying formats. In this case the format optimizer disables the optimization of the current dictionary and does not delete keys during transmission.

In addition, column guidance and compression processing can be performed on the transmitted data. The DBMS, which is typically in a text-oriented format, outputs in a row format. For example exporting Spark RDD containing n elements as CSV or JSON will generate n rows, each row containing one element of RDD. The same is true for other systems, both JSON and CSV formats. Once the format optimizer generates an efficient representation of the data, it is no longer necessary to transmit the data in a line-first form. For example, the data pipe type may accumulate and convert the derived data blocks into a column-first form to improve transmission performance. In particular, DBMSs outputting CSV and JSON are generally behavior oriented, and data compression efficiency for a column-oriented form may be higher, so that IO efficiency and data transmission performance between DBMSs may be improved. Arrow is used in this embodiment as the data structure to be transmitted because it performs best. To maximize performance, blocks of row data are accumulated into memory and embedded into a set of arow buffers, converted to a form of column storage, and then transferred to the target DBMS. The recipient's DBMS will reverse this process.

In a specific implementation, the data transmission method provided in this embodiment may be implemented using Java, and the generated data pipe currently supports a DBMS that uses a local file system or HDFS for data import and export. Specifically, for a file IO redirector, fileInputStream and FileOutputStream may be targeted as related file system calls to be modified. In addition to the concrete implementation of the data pipe class (datapipe input/OutputStream), enhanced versions of the following Java classes may be created: stringBuilder/Buffer introduced an array similar to that in AString. Output/input stream writers, which are modified to interact with the DataInput/Output stream class, contain the reload of string IO. The bufferedOutput/InputStream classes are enhanced to detect if the underlying layer stream is a data pipe class. org.apache.hadoop.io.text enhances the array of objects of this class in the same way as the AString class. org.apache.hadoop.hdfs.dfsinput/OutputStream, a specialized HDFS data pipe is created for these classes. java.sql.resultset replaces the getString method with a version of the AString class.

To support the hdfsdatapipe input stream implementation supports addressing within one data line. When an HDFS client opens a file, it performs a small read operation to determine if the file being read is a Hadoop sequence file.

For the enhanced String implementation, after AString is defined as a child class of java. Lang. String, and the class is declared final, the standard String class is replaced by dynamic code loading to solve this problem. When AString is implemented in Java for performance reasons, a flat array of bytes is used to store the value array, which is pre-allocated at start-up and managed internally by AString. For operations that cannot be performed on the original array, such as subtroping, AString may fall back to using a materialized string representation. Finally, as Java does not support operator reloading, all character string connections in the source code are rewritten by using a functional form. For example, for a+b, where a and b are both java.

According to the data transmission method, the data transmission efficiency is improved by automatically constructing and embedding the data pipeline. The efficient binary data pipeline is automatically constructed by using the existing text data export and import functions as specifications by using the efficient binary format, so that not only can the efficient binary data transmission be realized, but also the binary data transmission format can be simplified, and the support of the new format is not required to be realized in each system respectively.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiments of the present application also provide a data transmission apparatus for implementing the above-mentioned related data transmission method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in one or more embodiments of the data transmission device provided below may refer to the limitation of the data transmission method hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 10, there is provided a data transmission apparatus 1000 including: a data pipeline creation module 1002, a format optimization policy determination module 1004, a data format conversion module 1006, and a data pipeline transmission module 1008, wherein:

a data pipe creation module 1002, configured to determine a target database to be data-transmitted with the source database, and create a data pipe between the source database and the target database;

a format optimization strategy determining module 1004, configured to determine a data format optimization strategy matched with the source database according to a data conversion mode supported by the source database in a data transmission process;

the data format conversion module 1006 is configured to update a string conversion operation in the data conversion mode based on a data format optimization policy, and perform data format conversion on data to be transmitted in the source database according to the updated data conversion mode, so as to obtain data to be transmitted after data format conversion; the data to be transmitted after the data format conversion comprises data which is obtained based on the character string conversion operation in the updated data conversion mode and belongs to the object array type;

and the data pipeline transmission module 1008 is used for transmitting the data to be transmitted after the data format conversion from the source database to the target database through the data pipeline.

In one embodiment, the data conversion means comprises direct text conversion means and the data format optimization strategy comprises a string optimization strategy; the data format conversion module 1006 is further configured to update a character string conversion operation in the direct text conversion mode based on a character string optimization policy, so as to obtain an updated text conversion mode; and converting the text format of the data to be transmitted in the source database according to the updated text conversion mode, and obtaining the data to be transmitted after the data format conversion.

In one embodiment, the data format conversion module 1006 is further configured to construct a text conversion dataflow graph according to a direct text conversion manner; determining a specified character string conversion operation from the text conversion dataflow graph; based on the character string optimization strategy, the appointed character string conversion operation in the direct text conversion mode is updated to the character string conversion operation of the object array type, and the updated text conversion mode is obtained.

In one embodiment, the data format conversion module 1006 is further configured to determine, from the data to be transmitted in the source database, character string data belonging to the character string type; converting the character string literal quantity of the character string data into character string optimization data of the object array type through the character string conversion operation of the object array type; performing text format conversion on data except character string data in the data to be transmitted in a direct text conversion mode to obtain text format data; and obtaining data to be transmitted after data format conversion according to the text format data and the character string optimization data.

In one embodiment, the data conversion means comprises external text conversion means and the data format optimization strategy comprises a data type optimization strategy; the data format conversion module 1006 is further configured to update a string conversion operation of the external text conversion mode based on the data type optimization policy; generating binary state data for data to be transmitted in a source database; text format conversion is carried out on the binary state data through the updated character string conversion operation, and data to be transmitted after data format conversion is obtained; the data to be transmitted after the data format conversion belongs to the object array type.

In one embodiment, the data pipeline creation module 1002 is further configured to determine a first process communication interface of the source database and a second process communication interface of the target database, respectively; and establishing communication connection between the first process communication interface and the second process communication interface to obtain a data pipeline between the source database and the target database.

In one embodiment, the data pipeline creation module 1002 is further configured to, upon receiving the data export request, create a data export worker thread associated with the data export request, and determine a first process communication interface for the data export worker thread; inquiring a data import working thread from a data import thread catalog; the data import thread catalog comprises candidate data import working threads created by a target database based on the data import request; a second process communication interface of the data import worker thread is determined.

In one embodiment, the system further comprises a database configuration module for configuring the source database for data reading and data transmission through the first process communication interface; reading data to be transmitted from a source database through a first process communication interface; the data pipeline transmission module 1008 is further configured to transmit, through the first process communication interface, data to be transmitted after the data format conversion to the target database in the data pipeline; and the data to be transmitted after the data format conversion is used for indicating the target database to convert the data to be transmitted after the data format conversion into the data format supported by the target database and then storing the data.

In one embodiment, the database configuration module is further configured to determine a file name format of the data to be transmitted; when the file name format is of an active type, the data to be transmitted is read from the source database through the first process communication interface.

In one embodiment, the format optimization strategy determining module 1004 is further configured to determine a string conversion operation of the source database during the data transmission process; determining a data conversion mode supported by a source database according to the character string conversion operation; based on the data conversion mode, a data format optimization strategy matched with the source database is determined.

In one embodiment, the data format conversion module 1006 is further configured to delete the data delimiter in the data to be transmitted of the source database, update the string conversion operation in the data conversion mode based on the data format optimization policy, and perform data format conversion on the data to be transmitted with the data delimiter deleted according to the updated data conversion mode, so as to obtain the data to be transmitted after data format conversion.

In one embodiment, the data format conversion module 1006 is further configured to perform redundancy elimination for multiplexing attribute information in the data to be transmitted of the source database, update a string conversion operation in the data conversion mode based on the data format optimization policy, and perform data format conversion on the data to be transmitted after redundancy elimination according to the updated data conversion mode, so as to obtain the data to be transmitted after data format conversion.

In one embodiment, the data pipeline transmission module 1008 is further configured to, when the data to be transmitted after the data format conversion belongs to a storage type of a row format, convert the data to be transmitted after the data format conversion into a storage type of a column format; and transmitting the data to be transmitted after the data format conversion of the column format from the source database to the target database through the data pipeline.

In one embodiment, the system further comprises a verification module, configured to store the sampled data in the data to be transmitted in a cache of the operation node when the source database and the target database are associated to the same operation node; obtaining pipeline transmission data matched with the sampling data from the data to be transmitted after the data format conversion; and carrying out data verification on the sampling data in the cache and the pipeline transmission data to obtain a verification result aiming at the data pipeline.

In one embodiment, the system further comprises a binary transmission module, which is used for obtaining data to be transmitted from the source database when the source database and the target database support the same binary data format in the data transmission process; converting the data to be transmitted according to the binary data format to obtain the binary data to be transmitted; the data to be transmitted in binary format is transmitted from the source database to the target database through the data pipeline.

In one embodiment, as shown in fig. 11, there is provided a data transmission apparatus 1100, comprising: a data pipe creation module 1102, a data pipe transmission module 1104, and a data storage module 1106, wherein:

a data pipe creation module 1102, configured to determine a source database to be data-transmitted with a target database, and create a data pipe between the target database and the source database;

A data pipeline transmission module 1104, configured to receive, through a data pipeline, data to be transmitted after data format conversion transmitted from a source database; the data to be transmitted after the data format conversion is obtained by updating the character string conversion operation in the data conversion mode supported by the source database in the data transmission process based on the data format optimization strategy and performing data format conversion on the data to be transmitted in the source database according to the updated data conversion mode; the data to be transmitted after the data format conversion comprises data which is obtained based on the character string conversion operation in the updated data conversion mode and belongs to the object array type;

the data storage module 1106 is configured to convert the data to be transmitted after the data format conversion into a target data format supported by the target database, and store the data to be transmitted belonging to the target data format in the target database.

In one embodiment, the data pipe is created based on a first process communication interface of the source database and a second process communication interface of the target database; the thread configuration module is used for creating a data import working thread associated with the data import request when the data import request is received; recording a second process communication interface of the data import working thread into a data import thread catalog; when the number of the imported threads of the data importing working threads included in the data importing thread directory exceeds the number of the importing threads, configuring the idle data importing working threads into a waiting state; the export thread number is the thread number of the data export work thread created by the source database based on the data export request.

The respective modules in the above-described data transmission apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server or a terminal, and the internal structure of which may be as shown in fig. 12. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing various data related to the data transmission method. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data transmission method.

It will be appreciated by those skilled in the art that the structure shown in fig. 12 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric RandomAccess Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method of data transmission, the method comprising:

determining a target database to be subjected to data transmission with a source database, and creating a data pipeline between the source database and the target database based on respective process communication interfaces of the source database and the target database;

when the data conversion mode supported by the source database in the data transmission process comprises a direct text conversion mode, determining that the data format optimization strategy matched with the source database comprises a character string optimization strategy;

Updating the appointed character string conversion operation in the direct text conversion mode into the character string conversion operation of the object array type based on the character string optimization strategy to obtain an updated text conversion mode;

determining character string data belonging to a character string type from data to be transmitted of the source database;

converting the character string literal quantity of the character string data into character string optimization data of the object array type through the character string conversion operation of the object array type in the updated text conversion mode; the character string optimization data is stored in an object array form;

performing text format conversion on the data except the character string data in the data to be transmitted by using a direct text conversion mode in the updated text conversion modes to obtain text format data;

obtaining data to be transmitted after data format conversion according to the text format data and the character string optimization data;

2. The method of claim 1, wherein the text format data comprises JSON format data or CSV format data.

3. The method according to claim 1, wherein the method further comprises:

constructing a text conversion data flow diagram according to the direct text conversion mode;

and determining a specified character string conversion operation from the text conversion data flow diagram.

4. The method of claim 1, wherein the string data comprises at least one of string literal quantity, constructor, or numerical data.

5. The method according to claim 1, wherein the method further comprises:

when the data conversion mode supported by the source database in the data transmission process comprises an external text conversion mode, determining that the data format optimization strategy matched with the source database comprises a data type optimization strategy;

updating the character string conversion operation of the external text conversion mode based on the data type optimization strategy;

generating binary state data for data to be transmitted in the source database;

performing text format conversion on the binary state data through the updated character string conversion operation to obtain data to be transmitted after data format conversion; the data to be transmitted after the data format conversion belongs to the object array type.

6. The method of claim 1, wherein the creating a data pipe between the source database and the target database based on the respective process communication interfaces of the source database and the target database comprises:

determining a first process communication interface of the source database and a second process communication interface of the target database respectively;

and establishing communication connection between the first process communication interface and the second process communication interface to obtain a data pipeline between the source database and the target database.

7. The method of claim 6, wherein the determining the first process communication interface of the source database and the second process communication interface of the target database, respectively, comprises:

when a data export request is received, creating a data export working thread associated with the data export request, and determining a first process communication interface of the data export working thread;

inquiring a data import working thread from the data import thread catalog; the data import thread catalog comprises candidate data import working threads created by the target database based on a data import request;

And determining a second process communication interface of the data import working thread.

8. The method of claim 6, wherein the method further comprises:

configuring the source database to perform data reading and data transmission through the first process communication interface;

reading the data to be transmitted from the source database through the first process communication interface;

the transmitting the data to be transmitted after the data format conversion from the source database to the target database through the data pipeline comprises the following steps:

transmitting the data to be transmitted after the data format conversion to the target database in the data pipeline through the first process communication interface;

and the data to be transmitted after the data format conversion is used for indicating the target database to convert the data to be transmitted after the data format conversion into a data format supported by the target database and then storing the data.

9. The method of claim 8, wherein the reading the data to be transmitted from the source database via the first process communication interface comprises:

determining a file name format of the data to be transmitted;

And when the file name format belongs to an activation type, reading the data to be transmitted from the source database through the first process communication interface.

10. The method according to claim 1, wherein the method further comprises:

determining character string conversion operation of the source database in a data transmission process;

determining a data conversion mode supported by the source database according to the character string conversion operation;

and determining a data format optimization strategy matched with the source database based on the data conversion mode.

11. The method of claim 1, further comprising at least one of:

deleting the data separator in the data to be transmitted of the source database, updating the character string conversion operation in the data conversion mode based on the data format optimization strategy, and carrying out data format conversion on the data to be transmitted with the deleted data separator according to the updated data conversion mode to obtain the data to be transmitted after the data format conversion;

and performing redundancy elimination on multiplexing attribute information in the data to be transmitted of the source database, updating character string conversion operation in the data conversion mode based on the data format optimization strategy, and performing data format conversion on the data to be transmitted after redundancy elimination according to the updated data conversion mode to obtain the data to be transmitted after data format conversion.

12. The method according to claim 1, wherein said transferring the data to be transferred, after the data format conversion, from the source database to the target database through the data pipe comprises:

when the data to be transmitted after the data format conversion belongs to a storage type of a row format, converting the data to be transmitted after the data format conversion into a storage type of a column format;

and transmitting the data to be transmitted after the data format conversion in the column format from the source database to the target database through the data pipeline.

13. The method according to any one of claims 1 to 12, further comprising:

when the source database and the target database are associated to the same operation node, storing the sampling data in the data to be transmitted into a cache of the operation node;

obtaining pipeline transmission data matched with the sampling data from the data to be transmitted after the data format conversion;

and carrying out data verification on the sampling data in the cache and the pipeline transmission data to obtain a verification result aiming at the data pipeline.

14. The method according to any one of claims 1 to 12, further comprising:

when the source database and the target database support the same binary data format in the data transmission process, obtaining data to be transmitted from the source database;

converting the data to be transmitted according to the binary data format to obtain binary data to be transmitted;

and transmitting the data to be transmitted in the binary format from the source database to the target database through the data pipeline.

15. A method of data transmission, the method comprising:

determining a source database to be subjected to data transmission with a target database, and creating a data pipeline between the target database and the source database based on respective process communication interfaces of the target database and the source database;

receiving data to be transmitted after data format conversion transmitted from the source database through the data pipeline; the data to be transmitted after the data format conversion is obtained according to text format data and character string optimization data; the text format data is obtained by converting the text format of data except the character string data in the data to be transmitted in a direct text conversion mode in the updated text conversion modes; the character string optimization data is obtained by converting the character string literal quantity of the character string data into character string optimization data of an object array type through the character string conversion operation of the object array type in the updated text conversion mode, and the character string optimization data is stored in an object array form; the character string data are determined from the data to be transmitted of the source database, and the character string data belong to a character string type; the updated text conversion mode is obtained by updating the appointed character string conversion operation in the direct text conversion mode into the character string conversion operation of the object array type based on a character string optimization strategy; the character string optimization strategy is determined when a data conversion mode supported by the source database in the data transmission process comprises a direct text conversion mode, and the character string optimization strategy is matched with the source database;

And converting the data to be transmitted after the data format conversion into a target data format supported by the target database, and storing the data to be transmitted belonging to the target data format into the target database.

16. The method of claim 15, wherein the data pipe is created based on a first process communication interface of the source database and a second process communication interface of the target database; the method further comprises the steps of:

when a data import request is received, creating a data import working thread associated with the data import request;

recording a second process communication interface of the data import working thread into a data import thread catalog;

when the number of the imported threads of the data importing working threads included in the data importing thread catalog exceeds the number of the importing threads, configuring the idle data importing working threads into a waiting state;

the export thread number is the thread number of the data export work thread created by the source database based on the data export request.

17. A data transmission apparatus, the apparatus comprising:

the system comprises a data pipeline creation module, a data pipeline generation module and a data processing module, wherein the data pipeline creation module is used for determining a target database to be subjected to data transmission with a source database and creating a data pipeline between the source database and the target database based on respective process communication interfaces of the source database and the target database;

The format optimization strategy determining module is used for determining that the data format optimization strategy matched with the source database comprises a character string optimization strategy when the data conversion mode supported by the source database in the data transmission process comprises a direct text conversion mode;

the data format conversion module is used for updating the appointed character string conversion operation in the direct text conversion mode into the character string conversion operation of the object array type based on the character string optimization strategy to obtain an updated text conversion mode; determining character string data belonging to a character string type from data to be transmitted of the source database; converting the character string literal quantity of the character string data into character string optimization data of the object array type through the character string conversion operation of the object array type in the updated text conversion mode; the character string optimization data is stored in an object array form; performing text format conversion on the data except the character string data in the data to be transmitted by using a direct text conversion mode in the updated text conversion modes to obtain text format data; obtaining data to be transmitted after data format conversion according to the text format data and the character string optimization data;

18. The apparatus of claim 17, wherein the text format data comprises JSON format data or CSV format data.

19. The apparatus of claim 17, wherein the device comprises a plurality of sensors,

the data format conversion module is further used for constructing a text conversion data flow diagram according to the direct text conversion mode; and determining a specified character string conversion operation from the text conversion data flow diagram.

20. The apparatus of claim 17, wherein the string data comprises at least one of string literal quantity, constructor, or numerical data.

21. The apparatus of claim 17, wherein the device comprises a plurality of sensors,

the data format conversion module is further configured to determine that a data format optimization policy matched with the source database includes a data type optimization policy when a data conversion mode supported by the source database in a data transmission process includes an external text conversion mode; updating the character string conversion operation of the external text conversion mode based on the data type optimization strategy; generating binary state data for data to be transmitted in the source database; performing text format conversion on the binary state data through the updated character string conversion operation to obtain data to be transmitted after data format conversion; the data to be transmitted after the data format conversion belongs to the object array type.

22. The apparatus of claim 17, wherein the device comprises a plurality of sensors,

the data pipeline creation module is further used for respectively determining a first process communication interface of the source database and a second process communication interface of the target database; and establishing communication connection between the first process communication interface and the second process communication interface to obtain a data pipeline between the source database and the target database.

23. The apparatus of claim 22, wherein the device comprises a plurality of sensors,

the data pipeline creation module is further used for creating a data export working thread associated with the data export request when the data export request is received, and determining a first process communication interface of the data export working thread; inquiring a data import working thread from the data import thread catalog; the data import thread catalog comprises candidate data import working threads created by the target database based on a data import request; and determining a second process communication interface of the data import working thread.

24. The apparatus as recited in claim 22, further comprising:

the database configuration module is used for configuring the source database to carry out data reading and data transmission through the first process communication interface; reading the data to be transmitted from the source database through the first process communication interface;

The data pipeline creation module is further configured to transmit, through the first process communication interface, the data to be transmitted after the data format conversion to the target database in the data pipeline; and the data to be transmitted after the data format conversion is used for indicating the target database to convert the data to be transmitted after the data format conversion into a data format supported by the target database and then storing the data.

25. The apparatus of claim 24, wherein the device comprises a plurality of sensors,

the database configuration module is further used for determining a file name format of the data to be transmitted; and when the file name format belongs to an activation type, reading the data to be transmitted from the source database through the first process communication interface.

26. The apparatus of claim 17, wherein the device comprises a plurality of sensors,

the format optimization strategy determining module is further used for determining character string conversion operation of the source database in the data transmission process; determining a data conversion mode supported by the source database according to the character string conversion operation; and determining a data format optimization strategy matched with the source database based on the data conversion mode.

27. The apparatus of claim 17, wherein the data format conversion module is further configured to at least one of:

28. The apparatus of claim 17, wherein the device comprises a plurality of sensors,

the data pipeline transmission module is further configured to, when the data to be transmitted after the data format conversion belongs to a storage type of a row format, convert the data to be transmitted after the data format conversion into a storage type of a column format; and transmitting the data to be transmitted after the data format conversion in the column format from the source database to the target database through the data pipeline.

29. The apparatus according to any one of claims 17 to 28, further comprising:

the verification module is used for storing the sampling data in the data to be transmitted into a cache of the operation node when the source database and the target database are associated to the same operation node; obtaining pipeline transmission data matched with the sampling data from the data to be transmitted after the data format conversion; and carrying out data verification on the sampling data in the cache and the pipeline transmission data to obtain a verification result aiming at the data pipeline.

30. The apparatus according to any one of claims 17 to 28, further comprising:

the binary transmission module is used for acquiring data to be transmitted from the source database when the source database and the target database support the same binary data format in the data transmission process; converting the data to be transmitted according to the binary data format to obtain binary data to be transmitted; and transmitting the data to be transmitted in the binary format from the source database to the target database through the data pipeline.

31. A data transmission apparatus, the apparatus comprising:

the system comprises a data pipeline creation module, a data pipeline generation module and a data processing module, wherein the data pipeline creation module is used for determining a source database to be subjected to data transmission with a target database and creating a data pipeline between the target database and the source database based on respective process communication interfaces of the target database and the source database;

the data pipeline transmission module is used for receiving data to be transmitted after the data format conversion transmitted from the source database through the data pipeline; the data to be transmitted after the data format conversion is obtained according to text format data and character string optimization data; the text format data is obtained by converting the text format of data except the character string data in the data to be transmitted in a direct text conversion mode in the updated text conversion modes; the character string optimization data is obtained by converting the character string literal quantity of the character string data into character string optimization data of an object array type through the character string conversion operation of the object array type in the updated text conversion mode, and the character string optimization data is stored in an object array form; the character string data are determined from the data to be transmitted of the source database, and the character string data belong to a character string type; the updated text conversion mode is obtained by updating the appointed character string conversion operation in the direct text conversion mode into the character string conversion operation of the object array type based on a character string optimization strategy; the character string optimization strategy is determined when a data conversion mode supported by the source database in the data transmission process comprises a direct text conversion mode, and the character string optimization strategy is matched with the source database;

The data storage module is used for converting the data to be transmitted after the data format conversion into a target data format supported by the target database, and storing the data to be transmitted belonging to the target data format into the target database.

32. The apparatus of claim 31, wherein the data pipe is created based on a first process communication interface of the source database and a second process communication interface of the target database; further comprises:

the thread configuration module is used for creating a data import working thread associated with the data import request when the data import request is received; recording a second process communication interface of the data import working thread into a data import thread catalog; when the number of the imported threads of the data importing working threads included in the data importing thread catalog exceeds the number of the importing threads, configuring the idle data importing working threads into a waiting state; the export thread number is the thread number of the data export work thread created by the source database based on the data export request.

33. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 16 when the computer program is executed.

34. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 16.

35. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 16.