WO2021121041A1 - 数据传输优化方法、设备及可读存储介质 - Google Patents

数据传输优化方法、设备及可读存储介质 Download PDF

Info

Publication number
WO2021121041A1
WO2021121041A1 PCT/CN2020/133428 CN2020133428W WO2021121041A1 WO 2021121041 A1 WO2021121041 A1 WO 2021121041A1 CN 2020133428 W CN2020133428 W CN 2020133428W WO 2021121041 A1 WO2021121041 A1 WO 2021121041A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
transmitted
data transmission
receiving end
columnar
Prior art date
Application number
PCT/CN2020/133428
Other languages
English (en)
French (fr)
Inventor
黄启军
黄铭毅
李诗琦
刘玉德
陈天健
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021121041A1 publication Critical patent/WO2021121041A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9057Arrangements for supporting packet reassembly or resequencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/166IP fragmentation; TCP segmentation

Definitions

  • This application relates to the field of big data technology of financial technology (Fintech), and in particular to a data transmission optimization method, device, and readable storage medium.
  • the main purpose of this application is to provide a data transmission optimization method, device, and readable storage medium, aiming to solve the technical problem of low efficiency of column data transmission in the prior art.
  • an embodiment of the present application provides a data transmission optimization method.
  • the data transmission optimization method is applied to a data sending end, and the data transmission optimization method includes:
  • the metadata is transmitted to the data receiving end associated with the data transmission task, so as to transmit the memory address information of the transmitting end corresponding to the columnar data to be transmitted to the data receiving end, and receive the data receiving end Negotiation information fed back based on the metadata;
  • the column data to be transmitted is transmitted to the data receiving end.
  • this application also provides a data transmission optimization method, which is applied to a data receiving end, and the data transmission optimization method includes:
  • the memory address information of the sending end is updated.
  • the data transmission optimization device is a physical device.
  • the data transmission optimization device includes a memory, a processor, and a device that is stored in the memory and can run on the processor.
  • the program of the data transmission optimization method can realize the steps of the above-mentioned data transmission optimization method when the program of the data transmission optimization method is executed by a processor.
  • the present application also provides a readable storage medium, the readable storage medium stores a program for implementing the data transmission optimization method, and when the program of the data transmission optimization method is executed by a processor, the data transmission optimization method as described above is implemented step.
  • This application provides an efficient columnar data transmission method.
  • the columnar data to be transmitted and the columnar data to be transmitted are realized.
  • the purpose of transmitting the memory address information of the sending end corresponding to the format data directly to the data receiving end, and then the data receiving end can directly obtain the column format data to be transmitted based on the memory address information of the sending end, thereby saving the data transmission.
  • the calculation process of the serialization and deserialization of the columnar data to be transmitted that is, the calculation process during columnar data transmission is reduced, the transmission efficiency of data transmission is improved, and the columnar data in the prior art is solved.
  • FIG. 1 is a schematic flowchart of a first embodiment of a data transmission optimization method according to this application;
  • FIG. 2 is a schematic diagram of the data transmission process when the preset transmission protocol is RDMA in the first embodiment of the data transmission optimization method of this application;
  • FIG. 3 is a schematic flowchart of a second embodiment of a data transmission optimization method according to this application.
  • FIG. 4 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the application.
  • the embodiment of the present application provides a data transmission optimization method, which is applied to a data sending end.
  • the data transmission optimization method includes:
  • Step S10 When a data transmission task is received, determine the columnar data to be transmitted corresponding to the data transmission task and the metadata corresponding to the columnar data to be transmitted;
  • the format information can be information used to describe the organization of the data to be cached in a column format, for example, how many columns are included, the index of each column, the nesting information of columns and sub-columns, the number of cells in each column, and each column
  • the data transmission task includes the first data transmission task corresponding to the distributed synchronous computing task
  • the first data transmission task is associated with a distributed synchronous computing task, where the distributed synchronous computing task refers to running multiple processes at the same time step to complete different A distributed computing task of computing subtasks, and each process at this time is only used to perform the distributed synchronous computing task.
  • a preset multi-process polling mode is activated. Specifically, if the data transmission task is the first data transmission task, then Start a preset multi-process polling mode to match different transmission task calculation processes for each memory block, wherein each transmission task calculation process is responsible for the transmission status check and event processing of each corresponding memory block, and this Other computing processes are in a pause or low-load computing state to ensure CPU (central Processing unit (central processing unit), memory, transmission equipment, etc. reach the maximum utilization rate.
  • CPU central Processing unit
  • the data transmission task includes a second data transmission task corresponding to the distributed asynchronous computing task
  • the step Before the step of transmitting the metadata to the data receiving end associated with the data transmission task, the step includes:
  • the second data transmission task is associated with the distributed asynchronous computing task, where the distributed asynchronous computing task refers to a distributed process that can perform at least one process.
  • the distributed asynchronous computing task refers to a distributed process that can perform at least one process.
  • Computing tasks, and each process has multiple control rights, that is, each process can perform other computing tasks when it is idle.
  • the preset interrupt mode is activated. Specifically, if the data transmission task is the second data transmission task, then the preset interrupt mode is activated.
  • the interrupt mode is to dynamically match the transmission task calculation process for each of the memory blocks, that is, when the second data transmission task is performed, at least one transmission task calculation process is used for the second data transmission task.
  • the transmission task calculation process is responsible for the transmission status check and event processing of the memory block, and further, when there is a transmission status change or a new event is awakened, the corresponding transmission task calculation process is awakened to perform the second data transmission task , Wherein, when there is no transmission state change or a new event is awakened, the corresponding transmission task calculation process is in a sleep state, so as to give up calculation resources to other calculation processes.
  • Step S20 Transmit the metadata to the data receiving end associated with the data transmission task, so as to transmit the memory address information of the transmitting end corresponding to the columnar data to be transmitted to the data receiving end, and receive the The negotiation information fed back by the data receiving end based on the metadata;
  • the metadata is transmitted to the data receiving end associated with the data transmission task, so as to transmit the memory address information of the transmitting end corresponding to the columnar data to be transmitted to the data receiving end, and receive the data receiving end Based on the negotiation information fed back by the metadata, specifically, the metadata is transmitted to the data receiving end associated with the data transmission task, so as to transmit the memory address information of the transmitting end corresponding to the columnar data to be transmitted to The data receiving end, wherein the memory address information of the sending end can be stored in the metadata so as to be directly transmitted to the data receiving end along with the metadata, and the data receiving end will be based on the metadata
  • the memory length information and the maximum amount of data that can be received in one transmission determine the single transmission data amount.
  • the data receiving end can receive each transmission The maximum number of is 2 bytes, then the size of the single transmission data amount should be less than or equal to 2 bytes, and then the single transmission amount fed back by the data receiving end is received.
  • the metadata is transmitted to the data receiving end associated with the data transmission task, so as to transmit the memory address information of the transmitting end corresponding to the columnar data to be transmitted to the data receiving end, and receive the data
  • the step of the negotiation information fed back by the receiving end based on the metadata includes:
  • Step S21 Transmit the memory length information and the memory address information of the sending end to the data receiving end, so that the data receiving end can determine the single transmission data volume corresponding to the memory length information, and based on The memory address information of the sending end updates the memory address information of the receiving end of the columnar data to be transmitted;
  • the memory length information and the memory address information of the sending end are transmitted to the data receiving end, so that the data receiving end can determine the single transmission data amount corresponding to the memory length information , And update the memory address information of the receiving end of the columnar data to be transmitted based on the memory address information of the transmitting end, specifically, transmitting the memory length information to the data receiving end for the data receiving end to use
  • the memory length information and the preset maximum data reception quantity are used to determine the single transmission quantity, wherein the preset maximum data reception quantity is the maximum data that the data receiving end can receive during one data transmission process
  • Step S22 Receive the single transmission data amount fed back by the data receiving end.
  • the single transmission data amount is used to split the column data to be transmitted to obtain a single or multiple virtual column storage batches, wherein each virtual column
  • the data size corresponding to the format storage batch should be less than or equal to the size of the single transmission data size.
  • Step S30 Based on the negotiation information, transmit the columnar data to be transmitted to the data receiving end.
  • the column data to be transmitted is transmitted to the data receiving end. Specifically, the column data to be transmitted is transmitted based on the amount of data in a single transmission in the negotiation information.
  • the data is split to obtain multiple in-line data segments to be transmitted, and the memory blocks corresponding to each in-line data segment to be transmitted are transmitted to the data receiving end one by one.
  • the negotiation information includes a single transmission data amount
  • the step of transmitting the columnar data to be transmitted to the data receiving end based on the negotiation information includes:
  • Step S31 splitting the columnar data to be transferred based on the single transmission data volume to obtain multiple virtual columnar storage batches
  • the virtual columnar storage batch includes a portion of the columnar data to be transmitted with a preset data amount, wherein the size of the preset data amount is less than or equal to the single The size of the data transferred.
  • split the to-be-transmitted columnar data to obtain multiple virtual columnar storage batches. Specifically, based on the single-time transmission data volume, determine whether to transfer the columnar data If the single data transmission volume is less than the total amount of the column data to be transmitted, it is determined to split the column data to be transmitted, and the column data to be transmitted Split into a plurality of said virtual columnar storage batches, if the single data transmission volume is greater than or equal to the total amount of data of the columnar data to be transferred, it is determined not to split the columnar data to be transferred, The columnar data to be transmitted is used as a single virtual columnar storage batch to directly transmit the columnar data to be transmitted to the data receiving end.
  • Step S32 query the to-be-transmitted memory block corresponding to each virtual columnar storage batch
  • the to-be-transmitted memory block corresponding to each virtual columnar storage batch is queried. Specifically, based on the memory address information of the sending end in the metadata, each virtual column is queried in a preset local storage database. The virtual memory corresponding to the columnar storage batch is further obtained based on the mapping relationship between the virtual memory and the memory block to obtain the to-be-transmitted memory block, wherein the preset local storage database includes shared memory.
  • Step S33 Transmit each of the memory blocks to be transmitted to the data receiving end based on a preset transmission protocol.
  • the preset transmission protocol includes protocols such as RDMA or ordinary TCP (Transmission Control Protocol, Transmission Control Protocol).
  • each of the memory blocks to be transmitted is transmitted to the data receiving end, specifically, based on the data transmission rules of the preset transmission protocol, each of the memory blocks to be transmitted is transmitted to the data receiving end After the data transmission is completed, the lock on each memory block is released, that is, the transmission lock is released. As shown in FIG.
  • the RDMA is the data transmission device corresponding to the RDMA
  • the control channel is the channel through which control data is transmitted, and is used to transmit the metadata
  • the mem block 1 and the mem block1 is a transmission lock
  • the data receiving end will store each of the memory blocks to be transmitted, and obtain the information of each memory block to be transmitted
  • the receiving end memory address information, and the data receiving end will update the metadata based on the receiving end memory address information, that is, update the sending end memory address information in the metadata to the receiving end Memory address information.
  • the negotiation information includes a single transmission data amount
  • the step of transmitting the columnar data to be transmitted to the data receiving end based on the negotiation information includes:
  • Step C10 based on the single transmission data volume, taking the columnar data to be transferred as a single virtual columnar storage batch;
  • the to-be-transmitted columnar data is regarded as a single virtual columnar storage batch. Specifically, based on the single-time transmission data volume, it is determined whether to Split the columnar data. If the single data transmission volume is less than the total amount of the columnar data to be transmitted, it is determined to split the columnar data to be transmitted, and the columnar data to be transmitted is divided.
  • Data is split into a plurality of the virtual columnar storage batches to directly transmit the columnar data to be transmitted to the data receiving end, if the single data transmission volume is greater than or equal to the columnar data to be transmitted If the total amount of data is larger than the total amount of data, it is determined not to split the columnar data to be transferred, and the columnar data to be transferred is regarded as a single virtual columnar storage batch.
  • Step C20 query the memory block to be transferred corresponding to the single virtual columnar storage batch
  • the memory block to be transferred corresponding to the single virtual columnar storage batch is queried. Specifically, based on the memory address information of the sender in the metadata, query each of the virtual storage databases in a preset local storage database. The virtual memory corresponding to the columnar storage batch is further obtained based on the mapping relationship between the virtual memory and the memory block to obtain the to-be-transmitted memory block, wherein the preset local storage database includes shared memory.
  • Step C30 Based on a preset transmission protocol, the memory block to be transmitted is transmitted to the data receiving end.
  • the preset transmission protocol includes protocols such as RDMA or ordinary TCP (Transmission Control Protocol, Transmission Control Protocol).
  • each of the memory blocks to be transmitted is transmitted to the data receiving end, specifically, based on the data transmission rules of the preset transmission protocol, each of the memory blocks to be transmitted is transmitted to the data receiving end.
  • the lock on each of the memory blocks is released, that is, the transmission lock is released, and then after the data receiving end receives each of the memory blocks to be transmitted, the data receiving end will The memory block to be transmitted is stored, and the memory address information of the receiving end of each memory block to be transmitted is obtained, and then the data receiving end will update the metadata based on the memory address information of the receiving end, that is, The memory address information of the sending end in the metadata is updated to the memory address information of the receiving end.
  • the step of transmitting the columnar data to be transmitted to the data receiving end based on the negotiation information includes;
  • the negotiation information includes a single transmission data amount.
  • the column data to be transmitted is transmitted to the data receiving end, specifically, based on the single transmission data volume, the column to be transmitted is Split data into single or multiple virtual columnar storage batches, and determine the to-be-transmitted memory block corresponding to each virtual columnar storage batch, and further, based on the preset multi-process polling mode, for each of the The to-be-transmitted memory blocks are matched with different transmission task calculation processes, so that each of the to-be-transmitted memory blocks is sent to the data receiving end by executing each of the transmission task calculation processes, wherein the data receiving end is connected to the After each of the memory blocks to be transmitted, each of the memory blocks to be transmitted will be stored, and the corresponding receiving end memory address information will be obtained, and then the data receiving end will send the data based on the receiving end memory address information.
  • the end memory address information is updated to obtain new metadata, and then when the distributed synchronous computing task requires the column data to be transmitted, the data receiving end can directly base on the new metadata to perform the calculation on the column to be transmitted.
  • Type data performs operations such as data extraction or data transmission.
  • the step of transmitting the columnar data to be transmitted to the data receiving end based on the negotiation information includes;
  • Step E10 based on the preset interrupt mode and the negotiation information, transmit the column data to be transmitted to the data receiving end.
  • the column data to be transmitted is transmitted to the data receiving end, specifically, based on the single transmission data volume, the column data to be transmitted is split Divide into single or multiple virtual columnar storage batches, and determine the memory block to be transferred corresponding to each virtual columnar storage batch, and further, based on the preset interrupt mode, dynamically match each of the memory blocks to be transferred Transmission task calculation process, to send each of the memory blocks to be transmitted to the data receiving end by executing each of the transmission task calculation processes, wherein the data receiving end is connected to each of the memory blocks to be transmitted Afterwards, each of the memory blocks to be transmitted will be stored, and the corresponding receiving end memory address information will be obtained, and then the data receiving end will update the sending end memory address information based on the receiving end memory address information, When new metadata is obtained, and the distributed synchronous computing task requires the columnar data to be transmitted, the data receiving end can directly perform data extraction or data extraction on the columnar data to be transmitted based on the new metadata. Transmission and other operations.
  • the columnar data to be transmitted corresponding to the data transmission task and the metadata corresponding to the columnar data to be transmitted are determined, and then the metadata is transmitted to the data transmission task.
  • the data receiving end associated with the transmission task transmits the memory address information of the transmitting end corresponding to the columnar data to be transmitted to the data receiving end, and receiving the negotiation information fed back by the data receiving end based on the metadata, and then Based on the negotiation information, the column data to be transmitted is transmitted to the data receiving end.
  • the columnar data to be transmitted corresponding to the data transmission task and the metadata corresponding to the columnar data to be transmitted are determined first, and then the metadata Transmit to the data receiving end associated with the data transmission task to transmit the memory address information of the transmitting end corresponding to the columnar data to be transmitted to the data receiving end, and perform negotiation information feedback from the data receiving end Feedback, and based on the negotiation information, transmit the columnar data to be transmitted to the data receiving end. That is, this embodiment provides an efficient columnar data transmission method. By constructing the metadata and transmitting the metadata to the data receiving end, the columnar data to be transmitted and the data are transmitted.
  • the calculation process of serialization and deserialization of the columnar data to be transmitted during data transmission that is, the calculation process during columnar data transmission is reduced, and the transmission efficiency of data transmission is improved. Therefore, the problem is solved.
  • the data transmission optimization method is applied to the data receiving end, and the data transmission optimization method includes:
  • Step C10 receiving the memory length information corresponding to the metadata sent by the data sending end associated with the data receiving end and the memory address information of the sending end, and feeding back the negotiation information corresponding to the memory length information to the data sending end;
  • the sender memory address information may be stored in the metadata, that is, the metadata includes sender memory address information and memory length information.
  • Step C20 Receive the columnar data to be transmitted sent by the data sender based on the negotiation information, and store the columnar data to be transmitted in a preset storage database to obtain the memory of the receiving end of the columnar data to be transmitted Address information;
  • the columnar data to be transmitted corresponds to one or more data memory blocks, that is, the columnar data to be transmitted is stored and transmitted in the form of a memory block.
  • each of the to-be-transmitted memory blocks corresponding to the to-be-transmitted columnar data sent by the data sending end is received, wherein each of the to-be-transmitted memory blocks is determined by the data sending end based on the negotiation information.
  • the column data to be transmitted is obtained by splitting, and then each of the memory blocks to be transmitted is stored in the local database of the preset receiving end, and the data of each memory block to be transmitted in the local database of the preset receiving end is obtained.
  • the memory address information of the receiving end, wherein the preset local database of the receiving end includes the receiving end shared memory.
  • Step C30 Based on the memory address information of the receiving end, update the memory address information of the sending end.
  • the memory address information of the sending end is updated based on the memory address information of the receiving end.
  • the memory address information of the sending end in the metadata is updated based on the memory address information of the receiving end. Update to the memory address information of the receiving end, and then when it is necessary to extract the columnar data to be transmitted for calculation tasks or data transmission tasks, the columnar data to be transmitted can be extracted directly based on the memory address information of the receiving end That is, the columnar data to be transmitted can be directly extracted without serializing and deserializing the columnar data to be transmitted.
  • the preset storage database includes shared memory
  • step of updating the memory address information of the sending end based on the memory address information of the receiving end includes:
  • the preset non-virtual machine language types include C/C++ language types, etc.
  • the shared memory is the shared memory of the data receiving end, and the shared memory is multiple The shared memory of the processors, that is, there are multiple processors that can access the shared memory.
  • Step C50 If the columnar data to be transferred is the preset non-virtual machine language type, directly access the shared memory to extract the columnar data to be transferred;
  • the shared memory is directly accessed to extract the columnar data to be transmitted, specifically, if the columnar data to be transmitted If the columnar data is the preset non-virtual machine language type, it indicates that the shared memory can be directly accessed, and then the shared memory is directly accessed through the preset columnar data read-write data interface to extract the columnar format to be transmitted Data, so as to save the data paste copy process, and the column data to be transmitted can directly participate in the calculation task.
  • Step C60 If the columnar data to be transferred is not the preset non-virtual machine language type, indirectly access the shared memory to extract the columnar data to be transferred through a preset data interface;
  • the shared memory is indirectly accessed to extract the columnar data to be transferred through a preset data interface, specifically If the columnar data to be transferred is not the preset non-virtual machine language type, it indicates that the columnar data to be transferred cannot be retrieved by directly accessing the shared memory, and then through jni (Java Native Interface, Java local interface) or C extension indirectly access the shared memory to extract the columnar data to be transmitted.
  • jni Java Native Interface, Java local interface
  • C extension indirectly access the shared memory to extract the columnar data to be transmitted.
  • this embodiment first obtains the memory address information and memory length information of the sender by receiving the metadata, so as to feed back negotiation information to the data sender based on the memory length information, so as to receive the column to be transmitted. And then store the column data to be transmitted to obtain the memory address information of the receiving end, and then update the memory address information of the sending end based on the memory address information of the receiving end, that is, this embodiment Provides a method for directly obtaining the columnar data to be transmitted and the corresponding storage address information during data transmission, realizes the direct transmission of the columnar data to be transmitted, and avoids serialization and deserialization of the corresponding data. The calculation process during data transmission is reduced, the calculation efficiency of data transmission is improved, and the data transmission efficiency is further improved. Therefore, the technical problem of low efficiency of column data transmission in the prior art is solved.
  • FIG. 4 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application.
  • the data transmission optimization device may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between the processor 1001 and the memory 1005.
  • the memory 1005 may be a high-speed RAM memory, or a stable memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
  • the data transmission optimization device may also include a rectangular user interface, a network interface, a camera, RF (Radio Frequency (radio frequency) circuits, sensors, audio circuits, WiFi modules, etc.
  • the rectangular user interface may include a display screen (Display) and an input sub-module such as a keyboard (Keyboard), and the optional rectangular user interface may also include a standard wired interface and a wireless interface.
  • the optional network interface can include standard wired interface and wireless interface (such as WI-FI interface).
  • the structure of the data transmission optimization device shown in FIG. 4 does not constitute a limitation on the data transmission optimization device, and may include more or less components than shown in the figure, or a combination of certain components, or different components.
  • the layout of the components does not constitute a limitation on the data transmission optimization device, and may include more or less components than shown in the figure, or a combination of certain components, or different components. The layout of the components.
  • the processor 1001 is configured to execute the data transmission optimization program stored in the memory 1005 to implement the steps of the data transmission optimization method described in any one of the above.
  • the embodiments of the present application provide a readable storage medium, and the readable storage medium stores one or more programs, and the one or more programs may also be executed by one or more processors for implementation The steps of the data transmission optimization method described in any one of the above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种数据传输优化方法、设备及可读存储介质,所述数据传输优化方法包括:当接收到数据传输任务时,确定所述数据传输任务对应的待传输列式数据和所述待传输列式数据对应的元数据,将所述元数据传输至与所述数据传输任务关联的数据接收端,以将所述待传输列式数据对应的发送端内存地址信息传输至所述数据接收端,并接收所述数据接收端基于所述元数据反馈的协商信息,基于所述协商信息,将所述待传输列式数据传输至所述数据接收端。

Description

数据传输优化方法、设备及可读存储介质
本申请要求于2019年12月20日申请的、申请号为201911325502.0、名称为“数据传输优化方法、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及金融科技(Fintech)的大数据技术领域,尤其涉及一种数据传输优化方法、设备及可读存储介质。
背景技术
随着金融科技,尤其是互联网科技金融的不断发展,越来越多的技术(如分布式、区块链Blockchain、人工智能等)应用在金融领域,但金融业也对技术提出了更高的要求,如对金融业对应待办事项的分发也有更高的要求。
随着互联网科技的不断发展,越来越多的计算任务为分布式计算任务,而不再局限于单机进行,而在分布式计算任务中,数据的传输为其中必不可少的一环,目前,在进行跨机器数据传输通信时通常需要将数据转化为机器码进行传输,也即,需要对数据进行序列化和反序列化,进而导致了大量计算资源的浪费,进而导致数据传输的效率变低,而列式数据可支持丰富的数据结构,在进行一些计算任务时,可直接参与计算任务,但是,现有的列式数据无法支持高效的RDMA(Remote Direct Memory Access,远程直接数据存取)/DPDK(Data Plane Development Kit,数据平面开发套件)等传输技术,传输效率极低,所以,现有技术中存在列式数据传输效率低的技术问题。
技术解决方案
本申请的主要目的在于提供一种数据传输优化方法、设备及可读存储介质,旨在解决现有技术中列式数据传输效率低的技术问题。
为实现上述目的,本申请实施例提供一种数据传输优化方法,所述数据传输优化方法应用于数据发送端,所述数据传输优化方法包括:
当接收到数据传输任务时,确定所述数据传输任务对应的待传输列式数据和所述待传输列式数据对应的元数据;
将所述元数据传输至与所述数据传输任务关联的数据接收端,以将所述待传输列式数据对应的发送端内存地址信息传输至所述数据接收端,并接收所述数据接收端基于所述元数据反馈的协商信息;
基于所述协商信息,将所述待传输列式数据传输至所述数据接收端。
为实现上述目的,本申请还提供一种数据传输优化方法,所述数据传输优化方法应用于数据接收端,所述数据传输优化方法包括:
接收与所述数据接收端相关联的数据发送端发送的元数据对应的内存长度信息和发送端内存地址信息,并向所述数据发送端反馈所述内存长度信息对应的协商信息;
接收所述数据发送端基于所述协商信息发送的待传输列式数据,并将所述待传输列式数据存储至预设存储数据库,获得所述待传输列式数据的接收端内存地址信息;
基于所述接收端内存地址信息,对所述发送端内存地址信息进行更新。
本申请还提供一种数据传输优化设备,所述数据传输优化设备为实体设备,所述数据传输优化设备包括:存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的所述数据传输优化方法的程序,所述数据传输优化方法的程序被处理器执行时可实现如上述的数据传输优化方法的步骤。
本申请还提供一种可读存储介质,所述可读存储介质上存储有实现数据传输优化方法的程序,所述数据传输优化方法的程序被处理器执行时实现如上述的数据传输优化方法的步骤。
本申请提供了一种高效的列式数据传输方法,通过构建所述元数据并将所述元数据传输至所述数据接收端,实现了将所述待传输列式数据和所述待传输列式数据对应的发送端内存地址信息直接传输至所述数据接收端的目的,进而所述数据接收端可直接基于所述发送端内存地址信息获取所述待传输列式数据,进而节省了在传输数据时对所述待传输列式数据的序列化和反序列化的计算过程,也即,减少了列式数据传输时的计算过程,提高了数据传输的传输效率,解决了现有技术中列式数据传输效率低的技术问题。
附图说明
图1为本申请数据传输优化方法第一实施例的流程示意图;
图2为本申请数据传输优化方法第一实施例中为当所述预设传输协议为RDMA时的数据传输过程示意图;
图3为本申请数据传输优化方法第二实施例的流程示意图;
图4为本申请实施例方案涉及的硬件运行环境的设备结构示意图。
本发明的实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例提供一种数据传输优化方法,所述数据传输优化方法应用于数据发送端,在本申请数据传输优化方法的第一实施例中,参照图1,所述数据传输优化方法包括:
步骤S10,当接收到数据传输任务时,确定所述数据传输任务对应的待传输列式数据和所述待传输列式数据对应的元数据;
在本实施例中,需要说明的是,所述元数据包括所述待传输列式数据的发送端内存地址信息和内存长度信息,所述数据传输任务与所述待传输列式数据,所述发送端内存地址信息为所述待传输列式数据对应的各内存块的地址信息,其中,所述待传输列式数据存储于各所述内存块中,所述内存长度信息包括各所述内存块内的数据量大小,所述数据量越大,则所述内存长度越长。
当接收到数据传输任务时,确定所述数据传输任务对应的待传输列式数据和所述待传输列式数据对应的元数据,具体地,当接收到数据传输任务时,
判断所述数据传输任务对应的需求数据是否为列式数据类型,若所述需求数据是否为列式数据类型,则从发送端本地数据库中提取所述数据传输任务对应的待传输列式数据和所述待传输列式数据对应的元数据,也即,提取所述需求数据,若所述需求数据不为所述列式数据类型,则生成所述需求数据的列式格式信息,其中,列式格式信息可以是用于描述待缓存数据以列式格式组织的信息,例如,包括多少列、每列的索引、列与子列的嵌套信息、每个列的单元格数、每个列的数据类型、包括多少个存储批或每个存储批包括哪几个列等等信息,进而基于所述列式格式信息,将所述需求数据写入一个或者多个内存块中,并将各所述内存块添加至连续的虚拟内存中,以将所述需求数据存储为列式数据类型,获得所述待传输列式数据,并生成所述待传输列式数据的元数据,其中,所述元数据包括各所述内存块与所述虚拟内存之间的映射关系、所述列式格式信息与虚拟内存之间的对应关系,进而确定了所述待传输列式数据和所述元数据,且在确定所述待传输列式数据及其元数据之后,所述待传输列式数据对应的内存块将被锁定,以保证所述内存块无法写入其他数据。
另外地,需要说明的是,所述数据传输任务可绑定预设第一回调函数和预设第二回调函数,其中,所述预设第一回调函数用于在进行数据传输之前通过计算统计第一数据长度、传输时长、数据特征等相关信息,其中,所述第一数据长度为未传输至数据接收端的所述待传输列式数据的数据长度,所述传输时长为完成本次数据传输任务所要花费的累计时长,所述预设第二回调函数用于在进行数据传输之后将数据合并到另一区域、或丢弃当前数据、或统计第二数据长度、传输时长、数据特征等相关信息,其中,所述第二数据长度为已传输至数据接收端的所述待传输列式数据的数据长度,所述当前数据包括所述待传输列式数据。
其中,所述数据传输任务包括分布式同步计算任务对应的第一数据传输任务,
所述将所述元数据传输至与所述数据传输任务关联的数据接收端的步骤之前包括:
步骤A10,判断所述数据传输任务是否为所述第一数据传输任务;
在本实施例中,需要说明的是,所述第一数据传输任务与分布式同步计算任务相关联,其中,所述分布式同步计算任务指的是同一时间步运行多个进程以完成不同的计算子任务的分布式计算任务,且此时各进程只用于进行所述分布式同步计算任务。
步骤A20,若所述数据传输任务为所述第一数据传输任务,则启动预设多进程轮询模式;
在本实施例中,若所述数据传输任务为所述第一数据传输任务,则启动预设多进程轮询模式,具体地,若所述数据传输任务为所述第一数据传输任务,则启动预设多进程轮询模式,为各所述内存块匹配不同的传输任务计算进程,其中,各所述传输任务计算进程负责对应的各所述内存块的传输状态检查和事件处理,且此时其他计算进程处于暂停或低负载计算状态,以保证CPU(central processing unit,中央处理器)、内存、传输设备等达到最大利用率。
其中,所述数据传输任务包括分布式异步计算任务对应的第二数据传输任务,
所述将所述元数据传输至与所述数据传输任务关联的数据接收端的步骤之前包括:
步骤B10,判断所述数据传输任务是否为所述第二数据传输任务;
在本实施例中,需要说明的是,所述第二数据传输任务与所述分布式异步计算任务相关联,其中,所述分布式异步计算任务指的是至少可只进行一个进程的分布式计算任务,且各进程存在多个控制权,也即,各进程可在空闲时执行其他计算任务。
步骤B20,若所述数据传输任务为所述第二数据传输任务,则启动预设中断模式;
在本实施例中,若所述数据传输任务为所述第二数据传输任务,则启动预设中断模式,具体地,若所述数据传输任务为所述第二数据传输任务,则启动预设中断模式,以为各所述内存块动态匹配传输任务计算进程,也即,在进行所述第二数据传输任务时,至少存在一个传输任务计算进程用于所述第二数据传输任务,其中,所述传输任务计算进程负责所述内存块的传输状态检查和事件处理,进一步地,当有传输状态改变或者新事件被唤醒时,则唤醒相应的传输任务计算进程以进行所述第二数据传输任务,其中,在没有传输状态改变或者新事件被唤醒时,所述相应的传输任务计算进程处于睡眠状态,以将计算资源让给其他计算进程。
步骤S20,将所述元数据传输至与所述数据传输任务关联的数据接收端,以将所述待传输列式数据对应的发送端内存地址信息传输至所述数据接收端,并接收所述数据接收端基于所述元数据反馈的协商信息;
在本实施例中,需要说明的是,所述协商信息包括单次传输数据量。
将所述元数据传输至与所述数据传输任务关联的数据接收端,以将所述待传输列式数据对应的发送端内存地址信息传输至所述数据接收端,并接收所述数据接收端基于所述元数据反馈的协商信息,具体地,将所述元数据传输至与所述数据传输任务关联的数据接收端,以将所述待传输列式数据对应的发送端内存地址信息传输至所述数据接收端,其中,所述发送端内存地址信息可存储于所述元数据中,以随所述元数据直接传输至所述数据接收端,所述数据接收端将基于所述元数据中的内存长度信息与自身一次传输可接收的最大数据量,确定所述单次传输数据量,例如,假设所述内存长度信息为10字节,所述数据接收端每次进行传输所能接收的最大数量为2字节,则所述单次传输数据量的大小应小于或者等于2字节,进而接收所述数据接收端反馈的单次传输数量。
其中,所述元数据包括所述待传输列式数据对应的内存长度信息和所述发送端内存地址信息,所述协商信息包括单次传输数据量,
所述将所述元数据传输至与所述数据传输任务关联的数据接收端,以将所述待传输列式数据对应的发送端内存地址信息传输至所述数据接收端,并接收所述数据接收端基于所述元数据反馈的协商信息的步骤包括:
步骤S21,将所述内存长度信息和所述发送端内存地址信息传输至所述数据接收端,以供所述数据接收端确定所述内存长度信息对应的所述单次传输数据量,并基于所述发送端内存地址信息更新所述待传输列式数据的接收端内存地址信息;
在本实施例中,将所述内存长度信息和所述发送端内存地址信息传输至所述数据接收端,以供所述数据接收端确定所述内存长度信息对应的所述单次传输数据量,并基于所述发送端内存地址信息更新所述待传输列式数据的接收端内存地址信息,具体地,将所述内存长度信息传输至所述数据接收端,以供所述数据接收端基于所述内存长度信息和预设最大数据接收量,确定所述单次传输数量,其中,所述预设最大数据接收量为在一次数据传输过程中,所述数据接收端所能接收的最大数据的大小,并将所述发送端内存地址信息发送至所述数据接收端,以供所述数据接收端在获取所述待传输列式数据的接收端内存地址信息之后,基于所述接收端内存地址信息,更新所述发送端内存地址信息。
步骤S22,接收所述数据接收端反馈的所述单次传输数据量。
在本实施例中,需要说明的是,所述单次传输数据量用于对所述待传输列式数据进行拆分,获得单个或者多个虚拟列式存储批,其中,各所述虚拟列式存储批所对应的数据大小应小于或者等于所述单次传输数据量的大小。
步骤S30,基于所述协商信息,将所述待传输列式数据传输至所述数据接收端。
在本实施例中,基于所述协商信息,将所述待传输列式数据传输至所述数据接收端,具体地,基于所述协商信息中的单次传输数据量,将所述待传输列式数据进行拆分,获得多段待传输列式数据段,并将各所述待传输列式数据段对应的内存块逐一传输至所述数据接收端。
其中,所述协商信息包括单次传输数据量,
所述基于所述协商信息,将所述待传输列式数据传输至所述数据接收端的步骤包括:
步骤S31,基于所述单次传输数据量,对所述待传输列式数据进行拆分,获得多个虚拟列式存储批;
在本实施例中,需要说明的是,所述虚拟列式存储批包括预设数据量的部分所述待传输列式数据,其中,所述预设数据量的大小小于或者等于所述单次传输数据量的大小。
基于所述单次传输数据量,对所述待传输列式数据进行拆分,获得多个虚拟列式存储批,具体地,基于所述单次传输数据量,判断是否对所述待传输列式数据进行拆分,若所述单次数据传输量小于所述待传输列式数据的数据总量,则确定对所述待传输列式数据进行拆分,并将所述待传输列式数据拆分为多个所述虚拟列式存储批,若所述单次数据传输量大于或者等于所述待传输列式数据的数据总量,则确定不对所述待传输列式数据进行拆分,并将所述待传输列式数据作为单个虚拟列式存储批,以将所述待传输列式数据直接传输至所述数据接收端。
步骤S32,查询各所述虚拟列式存储批对应的待传输内存块;
在本实施例中,查询各所述虚拟列式存储批对应的待传输内存块,具体地,基于所述元数据中的发送端内存地址信息,在预设本地存储数据库中查询各所述虚拟列式存储批对应的虚拟内存,进而基于所述虚拟内存与所述内存块的映射关系,获取所述待传输内存块,其中,所述预设本地存储数据库包括共享内存。
步骤S33,基于预设传输协议,将各所述待传输内存块传输至所述数据接收端。
在本实施例中,需要说明的是,所述预设传输协议包括RDMA或普通TCP(Transmission Control Protocol,传输控制协议)等协议。
基于预设传输协议,将各所述待传输内存块传输至所述数据接收端,具体地,基于预设传输协议的数据传输规则,将各所述待传输内存块传输至所述数据接收端并在数据传输完成后,解除对各所述内存块的锁定,也即解除传输锁,如图2所示为当所述预设传输协议为RDMA时的数据传输过程示意图,其中,所述RDMA设备即为所述RDMA对应的数据传输设备,所述控制信道即为控制数据进行传输的通道,用于传输所述元数据,所述mem block1和所述mem block1均为传输锁,进而所述数据接收端在接收各所述待传输内存块之后,所述数据接收端将对各所述待传输内存块进行存储,并获取各所述待传输内存块的接收端内存地址信息,进而所述数据接收端将基于所述接收端内存地址信息对所述元数据进行更新,也即,将所述元数据中的发送端内存地址信息更新为所述接收端内存地址信息。
其中,所述协商信息包括单次传输数据量;
所述基于所述协商信息,将所述待传输列式数据传输至所述数据接收端的步骤包括:
步骤C10,基于所述单次传输数据量,将所述待传输列式数据作为单个虚拟列式存储批;
在本实施例中,基于所述单次传输数据量,将所述待传输列式数据作为单个虚拟列式存储批,具体地,基于所述单次传输数据量,判断是否对所述待传输列式数据进行拆分,若所述单次数据传输量小于所述待传输列式数据的数据总量,则确定对所述待传输列式数据进行拆分,并将所述待传输列式数据拆分为多个所述虚拟列式存储批,以将所述待传输列式数据直接传输至所述数据接收端,若所述单次数据传输量大于或者等于所述待传输列式数据的数据总量,则确定不对所述待传输列式数据进行拆分,并将所述待传输列式数据作为单个虚拟列式存储批。
步骤C20,查询所述单个虚拟列式存储批对应的待传输内存块;
在本实施例中,查询所述单个虚拟列式存储批对应的待传输内存块,具体地,基于所述元数据中的发送端内存地址信息,在预设本地存储数据库中查询各所述虚拟列式存储批对应的虚拟内存,进而基于所述虚拟内存与所述内存块的映射关系,获取所述待传输内存块,其中,所述预设本地存储数据库包括共享内存。
步骤C30,基于预设传输协议,将所述待传输内存块传输至所述数据接收端。
在本实施例中,需要说明的是,所述预设传输协议包括RDMA或普通TCP(Transmission Control Protocol,传输控制协议)等协议。
基于预设传输协议,将各所述待传输内存块传输至所述数据接收端,具体地,基于预设传输协议的数据传输规则,将各所述待传输内存块传输至所述数据接收端并在数据传输完成后,解除对各所述内存块的锁定,也即解除传输锁,进而所述数据接收端在接收各所述待传输内存块之后,所述数据接收端将对各所述待传输内存块进行存储,并获取各所述待传输内存块的接收端内存地址信息,进而所述数据接收端将基于所述接收端内存地址信息对所述元数据进行更新,也即,将所述元数据中的发送端内存地址信息更新为所述接收端内存地址信息。
其中,所述基于所述协商信息,将所述待传输列式数据传输至所述数据接收端的步骤包括;
步骤D10,基于所述预设多进程轮询模式和所述协商信息,将所述待传输列式数据传输至所述数据接收端。
在本实施例中,需要说明的是,所述协商信息包括单次传输数据量。
基于所述预设多进程轮询模式和所述协商信息,将所述待传输列式数据传输至所述数据接收端,具体地,基于所述单次传输数据量,将所述待传输列式数据拆分为单个或者多个虚拟列式存储批,并确定各所述虚拟列式存储批对应的待传输内存块,进一步地,基于所述预设多进程轮询模式,为各所述待传输内存块匹配不同的传输任务计算进程,以通过执行各所述传输任务计算进程,将各所述待传输内存块发送至所述数据接收端,其中,所述数据接收端在接所述各所述待传输内存块后,将对各所述待传输内存块进行存储,并获得对应的接收端内存地址信息,进而所述数据接收端将基于所述接收端内存地址信息对所述发送端内存地址信息进行更新,获得新的元数据,进而所述分布式同步计算任务需要所述待传输列式数据时,所述数据接收端可直接基于新的元数据,对所述待传输列式数据执行数据提取或者数据传输等操作。
其中,所述基于所述协商信息,将所述待传输列式数据传输至所述数据接收端的步骤包括;
步骤E10,基于所述预设中断模式和所述协商信息,将所述待传输列式数据传输至所述数据接收端。
在本实施例中,需要说明的是,所述协商信息包括单次传输数据量。
基于所述预设中断模式和所述协商信息,将所述待传输列式数据传输至所述数据接收端,具体地,基于所述单次传输数据量,将所述待传输列式数据拆分为单个或者多个虚拟列式存储批,并确定各所述虚拟列式存储批对应的待传输内存块,进一步地,基于所述预设中断模式,为各所述待传输内存块动态匹配传输任务计算进程,以通过执行各所述传输任务计算进程,将各所述待传输内存块发送至所述数据接收端,其中,所述数据接收端在接所述各所述待传输内存块后,将对各所述待传输内存块进行存储,并获得对应的接收端内存地址信息,进而所述数据接收端将基于所述接收端内存地址信息对所述发送端内存地址信息进行更新,获得新的元数据,进而所述分布式同步计算任务需要所述待传输列式数据时,所述数据接收端可直接基于新的元数据,对所述待传输列式数据执行数据提取或者数据传输等操作。
本实施例在当接收到数据传输任务时,确定所述数据传输任务对应的待传输列式数据和所述待传输列式数据对应的元数据,进而将所述元数据传输至与所述数据传输任务关联的数据接收端,以将所述待传输列式数据对应的发送端内存地址信息传输至所述数据接收端,并接收所述数据接收端基于所述元数据反馈的协商信息,进而基于所述协商信息,将所述待传输列式数据传输至所述数据接收端。也即,本实施例首先在当接收到数据传输任务时,进行所述数据传输任务对应的待传输列式数据和所述待传输列式数据对应的元数据的确定,进而将所述元数据传输至与所述数据传输任务关联的数据接收端,以将所述待传输列式数据对应的发送端内存地址信息传输至所述数据接收端,并进行所述数据接收端反馈的协商信息的反馈,进而基于所述协商信息,将所述待传输列式数据传输至所述数据接收端。也即,本实施例提供了一种高效的列式数据传输方法,通过构建所述元数据并将所述元数据传输至所述数据接收端,实现了将所述待传输列式数据和所述待传输列式数据对应的发送端内存地址信息直接传输至所述数据接收端的目的,进而所述数据接收端可直接基于所述发送内存地址信息获取所述待传输列式数据,进而节省了在传输数据时对所述待传输列式数据的序列化和反序列化的计算过程,也即,减少了列式数据传输时的计算过程,提高了数据传输的传输效率,所以,解决了现有技术中列式数据传输效率低的技术问题。
进一步地,参照图3,基于本申请中第一实施例,在数据传输优化方法的另一实施例中,所述数据传输优化方法应用于数据接收端,所述数据传输优化方法包括:
步骤C10,接收与所述数据接收端相关联的数据发送端发送的元数据对应的内存长度信息和发送端内存地址信息,并向所述数据发送端反馈所述内存长度信息对应的协商信息;
在本实施例中,需要说明的是,所述发送端内存地址信息可存储于所述元数据中,也即,所述元数据包括发送端内存地址信息和内存长度信息。
接收与所述数据接收端相关联的数据发送端发送的元数据对应的内存长度信息和发送端内存地址信息,并向所述数据发送端反馈所述内存长度信息对应的协商信息,具体地,接收与所述数据接收端相关联的数据发送端发送的元数据,其中,所述元数据包括发送端内存地址信息和内存长度信息,进而基于所述内存长度信息和预设单次最大数据接收量,确定所述单次传输数量,并将所述单次传输数据量反馈至所述数据发送端,其中,所述预设最大数据接收量所述数据接收端在进行一次单次数据传输时所能接收的最大数据量。
步骤C20,接收所述数据发送端基于所述协商信息发送的待传输列式数据,并将所述待传输列式数据存储至预设存储数据库,获得所述待传输列式数据的接收端内存地址信息;
在本实施例中,需要说明的是,所述待传输列式数据对应一个或者多个数据内存块,也即,所述待传输列式数据以内存块的形式进行存储和传输。
接收所述数据发送端基于所述协商信息发送的待传输列式数据,并将所述待传输列式数据存储至预设存储数据库,获得所述待传输列式数据的接收端内存地址信息,具体地,接收所述数据发送端发送的所述待传输列式数据对应的各所述待传输内存块,其中,各所述待传输内存块由所述数据发送端基于所述协商信息对所述待传输列式数据进行拆分而获得,进而将各所述待传输内存块存储至预设接收端本地数据库中,获得各所述待传输内存块在所述预设接收端本地数据库中的接收端内存地址信息,其中,所述预设接收端本地数据库包括接收端共享内存。
步骤C30,基于所述接收端内存地址信息,对所述发送端内存地址信息进行更新。
在本实施例中,基于所述接收端内存地址信息,对所述发送端内存地址信息进行更新,具体地,基于所述接收端内存地址信息,将所述元数据中的发送端内存地址信息更新为所述接收端内存地址信息,进而当需要提取所述待传输列式数据以进行计算任务或者数据传输任务时,可直接基于所述接收端内存地址信息,提取所述待传输列式数据,也即,无需对所述待传输列式数据进行序列化和反序列化即可直接提取所述待传输列式数据。
其中,所述预设存储数据库包括共享内存,
所述基于所述接收端内存地址信息,对所述发送端内存地址信息进行更新的步骤之后包括:
步骤C40,判断所述待传输列式数据是否为预设非虚拟机语言类型;
在本实施例中,需要说明的是,所述预设非虚拟机语言类型包括C/C++语言类型等,所述共享内存为所述数据接收端的共享内存,其中,所述共享内存为多个处理器的共有内存,也即,存在多个处理器可访问所述共享内存。
步骤C50,若所述待传输列式数据为所述预设非虚拟机语言类型,则直接访问所述共享内存以提取所述待传输列式数据;
在本实施例中,若所述待传输列式数据为所述预设非虚拟机语言类型,则直接访问所述共享内存以提取所述待传输列式数据,具体地,若所述待传输列式数据为所述预设非虚拟机语言类型,则表明可直接访问所述共享内存,进而通过预设列式数据读写数据接口,直接访问所述共享内存以提取所述待传输列式数据,以节省数据粘贴复制过程,且所述待传输列式数据可直接参与计算任务。
步骤C60,若所述待传输列式数据不为所述预设非虚拟机语言类型,则间接访问所述共享内存以通过预设数据接口提取所述待传输列式数据;
在本实施例中,若所述待传输列式数据不为所述预设非虚拟机语言类型,则间接访问所述共享内存以通过预设数据接口提取所述待传输列式数据,具体地,若所述待传输列式数据不为所述预设非虚拟机语言类型,则表明无法通过直接访问所述共享内存以提取所述待传输列式数据,进而通过jni(Java Native Interface,Java本地接口)或C扩展间接访问所述共享内存,以提取所述待传输列式数据。
本实施例通过接收与所述数据接收端相关联的数据发送端发送的元数据对应的内存长度信息和发送端内存地址信息,并向所述数据发送端反馈所述内存长度信息对应的协商信息,进而接收所述数据发送端基于所述协商信息发送的待传输列式数据,并将所述待传输列式数据存储至预设存储数据库,获得所述待传输列式数据的接收端内存地址信息,进而基于所述接收端内存地址信息,对所述发送端内存地址信息进行更新。也即,本实施例首先通过接收所述元数据,以获取发送端内存地址信息和内存长度信息,以基于所述内存长度信息向所述数据发送端反馈协商信息,以接收所述待传输列式数据,进而将所述待传输列式数据进行存储以获取接收端内存地址信息,进而基于所述接收端内存地址信息,进行对所述发送端内存地址信息的更新,也即,本实施例提供了一种在进行数据传输时,直接获取待传输列式数据和对应的存储地址信息的方法,进行实现了对待传输列式数据的直接传输,避免了对应数据进行序列化和反序列化,减少了数据传输时的计算过程,提高了数据传输的计算效率,进而提高了数据传输效率,所以,解决了现有技术中列式数据传输效率低的技术问题。
参照图4,图4是本申请实施例方案涉及的硬件运行环境的设备结构示意图。
如图4所示,该数据传输优化设备可以包括:处理器1001,例如CPU,存储器1005,通信总线1002。其中,通信总线1002用于实现处理器1001和存储器1005之间的连接通信。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储设备。
在一实施例中,该数据传输优化设备还可以包括矩形用户接口、网络接口、摄像头、RF(Radio Frequency,射频)电路,传感器、音频电路、WiFi模块等等。矩形用户接口可以包括显示屏(Display)、输入子模块比如键盘(Keyboard),可选矩形用户接口还可以包括标准的有线接口、无线接口。网络接口可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。
本领域技术人员可以理解,图4中示出的数据传输优化设备结构并不构成对数据传输优化设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图4所示,作为一种计算机存储可读存储介质的存储器1005中可以包括操作系统、网络通信模块以及数据传输优化程序。操作系统是管理和控制数据传输优化设备硬件和软件资源的程序,支持数据传输优化程序以及其它软件和/或程序的运行。网络通信模块用于实现存储器1005内部各组件之间的通信,以及与数据传输优化系统中其它硬件和软件之间通信。
在图4所示的数据传输优化设备中,处理器1001用于执行存储器1005中存储的数据传输优化程序,实现上述任一项所述的数据传输优化方法的步骤。
本申请数据传输优化设备具体实施方式与上述数据传输优化方法各实施例基本相同,在此不再赘述。
本申请实施例提供了一种可读存储介质,且所述可读存储介质存储有一个或者一个以上程序,所述一个或者一个以上程序还可被一个或者一个以上的处理器执行以用于实现上述任一项所述的数据传输优化方法的步骤。
本申请可读存储介质具体实施方式与上述数据传输优化方法各实施例基本相同,在此不再赘述。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利处理范围内。

Claims (20)

  1. 一种数据传输优化方法,其中,所述数据传输优化应用于数据发送端,所述数据传输优化方法包括:
    当接收到数据传输任务时,确定所述数据传输任务对应的待传输列式数据和所述待传输列式数据对应的元数据;
    将所述元数据传输至与所述数据传输任务关联的数据接收端,以将所述待传输列式数据对应的发送端内存地址信息传输至所述数据接收端,并接收所述数据接收端基于所述元数据反馈的协商信息;
    基于所述协商信息,将所述待传输列式数据传输至所述数据接收端。
  2. 如权利要求1所述的数据传输优化方法,其中,所述协商信息包括单次传输数据量;
    所述基于所述协商信息,将所述待传输列式数据传输至所述数据接收端的步骤包括:
    基于所述单次传输数据量,对所述待传输列式数据进行拆分,获得多个虚拟列式存储批;
    查询各所述虚拟列式存储批对应的待传输内存块;
    基于预设传输协议,将各所述待传输内存块传输至所述数据接收端。
  3. 如权利要求1所述的数据传输优化方法,其中,所述协商信息包括单次传输数据量;
    所述基于所述协商信息,将所述待传输列式数据传输至所述数据接收端的步骤包括:
    基于所述单次传输数据量,将所述待传输列式数据作为单个虚拟列式存储批;
    查询所述单个虚拟列式存储批对应的待传输内存块;
    基于预设传输协议,将所述待传输内存块传输至所述数据接收端。
  4. 如权利所要求1所述的数据传输优化方法,其中,所述数据传输任务包括分布式同步计算任务对应的第一数据传输任务,
    所述将所述元数据传输至与所述数据传输任务关联的数据接收端的步骤之前包括:
    判断所述数据传输任务是否为所述第一数据传输任务;
    若所述数据传输任务为所述第一数据传输任务,则启动预设多进程轮询模式;
    所述基于所述协商信息,将所述待传输列式数据传输至所述数据接收端的步骤包括:
    基于所述预设多进程轮询模式和所述协商信息,将所述待传输列式数据传输至所述数据接收端。
  5. 如权利所要求1所述的数据传输优化方法,其中,所述数据传输任务包括分布式异步计算任务对应的第二数据传输任务,
    所述将所述元数据传输至与所述数据传输任务关联的数据接收端的步骤之前包括:
    判断所述数据传输任务是否为所述第二数据传输任务;
    若所述数据传输任务为所述第二数据传输任务,则启动预设中断模式;
    所述基于所述协商信息,将所述待传输列式数据传输至所述数据接收端的步骤包括:
    基于所述预设中断模式和所述协商信息,将所述待传输列式数据传输至所述数据接收端。
  6. 如权利所要求1所述的数据传输优化方法,其中,所述元数据包括所述待传输列式数据对应的内存长度信息和所述发送端内存地址信息,
    所述将所述元数据传输至与所述数据传输任务关联的数据接收端,以将所述待传输列式数据对应的发送端内存地址信息传输至所述数据接收端,并接收所述数据接收端基于所述元数据反馈的协商信息的步骤包括:
    将所述内存长度信息和所述发送端内存地址信息传输至所述数据接收端,以供所述数据接收端确定所述内存长度信息对应的所述单次传输数据量,并基于所述发送端内存地址信息更新所述待传输列式数据的接收端内存地址信息;
    接收所述数据接收端反馈的所述单次传输数据量。
  7. 如权利所要求1所述的数据传输优化方法,其中,所述元数据包括所述待传输列式数据的发送端内存地址信息和内存长度信息。
  8. 如权利所要求7所述的数据传输优化方法,其中,所述发送端内存地址信息为所述待传输列式数据对应的各内存块的地址信息。
  9. 如权利所要求1所述的数据传输优化方法,其中,所述数据传输任务绑定预设第一回调函数和预设第二回调函数。
  10. 如权利所要求9所述的数据传输优化方法,其中,所述预设第一回调函数用于在进行数据传输之前通过计算统计第一数据长度、传输时长、数据特征。
  11. 如权利所要求10所述的数据传输优化方法,其中,所述第一数据长度为未传输至数据接收端的所述待传输列式数据的数据长度。
  12. 如权利所要求10所述的数据传输优化方法,其中,所述传输时长为完成本次数据传输任务所要花费的累计时长。
  13. 如权利所要求10所述的数据传输优化方法,其中,所述预设第二回调函数用于在进行数据传输之后将数据合并到另一区域、或丢弃当前数据、或统计第二数据长度、传输时长、数据特征。
  14. 如权利所要求10所述的数据传输优化方法,其中,所述第二数据长度为已传输至数据接收端的所述待传输列式数据的数据长度,所述当前数据包括所述待传输列式数据。
  15. 如权利所要求4所述的数据传输优化方法,其中,所述第一数据传输任务与分布式同步计算任务相关联。
  16. 如权利所要求15所述的数据传输优化方法,其中,所述分布式同步计算任务为同一时间步运行多个进程以完成不同的计算子任务的分布式计算任务。
  17. 一种数据传输优化方法,其中,所述数据传输优化方法应用于数据接收端,所述数据传输优化方法包括:
    接收与所述数据接收端相关联的数据发送端发送的元数据对应的内存长度信息和发送端内存地址信息,并向所述数据发送端反馈所述内存长度信息对应的协商信息;
    接收所述数据发送端基于所述协商信息发送的待传输列式数据,并将所述待传输列式数据存储至预设存储数据库,获得所述待传输列式数据的接收端内存地址信息;
    基于所述接收端内存地址信息,对所述发送端内存地址信息进行更新。
  18. 如权利所要求17所述的数据传输优化方法,其中,所述预设存储数据库包括共享内存,
    所述基于所述接收端内存地址信息,对所述发送端内存地址信息进行更新的步骤之后包括:
    判断所述待传输列式数据是否为预设非虚拟机语言类型;
    若所述待传输列式数据为所述预设非虚拟机语言类型,则直接访问所述共享内存以提取所述待传输列式数据;
    若所述待传输列式数据不为所述预设非虚拟机语言类型,则间接访问所述共享内存以通过预设数据接口提取所述待传输列式数据。
  19. 一种数据传输优化设备,其中,所述数据传输优化设备包括:存储器、处理器以及存储在存储器上的用于实现所述数据传输优化方法的程序,
    所述存储器用于存储实现数据传输优化方法的程序;
    所述处理器用于执行实现所述数据传输优化方法的程序,以实现如权利要求1至16或者17至18中任一项所述的数据传输优化方法的步骤。
  20. 一种可读存储介质,其中,所述可读存储介质上存储有实现数据传输优化方法的程序,所述实现数据传输优化方法的程序被处理器执行以实现如权利要求1至16或者17至18中任一项所述的数据传输优化方法的步骤。
PCT/CN2020/133428 2019-12-20 2020-12-02 数据传输优化方法、设备及可读存储介质 WO2021121041A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911325502.0A CN111107022B (zh) 2019-12-20 2019-12-20 数据传输优化方法、设备及可读存储介质
CN201911325502.0 2019-12-20

Publications (1)

Publication Number Publication Date
WO2021121041A1 true WO2021121041A1 (zh) 2021-06-24

Family

ID=70423022

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/133428 WO2021121041A1 (zh) 2019-12-20 2020-12-02 数据传输优化方法、设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN111107022B (zh)
WO (1) WO2021121041A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111107022B (zh) * 2019-12-20 2021-08-27 深圳前海微众银行股份有限公司 数据传输优化方法、设备及可读存储介质
CN112565402B (zh) * 2020-12-02 2022-10-28 浙江强脑科技有限公司 数据传输方法、装置、设备及计算机可读存储介质
CN112714181A (zh) * 2020-12-25 2021-04-27 北京四维纵横数据技术有限公司 一种数据传输方法及装置
CN115022860A (zh) * 2022-07-04 2022-09-06 北京展跃芯智科技有限公司 蓝牙低功耗设备的升级方法及装置、电子设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715039A (zh) * 2015-03-23 2015-06-17 星环信息科技(上海)有限公司 基于硬盘和内存的列式存储和查询方法及设备
US20160147447A1 (en) * 2014-11-25 2016-05-26 Rolando Blanco N-bit Compressed Versioned Column Data Array for In-Memory Columnar Stores
CN109379398A (zh) * 2018-08-31 2019-02-22 北京奇艺世纪科技有限公司 一种数据同步方法及装置
CN109558344A (zh) * 2018-12-03 2019-04-02 郑州云海信息技术有限公司 一种适用于网络传输的dma传输方法及dma控制器
CN109933631A (zh) * 2019-03-20 2019-06-25 江苏瑞中数据股份有限公司 基于Infiniband网络的分布式并行数据库系统及数据处理方法
CN111107022A (zh) * 2019-12-20 2020-05-05 深圳前海微众银行股份有限公司 数据传输优化方法、设备及可读存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160147447A1 (en) * 2014-11-25 2016-05-26 Rolando Blanco N-bit Compressed Versioned Column Data Array for In-Memory Columnar Stores
CN104715039A (zh) * 2015-03-23 2015-06-17 星环信息科技(上海)有限公司 基于硬盘和内存的列式存储和查询方法及设备
CN109379398A (zh) * 2018-08-31 2019-02-22 北京奇艺世纪科技有限公司 一种数据同步方法及装置
CN109558344A (zh) * 2018-12-03 2019-04-02 郑州云海信息技术有限公司 一种适用于网络传输的dma传输方法及dma控制器
CN109933631A (zh) * 2019-03-20 2019-06-25 江苏瑞中数据股份有限公司 基于Infiniband网络的分布式并行数据库系统及数据处理方法
CN111107022A (zh) * 2019-12-20 2020-05-05 深圳前海微众银行股份有限公司 数据传输优化方法、设备及可读存储介质

Also Published As

Publication number Publication date
CN111107022A (zh) 2020-05-05
CN111107022B (zh) 2021-08-27

Similar Documents

Publication Publication Date Title
WO2021121041A1 (zh) 数据传输优化方法、设备及可读存储介质
US11882054B2 (en) Terminating data server nodes
US11175832B2 (en) Thread groups for pluggable database connection consolidation in NUMA environment
US10732836B2 (en) Remote one-sided persistent writes
CN111459418B (zh) 一种基于rdma的键值存储系统传输方法
JP3696563B2 (ja) コンピュータ・プロセッサ及び処理装置
US8949220B2 (en) Techniques for managing XML data associated with multiple execution units
JP3483877B2 (ja) プロセッサでのデータ処理方法及びデータ処理システム
JP4334901B2 (ja) コンピュータ処理システム及びコンピュータで実行される処理方法
US9898338B2 (en) Network computer system and method for dynamically changing execution sequence of application programs
US8898236B2 (en) Automated conversion of versioned data collections
US11210277B2 (en) Distributing and processing streams over one or more networks for on-the-fly schema evolution
US10802766B2 (en) Database with NVDIMM as persistent storage
WO2020140614A1 (zh) 离线消息分发方法、服务器及存储介质
US8296774B2 (en) Service-based endpoint discovery for client-side load balancing
US20160034582A1 (en) Computing device and method for executing database operation command
US20190102309A1 (en) Nv cache
US20120297216A1 (en) Dynamically selecting active polling or timed waits
US20080162877A1 (en) Non-Homogeneous Multi-Processor System With Shared Memory
WO2023046141A1 (zh) 一种数据库网络负载性能的加速框架、加速方法及设备
US7743333B2 (en) Suspending a result set and continuing from a suspended result set for scrollable cursors
US20090328043A1 (en) Infrastructure of data summarization including light programs and helper steps
US7725591B2 (en) Detecting a timeout of elements in an element processing system
WO2022142008A1 (zh) 数据处理方法、装置、电子设备及存储介质
US7613710B2 (en) Suspending a result set and continuing from a suspended result set

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20900893

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20900893

Country of ref document: EP

Kind code of ref document: A1