WO2024103752A1

WO2024103752A1 - File transmission method, apparatus and system, electronic device, and storage medium

Info

Publication number: WO2024103752A1
Application number: PCT/CN2023/103618
Authority: WO
Inventors: 官祥臻; 王钰涵; 赵天武; 桂林
Original assignee: 工赋(青岛)科技有限公司; 卡奥斯工业智能研究院(青岛)有限公司
Priority date: 2022-11-16
Filing date: 2023-06-29
Publication date: 2024-05-23
Also published as: CN115801765A

Abstract

A file transmission method, apparatus and system, an electronic device, and a storage medium. The method is applied to a data consumer, and comprises: acquiring metadata of each of a plurality of file slices from a distributed database, the plurality of file slices and the metadata of each file slice being obtained by splitting an original file by a data producer (101); according to the metadata of each file slice, acquiring matched file slices from a topic partition of a message publishing and subscribing system to obtain a plurality of file slices (102); and creating a file processing stream, and, according to the metadata of each file slice, merging the plurality of file slices by using the file processing stream to obtain a target file (103).

Description

File transmission method, device, system, electronic device and storage medium

This application claims priority to the Chinese patent application filed with the China Patent Office on November 16, 2022, with application number 202211434252.6, the entire contents of which are incorporated by reference into this application.

Technical Field

The present application relates to data transmission technology, for example, to a file transmission method, device, system, electronic device and storage medium.

Background technique

Kafka is a distributed message publishing and subscription system with the advantages of high throughput, low latency, and high availability. Kafka generally transmits structured data such as logs. The default size of each data transmitted by Kafka does not exceed 1MB, which makes it impossible to transmit relatively large binary files (such as videos, pictures, compressed packages, etc.) through Kafka.

Summary of the invention

The present application provides a file transmission method, device, system, electronic device and storage medium, which can realize the transmission of large files by using a message publishing and subscription system, thereby improving the file transmission efficiency.

In a first aspect, the present application provides a file transmission method, which is applied to a data consumption end, and the method comprises:

Obtaining metadata of each file slice in a plurality of file slices from a distributed database, wherein the plurality of file slices and the metadata of each file slice are obtained by slicing the original file at the data production end;

Obtain the multiple file slices from the topic partition of the message publishing and subscription system according to the metadata of each file slice;

A file processing stream is created, and the plurality of file slices are merged according to the metadata of each file slice by using the file processing stream to obtain a target file.

In a second aspect, the present application provides a file transmission method, which is applied to a data production end, and the method includes:

Acquire an original file, and slice the original file to obtain multiple file slices and metadata of each file slice;

The multiple file slices are sent to the topic partition of the message publishing and subscription system, and each The metadata of each file slice is stored in a distributed database, so that after the data consumer obtains the metadata of each file slice from the distributed database and obtains the multiple file slices from the subject partition, the multiple file slices are merged according to the metadata of each file slice to obtain the target file.

In a third aspect, the present application provides a file transmission device, which is applied to a data consumption end, and the device includes:

A first acquisition module is configured to acquire metadata of each file slice in a plurality of file slices from a distributed database, wherein the plurality of file slices and the metadata of each file slice are obtained by segmenting the original file at the data production end;

A second acquisition module is configured to acquire the multiple file slices from a topic partition of a message publishing and subscription system according to the metadata of each file slice;

The merging module is configured to create a file processing stream, and use the file processing stream to merge the multiple file slices according to the metadata of each file slice to obtain a target file.

In a fourth aspect, the present application provides a file transmission device, which is applied to a data production end, and the device includes:

A segmentation module, configured to obtain an original file and segment the original file to obtain a plurality of file slices and metadata of each file slice;

A sending module, configured to send the plurality of file slices to a topic partition of a message publishing and subscription system;

The storage module is configured to store the metadata of each file slice into a distributed database, so that after the data consumer obtains the metadata of each file slice from the distributed database and obtains the multiple file slices from the subject partition, the multiple file slices are merged according to the metadata of each file slice to obtain the target file.

In a fifth aspect, the present application provides a file transfer system, comprising a data consumption end and a data production end for executing the file transfer method described in any embodiment of the present application.

In a sixth aspect, the present application provides an electronic device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the file transfer method as described in any embodiment of the present application is implemented.

In a seventh aspect, the present application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the file transfer method as described in any embodiment of the present application.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solution of the present application, the drawings required for use in the embodiments are briefly introduced below.

FIG1 is a schematic diagram of a flow chart of a file transmission method provided by the present application;

FIG2 is another schematic diagram of the process of the file transmission method provided by the present application;

FIG3 is another schematic diagram of the process of file transmission provided by the present application;

FIG4 is an exemplary flow chart of a file transmission method provided by the present application;

FIG5 is a schematic diagram of a structure of a file transmission device provided by the present application;

FIG6 is another schematic diagram of the structure of the file transmission device provided by the present application;

FIG. 7 is a schematic diagram of the structure of the electronic device provided by the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.

The terms "first", "second", etc. in the specification and claims of this application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.

FIG1 is a flowchart of a file transfer method provided by the present application, which can be performed by a file transfer device provided by the present application, and the device can be implemented in software and/or hardware. In an exemplary embodiment, the device can be integrated into a data consumption end, for example, it can be integrated into an electronic device at the data consumption end, and the electronic device can be a computer. The following embodiments will be described by taking the device integrated into an electronic device at the data consumption end as an example.

Before introducing the processing process on the data consumption side, let's first introduce the processing process on the data production side, as follows:

In this embodiment, the files to be transmitted at the data production end may be large files such as videos, pictures, compressed packages, etc. There may be multiple files to be transmitted, and these files are usually larger than 1MB, while the size of each piece of data transmitted by the message publishing and subscription system Kafka is generally no more than 1MB, so it is impossible to directly use the message publishing and subscription system to transmit these files to the demand side. In view of this situation, in this embodiment, before these files are transmitted using the message publishing and subscription system, each file can be sliced according to the message publishing and subscription system's limit on the size of the transmitted data, such as cutting each file into slices smaller than 1MB, thereby obtaining multiple file slices for each file, and each file slice can have metadata, which is descriptive data of the file slice.

After each file is cut into multiple file slices, the multiple file slices can be uploaded to the message publishing and subscription system. The message publishing and subscription system stores data in a classified manner, that is, each file slice has a category in the message publishing and subscription system, which is called a topic. Physically, data of different topics are stored separately. Each topic includes one or more partitions, that is, after each file slice is uploaded to the message publishing and subscription system, it will be stored in the corresponding topic partition. After storage, the offset of the file slice in the topic partition will be obtained. The topic partition stores file slices according to queues, and the offset can indicate the position of the actual position of the file slice relative to the offset of the queue head. In addition, when slicing files, the serial number of each file slice can also be recorded, and the serial number of each file slice and the offset of each file slice in the topic partition of the message publishing and subscription system are used as the metadata of the corresponding file slice. Of course, the metadata of the file slice can also include other information, such as the file name of the file, the information summary code of the file, etc., which are not limited here.

Since the message publishing and subscription system is generally not used for long-term data storage, and the storage space is limited, there is no concept of metadata storage. Therefore, in this embodiment, the file slice and the metadata of the file slice can be processed separately, the file slice is pushed to the message publishing and subscription system, and the metadata of the file slice is uploaded to the distributed database. The use of the distributed database to store metadata can facilitate the subsequent fast and efficient query of the metadata of the file slice. The distributed database can be a document database based on distributed file storage. For example, the distributed database can be MongoDB. MongoDB is a product between relational databases and non-relational databases. It is the most feature-rich and relational database among non-relational databases. The data structure it supports is very loose, so it can store relatively complex data types.

The following describes the processing process of the data consumer end. Continuing with reference to Figure 1, it may include the following steps:

Step 101, obtain metadata of each file slice in multiple file slices from a distributed database, the multiple file slices and the metadata of each file slice are obtained by the data production end by segmenting the original file. arrive.

For example, the original file can be any one of the multiple files transmitted by the data production end, depending on the consumption needs of the data consumption end, that is, the original file is also a large file such as video, picture, compressed package, etc. Since the data production end stores the metadata of each file slice in the distributed database after splitting the file, the metadata of each file slice of the required file can be obtained from the distributed database first during actual consumption.

Step 102: According to the metadata of each file slice, a matching file slice is obtained from a topic partition of a message publishing and subscription system to obtain a plurality of file slices.

The metadata includes the serial number of the file slice and the offset of the file slice in the topic partition; the serial number of the file slice refers to the position of the file slice in the original binary file. For example, a file size of 1GB is cut into 1024 file slices, and the serial numbers of the file slices can be 1 to 1024 respectively; the offset corresponding to each piece of data included in the message publishing and subscription system is divided into two parts: index and log. The index records the offset information of this data, and the log stores this data information. The consumer side searches for the offset of the file slice in the topic partition according to the serial number of each file slice, and obtains the matching file slice from the topic partition according to the offset of the file slice in the topic partition for consumption. For example, when the data consumer side consumes for the first time, it starts consuming from the file slice with an offset of 0 and consumes until 8, and the offset is recorded at 8. When consuming next time, it can start consuming from the beginning or from the last position. The maximum value that the data consumer side can consume is the maximum offset value written by the data producer side. When the maximum value of consumption is reached, it indicates that multiple file slices have been obtained.

Step 103: Create a file processing stream, and use the file processing stream to merge multiple file slices according to the metadata of each file slice to obtain a target file.

The file processing stream may include a file reading stream and a file writing stream. The file reading stream may be used to read data from multiple file slices in sequence according to the serial number, and the file writing stream may be used to write the read data into a specified file, thereby merging multiple file slices by encoding the file stream to obtain the target file.

In the solution of this application, the data production end processes the original file slices to obtain multiple file slices, pushes the multiple file slices to the message publishing and subscription system, and saves the metadata corresponding to each file slice to the distributed database; the data consumption end obtains the metadata corresponding to each file slice from the distributed database According to the metadata of each file slice, multiple file slices matching the metadata are obtained from the topic partition of the message publishing and subscription system; a file processing stream is created, and the file processing stream is used to merge multiple file slices according to the metadata of each file slice to obtain the target file. That is, the present application can slice large files, transmit the file slices through the message publishing and subscription system, and store the metadata of the file slices in a distributed database. With the help of file segmentation technology and distributed database, large files can be transmitted using the message publishing and subscription system, thereby improving the efficiency of file transmission.

FIG2 is another flowchart of the file transmission method provided by the present application, which illustrates the file transmission method provided by the present application. The method can be integrated in an electronic device at a data consumption end, and the electronic device can be a computer. The following embodiments will be described by taking the device integrated in an electronic device at a data consumption end as an example. As shown in FIG2 , the method can include the following steps:

Step 201, obtaining information summary codes of multiple files by querying a distributed database.

The message digest code refers to a 128-bit feature code obtained by digitally transforming the original information according to the public message digest algorithm. This feature code is irreversible and highly discrete, which can ensure the uniqueness of the file or file slice. For example, the message digest code can be an md5 code, and each file has a unique md5 code (i.e. file md5, abbreviated as fmd5).

Step 202, write the information summary codes of multiple files into a preset set to obtain an original summary code set.

An empty data set can be constructed, and the md5 codes of each file queried from the distributed database can be written into the empty data set to obtain an original summary code set, that is, the original summary code set includes multiple fmd5s.

Step 203: perform deduplication processing on the original summary code set to obtain a target summary code set.

Deduplication can avoid saving duplicate data in the database to cause a large amount of redundant data. After deduplication, a new data set can be obtained, that is, the set of target fmd5 codes. The number of fmd5 codes contained in the set is the number of files transmitted through Kafka. Due to the uniqueness of the target summary code, the data consumer can obtain multiple file slices of a file in the topic partition through the target summary code.

Step 204: Identify the information summary code of the original file from the target summary code set.

For example, the correspondence information between the information digest code of the file and the file identifier can be pre-stored. The file identifier can be the file name, and the information digest code can be the md5 code (i.e., fmd5 code) of the file. The information digest code of the original file is identified from the target summary code set according to the correspondence information. Here, the original file The file refers to the file currently needed by the consumer. The information summary code of the original file can also be determined directly based on the corresponding relationship information between the information summary code of the pre-stored file and the file identifier.

Step 205: Obtain metadata of each file slice in the plurality of file slices from a distributed database according to the information summary code of the original file.

That is, according to the information summary code of the original file, the metadata of all file slices of the original file are obtained from the distributed database at one time. For example, the metadata of each file slice may also include the following content:

Serial number: indicates the position of the file slice in the binary original file. For example, if the original file size is 1GB and the file is cut into 1024 file slices, the serial numbers of the 1024 file slices can be 1-1024.

Information summary code: includes the original file information summary code (fmd5 code) and each file slice information summary code (md5 code).

Offset: Indicates the location of the file slice in the topic partition of the message publishing and subscription system.

End status: Indicates whether the current slice is the last slice.

File name: The name and extension of the file.

Step 206, find the offset of the file slice in the subject partition according to the serial number of the file slice.

In a distributed database, different file slices can be distinguished by serial numbers, and in a topic partition, different file slices are distinguished by offsets. Therefore, after obtaining the metadata of a file slice, the offset of the file slice in the topic partition can be found according to the serial number of the file slice in the metadata. Each message in a topic partition has its own unique offset, which is used to indicate the location information of the message in the partition.

Step 207, obtaining the file slice from the subject partition according to the offset of the file slice in the subject partition.

Step 208: Verify each file slice in the plurality of file slices according to the information summary code of the file slice included in the metadata.

The obtained file slice can be verified according to the serial number first, that is, it can be determined whether the serial number of the obtained file slice is the same as the serial number of the file slice to be obtained. If the serial numbers are the same, the serial number verification passes; next, the file slice can be verified according to the information digest code, that is, the information digest code corresponding to the file slice obtained from the subject partition is calculated, and the information digest code of the file slice in the metadata is compared with the calculated information digest code. If the two are the same, the verification passes, and if they are different, the verification fails, thereby ensuring that the file slice transmitted by the subject partition is correct. Not tampered with.

Step 209, determine whether the verification is passed. If the verification is passed, execute step 210. If the verification is not passed, return to execute step 207.

That is, if the verification passes, continue to obtain the next file slice. If the verification fails, re-pull the corresponding file slice and verify it until all file slices of the original file are obtained.

Step 210, obtaining the next file slice, and executing steps 208 and 209, until multiple file slices are obtained.

Step 211, create a file processing stream, and use the file processing stream to write the data in multiple file slices into a designated file according to the serial number of each file slice to obtain a target file.

When multiple file slices are verified, it proves that the data transmission is completed and no errors occur during the data transmission process. When the end status in the metadata is detected, it proves that the file slices have been acquired. Next, the step of merging multiple file slices according to the metadata of each file slice using the file processing flow is triggered.

When merging multiple file slices, a file processing stream can be created. The file processing stream may include a file reading stream and a file writing stream. The file slices may be sorted according to the serial numbers so that they are the same as the order of the files before data transmission. The file reading stream is used to read data from multiple file slices in sequence according to the sorting, and the file writing stream is used to write the read data into a specified file, thereby merging multiple file slices by encoding the file stream to obtain the target file.

Step 212: verify the target file according to the information summary code of the original file.

The information summary code of the target file can be calculated, and the information summary code of the original file can be compared with the information summary code of the target file. If the information summary codes are consistent, the target file verification passes and the file transfer process is completed.

In the scheme of the present application, the data consumer obtains the metadata of each file slice in multiple file slices from a distributed database. The multiple file slices and the metadata of each file slice are obtained by the data producer through segmentation processing of the original file; multiple file slices are obtained from the topic partition of the message publishing and subscription system according to the metadata of each file slice; a file processing stream is created, and the file processing stream is used to merge the multiple file slices according to the metadata of each file slice to obtain the target file. That is, the present application can slice large files, transmit the file slices through the message publishing and subscription system, store the metadata of the file slices in a distributed database, and realize the use of message publishing and subscription by means of file segmentation technology and distributed database. The system transfers large files, which improves the efficiency of file transfer.

FIG3 is another flowchart of the file transmission method provided by the present application. The method can be integrated in an electronic device at the data production end, and the electronic device can be, for example, a computer. The following embodiments will be described by taking the file transmission device integrated in the electronic device at the data production end as an example. As shown in FIG3, the method can include the following steps:

Step 301, obtain an original file, and segment the original file to obtain multiple file slices and metadata of each file slice.

On the data production side, you can use the log collection system (Flume) to pull binary original files. Flume is a highly available, highly reliable, distributed system for collecting, aggregating and transmitting massive logs. It is a tool that can collect data resources such as logs and events, and centralize and store these huge amounts of data from various data resources. By pulling the binary file through Flume, you can customize the shard size to adapt to Kafka's message size and performance tuning. For example, you can split the original file into multiple file slices no larger than 1MB to meet Kafka's file transmission conditions, and get the metadata of each file slice.

Step 302: Send multiple file slices to the topic partition of the message publishing and subscription system, and store the metadata of each file slice in a distributed database.

The metadata includes the information summary code of the file slice, the offset of the file slice in the subject partition, the serial number of the file slice, the end status of the file slice, the full name of the original file and the information summary code of the original file. The end status of the file slice is a flag that is only in the metadata of the last file slice; the full name of the original file and the information summary code of the original file are used to identify the file and ensure the uniqueness of the original file. Next, the file slice is sent to the subject partition of the message publishing and subscription system. Since the sliced files are no larger than 1MB, they can be stably transmitted in the subject partition, and the metadata of each file slice is stored in the distributed database. For subsequent data processing, please refer to the previous embodiment, which will not be repeated here.

The scheme of the present application is that the data production end obtains the original file and divides the original file into multiple file slices and metadata of each file slice; the multiple file slices are sent to the topic partition of the message publishing and subscription system, and the metadata of each file slice is stored in the distributed database, so that the data consumption end obtains the metadata of each file slice from the distributed database and obtains the multiple file slices from the topic partition, and then stores the metadata of each file slice in the distributed database according to the metadata of each file slice. Multiple file slices are merged to obtain the target file. That is, the present application can slice large files, transmit the file slices through the message publishing and subscription system, store the metadata of the file slices in a distributed database, and realize the transmission of large files using the message publishing and subscription system with the help of file segmentation technology and distributed database, thereby improving the efficiency of file transmission.

Figure 4 is an exemplary flow chart of the file transfer method provided by the present application, and the flow is as follows: the data production end first slices the original file according to the size of each file slice not exceeding 1MB, and saves the metadata of each file slice in a distributed database, and then sends the file slice to the topic partition of the message publishing and subscription system for transmission. When a data consumer needs to consume data, the data consumer can query the distributed database to obtain the information digest code of each file, write the obtained information digest code of each file into a preset set and automatically deduplicate, and the number of information digest codes in the deduplicated set (i.e., the target digest code set) is equal to the number of files; according to the correspondence relationship information between the information digest code of the pre-stored file and the file identifier, the information digest code of the original file currently required to be downloaded is identified from the target digest code set, and the metadata of each file slice of the original file is obtained from the distributed database according to the information digest code of the original file, and the offset of the corresponding file slice is found according to the serial number of the file slice in the metadata, and the consumer consumes according to the queried offset, and pulls the file slice from the message publishing and subscription system. When the pulled file slice passes the verification, the file processing flow is used to write the data in the pulled file slice into the preset file, and the next file slice is continued to be pulled. After all the file slices of the original file are pulled and written into the preset file, the file processing flow is closed to obtain the target file, and then the information digest code of the original file can be compared with the information digest code of the calculated target file to verify the target file. If the verification passes, the file transfer is completed.

FIG5 is a schematic diagram of a structure of a file transmission device provided by the present application, which is suitable for executing the file transmission method provided by the present application and is applied to a data consumer. As shown in FIG5 , the device may include:

The first acquisition module 501 is configured to obtain metadata of each file slice in a plurality of file slices from a distributed database, wherein the plurality of file slices and the metadata of each file slice are obtained by segmenting the original file at the data production end; the second acquisition module 502 is configured to obtain matching file slices from the topic partition of the message publishing and subscription system according to the metadata of each file slice, thereby obtaining the plurality of file slices; the merging module 503 is configured to create a file processing flow, and utilize the file processing flow to generate a matching file slice; The multiple file slices are merged according to the metadata of each file slice to obtain a target file.

In one embodiment, the device also includes a set acquisition module, which is configured to: obtain information summary codes of multiple files by querying the distributed database, write the information summary codes of the multiple files into a preset set to obtain an original summary code set; and deduplicate the original summary code set to obtain a target summary code set.

In one embodiment, the first acquisition module 501 is configured to: identify the information summary code of the original file from the target summary code set; and acquire metadata of each file slice in multiple file slices from the distributed database according to the information summary code of the original file.

In one embodiment, the metadata includes the serial number of the file slice and the offset of the file slice in the subject partition, and the second acquisition module 502 is configured to: search for the offset of the file slice in the subject partition according to the serial number of the file slice; and obtain the file slice from the subject partition according to the offset of the file slice in the subject partition until the multiple file slices are obtained.

In one embodiment, the metadata also includes an information summary code of the corresponding file slice, and the device also includes: a file slice verification module, configured to verify each of the multiple file slices according to the information summary code of the corresponding file slice included in the metadata before using the file processing flow to merge the multiple file slices according to the metadata of each file slice; a trigger module, configured to trigger the merging module 503 to execute the step of merging the multiple file slices according to the metadata of each file slice using the file processing flow when all the multiple file slices have passed the verification.

In one embodiment, the merging module 503 is configured to: write the data in the multiple file slices into a designated file according to the serial number of each file slice by using the file processing flow to obtain the target file.

In one embodiment, the device further includes: a file verification module, configured to verify the target file according to the information summary code of the original file.

The device of the present application obtains the metadata of each file slice in multiple file slices from a distributed database. The multiple file slices and the metadata of each file slice are obtained by slicing the original file at the data production end; according to the metadata of each file slice, multiple matching file slices are obtained from the topic partition of the message publishing and subscription system; a file processing stream is created, and the file processing stream is used to merge the multiple file slices according to the metadata of each file slice to obtain the target file. That is, the present application can slice a large file, transmit the file slices through the message publishing and subscription system, and store the metadata of the file slices in the distributed database. In the database, with the help of file segmentation technology and distributed database, large files can be transmitted using the message publishing and subscription system, which improves the efficiency of file transmission.

FIG6 is another structural schematic diagram of the file transmission device provided by the present application, which is suitable for executing the file transmission method provided by the present application and is applied to the data production end. As shown in FIG6 , the device may include:

The segmentation module 601 is configured to obtain an original file and segment the original file to obtain multiple file slices and metadata of each file slice; the sending module 602 is configured to send the multiple file slices to the topic partition of the message publishing and subscription system; the storage module 603 is configured to store the metadata of each file slice in a distributed database, so that after the data consumer obtains the metadata of each file slice from the distributed database and obtains the multiple file slices from the topic partition, the multiple file slices are merged according to the metadata of each file slice to obtain the target file.

The device of the present application obtains the original file, and divides the original file into multiple file slices and metadata of each file slice; sends the multiple file slices to the topic partition of the message publishing and subscription system, and stores the metadata of each file slice in a distributed database, so that after the data consumer obtains the metadata of each file slice from the distributed database and obtains the multiple file slices from the topic partition, the multiple file slices are merged according to the metadata of each file slice to obtain the target file. That is, the present application can slice large files, transmit the file slices through the message publishing and subscription system, and store the metadata of the file slices in the distributed database. With the help of file segmentation technology and distributed database, it realizes the transmission of large files using the message publishing and subscription system, and improves the efficiency of file transmission.

The present application also provides a file transfer system, including a data consumption end and a data production end for executing the file transfer method described in any embodiment of the present application.

The present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the file transfer method provided in any of the above embodiments when executing the program.

The present application also provides a computer-readable medium having a computer program stored thereon, and when the program is executed by a processor, the file transmission method provided in any of the above embodiments is implemented.

Referring to Figure 7, a schematic diagram of a computer system 700 suitable for implementing the electronic device of the present application is shown. The electronic device shown in Figure 7 is only an example and should not bring any limitation to the function and scope of use of the present application.

As shown in FIG. 7 , the computer system 700 includes a central processing unit (CPU) 701, which can perform various appropriate actions and processes according to the program stored in the read-only memory (ROM) 702 or the program loaded from the storage part 708 to the random access memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. The input/output (I/O) interface 705 is also connected to the bus 704.

The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, etc.; an output section 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 708 including a hard disk, etc.; and a communication section 709 including a network interface card such as a local area network (LAN) card, a modem, etc. The communication section 709 performs communication processing via a network such as the Internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 710 as needed so that a computer program read therefrom is installed into the storage section 708 as needed.

According to the embodiments disclosed in the present application, the process described above with reference to the flowchart can be implemented as a computer software program. For example, the embodiments disclosed in the present application include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program includes a program code for executing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from the network through the communication part 709, and/or installed from the removable medium 711. When the computer program is executed by the central processing unit (CPU) 701, the above-mentioned functions of the device in the system of the present application are executed.

The computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination of the above. The computer-readable storage medium may include, but is not limited to, an electrical connection with one or more wires, a portable computer disk, a hard disk, RAM, ROM, an erasable programmable read-only memory, or a computer-readable medium. EPROM (Erasable Programmable Read Only Memory) or flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present application, a computer-readable storage medium may be any tangible medium containing or storing a program, which can be used by or in combination with an instruction execution system, device or device. In the present application, a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, which carries a computer-readable program code. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which may send, propagate, or transmit a program for use by or in combination with an instruction execution system, device or device. The program code contained on the computer-readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, optical cable, radio frequency (RF), etc., or any suitable combination of the above.

The flow chart and block diagram in the accompanying drawings illustrate the possible architecture, function and operation of the system, method and computer program product according to various embodiments of the present application. In this regard, each box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the above-mentioned module, program segment or a part of a code contains one or more executable instructions for realizing the specified logical function. It should also be noted that in some alternative implementations, the functions marked in the box can also occur in a different order from the order marked in the accompanying drawings. For example, two boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved. It should also be noted that each box in the block diagram or flow chart, and the combination of the boxes in the block diagram or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.

The modules and/or units described in this application may be implemented in software or hardware. The modules and/or units described may also be provided in a processor, for example, may be described as: a processor applied to a data consumption end, including a first acquisition module, a second acquisition module, and a merging module. Alternatively, it may be described as: a processor applied to a data production end, including a segmentation module, a sending module, and a storage module. Among them, the names of these modules do not constitute limitations on the modules themselves in some cases.

As another aspect, the present application further provides a computer-readable medium, which may be included in the device described in the above embodiment; or may exist independently without being assembled into the device. The computer-readable medium carries one or more programs, and when the one or more programs are executed by the device, the device includes:

The metadata of each file slice in the multiple file slices is obtained from the distributed database. The multiple file slices and the metadata of each file slice are obtained by slicing the original file at the data production end; according to the metadata of each file slice, the matching file slice is obtained from the topic partition of the message publishing and subscription system to obtain multiple file slices; a file processing stream is created, and the file processing stream is used to merge the multiple file slices according to the metadata of each file slice to obtain the target file.

Or the computer-readable medium carries one or more programs, and when the one or more programs are executed by a device, the device includes:

An original file is obtained, and the original file is sliced to obtain multiple file slices and metadata of each file slice; the multiple file slices are sent to a topic partition of a message publishing and subscription system, and the metadata of each file slice is stored in a distributed database, so that after a data consumer obtains the metadata of each file slice from the distributed database and obtains the multiple file slices from the topic partition, the multiple file slices are merged according to the metadata of each file slice to obtain a target file.

According to the technical solution of the present application, the metadata of each file slice in the multiple file slices can be obtained from the distributed database, and the multiple file slices and the metadata of each file slice are obtained by slicing the original file at the data production end; according to the metadata of each file slice, the matching multiple file slices are obtained from the topic partition of the message publishing and subscription system; a file processing stream is created, and the file processing stream is used to merge the multiple file slices according to the metadata of each file slice to obtain the target file. That is, the present application can slice large files, transmit the file slices through the message publishing and subscription system, and store the metadata of the file slices in the distributed database. With the help of file slicing technology and distributed database, the transmission of large files using the message publishing and subscription system is realized, thereby improving the efficiency of file transmission.

It should be understood that the various forms of processes shown above can be used to reorder, add or delete steps. For example, the multiple steps recorded in this application can be executed in parallel, sequentially or in different orders, as long as the expected results of the technical solution of this application can be achieved, and this document is not limited here.

The above specific implementations do not constitute limitations on the protection scope of this application.

Claims

A file transmission method, applied to a data consumer, comprising:

Obtaining metadata of each file slice in a plurality of file slices from a distributed database, wherein the plurality of file slices and the metadata of each file slice are obtained by slicing the original file at the data production end;

According to the metadata of each file slice, a matching file slice is obtained from the topic partition of the message publishing and subscription system to obtain the multiple file slices;

A file processing stream is created, and the plurality of file slices are merged according to the metadata of each file slice by using the file processing stream to obtain a target file.
The file transmission method according to claim 1, before obtaining metadata of each file slice in the plurality of file slices from the distributed database, further comprises:

Obtaining information summary codes of multiple files by querying the distributed database;

Writing the information summary codes of the multiple files into a preset set to obtain an original summary code set;

The original summary code set is deduplicated to obtain a target summary code set.
The file transmission method according to claim 2, wherein the step of obtaining metadata of each file slice in the plurality of file slices from a distributed database comprises:

Identifying the information summary code of the original file from the target summary code set;

The metadata of each file slice in the plurality of file slices is obtained from the distributed database according to the information summary code of the original file.
The file transmission method according to claim 1, wherein the metadata includes a serial number of the file slice and an offset of the file slice in the subject partition;

The step of acquiring a matching file slice from a topic partition of a message publishing and subscription system according to the metadata of each file slice to obtain the multiple file slices includes:

Find the offset of the file slice in the topic partition according to the serial number of the file slice;

The file slice is obtained from the subject partition according to the offset of the file slice in the subject partition until the plurality of file slices are obtained.
The file transmission method according to claim 4, wherein the metadata further includes an information summary code of the file slice;

Before merging the multiple file slices according to the metadata of each file slice by using the file processing flow, the method further includes:

Verify each file slice of the plurality of file slices according to the information summary code of the file slice included in the metadata;

In the case that the multiple file slices all pass the verification, the step of merging the multiple file slices according to the metadata of each file slice by using the file processing flow is triggered.
The file transfer method according to claim 4, wherein the step of merging the plurality of file slices according to the metadata of each file slice using the file processing flow to obtain a target file comprises:

The file processing flow is used to write the data in the multiple file slices into a designated file according to the serial number of each file slice to obtain the target file.
The file transmission method according to claim 3, further comprising:

The target file is verified according to the information summary code of the original file.
A file transmission method, applied to a data production end, comprising:

Acquire an original file, and slice the original file to obtain multiple file slices and metadata of each file slice;

The multiple file slices are sent to the topic partition of the message publishing and subscription system, and the metadata of each file slice is stored in the distributed database, so that the data consumer can After obtaining the metadata of each file slice from the database and obtaining the multiple file slices from the subject partition, the multiple file slices are merged according to the metadata of each file slice to obtain the target file.
A file transmission device, applied to a data consumption end, comprising:

A first acquisition module is configured to acquire metadata of each file slice in a plurality of file slices from a distributed database, wherein the plurality of file slices and the metadata of each file slice are obtained by segmenting the original file at the data production end;

A second acquisition module is configured to acquire matching file slices from a topic partition of a message publishing and subscription system according to the metadata of each file slice to obtain the multiple file slices;

The merging module is configured to create a file processing stream, and use the file processing stream to merge the multiple file slices according to the metadata of each file slice to obtain a target file.
A file transmission device, applied to a data production end, comprising:

A segmentation module, configured to obtain an original file and segment the original file to obtain a plurality of file slices and metadata of each file slice;

A sending module, configured to send the plurality of file slices to a topic partition of a message publishing and subscription system;

The storage module is configured to store the metadata of each file slice into a distributed database, so that after the data consumer obtains the metadata of each file slice from the distributed database and obtains the multiple file slices from the subject partition, the multiple file slices are merged according to the metadata of each file slice to obtain the target file.
A file transmission system comprises a data consumption end for executing the file transmission method as claimed in any one of claims 1 to 7 and a data production end for executing the file transmission method as claimed in claim 8.
An electronic device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, wherein when the processor executes the program, the file transfer method as described in any one of claims 1 to 7 is implemented, or when the processor executes the program, the file transfer method as described in claim 8 is implemented.
A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the file transmission method according to any one of claims 1 to 7 is implemented, or when the program is executed by a processor, the file transmission method according to claim 8 is implemented.