CN115801765A - File transmission method, device, system, electronic equipment and storage medium - Google Patents

File transmission method, device, system, electronic equipment and storage medium Download PDF

Info

Publication number
CN115801765A
CN115801765A CN202211434252.6A CN202211434252A CN115801765A CN 115801765 A CN115801765 A CN 115801765A CN 202211434252 A CN202211434252 A CN 202211434252A CN 115801765 A CN115801765 A CN 115801765A
Authority
CN
China
Prior art keywords
file
slices
slice
metadata
distributed database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211434252.6A
Other languages
Chinese (zh)
Inventor
官祥臻
王钰涵
赵天武
桂林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gongfu Qingdao Technology Co ltd
Cosmoplat Industrial Intelligent Research Institute Qingdao Co Ltd
Original Assignee
Gongfu Qingdao Technology Co ltd
Cosmoplat Industrial Intelligent Research Institute Qingdao Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gongfu Qingdao Technology Co ltd, Cosmoplat Industrial Intelligent Research Institute Qingdao Co Ltd filed Critical Gongfu Qingdao Technology Co ltd
Priority to CN202211434252.6A priority Critical patent/CN115801765A/en
Publication of CN115801765A publication Critical patent/CN115801765A/en
Priority to PCT/CN2023/103618 priority patent/WO2024103752A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a file transmission method, a device, a system, electronic equipment and a storage medium, wherein the method is applied to a data consumption end and comprises the following steps: acquiring metadata of each file slice in a plurality of file slices from a distributed database, wherein the plurality of file slices and the metadata of each file slice are obtained by segmenting an original file by a data production end; acquiring matched file slices from a topic partition of a message publishing and subscribing system according to metadata of each file slice to obtain a plurality of file slices; and creating a file processing flow, and merging the multiple file slices according to the metadata of each file slice by using the file processing flow to obtain the target file. The invention can slice the large file, transmit the file slices through the message publishing and subscribing system, store the metadata of the file slices in the distributed database, realize the transmission of the large file by using the message publishing and subscribing system by virtue of the file splitting technology and the distributed database, and improve the file transmission efficiency.

Description

File transmission method, device, system, electronic equipment and storage medium
Technical Field
The present invention relates to data transmission technologies, and in particular, to a file transmission method, device, system, electronic device, and storage medium.
Background
Kafka is a distributed message publish and subscribe system that has the advantages of high throughput, low latency, high availability, etc. Kafka transmits structured data, typically of the log type, with each piece of data of Kafka transmitting by default no more than 1MB in size, which results in larger binary files (such as video, pictures, compressed packets, etc.) being unable to be transmitted through Kafka.
Disclosure of Invention
The invention provides a file transmission method, a file transmission device, a file transmission system, electronic equipment and a storage medium, which can realize the transmission of large files by using a message publishing and subscribing system and improve the file transmission efficiency.
In a first aspect, the present invention provides a file transmission method, which is applied to a data consuming side, and the method includes:
acquiring metadata of each file slice in a plurality of file slices from a distributed database, wherein the plurality of file slices and the metadata of each file slice are obtained by segmenting an original file by a data production end;
acquiring the plurality of file slices from a topic partition of a message publishing and subscribing system according to the metadata of each file slice;
and creating a file processing flow, and merging the plurality of file slices according to the metadata of each file slice by using the file processing flow to obtain a target file.
In a second aspect, the present invention provides a file transmission method, applied to a data production end, where the method includes:
acquiring an original file, and segmenting the original file to obtain a plurality of file slices and metadata of each file slice;
and sending the multiple file slices to a theme partition of a message publishing and subscribing system, and storing the metadata of each file slice into a distributed database, so that a data consumption end acquires the metadata of each file slice from the distributed database and acquires the multiple file slices from the theme partition, and then merges the multiple file slices according to the metadata of each file slice to obtain a target file.
In a third aspect, the present invention provides a file transmission device applied to a data consuming side, the device comprising:
the system comprises a first acquisition module, a second acquisition module and a data processing module, wherein the first acquisition module is used for acquiring metadata of each file slice in a plurality of file slices from a distributed database, and the plurality of file slices and the metadata of each file slice are obtained by segmenting an original file by a data production end;
the second acquisition module is used for acquiring the plurality of file slices from the topic partition of the message publishing and subscribing system according to the metadata of each file slice;
and the merging module is used for creating a file processing stream and merging the file slices according to the metadata of each file slice by using the file processing stream to obtain a target file.
In a fourth aspect, the present invention provides a file transmission apparatus, applied to a data production end, the apparatus including:
the system comprises a segmentation module, a storage module and a processing module, wherein the segmentation module is used for acquiring an original file and segmenting the original file to obtain a plurality of file slices and metadata of each file slice;
the sending module is used for sending the plurality of file slices to a topic partition of a message publishing and subscribing system;
and the storage module is used for storing the metadata of each file slice into a distributed database, so that after the data consumption end acquires the metadata of each file slice from the distributed database and acquires the plurality of file slices from the theme partition, the plurality of file slices are merged according to the metadata of each file slice, and a target file is obtained.
In a fifth aspect, the present invention provides a file transfer system, which includes a data consuming side and a data producing side for executing the file transfer method according to any embodiment of the present invention.
In a sixth aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the file transfer method according to any embodiment of the present invention when executing the computer program.
In a seventh aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements a file transfer method according to any one of the embodiments of the present invention.
In the scheme of the invention, the metadata of each file slice in a plurality of file slices can be obtained from a distributed database, and the plurality of file slices and the metadata of each file slice are obtained by segmenting the original file by a data production end; acquiring a plurality of file slices from a topic partition of a message publishing and subscribing system according to the metadata of each file slice; and creating a file processing flow, and merging the multiple file slices according to the metadata of each file slice by using the file processing flow to obtain the target file. The invention can slice the large file, transmit the file slices through the message publishing and subscribing system, store the metadata of the file slices in the distributed database, and the data consumption end can obtain the file slices from the message publishing and subscribing system according to the metadata obtained from the distributed database and combine the file slices to obtain the required target file, thereby realizing the transmission of the large file by using the message publishing and subscribing system by means of the file splitting technology and the distributed database and improving the file transmission efficiency.
Drawings
In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to these drawings without inventive effort.
FIG. 1 is a schematic flow chart of a file transfer method provided by the present invention;
FIG. 2 is a schematic flow chart of a file transmission method provided by the present invention;
FIG. 3 is another schematic flow chart of a file transmission method provided by the present invention;
FIG. 4 is an exemplary flow chart of a file transfer method provided by the present invention;
FIG. 5 is a schematic structural diagram of a document transportation device provided in the present invention;
FIG. 6 is another schematic structural diagram of a document transportation device provided by the present invention;
fig. 7 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a schematic flow chart of a file transfer method provided by the present invention, which can be executed by a file transfer apparatus provided by the present invention, and the apparatus can be implemented by software and/or hardware. In a specific embodiment, the apparatus may be integrated in the data consuming side, in particular, may be integrated in an electronic device of the data consuming side, which may be, for example, a computer. The following embodiments will be described by taking as an example that the apparatus is integrated in an electronic device at a data consumer side.
Before introducing the processing procedure of the data consuming side, the processing procedure of the data producing side is introduced, which may specifically be as follows:
in this embodiment, the files to be transmitted at the data production end may be large files such as videos, pictures, compressed packages, etc., the files to be transmitted may include a plurality of files, the files usually exceed 1MB, and the size of each piece of data transmitted by the message publishing and subscribing system Kafka does not exceed 1MB, so that the files cannot be transmitted to the demand end by directly adopting the message publishing and subscribing system. For this situation, in this embodiment, before the files are transmitted by using the message publishing and subscribing system, each file slice may be sliced according to the limitation of the message publishing and subscribing system on the size of the transmission data, for example, each file slice is sliced into slices smaller than 1MB, so as to obtain a plurality of file slices of each file, and each file slice may have metadata, where the metadata is descriptive data of the file slice.
After each file is divided into a plurality of file slices, the plurality of file slices can be uploaded to a message publishing and subscribing system, the message publishing and subscribing system stores the data in a classified manner, that is, each file slice has a category in the message publishing and subscribing system, the category is called Topic, data of physically different topics are stored separately, each Topic comprises one or more partitions, that is, each file slice is uploaded to the message publishing and subscribing system and then stored in the corresponding Topic partition, and after storage, the offset of the file slice in the Topic partition is obtained. The subject partition stores the file slices in the queue, and the offset may represent the actual location of the file slice relative to the head of the queue. In addition, when the file is sliced, the serial number of each file slice can be recorded, and the serial number of each file slice and the offset of each file slice in the topic partition of the message publishing and subscribing system can be used as the metadata of the corresponding file slice. Of course, the metadata of the file slice may further include other information, such as a file name of the file, an information digest code of the file, and the like, which is not limited herein.
Because the message publishing and subscribing system is generally not used for storing data for a long time, the storage space is limited, and the concept of metadata storage is absent, in the embodiment, the file slices and the metadata of the file slices can be processed separately, the file slices are pushed to the message publishing and subscribing system, the metadata of the file slices are uploaded to the distributed database, and the distributed database is used for storing the metadata, so that the metadata of the file slices can be conveniently, quickly and efficiently inquired subsequently. In particular, the distributed database may be a document database based on distributed file storage, for example, the distributed database may be mongoDB, which is a product between a relational database and a non-relational database, and the non-relational database has the richest functions, most like the relational database, and supports very loose data structures, so that more complex data types can be stored.
The following describes a processing procedure of the data consuming side, and with continued reference to fig. 1, specifically, the following steps may be included:
step 101, obtaining metadata of each file slice in a plurality of file slices from a distributed database, and segmenting an original file by a data production end to obtain the metadata of the plurality of file slices and the metadata of each file slice.
For example, the original file may be any one of a plurality of files transmitted by the data producing end, and the specific consumption requirement of the data consuming end may be determined, that is, the original file is also a large file such as a video, a picture, a compressed package, and the like. Since the data production end stores the metadata of each file slice into the distributed database after segmenting the file, the metadata of each file slice of the required file can be obtained from the distributed database firstly during actual consumption.
And step 102, acquiring matched file slices from the topic partitions of the message publishing and subscribing systems according to the metadata of each file slice to obtain a plurality of file slices.
The metadata of each file slice comprises a serial number of the corresponding file slice and an offset of the corresponding file slice in the theme zone; the sequence number of a file slice refers to a position of the file slice in an original binary file, illustratively, one file has a size of 1GB, the file is cut into 1024 slices, and the sequence numbers of the file slices may be 1 to 1024, respectively; each offset of data in the message publishing and subscribing system is divided into an index and a log, the index records offset information of the data, and the log stores data information. The consumption end searches for the offset of the corresponding file slice in the subject partition according to the serial number of each file slice, and acquires the matched file slice from the subject partition according to the offset of the file slice in the subject partition for consumption, for example, when the data consumption end consumes for the first time, the consumption is started from the file slice with the offset of 0 until the consumption reaches 8, and the offset is recorded at 8. The maximum value which can be consumed by the data consumption end is the maximum value of the offset written by the data production end, and when the maximum value reaches the consumption value, the acquisition of the plurality of file slices is completed.
Step 103, creating a file processing stream, and merging the multiple file slices according to the metadata of each file slice by using the file processing stream to obtain a target file.
In specific implementation, the file processing stream may include a file reading stream and a file writing stream, data may be sequentially read from multiple file slices according to the serial number by using the file reading stream, and the read data may be written into an assigned file by using the file writing stream, so that multiple file slices are merged by encoding the file stream, and a target file is finally obtained after restoration.
According to the scheme, a data production end processes original file slices to obtain a plurality of file slices, the file slices are pushed to a message publishing and subscribing system, and metadata corresponding to each file slice is stored in a distributed database; the data consumption end obtains metadata corresponding to each file slice from a distributed database, and obtains a plurality of file slices matched with the metadata from a topic partition of a message publishing and subscribing system according to the metadata of each file slice; and creating a file processing flow, and merging the multiple file slices according to the metadata of each file slice by using the file processing flow to obtain the target file. The invention can slice the large file, transmit the file slices through the message publishing and subscribing system, store the metadata of the file slices in the distributed database, realize the transmission of the large file by using the message publishing and subscribing system by virtue of the file splitting technology and the distributed database, and improve the file transmission efficiency.
Fig. 2 is another schematic flow chart of the file transmission method provided by the present invention, further illustrating the file transmission method provided by the present invention, which can be integrated in an electronic device at a data consumer end, for example, the electronic device can be a computer. The following embodiments will be described by taking as an example that the apparatus is integrated in an electronic device at a data consumer end, and as shown in fig. 2, the method may include the following steps:
step 201, querying a distributed database to obtain the information summary codes of a plurality of files.
The information abstract code is a 128-bit (bit) feature code obtained by digitally transforming original information according to a public information abstract algorithm, and the feature code is irreversible and has high discreteness, so that the uniqueness of a file or a file slice can be ensured. For example, the message digest code may be an md5 code, and each file has a unique md5 code (i.e., file md5, fmd5 for short).
Step 202, writing the information digest codes of the plurality of files into a preset set to obtain an original digest code set.
Specifically, an empty data set may be constructed, and md5 codes of files queried from the distributed database are written into the empty data set to obtain an original digest code set, where the original digest code set includes a plurality of fmd5.
And 203, performing deduplication processing on the original abstract code set to obtain a target abstract code set.
Specifically, the deduplication process can avoid storing repetitive data in a database to cause a large amount of redundant data, a new data set, that is, a set of target fmd5 codes, can be obtained after the deduplication process, and the number of fmd5 codes included in the set is the number of files transmitted by kafka. Due to the uniqueness of the target summary code, the data consumer can acquire a plurality of file slices of a certain file in the subject partition through the target summary code.
Step 204, identifying the information digest code of the original file from the target digest code set.
For example, the correspondence information between the information digest code of the file and the file identifier may be stored in advance, the file identifier may be a file name, the information digest code may be an md5 code (i.e., an fmd5 code) of the file, and the information digest code of the original file, which refers to the file currently needed by the consumer, is identified from the target digest code set according to the correspondence information. In specific implementation, the information abstract code of the original file can be determined directly according to the corresponding relationship information between the information abstract code of the pre-stored file and the file identifier.
Step 205, obtaining metadata of each file slice in the plurality of file slices from the distributed database according to the information digest code of the original file.
Namely, according to the information abstract code of the original file, the metadata of all file slices of the original file is acquired from the distributed database at one time. Illustratively, the metadata for each file slice may also include the following:
sequence number: indicating the location of the file slice in the binary original file, for example, an original file size is 1GB, the file is cut into 1024 file slices, and the sequence numbers of the 1024 file slices may be 1-1024.
Information abstract code: including the original file information digest code (fmd 5 code) and each file slice information digest code (md 5 code).
Offset amount: indicating the location of the file slice in the topic partition of the message publish and subscribe system.
And (4) ending state: indicating whether the current slice is the last slice.
File name: the name of the file and the suffix name.
And step 206, searching the offset of the corresponding file slice in the theme zone according to the serial number of each file slice.
In the distributed database, different file slices can be distinguished through serial numbers, and in the theme partition, different file slices are distinguished through offsets, so that after the metadata of each file slice is obtained, the offset of the corresponding file slice in the theme partition can be searched according to the serial number of each file slice in the metadata. Each message in the subject partition has its own unique offset, which is used to indicate the location information of the message in the partition.
And step 207, acquiring matched file slices from the theme partition according to the offset of each file slice in the theme partition.
And 208, verifying the acquired file slices according to the information summary codes of the file slices included in the metadata.
Specifically, the acquired file slice may be checked according to the serial number, that is, whether the serial number of the acquired file slice is the same as the serial number of the file slice to be acquired may be determined, and if the serial number of the acquired file slice is the same as the serial number of the file slice to be acquired, the serial number check is passed; and then, checking the file slices according to the information digest codes, namely calculating the information digest codes of the file slices acquired from the subject partition, comparing the information digest codes of the file slices corresponding to the metadata with the information digest codes of the file slices acquired from the subject partition, if the information digest codes are the same as the information digest codes of the file slices acquired from the subject partition, checking the file slices to be passed, and if the information digest codes are different from the information digest codes of the file slices acquired from the subject partition, checking the file slices to be passed, so that the file slices transmitted by the subject partition are ensured to be correct and not to be tampered.
Step 209, determining whether the verification is passed, if the verification is passed, executing step 210, and if the verification is not passed, returning to execute step 207.
And if the verification is not passed, pulling the corresponding file slice again and verifying until all the file slices of the original file are obtained.
Step 210, obtaining a next file slice until obtaining a plurality of file slices.
And step 211, creating a file processing flow, and writing data in the multiple file slices into the specified file according to the serial number of each file slice by using the file processing flow to obtain the target file.
Specifically, when all the multiple file slices pass the verification, the data transmission is proved to be completed, no error or omission occurs in the data transmission process, when the ending state in the metadata is detected, the file slices are proved to be acquired, and then the step of merging the multiple file slices according to the metadata of each file slice by using the file processing flow is triggered and executed.
When a plurality of file slices are combined, a file processing stream can be created, the file processing stream can comprise a file reading stream and a file writing stream, the file slices can be sequenced according to serial numbers to enable the file slices to be identical to the file sequence before data transmission, the file reading stream is used for reading data from the file slices in sequence according to the sequence, the file writing stream is used for writing the read data into a specified file, and therefore the plurality of file slices are combined in a mode of coding the file stream, and a target file is finally obtained after reduction.
And step 212, checking the target file according to the information abstract code of the original file.
In the specific implementation, the information abstract code of the target file can be calculated, the information abstract code of the original file is compared with the information abstract code of the target file, if the information abstract codes are consistent, the target file passes the verification, and the file transmission process is completed.
According to the scheme, a data consumption end obtains metadata of each file slice in a plurality of file slices from a distributed database, and the plurality of file slices and the metadata of each file slice are obtained by segmenting an original file through a data production end; acquiring a plurality of file slices from a topic partition of a message publishing and subscribing system according to the metadata of each file slice; and creating a file processing stream, and combining a plurality of file slices according to the metadata of each file slice by using the file processing stream to obtain the target file. The invention can slice the large file, transmit the file slices through the message publishing and subscribing system, store the metadata of the file slices in the distributed database, realize the transmission of the large file by using the message publishing and subscribing system by virtue of the file splitting technology and the distributed database, and improve the file transmission efficiency.
Fig. 3 is another schematic flow chart of the file transmission method provided by the present invention, which can be integrated in an electronic device at a data production end, such as a computer. The following embodiments will be described by taking as an example that the apparatus is integrated in an electronic device at a data production end, and as shown in fig. 3, the method may include the following steps:
step 301, an original file is obtained, and the original file is segmented to obtain a plurality of file slices and metadata of each file slice.
Specifically, at the data production end, a binary original file can be pulled by using flash, which is a highly available, highly reliable, distributed system for collecting, aggregating and transmitting mass logs, and is a tool that can collect data resources such as logs, events and the like, and store the huge amount of data from each data resource in a centralized manner. By pulling the binary file through Flume, the fragment size can be adjusted to adapt to the Message size and performance tuning of Kafka in a customized manner, for example, an original file is divided into a plurality of file slices not greater than 1MB so as to meet the file transmission condition of Kafka, and metadata of each file slice is obtained at the same time.
Step 302, sending a plurality of file slices to a topic partition of the message publishing and subscribing system, and storing the metadata of each file slice in a distributed database.
The metadata comprises an information summary code corresponding to the file slice, an offset of the corresponding file slice in the subject partition, a serial number of the corresponding file slice, an end state of the file slice, a full name of the original file and the information summary code of the original file. Specifically, the file slice end state is a flag in the metadata of only the last file slice; the full name of the original file and the information abstract code of the original file are used for identifying the file and ensuring the uniqueness of the original file. And then, the file slices are sent to a theme partition of the message publishing and subscribing system, and because the sliced files are not more than 1MB, the files can be stably transmitted in the theme partition, and the metadata of each file slice is stored in the distributed database. For the subsequent data processing, reference may be made to the foregoing embodiments, which are not described herein again.
According to the scheme, a data production end obtains an original file, and the original file is subjected to segmentation processing to obtain a plurality of file slices and metadata of each file slice; sending a plurality of file slices to a topic partition of a message publishing and subscribing system, and storing metadata of each file slice into a distributed database, so that a data consumption end combines the file slices according to the metadata of each file slice after acquiring the metadata of each file slice from the distributed database and acquiring the file slices from the topic partition, thereby obtaining a target file. The invention can slice the large file, transmit the file slice through the message publishing and subscribing system, store the metadata of the file slice in the distributed database, realize the transmission of the large file by using the message publishing and subscribing system by virtue of the file segmentation technology and the distributed database, and improve the file transmission efficiency.
Fig. 4 is an exemplary flowchart of a file transfer method provided by the present invention, where the flowchart is as follows: the data production end firstly slices the original file according to the size of each file slice not more than 1MB, stores the metadata of each slice in a distributed database, and then sends the file slices to a subject partition of the message publishing and subscribing system for transmission. When the data consumption end needs to consume data, the data consumption end can inquire the distributed database to obtain the information abstract codes of all the files, write the obtained information abstract codes of all the files into a preset set and automatically remove duplication, wherein the number of the information abstract codes in the set (namely a target abstract code set) after duplication removal is equal to the number of the files; identifying an information summary code of an original file needing to be downloaded currently from a target summary code set according to corresponding relation information of information summary codes and file identifications of files stored in advance, acquiring metadata of each file slice of the original file from a distributed database according to the information summary code of the original file, searching offset of the corresponding file slice according to a serial number of the file slice in the metadata, consuming according to the searched offset at a consuming end, pulling the file slice from a message publishing and subscribing system, writing data in the pulled file slice into a preset file by using a file processing flow after the pulled file slice passes verification, continuing pulling the next file slice, closing the file processing flow after all file slices of the original file are pulled and written into the preset file, obtaining the target file, verifying the target file based on the information summary code of the original file, and if the file passes verification, successfully pulling the file.
Fig. 5 is a schematic structural diagram of a file transmission apparatus provided by the present invention, which is suitable for executing the file transmission method provided by the present invention and is applied to a data consuming side. As shown in fig. 5, the apparatus may specifically include:
a first obtaining module 501, configured to obtain metadata of each file slice in multiple file slices from a distributed database, where the multiple file slices and the metadata of each file slice are obtained by performing a splitting process on an original file by a data production end;
a second obtaining module 502, configured to obtain a matched file slice from a topic partition of a message publishing and subscribing system according to the metadata of each file slice, to obtain the multiple file slices;
a merging module 503, configured to create a file processing stream, and merge the multiple file slices according to the metadata of each file slice by using the file processing stream, so as to obtain a target file.
In an embodiment, the apparatus further includes a set obtaining module, configured to:
querying the distributed database to obtain the information summary codes of a plurality of files,
writing the information abstract codes of the files into a preset set to obtain an original abstract code set;
and carrying out duplication removal processing on the original abstract code set to obtain a target abstract code set.
In an embodiment, the first obtaining module 501 is specifically configured to:
identifying the information abstract code of the original file from the target abstract code set;
and acquiring metadata of each file slice in a plurality of file slices from the distributed database according to the information summary code of the original file.
In an embodiment, the metadata includes a sequence number of a corresponding file slice and an offset of the corresponding file slice in the theme partition, and the second obtaining module 502 is specifically configured to:
searching the offset of the corresponding file slice in the theme zone according to the serial number of each file slice;
and acquiring the matched file slices from the theme partition according to the offset of each file slice in the theme partition to obtain the plurality of file slices.
In an embodiment, the metadata further includes an information digest code corresponding to the file slice, and the apparatus further includes:
a file slice verifying module, configured to verify the file slices according to the information digest codes of the corresponding file slices included in the metadata before merging the file slices according to the metadata of each file slice by using the file processing stream;
a triggering module, configured to trigger the merging module 503 to execute a step of merging the multiple file slices according to the metadata of each file slice by using the file processing stream when all the multiple file slices pass the verification.
In an embodiment, the merging module 503 is specifically configured to:
and writing the data in the file slices into an appointed file by using the file processing stream according to the serial number of each file slice to obtain the target file.
In one embodiment, the apparatus further comprises:
and the file checking module is used for checking the target file according to the information abstract code of the original file.
The device of the invention obtains the metadata of each file slice in a plurality of file slices from a distributed database, and the metadata of the plurality of file slices and each file slice are obtained by segmenting an original file by a data production end; acquiring a plurality of matched file slices from a topic partition of a message publishing and subscribing system according to the metadata of each file slice; and creating a file processing stream, and combining a plurality of file slices according to the metadata of each file slice by using the file processing stream to obtain the target file. The invention can slice the large file, transmit the file slices through the message publishing and subscribing system, store the metadata of the file slices in the distributed database, realize the transmission of the large file by using the message publishing and subscribing system by virtue of the file splitting technology and the distributed database, and improve the file transmission efficiency.
Fig. 6 is another schematic structural diagram of a file transmission device provided by the present invention, which is suitable for executing the file transmission method provided by the present invention and is applied to a data production end. As shown in fig. 6, the apparatus may specifically include:
the segmentation module 601 is configured to obtain an original file, and segment the original file to obtain a plurality of file slices and metadata of each file slice;
a sending module 602, configured to send the plurality of file slices to a topic partition of a message publishing and subscribing system;
a storage module 603, configured to store the metadata of each file slice in a distributed database, so that after the data consuming side obtains the metadata of each file slice from the distributed database and obtains the multiple file slices from the theme partition, the multiple file slices are merged according to the metadata of each file slice, thereby obtaining a target file.
The device of the invention obtains an original file, and divides the original file to obtain a plurality of file slices and metadata of each file slice; sending a plurality of file slices to a topic partition of a message publishing and subscribing system, and storing metadata of each file slice into a distributed database, so that a data consumption end combines the file slices according to the metadata of each file slice after acquiring the metadata of each file slice from the distributed database and acquiring the file slices from the topic partition, thereby obtaining a target file. The invention can slice the large file, transmit the file slices through the message publishing and subscribing system, store the metadata of the file slices in the distributed database, realize the transmission of the large file by using the message publishing and subscribing system by virtue of the file splitting technology and the distributed database, and improve the file transmission efficiency.
The invention also provides a file transmission system which comprises a data consumption end and a data production end used for executing the file transmission method in any embodiment of the invention.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the file transmission method provided by any embodiment is realized.
The present invention also provides a computer readable medium, on which a computer program is stored, and the program is executed by a processor to implement the file transmission method provided by any of the above embodiments.
Referring now to FIG. 7, a block diagram of a computer system 700 suitable for use in implementing the electronic device of the present invention is shown. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the function and the scope of use of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 701.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules and/or units described in the present invention may be implemented by software or hardware. The described modules and/or units may also be provided in a processor, and may be described as: a processor is applied to a data consumption end and comprises a first obtaining module, a second obtaining module and a merging module. Alternatively, it can be described as: a processor is applied to a data production end and comprises a cutting module, a sending module and a storage module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not assembled into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
acquiring metadata of each file slice in a plurality of file slices from a distributed database, wherein the metadata of the plurality of file slices and each file slice are obtained by segmenting an original file by a data production end; acquiring matched file slices from a topic partition of a message publishing and subscribing system according to the metadata of each file slice to obtain a plurality of file slices; and creating a file processing flow, and merging the multiple file slices according to the metadata of each file slice by using the file processing flow to obtain the target file.
Or the computer readable medium carrying one or more programs which, when executed by a device, cause the device to comprise:
acquiring an original file, and segmenting the original file to obtain a plurality of file slices and metadata of each file slice; and sending the plurality of file slices to a topic partition of a message publishing and subscribing system, and storing the metadata of each file slice into a distributed database, so that a data consumption end combines the plurality of file slices according to the metadata of each file slice after acquiring the metadata of each file slice from the distributed database and acquiring the plurality of file slices from the topic partition, thereby obtaining a target file.
According to the technical scheme, the metadata of each file slice in a plurality of file slices can be obtained from a distributed database, and the plurality of file slices and the metadata of each file slice are obtained by segmenting an original file through a data production end; acquiring a plurality of matched file slices from a topic partition of a message publishing and subscribing system according to the metadata of each file slice; and creating a file processing stream, and combining a plurality of file slices according to the metadata of each file slice by using the file processing stream to obtain the target file. The invention can slice the large file, transmit the file slice through the message publishing and subscribing system, store the metadata of the file slice in the distributed database, realize the transmission of the large file by using the message publishing and subscribing system by virtue of the file segmentation technology and the distributed database, and improve the file transmission efficiency.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired result of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (13)

1. A file transmission method is applied to a data consumption end, and comprises the following steps:
acquiring metadata of each file slice in a plurality of file slices from a distributed database, wherein the plurality of file slices and the metadata of each file slice are obtained by segmenting an original file by a data production end;
acquiring matched file slices from a topic partition of a message publishing and subscribing system according to the metadata of each file slice to obtain a plurality of file slices;
and creating a file processing stream, and combining the multiple file slices according to the metadata of each file slice by using the file processing stream to obtain a target file.
2. The file transfer method according to claim 1, further comprising, before obtaining the metadata of each of the plurality of file slices from the distributed database:
inquiring the distributed database to obtain information abstract codes of a plurality of files;
writing the information abstract codes of the files into a preset set to obtain an original abstract code set;
and carrying out duplication removal processing on the original abstract code set to obtain a target abstract code set.
3. The file transfer method according to claim 2, wherein the obtaining metadata of each file slice in the plurality of file slices from the distributed database comprises:
identifying the information abstract code of the original file from the target abstract code set;
and acquiring metadata of each file slice in a plurality of file slices from the distributed database according to the information summary code of the original file.
4. The file transmission method according to claim 1, wherein the metadata includes a sequence number of a corresponding file slice and an offset of the corresponding file slice in the topic partition, and the obtaining the matched file slice from the topic partition of the message publishing and subscribing system according to the metadata of each file slice to obtain the plurality of file slices includes:
searching the offset of the corresponding file slice in the theme partition according to the serial number of each file slice;
and acquiring the matched file slices from the theme partition according to the offset of each file slice in the theme partition to obtain the plurality of file slices.
5. The file transmission method according to claim 4, wherein the metadata further includes information digest codes of corresponding file slices, and before merging the plurality of file slices according to the metadata of each file slice by using the file processing stream, the method further includes:
verifying the plurality of file slices according to the information summary codes of the corresponding file slices included in the metadata;
and when the plurality of file slices are verified, triggering and executing the step of merging the plurality of file slices according to the metadata of each file slice by using the file processing stream.
6. The file transfer method according to claim 4, wherein said merging the plurality of file slices according to the metadata of each file slice by using the file processing stream to obtain the target file comprises:
and writing the data in the multiple file slices into a specified file by using the file processing stream according to the serial number of each file slice to obtain the target file.
7. The file transfer method according to claim 3, characterized in that the method further comprises:
and checking the target file according to the information abstract code of the original file.
8. A file transmission method is applied to a data production end, and comprises the following steps:
acquiring an original file, and segmenting the original file to obtain a plurality of file slices and metadata of each file slice;
and sending the plurality of file slices to a topic partition of a message publishing and subscribing system, and storing the metadata of each file slice into a distributed database, so that a data consumption end combines the plurality of file slices according to the metadata of each file slice after acquiring the metadata of each file slice from the distributed database and acquiring the plurality of file slices from the topic partition, thereby obtaining a target file.
9. A file transmission apparatus, applied to a data consuming side, the apparatus comprising:
the system comprises a first acquisition module, a second acquisition module and a data processing module, wherein the first acquisition module is used for acquiring metadata of each file slice in a plurality of file slices from a distributed database, and the plurality of file slices and the metadata of each file slice are obtained by segmenting an original file by a data production end;
the second acquisition module is used for acquiring matched file slices from a topic partition of a message publishing and subscribing system according to the metadata of each file slice to obtain the plurality of file slices;
and the merging module is used for creating a file processing stream and merging the file slices according to the metadata of each file slice by using the file processing stream to obtain a target file.
10. A file transfer apparatus, applied to a data production side, the apparatus comprising:
the system comprises a segmentation module, a storage module and a processing module, wherein the segmentation module is used for acquiring an original file and segmenting the original file to obtain a plurality of file slices and metadata of each file slice;
the sending module is used for sending the plurality of file slices to a topic partition of a message publishing and subscribing system;
and the storage module is used for storing the metadata of each file slice into a distributed database, so that after the data consumption end acquires the metadata of each file slice from the distributed database and acquires the plurality of file slices from the theme partition, the plurality of file slices are merged according to the metadata of each file slice, and a target file is obtained.
11. A file transfer system comprising a data consuming side for performing the file transfer method according to any one of claims 1 to 7 and a data producing side for performing the file transfer method according to claim 8.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the file transfer method according to any one of claims 1 to 7 when executing the program, or wherein the processor implements the file transfer method according to claim 8 when executing the program.
13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the file transfer method according to any one of claims 1 to 7, or which, when being executed by a processor, implements the file transfer method according to claim 8.
CN202211434252.6A 2022-11-16 2022-11-16 File transmission method, device, system, electronic equipment and storage medium Pending CN115801765A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211434252.6A CN115801765A (en) 2022-11-16 2022-11-16 File transmission method, device, system, electronic equipment and storage medium
PCT/CN2023/103618 WO2024103752A1 (en) 2022-11-16 2023-06-29 File transmission method, apparatus and system, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211434252.6A CN115801765A (en) 2022-11-16 2022-11-16 File transmission method, device, system, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115801765A true CN115801765A (en) 2023-03-14

Family

ID=85438160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211434252.6A Pending CN115801765A (en) 2022-11-16 2022-11-16 File transmission method, device, system, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN115801765A (en)
WO (1) WO2024103752A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024103752A1 (en) * 2022-11-16 2024-05-23 工赋(青岛)科技有限公司 File transmission method, apparatus and system, electronic device, and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10887253B1 (en) * 2014-12-04 2021-01-05 Amazon Technologies, Inc. Message queuing with fan out
CN109361629B (en) * 2018-10-26 2020-11-03 江苏大学 Kafka-based large message reliable transmission method
CN113835870A (en) * 2020-06-23 2021-12-24 华为技术有限公司 Data processing method and system
CN114077518B (en) * 2020-08-21 2024-10-01 湖南微步信息科技有限责任公司 Data snapshot method, device, equipment and storage medium
CN115250181A (en) * 2022-07-22 2022-10-28 中国电信股份有限公司 Kafka-based file verification transmission method, device, equipment and storage
CN115801765A (en) * 2022-11-16 2023-03-14 工赋(青岛)科技有限公司 File transmission method, device, system, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024103752A1 (en) * 2022-11-16 2024-05-23 工赋(青岛)科技有限公司 File transmission method, apparatus and system, electronic device, and storage medium

Also Published As

Publication number Publication date
WO2024103752A1 (en) 2024-05-23

Similar Documents

Publication Publication Date Title
CN110764706B (en) Storage system, data management method, and storage medium
CN110554930B (en) Data storage method and related equipment
CN109522316B (en) Log processing method, device, equipment and storage medium
CN104584524A (en) Aggregating data in a mediation system
CN112632008B (en) Data slicing transmission method and device and computer equipment
CN110928853A (en) Method and device for identifying log
CN112925661A (en) Message processing method and device, computer equipment and storage medium
CN106990914B (en) Data deleting method and device
CN112486915B (en) Data storage method and device
CN112311902B (en) File sending method and device based on micro-service
CN113485962A (en) Log file storage method, device, equipment and storage medium
CN112019605A (en) Data distribution method and system of data stream
CN108039960B (en) Configuration information issuing method and server
CN115801765A (en) File transmission method, device, system, electronic equipment and storage medium
US10048991B2 (en) System and method for parallel processing data blocks containing sequential label ranges of series data
KR101666440B1 (en) Data processing method in In-memory Database System based on Circle-Queue
CN106919574B (en) Method for processing remote synchronous file in real time
CN111427917A (en) Search data processing method and related product
US20150088958A1 (en) Information Processing System and Distributed Processing Method
CN113761052A (en) Database synchronization method and device
CN117033311A (en) File merging method and system based on slice uploading
US7127446B1 (en) File system based task queue management
CN112988429B (en) Data processing method and device, electronic equipment and computer readable storage medium
CN112597119A (en) Method and device for generating processing log and storage medium
CN110597802B (en) Message processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination