CN111586094A - File uploading method and device and computer equipment - Google Patents

File uploading method and device and computer equipment Download PDF

Info

Publication number
CN111586094A
CN111586094A CN202010222822.XA CN202010222822A CN111586094A CN 111586094 A CN111586094 A CN 111586094A CN 202010222822 A CN202010222822 A CN 202010222822A CN 111586094 A CN111586094 A CN 111586094A
Authority
CN
China
Prior art keywords
file
blocks
block
uploading
file block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010222822.XA
Other languages
Chinese (zh)
Inventor
鄢伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Pension Insurance Corp
Original Assignee
Ping An Pension Insurance Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Pension Insurance Corp filed Critical Ping An Pension Insurance Corp
Priority to CN202010222822.XA priority Critical patent/CN111586094A/en
Publication of CN111586094A publication Critical patent/CN111586094A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9057Arrangements for supporting packet reassembly or resequencing

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of big data storage, and particularly relates to a file uploading method and device and computer equipment. The method comprises the following steps: determining the number of file blocks to be divided according to the size of a file to be uploaded; allocating the number of file blocks into at least two file block tables; setting a corresponding hash function for each file block table; sequentially calculating file blocks of the file contents in each file block table as candidate file blocks; selecting a file block with the minimum file content contained in the file block as a target file block corresponding to the file content; according to a target file block corresponding to the file content, the file to be uploaded is segmented into a plurality of corresponding file blocks; and uploading at least two file blocks simultaneously in a multithreading mode. The method uses a plurality of hash functions to segment the file blocks together so as to ensure that the sizes of the file blocks are relatively consistent, and can avoid the problems of uneven file block distribution and low uploading efficiency.

Description

File uploading method and device and computer equipment
Technical Field
The invention belongs to the technical field of big data storage, and particularly relates to a file uploading method and device and computer equipment.
Background
In current web application systems (e.g., mailbox application), file uploading is a very common function. However, due to the influence of the server and the stability of the data transmission channel, many web application systems cannot support the file uploading at the GB level, which causes a great limitation to the daily use of the user.
In order to solve the limitation of file uploading size, some prior art schemes propose a file block uploading mode. Namely, one large file is divided into a plurality of small files for uploading, so that the technical effects of stable uploading, breakpoint continuous transmission and the like are achieved. However, the blocking mode of the existing file blocking uploading mode is too simple, for example, a mode of directly using data entries to perform slicing and the like does not well use the relevance of the content information of the file, negative influence is caused on subsequent analysis check and file uploading, and the file uploading efficiency needs to be improved.
Disclosure of Invention
The embodiment of the invention provides a file uploading method and device, computer equipment and a storage medium, and aims to solve the technical problem of low uploading efficiency of large files in the prior art.
In a first aspect, an embodiment of the present invention provides a file uploading method, including: determining the number of file blocks of a file to be uploaded, which need to be divided, according to the size of the file to be uploaded, wherein the file to be uploaded consists of a plurality of file contents;
allocating the number of file blocks into at least two file block tables; different file blocks are identified in each file block table through different key values;
setting a corresponding hash function for each file block table;
sequentially calculating the file block of the file content in each file block table as a candidate file block through the hash function;
selecting a file block with the minimum file content from the candidate file blocks as a target file block corresponding to the file content;
according to a target file block corresponding to the file content, the file to be uploaded is segmented into a plurality of corresponding file blocks;
and uploading at least two file blocks simultaneously in a multithreading mode.
Optionally, the determining, according to the size of the file to be uploaded, the number of file blocks into which the file to be uploaded needs to be divided includes:
the method comprises the following steps of pre-defining a plurality of different file size ranges, wherein each file size range has the number of file blocks needing to be divided;
determining the size range of the file where the file to be uploaded is located;
and determining the number of file blocks of the file to be uploaded, which need to be divided, according to the size range of the file.
Optionally, the uploading at least two of the file blocks simultaneously in a multi-thread manner includes:
estimating the uploading time required by each file block;
placing file blocks with uploading time differences within a preset range in the same task list to be uploaded;
starting a plurality of uploading threads, and simultaneously transmitting file blocks in the same task list to be uploaded.
Optionally, the estimating of the upload time required for each file block includes:
and calculating the uploading time required by each file block through a breadth-first search algorithm.
Optionally, the placing the file blocks with the difference of the uploading time within the preset range in the same task list to be uploaded includes:
arranging all file blocks to form an unordered sequence;
selecting one file block in the unordered sequence as a reference file block;
sequentially comparing the uploading time between the reference file block and other file blocks in the unordered sequence along the forward direction of the unordered sequence;
when the uploading time of the reference file block is less than that of other file blocks, exchanging the positions of the reference file block and the other file blocks in the sequence, so that all the file blocks with the uploading time greater than that of the reference file block are moved to the right side of the reference file block;
sequentially comparing the uploading time between the reference file block and other file blocks in the unordered sequence along the reverse direction of the unordered sequence;
when the uploading time of the reference file block is longer than that of other file blocks, exchanging the positions of the reference file block and the other file blocks in the sequence, so that all the file blocks with the uploading time shorter than that of the reference file block are moved to the left side of the reference file block;
re-selecting a new reference file block until the unordered sequence is updated to an ordered sequence arranged according to the size of uploading time;
and segmenting the file blocks in the ordered sequence into a corresponding list to be uploaded according to the preset range.
Optionally, the allocating the file block number into at least two file block tables includes:
the file block number is evenly distributed into a first file block table and a second file block table, the first file block table and the second file block table contain N/2 file blocks, and N is the number of the file blocks needing to be divided;
in the first file block table, identifying N/2 file blocks by sequentially increasing first key values;
and identifying N/2 file blocks in the second file block table through sequentially increasing second key values.
Optionally, sequentially calculating, by the hash function, a file block to which the file content belongs in each of the file block tables as a candidate file block includes:
calculating a first key value and a second key value corresponding to the file content through a first hash function and a second hash function respectively, wherein the first hash function corresponds to the first file block table, and the second hash function corresponds to the second file block table;
determining the file block in the first file block table according to the first key value corresponding to the file content, and determining the file block in the first file block table
And determining the file block which belongs to the second file block table according to a second key value corresponding to the file content.
In a second aspect, an embodiment of the present invention provides a file uploading apparatus, including: the file block quantity setting module is used for determining the quantity of file blocks of the file to be uploaded, which need to be divided, according to the size of the file to be uploaded, and the file to be uploaded consists of a plurality of file contents;
the file block table setting module is used for distributing the number of the file blocks to at least two file block tables; different file blocks are identified in each file block table through different key values;
the hash function setting module is used for setting a corresponding hash function for each file block table;
the hash mapping module is used for sequentially calculating the file blocks of the file contents in each file block table as candidate file blocks through the hash function;
the selecting module is used for selecting a file block with the minimum file content from the candidate file blocks as a target file block corresponding to the file content;
the file segmentation module is used for segmenting the file to be uploaded into a plurality of corresponding file blocks according to the target file blocks corresponding to the file contents;
and the uploading module is used for simultaneously uploading at least two file blocks in a multithreading mode.
In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the file uploading method as described above when executing the computer program.
In a fourth aspect, the present invention also provides a computer-readable storage medium, where a computer program is stored, and when executed by a processor, the computer program causes the processor to execute the file uploading method described above.
According to the file uploading method, the file uploading device, the computer equipment and the storage medium, the at least two different hash functions are used for jointly segmenting the file blocks, so that the sizes of the file blocks are relatively consistent, the problem that when a single hash function is used for segmenting the file blocks, due to the fact that the sizes of different file blocks are not uniform, the file uploading efficiency is underground can be solved, and the file uploading efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a computer device according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a file uploading method according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a file block table after a single hash function is split;
fig. 4 is a schematic diagram of a process of splitting a file by two hash functions according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart of step 27 in FIG. 1;
fig. 6 is a schematic diagram of a method for acquiring distribution of file block upload time according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a file uploading apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
The embodiment of the invention provides a file uploading method, which can map divided file blocks through at least two different hash functions, and the file blocks in each file block list after being divided are uniform in size, so that the file uploading efficiency is improved.
First, a hardware environment of the file uploading method is introduced below, referring to fig. 1, where fig. 1 is a schematic structural diagram of a computer device 100 according to an embodiment of the present invention. The computer apparatus 100 may be a computer, a cluster of computers, a main stream computer, a computing device dedicated to providing online content, or a computer network comprising a set of computers operating in a centralized or distributed manner.
As shown in fig. 1, the computer apparatus 100 includes: a processor 102, memory and network interface 105 connected by a system bus 101; the memory may include, among other things, a non-volatile storage medium 103 and an internal memory 104.
In the embodiment of the present invention, the Processor 102 may be a Central Processing Unit (CPU), and the Processor 102 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. according to the type of hardware used. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The number of processors 102 may be one or more, and one or more of the processors 102 may execute sequences of computer program instructions to perform various file upload methods that will be described in more detail below.
The computer program instructions are stored by, accessed from, and read from the non-volatile storage medium 103 to be executed by the processor 10, thereby implementing the tuning method disclosed in the following embodiments of the present invention. For example, the nonvolatile storage medium 103 stores a software application that executes the adjustment method described below. Further, the non-volatile storage medium 103 may store the entire software application or only a portion of the software application that may be executed by the processor 102. It should be noted that although only one block is shown in fig. 1, the non-volatile storage medium 103 may comprise a plurality of physical devices installed on a central processing device or different computing devices.
The network interface 105 is used for network communication, such as providing transmission of data information. Those skilled in the art will appreciate that the configuration shown in fig. 1 is a block diagram of only a portion of the configuration associated with aspects of the present invention and is not intended to limit the computing device 100 to which aspects of the present invention may be applied, and that a particular computing device 100 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Those skilled in the art will appreciate that the embodiment of a computer device illustrated in fig. 1 does not constitute a limitation on the specific construction of the computer device, and that in other embodiments a computer device may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may only include a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in fig. 1, and are not described herein again.
The embodiment of the invention also provides a computer readable storage medium. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the file uploading method disclosed by the embodiments of the present invention. The computer program product is embodied on one or more computer readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer program code.
In the case of implementing the computer device 100 in software, fig. 2 shows a schematic diagram of a file uploading method of an embodiment, and the method in fig. 2 is described in detail below. Referring to fig. 2, the method includes the following steps:
step 21, determining the number of file blocks of the file to be uploaded, which need to be divided, according to the size of the file to be uploaded.
The file to be uploaded consists of a plurality of file contents. The file to be uploaded is usually a whole file with a large size, which may be data information in any form or format. Of course, the file to be uploaded is not inseparable, and contains many different file contents therein. For example, a compact package may contain many photos, or a document file may contain multiple pages of documents, and each page of documents or each chapter may be considered as a file content. In the present embodiment, the term "file content" is used to indicate the smallest divisible unit data information in a file to be uploaded.
The size of the file block obtained by division is required to be kept within a certain range, so that stable uploading is facilitated. Therefore, it is generally necessary to divide them into a suitable number of file blocks according to different file sizes.
In some embodiments, the relationship between file size and number of file blocks is expressed in the form of a piecewise function. The step 21 may specifically include the following steps:
first, a plurality of different file size ranges are pre-defined, and each file size range has a number of file blocks that need to be divided. And then, determining the size range of the file where the file to be uploaded is located. And finally, determining the number of file blocks of the file to be uploaded, which need to be divided, according to the size range of the file.
For example, when the file size is 1GB-2GB, the corresponding file block number at this time may be determined to be 1024, and when the file size exceeds 2GB, the file block number may be determined to be 2048. If the file size further increases the number of file blocks.
Of course, the file size range and the corresponding number of file blocks may be adjusted according to the actual requirement, which is an empirical value.
And step 22, distributing the file block number to at least two file block tables.
Wherein the file block table is a table composed of a plurality of consecutive file blocks. At this time, none of the file blocks has an initial state and is not empty, that is, no file content corresponds to the file block.
Different file blocks are identified by different key values in each file block table. For example, the file blocks in the file block table have sequentially increasing key values, and the file block table is composed of file blocks 1, file blocks 2 and file blocks N.
In some embodiments, file blocks 1, 2 through N have content and temporal continuity, and the sizes of file blocks 1, 2 through N may be divided according to their continuity in content.
And step 23, setting a corresponding hash function for each file block table.
A hash function is a function used to compress a large range of data maps into a smaller range. Through the hash function, the file contents in the file to be uploaded can be mapped to each file block in the file block table respectively. In this embodiment, different hash functions may be set correspondingly based on each file block table.
And step 24, sequentially calculating the file block of the file content in each file block table as a candidate file block through the hash function.
Conventional hash functions are typically implemented by division by rounding. For example, the content of the file to be uploaded is converted into a numerical value as an input value by a regular expression or the like. Then, after dividing it by a proper number (such as the number N of file blocks), the rest of the numbers are taken as key values and allocated to the file blocks with the same key values.
In the process of segmenting the file to be uploaded, the operation of calculating the hash function to obtain the corresponding key value needs to be executed for multiple times. It is inefficient for the processor to perform a division operation.
Preferably, the corresponding key value can be calculated by converting the key value into a multiplication mode and a right shift mode. For example, the hash function is set to: hash (key) > > x. That is, the number obtained by squaring the input value is shifted to the right, thereby obtaining the corresponding key value.
By changing the division operation into the multiplication and the right shift operation, the method is beneficial to reducing the operation load of computers such as a processor and the like, and increases the calculation efficiency when the file blocks are cut.
In this embodiment, there are a plurality of hash functions. Therefore, the file blocks to which the file contents correspond or map in a plurality of different file block tables can be calculated.
And step 25, selecting the file block with the minimum file content from the candidate file blocks as a target file block corresponding to the file content.
In the conventional technology, the content or the file name of a file to be uploaded is used as an input value and is mapped into a corresponding file block through a hash function. Therefore, the same or similar contents in the file to be uploaded are distributed in the same file block, and compared with other simple segmentation modes according to data entry modes and the like, the method has a better segmentation effect and is more convenient for subsequent uploading operation.
However, when only a single hash function is used for file block splitting, there may be a problem of splitting non-uniformity. For example, a large part of files may be mapped into several file blocks, which causes uneven sizes between different file blocks, and seriously affects the uploading efficiency. Such as shown in fig. 3, where the length of the file in fig. 3 indicates the size of the current file block. As can be seen from the file block table shown in fig. 3, the file blocks with key values of 2, 3, and 4 are much larger than the file blocks with key values of 1 and 6, so that the file blocks are unevenly distributed.
Therefore, by selecting the mode with the least file content in step 25, the sizes of the file blocks can be ensured to be consistent, and the phenomenon that a certain file block is accumulated as shown in fig. 3 is avoided.
And step 26, segmenting the file to be uploaded into a plurality of corresponding file blocks according to the target file blocks corresponding to the file contents.
After the target file blocks corresponding to all the file contents are determined, the file to be uploaded can be correspondingly split into a plurality of file blocks. Each file block contains corresponding file content for uploading.
And 27, uploading at least two file blocks simultaneously in a multithreading mode.
After the file is split into a plurality of file blocks, a multithreading uploading mode can be supported, and two or more file blocks are uploaded at the same time, so that the uploading capacity and speed of large files are well improved. Of course, the number of simultaneously uploaded file blocks (i.e., the total number of upload threads that can be turned on) that are specifically supported depends on the performance of the system.
The following takes two different file block tables and their corresponding hash functions as an example to describe in detail the specific steps of determining file blocks during the file uploading process. As shown in fig. 4, the specific cutting process is as follows:
1. the file block number is first evenly distributed into a first file block table and a second file block table. The first and second file block tables contain N/2 file blocks, where N is the number of file blocks that need to be divided.
As shown in fig. 4, in the first file block table, N/2 file blocks are identified by sequentially increasing first key values, and in the second file block table, N/2 file blocks are identified by sequentially increasing second key values.
2. Two different first hash functions and second hash functions are then set for the two file block tables. The two hash functions may specifically adopt hash functions in any suitable form, and only the use requirement of mapping the file content to the key value in the file block table needs to be met.
For example, the first hash function and the second hash function may each use different multipliers or be shifted to the right by different numbers of bits.
3. And calculating a first key value and a second key value corresponding to the file content through a first hash function and a second hash function respectively.
4. And determining the file block which belongs to the first file block table according to a first key value corresponding to the file content, and determining the file block which belongs to the second file block table according to a second key value corresponding to the file content.
5. And judging whether the file content contained in the file block corresponding to the first key value at the moment is larger than the file block corresponding to the second key value. And if so, dividing the file content into file blocks corresponding to the second key value. And if not, dividing the file content into file blocks corresponding to the first key value.
As can also be seen from fig. 4, the sizes of the file blocks in the file list obtained by jointly performing the file block segmentation through two different hash functions are relatively consistent. In other embodiments, 3, 4 or more hash functions may be used for partitioning, so as to satisfy the requirement that the sizes of the file blocks in the file list are relatively consistent.
In some embodiments, the file blocks can be sequentially ordered according to the uploading time required by each file block, so as to further improve the file uploading efficiency. As shown in fig. 5, step 27 includes:
step 271, estimating the uploading time required by each file block.
After the file segmentation is completed, the uploading condition of the file blocks can be estimated according to a series of actual conditions such as the content and the size of each segmented file block, the current uploading channel and the like, and the processing time required by the file blocks is predicted. The uploading time required by each file block is estimated, so that the uploading time of the files can be sequenced in sequence, and the uploading efficiency of the files is improved.
Considering that the processing time has more influence factors, a breadth first search algorithm can be specifically used for weighing the relation and influence among the factors, and finally the required processing time is determined.
For example, the bandwidth of the current upload channel is acquired, and the processing time that may be required is calculated according to the bandwidth and the size of the file block, but the analysis verification time that needs to be consumed after the file block is uploaded may also be further combined as a part of the predicted value of the processing time of the file block.
The specific processing time estimation method used can be set by those skilled in the art according to the needs of the actual situation, the processing time is only used for providing a reference, and can be an approximate estimation value, and an abnormally accurate value does not need to be obtained.
And 272, placing the file blocks with the difference of the uploading time within the preset range in the same task list to be uploaded.
The preset range is an empirical value, and can be set according to actual conditions (for example, the distribution of file block uploading time). Each file block has different uploading time according to the specific content, characteristics and the like. In actual operation, the file blocks can be correspondingly grouped according to different uploading times, and the file blocks with similar uploading times are placed in the same task list to be uploaded, so that subsequent multithread operation is facilitated, and the uploading efficiency of the file blocks is improved.
Step 273, starting a plurality of uploading threads, and simultaneously transmitting the file blocks in the same task list to be uploaded.
When multi-thread uploading is carried out, the uploaded file blocks need to be checked and the like. Therefore, it is always desirable to be able to complete the uploading task synchronously as much as possible for each uploading thread, and avoid the uploading thread being empty, etc.
After the task list to be uploaded is divided, the file blocks in the same task list to be uploaded can be uploaded simultaneously according to corresponding priority, and uploading efficiency is improved.
The implementation of step 272 is described in detail below. Fig. 6 is a schematic diagram of a method for quickly finding file blocks close to an upload time according to an embodiment of the present invention. As shown in fig. 6, the method may include the steps of:
first, all file blocks are arranged to form an unordered sequence. The boxes in the figure represent one file block. The numbers in the boxes indicate the time required for uploading a block of the file (for simplicity, only 1 to 7 are used to indicate the amount of time, and do not indicate the actual required upload time).
And then, selecting one file block as a reference file block, and sequentially comparing the uploading time between the reference file block and other file blocks in the unordered sequence along the forward direction of the unordered sequence. In fig. 6, the first file block at time 5 is taken as an example, and the forward direction refers to a direction from right to left in fig. 6.
And when the uploading time of the reference file block is less than that of other file blocks, exchanging the positions of the reference file block and the other file blocks in the sequence.
Then, the upload times between the base file block and the other file blocks in the unordered sequence are sequentially compared along the reverse direction of the unordered sequence (i.e., the left-to-right direction shown in fig. 6).
The above operations are repeated until the file block is no longer moved. At this time, all file blocks whose processing time is longer than that of the reference file block are moved to the right, and file blocks whose processing time is shorter than that of the reference file block are moved to the left.
And finally, reselecting a new reference file block, and continuously repeating the above operations for sequencing to finally obtain the ordered sequence sequenced according to the uploading time.
Based on the finally obtained ordered sequence, the overall uploading time distribution condition can be easily obtained, and the file blocks with similar uploading time are segmented into the corresponding list to be uploaded.
As can be seen from the above description, when the above sorting method is used to analyze the processing time of the file blocks, if the number of the file blocks is large, it may be necessary to execute a large number of iteration sequences to be able to converge, which takes a long time.
In a preferred embodiment, in order to further improve the sorting efficiency, please continue to refer to fig. 6, after finishing the sorting of the first reference file block (the reference file block with time 5 in fig. 6), the reference file block 1 and the reference file block 2 may be respectively selected on the left side and the right side of the reference file block, and the sorting operation may be performed on the portions smaller than the reference file block and larger than the reference file block, for example, all the file blocks in fig. 6, i.e. on the left side of the reference file block with time 5, are sorted by the reference file block with time 4 (reference 1 in fig. 6), all the file blocks on the right side of the reference file block with time 5 are sorted by the reference file block with time 6 (reference 2 in fig. 6), and the sorting on the left side and the sorting on the right side are performed simultaneously. After the reference file block 1 and the reference file block 2 finish the sorting operation, more reference file blocks can be further selected to simultaneously execute the sorting operation.
By the mode, the number of the reference file blocks which can be used simultaneously can be rapidly increased in a mode of the power of N of 2, and the time required by sequencing and analysis is effectively shortened.
The file uploading method provided by the embodiment of the invention uses at least two different hash functions to jointly perform file block segmentation so as to ensure that the sizes of the file blocks are relatively consistent, can avoid the problem of underground file uploading efficiency caused by uneven sizes among different file blocks when a single hash function performs file block segmentation, and improves the file uploading efficiency.
An embodiment of the present invention further provides a file uploading apparatus corresponding to the file uploading method in the foregoing embodiment, please refer to fig. 7, and fig. 7 provides a structural block diagram of the file uploading apparatus provided in the embodiment of the present invention, and as shown in fig. 7, the file uploading apparatus 700 includes: a file block number setting module 701, a file block table setting module 702, a hash function setting module 703, a hash mapping module 704, a selection module 705, a file splitting module 706, and an uploading module 707.
The file block number setting module 701 is configured to determine, according to the size of a file to be uploaded, the number of file blocks of the file to be uploaded, which needs to be divided, where the file to be uploaded is composed of a plurality of file contents. The file block table setting module 702 is configured to allocate the number of file blocks to at least two file block tables; different file blocks are identified in each file block table by different key values. The hash function setting module 703 is configured to set a corresponding hash function for each file block table. The hash mapping module 704 is configured to sequentially calculate, as candidate file blocks, file blocks to which the file contents belong in each of the file block tables through the hash function. The selecting module 705 is configured to select, from the candidate file blocks, a file block with the minimum file content included in the file block as a target file block corresponding to the file content. The file segmentation module 706 is configured to segment the file to be uploaded into a plurality of corresponding file block uploading modules according to the target file block corresponding to the file content. 707 for uploading at least two of said file blocks simultaneously in a multi-threaded manner.
The file uploading device provided by the embodiment of the invention uses at least two different hash functions to jointly perform file block segmentation so as to ensure that the sizes of the file blocks are relatively consistent, can avoid the problem of underground file uploading efficiency caused by uneven sizes of different file blocks when a single hash function performs file block segmentation, and improves the file uploading efficiency.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.
While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A file uploading method is characterized by comprising the following steps:
determining the number of file blocks of a file to be uploaded, which need to be divided, according to the size of the file to be uploaded, wherein the file to be uploaded consists of a plurality of file contents;
allocating the number of file blocks into at least two file block tables; different file blocks are identified in each file block table through different key values;
setting a corresponding hash function for each file block table;
sequentially calculating the file block of the file content in each file block table as a candidate file block through the hash function;
selecting a file block with the minimum file content from the candidate file blocks as a target file block corresponding to the file content;
according to a target file block corresponding to the file content, the file to be uploaded is segmented into a plurality of corresponding file blocks;
and uploading at least two file blocks simultaneously in a multithreading mode.
2. The file uploading method according to claim 1, wherein the determining the number of file blocks into which the file to be uploaded needs to be divided according to the size of the file to be uploaded comprises:
the method comprises the following steps of pre-defining a plurality of different file size ranges, wherein each file size range has the number of file blocks needing to be divided;
determining the size range of the file where the file to be uploaded is located;
and determining the number of file blocks of the file to be uploaded, which need to be divided, according to the size range of the file.
3. The file uploading method according to claim 2, wherein the uploading at least two of the file blocks simultaneously in a multi-threaded manner comprises:
estimating the uploading time required by each file block;
placing file blocks with uploading time differences within a preset range in the same task list to be uploaded;
starting a plurality of uploading threads, and simultaneously transmitting file blocks in the same task list to be uploaded.
4. The file uploading method according to claim 3, wherein the estimating of the uploading time required for each file block comprises:
and calculating the uploading time required by each file block through a breadth-first search algorithm.
5. The file uploading method according to claim 3, wherein the placing of the file blocks with the difference of the uploading time within the preset range in the same task list to be uploaded comprises:
arranging all file blocks to form an unordered sequence;
selecting one file block in the unordered sequence as a reference file block;
sequentially comparing the uploading time between the reference file block and other file blocks in the unordered sequence along the forward direction of the unordered sequence;
when the uploading time of the reference file block is less than that of other file blocks, exchanging the positions of the reference file block and the other file blocks in the sequence, so that all the file blocks with the uploading time greater than that of the reference file block are moved to the right side of the reference file block;
sequentially comparing the uploading time between the reference file block and other file blocks in the unordered sequence along the reverse direction of the unordered sequence;
when the uploading time of the reference file block is longer than that of other file blocks, exchanging the positions of the reference file block and the other file blocks in the sequence, so that all the file blocks with the uploading time shorter than that of the reference file block are moved to the left side of the reference file block;
re-selecting a new reference file block until the unordered sequence is updated to an ordered sequence arranged according to the size of uploading time;
and segmenting the file blocks in the ordered sequence into a corresponding list to be uploaded according to the preset range.
6. The file uploading method according to claim 1, wherein the allocating the file block number into at least two file block tables comprises:
the file block number is evenly distributed into a first file block table and a second file block table, the first file block table and the second file block table contain N/2 file blocks, and N is the number of the file blocks needing to be divided;
in the first file block table, identifying N/2 file blocks by sequentially increasing first key values;
and identifying N/2 file blocks in the second file block table through sequentially increasing second key values.
7. The file uploading method according to claim 6, wherein said sequentially calculating, by the hash function, the file block to which the file content belongs in each of the file block tables as the candidate file block comprises:
calculating a first key value and a second key value corresponding to the file content through a first hash function and a second hash function respectively, wherein the first hash function corresponds to the first file block table, and the second hash function corresponds to the second file block table;
determining the file block in the first file block table according to the first key value corresponding to the file content, and determining the file block in the first file block table
And determining the file block which belongs to the second file block table according to a second key value corresponding to the file content.
8. A file uploading apparatus, comprising:
the file block quantity setting module is used for determining the quantity of file blocks of the file to be uploaded, which need to be divided, according to the size of the file to be uploaded, and the file to be uploaded consists of a plurality of file contents;
the file block table setting module is used for distributing the number of the file blocks to at least two file block tables; different file blocks are identified in each file block table through different key values;
the hash function setting module is used for setting a corresponding hash function for each file block table;
the hash mapping module is used for sequentially calculating the file blocks of the file contents in each file block table as candidate file blocks through the hash function;
the selecting module is used for selecting a file block with the minimum file content from the candidate file blocks as a target file block corresponding to the file content;
the file segmentation module is used for segmenting the file to be uploaded into a plurality of corresponding file blocks according to the target file blocks corresponding to the file contents;
and the uploading module is used for simultaneously uploading at least two file blocks in a multithreading mode.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the file upload method of any of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to execute the file upload method according to any one of claims 1 to 7.
CN202010222822.XA 2020-03-26 2020-03-26 File uploading method and device and computer equipment Pending CN111586094A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010222822.XA CN111586094A (en) 2020-03-26 2020-03-26 File uploading method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010222822.XA CN111586094A (en) 2020-03-26 2020-03-26 File uploading method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN111586094A true CN111586094A (en) 2020-08-25

Family

ID=72113532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010222822.XA Pending CN111586094A (en) 2020-03-26 2020-03-26 File uploading method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN111586094A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112637341A (en) * 2020-12-22 2021-04-09 平安银行股份有限公司 File uploading method and device, electronic equipment and storage medium
CN113411393A (en) * 2021-06-17 2021-09-17 中国工商银行股份有限公司 File pushing method and device
CN117155922A (en) * 2023-10-31 2023-12-01 国家超级计算天津中心 File transmission method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7797323B1 (en) * 2006-10-11 2010-09-14 Hewlett-Packard Development Company, L.P. Producing representative hashes for segments of a file
CN103873507A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 Data block uploading and storing system and method
US20150058301A1 (en) * 2013-08-20 2015-02-26 International Business Machines Corporation Efficient data deduplication in a data storage network
CN105868305A (en) * 2016-03-25 2016-08-17 西安电子科技大学 A fuzzy matching-supporting cloud storage data dereplication method
CN107426331A (en) * 2017-08-09 2017-12-01 北京天信瑞安信息技术有限公司 A kind of file uploading method and device based on JavaScript
CN109359099A (en) * 2018-08-21 2019-02-19 中国平安人寿保险股份有限公司 Distributed document method for uploading, device, computer equipment and storage medium
CN109756568A (en) * 2018-12-29 2019-05-14 上海掌门科技有限公司 Processing method, equipment and the computer readable storage medium of file
CN110569213A (en) * 2018-05-18 2019-12-13 北京果仁宝软件技术有限责任公司 File access method, device and equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7797323B1 (en) * 2006-10-11 2010-09-14 Hewlett-Packard Development Company, L.P. Producing representative hashes for segments of a file
CN103873507A (en) * 2012-12-12 2014-06-18 鸿富锦精密工业(深圳)有限公司 Data block uploading and storing system and method
US20150058301A1 (en) * 2013-08-20 2015-02-26 International Business Machines Corporation Efficient data deduplication in a data storage network
CN105868305A (en) * 2016-03-25 2016-08-17 西安电子科技大学 A fuzzy matching-supporting cloud storage data dereplication method
CN107426331A (en) * 2017-08-09 2017-12-01 北京天信瑞安信息技术有限公司 A kind of file uploading method and device based on JavaScript
CN110569213A (en) * 2018-05-18 2019-12-13 北京果仁宝软件技术有限责任公司 File access method, device and equipment
CN109359099A (en) * 2018-08-21 2019-02-19 中国平安人寿保险股份有限公司 Distributed document method for uploading, device, computer equipment and storage medium
CN109756568A (en) * 2018-12-29 2019-05-14 上海掌门科技有限公司 Processing method, equipment and the computer readable storage medium of file

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112637341A (en) * 2020-12-22 2021-04-09 平安银行股份有限公司 File uploading method and device, electronic equipment and storage medium
CN112637341B (en) * 2020-12-22 2022-12-13 平安银行股份有限公司 File uploading method and device, electronic equipment and storage medium
CN113411393A (en) * 2021-06-17 2021-09-17 中国工商银行股份有限公司 File pushing method and device
CN117155922A (en) * 2023-10-31 2023-12-01 国家超级计算天津中心 File transmission method and device
CN117155922B (en) * 2023-10-31 2024-01-30 国家超级计算天津中心 File transmission method and device

Similar Documents

Publication Publication Date Title
CN111586094A (en) File uploading method and device and computer equipment
US9575984B2 (en) Similarity analysis method, apparatus, and system
CN109787638B (en) Data compression storage processing device and method
US8620932B2 (en) Parallel sorting apparatus, method, and program
US8504595B2 (en) De-duplication for a global coherent de-duplication repository
CN112000467A (en) Data tilt processing method and device, terminal equipment and storage medium
CN112085644B (en) Multi-column data ordering method and device, readable storage medium and electronic equipment
CN110309143B (en) Data similarity determination method and device and processing equipment
CN112148693A (en) Data processing method, device and storage medium
WO2023143095A1 (en) Method and system for data query
CN110175155B (en) File deduplication processing method and system
CN108093024B (en) Classified routing method and device based on data frequency
CN113553175A (en) Optimal sorting algorithm selection method facing traffic data flow
CN113177050A (en) Data balancing method, device, query system and storage medium
US6865527B2 (en) Method and apparatus for computing data storage assignments
CN108776698B (en) Spark-based anti-deflection data fragmentation method
CN110704424A (en) Sorting method and device applied to database and related equipment
US11250001B2 (en) Accurate partition sizing for memory efficient reduction operations
US20160253591A1 (en) Method and apparatus for managing performance of database
CN110909085A (en) Data processing method, device, equipment and storage medium
CN114546652A (en) Parameter estimation method and device and electronic equipment
CN107783990B (en) Data compression method and terminal
CN109684602B (en) Batch processing method and device and computer readable storage medium
CN114741029A (en) Data distribution method applied to deduplication storage system and related equipment
CN109344119B (en) File merging processing method and device, computing equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination