CN117667517A

CN117667517A - Distributed file processing method, device, server, equipment and storage medium

Info

Publication number: CN117667517A
Application number: CN202311699821.4A
Authority: CN
Inventors: 刘洋; 李华庆; 瞿盛辉
Original assignee: Beijing Hexin Digital Technology Co ltd; Hexin Technology Co ltd
Current assignee: Beijing Hexin Digital Technology Co ltd; Hexin Technology Co ltd
Priority date: 2023-12-11
Filing date: 2023-12-11
Publication date: 2024-03-08

Abstract

The invention provides a distributed file processing method, a device, a server, equipment and a storage medium, comprising the following steps: acquiring a first characteristic value of a file to be stored, and acquiring a first characteristic value index for identifying the file to be stored according to the first characteristic value; sequentially extracting first data streams with preset sizes from files to be stored, arranging the first data streams in data frames with the same size as a preset mask, and performing mask operation on the data frames according to the preset mask to obtain m first fragmented files; wherein m is a positive integer; the size of the preset mask is a rectangular mask divided by the number of the first fragmented files; sequentially transmitting the m first fragmented files and the first characteristic value indexes to a preset server, so that the preset server stores the m first fragmented files under the first characteristic value indexes; the invention can improve the file storage efficiency.

Description

Distributed file processing method, device, server, equipment and storage medium

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a distributed file processing method, device, server, device, and storage medium.

Background

In the prior art for storing and recovering files, a block of space is reserved in the local machine for hiding and storing, or after the files to be stored are divided into at least two object files, each object file is corresponding to one virtual storage node to avoid the occupation of the storage space. Another improved prior art for storing files in sequential segments, which is capable of recovering files, but in sequential segments, is a way of reading and dividing files in sequential segments, with a time complexity of the file length, i.e. a length of m ² When the file of the (E) is stored, the time complexity is thatThe file storage efficiency is low.

Disclosure of Invention

The invention aims to overcome the defects in the prior art and provide a distributed file processing method, a device, a server, equipment and a storage medium, which can improve the file storage efficiency.

In a first aspect, the present invention provides a distributed file processing method, including:

acquiring a first characteristic value of a file to be stored, and acquiring a first characteristic value index for identifying the file to be stored according to the first characteristic value;

sequentially extracting first data streams with preset sizes from the files to be stored, arranging the first data streams in data frames with the same size as a preset mask, and performing mask operation on the data frames according to the preset mask to obtain m first fragmented files; wherein m is a positive integer; the size of the preset mask is a rectangular mask divided by the number of the first fragmented files;

And sequentially transmitting the m first fragmented files and the first eigenvalue indexes to a preset server, so that the preset server stores the m first fragmented files under the first eigenvalue indexes.

According to the method, the data streams in the files to be stored are arranged in the data frame with the same size as the preset mask, and the data streams are segmented by using the rectangular mask with the size of the preset mask divided by the number of the first segmented files, so that one data stream can be divided into a plurality of segmented files at the same time, the efficiency of storing the files in the preset server in the segmented mode can be improved, and the experience of a user in processing the files is greatly improved.

With reference to the first aspect, in one possible implementation manner, the arranging the first data stream in a data frame with the same size as a preset mask, performing a masking operation on the data frame according to the preset mask, to obtain m first fragment files includes:

arranging bit data of a plurality of bits in rows of the first data stream, and arranging a plurality of rows to obtain a rectangular data frame with the same size as the preset mask;

and performing masking operation on the rectangular data frame according to a preset mask dividing m areas, obtaining m bit data partitions at the same time, taking bit data corresponding to each bit data partition on the first data stream as 1 first slicing file, and obtaining m first slicing files in total.

The invention arranges data streams in rectangular data frames with the same size as a preset mask, and performs mask operation on the rectangular data frames through the preset mask to obtain a plurality of fragmented files, and processes a fragment file with the length of m ² The time complexity of the data stream of (a) isCompared to the time complexity of reading the data stream in sequence and dividing it into m consecutive segments of data according to the prior art->According to the invention, the rectangular data frame and the preset mask are constructed to exchange space consumption for time consumption, so that the file slicing time can be greatly reduced, the file slicing efficiency is improved, and the experience of a user for storing files is further improved.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, setting the preset mask includes:

constructing a mask which takes bits as a unit and has a size of m rows and m columns according to the byte length of the first data stream, and equally dividing the mask into m areas in a diagonal direction to obtain the preset mask; wherein the size of the preset mask is not smaller than the byte length.

The method adopts the preset mask which equally divides the mask into m areas in the diagonal direction, so that after the mask operation is carried out on the preset mask and the data frame, a plurality of bit data partitions are obtained simultaneously, and each bit data partition is used as a slicing file, thereby improving the efficiency of file slicing; and when a segmented file is stolen, the internal data of the segmented file divided by adopting diagonal lines is discontinuous and has higher randomness, so that even if the stolen segmented file is cracked, a section of complete data cannot be obtained from the segmented file, and the safety of file storage is improved.

With reference to the first aspect and the foregoing possible implementation manner, in another possible implementation manner, after the storing, by the preset server, the m first fragmented files under the first eigenvalue index, searching, according to a second eigenvalue index of the file to be processed, a corresponding file to be downloaded to perform file restoration or file transmission includes:

after receiving the first characteristic value index, the preselected master server establishes a file index table to record the storage conditions of the master server and the slave servers according to the first characteristic value index and the received at least one first fragmented file and the preselected at least one slave server to receive the first characteristic value index and the residual first fragmented file; the master server and the slave server are selected in real time according to an ant colony algorithm fed back positively by the heuristic mark;

when the file to be restored needs to be restored or the file to be transmitted needs to be transmitted, searching the file to be downloaded corresponding to the file index table according to the second characteristic value index of the file to be processed, and carrying out file restoration or file transmission.

The invention adopts the main server and the slave server to store the fragmented files, and the main server establishes the file index table according to the first characteristic index and the fragmented files, thereby perfecting the file recovery mechanism on the basis of the encrypted transmission based on the file fragments and enabling the file processing to be applicable to more scenes.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the master server and the slave server are selected in real time according to an ant colony algorithm with heuristic marking forward feedback, including:

when uploading the file to be stored each time, the same test data are sent to a plurality of servers, and the servers are tracked to receive and process the test data, and the average time delay of unit data volume of a processing result is returned;

when the test data is sent next time, the first sub-test data volume sent to the server with the minimum average time delay is larger than the second sub-test data volume sent to the rest servers, and the average time delay is re-tracked;

and after repeated tests, marking the same server for all the test data, and the rest servers are slave servers.

The method adopts the division of the master server and the slave servers, and can select the optimal server for storing the fragmented files, thereby improving the file transmission efficiency; in addition, as the master server and the slave server which are selected in real time are used for storing the files each time, the master server and the slave server which are used for storing the files each time are different, the randomness of storing the files is improved, and the safety of storing the files can be further improved through the selection of different master and slave servers.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the searching the file to be downloaded corresponding to the file index table according to the second eigenvalue index of the file to be processed to restore the file includes:

obtaining a second characteristic value of a file to be processed, carrying out hash calculation on the second characteristic value to obtain a second characteristic value index, and inquiring the second characteristic value index in the file index table to obtain a first downloading fragmented file in the main server;

meanwhile, initiating global matching of the second characteristic value indexes to all slave servers according to the master server to obtain the rest second download fragmented files;

and sending the first download fragmented file and the second download fragmented file to a user terminal in batches, so that the user terminal reorganizes the first download fragmented file and the second download fragmented file according to the preset mask to obtain the file to be recovered or the transmission file.

The invention obtains the corresponding download fragmented files from the main server and the slave server through the characteristic values of the files to be processed, and provides a file recovery mechanism on the basis of the file storage mechanism, thereby being capable of being applied to the comprehensive scene of file transmission, file encryption and file recovery and having higher applicability and practicability.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the distributing the first download fragmented file and the second download fragmented file to the user terminal includes:

and when the second characteristic value of the first download fragmented file is consistent with the second characteristic value of the second download fragmented file, erasing the second characteristic value of the second download fragmented file to obtain a sub-file to be processed, so that the first download fragmented file and the sub-file to be processed are sent to a user terminal in batches.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the sequentially transmitting the m first shard files and the first eigenvalue index to a preset server includes:

m first fragment files and the first characteristic value indexes are respectively transmitted to a preset m-1 slave servers and 1 master server; wherein each slave server receives a first fragmented file and one of the characteristic value indexes, and the master server receives a fragmented file and one of the characteristic value indexes; the number of the divided first fragmented files is determined by the total number of the slave servers and the master server.

According to the method, the number of the first fragmented files is determined by the total number of the servers, each server receives 1 first fragmented file each time, when data is stolen from one server, complete data cannot be obtained after decoding, and all server data can be obtained to obtain complete data, so that the file storage safety is improved, the number of the first fragmented files is determined by the total number of the slave servers and the master servers, idle servers can be avoided, and the utilization rate of the servers is improved.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the sequentially extracting a first data stream of a preset size from the file to be stored includes:

arranging binary data streams of the file to be stored into square data blocks of M x M units according to bytes, and extracting binary data streams of L units from the square data blocks each time as a first data stream; wherein M and L are both positive integers.

With reference to the first aspect and the foregoing possible implementation manners, in another possible implementation manner, the obtaining a first feature value of a file to be stored, and according to the first feature value, obtaining a first feature value index that identifies the file to be stored includes:

Acquiring an absolute path file name of a file to be stored on a user terminal, and taking the absolute path file name as a first sub-characteristic value; or, performing hash calculation on file contents of a preset position of the file to be stored to obtain a second sub-characteristic value, so that a relation table is established according to the second sub-characteristic value and the file name to be stored; wherein the first feature value includes: the first sub-feature value or the second sub-feature value;

and carrying out hash operation on the first sub-characteristic value or the second sub-characteristic value to obtain a first characteristic value index for identifying the file to be stored.

The invention adopts the absolute path file name on the user terminal and the Hash value of part of file content to respectively obtain the unique characteristic value, thereby reducing Hash (Hash) conflict and increasing file confidentiality.

In a second aspect, the present invention provides a distributed file processing apparatus, including: the device comprises a first characteristic value acquisition module, a segmentation module and a storage module; wherein,

the first characteristic value acquisition module is used for acquiring a first characteristic value of a file to be stored and acquiring a first characteristic value index for identifying the file to be stored according to the first characteristic value;

The slicing module is used for sequentially extracting first data streams with preset sizes from the files to be stored, arranging the first data streams in data frames with the same size as a preset mask, and performing mask operation on the data frames according to the preset mask to obtain m first slicing files; wherein m is a positive integer; the size of the preset mask is a rectangular mask divided by the number of the first fragmented files;

the storage module is used for sequentially transmitting the m first fragmented files and the first characteristic value indexes to a preset server, so that the preset server stores the m first fragmented files under the first characteristic value indexes.

In a third aspect, the present invention provides a server comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the distributed file processing method according to the first aspect when the computer program is executed.

In a fourth aspect, the present invention provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the distributed file processing method according to the first aspect when the computer program is executed.

In a fifth aspect, the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the distributed file processing method according to the first aspect.

Drawings

FIG. 1 is a schematic flow chart of a distributed file processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a preset mask with 3-fold symmetry according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a preset mask with 5-fold symmetry according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a preset mask with 7 equal symmetry according to an embodiment of the present application;

fig. 5 is a schematic flow chart of an ant colony algorithm with heuristic labeling forward feedback according to an embodiment of the present application;

FIG. 6 is a type I storage flowchart of a file to be stored according to an embodiment of the present disclosure;

FIG. 7 is a type II storage flowchart of a file to be stored according to an embodiment of the present disclosure;

FIG. 8 is a flowchart of restoring a file to be restored according to an embodiment of the present application;

FIG. 9 is a schematic flow chart of file recovery for a type II storage file according to an embodiment of the present application;

FIG. 10 is a schematic flow chart of file transfer according to an embodiment of the present disclosure;

FIG. 11 is a schematic structural diagram of a distributed document processing apparatus according to an embodiment of the present disclosure;

fig. 12 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

It should be noted that, the file processing method may be deployed on a certain remote physical server cluster for external access, where the remote physical server cluster is formed by a plurality of remote physical servers, which together form a cluster system to provide services for the outside, the served user terminal does not need to know where the background service is deployed, and the user terminal only needs to authorize the management authority of the file or the file fragment to the service, that is, once the user terminal uploads the file or the file fragment to the server for hosting, the server has the storage right and the use right of the file, and the like. However, due to the specific requirement of authorization management, there is a risk of accidentally revealing the user file, so the processing method needs to perform certain encryption processing on the file to be stored when the user terminal uploads the file.

If the user does not want to authorize the service to authorize access to the content of the file, the user may pre-encrypt the file to be stored once, for example, encrypt the file with a symmetric key known only by the user, and then upload the encrypted file to the server, so that the service can only obtain the ciphertext of the file without decrypting the key even if the service is integrated with all the physical servers.

The method for processing the distributed files is provided, the files to be stored of the users are extracted through the masks in a uniform and discontinuous mode, so that the safety of file storage and the file access efficiency are improved, the files are uploaded to a remote server through a certain matching algorithm, and the integrity and confidentiality of the files are guaranteed. In order to better demonstrate the technical solutions of the present application, specific embodiments will be described.

Example 1

Referring to fig. 1, a flow chart of a distributed file processing method provided in an embodiment of the present application includes steps S11 to S13, specifically:

Step S11, obtaining a first characteristic value of a file to be stored, and obtaining a first characteristic value index for identifying the file to be stored according to the first characteristic value.

In some embodiments of the present application, obtaining a first feature value of a file to be stored, and obtaining a first feature value index identifying the file to be stored according to the first feature value includes: acquiring an absolute path file name of a file to be stored on a user terminal, and taking the absolute path file name as a first sub-characteristic value; or, performing hash calculation on file contents of a preset position of the file to be stored to obtain a second sub-characteristic value, so that a relation table is established according to the second sub-characteristic value and the file name to be stored; wherein the first feature value includes: the first sub-feature value or the second sub-feature value; and carrying out hash operation on the first sub-characteristic value or the second sub-characteristic value to obtain a first characteristic value index for identifying the file to be stored.

It should be noted that, when uploading a file to be stored, a feature value unique_value (uv) needs to be extracted for the file to be stored, and the unique identifier is called a feature value index Hash (uv) when the unique identifier is used for storing the file to be stored on a preset server after Hash operation is performed on the unique identifier. When the preset server stores a large number of files to be stored, a file index table StrTbl of (key, value) = (characteristic value index, file content) is constructed based on the characteristic value uv, so that when file restoration or file transmission is performed, relevant contents of the files to be processed are quickly extracted.

As an optional embodiment of the application, an absolute path file name of a file to be stored on a user terminal is obtained, the absolute path file name is used as a first sub-characteristic value, and after hash operation is performed on the first sub-characteristic value, a first characteristic value index for identifying the file to be stored is obtained.

It should be noted that, due to the characteristics of the Hash operation, in the current application field of commercial passwords, the results obtained after the Hash operation is performed on different contents can basically ensure that the results are unique, so that the unique characteristic value index of the file to be stored can be ensured, and compared with the original characteristic value (such as a file name or file content), the characteristic value index after the Hash operation is more standard and easy to manage.

As another optional embodiment of the present application, hash calculation is performed on file contents of a preset position of the file to be stored to obtain a second sub-feature value, so that a relationship table is established according to the second sub-feature value and the file name to be stored; and carrying out hash operation on the second sub-characteristic value to obtain a first characteristic value index for identifying the file to be stored.

It should be noted that, to reduce Hash (Hash) conflicts and increase file confidentiality, the file content of the preset location may be the beginning of a specific length of the file to be stored.

As a preferred embodiment of the application, selecting a part of the first 1K Bits of the file to be stored for Hash operation, if the file to be stored is less than 1K Bits, automatically supplementing 0 and then performing Hash operation to obtain a second sub-characteristic value.

As another alternative embodiment of the present application, the first characteristic value is extracted from a file underlying storage identifier, such as an inode or a tail portion byte.

It should be noted that, when the Hash calculation is performed according to the second sub-feature value to obtain the first feature value index, based on the self-characteristics of the Hash algorithm and the controllability of the file length, it can be ensured that the second sub-feature value is unique at the server side. Because the user only provides the file name of the file to be stored when using the processing method, the remote service is required to additionally store a relationship table delatbl of (key, value) = (file name, second sub-feature value), and the relationship table records the corresponding relationship between the file name of the file to be stored and the second sub-feature value, so that when the file is restored or transmitted, the related content of the file to be processed is quickly extracted.

The method adopts the characteristic value to carry out one-time combination and splicing of the character strings by taking the user space and the file path name as the reference; and the 1K Bits at the beginning of the file to be processed can be extracted as the characteristic value of the file, so that the Hash conflict of the characteristic value can be reduced.

It is worth to say that, when uploading the file to be stored, the processing method also needs to perform certain encryption processing on the file to be stored. Because the user's files are to be hosted by the server, the processing method performs certain separation, segmentation, fragmentation or encryption on the files to be stored in order to prevent the user's files from being accidentally leaked during the transmission process or in the storage of the server side. The method ensures that the complete original file of the user is not reserved on any single remote physical server, and continuous file data is not stored on one fragment.

Step S12, sequentially extracting first data streams with preset sizes from the files to be stored, arranging the first data streams in data frames with the same sizes as the preset masks, and performing masking operation on the data frames according to the preset masks to obtain m first fragmented files; wherein m is a positive integer; the size of the preset mask is a rectangular mask divided by the number of the first fragmented files.

In some embodiments of the present application, sequentially extracting the first data stream with the preset size from the file to be stored includes: arranging binary data streams of the file to be stored into square data blocks of M x M units according to bytes, and extracting binary data streams of L units from the square data blocks each time as a first data stream; wherein M and L are both positive integers.

In an alternative embodiment of the present application, the square data block may also be a rectangular data block, a circular data block, an elliptical data block, or other irregular data block, where the rectangular mask also changes accordingly based on the shape of the corresponding data block to accommodate the different shape data block.

It should be noted that, the shape of the data block is constructed to facilitate rapid extraction of data, when the data amount of l=1 units is extracted from the square data block, the data streams in the data amount corresponding to 1 unit are arranged into a pattern with the same size as the preset mask.

In an optional embodiment of the present application, the file to be stored includes: binary files or text files, etc.

As a preferred embodiment of the present application, the file to be stored in this embodiment is a binary file.

In some embodiments of the present application, the first data stream is arranged in a data frame with the same size as a preset mask, and masking operation is performed on the data frame according to the preset mask to obtain m first fragment files, including: arranging bit data of a plurality of bits in rows of the first data stream, and arranging a plurality of rows to obtain a rectangular data frame with the same size as the preset mask; and performing masking operation on the rectangular data frame according to a preset mask dividing m areas, obtaining m bit data partitions at the same time, taking bit data corresponding to each bit data partition on the first data stream as 1 first slicing file, and obtaining m first slicing files in total.

In one embodiment of the present application, the first data stream is arranged in a data frame with the same size as a preset mask, and masking operation is performed on the data frame according to the preset mask, so as to obtain m first fragment files, including: arranging the first data stream into m-bit data according to rows, and arranging m rows to obtain a rectangular data frame with the same size as the preset mask; and performing masking operation on the rectangular data frame according to a preset mask dividing m areas, obtaining m bit data partitions at the same time, taking bit data corresponding to each bit data partition on the first data stream as 1 first slicing file, and obtaining m first slicing files in total.

In an optional embodiment of the present application, the first data stream is arranged with m bits of bit data according to columns, and m columns are arranged to obtain a rectangular data frame with the same size as the preset mask; and performing masking operation on the rectangular data frame according to a preset mask dividing m areas, obtaining m bit data partitions at the same time, taking bit data corresponding to each bit data partition on the first data stream as 1 first slicing file, and obtaining m first slicing files in total.

The method comprises the steps of arranging data streams in rectangular data frames with the same size as a preset mask, performing mask operation on the rectangular data frames through the preset mask, and simultaneously obtaining a plurality of fragment files, wherein one fragment file is processed with the length of m ² The time complexity of the data stream of (a) isCompared to the time complexity of reading the data stream in sequence and dividing it into m consecutive segments of data according to the prior art->According to the file slicing method and device, the rectangular data frame is constructed, and the mask is preset, so that space consumption is replaced by time consumption, the file slicing time can be greatly reduced, the file slicing efficiency is improved, and the experience of a user for storing files is improved.

In some embodiments of the present application, setting the preset mask includes: constructing a mask which takes bits as a unit and has a size of m rows and m columns according to the byte length of the first data stream, and equally dividing the mask into m areas in a diagonal direction to obtain the preset mask; wherein the size of the preset mask is not smaller than the byte length.

It should be noted that the case where the size of the preset mask is larger than the byte length occurs in the following cases: when data is extracted from the first data stream for the last time, there may be insufficient filling of m rows and m columns of mask bits, at which time the mask square is filled with the remainder of the measure of 0's complement. The size of the preset mask is equal to the byte length before the last extraction of the data stream.

The mask is equally divided into m areas by the setting of the direction of the diagonal line, so that after the mask operation is carried out on the preset mask and the data frame, a plurality of bit data partitions are obtained simultaneously, each bit data partition is used as a slicing file, and the file slicing efficiency is improved. In addition, the mask is equally divided into m areas, so that the data stream can be fairly divided into the fragmented files with the same length, and when the files are recovered, the divided areas can be used for quickly assembling and then recovering the plurality of received fragmented files, so that the file recovery efficiency is improved.

It should be noted that, in the prior art of encryption transmission, a file to be transmitted is segmented, data in each segmented file is continuous, and then a digital envelope in a public key infrastructure (Pubmic Key Imfrastructure, PKI) encryption mechanism is adopted, or an encryption key is updated periodically according to file information, however, once the whole segmented file is stolen, a piece of complete data can be obtained by cracking the segmented file, so that the security of storing the file is low. In the embodiment, the internal data of the partitioned file divided by the diagonal lines are discontinuous and have higher randomness, so that the stolen partitioned file cannot obtain a complete data section even if the stolen partitioned file is cracked, and the security of file storage is improved.

In an optional embodiment of the present application, the preset mask symmetrically divides the m equally divided regions along a diagonal line.

In a preferred embodiment of the present application, when the preset mask is an odd number of regions, a region formed by the units on the diagonal line is taken as a starting region, and (m-1)/2 regions of m units corresponding to the upper diagonal line are found upwards along a region adjacent to the edge of the starting region; when the number of the units found along the initial area is less than m, expanding the units adjacent to the found units by using the units on the other diagonal line to obtain the areas of the m units until reaching two rectangular sides, and forming a last area of the upper triangle by using the units corresponding to the two rectangular sides; similarly, the same operation is performed in the lower triangle region to divide the (m-1)/2 regions containing m units, resulting in a m-aliquoting preset mask dividing the m regions.

Since the (m-1) regions are divided into regions adjacent along the diagonal line, the regions corresponding to the (m-1) regions are parallel along the diagonal line.

In an alternative embodiment of the present application, the mask may be divided by a left diagonal or a right diagonal to obtain the preset mask.

In an alternative embodiment of the present application, referring to fig. 2, a schematic diagram of a preset mask with 3 equal symmetry is provided in an embodiment of the present application. In the figure, m=3 areas are divided along the left diagonal line, namely, a rectangular mask with a preset mask being 3*3 is divided along the diagonal line, an area (an area) is formed along the diagonal line, the upper area and the lower area of the diagonal line are respectively independent areas (an area C and an area B), namely, each area is a bit data partition, the preset mask divided into three areas and the square data block are subjected to mask operation according to the bit, and the data flow of each bit data partition corresponding to the square data block is respectively obtained.

In another alternative embodiment of the present application, referring to fig. 3, a schematic diagram of a preset mask with 5 equal symmetry is provided in the embodiment of the present application. In the figure, m=5 regions are divided along the left diagonal line, that is, a rectangular mask with a preset mask of 5*5 is divided along the diagonal line into one region (region a), two regions adjacent to each other on the diagonal line are respectively independent regions (region E and region D), and the remaining two regions (region C and region B) on the upper right and lower left; in the E area, the E area consists of 4E units and 1E ', wherein E' is the adjacent unit between the right diagonal direction and the E unit; similarly, in the D region, 4D units and 1D 'are formed, and D' is a unit adjacent to the D unit in the right diagonal direction; the B region and the C region constitute m-cell regions from the cells remaining on both sides of the rectangular mask, respectively. And performing masking operation on the preset masks divided into five areas and the square data blocks according to the bits to respectively obtain the data streams of each bit data partition corresponding to the square data blocks.

In another alternative embodiment of the present application, referring to fig. 4, a schematic diagram of a preset mask with 7 equal symmetry is provided in the embodiment of the present application. In the figure, m=7 regions are divided along the left diagonal line, that is, a rectangular mask whose mask is preset as 7*7, one region (a region) is divided along the diagonal line, two regions adjacent to each other on the diagonal line are independent regions (G region and F region), two regions adjacent to each other on the right and left (C region and B region), a region sandwiched between the upper right region and the upper diagonal region (E region), and a region sandwiched between the lower left region and the lower diagonal region (D region). And performing masking operation on the preset mask divided into seven areas and the square data block according to the bits to respectively obtain the data flow of each bit data partition corresponding to the square data block. According to the setting mode of the preset mask, the multi-equal symmetrical mask can be obtained.

In another simple embodiment of the present application, the data stream is arranged in rows from left to right, right to left, top to bottom, and bottom to top on the mask by bits, m rows and m columns are arranged, and each column is taken as a tile file.

It is worth to describe that, m rows and m columns are arranged on the mask according to the row from left to right/from right to left/from top to bottom/from bottom to top, and the regularity among the segmented files of the preset mask constructed in a manner that each column/acts as a segmented file is stronger, and the security is lower because the segmentation rule is too simple, compared with the diagonal segmented encryption manner, the segmented encryption manner is performed in a diagonal parallel direction, the file relevance in the segmented file is lower, the randomness is stronger, and thus the security is higher. Notably, the time complexity of slicing the data stream using a mask is generally lower than that of conventional slicing methods that obtain continuous data in slices after traversing the file, because a certain amount of data stream can be batched using a mask.

In one embodiment of the present application, the first data stream is arranged in a data frame with the same size as a preset mask, and masking operation is performed on the data frame according to the preset mask, so as to obtain m first fragment files, including: arranging the first data stream in rows m ₁ Bit data of bit bits and m is arranged ₂ Obtaining a rectangular data frame with the same size as the preset mask; and performing masking operation on the rectangular data frame according to a preset mask dividing m areas, obtaining m bit data partitions at the same time, taking bit data corresponding to each bit data partition on the first data stream as 1 first slicing file, and obtaining m first slicing files in total.

It is worth noting that m ₁ And m is equal to ₂ And the m bit data partitions divided by the rectangular mask can be divided into m ₁ *m ₂ Thereby obtaining m first fragmented files which are uniformly divided, wherein the data bit of each first fragmented file is (m ₁ *m ₂ ) M, (m) ₁ *m ₂ ) And/m is a positive integer.

As an optional embodiment of the present application, when performing mask slicing, a mask square is firstly divided based on the size of the binary form of the file to be stored, that is, the binary streams of the file to be stored are sequentially arranged in a sequence of n=m×m=m ² Is included in the square data block. When extracting data streams from square data blocks, n=m data are taken each time, the n data are sequentially placed on a preset mask in m rows and m columns, the preset mask is a numerical filter layer preset in advance, the data streams of the files to be stored are screened according to the bits, after all the contents of the files to be stored are circularly processed, the files are segmented, a plurality of data stacks with different classifications are obtained, and each data stack (namely the segmented files after the file mask) is stored in different servers respectively, so that the encrypted storage of the files is completed.

As an alternative embodiment of the present application, when the size of the file to be stored is 345KByte, the file is 18 ² ＝＝324<345<19 ² =361, so taking m=19, i.e. the binary stream of the file to be stored can be arranged into a square data block of 19 rows and 19 columns, each row and column unit stores 1KB of data, and the tail data after exceeding 345KByte is directly set to 0. And when the segmentation is carried out, taking a data stream of L=1 units from the square data block, namely taking 1 KByte=32×32B, namely m=32 for masking, storing one bit in each masked unit for masking operation to obtain the classified data of the time, and storing the classified data into different servers after all row and column units are carried out to finish segmentation.

As another optional embodiment of the present application, each time slicing is performed, a data stream of l=2 units is taken from the square data block, that is, 2 kbyte=2×32×32b is taken, that is, m=32, and two bits are stored in each mask unit, so that after performing a masking operation, classification data of this time is obtained, and after all row units and column units are performed, the data after each classification is stored in different servers, so as to complete the segmentation.

It should be noted that if l=2 units are taken from square data blockThe data density of the data stream is higher than that of the data stream which takes L=1 units from the square data block, so that the large file can be quickly segmented, and the storage efficiency of the large file is improved; if multi-threading is used for batch processing, each thread uses a preset mask area to extract bit data partition for data stream, and the time complexity of processing one data stream is reduced toThe constant level of (2) can greatly improve the file slicing efficiency.

And S13, sequentially transmitting the m first fragmented files and the first eigenvalue indexes to a preset server, so that the preset server stores the m first fragmented files under the first eigenvalue indexes.

In some embodiments of the present application, transmitting the m first fragment files and the first eigenvalue index to a preset server sequentially includes: m first fragment files and the first characteristic value indexes are respectively transmitted to a preset m-1 slave servers and 1 master server; wherein each slave server receives a first fragmented file and one of the characteristic value indexes, and the master server receives a fragmented file and one of the characteristic value indexes; the number of the divided first fragmented files is determined by the total number of the slave servers and the master server.

In some embodiments of the present application, the preset server includes: 1 master service and at least 1 slave server preselected.

In the embodiment, the number of the first split files is determined by the total number of the servers, each server receives 1 first split file each time, when data is stolen from one server, complete data cannot be obtained after decoding, and all server data can be obtained to obtain complete data, so that the security of file storage is improved, the number of the first split files is determined by the total number of the slave servers and the master servers, and idle servers can be avoided to be reserved, so that the utilization rate of the servers is improved.

In some embodiments of the present application, after the preset server stores the m first fragment files under the first eigenvalue index, searching a corresponding file to be downloaded according to a second eigenvalue index of the file to be processed to perform file restoration or file transmission, including: after receiving the first characteristic value index, the preselected master server establishes a file index table to record the storage conditions of the master server and the slave servers according to the first characteristic value index and the received at least one first fragmented file and the preselected at least one slave server to receive the first characteristic value index and the residual first fragmented file; the master server and the slave server are selected in real time according to an ant colony algorithm fed back positively by the heuristic mark; when the file to be restored needs to be restored or the file to be transmitted needs to be transmitted, searching the file to be downloaded corresponding to the file index table according to the second characteristic value index of the file to be processed, and carrying out file restoration or file transmission.

In some embodiments of the present application, the first feature value indexes are sequentially transmitted to a preset server, so that the master server and the slave server both receive the first feature value indexes, and the master server and the slave server respectively store the received first fragment files under the corresponding first feature indexes.

In some embodiments of the present application, after the first eigenvalue indexes are respectively combined with the m first fragmented files, the obtained m fragmented files to be transmitted are transmitted to the master server, so that the master server and the slave server respectively store the received first fragmented files under the first eigenvalue indexes.

According to the method and the device, the master server and the slave server are divided, the optimal server can be selected for storing the fragmented files, and therefore file transmission efficiency is improved; in addition, as the master server and the slave server which are selected in real time are used for storing the files each time, the master server and the slave server which are used for storing the files each time are different, the randomness of storing the files is improved, and the safety of storing the files can be further improved through the selection of different master and slave servers.

In some embodiments of the present application, the master server and the slave server are selected in real time according to the ant colony algorithm with the heuristic mark forward feedback, referring to fig. 5, which is a schematic flow chart of the ant colony algorithm with the heuristic mark forward feedback provided in the embodiments of the present application, including sub-steps S131 to S133, specifically:

step S131, when uploading the file to be stored each time, the same test data are sent to a plurality of servers, and the servers are tracked to receive and process the test data, and the average time delay of unit data quantity of the processing result is returned.

In some embodiments of the present application, the test data may be obtained by performing a masking operation on a test data stream in a file to be stored according to a test mask smaller than a preset mask size, or may be preset data unrelated to the file to be stored.

And step S132, when the test data is sent next time, the first sub-test data volume sent to the server with the minimum average time delay is larger than the second sub-test data volume sent to the rest servers, and the average time delay is re-tracked.

In some embodiments of the present application, the first sub-test data amount sent to the server with the smallest average delay of the unit data amount is larger than the second sub-test data amount sent to the remaining servers; wherein the data processing delay is specifically; the test data is received, processed and transmitted with the total delay/data size.

And step S133, adding marks as a master server to all the test data concentrated in the same server until the test is repeated for a plurality of times, wherein the rest servers are slave servers.

It should be noted that, when uploading the file to be stored, the connection between the location of the user terminal and the remote multiple servers may be affected by the region distance and the network bandwidth, so that the transmission performance of the remote multiple servers is uneven, so that the server with higher transmission performance may be used as the master server, and thus, based on the high performance of the master server, the matching search of the Hash (uv) may be performed first, and then the master server initiates the matching with the Hash (uv) value to all the slave servers, so that the previous task of initiating query matching with multiple servers is transferred to the master server with higher performance, and further physical resources are utilized.

It is worth noting that the total delay of the master server must be greater than that of the slave server, since the data volume of the master server is the most in the probing phase, the average delay per data volume, i.e. total delay/total data volume = average delay per data volume, is to be compared, since the network performance of the master server is good, and the average delay is the lowest.

In some embodiments of the present application, a redundant remote server is additionally added or/and an original master-slave server is additionally added, and the same bit portion is reserved, so that redundant backup of the processing method can be implemented, and the data disaster recovery capability and anti-malicious attack capability of the file processing method are improved, but at the cost of losing part of the storage capacity.

In an alternative embodiment of the present application, when the main server is selected by the file to be stored, the same test data test_data may be sent to all remote servers first, the average delay of the unit data volume obtained by each remote service receiving process and returning according to the total delay_time_all is tracked and recorded, some data is sent to the server with a tendency to be smaller delay time when the test data is sent next time, and the average delay_time of the unit data volume is repeatedly recorded (the overall time consumption for processing all the data is used for consuming the delay_time_all/the data volume test_bits), and each test tends to send more data to the server with the minimum average delay_time of the unit data volume, so that the data volumes test_bits of other servers become smaller gradually, and under the condition that the network performance is not enough, when all the final test data will be concentrated to the only one server, the main server is determined. When the file is uploaded to the master server, a mark is added to the master server, and query matching when transmission or recovery operation is performed is all forwarded to the slave server by the master server.

In another optional embodiment of the present application, when a main server is selected from a file to be stored, m first sharded files are used as test data test_data of m corresponding servers, each server processes 1 first sharded file and sends the first sharded file to all remote servers at the same time, each remote service receiving process is tracked and recorded, and an average time delay of the obtained unit data volume is returned according to the total time delay_time_all; when the next m first fragmented files are sent, the next m first fragmented files have tendency to send 1 more first fragmented file to the server with small delay time, and repeatedly record the average time delay_time of the unit data volume (the whole time consumption delay_time_all/data volume test_bits used for processing all data), each test sends more data to the server with the minimum average time delay delay_time of the unit data volume in a tendency way, so that the data volume test_bits of other servers become smaller gradually, and under the condition of poor network performance, finally, all test data are concentrated to only one server, and then the main server is determined. And when the file to be stored is not stored, adding a mark on the master server when the residual file of the file to be stored is uploaded to the master server, uploading the mark to the slave server without the mark, and forwarding all the inquiry matching in the process of executing the transmission or recovery operation to the slave server by the master server.

It should be noted that, when the master server is screened by using the file to be tested, the mask size may be smaller than that of the master server. And the master server and the slave server are determined by using the small-size preset mask, and the storage efficiency of large-batch data transmission by using the large-size preset mask is higher.

It should be noted that the primary server is selected according to multiple factors such as network and region, so that for the same file to be stored, once the primary server is completely uploaded, it is determined synchronously, that is, only one primary server is fixed for one file, but the primary servers may not be the same for different files.

In some embodiments of the present application, when a file to be restored needs to be restored or a file to be transmitted needs to be transferred, searching a file to be downloaded corresponding to the file index table according to a second eigenvalue index of the file to be processed to restore or transfer the file, including: obtaining a second characteristic value of a processing file, carrying out hash calculation on the second characteristic value to obtain a second characteristic value index, and inquiring the second characteristic value index in the file index table to obtain a first downloading fragmented file in the main server; meanwhile, initiating global matching of the second characteristic value indexes to all slave servers according to the master server to obtain the rest second download fragmented files; and sending the first download fragmented file and the second download fragmented file to a user terminal in batches, so that the user terminal reorganizes the first download fragmented file and the second download fragmented file according to the preset mask to obtain the file to be restored or the transmission file.

In some embodiments of the present application, the step of distributing the first download fragment file and the second download fragment file to the user terminal in batches includes: and when the second characteristic value of the first download fragmented file is consistent with the second characteristic value of the second download fragmented file, erasing the second characteristic value of the second download fragmented file to obtain a sub-file to be processed, so that the first download fragmented file and the sub-file to be processed are sent to a user terminal in batches.

According to the method and the device, the corresponding downloading fragmented files are obtained from the main server and the slave server through the characteristic values of the files to be processed, and the file recovery mechanism is provided on the basis of the file storage mechanism, so that the method and the device can be applied to comprehensive scenes of file transmission, file encryption and file recovery, and have higher applicability and practicability.

Illustratively, the second characteristic value includes: and performing hash calculation by using the third sub-characteristic value of the absolute path file name or using file contents of a preset position of the file to be stored to obtain a fourth sub-characteristic value.

In an optional embodiment of the present application, a third sub-feature value using the absolute path file name is obtained, hash calculation is performed on the third sub-feature value to obtain a second feature value index, the second feature value index is queried in the file index table to obtain a first download fragment file in the master server, and global matching is initiated on the second feature value index in all slave servers to obtain a remaining second download fragment file; and sending the first download fragmented file and the second download fragmented file to a user terminal in batches, so that the user terminal reorganizes the first download fragmented file and the second download fragmented file according to the preset mask to obtain the file to be restored or the transmission file.

In another optional embodiment of the present application, a fourth sub-feature value obtained by performing hash computation according to file content of a preset position of the file to be stored is obtained, hash computation is performed on the fourth sub-feature value to obtain a second feature value index, the second feature value index is queried in the file index table to obtain a first download fragmented file in the master server, and global matching is initiated on the second feature value index in all slave servers to obtain a remaining second download fragmented file; and sending the first download fragmented file and the second download fragmented file to a user terminal in batches, so that the user terminal reorganizes the first download fragmented file and the second download fragmented file according to the preset mask to obtain the file to be restored or the transmission file.

It is worth to describe that the file processing method is mainly characterized in that the file is logically extracted through a preset mask diagram so as to be encrypted in a segmented mode, compared with a general segmented encryption method, the segmentation efficiency is higher, and the data of the obtained segmented file is not continuous due to the fact that the data flow is segmented along the diagonal direction, so that the security is higher. The mask layer of the file slicing can be preset according to a specific scene, bits are extracted through logical AND operation, and meanwhile, the unique characteristic value can be stored for the master-slave server matched with the slicing file by taking a user name and a file name or a user name and a 1kb content character of the file as references, so that the unique characteristic value is extracted according to the file content and is matched through an algorithm, and the file processing method integrating the operations of storage, transmission, recovery and the like is deployed remotely, so that the safety is higher, the applicable scene is wider, the slicing efficiency is higher, and the resource utilization rate of the server is higher.

According to the method and the device, the data flow in the file to be stored is arranged in the data frame with the same size as the preset mask, the rectangular mask which is equally divided into m areas along the direction of the parallel diagonal line is used for dividing the data flow into discontinuous divided files, so that the stolen divided files can be ensured to be broken, a section of complete data cannot be obtained, and the security of file storage is improved.

Example 2

Referring to fig. 6, an I-type storage flowchart of a file to be stored is provided in an embodiment of the present application. The figure comprises the processing flows of feature value extraction, mask circulation slicing and storage to a server.

Extracting the file content of the front 1Kbits of the binary file of the file to be stored as a characteristic value uv, performing Hash calculation on the characteristic value uv, and transmitting the obtained characteristic value index Hash (uv) to all master-slave servers.

Storing a binary file of a file to be stored into an N=M×M square data block with bytes as a unit, arranging the circularly extracted data stream into a rectangular data frame with the same size as a preset mask in a row with bits as a unit, wherein the preset mask is a square mask with 5 equal division areas with bits as a unit, performing masking operation on the rectangular data frame and the preset mask in a bit manner to obtain an area data stream of 5 bit data partitions in the rectangular data frame, reading the area data stream row by row, and transmitting the area data stream to a preset server, wherein the area data stream contains 5 servers in total.

In order to improve the utilization of servers, the present embodiment employs the number of servers equal to the number of fragmented files.

Specifically, a server A is obtained as a master server A according to the test data, the other servers are slave servers, the region data stream corresponding to the region A is transmitted to the master server A, and the region data corresponding to the region B, the region data corresponding to the region C, the region data corresponding to the region D and the region data corresponding to the region E are respectively transmitted to the slave servers B, C, D and E.

It is worth to say that, because the ABCDE type fragmented files bit are part of the files to be stored after being forcibly split, any leakage in the files will not affect the user; meanwhile, because the characteristic value index Hash (uv) is stored on each server, various file fragments have certain relations in the whole.

The data flow is segmented along the diagonal direction, and the obtained data of the segmented file is not continuous, so that the security is higher; and the optimal master-slave server is matched for storing the fragmented files, so that the security of storing the files on the remote server is higher, and the resource utilization rate of the server is higher.

Example 3

Compared with embodiment 2, the difference is that the number of servers of this embodiment is smaller than the number of fragmented files, and 5 equal-divided fragmented files are stored in 3 servers.

As a more general case, when the number of fragments of the file to be stored is not equal to the actual number of physical servers, referring to fig. 7, a type II storage flowchart of the file to be stored is provided in the embodiment of the present application. In the figure, when 5 equal-divided fragmented files are stored in 3 servers, the area a stream storage servers a, B, and C area stream storage servers B, D, and E area stream storage server C are used.

Therefore, the more general situation that the number of classifications is not equal to the number of servers can be realized, and the remote servers are divided into a Master server and a Slave server Slave in consideration of performance optimization and the like, wherein only one Master server (server A) and the other Slave servers (server B and server C) are used as the Master servers.

Because the connection between the location of the method and the remote multiple servers during uploading the file may be affected by various areas, distances, network bandwidths and the like, so that the transmission performance of the remote multiple servers is uneven, the server with higher transmission performance can be used as a main server, so that the matching search of Hash (uv) can be firstly performed based on the high performance of the main server, then the main server initiates matching to all the slave servers by the Hash (uv) value, and the task of initiating query matching to the multiple servers before is transferred to the main server with higher performance, and further physical resources are utilized.

The algorithm of the selected main server may adopt an ant colony algorithm with forward feedback of a heuristic mark, that is, when the main server is selected by the file to be uploaded, the same test data test_data may be sent to all remote servers first, the average time delay of the unit data volume obtained by each remote service receiving process and returning according to the total time delay_time_all is tracked and recorded, some data is sent to the server with small average time delay of the unit data volume when the test data is sent next time, and the average time delay_time of the unit data volume (the total time delay_all/the data volume test_bits used for processing all the data) is repeatedly recorded, and each test sends more data to the server with the minimum average time delay_time of the unit data volume, so that the data volume test_bits of other servers become smaller gradually, and when the network performance is not good, all the test data is finally concentrated to one server, the main server is determined. When the file is uploaded to the master server, a mark is added to the master server, and query matching when transmission or recovery operation is performed is all forwarded to the slave server by the master server.

In this embodiment, the data stream in the area a is sent to the Master server (server a), the data streams in the areas B and C are sent to the Slave server Slave1 (server B), and the data streams in the areas D and E are sent to the server Slave2 (server C), so that preferential matching between the file fragments and the remote server is achieved.

The mask slicing encryption method can divide the master server and the slave server based on the difference of the file classification number and the remote server number, wherein the separation of the master data and the slave data is completed when the mask layer is planned. The file fragments after mask segmentation and the fragments themselves have no integrity, but each fragment has a file characteristic value for correlation, and the fragments can be matched with a mask layer one by one so as to automatically carry sequential marks for later use in file assembly; after splitting the file content, adopting an ant colony algorithm with heuristic mark forward feedback, and confirming a server with higher performance as a main server according to the performance of the network bandwidth, task delay time and the like at the moment, wherein the server plays a role of initiating data query to other slave servers, so that the overall utilization rate of resources is improved.

Example 4

Referring to fig. 8, a flowchart of restoring a file to be restored is provided in an embodiment of the present application. In the figure, the processing flow including feature value extraction, feature value searching, file downloading acquisition and file restoration to be stored is the inverse operation of the file storage process in fig. 6.

When a user wants to restore a file, firstly searching a characteristic value uv associated with the file through a file name, then calculating to obtain a characteristic value index Hash (uv), initiating inquiry to a remote service based on the characteristic value index, matching all files to be downloaded of the remote server, finally assembling and merging based on a mask layer, and finally restoring a binary file of the file to be restored.

The method ensures that the fragmented files after the fragmentation and the fragmented files do not have integrity through mask fragmentation processing, so that malicious attacks are not feared when the fragmented files are stored on a remote server; because each fragmented file has a file characteristic value for connection, the method is not afraid of malicious tampering during network transmission; the final integrity of the user file can be ensured through the processing of the remote server, and even if a certain remote server is attacked or suffers network tampering during the issuing processing of the remote service, the final consistency of the user file can be ensured through the characteristic value comparison and repeated issuing.

Example 5

The embodiment is a file recovery process according to the type II storage scenario of fig. 7, and referring to fig. 9, a flow chart of file recovery for a type II storage file provided in the embodiment of the application is shown. After the file is uploaded to the remote server, a user can apply the file processing method in various scenes, and the principle steps of the file processing method are explained by a file transmission process and a file recovery process. For convenience of explanation, the files are classified into five equal division masks according to the current specification, and the classified data are stored on three servers. Similarly, the a-zone data stream (bitA) is stored under the eigenvalue (uv) of the Master server (server a), the B-and C-zone data streams (bitb+bitc) are stored under the eigenvalue (uv) of the Slave server Slave1 (server B), and the D-and E-zone data streams (bitd+bite) are stored under the eigenvalue (uv) of the server Slave2 (server C).

When a user deletes a local file by mistake and returns the file to the processing method, the file name of the file to be restored needs to be submitted, and the file name can be directly obtained from the submitted history. According to the foregoing discussion, the file content stored by the 3 remote services is respectively [ uv+bita ], [ uv+bitb+bitc ] and uv+bitd+bite ], and the file recovery service is executed as a Master server Master, and after the file name is obtained, the processing method performs the following operation steps: the steps S51 to S58 are specifically:

step S51, firstly, obtaining a characteristic value uv of a file to be restored according to the file name of the file to be restored. Adopting a relatively simple absolute path identifier, namely a user space name and a file path name, aiming at the characteristic value, wherein the remote server is required to store and search, and uv is obtained after combining the user space and the file path name; if the characteristic value is the first 1K Bits part of the file to be restored, searching a Relatebl relation table through the file name to obtain 1K Bits content, and obtaining the characteristic value uv.

And S52, performing a Hash operation on the characteristic value uv to obtain a characteristic value index Hash (uv), and performing a StrTbl table lookup operation to obtain a bit A part of the file to be restored, which is stored on the main server, wherein the bit A represents the matched content fragment extracted from the mask position A during mask slicing.

Step S53, the main server initiates global matching to all the slave servers by the characteristic value uv, and the parts, which are distributed in other slave servers and correspond to the characteristic value, namely [ uv+bitb+bitc ] and [ uv+bitd+bite ], are obtained respectively, and the content parts are fragments of the file to be stored, and even if all the content parts are stolen by a network, the content parts cannot be recovered into a final file because of the lack of a bitA part stored by the main server; and the file to be stored is also discontinuous data, and cannot be restored even if the file is stolen.

Step S54, comparing uv parts in the received data, and ensuring that all uv parts are consistent to enter step S55; otherwise, the inconsistency represents that the network data is lost or maliciously tampered, and the step S541 is entered;

step S541, re-initiating matching for inconsistent uv, re-executing step S53.

And S55, reversely erasing the uv part received from the server according to the characteristic value uv, ensuring that the rest part only comprises a bit part, and changing the part into bitB, bitC, bitD and bitE after the erasure is finished.

Step S56, the uv+bitA inquired by the main server, bitB, bitC, bitD and bitE acquired through the network are issued to the user terminal in batches;

and step S57, reassembling the uv+bitA+bitB+bitC+bitD+bitE on the user terminal according to a preset mask, namely recovering the original binary file.

Step S58, the user terminal checks whether the file recovery is successful, if so, the process is ended, and if so, the file recovery operation can be executed again from the step S51.

According to the method, the device and the system, the files are sequentially and circularly masked according to bit extraction data, the fragmented files are respectively stored in the remote server according to a certain matching rule, meanwhile, the connection is kept according to the unique matching characteristic value of the file information, and therefore various operations such as encryption, transmission and storage of the integrated files are recovered, and the method and the system can be suitable for more scenes and have higher applicability and practicability.

Example 6

When a user terminal needs to transmit a certain file to another user terminal, the transmitting end does not need to download and retransmit the file from the server, and can directly complete file sharing by means of the processing method, and meanwhile, the encrypted transmission and reliability and completeness of the file to be transmitted are ensured.

Likewise, when transmitting a certain file, the user terminal needs to submit the file name of the file to be transmitted, which can also be directly obtained in the submitted history.

The present embodiment assumes that the first user terminal transmits a file to the second user terminal, that is, the first user terminal is a sender and the second user terminal is a receiver. Notably, the first user terminal and the second user terminal include: cell phones, computers, tablets, smart watches, and other smart products; in addition, the remaining assumptions are the same as the file restoration scenario of the above embodiment 5, and referring to fig. 10, which is a schematic flow chart of file transmission provided in the embodiment of the present application, including steps S61 to S68, specifically:

Step S61, firstly, obtaining a characteristic value uv according to the file name of the file to be transmitted. The acquisition method is the same as the scenario of embodiment 5 described above. The method specifically comprises the following steps: firstly, obtaining a characteristic value uv of a file to be transmitted according to the file name of the file to be transmitted. The method comprises the steps that a relatively simple absolute path identifier is adopted aiming at a characteristic value, namely a user space name and a file path name, which requires the remote server to perform storage and searching, and the characteristic value uv is obtained after the user space and the file path name are combined; if the characteristic value is the first 1K Bits part of the file to be transmitted, searching a Relatbl relation table through the file name to obtain 1K Bits content, and obtaining the characteristic value uv.

Step S62, obtaining a characteristic value index Hash (uv) through a characteristic value uv, and then performing StrTbl table lookup operation to obtain a bitA part of the file to be transmitted, which is reserved on the main server;

step S63, acquiring information of a second user terminal from the first user terminal, and connecting the information to the second user terminal through a network; wherein it is required that the second user terminal has also authorized use of said remote service.

And S64, initiating global matching to all the slave servers by the master server according to the characteristic value uv, and issuing uv+bitB+bitC, uv+bitD+bitE and uv+bitA distributed on other slave servers to a second user terminal in a temporary file tmp mode.

Step S65, comparing uv parts in each data, and if all uv parts are consistent, entering step S66; if not, the process proceeds to step S651.

Step S651, the corresponding master-slave server re-executes the above step S64 for the inconsistent portion.

And step S66, reversely erasing uv parts received by other remote services according to the characteristic value uv, ensuring that the rest parts only comprise bit parts, and finally changing the parts into uv+ bitA, bitB, bitC, bitD and bitE after the erasure is finished.

Step S67, reassemble uv+bitA+bitB+bitC+bitD+bitE on the second user terminal according to the preset mask, that is, restore to the most original binary file, and delete the temporary file tmp after the assembly is completed.

Step S68, the second user terminal checks the file and notifies the first user terminal that the reception of the file is completed.

The method and the device can be suitable for various actual scenes, including but not limited to personal file recovery, double-party and multi-party file transmission, and have higher applicability and practicability; and the distributed application can be fully characterized after being deployed to the remote service, so that various operation flows about the file can be greatly optimized, and the production efficiency is improved.

Example 7

Referring to fig. 11, a schematic structural diagram of a distributed file processing apparatus according to an embodiment of the present application includes: a first characteristic value acquisition module 71, a segmentation module 72, and a storage module 73.

The first eigenvalue obtaining module 71 is mainly configured to obtain a first eigenvalue of a file to be stored, and transmit a first eigenvalue index corresponding to the first eigenvalue to the storage module 73; the slicing module 72 is mainly configured to slice a file to be stored into a first slicing file that is discontinuous according to a preset mask, and transmit the first slicing file to the storage module 73; after receiving the first feature index and the first fragmented file, the storage module 73 stores the first fragmented file under the first feature value index, so that the corresponding file to be restored is searched for in the preset server according to the second feature value index of the file to be processed to restore or transfer the file.

The first eigenvalue obtaining module 71 is configured to obtain a first eigenvalue of a file to be stored, and obtain a first eigenvalue index identifying the file to be stored according to the first eigenvalue.

The slicing module 72 is configured to sequentially extract first data streams with a preset size from the files to be stored, arrange the first data streams in a data frame with the same size as a preset mask, and perform a masking operation on the data frame according to the preset mask to obtain m first slicing files; wherein m is a positive integer; the size of the preset mask is a rectangular mask divided by the number of the first fragmented files.

In some embodiments of the present application, sequentially extracting the first data stream with the preset size from the file to be stored includes: arranging binary data streams of the file to be stored into square data blocks of M x M units according to bytes, and extracting binary data streams of L units from the square data blocks each time as a first data stream; wherein M and L are both positive integers. .

And the storage module 73 is configured to sequentially transmit the m first fragmented files and the first eigenvalue index to a preset server, so that the preset server stores the m first fragmented files under the first eigenvalue index.

In some embodiments of the present application, after the preset server stores the m first fragment files under the first eigenvalue index, searching a corresponding file to be downloaded according to a second eigenvalue index of the file to be processed to perform file restoration or file transmission includes: after receiving the first characteristic value index, the preselected master server establishes a file index table to record the storage conditions of the master server and the slave servers according to the first characteristic value index and the received at least one first fragmented file and the preselected at least one slave server to receive the first characteristic value index and the residual first fragmented file; the master server and the slave server are selected in real time according to an ant colony algorithm fed back positively by the heuristic mark; when the file to be restored needs to be restored or the file to be transmitted needs to be transmitted, searching the file to be downloaded corresponding to the file index table according to the second characteristic value index of the file to be processed, and carrying out file restoration or file transmission.

In some embodiments of the present application, searching the file to be downloaded corresponding to the file index table according to the second eigenvalue index of the file to be processed to perform file restoration or file transmission includes: obtaining a second characteristic value of a file to be processed, carrying out hash calculation on the second characteristic value to obtain a second characteristic value index, and inquiring the second characteristic value index in the file index table to obtain a first downloading fragmented file in the main server; meanwhile, initiating global matching of the second characteristic value indexes to all slave servers according to the master server to obtain the rest second download fragmented files; and sending the first download fragmented file and the second download fragmented file to a user terminal in batches, so that the user terminal reorganizes the first download fragmented file and the second download fragmented file according to the preset mask to obtain the file to be restored or the transmission file.

Compared with a general segmentation encryption method, the segmentation efficiency is higher, and the obtained data of the segmented file is not continuous due to the fact that the data flow is segmented along the diagonal direction. The mask layer of the file slicing can be preset according to a specific scene, bits are extracted through logical AND operation, and meanwhile, the unique characteristic value can be stored for the master-slave server matched with the slicing file by taking a user name and a file name or a user name and a 1kb content character of the file as references, so that the unique characteristic value is extracted according to the file content and is matched through an algorithm, and the file processing method integrating the operations of storage, transmission, recovery and the like is deployed remotely, so that the safety is higher, the applicable scene is wider, the slicing efficiency is higher, and the resource utilization rate of the server is higher.

Example 8

The application provides a server, the server includes: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the distributed file processing method when the computer program is executed.

According to the method, after the file is logically extracted through the preset mask map so as to be sectionally encrypted, the optimal master server and the slave server are matched for storing the fragmented file, so that the unique characteristic value is extracted according to the content of the file and matched to the optimal remote server through the algorithm for deployment, the file processing methods of storage, transmission, recovery and other operations can be fused and applied to the server, the safety is higher, the applicable scene is wider, the fragmentation efficiency is higher, and the server resource utilization rate is higher.

Example 9

Referring to fig. 12, a schematic diagram of an electronic device according to an embodiment of the present application is provided, where the electronic device includes: a memory 91, a processor 92 and a computer program stored on the memory 91 and executable on the processor, the processor 92 implementing the steps of the distributed file processing method when the computer program is executed.

In some embodiments of the present application, the electronic device further includes: a communication interface 93 and a communication bus 94; wherein the processor 92, the communication interface 93 and the memory 91 perform communication with each other via a communication bus 94.

According to the file processing method, after the file is logically extracted through the preset mask map so as to be encrypted in a segmented mode, the segmented file is stored by the master-slave server which is optimally matched with the segmented file, when the obtained file processing method is integrated on the electronic equipment, file storage, transmission, recovery and other operations can be performed through various electronic equipment, the application scene is wider, and the expandability is stronger.

Example 10

The present application provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the distributed file processing method according to the first aspect.

According to the method and the device, after the files are logically extracted through the preset mask map so as to be encrypted in a segmented mode, the segmented files are stored by the master-slave server which is optimally matched with the segmented files, when the obtained file processing program is stored in the storage medium, the file processing program in the storage medium can be operated or read, the files can be stored, transmitted, restored and the like, the application scene is wider, and the expandability is stronger.

It will be appreciated by those skilled in the art that embodiments of the present application may also provide a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims

1. A distributed file processing method, comprising:

2. The method for processing a distributed file according to claim 1, wherein the arranging the first data stream in a data frame with the same size as a preset mask, performing a masking operation on the data frame according to the preset mask, to obtain m first fragmented files, includes:

3. The distributed file processing method according to any one of claims 1 to 2, wherein setting the preset mask includes:

4. The distributed file processing method as claimed in claim 1, wherein after the preset server stores the m first fragmented files under the first eigenvalue index, searching the corresponding file to be downloaded according to the second eigenvalue index of the file to be processed for file restoration or file transmission, including:

5. The distributed file processing method of claim 4, wherein the master server and the slave server are selected in real time according to an ant colony algorithm with heuristic marking feed forward, comprising:

6. The method of claim 4, wherein searching for the file to be downloaded corresponding to the file index table according to the second eigenvalue index of the file to be processed to perform file restoration or file transmission includes:

and sending the first download fragmented file and the second download fragmented file to a user terminal in batches, so that the user terminal reorganizes the first download fragmented file and the second download fragmented file according to the preset mask to obtain the file to be restored or the transmission file.

7. The distributed file processing method as claimed in claim 6, wherein the step of distributing the first download fragmented file and the second download fragmented file to the user terminal in batches includes:

8. The distributed file processing method as claimed in claim 1, wherein the sequentially transmitting the m first fragmented files and the first eigenvalue index to a preset server includes:

9. The distributed file processing method as claimed in claim 1, wherein sequentially extracting the first data stream of the preset size from the file to be stored comprises:

10. The method of claim 1, wherein the obtaining a first eigenvalue of a file to be stored and obtaining a first eigenvalue index identifying the file to be stored according to the first eigenvalue comprises:

11. A distributed document processing apparatus, comprising: the device comprises a first characteristic value acquisition module, a segmentation module and a storage module; wherein,

12. A server, the server comprising: memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the distributed file processing method according to any of claims 1 to 10 when the computer program is executed.

13. An electronic device, the electronic device comprising: memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the distributed file processing method according to any of claims 1 to 10 when the computer program is executed.

14. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the distributed file processing method according to any of claims 1 to 10.