CN115292247B - File reading method and device, electronic equipment and storage medium - Google Patents

File reading method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115292247B
CN115292247B CN202211186653.4A CN202211186653A CN115292247B CN 115292247 B CN115292247 B CN 115292247B CN 202211186653 A CN202211186653 A CN 202211186653A CN 115292247 B CN115292247 B CN 115292247B
Authority
CN
China
Prior art keywords
file
read
group
target storage
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211186653.4A
Other languages
Chinese (zh)
Other versions
CN115292247A (en
Inventor
邵金生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dingxuan Tech Co ltd
Original Assignee
Beijing Dingxuan Tech Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dingxuan Tech Co ltd filed Critical Beijing Dingxuan Tech Co ltd
Priority to CN202211186653.4A priority Critical patent/CN115292247B/en
Publication of CN115292247A publication Critical patent/CN115292247A/en
Application granted granted Critical
Publication of CN115292247B publication Critical patent/CN115292247B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Abstract

The application provides a file reading method, a file reading device, electronic equipment and a storage medium, wherein redundant space in the last storage block in each obtained group is removed through index table removal, then redundant space between continuous files to be read is removed through a memory space, and through the operation, when the files are copied into the memory, due to the fact that the continuous files to be read exist, compared with the prior art, the method and the device can reduce the number of state switching and the frequency of hard disk magnetic head switching during reading, and are beneficial to improving reading efficiency.

Description

File reading method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a file reading method and apparatus, an electronic device, and a storage medium.
Background
With the progress of digitization, the business system needs to store more and more files, and the files stored in the business system need to be copied and exchanged in large quantities, such as: and transferring the files stored in the hard disk of the business system to other storage positions or other equipment to realize further processing and treatment of the data and the like. When files in a hard disk of a service system are transferred, the files need to be read, the storage positions of two files to be read which need to be continuously read are far different when the files are stored in the hard disk, at the moment, a hard disk magnetic head needs to jump to finish addressing, when the number of the files which need to be continuously read is large, the hard disk magnetic head frequently jumps to perform addressing, so that the addressing time is long, and because the storage positions of the two files to be read which need to be continuously read are not continuous and cannot be read at one time, when each file to be read is read, a user mode needs to be switched to a kernel mode, a certain time needs to be consumed for each switching, and when the number of the files which need to be continuously read is large, the time needed for state switching is long. In summary, in any of the above cases, the reading efficiency of reading the file is affected, and thus, it takes much time to transfer the file.
Disclosure of Invention
In view of this, embodiments of the present application provide a file reading method and apparatus, an electronic device, and a storage medium, so as to improve reading efficiency.
In a first aspect, an embodiment of the present application provides a file reading method, where the method is applied in a service system, a hard disk of the service system stores multiple files to be read, and when the service system is in a kernel state, the method includes:
traversing a plurality of files to be read, and storing an obtained first index table containing file information corresponding to each file to be read into a first memory area of a memory of the service system, wherein for each file information, the file information comprises a storage position of a target storage block occupied by the file to be read corresponding to the file information, a file size of the file to be read corresponding to the file information, a tail block position of a last target storage block occupied by the file to be read corresponding to the file information, and continuity between the target storage blocks occupied by the file to be read corresponding to the file information;
according to the storage position of a first target storage block occupied by a file to be read corresponding to each file information, sequencing each file information in the first index table according to the sequence of the storage positions to obtain a second index table;
according to the continuity between target storage blocks occupied by files to be read corresponding to the file information, the storage position of the target storage block occupied by the files to be read corresponding to the file information and the tail block position of the last target storage block occupied by the files to be read corresponding to the file information, performing first group division on the file information according to the sequence of the file information included in the second index table to obtain a plurality of first file groups, wherein for each occupied target storage block which is a discontinuous file to be read, the file information corresponding to the files to be read is taken as a first discontinuous file group, for each occupied target storage block which is a continuous file to be read and the tail block position of the last target storage block occupied by the files to be read and the storage position of the target storage block occupied by the next file to be read are discontinuous, the file information corresponding to the files to be read is taken as a second discontinuous file group, for each occupied target storage block which is a continuous file to be read and the tail block position occupied by the last target storage block of the files to be read and the tail block position of the next file to be read are taken as a continuous file group, and the file to be read corresponding to the next target storage block occupied by the file and the target storage block of the continuous file to be read, and the file to be read are taken as a continuous file group;
according to the file size of the file to be read corresponding to each piece of file information, performing second group division on continuous file groups in the plurality of first file groups according to the sequence of the file information included in the second index table to obtain a plurality of second file groups, wherein for each continuous file group, if the sum of the file sizes of the files to be read corresponding to the continuous file group is smaller than or equal to a preset threshold value, the continuous file group is used as one second file group, if the sum of the file sizes of the files to be read corresponding to the continuous file group is larger than the preset threshold value, the continuous file group is divided into N second file groups, in the first N-1 second file groups, the sum of the file sizes of the files to be read corresponding to the file information included in each second file group is larger than or equal to the preset threshold value, the value of the file size of the file to be read corresponding to the file information included in each second file group minus the file size of the file to be read corresponding to the last file information included in the second file groups is larger than 1, and the positive integer of the threshold value of the file size of the file to be larger than 1;
according to the file size of the file to be read corresponding to each file information and the storage position of a target storage block occupied by the file to be read corresponding to each file information, performing first redundancy removal on the first discontinuous file group, the second discontinuous file group and the second file group to obtain a third file group, wherein for each first discontinuous file group and each second discontinuous file group, according to the initial position and the first ending position of the target storage block occupied by the file to be read corresponding to the file information included in the discontinuous file group, determining a first reading range of the file information in the discontinuous file group so as to take the file information corresponding to the first reading range as the third file group, the first ending position is determined according to the difference between the storage size sum of the target storage blocks except the last target storage block in the target storage blocks occupied by the files to be read corresponding to the file information included in the non-continuous file group and the file size of the files to be read corresponding to the file information included in the non-continuous file group, for each second file group, the second reading range of the file information of the second file group is determined according to the starting position of the target storage block occupied by the files to be read corresponding to the file information included in the second file group and the second ending position of the target storage block occupied by the files to be read corresponding to the last file information of the file information included in the second file group, so that the file information corresponding to the second reading range is used as the third file group, and the second ending position is used according to the file information corresponding to the last file information included in the second file group The sum of the storage sizes of the target storage blocks except the last target storage block in the occupied target storage blocks and the difference between the file size of the file to be read corresponding to the last file information of the file information included in the second file group are determined;
copying files to be read corresponding to the third file group in the hard disk to a second memory area of a memory of the service system according to the reading range corresponding to each third file group, so as to take the copy content corresponding to the third file group in the second memory area as a fourth file group;
for the files to be read corresponding to each second file group in the fourth file group, performing second redundancy removal on a redundancy space between the files to be read corresponding to the second file group according to the starting position of the target storage block occupied by each file to be read corresponding to the second file group and the third ending position of the target storage block occupied by each file to be read corresponding to the second file group to obtain a fifth file group, wherein the third ending position is determined according to the difference between the sum of the storage sizes of the target storage blocks except the last target storage block in the target storage blocks occupied by each file to be read corresponding to the second file group and the file size of each file to be read corresponding to the second file group;
and reading the fifth file group and the file groups except the files to be read corresponding to the second file group in the fourth file group.
Optionally, the method further comprises:
and responding to the selection operation of the user on the file type of the file stored in the hard disk, and determining the file corresponding to the file type selected by the user as the file to be read.
Optionally, the method further comprises:
and sending the read files in a group form in parallel, wherein the sum of the group files sent in parallel is less than the sending bandwidth.
Optionally, the method further comprises:
the files sent in parallel are stored in the form of asynchronous diskettes.
In a second aspect, an embodiment of the present application provides a file reading apparatus, where the apparatus is in a service system, a hard disk of the service system stores multiple files to be read, and when the service system is in a kernel state, the apparatus includes:
the system comprises a traversing unit, a first storage unit and a second storage unit, wherein the traversing unit is used for traversing a plurality of files to be read and storing an obtained first index table containing file information corresponding to each file to be read into a first memory area of a memory of the service system, and for each file information, the file information comprises the storage position of a target storage block occupied by the file to be read corresponding to the file information, the file size of the file to be read corresponding to the file information, the tail block position of the last target storage block occupied by the file to be read corresponding to the file information and the continuity between the target storage blocks occupied by the file to be read corresponding to the file information;
the sorting unit is used for sorting the file information in the first index table according to the storage position of a first target storage block occupied by the file to be read corresponding to the file information and the sequence of the storage positions to obtain a second index table;
a first grouping unit, configured to perform first grouping division on file information according to a sequence of the file information included in the second index table according to continuity between target storage blocks occupied by files to be read corresponding to the file information, where each file information includes a storage location of a target storage block occupied by a file to be read corresponding to the file information, and a tail block location of a last target storage block occupied by a file to be read corresponding to the file information, so as to obtain a plurality of first file groups, where, for a file to be read for which each occupied target storage block is a discontinuous file, the file information corresponding to the file to be read is taken as a first discontinuous file group, when each occupied target storage block is a continuous file to be read and the tail block position of the last target storage block occupied by the file to be read is not continuous with the storage position of the target storage block occupied by the next file to be read, taking file information corresponding to the file to be read as a second non-continuous file group, and when each occupied target storage block is a continuous file to be read and the tail block position of the last target storage block occupied by the file to be read is continuous with the storage position of the target storage block occupied by the next file to be read, taking file information corresponding to the file to be read and file information corresponding to the next file to be read as a continuous file group;
a second grouping unit, configured to perform second group division on consecutive file groups in the multiple first file groups according to file sizes of files to be read corresponding to the file information and according to a sequence of the file information included in the second index table, so as to obtain multiple second file groups, where for each of the consecutive file groups, if a sum of the file sizes of the files to be read corresponding to the consecutive file groups is less than or equal to a preset threshold, the consecutive file group is used as one second file group, and if the sum of the file sizes of the files to be read corresponding to the consecutive file groups is greater than the preset threshold, the consecutive file groups are divided into N second file groups, where in the first N-1 second file groups, a sum of the file sizes of the files to be read corresponding to the file information included in each second file group is greater than or equal to the preset threshold, a sum of the file sizes of the files to be read corresponding to the file information included in each second file group minus a size of a file corresponding to a last file information included in the second file groups is greater than the preset threshold, and a value of N is an integer greater than 1;
a first processing unit, configured to perform first redundancy removal on the first discontinuous file group, the second discontinuous file group, and the second file group according to a file size of a file to be read corresponding to each piece of file information and a storage location of a target storage block occupied by the file information including the file to be read corresponding to the file information, so as to obtain a third file group, where for each of the first discontinuous file group and the second discontinuous file group, a first read range of file information in the discontinuous file group is determined according to an initial location and a first end location of the target storage block occupied by the file to be read corresponding to the file information included in the discontinuous file group, so as to use the file information corresponding to the first read range as the third file group, the first ending position is determined according to the difference between the sum of the storage sizes of target storage blocks except the last target storage block in target storage blocks occupied by files to be read corresponding to the file information included in the non-continuous file group and the file size of the files to be read corresponding to the file information included in the non-continuous file group, for each second file group, the second reading range of the file information of the second file group is determined according to the starting position of the target storage block occupied by the files to be read corresponding to the file information included in the second file group and the second ending position of the target storage block occupied by the files to be read corresponding to the last file information included in the second file group, so that the file information corresponding to the second reading range is used as the third file group, and the second ending position is determined according to the last file information included in the second file group The sum of the storage sizes of the target storage blocks except the last target storage block in the target storage blocks occupied by the corresponding files to be read is determined by the difference between the file size of the files to be read corresponding to the last file information of the file information included in the second file group;
a copying unit, configured to copy, according to a reading range corresponding to each third file group, a file to be read, corresponding to the third file group in the hard disk, to a second memory area of a memory of the service system, so as to use a copy content corresponding to the third file group in the second memory area as a fourth file group;
a second processing unit, configured to, for a to-be-read file corresponding to each second file group in the fourth file group, perform second redundancy removal on a redundancy space between the to-be-read files corresponding to the second file group according to a starting position of a target storage block occupied by each to-be-read file corresponding to the second file group and a third ending position of the target storage block occupied by each to-be-read file corresponding to the second file group, so as to obtain a fifth file group, where the third ending position is determined according to a difference between a sum of storage sizes of target storage blocks, excluding a last target storage block, in the target storage blocks occupied by each to-be-read file corresponding to the second file group and a file size of each to-be-read file corresponding to the second file group;
and the reading unit is used for reading the fifth file group and the file groups except the files to be read corresponding to the second file group in the fourth file group.
Optionally, the apparatus further comprises:
and the selection unit is used for responding to the selection operation of the user on the file type of the file stored in the hard disk and determining the file corresponding to the file type selected by the user as the file to be read.
Optionally, the apparatus further comprises:
and the sending unit is used for sending the read files in parallel in a group form, wherein the sum of the group files sent in parallel is less than the sending bandwidth.
Optionally, the files sent in parallel are stored in an asynchronous disk-dropping manner.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the file reading method according to any one of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the file reading method according to any one of the first aspect.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
in the method, a file to be read is divided into a continuous file to be read and a discontinuous file to be read through an index table, then the larger continuous file to be read is divided into a plurality of groups according to a preset division rule, after the division is finished, a redundant space in the last storage block in each group is removed to obtain a reading range of each group, the file in the hard disk is read into a memory according to the reading range of each group, and the redundant space of the last storage block in each group is removed, so that the redundant space is not read during reading, and the reading efficiency is improved.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of a file reading method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a document reading apparatus according to a second embodiment of the present application;
fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted in advance that the storage block is a basic storage unit of the hard disk, and one hard disk can be divided into 2 n A file usually occupies one or more storage blocks, and a storage block only stores one file, for example: a file having a size of 2.5M and a storage block having a size of 1M, then the file needs to occupy 3 storage blocks, wherein the third storage block is not full, but the remaining space of the third storage block is not used for storing other files, that is: two or more files cannot be stored in the same storage block, and the storage locations of the storage blocks occupied by the files may be continuous or discontinuous, for example: the hard disk comprises 5 storage blocks, the storage positions of the 5 storage blocks are sequentially a storage block 1, a storage block 2, a storage block 3, a storage block 4 and a storage block 5, if the storage block occupied by the file is the storage block 1, the storage block 2 and the storage block 3, the storage positions of the storage blocks occupied by the file are continuous, the file can also be called as a continuous file, if the storage block occupied by the file is the storage block 1, the storage block 2 and the storage block 4, or the storage block 1, the storage block 3 and the storage block 5, the storage positions of the storage blocks occupied by the file are discontinuous, the file can also be called as a discontinuous file, namely: when a file occupies a plurality of storage blocks, the file is a discontinuous file as long as two of the storage blocks occupied sequentially are discontinuous, and the sequential occupation means that when the storage blocks occupied by the file are a storage block 1, a storage block 2 and a storage block 4, the storage blocks occupied sequentially are sequentially a storage block 1, a storage block 2 and a storage block 4The file comprises a storage block 1, a storage block 2 and a storage block 4, wherein the storage block 2 and the storage block 4 are not continuous, so that the file is discontinuous.
When a file is stored in a hard disk, the occupied storage blocks may be discontinuous, and the storage blocks occupied by two files are also discontinuous, so that when a plurality of files are read, a hard disk magnetic head frequently jumps due to the storage position of the storage blocks, and after one file is read, the other file is discontinuous with the file, and frequent switching between a user mode and a kernel mode needs to be performed, so that the overall reading efficiency of the file is low.
In order to solve the above problems, the present application provides a file reading method, a file reading apparatus, an electronic device, and a storage medium, so as to improve the overall reading efficiency of a file.
Example one
Fig. 1 is a schematic flowchart of a file reading method provided in an embodiment of the present application, where the method is applied to a service system, a hard disk of the service system stores multiple files to be read, and when the service system is in a kernel state, as shown in fig. 1, the method includes the following steps:
step 101, traversing a plurality of files to be read, and storing an obtained first index table containing file information corresponding to each file to be read into a first memory area of a memory of the service system, wherein for each file information, the file information includes a storage position of a target storage block occupied by the file to be read corresponding to the file information, a file size of the file to be read corresponding to the file information, a tail block position of a last target storage block occupied by the file to be read corresponding to the file information, and continuity between the target storage blocks occupied by the file to be read corresponding to the file information.
Specifically, after a file is generated by a service system, the file is stored in a hard disk, at this time, the file occupies a certain number of storage blocks, and after the file to be read is determined, in order to improve the reading efficiency of the file to be read, it is necessary to determine relevant file information of the file to be read, for example, a storage position of a target storage block occupied by the file to be read, and which storage blocks are occupied by the file to be read can be determined through the file information; and the file size of the file to be read; and the tail block position of the last target storage block occupied by the file to be read, for example: a target storage block occupied by a file to be read is a storage block 1, a storage block 2 and a storage block 3, and the file to be read is stored according to the sequence of the storage block 1, the storage block 2 and the storage block 3, at this time, the storage space in the storage block 1 and the storage block 2 is occupied by the file to be read, if the size of the file to be read is smaller than the total size of the storage block 1, the storage block 2 and the storage block 3, the storage block 3 is not occupied, if the size of the file to be read is equal to the total size of the storage block 1, the storage block 2 and the storage block 3, the storage block 3 is occupied, no matter whether the storage block 3 is occupied, the storage block 3 is the last target storage block occupied by the file to be read, the tail block position refers to the position of the tail end position of the storage block 3 in the hard disk, or can be understood as a boundary in the range of the storage space of the storage block 3; and also obtaining the continuity between the target storage blocks occupied by the file to be read, namely: and if the file to be read occupies a plurality of target storage blocks, judging whether the target storage blocks are continuous or not.
After the four kinds of information are obtained, a file to be read can be described in detail through the four kinds of information, that is: the file information of the file to be read can be formed, and then the file information of the files to be read is stored in a first memory area of a memory of a service system in the form of a first index table, so that useless information in the files to be read is removed in the form of the index table.
And step 102, according to the storage position of the first target storage block occupied by the file to be read corresponding to each piece of file information, and according to the sequence of the storage positions, performing sorting on each piece of file information in the first index table to obtain a second index table.
Specifically, when reading a file to be read in a hard disk, it is fastest to read according to the sequence of storage blocks in the hard disk, for example: after a hard disk is divided according to the sequence of the storage block 1, the storage block 2, the storage block 3, the storage block 4, and the storage block 5, when a file in the five storage blocks is read, it is fastest to read according to the sequence of the storage block 1, the storage block 2, the storage block 3, the storage block 4, and the storage block 5, but since one file may occupy multiple storage blocks, file information may be sorted according to the storage location of a first target storage block occupied by the file to be read, for example: when a first file occupies the storage block 1, the storage block 3 and the storage block 5, and a second file occupies the storage block 2 and the storage block 4, a first target storage block occupied by the first file is the storage block 1, and a second target storage block occupied by the second file is the storage block 3, so that the arrangement sequence of the file information corresponding to the first file and the second file is the first file and the second file.
Step 103, according to the continuity between the target storage blocks occupied by the files to be read corresponding to the file information, wherein each file information includes the storage position of the target storage block occupied by the file to be read corresponding to the file information, and the tail block position of the last target storage block occupied by the file to be read corresponding to the file information, the file information is subjected to first group division according to the sequence of the file information included in the second index table to obtain a plurality of first file groups, wherein for each occupied target storage block which is a discontinuous file to be read, the file information corresponding to the file to be read is taken as a first discontinuous file group, for each occupied target storage block which is a continuous file to be read and the tail block position of the last target storage block of the file to be read and the storage position of the target storage block occupied by the next file to be read are discontinuous files, the file information corresponding to the file to be read is taken as a second discontinuous file group, for each occupied target storage block which is a continuous file to be read and the tail block position of the last target storage block occupied by the file to be read is taken as a discontinuous file group, and for each occupied target storage block which is a continuous file to be read and the tail block of the file to be read corresponding to the target storage block occupied by the next file to be read, and the tail block of the file to be read corresponding to be read.
Specifically, after the second index table is obtained, the sequence of each file to be read may be determined, and in order to facilitate subsequent operations, it is necessary to distinguish whether each file to be read is continuous with other files, where if a plurality of target storage blocks occupied by one file to be read are discontinuous, the file to be read is taken as a first discontinuous file group, at this time, the first discontinuous file group only includes the file to be read, and if a plurality of target storage blocks occupied by one file to be read are continuous, but the storage location of the file to be read is discontinuous with the storage locations of the other files to be read, that is: if the storage block occupied by the file to be read in the hard disk is discontinuous from the storage block occupied by other files to be read in the hard disk, the file to be read is taken as a second discontinuous file group, and at this time, the second discontinuous file group only includes the file to be read, for example: the hard disk is divided according to the sequence of a storage block 1, a storage block 2, a storage block 3, a storage block 4 and a storage block 5, wherein the first file to be read occupies the storage block 1 and the storage block 2, if the second file to be read occupies the storage block 4 and the storage block 5, the first file to be read is used as a second discontinuous file group, if the first file to be read occupies the storage block 1 and the storage block 2, and if the second file to be read occupies the storage block 3 and the storage block 4, the two files are used as a continuous file group.
Another example is: the hard disk is divided according to the sequence of a storage block 1, a storage block 2, a storage block 3, a storage block 4 and a storage block 5, a first file to be read occupies the storage block 1, the storage block 2 and the storage block 4, a second file to be read occupies the storage block 5, the first file to be read is used as a first discontinuous file group, and the second file to be read is used as a second discontinuous file group.
Further, when determining a file group, it is necessary to first determine whether storage blocks occupied by the file to be read are continuous, if not, it is determined as a first discontinuous file group, if so, it is determined whether a storage block occupied by a next file to be read adjacent to the file to be read is continuous with a storage block occupied by the file to be read, if not, it is determined as a second discontinuous file group, if so, the two files to be read are determined as a continuous file group, after determining the continuous file group, it is continuously determined whether a next file to be read is continuous with the continuous file group, if so, the three file groups to be read are determined as a continuous file group, if not, the previous two continuous files to be read are determined as a continuous file group, then it is determined whether a file to be read adjacent to the next file to be read is continuous with the next file to be read and the next file to be read are continuous, and so on the basis that information of all files in the second index table including information of each file is sequentially divided into a first discontinuous file group, a second discontinuous file group, and a discontinuous file group.
Step 104, according to the file size of the file to be read corresponding to each piece of file information, performing second group division on a plurality of continuous file groups in the first file groups according to the sequence of the file information included in the second index table to obtain a plurality of second file groups, wherein for each continuous file group, if the sum of the file sizes of the files to be read corresponding to the continuous file group is less than or equal to a preset threshold, the continuous file group is used as one second file group, if the sum of the file sizes of the files to be read corresponding to the continuous file group is greater than the preset threshold, the continuous file group is divided into N second file groups, in the first N-1 second file groups, the sum of the file sizes of the files to be read corresponding to the file information included in each second file group is greater than or equal to the preset threshold, the sum of the file sizes of the files to be read corresponding to the file information included in each second file group minus the preset threshold corresponding to the file size of the last file information included in the second file groups is greater than 1, and the value of the file size of the file to be greater than N is an integer.
Specifically, after the first file group is determined, the size of some continuous file groups may be too large, so that data transmission after reading is affected, and therefore the continuous file groups need to be divided, because this division is to avoid occurrence of a larger continuous file group, if the size of one continuous file group is smaller than or equal to a preset threshold, the continuous file group is not divided, if the size of one continuous file group is larger than the preset threshold, the continuous file group needs to be divided, and when the continuous file group is divided, the division is performed with the size just larger than or equal to the preset threshold as a criterion, for example: the preset threshold value is 1M, the continuous file group comprises 3 files to be read, if the first file to be read is 0.9M, the second file to be read is 0.8M and the third file to be read is 0.7M, the first file to be read and the second file to be read are divided into a second file group, and the third file to be read is divided into another second file group; if the first file to be read is 1.1M, the second file to be read is 0.8M and the third file to be read is 0.7M, dividing the first file to be read into a second file group, and dividing the second file to be read and the third file to be read into another second file group; and if the first file to be read is 0.3M, the second file to be read is 0.5M and the third file to be read is 0.6M, taking the three files to be read as a second file group, wherein when the three files to be read are divided, the three files to be read are divided according to the sequence of the three files to be read in a second index table.
The sizes of the non-continuous file groups may have non-continuous file groups larger than a preset threshold.
Step 105, according to the file size of the file to be read corresponding to each piece of file information and the storage location of the target storage block occupied by the file to be read corresponding to each piece of file information, performing first redundancy removal on the first discontinuous file group, the second discontinuous file group and the second file group to obtain a third file group, wherein for each first discontinuous file group and each second discontinuous file group, determining a first reading range of the file information in the discontinuous file group according to the initial location and the first end location of the target storage block occupied by the file to be read corresponding to the file information included in the discontinuous file group, so as to use the file information corresponding to the first reading range as the third file group, the first ending position is determined according to the difference between the sum of the storage sizes of target storage blocks except the last target storage block in target storage blocks occupied by files to be read corresponding to the file information included in the non-continuous file group and the file size of the files to be read corresponding to the file information included in the non-continuous file group, for each second file group, a second reading range of the file information of the second file group is determined according to the starting position of the target storage block occupied by the files to be read corresponding to the file information included in the second file group and a second ending position of the target storage block occupied by the files to be read corresponding to the last file information included in the second file group, so that the file information corresponding to the second reading range is used as the third file group, and the second ending position is determined according to the difference between the sum of the storage sizes of the target storage blocks except the last target storage block in the files to be read corresponding to the file information included in the non-continuous file group And determining the sum of the storage sizes of the target storage blocks except the last target storage block in the target storage blocks occupied by the read file and the difference between the file size of the file to be read corresponding to the last file information of the file information included in the second file group.
Specifically, after the division in step 104, the obtained file group includes: the method comprises a first discontinuous file group, a second discontinuous file group and a second file group, wherein the first discontinuous file group and the second discontinuous file group only comprise a file to be read, the second file group may be composed of a plurality of continuous files to be read, and for the first discontinuous file group and the second discontinuous file group, redundant space exists in only the last target storage block in target storage blocks occupied by the files to be read in the two file groups, therefore, only the storage positions of the files to be read in the target storage blocks need to be determined for the first discontinuous file group and the second discontinuous file group, namely: when the first reading range is determined, a first ending position of the file to be read in the last target storage block can be determined according to the difference between the sum of the storage sizes of the target storage blocks except the last target storage block in the target storage blocks occupied by the file to be read and the file size of the file to be read, and then the first reading range is determined according to the starting position and the first ending position of the target storage blocks occupied by the file to be read; for each second file group, the files to be read included in the second file group are consecutive, where the consecutive file groups include: if the second file group only includes one file to be read, the target storage blocks occupied by the file to be read are continuous, and if the second file group includes a plurality of files to be read, the target storage blocks occupied by the plurality of files to be read are continuous, at this time, all the files to be read included in the second file group can be regarded as a whole, and then the storage position of the whole in the target storage blocks is determined, that is: when the second reading range of the file information corresponding to the whole is determined, a second ending position of the last file to be read in the last target storage block in the whole can be determined according to a difference between a sum of storage sizes of the target storage blocks except the last target storage block in the target storage block occupied by the last file to be read in the whole and a file size of the last file to be read in the whole, and then the second reading range can be determined according to a starting position and a second ending position of the target storage block occupied by the whole.
Step 106, copying the file to be read corresponding to the third file group in the hard disk to a second memory area of the memory of the service system according to the reading range corresponding to each third file group, so as to use the copy content corresponding to the third file group in the second memory area as a fourth file group.
Specifically, after the first reading range and the second reading range are determined, when a file to be read stored in the hard disk is copied, the redundant space in the last target storage block in the file group is not read, and when the file to be read is read, due to the existence of continuous files to be read, compared with the prior art, the method and the device for reading the file to be read can reduce the number of state switching and the frequency of switching a magnetic head of the hard disk, so that the reading efficiency is improved.
Step 107, for the files to be read corresponding to each second file group in the fourth file group, according to the starting position of the target storage block occupied by each file to be read corresponding to the second file group and the third ending position of the target storage block occupied by each file to be read corresponding to the second file group, performing second redundancy removal on the redundant space between the files to be read corresponding to the second file group to obtain a fifth file group, wherein the third ending position is determined according to the difference between the sum of the storage sizes of the target storage blocks except the last target storage block in the target storage blocks occupied by each file to be read corresponding to the second file group and the file size of each file to be read corresponding to the second file group.
Specifically, after the redundant space in the last target storage block is removed, the redundant space does not exist in the first discontinuous file group and the second discontinuous file group, and after the redundant space in the last target storage block is removed in the second file group, the redundant space still exists between two continuous files to be read, for example: when a second file group is composed of a first file to be read and a second file to be read, after the first redundant space is removed, only the redundant space in the second file to be read is removed, the redundant space still exists in the first file to be read, the remaining redundant space needs to be removed at this time, each file to be read in the second file group can be regarded as a whole at this time, then the space occupied by the file to be read in the target storage block is determined by using the starting position and the ending position of the target storage block occupied by the file to be read, so that the redundant space in the last target storage block occupied by the file to be read is removed, and the redundant space among the files to be read can be removed at this time.
And step 108, reading the fifth file group and the file groups except the files to be read corresponding to the second file group in the fourth file group.
Specifically, all redundant spaces in one file group can be removed through the above operations, and since the second memory space includes all valid data of the file to be read, the reading efficiency is improved when the data in the second memory space is read.
In a possible embodiment, before performing step 101, a file corresponding to the file type selected by the user may be determined as the file to be read in response to a user selecting the file type of the file stored in the hard disk.
Specifically, when transferring a file, an option of a file type of the file stored in the hard disk may be displayed to the user, the user may select the option, and after the user finishes selecting, the file included in the file type selected by the user is used as the file to be read, and the processing shown in fig. 1 is performed.
In one possible embodiment, after step 108 is performed, the read files are transmitted in parallel in groups, wherein the sum of the group files transmitted in parallel is less than the transmission bandwidth.
Specifically, when data is transmitted, a plurality of data can be transmitted as long as the bandwidth is not exceeded, and in order to improve the data transmission efficiency, a plurality of file groups can be read at one time and then the read file groups are transmitted in parallel, so that the data transmission efficiency is improved.
In one possible embodiment, the files sent in parallel are stored in an asynchronous destage.
Specifically, after receiving the transmitted file, the received file may be stored while continuing to receive other files, instead of being stored together after receiving all files, which is beneficial to improving the storage efficiency.
Example two
Fig. 2 is a schematic structural diagram of a file reading apparatus provided in the second embodiment of the present application, where the apparatus is in a service system, a hard disk of the service system stores a plurality of files to be read, and when the service system is in a kernel state, as shown in fig. 2, the apparatus includes:
the traversal unit 201 is configured to traverse a plurality of files to be read, and store an obtained first index table that includes file information corresponding to each of the files to be read into a first memory area of a memory of the service system, where for each piece of file information, the piece of file information includes a storage location of a target storage block occupied by the file to be read corresponding to the piece of file information, a file size of the file to be read corresponding to the piece of file information, a tail block location of a last target storage block occupied by the file to be read corresponding to the piece of file information, and continuity between the target storage blocks occupied by the file to be read corresponding to the piece of file information;
a sorting unit 202, configured to sort, according to a storage position of a first target storage block occupied by a file to be read corresponding to each piece of file information and according to a sequence of the storage positions, each piece of file information in the first index table to obtain a second index table;
a first grouping unit 203, configured to perform first grouping division on file information according to a sequence of the file information included in the second index table according to continuity between target storage blocks occupied by files to be read corresponding to the file information, where each file information includes a storage location of a target storage block occupied by a file to be read corresponding to the file information, and a tail block location of a last target storage block occupied by a file to be read corresponding to each file information, so as to obtain a plurality of first file groups, where, for a file to be read for which each occupied target storage block is discontinuous, the file information corresponding to the file to be read is used as a first discontinuous file group, for a file to be read for which each occupied target storage block is continuous, and a tail block location of the last target storage block occupied by the file to be read is discontinuous with a storage location of a target storage block occupied by a next file to be read, the file information corresponding to the file to be read is used as a second discontinuous file group, for each occupied target storage block is continuous and a tail block location of the file to be read is used as a continuous storage block and a storage location of the last target storage block occupied by the file to be read is used as a continuous storage block corresponding to be read, and the file group, and the file information corresponding to be read for each occupied target storage block is used as a continuous storage block of the file to be read and a target storage block corresponding to be read, and a target storage block corresponding to be read when each occupied by the file is continuous storage block corresponding to be read, and the file to be read;
a second grouping unit 204, configured to perform second group division on consecutive file groups in the multiple first file groups according to the file sizes of the files to be read corresponding to the file information and according to the sequence of the file information included in the second index table, so as to obtain multiple second file groups, where for each of the consecutive file groups, if a sum of the file sizes of the files to be read corresponding to the consecutive file groups is less than or equal to a preset threshold, the consecutive file group is used as one second file group, and if the sum of the file sizes of the files to be read corresponding to the consecutive file groups is greater than the preset threshold, the consecutive file group is divided into N second file groups, where in the first N-1 second file groups, a sum of the file sizes of the files to be read corresponding to the file information included in each second file group is greater than or equal to the preset threshold, a sum of the file sizes of the files to be read corresponding to the file information included in each second file group minus a size of a file corresponding to the last file information included in the second file groups is greater than the preset threshold, and a value of the N is an integer greater than 1;
a first processing unit 205, configured to perform, according to a file size of a file to be read corresponding to each piece of file information and a storage location of a target storage block occupied by the file to be read corresponding to each piece of file information, first redundancy removal on the first discontinuous file group, the second discontinuous file group, and the second file group to obtain a third file group, where, for each of the first discontinuous file group and each of the second discontinuous file group, a first read range of file information in the discontinuous file group is determined according to a start position and a first end position of the target storage block occupied by the file to be read corresponding to the file information included in the discontinuous file group, so as to use file information corresponding to the first read range as the third file group, the first end position is determined according to a difference between a storage size of a target storage block other than a last target storage block in the target storage blocks occupied by the file to be read corresponding to the file group included in the discontinuous file group and a start position and a first end position of the target storage block occupied by the file to be read corresponding to the file information included in the discontinuous file group, and the second read range of the second discontinuous file group includes a last target storage block corresponding to the start position of the second read information, and the second read range of the second file group includes a last target storage block corresponding to the read range of the file group, and the second read range of the file group, and the last target storage block corresponding to obtain a second read range of the second file group, and a second read range of the file group, the second ending position is determined according to the difference between the sum of the storage sizes of target storage blocks except the last target storage block in the target storage blocks occupied by the file to be read corresponding to the last file information of the file information included in the second file group and the file size of the file to be read corresponding to the last file information of the file information included in the second file group;
a copying unit 206, configured to copy, according to the reading range corresponding to each third file group, files to be read, corresponding to the third file group in the hard disk, to a second memory area of the memory of the service system, so as to use copy content corresponding to the third file group in the second memory area as a fourth file group;
a second processing unit 207, configured to, for the to-be-read files corresponding to each second file group in the fourth file group, perform second redundancy removal on a redundancy space between the to-be-read files corresponding to the second file group according to a starting position of a target storage block occupied by each to-be-read file corresponding to the second file group and a third ending position of a target storage block occupied by each to-be-read file corresponding to the second file group, so as to obtain a fifth file group, where the third ending position is determined according to a difference between a sum of storage sizes of target storage blocks, excluding a last target storage block, in the target storage blocks occupied by each to-be-read file corresponding to the second file group and a file size of each to-be-read file corresponding to the second file group;
the reading unit 208 is configured to read the fifth file group and a file group in the fourth file group except for the file to be read corresponding to the second file group.
In one possible embodiment, the apparatus further comprises:
and the selection unit is used for responding to the selection operation of the user on the file type of the file stored in the hard disk and determining the file corresponding to the file type selected by the user as the file to be read.
In one possible embodiment, the apparatus further comprises:
and the sending unit is used for sending the read files in parallel in a group form, wherein the sum of the group files sent in parallel is less than the sending bandwidth.
In one possible embodiment, the files sent in parallel are stored in an asynchronous destage.
For the principle description of the second embodiment, reference is made to the detailed description of the first embodiment, and the detailed description is omitted here.
EXAMPLE III
Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present application, including: a processor 301, a storage medium 302 and a bus 303, wherein the storage medium 302 stores machine-readable instructions executable by the processor 301, when the electronic device runs the file reading method, the processor 301 and the storage medium 302 communicate with each other through the bus 303, and the processor 301 executes the machine-readable instructions to perform the method according to the first embodiment.
Example four
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the method described in the first embodiment.
The apparatus provided in the embodiments of the present application may be specific hardware on a device, or software or firmware installed on a device, or the like. The device provided by the embodiment of the present application has the same implementation principle and technical effect as the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments where no part of the device embodiments is mentioned. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the foregoing systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solutions of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures, and moreover, the terms "first," "second," "third," etc. are used merely to distinguish one description from another, and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A file reading method is applied to a service system, a hard disk of the service system stores a plurality of files to be read, and when the service system is in a kernel mode, the method comprises the following steps:
traversing a plurality of files to be read, and storing an obtained first index table containing file information corresponding to each file to be read into a first memory area of a memory of the service system, wherein for each file information, the file information comprises a storage position of a target storage block occupied by the file to be read corresponding to the file information, a file size of the file to be read corresponding to the file information, a tail block position of a last target storage block occupied by the file to be read corresponding to the file information, and continuity between the target storage blocks occupied by the file to be read corresponding to the file information;
sorting the file information in the first index table according to the storage positions of a first target storage block occupied by the file to be read corresponding to the file information and the sequence of the storage positions to obtain a second index table;
according to the continuity between target storage blocks occupied by files to be read corresponding to the file information, the storage position of the target storage block occupied by the files to be read corresponding to the file information and the tail block position of the last target storage block occupied by the files to be read corresponding to the file information, performing first group division on the file information according to the sequence of the file information included in the second index table to obtain a plurality of first file groups, wherein for each occupied target storage block which is a discontinuous file to be read, the file information corresponding to the files to be read is taken as a first discontinuous file group, for each occupied target storage block which is a continuous file to be read and the tail block position of the last target storage block occupied by the files to be read and the storage position of the target storage block occupied by the next file to be read are discontinuous, the file information corresponding to the files to be read is taken as a second discontinuous file group, for each occupied target storage block which is a continuous file to be read and the tail block position occupied by the last target storage block of the files to be read and the tail block position of the next file to be read are taken as a continuous file group, and the file to be read corresponding to the next target storage block occupied by the file and the target storage block of the continuous file to be read, and the file to be read are taken as a continuous file group;
according to the file size of the file to be read corresponding to each file information, performing second group division on continuous file groups in the plurality of first file groups according to the sequence of the file information included in the second index table to obtain a plurality of second file groups, wherein for each continuous file group, if the sum of the file sizes of the files to be read corresponding to the continuous file group is smaller than or equal to a preset threshold value, the continuous file group is taken as one second file group, if the sum of the file sizes of the files to be read corresponding to the continuous file group is larger than the preset threshold value, the continuous file group is divided into N second file groups, in the first N-1 second file groups, the sum of the file sizes of the files to be read corresponding to the file information included in each second file group is larger than or equal to the preset threshold value, the sum of the file sizes of the files to be read corresponding to the file information included in each second file group minus the file size of the file to be read corresponding to the last file information included in the second file groups is smaller than the preset threshold value, and the positive value of N is larger than 1;
according to the file size of the file to be read corresponding to each piece of file information and the storage position of a target storage block occupied by the file to be read corresponding to the file information, performing first redundancy removal on the first discontinuous file group, the second discontinuous file group and the second file group to obtain a third file group, wherein for each first discontinuous file group and each second discontinuous file group, according to the initial position and the first end position of the target storage block occupied by the file to be read corresponding to the file information included in the discontinuous file group, a first reading range of the file information in the discontinuous file group is determined so as to take the file information corresponding to the first reading range as the third file group, the first ending position is determined according to the difference between the storage size sum of the target storage blocks except the last target storage block in the target storage blocks occupied by the files to be read corresponding to the file information included in the non-continuous file group and the file size of the files to be read corresponding to the file information included in the non-continuous file group, for each second file group, the second reading range of the file information of the second file group is determined according to the starting position of the target storage block occupied by the files to be read corresponding to the file information included in the second file group and the second ending position of the target storage block occupied by the files to be read corresponding to the last file information of the file information included in the second file group, so that the file information corresponding to the second reading range is used as the third file group, and the second ending position is used according to the file information corresponding to the last file information included in the second file group The sum of the storage sizes of the target storage blocks except the last target storage block in the occupied target storage blocks is determined by the difference between the file size of the file to be read corresponding to the last file information of the file information included in the second file group;
copying files to be read corresponding to the third file group in the hard disk to a second memory area of a memory of the service system according to the reading range corresponding to each third file group, so as to take the copy content corresponding to the third file group in the second memory area as a fourth file group;
for the files to be read corresponding to each second file group in the fourth file group, performing second redundancy removal on a redundancy space between the files to be read corresponding to the second file group according to the starting position of the target storage block occupied by each file to be read corresponding to the second file group and the third ending position of the target storage block occupied by each file to be read corresponding to the second file group to obtain a fifth file group, wherein the third ending position is determined according to the difference between the sum of the storage sizes of the target storage blocks except the last target storage block in the target storage blocks occupied by each file to be read corresponding to the second file group and the file size of each file to be read corresponding to the second file group;
and reading the fifth file group and the file groups except the files to be read corresponding to the second file group in the fourth file group.
2. The method of claim 1, wherein the method further comprises:
and responding to the selection operation of the user on the file type of the file stored in the hard disk, and determining the file corresponding to the file type selected by the user as the file to be read.
3. The method of claim 1, wherein the method further comprises:
and sending the read files in a group form in parallel, wherein the sum of the group files sent in parallel is less than the sending bandwidth.
4. The method of claim 3, wherein the method further comprises:
the files sent in parallel are stored in an asynchronous disk-dropping mode.
5. A file reading apparatus, where the apparatus is in a service system, a hard disk of the service system stores a plurality of files to be read, and when the service system is in a kernel state, the apparatus includes:
the system comprises a traversing unit, a first storage unit and a second storage unit, wherein the traversing unit is used for traversing a plurality of files to be read and storing an obtained first index table containing file information corresponding to each file to be read into a first memory area of a memory of the service system, and for each file information, the file information comprises the storage position of a target storage block occupied by the file to be read corresponding to the file information, the file size of the file to be read corresponding to the file information, the tail block position of the last target storage block occupied by the file to be read corresponding to the file information and the continuity between the target storage blocks occupied by the file to be read corresponding to the file information;
the sorting unit is used for sorting the file information in the first index table according to the storage positions of the first target storage blocks occupied by the files to be read corresponding to the file information and the sequence of the storage positions to obtain a second index table;
a first grouping unit, configured to perform first grouping division on file information according to a sequence of the file information included in the second index table according to continuity between target storage blocks occupied by files to be read corresponding to the file information, a storage position of a target storage block occupied by a file to be read corresponding to the file information, and a tail block position of a last target storage block occupied by a file to be read corresponding to the file information, so as to obtain a plurality of first file groups, where, for each occupied target storage block that is a discontinuous file to be read, the file information corresponding to the file to be read is taken as a first discontinuous file group, when each occupied target storage block is a continuous file to be read and the tail block position of the last target storage block occupied by the file to be read is not continuous with the storage position of the target storage block occupied by the next file to be read, taking file information corresponding to the file to be read as a second non-continuous file group, and when each occupied target storage block is a continuous file to be read and the tail block position of the last target storage block occupied by the file to be read is continuous with the storage position of the target storage block occupied by the next file to be read, taking file information corresponding to the file to be read and file information corresponding to the next file to be read as a continuous file group;
a second grouping unit, configured to perform second group division on a continuous file group in the multiple first file groups according to the file sizes of the files to be read corresponding to the file information and according to the sequence of the file information included in the second index table, so as to obtain multiple second file groups, where, for each continuous file group, if a sum of file sizes of the files to be read corresponding to the continuous file group is less than or equal to a preset threshold, the continuous file group is used as one second file group, if a sum of file sizes of the files to be read corresponding to the continuous file group is greater than the preset threshold, the continuous file group is divided into N second file groups, where, in the first N-1 second file groups, a sum of file sizes of the files to be read corresponding to the file information included in each second file group is greater than or equal to the preset threshold, a value obtained by subtracting a size of a file to be read corresponding to last file information included in the file groups from the file information included in each second file group is greater than 1, and N is a positive integer;
a first processing unit, configured to perform first redundancy removal on the first non-consecutive file group, the second non-consecutive file group, and the second file group according to a file size of a to-be-read file corresponding to each piece of file information and a storage location of a target storage block occupied by the to-be-read file corresponding to the piece of file information, so as to obtain a third file group, where for each of the first non-consecutive file group and each of the second non-consecutive file group, a first reading range of file information in the non-consecutive file group is determined according to an initial location and a first ending location of the target storage block occupied by the to-be-read file corresponding to the piece of file information included in the non-consecutive file group, so as to use the piece of file information corresponding to the first reading range as the third file group, the first ending position is determined according to the difference between the sum of the storage sizes of target storage blocks except the last target storage block in target storage blocks occupied by files to be read corresponding to the file information included in the non-continuous file group and the file size of the files to be read corresponding to the file information included in the non-continuous file group, for each second file group, the second reading range of the file information of the second file group is determined according to the starting position of the target storage block occupied by the files to be read corresponding to the file information included in the second file group and the second ending position of the target storage block occupied by the files to be read corresponding to the last file information included in the second file group, so that the file information corresponding to the second reading range is used as the third file group, and the second ending position is determined according to the last file information included in the second file group The sum of the storage sizes of the target storage blocks except the last target storage block in the target storage blocks occupied by the corresponding files to be read is determined by the difference between the file size of the files to be read corresponding to the last file information of the file information included in the second file group;
a copying unit, configured to copy, according to a reading range corresponding to each third file group, a file to be read, corresponding to the third file group in the hard disk, to a second memory area of a memory of the service system, so as to use copy content corresponding to the third file group in the second memory area as a fourth file group;
a second processing unit, configured to, for a to-be-read file corresponding to each second file group in the fourth file group, perform second redundancy removal on a redundancy space between the to-be-read files corresponding to the second file group according to a starting position of a target storage block occupied by each to-be-read file corresponding to the second file group and a third ending position of the target storage block occupied by each to-be-read file corresponding to the second file group, so as to obtain a fifth file group, where the third ending position is determined according to a difference between a sum of storage sizes of target storage blocks, excluding a last target storage block, in the target storage blocks occupied by each to-be-read file corresponding to the second file group and a file size of each to-be-read file corresponding to the second file group;
and the reading unit is used for reading the fifth file group and the file groups except the files to be read corresponding to the second file group in the fourth file group.
6. The apparatus of claim 5, wherein the apparatus further comprises:
and the selection unit is used for responding to the selection operation of the user on the file type of the file stored in the hard disk and determining the file corresponding to the file type selected by the user as the file to be read.
7. The apparatus of claim 5, wherein the apparatus further comprises:
and the sending unit is used for sending the read files in a group mode in parallel, wherein the sum of the group files sent in parallel is less than the sending bandwidth.
8. The apparatus of claim 7, wherein the files sent in parallel are stored in an asynchronous destage.
9. An electronic device, comprising: processor, memory and bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the file reading method of any of claims 1 to 4.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of a file reading method as claimed in any one of the claims 1 to 4.
CN202211186653.4A 2022-09-28 2022-09-28 File reading method and device, electronic equipment and storage medium Active CN115292247B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211186653.4A CN115292247B (en) 2022-09-28 2022-09-28 File reading method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211186653.4A CN115292247B (en) 2022-09-28 2022-09-28 File reading method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115292247A CN115292247A (en) 2022-11-04
CN115292247B true CN115292247B (en) 2022-12-06

Family

ID=83833365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211186653.4A Active CN115292247B (en) 2022-09-28 2022-09-28 File reading method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115292247B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101233479A (en) * 2005-08-03 2008-07-30 桑迪士克股份有限公司 Management of memory blocks that directly store data files
CN101968791A (en) * 2010-08-10 2011-02-09 深圳市飘移网络技术有限公司 Data storage method and device
CN109726177A (en) * 2018-12-29 2019-05-07 北京赛思信安技术股份有限公司 A kind of mass file subregion indexing means based on HBase
CN110647497A (en) * 2019-07-19 2020-01-03 广东工业大学 HDFS-based high-performance file storage and management system
WO2021073111A1 (en) * 2019-10-15 2021-04-22 平安科技(深圳)有限公司 Distributed storage file reading and writing method, device and platform, and readable storage medium
WO2021169113A1 (en) * 2020-02-26 2021-09-02 平安科技(深圳)有限公司 Data management method and apparatus, and computer device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101233479A (en) * 2005-08-03 2008-07-30 桑迪士克股份有限公司 Management of memory blocks that directly store data files
CN101968791A (en) * 2010-08-10 2011-02-09 深圳市飘移网络技术有限公司 Data storage method and device
CN109726177A (en) * 2018-12-29 2019-05-07 北京赛思信安技术股份有限公司 A kind of mass file subregion indexing means based on HBase
CN110647497A (en) * 2019-07-19 2020-01-03 广东工业大学 HDFS-based high-performance file storage and management system
WO2021073111A1 (en) * 2019-10-15 2021-04-22 平安科技(深圳)有限公司 Distributed storage file reading and writing method, device and platform, and readable storage medium
WO2021169113A1 (en) * 2020-02-26 2021-09-02 平安科技(深圳)有限公司 Data management method and apparatus, and computer device and storage medium

Also Published As

Publication number Publication date
CN115292247A (en) 2022-11-04

Similar Documents

Publication Publication Date Title
JP5320678B2 (en) Data distribution storage system, data distribution method, apparatus used therefor, and program thereof
CN104067239A (en) Systems and methods for data chunk deduplication
JP2005267600A5 (en)
US11474919B2 (en) Method for managing multiple disks, electronic device and computer program product
CN105117351A (en) Method and apparatus for writing data into cache
CN110941514A (en) Data backup method, data recovery method, computer equipment and storage medium
CN115292247B (en) File reading method and device, electronic equipment and storage medium
CN105493080A (en) Method and apparatus for context aware based data de-duplication
CN111831691A (en) Data reading and writing method and device, electronic equipment and storage medium
CN107133334B (en) Data synchronization method based on high-bandwidth storage system
CN117492661A (en) Data writing method, medium, device and computing equipment
CN109658985B (en) Redundancy removal optimization method and system for gene reference sequence
CN108170372B (en) Data processing method and device based on cloud hard disk
US20040123039A1 (en) System and method for adatipvely loading input data into a multi-dimensional clustering table
CN105760114B (en) Parallel file system resource management method, device and system
CN110958212A (en) Data compression method, data decompression method, device and equipment
US8341376B1 (en) System, method, and computer program for repartitioning data based on access of the data
CN115408342A (en) File processing method and device and electronic equipment
CN109086220A (en) A kind of method and apparatus recycling memory space
US10241878B2 (en) System and method of data allocation providing increased reliability of storage
CN110019086A (en) More copy read methods, equipment and storage medium based on distributed file system
CN110362769B (en) Data processing method and device
CN111880735A (en) Data migration method, device, equipment and storage medium in storage system
CN112463741A (en) Cleaning method for aggregated large files and related equipment
JP2010191903A (en) Distributed file system striping class selecting method and distributed file system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant