CN116112513A

CN116112513A - File processing method and distributed file storage system

Info

Publication number: CN116112513A
Application number: CN202310015936.0A
Authority: CN
Inventors: 卢新文
Original assignee: New H3C Technologies Co Ltd
Current assignee: New H3C Technologies Co Ltd
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2023-05-12

Abstract

The present disclosure relates to the field of distributed storage technologies, and in particular, to a file processing method and a distributed file storage system. The distributed file storage system comprises a file management service component and a back-end storage, wherein the file management service component maintains a first mapping relation between each storage node and a topic corresponding to the storage node, and a second mapping relation between the name of each file and a topic corresponding to the storage node for storing the file; the method comprises the following steps: the file management service component receives a file downloading request sent by the front end, wherein the file downloading request carries the name of a target file to be downloaded; determining a target topic corresponding to the name of the target file based on the name of the target file and the second mapping relation; and determining a target storage node for storing the target file based on the target topic and the first mapping relation, and sending a file downloading request to the target storage node.

Description

File processing method and distributed file storage system

Technical Field

The present disclosure relates to the field of distributed storage technologies, and in particular, to a file processing method and a distributed file storage system.

Background

The centralized file storage is to store all files on a single server, and is convenient to use, but is limited by the storage capacity of the single server, so that the current storage requirement cannot be met. Therefore, distributed file storage is the dominant way. Distributed file storage, while solving the limit of storage capacity of a single server, has the problem of long query time or the need for strong consistency of files in a cluster when downloading files.

For example, in one mode, the files of the storage nodes in the distributed cluster do not require strong consistency, and then the download service polls each storage node at the time of file download until a file that needs to be downloaded is found. If the distributed cluster reaches a certain scale, such as thousands of storage nodes, it is very time-consuming to use a node-by-node storage node polling approach.

In order to solve the problem of long time consumption for inquiring and downloading files in a first mode, the files of all storage nodes in the distributed cluster are required to have strong consistency, and each time the files are downloaded, one storage node is randomly found for downloading, however, each storage node in the distributed cluster keeps the strong consistency of the files, and the storage space is greatly wasted.

Therefore, how to improve the file downloading efficiency without requiring strong consistency while saving the storage space is a current urgent problem to be solved.

Disclosure of Invention

The application provides a file processing method and a distributed file storage system.

In a first aspect, the present application provides a file processing method, applied to a distributed file storage system, where the distributed file storage system includes a file management service component and a back-end storage, where the file management service component maintains a first mapping relationship between each storage node and its corresponding topic, and a second mapping relationship between a name of each file and a topic corresponding to the storage node storing the file, and the back-end storage includes a plurality of storage nodes for storing files; the method comprises the following steps:

the file management service component receives a file downloading request sent by a front end, wherein the file downloading request carries the name of a target file to be downloaded;

the file management service component determines a target topic corresponding to the name of the target file based on the name of the target file and the second mapping relation;

the file management service component determines a target storage node for storing the target file based on the target topic and the first mapping relation, and sends the file downloading request to the target storage node.

Optionally, the distributed file storage system further includes a kafka component disposed between the file management service component and the back-end storage, wherein each topic in the kafka component is provided with two types of topics, a producer of the first type of topic is the file management service component, a consumer is the back-end storage, a producer of the second type of topic is the back-end storage, and a consumer is the file management service component.

Optionally, the file management service component determines a first type of target topic corresponding to the name of the target file based on the name of the target file and the second mapping relationship;

the file management service component issues the file download request to the first type of target topic of the kafka component such that a target storage node that is a consumer of the first type of target topic obtains the file download request from the first type of target topic in the kafka component.

Optionally, the method further comprises:

after the target storage node acquires the file downloading request, inquiring whether the target file is locally stored or not, if so, generating a response message carrying the downloading address of the target file, and if not, generating a response message not inquiring the target file;

and the target storage node issues a response message to the second-class target topic in the kafka component, so that the file management service component serving as a consumer of the second-class target topic acquires the response message from the second-class target topic in the kafka component and sends the response message to the front end, and if the response message received by the front end carries the download address of the target file, the target file is downloaded based on the download address.

Optionally, the plurality of storage nodes included in the back-end storage form a plurality of storage clusters, one topic corresponds to at least one storage cluster, and files stored in the plurality of storage nodes in one storage cluster keep strong consistency.

In a second aspect, the present application provides a distributed file storage system, where the distributed file storage system includes a file management service component and a back-end store, where the file management service component maintains a first mapping relationship between each storage node and its corresponding topic, a second mapping relationship between a name of each file and the topic corresponding to the storage node storing the file, and the back-end store includes a plurality of storage nodes for storing files, where,

Optionally, the distributed file storage system further comprises a kafka component disposed between the file management service component and the back-end store, wherein,

each topic in the kafka assembly is provided with two types of topics, a producer of the first type of topic is the file management service assembly, a consumer is the back-end storage, a producer of the second type of topic is the back-end storage, and a consumer is the file management service assembly.

Optionally, after the target storage node obtains the file downloading request, inquiring whether the target file is locally stored or not, if so, generating a response message carrying the downloading address of the target file, and if not, generating a response message not inquiring the target file;

As can be seen from the foregoing, the file processing method provided in the embodiments of the present application is applied to a distributed file storage system, where the distributed file storage system includes a file management service component and a back-end storage, where the file management service component maintains a first mapping relationship between each storage node and its corresponding topic, and a second mapping relationship between a name of each file and a topic corresponding to the storage node storing the file, and the back-end storage includes a plurality of storage nodes for storing files; the method comprises the following steps: the file management service component receives a file downloading request sent by a front end, wherein the file downloading request carries the name of a target file to be downloaded; the file management service component determines a target topic corresponding to the name of the target file based on the name of the target file and the second mapping relation; the file management service component determines a target storage node for storing the target file based on the target topic and the first mapping relation, and sends the file downloading request to the target storage node.

By adopting the file processing method provided by the embodiment of the application, under the condition that the files of all storage nodes in the distributed file storage system do not require strong consistency, the files can be quickly found and downloaded, the storage space is saved, and the efficiency of inquiring/downloading the files is not influenced.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will briefly describe the drawings that are required to be used in the embodiments of the present application or the description in the prior art, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may also be obtained according to these drawings of the embodiments of the present application for a person having ordinary skill in the art.

FIG. 1 is a detailed flowchart of a document processing method according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a file download request sending process according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a file download response return process according to an embodiment of the present application;

fig. 4 is a schematic architecture diagram of a distributed file storage system according to an embodiment of the present application.

Detailed Description

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to any or all possible combinations including one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, a first message may also be referred to as a second message, and similarly, a second message may also be referred to as a first message, without departing from the scope of the present application. Depending on the context, furthermore, the word "if" used may be interpreted as "at … …" or "at … …" or "in response to a determination".

Exemplary, referring to fig. 1, a detailed flowchart of a file processing method provided in an embodiment of the present application is provided, where the method is applied to a distributed file storage system, where the distributed file storage system includes a file management service component and a back-end storage, where the file management service component maintains a first mapping relationship between each storage node and its corresponding topic, and a second mapping relationship between a name of each file and a topic corresponding to a storage node storing the file, and the back-end storage includes a plurality of storage nodes for storing files; the method comprises the following steps:

step 100: the file management service component receives a file downloading request sent by the front end, wherein the file downloading request carries the name of a target file to be downloaded.

Further, in the embodiment of the present application, the distributed file storage system further includes a kafka component disposed between the file management service component and the backend storage, where each topic in the kafka component is provided with two types of topics, a producer of the first type of topic is the file management service component, a consumer is the backend storage, a producer of the second type of topic is the backend storage, and a consumer is the file management service component.

Specifically, the distributed file storage system provided by the embodiment of the application can be composed of a file management service component, a kafka component and a back-end storage. The file management service component can provide an API for inquiring the file to the outside (front end), and simultaneously maintain a file index, wherein the index records the corresponding relation between the file name and the topic, so that the corresponding topic can be conveniently and quickly found through the file name. The file management service component also maintains correspondence between each storage node and its corresponding topic.

It should be noted that a topic may correspond to one or more storage nodes, and files stored in the one or more storage nodes correspond to the topic. Assuming that topic1 corresponds to storage node 1, storage node 2 and storage node 3, when a file (e.g., file 1) is written into the distributed file storage system, if it is determined that the file is written into storage node 1/storage node 2/storage node 3, then the file name (file 1) of the file needs to be added to the file name corresponding to topic1.

Further, in the embodiment of the present application, each of the topic in the kafka component is provided with two types of topic, for example, the first type of topic is query-topic and the second type of topic is result-topic, and then, the corresponding topic1 of the storage node 1, the storage node 2 and the storage node 3 includes query-topic1 and result-topic1. For the query-topic class topic, the message producer is a file management service component, and the message consumption mode is that each storage node (storage node 1, storage node 2 and storage node 3) corresponding to the topic. For a topic of the result-topic class, the message producer is each storage node (storage node 1, storage node 2 and storage node 3) corresponding to the topic, and the message consumer is a file management service component.

Step 110: and the file management service component determines a target topic corresponding to the name of the target file based on the name of the target file and the second mapping relation.

In this embodiment of the present application, after the file management server analyzes the received file download request and determines that the file to be downloaded is the target file, the target topic corresponding to the name of the target file may be determined according to the second mapping relationship between the name of each file maintained locally and the topic corresponding to the storage node storing the file, where the target topic includes a first type target topic (target query-topic) and a second type target topic (target result-topic). Then, in the embodiment of the present application, the file management service component determines, based on the name of the target file and the second mapping relationship, a first type of target topic corresponding to the name of the target file.

In practice, the kafka component provides messaging services. Wherein two types of topic are set in the kafka component, the details of which are shown in Table 1:

TABLE 1

Step 120: the file management service component determines a target storage node for storing the target file based on the target topic and the first mapping relation, and sends the file downloading request to the target storage node.

As can be seen from the above, since the distributed file storage system further includes the kafka component, the file download request is handled by the kafka component in a publish/subscribe manner.

Specifically, the file management service component is a message publisher of a first type of target topic, then the file management service component publishes the file download request to the first type of target topic of the kafka component, and the target storage node is a message consumer of the first type of target topic, then the target storage node as a consumer of the first type of target topic obtains the file download request from the first type of target topic in the kafka component.

For example, referring to fig. 2, a schematic diagram of a file downloading request sending process provided in an embodiment of the present application is shown, and when downloading a file, the front end informs, through an API, a file name that needs to be downloaded by a file downloading service, and the file downloading service searches for the file in an index. If no direct return is found there is no such file. If the file is found, the message { "fileName": "fileName" } is sent to the corresponding query-topic through the Producer. The file storage node corresponding to the query-topic acquires the information of the file by monitoring the query-tiopc.

For example, after the file management service determines that the query-topic corresponding to the file 1 is the query-topic1, the file management service issues the file download request to the query-topic1 of the kafka component, and since the message consumer of the query-topic1 is the file storage node 1, the file storage node 2 and the file storage node 3, the file storage node 1, the file storage node 2 and the file storage node 3 can acquire the file download request from the query-topic1 of the kafka component and process the file download request.

Further, after the target storage node obtains the file downloading request, inquiring whether the target file is locally stored or not, if so, generating a response message carrying the downloading address of the target file, and if not, generating a response message not inquiring the target file.

Further, the target storage node issues a response message to the second type of target topic in the kafka component, so that the file management service component serving as a consumer of the second type of target topic obtains the response message from the second type of target topic in the kafka component, sends the response message to the front end, and downloads the target file based on the download address if the download address of the target file is carried in the response message received by the front end.

For example, referring to fig. 3, a schematic diagram of a file downloading response returning process provided in the embodiment of the present application is shown, and after a file storage node receives a message for querying a file from a query-topic through a Consumer, the file storage node starts to search for the file locally. If a file is found, the file storage node sends a message { "fileName": file name, "filePath": download URL ", findResult": true } to the result-topic through the Producer; if no file is found, the message { "fileName": "," filePath ":", "findResult": false } is sent to result-topic through the Producer. The file management service receives the message returned by the file storage node through the result-topic, and judges whether the file is found or not according to the findResult field. If the file is not found, returning to the front end that the file is not found; if the file is found, return to the URL downloaded by the front-end file, the front-end downloads the file through the URL.

For example, the file storage node 1, the file storage node 2 and the file storage node 3 are message producers of the result-topic1, and the file management service is a message consumer of the result-topic1, so after the file storage node 1, the file storage node 2 and the file storage node 3 process the file download request, whether the file 1 is queried locally or not, a response message is generated and issued to the result-topic1 of the kafka component, and the file management service obtains the response message from the result-topic1 and returns the response message to the front end. If the front end determines that the response message returned by the file storage node 3 carries the download address of the file 1 according to the response messages returned by the file storage node 1, the file storage node 2 and the file storage node 3, and then downloads the file 1 from the file storage node 3 based on the download address of the file 1.

In the embodiment of the application, a plurality of storage nodes included in the back-end storage form a plurality of storage clusters, one topic corresponds to at least one storage cluster, and files stored in the plurality of storage nodes in one storage cluster keep strong consistency.

The following describes the structure of the distributed file storage system provided in the embodiment of the present application in detail in connection with a specific application scenario. Exemplary, referring to fig. 4, a schematic architecture diagram of a distributed file storage system according to an embodiment of the present application is shown.

By way of example, the distributed file storage system includes a file management service, kafka, and a storage service, where file indexes are maintained in the file management service, that is, a mapping relationship between names of files stored in the distributed file storage system and corresponding topics, for example, a file name corresponding to topic1 (including query-topic1 and result-topic 1) has a fileName1 and a fileName2, and a file name corresponding to topic2 (including query-topic2 and result-topic 2) has a fileName3 and a fileName4, … …. In addition, the file management service also maintains a mapping relationship between each storage cluster and its corresponding topic, and then, after a file is written into one storage cluster, the name of the file can be added in the mapping relationship between the topic corresponding to the storage cluster and the file name.

Each of the topics (e.g., topic1, topic2, … …) is created in Kafka, and two types of topics (e.g., query-topic and result-topic) are included for each of the topics, and the topic1 is taken as an example to describe the topic1, the query-topic1 included in the topic1, the message producer (publisher) is a file management service, the message consumer (subscriber) is a storage cluster (e.g., storage cluster 1) corresponding to the topic1, the result-topic1 included in the topic1, the message producer (publisher) is a storage cluster (e.g., storage cluster 1) corresponding to the topic1, and the message consumer (subscriber) is a file management service.

In the illustration, only the memory cluster 1 corresponds to the topic1, the memory cluster 2 corresponds to the topic2, and the memory cluster 3 corresponds to the topic3 is shown as an example, and in practical application, one topic may correspond to one or more memory clusters.

The above units may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more microprocessors (digital singnal processor, abbreviated as DSP), or one or more field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), or the like. For another example, when a unit is implemented in the form of a processing element scheduler code, the processing element may be a general purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the units may be integrated together and implemented in the form of a system-on-a-chip (SOC).

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Moreover, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing description of the preferred embodiments of the present invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention are intended to be included within the scope of the present invention.

Claims

1. The file processing method is characterized by being applied to a distributed file storage system, wherein the distributed file storage system comprises a file management service component and a back-end storage, the file management service component maintains a first mapping relation between each storage node and a topic corresponding to the storage node, a second mapping relation between the name of each file and the topic corresponding to the storage node for storing the file, and the back-end storage comprises a plurality of storage nodes for storing the file; the method comprises the following steps:

2. The method of claim 1, wherein the distributed file storage system further comprises a kafka component disposed between the file management service component and the back-end store, wherein each topic of the kafka component is provided with two types of topics, a producer of a first type of topic being the file management service component, a consumer being the back-end store, a producer of a second type of topic being the back-end store, a consumer being the file management service component.

3. The method of claim 2, wherein the file management service component determines a first type of target topic corresponding to the name of the target file based on the name of the target file and the second mapping relationship;

4. A method as claimed in claim 3, wherein the method further comprises:

5. The method of claim 1, wherein the plurality of storage nodes included in the back-end storage form a plurality of storage clusters, one topic corresponds to at least one storage cluster, and files stored in the plurality of storage nodes in one storage cluster maintain strong consistency.

6. A distributed file storage system, comprising a file management service component and a back-end store, wherein the file management service component maintains a first mapping relationship between each storage node and its corresponding topic, a second mapping relationship between the name of each file and the topic corresponding to the storage node storing the file, the back-end store comprises a plurality of storage nodes for storing files, wherein,

7. The system of claim 6, wherein the distributed file storage system further comprises a kafka component disposed between the file management service component and the back-end store, wherein,

8. The system of claim 7, wherein the file management service component determines a first type of target topic corresponding to the name of the target file based on the name of the target file and the second mapping relationship;

9. The system of claim 8, wherein,

10. The system of claim 6, wherein the plurality of storage nodes included in the back-end storage form a plurality of storage clusters, one topic corresponds to at least one storage cluster, and files stored in the plurality of storage nodes in one storage cluster maintain strong consistency.