CN118193468A - File reading method and related equipment - Google Patents

File reading method and related equipment Download PDF

Info

Publication number
CN118193468A
CN118193468A CN202310477707.0A CN202310477707A CN118193468A CN 118193468 A CN118193468 A CN 118193468A CN 202310477707 A CN202310477707 A CN 202310477707A CN 118193468 A CN118193468 A CN 118193468A
Authority
CN
China
Prior art keywords
file
storage node
identifier
node cluster
electronic device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310477707.0A
Other languages
Chinese (zh)
Inventor
冯锐
王淇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN118193468A publication Critical patent/CN118193468A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a file reading method, in which an electronic device sends a first reading request for a first file in a plurality of files to a storage node cluster, wherein the first reading request is obtained according to a virtual file identifier and a first identifier of the first file in the electronic device, and the virtual file identifier is an identifier of the whole of the plurality of files; the electronic equipment receives a first file from the storage node cluster, wherein the first file is acquired by the storage node cluster according to a second identification of the first file in the storage node cluster, and the second identification of the first file is acquired by the storage node cluster according to a first reading request. According to the method, after the electronic device obtains the virtual file identifier, the electronic device can send a reading instruction for each file to the storage node cluster and receive the corresponding file according to the identifier of the virtual file identifier and the first identifier of each file in the electronic device, so that frequent interactive operation is avoided, and time delay caused by the interactive operation is reduced.

Description

File reading method and related equipment
The present application claims priority from chinese patent application No. 202211608313.6, entitled "an AI training acceleration method," filed on 12 months 14 of 2022, the entire contents of which are incorporated herein by reference.
Technical Field
The application relates to the technical field of data reading, in particular to a file reading method and related equipment.
Background
With the continued development of computer technology, access to massive files by electronic devices is often involved in a scenario such as training a machine learning model.
For example, in training a machine learning model, an electronic device performing training typically needs to read a massive number of files stored in a storage device as training samples. In the process of reading mass files by the electronic device, frequent interaction operation needs to be executed between the electronic device and the storage device for storing the files.
Wherein for each file to be read, the electronic device first needs to send an open instruction to the storage device to request a handle to the file in the storage device. After obtaining the handle of the file in the storage device, the electronic device may request to read the file from the storage device according to the handle of the file in the storage device. After the electronic device reads the file from the storage device, a close instruction needs to be sent to the storage device to instruct the storage device to close the opened file.
It can be seen that for each file to be read, the above-mentioned multiple interactions between the electronic device and the storage device need to be performed, and frequent interactions may result in a longer time-consuming and less efficient file reading process.
Disclosure of Invention
The application provides a file reading method, which aims to solve the problems of long time consumption and low file reading efficiency in the whole data reading process caused by frequent interaction operation when an electronic device reads a plurality of files from a storage device. The application also provides a corresponding apparatus, device, computer readable storage medium, computer program product, etc.
The first aspect of the present application provides a file reading method, which includes: the method comprises the steps that a storage node cluster receives a first reading request of an electronic device for a first file in a plurality of files, wherein the first reading request is obtained by the electronic device according to a virtual file identifier and a first identifier of the first file in the electronic device, and the virtual file identifier is an identifier of the whole of the plurality of files; the storage node cluster obtains a second identifier of the first file in the storage node cluster according to the first reading request; and the storage node cluster sends the first file to the electronic equipment according to the second identification of the first file.
In a first aspect, virtual file identifiers are used to represent the entirety of a plurality of files, such that the plurality of files may be presented to an electronic device in the form of one virtual file identifier.
Thus, the electronic device can request to read any one of the plurality of files according to the virtual file identification, and the storage node cluster can provide any one of the plurality of files to the electronic device according to the virtual file identification. For example, when a first file of the plurality of files needs to be read, the electronic device may obtain a first read request according to the virtual file identifier and a first identifier of the first file in the electronic device, and send the first read request to the storage node cluster. The storage node cluster may send the first file to the electronic device after receiving the first read request.
In the conventional technology, for each file, the storage node cluster can only send feedback information to the electronic device after receiving an opening instruction sent by the electronic device for the file; the electronic device can then send a read instruction for the file to the storage node cluster so that the storage node cluster can provide the file to the electronic device. In the scheme, the electronic device can generate and send the first reading request based on the identifier of the virtual file identifier and the first identifier of each file in the electronic device, and the storage node cluster can send the corresponding file to the electronic device according to the first reading request without executing multiple interactive operations, so that time delay caused by the interactive operations is reduced, and data reading efficiency is improved.
In a possible implementation manner of the first aspect, the first read request includes a second identifier of the first file, where the second identifier of the first file is queried by the electronic device according to the first identifier of the first file from index information corresponding to the virtual file identifier, and the index information includes an index from the first identifier of each file in the electronic device to the second identifier of the corresponding file in the storage node cluster.
In this possible implementation manner, in the virtual file creation stage, the storage node cluster may send the virtual file identifier and index information of the virtual file identifier to the electronic device, so that the electronic device may query the second identifier of the first file in the storage node cluster through the index information and the first identifier of the first file in the electronic device.
In a possible implementation manner of the first aspect, the storage node cluster includes a plurality of storage nodes; the storage node cluster receives a first read request from an electronic device for a first file of a plurality of files, comprising: a target storage node in the plurality of storage nodes receives a first reading request carrying a second identifier of a first file, wherein the target storage node is a storage node storing the first file; the storage node cluster sends the first file to the electronic device based on the second identification of the first file, including: the target storage node sends the first file to the electronic device based on the second identification of the first file.
In conventional technology, communication with the electronic device is typically through a fixed storage node in a cluster of storage nodes. The fixed storage node receives a first read request from the electronic device, and after the first read request is received, the storage node where the first file is located is queried through index information. In many cases, the storage node where the first file is located is different from the fixed storage node, so that information forwarding needs to be performed inside the storage node cluster, so that the first file is obtained from the storage node where the first file is located, and then the first file is forwarded inside the storage node cluster and then sent to the electronic device by the fixed storage node.
In this conventional information transmission process, a series of data forwarding operations need to be performed inside the storage node cluster, which results in longer time consumption and lower information processing efficiency.
In this possible implementation manner, after determining the storage node where the first file is located, the electronic device may determine the storage node where the first file is located as the target storage node, and directly send the first read request to the target storage node. In this way, the target storage node can directly send the first file stored in the target storage node to the electronic device without executing complex forwarding operation inside the storage node cluster, so that the interaction efficiency is improved, and the processing time is reduced.
In a possible implementation manner of the first aspect, the first reading instruction carries a virtual file identifier and a first identifier of the first file; the storage node cluster obtains a second identification of the first file in the storage node cluster based on the first read request, comprising: the storage node cluster queries the second identification of the first file from index information corresponding to the virtual file identification based on the first identification of the first file, wherein the index information comprises an index from the first identification of each file in the electronic device to the second identification of the corresponding file in the storage node cluster.
In this possible implementation manner, the first read request carries the virtual file identifier and the first identifier of the first file, so that after the storage node cluster receives the first read request, index information corresponding to the virtual file identifier is determined according to the virtual file identifier, and the second identifier of the first file in the storage node cluster is queried from the index information according to the first identifier of the first file.
In a possible implementation manner of the first aspect, before the storage node cluster receives the first read request from the electronic device for the first file of the plurality of files, the method further includes: the storage node cluster receives a creation instruction from the electronic equipment; the storage node cluster creates virtual file identifiers for identifying the whole files according to the creation instruction; the storage node cluster sends the virtual file identification to the electronic device.
In this possible implementation, the virtual file identifier may identify the entirety of the plurality of files, but the plurality of files need not be physically combined into one truly large file in the storage node cluster, so that operations such as data handling and combining are not required by the storage node cluster to perform on the plurality of files in creating the virtual file identifier. However, for the electronic device side, the virtual file identifier is used to describe the whole of the plurality of files, that is, in the electronic device, the virtual file identifier may be considered as an identifier of a file, where the file includes the whole of the plurality of files, but in the storage node cluster, the plurality of files do not need to be actually synthesized. Thus, the virtual file identity may be considered to describe a virtual large file, rather than a real existing large file in a storage node cluster, obtained by handling and compositing individual files in a plurality of files.
In a possible implementation manner of the first aspect, the creating instruction carries a file list, and the file list includes a first identifier of each file in the plurality of files in the electronic device; the method further comprises the steps of: and the storage node cluster generates index information corresponding to the virtual file identifications according to the creation instruction, so that the electronic equipment or the storage node cluster obtains a second identification of the first file according to the index information and the first identification of the first file, and the index information comprises an index from the first identification of each file in the electronic equipment to the second identification of the corresponding file in the storage node cluster.
In this possible implementation manner, according to the creation instruction, the storage node cluster may obtain and store the virtual file identifier and index information corresponding to the virtual file identifier, and the electronic device may obtain the virtual file identifier, and in some examples, may also obtain index information corresponding to the virtual file identifier. Thus, the file reading operation of at least one file in the plurality of files can be realized according to the virtual file identifier and the index information corresponding to the virtual file identifier.
In a possible implementation manner of the first aspect, the file list further includes a size of each of the plurality of files, and the first read request includes the size of the first file.
In a possible implementation manner of the first aspect, after the sending, by the storage node cluster, the first file to the electronic device based on the second identification of the first file, the method further includes: the storage node cluster receives a second reading request of a second file in the plurality of files from the electronic device, wherein the second reading request is obtained by the electronic device according to the virtual file identifier and the first identifier of the second file in the electronic device; the storage node cluster obtains a second identifier of the second file in the storage node cluster according to the second reading request; the storage node cluster sends the second file to the electronic equipment according to the second identification of the second file; after the second file is sent, the storage node cluster receives a reading stopping instruction from the electronic equipment; and stopping acquiring any one of the files by the storage node cluster according to the reading stopping instruction.
In the possible implementation manner, before at least two files in the plurality of files are read, the interaction operation for creating the virtual file identifier is only required to be executed once between the electronic equipment and the storage node cluster, and after the at least two files in the plurality of files are read, the interaction operation for ending the reading is only required to be executed once between the electronic equipment and the storage node cluster, so that the interaction times of the whole process are greatly reduced, the time consumption of the whole process is reduced, and the file reading efficiency is improved.
The second aspect of the present application provides a file reading method, which includes: the electronic equipment sends a first reading request for a first file in the plurality of files to the storage node cluster, wherein the first reading request is obtained according to a virtual file identifier and a first identifier of the first file in the electronic equipment, and the virtual file identifier is an identifier of the whole plurality of files; the electronic equipment receives a first file from the storage node cluster, wherein the first file is acquired by the storage node cluster according to a second identification of the first file in the storage node cluster, and the second identification of the first file is acquired by the storage node cluster according to a first reading request.
In a possible implementation manner of the second aspect, the method further includes: the electronic equipment inquires a second identifier of the first file from index information corresponding to the virtual file identifier according to the first identifier of the first file, wherein the index information comprises an index from the first identifier of each file in the electronic equipment to the second identifier of the corresponding file in the storage node cluster; the electronic device sending a first read request for a first file of the plurality of files to the storage node cluster, comprising: the electronic device sends a first read request carrying a second identification of the first file to the cluster of storage nodes.
In a possible implementation manner of the second aspect, the storage node cluster includes a plurality of storage nodes; the electronic device sending a first read request carrying a second identification of a first file to the storage node cluster, comprising: the electronic equipment sends a first reading request carrying a second identifier of a first file to a target storage node in a plurality of storage nodes, wherein the target storage node is a storage node storing the first file; the electronic device receives a first file from a storage node cluster, comprising: the electronic device receives a first file from a target storage node.
In one possible implementation manner of the second aspect, the electronic device sending, to the storage node cluster, a first read request for a first file of the plurality of files, includes: the electronic device sends a first reading request carrying a virtual file identifier and a first identifier of a first file to the storage node cluster, so that the storage node cluster is instructed to inquire a second identifier of the first file from index information corresponding to the virtual file identifier based on the first identifier of the first file, and the index information comprises an index from the first identifier of each file in the electronic device to the second identifier of the corresponding file in the storage node cluster.
In a possible implementation manner of the second aspect, before the electronic device sends the first read request for the first file of the plurality of files to the storage node cluster, the method further includes: the electronic equipment sends a creation instruction to the storage node cluster, wherein the creation instruction is used for instructing the storage node cluster to create a virtual file identifier for identifying the whole of a plurality of files; the electronic device receives feedback information from the storage node cluster, wherein the feedback information comprises a virtual file identifier.
In a possible implementation manner of the second aspect, the creating instruction carries information of a file list, and the file list includes a first identification of each file in the plurality of files in the electronic device; the creation instruction is further used for instructing the storage node cluster to generate index information corresponding to the virtual file identifications, so that the electronic device or the storage node cluster obtains a second identification of the first file according to the index information and the first identification of the first file, and the index information comprises an index from the first identification of each file in the electronic device to the second identification of the corresponding file in the storage node cluster.
In a possible implementation manner of the second aspect, after the electronic device receives the first file from the storage node cluster, the method further includes: the electronic device sends a second reading request for a second file in the plurality of files to the storage node cluster, wherein the second reading request is obtained according to the virtual file identifier and the first identifier of the second file in the electronic device; the electronic equipment receives a second file from the storage node cluster, wherein the second file is obtained by the storage node cluster according to a second identifier of the second file in the storage node cluster, and the second identifier of the second file is obtained by the storage node cluster according to a second reading request; after receiving the second file, the electronic device sends a reading stopping instruction to the storage node cluster, wherein the reading stopping instruction is used for indicating the electronic device to stop reading any file in the plurality of files.
A third aspect of the present application provides a document reading apparatus having the functionality to implement the method of the first aspect or any one of the possible implementations of the first aspect. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above, such as: the device comprises a receiving module, a processing module and a sending module.
A fourth aspect of the application provides a storage node cluster comprising one or more storage nodes, any storage node comprising at least one processor, memory and computer-executable instructions stored in the memory and executable on the processor, the processor performing the method as described above in the first aspect or any one of the possible implementations of the first aspect when the computer-executable instructions are executed by the processor.
A fifth aspect of the application provides a computer readable storage medium storing one or more computer executable instructions which, when executed by a processor, perform a method as described above or any one of the possible implementations of the first aspect.
A sixth aspect of the application provides a computer program product storing one or more computer-executable instructions which, when executed by a processor, perform a method as described above or any one of the possible implementations of the first aspect.
A seventh aspect of the application provides a chip system comprising a processor for supporting a cluster of storage nodes to implement the functionality referred to in the first aspect or any one of the possible implementations of the first aspect. In one possible design, the chip system may further include a memory to hold the necessary program instructions and data. The chip system can be composed of chips, and can also comprise chips and other discrete devices.
An eighth aspect of the present application provides a document reading apparatus having the functionality to implement the method of the second aspect or any one of the possible implementations of the second aspect. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above, such as: the device comprises a sending module, a receiving module and a query module.
A ninth aspect of the application provides an electronic device comprising at least one processor, a memory and computer-executable instructions stored in the memory and executable on the processor, the processor performing the method as described above for the second aspect or any one of the possible implementations of the second aspect when the computer-executable instructions are executed by the processor.
A tenth aspect of the application provides a computer readable storage medium storing one or more computer executable instructions which, when executed by a processor, perform a method as described above in the second aspect or any one of the possible implementations of the second aspect.
An eleventh aspect of the application provides a computer program product storing one or more computer-executable instructions which, when executed by a processor, perform a method as described above in the second aspect or any one of the possible implementations of the second aspect.
A twelfth aspect of the application provides a chip system comprising a processor for supporting an electronic device to implement the functionality referred to in the second aspect or any one of the possible implementations of the second aspect. In one possible design, the chip system may further include a memory to hold the necessary program instructions and data. The chip system can be composed of chips, and can also comprise chips and other discrete devices.
The technical effects of the third aspect to the twelfth aspect or any possible implementation manner of the third aspect may be referred to technical effects of the first aspect or technical effects of relevant possible implementation manners of the first aspect, which are not described herein.
Drawings
FIG. 1 is a schematic diagram of an interaction between an electronic device and a storage device during mass file reading in the conventional art;
FIG. 2 is an exemplary schematic diagram of a communication system provided by an embodiment of the present application;
FIG. 3 is an exemplary schematic diagram illustrating a processing stage of a file reading method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an embodiment of a method for reading a document according to the present application;
FIG. 5 is a schematic diagram of an embodiment of a method for reading a document according to the present application;
FIG. 6 is a schematic diagram of an embodiment of a method for reading a document according to the present application;
FIG. 7a is an exemplary diagram illustrating interactions between an electronic device and a cluster of storage nodes provided by an embodiment of the application;
FIG. 7b is another exemplary diagram illustrating interactions between an electronic device and a cluster of storage nodes provided by an embodiment of the application;
FIG. 8 is a schematic diagram of an embodiment of a document reading apparatus according to the present application;
FIG. 9 is a schematic diagram of another embodiment of a document reading apparatus according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of a storage node cluster according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings. As one of ordinary skill in the art can know, with the development of technology and the appearance of new scenes, the technical scheme provided by the embodiment of the application is also applicable to similar technical problems.
In the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of the following" or similar expressions thereof, means any combination of these items, including any combination of single or plural items. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which embodiments of the application have been described in connection with the description of the objects having the same attributes. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
With the continued development of computer technology, access to massive files by electronic devices is often involved in a scenario such as training a machine learning model.
For example, in training a machine learning model, an electronic device performing training typically needs to read a massive number of files stored in a storage device as training samples. In the process of reading mass files by the electronic device, frequent interaction operation needs to be executed between the electronic device and the storage device for storing the files.
For example, as shown in FIG. 1, for each file to be read (e.g., file 1-File N), the electronic device first needs to send an open instruction to the storage device to request that the actual handle of the file in the storage device be obtained. After obtaining the real handle of the file fed back by the storage device in the storage device, the electronic device may send a reading instruction to the storage device according to the real handle of the file in the storage device, so as to request to read the file. After the electronic device reads the file from the storage device, a close instruction needs to be sent to the storage device to instruct the storage device to close the opened file.
It can be seen that for each file to be read, the above-described multiple interactions between the electronic device and the storage device need to be performed.
In many scenarios, each file in the massive files is usually not large, the time consumption when reading each file through a reading instruction is usually not long, but in the whole interaction process, a long time is required to be spent on related interaction operations such as an opening instruction, a closing instruction and the like corresponding to each file, so that the time consumption of the whole interaction process is long, the file reading efficiency is low, and the resource cost of frequent interaction operations is also high.
The embodiment of the application provides a file reading method, which can solve the problems of long time consumption and low file reading efficiency in the whole data reading process caused by frequent interaction operation when the electronic equipment reads a plurality of files from the storage equipment.
The file reading method of the embodiment of the application can be applied to a communication system.
As shown in fig. 2, the communication system may include an electronic device and a cluster of storage nodes.
The electronic device may be a terminal device, a single server or a server cluster, or may be a Virtual Machine (VM) or a container (container).
The type of terminal device is not limited herein.
By way of example, the terminal device may be one or a combination of mobile phone (mobile phone), tablet (pad), computer with wireless transceiving functionality, virtual Reality (VR) terminal, augmented reality (augmented reality, AR) terminal, terminal in industrial control (industrial control), terminal in unmanned (SELF DRIVING), terminal in remote medical (remote medical), terminal in smart grid (SMART GRID), terminal in transportation security (transportation safety), terminal in smart city (SMART CITY), terminal in smart home (smart home), terminal in internet of things (internet of things, ioT), wearable device, robot, etc.
One or more storage nodes may be included in a storage node cluster.
The particular type and structure of the storage node cluster is also not limited herein.
In one example, the storage node cluster is a distributed file storage system. In another example, the storage node cluster is a file storage array, e.g., the storage node cluster may be a type of storage device such as a network attached storage (network attached storage, NAS) device.
The specific type of each storage node may also be varied and is not limited in this regard. Each storage node includes, but is not limited to, a storage function. The function of each storage node may be the same or may be different. By way of example, the plurality of storage nodes may include backup nodes, so as to implement backup redundancy of some or all files in the storage node cluster, and improve the capability of continuously providing file services in the node failure scenario.
Also, multiple storage nodes in a storage node cluster may be located at the same physical location, e.g., in the same data center; or may be located at different physical locations, e.g., there are any two storage nodes located in different data centers.
The electronic device is capable of communicating with the storage node cluster to read files from the storage node cluster.
Wherein a plurality of files may be stored in, but not limited to, a storage node cluster. Any one of the plurality of files may also be referred to as a computer file, which is a segment of a data stream stored in a cluster of storage nodes. The type of each of the plurality of files is not limited herein. The type of any of the plurality of files may be, for example, a document, an image, video, text, or voice, although other types of data are also possible. Wherein the plurality of files may be part of the files in the storage node cluster and are not limited to include all the files in the storage node cluster.
As shown in fig. 3, based on the above communication system, the file reading method in the embodiment of the present application mainly includes the following three stages: virtual file identification creation phase, file reading phase and reading end phase.
The above three phases are each described exemplarily below.
1. Virtual file identification creation phase
In the embodiment of the application, the virtual file identifier can be created to describe the whole of the plurality of files through the virtual file identifier, and the electronic equipment can read any one of the plurality of files according to the virtual file identifier later without the need of the electronic equipment to interact with the storage node cluster for multiple times so as to respectively request the information such as the handle of each file in the plurality of files in the storage node cluster and the like to read the corresponding file.
Specifically, as shown in FIG. 4, in some embodiments, the file processing method may include steps 401-405.
In step 401, the electronic device sends a creation instruction to a storage node cluster.
The creation instruction is for instructing the storage node cluster to create a virtual file identification identifying an entirety of the plurality of files.
In the embodiment of the application, the electronic equipment can instruct the storage node cluster to create the virtual file identifier for identifying the whole of the plurality of files through the creation instruction.
The creation instruction may not include the virtual file identifier, but may include information of a plurality of files, so that the storage node cluster can determine the entirety of the plurality of files that need to be identified, and instruct the storage node cluster to generate the virtual file identifier according to the creation instruction.
In one embodiment, the creation instruction carries information of a file list, wherein the file list comprises a first identification of each file in the plurality of files in the electronic device;
the creation instruction is further used for instructing the storage node cluster to generate index information corresponding to the virtual file identifications, wherein the index information comprises an index from a first identification of each file in the electronic device to a second identification of the corresponding file in the storage node cluster.
In this way, the electronic device may instruct the storage node cluster to create the virtual file identifier and index information corresponding to the virtual file identifier through the creation instruction.
Wherein the index information includes an index from a first identification of each file in the electronic device to a second identification of the corresponding file in the cluster of storage nodes. Thus, through the index information, the second identifier of each file in the storage node cluster can be queried according to the first identifier of the corresponding file in the electronic device. The indexing algorithm used for the indexing information is not limited herein, and the indexing information is illustratively created by means of a key-value (key-value) index, where key is a first identifier of a file in the electronic device and value is a second identifier of the file in the storage node cluster.
In step 402, a storage node cluster receives a create instruction from an electronic device.
In step 403, the storage node cluster creates a virtual file identifier that identifies the entirety of the plurality of files according to the creation instruction.
In the embodiment of the application, the virtual file identifier can identify the whole of a plurality of files, but the files are not required to be actually combined into a real large file, so that the storage node cluster does not need to perform operations such as data handling and combination on the files in the process of creating the virtual file identifier. However, for the electronic device side, the virtual file identifier is used to describe the whole of the plurality of files, that is, in the electronic device, the virtual file identifier may be considered as an identifier of a file, where the file includes the whole of the plurality of files, but in the storage node cluster, the plurality of files do not need to be actually synthesized. Thus, the virtual file identity may be considered to describe a virtual large file, rather than a real existing large file in a storage node cluster, obtained by handling and compositing individual files in a plurality of files.
In addition, in some examples, the creation instruction may further include information of a plurality of files, and generate index information corresponding to the virtual file identifier, so as to implement a subsequent file reading operation through the index information.
Specifically, in some embodiments, the creation instruction carries information of a file list, where the file list includes a first identification of each of the plurality of files in the electronic device;
The creation instruction is further used for instructing the storage node cluster to generate index information corresponding to the virtual file identifications, so that the electronic device or the storage node cluster obtains a second identification of the first file according to the index information and the first identification of the first file, and the index information comprises an index from the first identification of each file in the electronic device to the second identification of the corresponding file in the storage node cluster.
In this way, the electronic device may instruct the storage node cluster to create the virtual file identifier and index information corresponding to the virtual file identifier through the creation instruction.
Wherein the first identification of the first file in the electronic device may uniquely identify the first file in the electronic device. For example, the first identification of the first file in the electronic device may include a file name of the first file in the electronic device and path information of the first file.
The second identification of the first file in the storage node cluster may uniquely identify the first file in the storage node cluster. For example, the second identifier of the first file in the storage node cluster may be a handle of the first file in the storage node cluster, through which the storage node cluster may find the first file in the storage node cluster.
The index information includes an index from a first identification of each file in the electronic device to a second identification of the corresponding file in the storage node cluster. Thus, through the index information, the second identifier of each file in the storage node cluster can be queried according to the first identifier of the corresponding file in the electronic device. The indexing algorithm used for the indexing information is not limited herein, and the indexing information is illustratively created by means of a key-value (key-value) index, where key is a first identifier of a file in the electronic device and value is a second identifier of the file in the storage node cluster.
In some embodiments, the file list further includes a size of each of the plurality of files.
In this way, the storage node cluster can determine the size of each file, so that in the subsequent file reading process, the file is read according to the size of the file.
The specific creation process of the index information and the virtual file identification is described below by way of a specific example.
In this example, a training process for a machine learning model is involved.
In particular, training of the machine learning model may be performed on the electronic device.
In the training process, the machine learning model needs to be trained through the training data set. The training dataset may include a large number of training samples and a label corresponding to each training sample. At this time, all training samples in the training data set for training may be used as a plurality of files in the embodiment of the present application, where any training sample may be used as any file in the plurality of files.
In many scenarios, to facilitate training, the electronic device may obtain a file list that may include information for all training samples in the training dataset, e.g., may include one or more of the name, sequence number, path (path), tag (1 abel), and size (size) of each training sample.
The format of the file list may be a preset format, so that the storage node cluster can extract information in the file list. The specific generation mode of the file list can be various.
For example, in one example, the electronic device may obtain a file manifest of all training samples including the training data set from a pathway such as a provider of the training data set. The file list may include information of path, tag (1 abel), and size (size) of each training sample. Wherein the path may be a path generated to distinguish training samples under different labels in the training dataset.
For example, the content of the file list of the training data set acquired in the electronic device is as follows:
After the electronic device obtains the file list, the electronic device may read the file list through an initialization (init) function, so as to analyze the content of the file list, and generate a file list in a preset format.
At this time, the file list may include one or more of information such as a name, a serial number, a path (path), a tag (1 abel), and a size (size) of each training sample.
In another example, the file list may be generated by another device and sent to the storage node cluster, and information, such as a name or a number, of the file list, capable of identifying the file list may also be sent to the electronic device.
In this way, after receiving a creation instruction from the electronic device and carrying a file list, the storage node cluster may create, according to the creation instruction, a virtual file identifier and index information that identify the whole of the plurality of files, and may associate the virtual file identifier with the index information, for example, may add the virtual file identifier to a header of the index information to indicate that the index information is index information corresponding to the virtual file identifier.
The index information comprises an index from a first identifier such as a path or a name of each file in the electronic device to a second identifier such as a handle of the corresponding file in the storage node cluster. Thus, through the index information, the handle of the corresponding file in the storage node cluster can be queried according to the path or name and other information of each file in the electronic equipment, so that the corresponding file can be acquired from the storage node cluster according to the handle.
After creating the virtual file identification and index information, the storage node cluster may store the virtual file identification and index information.
The index information may enable the subsequent file reading node to obtain, by the electronic device or the storage node cluster, the second identifier of the first file according to the index information and the first identifier of the first file.
The specific manner in which the subsequent file reading is achieved by means of the virtual file identification and the index information may be referred to in the specific description of the relevant embodiments of the file reading phase.
Or in another example, the storage node cluster may obtain the file list through other means (e.g., receive the file list from other devices), and the creation instruction may include a name of the file list, so that the storage node cluster may create a virtual file identification for identifying the entirety of the plurality of files.
In step 404, the storage node cluster sends a virtual file identification to the electronic device.
At step 405, the electronic device receives feedback information from the storage node cluster.
The feedback information includes a virtual file identification.
In some examples, the feedback information further includes a virtual file identification.
It can be seen that, through the virtual file identifier creation phase, the storage node cluster may obtain and store the virtual file identifier and index information corresponding to the virtual file identifier, while the electronic device may obtain the virtual file identifier, and in some examples, may also obtain index information corresponding to the virtual file identifier.
Then, according to the virtual file identifier and index information corresponding to the virtual file identifier, file reading operation on at least one file in the plurality of files can be realized.
By way of example, the implementation of the document reading phase is described below.
2. Document reading stage
Specifically, as shown in FIG. 5, in some embodiments, the file processing method may include steps 406-4010.
In step 406, the electronic device sends a first read request for a first file of the plurality of files to the storage node cluster.
The first read request is obtained from a virtual file identifier, which is an identifier of the entirety of the plurality of files, and a first identifier of the first file in the electronic device.
In the embodiment of the present application, the plurality of files may be stored in the storage node cluster, so the electronic device may request, through the first read request, to read a first file in the plurality of files from the storage node cluster.
In the embodiment of the present application, there may be multiple cases of information carried in the first read request.
In the following, an exemplary description will be given of a processing procedure corresponding to different cases of information carried in the first read request.
(1) The first read request carries a second identification of the first file.
Specifically, in some embodiments, the method further comprises:
The electronic equipment inquires a second identifier of the first file from index information corresponding to the virtual file identifier according to the first identifier of the first file, wherein the index information comprises an index from the first identifier of each file in the electronic equipment to the second identifier of the corresponding file in the storage node cluster;
The electronic device sending a first read request for a first file of the plurality of files to the storage node cluster, comprising:
the electronic device sends a first read request carrying a second identification of the first file to the cluster of storage nodes.
At this time, in the virtual file creation stage, the storage node cluster may send the virtual file identifier and index information of the virtual file identifier to the electronic device, so that the electronic device may query the second identifier of the first file in the storage node cluster through the index information and the first identifier of the first file in the electronic device.
(2) The first read request may carry the virtual file identifier and a first identifier of the first file in the electronic device.
Specifically, in some embodiments, the electronic device sending a first read request for a first file of the plurality of files to the cluster of storage nodes includes:
The electronic device sends a first reading request carrying a virtual file identifier and a first identifier of a first file to the storage node cluster, so that the storage node cluster is instructed to inquire a second identifier of the first file from index information corresponding to the virtual file identifier based on the first identifier of the first file, and the index information comprises an index from the first identifier of each file in the electronic device to the second identifier of the corresponding file in the storage node cluster.
The first read request carries the virtual file identifier and the first identifier of the first file, so that after the storage node cluster receives the first read request, index information corresponding to the virtual file identifier is determined according to the virtual file identifier, and the second identifier of the first file in the storage node cluster is queried from the index information according to the first identifier of the first file.
In step 407, the storage node cluster receives a first read request from the electronic device for a first file of the plurality of files.
In step 408, the storage node cluster obtains a second identifier of the first file in the storage node cluster according to the first read request.
In some examples, the first read request carries the second identifier of the first file in the storage node cluster, and thus the storage node cluster may directly obtain the second identifier of the first file in the storage node cluster from the first read request.
In yet other examples, the first read request carries the virtual file identification and the first identification of the first file. At this time, after the storage node cluster receives the first read request, index information corresponding to the virtual file identifier is determined according to the virtual file identifier, and according to the first identifier of the first file, a second identifier of the first file in the storage node cluster is queried from the index information.
Step 409, the storage node cluster sends the first file to the electronic device according to the second identifier of the first file.
Because the second identifier of the first file in the storage node cluster may uniquely identify the first file in the storage node cluster, for example, the second identifier of the first file may be a handle of the first file in the storage node cluster, the storage node cluster may determine a storage location of the first file in the storage node cluster according to the second identifier of the first file, thereby acquiring the first file and transmitting the first file to the electronic device.
In step 4010, the electronic device receives a first file from a cluster of storage nodes.
In the embodiment of the application, the virtual file identifier is obtained in advance in the electronic device, and the first identifier of any file (for example, the first file) in the plurality of files is obtained in advance in the electronic device, so that when any file in the plurality of files needs to be read, according to the identifier of the virtual file identifier and the first identifier of each file in the electronic device, a reading instruction for the file can be sent to the storage node cluster and the corresponding file is received, and the electronic device does not need to execute interaction operations such as sending an opening instruction to the storage node cluster for each file.
It can be seen that in the embodiment of the present application, not only the first file of the plurality of files may be read from the storage node cluster, but also the second file of the plurality of files may be read from the storage node cluster according to the identifier of the virtual file identifier and the first identifier of each file in the electronic device.
Specifically, in some embodiments, as shown in FIG. 6, after the electronic device receives the first file from the storage node cluster, the method further comprises steps 4011-4015.
In step 4011, the electronic device sends a second read request for a second file of the plurality of files to the storage node cluster.
The second read request is derived from the virtual file identification and the first identification of the second file in the electronic device.
Wherein the first identification of the second file in the electronic device may uniquely identify the second file in the electronic device. For example, the first identification of the second file in the electronic device may include a file name of the second file in the electronic device and path information of the first file.
The second identification of the second file in the storage node cluster may uniquely identify the second file in the storage node cluster. For example, the second identifier of the second file in the storage node cluster may be a handle of the first file in the storage node cluster, through which the storage node cluster may find the second file in the storage node cluster.
In step 4012, the storage node cluster receives a second read request from the electronic device for a second file of the plurality of files.
Step 4013, the storage node cluster obtains a second identifier of the second file in the storage node cluster according to the second read request.
In some examples, the second read request carries a second identifier of the second file in the storage node cluster, and thus the storage node cluster may directly obtain the second identifier of the second file in the storage node cluster from the second read request.
In yet other examples, the second read request carries the virtual file identification and the first identification of the second file. And because the second file is contained in the plurality of files, the second identification of the second file can be queried from index information corresponding to the virtual file identification according to the first identification of the second file.
At this time, after the storage node cluster receives the second read request, index information corresponding to the virtual file identifier is determined according to the virtual file identifier, and according to the first identifier of the second file, the second identifier of the second file in the storage node cluster is queried from the index information.
In step 4014, the storage node cluster sends the second file to the electronic device according to the second identification of the second file.
In step 4015, the electronic device receives a second file from the storage node cluster.
In the embodiment of the application, the electronic device can send the reading instruction of the second file to the storage node cluster and receive the second file according to the identifier of the virtual file identifier and the first identifier of the second file in the electronic device, so that the interactive operation such as sending the opening instruction to the storage node cluster for each file is not required to be executed, the time delay caused by frequent interactive operation is reduced, and the data reading efficiency is improved.
The number of second files is not limited herein. Based on the same or similar steps as the file reading operation described above, the electronic device may send the related reading instruction of any one of the plurality of files to the storage node cluster multiple times, and read the corresponding file from the storage node cluster.
3. Read end stage
In some embodiments, as shown in fig. 6, after receiving the second file, the method further includes steps 4016-4017.
In step 4016, the electronic device sends a stop read command to the storage node cluster.
The stop reading instruction is used for indicating the electronic equipment to stop reading any file in the plurality of files.
In step 4017, the storage node cluster stops acquiring any one of the plurality of files according to the stop reading instruction.
In the embodiment of the present application, the specific content of the stop reading instruction may have various situations.
In one example, the stop read instruction may be specifically a delete instruction, where the delete instruction is used to instruct the storage node cluster to delete the virtual file identifier, and may also delete related resources such as a file list and/or index information. In this way, the storage node cluster may stop acquiring any one of the plurality of files according to the virtual file identifier after deleting the virtual file identifier and the related resource based on the deletion instruction.
In other examples, the stop read instruction may be other forms of instructions, such as a close instruction. The electronic equipment sends the closing instruction to instruct the storage node cluster to close the virtual files describing the whole of the files. The storage node cluster does not actually perform operations such as data handling and merging on the plurality of files, and does not generate a real large file containing the plurality of files, but after receiving the close instruction, the storage node cluster may stop acquiring any file of the plurality of files according to the virtual file identifier.
In other examples, the stop-reading instruction may be another form of instruction, and the specific content of the stop-reading instruction is not limited herein.
In the embodiment of the application, before at least two files in the plurality of files are read, the interaction operation for creating the virtual file identifier is only required to be executed once between the electronic equipment and the storage node cluster, and after the at least two files in the plurality of files are read, the interaction operation for ending the reading is only required to be executed once between the electronic equipment and the storage node cluster, so that the interaction times of the whole process are greatly reduced, the time consumption of the whole process is reduced, and the file reading efficiency is improved.
In addition, in the embodiment of the present application, for different situations of information carried in the first read request, storage nodes connected to the electronic device in the storage node cluster may be different, and the following description is given by way of example.
(1) In one example, the first read request carries a second identification of the first file.
At this point, in some embodiments, the storage node cluster includes a plurality of storage nodes;
The electronic device sending a first read request carrying a second identification of a first file to the storage node cluster, comprising:
the electronic equipment sends a first reading request carrying a second identifier of a first file to a target storage node in a plurality of storage nodes, wherein the target storage node is a storage node storing the first file;
The electronic device receives a first file from a storage node cluster, comprising:
the electronic device receives a first file from a target storage node.
Fig. 7a is an exemplary schematic diagram illustrating interaction between an electronic device and a storage node cluster in an embodiment of the application.
In a practical scenario, when a plurality of storage nodes are included in a storage node cluster (e.g., the storage node cluster is a distributed storage cluster), any one of a plurality of files may be stored on any one of the storage nodes in the storage node cluster, and even different portions of the same file may be stored on different storage nodes.
In conventional technology, communication with the electronic device is typically through a fixed storage node in a cluster of storage nodes. The fixed storage node receives a first read request from the electronic device, and after the first read request is received, the storage node where the first file is located is queried through index information. In many cases, the storage node where the first file is located is different from the fixed storage node, so that information forwarding needs to be performed inside the storage node cluster, so that the first file is obtained from the storage node where the first file is located, and then the first file is forwarded inside the storage node cluster and then sent to the electronic device by the fixed storage node.
In this conventional information transmission process, a series of data forwarding operations need to be performed inside the storage node cluster, which results in longer time consumption and lower information processing efficiency.
In this example, however, the electronic device may receive the virtual file identification and index information from the storage node cluster after sending the creation instruction.
Then, the electronic device can query a second identifier of the first file in the storage node cluster from index information corresponding to the virtual file identifier according to the first identifier of the first file, so that a storage node where the first file is located can be determined according to the second identifier. For example, the second identifier of the first file in the storage node cluster is a handle of the first file in the storage node cluster, where the handle includes information of a storage node where the first file is located, so that the electronic device can determine the storage node where the first file is located.
After determining the storage node where the first file is located, the electronic device may determine the storage node where the first file is located as a target storage node, and directly send a first read request to the target storage node. In this way, the target storage node can directly send the first file stored in the target storage node to the electronic device without executing complex forwarding operation inside the storage node cluster, so that the interaction efficiency is improved, and the processing time is reduced.
The electronic device directly sends the first reading request to the target storage node refers to that the electronic device only communicates with the target storage node in a plurality of storage nodes of the storage node cluster, so as to send the first reading request, and does not communicate with other storage nodes of the storage node cluster. Other network devices such as routers, switches, etc. may exist between the electronic device and the target storage node when the electronic device communicates with the target storage node.
(2) In one example, the first read request carries a virtual file identification and a first identification of the first file.
As shown in fig. 7b, an exemplary interaction diagram between the electronic device and the storage node cluster is shown in this example.
In this example, after sending the creation instruction, the electronic device may receive a virtual file identification from the storage node cluster.
Then, the electronic device sends a first read request carrying the virtual file identification and a first identification of the first file to the storage node cluster. After the storage node cluster receives the first reading request, index information corresponding to the virtual file identification is determined according to the virtual file identification, and a second identification of the first file in the storage node cluster is queried from the index information according to the first identification of the first file, so that the first file is acquired according to the second identification of the first file and sent to the electronic equipment.
The storage nodes in the storage node cluster that communicate with the electronic devices are not limited. In different interaction processes, for example, in the interaction process related to the creation instruction, the interaction process related to the reading instruction and the interaction process related to the stopping of the reading instruction, the storage nodes communicating with the electronic device may be the same storage node or may be different storage nodes.
The specific content and the specific application scenario of the plurality of files in any of the embodiments of the present application may have various situations, and in different application scenarios, the specific sources of the plurality of files may be different.
In the following, a process of reading a plurality of training samples in a training process of a machine learning model is taken as an example for description. It should be noted that any embodiment of the present application is not limited to the training scenario of the machine learning model, and may also be applied to other scenarios, and this example is merely illustrative, and not limiting of the embodiments of the present application.
In this example, training of the machine learning model needs to be performed in the electronic device.
Among them, machine learning models in electronic devices are typically trained under a Pytorch, tensorFlow-like open-source machine learning framework.
During the training process, the electronic device needs to obtain training samples in the training data set from the storage node cluster in batches.
In particular, during the training process, multiple rounds (epochs) of training may be involved, each round traversing a training dataset. In each round, multiple batches (batches) may be included, with each batch of training requiring acquisition of a portion of the training samples in the training dataset for training. For example, the training data set involved in the training process includes ten thousands of training samples and the label corresponding to each training sample, and each round includes 100 batches, so each batch needs to obtain 100 training samples from ten thousands of training samples in the training data set for training.
Thus, during the training process, the machine learning framework may acquire training samples in batches and perform the training. In this example, a specified Vfile interface may be provided for the machine learning framework while the data reading manner of the machine learning framework is maintained, so that the machine learning framework may implement batch reading of multiple training samples through the Vfile interface, using any of the embodiments described above.
Specifically, in this example, all training samples in the training data set may be stored in the storage node cluster, and each training sample may be assigned to a different directory in the storage node cluster according to a label, and an exemplary storage manner of each training sample in the storage node cluster is as follows:
-cat directory
|--p01440764
|--p01443537
|--p01444580
……
I- -dog directory
|--p01450111
|--p01450312
|--p01448912
……
And the electronic device may obtain a file list of all training samples comprising the training data set.
The content of the file list of the training data set obtained in the electronic device is as follows:
After the electronic device obtains the file list, the electronic device may read the file list through an initialization (init) function, so as to analyze the content of the file list, and generate a file list in a preset format. And, the Vfile interface provided by this example may be called by the initializing function, and the virtual file identifier is obtained according to the generated file list. The internal implementation of the Vfile interface may enable the electronic device to send a creation instruction to the storage node cluster, so as to instruct the storage node cluster to create a virtual file identifier for identifying the whole of the plurality of files and index information corresponding to the virtual file identifier, and feed back the virtual file identifier to the electronic device. Further, in some examples, the electronic device may also send index information to the electronic device.
Exemplary codes for the Init function are as follows:
Wherein a Vfile interface can be created in the Init function through self-vfile= Vfile (img_ paths) to obtain virtual file identification, and index information can also be obtained.
Then, during the training process, the electronic device may read the training samples through getitem functions, as required by the training.
When the training samples are read through getitem functions, a Vfile interface can be used to read the corresponding training samples according to the virtual file handle and the first identifier such as the path or the file name of the training sample to be read.
Exemplary codes for getitem functions are as follows:
/>
The index is a number of a training sample to be read, through which the training sample to be read can be determined in the electronic device, and then, according to the number, a path and a size of the corresponding training sample are obtained from the file list, so that according to the path, the size and the virtual file identifier of the corresponding training sample, the corresponding training sample and the corresponding label are read from the storage node cluster through the Vfile interface.
For each training sample to be read, the getitem functions may be called once, so that each training sample and corresponding tag may be read continuously by calling getitem functions multiple times.
For example, the training data set involved in the training process includes ten thousands of training samples and the label corresponding to each training sample, and each round includes 100 batches, so each batch needs to obtain 100 training samples from ten thousands of training samples in the training data set for training. Then, each batch may call getitem a function to read 100 training samples and corresponding tags in turn, so that ten thousand training samples and tags corresponding to each training sample are read by getitem a function in one round to complete one round of training.
After the training is finished, the electronic device may delete the virtual file identifier by a delete (del) function, so that the electronic device and the storage node cluster may delete the virtual file identifier and the corresponding index information, and so on, thereby releasing the related resources.
Exemplary codes for del functions are as follows:
def_len_(self):
return len(self.img_paths)
In this example, in the training scenario of the machine learning model, where a large number of training samples need to be read, only one Init function needs to be executed at the beginning to create a virtual file identifier corresponding to the training data set and corresponding index information, and then the training samples can be continuously read through getitem functions, so that frequent opening and closing operations on each file in the conventional technology are avoided, the time period required for reading each training sample in the training process is shortened, and the training efficiency of the machine learning model is greatly improved. In addition, in the process of reading the training samples, each training sample is not moved in the storage node cluster, but the whole mass training samples are accessed as a virtual large file, so that extra calculation and storage cost are not increased, the training sample reading mode of the machine learning framework is not changed, and the usability and the practicability are good.
In the above, the embodiment of the application introduces a file reading method from multiple aspects, and in the following, a file reading device applied to an electronic device and a file reading device applied to a storage node cluster in the embodiment of the application are described with reference to the accompanying drawings.
As shown in fig. 8, a document reading apparatus 80 applied to an electronic device includes:
A sending module 801, configured to send a first read request for a first file in the plurality of files to the storage node cluster, where the first read request is obtained according to a virtual file identifier and a first identifier of the first file in the electronic device, and the virtual file identifier is an identifier of an entire plurality of files;
The receiving module 802 is configured to receive a first file from a storage node cluster, where the first file is obtained by the storage node cluster according to a second identifier of the first file in the storage node cluster, and the second identifier of the first file is obtained by the storage node cluster according to a first read request.
Optionally, the apparatus 80 further comprises a query module 803;
The querying module 803 is configured to query, according to a first identifier of a first file, a second identifier of the first file from index information corresponding to the virtual file identifier, where the index information includes an index from the first identifier of each file in the electronic device to the second identifier of the corresponding file in the storage node cluster;
The sending module 801 is configured to send a first read request carrying a second identifier of a first file to a storage node cluster.
Optionally, the storage node cluster comprises a plurality of storage nodes;
The sending module 801 is configured to send a first read request carrying a second identifier of a first file to a target storage node of the plurality of storage nodes, where the target storage node is a storage node storing the first file;
The receiving module 802 is configured to receive a first file from a target storage node.
Optionally, the sending module 801 is configured to send a first read request carrying the virtual file identifier and the first identifier of the first file to the storage node cluster, so as to instruct the storage node cluster to query, based on the first identifier of the first file, the second identifier of the first file from index information corresponding to the virtual file identifier, where the index information includes an index from the first identifier of each file in the electronic device to the second identifier of the corresponding file in the storage node cluster.
Optionally, the sending module 801 is configured to send a creation instruction to the storage node cluster, where the creation instruction is configured to instruct the storage node cluster to create a virtual file identifier that identifies the whole plurality of files;
The receiving module 802 is configured to receive feedback information from the storage node cluster, where the feedback information includes a virtual file identifier.
Optionally, the creating instruction carries information of a file list, and the file list includes a first identifier of each file in the plurality of files in the electronic device;
The creation instruction is further used for instructing the storage node cluster to generate index information corresponding to the virtual file identifications, so that the electronic device or the storage node cluster obtains a second identification of the first file according to the index information and the first identification of the first file, and the index information comprises an index from the first identification of each file in the electronic device to the second identification of the corresponding file in the storage node cluster.
Optionally, the file list further includes a size of each of the plurality of files, and the first read request includes the size of the first file.
Optionally, the sending module 801 is configured to send, to the storage node cluster, a second read request for a second file of the plurality of files, where the second read request is obtained according to the virtual file identifier and a first identifier of the second file in the electronic device;
The receiving module 802 is configured to receive a second file from the storage node cluster, where the second file is obtained by the storage node cluster according to a second identifier of the second file in the storage node cluster, and the second identifier of the second file is obtained by the storage node cluster according to a second read request;
the sending module 801 is configured to send a stop reading instruction to the storage node cluster after receiving the second file, where the stop reading instruction is used to instruct the electronic device to stop reading any file of the plurality of files.
As shown in fig. 9, a file reading apparatus 90 applied to a storage node cluster includes:
The receiving module 901 is configured to receive a first read request from an electronic device for a first file in a plurality of files, where the first read request is obtained by the electronic device according to an identifier of a virtual file identifier and a first identifier of the first file in the electronic device, and the virtual file identifier is an identifier of an entire plurality of files;
A processing module 902, configured to obtain, according to the first read request, a second identifier of the first file in the storage node cluster;
The sending module 903 is configured to send the first file to the electronic device according to the second identifier of the first file.
Optionally, the first read request includes a second identifier of the first file, where the second identifier of the first file is queried by the electronic device from index information corresponding to the virtual file identifier according to the first identifier of the first file, and the index information includes an index from the first identifier of each file in the electronic device to the second identifier of the corresponding file in the storage node cluster.
Optionally, the storage node cluster comprises a plurality of storage nodes;
The receiving module 901 is configured to receive, through a target storage node of the plurality of storage nodes, a first read request carrying a second identifier of a first file, where the target storage node is a storage node storing the first file;
the sending module 903 is configured to send, by the target storage node, the first file to the electronic device based on the second identifier of the first file.
Optionally, the first reading instruction carries a virtual file identifier and a first identifier of the first file;
The processing module 902 is configured to query, based on the first identifier of the first file, the second identifier of the first file from index information corresponding to the virtual file identifier, where the index information includes an index from the first identifier of each file in the electronic device to the second identifier of the corresponding file in the storage node cluster.
Optionally, the receiving module 901 is configured to receive a creation instruction from an electronic device;
the processing module 902 is configured to create a virtual file identifier that identifies the entirety of the plurality of files according to the creation instruction;
The sending module 903 is configured to send the virtual file identifier to the electronic device.
Optionally, the creating instruction carries a file list, and the file list includes a first identifier of each file in the plurality of files in the electronic device;
The processing module is used for generating index information corresponding to the virtual file identifications according to the creation instruction, so that the electronic equipment or the storage node cluster obtains second identifications of the first files according to the index information and the first identifications of the first files, and the index information comprises indexes from the first identifications of each file in the electronic equipment to the second identifications of the corresponding files in the storage node cluster.
Optionally, the file list further includes a size of each of the plurality of files, and the first read request includes the size of the first file.
Optionally, the receiving module 901 is configured to receive a second read request from the electronic device for a second file of the plurality of files, where the second read request is obtained by the electronic device according to the virtual file identifier and the first identifier of the second file in the electronic device;
The processing module 902 is configured to obtain, according to the second read request, a second identifier of the second file in the storage node cluster;
the sending module 903 is configured to send the second file to the electronic device according to the second identifier of the second file;
The receiving module 901 is configured to receive a stop reading instruction from the electronic device after sending the second file;
the processing module 902 is configured to stop acquiring any file of the plurality of files according to the stop reading instruction.
Fig. 10 is a schematic diagram of a possible logic structure of an electronic device 100 according to an embodiment of the present application. The electronic device 100 is for implementing the functions of the electronic device as referred to in any of the above embodiments. The electronic device 100 includes: memory 1001, processor 1002, communication interface 1003, and bus 1004. The memory 1001, the processor 1002, and the communication interface 1003 are connected to each other by a bus 1004.
The memory 1001 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access memory (random access memory, RAM). The memory 1001 may store a program, and when the program stored in the memory 1001 is executed by the processor 1002, the processor 1002 and the communication interface 1003 are configured to perform one or more steps of the above-described file reading method embodiment.
The processor 1002 may employ a central processing unit (central processing unit, CPU), microprocessor, application SPECIFIC INTEGRATED Circuit (ASIC), graphics processor (graphics processina unit, GPU), digital signal processor (DIGITAL SIGNAL processing, DSP), off-the-shelf programmable gate array (field programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, for executing associated programs to perform the functions required by the sending module, receiving module, and querying module, etc., in the file reading apparatus applied to the electronic device in the above embodiments, or to perform one or more steps of the method embodiments of the present application. The steps of a method disclosed in connection with an embodiment of the present application may be performed by a hardware decoding processor or by a combination of hardware and software modules in the decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1001, and the processor 1002 reads the information in the memory 1001, and in combination with its hardware, performs one or more of the steps of the file reading method embodiments described above.
Communication interface 1003 enables communication between electronic device 100 and other devices or communication networks using transceiving means such as, but not limited to, a transceiver.
Bus 1004 may enable a pathway for information among the various components of electronic device 100 (e.g., memory 1001, processor 1002, and communication interface 1003). Bus 1004 may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in fig. 10, but not only one bus or one type of bus.
In another embodiment of the present application, there is also provided a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor of a device, perform the steps performed by the processor of fig. 10 described above.
In another embodiment of the present application, there is also provided a computer program product comprising computer-executable instructions stored in a computer-readable storage medium; the steps performed by the processor in fig. 10 described above are performed by the device when the computer-executable instructions are executed by the device's processor.
In another embodiment of the present application, there is also provided a chip system including a processor for implementing the steps performed by the processor of fig. 10. In one possible design, the chip system may further include memory to hold the necessary program instructions and data. The chip system can be composed of chips, and can also comprise chips and other discrete devices.
Fig. 11 is a schematic diagram of a possible logic structure of a storage node cluster 11 according to an embodiment of the present application. The storage node cluster 11 is used to implement the functionality of the electronic device as referred to in any of the embodiments described above. The storage node cluster 11 includes one or more storage nodes 110, any of which includes a memory 1101, a processor 1102, a communication interface 1103, and a bus 1104. The memory 1101, the processor 1102, and the communication interface 1103 are communicatively connected to each other through a bus 1104.
The memory 1101 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access memory (random access memory, RAM). The memory 1101 may store a program, and the processor 1102 and the communication interface 1103 are configured to perform one or more steps of the above-described file reading method embodiment when the program stored in the memory 1101 is executed by the processor 1102.
The processor 1102 may employ a central processing unit (central processing unit, CPU), microprocessor, application SPECIFIC INTEGRATED Circuit (ASIC), graphics processor (graphics processing unit, GPU), digital signal processor (DIGITAL SIGNAL processing, DSP), off-the-shelf programmable gate array (field programmable GATE ARRAY, FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, for executing associated programs to perform the functions required by the sending module, receiving module, processing module, etc. in the file reading apparatus applied to a storage node cluster in the above embodiments, or to perform one or more steps of the method embodiments of the present application. The steps of a method disclosed in connection with an embodiment of the present application may be performed by a hardware decoding processor or by a combination of hardware and software modules in the decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 1101, and the processor 1102 reads information in the memory 1101, and in combination with its hardware, performs one or more steps of the file reading method embodiments described above.
The communication interface 1103 enables communication between the storage node cluster 11 and other devices or communication networks using transceiving means, such as, but not limited to, a transceiver.
A bus 1104 may implement a pathway for information among the various components of the storage node cluster 11 (e.g., the memory 1101, the processor 1102, and the communication interface 1103). Bus 1104 may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one thick line is shown in FIG. 11, but not only one bus or one type of bus.
In another embodiment of the present application, there is also provided a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor of a device, perform the steps performed by the processor of fig. 11 described above.
In another embodiment of the present application, there is also provided a computer program product comprising computer-executable instructions stored in a computer-readable storage medium; when the processor of the device executes the computer-executable instructions, the device performs the steps described above for the processor of fig. 11.
In another embodiment of the present application, there is also provided a chip system including a processor for implementing the steps performed by the processor of fig. 11. In one possible design, the chip system may further include memory to hold the necessary program instructions and data. The chip system can be composed of chips, and can also comprise chips and other discrete devices.
An embodiment of the present application provides a communication system including an electronic device 100 as shown in fig. 10 and a storage node cluster 11 as shown in fig. 11.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above is merely a specific implementation of the embodiment of the present application, but the protection scope of the embodiment of the present application is not limited thereto.

Claims (19)

1. A method of reading a document, the method comprising:
The method comprises the steps that a storage node cluster receives a first reading request of an electronic device for a first file in a plurality of files, wherein the first reading request is obtained by the electronic device according to a virtual file identifier and a first identifier of the first file in the electronic device, and the virtual file identifier is an identifier of the whole of the plurality of files;
the storage node cluster obtains a second identifier of the first file in the storage node cluster according to the first reading request;
and the storage node cluster sends the first file to the electronic equipment according to the second identification of the first file.
2. The method of claim 1, wherein the first read request includes a second identification of the first file, the second identification of the first file being queried by the electronic device from index information corresponding to the virtual file identifications based on the first identification of the first file, the index information including an index from the first identification of each of the files in the electronic device to the second identification of the corresponding file in the storage node cluster.
3. The method of claim 2, wherein the cluster of storage nodes comprises a plurality of storage nodes;
the storage node cluster receives a first read request from an electronic device for a first file of a plurality of files, comprising:
a target storage node in the plurality of storage nodes receives the first reading request carrying a second identifier of the first file, wherein the target storage node is a storage node storing the first file;
the storage node cluster sends the first file to the electronic device based on the second identification of the first file, including:
the target storage node sends the first file to the electronic device based on a second identification of the first file.
4. The method of claim 1, wherein the first read instruction carries the virtual file identification and a first identification of the first file;
The storage node cluster obtains a second identifier of the first file in the storage node cluster based on the first read request, including:
The storage node cluster queries a second identifier of the first file from index information corresponding to the virtual file identifier based on the first identifier of the first file, wherein the index information comprises an index from the first identifier of each file in the electronic equipment to the second identifier of the corresponding file in the storage node cluster.
5. The method of any of claims 1-4, wherein prior to the storage node cluster receiving a first read request from the electronic device for a first file of the plurality of files, the method further comprises:
The storage node cluster receives a creation instruction from the electronic equipment;
The storage node cluster creates the virtual file identification for identifying the whole files according to the creation instruction;
and the storage node cluster sends the virtual file identification to the electronic equipment.
6. The method of claim 5, wherein the creation instruction carries a file list, the file list including a first identification of each of the plurality of files in the electronic device;
the method further comprises the steps of:
And generating index information corresponding to the virtual file identifications by the storage node cluster according to the creation instruction, so that the electronic equipment or the storage node cluster obtains a second identification of the first file according to the index information and the first identification of the first file, wherein the index information comprises an index from the first identification of each file in the electronic equipment to the second identification of the corresponding file in the storage node cluster.
7. The method of claim 6, wherein the list of files further comprises a size of each of the plurality of files, and wherein the first read request comprises the size of the first file.
8. The method of any of claims 1-7, wherein after the storage node cluster sends the first file to the electronic device based on the second identification of the first file, the method further comprises:
The storage node cluster receives a second read request of a second file in the plurality of files from the electronic equipment, wherein the second read request is obtained by the electronic equipment according to the virtual file identifier and a first identifier of the second file in the electronic equipment;
the storage node cluster obtains a second identifier of the second file in the storage node cluster according to the second reading request;
the storage node cluster sends the second file to the electronic equipment according to the second identifier of the second file;
after the second file is sent, the storage node cluster receives a reading stopping instruction from the electronic equipment;
And the storage node cluster stops acquiring any one of the files according to the reading stopping instruction.
9. A method of reading a document, the method comprising:
The method comprises the steps that electronic equipment sends a first reading request for a first file in a plurality of files to a storage node cluster, wherein the first reading request is obtained according to a virtual file identifier and a first identifier of the first file in the electronic equipment, and the virtual file identifier is an identifier of the whole plurality of files;
The electronic equipment receives the first file from the storage node cluster, wherein the first file is acquired by the storage node cluster according to a second identification of the first file in the storage node cluster, and the second identification of the first file is acquired by the storage node cluster according to the first reading request.
10. The method of claim 9, wherein before the electronic device sends the first read request for the first file of the plurality of files to the cluster of storage nodes, the method further comprises:
the electronic equipment sends a creation instruction to the storage node cluster, wherein the creation instruction is used for instructing the storage node cluster to create the virtual file identifier for identifying the whole files;
And the electronic equipment receives feedback information from the storage node cluster, wherein the feedback information comprises the virtual file identification.
11. The method of claim 10, wherein the creation instruction carries information of a file list, the file list including a first identification of each of the plurality of files in the electronic device;
The creation instruction is further configured to instruct the storage node cluster to generate index information corresponding to the virtual file identifier, so that the electronic device or the storage node cluster obtains a second identifier of the first file according to the index information and the first identifier of the first file, where the index information includes an index from a first identifier of each file in the electronic device to a second identifier of the corresponding file in the storage node cluster.
12. A document reading apparatus, comprising:
The electronic device comprises a receiving module, a first reading module and a second reading module, wherein the receiving module is used for receiving a first reading request of the electronic device for a first file in a plurality of files, the first reading request is obtained by the electronic device according to an identifier of a virtual file identifier and a first identifier of the first file in the electronic device, and the virtual file identifier is an identifier of the whole plurality of files;
the processing module is used for obtaining a second identifier of the first file in the storage node cluster according to the first reading request;
And the sending module is used for sending the first file to the electronic equipment according to the second identifier of the first file.
13. The apparatus of claim 12, wherein the device comprises a plurality of sensors,
The receiving module is used for receiving a creation instruction from the electronic equipment;
The processing module is used for creating the virtual file identification for identifying the whole of the files according to the creation instruction;
The sending module is used for sending the virtual file identification to the electronic equipment.
14. The apparatus of claim 13, wherein the creation instruction carries a file list comprising a first identification of each of the plurality of files in the electronic device;
The processing module is configured to generate index information corresponding to the virtual file identifiers according to the creation instruction, so that the electronic device or the storage node cluster obtains a second identifier of the first file according to the index information and the first identifier of the first file, where the index information includes an index from the first identifier of each file in the electronic device to the second identifier of the corresponding file in the storage node cluster.
15. A document reading apparatus, comprising:
the device comprises a sending module, a storage node cluster and a storage node cluster, wherein the sending module is used for sending a first reading request for a first file in a plurality of files to the storage node cluster, the first reading request is obtained according to a virtual file identifier and a first identifier of the first file in electronic equipment, and the virtual file identifier is an identifier of the whole plurality of files;
The receiving module is configured to receive the first file from the storage node cluster, where the first file is obtained by the storage node cluster according to a second identifier of the first file in the storage node cluster, and the second identifier of the first file is obtained by the storage node cluster according to the first read request.
16. A storage node cluster comprising at least one processor, a memory, and instructions stored on the memory and executable by the at least one processor, the at least one processor executing the instructions to implement the steps of the method of any one of claims 1-8.
17. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-8.
18. An electronic device comprising at least one processor, a memory, and instructions stored on the memory and executable by the at least one processor, the at least one processor executing the instructions to perform the steps of the method of any one of claims 9-11.
19. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 9-11.
CN202310477707.0A 2022-12-14 2023-04-27 File reading method and related equipment Pending CN118193468A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211608313 2022-12-14
CN2022116083136 2022-12-14

Publications (1)

Publication Number Publication Date
CN118193468A true CN118193468A (en) 2024-06-14

Family

ID=91393601

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310477707.0A Pending CN118193468A (en) 2022-12-14 2023-04-27 File reading method and related equipment

Country Status (1)

Country Link
CN (1) CN118193468A (en)

Similar Documents

Publication Publication Date Title
CN107665233B (en) Database data processing method and device, computer equipment and storage medium
CN101246486B (en) Method and apparatus for improved process of expressions
CN108334895B (en) Target data classification method and device, storage medium and electronic device
CN109800005A (en) A kind of hot update method of client and device
CN106992940A (en) Message processing method, device, system and terminal
CN104462420A (en) Method and device for executing query tasks on database
WO2019100934A1 (en) Container arrangement method, device, and storage medium
CN118193468A (en) File reading method and related equipment
WO2023077866A1 (en) Multimedia data processing method and apparatus, electronic device, and storage medium
CN103220327B (en) user information storage method and device
CN113360061B (en) Language switching method and device
CN112148782B (en) Market data access method and device
CN114221883A (en) Message testing method, device, server and storage medium
CN112150590B (en) Animation file output method and device
CN110471708B (en) Method and device for acquiring configuration items based on reusable components
CN114265869A (en) Data message forwarding method and device, storage medium and electronic equipment
CN113742529A (en) Multi-table front-end processing method and device
CN104111821A (en) Data processing method, data processing device and data processing system
CN108920277B (en) Service execution system, method and device and service isolation system
US20050022162A1 (en) Method of translating a message from a first markup language into a second markup language
US20150309788A1 (en) Function module modularizing method in data distribution service and modularizing apparatus thereof
CN111949716B (en) Formatted data output field processing method, computer device and storage medium
CN115052035B (en) Message pushing method, device and storage medium
CN114928466B (en) Automatic identification method and device for encrypted data, storage medium and computer equipment
CN111050339B (en) High-speed data updating system and method based on protocol non-perception forwarding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination