CN112783443A - Data reading method and device and electronic equipment - Google Patents

Data reading method and device and electronic equipment Download PDF

Info

Publication number
CN112783443A
CN112783443A CN202110063847.4A CN202110063847A CN112783443A CN 112783443 A CN112783443 A CN 112783443A CN 202110063847 A CN202110063847 A CN 202110063847A CN 112783443 A CN112783443 A CN 112783443A
Authority
CN
China
Prior art keywords
data
path
target
identifier
file system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110063847.4A
Other languages
Chinese (zh)
Inventor
余虹建
李锦丰
朱军
李秋庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Juyun Technology Co ltd
Original Assignee
Beijing Juyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Juyun Technology Co ltd filed Critical Beijing Juyun Technology Co ltd
Priority to CN202110063847.4A priority Critical patent/CN112783443A/en
Publication of CN112783443A publication Critical patent/CN112783443A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0635Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Library & Information Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data reading method and device and electronic equipment. The method comprises the following steps: receiving a data request sent by a file system; determining a target data identifier corresponding to the target path based on a corresponding relation between the pre-established path and the data identifier; and reading target data from the object storage system based on the target data identification, and feeding back the target data to the file system, so that the file system feeds back the target data to the task side after receiving the target data. Through the scheme, the efficiency of data reading can be improved.

Description

Data reading method and device and electronic equipment
Technical Field
The present invention relates to the field of data storage technologies, and in particular, to a data reading method and apparatus, and an electronic device.
Background
A data storage system is a commonly used system for providing services such as data storage, data reading and writing, and the like. In a conventional data storage system, a file system is used as an interface layer, and an object storage system is used as a data storage layer, so that a request for the file system can be converted into a request for object storage by a mount tool of the file system.
In the prior art, a file system accesses data in a hierarchical access mode, and an object storage system does not have a hierarchical structure, so that the hierarchical structure of the file system needs to be simulated in the object storage system through LIST request and Delimiter segmentation. However, the read data is very time-consuming when requested by LIST, resulting in low efficiency in data reading.
Disclosure of Invention
The embodiment of the invention aims to provide a data reading method, a data reading device and electronic equipment so as to improve the data reading efficiency. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a data reading method applied to a management apparatus, where the management apparatus communicates with a file system and an object storage system in a data storage system, and the object storage system stores data used for deep learning network model training, and the method includes:
receiving a data request sent by a file system; the data request is a request sent by the file system when receiving a training data acquisition request sent by a training task end; the data request carries a target path of the data requested by the training data acquisition request in the file system, and the training task end is as follows: the task end is used for training the deep learning network model;
determining a target data identifier corresponding to the target path based on a corresponding relation between the pre-established path and the data identifier; wherein, the data identifier corresponding to each path is: data identification of data under the path in the file system in the object storage system;
and reading target data from the object storage system based on the target data identification, and feeding back the target data to the file system, so that the file system feeds back the target data to the task side after receiving the target data.
Further, the management device further includes a name manager, where the name manager is used to store a correspondence between a pre-established path and a data identifier;
the determining a target data identifier corresponding to the target path based on a correspondence between a pre-established path and a data identifier includes:
sending an identifier acquisition request carrying the target path to the name manager, so that the management device feeds back the target data identifier after determining the target data identifier corresponding to the target path based on the corresponding relationship;
and receiving the target data identification fed back by the name manager.
Further, the method further comprises:
when the object storage system is detected to store data, acquiring a data identifier of the data stored by the object storage system and a path aiming at the stored data;
and establishing a corresponding relation between the acquired path and the acquired data identification.
Further, obtaining a path for the stored data includes:
acquiring a directory path of each layer of data directory where the stored data are located and a data path of the stored data;
the establishing of the corresponding relationship between the acquired path and the acquired data identifier includes:
and establishing a corresponding relation between the acquired directory path and the acquired data identifier for each path in the acquired directory path and the acquired data path.
Further, the establishing, for each of the obtained directory path and data path, a corresponding relationship between the path and the obtained data identifier includes:
and establishing a corresponding relation between the path and the acquired data identification by aiming at each path in the acquired directory path and data path in a multi-branch tree structure.
Further, the type of the target path is a directory path or a data path.
In a second aspect, an embodiment of the present invention provides a data reading apparatus applied to a management apparatus, where the management apparatus communicates with a file system and an object storage system in a data storage system, and the object storage system stores data used for deep learning network model training, and the apparatus includes:
the request receiving module is used for receiving a data request sent by the file system; the data request is a request sent by the file system when receiving a training data acquisition request sent by a training task end; the data request carries a target path of the data requested by the training data acquisition request in the file system, and the training task end is as follows: the task end is used for training the deep learning network model;
the identification determining module is used for determining a target data identification corresponding to the target path based on the corresponding relation between the pre-established path and the data identification; wherein, the data identifier corresponding to each path is: data identification of data under the path in the file system in the object storage system;
and the data reading module is used for reading target data from the object storage system based on the target data identification and feeding back the target data to the file system, so that the file system feeds back the target data to the task terminal after receiving the target data.
Further, the management device further includes a name manager, where the name manager is used to store a correspondence between a pre-established path and a data identifier;
the identifier determining module is specifically configured to send an identifier obtaining request carrying the target path to the name manager, so that the management device feeds back the target data identifier after determining the target data identifier corresponding to the target path based on the correspondence; and receiving the target data identification fed back by the name manager.
Further, the apparatus further comprises:
the path acquisition module is used for acquiring a data identifier of the data stored by the object storage system and a path aiming at the stored data when the object storage system is detected to store the data;
and the relation establishing module is used for establishing the corresponding relation between the acquired path and the acquired data identifier.
Further, the path obtaining module is specifically configured to obtain a directory path of each layer of the data directory where the stored data is located and a data path of the stored data;
the relationship establishing module is specifically configured to establish, for each of the obtained directory path and data path, a corresponding relationship between the path and the obtained data identifier.
Further, the relationship establishing module is specifically configured to establish, for each of the obtained directory path and data path, a corresponding relationship between the path and the obtained data identifier in a multi-way tree structure.
Further, the type of the target path is a directory path or a data path.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of the first aspect when executing the program stored in the memory.
In a fourth aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps of the first aspect.
The embodiment of the invention has the following beneficial effects:
in the data reading method provided by the embodiment of the invention, due to the pre-established corresponding relationship between the path and the data identifier, when data is read from the object storage system, the target data identifier of the requested data in the object storage system can be quickly determined according to the target identifier carried in the request, and then the target data is read from the object storage system according to the target data identifier without depending on the LIST request, so that the data reading efficiency is improved.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a schematic diagram of a data reading system according to an embodiment of the present invention;
FIG. 2 is a flowchart of a data reading method according to an embodiment of the present invention;
fig. 3 is a flowchart of a method for establishing a corresponding relationship according to an embodiment of the present invention;
FIG. 4a is a schematic diagram of a multi-way tree according to an embodiment of the present invention;
FIG. 4b is a schematic diagram of another multi-way tree according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a data reading system incorporating a specific scenario according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a data reading apparatus according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to improve the efficiency of reading data, embodiments of the present invention provide a data reading method and apparatus, and an electronic device.
First, a data reading method provided by an embodiment of the present invention will be described from the perspective of a management device.
The management device may be any electronic device or functional module in the electronic device with data processing capability, such as an independently installed cache server or a cache manager in a data storage system. Moreover, the data processing method provided by the embodiment of the invention can be realized by software, hardware or a combination of software and hardware.
In another implementation, the management apparatus may include a cache manager and a name manager, where the embodiment of the present invention may be applied to the cache manager, and the name manager is configured to store a correspondence between a pre-established path and a data identifier.
Fig. 1 is a schematic structural diagram of a data storage system according to an embodiment of the present invention. The method comprises the following steps: a file system 101, a management apparatus 102, and an object storage system 103. The object storage system 103 is a database that stores data in the form of objects in a disk. The file system 101 may be a user space file system (FUSE). The data acquisition method provided by the embodiment of the invention can be applied to the management device 102. Wherein the object storage system 103 is responsible for storing data, while the file system 101 may not be used for storing data, but only for interacting with users or task processes. The management apparatus 102 may also execute other technical solutions besides the technical solution provided by the embodiment of the present invention. For example, the management apparatus 102 may also perform data caching as a caching server.
In the embodiment of the present invention, the requesting party may be the file system 101, and when the file system receives a training data reading request from the training task end, the file system 101 does not store data, but stores data through the object storage system 103, so that the file system 101 may obtain data from the object storage system 103 according to the training data reading request. In order to improve the efficiency of data reading, the file system 101 may send a data request to the management apparatus 102 to read data from the object storage system 103 through the management apparatus 102. After receiving the data request sent by the file system 101, the management device 102 may execute the data obtaining method provided in the embodiment of the present invention, and read the target data from the object storage system 103, and feed back the target data to the file system, so that the file system feeds back the target data to the task side after receiving the target data.
It should be noted that, as can be seen from the foregoing, the embodiment of the present invention may be applied to various electronic devices, and for convenience of brief description, the technical solution of the embodiment of the present invention is described only by taking the embodiment of the present invention as an example for being applied to a management apparatus.
As shown in fig. 2, a data reading method provided in an embodiment of the present invention is applied to a management device, where the management device communicates with a file system and an object storage system in a data storage system, and the object storage system stores data for deep learning network model training, and the method may include the following steps:
s201, receiving a data request sent by a file system; the data request is a request sent by a file system when receiving a training data acquisition request sent by a training task end; the data request carries a target path of data requested by the training data acquisition request in the file system, and the training task end comprises: the task end is used for training the deep learning network model;
when a deep learning network model is trained on a training task end, in an iterative process, training data for training needs to be acquired from a target data set, the training task end can send a training data acquisition request to a file system, and the request can be a request for acquiring data under a specified data path for training. In a data system consisting of a file system and a storage system, the file system does not store data and is only used as an interaction layer for a user or a task end, so that the user or the task end can manage the stored data conveniently.
When the file system receives a data reading request from a user or a task terminal, a data request may be sent to the management device to request the data requested to be read by the data reading request.
It should be noted that the file system in the embodiment of the present invention does not store data, but only serves as a data management system, so that a user or a task end can conveniently manage the stored data. And the location where the data is actually stored is the object storage system.
The file system is a system for managing data in a directory form, different targets can be generated according to different attributes of the data, and the data managed under each directory can be data with the same attribute corresponding to the directory. Illustratively, data with a generation time of 2019 may be stored under directory 2019 and data with a generation time of 2020 may be stored under target 2020, according to the time attribute. For another example, data in the training set used to train the neural network model 1 may be stored under the model 1, and data in the training set used to train the neural network model 2 may be stored under the model 1, depending on the application.
In the object storage system, the stored data are stored in the form of objects, and a hierarchical structure of a file system does not exist, so that the object storage system is inconvenient to manage the stored data, but the availability and the data reading speed are due to the data storage system. Illustratively, a picture 1 and a picture 2 are stored in the object storage system, and key value pairs { Photo1, picture 1}, { Photo2, picture 2} are constructed in the comparison data 1, wherein Photo1 is a data identifier of the picture 1 in the object storage system, and the picture 1 can be quickly found based on the data identifier. Correspondingly, Photo2 is the data identification of picture 2 in the object storage system.
Alternatively, the directory path may be a data path. Illustratively, the data paths of picture 1, picture 2, and video 1 within the file system are "C/picture 1", "C/picture 2", and "C/video 1", respectively.
Optionally, the target path may also be a directory path of a directory where the data is located. As for the picture 1 in the above example, the directory where the picture 1 is located is "picture", and the directory path of the directory where the picture 1 is located is "C/picture".
S202, determining a target data identifier corresponding to the target path based on the corresponding relation between the pre-established path and the data identifier; wherein, the data identifier corresponding to each path is: data identification of data under the path in the file system in the object storage system;
the index of the data in the object storage system is identified by the data of the data in the object storage system, that is, the data can be searched in the object storage system by the data identification of the data. In the file system, the basis of the path search data can be used for searching corresponding data layer by layer according to the path. Therefore, the data identifier of the data managed by the file system in the object storage system can be quickly determined according to the corresponding relation between the pre-established path and the data identifier, so that the data managed by the file system can be conveniently read.
Optionally, in an implementation manner, the data identifier corresponding to each path may be recorded by a table. When the target data identifier corresponding to the target path needs to be determined, the target path may be used for searching in the table for recording the corresponding relationship, and after the target path is searched, the target data identifier corresponding to the target path recorded in the table is determined.
Optionally, in an implementation manner, when the target path is a data path, a unique target data identifier corresponds to the target path. Optionally, in another implementation, when the target path is a directory path, the target path may correspond to multiple target data identifiers.
Illustratively, in the object storage system, data 1, data 2, and data 3 are stored. Wherein, the data identifications of data 1, data 2 and data 3 in the object store are File1, File2 and File3 respectively, and the data paths of data 1, data 2 and data 3 in the File system are: "C/set 1/data 1", "C/set 1/data 2", and "C/set 2/data 3".
In the case where the target path is a data path, there are several possibilities: if the target path is 'C/set 1/data 1', the target data is identified as File 1; if the target path is 'C/set 1/data 2', the target data is identified as File 2; the target path is "C/set 2/data 3" and the target data is identified as File 3.
In the case where the target path is a directory path, there are several possibilities: if the target path is the directory path 'C/set 1' of the directory 'set 1', the target data are marked as File1 and File 2; the target path is directory path "C/set 2" of directory "set 2", and the target data is identified as File 2. The target path is directory path "C" of directory C, and the target data is identified as File1, File2, and File 3.
S203, reading the target data from the object storage system based on the target data identification, and feeding back the target data to the file system, so that the file system feeds back the target data to the task side after receiving the target data.
Wherein the target data identification can be used as an object storage index to read the target data from the object storage system.
Optionally, in another embodiment of the present invention, after the target data is read from the object storage system, the target data may be cached or a data request of a requester may be responded based on the target data.
According to the technical scheme provided by the embodiment, due to the pre-established corresponding relationship between the path and the data identifier, when data is read from the object storage system, the target data identifier of the requested data in the object storage system can be rapidly determined according to the target identifier carried in the request, and then the target data is read from the object storage system according to the target data identifier without depending on a LIST request, so that the data reading efficiency is improved.
Optionally, in another embodiment of the present invention, the data reading method provided in the embodiment of the present invention may specifically apply to a cache manager of a management device, where the management device further includes a name manager, and the name manager is configured to store a correspondence between a pre-established path and a data identifier, then S202 may include:
sending an identification obtaining request carrying a target path to a name manager so that a management device feeds back a target data identification after determining the target data identification corresponding to the target path based on the corresponding relation; and receiving the target data identification fed back by the name manager.
Optionally, as shown in fig. 3, a method for establishing a corresponding relationship provided in the embodiment of the present invention is used to establish a corresponding relationship between a path used in the embodiment of fig. 1 and a data identifier, where the method includes:
s301, when detecting that the object storage system stores data, acquiring a data identifier of the data stored in the object storage system and a path aiming at the stored data;
when data is stored in the object storage, the corresponding relationship between the data identifier of the stored data in the object storage system and the path for the stored data needs to be established.
Optionally, the object storage system may be monitored in real time, or may be periodically monitored, or a message reported in real time may be sent to the object storage system, so that the object storage system sends an informing message when executing data storage, and when receiving the informing message sent by the object storage system, it considers that the object storage system stores data.
Optionally, a message for acquiring a path for the stored data may be sent to the file system, and a message for acquiring a data identifier of the stored data may be sent to the object storage system to acquire the data identifier of the data stored in the object storage system and the path for the stored data, respectively.
Illustratively, when it is detected that data 4 is stored in the object storage system, data 4 is acquired respectively in the object storage system, data identification File4 and path "C/set 2/data 4" for data 4.
Optionally, in an implementation manner, the path for the stored data may be: at this time, the path for the stored data may be obtained in the following manner, including:
acquiring a directory path of each layer of data directory where the stored data are located and a data path of the stored data;
by way of example, the data path of data 4 is "C/set 2/data 4", and the data directory in which data 4 is located includes "C and set 2", so that the directory path of each layer of data directory in which data 4 is located includes directory path "C" of data directory C, and directory path "C/set 2" of data directory set 2 "
S302, establishing a corresponding relation between the acquired path and the acquired data identification.
A correspondence table may be established, and a correspondence between the path and the data identifier may be recorded in the correspondence table. At this time, the correspondence between the acquired path and the acquired data identifier may be added to the correspondence table to realize establishment of the correspondence between the acquired path and the acquired data identifier.
Optionally, in an implementation manner, when the S301 includes a directory path for obtaining a data directory of each layer where the stored data is located and a data path for the stored data, the S302 may include:
and establishing a corresponding relation between the acquired directory path and the acquired data identifier for each path in the acquired directory path and the acquired data path.
And aiming at each acquired directory path and data path, establishing a corresponding relation with the acquired data identifier.
By way of example, the data path of the data 4 is "C/set 2/data 4", the directory path of the data 4 includes "C" and "C/set 2", and the data identifier of the data 4 in the object storage system is File4, a correspondence between the directory path "C" and the data identifier File4, a correspondence between the directory path "C/set 2" and the data identifier File4, and a correspondence between the data path "C/set 2/data 4" and the data identifier File4 are established.
Optionally, for each of the obtained directory path and data path, a corresponding relationship between the path and the obtained data identifier may be established in a multi-way tree structure.
Illustratively, as shown in fig. 4a, a schematic diagram of a multi-way tree is provided for the embodiment of the present invention, wherein in the object storage system, data 1, data 2, and data 3 are stored. Wherein, the data identifications of data 1, data 2 and data 3 in the object store are File1, File2 and File3 respectively, and the data paths of data 1, data 2 and data 3 in the File system are: "C/set 1/data 1", "C/set 1/data 2", and "C/set 2/data 3".
When data 4 is stored in the object storage system, the identifier of data 4 in the object storage system is File4, and the data path of data 4 in the File system is "C/set 2/data 4", based on the example in fig. 4a, as shown in fig. 4b, another schematic diagram of a multi-way tree is provided for the embodiment of the present invention.
According to the technical scheme provided by the embodiment, the corresponding relation between the path aiming at the stored data and the data identifier of the stored data can be established when the data is stored in the object storage system, so that the data identifier of the stored data in the object storage system can be quickly found based on the corresponding relation when the stored data is read, and the efficiency of reading the data is improved.
Optionally, to more clearly illustrate the technical solution of the embodiment of the present invention, as shown in fig. 5, the embodiment of the present invention further provides a schematic diagram of a data reading method combined with an actual service scenario.
In FIG. 5, the deep learning training task manager generates a file system for each deep learning training task. For example, the deep learning training task 1 corresponds to the file system 1, the deep learning training task 2 corresponds to the file system 2, the deep learning training task 3 corresponds to the file system 3, and the deep learning training task 4 corresponds to the file system 4.
When the deep learning task needs to acquire training data in the task training process, a training data acquisition request can be sent to the file system. After receiving the training data acquisition request, the file system may first send a data request to the cache manager, and after receiving the data request, the cache manager may acquire a target data identifier from a name node (NameSpace), and read target data from the object storage system based on the acquired target data identifier.
Corresponding to the method provided above, as shown in fig. 6, an embodiment of the present invention further provides a data reading apparatus applied to a management apparatus, where the management apparatus communicates with a file system and an object storage system in a data storage system, and the object storage system stores data used for deep learning network model training, and the apparatus includes:
a request receiving module 601, configured to receive a data request sent by a file system; the data request is a request sent by a file system when receiving a training data acquisition request sent by a training task end; the data request carries a target path of data requested by the training data acquisition request in the file system, and the training task end comprises: the task end is used for training the deep learning network model;
an identifier determining module 602, configured to determine, based on a correspondence between a pre-established path and a data identifier, a target data identifier corresponding to a target path; wherein, the data identifier corresponding to each path is: data identification of data under the path in the file system in the object storage system;
and the data reading module 603 is configured to read the target data from the object storage system based on the target data identifier, and feed back the target data to the file system, so that the file system feeds back the target data to the task side after receiving the target data.
Further, the management device further comprises a name manager, wherein the name manager is used for storing a corresponding relation between a pre-established path and the data identifier;
the identification determining module is specifically used for sending an identification obtaining request carrying a target path to the name manager so that the management device feeds back a target data identification after determining the target data identification corresponding to the target path based on the corresponding relation; and receiving the target data identification fed back by the name manager.
Further, the apparatus further comprises:
the path acquisition module is used for acquiring a data identifier of the data stored by the object storage system and a path aiming at the stored data when the object storage system is detected to store the data;
and the relation establishing module is used for establishing the corresponding relation between the acquired path and the acquired data identifier.
Further, the path obtaining module is specifically configured to obtain a directory path of each layer of the data directory where the stored data is located and a data path of the stored data;
and the relationship establishing module is specifically used for establishing a corresponding relationship between the acquired directory path and the acquired data identifier for each of the acquired directory path and the acquired data path.
Further, the relationship establishing module is specifically configured to establish, for each of the obtained directory path and data path, a corresponding relationship between the path and the obtained data identifier in a multi-way tree structure.
Further, the type of the target path is a directory path or a data path.
According to the technical scheme provided by the embodiment, due to the pre-established corresponding relationship between the path and the data identifier, when data is read from the object storage system, the target data identifier of the requested data in the object storage system can be rapidly determined according to the target identifier carried in the request, and then the target data is read from the object storage system according to the target data identifier without depending on a LIST request, so that the data reading efficiency is improved.
An embodiment of the present invention further provides an electronic device, as shown in fig. 7, including a processor 701, a communication interface 702, a memory 703 and a communication bus 704, where the processor 701, the communication interface 702, and the memory 703 complete mutual communication through the communication bus 704,
a memory 703 for storing a computer program;
the processor 701 is configured to implement the method steps provided in the above embodiments when executing the program stored in the memory 703.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
In yet another embodiment provided by the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program realizes the steps of any one of the above data reading methods when executed by a processor.
In yet another embodiment, a computer program product containing instructions is provided, which when run on a computer, causes the computer to perform any of the data reading methods of the above embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, the electronic device, the computer-readable storage medium, and the computer program, since they are substantially similar to the method embodiments, the description is relatively simple, and in relation to them, reference may be made to the partial description of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A data reading method applied to a management device, wherein the management device is in communication with a file system and an object storage system in a data storage system, and the object storage system stores data for deep learning network model training, the method comprising:
receiving a data request sent by a file system; the data request is a request sent by the file system when receiving a training data acquisition request sent by a training task end; the data request carries a target path of the data requested by the training data acquisition request in the file system, and the training task end is as follows: the task end is used for training the deep learning network model;
determining a target data identifier corresponding to the target path based on a corresponding relation between the pre-established path and the data identifier; wherein, the data identifier corresponding to each path is: data identification of data under the path in the file system in the object storage system;
and reading target data from the object storage system based on the target data identification, and feeding back the target data to the file system, so that the file system feeds back the target data to the task side after receiving the target data.
2. The method according to claim 1, wherein the management apparatus further comprises a name manager for storing a correspondence between a pre-established path and a data identifier;
the determining a target data identifier corresponding to the target path based on a correspondence between a pre-established path and a data identifier includes:
sending an identifier acquisition request carrying the target path to the name manager, so that the management device feeds back the target data identifier after determining the target data identifier corresponding to the target path based on the corresponding relationship;
and receiving the target data identification fed back by the name manager.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
when the object storage system is detected to store data, acquiring a data identifier of the data stored by the object storage system and a path aiming at the stored data;
and establishing a corresponding relation between the acquired path and the acquired data identification.
4. The method of claim 3, wherein obtaining a path for the stored data comprises:
acquiring a directory path of each layer of data directory where the stored data are located and a data path of the stored data;
the establishing of the corresponding relationship between the acquired path and the acquired data identifier includes:
and establishing a corresponding relation between the acquired directory path and the acquired data identifier for each path in the acquired directory path and the acquired data path.
5. The method of claim 4,
establishing a corresponding relationship between the obtained directory path and the obtained data identifier for each of the obtained directory path and data path, including:
and establishing a corresponding relation between the path and the acquired data identification by aiming at each path in the acquired directory path and data path in a multi-branch tree structure.
6. The method of claim 4, wherein the type of the target path is a directory path or a data path.
7. A data reading apparatus applied to a management apparatus, the management apparatus communicating with a file system and an object storage system in a data storage system, and the object storage system storing therein data for deep learning network model training, the apparatus comprising:
the request receiving module is used for receiving a data request sent by the file system; the data request is a request sent by the file system when receiving a training data acquisition request sent by a training task end; the data request carries a target path of the data requested by the training data acquisition request in the file system, and the training task end is as follows: the task end is used for training the deep learning network model;
the identification determining module is used for determining a target data identification corresponding to the target path based on the corresponding relation between the pre-established path and the data identification; wherein, the data identifier corresponding to each path is: data identification of data under the path in the file system in the object storage system;
and the data reading module is used for reading target data from the object storage system based on the target data identification and feeding back the target data to the file system, so that the file system feeds back the target data to the task terminal after receiving the target data.
8. The apparatus according to claim 7, wherein the cache manager of the management apparatus is specifically applied, and the management apparatus further includes a name manager, and the name manager is configured to store a correspondence between a pre-established path and a data identifier;
the identifier determining module is specifically configured to send an identifier obtaining request carrying the target path to the name manager, so that the management device feeds back the target data identifier after determining the target data identifier corresponding to the target path based on the correspondence; and receiving the target data identification fed back by the name manager.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-6 when executing a program stored in the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-6.
CN202110063847.4A 2021-01-18 2021-01-18 Data reading method and device and electronic equipment Pending CN112783443A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110063847.4A CN112783443A (en) 2021-01-18 2021-01-18 Data reading method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110063847.4A CN112783443A (en) 2021-01-18 2021-01-18 Data reading method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN112783443A true CN112783443A (en) 2021-05-11

Family

ID=75757164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110063847.4A Pending CN112783443A (en) 2021-01-18 2021-01-18 Data reading method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112783443A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102243660A (en) * 2011-07-18 2011-11-16 中兴通讯股份有限公司 Data access method and device
CN105550252A (en) * 2015-12-09 2016-05-04 北京金山安全软件有限公司 File positioning method and device and electronic equipment
CN110780819A (en) * 2019-10-25 2020-02-11 浪潮电子信息产业股份有限公司 Data read-write method of distributed storage system
CN111258958A (en) * 2020-01-10 2020-06-09 北京猎豹移动科技有限公司 Data acquisition method, data providing method and device
CN112015696A (en) * 2020-08-21 2020-12-01 北京奇艺世纪科技有限公司 Data access method, data relationship setting method, data access device, data relationship setting device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102243660A (en) * 2011-07-18 2011-11-16 中兴通讯股份有限公司 Data access method and device
CN105550252A (en) * 2015-12-09 2016-05-04 北京金山安全软件有限公司 File positioning method and device and electronic equipment
CN110780819A (en) * 2019-10-25 2020-02-11 浪潮电子信息产业股份有限公司 Data read-write method of distributed storage system
CN111258958A (en) * 2020-01-10 2020-06-09 北京猎豹移动科技有限公司 Data acquisition method, data providing method and device
CN112015696A (en) * 2020-08-21 2020-12-01 北京奇艺世纪科技有限公司 Data access method, data relationship setting method, data access device, data relationship setting device and storage medium

Similar Documents

Publication Publication Date Title
US11647097B2 (en) Providing access to managed content
CN110046133B (en) Metadata management method, device and system for storage file system
CN109145201B (en) Information management method, device, equipment and storage medium based on block chain
CN109587258B (en) Service activity detection method and device
CN109213604B (en) Data source management method and device
CN107103011B (en) Method and device for realizing terminal data search
US11574025B2 (en) Systems and methods for managed asset distribution in a distributed heterogeneous storage environment
US12056089B2 (en) Method and system for deleting obsolete files from a file system
CN108154024B (en) Data retrieval method and device and electronic equipment
CN111488377A (en) Data query method and device, electronic equipment and storage medium
CN111831915A (en) Method, device, electronic equipment and storage medium for responding to data query request
CN111597259A (en) Data storage system, method, device, electronic equipment and storage medium
CN110427538B (en) Data query method, data storage method, data query device, data storage device and electronic equipment
CN114172752B (en) Group type interconnection method for nodes of Internet of things
CN112783842B (en) Log collection method and device
CN112783443A (en) Data reading method and device and electronic equipment
CN114281410A (en) Redis cluster proxy mode instruction response method and device and electronic equipment
CN112783843A (en) Data reading method and device and electronic equipment
KR20170125665A (en) Semantic Information Management Method for a M2M/IoT platform
CN111698324B (en) Data request method, device and system
CN116305288B (en) Method, device, equipment and storage medium for isolating database resources
CN111163088B (en) Message processing method, system and device and electronic equipment
CN111723146B (en) Method for monitoring database, management system and storage medium
CN117493274A (en) Cold catalog searching method and device, electronic equipment and storage medium
CN115982183A (en) Data difference updating method and device based on custom path

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210511