CN111258958A - Data acquisition method, data providing method and device - Google Patents

Data acquisition method, data providing method and device Download PDF

Info

Publication number
CN111258958A
CN111258958A CN202010030125.4A CN202010030125A CN111258958A CN 111258958 A CN111258958 A CN 111258958A CN 202010030125 A CN202010030125 A CN 202010030125A CN 111258958 A CN111258958 A CN 111258958A
Authority
CN
China
Prior art keywords
file
bucket
directory
server
directory information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010030125.4A
Other languages
Chinese (zh)
Inventor
余虹建
李锦丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shell Internet Beijing Security Technology Co Ltd
Beijing Cheetah Mobile Technology Co Ltd
Original Assignee
Shell Internet Beijing Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shell Internet Beijing Security Technology Co Ltd filed Critical Shell Internet Beijing Security Technology Co Ltd
Priority to CN202010030125.4A priority Critical patent/CN111258958A/en
Publication of CN111258958A publication Critical patent/CN111258958A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The embodiment of the invention discloses a data acquisition method, a data providing method and a data providing device, relates to the technical field of computers, and can effectively improve the acquisition speed of training data in model training. The data acquisition method comprises the following steps: sending a directory information mounting request of a file to an object storage server, wherein the mounting request carries a bucket name of a file bucket where the file is located in the object storage server; according to a response message returned by the object storage server, mounting a directory bucket corresponding to the file bucket to a local preset mounting point, wherein the directory bucket stores directory information of the file; the directory information indicates a storage path of the file in the object storage server; and acquiring the file from the object storage server according to the directory information. The method can be applied to model training of machine learning.

Description

Data acquisition method, data providing method and device
Technical Field
The invention relates to the technical field of computers, in particular to a data acquisition method, a data providing method and a data providing device.
Background
In recent years, artificial intelligence technology has become more and more widely used in industry and life. Machine learning is an important branch in the field of artificial intelligence, and can obtain a relatively ideal mathematical model through a large amount of training data, so that human thinking is simulated.
However, since the amount of data required for model training is huge, often in the order of tens of millions of files, the reading speed of the training data becomes an important factor affecting the efficiency of model training.
For the problem that the reading speed of training data is slow in model training, an effective solution is not available in the related field.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data acquisition method, a data providing method, an apparatus, an electronic device, and a storage medium, which can effectively improve the acquisition speed of training data in model training.
In a first aspect, an embodiment of the present invention provides a data acquisition method, including:
sending a directory information mounting request of a file to an object storage server, wherein the mounting request carries a bucket name of a file bucket where the file is located in the object storage server;
according to a response message returned by the object storage server, mounting a directory bucket corresponding to the file bucket to a local preset mounting point, wherein the directory bucket stores directory information of the file; the directory information indicates a storage path of the file in the object storage server;
and acquiring the file from the object storage server according to the directory information.
Optionally, before sending the request for mounting the directory information of the file to the object storage server, the method further includes:
sending an authentication request to the object storage server;
the sending of the directory information mount request of the file to the object storage server includes:
and after receiving the message of successful authentication sent by the object storage server, sending a directory information mounting request of the file to the object storage server.
Optionally, before acquiring the file from the object storage server according to the directory information, the method further includes:
acquiring file screening rule information;
the obtaining the file from the object storage server according to the directory information includes:
reading the directory information in the directory bucket, and constructing a hierarchical directory structure according to the directory information;
selecting a target file to be read from the hierarchical directory structure according to the file screening rule information;
and acquiring the target file from the object storage server according to the storage path of the target file indicated in the hierarchical directory structure.
Optionally, the file screening rule information includes at least one of a number, a distribution, and a size of the target file.
Optionally, the directory information is a directory file with a preset structure, or a file name list.
Optionally, the mount request further carries an address of a load balancing server corresponding to the object storage server, so as to access the object storage server through the load balancing server, where the load balancing server is configured to distribute the network access request to the object storage server in a balanced manner according to a load balancing algorithm.
In a second aspect, an embodiment of the present invention further provides a data providing method, including:
receiving a directory information mounting request sent by a model training server, wherein the mounting request carries the bucket name of a file bucket where a file is located;
returning a response message to the model training server so that the model training server mounts the directory bucket corresponding to the file bucket to a preset mounting point of the model training server, wherein the directory bucket stores directory information of the files; the directory information indicates a storage path of the file;
and providing the file to the model training server according to the directory information.
Optionally, before receiving the directory information mount request sent by the model training server, the method further includes:
receiving an authentication request sent by the model training server;
the receiving of the directory information mount request sent by the model training server includes:
and receiving a directory information mounting request sent by the model training server after the model training server is successfully authenticated.
Optionally, the method further includes, before mounting the directory bucket corresponding to the file bucket to a preset mounting point of the model training server according to the bucket name of the file bucket, that:
generating directory information of a file system corresponding to the storage object according to the object name of the storage object in the file bucket;
and storing the directory information in a pre-established directory bucket corresponding to the file bucket.
Optionally, the generating directory information of the file system corresponding to the storage object according to the object name of the storage object in the file bucket includes:
scanning the object name of an object stored in the file bucket, and splitting the object name to form the directory information;
and/or
And receiving registration information of the storage object in the file bucket, and forming the directory information according to the registration information.
In a third aspect, an embodiment of the present invention further provides a data acquisition apparatus, including:
the device comprises a request sending unit, a storage unit and a processing unit, wherein the request sending unit is used for sending a directory information mounting request of a file to an object storage server, and the mounting request carries a bucket name of a file bucket where the file is located in the object storage server;
the directory mounting unit is used for mounting a directory bucket corresponding to the file bucket to a local preset mounting point according to a response message returned by the object storage server, wherein the directory bucket stores directory information of the file; the directory information indicates a storage path of the file in the object storage server;
and the file acquisition unit is used for acquiring the file from the object storage server according to the directory information.
Optionally, the request sending unit is further configured to:
before sending a directory information mounting request of a file to an object storage server, sending an authentication request to the object storage server;
and after receiving the message of successful authentication sent by the object storage server, sending a directory information mounting request of the file to the object storage server.
Optionally, the data acquiring apparatus further includes:
a rule obtaining unit, configured to obtain file screening rule information before obtaining the file from an object storage server according to the directory information;
the file acquiring unit includes:
the construction module is used for reading the directory information in the directory bucket and constructing a hierarchical directory structure according to the directory information;
the selection module is used for selecting a target file to be read from the hierarchical directory structure according to the file screening rule information;
and the acquisition module is used for acquiring the target file from the object storage server according to the storage path of the target file indicated in the hierarchical directory structure.
Optionally, the file screening rule information includes at least one of a number, a distribution, and a size of the target file.
Optionally, the directory information is a directory file with a preset structure, or a file name list.
Optionally, the mount request further carries an address of a load balancing server corresponding to the object storage server, so as to access the object storage server through the load balancing server, where the load balancing server is configured to distribute the network access request to the object storage server in a balanced manner according to a load balancing algorithm.
In a fourth aspect, an embodiment of the present invention further provides a data providing apparatus, including:
the device comprises a request receiving unit, a model training server and a processing unit, wherein the request receiving unit is used for receiving a directory information mounting request sent by the model training server, and the mounting request carries the bucket name of a file bucket where a file is located;
a response returning unit, configured to return a response message to the model training server, so that the model training server mounts a directory bucket corresponding to the file bucket to a preset mounting point of the model training server, where directory information of the file is stored in the directory bucket; the directory information indicates a storage path of the file;
and the file providing unit is used for providing the file for the model training server according to the directory information.
Optionally, the request receiving unit is further configured to:
receiving an authentication request sent by the model training server;
and receiving a directory information mounting request sent by the model training server after the model training server is successfully authenticated.
Optionally, the data providing apparatus further includes:
the directory generation unit is used for generating directory information of a file system corresponding to a storage object according to the object name of the storage object in the file bucket before the directory bucket corresponding to the file bucket is mounted to a preset mounting point of the model training server;
and the directory storage unit is used for storing the directory information in a directory bucket which is established in advance and corresponds to the file bucket.
Optionally, the catalog generation unit includes:
the first generation module is used for scanning the object name of an object stored in the file bucket and splitting the object name to form the directory information;
and/or
And the second generation module is used for receiving the registration information of the storage object in the file bucket and forming the directory information according to the registration information.
In a fifth aspect, an embodiment of the present invention further provides an electronic device, including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for executing any one of the data acquisition methods or the data providing methods provided by the embodiments of the present invention.
In a sixth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where one or more programs are stored, and the one or more programs are executable by one or more processors to implement any one of the data acquisition methods or the data providing methods provided by the embodiments of the present invention.
According to the data acquisition method, the data providing device, the electronic equipment and the storage medium provided by the embodiment of the invention, when the model training server needs to read the training data, the directory information of each file in the training data can be acquired from the hierarchical directory server, and the storage path of the file in the object storage server is acquired according to the directory information, so that the file can be quickly acquired through the storage path, the situation that the file is directly searched in mass data of the object storage server is avoided, and the data acquisition speed can be effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a data interaction application scenario in an embodiment of the present invention;
FIG. 2 is a flow chart of a data acquisition method according to an embodiment of the present invention;
fig. 3 is a flowchart of a data providing method according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a data acquisition apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a data providing apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In machine learning, on the one hand, a computer with powerful computing power is required for model training, and on the other hand, sufficient data samples are also required for computer learning. These two tasks may be performed by different servers, respectively. For example, model training may be performed by a model training server, while training data may be provided by an object storage server. The model training server may interact with the object store server to obtain data from the object store server and to perform model training using the data. For example, an interaction diagram of a model training server and an object storage server may be shown in fig. 1.
In a first aspect, embodiments of the present invention provide a data acquisition method, which can effectively improve the acquisition speed of training data in model training.
As shown in fig. 2, a data obtaining method provided in an embodiment of the present invention, based on a model training server, may include:
s11, sending a directory information mounting request of a file to an object storage server, wherein the mounting request carries the bucket name of a file bucket where the file is located in the object storage server;
in an embodiment of the present invention, the model training server may read the training data in a manner of reading a file. Each file may be stored in the object storage server in the form of an object. All training data required for a model training task may form a data set. The data in the data set may be read by the model training server in the form of a file. Each dataset may be stored in a bucket (bucket) of the object storage server, and the buckets that load the datasets may also be referred to as file buckets, each of which has its own bucket name. Due to the large amount of data required for model training, alternatively, a data set may often include tens of millions of file numbers.
When the model training server needs to read data for model training, a directory information mount request may be sent to the object storage server first, and the bucket name of the file bucket where the data set required for this training is located is carried in the mount request, so that the object storage server returns the directory information corresponding to the bucket name.
S12, according to the response message returned by the object storage server, mounting the directory bucket corresponding to the file bucket to a local preset mounting point, wherein the directory bucket stores directory information of the file; the directory information indicates a storage path of the file in the object storage server;
after sending the directory information mount request to the object storage server, in this step, the model training server may receive response information returned by the object storage server, and mount the directory bucket corresponding to the file bucket to a local preset mount point according to the response information.
Optionally, in the Linux system, the mount command may be implemented by an mnt command. For example, command/root/batfs — endpoint http:// 192.168.10.244: 8080 abc/mnt/antfs/can indicate that the bucket of files with bucket names abc is mounted under the antfs directory. In embodiments of the present invention, each file bucket has a corresponding directory bucket, e.g., file bucket aaa may have a corresponding directory bucket aaa-c. When the file bucket needs to be mounted, firstly, the directory bucket corresponding to the file bucket is mounted. The directory information of the files in the file bucket is stored in the directory bucket, and only the file names and the storage paths are recorded in the directory information, so that the data volume of the directory bucket is very small, clear guidance can be provided for acquiring the files, and the time required for acquiring the files is greatly shortened.
Alternatively, the directory information may be a hierarchical directory structure of a multi-way tree, which indicates storage paths of the files.
And S13, acquiring the file from the object storage server according to the directory information.
In this step, since the directory bucket of the file is mounted locally, when the file needs to be read, the specific position of the file in the object storage server can be quickly obtained from the directory information of the directory bucket, and the corresponding file can be quickly obtained according to a clue.
According to the data acquisition method provided by the embodiment of the invention, the model training server can send a directory information mounting request of a file to the object storage server, the mounting request carries the bucket name of the file bucket where the file is located in the object storage server, the directory bucket corresponding to the file bucket is mounted to a local preset mounting point according to a response message returned by the object storage server, and the file is acquired from the object storage server according to the directory information. Therefore, when the model training server needs to read the training data, the directory information of each file in the training data can be obtained from the hierarchical directory server, and the storage path of the file in the object storage server is obtained according to the directory information, so that the file can be quickly obtained through the storage path, the situation that the file is directly searched in the mass data of the object storage server is avoided, and the data obtaining speed can be effectively improved.
Further, to enhance data security, in an embodiment of the present invention, before sending a request for mounting directory information of a file to an object storage server, a data obtaining method provided in an embodiment of the present invention may further include: sending an authentication request to the object storage server; based on this, the sending of the directory information mount request of the file to the object storage server includes: and after receiving the message of successful authentication sent by the object storage server, sending a directory information mounting request of the file to the object storage server.
Optionally, the authentication request may carry information such as an identity, an IP address, and a security key of the model training server. After receiving the authentication request, the object storage server can verify the information, and after the verification is passed, the object storage server sends an authentication success message to the model training server. And after receiving the message of successful authentication, the model training server sends a directory information mounting request of the file to the object storage server.
It can be understood that model training is a process of training through a large amount of data, and continuously iterating and optimizing. In this process, in the embodiment of the present invention, the server may read data from the data set in batches according to a certain rule to perform model training, and after all data in the data set are read for one round, the model parameters may be adjusted, and then, the data set is read for a second round. The repetition can reach dozens, hundreds or even higher.
In order to further improve the data acquisition efficiency, in an embodiment of the present invention, the model training server may first acquire the file screening rule information when reading the files from the data set, so that a batch of corresponding files can be acquired in the data set according to the file screening rule information, instead of requesting and providing the files one by one, thereby effectively reducing the communication time.
Specifically, in an embodiment of the present invention, before the step S13 of acquiring the file from the object storage server according to the directory information, the data acquisition method provided in an embodiment of the present invention may further include: acquiring file screening rule information; based on this, the acquiring the file from the object storage server according to the directory information in step S13 may include:
reading the directory information in the directory bucket, and constructing a hierarchical directory structure according to the directory information;
selecting a target file to be read from the hierarchical directory structure according to the file screening rule information;
and acquiring the target file from the object storage server according to the storage path of the target file indicated in the hierarchical directory structure.
The file screening rule information may refer to a policy for organizing files by the model training server, that is, how to read the files each time, so as to perform model training using the batch of files. Optionally, in an embodiment of the present invention, the file filtering rule information may include one or more of the number, distribution, and size of the target files that need to be read. The number of the target files may refer to the number of files read in each batch, the distribution of the target files may refer to the storage path of the files read in each batch and the number of files read under each storage path, and the size of the target files may refer to the size of the data size of each file read.
Illustratively, in one embodiment of the present invention, the file filtering rule information may include: 15 files are read under directory c1, 10 files under directory c2, 15 files under directory c3, and 30 files under directory c4, wherein the file sizes of the files are less than 200M. Then, according to the file screening rule information, corresponding target files may be selected from the directory information (optionally, the directory information in the directory bucket may also record the file size of each file), and then the target files are obtained from the object storage server according to the storage path provided by the directory information. After the sequence of the target files is broken up, model training is carried out.
Optionally, the specific form of the directory information in the directory bucket is not limited, and the directory information may be a directory file with a preset structure, such as a structured file like JSON or YAML, or may be a simple file name list. This form of directory information is easier to load for structured directory files and easier to maintain for directory files that have file names tiled directly to form a list of file names.
In order to further increase the data acquisition speed and reduce the network request delay, in an embodiment of the present invention, in step S11, the directory information mount request sent by the model training server to the object storage server may further carry an address of a load balancing server corresponding to the object storage server, so as to access the object storage server through the load balancing server, where the load balancing server is configured to distribute the network access request to the object storage server in a balanced manner according to a load balancing algorithm.
Therefore, each node in the object storage cluster is independent, the access load can be evenly distributed to all the nodes in the cluster, on one hand, the problem that common resources in an NAS and a cluster file system are unreasonably utilized can be avoided, on the other hand, reasonable nodes can be automatically selected for data reading, and the performance of the system is guaranteed to be maximized.
Correspondingly, in a second aspect, an embodiment of the present invention further provides a data providing method, which can effectively improve the acquisition speed of training data in model training.
As shown in fig. 3, a data providing method provided by an embodiment of the present invention, based on an object storage server, may include:
s21, receiving a directory information mounting request sent by a model training server, wherein the mounting request carries the bucket name of a file bucket where a file is located;
in an embodiment of the present invention, the model training server may read the training data in a manner of reading a file. Each file may be stored in the object storage server in the form of an object. All training data required for a model training task may form a data set. The data in the data set may be read by the model training server in the form of a file. Each dataset may be stored in a bucket (bucket) of the object storage server, and the buckets that load the datasets may also be referred to as file buckets, each of which has its own bucket name. Due to the large amount of data required for model training, alternatively, a data set may often include tens of millions of file numbers.
When the model training server needs to read data for model training, a directory information mount request may be sent to the object storage server first, and the bucket name of the file bucket where the data set required for this training is located is carried in the mount request, so that the object storage server returns the directory information corresponding to the bucket name.
S22, returning a response message to the model training server to enable the model training server to mount the directory bucket corresponding to the file bucket to a preset mounting point of the model training server, wherein the directory bucket stores directory information of the files; the directory information indicates a storage path of the file;
in this step, the directory bucket corresponding to the file bucket may be mounted to a preset mounting point of the model training server by returning a response message to the model training server.
And S23, providing the file to the model training server according to the directory information.
In the data providing method provided by the embodiment of the invention, the object storage server can receive a directory information mounting request sent by the model training server, the mounting request carries the bucket name of the file bucket where the file is located, a response message is returned to the model training server, so that the model training server mounts the directory bucket corresponding to the file bucket to a preset mounting point of the model training server, and the file is provided to the model training server according to the directory information. Therefore, when the model training server needs to read the training data, the directory information of each file in the training data can be provided for the model training server, so that the model training server can obtain the storage path of the file in the object storage server according to the directory information, the file can be quickly obtained through the storage path, the situation that the file is directly searched in the mass data of the object storage server is avoided, and the data obtaining speed can be effectively improved.
Further, to enhance data security, in an embodiment of the present invention, before receiving a catalog information mount request sent by a model training server, a data providing method provided by an embodiment of the present invention may further include: receiving an authentication request sent by the model training server; based on this, receiving the catalog information mount request sent by the model training server may include: and receiving a directory information mounting request sent by the model training server after the model training server is successfully authenticated.
Optionally, the authentication request may carry information such as an identity, an IP address, and a security key of the model training server. After receiving the authentication request, the object storage server can verify the information, and after the verification is passed, the object storage server sends an authentication success message to the model training server.
Optionally, in order to quickly give feedback to the model training server when a mount request of the directory information is received, in an embodiment of the present invention, corresponding directory information may be generated for a file in the object storage server in advance.
Specifically, in an embodiment of the present invention, before the mounting the directory bucket corresponding to the file bucket to the preset mounting point of the model training server according to the bucket name of the file bucket in step S22, the data providing method provided in the embodiment of the present invention may further include:
generating directory information of a file system corresponding to the storage object according to the object name of the storage object in the file bucket;
and storing the directory information in a pre-established directory bucket corresponding to the file bucket.
Optionally, in an embodiment of the present invention, generating directory information of a file system corresponding to a storage object according to an object name of the storage object in the file bucket may specifically include: and scanning the object name of the object stored in the bucket, and splitting the object name to form the directory information. For example, in one embodiment of the invention, the object name of one storage object is:
ImageNet-large/train/n01782516_10048.JPEG
the object name can be split into: ImageNet-large, train, n01782516_10048.JPEG, the directory information so formed may be: ImageNet-large \ train \ n01782516_10048 JPEG, i.e. the n01782516_10048 JPEG file under the train subfolder under the ImageNet-large folder. After all the files in the bucket are scanned, directory information containing storage paths of all the files can be formed.
Optionally, in another embodiment of the present invention, generating directory information of a file system corresponding to a storage object according to an object name of the storage object in the file bucket may also include: and receiving the registration information of the storage object in the bucket, and forming the directory information according to the registration information. That is, when an object is newly stored in the object storage server, a registration request may be actively initiated to the object storage server, and the object storage server may generate directory information according to the received registration information, such as an object name.
Optionally, the two ways of generating directory information may be used separately or in combination, for example, all objects already stored in the current bucket may be scanned to generate directory information, and when a new object is added in a later period, the directory information of the object may be generated through the registration information of the object and added to the original directory information.
In a third aspect, an embodiment of the present invention further provides a data acquisition apparatus, which can effectively improve the acquisition speed of training data in model training.
As shown in fig. 4, an embodiment of the present invention provides a data acquisition apparatus, which may include:
a request sending unit 31, configured to send a directory information mount request of a file to an object storage server, where the mount request carries a bucket name of a file bucket where the file is located in the object storage server;
a directory mount unit 32, configured to mount, according to a response message returned by the object storage server, a directory bucket corresponding to the file bucket to a local preset mount point, where the directory bucket stores directory information of the file; the directory information indicates a storage path of the file in the object storage server;
a file obtaining unit 33, configured to obtain the file from the object storage server according to the directory information.
The data acquisition device provided by the embodiment of the invention can send a directory information mounting request of a file to an object storage server, the mounting request carries a bucket name of a file bucket where the file is located in the object storage server, the directory bucket corresponding to the file bucket is mounted to a local preset mounting point according to a response message returned by the object storage server, and the file is acquired from the object storage server according to the directory information. Therefore, when the model training server needs to read the training data, the directory information of each file in the training data can be obtained from the hierarchical directory server, and the storage path of the file in the object storage server is obtained according to the directory information, so that the file can be quickly obtained through the storage path, the situation that the file is directly searched in the mass data of the object storage server is avoided, and the data obtaining speed can be effectively improved.
Optionally, the request sending unit 31 may be further configured to:
before sending a directory information mounting request of a file to an object storage server, sending an authentication request to the object storage server;
and after receiving the message of successful authentication sent by the object storage server, sending a directory information mounting request of the file to the object storage server.
Optionally, the data acquiring apparatus may further include: a rule obtaining unit, configured to obtain file screening rule information before obtaining the file from an object storage server according to the directory information;
based on this, the file acquiring unit 33 may include:
the construction module is used for reading the directory information in the directory bucket and constructing a hierarchical directory structure according to the directory information;
the selection module is used for selecting a target file to be read from the hierarchical directory structure according to the file screening rule information;
and the acquisition module is used for acquiring the target file from the object storage server according to the storage path of the target file indicated in the hierarchical directory structure.
Optionally, the file screening rule information includes at least one of a number, a distribution, and a size of the target file.
Optionally, the directory information is a directory file with a preset structure, or a file name list.
Optionally, the mount request further carries an address of a load balancing server corresponding to the object storage server, so as to access the object storage server through the load balancing server, where the load balancing server is configured to distribute the network access request to the object storage server in a balanced manner according to a load balancing algorithm.
In a fourth aspect, an embodiment of the present invention provides a data providing apparatus, which can effectively improve the acquisition speed of training data in model training.
As shown in fig. 5, the data providing apparatus provided by the embodiment of the present invention may include:
a request receiving unit 41, configured to receive a directory information mount request sent by a model training server, where the mount request carries a bucket name of a file bucket where a file is located;
a response returning unit 42, configured to return a response message to the model training server, so that the model training server mounts a directory bucket corresponding to the file bucket to a preset mounting point of the model training server, where the directory bucket stores directory information of the file; the directory information indicates a storage path of the file;
a file providing unit 43, configured to provide the file to the model training server according to the directory information.
The data providing device provided by the embodiment of the invention can receive a directory information mounting request sent by a model training server, the mounting request carries the bucket name of a file bucket where a file is located, a response message is returned to the model training server, so that the model training server mounts the directory bucket corresponding to the file bucket to a preset mounting point of the model training server, and the file is provided to the model training server according to the directory information. Therefore, when the model training server needs to read the training data, the directory information of each file in the training data can be provided for the model training server, so that the model training server can obtain the storage path of the file in the object storage server according to the directory information, the file can be quickly obtained through the storage path, the situation that the file is directly searched in the mass data of the object storage server is avoided, and the data obtaining speed can be effectively improved.
Optionally, the request receiving unit 41 may further be configured to:
receiving an authentication request sent by the model training server;
and receiving a directory information mounting request sent by the model training server after the model training server is successfully authenticated.
Optionally, the data providing apparatus may further include:
the directory generation unit is used for generating directory information of a file system corresponding to a storage object according to the object name of the storage object in the file bucket before the directory bucket corresponding to the file bucket is mounted to a preset mounting point of the model training server;
and the directory storage unit is used for storing the directory information in a directory bucket which is established in advance and corresponds to the file bucket.
Optionally, the catalog generation unit may include:
the first generation module is used for scanning the object name of an object stored in the file bucket and splitting the object name to form the directory information;
and/or
And the second generation module is used for receiving the registration information of the storage object in the file bucket and forming the directory information according to the registration information.
In a fifth aspect, an embodiment of the present invention further provides an electronic device, which can effectively improve an acquisition speed of training data in model training.
As shown in fig. 6, an electronic device provided in an embodiment of the present invention may include: the device comprises a shell 51, a processor 52, a memory 53, a circuit board 54 and a power circuit 55, wherein the circuit board 54 is arranged inside a space enclosed by the shell 51, and the processor 52 and the memory 53 are arranged on the circuit board 54; a power supply circuit 55 for supplying power to each circuit or device of the electronic apparatus; the memory 53 is used to store executable program code; the processor 52 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 53, for executing the data acquisition method or the data providing method provided in any of the foregoing embodiments.
For specific execution processes of the above steps by the processor 52 and further steps executed by the processor 52 by running the executable program code, reference may be made to the description of the foregoing embodiments, and details are not described herein again.
The above electronic devices exist in a variety of forms, including but not limited to:
(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include: smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads.
(3) A portable entertainment device: such devices can display and play multimedia content. This type of device comprises: audio, video players (e.g., ipods), handheld game consoles, electronic books, and smart toys and portable car navigation devices.
(4) A server: the device for providing the computing service comprises a processor, a hard disk, a memory, a system bus and the like, and the server is similar to a general computer architecture, but has higher requirements on processing capacity, stability, reliability, safety, expandability, manageability and the like because of the need of providing high-reliability service.
(5) And other electronic equipment with data interaction function.
Accordingly, an embodiment of the present invention further provides a computer-readable storage medium, where one or more programs are stored, and the one or more programs can be executed by one or more processors to implement any one of the data acquisition methods or the data providing methods provided in the foregoing embodiments, so that corresponding technical effects can also be achieved, which have been described in detail above and are not described herein again.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.
In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
For convenience of description, the above devices are described separately in terms of functional division into various units/modules. Of course, the functionality of the units/modules may be implemented in one or more software and/or hardware implementations of the invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of data acquisition, comprising:
sending a directory information mounting request of a file to an object storage server, wherein the mounting request carries a bucket name of a file bucket where the file is located in the object storage server;
according to a response message returned by the object storage server, mounting a directory bucket corresponding to the file bucket to a local preset mounting point, wherein the directory bucket stores directory information of the file; the directory information indicates a storage path of the file in the object storage server;
and acquiring the file from the object storage server according to the directory information.
2. The method of claim 1, wherein before sending the request for the mount of the directory information of the file to the object storage server, the method further comprises:
sending an authentication request to the object storage server;
the sending of the directory information mount request of the file to the object storage server includes:
and after receiving the message of successful authentication sent by the object storage server, sending a directory information mounting request of the file to the object storage server.
3. The method of claim 1, wherein before retrieving the file from the object storage server according to the directory information, the method further comprises:
acquiring file screening rule information;
the obtaining the file from the object storage server according to the directory information includes:
reading the directory information in the directory bucket, and constructing a hierarchical directory structure according to the directory information;
selecting a target file to be read from the hierarchical directory structure according to the file screening rule information;
and acquiring the target file from the object storage server according to the storage path of the target file indicated in the hierarchical directory structure.
4. The method of claim 3, wherein the file filtering rule information comprises at least one of a number, a distribution, and a size of the target files.
5. The method according to claim 1, wherein the directory information is a directory file having a preset structure or a file name list.
6. The method according to any one of claims 1 to 5, wherein the mount request further carries an address of a load balancing server corresponding to the object storage server, so as to access the object storage server through the load balancing server, wherein the load balancing server is configured to distribute network access requests to the object storage server in a balanced manner according to a load balancing algorithm.
7. A data providing method, comprising:
receiving a directory information mounting request sent by a model training server, wherein the mounting request carries the bucket name of a file bucket where a file is located;
returning a response message to the model training server so that the model training server mounts the directory bucket corresponding to the file bucket to a preset mounting point of the model training server, wherein the directory bucket stores directory information of the files; the directory information indicates a storage path of the file;
and providing the file to the model training server according to the directory information.
8. The method of claim 7, wherein before receiving the request for the mount of directory information sent by the model training server, the method further comprises:
receiving an authentication request sent by the model training server;
the receiving of the directory information mount request sent by the model training server includes:
and receiving a directory information mounting request sent by the model training server after the model training server is successfully authenticated.
9. The method according to claim 7, wherein before mounting a directory bucket corresponding to the file bucket to a preset mounting point of the model training server according to the bucket name of the file bucket, the method further comprises:
generating directory information of a file system corresponding to the storage object according to the object name of the storage object in the file bucket;
and storing the directory information in a pre-established directory bucket corresponding to the file bucket.
10. The method according to claim 9, wherein the generating directory information of the file system corresponding to the storage object according to the object name of the storage object in the file bucket comprises:
scanning the object name of an object stored in the file bucket, and splitting the object name to form the directory information;
and/or
And receiving registration information of the storage object in the file bucket, and forming the directory information according to the registration information.
CN202010030125.4A 2020-01-10 2020-01-10 Data acquisition method, data providing method and device Pending CN111258958A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010030125.4A CN111258958A (en) 2020-01-10 2020-01-10 Data acquisition method, data providing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010030125.4A CN111258958A (en) 2020-01-10 2020-01-10 Data acquisition method, data providing method and device

Publications (1)

Publication Number Publication Date
CN111258958A true CN111258958A (en) 2020-06-09

Family

ID=70948671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010030125.4A Pending CN111258958A (en) 2020-01-10 2020-01-10 Data acquisition method, data providing method and device

Country Status (1)

Country Link
CN (1) CN111258958A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112650721A (en) * 2020-12-29 2021-04-13 杭州趣链科技有限公司 File storage method, device, system and equipment
CN112733892A (en) * 2020-12-28 2021-04-30 北京聚云科技有限公司 Data interaction method and device for model training
CN112783443A (en) * 2021-01-18 2021-05-11 北京聚云科技有限公司 Data reading method and device and electronic equipment
CN113918519A (en) * 2021-09-06 2022-01-11 中国长城科技集团股份有限公司 Folder loading method and device and terminal equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103475682A (en) * 2012-06-07 2013-12-25 华为技术有限公司 File transfer method and file transfer equipment
CN104639658A (en) * 2015-03-12 2015-05-20 浪潮集团有限公司 Realization method for accessing object storage by file system mounting
CN106156289A (en) * 2016-06-28 2016-11-23 北京百迈客云科技有限公司 The method of the data in a kind of read-write object storage system and device
CN106878457A (en) * 2017-03-24 2017-06-20 网宿科技股份有限公司 The attached storage method of distributed network and system
CN107045530A (en) * 2017-01-20 2017-08-15 华中科技大学 A kind of method that object storage system is embodied as to local file system
CN107451486A (en) * 2017-06-30 2017-12-08 华为技术有限公司 The authority setting method and device of a kind of file system
CN108089818A (en) * 2017-12-12 2018-05-29 腾讯科技(深圳)有限公司 Data processing method, device and storage medium
CN109002730A (en) * 2018-07-26 2018-12-14 郑州云海信息技术有限公司 A kind of file system directories right management method, device, equipment and storage medium
CN109151028A (en) * 2018-08-23 2019-01-04 郑州云海信息技术有限公司 A kind of distributed memory system disaster recovery method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103475682A (en) * 2012-06-07 2013-12-25 华为技术有限公司 File transfer method and file transfer equipment
CN104639658A (en) * 2015-03-12 2015-05-20 浪潮集团有限公司 Realization method for accessing object storage by file system mounting
CN106156289A (en) * 2016-06-28 2016-11-23 北京百迈客云科技有限公司 The method of the data in a kind of read-write object storage system and device
CN107045530A (en) * 2017-01-20 2017-08-15 华中科技大学 A kind of method that object storage system is embodied as to local file system
CN106878457A (en) * 2017-03-24 2017-06-20 网宿科技股份有限公司 The attached storage method of distributed network and system
CN107451486A (en) * 2017-06-30 2017-12-08 华为技术有限公司 The authority setting method and device of a kind of file system
CN108089818A (en) * 2017-12-12 2018-05-29 腾讯科技(深圳)有限公司 Data processing method, device and storage medium
CN109002730A (en) * 2018-07-26 2018-12-14 郑州云海信息技术有限公司 A kind of file system directories right management method, device, equipment and storage medium
CN109151028A (en) * 2018-08-23 2019-01-04 郑州云海信息技术有限公司 A kind of distributed memory system disaster recovery method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733892A (en) * 2020-12-28 2021-04-30 北京聚云科技有限公司 Data interaction method and device for model training
CN112650721A (en) * 2020-12-29 2021-04-13 杭州趣链科技有限公司 File storage method, device, system and equipment
CN112783443A (en) * 2021-01-18 2021-05-11 北京聚云科技有限公司 Data reading method and device and electronic equipment
CN113918519A (en) * 2021-09-06 2022-01-11 中国长城科技集团股份有限公司 Folder loading method and device and terminal equipment

Similar Documents

Publication Publication Date Title
CN111258958A (en) Data acquisition method, data providing method and device
CN112087487B (en) Scheduling method and device of model training task, electronic equipment and storage medium
CN110474820B (en) Flow playback method and device and electronic equipment
CN108769153B (en) Data processing method and system for network application
CN111467806A (en) Method, device, medium and electronic equipment for generating resources in game scene
CN108241797A (en) Mirror image warehouse user right management method, device, system and readable storage medium storing program for executing
CN104348919A (en) Method and device for downloading file and browser
WO2014146441A1 (en) Method, server and system for processing task data
CN111158750A (en) Unity-based game installation package packaging method and device
CN111258959A (en) Data acquisition method, data providing method and device
CN112084017B (en) Memory management method and device, electronic equipment and storage medium
CN103747032A (en) File transmission method, device and system
CN105872121A (en) Method for connecting terminal with server, terminal and domain name server
CN110652728A (en) Game resource management method and device, electronic equipment and storage medium
CN105224541B (en) Uniqueness control method, information storage means and the device of data
CN113965402A (en) Configuration method and device of firewall security policy and electronic equipment
CN113971163A (en) Small file merging storage method, small file reading method and server
CN112085208A (en) Method and device for model training by using cloud
CN111444542A (en) Data processing method, device and storage medium for copyright file
CN103220327B (en) user information storage method and device
CN111880896B (en) Method and device for rapidly restoring container and running state data
CN111080750B (en) Robot animation configuration method, device and system
US9280384B2 (en) Method, server and system for processing task data
CN108733805B (en) File interaction method, system, computer equipment and storage medium
Cheng et al. Hadoop environment management App based on mobile cloud computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201022

Address after: Room 91, 5 / F, building 5, yard 30, Shixing street, Shijingshan District, Beijing 100041

Applicant after: Beijing juyuncube Technology Co., Ltd

Address before: 100041 Beijing, Shijingshan District Xing Xing street, building 30, No. 3, building 2, A-0071

Applicant before: Beijing Cheetah Mobile Technology Co.,Ltd.