CN111258959A - Data acquisition method, data providing method and device - Google Patents

Data acquisition method, data providing method and device Download PDF

Info

Publication number
CN111258959A
CN111258959A CN202010030631.3A CN202010030631A CN111258959A CN 111258959 A CN111258959 A CN 111258959A CN 202010030631 A CN202010030631 A CN 202010030631A CN 111258959 A CN111258959 A CN 111258959A
Authority
CN
China
Prior art keywords
file
server
directory information
directory
bucket
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010030631.3A
Other languages
Chinese (zh)
Inventor
余虹建
李锦丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shell Internet Beijing Security Technology Co Ltd
Beijing Cheetah Mobile Technology Co Ltd
Original Assignee
Shell Internet Beijing Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shell Internet Beijing Security Technology Co Ltd filed Critical Shell Internet Beijing Security Technology Co Ltd
Priority to CN202010030631.3A priority Critical patent/CN111258959A/en
Publication of CN111258959A publication Critical patent/CN111258959A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a data acquisition method, a data providing method and a data providing device, relates to the technical field of computers, and can effectively improve the acquisition speed of training data in model training. The data acquisition method comprises the following steps: sending a directory information mounting request of a file to a hierarchical directory server, wherein the mounting request carries a bucket name of a bucket where the file is located in an object storage server; loading directory information which is returned by the hierarchical directory server and contains the bucket name into a memory, wherein the directory information indicates a storage path of the file in the object storage server; and acquiring the file from the object storage server according to the directory information. The invention can be applied to machine learning.

Description

Data acquisition method, data providing method and device
Technical Field
The invention relates to the technical field of computers, in particular to a data acquisition method, a data providing method and a data providing device.
Background
In recent years, artificial intelligence technology has become more and more widely used in industry and life. Machine learning is an important branch in the field of artificial intelligence, and an ideal mathematical model can be obtained through training of a large amount of data, so that human thinking is simulated.
However, because the amount of data required for model training is huge, often in the order of tens of millions of files, the reading speed of the training data becomes an important factor affecting the efficiency of model training.
For the problem that the reading speed of training data is slow in model training, an effective solution is not available in the related field.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data acquisition method, a data providing method, and a data providing device, which can effectively improve the acquisition speed of training data in model training.
In a first aspect, an embodiment of the present invention provides a data acquisition method, including:
sending a directory information mounting request of a file to a hierarchical directory server, wherein the mounting request carries a bucket name of a bucket where the file is located in an object storage server;
loading directory information which is returned by the hierarchical directory server and contains the bucket name into a memory, wherein the directory information indicates a storage path of the file in the object storage server;
and acquiring the file from the object storage server according to the directory information.
Optionally, before sending the request for mounting the directory information of the file to the hierarchical directory server, the method further includes:
sending an authentication request to the object storage server;
the sending of the directory information mount request of the file to the hierarchical directory server includes:
and after receiving the message of successful authentication sent by the object storage server, sending a directory information mounting request of the file to the hierarchical directory server.
Optionally, before acquiring the file from the object storage server according to the directory information, the method further includes:
acquiring file screening rule information;
the obtaining the file from the object storage server according to the directory information includes:
selecting a target file to be read from the directory information according to the file screening rule information;
and acquiring the target file from the object storage server according to the directory information of the target file.
Optionally, the file screening rule information includes at least one of the number, distribution, and size of the target files that need to be read.
Optionally, the mount request further carries an address of a load balancing server corresponding to the object storage server, so as to access the object storage server through the load balancing server, where the load balancing server is configured to distribute the network access request to the object storage server in a balanced manner according to a load balancing algorithm.
In a second aspect, an embodiment of the present invention further provides a data providing method, including:
receiving a directory information mounting request of a file sent by a model training server, wherein the mounting request carries a bucket name of a bucket where the file is located in an object storage server;
and sending directory information containing the bucket name to the model training server, wherein the directory information indicates the storage path of the file in the object storage server.
Optionally, before the sending the directory information including the bucket name to the model training server, the method further includes: and generating the directory information of the file system corresponding to the storage object according to the object name of the storage object in the bucket.
Optionally, the generating directory information of the file system corresponding to the storage object according to the object name of the storage object in the bucket includes:
scanning the object name of an object stored in the bucket, and splitting the object name to form the directory information;
and/or
And receiving the registration information of the storage object in the bucket, and forming the directory information according to the registration information.
In a third aspect, an embodiment of the present invention further provides a data acquisition apparatus, including:
the device comprises a request sending unit, a hierarchical directory server and a storage unit, wherein the request sending unit is used for sending a directory information mounting request of a file to the hierarchical directory server, and the mounting request carries a barrel name of a barrel where the file is located in an object storage server;
a directory loading unit, configured to load, to a memory, directory information including the bucket name, where the directory information is returned by the hierarchical directory server, and indicates a storage path of the file in the object storage server;
and the file acquisition unit is used for acquiring the file from the object storage server according to the directory information.
Optionally, the request sending unit is further configured to:
before sending a directory information mounting request of a file to a hierarchical directory server, sending an authentication request to the object storage server;
and after receiving the message of successful authentication sent by the object storage server, sending a directory information mounting request of the file to the hierarchical directory server.
Optionally, the data acquiring apparatus further includes:
a rule obtaining unit, configured to obtain file screening rule information before obtaining the file from an object storage server according to the directory information;
the file acquiring unit includes:
the selection module is used for selecting a target file to be read from the directory information according to the file screening rule information;
and the acquisition module is used for acquiring the target file from the object storage server according to the directory information of the target file.
Optionally, the file screening rule information includes at least one of the number, distribution, and size of the target files that need to be read.
Optionally, the mount request further carries an address of a load balancing server corresponding to the object storage server, so as to access the object storage server through the load balancing server, where the load balancing server is configured to distribute the network access request to the object storage server in a balanced manner according to a load balancing algorithm.
In a fourth aspect, an embodiment of the present invention further provides a data providing apparatus, including:
the device comprises a request receiving unit, a model training server and a processing unit, wherein the request receiving unit is used for receiving a directory information mounting request of a file sent by the model training server, and the mounting request carries a barrel name of a barrel where the file is located in an object storage server;
and a directory sending unit, configured to send directory information including the bucket name to the model training server, where the directory information indicates a storage path of the file in the object storage server.
Optionally, the data providing apparatus further includes: and the directory generation unit is used for generating the directory information of the file system corresponding to the storage object according to the object name of the storage object in the bucket before sending the directory information containing the bucket name to the model training server.
Optionally, the catalog generation unit includes:
the first generation module is used for scanning the object name of the object stored in the bucket and splitting the object name to form the directory information;
and/or
And the second generation module is used for receiving the registration information of the storage object in the bucket and forming the directory information according to the registration information. :
in a fifth aspect, an embodiment of the present invention further provides an electronic device, including: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory, for executing any one of the data acquisition apparatus or the data providing apparatus provided by the embodiments of the present invention.
In a sixth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where one or more programs are stored, and the one or more programs are executable by one or more processors to implement any one of the data acquisition apparatuses or the data providing apparatuses provided by the embodiments of the present invention.
According to the data acquisition method, the data providing device, the electronic equipment and the storage medium provided by the embodiment of the invention, when the model training server needs to read the training data, the directory information of each file in the training data can be acquired from the hierarchical directory server, and the storage path of the file in the object storage server is acquired according to the directory information, so that the file can be quickly acquired through the storage path, the situation that the file is directly searched in mass data of the object storage server is avoided, and the data acquisition speed can be effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram of a data interaction application scenario in an embodiment of the present invention;
FIG. 2 is a flow chart of a data acquisition method according to an embodiment of the present invention;
fig. 3 is a flowchart of a data providing method according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a data acquisition apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a data providing apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In machine learning, on the one hand, a computer with powerful computing power is required for model training, and on the other hand, sufficient data samples are also required for computer learning. These two tasks may be performed by different servers, respectively. For example, model training may be performed by a model training server, while training data may be provided by an object storage server. The model training server may interact with the object store server to obtain data from the object store server and to perform model training using the data. For example, an interaction diagram of a model training server and an object storage server may be shown in fig. 1.
In a first aspect, embodiments of the present invention provide a data acquisition method, which can effectively improve the acquisition speed of training data in model training.
As shown in fig. 2, a data obtaining method provided in an embodiment of the present invention, based on a model training server, may include:
s11, sending a directory information mounting request of the file to the hierarchical directory server, wherein the mounting request carries the bucket name of the bucket where the file is located in the object storage server;
the model training server may read the training data in a manner of reading a file. Each file may be stored in the object storage server in the form of an object. All training data required for a model training task may form a data set. Each data set may be stored in a bucket of the object store server. Due to the large amount of data required for model training, alternatively, a data set may often include tens of millions of file numbers.
The hierarchical directory server may provide the directory information of each file in the object storage server to the outside. Alternatively, the directory information may be a hierarchical directory structure of a multi-way tree, which indicates storage paths of the files. It should be noted that, the hierarchical directory server may be logically or functionally divided, and physically, the hierarchical directory server may be independent from the object storage server or may be integrated in the object storage server, which is not limited in this embodiment of the present invention.
When the model training server needs to read data for model training, a directory information mounting request can be sent to the hierarchical directory server, and the bucket name of the bucket where the data set required by the training is located is carried in the mounting request, so that the hierarchical directory server returns the directory information corresponding to the bucket name.
S12, loading directory information which is returned by the hierarchical directory server and contains the bucket name into a memory, wherein the directory information indicates a storage path of the file in the object storage server;
after sending the request for mounting the directory information to the hierarchical directory server, in this step, the training server may receive the directory information returned by the hierarchical directory server, and load the directory information to the memory, thereby implementing the mounting of the directory information.
Optionally, in the Linux system, the mount command may be implemented by an mnt command. For example, command/root/batfs — endpoint — http: // 192.168.10.244: 8080 abc/mnt/antfs/can indicate that directory information with bucket name abc is mounted under an antfs directory. The directory information only records the file name and the storage path, so that the data volume of the directory information is very small, clear guidance can be provided for acquiring the file, and the time required for acquiring the file is greatly shortened.
And S13, acquiring the file from the object storage server according to the directory information.
In this step, since the directory information of the file is loaded into the memory, when the file needs to be read, the specific position of the file in the object storage server can be quickly obtained from the directory information, and the corresponding file can be quickly obtained through a clue.
In the data acquisition method provided by the embodiment of the invention, the model training server can send a directory information mounting request of a file to the hierarchical directory server, the mounting request carries the bucket name of the bucket where the file is located in the object storage server, the directory information which is returned by the hierarchical directory server and contains the bucket name is loaded to the memory, and the file is acquired from the object storage server according to the directory information. Therefore, when the model training server needs to read the training data, the directory information of each file in the training data can be obtained from the hierarchical directory server, and the storage path of the file in the object storage server is obtained according to the directory information, so that the file can be quickly obtained through the storage path, the situation that the file is directly searched in the mass data of the object storage server is avoided, and the data obtaining speed can be effectively improved.
Further, to enhance data security, in an embodiment of the present invention, before sending a request for mounting directory information of a file to a hierarchical directory server, the data obtaining method provided in the embodiment of the present invention may further include: and sending an authentication request to the object storage server. Based on this, sending the directory information mount request of the file to the hierarchical directory server may specifically include: and after receiving the message of successful authentication sent by the object storage server, sending a directory information mounting request of the file to the hierarchical directory server.
Optionally, the authentication request may carry information such as an identity, an IP address, and a security key of the model training server. After receiving the authentication request, the object storage server can verify the information, and after the verification is passed, the object storage server sends an authentication success message to the model training server. And after receiving the message of successful authentication, the model training server sends a directory information mounting request of the file to the hierarchical directory server.
It can be understood that model training is a process of training through a large amount of data, and continuously iterating and optimizing. In this process, in the embodiment of the present invention, the server may read data from the data set in batches according to a certain rule to perform model training, and after all data in the data set are read for one round, the model parameters may be adjusted, and then, the data set is read for a second round. The repetition can reach dozens, hundreds or even higher.
In order to further improve the data acquisition efficiency, in an embodiment of the present invention, the model training server may first acquire the file screening rule information when reading the files from the data set, so that a batch of corresponding files can be acquired in the data set according to the file screening rule information, instead of requesting and providing the files one by one, thereby effectively reducing the communication time.
Specifically, in an embodiment of the present invention, before the step S13 acquires the file from the object storage server according to the directory information, the data acquisition method provided in an embodiment of the present invention may further include: acquiring file screening rule information; based on this, in step S13, acquiring the file from the object storage server according to the directory information may specifically include:
selecting a target file to be read from the directory information according to the file screening rule information;
and acquiring the target file from the object storage server according to the directory information of the target file.
The file screening rule information may refer to a policy for organizing files by the model training server, that is, how to read the files each time, so as to perform model training using the batch of files. Optionally, in an embodiment of the present invention, the file filtering rule information may include one or more of the number, distribution, and size of the target files that need to be read. The number of the target files may refer to the number of files read in each batch, the distribution of the target files may refer to the storage path of the files read in each batch and the number of files read under each storage path, and the size of the target files may refer to the size of the data size of each file read.
Illustratively, in one embodiment of the present invention, the file filtering rule information may include: 15 files are read under directory c1, 10 files under directory c2, 15 files under directory c3, and 30 files under directory c4, wherein the file sizes of the files are less than 200M. Then, according to the file screening rule information, corresponding target files may be selected from the directory information (optionally, the file size of each file may also be recorded in the directory information), and then the target files may be obtained from the object storage server according to the storage path provided by the directory information. After the sequence of the target files is broken up, model training is carried out.
In order to further increase the data acquisition speed and reduce the network request delay, in an embodiment of the present invention, the mount request sent by the model training server to the hierarchical directory server in step S11 may further carry an address of a load balancing server corresponding to the object storage server, so as to access the object storage server through the load balancing server, where the load balancing server is configured to distribute the network access request to the object storage server in a balanced manner according to a load balancing algorithm.
Therefore, each node in the object storage cluster is independent, the access load can be evenly distributed to all the nodes in the cluster, on one hand, the problem that common resources in an NAS and a cluster file system are unreasonably utilized can be avoided, on the other hand, reasonable nodes can be automatically selected for data reading, and the performance of the system is guaranteed to be maximized.
Correspondingly, in a second aspect, an embodiment of the present invention further provides a data providing method, which can effectively improve the acquisition speed of training data in model training.
As shown in fig. 3, the data providing method provided by the embodiment of the present invention, based on the hierarchical directory server, may include:
s21, receiving a directory information mounting request of a file sent by a model training server, wherein the mounting request carries the bucket name of the bucket where the file is located in an object storage server;
the model training server may read the training data in a manner of reading a file. Each file may be stored in the object storage server in the form of an object. All training data required for a model training task may form a data set. Each data set may be stored in one bucket of the object storage server. Due to the large amount of data required for model training, alternatively, a data set may often include tens of millions of file numbers.
The hierarchical directory server may provide the directory information of each file in the object storage server to the outside. Alternatively, the directory information may be a hierarchical directory structure of a multi-way tree, which indicates storage paths of the files. It should be noted that, the hierarchical directory server may be logically or functionally divided, and physically, the hierarchical directory server may be independent from the object storage server or may be integrated in the object storage server, which is not limited in this embodiment of the present invention.
When the model training server needs to read data for model training, a directory information mounting request can be sent to the hierarchical directory server, and the bucket name of the bucket where the data set required by the training is located is carried in the mounting request, so that the hierarchical directory server returns the directory information corresponding to the bucket name.
S22, directory information containing the bucket name is sent to the model training server, and the directory information indicates the storage path of the file in the object storage server.
In this step, the hierarchical directory server may return directory information under the bucket name to the model training server according to the bucket name carried in the mount request.
In the data providing method provided by the embodiment of the present invention, the hierarchical directory server may receive a directory information mount request of a file sent by the model training server, where the mount request carries a bucket name of a bucket where the file is located, and send directory information including the bucket name to the model training server, where the directory information indicates a storage path of the file in the object storage server. Therefore, when the model training server needs to read the training data, the directory information of each file in the training data can be provided for the model training server, so that the model training server can obtain the storage path of the file in the object storage server according to the directory information, the file can be quickly obtained through the storage path, the situation that the file is directly searched in the mass data of the object storage server is avoided, and the data obtaining speed can be effectively improved.
Optionally, in order to quickly give feedback to the model training server when a mount request of the directory information is received, in an embodiment of the present invention, corresponding directory information may be generated for data in the object storage server in advance.
Specifically, in an embodiment of the present invention, before sending the directory information including the bucket name to the model training server, the data obtaining method provided in the embodiment of the present invention may further include: and generating the directory information of the file system corresponding to the storage object according to the object name of the storage object in the bucket.
Optionally, in an embodiment of the present invention, generating directory information of a file system corresponding to a storage object according to an object name of the storage object in a bucket may specifically include: and scanning the object name of the object stored in the bucket, and splitting the object name to form the directory information. For example, in one embodiment of the invention, the object name of one storage object is:
ImageNet-large/train/n01782516_10048.JPEG
the object name can be split into: ImageNet-large, train, n01782516_10048.JPEG, the directory information so formed may be: ImageNet-large \ train \ n01782516_10048 JPEG, i.e. the n01782516_10048 JPEG file under the train subfolder under the ImageNet-large folder. After all the files in the bucket are scanned, directory information containing storage paths of all the files can be formed.
Optionally, in another embodiment of the present invention, generating directory information of a file system corresponding to a storage object according to an object name of the storage object in a bucket may also include: and receiving the registration information of the storage object in the bucket, and forming the directory information according to the registration information. That is, when an object is newly stored in the object storage server, a registration request may be actively initiated to the hierarchical directory server, and the hierarchical directory server may generate directory information according to the received registration information, such as the object name.
Optionally, the two ways of generating directory information may be used separately or in combination, for example, all objects already stored in the current bucket may be scanned to generate directory information, and when a new object is added in a later period, the directory information of the object may be generated through the registration information of the object and added to the original directory information.
In a third aspect, an embodiment of the present invention provides a data acquisition apparatus, which can effectively improve the acquisition speed of training data in model training.
As shown in fig. 4, an embodiment of the present invention provides a data acquisition apparatus, including:
a request sending unit 31, configured to send a directory information mount request for a file to a hierarchical directory server, where the mount request carries a bucket name of a bucket where the file is located in an object storage server;
a directory loading unit 32, configured to load, to a memory, directory information including the bucket name and returned by the hierarchical directory server, where the directory information indicates a storage path of the file in the object storage server;
a file obtaining unit 33, configured to obtain the file from the object storage server according to the directory information.
The data acquisition device provided by the embodiment of the invention can send a directory information mounting request of a file to a hierarchical directory server, the mounting request carries a bucket name of a bucket where the file is located in an object storage server, the directory information which is returned by the hierarchical directory server and contains the bucket name is loaded to a memory, and the file is acquired from the object storage server according to the directory information. Therefore, when the model training server needs to read the training data, the directory information of each file in the training data can be obtained from the hierarchical directory server, and the storage path of the file in the object storage server is obtained according to the directory information, so that the file can be quickly obtained through the storage path, the situation that the file is directly searched in the mass data of the object storage server is avoided, and the data obtaining speed can be effectively improved.
Optionally, the request sending unit 31 may be further configured to:
before sending a directory information mounting request of a file to a hierarchical directory server, sending an authentication request to the object storage server;
and after receiving the message of successful authentication sent by the object storage server, sending a directory information mounting request of the file to the hierarchical directory server.
Optionally, the data obtaining apparatus may further include a rule obtaining unit, configured to obtain file screening rule information before obtaining the file from the object storage server according to the directory information; the file acquiring unit 33 may specifically include:
the selection module is used for selecting a target file to be read from the directory information according to the file screening rule information;
and the acquisition module is used for acquiring the target file from the object storage server according to the directory information of the target file.
Optionally, the file screening rule information includes at least one of the number, distribution, and size of the target files that need to be read.
Optionally, the mount request may further carry an address of a load balancing server corresponding to the object storage server, so as to access the object storage server through the load balancing server, where the load balancing server is configured to distribute the network access request to the object storage server in a balanced manner according to a load balancing algorithm.
In a fourth aspect, an embodiment of the present invention provides a data providing apparatus, which can effectively improve the acquisition speed of training data in model training.
As shown in fig. 5, an embodiment of the present invention provides a data providing apparatus, including:
a request receiving unit 41, configured to receive a directory information mount request of a file sent by a model training server, where the mount request carries a bucket name of a bucket where the file is located in an object storage server;
a directory sending unit 42, configured to send, to the model training server, directory information including the bucket name, where the directory information indicates a storage path of the file in the object storage server.
The data providing device provided by the embodiment of the invention can receive a directory information mounting request of a file, which is sent by a model training server, wherein the mounting request carries a bucket name of a bucket where the file is located, and sends directory information containing the bucket name to the model training server, and the directory information indicates a storage path of the file in an object storage server. Therefore, when the model training server needs to read the training data, the directory information of each file in the training data can be provided for the model training server, so that the model training server can obtain the storage path of the file in the object storage server according to the directory information, the file can be quickly obtained through the storage path, the situation that the file is directly searched in the mass data of the object storage server is avoided, and the data obtaining speed can be effectively improved.
Optionally, the data receiving apparatus may further include: and the directory generation unit is used for generating the directory information of the file system corresponding to the storage object according to the object name of the storage object in the bucket before sending the directory information containing the bucket name to the model training server.
Optionally, the catalog generation unit may include:
the first generation module is used for scanning the object name of the object stored in the bucket and splitting the object name to form the directory information;
and/or
And the second generation module is used for receiving the registration information of the storage object in the bucket and forming the directory information according to the registration information.
In a third aspect, an embodiment of the present invention further provides an electronic device, which can effectively improve the acquisition speed of training data in model training.
As shown in fig. 6, an electronic device provided in an embodiment of the present invention may include: the device comprises a shell 51, a processor 52, a memory 53, a circuit board 54 and a power circuit 55, wherein the circuit board 54 is arranged inside a space enclosed by the shell 51, and the processor 52 and the memory 53 are arranged on the circuit board 54; a power supply circuit 55 for supplying power to each circuit or device of the electronic apparatus; the memory 53 is used to store executable program code; the processor 52 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 53, for executing the data acquisition method or the data providing method provided in any of the foregoing embodiments.
For specific execution processes of the above steps by the processor 52 and further steps executed by the processor 52 by running the executable program code, reference may be made to the description of the foregoing embodiments, and details are not described herein again.
The above electronic devices exist in a variety of forms, including but not limited to:
(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted at providing voice, data communications. Such terminals include: smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads.
(3) A portable entertainment device: such devices can display and play multimedia content. This type of device comprises: audio, video players (e.g., ipods), handheld game consoles, electronic books, and smart toys and portable car navigation devices.
(4) A server: the device for providing the computing service comprises a processor, a hard disk, a memory, a system bus and the like, and the server is similar to a general computer architecture, but has higher requirements on processing capacity, stability, reliability, safety, expandability, manageability and the like because of the need of providing high-reliability service.
(5) And other electronic equipment with data interaction function.
Accordingly, an embodiment of the present invention further provides a computer-readable storage medium, where one or more programs are stored, and the one or more programs can be executed by one or more processors to implement any one of the data acquisition methods or the data providing methods provided in the foregoing embodiments, so that corresponding technical effects can also be achieved, which have been described in detail above and are not described herein again.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.
In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
For convenience of description, the above devices are described separately in terms of functional division into various units/modules. Of course, the functionality of the units/modules may be implemented in one or more software and/or hardware implementations of the invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of data acquisition, comprising:
sending a directory information mounting request of a file to a hierarchical directory server, wherein the mounting request carries a bucket name of a bucket where the file is located in an object storage server;
loading directory information which is returned by the hierarchical directory server and contains the bucket name into a memory, wherein the directory information indicates a storage path of the file in the object storage server;
and acquiring the file from the object storage server according to the directory information.
2. The method of claim 1, wherein prior to sending a request to mount directory information for a file to a hierarchical directory server, the method further comprises:
sending an authentication request to the object storage server;
the sending of the directory information mount request of the file to the hierarchical directory server includes:
and after receiving the message of successful authentication sent by the object storage server, sending a directory information mounting request of the file to the hierarchical directory server.
3. The method of claim 1, wherein before retrieving the file from the object storage server according to the directory information, the method further comprises:
acquiring file screening rule information;
the obtaining the file from the object storage server according to the directory information includes:
selecting a target file to be read from the directory information according to the file screening rule information;
and acquiring the target file from the object storage server according to the directory information of the target file.
4. The method according to claim 3, wherein the file filtering rule information includes at least one of the number, distribution and size of the target files to be read.
5. The method according to any one of claims 1 to 4, wherein the mount request further carries an address of a load balancing server corresponding to the object storage server, so as to access the object storage server through the load balancing server, wherein the load balancing server is configured to distribute network access requests to the object storage server in a balanced manner according to a load balancing algorithm.
6. A data providing method, comprising:
receiving a directory information mounting request of a file sent by a model training server, wherein the mounting request carries a bucket name of a bucket where the file is located in an object storage server;
and sending directory information containing the bucket name to the model training server, wherein the directory information indicates the storage path of the file in the object storage server.
7. The method of claim 6, wherein prior to said sending directory information containing the bucket name to the model training server, the method further comprises:
and generating the directory information of the file system corresponding to the storage object according to the object name of the storage object in the bucket.
8. The method of claim 7, wherein the generating directory information of the file system corresponding to the storage object according to the object name of the storage object in the bucket comprises:
scanning the object name of an object stored in the bucket, and splitting the object name to form the directory information;
and/or
And receiving the registration information of the storage object in the bucket, and forming the directory information according to the registration information.
9. A data acquisition apparatus, comprising:
the device comprises a request sending unit, a hierarchical directory server and a storage unit, wherein the request sending unit is used for sending a directory information mounting request of a file to the hierarchical directory server, and the mounting request carries a barrel name of a barrel where the file is located in an object storage server;
a directory loading unit, configured to load, to a memory, directory information including the bucket name, where the directory information is returned by the hierarchical directory server, and indicates a storage path of the file in the object storage server;
and the file acquisition unit is used for acquiring the file from the object storage server according to the directory information.
10. The apparatus of claim 9, wherein the request sending unit is further configured to:
before sending a directory information mounting request of a file to a hierarchical directory server, sending an authentication request to the object storage server;
and after receiving the message of successful authentication sent by the object storage server, sending a directory information mounting request of the file to the hierarchical directory server.
CN202010030631.3A 2020-01-10 2020-01-10 Data acquisition method, data providing method and device Pending CN111258959A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010030631.3A CN111258959A (en) 2020-01-10 2020-01-10 Data acquisition method, data providing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010030631.3A CN111258959A (en) 2020-01-10 2020-01-10 Data acquisition method, data providing method and device

Publications (1)

Publication Number Publication Date
CN111258959A true CN111258959A (en) 2020-06-09

Family

ID=70946886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010030631.3A Pending CN111258959A (en) 2020-01-10 2020-01-10 Data acquisition method, data providing method and device

Country Status (1)

Country Link
CN (1) CN111258959A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733892A (en) * 2020-12-28 2021-04-30 北京聚云科技有限公司 Data interaction method and device for model training
CN112749127A (en) * 2020-12-28 2021-05-04 北京聚云科技有限公司 Data providing method and system for model training

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103475682A (en) * 2012-06-07 2013-12-25 华为技术有限公司 File transfer method and file transfer equipment
CN104283941A (en) * 2014-09-16 2015-01-14 深圳市同洲电子股份有限公司 Data access method, device and system
CN104639658A (en) * 2015-03-12 2015-05-20 浪潮集团有限公司 Realization method for accessing object storage by file system mounting
CN108089818A (en) * 2017-12-12 2018-05-29 腾讯科技(深圳)有限公司 Data processing method, device and storage medium
CN109002730A (en) * 2018-07-26 2018-12-14 郑州云海信息技术有限公司 A kind of file system directories right management method, device, equipment and storage medium
US20190102341A1 (en) * 2016-07-07 2019-04-04 Tencent Technology (Shenzhen) Company Limited Object information processing method and apparatus, and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103475682A (en) * 2012-06-07 2013-12-25 华为技术有限公司 File transfer method and file transfer equipment
CN104283941A (en) * 2014-09-16 2015-01-14 深圳市同洲电子股份有限公司 Data access method, device and system
CN104639658A (en) * 2015-03-12 2015-05-20 浪潮集团有限公司 Realization method for accessing object storage by file system mounting
US20190102341A1 (en) * 2016-07-07 2019-04-04 Tencent Technology (Shenzhen) Company Limited Object information processing method and apparatus, and storage medium
CN108089818A (en) * 2017-12-12 2018-05-29 腾讯科技(深圳)有限公司 Data processing method, device and storage medium
CN109002730A (en) * 2018-07-26 2018-12-14 郑州云海信息技术有限公司 A kind of file system directories right management method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733892A (en) * 2020-12-28 2021-04-30 北京聚云科技有限公司 Data interaction method and device for model training
CN112749127A (en) * 2020-12-28 2021-05-04 北京聚云科技有限公司 Data providing method and system for model training

Similar Documents

Publication Publication Date Title
CN111258958A (en) Data acquisition method, data providing method and device
CN103686198A (en) Video data processing method, device and system
CN112087487B (en) Scheduling method and device of model training task, electronic equipment and storage medium
CN110474820B (en) Flow playback method and device and electronic equipment
CN104348919A (en) Method and device for downloading file and browser
CN111258965B (en) Data acquisition method and device, electronic equipment and storage medium
CN108241797A (en) Mirror image warehouse user right management method, device, system and readable storage medium storing program for executing
CN112036125B (en) Document management method and device and computer equipment
US20110231813A1 (en) Apparatus and method for on-demand optimization of applications
CN111258959A (en) Data acquisition method, data providing method and device
WO2014146441A1 (en) Method, server and system for processing task data
CN112084017B (en) Memory management method and device, electronic equipment and storage medium
CN111158750A (en) Unity-based game installation package packaging method and device
CN110652728A (en) Game resource management method and device, electronic equipment and storage medium
CN113965402A (en) Configuration method and device of firewall security policy and electronic equipment
CN112749127A (en) Data providing method and system for model training
CN110580212B (en) Data export method and device of application program, electronic equipment and storage medium
CN113971163A (en) Small file merging storage method, small file reading method and server
CN112085208A (en) Method and device for model training by using cloud
CN111698210A (en) Cloud mobile phone handle data processing method and system and storage medium
CN111444542A (en) Data processing method, device and storage medium for copyright file
CN114338102B (en) Security detection method, security detection device, electronic equipment and storage medium
CN103220327B (en) user information storage method and device
CN108920658B (en) Mobile device desktop moving method and device and electronic device
CN111080750B (en) Robot animation configuration method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201022

Address after: Room 91, 5 / F, building 5, yard 30, Shixing street, Shijingshan District, Beijing 100041

Applicant after: Beijing juyuncube Technology Co., Ltd

Address before: 100041 Beijing, Shijingshan District Xing Xing street, building 30, No. 3, building 2, A-0071

Applicant before: Beijing Cheetah Mobile Technology Co.,Ltd.

TA01 Transfer of patent application right