CN112839071A

CN112839071A - Training system, training data access method and device, electronic device and medium

Info

Publication number: CN112839071A
Application number: CN201911167520.0A
Authority: CN
Inventors: 王立鹏; 杨柏辰; 叶松高; 颜深根
Original assignee: Sensetime Group Ltd
Current assignee: Sensetime Group Ltd
Priority date: 2019-11-25
Filing date: 2019-11-25
Publication date: 2021-05-25
Anticipated expiration: 2039-11-25
Also published as: CN112839071B

Abstract

The present disclosure relates to a training system, a training data access method and apparatus, an electronic device, and a medium, wherein the system includes: the system comprises a server and a plurality of nodes, wherein the server is used for distributing caching tasks for the nodes; the node is used for caching the training data block based on the caching task distributed by the server; the node comprises at least one client; a first client of the at least one client is used for acquiring a first data access request aiming at target training data; the first client is used for responding to the first data access request, determining a target client for caching the target training data, and acquiring the target training data from a node where the target client is located.

Description

Training system, training data access method and device, electronic device and medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a training system, a training data access method and apparatus, an electronic device, and a medium.

Background

Distributed storage is a data storage technology, data can be stored on a plurality of servers in a scattered mode, and data resources of the scattered storage can form a virtual storage system. The development of cloud computing and the Internet brings massive data, and distributed storage provides an efficient storage mode for the massive data. The distributed storage utilizes a plurality of storage servers to share the storage load, and utilizes the position server to position the storage information, thereby effectively improving the reliability, the availability and the access efficiency of the system.

Typically, a client effects the reading or writing of a file by requesting it from a server, each time the reading or writing of the file needs to go through the server. Taking reading a file as an example, a client sends a file reading request to a server, and then the server returns a requested file to the client according to the file reading request. However, one server may be connected to multiple clients, and a large number of clients accessing the server at the same time may occupy more network resources of the server, which may deteriorate the performance of the server.

Disclosure of Invention

The disclosure provides a training system, a training data access method and device, electronic equipment and a medium.

According to an aspect of the present disclosure, there is provided a training system comprising:

a training system, the system comprising: a server and a plurality of nodes, wherein,

the server is used for distributing caching tasks for the nodes;

the node is used for caching the training data block based on the caching task distributed by the server;

the node comprises at least one client;

a first client of the at least one client is used for acquiring a first data access request aiming at target training data;

the first client is used for responding to the first data access request, determining a target client for caching the target training data, and acquiring the target training data from a node where the target client is located.

In a possible implementation manner, the first client is further configured to determine, in response to the first data access request, the target client used for caching a target training data block in the multiple clients according to registration information and meta information of the multiple clients in the training system, where the meta information includes information of each training data.

In a possible implementation manner, when the target client is the first client, the first client is configured to obtain a target training data block including the target training data from a node where the first client is located, and obtain the target training data from the target training data block.

In a possible implementation manner, in a case that the target client is a second client, the first client is configured to send a second data access request for the target training data to the second client, where the first client is different from the second client, and the first client and the second client belong to different nodes;

the second client is used for responding to the second data access request, acquiring the target training data block from the node where the second client is located, and sending the target training data block to the first client;

the first client is further configured to obtain the target training data from the target training data block sent by the second client.

In a possible implementation manner, the target client is configured to obtain a cache task that is allocated by the server to the target client;

the target client is configured to cache the training data block to be cached, indicated by the caching task, to the node where the target client is located, where the training data block to be cached includes the target training data block.

In a possible implementation manner, the target client is further configured to cache the training data block to be cached, which is indicated by the caching task, to the node where the target client is located when the plurality of clients in the training system complete registration.

In a possible implementation manner, the target client is further configured to, when the target client receives a second data access request, determine a target training data block in which target training data indicated by the second data access request is located, and obtain the target training data block from the server, so as to cache the target training data block in a node in which the target client is located.

In a possible implementation manner, the target client is configured to obtain meta information from the server, and send a registration request to the server, where the meta information includes information of each piece of training data;

the server is further used for sending registration information of a plurality of clients in the training system to the target client according to the registration request;

and the target client is further used for determining a training data block indicated by the cache task allocated to each client in the plurality of clients according to the registration information of the plurality of clients and the meta information.

In a possible implementation manner, the target client is further configured to send the available memory of the node where the target client is located to the server;

the server is further configured to determine a cache task allocated to the target client according to the available memory of the plurality of nodes of the training system.

In a possible implementation manner, the target client is the client with the lowest process level in the node, wherein the process level of the client is determined by the server.

According to an aspect of the present disclosure, there is provided a training data access method, which is applied to a training system, the system including: a server and a plurality of nodes, wherein,

the server is used for distributing caching tasks for the nodes;

the node comprises at least one client;

a first client in the at least one client acquires a first data access request aiming at target training data;

and the first client responds to the first data access request, determines a target client for caching the target training data, and acquires the target training data from a node where the target client is located.

In one possible implementation, the determining, by the first client, a target client for caching the target training data in response to the first data access request includes:

and the first client responds to the first data access request, and determines the target client used for caching a target training data block in the plurality of clients according to registration information and meta information of the plurality of clients in the training system, wherein the meta information comprises information of each training data.

In a possible implementation manner, in a case that the target client is the first client, the determining, by the first client, the target client for caching the target training data in response to the first data access request, so as to obtain the target training data from a node where the target client is located includes:

and the first client acquires a target training data block comprising the target training data from the node where the first client is located, and acquires the target training data from the target training data block.

In a possible implementation manner, in a case that the target client is a second client, the determining, by the first client, the target client for caching the target training data in response to the first data access request, so as to obtain the target training data from a node where the target client is located includes:

the first client sends a second data access request aiming at the target training data to the second client, so that the second client responds to the second data access request, acquires the target training data block from a node where the second client is located, and sends the target training data block to the first client;

the first client acquires the target training data from the target training data block sent by the second client;

the first client is different from the second client, and the first client and the second client belong to different nodes.

In a possible implementation manner, before the first client acquires the target training data from the node where the target client is located, the method further includes:

the first client acquires a cache task distributed by the server for the first client;

and the first client caches the training data block to be cached indicated by the caching task to a node where the first client is located, wherein the training data block to be cached comprises the target training data block.

and the first client caches the training data block to be cached indicated by the caching task to a node where the first client is located under the condition that a plurality of clients in the training system finish registration.

and under the condition that the first client receives the first data access request, determining a target training data block where target training data indicated by the first data access request is located, and acquiring the target training data block from the server so as to cache the target training data block to a node where the first client is located.

In one possible implementation, the method further includes:

the first client acquires meta information from the server and sends a registration request to the server, so that the server sends registration information of a plurality of clients in the training system to the first client according to the registration request, wherein the meta information comprises information of each training data;

and the first client determines a training data block indicated by the cache task allocated to each client in the plurality of clients according to the registration information of the plurality of clients and the meta information.

In one possible implementation, the method further includes:

and the first client sends the available memory of the node where the first client is located to the server, so that the server determines the cache task distributed to the first client according to the available memories of the nodes of the training system.

In a possible implementation manner, the first client is the client with the lowest process level in the node where the first client is located, wherein the process level of the client is determined by the server.

According to an aspect of the present disclosure, there is provided a training data access apparatus, which is applied to a training system, the system including: the system comprises a server and a plurality of nodes, wherein the server is used for distributing caching tasks for the nodes; the node is used for caching the training data block based on the caching task distributed by the server; the node comprises at least one client;

the apparatus is deployed in a first client of the at least one client, the apparatus comprising:

the acquisition module is used for acquiring a first data access request aiming at target training data;

and the processing module is used for responding to the first data access request, determining a target client for caching the target training data, and acquiring the target training data from a node where the target client is located.

In a possible implementation manner, the processing module is specifically configured to determine, in response to the first data access request, the target client used for caching the target training data block in the multiple clients according to registration information and meta information of the multiple clients in the training system, where the meta information includes information of each piece of training data.

In a possible implementation manner, the processing module is specifically configured to, when the target client is the first client, obtain a target training data block including the target training data from a node where the first client is located, and obtain the target training data from the target training data block.

In a possible implementation manner, the processing module is specifically configured to send a second data access request for the target training data to the second client when the target client is the second client, so that the second client obtains the target training data block from a node where the second client is located in response to the second data access request, and sends the target training data block to the first client; acquiring the target training data from the target training data block sent by the second client;

In one possible implementation, the apparatus further includes:

the cache module is used for acquiring a cache task distributed by the server for the first client before the first client acquires the target training data from the node where the target client is located; and caching the training data blocks to be cached indicated by the caching task into the node where the first client is located, wherein the training data blocks to be cached comprise the target training data blocks.

In a possible implementation manner, the caching module is further configured to cache the training data block to be cached, which is indicated by the caching task, in the node where the first client is located, under the condition that the plurality of clients in the training system complete registration before the target training data is obtained from the node where the target client is located.

In a possible implementation manner, the cache module is further configured to, before the target training data is obtained from the node where the target client is located, determine a target training data block where the target training data indicated by the first data access request is located under the condition that the first data access request is received, and obtain the target training data block from the server, so as to cache the target training data block in the node where the first client is located.

In one possible implementation, the apparatus further includes:

the registration module is used for acquiring meta-information from the server and sending a registration request to the server so that the server sends registration information of a plurality of clients in the training system to the first client according to the registration request, wherein the meta-information comprises information of each training data;

the determining module is further configured to determine, according to the registration information of the plurality of clients and the meta information, a training data block indicated by the cache task allocated to each of the plurality of clients.

In one possible implementation, the apparatus further includes:

and the sending module is used for sending the available memory of the node where the first client is located to the server so that the server determines the cache task allocated to the first client according to the available memories of the nodes of the training system.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, the training system may include a server and a plurality of nodes, the server may allocate a caching task to the plurality of nodes, and the nodes may cache the training data blocks based on the caching task allocated by the server. The node may include at least one client, a first client of the at least one client may obtain a first data access request for the target training data, and the first client may determine a target client for caching the target training data in response to the first data access request, so as to obtain the target training data from the node where the target client is located. Therefore, the training data blocks can be dispersedly cached in a plurality of nodes, so that the caching resources of the nodes can be fully utilized, the loading speed of the training data is increased and the data transmission pressure between the client and the server is reduced through the data interaction between the client and the client.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

FIG. 1 shows a block diagram of a training system according to an embodiment of the present disclosure.

FIG. 2 shows a flow diagram of a training data access method according to an embodiment of the present disclosure.

FIG. 3 illustrates a block diagram of obtaining target training data in accordance with an embodiment of the present disclosure.

Fig. 4 shows a block diagram of an example of a communication connection between clients according to an embodiment of the present disclosure.

Fig. 5 illustrates a block diagram of an example of a client caching training data block in accordance with an embodiment of the disclosure.

Fig. 6 illustrates a block diagram of an example of a client caching training data block in accordance with an embodiment of the disclosure.

Fig. 7 shows a block diagram of a client registering in accordance with an embodiment of the disclosure.

FIG. 8 shows a block diagram of a training system according to an embodiment of the present disclosure.

FIG. 9 shows a block diagram of a training data access device according to an embodiment of the present disclosure.

FIG. 10 shows a block diagram of an electronic device in accordance with an embodiment of the disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

The training system provided by the embodiment of the disclosure can comprise a server and a plurality of nodes. Wherein, the server can distribute the caching task for a plurality of nodes. The node may cache the training data blocks based on caching tasks assigned by the server. In this way, a large number of training data blocks can be stored in a plurality of nodes in a scattered manner, and the access load of the server is reduced. Here, the node may include at least one client, a first client of the at least one client may obtain a first data access request for the target training data, and in response to the first data access request, the first client may determine a target client for caching the target training data, so as to obtain the target training data from a node where the target client is located, thereby increasing a loading speed of the target training data, and making full use of cache resources in the node.

In the related art, the training data block is generally stored in a server. When the client accesses the training data, an access request needs to be sent to the server. However, the processing resources of the server are limited, and it is difficult to respond to access requests of some clients in time under the condition of receiving access requests of a large number of clients at the same time, so that the speed of acquiring training data by the clients is affected. By caching the training data blocks in the nodes, the client can acquire the target training data in the cache units of the nodes, so that the speed of loading the target training data is increased, and the cache resources of a plurality of nodes can be fully utilized, so that the client can efficiently and reliably communicate with each other, and the communication performance of the training data is improved.

FIG. 1 shows a block diagram of a training system according to an embodiment of the present disclosure. As shown in fig. 1, the training system may include: a server 11 and a plurality of nodes 12.

The server 11 is configured to allocate cache tasks to the plurality of nodes 12;

the node 12 is configured to cache the training data block based on the cache task allocated by the server; the node 12 comprises at least one client 13.

In the embodiment of the present disclosure, the training system may be applied in a scenario of neural network training. The neural network uses a large amount of training data in the training process, and the large amount of training data can form a data set. The server 11 may manage the data set. The data set may include a plurality of training data blocks, each of which may include a plurality of training data. Here, the training data may be input data, output data, tag data, and the like of the neural network, which are required in a training process, and the training data may be image information, text information, and the like. The server 11 may allocate a caching task to a plurality of nodes, that is, may be understood as a task of allocating training data blocks for caching to a plurality of nodes.

Here, the training system may include a plurality of nodes, where the nodes may be training nodes in a neural network, and may be devices, device clusters (including at least two devices), or programs running on the devices, and the like, which are not limited herein. Each node 12 may have a corresponding cache unit, and may cache a training data block corresponding to a cache task allocated by the server 11, so that a plurality of training data blocks may be stored in a plurality of nodes in a distributed manner. For example, in the case that the node is an equipment cluster, the cache unit may refer to one or more servers belonging to the equipment cluster or other equipment; in the case of a device, a cache unit may refer to an area on the device for storing data.

Here, each node 12 may include at least one client. Each client may provide services to the node, for example, to provide services to the node in response to a training process (a training process may refer to a client that the training system deploys on a certain node). At least one client included in each node 12 may share the cache of the node where the client is located, that is, multiple clients deployed on the same node may store the training data block indicated by the cache task allocated by the server in the same cache unit, and in some implementations, different cache units may be set for different clients.

By caching the caching tasks distributed by the server in the nodes, distributed caching can be performed on the training data at the client side, and the loading performance of the training data is improved. It should be noted that the training system may be integrated in one device, and in some implementations, the server and the node in the training system may be integrated in different devices, and the specific setting manner is not limited in the embodiments of the present disclosure.

The following describes a training data access method provided by an embodiment of the present disclosure. The training data access method may be applied to a terminal device or other electronic devices, and at this time, the terminal device or other electronic devices may be used as a node in the training system, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the training data access method may be implemented by a processor calling computer readable instructions stored in a memory. The training data access method provided by the embodiment of the disclosure is described below with the first client as an execution subject.

FIG. 2 shows a flow diagram of a training data access method according to an embodiment of the present disclosure. The training data access method may include the steps of:

in step S21, a first client of the at least one client obtains a first data access request for target training data.

Step S22, in response to the first data access request, the first client determines a target client for caching the target training data, so as to obtain the target training data from a node where the target client is located.

In the embodiment of the present disclosure, the first client may be any one of at least one client included in the node, or one client determined according to a preset rule. For example, the determination may be made according to the order in which the clients register with the server, and/or according to the process level assigned to the clients by the server, and/or according to the data processing capabilities (e.g., computing capabilities, etc.) of the clients in each node, and the like. That is, the server may determine the first client in consideration of one or more of the factors listed above. The first data access request may be an access request to access the target training data, which may be generated by the first client, or sent by a client other than the first client, which may be located at the same node or a different node than the first client. For example, a first client may receive first data access requests of other clients in the same node.

Here, the target training data may be located in a target training data block, and the target client may be a client of the caching task to which the target training data block is assigned. The target client may be the first client or may be a client other than the first client.

In the embodiment of the disclosure, the first client may receive a first data access request for accessing the target training data, and then may determine the target client for caching the target training data according to the relevant information of the target training data carried in the first data access request. Here, the information related to the target training data in the first data access request may include information such as a name, a storage path, and the like of the target training data.

Here, the training data block may be an aggregation of a plurality of training data, and in order to facilitate transmission and storage of the training data, a large amount of training data may be aggregated into a plurality of training data blocks for storage. Correspondingly, the first client may further store a corresponding relationship between the training data block and the client, where the corresponding relationship may represent a corresponding relationship between the client and the cache task of the training data block allocated to the client. The first client may determine, according to the relevant information of the target training data carried in the first data access request, a target training data block in which the target training data is located. Then, according to the corresponding relationship between the training data block and the client, the client corresponding to the target training data block can be determined.

After determining the target client for caching the target training data, the first client may obtain the target training data from a node where the target client is located according to address information of the target client. The Address information may include an Internet Protocol (IP) Address and a port Address, the Address information of the target client may be obtained by the first client from a server, and the first client may obtain the Address information of the plurality of clients from the server.

The first client side of the embodiment of the disclosure can acquire the target training data in the node where the target client side is located under the condition of acquiring the first data access request for accessing the target training data, so that the speed of accessing the training data can be increased, and efficient and reliable communication among a plurality of client sides can be realized.

In one possible implementation manner, the first client may determine, in response to the first data access request, a target client used for caching a target training data block in the multiple clients according to registration information and meta information of the multiple clients in the training system, where the meta information includes information of each training data.

In this implementation, the first client may store, in advance, meta information of a plurality of pieces of training data, where the meta information may include information of the plurality of pieces of training data, for example, information such as a name, a data length, a data offset address, and a located training data block of each piece of training data, and according to the pre-stored meta information, the first client may determine a target training data block where the target training data is located.

The first client may also pre-store registration information of multiple clients, where the registration information of multiple clients may include address information of each client, a process level, an available memory of a node where the client is located, and the like. According to the registration information of the plurality of clients, the first client can determine the cache task allocated to each client, and further according to the cache task allocated to each client, the client for caching each training data block can be determined, that is, the corresponding relationship between the training data block and the client can be determined, so that the target client for caching the target training data block can be determined according to the corresponding relationship.

Here, the registration information and the meta information previously stored by the first client may be acquired from the server. The first client may determine a correspondence between each client and the cached training data block according to the pre-stored registration information and the meta information, and store the correspondence. Thus, according to the corresponding relation, the first client can quickly determine the target training data block where the target training data is located.

In a possible implementation manner, in a case that the target client is the first client, the first client may obtain a target training data block including the target training data from a node where the first client is located, and obtain the target training data from the target training data block.

In this implementation manner, when the target client is the first client, it may be determined that the client for caching the target training data block is the first client, and the target training data block is stored in the node where the first client is located, so that the first client may obtain the target training data in the target training data block in a local cache (i.e., a cache unit of the node where the first client is located), thereby implementing fast reading of the target training data.

Here, the first client may be a client having an access caching capability in a node where the first client is located. In one implementation, the first client may be a client with the lowest process level in the node, where the process level of the client is determined by the server. The server can allocate a unique process level to each client, so that the clients with access capability in each node can be determined according to the process levels of the clients in each node. For example, the server may randomly allocate a unique process level to each client, or the server may allocate a process level to each client according to a certain rule, for example, the process level is set according to the sequence in which each client registers with the server. Certain rules may be preset, and include, but are not limited to, the above-mentioned cases, and in the embodiments of the present application, the setting mode of certain rules is not limited.

In a possible implementation manner, when the target client is the second client, the first client sends a second data access request for the target training data to the second client, so that the second client responds to the second data access request, acquires the target training data block from a node where the second client is located, and sends the target training data block to the first client. The first client acquires target training data from a target training data block sent by the second client. The first client is different from the second client, and the first client and the second client belong to different nodes.

In this implementation, if the target client is the second client and the first client and the second client belong to different nodes, the first client may send a second data access request for accessing the target training data to the second client after acquiring the first data access request. The second client may determine a target training data block including the target training data according to the second data access request, and send the target training data block to the first client. The first client may obtain the target training data in a target training data block. Thus, for a first client, the first client can access a target training data block from another node (different from the node where the first client is located) by means of a second client located on the other node.

The function of the second client is the same as or similar to that of the first client, so that the determining method of the second client may refer to the determining method of the first client, and when the second client receives the first data access request, the above-described related content described for the first client may be used for data storage, data access, and the like, which is not described herein again. For example, the second client may be a client having access caching capability in a node where the second client is located, and the second client is a client having the lowest process level in the node where the second client is located, or the second client may be a client having the strongest computing capability in the node where the second client is located.

The above process of acquiring the target training data is described below by way of an example. FIG. 3 illustrates a block diagram of obtaining target training data in accordance with an embodiment of the present disclosure. Assume that the first node includes a client a, a client B and a client C, where the client B is the client with the smallest process level of the clients in the first node. Assume that the second node includes a client D, a client E and a client F, where the client E is the client with the smallest process level of the clients in the second node.

In one example, any one client in each node may access the training data blocks indicated by the client's assigned caching task. The client a in the first node receives an access request for accessing the target training data, and may send the first data access request to the client C if it is determined that the target training data block in which the target training data is located is cached on the node in which the client C is located. The client C may read the target training data in the cache unit of the node where the client C is located, and send the target training data to the client a. That is, for each client deployed in the training system, after receiving a data access request sent by another client, the client having the capability of caching a training data block may access a target training data block cached in the caching unit by the client, and send the target training data block to the client initiating the data access request, so that the client initiating the data access request completes access to the target training data.

In one example, a process-level-minimum client in each node may access a block of training data cached in the node in which it resides. Client a receives the access request to access the target training data, and may send a first data access request to client B if it is determined that the target training data stores the first node. Client B may read the target training data in the local cache and return the target training data to client a.

In one example, a process-level-minimum client in each node may access a block of training data cached in the node in which it resides. It should be noted that, at this time, the client with the minimum process level in each node has the capability of caching the training data block. The client a receives the access request for accessing the target training data, and under the condition that the target training data is determined to be stored in the client E, the client a may send a second data access request to the client E in the second node, and the client E obtains the target training data block of the target training data in the cache of the second node and returns the target training data block to the client a.

In order to enable efficient communication and data transmission between clients, communication between clients may be established through a framework of Remote Procedure Call (RPC), for example, simple and efficient Apache thread may be used to implement communication between clients. Apache thread is a simple and friendly RPC framework, and can support a plurality of interface description languages, such as C + +, Java, Python and other computer languages. Moreover, Apache thread adopts a binary communication protocol, so that communication between clients can be realized more efficiently.

The communication mode between the clients provided by the embodiment of the disclosure can alleviate the problems of too large connection number and too high server pressure caused by that one server corresponds to a plurality of clients, and the first client in each node is responsible for accessing the training data, so that the pressure of the server can be reduced. Fig. 4 shows a block diagram of an example of a communication connection between clients according to an embodiment of the present disclosure. The implementation in fig. 4 may represent a communication connection between clients, in the case of a cache unit being accessed by a given client in one node, for example, the client with the smallest process level in each node is responsible for the access of training data. Each client communicates with the client with the smallest process level in each node (i.e. for a client, the client can perform data interaction with the client with the smallest process level in each node), and the number of connections between the clients is n × p, where n represents the number of clients and p represents the number of nodes. In the case of full connection of clients, each client generally has the capability of caching training data blocks, and the number of connections between clients is n × (n-1). No matter which implementation mode is adopted, data interaction between the client and the server can be effectively reduced, and therefore network pressure is reduced.

In a possible implementation manner, before the first client acquires the target training data from the node where the target client is located, the first client acquires a cache task allocated by the server for the first client, the first client caches a training data block to be cached, which is indicated by the cache task, to the node where the first client is located, and the training data block to be cached includes the target training data block.

In this implementation manner, the first client may obtain the cache task allocated by the server to the first client before the first client obtains the target training data from the node where the target client is located, and cache the training data block to be cached, which is indicated by the obtained cache task, at the node where the first client is located. For example, the training data blocks to be cached indicated by the obtained caching task may be cached in the node where the training data blocks are located before receiving the first data access request (that is, all the training data blocks to be cached indicated by the caching task are cached in the node where the first client is located at one time), or the training data blocks to be cached indicated by the obtained caching task may be cached in the node where the training data blocks are located after receiving the first data access request (that is, the training data blocks to be obtained from the node where the first client is located this time are cached in the node where the first client is located according to a requirement, that is, according to the first data access request). Here, the training data block to be cached may be all the training data blocks indicated by the caching task, or may be one or more training data blocks in all the training data blocks indicated by the caching task. The training data block to be cached comprises target training data, so that the first client can quickly acquire the target training data in the target training data block in the node where the first client is located. Here, the first client is the target client.

In one example, before the first client acquires the target training data from the node where the target client is located, the first client caches the training data block to be cached, which is indicated by the caching task, into the node where the first client is located when the first client completes registration in the training system at a plurality of clients.

In this example, the first client may cache the training data block indicated by the allocated cache task in the node where the first client is located in advance, that is, the first client may cache the training data block indicated by the allocated cache task in the local in advance when the plurality of clients complete registration, so that the first client may quickly respond to the first data access request and obtain the target training data in the cache of the node where the first client is located when receiving the first data access request.

Fig. 5 illustrates a block diagram of an example of a client caching training data block in accordance with an embodiment of the disclosure. In a one shot mode, that is, after a client in the training system registers with a server, a training data block indicated by a caching task allocated by the server is cached in a caching unit of a node where the client is located, and after the client caches the training data block, the client may not increase caching tasks of other training data blocks. Assume that the training system includes 3 clients, client a, client B, and client C. Client a may pre-cache allocated training data block 1 and training data block 2, client B may pre-cache allocated training data block 3 and training data block 4, and client C may pre-cache allocated training data block 5 and training data block 6. In the case where the client a receives a request to read the training data block 3 and the training data block 5, a request to read the training data block may be initiated to the client B and the client C according to the correspondence relationship between the training data block and the client, so as to obtain the training data block 3 through the client B and obtain the training data block 5 through the client C.

In one example, when receiving a first data access request, a first client determines a target training data block where target training data indicated by the first data access request is located, and acquires the target training data block from a server, so as to cache the target training data block into a node where the first client is located.

In this implementation manner, the first client may determine, according to the first data access request, a target training data block of the target training data after receiving the first data access request, and in a case that the client to which the caching task corresponding to the target training data block is allocated is the first client, the first client may obtain the target training data block from the server and cache the target training data block in the node where the target training data block is located. That is, it may be understood that the first client may only cache the target training data block after receiving the first data access request, and does not cache other training data blocks indicated by the caching task. Therefore, the pressure caused by the fact that a plurality of clients simultaneously acquire the training data blocks from the server can be reduced, and the cache resource of the node where the first client is located can be saved.

Fig. 6 illustrates a block diagram of an example of a client caching training data block in accordance with an embodiment of the disclosure. In an on demand passive caching (on demand) mode, that is, after receiving a request for accessing training data, a client in the training system may cache a training data block to be accessed in a cache unit of a node where the client is located according to the request. Assume that the training system includes 3 clients, client a, client B, and client C. The cache tasks distributed to the client A comprise a training data block 1 and a training data block 2, the cache tasks distributed to the client B comprise a training data block 3 and a training data block 4, and the cache tasks distributed to the client C comprise a training data block 5 and a training data block 6. In the case where the client a receives the first data access request for accessing the training data block 3 and the training data block 5, a request for accessing the training data block 3 may be initiated to the client B and a request for accessing the training data block 5 may be initiated to the client C according to the correspondence between the training data blocks and the clients. After client B receives the request to read training data block 3, training data block 3 may be buffered. Client C, after receiving a request to read training data block 5, may cache training data block 5 such that client B retrieves training data block 3 and training data block 5 through client C, while client a may not perform the assigned caching task, i.e., client a may not cache training data block 1 and training data block 2. In this way, after receiving the request for reading the corresponding training data block, the client can obtain the corresponding training data block from the server and cache the training data block locally, so that local cache can be saved, and the pressure of multiple clients simultaneously requesting the training data block from the server can be reduced.

In one possible implementation manner, the first client acquires meta information from the server and sends a registration request to the server, so that the server sends registration information of a plurality of clients in the training system to the first client according to the registration request, wherein the meta information includes information of each piece of training data. And then the first client determines a training data block indicated by a cache task allocated to each client in the plurality of clients according to the registration information and the meta information of the plurality of clients.

In this implementation, the first client may register with the server. Fig. 7 shows a block diagram of a client registering in accordance with an embodiment of the disclosure. The first client may obtain meta information of the training data from the server, where the meta information may include information of a plurality of training data, for example, information of a name, a data length, a data offset address, a located training data block, and the like of each training data, and the meta information may be used by the first client to determine a target training data block where the target training data is located. The first client can also start a registration process, send a registration request to the server, and register with the server. The registration request may carry related information of the first client, for example, the registration request may carry address information of the first client, a task identifier, an available memory of a node where the registration request is located, and the like. Here, the address information of the first client may include an IP address and a port address. For example, the first client may select a port randomly or according to a rule in the currently idle ports and send the port address of the port to the server, so that the occurrence of port collision of multiple clients in a node can be reduced. Here, a task identifier may be used to indicate a task for which the first client requests registration, and the task identifier may be obtained from an environment variable of the training system.

The server may send registration information for a plurality of clients in the training system to the first client after receiving a registration request of the first client. Here, the registration information may include address information of the client, a process level, available memory, and the like. For example, the server may receive registration requests of a plurality of clients in the training system, and record information such as an IP address, a port address, a task identifier, and an available memory of a node where the server is located, which are carried in the registration request of each client. The server may obtain the number of processes in the environment variable of the training system, where the number of processes may be the number of registration processes initiated by the client. Then, under the condition of receiving registration requests of a plurality of clients in the training system, that is, under the condition that the received registration requests are greater than or equal to the number of processes, different process grades can be allocated to different clients according to the number of the clients initiating the registration requests, and registration information such as address information, the process grades, available memory and the like of each client is sent to the plurality of clients.

Further, the first client may determine, according to the registration information of the multiple clients and the obtained meta information, a caching task that the server allocates to each client, that is, may determine a correspondence between each client and a training data block to be cached by the client. In this way, the first client may locally calculate the correspondence between each client and the training data block, thereby reducing the server stress caused by the client accessing the server.

In one possible implementation manner, the first client may send the available memory of the node where the first client is located to the server, so that the server determines the cache task allocated to the first client according to the available memories of the plurality of nodes of the training system.

In this implementation manner, the first client may send the available memory of the node where the first client is located to the server, for example, the first client may carry the available memory of the node where the first client is located in the registration request for registering with the server, or the first client may send the available memory of the node where the first client is located to the server before sending the registration request to the server.

The server may allocate a caching task of the training data block to each client according to an available memory of each client in the plurality of clients. The server may determine, according to the available memory of each client, a cache size (cache size, which refers to a cache on the node for storing the training data block) of the node where each client is located, and then allocate, according to the cache size of the node where each client is located, a cache task of the training data block to each client, so that the training data block indicated by the cache task allocated to each client matches the cache size of the client.

Here, the cache size of each client may be calculated according to the following formula (1):

the cache size is the available memory x (data set size/total memory) of the node where the cache is located, and the formula is (1);

the data set may be a data set formed by all training data blocks, and the total memory may be the sum of the available memories of all clients.

Here, when determining the cache task allocated to each client list according to the available memory of the node where each client is located, the server may sequentially allocate the arranged training data blocks to each client according to the process level of each client, for example, allocate the training data blocks ranked in the top to the client with a small process level. According to the way in which the training data blocks are allocated, the first client can determine the caching task of the training data block allocated to each client according to the available memory of the node where each client is located, the process level and the meta information of the training data block. In this way, the first client can quickly determine the correspondence between each client and the training data block to be cached, and can quickly determine the target client for the target training data when receiving the first data access request.

It should be noted that, the client and the server provided in the embodiment of the present disclosure may be configured in the same electronic device, or may be configured in different electronic devices, and the embodiment of the present disclosure does not limit a specific configuration manner. The embodiment of the disclosure provides a scheme of client distributed dynamic cache, which can realize high-efficiency information communication and meet the requirements of different users on cache modes.

The training system and the training data access method provided by the embodiment of the disclosure can cache the training data block in the node, and can make full use of cache resources of a plurality of nodes. The first client can access the training data cached by the first client in the node where the first client is located, data interaction with other clients is achieved, the speed of loading target training data is improved, the clients can communicate with each other efficiently and reliably, and the communication performance of the training data is improved.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides an information processing apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the information processing methods provided by the present disclosure, and the descriptions and corresponding descriptions of the corresponding technical solutions and the corresponding descriptions in the methods section are omitted for brevity.

Fig. 8 shows a block diagram of a training system according to an embodiment of the present disclosure, as shown in fig. 8, the system comprising: a server 31 and a plurality of nodes 32,

the server 31 is configured to allocate cache tasks to the plurality of nodes;

the node 32 is configured to cache the training data block based on the cache task allocated by the server;

the node comprises at least one client 33;

a first client 331 of the at least one client, configured to obtain a first data access request for target training data;

the first client 331 is configured to determine, in response to the first data access request, a target client for caching the target training data, so as to obtain the target training data from a node where the target client is located.

In a possible implementation manner, the first client 331 is further configured to determine, in response to the first data access request, the target client, which is used for caching a target training data block, in the multiple clients according to registration information of the multiple clients in the training system and meta information, where the meta information includes information of each training data.

In a case that the target client is the first client, the first client 331 is configured to obtain a target training data block including the target training data from a node where the first client is located, and obtain the target training data from the target training data block.

In a possible implementation manner, in a case that the target client is a second client, the first client 331 is configured to send a second data access request for the target training data to the second client, where the first client is different from the second client, and the first client and the second client belong to different nodes;

in a possible implementation manner, the second client is configured to, in response to the second data access request, obtain the target training data block from a node where the second client is located, and send the target training data block to the first client;

the first client 331 is further configured to obtain the target training data from the target training data block sent by the second client.

Fig. 9 shows a block diagram of a training data access device according to an embodiment of the present disclosure, which is applied to a training system, the system comprising: the system comprises a server and a plurality of nodes, wherein the server is used for distributing caching tasks for the nodes; the node is used for caching the training data block based on the caching task distributed by the server; the node comprises at least one client; the apparatus is deployed in a first client of the at least one client, the apparatus comprising:

an obtaining module 41, configured to obtain a first data access request for target training data;

and the processing module 42 is configured to determine, in response to the first data access request, a target client for caching the target training data, so as to obtain the target training data from a node where the target client is located.

In a possible implementation manner, the processing module 42 is specifically configured to determine, in response to the first data access request, the target client used for caching the target training data block in the multiple clients according to registration information of the multiple clients in the training system and meta information, where the meta information includes information of each training data.

In a possible implementation manner, the processing module 42 is specifically configured to, when the target client is the first client, obtain a target training data block including the target training data from a node where the first client is located, and obtain the target training data from the target training data block.

In a possible implementation manner, the processing module 42 is specifically configured to send a second data access request for the target training data to the second client when the target client is the second client, so that the second client responds to the second data access request, obtains the target training data block from a node where the second client is located, and sends the target training data block to the first client; acquiring the target training data from the target training data block sent by the second client;

In one possible implementation, the apparatus further includes:

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the above-mentioned method. The computer readable storage medium may be a non-volatile computer readable storage medium.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the memory-stored instructions to perform the above-described method.

The embodiments of the present disclosure also provide a computer program product, which includes computer readable code, and when the computer readable code runs on a device, a processor in the device executes instructions for implementing the picture search method provided in any of the above embodiments.

The embodiments of the present disclosure also provide another computer program product for storing computer readable instructions, which when executed, cause a computer to perform the operations of the picture searching method provided in any of the above embodiments.

Fig. 10 shows a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, the electronic device 1900 may be provided as a server. Referring to fig. 10, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A training system, the system comprising: a server and a plurality of nodes, wherein,

the server is used for distributing caching tasks for the nodes;

the node comprises at least one client;

2. The training system of claim 1, wherein the first client is further configured to determine, in response to the first data access request, the target client of the plurality of clients for caching a target training data block according to registration information of the plurality of clients in the training system and meta information, the meta information including information of each training data.

3. The training system according to claim 1 or 2, wherein, in a case that the target client is the first client, the first client is configured to obtain a target training data block including the target training data from a node where the first client is located, and obtain the target training data from the target training data block.

4. Training system according to claim 1 or 2, wherein, in case the target client is a second client, the first client is configured to send a second data access request for the target training data to the second client, the first client is different from the second client, and the first client and the second client belong to different nodes;

5. The training system according to any one of claims 1 to 4, wherein the target client is configured to obtain a caching task allocated by the server to the target client;

6. The training system of claim 5, wherein the target client is further configured to cache the training data block to be cached, indicated by the caching task, to the node where the target client is located when a plurality of clients in the training system complete registration.

7. A training data access method, wherein the method is applied to a training system, and the system comprises: a server and a plurality of nodes, wherein,

the server is used for distributing caching tasks for the nodes;

the node comprises at least one client;

8. A training data access apparatus, wherein the apparatus is applied to a training system, the system comprising: the system comprises a server and a plurality of nodes, wherein the server is used for distributing caching tasks for the nodes; the node is used for caching the training data block based on the caching task distributed by the server; the node comprises at least one client;

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of claim 7.

10. A computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the method of claim 7.