CN117574978A - Graph model training method, device, equipment and storage medium - Google Patents

Graph model training method, device, equipment and storage medium Download PDF

Info

Publication number
CN117574978A
CN117574978A CN202311338658.9A CN202311338658A CN117574978A CN 117574978 A CN117574978 A CN 117574978A CN 202311338658 A CN202311338658 A CN 202311338658A CN 117574978 A CN117574978 A CN 117574978A
Authority
CN
China
Prior art keywords
gpu
sampling
node set
full
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311338658.9A
Other languages
Chinese (zh)
Inventor
焦学武
胡伟
黄正杰
骆新生
胡明清
杨俊超
李伟彬
冯丹蕾
戴斯铭
李淼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202311338658.9A priority Critical patent/CN117574978A/en
Publication of CN117574978A publication Critical patent/CN117574978A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a graph model training method, device, equipment and storage medium, relates to the technical field of computers, and particularly relates to the technical field of deep learning technology and distributed data processing. The specific implementation scheme is as follows: according to the topological structure of the full-image data, at least two GPU servers are adopted to carry out free sampling on the full-image data respectively, so as to obtain at least two target sampling results; carrying out attribute and parameter pulling on the target sampling result to obtain a target parameter result; and training the graph model by adopting at least two target sampling results and at least two target parameter results. Through the technical scheme, the training efficiency of the graph model can be improved.

Description

Graph model training method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technology, and in particular, to the field of deep learning techniques and distributed data processing techniques.
Background
The graph model technology becomes a research hot spot in recent years, and widely falls to the ground in Internet search popularization scenes. Internet vendors need to introduce more data, conduct graph model training for larger scale and more complex models. The conventional training scheme (including the distributed CPU scheme and the single-machine GPU scheme) cannot meet the requirements and faces various challenges such as performance, scale, multiple machines, complex models and the like.
Disclosure of Invention
The present disclosure provides a graph model training method, apparatus, device, and storage medium.
According to an aspect of the present disclosure, there is provided a graph model training method, the method including:
according to the topological structure of the full-image data, at least two GPU servers are adopted to carry out free sampling on the full-image data respectively, so as to obtain at least two target sampling results;
carrying out attribute and parameter pulling on the target sampling result to obtain a target parameter result;
and training the graph model by adopting at least two target sampling results and at least two target parameter results.
According to another aspect of the present disclosure, there is provided a graphic model training apparatus, the apparatus including:
the target sampling result determining module is used for performing free sampling on the full-image data by adopting at least two GPU servers according to the topological structure of the full-image data to obtain at least two target sampling results;
the target parameter result determining module is used for carrying out attribute and parameter pulling on the target sampling result to obtain a target parameter result;
and the graph model training module is used for training the graph model by adopting at least two target sampling results and at least two target parameter results.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the graph model training method of any one embodiment of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the graph model training method of any of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a graph model training method according to any embodiment of the present disclosure.
According to the technology disclosed by the invention, the efficiency of graph model training can be improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of a graph model training method provided in accordance with an embodiment of the present disclosure;
FIG. 2 is a flow chart of another graph model training method provided in accordance with an embodiment of the present disclosure;
FIG. 3 is a flow chart of yet another graph model training method provided in accordance with an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a graphical model training apparatus provided in accordance with an embodiment of the present disclosure;
FIG. 5 is a block diagram of an electronic device for implementing the graph model training method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that the terms "first," "second," "target," "candidate," and the like in the description and claims of the present invention and in the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In addition, in the technical scheme of the invention, the processing such as collection, storage, use, processing, transmission, provision and disclosure of the related full-image data and the like all meet the requirements of related laws and regulations and do not violate the popular regulations.
Fig. 1 is a flowchart of a graph model training method provided in accordance with an embodiment of the present disclosure. The method is suitable for the situation of how to train the graph model in the Internet search popularization scene. The method may be performed by a graph model training apparatus, which may be implemented in software and/or hardware, and may be integrated in an electronic device, such as a server, carrying graph model training functions. As shown in fig. 1, the graph model training method of the present embodiment may include:
s101, according to the topological structure of the full-image data, at least two GPU servers are adopted to carry out free sampling on the full-image data respectively, and at least two target sampling results are obtained.
In this embodiment, the full-graph data refers to graph data related to an internet search promotion scene, and may be, for example, a recommendation system graph or a social network graph; alternatively, nodes and edges on a large scale are included, as well as node attributes and edge attributes, typically on the order of hundreds of millions. The target sampling result is that the GPU server performs the walk sampling on the whole image data to obtain image data, including relevant nodes and edges, and node attributes and edge attributes. The topology refers to the topological relation among nodes in the graph data.
In an alternative mode, the full-image data can be divided into at least two parts according to the topological structure of the full-image data, each part of full-image data is respectively sent to a corresponding GPU server, and each part of full-image data is subjected to walk sampling through the corresponding GPU server to obtain a target sampling result corresponding to the GPU server.
S102, carrying out attribute and parameter pulling on the target sampling result to obtain a target parameter result.
In this embodiment, the target parameter result refers to data obtained by pulling the attribute and the parameter of the target sampling result, including attribute vector data of the node, attribute vector data of the edge, and the like. Note that, nodes and edges in the full-graph data, and their attributes may be stored in a Solid State Disk (SSD) in a vector form. The attribute is data whose nodes are vectorized. The parameters are the vectorized data of the relevant attributes.
Optionally, after each GPU server obtains the target sampling result, pulling the attribute and the parameter of the target sampling result to obtain a target parameter result. Specifically, after the target sampling result is obtained, the attribute is pulled from the SSD, and simultaneously, the parameter is pulled from the SSD, the attribute and the parameter are pulled from the SSD to the CPU memory, and then the CPU memory is loaded to the GPU video memory, so that the target parameter result is obtained.
S103, training the graph model by adopting at least two target sampling results and at least two target parameter results.
In this embodiment, the graph model may be a model composed of a graph neural network.
Specifically, at least two target sampling results and at least two target parameter results are adopted to carry out batch iterative training on the graph model until the training stopping condition is met, and model training is stopped. The training stopping condition may be that the training loss is stabilized in a set range, or the iteration number reaches a set number; the setting range and the setting number can be set by those skilled in the art according to the actual circumstances.
According to the technical scheme provided by the embodiment of the disclosure, according to the topological structure of the full-image data, at least two GPU servers are adopted to carry out free sampling on the full-image data respectively to obtain at least two target sampling results, then attribute and parameter pulling are carried out on the target sampling results to obtain target parameter results, and further at least two target sampling results and at least two target parameter results are adopted to train the image model. According to the technical scheme, the graph migration and sampling are performance bottlenecks of the graph model, the related technology adopts the single-machine GPU to carry out the migration sampling, but the graph scale is large because the GPU video memory is small, so that the migration sampling is limited.
FIG. 2 is a flow chart of another graph model training method provided in accordance with an embodiment of the present disclosure. Based on the above embodiment, the present embodiment further optimizes the "topology structure according to full-image data, and performs the walk sampling on the full-image data by using at least two GPU servers to obtain at least two target sampling results", thereby providing an alternative embodiment. As shown in fig. 2, the graph model training method of the present embodiment may include:
s201, segmenting the full graph data according to the topological structure of the full graph data to obtain at least two subgraphs.
Wherein the number of subgraphs is the same as the number of servers of the GPU servers. Note that, the subgraph is stored in the SSD.
In an alternative mode, on the premise that the number of sub-graphs is the same as the number of servers of the GPU servers, the full-graph data is segmented based on the topological structure of the full-graph data, and at least two sub-graphs are obtained. Further, on the premise of ensuring that the number of the subgraphs is the same as the number of the servers of the GPU server, the total graph data can be segmented based on the topological structure of the total graph data and the size of the GPU video memory in the GPU server to obtain at least two subgraphs.
S202, at least two sub-graphs are distributed to at least two GPU servers.
Specifically, at least two subgraphs are in one-to-one correspondence with at least two GPU servers, so that the subsequent GPU servers can perform walk sampling on the corresponding subgraphs.
S203, according to the topological structure of the full graph data, adopting the GPU server to carry out the walk sampling on the subgraph corresponding to the GPU server, and obtaining the target sampling result corresponding to the GPU server.
Optionally, for each GPU server, the GPU server is adopted to perform multiple-time walk sampling on the sub-graph corresponding to the GPU server, so as to obtain a target sampling result corresponding to the GPU server. It should be noted that, at least two GPU servers may perform the walk sampling on the corresponding sub-graphs in parallel.
S204, carrying out attribute and parameter pulling on the target sampling result to obtain a target parameter result.
S205, training the graph model by adopting at least two target sampling results and at least two target parameter results.
According to the technical scheme provided by the embodiment of the disclosure, the full graph data is segmented according to the topological structure of the full graph data to obtain at least two sub-graphs, the at least two sub-graphs are distributed to at least two GPU servers, then the sub-graphs corresponding to the GPU servers are subjected to the migration sampling according to the topological structure of the full graph data by the GPU servers to obtain target sampling results corresponding to the GPU servers, further the target sampling results are subjected to attribute and parameter pulling to obtain target parameter results, and finally the graph model is trained by adopting the at least two target sampling results and the at least two target parameter results. According to the technical scheme, the whole graph data is segmented, the sub-graph is subjected to the migration sampling, and the node cross-machine proportion in the migration sampling process can be reduced.
On the basis of the above embodiment, as an optional manner of the present disclosure, according to a topology structure of full graph data, performing, by using a GPU server, a walk sampling on a sub-graph corresponding to the GPU server, to obtain a target sampling result corresponding to the GPU server, including: dividing the sub-graph into at least two initial node sets in average; obtaining an initial node set according to the topological structure of the full-graph data and the initial node set; and according to the topological structure of the full-graph data, performing multiple-time walk sampling on at least two initial node sets by using the GPU server to obtain a target sampling result corresponding to the GPU server.
The initial node set refers to a set formed by nodes in the subgraph. By a set of starting nodes is meant a set of neighbor nodes comprising nodes in the subgraph and nodes in the subgraph.
Specifically, the sub-graph can be divided into at least two initial combination sets at random, and the number of nodes in each initial node set is the same. And selecting some neighbor nodes for the nodes in the initial node set from the full-graph data according to the topological structure of the full-graph data, thereby obtaining the initial node set. And according to the topological structure of the full graph data, nodes in at least two initial node sets are respectively taken as sampling initial nodes, and the GPU server is adopted to carry out the walk sampling on at least two initial combined sets in series, so that a target sampling result corresponding to the GPU server is obtained.
It can be understood that the sub-graphs are further divided, and the GPU server is adopted to carry out serial walk sampling on a plurality of initial node sets, so that the problem of limited video memory can be avoided, and the sampling efficiency is improved.
It should be noted that, the full graph data, the subgraph and the initial node set are all stored in the solid state disk SSD. It can be understood that the whole graph data, the sub graph, the initial combination set and other related data are stored in the solid state disk SSD, the whole graph data is subjected to heterogeneous multi-level storage, namely, the whole graph data is divided into the sub graph, and the sub graph is divided into the initial node set for hierarchical storage.
Further, by way of example, obtaining the initial node set according to the topology structure of the full graph data and the initial node set includes: obtaining n-order neighbor nodes of initial nodes in an initial node set from the full graph data, and adding the n-order neighbor nodes into the initial node set to obtain a first node set; n is a natural number greater than 1; pulling the first node set from the SSD to the CPU memory to obtain a second node set; and loading the second node set from the CPU memory to the GPU server to obtain a starting node set.
Wherein the first node set refers to a set of starting nodes stored in the SSD. The second node set refers to a set of starting nodes stored in the CPU memory.
Specifically, according to the topological structure of the full graph data, n-order neighbor nodes of each initial node in the initial node set are obtained from the full graph data, and the corresponding n-order neighbor nodes are added into the initial node set to obtain a first node set, wherein n is a natural number larger than 1; then the first node set is pulled up from the SSD to the CPU memory to obtain a second node set; and loading the second node set from the CPU memory to the GPU server to obtain a starting node set.
It can be understood that the sub-graphs are decomposed and stored in a hierarchical manner according to the resources, so that the advantages of SSD storage capacity can be brought into play, the advantages of GPU running speed can be realized, and the rate of large-scale graph migration sampling is improved.
FIG. 3 is a flow chart of yet another graph model training method provided in accordance with an embodiment of the present disclosure; based on the above embodiment, the present embodiment provides an optional implementation, where the GPU server includes at least two source GPU cards, and accordingly, the GPU server performs multiple-run sampling on at least two starting node sets according to the topology structure of the full graph data, so as to obtain a target sampling result corresponding to the GPU server. As shown in fig. 3, the graph model training method of the present embodiment may include:
S301, segmenting the full graph data according to the topological structure of the full graph data to obtain at least two subgraphs.
S302, at least two sub-graphs are distributed to at least two GPU servers.
S303, dividing the subgraph into at least two initial node sets.
S304, obtaining an initial node set according to the topological structure of the full graph data and the initial node set.
S305, according to the topological structure of the full-graph data, performing multiple-time walk sampling on at least two initial node sets by using the GPU server to obtain a target sampling result corresponding to the GPU server.
Optionally, for each trip sampling, respectively sending at least two initial node sets corresponding to the trip sampling to at least two source GPU cards in the GPU server; adopting a source GPU card, and performing walk sampling on the initial node set according to the topological structure of the full-graph data to obtain an intermediate sampling result; nodes belonging to the source GPU card in other source GPU cards are sent to the source GPU card, and a candidate sampling node set is obtained; and updating the initial node set by adopting the candidate sampling node set to obtain the initial node set of the next trip sampling until the last trip sampling is finished, and obtaining a target sampling result corresponding to the GPU server. Further, nodes belonging to other source GPU cards in the intermediate sampling result are sent to the other source GPU cards.
In this embodiment, the GPU server includes at least two source GPU cards. It should be noted that the number of cards of the GPU is the same as the number of the start node sets.
The intermediate sampling result is a sampling node obtained after the last wander sampling except the last wander sampling; optionally, the nodes of the source GPU card and the nodes of other source GPU cards are included. The candidate sampling node set refers to a set of nodes, which are obtained after each sampling, belonging to the source GPU card.
Specifically, for each trip sampling, at least two initial node sets corresponding to the trip sampling are respectively sent to corresponding source GPU cards in a GPU server, then for each source GPU card, the source GPU card is adopted to start the trip sampling by taking nodes in the initial node set corresponding to the source GPU card as the initial nodes based on the full graph data, in the process of the trip sampling, the source GPU card can acquire other nodes related to the local initial nodes from other source GPU cards based on the topological structure of the full graph data, and also can send the local nodes related to the initial nodes in other source GPU cards to other source GPU cards, and after the trip sampling, the intermediate sampling result is obtained. And then nodes belonging to other source GPU cards in the intermediate adoption result in the source GPU card are sent to the corresponding other source GPU cards, and nodes belonging to the source GPU card in the other source GPU cards are sent to the source GPU card, so that a candidate sampling node set corresponding to the source GPU card is obtained. And finally, taking the candidate sampling node set corresponding to the source GPU card as a new starting node set, starting the next wandering sampling until the last wandering sampling is finished, and thus obtaining a target sampling result corresponding to the GPU server.
It should be noted that, assuming that a certain GPU server includes three source GPU cards of GPU0, GPU1 and GPU2, when the first-time walk sampling starts, starting node sets a { a0, a1, a2, a3, …, a10}, B { B0, B1, B2, B3, …, B10}, C { C0, C1, C2, C3, …, C10}, respectively corresponding to GPU0, GPU1 and GPU2, B4, C2}, respectively send B2, B4 to GPU1, and send C2 to GPU2, to obtain a candidate sampling node set z { a0, a3, a5, a7, a8, a10}, the candidate sampling node will beSet z { a0, a3, a5, a7, a8, a10} is used as the starting combined set for the next downsampling of GPU 0. That is, each time a excursion sample begins, the corresponding set of starting nodes in each source GPU card is a subset of the set of starting nodes in the source GPU card at the time of the first excursion sample.
Further, it should be noted that, communication between the source GPU cards may be performed through NCCL. NCCL (NVIDIA Collective Communications Library) is a high-performance multi-GPU communication library developed by NVIDIA for enabling fast data transfer and collaborative computation among multiple NVIDIA GPUs. It can provide support for distributed training and data parallel acceleration in the field of deep learning and high performance computing.
According to the technical scheme provided by the embodiment, the chart is adopted by the multi-GPU cards of the multi-PGU server, so that the rate of the chart sampling can be increased.
On the basis of the foregoing embodiment, as an optional manner of the present disclosure, after the source GPU card is adopted, according to the topology structure of the full graph data, the performing a walk sampling on the starting node set, to obtain an intermediate sampling result, further includes: and performing de-duplication processing on the intermediate sampling result by adopting a source GPU card.
Specifically, after the initial node set is subjected to the walk sampling, the obtained intermediate sampling result has the condition of repeated nodes, so that the intermediate sampling result is subjected to the deduplication processing by adopting the source GPU card. It can be appreciated that by means of the node de-duplication processing in the card, repeated sampling of the next downstream sampling of the GPU card can be avoided, so that processing and transmission of repeated data are reduced.
On the basis of the foregoing embodiment, as an optional manner of the disclosure, after updating the starting node set with the candidate sampling node set, the method further includes: and adopting a GPU server to perform deduplication processing on the updated initial node set in the at least two source GPU cards.
Specifically, for each GPU server, after updating the starting node set, at least two source GPU cards in the GPU server perform deduplication processing on the updated starting node set in the at least two source GPU cards, that is, it is ensured that there are no repeated starting nodes in the GPU server. It can be understood that the repeated sampling of the GPU server in the next trip sampling can be avoided by performing the de-duplication in the machine, i.e. the GPU server, so that the processing and transmission of the repeated data are further reduced.
Fig. 4 is a schematic structural diagram of a graphic model training apparatus according to an embodiment of the present disclosure. The embodiment of the disclosure is suitable for the situation of how to train the graph model in the Internet search popularization scene. The apparatus may be implemented in software and/or hardware and may be integrated into an electronic device, such as a server, that carries the graph model training functions. As shown in fig. 4, the graphic model training apparatus 400 of the present embodiment may include:
the target sampling result determining module 401 is configured to perform, according to the topology structure of the full-image data, walk sampling on the full-image data by using at least two GPU servers, so as to obtain at least two target sampling results;
the target parameter result determining module 402 is configured to perform attribute and parameter pulling on the target sampling result to obtain a target parameter result;
the graph model training module 403 is configured to train the graph model using at least two target sampling results and at least two target parameter results.
According to the technical scheme provided by the embodiment of the disclosure, according to the topological structure of the full-image data, at least two GPU servers are adopted to carry out free sampling on the full-image data respectively to obtain at least two target sampling results, then attribute and parameter pulling are carried out on the target sampling results to obtain target parameter results, and further at least two target sampling results and at least two target parameter results are adopted to train the image model. According to the technical scheme, the graph migration and sampling are performance bottlenecks of the graph model, the related technology adopts the single-machine GPU to carry out the migration sampling, but the graph scale is large because the GPU video memory is small, so that the migration sampling is limited.
Further, the target sampling result determining module 401 includes:
the sub-graph determining sub-module is used for splitting the full graph data according to the topological structure of the full graph data to obtain at least two sub-graphs; wherein, the number of subgraphs is the same as the number of servers of the GPU servers;
a sub-graph allocation sub-module for allocating at least two sub-graphs to at least two GPU servers;
and the target sampling result determining sub-module is used for performing walk sampling on the subgraph corresponding to the GPU server by adopting the GPU server according to the topological structure of the full graph data to obtain a target sampling result corresponding to the GPU server.
Further, the target sampling result determination submodule includes:
an initial node set determining unit, configured to divide the sub-graph into at least two initial node sets;
the initial node set determining unit is used for obtaining an initial node set according to the topological structure of the full-graph data and the initial node set;
and the target sampling result determining unit is used for carrying out multiple-time walk sampling on at least two initial node sets by adopting the GPU server according to the topological structure of the full-graph data to obtain a target sampling result corresponding to the GPU server.
Further, the full graph data, the subgraph and the initial node set are all stored to the solid state disk SSD.
Further, the starting node set determining unit is specifically configured to:
obtaining n-order neighbor nodes of initial nodes in an initial node set from the full graph data, and adding the n-order neighbor nodes into the initial node set to obtain a first node set; n is a natural number greater than 1;
pulling the first node set from the SSD to the CPU memory to obtain a second node set;
and loading the second node set from the CPU memory to the GPU server to obtain a starting node set.
Further, the GPU server includes at least two source GPU cards.
Further, the target sampling result determining unit is configured to:
for each wandering sampling, respectively transmitting at least two initial node sets corresponding to the wandering sampling to at least two source GPU cards in the GPU server;
adopting a source GPU card, and performing walk sampling on the initial node set according to the topological structure of the full-graph data to obtain an intermediate sampling result; the intermediate sampling result comprises nodes of the source GPU card and nodes of other source GPU cards;
nodes belonging to the source GPU card in other source GPU cards are sent to the source GPU card, and a candidate sampling node set is obtained;
and updating the initial node set by adopting the candidate sampling node set to obtain the initial node set of the next trip sampling until the last trip sampling is finished, and obtaining a target sampling result corresponding to the GPU server.
Further, the target sampling result determining unit is further configured to:
and after the source GPU card is adopted, the initial node set is subjected to the walk sampling according to the topological structure of the full graph data, and the intermediate sampling result is obtained, and then the source GPU card is adopted to carry out the de-duplication processing on the intermediate sampling result.
Further, the target sampling result determining unit is further configured to:
and after updating the initial node set by adopting the candidate sampling node set, carrying out de-duplication processing on the updated initial node set in at least two source GPU cards by adopting the GPU server.
Further, the target sampling result determining unit is further configured to:
and transmitting nodes belonging to other source GPU cards in the intermediate sampling result to the other source GPU cards.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
FIG. 5 is a block diagram of an electronic device for implementing the graph model training method of an embodiment of the present disclosure. Fig. 5 illustrates a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the electronic device 500 includes a computing unit 501 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic device 500 may also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in electronic device 500 are connected to I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the electronic device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as the graph model training method. For example, in some embodiments, the graph model training method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the graph model training method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the graph model training method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, a server of a distributed system, or a server with a blockchain node.
Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligent software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.
Cloud computing (cloud computing) refers to a technical system that a shared physical or virtual resource pool which is elastically extensible is accessed through a network, resources can comprise servers, operating systems, networks, software, applications, storage devices and the like, and resources can be deployed and managed in an on-demand and self-service mode. Through cloud computing technology, high-efficiency and powerful data processing capability can be provided for technical application such as artificial intelligence and blockchain, and model training.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (23)

1. A graph model training method, comprising:
according to the topological structure of the full-image data, at least two GPU servers are adopted to carry out free sampling on the full-image data respectively, so as to obtain at least two target sampling results;
carrying out attribute and parameter pulling on the target sampling result to obtain a target parameter result;
and training the graph model by adopting at least two target sampling results and at least two target parameter results.
2. The method of claim 1, wherein the performing, according to the topology of the full-image data, the walk sampling on the full-image data by using at least two GPU servers to obtain at least two target sampling results respectively includes:
splitting the full graph data according to the topological structure of the full graph data to obtain at least two subgraphs; the number of the subgraphs is the same as the number of servers of the GPU servers;
Assigning the at least two sub-graphs to at least two GPU servers;
and according to the topological structure of the full graph data, adopting the GPU server to carry out the walk sampling on the subgraph corresponding to the GPU server to obtain the target sampling result corresponding to the GPU server.
3. The method of claim 2, wherein the performing, by using the GPU server, the walk sampling on the sub-graph corresponding to the GPU server according to the topology structure of the full graph data to obtain the target sampling result corresponding to the GPU server, includes:
dividing the subgraph into at least two initial node sets in average;
obtaining an initial node set according to the topological structure of the full-graph data and the initial node set;
and according to the topological structure of the full graph data, performing multiple-time walk sampling on at least two initial node sets by adopting the GPU server to obtain a target sampling result corresponding to the GPU server.
4. The method of claim 3, wherein the full graph data, the subgraph, and the initial set of nodes are all stored to a solid state disk SSD.
5. The method of claim 4, wherein the obtaining the initial node set according to the topology of the full graph data and the initial node set comprises:
Obtaining n-order neighbor nodes of initial nodes in the initial node set from the full graph data, and adding the n-order neighbor nodes into the initial node set to obtain a first node set; n is a natural number greater than 1;
pulling the first node set from the SSD to a CPU memory to obtain a second node set;
and loading the second node set from the CPU memory to the GPU server to obtain a starting node set.
6. A method according to claim 3, wherein the GPU server comprises at least two source GPU cards.
7. The method of claim 6, wherein the performing, by using the GPU server, multiple excursion samples on at least two starting node sets according to the topology structure of the full graph data to obtain a target sampling result corresponding to the GPU server, includes:
for each migration sampling, respectively sending at least two initial node sets corresponding to the migration sampling to at least two source GPU cards in the GPU server;
adopting the source GPU card to carry out wandering sampling on the initial node set according to the topological structure of the full-graph data to obtain an intermediate sampling result; the intermediate sampling result comprises nodes of the source GPU card and nodes of other source GPU cards;
Transmitting nodes belonging to the source GPU card in other source GPU cards to the source GPU card to obtain a candidate sampling node set;
and updating the initial node set by adopting the candidate sampling node set to obtain the initial node set of the next trip sampling until the last trip sampling is finished, and obtaining a target sampling result corresponding to the GPU server.
8. The method of claim 7, wherein after adopting the source GPU card to perform the walk sampling on the set of starting nodes according to the topology structure of the full graph data, obtaining an intermediate sampling result, further comprising:
and carrying out de-duplication processing on the intermediate sampling result by adopting the source GPU card.
9. The method of claim 7, wherein after updating the set of starting nodes with the set of candidate sampling nodes, further comprising:
and adopting the GPU server to perform de-duplication processing on the updated initial node set in the at least two source GPU cards.
10. The method of claim 7, further comprising:
and transmitting nodes belonging to other source GPU cards in the intermediate sampling result to the other source GPU cards.
11. A graph model training apparatus comprising:
The target sampling result determining module is used for performing free sampling on the full-image data by adopting at least two GPU servers according to the topological structure of the full-image data to obtain at least two target sampling results;
the target parameter result determining module is used for carrying out attribute and parameter pulling on the target sampling result to obtain a target parameter result;
and the graph model training module is used for training the graph model by adopting at least two target sampling results and at least two target parameter results.
12. The apparatus of claim 11, wherein the target sampling result determination module comprises:
the sub-graph determining sub-module is used for dividing the full graph data according to the topological structure of the full graph data to obtain at least two sub-graphs; the number of the subgraphs is the same as the number of servers of the GPU servers;
a sub-graph allocation sub-module for allocating the at least two sub-graphs to at least two GPU servers;
and the target sampling result determining sub-module is used for performing walk sampling on the subgraph corresponding to the GPU server by adopting the GPU server according to the topological structure of the full graph data to obtain a target sampling result corresponding to the GPU server.
13. The apparatus of claim 12, wherein the target sampling result determination submodule comprises:
an initial node set determining unit, configured to divide the sub-graph into at least two initial node sets;
the initial node set determining unit is used for obtaining an initial node set according to the topological structure of the full-graph data and the initial node set;
and the target sampling result determining unit is used for carrying out multiple-time wandering sampling on at least two initial node sets by adopting the GPU server according to the topological structure of the full graph data to obtain a target sampling result corresponding to the GPU server.
14. The apparatus of claim 13, wherein the full graph data, the subgraph, and the initial set of nodes are all stored to a solid state disk, SSD.
15. The apparatus of claim 14, wherein the starting node set determining unit is specifically configured to:
obtaining n-order neighbor nodes of initial nodes in the initial node set from the full graph data, and adding the n-order neighbor nodes into the initial node set to obtain a first node set; n is a natural number greater than 1;
pulling the first node set from the SSD to a CPU memory to obtain a second node set;
And loading the second node set from the CPU memory to the GPU server to obtain a starting node set.
16. The apparatus of claim 13, wherein the GPU server comprises at least two source GPU cards.
17. The apparatus of claim 16, wherein the target sampling result determination unit is configured to:
for each migration sampling, respectively sending at least two initial node sets corresponding to the migration sampling to at least two source GPU cards in the GPU server;
adopting the source GPU card to carry out wandering sampling on the initial node set according to the topological structure of the full-graph data to obtain an intermediate sampling result; the intermediate sampling result comprises nodes of the source GPU card and nodes of other source GPU cards;
transmitting nodes belonging to the source GPU card in other source GPU cards to the source GPU card to obtain a candidate sampling node set;
and updating the initial node set by adopting the candidate sampling node set to obtain the initial node set of the next trip sampling until the last trip sampling is finished, and obtaining a target sampling result corresponding to the GPU server.
18. The apparatus of claim 17, wherein the target sampling result determination unit is further configured to:
and after the source GPU card is adopted, the initial node set is subjected to wandering sampling according to the topological structure of the full-graph data to obtain an intermediate sampling result, and the source GPU card is adopted to carry out de-duplication processing on the intermediate sampling result.
19. The apparatus of claim 17, wherein the target sampling result determination unit is further configured to:
and after updating the initial node set by adopting the candidate sampling node set, performing deduplication processing on the updated initial node set in at least two source GPU cards by adopting the GPU server.
20. The apparatus of claim 17, wherein the target sampling result determination unit is further configured to:
and transmitting nodes belonging to other source GPU cards in the intermediate sampling result to the other source GPU cards.
21. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the graph model training method of any one of claims 1-10.
22. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the graph model training method according to any one of claims 1-10.
23. A computer program product comprising a computer program which, when executed by a processor, implements a graph model training method according to any of claims 1-10.
CN202311338658.9A 2023-10-16 2023-10-16 Graph model training method, device, equipment and storage medium Pending CN117574978A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311338658.9A CN117574978A (en) 2023-10-16 2023-10-16 Graph model training method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311338658.9A CN117574978A (en) 2023-10-16 2023-10-16 Graph model training method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117574978A true CN117574978A (en) 2024-02-20

Family

ID=89859570

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311338658.9A Pending CN117574978A (en) 2023-10-16 2023-10-16 Graph model training method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117574978A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119129683A (en) * 2024-11-14 2024-12-13 浙江大学 Graph neural network training acceleration method and system based on multi-GPU and multi-SSD

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119129683A (en) * 2024-11-14 2024-12-13 浙江大学 Graph neural network training acceleration method and system based on multi-GPU and multi-SSD
CN119129683B (en) * 2024-11-14 2025-02-11 浙江大学 Graph neural network training acceleration method and system based on multi-GPU and multi-SSD

Similar Documents

Publication Publication Date Title
CN112560874B (en) Training method, device, equipment and medium for image recognition model
CN113342345A (en) Operator fusion method and device of deep learning framework
CN113033194B (en) Training method, device, equipment and storage medium for semantic representation graph model
CN114841315A (en) Hybrid expert model realization method, system, electronic device and storage medium
US20220245465A1 (en) Picture searching method and apparatus, electronic device and computer readable storage medium
CN113641829B (en) Training and knowledge graph completion method and device for graph neural network
CN112560481B (en) Statement processing method, device and storage medium
CN113627536A (en) Model training method, video classification method, device, equipment and storage medium
CN113361574A (en) Training method and device of data processing model, electronic equipment and storage medium
CN117574978A (en) Graph model training method, device, equipment and storage medium
WO2023174189A1 (en) Method and apparatus for classifying nodes of graph network model, and device and storage medium
CN116127319A (en) Multi-mode negative sample construction and model pre-training method, device, equipment and medium
CN114882333B (en) Training method and device of data processing model, electronic equipment and storage medium
CN118690837A (en) Operator fusion method, device and electronic equipment
CN115809688B (en) Model debugging method and device, electronic equipment and storage medium
CN115186738B (en) Model training method, device and storage medium
CN115796228B (en) Operator fusion method, device, equipment and storage medium
US12007965B2 (en) Method, device and storage medium for deduplicating entity nodes in graph database
CN113792117B (en) Method and device for determining data update context, electronic equipment and storage medium
CN112560987B (en) Image sample processing method, apparatus, device, storage medium, and program product
CN116452861A (en) Target model training method and device and electronic equipment
CN115794742A (en) File path data processing method, device, equipment and storage medium
CN115759260A (en) Inference method and device of deep learning model, electronic equipment and storage medium
CN114422584B (en) Method, device and storage medium for pushing resources
CN114398130B (en) Page display method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination