WO2020248342A1

WO2020248342A1 - Hyper-parameter optimization method and apparatus for large-scale network representation learning

Info

Publication number: WO2020248342A1
Application number: PCT/CN2019/098235
Authority: WO
Inventors: 朱文武; 涂珂; 崔鹏
Original assignee: 清华大学
Priority date: 2019-06-14
Filing date: 2019-07-29
Publication date: 2020-12-17
Also published as: CN110322021B; CN110322021A

Abstract

A hyper-parameter optimization method and apparatus for large-scale network representation learning. The method comprises: sampling an original network to obtain a plurality of sub-networks (101); extracting, according to a pre-set algorithm, a first image feature of the original network and a second image feature of each sub-network in the plurality of sub-networks (102); fitting, according to Gaussian process regression, mapping from the second image feature and a hyper-parameter of each sub-network to a final effect (103); calculating the first image feature and each second image feature according to a similarity function to acquire the network similarity between the original network and each sub-network (104); and learning the mapping from the second image feature and the hyper-parameter of each sub-network in the plurality of sub-networks to the final effect in order to generate an optimal hyper-parameter of the original network, so as to perform information identification by means of the original network (105). By means of the method, an optimal hyper-parameter of an original network is optimized by means of learning mapping from a hyper-parameter and a second image feature in a plurality of sub-networks to a final effect, such that the hyper-parameter of the original network can be quickly, effectively and automatically adjusted.

Description

Hyperparameter optimization method and device for large-scale network representation learning

Cross references to related applications

This application claims the priority of the Chinese patent application number "201910515890.2" submitted by Tsinghua University on June 14, 2019 with the title of "Hyperparameter Optimization Method and Device for Large-scale Network Representation Learning".

Technical field

This application relates to the field of network learning technology, and in particular to a method and device for optimizing hyperparameters of large-scale network representation learning.

Background technique

Network representation learning is an effective way to process network data. In order to achieve good results, network representation learning usually requires careful adjustment of parameters. However, the large scale of the real network brings difficulties to the application of automatic machine learning to network representation learning methods.

Summary of the invention

This application aims to solve one of the technical problems in the related technology at least to a certain extent.

This application proposes a hyperparameter optimization method for large-scale network representation learning to solve the technical problem of low efficiency in optimizing hyperparameters for large-scale network representation learning in the prior art.

An embodiment of the present application proposes a hyperparameter optimization method for large-scale network representation learning, including:

Sampling the original network to obtain multiple sub-networks;

Extracting the first image feature of the original network and the second image feature of each of the multiple sub-networks according to a preset algorithm;

Regression fitting the mapping of the second image features and hyperparameters of each of the multiple sub-networks to the final effect according to the Gaussian process;

Calculating the first image feature and each second image feature according to a similarity function to obtain the network similarity between the original network and each sub-network;

According to the network similarity between the original network and each sub-network, the mapping of the second image features and hyperparameters of each sub-network in the multiple sub-networks to the final effect is learned to generate the optimal hyper-parameters of the original network to pass The original network performs information identification.

In the hyperparameter optimization method for large-scale network representation learning in the embodiment of the present application, multiple sub-networks are obtained by sampling the original network, and the first image feature of the original network and the first image feature of each of the multiple sub-networks are extracted according to a preset algorithm. Two image features, according to the Gaussian process regression fitting the second image features and hyperparameters of each sub-network in the multiple sub-networks to the final result, and the first image feature and each second image feature are calculated according to the similarity function to obtain The network similarity between the original network and each sub-network, according to the network similarity between the original network and each sub-network, learn the second image features and hyperparameters of each sub-network in the multiple sub-networks to the final effect to generate the original The optimal hyperparameters of the network for information identification through the original network. The method optimizes the optimal hyperparameters of the original network by learning the hyperparameters in multiple sub-networks and the mapping of the second image feature to the final effect, and can quickly and effectively automatically adjust the hyperparameters of the original network.

Another embodiment of the present application proposes a hyperparameter optimization device for large-scale network representation learning, including:

The sampling module is used to sample the original network to obtain multiple sub-networks;

An extraction module, configured to extract the first image feature of the original network and the second image feature of each of the multiple sub-networks according to a preset algorithm;

The fitting module is used to regression fit the second image features and hyperparameters of each of the multiple sub-networks to the final effect according to the Gaussian process; the calculation module is used to perform the mapping of the first image according to the similarity function Calculation of features and each second image feature to obtain network similarity between the original network and each sub-network;

The generating module is configured to learn the mapping of the second image features and hyperparameters of each sub-network of the multiple sub-networks to the final effect according to the network similarity of the original network and each sub-network to generate the optimal of the original network Hyperparameters for information identification through the original network.

The hyperparameter optimization device for large-scale network representation learning in the embodiment of the present application obtains multiple sub-networks by sampling the original network, and extracts the first image feature of the original network and the first image feature of each of the multiple sub-networks according to a preset algorithm Two image features, according to the Gaussian process regression fitting the second image features and hyperparameters of each sub-network in the multiple sub-networks to the final effect, and calculate the first image feature and each second image feature according to the similarity function to obtain The network similarity between the original network and each sub-network, according to the network similarity between the original network and each sub-network, learn the second image features and hyperparameters of each sub-network in multiple sub-networks to the final effect to generate the optimal original network Super-parameters for information identification through the original network. The method optimizes the optimal hyperparameters of the original network by learning the hyperparameters in multiple sub-networks and the mapping of the second image feature to the final effect, and can quickly and effectively automatically adjust the hyperparameters of the original network.

The additional aspects and advantages of this application will be partly given in the following description, and some will become obvious from the following description, or be understood through the practice of this application.

Description of the drawings

The above and/or additional aspects and advantages of the present application will become obvious and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, in which:

FIG. 1 is a schematic flowchart of a hyperparameter optimization method for large-scale network representation learning provided by an embodiment of this application;

FIG. 2 is a schematic structural diagram of a hyperparameter optimization device for large-scale network representation learning provided by an embodiment of this application.

Detailed ways

The embodiments of the present application are described in detail below. Examples of the embodiments are shown in the accompanying drawings, wherein the same or similar reference numerals indicate the same or similar elements or elements with the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary, and are intended to explain the application, but should not be understood as a limitation to the application.

In the prior art, when optimizing the hyperparameters of large-scale network representation learning, the parameters are adjusted directly on the sampled small graph. However, when the small graph is obtained by sampling, the connection between the network nodes is destroyed, making the sample small graph The optimal solution of is not the optimal solution of the big picture. In addition, real network data is usually composed of many different heterogeneous units, and sampling may cause the loss of some units and affect the selection of the optimal solution.

In response to the above technical problems, the embodiments of the present application provide a method for optimizing large-scale network representation learning hyperparameters. By sampling the original network, multiple sub-networks are obtained, and the first image features and multiple sub-networks of the original network are extracted according to a preset algorithm. The second image features of each sub-network in the sub-networks are fitted according to the Gaussian process regression and the second image features and hyperparameters of each sub-network in the multiple sub-networks are mapped to the final effect. The first image feature and each sub-network are mapped according to the similarity function. Second image feature calculation to obtain the network similarity between the original network and each sub-network, and learn the second image features and hyperparameters of each sub-network in multiple sub-networks to the final effect according to the network similarity between the original network and each sub-network The mapping generates the optimal hyperparameters of the original network for information identification through the original network.

The following describes the hyperparameter optimization method and device for large-scale network representation learning in the embodiments of the present application with reference to the accompanying drawings.

FIG. 1 is a schematic flowchart of a hyperparameter optimization method for large-scale network representation learning provided by an embodiment of this application.

As shown in Figure 1, the method includes the following steps:

Step 101: Sample the original network to obtain multiple sub-networks.

Among them, the original network refers to a large-scale network used for network representation learning. Network representation learning aims to represent the nodes in the network as low-dimensional, real-valued, and dense vector forms, so that the obtained vector form can have the ability to represent and reason in the vector space, so that it can be more flexibly applied to different data Excavating task.

For example, the representation of a node can be used as a feature and sent to a classifier like a support vector machine. At the same time, the node representation can also be transformed into spatial coordinates for visualization tasks.

In the embodiment of this application, a multi-source random walk sampling algorithm is adopted to sample the original network to obtain multiple sub-networks. Specifically, starting from multiple nodes of the original network, randomly walk to its neighboring nodes, and then randomly move from the neighboring nodes until the preset number of times is reached, and finally take the subgraph composed of all the visited nodes as our sampling The sub-networks, thereby generating multiple sub-networks.

Step 102: Extract the first image feature of the original network and the second image feature of each of the multiple sub-networks according to a preset algorithm.

In this embodiment, a preset signal extraction algorithm is used to extract signals from the original network and multiple sub-networks to obtain the first image feature of the original network and the second image feature of each of the multiple sub-networks. Specifically, the first candidate feature vector of the original network under the Laplacian matrix and the second candidate feature vector of each sub-network are calculated. Furthermore, low-pass filtering is performed on the first feature vector and the second feature vector to obtain the first feature vector of the original network and the second feature vector of each sub-network.

Step 103: Regression fits the mapping of the second image features and hyperparameters of each of the multiple sub-networks to the final effect according to the Gaussian process.

Among them, Gaussian process regression studies the relationship between variables and variables, that is, by establishing the relationship between the dependent variable and the independent variable, by establishing the regression function as much as possible, and obtaining the smallest mean square error in the case of not fitting.

In this embodiment, a Gaussian process regression algorithm is used to map the second image features and hyperparameters of each of the multiple sub-networks obtained by sampling to the final effect.

Step 104: Calculate the first image feature and each second image feature according to the similarity function to obtain the network similarity between the original network and each sub-network.

Among them, the network similarity is the network structure similarity and super parameter similarity between the original network and the sub-network.

Specifically, the first image feature and each second image feature are calculated according to the similarity function, and further, the network structure similarity and the hyperparameter similarity between the original network and each sub-network are obtained.

It should be noted that the similarity function can be used as the kernel function of the Gaussian process to ensure that the more similar the sub-network is to the original network, the more similar the optimal superparameters of the original network are finally predicted. Among them, the kernel function refers to the so-called Radial Basis Function (RBF), which is a scalar function that is symmetric along the radial direction.

Step 105: According to the network similarity between the original network and each sub-network, learn the second image features and hyperparameters of each sub-network in the multiple sub-networks to the final effect to generate the optimal hyper-parameters of the original network, so as to perform through the original network Information recognition.

In the embodiment of the present application, the first image feature and each second image feature are calculated according to the similarity function, and the network similarity between the original network and each sub-network is obtained. Furthermore, according to the network similarity between the original network and each sub-network, the mapping of the second image features and hyperparameters of each sub-network in the multiple sub-networks to the final effect is generated to generate the optimal hyper-parameters of the original network, so as to perform information through the original network Recognition.

It can be understood as mapping the hyperparameters of multiple sub-networks and the second image feature to the final effect to optimize the optimal hyperparameters of the original network. This method can optimize the hyperparameters of the original network faster and obtain the original The optimal hyperparameter of the network. Furthermore, according to the optimized original network, face recognition and detection, anomaly detection, voice recognition, etc. are performed.

The hyperparameter optimization method for large-scale network characterization learning in the embodiment of this application obtains multiple sub-networks by sampling the original network, and extracts the first image feature of the original network and the first image feature of each of the multiple sub-networks according to a preset algorithm. Two image features, according to the Gaussian process regression fitting the second image features and hyperparameters of each sub-network in the multiple sub-networks to the final effect, and calculate the first image feature and each second image feature according to the similarity function to obtain The network similarity between the original network and each sub-network, according to the network similarity between the original network and each sub-network, learn the second image features and hyperparameters of each sub-network in multiple sub-networks to the final effect to generate the optimal original network Super-parameters for information identification through the original network. The method optimizes the optimal hyperparameters of the original network by learning the hyperparameters in multiple sub-networks and the mapping of the second image feature to the final effect, and can quickly and effectively automatically adjust the hyperparameters of the original network.

In order to implement the foregoing embodiment, an embodiment of the present application also proposes a hyperparameter optimization device for large-scale network representation learning.

As shown in FIG. 2, the hyperparameter optimization device for large-scale network representation learning includes: a sampling module 110, an extraction module 120, a fitting module 130, a calculation module 140, and a generation module 150.

The sampling module 110 is used to sample the original network to obtain multiple sub-networks.

The extraction module 120 is configured to extract the first image feature of the original network and the second image feature of each of the multiple sub-networks according to a preset algorithm.

The fitting module 130 is used to regressively fit the mapping of the second image features and hyperparameters of each of the multiple sub-networks to the final effect according to the Gaussian process.

The calculation module 140 is configured to calculate the first image feature and each second image feature according to the similarity function to obtain the network similarity of the original network and each sub-network.

The generating module 150 is used to learn the second image features and hyperparameters of each sub-network in the multiple sub-networks to the final effect according to the network similarity of the original network and each sub-network to generate the optimal hyper-parameters of the original network to pass The original network performs information identification.

As a possible implementation, the sampling module 110 is specifically used for:

According to the multi-source random walk sampling algorithm, multiple nodes are randomly selected as the starting point from the nodes of the original network;

Randomly walk to the neighboring nodes of the multiple nodes according to the preset probability, and then randomly move from the neighboring nodes until the preset number of times is reached to generate multiple sub-networks.

As another possible implementation manner, the fitting module 130 is specifically used for:

Using the similarity function as the kernel function of the Gaussian process, calculating the first image feature and each second image feature to obtain the network similarity of the original network and each sub-network.

As another possible implementation manner, the calculation module 140 is specifically configured to:

Obtain the network structure similarity and hyperparameter similarity of the original network and each sub-network.

As another possible implementation manner, the extraction module 120 is specifically used for:

Calculating the first candidate feature vector of the original network and the second candidate feature vector of each sub-network under the Laplacian matrix;

Perform low-pass filtering on the first feature vector and the second feature vector to obtain the first feature vector of the original network and the second feature vector of each sub-network.

In the description of this specification, descriptions with reference to the terms "one embodiment", "some embodiments", "examples", "specific examples", or "some examples" etc. mean specific features described in conjunction with the embodiment or example , The structure, materials, or characteristics are included in at least one embodiment or example of the present application. In this specification, the schematic representations of the above terms do not necessarily refer to the same embodiment or example. Moreover, the described specific features, structures, materials or characteristics can be combined in any one or more embodiments or examples in a suitable manner. In addition, those skilled in the art can combine and combine the different embodiments or examples and the characteristics of the different embodiments or examples described in this specification without contradicting each other.

In addition, the terms "first" and "second" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Therefore, the features defined with "first" and "second" may explicitly or implicitly include at least one of the features. In the description of the present application, "a plurality of" means at least two, such as two, three, etc., unless specifically defined otherwise.

Any process or method description in the flowchart or described in other ways herein can be understood as a module, segment or part of code that includes one or more executable instructions for implementing custom logic functions or steps of the process And the scope of the preferred embodiments of the present application includes additional implementations, which may not be in the order shown or discussed, including performing functions in a substantially simultaneous manner or in reverse order according to the functions involved. This should It is understood by those skilled in the art to which the embodiments of this application belong.

The logic and/or steps represented in the flowchart or described in other ways herein, for example, can be considered as a sequenced list of executable instructions for implementing logic functions, and can be embodied in any computer-readable medium, For use by instruction execution systems, devices, or equipment (such as computer-based systems, systems including processors, or other systems that can fetch instructions from instruction execution systems, devices, or equipment and execute instructions), or combine these instruction execution systems, devices Or equipment. For the purposes of this specification, a "computer-readable medium" can be any device that can contain, store, communicate, propagate, or transmit a program for use by an instruction execution system, device, or device or in combination with these instruction execution systems, devices, or devices. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections (electronic devices) with one or more wiring, portable computer disk cases (magnetic devices), random access memory (RAM), Read only memory (ROM), erasable and editable read only memory (EPROM or flash memory), fiber optic devices, and portable compact disk read only memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable media on which the program can be printed, because it can be used, for example, by optically scanning the paper or other media, and then editing, interpreting, or other suitable media if necessary. The program is processed in a manner to obtain the program electronically and then stored in the computer memory.

It should be understood that each part of this application can be implemented by hardware, software, firmware, or a combination thereof. In the above embodiments, multiple steps or methods can be implemented by software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if it is implemented by hardware as in another embodiment, it can be implemented by any one or a combination of the following technologies known in the art: Discrete logic gate circuits for implementing logic functions on data signals Logic circuit, application specific integrated circuit with suitable combinational logic gate, programmable gate array (PGA), field programmable gate array (FPGA), etc.

Those of ordinary skill in the art can understand that all or part of the steps carried in the method of the foregoing embodiments can be implemented by a program instructing relevant hardware to complete. The program can be stored in a computer-readable storage medium. When executed, it includes one of the steps of the method embodiment or a combination thereof.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium.

The aforementioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc. Although the embodiments of the present application have been shown and described above, it can be understood that the above-mentioned embodiments are exemplary and should not be construed as limiting the present application. A person of ordinary skill in the art can comment on the foregoing within the scope of the present application. The embodiment undergoes changes, modifications, substitutions and modifications.

Claims

A hyperparameter optimization method for large-scale network representation learning, characterized in that the method includes the following steps:

Sampling the original network to obtain multiple sub-networks;

Extracting the first image feature of the original network and the second image feature of each of the multiple sub-networks according to a preset algorithm;

Regression fitting the mapping of the second image features and hyperparameters of each of the multiple sub-networks to the final effect according to the Gaussian process;

Calculating the first image feature and each second image feature according to a similarity function to obtain the network similarity between the original network and each sub-network;

According to the network similarity between the original network and each sub-network, the mapping of the second image features and hyperparameters of each sub-network in the multiple sub-networks to the final effect is learned to generate the optimal hyper-parameters of the original network to pass The original network performs information identification.
The method of claim 1, wherein the sampling the original network to obtain multiple sub-networks comprises:

According to a multi-source random walk sampling algorithm, multiple nodes are randomly selected as starting points from the nodes of the original network;

Randomly walk to neighboring nodes of the multiple nodes according to a preset probability, and then randomly move from the neighboring nodes until the preset number of times is reached, and the multiple sub-networks are generated.
The method according to claim 1 or 2, wherein the regression fitting of the second image features and hyperparameters of each sub-network in the plurality of sub-networks to the final effect according to the Gaussian process regression comprises:

Using the similarity function as the kernel function of the Gaussian process, calculating the first image feature and each second image feature to obtain the network similarity of the original network and each sub-network.
The method according to any one of claims 1 to 3, wherein the obtaining the network similarity between the original network and each sub-network comprises:

Obtain the network structure similarity and super parameter similarity of the original network and each sub-network.
The method according to any one of claims 1 to 4, wherein said extracting the first image feature of the original network and the second image feature of each of the multiple sub-networks according to a preset algorithm, include:

Calculating the first candidate feature vector of the original network and the second candidate feature vector of each sub-network under the Laplacian matrix;

Perform low-pass filtering on the first feature vector and the second feature vector to obtain the first feature vector of the original network and the second feature vector of each sub-network.
A hyperparameter optimization device for large-scale network representation learning, characterized in that the device includes:

The sampling module is used to sample the original network to obtain multiple sub-networks;

An extraction module, configured to extract the first image feature of the original network and the second image feature of each of the multiple sub-networks according to a preset algorithm;

A fitting module for regression fitting the mapping of the second image features and hyperparameters of each sub-network in the multiple sub-networks to the final effect according to the Gaussian process;

A calculation module, configured to calculate the first image feature and each second image feature according to a similarity function, and obtain the network similarity between the original network and each sub-network;

The generating module is configured to learn the mapping of the second image features and hyperparameters of each sub-network of the multiple sub-networks to the final effect according to the network similarity of the original network and each sub-network to generate the optimal of the original network Hyperparameters for information identification through the original network.
The device according to claim 6, wherein the sampling module is specifically configured to:

According to a multi-source random walk sampling algorithm, multiple nodes are randomly selected as starting points from the nodes of the original network;

Randomly walk to neighboring nodes of the multiple nodes according to a preset probability, and then randomly move from the neighboring nodes until the preset number of times is reached, and the multiple sub-networks are generated.
The device according to claim 6 or 7, wherein the fitting module is specifically used for:

Using the similarity function as the kernel function of the Gaussian process, calculating the first image feature and each second image feature to obtain the network similarity of the original network and each sub-network.
8. The device according to any one of claims 6-8, wherein the calculation module is specifically configured to:

Obtain the network structure similarity and hyperparameter similarity of the original network and each sub-network.
The device according to any one of claims 6-9, wherein the extraction module is specifically configured to:

Calculating the first candidate feature vector of the original network and the second candidate feature vector of each sub-network under the Laplacian matrix;

Perform low-pass filtering on the first feature vector and the second feature vector to obtain the first feature vector of the original network and the second feature vector of each sub-network.