CN114912585A

CN114912585A - Neural network generation method and device

Info

Publication number: CN114912585A
Application number: CN202210591391.3A
Authority: CN
Inventors: 沈力; 陶大程
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2022-05-27
Filing date: 2022-05-27
Publication date: 2022-08-16

Abstract

The embodiment of the disclosure discloses a method and a device for generating a neural network. One embodiment of the method comprises: acquiring a neural network to be trained, wherein the neural network comprises a feature extraction network, a screening network and an output network, the feature extraction network is used for generating probability vectors, the screening network is used for carrying out heavy parameter processing on the input probability vectors, the Top-K algorithm based on optimal transmission is applied to heavy parameter processing results, and the output network is used for generating processing results according to the output of the screening network; acquiring a training data set; and taking the sample to be processed in the training data set as the input of the neural network, taking the processing result sample corresponding to the input sample to be processed as the expected output of the neural network, and training the neural network by utilizing a back propagation algorithm. The implementation mode not only solves the problem that the Top-K algorithm cannot calculate the gradient based on the Top-K algorithm of the optimal transmission, but also relieves the problem that the solving of the neural network based on the Top-K algorithm of the optimal transmission is inaccurate.

Description

Neural network generation method and device

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a method and a device for generating a neural network of a neural network.

Background

Top-K Coreset is a general method of ordering samples that enables the approximation of the original set with a smaller set. In some cases, Top-K Coreset is applied in neural networks. In order to train the Top-K Coreset-based neural network, the loss function is required to be differentiable relative to the input in each updating step, but the implementation algorithm of the Top-K operation usually involves operations such as index exchange and the like, and the gradient of the Top-K operation cannot be calculated, so that the loss function is difficult to be integrated into the training process of the neural network.

Based on this, the existing common training method is to adopt a two-stage training mode, specifically use the proxy loss of the feature extraction part to train the feature extraction part included in the neural network, then use the trained feature extraction part to perform feature extraction, and then use Top-K Coreset and the like to perform subsequent processing according to the extracted features. This approach completely circumvents Top-K operations during training, but can make training and final processing results inconsistent.

Disclosure of Invention

The embodiment of the disclosure provides a method and a device for generating a neural network.

In a first aspect, an embodiment of the present disclosure provides a method for generating a neural network of a neural network, the method including: acquiring a neural network to be trained, wherein the neural network comprises a feature extraction network, a screening network and an output network, the feature extraction network is used for extracting sample features and generating probability vectors according to the sample features, the screening network is used for carrying out re-parameter processing on the input probability vectors and applying a Top-K algorithm based on optimal transmission to the re-parameter processing results, and the output network is used for generating processing results according to the output of the screening network; acquiring a training data set, wherein the training data in the training data set comprises a sample to be processed and label data corresponding to the sample to be processed, and the type of the sample to be processed comprises at least one of the following items: text, images, audio, and video; and taking the sample to be processed in the training data set as the input of the neural network, taking the label data corresponding to the input sample to be processed as the expected output of the neural network, and training the neural network by utilizing a back propagation algorithm to obtain the trained neural network.

In some embodiments, the screening network performs a re-parameterization on the input probability vector using a Gumbel Trick.

In some embodiments, the screening network implements an optimal transmission-based Top-K algorithm using a Sinkhorn algorithm.

In some embodiments, each element in the probability vector corresponds to information in a preset information set; and outputting a processing result generated by the network for indicating the information selected from the preset information set.

In a second aspect, an embodiment of the present disclosure provides an information pushing method, where the method includes: acquiring a candidate push information set; selecting information from the candidate push information set by using a pre-trained neural network, wherein the neural network is obtained by training by using a method described in the last implementation manner in the first aspect; and pushing the information selected from the candidate pushing information set.

In a third aspect, an embodiment of the present disclosure provides an apparatus for generating a neural network, the apparatus including: the neural network comprises a characteristic extraction network, a screening network and an output network, wherein the characteristic extraction network is used for extracting sample characteristics and generating probability vectors according to the sample characteristics, the screening network is used for carrying out heavy parameter processing on the input probability vectors and applying a Top-K algorithm based on optimal transmission to heavy parameter processing results, and the output network is used for generating processing results according to the output of the screening network; the second obtaining unit is configured to obtain a training data set, wherein the training data in the training data set includes a to-be-processed sample and label data corresponding to the to-be-processed sample, and the type of the to-be-processed sample includes at least one of the following: text, image, audio, video; and the training unit is configured to take the samples to be processed in the training data set as the input of the neural network, take the label data corresponding to the input samples to be processed as the expected output of the neural network, and train the neural network by using a back propagation algorithm to obtain the trained neural network.

In a fourth aspect, an embodiment of the present disclosure provides an information pushing apparatus, including a third obtaining unit configured to obtain a candidate pushing information set; a selecting unit configured to select information from the candidate pushed information set using a pre-trained neural network, wherein the neural network is trained as described in the last implementation of the first aspect; and the pushing unit is configured to push the information selected from the candidate pushing information set.

In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method as described in any implementation of the first aspect.

In a sixth aspect, embodiments of the present disclosure provide a computer-readable medium, on which a computer program is stored, which computer program, when executed by a processor, implements the method as described in any implementation manner of the first aspect.

According to the generation method and device of the neural network, the problem that the Top-K algorithm cannot calculate the gradient is solved based on the Top-K algorithm of the optimal transmission, the sampling process from continuous probability distribution to approximate discrete decision variables is simulated by introducing the reparameterization in the neural network based on the Top-K algorithm of the optimal transmission, the gradient is kept, the problem of inaccurate solution of the neural network based on the Top-K algorithm of the optimal transmission is relieved, and the gap between mapping inference and the Top-K algorithm is optimized. Meanwhile, the optimal transmission-based Top-K algorithm neural network can reduce the data volume subsequently participating in calculation by using the Top-K algorithm, thereby being beneficial to improving the calculation efficiency and reducing the resource consumption in the calculation process.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method of generating a neural network according to the present disclosure;

FIG. 3 is a flow diagram of one embodiment of an information push method according to the present disclosure;

FIG. 4 is a schematic structural diagram of one embodiment of a generation apparatus of a neural network according to the present disclosure;

FIG. 5 is a schematic block diagram of one embodiment of an information pushing device according to the present disclosure;

FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows an exemplary architecture 100 to which an embodiment of a neural network generation method or a neural network generation apparatus of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. Various client applications may be installed on the

terminal devices

101, 102, 103. Such as search-class applications, browser-class applications, social platform software, deep-learning-class applications, information-flow-class applications, and so forth.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, e-book readers, laptop portable computers, desktop computers, and the like. When the

terminal devices

101, 102, 103 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide a distributed service), or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a backend server that provides service support for client applications installed on the

terminal devices

101, 102, 103. The server can obtain the neural network to be trained and a training data set, and end-to-end training is carried out on the neural network to be trained by utilizing the training data set.

The neural network and the training data set to be trained may be stored in the

terminal devices

101, 102, and 103, and at this time, the server may obtain the neural network and the training data set to be trained from the

terminal devices

101, 102, and 103. The neural network and the training data set to be trained may also be directly stored locally in the server 105, and the server 105 may directly extract the neural network and the training data set to be trained, which are locally stored, in this case, the

terminal devices

101, 102, 103 and the network 104 may not be present.

It should be noted that the neural network generation method provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the neural network generation device is generally disposed in the server 105.

It should also be noted that the

terminal devices

101, 102, 103 may also have model training applications installed therein. The

terminal devices

101, 102, 103 may also train the neural network to be trained end-to-end using the training data set based on the model training class application. In this case, the method of generating the neural network may be executed by the

terminal devices

101, 102, and 103, and accordingly, the neural network generating device may be provided in the

terminal devices

101, 102, and 103. At this point, the exemplary system architecture 100 may not have the server 105 and the network 104.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a method of generating a neural network in accordance with the present disclosure is shown. The generation method of the neural network comprises the following steps:

step 201, obtaining a neural network to be trained.

In the present embodiment, an executing subject of the neural network generation method (such as the server 105 shown in fig. 1) may acquire the neural network to be trained from a local, other storage device, or a third-party data platform. The neural network may include a feature extraction network, a screening network, and an output network.

The feature extraction network may be configured to extract sample features and generate probability vectors according to the extracted sample features. The elements in the probability vector may represent probabilities. In different neural networks, the elements in the probability vector may represent different probabilities.

For example, in some cases, each element in the probability vector may correspond to a different category of the target object, and in this case, each element may represent a probability that the target object belongs to the corresponding category. For another example, in some cases, each element in the probability vector may correspond to different information (such as an image or a video), and in this case, each element may represent a probability of selecting corresponding information.

In particular, the feature extraction network may generate probability vectors from inputs to the neural network. Wherein the inputs of different types of neural networks may be different. For example, the input to the neural network may be text and/or images and/or audio and/or video, or may be a sequence of features or a feature vector, etc. The specific processing procedure of the feature extraction network can be set according to the specific application requirements of the neural network. For example, the feature extraction network may perform feature extraction on the input of the neural network, and then generate a probability vector according to the feature extraction result.

The screening network can be used to perform Re-Parameterization (Re-Parameterization) on the input probability vectors and to apply Top-K algorithm based on optimal transmission to the result of the Re-Parameterization. The probability vector generated by the feature extraction network can be used as the input of the screening network. A re-parameter is a method of processing an objective function to be sampled from a distribution. Because the distribution has parameters, the gradient of the parameters is lost if the distribution is directly sampled, and the heavy parameter processing enables the distribution to be directly sampled through the transformation of the objective function, and the gradient of the parameters can be reserved. The processing of the input probability vector with the weighted parameters may refer to that each element in the probability vector is perturbed by using various weighted parameter processing methods, so that the computation of the loss function is performed based on the perturbed probability vector, and the gradient of the network parameters of the neural network may be computed when the computation is performed by using a back propagation algorithm.

The Top-K algorithm refers to finding the largest or smallest K elements from a set. The Top-K algorithm is an important component of some neural networks, and is widely applied to the fields of information retrieval, data mining and the like. For example, the Top-K algorithm may be used for the construction of a core set (Coreset).

The kernel set is an optimization method based on sample re-representation, and particularly, a smaller set is used for approximating a larger set, and meanwhile, some expected characteristics (such as average pairwise distance, diameter of a point set and the like) are kept. The core set can be similar to the original set by using a smaller set and can improve the expandability of the algorithm, so that the core set is applied to the algorithm with large calculation amount along with the increase of data scale, such as large model parameter pruning, large-scale recommendation system speed improvement, K median/cluster complexity reduction and the like, so as to improve the operation efficiency and performance of the algorithm.

Top-K based core set construction refers to sampling a set of elements "C" from an ordered domain "D". If an object e ∈ D and is the "i" th largest element in "D", its rank is "i". By sampling each element of "D" independently with a probability "P", a set of elements "C" can be obtained. The objects in "C" ranked as "kp" are ranked approximately as "k" in "D".

The purpose of Optimal Transport (Optimal Transport) is to find a Transport scheme with the least overall cost, and then establish a geometric tool set facing probability distribution, so as to model the probability distribution and measure the distance between the probability distributions through a geometric method. Optimal transmission is widely used in the fields of computer vision, computer graphics, and the like.

Since Top-K algorithms are usually calculated by bubble algorithm or the like, these calculation methods usually involve exchanging indices and cannot be used to calculate their gradient, i.e. the mapping from the input data to the vector indicating whether each element belongs to the first K selected elements is discontinuous. Therefore, for neural networks using the Top-K algorithm, end-to-end training using back-propagation and gradient descent algorithms is not possible.

In order to solve the problem, the Top-K algorithm based on optimal transmission regards the problem of realizing differentiable Top-K as an optimal transmission problem, specifically, an element belonging to Top-K is placed at one position, and an element not belonging to Top-K is placed at the other position. Adding a regular Entropy (entry Regularization) on the basis of the optimal transmission scheme so that the optimal solution is an inner point of a feasible region, and when the input changes, the optimal solution smoothly changes, thereby realizing a differentiable Top-K problem.

The output network is used for generating a processing result according to the output of the screening network. I.e. the output results of the screening network may be used as input to the output network. The neural network under different application scenarios may include output networks that generate different processing results. For example, the processing result may be an attribute (such as quality information, score, etc.) corresponding to K elements selected by the screening network using the Top-K algorithm. The specific processing procedure of the output network can be set according to the specific application requirements of the neural network.

Step 202, a training data set is obtained.

In this embodiment, the executive may obtain the training data set from local, other storage devices, or third party data platforms. The training data in the training data set comprises a sample to be processed and label data corresponding to the sample to be processed. The sample to be processed may be various types of samples to be processed including, but not limited to, at least one of: text, images, audio, and video. The label data may be a true label corresponding to the specimen to be processed.

And 203, taking the sample to be processed in the training data set as the input of the neural network, taking the label data sample corresponding to the input sample to be processed as the expected output of the neural network, and training the neural network by using a back propagation algorithm.

In this embodiment, a machine learning method may be used to use a sample to be processed as an input of the neural network, use tag data corresponding to the input sample to be processed as an expected output of the neural network, and train the neural network by using a back propagation algorithm, so as to obtain a trained neural network.

In some optional implementations of this embodiment, the screening network may perform a recomparametric process on the input probability vector by using a gummy tick. The Gumbel Trick is a method for processing the heavy parameters, and is specifically realized by adding Gumbel noise to a discrete random variable to simulate discrete probability distribution and then sampling.

As an example, for the probability vector "S", the introduction of noise into "S" using the Gumbel trigk can be expressed as:

wherein the content of the first and second substances,

representing the vector after the introduction of noise. "g _σ (u _i ) "is Gumbel distribution. For example, "g _σ (u _i ) "can be a standard Gumbel distribution as follows:

g _σ (u)＝-σloh(-log(u))

by introducing Gumbel-weighted parameterization processing into the differentiable Top-k algorithm based on optimal transmission, a relatively proper gap can be formed between the differentiable Top-k algorithm and mapping inference, and the problem of inaccurate estimation existing when the probability that the k-th maximum (or minimum) item is equal to the probability that the k + 1-th maximum (or minimum) item corresponds to is relieved.

In some optional implementation manners of the embodiment, the screening network may implement a Top-K algorithm based on optimal transmission by using a Sinkhorn algorithm. The Sinkhorn algorithm is a method for solving an optimal transmission problem.

As an example, for the probability vector "s", the Top-K problem is expressed as an optimal transmission problem. Specifically, k maximum (max) or minimum (min) terms (k < m) are selected from a probability vector "s" of size m, k terms are moved to one destination, and the other (m-k) terms are moved to another destination. Assuming the destination is the maximum value of "s", the part of Top-k may be moved to "max(s)", and the other items to "min(s)", at which time the optimal transmission problem may be expressed as:

c＝[1 1 ... 1]

where "c" is the cost matrix. "r" indicates two transmission destinations. "D" is a distance matrix.

After adding noise for "s" with the Gumbel distribution, the distance matrix can be updated as follows:

then, by introducing an entropy regularization term, the optimal transmission problem can be represented as an integer linear programming problem with Gumbel noise as follows:

wherein the content of the first and second substances,

representing the transmission scheme, i.e. the solution to the Top-K problem. "1" is a vector with each element being 1.

An entropy regularization term. "τ" is the regularization discounting factor.

Indicating correlation with regularized discount factors

"i" and "j" denote a row label and a column label, respectively.

For the above problem, the solution can be performed by using the Sinkhorn algorithm by giving an arbitrary real number distance matrix "D". Specifically, firstly make ^τ T＝exp(-D/τ)Then, according to the following formula " ^τ T"alternately normalized by rows and columns to find the final convergence solution:

wherein the content of the first and second substances,

meaning divided by element.

On the basis of the optimal transmission problem, the solved smooth change and the mapping differentiability are realized by increasing the regular entropy, and the differentiable Top-K operator is applied to the training process of the neural network model, namely, the differentiable Top-K operation is realized. In addition, the Sinkhorn algorithm is used for solving the optimal transmission problem, and the method is favorable for realizing the relatively efficient Top-K Coreset structure and the end-to-end training of the neural network.

In some optional implementations of the present embodiment, each element in the probability vector may correspond to information in a preset information set respectively. At this time, the processing result generated by the output network may be used to indicate information selected from the preset information set.

The preset information set can be preset according to the actual application scene. In general, each element in the probability vector may correspond to information in a preset information set one by one. I.e. each element in the probability vector represents the probability to which its corresponding information corresponds. For example, in the information push scenario, the preset information set may be composed of several pieces of information to be pushed. Each element in the probability vector represents the probability that the corresponding information to be pushed is selected for pushing.

At this time, the processing result generated by the output network may be used to indicate information selected from the preset information set. Taking the scenario of information push as an example, the processing result generated by the output network may be information selected from a preset information set for push.

By adopting the method provided by the embodiment in scenes such as information screening and pushing, the end-to-end training of the neural network integrated with the Top-K algorithm can be realized, and the accuracy of information screening and prediction can be improved.

It should be noted that the design of the output network can be flexibly set according to the actual application requirements. Any output network can adopt the method provided by the embodiment to realize the end-to-end training of the neural network integrated with the Top-K algorithm.

The above-described embodiments of the present disclosure provide methods directed to existing methods of applying a differentiable Top-K operator to the training of neural networks, the method not only solves the problem that the Top-K algorithm can not calculate the gradient based on the Top-K algorithm of the optimal transmission, but also proposes to introduce a re-parameterized simulation of the sampling process from continuous probability distributions to near-discrete decision variables, while still maintaining the gradient of the gradient parameter, the method can improve the calculation efficiency by utilizing the Top-K algorithm, so that a better gap exists between the Top-K algorithm and the mapping inference, and a better theoretical upper bound can be realized in the practical application process, therefore, the problem that the estimation result is inaccurate when the corresponding probabilities of the K-th maximum (or minimum) item and the K + 1-th maximum (or minimum) item are equal in the Top-K algorithm based on the optimal transmission is solved.

With further reference to fig. 3, a flow 300 of one embodiment of an information push method according to the present disclosure is shown. The process 300 of the information push method includes the following steps:

step 301, a candidate push information set is obtained.

In this embodiment, the executing entity of the information push method may obtain the candidate push information set from a local device, a connected storage device, a third-party data platform, or the like. Wherein the candidate push information set may be composed of several information. The candidate push information set may be preset according to actual application requirements or application scenarios. As an example, in a personalized information push scenario for a user, the candidate push information set may be a personalized candidate push information set corresponding to the user.

Step 302, selecting information from the candidate pushed information set by using a pre-trained neural network.

In this embodiment, the neural network may be trained using the method of the alternative embodiment as described in the embodiment of fig. 2 above. In particular, each element in the probability vector may correspond to each information in the candidate push information set one-to-one, and is used to represent a probability of selecting the corresponding information. The screening network may screen K elements from the probability vector based on the probability vector, the weight parameter and the Top-K algorithm based on the optimal transmission, the output network may generate a processing result according to an output result of the screening network, and the processing result may indicate that the screened K elements respectively correspond to information in the candidate pushed information set.

Step 303, pushing the information selected from the candidate pushed information set.

In this embodiment, the executing entity may push the information selected from the candidate push information set according to actual requirements. For example, information selected from the candidate pushed information set may be pushed to a terminal device used by the corresponding user.

The execution subject of the information push method of the present embodiment may be the same as or different from the execution subject of the neural network generation method.

According to the method provided by the embodiment of the disclosure, the neural network based on the Top-K algorithm of the optimal transmission obtained by introducing the heavy parameter training is used for information screening and pushing, so that the precision of the screened information can be improved, and the accuracy of information pushing is improved.

With further reference to fig. 4, as an implementation of the method shown in fig. 2, the present disclosure provides an embodiment of a generation apparatus of a neural network, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 4, the generation apparatus 400 of a neural network provided in this embodiment includes a first obtaining unit 401, a second obtaining unit 402, and a training unit 403. The first obtaining unit 401 is configured to obtain a neural network to be trained, where the neural network includes a feature extraction network, a screening network and an output network, the feature extraction network is configured to extract sample features and generate probability vectors according to the sample features, the screening network is configured to perform a re-parameter processing on the input probability vectors and apply a Top-K algorithm based on optimal transmission to the re-parameter processing results, and the output network is configured to generate processing results according to the output of the screening network; the second obtaining unit 402 is configured to obtain a training data set, where training data in the training data set includes a to-be-processed sample and label data corresponding to the to-be-processed sample, and a type of the to-be-processed sample includes at least one of: text, images, audio, and video; the training unit 403 is configured to use the to-be-processed sample in the training data set as an input of the neural network, use the label data corresponding to the input to-be-processed sample as an expected output of the neural network, and train the neural network by using a back propagation algorithm to obtain a trained neural network.

In the present embodiment, the neural network generation apparatus 400 includes: for specific processing of the first obtaining unit 401, the second obtaining unit 402, and the training unit 403 and technical effects thereof, reference may be made to relevant descriptions of step 201, step 202, and step 203 in the corresponding embodiment of fig. 2, and details are not repeated here.

In some optional implementations of this embodiment, the screening network performs a recomparameter processing on the input probability vector by using a Gumbel Trick.

In some optional implementation manners of this embodiment, the screening network uses a Sinkhorn algorithm to implement a Top-K algorithm based on optimal transmission.

In some optional implementation manners of this embodiment, each element in the probability vector corresponds to information in a preset information set; and outputting a processing result generated by the network for indicating the information selected from the preset information set.

The apparatus provided in the foregoing embodiment of the present disclosure acquires, by a first acquiring unit, a neural network to be trained, where the neural network includes a feature extraction network, a screening network, and an output network, the feature extraction network is configured to generate a probability vector, the screening network is configured to perform a heavy parameter processing on an input probability vector, apply a Top-K algorithm based on optimal transmission to a heavy parameter processing result, and the output network is configured to generate a processing result according to an output of the screening network; the second obtaining unit obtains a training data set, wherein the training data in the training data set comprises a to-be-processed sample and a processing result sample, and the type of the to-be-processed sample comprises at least one of the following items: text, image, audio, video; a training unit configured to take a sample to be processed in a training data set as an input of the neural network, take a processing result sample corresponding to the input sample to be processed as an expected output of the neural network, train the neural network by using a back propagation algorithm, propose a sampling process of introducing a re-parameterized simulation from continuous probability distribution to a near-discrete decision variable, while still maintaining the gradient of the gradient parameter, the method can improve the calculation efficiency by utilizing the Top-K algorithm, so that a better gap exists between the Top-K algorithm and mapping inference, and a better theoretical upper bound can be realized in the practical application process, therefore, the problem that the estimation result is inaccurate when the corresponding probabilities of the K-th maximum (or minimum) item and the K + 1-th maximum (or minimum) item are equal in the Top-K algorithm based on the optimal transmission is solved.

With further reference to fig. 5, as an implementation of the method shown in fig. 3, the present disclosure provides an embodiment of an information pushing apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 3, and the apparatus may be applied to various electronic devices in particular.

As shown in fig. 5, the information push apparatus 500 provided by the present embodiment includes a third obtaining unit 501, a selecting unit 502 and a pushing unit 503. Wherein the third obtaining unit 501 is configured to obtain a candidate push information set; the selecting unit 502 is configured to select information from the candidate pushed information set by using a pre-trained neural network, wherein the neural network is trained by using the method described in the embodiment of fig. 2; the pushing unit 503 is configured to push information selected from the set of candidate push information.

In the present embodiment, in the information push apparatus 500: the specific processing of the third obtaining unit 501, the selecting unit 502, and the pushing unit 503 and the technical effects thereof can refer to the related descriptions of step 301, step 302, and step 303 in the corresponding embodiment of fig. 3, which are not described herein again.

The apparatus provided by the foregoing embodiment of the present disclosure acquires, by a third acquiring unit, a candidate push information set; the selecting unit selects information from the candidate push information set by using the optimal transmission-based Top-K algorithm neural network obtained by introducing the heavy parameter training, and the pushing unit pushes the information selected from the candidate push information set, so that the precision of the screened information can be improved, and the accuracy of information pushing is improved.

Referring now to FIG. 6, a schematic diagram of an electronic device (e.g., the server of FIG. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device/server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, magnetic tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure.

It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a neural network to be trained, wherein the neural network comprises a feature extraction network, a screening network and an output network, the feature extraction network is used for extracting sample features and generating probability vectors according to the sample features, the screening network is used for carrying out heavy parameter processing on the input probability vectors and applying a Top-K algorithm based on optimal transmission to heavy parameter processing results, and the output network is used for generating processing results according to the output of the screening network; acquiring a training data set, wherein the training data in the training data set comprises a sample to be processed and label data corresponding to the sample to be processed, and the type of the sample to be processed comprises at least one of the following items: text, image, audio, video; and taking the samples to be processed in the training data set as the input of the neural network, taking the label data corresponding to the input samples to be processed as the expected output of the neural network, and training the neural network by using a back propagation algorithm to obtain the trained neural network.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first acquisition unit, a second acquisition unit, and a training unit. For example, the training unit may also be described as "taking a sample to be processed in the training data set as an input of the neural network, taking label data corresponding to the input sample to be processed as an expected output of the neural network, and training the unit of the neural network by using a back propagation algorithm to obtain a trained neural network".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method of generating a neural network, comprising:

acquiring a neural network to be trained, wherein the neural network comprises a feature extraction network, a screening network and an output network, the feature extraction network is used for extracting sample features and generating probability vectors according to the sample features, the screening network is used for carrying out re-parameter processing on the input probability vectors and applying a Top-K algorithm based on optimal transmission on re-parameter processing results, and the output network is used for generating processing results according to the output of the screening network;

acquiring a training data set, wherein training data in the training data set comprises a sample to be processed and label data corresponding to the sample to be processed, and the type of the sample to be processed comprises at least one of the following items: text, images, audio, and video;

and taking the sample to be processed in the training data set as the input of the neural network, taking the label data corresponding to the input sample to be processed as the expected output of the neural network, and training the neural network by utilizing a back propagation algorithm to obtain the trained neural network.

2. The method of claim 1, wherein the screening network re-parameterizes the probability vectors using a Gumbel Trick.

3. The method of claim 1, wherein the screening network implements an optimal transmission based Top-K algorithm using a Sinkhorn algorithm.

4. The method according to one of claims 1 to 3, wherein each element in the probability vector corresponds to information in a preset information set; and

the processing result generated by the output network is used to indicate the information selected from the preset information set.

5. An information push method, comprising:

acquiring a candidate push information set;

selecting information from the candidate pushed information set using a pre-generated neural network, wherein the neural network is trained using the method of claim 4;

and pushing the information selected from the candidate pushed information set.

6. An apparatus for generating a neural network, comprising:

the neural network comprises a characteristic extraction network, a screening network and an output network, wherein the characteristic extraction network is used for extracting sample characteristics and generating probability vectors according to the sample characteristics, the screening network is used for performing re-parameter processing on the input probability vectors and applying a Top-K algorithm based on optimal transmission to re-parameter processing results, and the output network is used for generating processing results according to the output of the screening network;

a second obtaining unit configured to obtain a training data set, where training data in the training data set includes a to-be-processed sample and label data corresponding to the to-be-processed sample, and a type of the to-be-processed sample includes at least one of: text, images, audio, and video;

and the training unit is configured to take the sample to be processed in the training data set as the input of the neural network, take the input label data corresponding to the sample to be processed as the expected output of the neural network, and train the neural network by using a back propagation algorithm to obtain the trained neural network.

7. The apparatus of claim 6, wherein the screening network re-parameterizes the probability vectors using a Gumbel Trick.

8. The apparatus of claim 6, wherein the screening network implements an optimal transport based Top-K algorithm with a Sinkhorn algorithm.

9. The apparatus according to one of claims 6 to 8, wherein each element in the probability vector corresponds to information in a preset information set; and

10. An information pushing apparatus comprising:

a third obtaining unit configured to obtain a set of candidate push information;

a selecting unit configured to select information from the candidate pushed information set by using a pre-trained neural network, wherein the neural network is obtained by training by using the method of claim 4;

a pushing unit configured to push information selected from the candidate pushed information set.

11. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.