CN107609566A

CN107609566A - Dual network deep learning method based on a small amount of personalized sample

Info

Publication number: CN107609566A
Application number: CN201610546022.7A
Authority: CN
Inventors: 盛益强; 赵震宇
Original assignee: Institute of Acoustics CAS; Shanghai 3Ntv Network Technology Co Ltd
Current assignee: Institute of Acoustics CAS; Shanghai 3Ntv Network Technology Co Ltd
Priority date: 2016-07-12
Filing date: 2016-07-12
Publication date: 2018-01-19

Abstract

The present invention relates to a double-network deep learning method based on a small number of personalized samples, the double-network includes a reconstruction network and a deep network, and the method includes: collecting a small amount of personalized samples; wherein, the personalized samples include sample data and Its label; use the sample data in the personalized sample to train the reconstruction network, and then input the label in the personalized sample into the trained reconstruction network to generate new reconstruction data and its label; based on the reconstruction from The new reconstruction data of network and label thereof train described depth network; The data to be tested is input in the depth network through training, obtain the output of sigmoid function, then through a soft maximum regression from described sigmoid function Select one of the results in the output as a personalized response.

Description

A dual-network deep learning approach based on a small number of personalized samples

技术领域technical field

本发明涉及通信与数据处理领域，特别涉及基于少量个性化样本的双网络深度学习方法。The invention relates to the fields of communication and data processing, in particular to a double-network deep learning method based on a small number of personalized samples.

背景技术Background technique

随着网络技术的快速发展，数据的容量、个性化和多样性快速增加，而处理数据的算法复杂度却难以改善，依赖个人经验和手工操作来描述数据、标注数据、选择特征、提取特征、处理数据的方法，已经很难满足个性化数据快速增长的需求，如何高效处理个性化数据已经成为一个紧迫的难题。With the rapid development of network technology, the capacity, personalization and diversity of data are increasing rapidly, but the algorithm complexity of data processing is difficult to improve, relying on personal experience and manual operations to describe data, label data, select features, extract features, The method of data processing has been difficult to meet the rapidly growing demand for personalized data, and how to efficiently process personalized data has become an urgent problem.

深度学习方法的研究突破，为解决数据处理问题指明了一个值得探索的方向。深度学习可以从大数据中自动提取特征，并通过海量的样本训练获得很好的处理效果。实际上，大数据的快速增长和深度学习的研究是相辅相成的，一方面大数据的快速增长需要一种高效处理海量数据的方法，另一方面深度学习系统的训练需要海量的样本数据。Research breakthroughs in deep learning methods have pointed out a direction worth exploring for solving data processing problems. Deep learning can automatically extract features from big data, and obtain good processing results through massive sample training. In fact, the rapid growth of big data and the research of deep learning complement each other. On the one hand, the rapid growth of big data requires a method for efficiently processing massive amounts of data. On the other hand, the training of deep learning systems requires massive sample data.

但是，个性化数据主要是由网络边缘的用户产生的，目前的方法是先对这些数据进行收集，然后再在云端服务器上对这些数据进行分析和处理。这会产生大量的数据传输，同时，对于终端用户来说，从服务器端得到的响应结果是由全局数据训练出的模型响应得到的，对于个性化用户的响应精度较低。如果在终端进行深度学习网络的构建，则由于终端个性化用户的数据量太少，无法对深度学习网络进行训练，这就是存在于目前网络的大数据两难悖论。However, personalized data is mainly generated by users at the edge of the network. The current method is to collect the data first, and then analyze and process the data on the cloud server. This will generate a large amount of data transmission. At the same time, for the end user, the response result obtained from the server is obtained from the response of the model trained on the global data, and the response accuracy for personalized users is low. If the deep learning network is constructed on the terminal, the deep learning network cannot be trained due to the small amount of terminal personalized user data. This is the big data dilemma that exists in the current network.

目前Apache开发的Hadoop架构，百度开发的参数服务器架构，腾讯开发的Mariana架构等，都是针对分布式大规模、深度学习系统而做出的一些尝试，通过对模型进行拆分，对数据进行拆分等方式，这些系统在一定程度上解决了大规模数据的处理问题。然而，以上的这些分布式系统是将大规模的模型和数据在多机上进行处理，多机之间存在大量的数据通信，这就导致这些分布式系统目前只能应用于局域网络。而终端用户想要获得响应也必须先将数据传输到中心集群上，然后中心机群再将生成的响应发送给终端用户，因此用户端延迟响应很高。At present, the Hadoop architecture developed by Apache, the parameter server architecture developed by Baidu, and the Mariana architecture developed by Tencent are all attempts for distributed large-scale and deep learning systems. By splitting the model, the data is split These systems solve the problem of large-scale data processing to a certain extent. However, the above distributed systems process large-scale models and data on multiple machines, and there is a large amount of data communication between multiple machines, which leads to the fact that these distributed systems can only be applied to local area networks at present. However, if end users want to get a response, they must first transmit the data to the central cluster, and then the central cluster will send the generated response to the end user, so the user-side delay response is very high.

在广域网络中，存在网络带宽的限制、大数据两难悖论、个性化需求等问题，如果能够解决分布式大数据实时处理系统中的通信代价高、边缘用户数据少的问题，则可以有效地将分布式大数据实时处理系统应用于广域网络，提高网络利用率，改善用户体验。In wide-area networks, there are problems such as network bandwidth limitations, big data dilemmas, and individual needs. If the problems of high communication costs and few edge user data in the distributed big data real-time processing system can be solved, it can effectively Apply the distributed big data real-time processing system to the wide area network to improve network utilization and user experience.

因此，针对少量个性化样本，有必要提供一种更适合于广域网络的分布式数据处理方法，以解决训练数据和模型参数的传输量大，以及当分布式数据存在个性化信息时，处理精度难以提升的问题。Therefore, for a small number of personalized samples, it is necessary to provide a distributed data processing method that is more suitable for wide-area networks to solve the problem of large transmission of training data and model parameters, and when there is personalized information in distributed data. Difficult to raise issues.

发明内容Contents of the invention

本发明的目的在于克服已有的分布式数据处理方法由于边缘用户样本数据少、从而处理精度难以提升的缺陷，从而提供一种基于少量个性化样本的双网络深度学习方法。The purpose of the present invention is to overcome the defect of the existing distributed data processing method that the processing accuracy is difficult to improve due to the small number of edge user sample data, thereby providing a dual-network deep learning method based on a small number of personalized samples.

为了实现上述目的，本发明提供了一种基于少量个性化样本的双网络深度学习方法，所述双网络包括重构网络与深度网络，该方法包括：In order to achieve the above object, the present invention provides a dual-network deep learning method based on a small number of personalized samples, the dual-network includes a reconstructed network and a deep network, and the method includes:

步骤1)、采集少量个性化样本；其中，所述个性化样本包括样本数据及其标签；Step 1), collecting a small amount of personalized samples; wherein, the personalized samples include sample data and tags thereof;

步骤2)、利用个性化样本中的样本数据训练所述重构网络，然后将个性化样本中的标签输入经过训练后的重构网络中，生成新的重构数据及其标签；Step 2), using the sample data in the personalized samples to train the reconstructed network, and then input the labels in the personalized samples into the trained reconstructed network to generate new reconstructed data and labels thereof;

步骤3)、基于步骤2)所得到的来自重构网络的新的重构数据及其标签训练所述深度网络；Step 3), based on step 2) obtained new reconstruction data from the reconstruction network and its label training the deep network;

步骤4)、将待测试的数据输入到经过训练的深度网络中，得到S形函数的输出，再经过一个软最大回归从所述S形函数的输出中选择一个结果作为个性化的响应。Step 4), input the data to be tested into the trained deep network to obtain the output of the S-shaped function, and then select a result from the output of the S-shaped function through a soft maximum regression as a personalized response.

上述技术方案中，所述少量个性化样本的数目记为m，m≈M/J，其中的M为用大数据训练相同精度的深度网络时的样本总数，J为通过一个个性化样本重构出的样本的个数。In the above technical solution, the number of the small number of personalized samples is recorded as m, m≈M/J, where M is the total number of samples when using big data to train a deep network with the same precision, and J is the number of samples reconstructed through a personalized sample. The number of samples out.

上述技术方案中，在步骤2)中，所生成的新的重构数据及其标签的集合表示为{x^r _k,y^r _k}；其中，In the above technical solution, in step 2), the generated new reconstructed data and its label set are expressed as {x ^r _k , y ^r _k }; where,

x^r _k代表任意一个重构数据，y^r _k为该重构数据对应的标签，k的取值为1≤k≤m×J，m为个性化样本数目，J为通过任一个性化样本重构出的样本的个数；J和个性化样本数目m的关系如下：m值越小，J的取值越大，以保证有足够的重构样本m×J来训练深度网络。x ^r _k represents any reconstructed data, y ^r _k is the label corresponding to the reconstructed data, the value of k is 1≤k≤m×J, m is the number of personalized samples, and J is the The number of reconstructed samples; the relationship between J and the number of personalized samples m is as follows: the smaller the value of m, the larger the value of J, so as to ensure that there are enough reconstructed samples m×J to train the deep network.

上述技术方案中，所述重构网络为包括贝叶斯网络和无层次神经网络在内的任何一种具有随机重构大量样本功能的学习网络。In the above technical solution, the reconstruction network is any learning network with the function of randomly reconstructing a large number of samples, including Bayesian network and non-hierarchical neural network.

上述技术方案中，所述重构网络为具有一次性学习功能的贝叶斯程序学习网络。In the above technical solution, the reconstruction network is a Bayesian procedural learning network with a one-time learning function.

上述技术方案中，所述深度网络为包括深度前馈网络、深度置信网、深度玻尔兹曼机、深度卷积网络在内的任何一种具有高精度的深度学习网络。In the above technical solution, the deep network is any high-precision deep learning network including a deep feedforward network, a deep belief network, a deep Boltzmann machine, and a deep convolutional network.

上述技术方案中，所述步骤4)进一步包括：In the above technical solution, said step 4) further includes:

步骤4-1)、通过深度网络的隐层计算各神经元的输出，其计算公式为：Step 4-1), calculate the output of each neuron through the hidden layer of the deep network, and its calculation formula is:

其中，y_j是第j个神经元的输出，f是一个给定的激活函数，w_ij是从第i个神经元到第j个神经元的连接权重，x_i是来自第i个神经元的输入，b_j是偏置权重；where, y _j is the output of the jth neuron, f is a given activation function, w _ij is the weight of the connection from the i-th neuron to the j-th neuron, and x _i is the weight from the i-th neuron The input, b _j is the bias weight;

步骤4-2)、计算S形函数的输出；其中，S形函数的表达式为：Step 4-2), calculate the output of the sigmoid function; wherein, the expression of the sigmoid function is:

所得到的S形函数的输出是一个P维的向量(o₁,o₂,…,o_j,…,o_P)，向量中的每一个元素o_j＝sigmoid(y_j)都是介于0和1之间的实数，这里的P是个性化标签的总数；The output of the obtained sigmoid function is a P-dimensional vector (o ₁ ,o ₂ ,…,o _j ,…,o _P ), and each element in the vector o _j =sigmoid(y _j ) is between A real number between 0 and 1, where P is the total number of personalized tags;

步骤4-3)、通过软最大回归从所述S形函数的输出中选择一个结果作为个性化的响应；其中，软最大回归的公式如下：Step 4-3), selecting a result from the output of the sigmoid function by soft maximum regression as a personalized response; wherein, the formula of soft maximum regression is as follows:

所述软最大回归表示分别以(softmax(o₁),softmax(o₂),…,softmax(o_j),…,softmax(o_P))的概率在S形函数的输出(o₁,o₂,…,o_j,…,o_P)中选中一个o_j作为个性化响应。 _The soft maximum regression represents the output of the _sigmoid function _{(o 1} _, _o ₂ ,...,o _j ,...,o _P ) select one o _j as a personalized response.

本发明的优点在于：The advantages of the present invention are:

本发明通过引入了一个支持一次性学习的辅助重构网络，将少量个性化样本转化为大量数据，从而解决了少量的个性化样本无法直接训练一个高精度深度网络的问题。同时，本发明的方法有效地避免中心和边缘之间的数据传输，它对现有的深度学习系统进行了扩展，可以将其应用到广域网络中，它不仅解决了当分布式数据存在个性化信息时，处理精度难以提升的问题，而且改善现有分布式大数据实时处理系统的传输代价。The present invention converts a small amount of personalized samples into a large amount of data by introducing an auxiliary reconstruction network that supports one-time learning, thereby solving the problem that a small amount of personalized samples cannot directly train a high-precision deep network. At the same time, the method of the present invention effectively avoids the data transmission between the center and the edge. It expands the existing deep learning system and can be applied to the wide area network. It not only solves the problem of personalization of distributed data When processing information, it is difficult to improve the processing accuracy, and improve the transmission cost of the existing distributed big data real-time processing system.

附图说明Description of drawings

图1是本发明的基于少量个性化样本的双网络深度学习方法的流程图；Fig. 1 is the flow chart of the dual-network deep learning method based on a small amount of individualized samples of the present invention;

图2是本发明中所涉及的双网络的结构示意图。Fig. 2 is a schematic structural diagram of the dual network involved in the present invention.

具体实施方式detailed description

现结合附图对本发明作进一步的描述。The present invention will be further described now in conjunction with accompanying drawing.

如图1和图2所示，本发明的基于少量个性化样本的双网络深度学习方法，包括：As shown in Figure 1 and Figure 2, the dual-network deep learning method based on a small number of personalized samples of the present invention includes:

步骤1)、采集少量个性化样本，作为双网络系统的输入；其中，少量个性化样本的总数记为m，取值范围为m≥1；优选地，m≈M/J，其中的M为用大数据训练相同精度的深度网络时的样本总数，J为通过一个个性化样本重构出的样本的个数。所采集的个性化样本中样本数据的集合可记为中i的取值范围为1≤i≤m，上角标p表示个性化(personalized)。Step 1), collecting a small amount of personalized samples as the input of the dual network system; wherein, the total number of a small number of personalized samples is recorded as m, and the value range is m≥1; preferably, m≈M/J, where M is The total number of samples when using big data to train a deep network with the same precision, J is the number of samples reconstructed from a personalized sample. The collection of sample data in the collected personalized samples can be written as The value range of i is 1≤i≤m, and the superscript p means personalized.

本步骤中，所采集的个性化样本除了样本数据本身以外，还包括该样本所对应的标签。个性化样本标签是指对于任意数据的正确输出个性化样本标签的集合可记为本步骤所采集的个性化样本是样本数据与标签的集合，可记为 In this step, in addition to the sample data itself, the collected personalized sample also includes the label corresponding to the sample. Personalized sample labels refer to any data The correct output of The collection of personalized sample labels can be written as The personalized sample collected in this step is a collection of sample data and labels, which can be written as

步骤2)、利用所采集的个性化样本的样本数据来训练双网络中的重构网络(记为完成少量样本的一次性学习，并将个性化样本的标签(记为)输入到经过训练的重构网络，来生成大量新的重构数据及其标签，将新的重构数据及其标签的集合记为其中任意一个重构数据，为该重构数据对应的标签，k的取值为1≤k≤m×J，m为个性化样本数目，J为通过任一个性化样本重构出的样本的个数，J和个性化样本数目m的关系如下：m值越小，J的取值越大，以保证有足够的重构样本m×J来训练深度网络。Step 2), using the sample data of the collected personalized samples to train the reconstruction network in the double network (denoted as Complete one-time learning of a small number of samples, and label the personalized samples (denoted as ) into the trained reconstruction network to generate a large amount of new reconstructed data and their labels, and record the new reconstructed data and their labels as in Any reconstructed data, is the label corresponding to the reconstructed data, the value of k is 1≤k≤m×J, m is the number of personalized samples, J is the number of samples reconstructed from any personalized sample, J and personalized The relationship between the number of samples m is as follows: the smaller the value of m, the larger the value of J, so as to ensure that there are enough reconstructed samples m×J to train the deep network.

其中，双网络中的重构网络E_a ⁽ⁱ⁾可以是任何一种具有随机重构大量样本功能的学习网络，包括贝叶斯网络和无层次神经网络在内；重构网络E_a ⁽ⁱ⁾优选为具有一次性学习功能的贝叶斯程序学习网络；所述的贝叶斯程序学习网络先根据背景知识库获得的先验知识对输入数据进行拆分，将输入数据拆分成小的部分和元组，然后将这些小的部分和元组进行重新组合，生成新的重构数据再从生成的多个重构数据中，选取与原数据相似度最高的J个数据作为模块的输出；其中J为正整数，用于表示通过一个个性化样本重构出的样本个数。Among them, the reconstruction network E _a ⁽ⁱ⁾ in the double network can be any learning network with the function of randomly reconstructing a large number of samples, including Bayesian network and non-hierarchical neural network; the reconstruction network E _a ^{(i )} is preferably a Bayesian program learning network with a one-time learning function; the Bayesian program learning network first splits the input data according to the prior knowledge obtained in the background knowledge base, and splits the input data into small Parts and tuples, and then recombine these small parts and tuples to generate new reconstructed data From the multiple generated reconstructed data, select J data with the highest similarity with the original data as the output of the module; where J is a positive integer, which is used to represent the number of samples reconstructed through a personalized sample.

步骤3)、基于来自重构网络的大量新的重构数据及其标签的集合用现有技术来训练双网络中的深度网络(记为E_d ⁽ⁱ⁾)；Step 3), based on a large amount of new reconstruction data and its label collection from the reconstruction network Use the existing technology to train the deep network in the double network (denoted as E _d ⁽ⁱ⁾ );

其中，双网络中的深度网络E_d ⁽ⁱ⁾可以是任何一种具有高精度的深度学习网络，主要包括深度前馈网络、深度置信网、深度玻尔兹曼机、深度卷积网络。Among them, the deep network E _d ⁽ⁱ⁾ in the dual network can be any kind of high-precision deep learning network, mainly including deep feedforward network, deep belief network, deep Boltzmann machine, and deep convolutional network.

步骤4)、将用于测试的个性化样本或来自外部的新数据输入到经过训练的深度网络E_d ⁽ⁱ⁾中，得到S形函数的输出，再经过一个软最大回归从S形函数的输出中选择一个结果，作为个性化的响应。Step 4), input personalized samples for testing or new data from the outside into the trained deep network E _d ⁽ⁱ⁾ , get the output of the sigmoid function, and then go through a soft maximum regression from the sigmoid function Select a result from the output as a personalized response.

其中，S形函数的表达式为：Among them, the expression of the sigmoid function is:

其中，y_j是第j个神经元的输出，其值是通过深度网络的隐层计算出来的，具体计算公式如下：Among them, y _j is the output of the jth neuron, and its value is calculated through the hidden layer of the deep network. The specific calculation formula is as follows:

其中，f是一个给定的激活函数，w_ij是从第i个神经元到第j个神经元的连接权重，x_i是来自第i个神经元的输入，b_j是偏置权重。where f is a given activation function, w _ij is the connection weight from i-th neuron to j-th neuron, x _i is the input from i-th neuron, and b _j is the bias weight.

S形函数的输出是一个P维的向量(o₁,o₂,…,o_j,…,o_P)，向量中的每一个元素o_j＝sigmoid(y_j)都是介于0和1之间的实数，这里的P是个性化标签的总数。The output of the sigmoid function is a P-dimensional vector (o ₁ ,o ₂ ,…,o _j ,…,o _P ), each element o _j =sigmoid(y _j ) in the vector is between 0 and 1 A real number between , where P is the total number of personalized tags.

其中，软最大回归的公式如下：Among them, the formula of soft maximum regression is as follows:

所述软最大回归表示分别以(softmax(o₁),softmax(o₂),…,softmax(o_j),…,softmax(o_P))的概率在(o₁,o₂,…,o_j,…,o_P)选中一个o_j作为个性化响应。 _The soft maximum _regression means that _{(o 1} _, _o ₂ ,...,o _j ,...,o _P ) select an o _j as a personalized response.

最后所应说明的是，以上实施例仅用以说明本发明的技术方案而非限制。尽管参照实施例对本发明进行了详细说明，本领域的普通技术人员应当理解，对本发明的技术方案进行修改或者等同替换，都不脱离本发明技术方案的精神和范围，其均应涵盖在本发明的权利要求范围当中。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention rather than limit them. Although the present invention has been described in detail with reference to the embodiments, those skilled in the art should understand that modifications or equivalent replacements to the technical solutions of the present invention do not depart from the spirit and scope of the technical solutions of the present invention, and all of them should be included in the scope of the present invention. within the scope of the claims.

Claims

1. a kind of dual network deep learning method based on a small amount of personalized sample, the dual network include reconstructed network and depth Network, this method include：

Step 1), a small amount of personalized sample of collection；Wherein, the personalized sample includes sample data and its label；

Step 2), using the sample data in personalized sample the reconstructed network is trained, then by the mark in personalized sample In reconstructed network of the label input after training, new reconstruct data and its label are generated；

Step 3), based on the new reconstruct data and its label training depth net from reconstructed network obtained by step 2) Network；

Step 4), by data input to be tested into trained depth network, obtain the output of sigmoid function, then pass through One soft maximum returns selects response of the result as personalization from the output of the sigmoid function.

2. the dual network deep learning method according to claim 1 based on a small amount of personalized sample, it is characterised in that institute The number for stating a small amount of personalized sample is designated as m, and m ≈ M/J, M therein are when training the depth network of same precision with big data Total sample number, J is by the number of the sample that property sample reconstructs one by one.

3. the dual network deep learning method according to claim 1 based on a small amount of personalized sample, it is characterised in that In step 2), the new reconstruct data and its set expression of label that are generated are { x^r _k,y^r _k}；Wherein,

x^r _kRepresent any one reconstruct data, y^r _kFor label corresponding to the reconstruct data, k value is 1≤k≤m × J, and m is individual Property number of samples, J is the number of sample reconstructed by any personalized sample；J and personalized sample number m relation It is as follows：M values are smaller, and J value is bigger, to ensure to have enough reconstructed sample m × J to train depth network.

4. the dual network deep learning method according to claim 1 based on a small amount of personalized sample, it is characterised in that institute It is to have random reconstruct great amount of samples including Bayesian network and without any one including leveled neural net to state reconstructed network The learning network of function.

5. the dual network deep learning method according to claim 4 based on a small amount of personalized sample, it is characterised in that institute It is Bayes's programmed instruction programmed learning network with disposable learning functionality to state reconstructed network.

6. the dual network deep learning method according to claim 1 based on a small amount of personalized sample, it is characterised in that institute It is any including depth feedforward network, depth confidence net, depth Boltzmann machine, depth convolutional network to state depth network One kind has high accuracy deep learning network.

7. the dual network deep learning method according to claim 1 based on a small amount of personalized sample, it is characterised in that institute Step 4) is stated to further comprise：

Step 4-1), calculate by the hidden layer of depth network the output of each neuron, its calculation formula is：

<mrow> <msub> <mi>y</mi> <mi>j</mi> </msub> <mo>=</mo> <mi>f</mi> <mrow> <mo>(</mo> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>+</mo> <msub> <mi>b</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow>

Wherein, y_jIt is the output of j-th of neuron, f is a given activation primitive, w_ijIt is from i-th of neuron to j-th The connection weight of neuron, x_iIt is the input from i-th of neuron, b_jIt is biasing weight；

Step 4-2), calculate sigmoid function output；Wherein, the expression formula of sigmoid function is：

The output of resulting sigmoid function is the vector (o of a P dimension₁,o₂,…,o_j,…,o_P), vector in each element o_j=sigmoid (y_j) all it is real number between 0 and 1, P here is the sum of personalized labels；

Step 4-3), pass through soft maximum return and select a result be used as personalized response from the output of the sigmoid function； Wherein, the formula of soft maximum recurrence is as follows：

<mrow> <mi>s</mi> <mi>o</mi> <mi>f</mi> <mi>t</mi> <mi> </mi> <mi>m</mi> <mi>a</mi> <mi>x</mi> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mi>exp</mi> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <munder> <mo>&Sigma;</mo> <mi>i</mi> </munder> <mi>exp</mi> <mrow> <mo>(</mo> <msub> <mi>o</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>

Soft maximum return is represented respectively with (softmax (o₁),softmax(o₂),…,softmax(o_j),…,softmax (o_P)) probability sigmoid function output (o₁,o₂,…,o_j,…,o_P) in choose an o_jResponded as personalization.