CN108304359B

CN108304359B - Unsupervised learning uniform characteristics extractor construction method

Info

Publication number: CN108304359B
Application number: CN201810117102.XA
Authority: CN
Inventors: 杨楠; 曹三省
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2018-02-06
Filing date: 2018-02-06
Publication date: 2019-06-14
Anticipated expiration: 2038-02-06
Also published as: CN108304359A

Abstract

The application provides a kind of unsupervised learning uniform characteristics extractor construction method, it is characterised in that: obtains practical newsletter archive data from server end and generates news features training dataset；The data that news features training data is concentrated are subjected to processing dyad and obtain news features training vector collection；News data collection is sorted out according to user accesses data, forms user characteristics training dataset；Building one has the asymmetric noise reduction of the stack of multiple hidden layers to shrink self-encoding encoder, is trained using specific objective function to depth self-encoding encoder；After depth self-encoding encoder completes training, decoder section is deleted, a binaryzation generation layer is added, unsupervised learning uniform characteristics extractor is completed in building.Unsupervised learning uniform characteristics extractor provided by the present application, the unification that news features and user characteristics may be implemented, the unification based on commending contents and collaborative filtering recommending, and improve the efficiency of real-time recommendation.

Description

Method for constructing unsupervised learning unified feature extractor

Technical Field

The invention belongs to the field of artificial intelligence, and particularly relates to a construction method of an unsupervised learning unified feature extractor.

Background

The current recommendation system or recommendation engine is generally classified into content-based recommendation, collaborative filtering recommendation, mixed recommendation and other types, is an information tool which is as important as a search engine in the current society, and is widely applied to the fields of e-commerce, media recommendation and the like. The current popular collaborative filtering method is mainly based on commonality, namely, similarity between users and similarity between items are calculated through the scores of some users on commodities or media contents (which can be collectively called as "items"), then the scores of other users similar to the interests of the users are used for deducing the scores of the users on new items, or the scores of the users on new items are predicted according to the similarity with the items which the users have interests, so that the method is also called score prediction, but the method has the defects of insufficient personalization and difficult prediction under the condition of insufficient score data.

The recommendation based on the content mainly models the preference of a certain user and the attribute of an article, and the recommendation is carried out according to the preference and the preference of the user, so that the personalization is strong, but the modeling and the matching of the preference of the user and the attribute of the article are difficult. Past user preference modeling requires the use of direct features such as demographics and is also prone to invading a person's privacy.

Deep learning is a new machine learning method which is emerging in recent years and can be divided into supervised learning and unsupervised learning. An auto encoder (AF) is a leading edge of current research on unsupervised learning, but most of the current depth auto encoding systems have advantages and disadvantages, such as easiness in overfitting and the like, most of the depth auto encoding systems do not realize unsupervised learning in a complete sense, and the exertion of the capability of the depth auto encoder is greatly restricted.

Under the condition that technologies such as artificial intelligence, deep learning and unsupervised learning are rapidly developed at present, a new technology and a new method need to be researched and used for updating the technical basis of the recommendation system, hybrid recommendation is effectively achieved, and online recommendation efficiency is greatly improved.

Disclosure of Invention

Aiming at the problems that the personalization is insufficient, the user characteristic extraction is difficult, different methods are Unified to form an effective mixed recommendation method, the privacy is violated in the user characteristic extraction, the real-time recommendation efficiency needs to be improved and the like in the application of current fusion media news recommendation and the like, according to the current novel artificial intelligence technology, the application discloses a construction method of an Unsupervised Learning Unified Feature Extractor (ULUFE) for extracting Content-Based Unified Feature Representation (URBC). A construction method of an unsupervised learning unified feature extractor comprises the following steps:

s1, acquiring actual news text data and user access data from a server, and generating a news characteristic training data set after sorting and randomizing;

s2, preprocessing the data in the news characteristic training data set by using the current Chinese word segmentation tool to obtain a preprocessed news characteristic training data set;

s3, obtaining a news characteristic training vector set from the preprocessed news characteristic training data set through a TF-IDF method;

s4, classifying the news characteristic training vector set according to the user access data to form a user characteristic training data set;

s5, constructing a stacked asymmetric noise reduction and contraction self-encoder with a plurality of hidden layers and using J_SA-CDAEAs an objective function:

wherein,

wherein k is_σIs gaussian kernel, with standard deviation σ of 1.0, gaussian kernel function:

where x denotes the input of the encoder, f_θ() Representing the output of the encoder, g_θ() Represents the decoder output; l is_MC() A cost function representing a single input, λ is a regularization parameter of a systolic auto-encoder, | | | | luminance_FIs the F norm symbol, J (x) is the encoder Jacobian matrix, θ is the parameter set for the depth autocoder, x_iRepresenting the input to the encoder in one training session,representing the output restored by the decoder, t representing the training set, and z representing the algebraic expression in the Gaussian kernel;

s6, training the depth self-encoder, wherein the training steps are as follows:

s61, taking the news feature training vector set as training data of the depth self-encoder;

s62, adding Gaussian white noise into the training data to generate input data with noise;

s63, taking the input data with noise as the input of the depth self-encoder, and during training, adopting a batch gradient descent method, firstly performing unsupervised layer-by-layer pre-training to obtain initial parameters of each hidden layer and output data of an output layer;

s64, comparing input training data with output data in the objective function to realize the reverse propagation of the gradient and adjust the initial parameters of each hidden layer;

s65, obtaining a parameter set of the depth self-encoder after the training is finished;

and S7, deleting a decoder part of the depth self-encoder, adding a binary generation layer after the output of the last hidden layer, and completing the construction of the unsupervised learning unified feature extractor.

Preferably, the step S1, obtaining actual news text data and user access data from the server, and generating a news characteristic training data set after sorting and randomizing, specifically includes the following steps:

s11, collecting news data and user access data in a certain time period on the server;

s12, removing pictures and videos in news data, uniformly coding the pictures and videos into UTF-8, setting a sequence number for each news to form a news data set;

s13, randomizing and reordering the news in the news data set according to the sequence numbers, and then respectively using the news as the news characteristic training data sets in the layer-by-layer unsupervised pre-training stage and the global training stage according to a certain proportion.

Preferably, in step S5, a stacked asymmetric noise reduction and shrinkage self-encoder with multiple hidden layers is constructed, including 2 hidden layers.

Preferably, the coding function of the first hidden layer is h₁(x_i)＝S(w₁x_i+b₁) The pre-training decoding function is The coding function of the second hidden layer is h₂(h₁)＝S(w₂h₁+b₂) The pre-training decoding function is

The global training decoding function from the second hidden layer to the output layer is g_o(x_i)＝S(w₁x_i+b₁)；

The initial parameters of each layer are [0,1 ]]The nonlinear activation function S () uses a Sigmoid function in common, e is the Euler number, h represents the coding function of the hidden layer, g is the decoding function, b represents the offset, x represents the input to the coder, w₁、w₂The weight parameters of the first and second hidden layers are respectively.

Preferably, the dimension of the binary generation layer in step S7 is the same as that of the last hidden layer of the depth self-encoder, and a one-to-one connection is realized with each neuron of the last hidden layer; the binary generation layer is provided with a weight regulator according to the output of the last hidden layer to realize threshold value regulation, the selection of the threshold value T in the weight regulator enables the output result of one complete training to be divided into two types, and the variance between the two types is the largest.

Preferably, the method further includes step S8, inputting the user feature training vector set to the unsupervised learning unified feature extractor to obtain a user preference model, and generating a unified user neighbor table through similarity comparison according to the user preference model of each user.

The application has the advantages that:

1. aiming at the problems that manual marking data required by supervised learning in the quick recommendation of network media is difficult to obtain in real time, and the conventional depth self-encoder still needs supervised fine tuning after adopting unsupervised layer-by-layer pre-training, the depth self-encoder can realize whole-process unsupervised learning;

2. the deep structure is adopted to replace a single hidden layer structure, so that the capability of high-order potential interpretation factors of the learning content is further improved;

3. the method has the advantages that the encoder and the decoder are asymmetrical, the hidden layer dimension is lower than the input layer dimension, the nonlinear flow pattern of data can be learned, the dimension reduction is realized while the characteristics are extracted, and the method is superior to linear flow patterns such as PCA. And the asymmetry can also be used as a means for solving the problem that the self-encoder is easy to over-fit;

4. the features output by the self-encoder are convenient for binarization processing, and the binarization features can be generated after a binarization generation layer is added, so that the rapid similarity comparison problem of users and news in a fusion medium can be solved by cosine similarity comparison, Hamming distance comparison, Hash and other methods in recommendation, and the rapid recommendation effect on short news in a mobile medium is obvious.

5. In application, the features (based on unified feature representation of content) extracted from the news data are used as the features of the news to be recommended and the user, so that the unification of the two features is realized, the unification of the recommendation method based on content recommendation and collaborative filtering recommendation is also realized, the innovation of the recommendation method is realized while the privacy of the user is effectively protected, and the recommendation efficiency is improved.

Drawings

FIG. 1 is a schematic design of an SA-CDAE according to the present invention;

FIG. 2 is a schematic diagram of the training of the present invention;

FIG. 3 is an unsupervised learning feature extractor of the present invention;

FIG. 4 is a schematic diagram of an online recommendation of the present invention;

FIG. 5 is a graph comparing accuracy rates of the present invention;

FIG. 6 is a chart comparing recall rates of the present invention.

Detailed Description

The specific implementation and detailed steps of the unsupervised learning unified feature extractor construction method of the present invention are further described below:

the method comprises the following steps: data acquisition and preparation

The invention mainly aims at website text news and mobile phone news client text news in the current fusion media. The news text data and the user access data are both located at a server side, a 'news characteristic training data set' needs to be generated in the step, and the specific process is as follows:

1) collecting news data and user access data in a certain period of time on a server, wherein the news data comprises historical news on the server, and the user access data comprises a news ID list read by a user in a certain period of time;

2) removing irrelevant contents such as pictures, videos and the like in news data, uniformly coding the irrelevant contents into UTF-8, and setting a sequence number for each piece of news to form a news data set;

3) the news in the news data set is randomized and reordered according to the sequence number, and then the news is respectively used as a 'news characteristic training data set' in a layer-by-layer unsupervised pre-training stage and a global training stage according to a certain proportion.

Step two: text data preprocessing

And performing Chinese word segmentation, stop word elimination and other processing on the data in the news characteristic training data set by using the current Chinese word segmentation tool to obtain a preprocessed news characteristic training data set.

Step three: news text data vectorization

Performing vectorization processing on the preprocessed news characteristic training data set by using a TF-IDF method to obtain a news characteristic training vector set corresponding to the news characteristic training data set; TF-IDF is an abbreviation for "term frequency-inverse document frequency".

TF means word frequency, and the calculation formula is as follows:

IDF means inverse document frequency and is calculated by the formula:

on the premise of keeping the relative position of words in the news characteristic training data set, obtaining initial characteristic vectors of data in the news characteristic training data set through a TF-IDF method to form a news characteristic training vector set, wherein the TF-ID calculation formula is as follows:

TF-ID＝TF*IDF

step four: obtaining a user feature training dataset

Classifying the news characteristic training vector set according to user access data to obtain a user characteristic training vector set;

step five: constructing a stacked asymmetric noise reduction contraction self-encoder

The core component of the unsupervised learning unified feature extractor in the application is a specially designed depth self-encoder, and the role of the unsupervised learning unified feature extractor in the invention is mainly embodied in two aspects of feature extraction and dimension reduction. The invention designs a Stacked (depth) Asymmetric noise reduction and shrinkage self-encoder (SA-CDAE) with 2 or 3 hidden layers by combining the advantages of the shrinkage self-encoder and the noise reduction self-encoder around the application target of the fusion media intelligent recommendation, as shown in figure 1. Structurally, the multi-hidden layer is adopted to improve the feature extraction capability of the single hidden layer, the input layer and the output layer have the same dimension, the hidden layer dimension is smaller than that of the input layer, the hidden layer dimension is gradually reduced layer by layer according to a proportion, and the coding and decoding structure is asymmetric, so that the anti-overfitting capability is improved. The initial news training vector set obtained after the early preparation and the preprocessing accords with independent same steps on the whole, but a certain amount of disturbance exists, the specific distribution is unknown, and the vector set is recorded as D ═ x₁，x₂，...，x_n}，x_i∈R^dAnd N belongs to N, then:

the coding function of the first hidden layer is h₁(x_i)＝S(w₁x_i+b₁)，

The pre-training decoding function of the first hidden layer is

The coding function of the second hidden layer is

The pre-training decoding function of the second hidden layer is

The initial parameters of each layer are [0,1 ]]Random numbers in open intervals, nonlinear activation function S () uses a Sigmoid function in unison,

wherein D represents news initial training vector set, R is real number set, N is natural number set, h represents coding function of hidden layer, g represents decoding function, b represents bias, e is Euler number, h represents coding function of hidden layer, g represents decoding function, b represents bias, x represents input of encoder, w represents input of encoder, and the like₁、w₂Weight parameters, x, of the first and second hidden layers, respectively_iRepresenting the input to the encoder in one training session.

The principle of the self-encoder is to try to make the input of the encoder reappear at the output of the decoder by training an encoding and decoding mechanism, wherein the encoder part is also called hidden layer, and the decoder part is also called output layer. It is not easy and not practical to reconstruct the input at the output end completely, but it only can implement approximate replication by designing special structure, adding constraint in replication properly, using special cost function and training method, and can force the model to replicate the data in the input according to the weight, so as to construct useful distributed features in the data in the encoder of the self-encoder, which has become the leading edge of the research of generating model in recent years. The prototype automatic encoder represents better feature extraction capability, but the problems of overfitting and the like easily occur in use, the generalization capability of actual data is lost, and then the derivative automatic encoder which is improved and optimized for the prototype is developed in succession.

The depth self-encoder of the present invention is designed to take into account both the addition of noise and the reduction of noise (perturbations). The noise adding means that white noise with Gaussian distribution is added into input X by means of the thought of denoisingAutoEncoders, so that the decoder can forcibly remove the interference of the noise during output, the over-fitting resistance of the system is improved, the noise reducing self-coding characteristic can be achieved by adding the white Gaussian noise into the input during training, and the over-fitting risk is further reduced. The parameter set θ of the neural network is trained by back propagation and Stochastic Gradient Descent (SGD).

Reducing noise (disturbance) refers to improving the system's resistance to non-gaussian distributed noise and disturbance during training. In order to further reduce the influence of outliers in a news characteristic data set and a user characteristic data set and provide a basis for further adopting binarization generation in a scheme, the characteristic of a contraction automatic encoder is partially adopted in design. The shrinkage automatic encoder adds analytic shrinkage penalty factors in the cost function expression of the prototype automatic encoder to reduce the freedom degree of characteristic expression, so that hidden layer neurons reach a saturation state, and output data is limited within a certain range of a parameter space. The penalty factor is actually an F norm (Frobenius norm) of a Jacobian matrix (Jacobian) of the encoder, and has the functions of reducing the influence of an outlier (outlier) on the encoder, suppressing the disturbance of a training sample (on a low-dimensional manifold surface) in all directions and assisting the encoder to learn useful data characteristics. Furthermore, the distributed representation learned by the systolic auto-encoder has the feature of "saturation", i.e. most hidden layer elements have values close to both ends (0 or 1) and the partial derivative to the input is close to 0.

In general self-encoder training, a Mean Square Error (MSE) is often used as a cost function, and there is a certain tolerance to gaussian-distributed noise, but in this example, considering the existence of disturbances such as minimization variables, for example, in an accidental reading situation outside user preference, in order to improve robustness, the present embodiment uses maximum correlation entropy (MC) as the cost function:

the overall objective function of the depth self-encoder in the invention is as follows:

in the above formulae, f_θ() Representing the output of the encoder, g_θ() Represents the decoder output; l is_MC() A cost function representing a single input, λ is a regularization parameter of a systolic auto-encoder, | | | | luminance_FIs the F norm symbol, J (x) is the encoder Jacobian matrix, θ is the parameter set for the depth autocoder, x_iRepresenting the input to the encoder in one training session,representing the output restored by the decoder, t representing the training set, and z representing the algebraic expression in the gaussian kernel.

Step six: training depth autoencoder

The training of the neural network refers to that cleaned and sorted data are used as input, and parameters of a neural network target function tend to converge gradually through two links of forward propagation and backward propagation, so that high-order statistical characteristics are learned. As shown in fig. 2, the deep self-encoder takes off-line training, and the main training steps are as follows:

1) the method comprises the steps that a news characteristic training vector set is used as training data of a deep self-encoder and is set to be X, so that the training data in the application are news data, and neither manual marking nor third-party corpus is needed;

2) adding white Gaussian noise to the training data X to generate input data X with noise¹；

3) Mixing X¹As the input of the depth self-encoder, a batch gradient descent method (Mini-batch) is adopted during training, unsupervised layer-by-layer pre-training is firstly carried out to obtain initial parameters of each hidden layer and the output of an output layer

4) For X and X in the objective functionAnd comparing to realize global backward propagation of the gradient and adjusting the initial parameters of all hidden layers.

5) And after the training is finished, obtaining a parameter set of the depth self-encoder, and using the parameter set to construct an unsupervised learning unified feature extractor in the next step.

The deep self-encoder in the application takes brand-new consideration on the aspects of structure, cost function, training mode and the like, can realize dimension reduction while extracting features, can learn nonlinear flow patterns, and is greatly superior to linear flow pattern methods such as PCA (principal component analysis) in the aspect of dimension reduction. In addition, according to the parallelization characteristic of the neural network, GPU parallel computing is adopted to accelerate the main training steps of the depth self-encoder, so that the training efficiency of the depth self-encoder is greatly improved, and the practical application efficiency in a recommendation system is improved.

Step seven: constructing an unsupervised learning unified feature extractor

The trained output of the deep self-encoder has the characteristic of easy binarization, for this reason, after the deep self-encoder finishes training, a decoder part is deleted, and a binarization generation layer is added after the output of the last hidden layer, wherein the binarization generation layer is used for finishing binarization processing, and as shown in fig. 3, the construction of the unsupervised learning unified feature extractor is finished.

In this embodiment, about 70% of the output of the depth self-encoder is close to 0 and 1, and binarization is easy, but how the remaining 30% is processed will directly affect the overall binarization extraction effect and the accuracy of subsequent similarity comparison. Therefore, a binary generation layer with the same dimension as the last hidden layer of the depth self-encoder is structurally designed, and one-to-one connection with each neuron of the last hidden layer is realized; in internal design, a common fixed threshold is not adopted in a binary generation layer, but a weight regulator is designed according to the actual distribution output by the last hidden layer to realize threshold adjustment, and the selection principle of the threshold T in the weight regulator is that the output result of one-time complete training can be divided into two types, and the variance between the two types is maximum.

After one complete training, the total output set of each unit of the hidden layer is K, wherein N different data exist. Sorting K from small to large to obtain a data set K (K1, K2, …, ki), and dividing the data set K into two groups K1 and K2 with the size of t and N-t, wherein the occurrence frequency of each ki is ni, wherein i belongs to [1, N ∈](ii) a Probability of occurrence of two groups in the whole respectively₁、ε₂The mean values of the two groups are β respectively₁、β₂. Then the probability of ki occurring is p_i＝ni/N，ε₂＝1-ε₁The mean values of the two groups are respectivelyThe mean value of the data set K is β ═ epsilon₁β₁+ε₂β₂. The between-class variance of the two groups is defined as δ (t) ═ e₁(β₁-β)²+ε₂(β₂-β)². Get T ═ argmax_t(6(t)), i.e., δ(T) finding the value corresponding to the position of T from K as a threshold value at the maximum T, setting the value less than or equal to T as 0, and setting the rest as 1, thereby realizing the binaryzation of hidden layer output.

Step eight: obtaining a user preference model and a user neighbor table

And after the construction of the unsupervised learning unified feature extractor is completed, inputting the user feature training vector set into the unsupervised learning unified feature extractor to obtain a user preference model, and generating a unified user neighbor table through similarity comparison according to the user preference model of each user.

Fig. 4 shows an example of personalized recommendation using an unsupervised learning unified feature extractor. Preprocessing and vectorizing all news texts to be recommended, and inputting the processed news texts into an unsupervised learning unified feature extractor to obtain a news feature vector to be recommended, wherein the news feature vector is represented by unified features based on content; comparing the similarity of the news characteristic vector to be recommended with a preference model of a user to generate a recommendation list based on content; and generating a collaborative filtering recommendation list from news read by a user similar to the user A1 by using the user neighbor table, and obtaining a Top-N recommendation list of mixed recommendations after weighted mixing.

The unsupervised learning unified feature extractor disclosed by the invention is innovative in the aspects of overall design and application mode:

1. innovations are made in design: the design of the depth self-encoder integrates the characteristics of a contraction self-encoder and a noise reduction self-encoder, a new objective function is designed, the extraction capability of high-order statistical information is structurally improved through a depth structure (2-3 hidden layers), and the number of neurons in each hidden layer is reduced progressively, so that the coding and decoding asymmetry of the depth self-encoder is realized, the problem of overfitting easily occurring in the self-encoder is favorably improved, the robustness of feature extraction is improved, and the dimension reduction is realized while the features are extracted. After training is finished, a binarization generation layer is used for replacing an output layer to obtain an unsupervised learning feature extractor, binarization features can be generated, hamming distance comparison is facilitated, and operations such as Hash comparison are facilitated.

2. The innovation is realized on the training mode: in the past, a self-encoder of a single hidden layer totally uses input and output as comparison data, and updates network parameters by back propagation after errors are obtained. After unsupervised layer-by-layer pre-training is generally adopted in the multi-layer self-encoder, classifiers such as softmax are added behind the last hidden layer to perform supervised learning according to class labels, so that the whole body is semi-supervised learning. The self-encoder of the depth comprehensively considers the problems of network depth and calculation efficiency, input data are also used for comparison at an output end, and the obtained error is propagated reversely, so that complete unsupervised learning is realized.

3. The innovation is realized in application: the efficiency of applications such as a recommendation system is improved, and the privacy of individuals is effectively protected. On the aspect of improving the recommendation efficiency, the characteristics generated by extracting the news text are used as the favorite and preference characteristics of the user to the news, so that the uniform extraction of the characteristics (the characteristics of the user and the characteristics of the articles) is realized, and the uniformity of a mixed recommendation method on the technical basis is also realized; and data such as user demographic information and the like are avoided through the high-order statistical characteristics, the extracted vector is abstract information and does not contain explicit data of the user, and the user information cannot be leaked even if the vector is illegally acquired, so that privacy protection is realized, and the increasingly strict protection requirements of the country on the personal privacy information are met.

4. Innovations in training data are realized. In the existing methods such as collaborative filtering, similarity among users and similarity among articles are calculated through scoring of commodities or media contents by the users, but the current users rarely score read news, so that scoring data is rare and training data is insufficient. The application directly uses news data and user access data as training data of the depth self-encoder, and has the following characteristics: firstly, the defect of lack of training data is avoided; secondly, a third-party corpus is not used, so that the method is more practical.

In practical applications, precision and recall are the two most important indicators used in the recommendation system evaluation. Practical tests show that the features extracted by the unsupervised learning unified feature extractor constructed by the method are well matched with the recommendation method. Compared with the current popular method, the novel personalized recommendation method has good effects in the aspects of accuracy rate and recall rate, and is shown in the attached figures 5 and 6.

Finally, it should be noted that: the above-mentioned embodiments are only used for illustrating the technical solution of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A construction method of an unsupervised learning unified feature extractor is characterized by comprising the following steps:

wherein,

s6, training the depth self-encoder, wherein the training steps are as follows:

and S7, removing a decoder part of the depth self-encoder, and adding a binary generation layer after the output of the last hidden layer to complete the construction of the unsupervised learning unified feature extractor.

2. The unsupervised learning unified feature extractor construction method of claim 1, wherein:

the step S1, acquiring actual news text data and user access data from the server, and generating a news characteristic training data set after sorting and randomizing, specifically includes the following steps:

3. The unsupervised learning unified feature extractor construction method of claim 1, wherein:

in the step S5, a stacked asymmetric noise reduction and shrinkage self-encoder with multiple hidden layers is constructed, including 2 hidden layers.

4. The unsupervised learning unified feature extractor construction method of claim 3, wherein:

the coding function of the first hidden layer is h₁(x_i)＝S(w₁x_i+b₁) The pre-training decoding function is

The coding function of the second hidden layer is h₂(h₁)＝S(w₂h₁+b₂) The pre-training decoding function is

5. The unsupervised learning unified feature extractor construction method of claim 1, wherein:

the dimension of the binary generation layer in the step S7 is the same as that of the last hidden layer of the depth self-encoder, and one-to-one connection is realized with each neuron of the last hidden layer; the binary generation layer is provided with a weight regulator according to the output of the last hidden layer to realize threshold value regulation, the selection of the threshold value T in the weight regulator enables the output result of one complete training to be divided into two types, and the variance between the two types is the largest.

6. The unsupervised learning unified feature extractor construction method of claim 1, further comprising:

and S8, inputting the user feature training vector set into the unsupervised learning unified feature extractor to obtain a user preference model, and generating a unified user neighbor table through similarity comparison according to the user preference model of each user.