CN113220936B

CN113220936B - Video intelligent recommendation method, device and storage medium based on random matrix coding and simplified convolutional network

Info

Publication number: CN113220936B
Application number: CN202110623686.XA
Authority: CN
Inventors: 马晓波; 岳晓光; 高鹏; 武跃; 史建焘; 侯云峰; 李岩泽; 廉士勇
Original assignee: Heilongjiang Radio And Television Station
Current assignee: Heilongjiang Radio And Television Station
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2023-08-15
Anticipated expiration: 2041-06-04
Also published as: CN113220936A

Abstract

The application relates to a video recommendation method, in particular to an intelligent video recommendation method, device and storage medium based on random matrix coding and a simplified convolution network, and belongs to the technical field of network information processing. Specifically, firstly, preprocessing a data set; secondly, generating random matrix codes for the user characteristics through random vectors; secondly, transmitting the data coding matrix to a first full-connection layer to obtain a user characteristic vector; secondly, transmitting the video features to a simplified convolutional network to generate a simplified text convolutional network code, and generating a video feature matrix; secondly, transmitting the video feature matrix to a second full-connection layer to obtain a film feature vector; secondly, calculating a prediction score through the user feature vector and the film feature vector, and carrying out fitting training on the prediction score and the real score; and finally, video recommendation is carried out on the user through the prediction scores. The technical problems of large calculated amount and complicated information coding of the video recommendation method in the prior art are solved.

Description

Video intelligent recommendation method, device and storage medium based on random matrix coding and simplified convolutional network

Technical Field

The application relates to a video recommendation method, in particular to an intelligent video recommendation method, device and storage medium based on random matrix coding and a simplified convolution network, and belongs to the technical field of network information processing.

Background

The value of the recommendation algorithm is increasingly revealed towards the mass and structural complexity of the current internet information. Deep learning techniques are currently evolving rapidly and with significant success in many areas, but currently there are few studies on their application to video recommendation algorithms.

Deep learning is the inherent regularity and presentation hierarchy of learning sample data, and the information obtained during such learning is helpful in interpreting data such as text, images and sounds. Its final goal is to have the machine have analytical learning capabilities like a person, and to recognize text, image, and sound data. Deep learning is a complex machine learning algorithm that achieves far greater results in terms of speech and image recognition than prior art. Deep learning has achieved many results in search technology, data mining, machine learning, machine translation, natural language processing, multimedia learning, speech, recommendation and personalization techniques, and other related fields. The deep learning makes the machine imitate the activities of human beings such as audio-visual and thinking, solves a plurality of complex pattern recognition problems, and makes the related technology of artificial intelligence greatly advanced.

Existing solutions that combine deep learning with recommendation systems are one-hot encoding of the raw data, because most algorithms are calculated based on metrics in vector space and are equidistant from the dots in order to make variable values that are not in a partial order relationship out of partial order. And using one-hot coding to expand the value of the discrete feature to an European space, wherein a certain value of the discrete feature corresponds to a certain point of the European space. This will make the calculation of the distance between the features more reasonable. After one-hot encoding of discrete features, the features of each dimension can be considered as continuous features. After mapping the discrete features to European space through one-hot coding, the discrete features are utilized in machine learning algorithms such as regression, classification, clustering and the like.

The currently used one-hot coding becomes very large in feature space when the number of categories is large. In this case, PCA can generally be used to reduce the dimensions. And the combination of one hot encoding+pca is also very useful in practice. However, this brings about many extra computational burdens, which makes the system design more complex, and in the practical application scenario, there are many times when it is unnecessary to perform extra encoding on information with little meaning but high dimensionality for the calculation result, such as user id, to increase the difficulty and complexity of the system design.

Therefore, the application provides the method for coding the information based on random matrix coding and simplified convolution network, which reduces the calculated amount and simplifies the information coding process while maintaining the whole deep learning model architecture.

Disclosure of Invention

The following presents a simplified summary of the application in order to provide a basic understanding of some aspects of the application. It should be understood that this summary is not an exhaustive overview of the application. It is not intended to identify key or critical elements of the application or to delineate the scope of the application. Its purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.

In view of this, in order to solve the technical problems of large calculated amount and complicated information coding of the video recommendation method in the prior art, the application provides a video intelligent recommendation method, a device and a storage medium based on random matrix coding and a simplified convolution network, which convert the video intelligent recommendation method into numbers in a data category processing field to be used as input, so that the dimension of an embedded layer is reduced, and the time and the power consumption for calculation are small. The technical problems of large calculated amount and complicated information coding of the video recommendation method in the prior art are solved.

A video intelligent recommendation method based on random matrix coding and a simplified convolution network comprises the following steps:

s1, preprocessing a data set;

s2, generating random matrix codes for the user characteristics through the random vectors, and generating a data coding matrix;

s3, transmitting the data coding matrix in the step S2 to a first full-connection layer to obtain a user feature vector;

s4, transmitting the video features to a simplified convolutional network to generate a simplified text convolutional network code, and generating a video feature matrix;

s5, transmitting the video feature matrix in the step S4 to a second full-connection layer to obtain video feature vectors;

s6, calculating a prediction score through the user feature vector and the video feature vector, and performing fitting training on the prediction score and the real score;

s7, video recommendation is conducted on the user through prediction scoring.

Preferably, the specific method for preprocessing the data set in step S1 is: dividing the data into user features and video features; the user features specifically include: user occupation and video type, user age, user gender, and user video ID; the video features specifically include: video name and user name.

Preferably, the specific method for generating the random matrix code for the user feature by using the random vector in step S2 is: converting F and M of the Gender fields to O and 1; converting the Age field into seven consecutive numbers of 0-6; the categories in the Genres field are converted into a dictionary of character strings to numbers, and the Genres field of each movie is converted into a digital list; the Zip-code field, timetables field, userID field, occupancy ID field, and movieID field remain unchanged.

Preferably, the specific method for transmitting the video features to the simplified convolution network in step S4 includes the following steps:

s4.1, inputting video features in an embedding layer of a simplified convolution network, and generating an embedding matrix by embedding vectors of each word in the video features;

s4.2, convolving on the embedded matrix in a convolution layer of the simplified convolution network by using a plurality of convolution kernels with different sizes, wherein each convolution covers a plurality of words;

s4.3, reducing the dimension of the vector of the convolution layer in the max pooling layer of the simplified convolution network and splicing to obtain a long vector.

Preferably, the specific method for obtaining the user feature vector in step S3 is: the relu function is used, which is self-contained in the activation function TensorFlow.

Preferably, the specific method for obtaining the video feature vector in step S5 is: transmitting the long vector in the step S4.3 to a second full-connection layer for normalization by using dropout to obtain a video feature vector; obtaining the video feature vector uses a hyperbolic tangent function of the activation function f (x) =tanh (x).

Preferably, in step S6, the specific method for calculating the prediction score by using the user feature vector and the video feature vector is as follows: the predictive score is calculated using vector multiplication.

Preferably, the specific method for performing fitting training on the true score in the step S6 is calculated by the following formula:

wherein M is the total number of samples, y _m In order to predict the score of a score,true scores.

A computer device comprising a memory and a processor, the memory storing a computer program, said processor implementing the steps of a video intelligent recommendation method based on random matrix coding and a reduced convolutional network when executing said computer program.

A computer readable storage medium having stored thereon a computer program which when executed by a processor implements a video intelligent recommendation method based on random matrix coding and a reduced convolutional network.

The beneficial effects of the application are as follows: the video intelligent recommendation method based on random matrix coding and simplified convolutional network converts the fields into numbers in the data category processing fields as input, so that the dimension of an embedded layer is lower than the input of the traditional one-hot coding, and the time and the power consumption for calculation are smaller. On the other hand, the problem of cold start of the user and the video can be fundamentally solved after the training model is obtained, when a new user or a new video is added into the system, the current existing video or the user characteristic matrix and the user and video characteristics can be simply calculated to obtain possible scores of the user on different videos, and moreover, clustering of the user and the video can be realized by only calculating the cosine similarity of the corresponding characteristic vector and the characteristic matrix, and the collaborative filtering algorithm can be optimized by the result.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of a recommendation method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a model structure of a recommendation method according to an embodiment of the present application;

FIG. 3 is a schematic flowchart of an algorithm of a recommendation method according to an embodiment of the present application;

fig. 4 is a schematic diagram of a UserID coding matrix according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application and not exhaustive of all embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

1-4, the video intelligent recommendation method based on random matrix coding and simplified convolution network comprises the following steps:

step one, preprocessing a data set; the data set is data which can be collected by a video website under a general application scene, in this embodiment, a Movielens data set is taken as an example, and the Movielens data set includes rating data of a plurality of users for a plurality of movies, and also includes movie metadata information and user attribute information. This dataset is often used as a test dataset for a recommendation system, machine learning algorithm. Especially in the field of recommendation systems. The abstract level of the data set is high, wherein the 1M version of the data set contains 1 hundred million comments of 6000 users on nearly 4000 movies, wherein the user information comprises fields such as user ID, gender, age, occupation ID, postal code and the like, and the postal code field is not used in the training process because the field is difficult to acquire in a practical application environment. The movie data includes fields such as movie ID, movie name, and movie style, and the scoring data includes fields such as user ID, movie ID, score, and time stamp, respectively.

The specific method for preprocessing the data set in this embodiment is: dividing the data into user features and video features; the user features specifically include: user occupation and video type, user age, user gender, and user video ID; the video features specifically include: video name and user name.

Refer specifically to tables 1 and 2. Table 1 is a movieens user data sample table and table 2 is a movieens user scoring data sample table.

Table 1 movieens user data sample

Table 2 shows a sample of Movielens user scoring data

UserID	MovieID	Rating	timestamps
				0	1193	5	978300760
1	661	3	978302109
				2	914	3	978301968
3	3408	4	978300275
				4	2355	5	978824291

The Rating field in table 2 is the target we learn, and is the actual score of the corresponding user for the movie.

Step two, generating random matrix codes for the user characteristics through random vectors, and generating a data coding matrix; the specific method comprises the following steps: converting F and M of a Gender (user Gender) field into O and 1; converting the Age field into seven consecutive numbers of 0-6; specifically, the ages of the users are divided into seven age groups, for example, 0-15 is uniformly mapped into 1,15-25 is mapped into 2, and the mapping is performed. The Genres (video type) field is a classification field, so the categories therein will be converted into a dictionary of character strings to numbers first, and because some movies are a combination of multiple Genres (video types), the Genres (video type) field of each movie is converted into a digital list; the Zip-code field, the timetables field, the userID field, the occupancy ID field, and the movieID field remain unchanged.

In this embodiment, taking UserID as an example, a random matrix coding sample is created to code five user IDs in table 2, and if the dimension of the input feature vector required by the subsequent full-connection layer is 8, the five userids are coded (the selected random number interval is [0,1 ]), and the result matrix is referred to fig. 4. Each row is a feature vector corresponding to an ID, and the matrix is used as a coding matrix corresponding to the user ID to be put into the next layer for calculation.

The specific idea about the random matrix coding algorithm is as follows:

1. the method is characterized in that the method is used for combing information to be processed, screening out a user characteristic of a user id, which is a user characteristic with little meaning on training results and a large number but indispensable, wherein the user characteristic can be used for uniquely identifying the user identity, but the user information is only a simple number, has little meaning on presuming user preference, has more data quantity and meets the applicable condition of a algorithm.

2. The size of the vector generated by the architecture definition algorithm for the data volume and the deep learning model used, in this case we and assuming that the data volume of the user is x, the vector dimension expected by the user is y, the random number interval size defined by the user is [ a, b ], the minimum division unit in the interval defined by the user is c, and then the probability of the identical vector appearing for any one of the user data is as follows: x/((b-a)/c)/(y).

For example, the data set has 20000 pieces of data, the neural network used needs a vector with dimension 16 as input, and it is reasonable to define the floating point number between the coding intervals [0,1] generated by the algorithm, because the similarity probability of the two vectors is already very close to 0.

3. After the two steps are completed, the data are required to be preprocessed, firstly, the coding fineness required by the experimental data is reasonably judged according to the experimental requirements, for example, if the used data comprise user age fields, if the user ages have important significance for experimental results, the mapping of each user age and the feature vector can be established according to the requirements, and if the user ages have little significance for the experimental results, a more reasonable rule is to divide the user ages into a plurality of age groups so as to reduce the calculation amount of the features, thereby integrally reducing the algorithm operation requirements.

4. The generation of random numbers in each random vector can be realized by calling specific random number functions according to the used language, and the recommended value-taking mode is floating point number with the interval of [ -1,1 ]. After all vectors are generated and combined, a mapping between the raw data and the matrix is established for use in a subsequent deep learning algorithm.

Step three, transmitting the data coding matrix in the step two to a first full-connection layer to obtain a user characteristic vector; the specific method comprises the following steps: the relu function is used, which is self-contained in the activation function TensorFlow.

Transmitting the video features to a simplified convolutional network to generate a simplified text convolutional network code, and generating a video feature matrix; the specific method comprises the following steps:

step four, inputting video features in an embedding layer of the simplified convolution network, and generating an embedding matrix by embedding vectors of each word in the video features;

step four, convolving on the embedded matrix in a convolution layer of the simplified convolution network by using a plurality of convolution kernels with different sizes, wherein each convolution covers a plurality of words;

and fourthly, reducing the dimension of the vector of the convolution layer in the max pooling layer of the simplified convolution network and splicing to obtain a long vector.

Step five, transmitting the video feature matrix in the step four to a second full-connection layer to obtain a video feature vector;

the specific method for obtaining the video feature vector is as follows: transmitting the long vector in the fourth step to a second full connection layer for normalization by using dropout to obtain a video feature vector; obtaining the video feature vector uses a hyperbolic tangent function of the activation function f (x) =tanh (x).

With respect to the embedding layer, this embodiment takes "I play football with … …" as an example, each word is represented using a row vector of 1*6, and then the 4 words are stacked in a vertical fashion into a two-dimensional matrix. The shape of the two-dimensional matrix is 7 words.

Regarding the convolution layers, this embodiment takes 3 convolution kernels of different sizes as an example, namely 2 convolution kernels of 45, 2 convolution kernels of 35 and 2 convolution kernels of 2*5. One dimension of the convolution kernel is determined, equal to the dimension d of the word vector. The data here is sequence data so that the convolution operation is no longer a two-dimensional convolution in the image, but a one-dimensional convolution, the convolution kernel only translating in the height dimension.

Regarding the max pooling layer, after convolution, max pooling is performed, and the results after max pooling each feature are also stacked in a vertical fashion.

With respect to the second full connection layer, after the pooling operation, calculation of full connection is performed, and activation is performed through an activation function.

Step six, calculating a predictive score through the user feature vector and the video feature vector, and carrying out fitting training on the predictive score and the real score; training was performed using a mean square error function to optimize loss, looking at loss around 1 using tensorboard.

The specific method for calculating the prediction score is as follows: calculating a prediction score using vector multiplication;

the specific method for fitting training with the true scores is that the following formula is adopted for calculation:

In this embodiment, the M takes the total number of user scoring table entries, and finally the learned user performs fitting training on the score of the video and the Rating field in table 2 by using the minimum mean square error, and for the whole full data set, 90% of the total data set is adopted as the training set, and 10% is adopted as the verification set.

The training process uses Mini_batch as an optimization algorithm, namely gradient descent of small batches, the algorithm divides data into a plurality of batches, and parameters are updated according to the batches, so that a group of data in one batch jointly determines the direction of the gradient, deviation is not easy to occur when the gradient is descended, and randomness is reduced. On the other hand, because the number of samples of the batch is much smaller than the whole data set, the calculation is not so large.

And step seven, video recommendation is carried out on the user through the prediction scores.

The computer device of the present application may be a device including a processor and a memory, such as a single chip microcomputer including a central processing unit. And the processor is used for realizing the steps of the recommendation method based on the CREO software and capable of modifying the recommendation data driven by the relation when executing the computer program stored in the memory.

The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Computer-readable storage medium embodiments

The computer readable storage medium of the present application may be any form of storage medium readable by a processor of a computer apparatus, including but not limited to, nonvolatile memory, volatile memory, ferroelectric memory, etc., having a computer program stored thereon, which when read and executed by the processor of the computer apparatus, can implement the steps of the above-described modeling method based on the CREO software, which can modify the modeling data driven by the relationship.

The computer program comprises computer program code which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

While the application has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the application as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The disclosure of the present application is intended to be illustrative, but not limiting, of the scope of the application, which is defined by the appended claims.

Claims

1. The intelligent video recommendation method based on random matrix coding and simplified convolution network is characterized by comprising the following steps:

s1, preprocessing a data set, wherein the data set comprises rating data of a plurality of users on a plurality of movies, and also comprises movie metadata information and user attribute information; dividing the data into user features and video features; the user features specifically include: user occupation and video type, user age, user gender, and user video ID; the video features specifically include: video name and user name;

s2, generating random matrix codes for user characteristics through random vectors, and generating data coding matrixes, wherein the specific method comprises the following steps of: converting F and M of the Gender fields to O and 1; converting the Age field into seven consecutive numbers of 0-6; the categories in the Genres field are converted into a dictionary of character strings to numbers, and the Genres field of each video is converted into a digital list; the Zip-code field, timetables field, userID field, occupancy ID field, and movieID field remain unchanged;

carding the information to be processed, and screening the data dimension which has no meaning on training results and is more and indispensable;

for the data volume and the vector size generated by the architecture definition algorithm of the deep learning model, let the data volume of the user be x, the vector dimension expected by the user be y, the random number interval size defined by the user be [ a, b ], the minimum segmentation unit in the interval defined by the user be c, at this time, for any one of the user data, the probability of the identical vector appearance is as formula: x/((b-a)/c)/(y);

preprocessing the data, and judging the fineness of the experimental data to be encoded according to the experimental requirement;

the generation of random numbers in each random vector is realized by calling a specific random number function according to a programming language used by the random vector, and the recommended value is a floating point number with a range of [ -1,1 ]; combining all vectors after they are generated to create a mapping between the raw data and the matrix;

s3, transmitting the data coding matrix in the step S2 to a first full-connection layer to obtain a user feature vector, wherein the specific method is as follows: using the relu function of the activation function TensorFlow;

s4, transmitting the video features to a simplified convolutional network to generate a simplified text convolutional network code and generate a video feature matrix, wherein the specific method comprises the following steps of:

s4.3, reducing the dimension of the vector of the convolution layer in the maxpooling layer of the simplified convolution network and splicing to obtain a long vector;

s5, transmitting the video feature matrix in the step S4 to a second full-connection layer to obtain video feature vectors, wherein the specific method comprises the following steps: transmitting the long vector in the step S4.3 to a second full-connection layer for normalization by using dropout to obtain a video feature vector; obtaining a video feature vector using a hyperbolic tangent function of an activation function f (x) =tanh (x);

s6, calculating a prediction score through the user feature vector and the video feature vector, and fitting and training the prediction score and the real score, wherein the specific method is as follows: calculating a prediction score using vector multiplication; the specific method for fitting training is calculated by the following formula:

wherein M is the total number of samples, y _m In order to predict the score of a score,true scoring;

s7, video recommendation is conducted on the user through prediction scoring.

2. A computer apparatus, characterized in that: comprising a memory storing a computer program and a processor implementing the steps of the video intelligent recommendation method based on random matrix coding and a reduced convolutional network according to claim 1 when said computer program is executed.

3. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program, when executed by a processor, implements the video intelligent recommendation method based on random matrix coding and a reduced convolutional network of claim 1.