CN115757932A

CN115757932A - Short video recommendation method and device, computer equipment and storage medium

Info

Publication number: CN115757932A
Application number: CN202210709827.4A
Authority: CN
Inventors: 王志峰; 马胡双
Original assignee: Dongguan Bubugao Education Software Co ltd
Current assignee: Dongguan Bubugao Education Software Co ltd
Priority date: 2022-06-21
Filing date: 2022-06-21
Publication date: 2023-03-07

Abstract

The invention provides a short video recommendation method and device, computer equipment and a storage medium, wherein the short video recommendation method comprises the following steps: acquiring user characteristics and video characteristics of a short video; inputting the obtained features into a pre-trained first neural network model to obtain a first feature matrix, wherein the weights of the features in the first feature matrix are the same; inputting the obtained features into a pre-trained second neural network model to obtain a second feature matrix, and giving different weights to the second feature matrix according to the importance degree of each feature; scoring each short video based on the first feature matrix and the second feature matrix; and carrying out short video recommendation according to the scores of the short videos. In the process, in addition to the operations of high-order feature extraction, feature intersection and the like, the contribution of different features to the overall effect is considered, important features are endowed with high weight, and secondary features are endowed with low weight, so that the accuracy of capturing the preference change of a user is improved, and the short video recommendation is realized more specifically.

Description

Short video recommendation method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a short video recommendation method and apparatus, a computer device, and a storage medium.

Background

In recent years, with the rapid development of the internet, a short video platform has achieved great success, and both the number of users and the content of short video content show explosive growth. Under the background, how to more quickly acquire interesting contents by a user and perform personalized recommendation for the user is very important to find out the video contents which are possibly interesting by the user in massive video resources and improve the user viscosity.

At present, a conventional video recommendation algorithm generally includes a video screening stage and a video scoring stage, wherein the video screening stage screens out a video set that may be interested by a user from a large amount of videos, and the video scoring stage scores and sorts the screened videos according to the preference of the user. However, most video recommendation algorithms do not consider the importance of the video features alone, that is, do not consider the contribution of the features to the overall effect of the video, thereby affecting the accuracy of video recommendation.

Disclosure of Invention

The invention aims to provide a short video recommendation method and device, computer equipment and a storage medium, and effectively solve the technical problem that the existing short video recommendation method is low in accuracy.

The technical scheme provided by the invention is as follows:

in one aspect, the present invention provides a short video recommendation method, including:

acquiring user characteristics and video characteristics of a short video;

inputting the obtained features into a pre-trained first neural network model to obtain a first feature matrix, wherein the weights of the features in the first feature matrix are the same;

inputting the obtained features into a pre-trained second neural network model to obtain a second feature matrix, wherein different weights are given to the second feature matrix according to the importance degree of each feature;

scoring each short video based on the first feature matrix and the second feature matrix;

and recommending the short videos according to the scores of the short videos.

Further preferably, the first neural network model is composed of a plurality of feature extraction blocks connected in series, each feature extraction block comprises a multi-head neural network, a first standardized residual error network, a feature enhancement network and a second standardized residual error network which are connected in sequence, and the multi-head neural network comprises a plurality of self-attention networks;

the feature extraction step in each feature extraction module comprises the following steps of inputting the obtained features into a pre-trained first neural network model to obtain a first feature matrix:

inputting the input features into a multi-head neural network to perform linear transformation to obtain a first matrix, wherein for a first feature extraction block, the input features are the acquired user features and the video features of the short video, and for other feature extraction blocks, the input features are the output of a previous feature extraction block;

inputting the input features and the first matrix into a first standardized residual error network to perform residual error and standardized operation to obtain a second matrix;

performing nonlinear transformation on the second matrix input feature enhancement network to obtain a third matrix;

and inputting the second matrix and the third matrix into a second standardized residual error network for residual error and standardization operation.

Further preferably, the second neural network model comprises a feature compression network, a feature importance prediction network and a feature calibration network which are connected in sequence;

inputting the obtained features into a pre-trained second neural network model to obtain a second feature matrix comprises:

inputting the obtained features into a feature compression network to perform average pooling operation so as to compress the input features;

inputting the compressed features into a feature importance prediction network to predict the weights of different features, wherein the feature importance prediction network comprises two full-connection layers connected behind the feature compression network;

and inputting the predicted weights of different characteristics into a characteristic calibration network, weighting the weights to the corresponding characteristics, and completing the calibration of the acquired user characteristics and video characteristics.

Further preferably, the scoring each short video based on the first feature matrix and the second feature matrix, and then performing short video recommendation includes:

transversely splicing the first feature matrix and the second feature matrix to obtain a spliced feature matrix;

and inputting the splicing feature matrix to a full-connection layer for scoring calculation to obtain a score for each short video.

Further preferably, the recommending short videos according to the scores of the short videos comprises:

sequencing the short videos from large to small according to the scores, and determining a sequencing queue of the short videos;

recommending short videos according to the sorting queue; or

After the user characteristics and the video characteristics of the short video are obtained, the method further comprises the following steps:

and performing dimension reduction operation on the user characteristics and the video characteristics.

In another aspect, the present invention provides a short video recommendation apparatus, including:

the characteristic acquisition module is used for acquiring user characteristics and video characteristics of the short video;

the first feature extraction module is used for inputting the acquired features into a pre-trained first neural network model to obtain a first feature matrix, and the weights of the features in the first feature matrix are the same;

the second feature extraction module is used for inputting the acquired features into a pre-trained second neural network model to obtain a second feature matrix, and different weights are given to the second feature matrix according to the importance degree of each feature;

the scoring module is used for scoring each short video based on the first feature matrix and the second feature matrix;

and the video recommending module is used for recommending the short videos according to the scores of the short videos.

Further preferably, in the first feature extraction module, the first neural network model is composed of a plurality of feature extraction blocks connected in series, each of the feature extraction blocks includes a multi-headed neural network, a first standardized residual error network, a feature enhancement network and a second standardized residual error network connected in sequence, wherein,

the multi-head neural network comprises a plurality of self-attention networks and is used for carrying out linear transformation on input features to obtain a first matrix, wherein for a first feature extraction block, the input features are obtained user features and video features of short videos, and for other feature extraction blocks, the input features are output of a previous feature extraction block;

the first standardized residual error network is used for carrying out residual error and standardized operation on the input features and the first matrix to obtain a second matrix;

the characteristic enhancement network is used for carrying out nonlinear transformation on the second matrix to obtain a third matrix;

the second normalized residual network is used for performing residual and normalization operations on the second matrix and the third matrix.

Further preferably, in the second feature extraction module, the second neural network model includes a feature compression network, a feature importance prediction network and a feature calibration network, which are connected in sequence; wherein, the first and the second end of the pipe are connected with each other,

the characteristic compression network is used for performing average pooling operation on the acquired characteristics so as to compress the input characteristics;

the feature importance prediction network is used for predicting the weight of the compressed features and comprises two full connection layers connected behind the feature compression network;

the feature calibration network is used for weighting the predicted weights of different features to corresponding features to finish the re-calibration of the obtained user features and the video features.

Further preferably, the scoring module comprises a feature splicing network and a scoring network connected with each other, wherein,

the characteristic splicing network is used for transversely splicing the first characteristic matrix and the second characteristic matrix to obtain a spliced characteristic matrix;

and the scoring network is used for carrying out scoring calculation on the splicing feature matrix to obtain a score for each short video.

Further preferably, the video recommendation module comprises a sorting unit and a recommendation unit connected to each other, wherein,

the sorting unit is used for sorting the short videos from large to small according to the scores and determining a sorting queue of the short videos;

the recommendation unit is used for recommending the short video according to the sorting queue; or

The short video recommendation device further comprises a dimension reduction module used for performing dimension reduction operation on the user characteristics and the video characteristics.

In another aspect, the present invention provides a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the short video recommendation method steps described above.

In another aspect, the present invention provides a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the short video recommendation method steps described above.

According to the short video recommendation method and device, the computer equipment and the storage medium, the first neural network model and the second neural network model are combined to simultaneously extract the features of the obtained user features and the video features, in the process, besides operations such as high-order feature extraction and feature intersection, the contribution of different features to the overall effect is considered, high weight is given to important features, low weight is given to minor features, particularly, the contribution of partial low-frequency features to the overall effect can be fully mined, the accuracy of capturing the preference change of the user is improved, the short video recommendation is realized more pertinently, and the user experience is enhanced.

Drawings

The foregoing features, technical features, advantages and implementations of which will be further described in the following detailed description of the preferred embodiments in a clearly understandable manner in conjunction with the accompanying drawings.

FIG. 1 is a flowchart illustrating a short video recommendation method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a short video recommendation method according to another embodiment of the present invention;

FIG. 3 is a diagram of a first neural network model architecture according to the present invention;

FIG. 4 is a block diagram illustrating the structure of a feature extraction block according to the present invention;

FIG. 5 is a diagram of a second neural network model architecture according to the present invention;

FIG. 6 is a flowchart illustrating an embodiment of a short video recommendation apparatus according to the present invention;

FIG. 7 is a schematic flow chart illustrating an apparatus for short video recommendation according to another embodiment of the present invention;

FIG. 8 is a schematic diagram of a computer device according to the present invention.

The reference numbers illustrate:

100-a first neural network model, 110-a feature extraction block, 111-a multi-head neural network, 112-a first standardized residual error network, 113-a feature enhancement network, 114-a second standardized residual error network, 200-a second neural network model, 210-a feature compression network, 220-a feature importance prediction network, 230-a feature calibration network, 300-a short video recommendation device, 310-a feature acquisition module, 320-a first feature extraction module, 330-a second feature extraction module, 340-a scoring module, 350-a video recommendation module and 360-a dimensionality reduction module.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, specific embodiments of the present invention will be described below with reference to the accompanying drawings. It is to be understood that the drawings in the following description are merely exemplary of the invention and that other drawings and embodiments may be devised by those skilled in the art without the use of inventive faculty.

An embodiment of the present invention, as shown in fig. 1, is a short video recommendation method, including:

s10, acquiring user characteristics and video characteristics of a short video;

s20, inputting the acquired features into a pre-trained first neural network model to obtain a first feature matrix, wherein the weights of the features in the first feature matrix are the same;

s30, inputting the acquired features into a pre-trained second neural network model to obtain a second feature matrix, and giving different weights to the second feature matrix according to the importance degree of each feature;

s40, scoring each short video based on the first feature matrix and the second feature matrix;

and S50, recommending the short videos according to the scores of the short videos.

In this embodiment, because the interests and hobbies of each user are different from daily requirements, when the terminal software recommends the short videos for the users, in order to make the short videos recommended for each user more targeted and meet the requirements and hobbies of the users better, the terminal software scores according to the user characteristics and the video characteristics of the short videos, and further makes corresponding short video recommendations according to the scores. The terminal software is software used for providing short videos in the terminal, such as simian tutoring and the like, the user characteristics can be user age, user gender, an age watching list and the like, the video characteristics of the short videos can be cover art information, video teacher information, video knowledge point information and the like corresponding to the videos, and the video characteristics are not specifically limited and can be adjusted according to actual application.

In practical application, the obtained user features and video features of the short video are generally high-dimensional original data, and if subsequent calculation is directly performed according to the high-dimensional original data, calculation is complex and accuracy is not high, so in other embodiments, in order to improve calculation efficiency, after the user features and the video features are obtained, a step S60 of performing a dimension reduction operation on the obtained features is further included, as shown in fig. 2, the high-dimensional original data are reduced into low-dimensional data, specifically, the application features and the video features are input to an embedding layer (embedding layer) to perform linear projection, and the low-dimensional user features and the low-dimensional video features are obtained.

After the user characteristics and the video characteristics are obtained, the user characteristics and the video characteristics are respectively input into the first neural network model and the second neural network model for characteristic extraction, two different methods are adopted for characteristic extraction in the process, the characteristic abundance degree in the subsequent short video scoring process is greatly improved, different weights are given to the characteristics from different aspects, the contributions of the different characteristics to video recommendation are mined, particularly for low-frequency characteristics which can be ignored in the traditional recommendation method, the characteristics are fully mined in the embodiment so as to improve the accuracy of the short video recommendation and the satisfaction degree of the user on the recommended short video.

The following description is made of the first neural network model and the second neural network model, and after the first neural network model and the second neural network model are created, the first neural network model and the second neural network model are trained based on a training set and a verification set formed by a large number of user features and video features, and after the training is completed, the first neural network model and the second neural network model are applied to practice to recommend short videos.

As shown in fig. 3 and 4, the first neural network model 100 is composed of a plurality of serially connected feature extraction blocks 110 (including feature extraction block 1, feature extraction block 2, \ 8230;, and feature extraction block n, as shown in the figure), and each feature extraction block 110 includes a multi-head neural network 111 (including a plurality of self-attention networks), a first normalized residual network 112, a feature enhancement network 113, and a second normalized residual network 114, which are connected in sequence.

Based on this, in step S20, the obtained features are input into the pre-trained first neural network model 100 to obtain a first feature matrix, and the feature extraction step in each feature extraction module includes:

s21, inputting the input features into the multi-head neural network 111 to perform linear transformation to obtain a first matrix, wherein for a first feature extraction block, the input features are the acquired user features and the video features of the short video, and for other feature extraction blocks, the input features are the output of a previous feature extraction block.

A first matrix MultiHead (x) obtained by transforming a multi-head neural network (including a plurality of self-attention networks) in the jth feature extraction module is as follows:

MultiHead(x)＝Concat(head _j1 ,…,head _jh )W ^jO (1)

wherein j is an index of the feature extraction module; concat (. Cndot.) represents a connection function;

the weight matrixes respectively representing the ith self-attention network in the jth multi-head neural network (corresponding to the multi-head neural network in the jth characteristic extraction module) can be obtained by initialization; head _ji Representing the output of the ith self-attention network in the jth multi-headed neural network, i =1,2, \8230;, h; q _j 、K _j 、V _j Q (Query) matrix, K (Key) matrix and V (Value) matrix respectively representing the jth multi-head neural network, in this embodiment Q _j ＝K _j ＝V _j = x, x denotes input characteristics; w ^jO The weight matrix corresponding to the j-th multi-head neural network can be obtained by initialization.

For any self-attention network in the multi-head neural network 111, the output is as follows (3):

wherein Q, K and V respectively represent a Q matrix, a K matrix and a V matrix of the self-attention network; d _k The number of columns of the Q, K matrix, i.e., the vector dimension, is represented.

S22, inputting the input features and the first matrix into the first standardized residual error network 112 for residual error and standardized operation to obtain a second matrix L ₁ As in formula (4):

L ₁ ＝LayerNorm(x+MultiHead(x))(4)

s23, the second matrix is input into the feature enhancement network 113 to be subjected to nonlinear transformation to obtain a third matrix.

In an example where the enhancement network is a feed-forward neural network comprising two layers, and the activation function of the first layer is ReLU and the activation function of the second layer is a linear activation function, the resulting third matrix FFN (x) is as follows (5):

FFN(x)＝ReLU(xW _j1 +b _j1 )W _j2 +b _j2 (5)

wherein, W _j1 ,W _j2 Representing the weight matrices corresponding to the first and second layers, respectively, b _j1 ,b _j2 Respectively representing the corresponding bias terms of the first layer and the second layer.

S24, inputting the second matrix and the third matrix into the second normalized residual error network 114 for residual error and normalization operation, and outputting a result L ₂ As shown in formula (6):

L ₂ ＝LayerNorm(L ₁ +FFN(x))(6)

when the feature extraction block is not the last feature extraction block connected in series in the first neural network model, the output result L is ₂ Will be the input of the next feature extraction block; correspondingly, when the feature extraction block is the last feature extraction block connected in series in the first neural network model, the output result L is output ₂ I.e. the output of the first neural network model, i.e. the first feature matrix. In practical applications, the number of the feature extraction blocks connected in series in the first neural network model may be determined according to actual requirements, such as setting 2, 3, or even more.

As shown in fig. 5, the second neural network model 200 includes a feature compression network 210, a feature importance prediction network 220, and a feature scaling network 230, which are connected in sequence.

Based on this, the step S30 of inputting the obtained features into the pre-trained second neural network model 200 to obtain a second feature matrix includes:

s31, inputting the obtained features into the feature compression network 210 to perform average pooling operation to compress the input features, and compressing the result F _sq (e _m ) As shown in formula (7):

wherein, F _sq () represents a compression function; the input feature matrix is E = [ E = [ ] ₁ ,…e _m ,…,e _n ]M =1,2, \8230;, n; k' represents a vector e _m Dimension of (i.e. e) _m Is a vector of dimension K', t represents a vector e _m T =1,2, \8230;, K' of the t-th cycle.

The feature compression process compresses two-dimensional features into a real number, namely a compression result F _sq (e _i ) The dimension of the output is matched with the number of the input features, and the global information of the dimension features is represented.

S32 inputs the compressed features into the feature importance prediction network 220 to predict the weights of different features, where the feature importance prediction network 220 includes two fully-connected layers (including a first fully-connected layer and a second fully-connected layer) connected to the feature compression network 210, that is, the two fully-connected layers are used to learn the importance of the features output in step S31, and the prediction result a is as follows (8):

A＝F _ex (F _sq (e _m ))＝σ ₂ (W ₂ σ ₁ (W ₁ F _sq (e _m )))(8)

wherein, F _ex () represents a prediction function; sigma ₁ The activation function, σ, representing the first fully-connected layer ₂ Representing the activation function of the second fully-connected layer, W ₁ Weight matrix, W, representing the first fully-connected layer ₂ A weight matrix representing the second fully-connected layer.

S33, inputting the predicted weights of different features into the feature calibration network 230, weighting the weights to the corresponding features (scaling the importance of the originally acquired features), completing the recalibration of the acquired user features and video features, and obtaining a result F _scale (A, E) are of formula (8):

F _scale (A,E)＝[a ₁ .e ₁ ,…,a _n .e _n ]＝[v ₁ ,…,v _n ](9)

wherein E = [ E = ₁ ,e ₂ ,…,e _n ]Input matrix representing a characteristic compression network, a = [ a = [ a ] ₁ ,a ₂ ,…,a _n ]A weight matrix representing the prediction, wherein a ₁ Corresponds to as feature e ₁ V is a prediction weight matrix of ₁ ＝a ₁ .e ₁ And so on.

After the first feature matrix and the second feature matrix are obtained based on the method, the short videos are scored immediately, and the method comprises the following steps:

s41, transversely splicing the first feature matrix and the second feature matrix to obtain a spliced feature matrix;

and S42, inputting the splicing characteristic matrix into the full-connection layer for scoring calculation to obtain scores for the short videos.

The splicing feature matrix is formed by transversely splicing a first feature matrix and a second feature matrix, and the first feature matrix is assumed to be y ₁ ,y ₂ ,…,y _p ]The second feature matrix is [ z ] ₁ ,z ₂ ,…,z _q ]Then the obtained splicing feature matrix after splicing is [ y ₁ ,y ₂ ,…,y _p ,z ₁ ,z ₂ ,…,z _q ]Or [ z ] ₁ ,z ₂ ,…,z _q ,y ₁ ,y ₂ ,…,y _p ]. The scoring network is composed of a plurality of fully-connected networks for further feature crossing, and features after crossing are activated and output through sigmoid to serve as the scoring of the video at the opposite end of the user.

After the scores are obtained, in step S50, performing short video recommendation according to the score of each short video includes:

s51, sequencing the short videos from large to small according to the scores, and determining a sequencing queue of the short videos;

and S52, short video recommendation is carried out according to the sorting queue.

In the process, the short videos are sorted from large to small according to the scores of the short videos, a sorting queue of the short videos is determined, and then the short videos are recommended according to the sorting queue. For example, the scores for short video a, short video B, short video C, and short video D are: 70. 60, 80 and 65, sorting the short videos into a sorting queue of a short video C, a short video A, a short video D and a short video B from large to small, and recommending the short videos according to the sorting queue. In practical application, if short videos with the same score appear, the short videos can be randomly ordered, and a recommendation rule can be set in advance according to user characteristics, for example, short videos corresponding to a user grade are recommended preferentially.

Another embodiment of the present invention, a short video recommendation apparatus 300, as shown in fig. 5, includes: a feature obtaining module 310, configured to obtain user features and video features of the short video; a first feature extraction module 320, configured to input the obtained features into a pre-trained first neural network model 100 to obtain a first feature matrix, where weights of features in the first feature matrix are the same; the second feature extraction module 330 is configured to input the obtained features into the pre-trained second neural network model 200 to obtain a second feature matrix, where different weights are given to the second feature matrix according to the importance degree of each feature; a scoring module 340, configured to score each short video based on the first feature matrix and the second feature matrix; and the video recommending module 350 is configured to recommend the short videos according to the scores of the short videos.

In this embodiment, because the interests and hobbies of each user are different from the daily needs, when the terminal software (the short video recommendation device is applied to the terminal) recommends the short videos for the users, in order to make the short videos recommended for each user more targeted and meet the needs and hobbies of the users better, the terminal software scores according to the user characteristics and the video characteristics of the short videos, and further performs corresponding short video recommendation according to the scores. The terminal software is software for providing short videos in the terminal, such as simian tutoring and the like, the user characteristics can be user age, user gender, user grade and the like, and the video characteristics of the short videos can be textbooks, chapters, contained knowledge points and the like corresponding to the videos.

In practical applications, the obtained user features and the video features of the short video are generally high-dimensional original data, and if subsequent calculation is directly performed according to the high-dimensional original data, calculation is complex and accuracy is not high, so in other embodiments, in order to improve calculation efficiency, a dimension reduction module 360 is further configured in the short video recommendation device 300, as shown in fig. 6, to perform a dimension reduction operation on the user features and the video features, so as to reduce the high-dimensional original data into low-dimensional data. In practical application, the dimension reduction module may be an embedding layer (embedding layer), and performs linear projection on the application features and the video features to obtain low-dimensional user features and video features.

After the user characteristics and the video characteristics are obtained, the user characteristics and the video characteristics are respectively input into the first neural network model 100 and the second neural network model 200 for characteristic extraction, two different methods are adopted for characteristic extraction in the process, the characteristic abundance degree in the subsequent short video scoring process is greatly improved, different weights are given to the characteristics from different aspects, the contributions of the different characteristics to video recommendation are mined, particularly for low-frequency characteristics which can be ignored in the traditional recommendation method, and the characteristics are fully mined in the embodiment to improve the accuracy of short video recommendation and the satisfaction degree of the user on the recommended short video.

As further described below with respect to each network module in the short video recommendation apparatus 300, after the network model is created, it is trained based on a training set and a verification set formed by a large number of user features and video features, and after training is completed, it is applied to practice to recommend short videos.

As shown in fig. 3 and 4, the first neural network model 100 in the first feature extraction module 320 is composed of a plurality of serially connected feature extraction blocks 110 (as shown in the figure, feature extraction block 1, feature extraction block 2, \ 8230;, feature extraction block n are included), each feature extraction block 110 includes a multi-head neural network 111, a first normalized residual network 112, a feature enhancement network 113, and a second normalized residual network 114, which are connected in sequence, wherein,

the multi-head neural network 111 includes multiple self-attention networks, and is configured to perform linear transformation on input features to obtain a first matrix, where for a first feature extraction block, the input features are obtained user features and video features of a short video, and for other feature extraction blocks, the input features are output of a previous feature extraction block; a first matrix MultiHead (x) obtained by transforming a multi-head neural network (comprising a plurality of self-attention networks) in the jth feature extraction module is expressed as formulas (1) to (2).

The first normalized residual error network 112 is used for performing residual error and normalization operations on the input features and the first matrix to obtain a second matrix L ₁ As shown in formula (4).

The feature enhancement network 113 is configured to perform nonlinear transformation on the second matrix to obtain a third matrix; in the example where the enhancement network is a feedforward neural network comprising two layers, and the activation function of the first layer is ReLU and the activation function of the second layer is a linear activation function, the resulting third matrix FFN (x) is as in equation (5).

The second normalized residual error network 114 is used for performing residual error and normalization operations on the second matrix and the third matrix, and outputting a result L ₂ As shown in formula (6).

When the feature extraction block is not the last feature extraction block connected in series in the first neural network model, the feature extraction block outputs a result L ₂ Will be the input to the next feature extraction block; correspondingly, when the feature extraction block is the last feature extraction block connected in series in the first neural network model, the output result L is output ₂ I.e. the output of the first neural network model, i.e. the first feature matrix. In practical applications, the number of the feature extraction blocks connected in series in the first neural network model may be determined according to actual requirements, such as setting 2, 3, or even more.

As shown in fig. 5, in the second feature extraction module 330, the second neural network model 200 includes a feature compression network 210, a feature importance prediction network 220, and a feature calibration network 230, which are connected in sequence; wherein, the first and the second end of the pipe are connected with each other,

the feature compression network 210 is used for performing an average pooling operation on the acquired features to compress the input features, and the compression result F _sq (e _i ) As shown in formula (7). This process compresses the two-dimensional features into a real number, the compression result F _sq (e _i ) The dimension of the output is matched with the number of the input features, and the global information of the dimension features is represented.

The feature importance prediction network 220 is used for predicting the weight of the compressed features, and includes two fully-connected layers (including a first fully-connected layer and a second fully-connected layer) connected behind the feature compression network 210, and the prediction result a is as shown in formula (8).

The feature scaling network 230 is configured to weight the predicted weights of the different features to corresponding features, and to perform scaling on the obtained user features and video features, resulting in F _scale (A, E) is represented by the formula (8).

After the first feature extraction module 320 and the second feature extraction module 330 obtain the first feature matrix and the second feature matrix, each short video is then scored based on the network structure in the scoring module 340. Specifically, the scoring module 340 includes a feature splicing network and a scoring network connected to each other, where the feature splicing network is configured to transversely splice a first feature matrix and a second feature matrix to obtain a spliced feature matrix; and the scoring network is used for carrying out scoring calculation on the splicing feature matrix to obtain a score for each short video.

The splicing feature matrix is formed by transversely splicing a first feature matrix and a second feature matrix, and the first feature matrix is assumed to be y ₁ ,y ₂ ,…,y _p ]The second feature matrix is [ z ] ₁ ,y ₂ ,…,z _q ]And then the splicing characteristic matrix obtained after splicing is [ y ₁ ,y ₂ ,…,y _p ,z ₁ ,z ₂ ,…,z _q ]Or [ z ] ₁ ,z ₂ ,…,z _q ,y ₁ ,y ₂ ,…,y _p ]. The scoring network is composed of a plurality of fully-connected networks for further feature crossing, and features after crossing are activated and output through sigmoid to serve as the scoring of the video at the opposite end of the user.

After the score is obtained, the video recommendation module 350 recommends videos based on the result, specifically, the video recommendation module 350 includes a sorting unit and a recommendation unit which are connected with each other, where the sorting unit is configured to sort the short videos from large to small according to the score, and determine a sorting queue of the short videos; and the recommendation unit is used for recommending the short videos according to the sorting queue.

In the process, the sorting unit sorts the short videos from large to small according to the scores of the short videos, a sorting queue of the short videos is determined, and then the recommending unit recommends the short videos according to the sorting queue. For example, the scores for short video a, short video B, short video C, and short video D are: 70. 60, 80 and 65, sorting the short videos into a sorting queue of a short video C, a short video A, a short video D and a short video B from large to small, and recommending the short videos according to the sorting queue. In practical application, if short videos with the same score appear, the short videos can be randomly sorted, or recommendation rules can be set in advance according to user characteristics, for example, short videos corresponding to the user grade are recommended preferentially.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of program modules is illustrated, and in practical applications, the above-described distribution of functions may be performed by different program modules, that is, the internal structure of the apparatus may be divided into different program units or modules to perform all or part of the above-described functions. Each program module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one processing unit, and the integrated unit may be implemented in a form of hardware, or may be implemented in a form of software program unit. In addition, the specific names of the program modules are only used for distinguishing one program module from another, and are not used for limiting the protection scope of the present invention.

Fig. 8 is a schematic structural diagram of a computer apparatus provided in an embodiment of the present invention, and as shown, the computer apparatus 400 includes: a processor 420, a memory 410, and a computer program 411 stored in the memory 410 and executable on the processor 420, such as: the short video recommends an update procedure. The processor 420 implements the steps in the short video recommendation method embodiments when executing the computer program 411, or the processor 420 implements the functions of the modules in the short video recommendation apparatus embodiments when executing the computer program 411.

Computer device 400 may include, but is not limited to, a processor 420, a memory 410. Those skilled in the art will appreciate that fig. 8 is merely an example of a computer device 400 and is not intended to limit the computer device 400 and may include more or fewer components than illustrated, or some of the components may be combined, or different components.

The Processor 420 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor 420 may be a microprocessor or the processor may be any conventional processor or the like.

The memory 410 may be an internal storage unit of the computer device 400, such as: a hard disk or memory of the computer device 400. The memory 410 may also be an external storage device to the computer device 400, such as: a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the computer device 400. Further, the memory 410 may also include both internal and external storage units of the computer device 400. The memory 410 is used to store the computer program 411 and other programs and data required by the computer device 400. The memory 410 may also be used to temporarily store data that has been output or is to be output.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or recited in detail in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed computer device and method can be implemented in other ways. For example, the above-described embodiments of a computer device are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and an actual implementation may have another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may be implemented by sending instructions to relevant hardware through the computer program 411, where the computer program 411 may be stored in a computer-readable storage medium, and when the computer program 411 is executed by the processor 420, the steps of the method embodiments may be implemented. Among them, the computer program 411 includes: computer program code which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying the code of computer program 411, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the content of the computer readable storage medium can be increased or decreased according to the requirements of the legislation and patent practice in the jurisdiction, for example: in certain jurisdictions, in accordance with legislation and patent practice, the computer-readable medium does not include electrical carrier signals and telecommunications signals.

It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be construed as the protection scope of the present invention.

Claims

1. A short video recommendation method, comprising:

acquiring user characteristics and video characteristics of a short video;

and carrying out short video recommendation according to the scores of the short videos.

2. The short video recommendation method according to claim 1, wherein the first neural network model is composed of a plurality of serially connected feature extraction blocks, each feature extraction block comprises a multi-head neural network, a first normalized residual error network, a feature enhancement network and a second normalized residual error network which are connected in sequence, and the multi-head neural network comprises a plurality of self-attention networks;

3. The short video recommendation method of claim 1, wherein the second neural network model comprises a feature compression network, a feature importance prediction network, and a feature calibration network connected in sequence;

4. The short video recommendation method of any of claims 1-3,

scoring each short video based on the first feature matrix and the second feature matrix, and further performing short video recommendation comprises:

5. The short video recommendation method of any of claims 1-3,

the short video recommendation according to the score of each short video comprises the following steps:

recommending short videos according to the sorting queue; or

6. A short video recommendation device, comprising:

7. The short video recommendation device of claim 6, wherein in the first feature extraction module, the first neural network model is composed of a plurality of serially connected feature extraction blocks, each of said feature extraction blocks comprises a multi-headed neural network, a first normalized residual network, a feature enhancement network and a second normalized residual network connected in sequence, wherein,

the multi-head neural network comprises a plurality of self-attention networks and is used for carrying out linear transformation on input features to obtain a first matrix, wherein for a first feature extraction block, the input features are the acquired user features and the video features of the short video, and for other feature extraction blocks, the input features are the output of a previous feature extraction block;

8. The short video recommendation device of claim 6, wherein in the second feature extraction module, the second neural network model comprises a feature compression network, a feature importance prediction network and a feature calibration network which are connected in sequence; wherein the content of the first and second substances,

the characteristic compression network is used for carrying out average pooling operation on the acquired characteristics so as to compress the input characteristics;

the feature importance prediction network is used for predicting the weight of the compressed features and comprises two fully-connected layers connected behind the feature compression network;

9. The short video recommendation device of any one of claims 6-8, wherein the scoring module comprises a feature concatenation network and a scoring network connected to each other, wherein,

and the scoring network is used for carrying out scoring calculation on the splicing characteristic matrix to obtain a score for each short video.

10. The short video recommendation device of any of claims 6-8,

the video recommending module comprises a sequencing unit and a recommending unit which are connected with each other, wherein,

11. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, causes the processor to perform the steps of the short video recommendation method of any of claims 1-5.

12. A computer device comprising a memory and a processor, characterized in that the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the short video recommendation method of any one of claims 1 to 5.