CN117290540A

CN117290540A - Song recommendation model training method, song recommendation method, device and storage medium

Info

Publication number: CN117290540A
Application number: CN202311299277.4A
Authority: CN
Inventors: 陈飞; 马小栓
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2023-10-09
Filing date: 2023-10-09
Publication date: 2023-12-26

Abstract

The application relates to a song recommendation model training method, a song recommendation method, computer equipment and a storage medium, and relates to the technical field of artificial intelligence. The method and the device can improve the suitability of songs recommended by the model and users. The method comprises the following steps: for each sample user's singing information sequence sample, dividing into a history and a future singing information sequence, inputting the history singing information sequence into a first feature extraction branch of a model to be trained to extract song feature representations of each history time step, inputting the future singing information sequence into a second feature extraction branch to extract song feature representations of each future time step, training the first and second branches of the model based on the feature representations of each history time step and the feature representations of each future time step, and obtaining a trained song recommendation model when the similarity between the feature representations of the history time steps and the feature representations of the future time steps of the same user is greater than or equal to a first threshold value and the similarity between the feature representations of different users is smaller than a second threshold value.

Description

Song recommendation model training method, song recommendation method, device and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to a song recommendation model training method, a song recommendation method, a computer device, and a storage medium.

Background

With the development of artificial intelligence technology, song recommendation systems have emerged in music scenes that can perform song recommendation by modeling user-related information and extracting features according to the model.

The model used in song recommendation in the prior art is mainly used for predicting the song listening behavior of the user at the next moment according to the historical song listening behavior of the user, however, the model cannot accurately and stably capture the long-term song listening interest of the user, and therefore the suitability of the songs recommended by the model to the user is low.

Disclosure of Invention

Based on this, it is necessary to provide a song recommendation model training method, a song recommendation method, a computer device, and a storage medium in view of the above-described technical problems.

In a first aspect, the present application provides a song recommendation model training method. The method comprises the following steps:

acquiring respective song listening information sequence samples of each sample user;

dividing a song listening information sequence sample into a historical song listening information sequence and a future song listening information sequence aiming at a song listening information sequence sample of each sample user to obtain the historical song listening information sequence and the future song listening information sequence of each sample user;

Inputting the historical song listening information sequences of each sample user into a first feature extraction branch of a song recommendation model to be trained, and extracting song feature representations of each historical time step by the first feature extraction branch according to the historical song listening information sequences to obtain song feature representations of each historical time step corresponding to each sample user;

inputting the future song listening information sequences of each sample user into a second feature extraction branch of a song recommendation model to be trained, and extracting song feature representations of all future time steps by the second feature extraction branch according to the future song listening information sequences to obtain song feature representations of all the future time steps corresponding to each sample user;

training a first feature extraction branch and a second feature extraction branch of the song recommendation model to be trained based on similarity of song feature representations of each historical time step corresponding to each sample user and song feature representations of each future time step corresponding to each sample user;

a trained song recommendation model is obtained when the similarity of song feature representations corresponding to historical time steps of the same sample user to song feature representations of future time steps is greater than or equal to a first similarity threshold and the similarity of song feature representations corresponding to historical time steps of different sample users to song feature representations of future time steps is less than a second similarity threshold.

In one embodiment, the training the first feature extraction branch and the second feature extraction branch of the song recommendation model to be trained based on the similarity of the song feature representation of each historical time step corresponding to each sample user and the song feature representation of each future time step corresponding to each sample user includes:

extracting a plurality of song characteristic representations from the song characteristic representations of each historical time step corresponding to each sample user as a first type song characteristic representation set, and extracting a plurality of song characteristic representations from the song characteristic representations of each future time step corresponding to each sample user as a second type song characteristic representation set; and training a first feature extraction branch and a second feature extraction branch of the song recommendation model to be trained according to the similarity of each song feature representation in the first type song feature representation set and each song feature representation in the second type song feature representation set.

In one embodiment, the obtaining the trained song recommendation model when the similarity of the song feature representation corresponding to the historical time step of the same sample user to the song feature representation of the future time step is greater than or equal to the first similarity threshold and the similarity of the song feature representation corresponding to the historical time step of a different sample user to the song feature representation of the future time step is less than the second similarity threshold comprises:

And obtaining a trained song recommendation model when the similarity of the song feature representations in the first type song feature representation set and the song feature representations in the second type song feature representation set corresponding to the same sample user is greater than or equal to a first similarity threshold, and the similarity of the song feature representations in the first type song feature representation set and the song feature representations in the second type song feature representation set corresponding to different sample users is less than a second similarity threshold.

In one embodiment, the method further comprises:

aiming at each sample user, carrying out different mask processing on the historical singing information sequences of the sample user to obtain at least two groups of historical singing information sequences subjected to mask processing; inputting the at least two groups of history song listening information sequences processed by the mask of each sample user into a first feature extraction branch of the song recommendation model to be trained, and respectively outputting song feature representations of at least two groups of each history time step corresponding to the first feature extraction branch according to the at least two groups of history song listening information sequences processed by the mask to obtain song feature representations of at least two groups of each history time step corresponding to each sample user; aiming at each sample user, obtaining song characteristic representations of at least two last historical time steps according to song characteristic representations of at least two corresponding groups of historical time steps;

The training the first feature extraction branch and the second feature extraction branch of the song recommendation model to be trained based on the similarity of song feature representations of each historical time step corresponding to each sample user and song feature representations of each future time step corresponding to each sample user comprises the following steps:

and training a first feature extraction branch and a second feature extraction branch of the song recommendation model to be trained based on the similarity of song feature representations of each historical time step corresponding to each sample user and song feature representations of each future time step corresponding to each sample user and the similarity between song feature representations of the at least two last historical time steps.

A trained song recommendation model is obtained when the similarity of song feature representations corresponding to historical time steps of the same sample user to song feature representations of future time steps is greater than or equal to a first similarity threshold, and the similarity of song feature representations corresponding to historical time steps of different sample users to song feature representations of future time steps is less than a second similarity threshold, and the similarity between song feature representations of the at least two last historical time steps of the same sample user is greater than or equal to a third similarity threshold.

In one embodiment, the method further comprises: acquiring the favorite song sequences of the users of the samples; the step of extracting song feature representations of each historical time step by the first feature extraction branch according to the historical song listening information sequence comprises the following steps: and the first feature extraction branch takes the historical song listening information sequence and the favorite song sequence as query items to carry out weighting processing on the historical song listening information sequence, so as to obtain song feature representation of each historical time step.

In one embodiment, the method for dividing the sample of the song listening information sequence into a historical song listening information sequence and a future song listening information sequence for each sample user includes: dividing a song listening information sequence sample of each sample user into a historical song listening information sequence and a preliminary future song listening information sequence; and determining the future song-listening information sequence according to the song-listening information of the forward song-listening behavior of the sample user in the preliminary future song-listening information sequence.

In a second aspect, the present application provides a song recommendation method. The method comprises the following steps:

acquiring a historical song listening information sequence of a target user; inputting the historical song listening information sequence of the target user into a trained song recommendation model, and extracting song characteristic representations of each historical time step corresponding to the target user according to the historical song listening information sequence of the target user by a first characteristic extraction branch in the trained song recommendation model; obtaining respective corresponding song characteristic representations of all songs in a song library; the song characteristic representations corresponding to the songs are extracted by a second characteristic extraction branch in the trained song recommendation model according to song information of the songs; determining songs recommended to the target user based on similarity of song characteristic representations of each historical time step corresponding to the target user and song characteristic representations corresponding to each song; wherein the trained song recommendation model is trained according to the method described above.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

Acquiring respective song listening information sequence samples of each sample user; dividing a song listening information sequence sample into a historical song listening information sequence and a future song listening information sequence aiming at a song listening information sequence sample of each sample user to obtain the historical song listening information sequence and the future song listening information sequence of each sample user; inputting the historical song listening information sequences of each sample user into a first feature extraction branch of a song recommendation model to be trained, and extracting song feature representations of each historical time step by the first feature extraction branch according to the historical song listening information sequences to obtain song feature representations of each historical time step corresponding to each sample user; inputting the future song listening information sequences of each sample user into a second feature extraction branch of a song recommendation model to be trained, and extracting song feature representations of all future time steps by the second feature extraction branch according to the future song listening information sequences to obtain song feature representations of all the future time steps corresponding to each sample user; training a first feature extraction branch and a second feature extraction branch of the song recommendation model to be trained based on similarity of song feature representations of each historical time step corresponding to each sample user and song feature representations of each future time step corresponding to each sample user; a trained song recommendation model is obtained when the similarity of song feature representations corresponding to historical time steps of the same sample user to song feature representations of future time steps is greater than or equal to a first similarity threshold and the similarity of song feature representations corresponding to historical time steps of different sample users to song feature representations of future time steps is less than a second similarity threshold.

In a fourth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a fifth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a sixth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

According to the song recommendation model training method, the song recommendation method, the computer equipment and the storage medium, respective song listening information sequence samples of each sample user are obtained, the song listening information sequence samples of each sample user are divided into historical song listening information sequences and future song listening information sequences, the historical song listening information sequences of each sample user are input into a first feature extraction branch of a song recommendation model to be trained, song feature representations of each historical time step are extracted by the first feature extraction branch according to the historical song listening information sequences, song feature representations of each sample user are input into a second feature extraction branch of the model, song feature representations of each future time step are extracted by the second feature extraction branch according to the future song listening information sequences, similarity of song feature representations of each historical time step corresponding to each sample user to song feature representations of each future time step corresponding to each sample user is based on the similarity of song feature representations of the historical time steps corresponding to each sample user, and the first and second feature extraction branches of the training model are obtained when the similarity of the song feature representations of the historical time steps corresponding to the same sample user to the song feature representations of the future time steps is greater than or equal to the similarity of the first step and the similarity of the song feature representations of the sample user to the song feature representations of the future time step to be less than the song feature representation of the song recommendation model corresponding to the user is obtained by the similarity of the user is less than the similarity of the threshold. According to the scheme, in training, a sample of a song listening information sequence of a sample user is divided into a historical song listening information sequence and a future song listening information sequence, the first and second characteristic extraction branches of a song recommendation model are respectively characterized by song characteristic representations of historical time steps and song characteristic representations of future time steps, and the first and second characteristic extraction branches of the model are trained based on the fact that the similarity between song characteristic representations corresponding to the historical time steps of the same sample user and song characteristic representations of the future time steps is as small as possible and the similarity between song characteristic representations corresponding to the historical time steps of different sample users and song characteristic representations of the future time steps is as large as possible, so that the trained song recommendation model can accurately and stably capture long-term song listening interests of users, and suitability of songs recommended by the model to the users is improved.

Drawings

FIG. 1 is an application environment diagram of a related method in an embodiment of the present application;

FIG. 2 is a flowchart of a method for training a song recommendation model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a song recommendation model according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating steps for dividing a song listening information sequence according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a song recommendation model according to another embodiment of the present application;

FIG. 6 is a flowchart of a song recommendation method according to an embodiment of the present application;

FIG. 7 (a) is an internal structural diagram of a computer device in an embodiment of the present application;

fig. 7 (b) is an internal structural diagram of a computer device according to another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.

The song recommendation model training method and the song recommendation method provided in the embodiments of the present invention may be applied to an application environment as shown in fig. 1, where the application environment may include a terminal 110 and a server 120, where the terminal 110 may communicate with the server 120 through the internet, and may further include a data storage system, where the data storage system may store data that needs to be processed by the server 120, and the data storage system may be integrated on the server 120, or may be placed on a cloud or other network servers. The song recommendation model training method of the present application may be performed by the server 120, and the song recommendation method of the present application may be performed by the terminal 110 or performed by the server 120. Specifically, for the song recommendation model training method, the server 120 may divide the song recommendation model training method into a historical song listening information sequence and a future song listening information sequence according to respective song listening information sequence samples of each sample user, and train the first feature extraction branch and the second feature extraction branch in the song recommendation model to be trained by using the historical song listening information sequence and the future song listening information sequence of each sample user, so as to obtain a trained song recommendation model, where the trained song recommendation model may be sent to the terminal 110 by the server 120; for the song recommendation method, taking the terminal 110 as an example, the terminal 110 can obtain a historical song listening information sequence of the target user, extract a song characteristic representation of the target user through the trained song recommendation model, and determine a song recommended to the target user according to the song characteristic representation, so that long-term song listening interests of the user are accurately and stably captured through the trained song recommendation model, and suitability of the song recommended by the model to the user is improved.

In the above application environment, the terminal 110 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, etc. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 120 may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.

The following sequentially describes a song recommendation model training method and a song recommendation method according to the present application with reference to the embodiments and the corresponding drawings based on the application environment shown in fig. 1.

In one embodiment, as shown in fig. 2, the present application provides a song recommendation model training method, which may be performed by the server 120, and may include the steps of:

step S201, acquiring respective song listening information sequence samples of each sample user.

In this step, the server 120 may obtain, from a user information base of the music application program, a respective song listening information sequence of each sample user, where the song listening information sequence is recorded as a song listening information sequence sample, and specifically, in this step, a step of obtaining sample data of the model training may be performed, a part of users of the music application program may be extracted as sample users, and a sample of a song listening information sequence of each sample user in a preset period of time may be obtained, where the sample of the song listening information sequence may include song listening information arranged according to a time step (e.g., every day), and the song listening information may be information such as an identifier of a song listened to by each user in a corresponding time step, a singer, a language, and a song wind corresponding to the song.

Step S202, dividing the song listening information sequence sample into a historical song listening information sequence and a future song listening information sequence according to the song listening information sequence sample of each sample user, and obtaining the historical song listening information sequence and the future song listening information sequence of each sample user.

In this step, for each sample user's listening information sequence sample, the historical sequence and the future sequence are divided or referred to as slicing, and for the listening information sequence sample containing n time steps, the listening information sequence sample corresponding to the 1 st to k time steps in the listening information sequence sample may be divided into historical listening information sequences, and the listening information sequence sample of the k+1 to n time steps may be divided into future listening information sequences, so that the server 120 obtains the historical listening information sequence and the future listening information sequence of each sample user. The time steps in the historical song listening information sequence are relatively recorded as historical time steps, and the time steps in the future song listening information sequence are recorded as future time steps.

Step S203, the historical song listening information sequence of each sample user is input into a first feature extraction branch of a song recommendation model to be trained, and song feature representations of each historical time step are extracted by the first feature extraction branch according to the historical song listening information sequence, so that song feature representations of each historical time step corresponding to each sample user are obtained.

Step S204, inputting the future song listening information sequence of each sample user into a second feature extraction branch of the song recommendation model to be trained, and extracting song feature representations of all future time steps by the second feature extraction branch according to the future song listening information sequence to obtain song feature representations of all the future time steps corresponding to each sample user.

The steps S203 and S204 are steps of inputting the historical song listening information sequence and the future song listening information sequence of each sample user into the song recommendation model to be trained for training. Referring to fig. 3, in step S203, a historical song listening information sequence of each sample user is input into a first feature extraction branch of a song recommendation model to be trained, song feature representations of each historical time step are extracted by the first feature extraction branch according to the historical song listening information of each historical time step in the historical song listening information sequence, and the first feature extraction branch specifically may adopt a transform structure, which is a model structure modeled by a sequence, is input into the historical song listening information sequence of the sample user to obtain song feature representations of each historical time step, so that the first feature extraction branch may extract song feature representations of each historical time step according to the historical song listening information of each historical time step and the previous historical time step thereof, so as to accurately and stably mine long-term song listening interests of the user. In step S204, the future song-listening information sequence of each sample user is input into a second feature extraction branch of the model, and the song feature representation of each future time step is extracted by the second feature extraction branch according to the future song-listening information of each future time step in the future song-listening information sequence, where the second feature extraction branch may be implemented by adopting a DNN (Deep NeuralNetworks, deep neural network) structure or MPL (multi-layer perceptron). Thus, a song feature representation of each historical time step corresponding to each sample user and a song feature representation of each future time step corresponding to each sample user are obtained in step S203.

In step S205, the first feature extraction branch and the second feature extraction branch of the song recommendation model to be trained are trained based on the similarity of the song feature representation of each historical time step corresponding to each sample user and the song feature representation of each future time step corresponding to each sample user.

Specifically, after the history and future song listening information sequences of each sample user are input into a song recommendation model to be trained, song characteristic representations of each history time step and song characteristic representations of each future time step corresponding to each user are extracted by a first characteristic extraction branch and a second characteristic extraction branch of the song recommendation model to be trained correspondingly.

In step S206, a trained song recommendation model is obtained when the similarity of the song feature representation corresponding to the historical time step of the same sample user to the song feature representation of the future time step is greater than or equal to the first similarity threshold and the similarity of the song feature representation corresponding to the historical time step of a different sample user to the song feature representation of the future time step is less than the second similarity threshold.

In particular, the similarity of the song feature representation for each historical time step to the song feature representation for each future time step, respectively, may be calculated, and may correspond to the same sample user or may correspond to different sample users. In this step, when the similarity between the song feature representation corresponding to the history time step and the song feature representation of the future time step of the same sample user is greater than or equal to the first similarity threshold and the similarity between the song feature representation corresponding to the history time step and the song feature representation of the future time step of a different sample user is less than the second similarity threshold, the server 120 obtains a trained song recommendation model, that is, the server 120 makes the foregoing similarity of the same sample user as large as possible (cosine distance as small as possible) and makes the foregoing similarity of the different sample user as small as possible (cosine distance as possible) during the model training process, and in a specific implementation, the server 120 may combine the song feature representation of the history time step of the same sample user and the song feature representation of the future time step into a positive sample pair through the infonnce loss function, and calculate the InfoNCE loss so that the foregoing similarity of the same sample user is as large as possible and the foregoing similarity of the different sample user is as small as possible, thereby obtaining the trained song recommendation model.

The song recommendation model training method of the embodiment obtains respective song listening information sequence samples of each sample user, divides the song listening information sequence samples into historical song listening information sequences and future song listening information sequences aiming at the song listening information sequence samples of each sample user, inputs the historical song listening information sequences of each sample user into a first feature extraction branch of a song recommendation model to be trained, extracts song feature representations of each historical time step according to the historical song listening information sequences by the first feature extraction branch, inputs the future song listening information sequences of each sample user into a second feature extraction branch of the model, extracts song feature representations of each future time step according to the future song listening information sequences by the second feature extraction branch, the first and second feature extraction branches of the training model are based on the similarity of the song feature representation of each historical time step corresponding to each sample user to the song feature representation of each future time step corresponding to each sample user, and the trained song recommendation model is obtained when the similarity of the song feature representation of the historical time step to the song feature representation of the future time step of the same sample user is greater than or equal to a first similarity threshold and the similarity of the song feature representations of the historical time steps of different sample users to the song feature representation of the future time step is less than a second similarity threshold. According to the scheme, in training, a sample of a song listening information sequence of a sample user is divided into a historical song listening information sequence and a future song listening information sequence, the first and second characteristic extraction branches of a song recommendation model are respectively characterized by song characteristic representations of historical time steps and song characteristic representations of future time steps, and the first and second characteristic extraction branches of the model are trained based on the fact that the similarity between song characteristic representations corresponding to the historical time steps of the same sample user and song characteristic representations of the future time steps is as small as possible and the similarity between song characteristic representations corresponding to the historical time steps of different sample users and song characteristic representations of the future time steps is as large as possible, so that the trained song recommendation model can accurately and stably capture long-term song listening interests of users, and suitability of songs recommended by the model to the users is improved.

In some embodiments, as shown in fig. 4, the step S202 of dividing the listening to song information sequence samples of each sample user into a historical listening to song information sequence and a future listening to song information sequence may include:

step S401, for each sample user' S song listening information sequence sample, divide the song listening information sequence sample into a historical song listening information sequence and a preliminary future song listening information sequence.

Step S402, determining a future song-listening information sequence according to the song-listening information of the forward song-listening behavior of the sample user in the preliminary future song-listening information sequence.

In this embodiment, the future song listening information included in the divided future song listening information sequence is forward song listening information, where the forward song listening information refers to song listening information of interest of the user, so that model training is ensured to be accurately performed, and long-term song listening interest of the user can be accurately captured. Specifically, in step S401, for each sample user 'S song-listening information sequence sample, firstly, according to the time-step sequence, the song-listening information sequence sample is divided into a historical song-listening information sequence and a future song-listening information sequence, the future song-listening information sequence is recorded as a preliminary future song-listening information sequence, then, in step S402, for the preliminary future song-listening information sequence, song-listening information corresponding to the sample user' S forward song-listening behavior is screened out from the preliminary future song-listening information sequence, and the future song-listening information sequence is obtained according to the screened song-out song-listening information. Whether the sample user is in forward song listening action or not can be determined according to the playing time of the sample user on the song, and the completely played song can be determined to be in forward song listening action or the like.

In some embodiments, training the first feature extraction branch and the second feature extraction branch of the song recommendation model to be trained based on the similarity of the song feature representation of each historical time step corresponding to each sample user and the song feature representation of each future time step corresponding to each sample user in step S205 may include:

extracting a plurality of song characteristic representations from song characteristic representations of each historical time step corresponding to each sample user as a first type of song characteristic representation set, and extracting a plurality of song characteristic representations from song characteristic representations of each future time step corresponding to each sample user as a second type of song characteristic representation set; and training a first feature extraction branch and a second feature extraction branch of the song recommendation model to be trained according to the similarity of each song feature representation in the first type song feature representation set and each song feature representation in the second type song feature representation set.

In this embodiment, after obtaining the song feature representation of each historical time step and the song feature representation of each future time step corresponding to each sample user, the song feature representation of each historical time step and the song feature representation of each future time step may be sampled, a plurality of song feature representations are extracted from the song feature representations of each historical time step corresponding to each sample user as a first type song feature representation set, a plurality of song feature representations are extracted from the song feature representations of each future time step corresponding to each sample user as a second type song feature representation set, and then the first and second feature extraction branches are trained according to the similarity between each song feature representation in the first type song feature representation set and each song feature representation in the second type song feature representation set.

Based on this, in one embodiment, when the similarity of the song feature representation corresponding to the historical time step of the same sample user to the song feature representation of the future time step is greater than or equal to the first similarity threshold and the similarity of the song feature representation corresponding to the historical time step of a different sample user to the song feature representation of the future time step is less than the second similarity threshold, the trained song recommendation model is obtained in step S206, which specifically includes:

and obtaining a trained song recommendation model when the similarity of the song feature representations in the first type of song feature representation set and the song feature representations in the second type of song feature representation set corresponding to the same sample user is greater than or equal to a first similarity threshold, and the similarity of the song feature representations in the first type of song feature representation set and the song feature representations in the second type of song feature representation set corresponding to different sample users is less than a second similarity threshold.

According to the scheme, a first type song feature representation set and a second type song feature representation set of each sample user are utilized, model training is conducted according to similarity of each song feature representation in the second type song feature representation set respectively, and accordingly, when similarity of song feature representations in the first type song feature representation set corresponding to the same sample user in the second type song feature representation set is larger than or equal to a first similarity threshold value, and the similarity corresponding to different sample users is smaller than a second similarity threshold value, a trained song recommendation model is obtained.

The model training mode of the embodiment can play a role in sample enhancement, can stably capture long-term song listening interests of users in the model training process, still can enable the model to mine stable long-term song listening interests of the users in high quality when the users with short song listening information sequences and song recommendation need short-time level updating in application, weakens the influence of individual offset interests on the model, relieves the influence of short-term song listening interest drift of the users on the model, and then mines the long-term song listening interests of the users more accurately.

In some embodiments, the method of the present application may further comprise the steps of:

aiming at each sample user, carrying out different mask processing on the historical singing information sequences of the sample user to obtain at least two groups of historical singing information sequences subjected to mask processing; inputting at least two groups of history song listening information sequences of each sample user through mask processing into a first feature extraction branch of a song recommendation model to be trained, and respectively outputting song feature representations of at least two groups of history time steps corresponding to the first feature extraction branch according to the at least two groups of history song listening information sequences through mask processing to obtain song feature representations of at least two groups of history time steps corresponding to each sample user; and obtaining song characteristic representations of at least two last historical time steps according to song characteristic representations of at least two corresponding groups of historical time steps for each sample user.

The method mainly comprises the steps of constructing an auxiliary part for training a song recommendation model, performing mask processing on a historical song listening information sequence of each sample user to obtain different mask processed historical song listening information sequences corresponding to the same sample user, and forcing a first feature extraction branch to learn from multiple angles to the representation of the sample user according to the sequences.

Specifically, in connection with fig. 5, in this embodiment, the auxiliary portion adopts the first feature extraction branch to process, that is, the first feature extraction branch processes not only the historical singing information sequence but also at least two sets of the historical singing information sequences processed through the mask during training. Firstly, performing different masking treatments on a historical song listening information sequence of a sample user according to each sample user, performing masking treatments on the historical song listening information sequence of the sample user according to a certain proportion to obtain at least two groups of historical song listening information sequences of the sample user, taking two groups of historical song listening information sequences subjected to masking treatments corresponding to each sample user as an example, obtaining two groups of historical song listening information sequences subjected to masking treatments of each sample user, namely a historical song listening information sequence 1 subjected to masking treatments and a historical song listening information sequence 2 subjected to masking treatments, inputting the two groups of historical song listening information sequences subjected to masking treatments of each sample user into a first feature extraction branch of a song recommendation model to be trained, outputting song feature representations 1 of corresponding historical time steps according to the historical song listening information sequences subjected to masking treatments respectively by the first feature extraction branch, and outputting song feature representations 2 of corresponding historical time steps according to the historical song listening information sequences subjected to masking treatments, thereby obtaining at least two groups of corresponding historical song feature representations of each sample user under the condition of at least two groups of the historical song listening information sequences subjected to masking treatments. Next, taking each sample user corresponding to two sets of song feature representations (song feature representation 1 for each historical time step and song feature representation 2 for each historical time step) as an example, for each sample user, two song feature representations corresponding to the last historical time step are obtained from the corresponding two sets of song feature representations for each historical time step, where, as previously described, the historical time steps may be 1 to k, where two song feature representations corresponding to the last historical time step k may be obtained.

Based on this, the training of the first feature extraction branch and the second feature extraction branch of the song recommendation model to be trained based on the similarity of the song feature representation of each historical time step corresponding to each sample user and the song feature representation of each future time step corresponding to each sample user in step S205 further includes:

the first feature extraction branch and the second feature extraction branch of the song recommendation model to be trained are trained based on the similarity of the song feature representations of each historical time step corresponding to each sample user to the song feature representations of each future time step corresponding to each sample user, and the similarity between the song feature representations of at least two last historical time steps.

In this embodiment, when training the first and second feature extraction branches of the model, the training is performed in combination with the similarity between the song feature representations of the at least two last historical time steps described above, in addition to the similarity between the song feature representations of each historical time step corresponding to each sample user and the song feature representations of each future time step corresponding to each sample user.

Based on this, in some embodiments, when the similarity of the song feature representation corresponding to the historical time step of the same sample user to the song feature representation of the future time step is greater than or equal to the first similarity threshold and the similarity of the song feature representation corresponding to the historical time step of a different sample user to the song feature representation of the future time step is less than the second similarity threshold, the trained song recommendation model is obtained in step S206, further comprising:

A trained song recommendation model is obtained when the similarity of song feature representations corresponding to historical time steps of the same sample user to song feature representations of future time steps is greater than or equal to a first similarity threshold, and the similarity of song feature representations corresponding to historical time steps of different sample users to song feature representations of future time steps is less than a second similarity threshold, and the similarity between song feature representations of at least two last historical time steps of the same sample user is greater than or equal to a third similarity threshold.

In this embodiment, in training by combining the similarity between the song feature representations of the at least two last historical time steps, the similarity between the song feature representations of the at least two last historical time steps of the same sample user is required to be greater than or equal to a third similarity threshold, and in a specific implementation, for the auxiliary portion, the song feature representations of the at least two last historical time steps of the same sample user can be pulled up through MSE loss, so that the similarity between the song feature representations of the at least two last historical time steps of the same sample user is greater than or equal to the third similarity threshold, thereby forcing the model to learn the user's characterization in multiple angles.

In some embodiments, the method of the present application may further comprise the steps of: and acquiring the respective favorite song sequences of the users of each sample. The server 120 may further obtain a favorite song sequence of each sample user, and specifically may count songs favored by each sample user according to the information such as the playing times, etc., to obtain each sample user, such as a plurality of favorite songs, and obtain a favorite song sequence of each sample user. Based on this, the first feature extraction branch in step S203 extracts song feature representations of each historical time step according to the historical listening song information sequence, specifically including: and the first characteristic extraction branch takes the historical song listening information sequence and the favorite song sequence as query items to carry out weighting processing on the historical song listening information sequence, so as to obtain song characteristic representations of each historical time step.

In this embodiment, specifically, the input of the song recommendation model to be trained may include a historical song listening information sequence, a future song listening information sequence, and the favorite song sequence, where each of the historical song listening information sequence and the future song listening information sequence may include information such as an identifier of each song and a corresponding timestamp, and the favorite song sequence may include a plurality of songs that are favored by the sample user according to time accumulation. Then, the historical song listening information sequence, the future song listening information sequence and the favorite song sequence can be subjected to operations such as coding and Embedding, and the high-dimensional sparse feature is mapped to obtain a low-dimensional dense feature vector, wherein the identification of songs in the historical song listening information sequence and the future song listening information sequence can share Embedding, so that the parameter number of a model can be effectively reduced. Then, the identification of each song in the historical song listening information sequence and the corresponding timestamp are spliced and input into a first feature extraction branch, the first feature extraction branch can adopt a transform structure, and when modeling is performed by using a Self-Attention module (Self-Attention) in the transform structure, the identification of each song in the historical song listening information sequence after pooling and the favorite song sequence splice are used as query terms query, and the identification of each song in the historical song listening information sequence is weighted to obtain song feature representations of each historical time step. The scheme of the embodiment can enable the first feature extraction branch in the song recommendation model to be combined with the favorite song sequence to accurately learn the feature representation of the long-term song listening interest of the user, and further strengthen the robustness of the model.

In one embodiment, as shown in fig. 6, a song recommendation method is provided, which may be applied to the terminal 110 or the server 120 of fig. 1, and may include the steps of:

step S601, a historical song listening information sequence of a target user is obtained.

In this step, a user who starts the music application on the terminal 110 may be determined as a target user, and a history of the target user's song-listening information sequence may be obtained, where the history of the song-listening information sequence may be a song-listening information sequence in a near-period time (such as one week, one month, etc.).

Step S602, inputting the historical song listening information sequence of the target user into a trained song recommendation model, and extracting song characteristic representations of each historical time step corresponding to the target user by a first characteristic extraction branch in the trained song recommendation model according to the historical song listening information sequence of the target user.

In this step, the trained song recommendation model may be a song recommendation model obtained by training the server 120 according to the song recommendation model training method provided in any one of the embodiments above, the historical song listening information sequence of the target user may be input into the trained song recommendation model, and the first feature extraction branch in the model extracts the song feature representation of each historical time step corresponding to the target user according to the historical song listening information sequence of the target user. In the model application stage, the input of the model can be just the historical song listening information sequence of the user, and the corresponding song characteristic representation of each historical time step is obtained through the first characteristic extraction branch in the song recommendation model.

Step S603, obtaining respective corresponding song characteristic representations of all songs in the song library.

In this step, the respective corresponding song feature representations of each song in the song library of the music application program may be obtained, specifically, after the server 120 trains to obtain the song recommendation model, the respective song information (the identification of the song, the corresponding singer, the language, the wind, etc.) of each song in the song library of the music application program may be input to the song recommendation model, and the respective corresponding song feature representations of each song may be obtained by the second feature extraction branch in the song recommendation model according to the respective song information of each song in the song library.

In step S604, songs recommended to the target user are determined based on the similarity of the song feature representation of each historical time step corresponding to the target user and the song feature representation of each song.

In this step, the similarity between the song feature representation of each historical time step corresponding to the target user and the song feature representation of each song can be calculated, and a certain number (e.g. 10) of songs are selected from the songs according to the order of the similarity from large to small as the songs recommended to the target user, and the songs can be displayed in the song recommendation area of the first page of the music application when the target user starts the music application, so as to achieve the effect of providing the recommended songs accurately adapted to the target user, and avoid frequent searching operations of the songs due to the fact that the user does not adapt to the recommended songs in the conventional technology.

The scheme of the embodiment can play a beneficial role in a recommendation system of a music scene, and can accurately and stably capture the long-term song listening interest of the user to perform song recommendation service accurately adapted to the user.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

In one embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 7 (a). The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data such as a song listening information sequence sample. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by the processor implements a song recommendation model training method, song recommendation method.

In one embodiment, a computer device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 7 (b). The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a song recommendation method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structures shown in fig. 7 (a) and 7 (b) are merely block diagrams of partial structures related to the present application and do not constitute a limitation of the computer device to which the present application is applied, and that a particular computer device may include more or less components than those shown in the figures, or may combine some components, or have different arrangements of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive RandomAccess Memory, MRAM), ferroelectric Memory (Ferroelectric RandomAccess Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (RandomAccess Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static RandomAccess Memory, SRAM) or dynamic random access memory (Dynamic RandomAccess Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method for training a song recommendation model, the method comprising:

2. The method of claim 1, wherein training the first feature extraction branch and the second feature extraction branch of the song recommendation model to be trained based on similarity of song feature representations of historical time steps corresponding to the each sample user to song feature representations of future time steps corresponding to the each sample user, comprises:

extracting a plurality of song characteristic representations from the song characteristic representations of each historical time step corresponding to each sample user as a first type song characteristic representation set, and extracting a plurality of song characteristic representations from the song characteristic representations of each future time step corresponding to each sample user as a second type song characteristic representation set;

and training a first feature extraction branch and a second feature extraction branch of the song recommendation model to be trained according to the similarity of each song feature representation in the first type song feature representation set and each song feature representation in the second type song feature representation set.

3. The method of claim 2, wherein the obtaining a trained song recommendation model when the similarity of the song feature representation corresponding to the historical time step of the same sample user to the song feature representation of the future time step is greater than or equal to a first similarity threshold and the similarity of the song feature representation corresponding to the historical time step of a different sample user to the song feature representation of the future time step is less than a second similarity threshold comprises:

4. The method according to claim 1, wherein the method further comprises:

aiming at each sample user, carrying out different mask processing on the historical singing information sequences of the sample user to obtain at least two groups of historical singing information sequences subjected to mask processing;

inputting the at least two groups of history song listening information sequences processed by the mask of each sample user into a first feature extraction branch of the song recommendation model to be trained, and respectively outputting song feature representations of at least two groups of each history time step corresponding to the first feature extraction branch according to the at least two groups of history song listening information sequences processed by the mask to obtain song feature representations of at least two groups of each history time step corresponding to each sample user;

Aiming at each sample user, obtaining song characteristic representations of at least two last historical time steps according to song characteristic representations of at least two corresponding groups of historical time steps;

5. The method of claim 4, wherein the obtaining a trained song recommendation model when the similarity of the song feature representation corresponding to the historical time step of the same sample user to the song feature representation of the future time step is greater than or equal to a first similarity threshold and the similarity of the song feature representation corresponding to the historical time step of a different sample user to the song feature representation of the future time step is less than a second similarity threshold comprises:

6. The method of claim 1, wherein the step of determining the position of the substrate comprises,

the method further comprises the steps of:

acquiring the favorite song sequences of the users of the samples;

the step of extracting song feature representations of each historical time step by the first feature extraction branch according to the historical song listening information sequence comprises the following steps:

and the first feature extraction branch takes the historical song listening information sequence and the favorite song sequence as query items to carry out weighting processing on the historical song listening information sequence, so as to obtain song feature representation of each historical time step.

7. The method according to any one of claims 1 to 6, wherein the dividing the sample of the listening information sequence into a historical listening information sequence and a future listening information sequence for each sample user comprises:

Dividing a song listening information sequence sample of each sample user into a historical song listening information sequence and a preliminary future song listening information sequence;

and determining the future song-listening information sequence according to the song-listening information of the forward song-listening behavior of the sample user in the preliminary future song-listening information sequence.

8. A song recommendation method, the method comprising:

acquiring a historical song listening information sequence of a target user;

inputting the historical song listening information sequence of the target user into a trained song recommendation model, and extracting song characteristic representations of each historical time step corresponding to the target user according to the historical song listening information sequence of the target user by a first characteristic extraction branch in the trained song recommendation model;

obtaining respective corresponding song characteristic representations of all songs in a song library; the song characteristic representations corresponding to the songs are extracted by a second characteristic extraction branch in the trained song recommendation model according to song information of the songs;

determining songs recommended to the target user based on similarity of song characteristic representations of each historical time step corresponding to the target user and song characteristic representations corresponding to each song;

Wherein the trained song recommendation model is trained according to the method of any one of claims 1 to 7.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 7 or of claim 8 when the computer program is executed.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 7 or of claim 8.