CN110717064A

CN110717064A - Personalized audio play list generation method and device and readable storage medium

Info

Publication number: CN110717064A
Application number: CN201910765781.6A
Authority: CN
Inventors: 朱玉婷; 杜睿
Original assignee: Guangzhou Li Zhi Network Technology Co Ltd
Current assignee: Guangzhou Li Zhi Network Technology Co Ltd
Priority date: 2019-08-19
Filing date: 2019-08-19
Publication date: 2020-01-21
Anticipated expiration: 2039-08-19
Also published as: CN110717064B

Abstract

The embodiment of the invention discloses a method and a device for generating a personalized audio playlist and a readable storage medium, wherein the method comprises the following steps: constructing a user representation system according to the user information; constructing an audio content portrait system according to the audio content information; and training an audio play list generating model according to the user portrayal system and the audio content portrayal system to generate an audio play list. The embodiment of the invention can automatically and individually generate the playlist for each listener according to the interest preference of each listener in the using process so as to meet the requirement of the listener for listening to the audio playlist.

Description

Personalized audio play list generation method and device and readable storage medium

Technical Field

The invention relates to the technical field of audio data intelligent processing, in particular to a personalized audio playlist generation method and device and a readable storage medium.

Background

Audio is one of the most prominent ways that humans acquire information, while listening to audio content by way of a playlist is one of the most prominent forms that users listen to audio.

The traditional way of generating the playlist is generally a way of generating the playlist manually, that is, manually generating the playlist according to personal preferences.

The disadvantage of such generation of the playlist is that:

1: the most common situation is that the audio listening user only likes part of the content and is difficult to meet all requirements of the user because the song list is generated according to the personal preference;

2: the generation of the broadcast list needs a large amount of manual work, and the efficiency and the quality are difficult to ensure;

3: the generation of the broadcast list depends on manual work, and the time period for generating one broadcast list is long generally;

it is very necessary to design a system that can automatically and individually generate the playlist for each listener according to the interest preference of each listener, so as to meet the requirement of the listener for listening to the audio playlist.

Disclosure of Invention

The embodiment of the invention aims to provide a personalized audio playlist generation method, a personalized audio playlist generation device and a readable storage medium. The embodiment of the invention can automatically and individually generate the playlist for each listener according to the interest preference of each listener in the using process so as to meet the requirement of the listener for listening to the audio playlist.

In order to solve the technical problem, the embodiment of the invention adopts the following technical scheme:

provided is a personalized audio play list generation method, comprising the following steps:

constructing a user representation system according to the user information;

constructing an audio content portrait system according to the audio content information;

and training an audio play list generating model according to the user portrayal system and the audio content portrayal system to generate an audio play list.

Optionally, the system for constructing a user representation according to user information includes: and constructing a user image system through data mining and data statistical algorithms according to the collected user attribute information and the user interest preference information.

Optionally, the system for constructing an audio content representation according to audio content information includes: and (3) constructing an audio content characteristic project, collecting feedback information of the user on the audio content, and constructing an audio content portrait system by data mining and data statistics.

Optionally, the feedback information includes active feedback information and/or passive feedback information.

Optionally, the audio playlist generation model is trained according to the user representation system and the audio content representation system, and the process of generating the audio playlist is as follows:

generating a user feature vector according to a user portrait system;

generating an audio content feature vector according to the audio content representation system;

generating a set of similar users for each user;

generating a set of similar audio content for each audio content;

generating a user-personalized audio content set for each user;

generating a prediction personalized audio content recall set according to the user personalized audio content set;

and classifying the audio contents in the predicted personalized audio content recall set according to the theme dimension, and generating personalized audio broadcasting lists according to the themes. .

Optionally, the process of generating the predicted personalized audio content recall set according to the user personalized audio content set includes:

for each audio content in the user personalized audio content set, selecting n placed predicted personalized audio content recall sets from the audio content similar audio content set;

and selecting n similar users from the similar user sets of the users, and putting the audio contents in the user personalized audio content sets of each user of the n similar users into the predicted personalized audio content recall set of the users.

Optionally, the process of generating the user feature vector according to the user representation system includes:

splicing and reducing dimensions according to four categories of characteristics of the age group of the user, the gender of the user, the interest preference of the user and the income level of the user in a fixed sequence to generate a short-term characteristic vector of the user;

splicing and reducing dimensions according to five category characteristics of forward comments of users, probability distribution of listened audio subjects, probability distribution of listened audio categories, probability distribution of listened audio authors and probability statistics of audio generation time periods in a fixed sequence to generate long-term characteristic vectors of the users;

a user profile is generated using a profile embedding algorithm based on the degree to which audio content is listened to between users.

Optionally, the process of generating the audio content feature vector according to the audio content representation system includes:

according to the method, audio content feature vectors are generated according to user graph embedding with positive feedback to audio content, user graph embedding with negative feedback to audio content, user positive feedback frequency, user negative feedback frequency, audio content classification onehot coding, audio content label onehot coding, sound features and text features which are spliced in a fixed sequence and subjected to dimension reduction.

An embodiment of the present invention further provides a device for generating a personalized audio playlist, including:

the user representation system generating module is used for constructing a user representation system according to the user information;

the audio content portrait system generation module is used for constructing an audio content portrait system according to the audio content information;

and the audio play list generation model training module is used for training an audio play list generation model according to the user portrayal system and the audio content portrayal system to generate an audio play list.

The embodiment of the invention also provides a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the computer program realizes the steps of the personalized audio playlist generating method.

The embodiment of the invention provides a method and a device for generating a personalized audio play list and a readable storage medium. Secondly, a content portrait system is constructed, and all audios are automatically and finely described by adopting a data excavator statistical analysis related technology. And finally, automatically and intelligently constructing a play list based on each user interest point through an artificial intelligence algorithm. The method can automatically and individually generate the playlist for each listener according to the interest preference of each listener, so that the requirement of the listener for listening to the audio playlist is met, the user satisfaction is integrally improved, and the playlist production efficiency is improved.

Drawings

In order to more clearly illustrate the technical solution in the present embodiment, the drawings used in the prior art and the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for a person skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart illustrating a method for generating a personalized audio playlist according to an embodiment of the present invention.

Fig. 2 is a schematic algorithm flow diagram of a personalized audio playlist generation method according to an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a personalized audio playlist generating device according to an embodiment of the present invention.

Detailed Description

All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.

Any feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.

The following detailed description of embodiments of the invention refers to the accompanying drawings and examples.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for generating a personalized audio playlist according to an embodiment of the present invention.

The method comprises the following steps:

s11: constructing a user representation system according to the user information;

specifically, a user portrait system is constructed according to user information, and content information is described in an all-around manner. The method comprises the following steps: and collecting user information, and constructing a user portrait system in a data mining and data statistics mode. The user representation system is divided into two parts, namely a user long-term characteristic and a user short-term characteristic, wherein the user long-term characteristic is used for storing persistent user attributes which cannot be changed due to time, space and the like; the user short-term characteristics are used to preserve the interest preferences developed by the user in a short time.

The user representation system includes, but is not limited to, the following dimensional information:

(1): the age of the user;

(2): the gender of the user;

(3): the user listens to the audio record historically;

(4): a user listens to a playlist record historically;

(5): the user listening behaviors to the single audio comprise clicking behaviors, comment behaviors, praise behaviors and listening duration of the single audio dimension;

(6): the user listens to the existing playlist, and the listening behaviors include clicking behaviors, comment behaviors, praise behaviors and listening duration of the playlist dimension;

(7): characteristics of the user listening to the audio including, but not limited to, audio theme, audio category, audio partner, total audio duration, audio production time;

(8): the user listens to the playlist for features including, but not limited to, playlist category, playlist topic, playlist author, playlist overall duration, playlist production time.

S12: constructing an audio content portrait system according to the audio content information;

specifically, an audio content representation system is constructed according to the audio content information, and the audio content information is described in an all-around mode. The method comprises the following steps: and constructing audio content-related feature engineering, and collecting listening feedback of each audio of each user, wherein the feedback information comprises active feedback and passive feedback.

The active feedback comprises the following steps: the user comments have words of positive encouragement, and praise;

the passive feedback includes: clicking behavior of a user and listening duration of the user;

and constructing and finishing the audio content image system through data mining and a statistical correlation algorithm.

It should be noted that the audio content representation system includes, but is not limited to, the following dimensional information:

(1): the category to which the audio content belongs;

(2): feedback of audio content to individual users (including positive feedback, negative feedback);

(3): a production time of the audio content;

(4): a producer of audio content;

(5): a content tag of the audio content.

S13: and training an audio play list generating model according to the user portrayal system and the audio content portrayal system to generate an audio play list.

Specifically, please refer to fig. 2, where fig. 2 is a schematic diagram of an algorithm flow of a personalized audio playlist generating method according to an embodiment of the present invention. Training an audio play list generating model according to a user portrayal system and an audio content portrayal system, wherein the process of generating the audio play list comprises the following steps:

s131: generating a user feature vector according to a user portrait system;

the generation of the user feature vector by the user profile system is to generate the user feature vector from the user profile for each user.

Specifically, the process of generating the user feature vector according to the user portrait system is as follows:

the user feature vector is formed by splicing a user short-term feature vector, a user long-term feature vector and a user map graph according to a fixed sequence. The method comprises the specific steps of carrying out,

s1311: the user short-term feature vector is generated according to four categories of features of the user age, the user gender, the user interest preference and the user income level. Each category feature generates a corresponding onehot code. The Onehot encoding method is to convert all n class features into vectors containing n 0-1, each corresponding to a class name. After onehot codes of four features are spliced in a fixed order, dimension reduction is performed by using an Autoencoder.

The self-encoder training method comprises the following steps: the input is the user short-term feature vector onehot code, and the output is the vector with the same dimension as the input. The training goal is to have the input value equal to the output value. And when the accuracy of the model verification set reaches a certain threshold value, the characteristic vector of the middle layer of the model is a dimension reduction result. The number of layers of the middle layer can be changed according to requirements. Equation 1 is the process of passing the input high-dimensional features through the intermediate layer vector generated by the encoder. Equation 2 is the process of converting the intermediate layer vector into the original vector again through the decoding step, and equation 3 is the loss function.

Formula 1 h ═ g θ₁(x)＝σ(W₁x+b₁)

Equation 2:

equation 3:

s1312: the long-term feature vector of the user is formed by splicing features generated by five dimensions of forward comment of the user, probability distribution of listened audio subjects, probability distribution of listened audio categories, probability distribution of listened audio authors and probability statistics of audio generation time periods according to a fixed sequence. The user forward comment feature generation mode is that dictionary word segmentation is carried out on user comment texts. The dictionary used is a manual collection of defined forward words. All the cut forward words are generated into word vectors (word 2vec model can be used). And accumulating all the word vectors cut out to be used as the user forward comment vector. In the formula

For the weights of the words, the vector may be generated in such a way that each word weight is equal. S is the number of words cut out in a text.

Equation 4:

s1313: the feature vector of the audio theme probability distribution and the audio category probability distribution is the probability value of each theme and category listened to by the user, and the feature vector of the audio generation time is onehot coding of the time period of audio generation. The spliced vector is used for training the Autoencoder dimensionality reduction in the mode in the step S1311.

S1314: the user map construction method comprises the following steps: the atlas entity is each user, the relationship between the entities is the same audio content listened by two users, the more the number of the audio contents listened by two users is, the larger the weight of the relationship between the two entities is. The graphmembedding training may use the node2vec graph embedding algorithm.

S132: generating an audio content feature vector according to the audio content representation system;

specifically, the process of generating the audio content feature vector according to the audio content representation system is as follows:

the audio content feature vectors are concatenated in a fixed order by the following feature vectors:

the audio content processing method comprises the steps of user graph embedding with positive feedback on the audio content, user graph embedding with negative feedback on the audio content, user positive feedback frequency, user negative feedback frequency, audio content classification onehot coding, audio content label onehot coding, sound characteristic vector and audio content characteristic vector. Specifically, the method comprises the following steps:

s1321: the manner of searching for the positive feedback and the negative feedback is the same as the manner of searching for the positive comment in step S1312, and those skilled in the art can implement the method according to the above description, and details are not repeated here. The method for generating the user graph marking is the same as that in step S1314, and the onehot encoding method is the same as that in step S1311, and those skilled in the art can implement the method according to the above steps, and details are not described here.

S1322: the audio feature vectors are extracted dnn as audio feature vectors using an audio fingerprinting algorithm, such as extracting audio mfcc (mel-frequency cepstral coefficients), using dnn to perform classification supervised training on the mfcc.

S1323: the audio content feature vector generation mode is to convert the audio content into a text through a speech recognition algorithm and convert the text into a feature vector through a natural language processing technology. Such as using text-cnn or sense 2 vec.

S1324: after the vectors of the above-mentioned portions are spliced, the vectors are reduced in dimension by using the method in step S1313, which can be implemented by those skilled in the art according to the above-mentioned steps, and are not described herein again. The final generated vector is the audio content feature vector.

S133: generating a set of similar users for each user;

specifically, the process of generating a similar user set for each user is as follows: for each user, the n most similar users are found out by using a nearest neighbor search method (a classical nearest neighbor search algorithm such as kd-tree can be used) according to the feature vectors of the users among the feature vectors of all other users. Each user generates a set of similar users.

S134: generating a set of similar audio content for each audio content;

specifically, the process of generating a set of similar audio contents for each audio content is as follows: and aiming at each audio content, using a nearest neighbor search method to find n most similar audio contents in all other audio content feature vectors according to the feature vector of the audio content. Each audio content generates a set of most similar audio content.

S135: generating a user-personalized audio content set for each user;

specifically, the process of generating the user personalized audio content set for each user is as follows: for each user, the audio content that the user provided positive feedback is found in its user representation. A set of user favorite audio content is generated. The user positive feedback comprises active positive feedback and passive positive feedback. Active forward feedback: 1. the user portrays that forward words appear in the user's description of the audio content. 2. The user has praise with the audio content. Passive forward-backward-blocking: the user listens to the audio content for a period of time exceeding s minutes. The duration of S can take different values according to the type of the audio content.

S136: generating a prediction personalized audio content recall set according to the user personalized audio content set;

specifically, the process of generating the predicted personalized audio content recall set according to the user personalized audio content set is as follows: for each audio content in the user's favorite audio content set, n audio contents are selected from the audio content similar audio content set to be put into the recall set. The selection method can be adjusted according to the target, and can be random selection (improving the coverage rate of the system recalling the audio content) or the selection of the most similar n (improving the accuracy). The method comprises the following steps:

s1361: for each audio content in the user's favorite audio content set, n audio contents are selected from the audio content similar audio content set to be put into the recall set. The selection method can be adjusted according to the target, and can be random selection (improving the coverage rate of the audio content recalled by the system) or selection of the most similar n (improving the accuracy)

S1362: and selecting n users from the similar user set of the users, wherein the size of n is selected according to requirements. The selection of the N users may be random or the most similar N. For each of n similar users, placing audio content from the user's favorite audio content collection into a predictive personalized audio content recall collection.

S137: and classifying the audio contents in the predicted personalized audio content recall set according to the theme dimension, and generating personalized audio broadcasting lists according to the themes.

It is noted that the subject dimension includes, but is not limited to: such as music, injury, joy, etc. According to the embodiment of the application, in the using process, the playlist can be automatically and individually generated for each listener according to the interest preference of each listener, so that the requirement of the listener for listening to the audio playlist is met. User satisfaction is integrally improved, and bill playing production efficiency is improved.

Of course, the embodiment of the present invention is not limited to automatically generating the personalized playlist for the audio content by using the above method, and may also be implemented by using other methods. The embodiment of the present invention is not limited to specific methods.

On the basis of the foregoing embodiments, the present invention provides a personalized audio playlist generating apparatus, which is specifically shown in fig. 3. The device includes:

It should be noted that the embodiment of the present invention has the same beneficial effects as the personalized audio playlist generating method in the foregoing embodiment, and for the specific description of the personalized audio playlist generating method in the embodiment of the present invention, please refer to the foregoing embodiment, which is not described herein again.

On the basis of the foregoing embodiments, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of the personalized audio playlist generating method are implemented.

It should be noted that the embodiment of the present invention has the same beneficial effects as the personalized audio playlist generating method in the foregoing embodiment, and please refer to the foregoing embodiment for specific description of the personalized audio playlist generating method in the foregoing embodiment of the present invention, which is not described herein again.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for generating a personalized audio playlist, comprising:

constructing a user representation system according to the user information;

2. The method of claim 1, wherein the constructing a user representation system based on user information comprises: and constructing a user image system through data mining and data statistical algorithms according to the collected user attribute information and the user interest preference information.

3. The method of claim 1, wherein the constructing an audio representation system from audio content information comprises: and (3) constructing an audio content characteristic project, collecting feedback information of the user on the audio content, and constructing an audio content portrait system by data mining and data statistics.

4. The method of claim 3, wherein the feedback information comprises active feedback information and/or passive feedback information.

5. The method of claim 1, wherein the audio playlist generation model is trained based on a user representation system and an audio content representation system, and the audio playlist generation process comprises:

generating a user feature vector according to a user portrait system;

generating a set of similar users for each user;

generating a set of similar audio content for each audio content;

generating a user-personalized audio content set for each user;

generating a prediction personalized audio content recall set according to the user personalized audio content set; and classifying the audio contents in the predicted personalized audio content recall set according to the theme dimension, and generating personalized audio broadcasting lists according to the themes.

6. The method of claim 5, wherein the generating of the recall set of predicted personalized audio content from the user-personalized audio content set comprises:

7. The method of claim 5, wherein the generating of the user feature vector according to the user representation system comprises:

8. The method of claim 5, wherein the generating of the audio content feature vector according to the audio content representation system comprises:

9. A personalized audio playlist generating apparatus, comprising:

10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of the personalized audio playlist generating method according to any one of claims 1 to 8.