CN112632318A - Audio recommendation method, device and system and storage medium - Google Patents

Audio recommendation method, device and system and storage medium Download PDF

Info

Publication number
CN112632318A
CN112632318A CN202011550325.9A CN202011550325A CN112632318A CN 112632318 A CN112632318 A CN 112632318A CN 202011550325 A CN202011550325 A CN 202011550325A CN 112632318 A CN112632318 A CN 112632318A
Authority
CN
China
Prior art keywords
audio
recommended
user
timbre
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011550325.9A
Other languages
Chinese (zh)
Inventor
喻浩文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anker Innovations Co Ltd
Original Assignee
Anker Innovations Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anker Innovations Co Ltd filed Critical Anker Innovations Co Ltd
Priority to CN202011550325.9A priority Critical patent/CN112632318A/en
Publication of CN112632318A publication Critical patent/CN112632318A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an audio recommendation method, device, system and storage medium, wherein the method comprises the following steps: acquiring audio to be recommended; extracting the characteristics of the audio to be recommended to obtain a timbre characteristic coding vector of the audio to be recommended; and obtaining a recommendation result based on the timbre characteristic coding vector of the audio to be recommended and a timbre characteristic coding vector set preferred by the user, wherein the recommendation result comprises the matching degree of the audio to be recommended and the timbre characteristic preference of the user. According to the method, the system and the storage medium, the audio recommendation of the user is completed based on the preference of the user on the tone quality characteristics without depending on behavior information and audio labeling information between other users, so that the personalized recommendation of the audio is realized, and the user experience is improved.

Description

Audio recommendation method, device and system and storage medium
Technical Field
The present invention relates to the field of audio technology, and more particularly to processing of audio recommendations.
Background
Audio recommendations are the recommendation of potential user preferred music for a user based on an estimate of the user's preference for audio. At present, the audio recommendation mode is mainly based on manual labeling information such as music styles, titles, albums and the like or calculation obtained through user behavior analysis for recommendation. Wherein the audio recommendation by user behavior analysis is based on collaborative filtering by a user and based on audio collaborative filtering. Taking a music-based collaborative filtering algorithm as an example, one audio may be liked by many users, if the sets of people who like two pieces of music are a and B, respectively, if the crowds of a and B are highly coincident, even the same group of people, then the two pieces of music can be considered to be similar pieces of music, so if one person likes one of the pieces of music, the other similar piece of music is recommended to the person according to the collaborative filtering result. Therefore, the traditional method is based on manual marking information and behavior information of a user, and is irrelevant to the tone quality characteristics of the audio and the waveform information of the audio. However, if an audio cannot obtain the corresponding manual label tag, such as album, title, or an audio is not heard by the user, and there is no corresponding user behavior information, then audio recommendation cannot be performed.
In addition, an audio recommendation method maps the audio waveform of the user to a collaborative filtering hidden factor (late factor) feature, so that even if the label of the audio is unknown and the audio is not heard by people, the collaborative filtering hidden factor feature of the music can be obtained from the waveform, and then the music is recommended to the user by matching with the known audio features in other music libraries. Although the method solves the problem of audio recommendation without labeling information, recommendation based on user tone quality characteristic preference cannot be realized, for example, the preference degree of the user to tone quality such as bass intensity cannot be known, and audio recommendation based on the preference degree of the tone quality characteristic cannot be performed; in addition, the output of the method is still the hidden factor characteristic of the collaborative filtering, and the behavior information of the user needs to be utilized and then matched with the characteristics of other music in the music library, and the music library still needs to analyze the behavior information of a large number of clients through the collaborative filtering method to learn the characteristics of the music. For a music portal site with more users, behavior information among a plurality of users can be obtained, but for other music platforms or individuals, the behavior information is not easy to realize.
Therefore, the audio recommendation method in the prior art still needs to be performed depending on behavior information among users, and the problem of performing personalized recommendation based on the preference of the users on the tone quality characteristics cannot be solved.
Disclosure of Invention
The present invention has been made in view of the above problems. The invention provides an audio recommendation method, device and system and a computer storage medium, which aim to solve the problem that audio recommendation based on user preference to tone quality characteristics cannot be realized due to the fact that user behavior information is required to be relied on.
According to a first aspect of the present invention, there is provided an audio recommendation method comprising:
acquiring audio to be recommended;
extracting the characteristics of the audio to be recommended to obtain a timbre characteristic coding vector of the audio to be recommended;
and obtaining a recommendation result based on the timbre characteristic coding vector of the audio to be recommended and a timbre characteristic coding vector set preferred by the user, wherein the recommendation result comprises the matching degree of the audio to be recommended and the timbre characteristic preference of the user.
Optionally, the performing feature extraction on the audio to be recommended to obtain a psychoacoustic feature encoding vector of the audio to be recommended includes:
performing feature extraction on the audio to be recommended to obtain the timbre features of the audio to be recommended;
clustering the timbre features of the audio to be recommended to obtain timbre feature distribution vectors of the audio to be recommended;
and coding the timbre characteristic distribution vector of the audio to be recommended to obtain the timbre characteristic coding vector of the audio to be recommended.
Optionally, the psychoacoustic feature comprises at least one of a MFCC, a one-frame time-domain waveform, a multi-frame time-domain waveform, or an artificial design feature.
Optionally, the encoding the psychoacoustic feature distribution vector of the audio to be recommended includes:
inputting the tone quality characteristic distribution vector of the audio to be recommended into a trained coding network for coding; the training method for obtaining the trained coding network comprises the following steps:
carrying out feature extraction and clustering on the audio preferred by the user to obtain a tone quality feature distribution vector of the audio preferred by the user;
training an unsupervised neural network based on the timbre feature distribution vector of the user preference audio to obtain the trained unsupervised neural network;
and selecting at least one hidden layer in the trained unsupervised neural network as the trained coding network.
Optionally, the method further comprises:
and inputting the timbre characteristic distribution vector of the audio preferred by the user into the trained coding network to obtain a timbre characteristic coding vector set preferred by the user.
Optionally, obtaining the recommendation result based on the psychoacoustic feature encoding vector of the audio to be recommended and a psychoacoustic feature encoding vector set of user preferences includes:
calculating the matching degree between the timbre characteristic coding vector of the audio to be recommended and a timbre characteristic coding vector set preferred by a user;
and taking the audio to be recommended with the matching degree meeting the preset condition and the corresponding matching degree as the recommendation result.
Optionally, the preset condition includes: the matching degree is greater than or equal to a recommendation threshold.
Optionally, calculating a matching degree between the psychoacoustic feature coding vector of the audio to be recommended and a psychoacoustic feature coding vector set preferred by a user includes:
averaging the timbre characteristic coding vectors preferred by the user in the timbre characteristic coding vector set preferred by the user to obtain an average coding vector, and calculating the matching degree of the timbre characteristic coding vector of the audio to be recommended and the average coding vector as the matching degree between the timbre characteristic coding vector of the audio to be recommended and the timbre characteristic coding vector preferred by the user;
or the like, or, alternatively,
and calculating the sub-matching degree of the timbre characteristic coding vector of the audio to be recommended and the timbre characteristic coding vector preferred by each user in the timbre characteristic coding vector set preferred by the users, and averaging all the sub-matching degrees to obtain the matching degree between the timbre characteristic coding vector of the audio to be recommended and the timbre characteristic coding vector preferred by the users.
Optionally, before obtaining the audio to be recommended, the method further includes: acquiring an audio list to be recommended, and sequentially calculating the matching degree of each audio to be recommended in the audio list to be recommended and the user tone quality characteristic preference; the recommendation result comprises: and the list of the audio to be recommended is sorted according to the matching degree with the user tone quality characteristic preference.
According to a second aspect of the present invention, there is provided an audio recommendation apparatus, the apparatus comprising:
the acquisition module is used for acquiring audio to be recommended;
the characteristic coding module is used for extracting the characteristics of the audio to be recommended to obtain the timbre characteristic coding vector of the audio to be recommended;
and the calculation module is used for obtaining a recommendation result based on the timbre characteristic coding vector of the audio to be recommended and a timbre characteristic coding vector set preferred by the user, wherein the recommendation result comprises the matching degree of the audio to be recommended and the timbre characteristic preference of the user.
According to a third aspect of the present invention, there is provided an audio recommendation system comprising a memory, a processor and a computer program stored on the memory and running on the processor, characterized in that the steps of the method of the first aspect are implemented when the computer program is executed by the processor.
According to a fourth aspect of the present invention, there is provided a computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a computer, implements the steps of the method of the first aspect.
According to the audio recommendation method, device and system and the computer storage medium provided by the embodiment of the invention, based on the preference of the user on the tone quality characteristics, the audio recommendation of the user is completed without depending on behavior information and audio labeling information between other users, so that the personalized recommendation of the audio is realized, and the user experience is improved.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.
FIG. 1 is a schematic flow chart diagram of an audio recommendation method according to an embodiment of the invention;
FIG. 2 is an example of an audio recommendation method according to an embodiment of the present invention;
FIG. 3 is a schematic block diagram of an audio recommendation device according to an embodiment of the present invention;
FIG. 4 is a schematic block diagram of an audio recommendation system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail below with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the invention described herein without inventive step, shall fall within the scope of protection of the invention.
Tone quality is a subjective evaluation of people on audio quality, and there are different preferences for each person of tone quality, for example, some people like music surge bass and some people like sound with more medium-high frequencies or even sharp spikes. Over time, the subjective sound assessment may also change with environmental changes, such as in high noise environments, where many people prefer to increase loudness, while pre-sleep music is typically turned down. In order to implement personalized audio recommendation for a user, audio needs to be recommended to the user according to the preference of the user on the tone quality characteristics.
Next, an audio recommendation method 100 according to an embodiment of the present invention will be described with reference to fig. 1. As shown in fig. 1, an audio recommendation method 100 includes:
firstly, in step S110, an audio to be recommended is acquired;
in step S120, performing feature extraction on the audio to be recommended to obtain a timbre feature coding vector of the audio to be recommended;
in step S130, a recommendation result is obtained based on the psychoacoustic feature encoding vector of the audio to be recommended and a psychoacoustic feature encoding vector set of user preferences, where the recommendation result includes a matching degree between the audio to be recommended and the user psychoacoustic feature preferences.
The method comprises the steps of extracting tone quality characteristics of audio to be recommended, counting tone quality characteristic distribution, matching the coded tone quality characteristic distribution vector with codes of tone quality characteristic distribution preferred by a user, and recommending the audio with high matching degree to the user. In the whole process, the tone quality characteristic preference of the user is learned based on the user preference audio data without the need of behavior information among the users, and then other music with similar characteristics is found from the existing audio library according to the learned characteristics and recommended to the user, so that the audio personalized recommendation is realized, and the user experience is improved. The method is suitable for being widely applied to various occasions needing audio recommendation, the effectiveness of the audio recommendation is improved, the marking information or the behavior information among users does not need to be acquired, and a large amount of time and labor cost are saved.
Alternatively, the audio recommendation method according to the embodiment of the present invention may be implemented in a device, an apparatus, or a system having a memory and a processor.
The audio recommendation method according to the embodiment of the invention can be deployed in a personal terminal or a server (or a cloud) in a whole or distributed manner. Such as a smart phone, tablet, personal computer, etc.
According to an embodiment of the present invention, in step S110, acquiring the audio to be recommended may include: and acquiring the audio to be recommended from local audio data or acquiring the audio to be recommended from other data sources. The other data sources may include audio data sources that are stored in the device or the cloud and that can exchange data with the user device.
In some embodiments, the selection method of the audio data to be recommended includes and is not limited to: randomly selecting audio data from an audio library and/or music data recommended by a recommendation algorithm of a music portal.
According to the embodiment of the present invention, in step S120, performing feature extraction on the audio to be recommended to obtain a timbre feature coding vector of the audio to be recommended, including:
performing feature extraction on the audio to be recommended to obtain the timbre features of the audio to be recommended;
clustering the timbre features of the audio to be recommended to obtain timbre feature distribution vectors of the audio to be recommended;
and coding the timbre characteristic distribution vector of the audio to be recommended to obtain the timbre characteristic coding vector of the audio to be recommended.
Optionally, the method for extracting the features of the audio to be recommended may include: at least one of FFT (Fast Fourier Transform), STFT (Short-Time Fourier Transform), DFT (Discrete Fourier Transform).
Optionally, the psychoacoustic features may include: MFCC (Mel Frequency Cepstrum Coefficient), one or more frames of time domain waveforms, and (other) artificial design features. For example, the psychoacoustic feature may be a feature vector or an intensity of a psychoacoustic characteristic.
In some embodiments, the feature vector of the psychoacoustic characteristic may be an m-dimensional vector.
In one embodiment, toThe bass intensity is a sound quality characteristic representing whether a music bass part is powerful or not, can be generally calculated by the ratio of the energy of a low frequency band to the energy of a full frequency band, and is a numerical value; if the low frequency band has m frequency points and the full frequency band has k frequency points, the amplitude of each frequency point is s (i), i is 1, 2 … … k, then the full frequency band amplitude
Figure BDA0002857717490000071
Then the feature vector of the timbre characteristic of bass intensity can be designed as V ═ V1, V2 … … vm]Where Vi ═ s (i)/E, i.e., the value of the element of V is the ratio of the amplitude of each frequency point to the amplitude of the full frequency point, and the sum of V elements can be regarded as the bass intensity. Similarly, for other psychoacoustic characteristics, the feature vector may be selected according to an actual calculation method.
In some embodiments, the psychoacoustic feature may be an intensity of a psychoacoustic characteristic, including, without limitation, numerical data. Then, the feature vector of the voice quality characteristic at this time is an intermediate result of calculating the voice quality feature strength.
In some embodiments, the performing feature extraction on the audio training data to obtain the psychoacoustic features of the audio training data may include: and framing the audio training data, and then performing feature extraction to obtain the tone quality features of the audio training data.
Optionally, clustering psychoacoustic features of the audio training data includes, but is not limited to, a Kmeans clustering algorithm. Wherein, the clustering parameters can be set to be N types according to requirements.
Optionally, clustering the psychoacoustic features of the audio to be recommended to obtain a psychoacoustic feature distribution vector of the audio to be recommended, including:
inputting the tone quality characteristics of the audio to be recommended into a trained clustering network for clustering; the training method for obtaining the trained clustering network comprises the following steps:
and performing feature extraction on audio training data to obtain the tone quality features of the audio training data, and clustering the tone quality features of the audio training data to obtain the clustering network.
Wherein the audio training data includes, but is not limited to, audio waveforms used to train neural networks, and the audio training data should cover a class of acoustic characteristics that are currently known. The psychoacoustic feature distribution vector represents a distribution of psychoacoustic features of the audio data. For example, for a piece of audio, there are c frames, after feature extraction, these frames are classified into N classes, and a feature vector is used to represent the class distribution of the c frames, and is denoted as vector U ═ U1, U2, U3 … … UN]Where Ui is the number of frames in the audio belonging to the ith category, e.g., U-3, 4,7,9 … …]The number of frames whose psychoacoustic features belong to class 1 is 3, class 2 is 4, class 3 is 7, and
Figure BDA0002857717490000081
the vector U is referred to as the psychoacoustic feature distribution vector of the audio.
It should be understood that the audio recommendation method according to the embodiment of the present invention is not limited by the feature extraction method and the clustering method, and the feature extraction method and the clustering method, which are already known, or the feature extraction method and the clustering method developed in the future are applicable to the audio recommendation method according to the embodiment of the present invention, and are not limited herein.
Optionally, the encoding the psychoacoustic feature distribution vector of the audio to be recommended includes:
inputting the tone quality characteristic distribution vector of the audio to be recommended into a trained coding network for coding; the training method for obtaining the trained coding network comprises the following steps:
carrying out feature extraction and clustering on the audio preferred by the user to obtain a tone quality feature distribution vector of the audio preferred by the user;
training an unsupervised neural network based on the timbre feature distribution vector of the user preference audio to obtain the trained unsupervised neural network;
and selecting at least one hidden layer in the trained unsupervised neural network as the coding network.
The clustering after feature extraction of the user preference audio can be performed based on a trained clustering network. The tone quality distribution characteristics of the user preference audio can be obtained based on the learning of the user preference audio, an unsupervised neural network is trained on the basis, the unsupervised neural network carries out compression coding on input characteristics, and a plurality of hidden layers in the unsupervised neural network after training are selected as a coding network. Based on the coding network, the acoustic characteristic distribution of the user preference audio is taken as an input characteristic, and the output of the hidden layer can be regarded as the coding of the input characteristic. If an unsupervised neural network of a single hidden layer is trained, such as a self-encoder of a single hidden layer or a constrained boltzmann machine, the hidden layer is unique; if a multi-hidden-layer unsupervised neural network is trained, such as a multi-hidden-layer auto-encoder or a deep belief network, at least one of the hidden layers may be selected.
Alternatively, the unsupervised neural network may include, and is not limited to, an autoencoder, a limited boltzmann machine, a deep belief network.
In some embodiments, the user preferred audio is collected in a manner that includes, but is not limited to, selecting audio data that is frequently played by the user, and/or audio data that is resident in a playlist, and/or audio data that is praised.
Optionally, obtaining the recommendation result based on the psychoacoustic feature encoding vector of the audio to be recommended and a psychoacoustic feature encoding vector set of user preferences includes:
calculating the matching degree between the timbre characteristic coding vector of the audio to be recommended and a timbre characteristic coding vector set preferred by a user;
and taking the audio to be recommended with the matching degree meeting the preset condition and the corresponding matching degree as the recommendation result.
In some embodiments, the method further comprises:
and inputting the timbre characteristic distribution vector of the audio preferred by the user into the trained coding network to obtain a timbre characteristic coding vector set preferred by the user.
The user preference audio can comprise one or more audios, the timbre characteristic distribution vector of each user preference audio corresponds to one coding vector, and the coding vectors of the plurality of user preference audio data characteristics form a timbre characteristic coding set preferred by the user. These coded feature vectors can be seen as an abstract representation model of the user's preference for the psycho-acoustic properties.
Optionally, the matching degree comprises a cosine similarity, or a euclidean distance.
Optionally, the preset condition includes: the matching degree is greater than or equal to a recommendation threshold.
The recommendation threshold value can be set as required, and when the matching degree between the timbre feature coding vector of the audio data to be recommended and the timbre feature coding vector set preferred by the user is greater than or equal to the recommendation threshold value, it is indicated that the timbre characteristics of the audio data to be recommended are closer to the timbre characteristics preferred by the user, and the audio data to be recommended are suitable for being recommended to the user.
According to the embodiment of the present invention, before acquiring the audio to be recommended in step S110, the method further includes: acquiring an audio list to be recommended, and sequentially calculating the matching degree of each audio to be recommended in the audio list to be recommended and the user tone quality characteristic preference;
the recommendation result comprises: and the list of the audio to be recommended is sorted according to the matching degree with the user tone quality characteristic preference.
The audio recommendation method comprises the steps of obtaining a plurality of audio to be recommended, wherein the audio to be recommended list comprises a plurality of audio to be recommended, the matching degree of the audio to be recommended is ranked from high to low, then the first plurality (namely the preset range, for example, the first m, m is a positive integer) with the highest matching degree is selected to form an audio recommendation list, and the audio recommendation list represents a plurality of audios which are closest to the sound quality characteristic preferred by a user in all the obtained audio to be recommended to the user. It should be understood that all the audios to be recommended in the previous list of audios to be recommended may also be sorted according to the matching degree and then recommended to the user, which is not limited herein.
In some embodiments, calculating a matching degree between the psychoacoustic feature coding vector of the audio to be recommended and a set of psychoacoustic feature coding vectors preferred by a user includes:
averaging the timbre characteristic coding vectors preferred by the user in the timbre characteristic coding vector set preferred by the user to obtain an average coding vector, and calculating the matching degree of the timbre characteristic coding vector of the audio to be recommended and the average coding vector as the matching degree between the timbre characteristic coding vector of the audio to be recommended and the timbre characteristic coding vector preferred by the user;
or the like, or, alternatively,
calculating the sub-matching degree of the timbre characteristic coding vector of the audio to be recommended and the timbre characteristic coding vector preferred by each user in the timbre characteristic coding vector set preferred by the users, and averaging the sub-matching degrees to obtain the matching degree between the timbre characteristic coding vector of the audio to be recommended and the timbre characteristic coding vector preferred by the users;
or the like, or, alternatively,
and calculating the sub-matching degree of the timbre characteristic coding vector of the audio to be recommended and the timbre characteristic coding vector preferred by the user in the timbre characteristic coding vector set preferred by the user, and averaging the sub-matching degrees meeting the preset requirement in the sub-matching degrees to obtain the matching degree between the timbre characteristic coding vector of the audio to be recommended and the timbre characteristic coding vector preferred by the user.
Wherein, the preset requirement can be set according to the requirement, and can include: and the sub-matching degrees are sorted from high to low by the first n sub-matching degrees, and n is a positive integer.
Optionally, the method 100 may further include:
updating the timbre characteristic distribution vector of the user preference audio based on the recommendation result;
updating the unsupervised neural network by using the updated timbre characteristic distribution vector of the user preference audio to obtain an updated unsupervised neural network;
and selecting at least one hidden layer in the updated unsupervised neural network to update the coding network.
Optionally, updating the timbre feature distribution vector of the user preferred audio based on the recommendation result may include:
adding the recommendation result to the user preference audio to obtain an updated user preference audio;
and after feature extraction is carried out on the updated user preference audio, inputting the user preference audio into the clustering network to obtain the updated timbre feature distribution vector of the user preference audio.
Optionally, updating the timbre feature distribution vector of the user preferred audio based on the recommendation result, and further comprising: and updating the timbre characteristic distribution vector of the audio preferred by the user based on the operation information of the user on the recommendation result.
Optionally, the operation information of the user on the recommendation result may include: deleting or not playing at least one of the recommendation results within a preset time range.
Optionally, updating the psychoacoustic feature distribution vector of the user preferred audio based on the operation information of the user on the recommendation result may include:
adding the recommendation to the user preference audio;
when the user deletes or does not play at least one of the recommendation results within a preset time range, deleting the at least one recommendation result from the user preference audio to obtain an updated user preference audio;
and after feature extraction is carried out on the updated user preference audio, inputting the user preference audio into the clustering network to obtain the updated timbre feature distribution vector of the user preference audio.
In one embodiment, the method may further comprise:
adding the recommendation result to the user preference audio to obtain an updated user preference audio, wherein if at least one of the recommendation results is deleted or not played by the user within a preset time range, the at least one is deleted from the user preference audio to obtain the updated user preference audio;
at intervals of preset time, performing feature extraction based on the updated user preference audio, and inputting the extracted user preference audio into the clustering network to obtain the updated timbre feature distribution vector of the user preference audio;
updating the unsupervised neural network by using the updated timbre characteristic distribution vector of the user preference audio to obtain an updated unsupervised neural network;
and selecting at least one hidden layer in the updated unsupervised neural network to update the coding network.
Alternatively, in the above embodiment, the feature extraction may be performed manually at any time based on the updated user preference audio as needed, and then the extracted feature is input to the clustering network to obtain the updated timbre feature distribution of the user preference audio.
In one embodiment, referring to fig. 2, fig. 2 illustrates an example of an audio recommendation method according to an embodiment of the present invention. As shown in fig. 2, an audio recommendation method according to an embodiment of the present invention includes:
in step S210, training audio data (200) in an audio database are obtained, after framing, feature extraction is carried out to obtain tone quality features (201) of the training audio data, clustering training is carried out on the tone quality features of the training audio data to obtain a clustering device, and the tone quality features are clustered into N classes (202);
in step S220, obtaining user preference audio data (203) based on a user preference list of a user, performing feature extraction on the user preference audio data to obtain a user preference timbre feature (204), and inputting the user preference audio data into the clustering device in step S210 to obtain a timbre feature distribution vector (205) of the user preference audio; training an unsupervised neural network based on the timbre feature distribution vector of the user preference audio, and selecting a hidden layer in the trained unsupervised neural network as an encoder (206); inputting the tone quality characteristic distribution vector of the audio preferred by the user into the encoder to obtain a tone quality characteristic coding set (207) preferred by the user;
in step S230, at least one audio to be recommended is obtained (208), after framing, feature extraction is carried out to generate the timbre feature (209) of the audio to be recommended, and the timbre feature distribution vector (210) of the audio to be recommended is obtained by inputting the clustering device in step S210; inputting the psychoacoustic feature distribution vector of the audio to be recommended into the encoder (211) in step S220 to obtain a psychoacoustic feature code (212) of the audio to be recommended;
calculating the matching degree (213) of the timbre characteristic codes of the audio to be recommended and the timbre characteristic code set preferred by the user;
recommending the audio to be recommended with the highest matching degree or the matching degree larger than a recommendation threshold value to the user as a recommendation result (214).
In the above embodiment, the method 200 may further include: adding the recommendation result to the user preference audio to obtain an updated user preference audio, wherein if the user dislikes at least one audio in the recommendation result (such as the user deletes or does not play the audio within a certain time), the at least one audio that the user dislikes is deleted from the user preference audio to obtain the updated user preference audio; repeatedly training a new said encoder based on the currently updated user preference audio, as appropriate.
Fig. 3 shows a schematic block diagram of an audio recommendation device 300 according to an embodiment of the present invention. Only the main functions of the respective components of the audio recommendation device 300 will be described below, and the details that have been described above will be omitted.
As shown in fig. 3, the audio recommendation apparatus 300 according to an embodiment of the present invention includes:
an obtaining module 310, configured to obtain an audio to be recommended;
the feature coding module 320 is configured to perform feature extraction on the audio to be recommended to obtain a timbre feature coding vector of the audio to be recommended;
the calculation module 330 is configured to obtain a recommendation result based on the psychoacoustic feature coding vector of the audio to be recommended and a psychoacoustic feature coding vector set of user preferences, where the recommendation result includes a matching degree between the audio to be recommended and the psychoacoustic feature preferences of the user.
And after the timbre feature distribution vector is coded, the timbre feature distribution vector is matched with the codes of the timbre feature distribution preferred by the user, and the audio with high matching degree is recommended to the user. In the whole process, the tone quality characteristic preference of the user is learned based on the user preference audio data without the need of behavior information among the users, and then other music with similar characteristics is found from the existing audio library according to the learned characteristics and recommended to the user, so that the audio personalized recommendation is realized, and the user experience is improved. The method is suitable for being widely applied to various occasions needing audio recommendation, the effectiveness of the audio recommendation is improved, the marking information or the behavior information among users does not need to be acquired, and a large amount of time and labor cost are saved.
FIG. 4 shows a schematic block diagram of an audio recommendation system 400 according to an embodiment of the present invention. The audio recommendation system 400 includes a storage 410, and a processor 420.
The storage 410 stores program codes for implementing respective steps in the audio recommendation method according to an embodiment of the present invention.
The processor 420 is configured to run the program codes stored in the storage device 410 to perform the corresponding steps of the audio recommendation method according to the embodiment of the present invention, and is configured to implement the corresponding modules in the audio recommendation device according to the embodiment of the present invention.
Furthermore, according to an embodiment of the present invention, there is also provided a storage medium on which program instructions are stored, which when executed by a computer or a processor are used for executing the corresponding steps of the audio recommendation method according to an embodiment of the present invention and for implementing the corresponding modules in the audio recommendation device according to an embodiment of the present invention. The storage medium may include, for example, a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, or any combination of the above storage media. The storage medium can be any combination of one or more computer-readable storage media, e.g., one containing computer-readable program code for randomly generating sequences of action instructions and another containing computer-readable program code for making audio recommendations.
In one embodiment, the computer program instructions may implement the functional modules of the audio recommendation apparatus according to the embodiment of the present invention when executed by a computer and/or may perform the audio recommendation method according to the embodiment of the present invention.
The modules in the audio recommendation system according to the embodiment of the present invention may be implemented by a processor of the electronic device for audio recommendation according to the embodiment of the present invention running computer program instructions stored in a memory, or may be implemented when computer instructions stored in a computer readable storage medium of a computer program product according to the embodiment of the present invention are run by a computer.
According to the audio recommendation method, the audio recommendation device, the audio recommendation system and the storage medium, based on the preference of the user on the tone quality characteristics, the audio recommendation of the user is completed without depending on behavior information and audio labeling information between other users, the audio personalized recommendation is realized, and the user experience is improved.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where such features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
The above description is only for the specific embodiment of the present invention or the description thereof, and the protection scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the protection scope of the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (12)

1. An audio recommendation method, characterized in that the method comprises:
acquiring audio to be recommended;
extracting the characteristics of the audio to be recommended to obtain a timbre characteristic coding vector of the audio to be recommended;
and obtaining a recommendation result based on the timbre characteristic coding vector of the audio to be recommended and a timbre characteristic coding vector set preferred by the user, wherein the recommendation result comprises the matching degree of the audio to be recommended and the timbre characteristic preference of the user.
2. The method of claim 1, wherein performing feature extraction on the audio to be recommended to obtain a psychoacoustic feature encoding vector of the audio to be recommended comprises:
performing feature extraction on the audio to be recommended to obtain the timbre features of the audio to be recommended;
clustering the timbre features of the audio to be recommended to obtain timbre feature distribution vectors of the audio to be recommended;
and coding the timbre characteristic distribution vector of the audio to be recommended to obtain the timbre characteristic coding vector of the audio to be recommended.
3. The method of claim 2, wherein the psychoacoustic features comprise at least one of MFCCs, one-frame time-domain waveforms, multiple-frame time-domain waveforms, or artificial design features.
4. The method of claim 2, wherein encoding the psychoacoustic feature distribution vector of the audio to be recommended comprises:
inputting the tone quality characteristic distribution vector of the audio to be recommended into a trained coding network for coding; the training method for obtaining the trained coding network comprises the following steps:
carrying out feature extraction and clustering on the audio preferred by the user to obtain a tone quality feature distribution vector of the audio preferred by the user;
training an unsupervised neural network based on the timbre feature distribution vector of the user preference audio to obtain the trained unsupervised neural network;
and selecting at least one hidden layer in the trained unsupervised neural network as the trained coding network.
5. The method of claim 4, wherein the method further comprises:
and inputting the timbre characteristic distribution vector of the audio preferred by the user into the trained coding network to obtain a timbre characteristic coding vector set preferred by the user.
6. The method of claim 1, wherein obtaining the recommendation result based on the psychoacoustic feature encoding vector of the audio to be recommended and a set of psychoacoustic feature encoding vectors of user preferences comprises:
calculating the matching degree between the timbre characteristic coding vector of the audio to be recommended and a timbre characteristic coding vector set preferred by a user;
and taking the audio to be recommended with the matching degree meeting the preset condition and the corresponding matching degree as the recommendation result.
7. The method of claim 6, wherein the preset conditions include: the matching degree is greater than or equal to a recommendation threshold.
8. The method of claim 6, wherein calculating a degree of matching between the psychoacoustic feature encoding vector of the audio to be recommended and a set of psychoacoustic feature encoding vectors of user preferences comprises:
averaging the timbre characteristic coding vectors preferred by the user in the timbre characteristic coding vector set preferred by the user to obtain an average coding vector, and calculating the matching degree of the timbre characteristic coding vector of the audio to be recommended and the average coding vector as the matching degree between the timbre characteristic coding vector of the audio to be recommended and the timbre characteristic coding vector preferred by the user;
or the like, or, alternatively,
and calculating the sub-matching degree of the timbre characteristic coding vector of the audio to be recommended and the timbre characteristic coding vector preferred by each user in the timbre characteristic coding vector set preferred by the users, and averaging all the sub-matching degrees to obtain the matching degree between the timbre characteristic coding vector of the audio to be recommended and the timbre characteristic coding vector preferred by the users.
9. The method of any one of claims 1-7, further comprising, prior to obtaining the audio to be recommended: acquiring an audio list to be recommended, and sequentially calculating the matching degree of each audio to be recommended in the audio list to be recommended and the user tone quality characteristic preference; the recommendation result comprises: and the list of the audio to be recommended is sorted according to the matching degree with the user tone quality characteristic preference.
10. An audio recommendation apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring audio to be recommended;
the characteristic coding module is used for extracting the characteristics of the audio to be recommended to obtain the timbre characteristic coding vector of the audio to be recommended;
and the calculation module is used for obtaining a recommendation result based on the timbre characteristic coding vector of the audio to be recommended and a timbre characteristic coding vector set preferred by the user, wherein the recommendation result comprises the matching degree of the audio to be recommended and the timbre characteristic preference of the user.
11. An audio recommendation system comprising a memory, a processor and a computer program stored on said memory and running on said processor, characterized in that said processor implements the steps of the method of any of claims 1 to 9 when executing said computer program.
12. A computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a computer, implements the steps of the method of any of claims 1 to 9.
CN202011550325.9A 2020-12-24 2020-12-24 Audio recommendation method, device and system and storage medium Pending CN112632318A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011550325.9A CN112632318A (en) 2020-12-24 2020-12-24 Audio recommendation method, device and system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011550325.9A CN112632318A (en) 2020-12-24 2020-12-24 Audio recommendation method, device and system and storage medium

Publications (1)

Publication Number Publication Date
CN112632318A true CN112632318A (en) 2021-04-09

Family

ID=75324280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011550325.9A Pending CN112632318A (en) 2020-12-24 2020-12-24 Audio recommendation method, device and system and storage medium

Country Status (1)

Country Link
CN (1) CN112632318A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113674063A (en) * 2021-08-27 2021-11-19 卓尔智联(武汉)研究院有限公司 Shopping recommendation method, shopping recommendation device and electronic equipment
CN113779415A (en) * 2021-10-22 2021-12-10 平安科技(深圳)有限公司 Training method, device and equipment of news recommendation model and storage medium
CN116911641A (en) * 2023-09-11 2023-10-20 深圳市华傲数据技术有限公司 Sponsored recommendation method, sponsored recommendation device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595550A (en) * 2018-04-10 2018-09-28 南京邮电大学 A kind of music commending system and recommendation method based on convolutional neural networks
CN109147804A (en) * 2018-06-05 2019-01-04 安克创新科技股份有限公司 A kind of acoustic feature processing method and system based on deep learning
CN109147807A (en) * 2018-06-05 2019-01-04 安克创新科技股份有限公司 A kind of range balance method, apparatus and system based on deep learning
CN109582817A (en) * 2018-10-30 2019-04-05 努比亚技术有限公司 A kind of song recommendations method, terminal and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595550A (en) * 2018-04-10 2018-09-28 南京邮电大学 A kind of music commending system and recommendation method based on convolutional neural networks
CN109147804A (en) * 2018-06-05 2019-01-04 安克创新科技股份有限公司 A kind of acoustic feature processing method and system based on deep learning
CN109147807A (en) * 2018-06-05 2019-01-04 安克创新科技股份有限公司 A kind of range balance method, apparatus and system based on deep learning
CN109582817A (en) * 2018-10-30 2019-04-05 努比亚技术有限公司 A kind of song recommendations method, terminal and computer readable storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113674063A (en) * 2021-08-27 2021-11-19 卓尔智联(武汉)研究院有限公司 Shopping recommendation method, shopping recommendation device and electronic equipment
CN113674063B (en) * 2021-08-27 2024-01-12 卓尔智联(武汉)研究院有限公司 Shopping recommendation method, shopping recommendation device and electronic equipment
CN113779415A (en) * 2021-10-22 2021-12-10 平安科技(深圳)有限公司 Training method, device and equipment of news recommendation model and storage medium
CN116911641A (en) * 2023-09-11 2023-10-20 深圳市华傲数据技术有限公司 Sponsored recommendation method, sponsored recommendation device, computer equipment and storage medium
CN116911641B (en) * 2023-09-11 2024-02-02 深圳市华傲数据技术有限公司 Sponsored recommendation method, sponsored recommendation device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
JP7137639B2 (en) SOUND QUALITY CHARACTERISTICS PROCESSING METHOD AND SYSTEM BASED ON DEEP LEARNING
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
CN109147807B (en) Voice domain balancing method, device and system based on deep learning
CN112632318A (en) Audio recommendation method, device and system and storage medium
US9401154B2 (en) Systems and methods for recognizing sound and music signals in high noise and distortion
CN108255840B (en) Song recommendation method and system
CN103729368B (en) A kind of robust audio recognition methods based on local spectrum iamge description
CN108549675B (en) Piano teaching method based on big data and neural network
KR101804967B1 (en) Method and system to recommend music contents by database composed of user's context, recommended music and use pattern
RU2427909C2 (en) Method to generate print for sound signal
CN111816170A (en) Training of audio classification model and junk audio recognition method and device
CN111428078A (en) Audio fingerprint coding method and device, computer equipment and storage medium
CN105895079A (en) Voice data processing method and device
US11410706B2 (en) Content pushing method for display device, pushing device and display device
CN111460215B (en) Audio data processing method and device, computer equipment and storage medium
Siddiquee et al. Association rule mining and audio signal processing for music discovery and recommendation
CN108777804B (en) Media playing method and device
CN111859008A (en) Music recommending method and terminal
CN111477248B (en) Audio noise detection method and device
CN113450811B (en) Method and equipment for performing transparent processing on music
Shirali-Shahreza et al. Fast and scalable system for automatic artist identification
Prapcoyo et al. Implementation of Mel Frequency Cepstral Coefficient and Dynamic Time Warping For Bird Sound Classification
CN113744721B (en) Model training method, audio processing method, device and readable storage medium
Doungpaisan et al. Query by Example of Speaker Audio Signals using Power Spectrum and MFCCs
Horsburgh et al. Music-inspired texture representation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210409

RJ01 Rejection of invention patent application after publication