CN109582822A - A kind of music recommended method and device based on user speech - Google Patents

A kind of music recommended method and device based on user speech Download PDF

Info

Publication number
CN109582822A
CN109582822A CN201811222418.1A CN201811222418A CN109582822A CN 109582822 A CN109582822 A CN 109582822A CN 201811222418 A CN201811222418 A CN 201811222418A CN 109582822 A CN109582822 A CN 109582822A
Authority
CN
China
Prior art keywords
user
subscriber
class
music
voice data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811222418.1A
Other languages
Chinese (zh)
Inventor
赵涛涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811222418.1A priority Critical patent/CN109582822A/en
Publication of CN109582822A publication Critical patent/CN109582822A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The application provides a kind of music recommended method and device based on user speech, the method includes obtaining the voice data of user, extracts corresponding class of subscriber and identification text in the voice data;According to the class of subscriber and identification text, corresponding recommendation music is obtained;Show corresponding recommendation music to user.It can be avoided during recommending music in the prior art not in view of age of user, cause to be directed to the problem that older user recommends inaccuracy.Generalization bounds are more perfect, and recommendation is also more accurate, to improve the satisfaction of user.

Description

A kind of music recommended method and device based on user speech
[technical field]
This application involves artificial intelligence application field more particularly to a kind of music recommended methods and dress based on user speech It sets.
[background technique]
Artificial intelligence (Artificial Intelligence;Al), it is research, develops for simulating, extending and extending people Intelligence theory, method, a new technological sciences of technology and application system.Artificial intelligence is one of computer science Branch, it attempts to understand the essence of intelligence, and produces a kind of new intelligence that can be made a response in such a way that human intelligence is similar Energy machine, the research in the field includes robot, language identification, image recognition, natural language processing and expert system etc..
In recent years, artificial intelligence technology has far-reaching development, and commercialization is done step-by-step.Especially Intelligent voice dialog Product has started based on talking with the rise of external Amazon Echo intelligent sound and Google Home intelligent sound Want the popular upsurge of the smart home product especially intelligent sound product of interactive mode.
The typically used as scene of Intelligent voice dialog product including intelligent sound box is among family, in the family User is interacted with voice with machine very natural.The more frequent application occurred in above-mentioned interaction includes according to user's language Sound plays song.
In current smart home product, when user expresses the non-precision demand of " playing a song ", cloud is general It can recommend some new songs or some hit songs at random.But the song of these recommendations, do not consider the age of user, Due to older user, since the surf time is relatively fewer, in server recorded data, for older The search of user and Download History proportion are smaller, and therefore, new song and hit song often can only covering part youthful users Demand, older user is often unable to satisfy.
As it can be seen that commending contents strategy is not perfect, accurate, and the satisfaction of user is not high in traditional music recommended method.
[summary of the invention]
The many aspects of the application provide a kind of music recommended method and device based on user speech, to mention for user For personalized service.
The one side of the application provides a kind of music recommended method based on user speech, comprising:
The voice data for obtaining user extracts corresponding class of subscriber and identification text in the voice data;
According to the class of subscriber and identification text, corresponding recommendation music is obtained;
Show corresponding recommendation music to user.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the user class It Bao Kuo not user's gender, age of user section.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the extraction institute Stating corresponding class of subscriber in voice data includes:
According to accessed user speech, using Application on Voiceprint Recognition mode, identification issues the class of subscriber of order voice.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, it is described according to institute The user speech got, using Application on Voiceprint Recognition mode, identification is issued before the class of subscriber of order voice, further includes:
According to the sound characteristic of different user classification, model training is carried out, establishes the vocal print processing mould of different user classification Type.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the extraction institute Stating corresponding identification text in voice data includes:
Speech recognition is carried out to the voice data using speech recognition modeling, to obtain the corresponding text of the voice data This request;Or,
Speech recognition, the predicate to obtain are carried out to the voice data using the speech recognition modeling of corresponding class of subscriber The corresponding text request of sound data.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, it is described according to institute Class of subscriber and identification text are stated, obtaining corresponding recommendation music includes:
Judge the type of the identification text;
If precision demand type, according to the corresponding recommendation music of the identification text search;
Phase is obtained in the corresponding recommendation music libraries of each class of subscriber according to the class of subscriber if general demand type The recommendation music answered.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, it is described according to institute Class of subscriber is stated, is obtained in the corresponding recommendation music libraries of each class of subscriber before recommending music accordingly, further includes:
It is opposite with the class of subscriber of each historical user based on what is extracted from the sample voice request data of each historical user The search music content answered establishes the corresponding recommendation music libraries of each class of subscriber.
Another aspect of the present invention provides a kind of music recommendation apparatus based on user speech, comprising:
Extraction module extracts corresponding class of subscriber and knowledge in the voice data for obtaining the voice data of user Other text;
Searching module, for obtaining corresponding recommendation music according to the class of subscriber and identification text;
Display module, for showing corresponding recommendation music to user.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the user class It Bao Kuo not user's gender, age of user section.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the extraction mould Block includes Application on Voiceprint Recognition submodule, for according to accessed user speech, using Application on Voiceprint Recognition mode, identification to issue order The class of subscriber of voice.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the extraction mould Block further includes that vocal print processing model foundation submodule carries out model training, build for the sound characteristic according to different user classification The vocal print of vertical different user classification handles model.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the extraction mould Block includes speech recognition submodule, for carrying out speech recognition to the voice data using speech recognition modeling, to obtain State the corresponding text request of voice data;Or, being carried out using the speech recognition modeling of corresponding class of subscriber to the voice data Speech recognition, to obtain the corresponding text request of the voice data.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the lookup mould Block is specifically used for:
Judge the type of the identification text;
If precision demand type, according to the corresponding recommendation music of the identification text search;
Phase is obtained in the corresponding recommendation music libraries of each class of subscriber according to the class of subscriber if general demand type The recommendation music answered.
The aspect and any possible implementation manners as described above, it is further provided a kind of implementation, the lookup mould Block further includes recommending music libraries setting up submodule, is used for:
It is opposite with the class of subscriber of each historical user based on what is extracted from the sample voice request data of each historical user The search music content answered establishes the corresponding recommendation music libraries of each class of subscriber.
The another aspect of the application provides a kind of equipment, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes any above-mentioned method.
The another aspect of the application provides a kind of computer readable storage medium, is stored thereon with computer program, special Sign is that the program realizes any above-mentioned method when being executed by processor.
It can be seen that based on above-mentioned introduction using scheme of the present invention, Generalization bounds are more perfect, recommend also more smart Standard, to improve the satisfaction of user.
[Detailed description of the invention]
It in order to more clearly explain the technical solutions in the embodiments of the present application, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is some realities of the application Example is applied, it for those of ordinary skill in the art, without any creative labor, can also be attached according to these Figure obtains other attached drawings.
Fig. 1 is the flow diagram for the voice-based music recommended method that some embodiments of the application provide;
Fig. 2 is the structural schematic diagram for the voice-based music recommendation apparatus that some embodiments of the application provide;
Fig. 3 is the block diagram suitable for being used to realize the exemplary computer system/server of the embodiment of the present invention.
[specific embodiment]
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art Whole other embodiments obtained without creative efforts, shall fall in the protection scope of this application.
In addition, the terms "and/or", only a kind of incidence relation for describing affiliated partner, indicates may exist Three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, these three situations of individualism B.Separately Outside, character "/" herein typicallys represent the relationship that forward-backward correlation object is a kind of "or".
Fig. 1 is the flow diagram for the music recommended method based on user speech that one embodiment of the application provides, such as Fig. 1 It is shown, comprising the following steps:
Step S11, the voice data for obtaining user extracts corresponding class of subscriber and identification text in the voice data This;
Step S12, according to the class of subscriber and identification text, corresponding recommendation music is obtained;
Step S13, show corresponding recommendation music to user.
In a kind of preferred implementation of step S11,
Preferably, class of subscriber is identified using Application on Voiceprint Recognition mode according to the voice data of accessed user.
Specifically, the class of subscriber includes user's gender, age of user section.
By taking user's gender as an example, the attribute value of user's gender is male, women.
By taking the age as an example, the attribute value of age of user section be children, youth, middle age and old age or value be 1-5 years old, 6-9 years old, 10-15 years old, 16-18 years old, 19-25 years old, 26-30 years old, 31-35 years old, 36-40 years old, 41-50 years old, 51-60 years old, 61-70 Year, 71-80 years old, 80 years old or more.
Due to different user classification, the i.e. user group of different sexes, age bracket, there is special vocal print feature, therefore, Before carrying out Application on Voiceprint Recognition, model training can be carried out, different user classification is established according to the sound characteristic of different user classification Vocal print handle model, with realize the user group towards different user classification voiceprint analysis.When user initiates phonetic search, The order voice that can be issued according to user identifies the user's gender for issuing order voice, age using Application on Voiceprint Recognition mode Segment information.
It before Application on Voiceprint Recognition, needs first to model the vocal print of speaker, i.e. " training " or " study ".The model Neural network model generally in deep learning method, such as deep neural network model, convolutional neural networks model etc..Tool Body, by applying deep neural network DNN vocal print baseline system, extract the first eigenvector of every voice in training set; Gender sorter is respectively trained in the gender that marks according to the first eigenvector of every voice and in advance, age bracket label With character classification by age device, thus establish distinguish gender, age bracket vocal print processing model.
According to accessed order voice, the fisrt feature information of the order voice is extracted, and fisrt feature is believed Breath is sent respectively to pre-generated gender sorter and age bracket classifier.Gender sorter and age bracket classifier are to first Characteristic information is analyzed, and obtains the gender label and age bracket label of the fisrt feature information, that is, order voice Gender label and age bracket label.
For example, by taking gender sorter is gauss hybrid models as an example, it is special fundamental frequency first can be extracted to the voice request Sign and mel-frequency cepstrum coefficient MFCC feature later can be based on gauss hybrid models to fundamental frequency feature and MFCC feature Posterior probability values calculating is carried out, the gender of the user is determined according to calculated result, for example, it is assumed that the gauss hybrid models are male Gauss hybrid models, then when calculated result be posterior probability values it is very high, such as larger than certain threshold value when, it may be determined that the gender of the user For male, when calculated result is posterior probability values very little, such as less than certain threshold value, it may be determined that the gender of the user is women.
Preferably, the vocal print feature is d-vector feature, is by deep neural network (Deep Neural Network, DNN) a kind of feature for extracting, the output of the last layer hidden layer in specifically DNN.
Preferably, speech recognition is carried out according to voice data of the preset speech recognition modeling to accessed user, To obtain the corresponding identification text of the voice data.
Speech recognition (specific extracting method and existing skill are carried out using voice data of the existing voice recognition methods to user Art is identical, and the present embodiment is not specifically limited herein), obtain the corresponding identification text of the voice data.
Preferably, the speech recognition steps can carry out simultaneously with above-mentioned Application on Voiceprint Recognition step.
Preferably, in the preferred embodiment of the application,
Above-mentioned Application on Voiceprint Recognition step is carried out first, identifies class of subscriber;
Then according to the class of subscriber, voice is carried out to order voice using the speech recognition modeling of corresponding class of subscriber Identification, to obtain the corresponding text request of the voice data.
Specifically, the corpus for acquiring corresponding different user types forms corpus, carries out voice knowledge using the corpus Other model training obtains the speech recognition modeling of corresponding user type.
By using corresponding speech recognition modeling to different types of user, the accuracy of speech recognition can be improved.
Preferably, the text request includes following two type: precision demand, general demand.For example, precision demand is " Play song Super Star ";General demand is " playing a song ".
In a kind of preferred implementation of step S12,
According to the class of subscriber and identification text, corresponding recommendation music is obtained.
Preferably, first determine whether that the type of the identification text does not consider the classification of user, directly for precision demand It is accurately inquired, searches corresponding music list;General demand is carried out according to the classification of user, including age and gender Music screening.
Preferably, according to the identification text, the semantic feature of the corresponding text of the voice data is extracted;Wherein, language Adopted feature is used to characterize the semantic information of the corresponding text of the voice data, specifically can be corresponding using the voice data The term vector or sentence vector of text indicate;Wherein, sentence vector can be added by the term vector of each word in identification text After take mean value to obtain, term vector extracting method is same as the prior art, such as uses Word2Vec technology, extracts every in identification text The term vector of a word, the present embodiment are not specifically limited herein.
Preferably, word segmentation processing is carried out to the identification text, determines whether the identification text is to recommend musical instruction, It whether include specific musical designation.If the recommendation musical instruction including specific musical designation, then it is judged as precision demand, if Do not include the recommendation musical instruction of specific musical designation, is then judged as general demand.
Preferably for the precision demand for for example " playing song Super Star ", needed for user has specified The music to be played therefore, there is no need to the classification for considering user, directly search corresponding music list and be showed.
Preferably, to identify that the specific musical designation for including in text is scanned for as search term, music searching column are obtained Table.
Preferably for the general demand for example " to play a song ", the music played required for being not known due to user, Therefore, it is necessary to be recommended according to class of subscriber.
Preferably, it according to the class of subscriber, including age and gender, is obtained in recommending music libraries in corresponding recommend Hold.It wherein, include multiple corresponding relationship models in the recommendation music libraries, also, each corresponding relationship model is based respectively on Relatively right search music content of historical user's classification extracted from the sample voice request data of each historical user and each It establishes.
Preferably, described that music libraries is recommended to pass through the smart home for collecting a large amount of historical users and scheme execution in advance Interactive information between product is constructed, and interactive information includes user voice data, the corresponding user class of user voice data Other and user corresponding musical designation of precision demand.
Preferably, can also be according in other music servers, different classes of user records the search of music and establishes Each corresponding relationship model.Wherein, according to the attribute information of user, for example, gender, age information that user fills in carry out user Cluster, is divided into multiple class of subscribers.
Preferably, it in the corresponding relationship model, is arranged according to searching times/broadcasting frequency of the wherein music Sequence.
For example, being 61-70 years old for age of user, gender is woman's class of subscriber, is obtained in its corresponding relationship model Music candidate result.
Preferably, it in a kind of preferred implementation of the present embodiment, is clustered, is established different according to the age of user The corresponding relationship model of the user at age.That is, music candidate result in the corresponding relationship model only with the age phase of user It closes, puts aside the gender of user.
After obtaining corresponding music recommendation results according to the age of user, screened further according to the gender of user, for example, It is woman user for gender, it is the music of songstress as consequently recommended knot that singer is screened from the music recommendation results Fruit.
In a kind of preferred implementation of step S13,
Show corresponding recommendation music to user.
Preferably for the precision demand for for example " playing song Super Star ", directly broadcasting is searched " Super Star " collating sequence highest music in music list, wherein the sequence is according to it in search and broadcasting time It carries out.
Preferably for the general demand for example " to play a song ", show music recommendation results, the music to user Recommendation results include sort a forward head or the more song lists being ranked up according to its search and broadcasting time, with Just user selects.
According to the present embodiment the method, corresponding use in the voice data can be extracted according to the voice data of user Family classification and identification text obtain corresponding recommendation music.Can corresponding sound targetedly be provided to different user classification It is happy.Age of user and gender can not be learnt compared to traditional Generalization bounds, and after joined age and gender, Generalization bounds can be more Add it is kind, recommend it is also more accurate, to improve the satisfaction of user.Due to being implicit recommendation identification, even if identification is wrong once in a while Accidentally recommend mistake, user will not obviously perceive.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the application is not limited by the described action sequence because According to the application, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, related actions and modules not necessarily the application It is necessary.
In the described embodiment, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.
Fig. 2 is the structural schematic diagram for the music recommendation apparatus based on user speech that one embodiment of the application provides, such as Fig. 2 It is shown, comprising:
Extraction module 21, for obtaining the voice data of user, extract in the voice data corresponding class of subscriber and Identify text;
Searching module 22, for obtaining corresponding recommendation music according to the class of subscriber and identification text;
Display module 23, for showing corresponding recommendation music to user.
In a kind of preferred implementation of extraction module 21,
Preferably, the extraction module 21 includes Application on Voiceprint Recognition submodule, with the voice number according to accessed user According to using Application on Voiceprint Recognition mode, identification class of subscriber.
Specifically, the class of subscriber includes user's gender, age of user section.
By taking user's gender as an example, the attribute value of user's gender is male, women.
By taking the age as an example, the attribute value of age of user section be children, youth, middle age and old age or value be 1-5 years old, 6-9 years old, 10-15 years old, 16-18 years old, 19-25 years old, 26-30 years old, 31-35 years old, 36-40 years old, 41-50 years old, 51-60 years old, 61-70 Year, 71-80 years old, 80 years old or more.
Due to different user classification, the i.e. user group of different sexes, age bracket, there is special vocal print feature, therefore, institute Stating extraction module 21 further includes vocal print processing model foundation submodule, is used for before carrying out Application on Voiceprint Recognition, according to different user The sound characteristic of classification carries out model training, the vocal print processing model of different user classification is established, to realize towards different user The voiceprint analysis of the user group of classification.When user initiates phonetic search, the order voice that can be issued according to user, using sound Line identification method identifies the user's gender for issuing order voice, age segment information.
It before Application on Voiceprint Recognition, needs first to model the vocal print of speaker, i.e. " training " or " study ".The model Neural network model generally in deep learning method, such as deep neural network model, convolutional neural networks model etc..Tool Body, by applying deep neural network DNN vocal print baseline system, extract the first eigenvector of every voice in training set; Gender sorter is respectively trained in the gender that marks according to the first eigenvector of every voice and in advance, age bracket label With character classification by age device, thus establish distinguish gender, age bracket vocal print processing model.
According to accessed order voice, the fisrt feature information of the order voice is extracted, and fisrt feature is believed Breath is sent respectively to pre-generated gender sorter and age bracket classifier.Gender sorter and age bracket classifier are to first Characteristic information is analyzed, and obtains the gender label and age bracket label of the fisrt feature information, that is, order voice Gender label and age bracket label.
For example, by taking gender sorter is gauss hybrid models as an example, it is special fundamental frequency first can be extracted to the voice request Sign and mel-frequency cepstrum coefficient MFCC feature later can be based on gauss hybrid models to fundamental frequency feature and MFCC feature Posterior probability values calculating is carried out, the gender of the user is determined according to calculated result, for example, it is assumed that the gauss hybrid models are male Gauss hybrid models, then when calculated result be posterior probability values it is very high, such as larger than certain threshold value when, it may be determined that the gender of the user For male, when calculated result is posterior probability values very little, such as less than certain threshold value, it may be determined that the gender of the user is women.
Preferably, the vocal print feature is d-vector feature, is by deep neural network (Deep Neural Network, DNN) a kind of feature for extracting, the output of the last layer hidden layer in specifically DNN.
Preferably, the extraction module 21 further includes speech recognition submodule, for according to preset speech recognition modeling Speech recognition is carried out to the voice data of accessed user, to obtain the corresponding identification text of the voice data.
Speech recognition (specific extracting method and existing skill are carried out using voice data of the existing voice recognition methods to user Art is identical, and the present embodiment is not specifically limited herein), obtain the corresponding identification text of the voice data.
Preferably, the speech recognition steps can carry out simultaneously with above-mentioned Application on Voiceprint Recognition step.
Preferably, in the preferred embodiment of the application, the speech recognition submodule is using corresponding class of subscriber Speech recognition modeling to the voice data carry out speech recognition, to obtain the corresponding text request of the voice data.
Above-mentioned Application on Voiceprint Recognition step is carried out by the Application on Voiceprint Recognition submodule first, identifies class of subscriber;
Then by the speech recognition submodule according to the class of subscriber, using the speech recognition mould of corresponding class of subscriber Type carries out speech recognition to order voice, to obtain the corresponding text request of the voice data.
Specifically, the corpus for acquiring corresponding different user types forms corpus, carries out voice knowledge using the corpus Other model training obtains the speech recognition modeling of corresponding user type.
By using corresponding speech recognition modeling to different types of user, the accuracy of speech recognition can be improved.
Preferably, the text request includes following two type: precision demand, general demand.For example, precision demand is " Play song Super Star ";General demand is " playing a song ".
In a kind of preferred implementation of searching module 22,
The searching module 22 is used to obtain corresponding recommendation music according to the class of subscriber and identification text.
Preferably, first determine whether that the type of the identification text does not consider the classification of user, directly for precision demand It is accurately inquired, searches corresponding music list;General demand is carried out according to the classification of user, including age and gender Music screening.
Preferably, according to the identification text, the semantic feature of the corresponding text of the voice data is extracted;Wherein, language Adopted feature is used to characterize the semantic information of the corresponding text of the voice data, specifically can be corresponding using the voice data The term vector or sentence vector of text indicate;Wherein, sentence vector can be added by the term vector of each word in identification text After take mean value to obtain, term vector extracting method is same as the prior art, such as uses Word2Vec technology, extracts every in identification text The term vector of a word, the present embodiment are not specifically limited herein.
Preferably, word segmentation processing is carried out to the identification text, determines whether the identification text is to recommend musical instruction, It whether include specific musical designation.If the recommendation musical instruction including specific musical designation, then it is judged as precision demand, if Do not include the recommendation musical instruction of specific musical designation, is then judged as general demand.
Preferably for the precision demand for for example " playing song Super Star ", needed for user has specified The music to be played therefore, there is no need to the classification for considering user, directly search corresponding music list and be showed.
Preferably, to identify that the specific musical designation for including in text is scanned for as search term, music searching column are obtained Table.
Preferably for the general demand for example " to play a song ", the music played required for being not known due to user, Therefore, it is necessary to be recommended according to class of subscriber.
Preferably, it according to the class of subscriber, including age and gender, is obtained in recommending music libraries in corresponding recommend Hold.The searching module 22 further includes recommending music libraries setting up submodule, for based on asking from the sample voice of each historical user The search music content corresponding with the class of subscriber of each historical user extracted in data is sought, it is corresponding to establish each class of subscriber Recommend music libraries.
It preferably, include multiple corresponding relationship models in the recommendation music libraries, also, each corresponding relationship model divides Not based on the relatively right search sound of historical user's classification extracted from the sample voice request data of each historical user and each Happy content is established.
Preferably, described that music libraries is recommended to pass through the smart home for collecting a large amount of historical users and scheme execution in advance Interactive information between product is constructed, and interactive information includes user voice data, the corresponding user class of user voice data Other and user corresponding musical designation of precision demand.
Preferably, can also be according in other music servers, different classes of user records the search of music and establishes Each corresponding relationship model.Wherein, according to the attribute information of user, for example, gender, age information that user fills in carry out user Cluster, is divided into multiple class of subscribers.
Preferably, it in the corresponding relationship model, is arranged according to searching times/broadcasting frequency of the wherein music Sequence.
For example, being 61-70 years old for age of user, gender is woman's class of subscriber, is obtained in its corresponding relationship model Music candidate result.
Preferably, it in a kind of preferred implementation of the present embodiment, is clustered, is established different according to the age of user The corresponding relationship model of the user at age.That is, music candidate result in the corresponding relationship model only with the age phase of user It closes, puts aside the gender of user.
After obtaining corresponding music recommendation results according to the age of user, screened further according to the gender of user, for example, It is woman user for gender, it is the music of songstress as consequently recommended knot that singer is screened from the music recommendation results Fruit.
In a kind of preferred implementation of step display module 23,
The step display module 23 is used to show to user corresponding recommendation music.
Preferably for the precision demand for for example " playing song Super Star ", directly broadcasting is searched " Super Star " collating sequence highest music in music list, wherein the sequence is according to it in search and broadcasting time It carries out.
Preferably for the general demand for example " to play a song ", show music recommendation results, the music to user Recommendation results include sort a forward head or the more song lists being ranked up according to its search and broadcasting time, with Just user selects.
According to the present embodiment described device, corresponding use in the voice data can be extracted according to the voice data of user Family classification and identification text obtain corresponding recommendation music.Can corresponding sound targetedly be provided to different user classification It is happy.Age of user and gender can not be learnt compared to traditional Generalization bounds, and after joined age and gender, Generalization bounds can be more Add it is kind, recommend it is also more accurate, to improve the satisfaction of user.Due to being implicit recommendation identification, even if identification is wrong once in a while Accidentally recommend mistake, user will not obviously perceive.
It is apparent to those skilled in the art that for convenience and simplicity of description, the terminal of the description It with the specific work process of server, can refer to corresponding processes in the foregoing method embodiment, details are not described herein.
In several embodiments provided herein, it should be understood that disclosed method and apparatus can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or logical of device or unit Letter connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.The integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
Fig. 3 shows the frame for being suitable for the exemplary computer system/server 012 for being used to realize embodiment of the present invention Figure.The computer system/server 012 that Fig. 3 is shown is only an example, should not function and use to the embodiment of the present invention Range band carrys out any restrictions.
As shown in figure 3, computer system/server 012 is showed in the form of universal computing device.Computer system/clothes The component of business device 012 can include but is not limited to: one or more processor or processing unit 016, system storage 028, connect the bus 018 of different system components (including system storage 028 and processing unit 016).
Bus 018 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Computer system/server 012 typically comprises a variety of computer system readable media.These media, which can be, appoints The usable medium what can be accessed by computer system/server 012, including volatile and non-volatile media, movably With immovable medium.
System storage 028 may include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 030 and/or cache memory 032.Computer system/server 012 may further include other Removable/nonremovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 034 can For reading and writing immovable, non-volatile magnetic media (Fig. 3 do not show, commonly referred to as " hard disk drive ").Although in Fig. 3 It is not shown, the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided, and to can The CD drive of mobile anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these situations Under, each driver can be connected by one or more data media interfaces with bus 018.Memory 028 may include At least one program product, the program product have one group of (for example, at least one) program module, these program modules are configured To execute the function of various embodiments of the present invention.
Program/utility 040 with one group of (at least one) program module 042, can store in such as memory In 028, such program module 042 includes --- but being not limited to --- operating system, one or more application program, other It may include the realization of network environment in program module and program data, each of these examples or certain combination.Journey Sequence module 042 usually executes function and/or method in embodiment described in the invention.
Computer system/server 012 can also with one or more external equipments 014 (such as keyboard, sensing equipment, Display 024 etc.) communication, in the present invention, computer system/server 012 is communicated with outside radar equipment, can also be with One or more enable a user to the equipment interacted with the computer system/server 012 communication, and/or with make the meter Any equipment (such as network interface card, the modulation that calculation machine systems/servers 012 can be communicated with one or more of the other calculating equipment Demodulator etc.) communication.This communication can be carried out by input/output (I/O) interface 022.Also, computer system/clothes Being engaged in device 012 can also be by network adapter 020 and one or more network (such as local area network (LAN), wide area network (WAN) And/or public network, such as internet) communication.As shown in figure 3, network adapter 020 by bus 018 and computer system/ Other modules of server 012 communicate.It should be understood that computer system/server 012 can be combined although being not shown in Fig. 3 Using other hardware and/or software module, including but not limited to: microcode, device driver, redundant processing unit, external magnetic Dish driving array, RAID system, tape drive and data backup storage system etc..
The program that processing unit 016 is stored in system storage 028 by operation, thereby executing described in the invention Function and/or method in embodiment.
Above-mentioned computer program can be set in computer storage medium, i.e., the computer storage medium is encoded with Computer program, the program by one or more computers when being executed, so that one or more computers execute in the present invention State method flow shown in embodiment and/or device operation.
With time, the development of technology, medium meaning is more and more extensive, and the route of transmission of computer program is no longer limited by Tangible medium, can also be directly from network downloading etc..It can be using any combination of one or more computer-readable media. Computer-readable medium can be computer-readable signal media or computer readable storage medium.Computer-readable storage medium Matter for example may be-but not limited to-system, device or the device of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or Any above combination of person.The more specific example (non exhaustive list) of computer readable storage medium includes: with one Or the electrical connections of multiple conducting wires, portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light Memory device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer readable storage medium can With to be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or Person is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission is for by the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language one such as Java, Smalltalk, C++, It further include conventional procedural programming language one such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN) is connected to subscriber computer, or, it may be connected to outer computer (such as provided using Internet service Quotient is connected by internet).
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations;Although The application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (16)

1. a kind of music recommended method based on user speech characterized by comprising
The voice data for obtaining user extracts corresponding class of subscriber and identification text in the voice data;
According to the class of subscriber and identification text, corresponding recommendation music is obtained;
Show corresponding recommendation music to user.
2. the method according to claim 1, wherein
The class of subscriber includes user's gender, age of user section.
3. the method according to claim 1, wherein described extract corresponding class of subscriber in the voice data Include:
According to accessed user speech, using Application on Voiceprint Recognition mode, identification issues the class of subscriber of order voice.
4. according to the method described in claim 3, it is characterized in that, the user speech according to accessed by, using vocal print Identification method, identification issue before the class of subscriber of order voice, further includes:
According to the sound characteristic of different user classification, model training is carried out, establishes the vocal print processing model of different user classification.
5. the method according to claim 1, wherein described extract corresponding identification text packet in the voice data It includes:
Speech recognition is carried out to the voice data using speech recognition modeling, is asked with obtaining the corresponding text of the voice data It asks;Or,
Speech recognition is carried out to the voice data using the speech recognition modeling of corresponding class of subscriber, to obtain the voice number According to corresponding text request.
6. the method according to claim 1, wherein it is described according to the class of subscriber and identification text, obtain Recommendation music includes: accordingly
Judge the type of the identification text;
If precision demand type, according to the corresponding recommendation music of the identification text search;
If general demand type, according to the class of subscriber, obtained in the corresponding recommendation music libraries of each class of subscriber corresponding Recommend music.
7. according to the method described in claim 6, it is characterized in that, described according to the class of subscriber, in each class of subscriber pair It is obtained in the recommendation music libraries answered before recommending music accordingly, further includes:
It is corresponding with the class of subscriber of each historical user based on what is extracted from the sample voice request data of each historical user Music content is searched for, the corresponding recommendation music libraries of each class of subscriber are established.
8. a kind of music recommendation apparatus based on user speech characterized by comprising
Extraction module extracts corresponding class of subscriber and identification text in the voice data for obtaining the voice data of user This;
Searching module, for obtaining corresponding recommendation music according to the class of subscriber and identification text;
Display module, for showing corresponding recommendation music to user.
9. device according to claim 8, which is characterized in that
The class of subscriber includes user's gender, age of user section.
10. device according to claim 8, which is characterized in that the extraction module includes Application on Voiceprint Recognition submodule, is used for According to accessed user speech, using Application on Voiceprint Recognition mode, identification issues the class of subscriber of order voice.
11. device according to claim 10, which is characterized in that the extraction module further includes vocal print processing model foundation Submodule carries out model training for the sound characteristic according to different user classification, establishes the vocal print processing of different user classification Model.
12. device according to claim 8, which is characterized in that the extraction module includes speech recognition submodule, is used for Speech recognition is carried out to the voice data using speech recognition modeling, to obtain the corresponding text request of the voice data; Or, speech recognition is carried out to the voice data using the speech recognition modeling of corresponding class of subscriber, to obtain the voice number According to corresponding text request.
13. device according to claim 8, which is characterized in that the searching module is specifically used for:
Judge the type of the identification text;
If precision demand type, according to the corresponding recommendation music of the identification text search;
If general demand type, according to the class of subscriber, obtained in the corresponding recommendation music libraries of each class of subscriber corresponding Recommend music.
14. device according to claim 13, which is characterized in that the searching module further includes that music libraries is recommended to establish son Module is used for:
It is corresponding with the class of subscriber of each historical user based on what is extracted from the sample voice request data of each historical user Music content is searched for, the corresponding recommendation music libraries of each class of subscriber are established.
15. a kind of equipment, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method as described in any in claim 1-7.
16. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The method as described in any in claim 1-7 is realized when execution.
CN201811222418.1A 2018-10-19 2018-10-19 A kind of music recommended method and device based on user speech Pending CN109582822A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811222418.1A CN109582822A (en) 2018-10-19 2018-10-19 A kind of music recommended method and device based on user speech

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811222418.1A CN109582822A (en) 2018-10-19 2018-10-19 A kind of music recommended method and device based on user speech

Publications (1)

Publication Number Publication Date
CN109582822A true CN109582822A (en) 2019-04-05

Family

ID=65920672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811222418.1A Pending CN109582822A (en) 2018-10-19 2018-10-19 A kind of music recommended method and device based on user speech

Country Status (1)

Country Link
CN (1) CN109582822A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110223134A (en) * 2019-04-28 2019-09-10 平安科技(深圳)有限公司 Products Show method and relevant device based on speech recognition
CN110598011A (en) * 2019-09-27 2019-12-20 腾讯科技(深圳)有限公司 Data processing method, data processing device, computer equipment and readable storage medium
CN111023470A (en) * 2019-12-06 2020-04-17 厦门快商通科技股份有限公司 Air conditioner temperature adjusting method, medium, equipment and device
CN111371838A (en) * 2020-02-14 2020-07-03 厦门快商通科技股份有限公司 Information pushing method and system based on voiceprint recognition and mobile terminal
CN111414512A (en) * 2020-03-02 2020-07-14 北京声智科技有限公司 Resource recommendation method and device based on voice search and electronic equipment
CN111488485A (en) * 2020-04-16 2020-08-04 北京雷石天地电子技术有限公司 Music recommendation method based on convolutional neural network, storage medium and electronic device
CN111638830A (en) * 2020-05-27 2020-09-08 杭州网易云音乐科技有限公司 Multimedia file selection method, device, equipment and computer readable storage medium
CN111694982A (en) * 2019-11-27 2020-09-22 深圳友宝科斯科技有限公司 Song recommendation method and system
CN111782878A (en) * 2020-07-06 2020-10-16 聚好看科技股份有限公司 Server, display equipment and video searching and sorting method thereof
CN111798857A (en) * 2019-04-08 2020-10-20 北京嘀嘀无限科技发展有限公司 Information identification method and device, electronic equipment and storage medium
CN111862991A (en) * 2019-04-30 2020-10-30 杭州海康威视数字技术股份有限公司 Method and system for identifying baby crying
CN111859008A (en) * 2019-04-29 2020-10-30 深圳市冠旭电子股份有限公司 Music recommending method and terminal
CN111951809A (en) * 2019-05-14 2020-11-17 深圳子丸科技有限公司 Multi-person voiceprint identification method and system
CN112230555A (en) * 2020-10-12 2021-01-15 珠海格力电器股份有限公司 Intelligent household equipment, control method and device thereof and storage medium
CN112948662A (en) * 2019-12-10 2021-06-11 北京搜狗科技发展有限公司 Recommendation method and device and recommendation device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106128467A (en) * 2016-06-06 2016-11-16 北京云知声信息技术有限公司 Method of speech processing and device
CN107507612A (en) * 2017-06-30 2017-12-22 百度在线网络技术(北京)有限公司 A kind of method for recognizing sound-groove and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106128467A (en) * 2016-06-06 2016-11-16 北京云知声信息技术有限公司 Method of speech processing and device
CN107507612A (en) * 2017-06-30 2017-12-22 百度在线网络技术(北京)有限公司 A kind of method for recognizing sound-groove and device

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111798857A (en) * 2019-04-08 2020-10-20 北京嘀嘀无限科技发展有限公司 Information identification method and device, electronic equipment and storage medium
CN110223134A (en) * 2019-04-28 2019-09-10 平安科技(深圳)有限公司 Products Show method and relevant device based on speech recognition
CN111859008A (en) * 2019-04-29 2020-10-30 深圳市冠旭电子股份有限公司 Music recommending method and terminal
CN111859008B (en) * 2019-04-29 2023-11-10 深圳市冠旭电子股份有限公司 Music recommending method and terminal
CN111862991A (en) * 2019-04-30 2020-10-30 杭州海康威视数字技术股份有限公司 Method and system for identifying baby crying
CN111951809A (en) * 2019-05-14 2020-11-17 深圳子丸科技有限公司 Multi-person voiceprint identification method and system
CN110598011A (en) * 2019-09-27 2019-12-20 腾讯科技(深圳)有限公司 Data processing method, data processing device, computer equipment and readable storage medium
CN111694982A (en) * 2019-11-27 2020-09-22 深圳友宝科斯科技有限公司 Song recommendation method and system
CN111023470A (en) * 2019-12-06 2020-04-17 厦门快商通科技股份有限公司 Air conditioner temperature adjusting method, medium, equipment and device
CN112948662A (en) * 2019-12-10 2021-06-11 北京搜狗科技发展有限公司 Recommendation method and device and recommendation device
CN111371838A (en) * 2020-02-14 2020-07-03 厦门快商通科技股份有限公司 Information pushing method and system based on voiceprint recognition and mobile terminal
CN111414512A (en) * 2020-03-02 2020-07-14 北京声智科技有限公司 Resource recommendation method and device based on voice search and electronic equipment
CN111488485A (en) * 2020-04-16 2020-08-04 北京雷石天地电子技术有限公司 Music recommendation method based on convolutional neural network, storage medium and electronic device
CN111488485B (en) * 2020-04-16 2023-11-17 北京雷石天地电子技术有限公司 Music recommendation method based on convolutional neural network, storage medium and electronic device
CN111638830A (en) * 2020-05-27 2020-09-08 杭州网易云音乐科技有限公司 Multimedia file selection method, device, equipment and computer readable storage medium
CN111782878A (en) * 2020-07-06 2020-10-16 聚好看科技股份有限公司 Server, display equipment and video searching and sorting method thereof
CN111782878B (en) * 2020-07-06 2023-09-19 聚好看科技股份有限公司 Server, display device and video search ordering method thereof
CN112230555A (en) * 2020-10-12 2021-01-15 珠海格力电器股份有限公司 Intelligent household equipment, control method and device thereof and storage medium

Similar Documents

Publication Publication Date Title
CN109582822A (en) A kind of music recommended method and device based on user speech
CN107507612B (en) Voiceprint recognition method and device
CN107481720B (en) Explicit voiceprint recognition method and device
CN107492379B (en) Voiceprint creating and registering method and device
CN106548773B (en) Child user searching method and device based on artificial intelligence
US11030412B2 (en) System and method for chatbot conversation construction and management
EP3803846B1 (en) Autonomous generation of melody
US8972265B1 (en) Multiple voices in audio content
CN108874895B (en) Interactive information pushing method and device, computer equipment and storage medium
CN110069608A (en) A kind of method, apparatus of interactive voice, equipment and computer storage medium
CN110838286A (en) Model training method, language identification method, device and equipment
CN108197282A (en) Sorting technique, device and the terminal of file data, server, storage medium
CN111081280B (en) Text-independent speech emotion recognition method and device and emotion recognition algorithm model generation method
CN107785018A (en) More wheel interaction semantics understanding methods and device
US20220076674A1 (en) Cross-device voiceprint recognition
US10854189B2 (en) Techniques for model training for voice features
CN110853617A (en) Model training method, language identification method, device and equipment
CN109858038A (en) A kind of text punctuate determines method and device
CN110232340A (en) Establish the method, apparatus of video classification model and visual classification
CN109325091A (en) Update method, device, equipment and the medium of points of interest attribute information
CN109785846A (en) The role recognition method and device of the voice data of monophonic
CN110223134A (en) Products Show method and relevant device based on speech recognition
CN111147871B (en) Singing recognition method and device in live broadcast room, server and storage medium
CN110647613A (en) Courseware construction method, courseware construction device, courseware construction server and storage medium
CN109800410A (en) A kind of list generation method and system based on online chatting record

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190405