WO2020052135A1 - 音乐推荐的方法、装置、计算装置和存储介质 - Google Patents

音乐推荐的方法、装置、计算装置和存储介质 Download PDF

Info

Publication number
WO2020052135A1
WO2020052135A1 PCT/CN2018/121507 CN2018121507W WO2020052135A1 WO 2020052135 A1 WO2020052135 A1 WO 2020052135A1 CN 2018121507 W CN2018121507 W CN 2018121507W WO 2020052135 A1 WO2020052135 A1 WO 2020052135A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
voice
music
emotional
sound
Prior art date
Application number
PCT/CN2018/121507
Other languages
English (en)
French (fr)
Inventor
廖海霖
张新
毛跃辉
廖湖锋
王慧君
Original Assignee
珠海格力电器股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 珠海格力电器股份有限公司 filed Critical 珠海格力电器股份有限公司
Publication of WO2020052135A1 publication Critical patent/WO2020052135A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/725Cordless telephones

Definitions

  • the present disclosure relates to the field of artificial intelligence technology, and in particular, to a method, a device, a computing device, and a storage medium for music recommendation.
  • the application for measuring the heart rate needs to be opened first, and then the finger is placed at the corresponding position for measuring the heart rate according to the instructions of the application and held for a period of time to obtain the measurement result. Therefore, the prior art solutions also have complicated operations and time. A long question.
  • a method for music recommendation including: obtaining a user's voice; obtaining an emotional keyword based on the user's voice, wherein the emotional keyword is used to indicate a user's current emotion; and for the user Recommend types of music corresponding to emotional keywords.
  • the method for music recommendation further includes: collecting sounds emitted by multiple users, each sound having a corresponding emotional keyword; determining a sample of the sound according to a voiceprint extracted from each of the collected sounds; Big data self-learning method uses sound samples to train the emotions of the sound emotion model.
  • obtaining the emotional keywords according to the user's voice includes: analyzing the user's voice using a voice emotion model to obtain the emotional keywords.
  • determining a sample of a sound according to a voiceprint extracted from each sound collected includes: converting a signal of the collected sound into a spectrogram and a spectrogram; and extracting the sound from the spectrogram and the spectrogram Based on the Mel frequency cepstrum coefficient, the voice characteristic value is obtained from the extracted voiceprint, and the voice characteristic value is used as a sample of the voice.
  • the trained voice emotion model includes a key value-value table; wherein the key value represents a voice characteristic value, and the value value represents an emotional keyword corresponding to the voice characteristic value; and the emotional keyword is obtained according to the user's voice Including: converting the signal of the collected sound into a spectrogram and a spectrogram; extracting a voiceprint from the spectrogram and the spectrogram; based on the Mel frequency cepstrum coefficient, obtaining a sound characteristic value from the extracted voiceprint; according to Key value-value table, to get the emotional keywords corresponding to the sound feature values.
  • the method for music recommendation further includes: performing a semantic analysis on the user's voice; determining the search range of the music requested by the user according to the result of the semantic analysis; recommending the type of music corresponding to the emotional keywords for the user includes: Within the search range of the music requested by the user, it is recommended for the user that the type of music corresponding to the emotional keywords.
  • recommending music of a type corresponding to an emotional keyword for the user includes: obtaining a user ’s emotional keyword based on a preset correspondence between the emotional keyword and a music tag.
  • the method for music recommendation further includes: semantically parsing the user's voice; and when it is determined that the voice contains an awake word according to the result of the semantic parsing, obtaining emotional keywords based on the voice of the user.
  • an apparatus for music recommendation including: an acquisition module for acquiring a user's voice; and a training analysis module for obtaining an emotional keyword based on the user's voice, wherein emotion is key Words are used to indicate the user's current emotions; a recommendation module is used to recommend music of the type corresponding to the emotional keywords for the user.
  • a computing device including at least one processor; and; a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, The instructions are executed by at least one processor, so that the at least one processor can execute any one of the music recommendation methods provided by the embodiments of the present disclosure.
  • a computer-readable storage medium storing executable instructions, and when the executable instructions are executed by a processor, a method for implementing any one of the music recommendation in the embodiments of the present disclosure.
  • FIG. 1 is a schematic flowchart of a music recommendation method according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a music recommendation structure in an embodiment of the present disclosure
  • FIG. 3 is a schematic structural diagram of a computing device according to an embodiment of the present disclosure.
  • the inventor recognizes that in a practical situation, if only a mobile terminal is used, it is difficult to obtain parameters such as heartbeat, pulse, breathing, heart rate, and the like. These parameters are likely to be obtained using external equipment, such as measuring the user's pulse with a smart watch. Therefore, the method in the related art is less practical.
  • additional applications are needed to obtain the corresponding parameters. Taking heart rate measurement as an example, you first need to open the heart rate measurement application, and then place your finger on the corresponding position for heart rate measurement according to the application's instructions and hold it for a period of time to get the measurement result. Therefore, the related art solutions also have problems of complicated operations and long time.
  • the inventor has proposed the technical solution of the embodiments of the present disclosure after research. Specifically, since the voice function is a basic function possessed by each mobile terminal, music can be recommended according to the voice of each person. Further, each person's voice is different, and the music that each person needs to listen to under different emotions is also different. Therefore, the inventor of the present disclosure proposes a scheme for determining the user's emotion based on the user's voice, and further recommends music according to the user's emotion. In this way, the recommended music meets the needs of the user's current mood, so the user can accurately recommend the music in a simple voice manner.
  • the sounds of different users are first collected.
  • the collected sounds may be sounds input by the user according to requirements (such as simulating sound input under different emotions), or they may not require the user to deliberately input to the mobile terminal, and It is the voices that the user sends to the mobile terminal everyday; then, the voiceprints of these voices are extracted and the self-learning method based on big data is used to obtain the voice characteristics under different emotions.
  • the mobile terminal When the mobile terminal receives the sound from the user, it will extract the characteristics of the sound, and then obtain the emotion corresponding to the sound, and then recommend music according to the emotion.
  • the voice may be semantically parsed to determine whether the voice contains an awake word, and the content contained in the voice is identified. If the wake word is included, the feature of the voice is extracted to obtain the voice feature value.
  • FIG. 1 is a schematic flowchart of a music recommendation method according to some embodiments of the present disclosure, including steps 101-103.
  • step 101 a user's voice is acquired.
  • step 102 emotional keywords are obtained according to the user's voice, wherein the emotional keywords are used to indicate the current emotion of the user.
  • step 103 music of a type corresponding to the emotional keyword is recommended for the user.
  • a correspondence relationship between sounds and keywords may also be found through a self-learning method based on big data.
  • step A1 may be included ⁇ A3.
  • step A1 sounds made by a plurality of users are collected.
  • the user may not need to deliberately record sounds into the mobile terminal, but collects the daily voices that the user sends to the mobile terminal, such as: phone voice, call voice of a third-party application, and voice interaction through an instant communication tool Features such as voice messages.
  • the user can also be invited to input sounds under different emotions for learning.
  • step A2 a sample of the sound is determined according to the voiceprint extracted from each of the collected sounds.
  • the voiceprint includes tone color, tone, and loudness.
  • the user's identity can be identified through the tone color, and then the user's emotion can be determined based on the tone and loudness. That is, during learning, the voice characteristics under different emotions can be obtained for different users to learn the voices under different emotions.
  • step A3 the self-learning method based on big data is used to train the sound emotion model using sound samples, so that the sound emotion model can obtain the emotion keywords corresponding to the sound through the user's sound.
  • the foregoing obtaining the emotion keywords based on the user's voice can be specifically implemented as: using the voice emotion model to analyze the user's voice to obtain the emotion keywords .
  • the emotion keywords corresponding to the extracted voiceprint can be obtained through the voice emotion model.
  • the self-learning method based on big data can make the obtained emotional keywords more consistent with the user's emotions, that is, the established voice emotion model can more accurately identify the emotions of different users. Furthermore, the recommended music is more in line with the user's current mood.
  • the obtained voice signal may be first converted into a spectrogram, and the voiceprint is extracted from the spectrogram, and the voiceprint is further extracted to obtain the voice characteristic value.
  • steps B1 to B3 are included.
  • step B1 the signal of the collected sound is converted into a spectrogram and a spectrogram.
  • step B2 a voiceprint is extracted from a spectrogram and a spectrogram.
  • step B3 based on the Mel frequency cepstrum coefficient, a sound feature value is obtained from the extracted voiceprint, and the sound feature value is used as a sample of the sound.
  • extracting sound feature values by using Mel frequency cepstrum coefficients may include the following steps, for example: obtaining speech, pre-emphasis, framing, and windowing, performing FFT (Fast Fourier transform), and obtaining Absolute or squared values, Mel filtering (Mel filtering), logarithmic operation, DCT (Discrete transform, discrete cosine transform), obtain dynamic characteristics (Delta MFCC, Mel frequency cepstrum coefficient), and output sound characteristic values.
  • FFT Fast Fourier transform
  • Absolute or squared values Absolute or squared values
  • Mel filtering Mel filtering
  • logarithmic operation logarithmic operation
  • DCT Discrete transform, discrete cosine transform
  • Delta MFCC Mel frequency cepstrum coefficient
  • the specific data of the voiceprint can be further obtained, that is, the accurate emotional keywords can be further obtained.
  • the voice emotion model is responsible for finding the corresponding emotion feature value according to the voice feature value.
  • the trained voice emotion model can be embodied in the form of a key value-value table, where the key value represents the voice feature value, value The value indicates an emotional keyword corresponding to the characteristic value of the sound. After obtaining the key-value table, the user's emotional keywords can be obtained according to the user's voice through steps C1 to C4.
  • step C1 the signal of the collected sound is converted into a spectrogram and a spectrogram.
  • step C2 a voiceprint is extracted from a spectrogram and a spectrogram.
  • step C3 based on the Mel frequency cepstrum coefficient, a sound feature value is obtained from the extracted voiceprint.
  • step C4 according to a key-value value table, an emotional keyword corresponding to a voice characteristic value is obtained.
  • the key-value table is shown in Table 1. It should be noted that Table 1 is only for explaining the exemplary correspondence between the key and the value, and the specific values of the key and the value The correspondence between them also needs to be corrected by collecting data.
  • the key value-value table can be used to associate sound feature values with emotional keywords, and emotional keywords can be determined through a simple table lookup.
  • the acquired sound may be interpreted, and further as the condition for searching for the type of music according to the content of the interpretation, it may be specifically implemented as follows: semantic analysis of the user's voice; based on the semantic analysis result, the The search scope of the music requested by the user.
  • the retrieval range when recommending music of a type corresponding to the emotional keyword for the user, it may be implemented as follows: within the retrieval range of the music requested by the user, recommending corresponding to the emotional keyword for the user Type of music.
  • An example of finding suitable music according to the music search range and emotional keywords mentioned above is: after obtaining the voice "play a song of a friend", determine the corresponding emotional keywords based on the voice. If it is determined that the user is happy when saying this sentence, he will retrieve Zhang Xueyou's cheerful songs. In this way, according to the search range and emotional keywords of music, users can be provided with more accurately the type of music they want.
  • a music tag matching the emotional keywords may be found, and appropriate music may be searched according to the music tags.
  • the specific implementation may be as follows: according to the emotional keywords and music The preset corresponding relationship of the tags obtains the music tags of the user's emotional keywords; within the search range of the music requested by the user, according to the obtained music tags, recommends music for the user.
  • the processing resources and power of the terminal are continuously consumed.
  • the acquired sound needs to be searched.
  • the operation of obtaining emotional keywords based on the voice is performed.
  • the operation can be implemented as follows: semantic analysis of the user's voice; and based on the semantic analysis result, it is determined that the voice contains the awakening word.
  • emotional keywords are obtained based on the user's voice.
  • the above semantic parsing process includes searching the sound to find whether the sound contains a wake word. If there is an awakening word, the operation of obtaining emotional keywords based on the voice is performed.
  • the wake word can be onomatopoeic "Hey”, “Ha”, or "song".
  • users can also customize their own wake-up words by setting.
  • the subsequent processing is performed only when the wake-up word appears, which can achieve the purpose of saving processing resources and power.
  • FIG. 2 is a schematic structural diagram of a music recommendation device according to some embodiments of the present disclosure.
  • the music recommendation device includes: an acquisition module 201 for acquiring a user's voice; and a training analysis module 202 for obtaining an emotional keyword based on the user's voice, wherein the emotional keyword is used to indicate the user's current Emotions; a recommendation module 203, configured to recommend music of a type corresponding to an emotional keyword for the user.
  • the apparatus for music recommendation further includes: a collection module for collecting sounds emitted by multiple users, each sound having a corresponding emotional keyword; a voiceprint extraction module for collecting The voiceprint extracted from the sound determines the sample of the sound; the training module is used to train the sound emotion model by using the sample of the sound through the self-learning method based on big data, so that the sound emotion model can obtain the sound from the user's voice Corresponding emotional keywords.
  • the training analysis module 202 is further specifically configured to analyze the user's voice by using a voice emotion model to obtain emotional keywords.
  • the voiceprint extraction module specifically includes: a first conversion unit configured to convert a signal of the collected sound into a spectrogram and a spectrogram; and a first voiceprint extraction unit configured to convert the A voiceprint is extracted in the spectrum diagram; a first sound feature value extraction unit is configured to obtain a sound feature value from the extracted voiceprint based on the Mel frequency cepstrum coefficient, and use the sound feature value as a sample of the sound.
  • the trained sound emotion model includes a key value-value table; wherein the key value represents a sound feature value and the value value represents an emotional keyword corresponding to the sound feature value;
  • the training analysis module 202 specifically includes: A second conversion unit for converting a signal of the collected sound into a spectrogram and a spectrogram; a second voiceprint extraction unit for extracting a voiceprint from the spectrogram and the spectrogram; a second sound feature value extraction unit , Used to obtain the voice characteristic value from the extracted voiceprint based on the Mel frequency cepstrum coefficient; the search unit is used to obtain the emotional keywords corresponding to the user's voice according to the key-value table.
  • the apparatus for music recommendation further includes: a first semantic analysis module for semantic analysis of the user's voice; a range determination module for determining the retrieval range of the music requested by the user according to the result of the semantic analysis; recommendation The module 203 specifically includes a recommendation unit for recommending music of a type corresponding to the emotional keyword to the user within the search range of the music requested by the user.
  • the recommendation unit specifically includes: a music tag corresponding sub-unit for obtaining a music tag of the user's emotional keyword according to a preset correspondence between the emotional keyword and the music tag; determining a recommendation sub-unit for Within the search range of the music requested by the user, the music is recommended for the user according to the obtained music tag.
  • the apparatus for music recommendation further includes: a second semantic analysis module configured to perform semantic analysis on the user's voice; and a search module configured to determine that the sound contains an awake word according to the semantic analysis result,
  • the training analysis module 202 is triggered to obtain emotional keywords based on the user's voice.
  • a computing device may include at least one processor and at least one memory.
  • the memory stores program code, and when the program code is executed by the processor, the processor is caused to execute the music recommendation method according to various exemplary embodiments of the present disclosure described above in this specification.
  • a computing device 30 according to this embodiment of the present disclosure is described below with reference to FIG. 3.
  • the computing device 30 shown in FIG. 3 is merely an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present disclosure.
  • the computing device may be, for example, a mobile phone, a tablet computer, or the like.
  • the computing device 30 is expressed in the form of a general-purpose computing device.
  • the components of the computing device 30 may include, but are not limited to, the at least one processor 31 described above, the at least one memory 32 described above, and a bus 33 connecting different system components (including the memory 32 and the processor 31).
  • the bus 33 represents one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a processor, or a local area bus using any of a variety of bus structures.
  • the memory 32 may include a readable medium in the form of a volatile memory, such as a random access memory (RAM) 321 and / or a cache memory 322, and may further include a read-only memory (ROM) 323.
  • RAM random access memory
  • ROM read-only memory
  • the memory 32 may also include a program / utility tool 325 having a set (at least one) of program modules 324.
  • program modules 324 include, but are not limited to, an operating system, one or more application programs, other program modules, and program data. These Each or some combination of examples may include an implementation of a network environment.
  • the computing device 30 may also communicate with one or more external devices 34 (eg, pointing devices, etc.), and may also communicate with one or more devices that enable users to interact with the computing device 30, and / or with the computing device 30 that enables Any device (such as a router, modem, etc.) that communicates with one or more other computing devices. This communication can be performed through an input / output (I / O) interface 35.
  • the computing device 30 may also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN), and / or a public network, such as the Internet) through the network adapter 36. As shown, the network adapter 36 communicates with other modules for the computing device 30 via the bus 33.
  • LAN local area network
  • WAN wide area network
  • public network such as the Internet
  • aspects of the music recommendation method provided by the present disclosure may also be implemented in the form of a program product, which includes program code.
  • the program product runs on a computer device, the program code is used to make a computer
  • the device performs the steps in the method of music recommendation according to various exemplary embodiments of the present disclosure described above in this specification, and performs steps 101-103 as shown in FIG. 1.
  • the program product may employ any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • the music recommendation method of the embodiment of the present disclosure may adopt a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on a computing device.
  • CD-ROM portable compact disc read-only memory
  • the program product of the present disclosure is not limited thereto.
  • the readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • the readable signal medium may include a data signal that is borne in baseband or propagated as part of a carrier wave, in which readable program code is carried. Such a propagated data signal may take many forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with an instruction execution system, apparatus, or device.
  • Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • the program code for performing the operations of the present disclosure may be written in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages such as Java, C ++, etc., and also include conventional procedural programming. Language—such as "C” or a similar programming language.
  • the program code can be executed entirely on the user computing device, partly on the user device, as an independent software package, partly on the user computing device, partly on the remote computing device, or entirely on the remote computing device or server On.
  • the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (e.g., using Internet services Provider to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet services Provider to connect via the Internet
  • the embodiments of the present disclosure may be provided as a method, a system, or a computer program product. Therefore, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, the present disclosure may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code therein.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to operate in such a manner that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instruction device Achieve the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Child & Adolescent Psychology (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Hospice & Palliative Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Psychiatry (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开公开了一种音乐推荐方法、装置、计算装置和存储介质,涉及人工智能技术领域音乐推荐的方法包括:获取用户的声音;根据所述用户的声音得到情绪关键词,其中,所述情绪关键词用于表示用户当前情绪;为该用户推荐与所述情绪关键词对应类型的音乐。

Description

音乐推荐的方法、装置、计算装置和存储介质
相关申请的交叉引用
本申请是以CN申请号为201811051761.4,申请日为2018年9月10日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及人工智能技术领域,尤其涉及一种音乐推荐的方法、装置、计算装置和存储介质。
背景技术
现在智能终端行业发展迅速,各种智能终端的智能化程度越来越高。当前的大部分移动终端具有音乐播放功能,而且该功能的使用率很高。如何根据用户的情绪来判断播放音乐的类型变得越来越重要。相关技术可以通过用户的心跳、脉搏、呼吸、心率等参数来为用户推荐音乐。而在实际情况中,如果只使用移动终端很难获取到这些参数,必须采用外部的设备,例如采用智能手表测量用户的脉搏,所以,该方法实用性低。此外,对于智能终端来说,还需要额外的应用来获取相应的参数。以测量心率为例,首先需要打开测量心率的应用,然后按照应用的指示将手指放置到测量心率的相应位置并保持一段时间才能得到测量结果,所以,现有技术的方案还存在操作复杂,时间长等的问题。
发明内容
根据本公开一些实施例的第一方面,提供一种音乐推荐的方法,包括:获取用户的声音;根据用户的声音得到情绪关键词,其中,情绪关键词用于表示用户当前情绪;为该用户推荐与情绪关键词对应类型的音乐。
在一些实施例中,音乐推荐的方法还包括:采集多个用户发出的声音,每个声音具有对应的情绪关键词;根据从采集的每个声音中提取的声纹确定声音的样本;通过基于大数据的自学习的方法,采用声音的样本对声音情绪模型进行训练情绪。
在一些实施例中,根据用户的声音得到情绪关键词包括:采用声音情绪模型对用户的声音进行分析,得到情绪关键词。
在一些实施例中,根据从采集的每个声音中提取的声纹确定声音的样本包括:将采集到的声音的信号转换成声谱图和频谱图;从声谱图和频谱图中提取声纹;基于梅尔频率倒谱系数,从提取的声纹中得到声音特征值,并将声音特征值作为声音的样本。
在一些实施例中,训练好的声音情绪模型包括key值-value值表;其中,key值表示声音特征值,value值表示该声音特征值对应的情绪关键词;根据用户的声音得到情绪关键词包括:将采集到的声音的信号转换成声谱图和频谱图;从声谱图和频谱图中提取声纹;基于梅尔频率倒谱系数,从提取的声纹中得到声音特征值;根据key值-value值表,得到声音特征值对应的情绪关键词。
在一些实施例中,音乐推荐的方法还包括:对用户的声音进行语义解析;根据语义解析结果,确定用户请求的音乐的检索范围;为该用户推荐与情绪关键词对应类型的音乐包括:在用户请求的音乐的检索范围内,为该用户推荐与情绪关键词对应类型的音乐。
在一些实施例中,在用户请求的音乐的检索范围内,为该用户推荐与情绪关键词对应类型的音乐包括:根据情绪关键词与音乐标签的预设对应关系,得到用户的情绪关键词的音乐标签;在用户请求的音乐的检索范围内,根据得到的音乐标签,为用户推荐音乐。
在一些实施例中,音乐推荐的方法还包括:对用户的声音进行语义解析;在根据语义解析结果确定该声音中含有唤醒词的情况下,根据用户的声音得到情绪关键词。
根据本公开一些实施例的第二方面,提供一种音乐推荐的装置,包括:获取模块,用于获取用户的声音;训练分析模块,用于根据用户的声音得到情绪关键词,其中,情绪关键词用于表示用户当前情绪;推荐模块,用于为该用户推荐与情绪关键词对应类型的音乐。
根据本公开一些实施例的第三方面,提供一种计算装置,包括至少一个处理器;以及;与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行本公开实施例提供的任意一种音乐推荐的方法。
根据本公开一些实施例的第四方面,提供一种计算机可读存储介质,存储有可执行指令,可执行指令被处理器执行时实现本公开实施例中的任意一种音乐推荐的方法。
本公开的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本公开而了解。本公开的目的和其他优点可通过在所写的 说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。
附图说明
此处所说明的附图用来提供对本公开的进一步理解,构成本公开的一部分,本公开的示意性实施例及其说明用于解释本公开,并不构成对本公开的不当限定。在附图中:
图1为本公开实施例中音乐推荐方法的流程示意图;
图2为本公开实施例中音乐推荐结构示意图;
图3为根据本公开实施方式的计算装置的结构示意图。
具体实施方式
发明人认识到,在实际情况中,如果仅使用移动终端,很难获取到心跳、脉搏、呼吸、心率等参数。这些参数很可能需要采用外部的设备来获得,例如采用智能手表测量用户的脉搏等等。所以,相关技术中的方法实用性较低。此外,对于智能终端来说,还需要额外的应用来获取相应的参数。以测量心率为例,首先需要打开测量心率的应用,然后按照应用的指示将手指放置到测量心率的相应位置并保持一段时间才能得到测量结果。所以,相关技术的方案还存在操作复杂,时间长等的问题。
为了使得根据用户情绪判断播放音乐类型的操作更简单、更迅速,本公开实施例中提供一种音乐推荐方法及装置。为了更好的理解本公开实施例提供的技术方案,这里对该方案的基本原理做一下简单说明。
为通过更简单的方式来为用户准确地推荐音乐,发明人经过研究提出了本公开实施例的技术方案。具体的,由于语音功能是每个移动终端都具有的基本功能,所以,可以根据每个人的语音来推荐音乐。进一步的,每个人的声音是不同的,每个人在不同情绪下需要收听的音乐也是不同的。所以本公开的发明人提出了根据用户的语音来确定用户的情绪的方案,根据用户的情绪来进一步推荐音乐。这样,推荐的音乐是满足用户当前情绪下的需求的,所以能够通过简单的语音方式为用户准确的推荐音乐。
在一些实施例中,首先采集不同用户的声音,采集的声音可以是用户按照要求输入的(例如模拟不同情绪下的声音输入)声音,也可以是不需要用户去刻意地向移动终端输入,而是用户日常向移动终端发出的声音;然后,提取这些声音的声纹,通过基于大数据的自学习方法,得到不同情绪下的声音特征。
当移动终端接收到用户发出的声音时,会提取声音的特征,进而得到该声音对应的情绪,然后根据情绪推荐音乐。
在一些实施例中,在接收到用户发出的声音之后,可以先对此声音进行语义解析,判断该条语音是否含有唤醒词,并且识别该语音所含内容。若含有唤醒词,则提取该语音的特征,以得到声音特征值。
下面结合参照附图对本公开的一些实施例提供的一种音乐推荐的方法作进一步说明。图1为根据本公开一些实施例的音乐推荐方法的流程示意图,包括步骤101~103。
在步骤101中,获取用户的声音。
在步骤102中,根据用户的声音得到情绪关键词,其中,情绪关键词用于表示用户当前情绪。
在步骤103中,为该用户推荐与情绪关键词对应类型的音乐。
在一些实施例中,为了能够更准确地识别不同声音对应的情绪,在步骤101之前,还可以通过基于大数据的自学习的方法找到声音与关键词之间的对应关系,例如可包括步骤A1~A3。
在步骤A1中,采集多个用户发出的声音。
在一些实施例中,可以不需要用户去刻意地向移动终端录入声音,而是采集用户日常向移动终端发出的声音,如:电话语音、第三方应用的通话语音及通过即时通信工具的语音交互功能发出的语音消息等。当然,如前所述,也可以邀请用户输入不同情绪下的声音以供学习。
在步骤A2中,根据从采集的每个声音中提取的声纹确定声音的样本。
在一些实施例中,声纹包括音色、音调和响度,通过音色可以识别用户身份,再根据音调和响度可以判断该用户的情绪。即,在学习时,可以针对不同用户得到不同情绪下的声音特征以学习其不同情绪下的声音。
在步骤A3中,通过基于大数据的自学习的方法,采用声音的样本对声音情绪模型进行训练,使得声音情绪模型可以通过用户的声音得到与该声音对应的情绪关键词。
在一些实施例中,在获得通过大数据学习得到的声音情绪模型之后,前述的根据用户的声音得到情绪关键词,可具体实施为:采用声音情绪模型对用户的声音进行分析,得到情绪关键词。
这样,通过基于大数据的自学习的方法,可以通过声音情绪模型得到与提取的声纹对应的情绪关键词。通过基于大数据的自学习方法,可以使获得的情绪关键词更加 符合用户的情绪,也即建立的声音情绪模型能够更准确地识别出不同用户的情绪。进而,推荐的音乐也就更符合用户当前的情绪。
在一些实施例中,为了从声音中提取出声纹可以先将获得的语音信号转化成声谱图,并从声谱图中提取出声纹,并对声纹做进一步提取,得到声音特征值,例如包括步骤B1~B3。
在步骤B1中,将采集到的声音的信号转换成声谱图和频谱图。
在步骤B2中,从声谱图和频谱图中提取声纹。
在步骤B3中,基于梅尔频率倒谱系数,从提取的声纹中得到声音特征值,并将声音特征值作为声音的样本。
在一些实施例中,通过梅尔频率倒谱系数提取声音特征值例如可以包括以下步骤:获取语音、预加重、分帧和加窗、执行FFT(Fast Fourier transform,快速傅里叶变换)、取绝对值或平方值、Mel滤波(梅尔滤波)、取对数运算、DCT(Discrete consine transform,离散余弦变换)、获得动态特征(Delta MFCC,梅尔频率倒谱系数)、输出声音特征值。
这样,通过提取声音特征值,可以进一步得到声纹的具体数据,也即可以进一步得到准确的情绪关键词。
在一些实施例中,声音情绪模型负责根据声音特征值找到对应的情绪特征值,训练好的声音情绪模型可以通过key值-value值表的形式来体现,其中,key值表示声音特征值,value值表示该声音特征值对应的情绪关键词。在获得key值-value值表之后,可以通过步骤C1~C4来根据用户的声音得到用户的情绪关键词。
在步骤C1中,将采集到的声音的信号转换成声谱图和频谱图。
在步骤C2中,从声谱图和频谱图中提取声纹。
在步骤C3中,基于梅尔频率倒谱系数,从提取的声纹中得到声音特征值。
在步骤C4中,根据key值-value值表,得到声音特征值对应的情绪关键词。
在一些实施例中,key值-value值表如表1所示,需要说明的是,该表1只是为了说明key值与value值的示例性的对应关系,关于key值与value值的具体数值之间的对应关系,还需要通过采集数据进行更正。
表1
key value
0~10 高兴
10~30 愉悦
30~50 悲伤
50~60 正常
…… ……
这样,通过key值-value值表,可以将声音特征值与情绪关键词对应起来,通过简单的查表方式,即可确定情绪关键词。
在一些实施例中,在步骤101之后,可以对获取的声音进行解释,进一步根据解释的内容作为搜索音乐类型的条件,具体可实施为:对用户的声音进行语义解析;根据语义解析结果,确定用户请求的音乐的检索范围。
在一些实施例中,在检索范围的基础上,在为该用户推荐与情绪关键词对应类型的音乐时可实施为:在用户请求的音乐的检索范围内,为该用户推荐与情绪关键词对应类型的音乐。
上述语义解析是指对该声音进行解释,了解这句话的意思。例如“放一首张学友的歌”表明用户想听张学友的歌曲。因此用户请求的音乐的检索范围就变成了歌手是张学友,从而需要在张学友的歌曲中进行检索。
上述根据音乐的检索范围与情绪关键词,找到合适的音乐的一个示例为:获取声音“放一首张学友的歌”后,根据语音确定对应的情绪关键词。若确定用户说这句话的时候是高兴的,就会检索张学友的欢快的歌曲。这样,根据音乐的检索范围与情绪关键词,可以为用户更准确地提供准确想要的音乐类型。
在一些实施例中,根据获取到的情绪关键词得到推荐的歌曲之前,可以先找到符合情绪关键词的音乐标签,并根据音乐标签搜索合适的音乐,具体可实施为:根据情绪关键词与音乐标签的预设对应关系,得到用户的情绪关键词的音乐标签;在用户请求的音乐的检索范围内,根据得到的音乐标签,为该用户推荐音乐。
情绪关键词与音乐标签的预设对应关系的一个示例为:情绪关键词为“高兴”,对应的音乐标签可以是“欢快的”。然后在互联网中搜索“欢快的”类型的音乐,并推荐给用户。
这样,通过情绪关键词与音乐标签的预设对应关系,可以根据情绪关键词找到合 适的音乐标签,为用户推荐更准确的音乐。
在一些实施例中,若一直获取用户的声音并进行后续的处理,会不断消耗终端的处理资源和电量。有鉴于此,在一些实施例中,在步骤101之后,需要对获取的声音进行搜索。在搜索到含有唤醒词的语句的情况下,执行根据该声音得到情绪关键词的操作,具体可实施为:对用户的声音进行语义解析;在根据语义解析结果,确定该声音中含有唤醒词的情况下,根据用户的声音得到情绪关键词。
上述语义解析过程包括对该声音进行检索,查找该声音中是否含有唤醒词。如果含有唤醒词,即执行根据该声音得到情绪关键词的操作。唤醒词可以是拟声词“嘿”、“哈”,也可以是“歌曲”。当然,具体实施时,用户也可以通过设置自定义自己的唤醒词。
这样,通过唤醒词进入到根据声音推荐音乐的操作,可以减少误触发的几率。此外,在出现唤醒词的情况下才进行后续的处理,能够达到节约处理资源和电量的目的。
基于相同或相似的发明构思,本公开实施例还提供一种音乐推荐的装置。图2为根据本公开一些实施例的音乐推荐的装置的结构示意图。如图2所示,该音乐的推荐装置包括:获取模块201,用于获取用户的声音;训练分析模块202,用于根据用户的声音得到情绪关键词,其中,情绪关键词用于表示用户当前情绪;推荐模块203,用于为该用户推荐与情绪关键词对应类型的音乐。
在一些实施例中,该音乐推荐的装置还包括:采集模块,用于采集多个用户发出的声音,每个声音具有对应的情绪关键词;声纹提取模块,用于根据从采集的每个声音中提取的声纹确定声音的样本;训练模块,用于通过基于大数据的自学习的方法,采用声音的样本对声音情绪模型进行训练,使得声音情绪模型可以通过用户的声音得到与该声音对应的情绪关键词。
在一些实施例中,训练分析模块202进一步具体用于采用声音情绪模型对用户的声音进行分析,得到情绪关键词。
在一些实施例中,声纹提取模块具体包括:第一转换单元,用于将采集到的声音的信号转换成声谱图和频谱图;第一声纹提取单元,用于从声谱图和频谱图中提取声纹;第一声音特征值提取单元,用于基于梅尔频率倒谱系数,从提取的声纹中得到声音特征值,并将声音特征值作为声音的样本。
在一些实施例中,训练好的声音情绪模型包括key值-value值表;其中,key值表示声音特征值,value值表示该声音特征值对应的情绪关键词;训练分析模块202 具体包括:第二转换单元,用于将采集到的声音的信号转换成声谱图和频谱图;第二声纹提取单元,用于从声谱图和频谱图中提取声纹;第二声音特征值提取单元,用于基于梅尔频率倒谱系数,从提取的声纹中得到声音特征值;查找单元,用于根据key值-value值表,得到用户的声音对应的情绪关键词。
在一些实施例中,音乐推荐的装置还包括:第一语义解析模块,用于对用户的声音进行语义解析;确定范围模块,用于根据语义解析结果,确定用户请求的音乐的检索范围;推荐模块203具体包括:推荐单元,用于在用户请求的音乐的检索范围内,为该用户推荐与情绪关键词对应类型的音乐。
在一些实施例中,推荐单元具体包括:音乐标签对应子单元,用于根据情绪关键词与音乐标签的预设对应关系,得到用户的情绪关键词的音乐标签;确定推荐子单元,用于在用户请求的音乐的检索范围内,根据得到的音乐标签,为用户推荐音乐。
在一些实施例中,音乐推荐的装置还包括:第二语义解析模块,用于对用户的声音进行语义解析;搜索模块,用于在根据语义解析结果确定该声音中含有唤醒词的情况下,触发训练分析模块202执行根据用户的声音得到情绪关键词。
在介绍了本公开示例性实施方式的音乐推荐的方法及装置之后,接下来,介绍根据本公开的另一示例性实施方式的计算装置。
所属技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
在一些可能的实施方式中,根据本公开的实施例,计算装置可以至少包括至少一个处理器、以及至少一个存储器。其中,存储器存储有程序代码,当程序代码被处理器执行时,使得处理器执行本说明书上述描述的根据本公开各种示例性实施方式的音乐推荐方法。
下面参照图3来描述根据本公开的这种实施方式的计算装置30。图3显示的计算装置30仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。该计算装置例如可以是手机、平板电脑等。
如图3所示,计算装置30以通用计算装置的形式表现。计算装置30的组件可以包括但不限于:上述至少一个处理器31、上述至少一个存储器32、连接不同系统组件(包括存储器32和处理器31)的总线33。
总线33表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器、外围总线、处理器或者使用多种总线结构中的任意总线结构的局域总线。
存储器32可以包括易失性存储器形式的可读介质,例如随机存取存储器(RAM)321和/或高速缓存存储器322,还可以进一步包括只读存储器(ROM)323。
存储器32还可以包括具有一组(至少一个)程序模块324的程序/实用工具325,这样的程序模块324包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
计算装置30也可以与一个或多个外部设备34(例如指向设备等)通信,还可与一个或者多个使得用户能与计算装置30交互的设备通信,和/或与使得该计算装置30能与一个或多个其它计算装置进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口35进行。并且,计算装置30还可以通过网络适配器36与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器36通过总线33与用于计算装置30的其它模块通信。应当理解,尽管图中未示出,可以结合计算装置30使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理器、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
在一些可能的实施方式中,本公开提供的音乐推荐方法的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当程序产品在计算机设备上运行时,程序代码用于使计算机设备执行本说明书上述描述的根据本公开各种示例性实施方式的音乐推荐的方法中的步骤,执行如图1中所示的步骤101-103。
程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
本公开实施方式的音乐推荐方法可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在计算装置上运行。然而,本公开的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执 行系统、装置或者器件使用或者与其结合使用。
可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括——但不限于——电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于——无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算装置上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算装置上部分在远程计算装置上执行、或者完全在远程计算装置或服务器上执行。在涉及远程计算装置的情形中,远程计算装置可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算装置,或者,可以连接到外部计算装置(例如利用因特网服务提供商来通过因特网连接)。
应当注意,尽管在上文详细描述中提及了装置的若干单元或子单元,但是这种划分仅仅是示例性的并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多单元的特征和功能可以在一个单元中具体化。反之,上文描述的一个单元的特征和功能可以进一步划分为由多个单元来具体化。
此外,尽管在附图中以顺序描述了本公开方法的操作,但是,这并非要求或者暗示必须按照该顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。附加地或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。
本领域内的技术人员应明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流 程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本公开的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本公开范围的所有变更和修改。
显然,本领域的技术人员可以对本公开进行各种改动和变型而不脱离本公开的精神和范围。这样,倘若本公开的这些修改和变型属于本公开权利要求及其等同技术的范围之内,则本公开也意图包含这些改动和变型在内。

Claims (11)

  1. 一种音乐推荐的方法,包括:
    获取用户的声音;
    根据所述用户的声音得到情绪关键词,其中,所述情绪关键词用于表示用户当前情绪;
    为该用户推荐与所述情绪关键词对应类型的音乐。
  2. 如权利要求1所述的方法,还包括:
    采集多个用户发出的声音,每个声音具有对应的情绪关键词;
    根据从采集的每个声音中提取的声纹确定声音的样本;
    通过基于大数据的自学习的方法,采用声音的样本对声音情绪模型进行训练情绪。
  3. 如权利要求2所述的方法,其中,所述根据所述用户的声音得到情绪关键词包括:
    采用所述声音情绪模型对所述用户的声音进行分析,得到情绪关键词。
  4. 如权利要求2所述的方法,其中,所述根据从采集的每个声音中提取的声纹确定声音的样本包括:
    将采集到的声音的信号转换成声谱图和频谱图;
    从声谱图和频谱图中提取声纹;
    基于梅尔频率倒谱系数,从提取的声纹中得到声音特征值,并将声音特征值作为声音的样本。
  5. 如权利要求1所述的方法,其中,训练好的声音情绪模型包括key值-value值表;其中,key值表示声音特征值,value值表示该声音特征值对应的情绪关键词;
    所述根据所述用户的声音得到情绪关键词包括:
    将采集到的声音的信号转换成声谱图和频谱图;
    从声谱图和频谱图中提取声纹;
    基于梅尔频率倒谱系数,从提取的声纹中得到声音特征值;
    根据key值-value值表,得到所述声音特征值对应的情绪关键词。
  6. 如权利要求1所述的方法,还包括:
    对所述用户的声音进行语义解析;
    根据语义解析结果,确定所述用户请求的音乐的检索范围;
    所述为该用户推荐与所述情绪关键词对应类型的音乐包括:
    在用户请求的音乐的检索范围内,为该用户推荐与所述情绪关键词对应类型的音乐。
  7. 如权利要求6所述的方法,其中,所述在用户请求的音乐的检索范围内,为该用户推荐与所述情绪关键词对应类型的音乐包括:
    根据情绪关键词与音乐标签的预设对应关系,得到所述用户的情绪关键词的音乐标签;
    在用户请求的音乐的检索范围内,根据得到的音乐标签,为所述用户推荐音乐。
  8. 如权利要求1所述的方法,还包括:
    对所述用户的声音进行语义解析;
    在根据语义解析结果确定该声音中含有唤醒词的情况下,根据所述用户的声音得到情绪关键词。
  9. 一种音乐推荐的装置,包括:
    获取模块,用于获取用户的声音;
    训练分析模块,用于根据所述用户的声音得到情绪关键词,其中,所述情绪关键词用于表示用户当前情绪;
    推荐模块,用于为该用户推荐与所述情绪关键词对应类型的音乐。
  10. 一种计算机可读存储介质,存储有可执行指令,其中,所述可执行指令被处理器执行时实现如权利要求1-8中任一权利要求所述的方法。
  11. 一种计算装置,包括:
    至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1-8中任一权利要求所述的方法。
PCT/CN2018/121507 2018-09-10 2018-12-17 音乐推荐的方法、装置、计算装置和存储介质 WO2020052135A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811051761.4A CN110889008B (zh) 2018-09-10 2018-09-10 一种音乐推荐方法、装置、计算装置和存储介质
CN201811051761.4 2018-09-10

Publications (1)

Publication Number Publication Date
WO2020052135A1 true WO2020052135A1 (zh) 2020-03-19

Family

ID=69745082

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/121507 WO2020052135A1 (zh) 2018-09-10 2018-12-17 音乐推荐的方法、装置、计算装置和存储介质

Country Status (2)

Country Link
CN (1) CN110889008B (zh)
WO (1) WO2020052135A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737414A (zh) * 2020-06-04 2020-10-02 腾讯音乐娱乐科技(深圳)有限公司 一种歌曲推荐方法及装置、服务器、存储介质
CN113643700A (zh) * 2021-07-27 2021-11-12 广州市威士丹利智能科技有限公司 一种智能语音开关的控制方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115331781B (zh) * 2022-01-06 2023-11-10 中国科学院心理研究所 一种基于心电信号和音乐的抑郁症治疗系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104616664A (zh) * 2015-02-02 2015-05-13 合肥工业大学 一种基于声谱图显著性检测的音频识别方法
CN105095406A (zh) * 2015-07-09 2015-11-25 百度在线网络技术(北京)有限公司 一种基于用户特征的语音搜索方法及装置
CN106128465A (zh) * 2016-06-23 2016-11-16 成都启英泰伦科技有限公司 一种声纹识别系统及方法
CN106128467A (zh) * 2016-06-06 2016-11-16 北京云知声信息技术有限公司 语音处理方法及装置
CN106302987A (zh) * 2016-07-28 2017-01-04 乐视控股(北京)有限公司 一种音频推荐方法及设备
US20180124243A1 (en) * 2016-11-02 2018-05-03 International Business Machines Corporation System and Method for Monitoring and Visualizing Emotions in Call Center Dialogs at Call Centers

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108091340B (zh) * 2016-11-22 2020-11-03 北京京东尚科信息技术有限公司 声纹识别方法、声纹识别系统和计算机可读存储介质
CN107562850A (zh) * 2017-08-28 2018-01-09 百度在线网络技术(北京)有限公司 音乐推荐方法、装置、设备及存储介质
CN108153810A (zh) * 2017-11-24 2018-06-12 广东小天才科技有限公司 一种音乐推荐方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104616664A (zh) * 2015-02-02 2015-05-13 合肥工业大学 一种基于声谱图显著性检测的音频识别方法
CN105095406A (zh) * 2015-07-09 2015-11-25 百度在线网络技术(北京)有限公司 一种基于用户特征的语音搜索方法及装置
CN106128467A (zh) * 2016-06-06 2016-11-16 北京云知声信息技术有限公司 语音处理方法及装置
CN106128465A (zh) * 2016-06-23 2016-11-16 成都启英泰伦科技有限公司 一种声纹识别系统及方法
CN106302987A (zh) * 2016-07-28 2017-01-04 乐视控股(北京)有限公司 一种音频推荐方法及设备
US20180124243A1 (en) * 2016-11-02 2018-05-03 International Business Machines Corporation System and Method for Monitoring and Visualizing Emotions in Call Center Dialogs at Call Centers

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737414A (zh) * 2020-06-04 2020-10-02 腾讯音乐娱乐科技(深圳)有限公司 一种歌曲推荐方法及装置、服务器、存储介质
CN113643700A (zh) * 2021-07-27 2021-11-12 广州市威士丹利智能科技有限公司 一种智能语音开关的控制方法及系统
CN113643700B (zh) * 2021-07-27 2024-02-27 广州市威士丹利智能科技有限公司 一种智能语音开关的控制方法及系统

Also Published As

Publication number Publication date
CN110889008A (zh) 2020-03-17
CN110889008B (zh) 2021-11-09

Similar Documents

Publication Publication Date Title
US10403282B2 (en) Method and apparatus for providing voice service
US11475881B2 (en) Deep multi-channel acoustic modeling
US11132172B1 (en) Low latency audio data pipeline
US10977299B2 (en) Systems and methods for consolidating recorded content
US20200126566A1 (en) Method and apparatus for voice interaction
WO2019109787A1 (zh) 音频分类方法、装置、智能设备和存储介质
WO2019148586A1 (zh) 多人发言中发言人识别方法以及装置
WO2021128741A1 (zh) 语音情绪波动分析方法、装置、计算机设备及存储介质
WO2017084360A1 (zh) 一种用于语音识别方法及系统
CN109785859B (zh) 基于语音分析的管理音乐的方法、装置和计算机设备
WO2019096056A1 (zh) 语音识别方法、装置及系统
WO2022178969A1 (zh) 语音对话数据处理方法、装置、计算机设备及存储介质
WO2020052135A1 (zh) 音乐推荐的方法、装置、计算装置和存储介质
US10573311B1 (en) Generating self-support metrics based on paralinguistic information
US11450306B2 (en) Systems and methods for generating synthesized speech responses to voice inputs by training a neural network model based on the voice input prosodic metrics and training voice inputs
JP6915637B2 (ja) 情報処理装置、情報処理方法、およびプログラム
CN111199732A (zh) 一种基于情感的语音交互方法、存储介质及终端设备
WO2018095167A1 (zh) 声纹识别方法和声纹识别系统
CN108877779B (zh) 用于检测语音尾点的方法和装置
Kumar et al. Machine learning based speech emotions recognition system
WO2020068858A9 (en) Techniques for language model training for a reference language
Han Feature recognition of spoken Japanese input based on support vector machine
CN114678040B (zh) 语音一致性检测方法、装置、设备及存储介质
KR102389776B1 (ko) 요청시 오디오 레코딩으로의 보충적 오디오 콘텐츠의 동적 삽입
Li et al. Acoustic measures for real-time voice coaching

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18933443

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18933443

Country of ref document: EP

Kind code of ref document: A1