WO2020034849A1 - 音乐推荐的方法、装置、计算设备和介质 - Google Patents

音乐推荐的方法、装置、计算设备和介质 Download PDF

Info

Publication number
WO2020034849A1
WO2020034849A1 PCT/CN2019/098861 CN2019098861W WO2020034849A1 WO 2020034849 A1 WO2020034849 A1 WO 2020034849A1 CN 2019098861 W CN2019098861 W CN 2019098861W WO 2020034849 A1 WO2020034849 A1 WO 2020034849A1
Authority
WO
WIPO (PCT)
Prior art keywords
music
user
matching
visual semantic
information
Prior art date
Application number
PCT/CN2019/098861
Other languages
English (en)
French (fr)
Inventor
李岩
王汉杰
叶浩
陈波
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to JP2020549554A priority Critical patent/JP7206288B2/ja
Priority to EP19849335.5A priority patent/EP3757995A4/en
Publication of WO2020034849A1 publication Critical patent/WO2020034849A1/zh
Priority to US17/026,477 priority patent/US11314806B2/en

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/11Indexing; Addressing; Timing or synchronising; Measuring tape travel by using information not detectable on the record carrier
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • G10H1/368Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems displaying animated or moving pictures synchronized with the music or audio part
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals
    • G11B27/036Insert-editing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2220/00Input/output interfacing specifically adapted for electrophonic musical tools or instruments
    • G10H2220/155User input interfaces for electrophonic musical instruments
    • G10H2220/441Image sensing, i.e. capturing images or optical patterns for musical purposes or musical control purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/075Musical metadata derived from musical analysis or for use in electrophonic musical instruments
    • G10H2240/085Mood, i.e. generation, detection or selection of a particular emotional content or atmosphere in a musical piece
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2240/00Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
    • G10H2240/121Musical libraries, i.e. musical databases indexed by musical parameters, wavetables, indexing schemes using musical parameters, musical rule bases or knowledge bases, e.g. for automatic composing methods
    • G10H2240/131Library retrieval, i.e. searching a database or selecting a specific musical piece, segment, pattern, rule or parameter set
    • G10H2240/141Library retrieval matching, i.e. any of the steps of matching an inputted segment or phrase with musical database contents, e.g. query by humming, singing or playing; the steps may include, e.g. musical analysis of the input, musical feature extraction, query formulation, or details of the retrieval process
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/311Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method, an apparatus, a computing device, and a medium for music recommendation.
  • the embodiments of the present application provide a method, an apparatus, a computing device, and a medium for music recommendation, which are used to provide personalization for different users while using less processing resources and bandwidth resources of the computing device when recommending matching music for the user. Recommended services.
  • An embodiment of the present application provides a method for music recommendation, which is executed by a server device and includes:
  • each visual semantic tag is used to describe at least one content of the material
  • the matching music is filtered according to the preset music filtering conditions, and the filtered matching music is recommended as the candidate music of the material.
  • An embodiment of the present application further provides a method for music recommendation, which is executed by a terminal device, and includes:
  • the estimated music appreciation information of the user for each matching music is obtained based on the actual music appreciation information of each candidate music by different users.
  • An embodiment of the present application further provides a device for music recommendation, including:
  • An acquisition unit which is used to acquire material to be soundtracked
  • a first determining unit configured to determine at least one visual semantic tag of the material, and each visual semantic tag is used to describe at least one content of the material;
  • a search unit configured to search each matching music that matches at least one visual semantic tag from the candidate music library
  • a sorting unit configured to sort each matching music according to the user appreciation information for each matching music corresponding to the material
  • a recommendation unit is configured to filter matching music according to a preset music filtering condition based on the sorting result, and recommend the filtered matching music as a candidate music of the material.
  • An embodiment of the present application further provides a device for music recommendation, including:
  • a sending unit configured to send the material to be scored to the server device, and trigger the server device to perform the following steps: determine at least one visual semantic tag of the material; and search for each matching music that matches the at least one visual semantic tag from the candidate music library; According to the estimated music appreciation information of each matching music by the user corresponding to the material, sort each matching music; based on the sorting result, filter the matching music according to the preset music filtering conditions, and recommend the filtered matching music as the material Alternative music
  • a receiving unit configured to receive candidate music returned by the server device
  • the estimated music appreciation information of the user for each matching music is obtained based on the actual music appreciation information of each candidate music by different users.
  • An embodiment of the present application further provides a computing device including at least one processing unit and at least one storage unit, where the storage unit stores a computer program, and when the program is executed by the processing unit, causes the processing unit to execute any of the foregoing music recommendations Steps of the method.
  • An embodiment of the present application further provides a computer-readable medium that stores a computer program executable by a computing device, and when the program runs on a terminal device, causes the computing device to execute the steps of any of the above-mentioned music recommendation methods.
  • the method, device, computing device, and medium for music recommendation determine the visual semantic tags of the music material to be matched, and search for matching music that matches the visual semantic tags, and according to the user appreciation information of each matching music by the user, Sort each matching music and recommend matching music to the user according to the sorted results.
  • the reason for music recommendation can be explained to users through visual semantic tags, and differentiated recommendations are made to different users, and personalized recommendation services for music recommendation are realized, and the need for re-recommendations due to inappropriate music recommendations is further avoided.
  • the problem of wasting processing resources of the computing device and occupying bandwidth resources between the terminal device and the server can save the processing resources of the computing device and the bandwidth resources between the terminal device and the server.
  • FIG. 1 is a schematic structural diagram of a terminal device provided in an embodiment of the present application.
  • FIG. 2 is an implementation flowchart of a music recommendation method according to an embodiment of the present application
  • FIG. 3a is an example diagram of an analytic image provided in an embodiment of the present application.
  • FIG. 3b is a schematic diagram of an Inception submodule of Inception V1 provided in an embodiment of the present application.
  • FIG. 3c is an exemplary diagram of a user music review provided in an embodiment of the present application.
  • FIG. 3d is a second example of a user music review provided in an embodiment of the present application.
  • 3e is a schematic structural diagram of a model of FastText provided in an embodiment of the present application.
  • FIG. 3f is a first schematic diagram of a music recommendation application interface provided in an embodiment of the present application.
  • FIG. 3g is a diagram of a matching music recommendation example of a material provided in an embodiment of the present application.
  • 3h is a second schematic diagram of a music recommendation application interface provided in an embodiment of the present application.
  • 3i is an information interaction diagram provided in an embodiment of the present application.
  • FIG. 4a is a first schematic structural diagram of a music recommendation device according to an embodiment of the present application.
  • 4b is a second structural schematic diagram of a music recommendation device according to an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
  • the embodiments of the present application provide a method, device, computing device, and medium for music recommendation.
  • Terminal device It is an electronic device that can install various applications and can display the entity provided in the installed application.
  • the electronic device can be mobile or fixed.
  • a mobile phone a tablet, an in-vehicle device, a personal digital assistant (PDA), or other electronic devices capable of implementing the above functions.
  • PDA personal digital assistant
  • CNN Convolutional Neural Networks
  • Visual semantic label vector represents the probability distribution of a frame of image corresponding to each label, including: a frame of image corresponding to the score of each label, in the embodiment of the present application, a score can be a frame of image corresponding to one The probability value of each label.
  • An image can be labeled with multiple labels.
  • Label recognition model a model for identifying the input image and determining the label of the image.
  • Music search model It is a model for performing music search according to the input search term, and obtaining music matching the search term.
  • FastText It is a word vector calculation and text classification tool open sourced by Facebook in 2016, but its advantages are also very obvious. In the text classification task, FastText can achieve accuracy comparable to that of deep networks, but Many orders of magnitude faster than deep networks in training time.
  • the embodiment of the present application provides a technical solution for music recommendation to determine the visual semantic label of the material. It also searches for matching music that matches the visual semantic tags, and sorts and recommends matching music according to the user's user appreciation information for the matching music. In this way, differentiated recommendations can be provided for different users, and personalized services can be provided for users.
  • the method for music recommendation provided in the embodiments of the present application can be applied to a terminal device.
  • the terminal device may be a mobile phone, a tablet computer, or a PDA (Personal Digital Assistant).
  • FIG. 1 is a schematic structural diagram of a terminal device 100.
  • the terminal device 100 includes: a processor 110, a memory 120, a power source 130, a display unit 140, and an input unit 150.
  • the processor 110 is a control center of the terminal device 100, and uses various interfaces and lines to connect various components.
  • the processor 110 executes various functions of the terminal device 100 by running or executing software programs and / or data stored in the memory 120, so as to control the terminal.
  • the equipment performs overall monitoring.
  • the processor 110 may include one or more processing units; the processor 110 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, and an application program.
  • the modem processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 110.
  • the processor and the memory may be implemented on a single chip. In other embodiments, they may also be implemented on separate chips.
  • the memory 120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, various application programs, and the like; the storage data area may store data created according to the use of the terminal device 100 and the like.
  • the memory 120 may include a high-speed random access memory, and may further include a non-volatile memory, for example, at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the terminal device 100 further includes a power source 130 (such as a battery) for supplying power to various components.
  • the power source can be logically connected to the processor 110 through a power management system, so as to implement functions such as management of charging, discharging, and power consumption through the power management system.
  • the display unit 140 may be configured to display information input by the user or information provided to the user and various menus of the terminal device 100.
  • the display interface 140 is mainly used to display the display interface and display interface of each application program in the terminal device 100. Displayed entities such as text and pictures.
  • the display unit 140 may include a display panel 141.
  • the display panel 141 may be configured using a liquid crystal display (Liquid Crystal Display, LCD), an organic light emitting diode (Organic Light-Emitting Diode, OLED), or the like.
  • the input unit 150 may be used to receive information such as numbers or characters input by a user.
  • the input unit 150 may include a touch panel 151 and other input devices 152.
  • the touch panel 151 also referred to as a touch screen, can collect touch operations performed by the user on or near the touch panel (for example, the user uses a finger, a touch pen, or any suitable object or accessory on the touch panel 151 or the touch panel 151 Nearby actions).
  • the touch panel 151 can detect a user's touch operation, and detect signals brought by the touch operation, convert these signals into contact coordinates, send them to the processor 110, and receive commands from the processor 110 and execute them.
  • the touch panel 151 may be implemented in various types such as a resistive type, a capacitive type, an infrared type, and a surface acoustic wave.
  • the other input devices 152 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, power on / off keys, etc.), a trackball, a mouse, a joystick, and the like.
  • the touch panel 151 may cover the display panel 141.
  • the touch panel 151 detects a touch operation on or near the touch panel 151, the touch panel 151 transmits the touch operation to the processor 110 to determine the type of the touch event.
  • a corresponding visual output is provided on the display panel 141.
  • the touch panel 151 and the display panel 141 are implemented as input and output functions of the terminal device 100 as two independent components, in some embodiments, the touch panel 151 and the display panel 141 may be implemented. Integrated to implement the input and output functions of the terminal device 100.
  • the terminal device 100 may further include one or more sensors, such as a pressure sensor, a gravity acceleration sensor, a proximity light sensor, and the like.
  • sensors such as a pressure sensor, a gravity acceleration sensor, a proximity light sensor, and the like.
  • the above-mentioned terminal device 100 may further include other components such as a camera. Since these components are not the components used in the embodiments of the present application, they are not shown in FIG. 1 and will not be described in detail. .
  • FIG. 1 is an example of a terminal device and does not constitute a limitation on the terminal device.
  • the terminal device may include more or fewer components than shown in the figure, or some components may be combined. , Or different parts.
  • the method for music recommendation may also be applied to a server device. Both the server device and the terminal device can adopt the structure shown in FIG. 1. Server devices and terminal devices are collectively referred to as computing devices.
  • the method for music recommendation provided in the embodiments of the present application can be applied to recommend matching music for various materials, such as an image collection or a video.
  • the image collection may include one or more images, images, or videos. It can be taken by the user or obtained from other sources.
  • an implementation flowchart of a music recommendation method provided by an embodiment of the present application is performed by a server device.
  • the specific implementation process of the method includes steps 200 to 205, as follows:
  • Step 200 The server device acquires the material that needs a soundtrack.
  • the material when step 200 is performed, the material may be a video or an image collection, and the image collection includes at least one frame of the image.
  • the material of the server device can be obtained in the following ways: the server device receives the material for the soundtrack sent by the terminal device, or the server device directly obtains the material for the soundtrack input by the user, and the server device itself sets the material for the soundtrack.
  • the user can be an instant messaging service (such as WeChat).
  • the user can input various types of materials through his own terminal device, such as short material to be recorded by WeChat friends, and the terminal device sends the short material to the server through the communication network. device.
  • the user uploads the material to be scored directly on the application interface provided on the server device side.
  • the server device may also actively search for materials uploaded by the user to the public platform, then perform soundtrack on these materials, and then send the soundtracked materials to the user.
  • Step 201 The server device determines a visual semantic tag of the material.
  • step 201 when step 201 is performed, the following methods may be adopted:
  • the first method is: determining at least one visual semantic tag specified by the user from the alternative visual semantic tags as at least one visual semantic tag of the material.
  • the user may be provided with some alternative visual semantic tags for the user to choose, where the user designates and submits at least one visual semantic tag he wants, and determines the visual semantic tag specified by the user as at least one visual semantic tag of the material.
  • the second method is to parse the content of the material and determine at least one visual semantic tag of the material. For example, the content of the video or image collection is parsed, and at least one visual semantic tag of the material is determined according to the analysis result.
  • a pre-trained label recognition model is used to perform visual semantic label recognition on the material to obtain the visual semantic label vector of the material, and the visual semantics of the scores in the visual semantic label vector that meet the preset filtering conditions
  • the tag is determined as a visual semantic tag corresponding to the material.
  • the image collection includes at least one frame of image
  • the visual semantic label vector of the material includes: at least one visual semantic label of the content identified from the material and its corresponding score
  • the label recognition model is performed on multiple label recognition samples. After training, each label recognition sample includes a sample image and a visual semantic label vector of the sample image.
  • the server device parses the material according to a preset duration to obtain each frame image.
  • the server device uses the pre-trained label recognition model to perform visual semantic label recognition on each frame of the image, and obtains the visual semantic label vector of each frame of the image.
  • the server device determines an average vector of the visual semantic label vectors of each frame image, and determines the visual semantic labels whose scores meet the preset filtering conditions as the visual semantic labels corresponding to the material.
  • the visual semantic label vector of a frame of image includes at least one visual semantic label of the content identified from the frame of the image and its corresponding score
  • the label recognition model is obtained after training multiple label recognition samples, Each label recognition sample includes a sample image and a visual semantic label vector of the sample image.
  • the preset duration may be 1s, that is, 1s is used to parse a frame of image.
  • the filter condition can be a specified number of visual semantic labels with the highest score. The specified number can be one or more.
  • the server device determines that the visual semantic label corresponding to the material is the sky with the highest score.
  • the label recognition model is a model for identifying an input image and determining a label of the image.
  • the label recognition model may be a model obtained by training a large number of sample images and corresponding visual semantic label vectors, or may be a model established according to an association relationship between image features and visual semantic labels.
  • the specific acquisition method of the label recognition model is not limited herein.
  • a label recognition model is obtained by training a sample image and a visual semantic label vector through a convolutional neural network algorithm as an example for description.
  • the server device Before executing step 201, uses a convolutional neural network algorithm in advance to train a large number of sample images in the image database and the visual semantic label vectors of the sample images, thereby obtaining a label recognition model.
  • Image databases usually contain tens of millions of image data.
  • the visual semantic label vector represents a probability distribution in a frame of an image corresponding to each label, including: a frame of an image corresponding to a score of each label.
  • a score may correspond to a frame of an image.
  • An image can be labeled with multiple labels.
  • FIG. 3a is an example diagram of an analytic image. It is assumed that the set of visual semantic labels includes: sky, mountain, sea, plant, animal, person, snow, light, and car. Then, the server device determines that the visual semantic tag vector corresponding to the parsed image shown in FIG. 3a is ⁇ 0.7, 0.03, 0.1, 0.02, 0, 0, 0, 0.05, 0 ⁇ .
  • an Inception V1 or Inception V3 model in a convolutional neural network may be used, and a cross entropy loss function may be used (Cross Entropy Loss) as a loss function to determine the similarity between the visual semantic label vector obtained from the recognition and the sample visual semantic label vector.
  • Cross entropy Loss a loss function to determine the similarity between the visual semantic label vector obtained from the recognition and the sample visual semantic label vector.
  • FIG. 3b is a schematic diagram of an Inception submodule of Inception V1.
  • the previous layer is used to get the output value of the previous layer.
  • 1x1, 3x3, and 5x5 are all Convolutions.
  • the Inception sub-module performs convolution and pooling (3x3max pooling) on the output value of the previous layer through each convolution check, and uses a filter connection (Filter Concatenation) to process and output to the next layer.
  • a convolutional neural network algorithm can be used in advance to train a large number of sample images in the image database and the visual semantic label vectors of the sample images, thereby obtaining a label recognition model.
  • the pre-trained label recognition model is used to visually identify each frame of the image to obtain the visual semantic label vector of each frame of the image, and determined according to the probability distribution of the material at each visual semantic label.
  • the visual semantic tags corresponding to the materials are labeled with different visual semantic tags for different materials, so that the user can explain the reason for the music recommendation through the visual semantic tags.
  • a label recognition model is directly used to determine the visual semantic label vector of the image, and the visual semantic label of the image is determined according to the visual semantic label vector.
  • Step 202 The server device searches each candidate music library that matches at least one visual semantic tag from the candidate music library.
  • the server device uses the pre-trained music search model based on the at least one visual semantic tag to search for each matching music that matches the at least one visual semantic tag from the candidate music library.
  • the visual semantic label is "missing my old mother"
  • the server device searches the candidate music library for the matching music that matches "missing my old mother” according to the music search model as "Mother” by Yan Weiwen.
  • the music search model is a model for performing music search according to the input search word, and obtaining music matching the search word.
  • the music search model can be obtained through text classification algorithms or the relationship between text and music.
  • the specific method of obtaining the music search model is not limited here.
  • a preset text classification algorithm is used for training text and music to obtain a music search model as an example for description.
  • the server device may obtain a music search model after performing text training by using a preset text classification algorithm based on the music review information of each user on each music in advance.
  • Text classification algorithms are used for text classification. This is because the massive music review information of each user on each song can reflect the theme and mood of each song, and different songs may have completely different review styles.
  • FIG. 3c is an example of a user's music review.
  • FIG. 3d is an example of a user's music review.
  • the three songs in Figure 3d are Huslan's “Hong Yan”, Yan Weiwen's “Mother”, and the military song “Military Flowers in the Army”.
  • the comments of "Hong Yan” are mostly concentrated Homesickness, hometown, Inner Mongolia, and Saibei, "Mother” is mostly for the love of children, parents, and “Green Flowers in the Army” is more about the life of the army and the military.
  • the text classification algorithm may adopt FastText.
  • FIG. 3e is a schematic diagram of a model structure of FastText.
  • the input layer (x1, x2 ... xN ) is used to input the user's music review information;
  • the hidden layer is used to generate a hidden layer vector based on the input music review information;
  • the output layer is used to classify the hidden layer vector , Which is classified by music.
  • the optimization objective function of FastText is:
  • x n is user's music review information
  • y n is music
  • matrix parameter A is a word-based quick lookup table, that is, the word's embedding vector
  • the mathematical meaning of the Ax n matrix operation is to add or take the word's embedding vector Average to get the hidden layer vector.
  • the matrix parameter B is a parameter of the function f
  • the function f is a multi-class linear function.
  • a preset text classification algorithm is used to perform text training to obtain a music search model, and a pre-trained music search model is used to search out and visualize from the candidate music library. Semantic tags match each matching music.
  • Step 203 The server device determines user appreciation information for each matching music by the user corresponding to the material.
  • step 203 when step 203 is performed, the following methods may be adopted:
  • the first method is to use one parameter value or a weighted average value of multiple parameter values of the music appreciation behavior data as the user appreciation information for the music appreciation behavior data of each matching music by the user providing the material.
  • the second method is: The server device predicts the estimated music appreciation information of each matching music by the user based on the actual music appreciation information of each matching music by the similar users of the user, and uses the estimated music appreciation information as the user appreciation. information.
  • the third method is as follows: the server device obtains a predetermined estimated evaluation matrix, and directly obtains the user's estimated music appreciation information for each matching music in the estimated evaluation matrix, and uses the estimated music appreciation information as the user appreciation information.
  • priorities can be set for various methods.
  • the priority order of the methods is not limited.
  • the server device obtains user attribute information of each user who appreciates each matching music, and filters out similar users whose user attribute information is similar to the user attribute information of the user who inputs the material.
  • the server device separately acquires actual music appreciation information of each similar user for each matching music.
  • the server device averages the actual music appreciation information of each matching music by each similar user, and estimates the user's estimated music appreciation information of each matching music.
  • the server device sorts each matching music according to the estimated music appreciation information of the corresponding music by the user corresponding to the material, and the estimated music appreciation information of the matching music by the user is based on the different users on each candidate.
  • the actual music appreciation information of the music is obtained.
  • the server device obtains a parameter value of one piece of music appreciation behavior data of music according to a user corresponding to the material, or is obtained by weighting the parameter values of at least two types of music appreciation behavior data of music. Comprehensive value to sort each matching music.
  • the user attribute information is used to describe the characteristics of the user.
  • the user attribute information may include: gender, age, education, and job.
  • a user's actual music appreciation information for a piece of music is obtained by weighting each parameter value contained in the user's music appreciation behavior data; the music appreciation behavior data contains any one or any combination of the following parameters: music Ratings, click-through rates, favorite behavior, like behavior, and sharing behavior.
  • the estimated music appreciation information of the matching music by the users can be predicted, so that the matching music can be recommended for the users according to the actual music appreciation information of the similar users.
  • the server device determines an estimated evaluation matrix in advance based on actual user appreciation information of each candidate music in the candidate music library by each user.
  • the server device composes a scoring matrix based on each user's actual music appreciation information for each candidate music.
  • the element mij in the scoring matrix represents a value corresponding to the appreciation of the music j by the user i.
  • the server device performs matrix decomposition on the scoring matrix by using a preset matrix decomposition algorithm to obtain a user matrix and a music feature matrix.
  • the product of the transposition of each music feature vector in the music feature matrix and each user vector in the user matrix is determined as the estimated music appreciation information of each user for each music.
  • the matrix decomposition algorithm may use the FunkSVD algorithm, and the specific principle is as follows:
  • M is a scoring matrix
  • P is a user matrix
  • Q is a music feature matrix
  • m is the total number of users
  • n is the total number of music
  • k is a parameter.
  • the estimated music score of user j for music j can be expressed by qTjpi.
  • p is the user vector and q is the music feature vector.
  • the mean square error is used as a loss function to determine the final P and Q.
  • p is a user vector
  • q is a music feature vector
  • is a regularization coefficient
  • i is a user number
  • j is a music number.
  • pi pi + ⁇ ((mij-qTjpi) qj- ⁇ pi);
  • qj qj + ⁇ ((mij-qTjpi) pi- ⁇ qj);
  • the user matrix and the music feature matrix can be obtained through matrix decomposition, and then based on the user matrix and the music feature matrix, each user's prediction of each music is obtained Evaluate the evaluation matrix, and determine the estimated evaluation matrix as the user's estimated music appreciation information for each candidate music.
  • step 204 the server device sorts each matching music according to the user appreciation information of each matching music by the user corresponding to the material.
  • Step 205 The server device filters each matching music according to a preset music filtering condition based on the ranking result, and recommends the filtered matching music as a candidate music of the material.
  • the server device selects matching music that matches the preset music filtering conditions according to the sorting among the matching music, and directly displays the filtered candidate music to the user according to the sorting or sends the information of the candidate music to the terminal device. .
  • the music filtering condition may be filtering out matching music whose value in the user appreciation information is higher than a set value, or according to a sorting result from high to low, filtering out a match whose serial number is higher than the set value Music, or countdown set number of matching music.
  • the user can select his favorite music among the candidate music for material soundtrack.
  • FIG. 3f is a first schematic diagram of a music recommendation application interface.
  • the terminal device asks the user whether the music is a small video soundtrack.
  • FIG. 3g which is an example of a matching music recommendation for a material.
  • the terminal device determines that the user is a small video soundtrack, it sends a small video to the server device, the server device parses the small video, and determines that the visual semantic tags of the small video are snow and motion. Then, the server device searches the massive music library (candidate music library) for 5 songs matching the snow, and searches for 5 songs matching the movement. Next, the server device sorts the songs according to the user's estimated music appreciation information for the above 10 songs.
  • FIG. 3h which is a second schematic diagram of a music recommendation application interface. In FIG. 3h, the top 5 songs are recommended to the user according to the order.
  • the terminal device receives the information of the candidate music returned by the server device, and displays the information of the candidate music to the user, determines to receive the user's instruction information for specifying the soundtrack music from the candidate music, and obtains the synthesized and Output clips with soundtrack music.
  • the first way is: sending the instruction information to the server device, and receiving the material from the server device, which is composed of the soundtrack music.
  • the second method is: sending the instruction information to the server device, and receiving the soundtrack music returned by the server device according to the instruction information, and synthesizing the soundtrack music into the material.
  • the server device receives the instruction information sent from the terminal device to specify the soundtrack music from the candidate music, synthesizes the soundtrack music into the material according to the instruction information, and sends the synthesized material to the terminal device.
  • a number of material semantic tags of the material are determined, and a number of matching music matched by the material semantic tags are searched based on a music search model obtained by each user for the music review information of each music, and based on the user appreciation information of the user, Sort the matching music and make music recommendations to users based on the sorted results.
  • Sort the matching music and make music recommendations to users based on the sorted results it is possible to personalize services according to different users' preferences for different music, that is, to make different recommendations to different users, not only to recommend users music that matches the material, but also to recommend users to the music they like.
  • An embodiment of the present application further provides a method for music recommendation, which is executed by a terminal device and includes:
  • the terminal device sends the material to be dubbed to the server device, and triggers the server device to perform the following steps: determining at least one visual semantic tag of the material; searching from the candidate music library for each matching music that matches the at least one visual semantic tag; according to the material
  • Corresponding users sort the estimated music appreciation information of each matching music, sort each matching music; based on the sorting result, filter the matching music according to the preset music filtering conditions, and recommend the filtered matching music as a backup of the material Choose music.
  • the terminal device receives the alternative music returned by the server device.
  • the estimated music appreciation information of the user for each matching music is obtained based on the actual music appreciation information of each candidate music by different users.
  • FIG. 3i is an interactive sequence diagram of a music soundtrack.
  • the specific implementation process of this method is as follows:
  • Step 301 The terminal device sends instruction information for scoring the material to the server device.
  • Step 302 The terminal device receives the candidate music based on the material recommendation returned by the server device.
  • Step 303 The terminal device sends, to the server device, instruction information for using the specified music in the candidate music for soundtracking.
  • Step 304 The terminal device receives the music-synthesized material returned by the server device.
  • an embodiment of the present application further provides a device for music recommendation. Since the principle of solving the problem by the above device and device is similar to the method for music recommendation described above, the implementation of the device can refer to the implementation of the above method and repeat The details are not repeated here.
  • FIG. 4a it is a first structural schematic diagram of a music recommendation device according to an embodiment of the present application, including:
  • An obtaining unit 400 for obtaining materials to be soundtracked
  • a first determining unit 401 configured to determine at least one visual semantic tag of a material, and each visual semantic tag is used to describe at least one content of the material;
  • a search unit 402 configured to search each matching music that matches at least one visual semantic tag from a candidate music library
  • a sorting unit 403 configured to sort each matching music according to the user appreciation information for each matching music corresponding to the material
  • the recommendation unit 404 is configured to filter matching music according to a preset music filtering condition based on the ranking result, and recommend the filtered matching music as a candidate music of the material.
  • the recommendation unit 404 is further configured to:
  • the first determining unit 401 further includes:
  • a second determining unit configured to determine at least one visual semantic tag specified by the user from the alternative visual semantic tags as at least one visual semantic tag of the material;
  • a parsing unit configured to parse the content of the material and determine at least one visual semantic tag of the material.
  • the parsing unit is specifically configured to:
  • a pre-trained label recognition model is used to perform visual semantic label recognition on the material to obtain the visual semantic label vector of the material, and the visual semantic labels whose scores in the visual semantic label vector meet the preset filtering conditions are determined.
  • the image collection includes at least one frame of image
  • the visual semantic label vector of the material includes: at least one visual semantic label of the content identified from the material and its corresponding score
  • the label recognition model is performed on multiple label recognition samples. After training, each label recognition sample includes a sample image and a visual semantic label vector of the sample image.
  • the parsing unit is specifically configured to:
  • the material is a video
  • the material is frame-parsed to obtain each frame image
  • the visual semantic label vector of a frame of image includes at least one visual semantic label of the content identified from the frame of the image and its corresponding score
  • the label recognition model is obtained after training multiple label recognition samples, Each label recognition sample includes a sample image and a visual semantic label vector of the sample image.
  • the search unit 402 is specifically configured to:
  • the music search model is obtained after each user performs text classification training on the music review information of each music.
  • the sorting unit 403 is specifically configured to:
  • the estimated music appreciation information of each matching music by the user corresponding to the material sort each matching music, and the estimated music appreciation information of each matching music by the user is obtained based on the actual music appreciation information of each candidate music by different users;
  • a user's actual music appreciation information for a piece of music is obtained by weighting each parameter value included in the user's music appreciation behavior data;
  • the music appreciation behavior data includes any one or any combination of the following parameters : Music ratings, click-through rates, favorite behavior, like behavior, and sharing behavior.
  • the sorting unit 403 is specifically configured to:
  • For the matched music obtain user attribute information of each user who appreciates the matched music, and filter out similar users whose user attribute information is similar to the user attribute information of the user who inputs the material;
  • the average music processing information of each matching music is averaged for each similar user, and the estimated music appreciation information of each matching music is estimated by the user.
  • the sorting unit 403 is specifically configured to:
  • the product of the transpose of each music feature vector in the music feature matrix and each user vector in the user matrix is determined as the estimated music appreciation information of each user for each music.
  • the sorting unit 403 is specifically configured to:
  • the music appreciation behavior data of a user for a piece of music includes any one or any combination of the following parameters: music score, click rate, favorite behavior, like behavior, and sharing behavior.
  • FIG. 4b it is a second structural schematic diagram of a music recommendation device according to an embodiment of the present application, including:
  • the sending unit 410 is configured to send the material to be scored to the server device, and trigger the server device to perform the following steps: determine at least one visual semantic tag of the material; and search for each matching music that matches the at least one visual semantic tag from the candidate music library. ; According to the estimated music appreciation information of each matching music by the user corresponding to the material, sort each matching music; based on the sorting result, filter the matching music according to the preset music filtering conditions, and recommend the filtered matching music as Alternative music for the material;
  • a receiving unit 411 configured to receive candidate music returned by the server device
  • the estimated music appreciation information of the user for each matching music is obtained based on the actual music appreciation information of each candidate music by different users.
  • an embodiment of the present application further provides a computing device including at least one processing unit and at least one storage unit, where the storage unit stores a computer program, and when the program is executed by the processing unit, the processing unit Perform the steps of the method described in the above embodiments.
  • the computing device may be a server device or a terminal device, and both the server device and the terminal device may adopt the structure shown in FIG. 5.
  • the terminal device is taken as an example to describe the structure of the computing device.
  • An embodiment of the present application provides a terminal device 500. Referring to FIG. 5, the terminal device 500 is configured to implement the methods described in the foregoing method embodiments.
  • the terminal device 500 may include a memory 501, The processor 502, the input unit 503, and the display panel 504.
  • the memory 501 is configured to store a computer program executed by the processor 502.
  • the memory 501 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the terminal device 500, and the like.
  • the processor 502 may be a central processing unit (CPU), or a digital processing unit.
  • the input unit 503 may be configured to obtain a user instruction input by a user.
  • the display panel 504 is used to display information input by or provided to the user. In the embodiment of the present application, the display panel 504 is mainly used to display the display interface of each application in the terminal device and the control entity displayed in each display interface. . In the embodiment of the present application, the display panel 504 may be configured with a liquid crystal display (LCD) or an organic light-emitting diode (OLED).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • the embodiments of the present application are not limited to specific connection media between the memory 501, the processor 502, the input unit 503, and the display panel 504.
  • the memory 501, the processor 502, the input unit 503, and the display panel 504 are connected by a bus 505 in FIG. 5.
  • the bus 505 is indicated by a thick line in FIG. 5. It is for illustrative purposes and is not limited.
  • the bus 505 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in FIG. 5, but it does not mean that there is only one bus or one type of bus.
  • the memory 501 may be a volatile memory, such as a random-access memory (RAM); the memory 501 may also be a non-volatile memory, such as a read-only memory, and a flash memory.
  • RAM random-access memory
  • the memory 501 may also be a non-volatile memory, such as a read-only memory, and a flash memory.
  • Memory flash memory
  • HDD hard disk
  • SSD solid-state drive
  • memory 501 can be used to carry or store the desired program code in the form of instructions or data structures and can be implemented by Any other media that the computer accesses, but is not limited to.
  • the memory 501 may be a combination of the above-mentioned memories.
  • the processor 502 is configured to implement the embodiment shown in FIG. 2 and includes:
  • the processor 502 is configured to call a computer program stored in the memory 501 to execute the embodiment shown in FIG. 2.
  • An embodiment of the present application further provides a computer-readable storage medium that stores computer-executable instructions that need to be executed to execute the processor, and includes a program that is required to execute the processor.
  • the storage medium stores a computer program executable by a computing device, and when the program runs on the computing device, causes the computing device to execute the steps of the method described in the foregoing embodiment.
  • aspects of a method for music recommendation provided in the present application may also be implemented in the form of a program product, which includes program code.
  • the program product runs on a terminal device, the program code is used
  • the method is to enable the terminal device to perform the steps in a method for music recommendation according to various exemplary embodiments of the present application described above in this specification.
  • the terminal device may execute the embodiment shown in FIG. 2.
  • the program product may employ any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • the program product for a music recommendation may adopt a portable compact disc read-only memory (CD-ROM) and include a program code, and may run on a computing device.
  • CD-ROM portable compact disc read-only memory
  • the program product of the present application is not limited thereto.
  • the readable storage medium may be any tangible medium containing or storing a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • the readable signal medium may include a data signal that is borne in baseband or propagated as part of a carrier wave, in which readable program code is carried. Such a propagated data signal may take a variety of forms, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with an instruction execution system, apparatus, or device.
  • Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • the program code used to perform the operations of this application can be written in any combination of one or more programming languages.
  • the programming languages include entity-oriented programming languages—such as Java, C ++, etc., and also include conventional procedural programming. Language—such as "C” or a similar programming language.
  • the program code may be executed entirely on the user computing device, partly on the user device, as an independent software package, partly on the user computing device, partly on the remote computing device, or entirely on the remote computing device or server On the device.
  • the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it may be connected to an external computing device (e.g., using Internet services Provider to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet services Provider to connect via the Internet
  • this application may be provided as a method, a system, or a computer program product. Therefore, this application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Moreover, this application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to work in a particular manner such that the instructions stored in the computer-readable memory produce a manufactured article including an instruction device, the instructions
  • the device implements the functions specified in one or more flowcharts and / or one or more blocks of the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing device, so that a series of steps can be performed on the computer or other programmable device to produce a computer-implemented process, which can be executed on the computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more flowcharts and / or one or more blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Acoustics & Sound (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请公开了音乐推荐的方法、装置、计算设备和介质,属于计算机技术领域,该方法包括,确定待配乐素材的视觉语义标签,并搜索视觉语义标签匹配的匹配音乐,并根据用户对各匹配音乐的用户鉴赏信息,对各匹配音乐进行排序,以及按照排序结果向用户进行匹配音乐推荐。这样,可以通过视觉语义标签向用户解释音乐推荐的理由,并且对不同用户进行差异化推荐,实现了音乐推荐的个性化推荐服务。

Description

音乐推荐的方法、装置、计算设备和介质
本申请要求于2018年8月14日提交中国专利局、申请号为201810924409.0、发明名称为“一种音乐推荐的方法、装置、终端设备和介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及音乐推荐的方法、装置、计算设备和介质。
发明背景
随着各类即时通信应用的兴起,分享行为无处不在,用户对分享素材的形式要求越来越多样化,单纯分享图片视频等素材已经不能满足人们的需求,对素材进行配乐成为一种新的需求。现有技术下,通常提取素材的素材特征以及音乐的音乐特征后,根据提取的素材特征以及音乐特征建立素材与音乐之间的匹配关系,进而通过匹配关系为用户的素材推荐匹配的音乐。其中,用户获得的素材种类可能很多,例如网上的图片视频,或者自己拍摄的视频或图像集合等。
但是,采用这种方式,仅能按照固定的匹配关系对不同用户进行推荐,无法为用户提供个性化服务。
发明内容
本申请实施例提供音乐推荐的方法、装置、计算设备和介质,用以在为用户推荐素材匹配的音乐时,在使用较少计算设备的处理资源和带宽资源的情况下为不同用户提供个性化推荐服务。
本申请实施例提供一种音乐推荐的方法,由服务器设备执行,包括:
获取待配乐的素材;
确定素材的至少一个视觉语义标签,每个视觉语义标签用于描述素材的至少一项内容;
从候选音乐库中,搜索出与至少一个视觉语义标签匹配的各个匹配音乐;
根据素材对应的用户针对各个匹配音乐的用户鉴赏信息,对各个匹配音乐进行排序;
基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为素材的备选音乐。
本申请实施例还提供一种音乐推荐的方法,由终端设备执行,包括:
向服务器设备发送待配乐的素材,触发服务器设备执行以下步骤:确定素材的至少一个视觉语义标签;从候选音乐库中,搜索出与至少一个视觉语义标签匹配的各个匹配音乐;根据素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序;基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为素材的备选音乐;
接收服务器设备返回的备选音乐;
其中,用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的。
本申请实施例还提供一种音乐推荐的装置,包括:
获取单元,用于获取待配乐的素材;
第一确定单元,用于确定素材的至少一个视觉语义标签,每个视觉语义标签用于描述素材的至少一项内容;
搜索单元,用于从候选音乐库中,搜索出与至少一个视觉语义标签匹配的各个匹配音乐;
排序单元,用于根据素材对应的用户针对各个匹配音乐的用户鉴赏信息,对各个匹配音乐进行排序;
推荐单元,用于基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为素材的备选音乐。
本申请实施例还提供一种音乐推荐的装置,包括:
发送单元,用于向服务器设备发送待配乐的素材,触发服务器设备执行以下步骤:确定素材的至少一个视觉语义标签;从候选音乐库中,搜索出与至少一个视觉语义标签匹配的各个匹配音乐;根据素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序;基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为素材的备选音乐;
接收单元,用于接收服务器设备返回的备选音乐;
其中,用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的。
本申请实施例还提供一种计算设备,包括至少一个处理单元、以及至少一个存储单元,其中,存储单元存储有计算机程序,当程序被处理单元执行时,使得处理单元执行上述任意一种音乐推荐的方法的步骤。
本申请实施例还提供一种计算机可读介质,其存储有可由计算设备执行的计算机程序,当程序在终端设备上运行时,使得计算设备执行上述任意一种音乐推荐的方法的步骤。
本申请实施例提供的音乐推荐的方法、装置、计算设备和介质中,确定待配乐素材的视觉语义标签,并搜索视觉语义标签匹配的匹配音乐,并根据用户对各匹配音乐的用户鉴赏信息,对各匹配音乐进行排序,以及按照排序结果向用户进行匹配音乐推荐。这样,可以通过视觉语义标签向用户解释音乐推荐的理由,并且对不同用户进行差异化推荐,实现了音乐推荐的个性化推荐服务,而且进一步可避免因为音乐推荐不合适而需要重新推荐所造成的浪费计算设备的处理资源以及占用终端设备和服务器之间的带宽资源的问题,从而可以节省计算设备的处理资源以及终端设备和服务器之间的带宽资源。
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在所写的说明书、权利要求书、以及附图中所特别指出的结构来实现和获得。
附图简要说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本申请实施方式中提供的一种终端设备的结构示意图;
图2为本申请实施方式中一种音乐推荐的方法的实施流程图;
图3a为本申请实施方式中提供的一种解析图像示例图;
图3b为本申请实施方式中提供的一种Inception V1的Inception子模块示意图;
图3c为本申请实施方式中提供的一种用户音乐评论的示例图一;
图3d为本申请实施方式中提供的一种用户音乐评论的示例图二;
图3e为本申请实施方式中提供的一种FastText的模型结构示意图;
图3f为本申请实施方式中提供的一种音乐推荐应用界面示意图一;
图3g为本申请实施方式中提供的一种素材的匹配音乐推荐示例图;
图3h为本申请实施方式中提供的一种音乐推荐应用界面示意图二;
图3i为本申请实施方式中提供的一种信息交互图;
图4a为本申请实施方式中一种音乐推荐的装置的结构示意图一;
图4b为本申请实施方式中一种音乐推荐的装置的结构示意图二;
图5为本申请实施方式中终端设备结构示意图。
实施本发明的方式
为了在为用户推荐素材匹配的音乐时,为不同用户提供个性化推荐,本申请实施例提供了音乐推荐的方法、装置、计算设备和介质。
首先,对本申请实施例中涉及的部分用语进行说明,以便于本领域技术人员理解。
1、终端设备:是可以安装各类应用程序,并且能够将已安装的应用程序中提供的实体进行显示的电子设备,该电子设备可以是移动的,也可以是固定的。例如,手机、平板电脑、车载设备、个人数字助理(personal digital assistant,PDA)或其它能够实现上述功能的电子设备等。
2、卷积神经网络算法:是近年发展起来,并引起广泛重视的一种高效识别方法。20世纪60年代,Hubel和Wiesel在研究猫脑皮层中用于局部敏感和方向选择的神经元时发现其独特的网络结构可以有效地降低反馈神经网络的复杂性,继而提出了卷积神经网络(ConvolutionalNeural Networks,CNN)。现在,CNN已经成为众多科学领域的研究热点之一,特别是在模式分类领域,由于该网络避免了对图像的复杂前期预处理,可以直接输入原始图像,因而得到了更为广泛的应用。
3、视觉语义标签向量:表示一帧图像对应各个标签中的概率分布,包括:一帧图像分别对应每一标签的分值,在本申请实施例中,一个分值可以为一帧图像对应一种标签的概率值。一帧图像可以标注多个标签。
4、标签识别模型:为用于对输入的图像进行识别,确定该图像的标签的模型。
5、音乐搜索模型:为用于根据输入的搜索词进行音乐搜索,获得该搜索词匹配的音乐的模型。
6、FastText:是脸书(facebook)于2016年开源的一个词向量计算和文本分类工具,但是它的优点也非常明显,在文本分类任务中,FastText能取得和深度网络相媲美的精度,却在训练时间上比深度网络快许多数量级。
由于通过素材与音乐的固定的匹配关系为用户输入的素材推荐匹配音乐,无法为不同用户提供差异化服务,因此,本申请实施例提供一种音乐推荐的技术方案,确定素材的视觉语义标签,并搜索视觉语义标签匹配的匹配音乐,以及按照用户对匹配音乐的用户鉴赏信息为匹配音乐进行排序以及推荐。这样,可以为不同用户提供差异化推荐,为用户提供个性化服务。
本申请实施例提供的一种音乐推荐的方法,可应用于终端设备中,该终端设备可以为手机、平板电脑、PDA(Personal Digital Assistant,掌上电脑)等。
图1示出了一种终端设备100的结构示意图。参阅图1所示,终端设备100包括:处理器110、存储器120、电源130、显示单元140、输入单元150。
处理器110是终端设备100的控制中心,利用各种接口和线路连接各个部件,通过运行或执行存储在存储器120内的软件程序和/或数据,执行终端设备100的各种功能,从而对终端设备进行整体监控。
在本申请实施例中,处理器110可包括一个或多个处理单元;处理器110可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器110中。在一些实施例中,处理器、存储器、可以在单一芯片上实现,在另一些实施例中,它们也可以在独立的芯片上分别实现。
存储器120可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、各种应用程序等;存储数据区可存储根据终端设备100的使用所创建的数据等。此外,存储器120可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件等。
终端设备100还包括给各个部件供电的电源130(比如电池),电源可以通过电源管理系统与处理器110逻辑相连,从而通过电源管理系统实现管理 充电、放电、以及功耗等功能。
显示单元140可用于显示由用户输入的信息或提供给用户的信息以及终端设备100的各种菜单等,本申请实施例中主要用于显示终端设备100中各应用程序的显示界面以及显示界面中显示的文本、图片等实体。显示单元140可以包括显示面板141。显示面板141可以采用液晶显示屏(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置。
输入单元150可用于接收用户输入的数字或字符等信息。输入单元150可包括触控面板151以及其他输入设备152。其中,触控面板151,也称为触摸屏,可收集用户在其上或附近的触摸操作(比如用户使用手指、触摸笔等任何适合的物体或附件在触控面板151上或在触控面板151附近的操作)。
具体的,触控面板151可以检测用户的触摸操作,并检测触摸操作带来的信号,将这些信号转换成触点坐标,发送给处理器110,并接收处理器110发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触控面板151。其他输入设备152可以包括但不限于物理键盘、功能键(比如音量控制按键、开关机按键等)、轨迹球、鼠标、操作杆等中的一种或多种。
当然,触控面板151可覆盖显示面板141,当触控面板151检测到在其上或附近的触摸操作后,传送给处理器110以确定触摸事件的类型,随后处理器110根据触摸事件的类型在显示面板141上提供相应的视觉输出。虽然在图1中,触控面板151与显示面板141是作为两个独立的部件来实现终端设备100的输入和输出功能,但是在某些实施例中,可以将触控面板151与显示面板141集成而实现终端设备100的输入和输出功能。
终端设备100还可包括一个或多个传感器,例如压力传感器、重力加速度传感器、接近光传感器等。当然,根据具体应用中的需要,上述终端设备100还可以包括摄像头等其它部件,由于这些部件不是本申请实施例中重点使用的部件,因此,在图1中没有示出,且不再详述。
本领域技术人员可以理解,图1是终端设备的举例,并不构成对终端设备的限定,在其它实施例中,终端设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件。
本申请实施例中,该音乐推荐的方法也可应用于服务器设备中。服务器 设备以及终端设备都可以采用图1中所示的结构。服务器设备和终端设备统称为计算设备。本申请实施例提供的一种音乐推荐的方法,可以应用于为各种素材进行匹配音乐的推荐,各种素材例如图像集合或者视频,图像集合中可以包含一张或多张图像,图像或者视频可以是用户自己拍摄,也可以是从其他途径获取的。
参阅图2所示,为本申请实施例提供的一种音乐推荐的方法的实施流程图,该方法由服务器设备执行,该方法的具体实施流程包括步骤200~205,具体如下:
步骤200:服务器设备获取需要配乐的素材。
在本申请实施例中,执行步骤200时,素材可以为视频或图像集合,图像集合中包含至少一帧图像。
其中,服务器设备的素材可以通过以下方式获得:服务器设备接收终端设备发送的待配乐的素材,或,服务器设备直接获取用户输入的待配乐的素材,服务器设备自身设定待配乐的素材。用户可以为即时通讯业务(如,微信)的用户,用户可以通过自己的终端设备输入各类素材,例如微信朋友圈拍摄的待配乐的短素材,终端设备再将短素材通过通信网络发送给服务器设备。再例如,用户直接在服务器设备端提供的应用界面上上传待配乐的素材等。还例如,服务器设备也可以主动搜索用户上传到公共平台的素材,然后对这些素材进行配乐,并将配乐后的素材再发送给用户等。
步骤201:服务器设备确定素材的视觉语义标签。
具体的,执行步骤201时,可以采用以下几种方式:
第一种方式为:将用户从备选的视觉语义标签中指定的至少一个视觉语义标签,确定为素材的至少一个视觉语义标签。例如,可以为用户提供一些备选的视觉语义标签供用户选择,用户在其中指定自己想要的至少一个视觉语义标签并提交,将用户指定的视觉语义标签确定为素材的至少一个视觉语义标签。
第二种方式为:解析素材的内容,确定素材的至少一个视觉语义标签。例如,对视频或图像集合的内容进行解析,根据解析结果确定素材的至少一个视觉语义标签。
其中,若素材为图像集合,则利用预先训练的标签识别模型,对素材进行视觉语义标签识别,获得素材的视觉语义标签向量,并将视觉语义标签向 量中分值符合预设筛选条件的视觉语义标签,确定为素材对应的视觉语义标签。
其中:图像集合中包含至少一帧图像,素材的视觉语义标签向量包括:从素材中识别出的内容的至少一个视觉语义标签及其对应的分值,标签识别模型为对多个标签识别样本进行训练后获得的,每个标签识别样本包括样本图像和该样本图像的视觉语义标签向量。
其中,若素材为视频,则执行以下步骤:
首先,服务器设备将素材按照预设时长进行帧解析,获得各帧图像。
然后,服务器设备利用预先训练的标签识别模型,分别对每一帧图像进行视觉语义标签识别,获得每一帧图像的视觉语义标签向量。
最后,服务器设备确定各帧图像的视觉语义标签向量的平均向量,并将平均向量中的分值符合预设筛选条件的视觉语义标签,确定为素材对应的视觉语义标签。
其中:一帧图像的视觉语义标签向量包括:从该帧图像中识别出的内容的至少一个视觉语义标签及其对应的分值,标签识别模型为对多个标签识别样本进行训练后获得的,每个标签识别样本包括样本图像和该样本图像的视觉语义标签向量。
在本申请实施例中,预设时长可以为1s,即1s解析一帧图像。筛选条件可以为筛选出分值最高的指定数量的视觉语义标签。指定数量可以为一个或多个。
例如,假设视觉语义标签集合包括:天空、山、海、植物、动物、人、雪、灯以及车,指定数量为1。平均向量为{0.7,0.03,0.1,0.02,0,0,0,0.05,0}时,服务器设备确定素材对应的视觉语义标签为分值最高的天空。
其中,标签识别模型为用于对输入的图像进行识别,并确定该图像的标签的模型。标签识别模型可以为通过对大量样本图像以及相应视觉语义标签向量进行训练后获得的模型,也可以为根据图像特征与视觉语义标签之间的关联关系建立的模型。标签识别模型的具体获得方式在此不做限制。
本申请实施例中,以通过卷积神经网络算法对样本图像以及视觉语义标签向量进行训练获得标签识别模型为例进行说明。
在执行步骤201之前,服务器设备预先采用卷积神经网络算法,对图像数据库中大量的样本图像以及该样本图像的视觉语义标签向量进行训练,从 而获得标签识别模型。图像数据库通常包含千万级的图像数据。
其中,视觉语义标签向量表示一帧图像对应各个标签中的概率分布,包括:一帧图像分别对应每一标签的分值,在本申请实施例中,一个分值可以为一帧图像对应一种标签的概率值。一帧图像可以标注多个标签。
例如,参阅图3a所示,为一种解析图像示例图。假设视觉语义标签集合包括:天空、山、海、植物、动物、人、雪、灯以及车。则服务器设备确定图3a所示的解析图像对应视觉语义标签向量为{0.7,0.03,0.1,0.02,0,0,0,0.05,0}。
在本申请实施例中,对图像数据库中大量的样本图像以及该样本图像的视觉语义标签向量进行训练时,可以采用卷积神经网络中的Inception V1或Inception V3模型,并且可以采用交叉熵损失函数(Cross Entropy Loss)作为损失函数,以确定识别获得的视觉语义标签向量与样本视觉语义标签向量之间的相似度。这样,就可以根据确定出的相似度对训练过程中的模型参数进行不断调整。
例如,参阅图3b所示,为一种Inception V1的Inception子模块示意图。上一层(Previous layer)用于获取上一层的输出值。1x1,3x3,以及5x5均为卷积核(Convolutions)。Inception子模块通过各卷积核对上一层的输出值进行卷积以及池化(3x3max pooling),并采用过滤器连接(Filter Concatenation)进行处理后输出到下一层。
这样,就可以预先采用卷积神经网络算法,对图像数据库中大量的样本图像以及该样本图像的视觉语义标签向量进行训练,从而获得标签识别模型。若素材为视频,则利用预先训练的标签识别模型,分别对每一帧图像进行视觉语义标签识别,获得每一帧图像的视觉语义标签向量,以及根据素材在各视觉语义标签的概率分布,确定素材对应的视觉语义标签,为不同的素材打上不同的视觉语义标签,从而可以通过视觉语义标签向用户解释音乐推荐的理由。若待匹配的对象为图像集合,则直接采用标签识别模型确定该图像的视觉语义标签向量,并根据视觉语义标签向量确定该图像的视觉语义标签。
步骤202:服务器设备从候选音乐库中,搜索出与至少一个视觉语义标签匹配的各个匹配音乐。
具体的,服务器设备基于至少一个视觉语义标签,采用预先训练的音乐搜索模型,从候选音乐库中,搜索出与至少一个视觉语义标签匹配的各个匹 配音乐。
例如,视觉语义标签为“想念我的老母亲”,服务器设备根据音乐搜索模型,从候选音乐库中,搜索出与“想念我的老母亲”匹配的匹配音乐为阎维文的《母亲》。
其中,音乐搜索模型为用于根据输入的搜索词进行音乐搜索,获得与该搜索词匹配的音乐的模型。音乐搜索模型可以通过文本分类算法,或文本与音乐之间的关联关系等方式获得。音乐搜索模型的具体获得方式在此不做限制。本申请实施例中,以采用预设的文本分类算法进行文本与音乐的训练获得音乐搜索模型为例进行说明。
本申请实施例中,在执行步骤204之前,服务器设备可以预先基于各用户对各音乐的音乐评论信息,采用预设的文本分类算法进行文本训练后获得音乐搜索模型。文本分类算法用于进行文本分类。这是由于各用户对各歌曲的海量音乐评论信息可以反映每一歌曲的主题与意境,不同的歌曲可能有着截然不同的评论风格。
例如,参阅图3c所示,为一种用户音乐评论的示例图一。图3c中,可以看到用户在朋友圈中对分享的音乐进行的评论。参阅图3d所示,为一种用户音乐评论的示例图二。图3d中三首歌曲分别为呼斯楞的《鸿雁》,阎维文的《母亲》,以及军旅歌曲《军中绿花》,根据用户的音乐评论信息可以明显看出《鸿雁》的评论多集中于思乡、故乡、内蒙、塞北,《母亲》则多为儿女情、父母恩,《军中绿花》更多的是对部队生活、军旅生活的怀念。
在本申请实施例中,文本分类算法可以采用FastText。参阅图3e所示,为一种FastText的模型结构示意图。图3e中,输入层(x1、x2……x N)用于输入用户的音乐评论信息;隐含层用于基于输入的音乐评论信息生成隐层向量;输出层用于基于隐层向量进行分类,即按照音乐分类。
其中,优化目标函数用于使得f的似然估计越大,FastText的音乐分类精度越高。FastText的优化目标函数为:
Figure PCTCN2019098861-appb-000001
其中,x n为用户的音乐评论信息,y n为音乐,矩阵参数A是基于单词的快查表,即词的嵌入向量,Ax n矩阵运算的数学意义是将单词的嵌入向量相加或者取平均,得到隐层向量。矩阵参数B是函数f的参数,函数f是一个多分类的线性函数。
这样,就可以根据各用户对各音乐的音乐评论信息,采用预设的文本分类算法进行文本训练后获得音乐搜索模型,并采用预先训练的音乐搜索模型,从候选音乐库中,搜索出与视觉语义标签匹配的各个匹配音乐。
步骤203:服务器设备确定素材对应的用户针对各个匹配音乐的用户鉴赏信息。
具体的,执行步骤203时,可以采用以下几种方式:
第一种方式为:分别针对提供素材的用户对每一匹配音乐的音乐鉴赏行为数据,将音乐鉴赏行为数据的一种参数值、或多种参数值的加权平均值,作为用户鉴赏信息。
第二种方式为:服务器设备基于用户的各相似用户分别对每一匹配音乐的实际音乐鉴赏信息,预测用户对每一匹配音乐的预估音乐鉴赏信息,并将预估音乐鉴赏信息作为用户鉴赏信息。
第三种方式为:服务器设备获取预先确定的预估评价矩阵,并直接获取预估评价矩阵中用户对各匹配音乐的预估音乐鉴赏信息,并将预估音乐鉴赏信息作为用户鉴赏信息。
实际应用中,可以为各种方式设定相应的优先级,本申请实施例中,对各方式的优先顺序不做限定。
具体的,执行第二种方式时,可以采用以下步骤:
首先,服务器设备获取鉴赏各匹配音乐的各用户的用户属性信息,并筛选出用户属性信息与输入素材的用户的用户属性信息相似的各相似用户。
然后,服务器设备分别获取每一相似用户对每一匹配音乐的实际音乐鉴赏信息。
最后,服务器设备分别对各相似用户分别对每一匹配音乐的实际音乐鉴赏信息进行平均值处理,预估用户对各匹配音乐的预估音乐鉴赏信息。
在本申请实施例中,服务器设备根据素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序,用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的。在本申请另一实施例中,服务器设备根据素材对应的用户对音乐的一种音乐鉴赏行为数据的参数值,或者是对音乐的至少两种音乐鉴赏行为数据的参数值进行加权处理后获得的综合值,对各个匹配音乐进行排序。
其中,用户属性信息用于描述用户的特征。在本申请实施例中,用户属 性信息可以包括:性别,年龄,学历以及工作等。一个用户对一首音乐的实际音乐鉴赏信息是对用户的音乐鉴赏行为数据中包含的各个参数值进行加权处理后获得的;音乐鉴赏行为数据中包含以下参数中的任意一种或任意组合:音乐评分、点击率、收藏行为、点赞行为,以及分享行为。
这样,就可以根据用户的各相似用户对匹配音乐的实际音乐鉴赏信息,预测用户对各匹配音乐的预估音乐鉴赏信息,从而可以根据相似用户的实际音乐鉴赏信息为用户推荐匹配音乐。
其中,采用第三种方式时,在执行步骤203之前,服务器设备预先基于各用户对候选音乐库中各候选音乐的实际音乐鉴赏信息,确定预估评价矩阵。
具体的,确定预估评价矩阵时,可以采用以下步骤:
首先,服务器设备基于各用户对各候选音乐的实际音乐鉴赏信息,组成评分矩阵。其中,评分矩阵中的元素mij表示用户i对音乐j的鉴赏对应的数值。
然后,服务器设备采用预设的矩阵分解算法对评分矩阵进行矩阵分解,获得用户矩阵和音乐特征矩阵。
最后,分别将所述音乐特征矩阵中的每一音乐特征向量的转置与所述用户矩阵中的每一用户向量的乘积,确定为每一用户对每一音乐的预估音乐鉴赏信息。
在本申请实施例中,矩阵分解算法可以采用FunkSVD算法,具体原理如下:
对评分矩阵进行矩阵分解时,期望评分矩阵按照公式M mxn=P T mxkQ kxn进行分解。其中,M为评分矩阵,P为用户矩阵,Q为音乐特征矩阵,m为用户总数,n为音乐总数,k为参数。这样,基于矩阵分解后的P和Q,可以通过qTjpi表示用户i对音乐j的预估的音乐评分。p为用户向量,q为音乐特征向量。
为尽可能的减小用户实际的音乐评分mij与计算获得的预估音乐评分qTjpi之间的评分残差,将均方差作为损失函数,以确定最终的P和Q。
即只要可以最小化损失函数∑i,j(mij-qTjpi) 2并求出极值所对应的pi和qj,则我们最终可以得到矩阵P和Q,那么对于任意矩阵M的任意一个空白评分的位置,我们可以通过qTjpi计算预测的音乐评分。
在实际应用中,我们为了防止过拟合,会加入一个正则化项,因此,优 化目标函数J(p,q)为:
Figure PCTCN2019098861-appb-000002
其中,p为用户向量,q为音乐特征向量,λ为正则化系数,i为用户序号,j为音乐序号。
由于λ为正则化系数,需要调参,因此,通过梯度下降法来进行优化得到结果,具体步骤如下:
首先,将上式分别对pi和qj求导我们得到:
Figure PCTCN2019098861-appb-000003
Figure PCTCN2019098861-appb-000004
然后,在梯度下降法迭代时,迭代公式为:
pi=pi+α((mij-qTjpi)qj-λpi);
qj=qj+α((mij-qTjpi)pi-λqj);
通过迭代我们最终可以得到优化后的用户矩阵P和音乐特征矩阵Q,进而基于Q中的每一qTj与P中的每一pi的乘积,确定各用户对各音乐的预估评价矩阵。
这样,就可以根据各用户对各候选音乐的实际音乐鉴赏信息获得的评分矩阵,通过矩阵分解,获得用户矩阵和音乐特征矩阵,进而基于用户矩阵和音乐特征矩阵,获得各用户对各音乐的预估评价矩阵,并将预估评价矩阵确定为用户对各候选音乐的预估音乐鉴赏信息。
步骤204服务器设备根据素材对应的用户对各个匹配音乐的用户鉴赏信息,对各个匹配音乐进行排序。
步骤205:服务器设备基于排序结果,按照预设的音乐筛选条件对各匹配音乐进行筛选,并将筛选出的匹配音乐推荐为素材的备选音乐。
具体的,服务器设备在各匹配音乐中按照排序筛选出符合预设的音乐筛选条件的匹配音乐,并按照排序将筛选出的备选音乐直接向用户显示或将备选音乐的信息发送给终端设备。
在本申请实施例中,音乐筛选条件可以为筛选出用户鉴赏信息中的数值高于设定值的匹配音乐,或者,按照由高到低的排序结果,筛选出序号高于设定值的匹配音乐,或倒数设定数量的匹配音乐。
这样,用户就可以在各备选音乐中选取自己喜欢的音乐进行素材配乐。
例如,参阅图3f所示,为一种音乐推荐应用界面示意图一,图3f中,终端设备询问用户是否为小视频配乐,参阅图3g所示,为一种素材的匹配音乐推荐示例图。图3g中,终端设备确定用户为小视频配乐时,向服务器设备发送小视频,服务器设备对小视频进行解析,确定小视频的视觉语义标签为雪地和运动。然后,服务器设备在海量曲库(候选音乐库)中搜索出与雪地匹配的5首歌曲,搜索出与运动匹配的5首歌曲。接着,服务器设备根据用户对上述10首歌曲的预估音乐鉴赏信息,对各歌曲进行排序。参阅图3h所示,为一种音乐推荐应用界面示意图二,图3h中将排序在前的5首歌曲按照排序推荐给用户。
进一步地,终端设备接收服务器设备返回的备选音乐的信息,并将备选音乐的信息显示给用户,确定接收到用户从备选音乐中指定配乐音乐的指示信息,根据指示信息,获得合成并输出合成有配乐音乐的素材。
其中,根据指示信息,获得合成有配乐音乐的素材时,可以采用以下两种方式:
第一种方式为:将指示信息发送给服务器设备,接收服务器设备返回的合成有配乐音乐的素材。
第二种方式为:将指示信息发送给服务器设备,并接收服务器设备根据指示信息返回的配乐音乐,以及将配乐音乐合成到素材中。例如,服务器设备接收终端设备发送的从备选音乐中指定配乐音乐的指示信息,根据指示信息,将配乐音乐合成到素材,并将合成有音乐的素材发送给终端设备。
本申请实施例中,确定素材的若干素材语义标签,并基于各用户对各音乐的音乐评论信息获得的音乐搜索模型搜索素材语义标签匹配的若干匹配音乐,以及基于用户的用户鉴赏信息,对各个匹配音乐进行排序,并按照排序结果向用户进行音乐推荐。这样,就可以根据不同用户对不同音乐的喜好进行个性化服务,即对不同的用户进行差异化推荐,既向用户推荐了与素材匹配的音乐还向用户推荐了用户喜欢的音乐。
本申请实施例还提供一种音乐推荐的方法,该方法由终端设备执行,包括:
终端设备向服务器设备发送待配乐的素材,触发服务器设备执行以下步骤:确定素材的至少一个视觉语义标签;从候选音乐库中,搜索出与该至少一个视觉语义标签匹配的各个匹配音乐;根据素材对应的用户对各个匹配音 乐的预估音乐鉴赏信息,对各个匹配音乐进行排序;基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为素材的备选音乐。然后,终端设备接收服务器设备返回的备选音乐。其中,用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的。
参阅图3i所示,为一种音乐配乐的交互时序图。该方法的具体实施流程如下:
步骤301:终端设备向服务器设备发送对素材进行配乐的指示信息。
步骤302:终端设备接收服务器设备返回的基于素材推荐的备选音乐。
步骤303:终端设备向服务器设备发送采用备选音乐中的指定音乐进行配乐的指示信息。
步骤304:终端设备接收服务器设备返回的合成有音乐的素材。
基于同一发明构思,本申请实施例中还提供了一种音乐推荐的装置,由于上述装置及设备解决问题的原理与上述音乐推荐的方法相似,因此,装置的实施可以参见上述方法的实施,重复之处不再赘述。
如图4a所示,其为本申请实施例提供的一种音乐推荐的装置的结构示意图一,包括:
获取单元400,用于获取待配乐的素材;
第一确定单元401,用于确定素材的至少一个视觉语义标签,每个视觉语义标签用于描述素材的至少一项内容;
搜索单元402,用于从候选音乐库中,搜索出与至少一个视觉语义标签匹配的各个匹配音乐;
排序单元403,用于根据素材对应的用户针对各个匹配音乐的用户鉴赏信息,对各个匹配音乐进行排序;
推荐单元404,用于基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为素材的备选音乐。
在本申请实施例中,推荐单元404还用于:
接收终端设备发送的从备选音乐中指定配乐音乐的指示信息;
根据指示信息,将配乐音乐合成到素材;
将合成有音乐的素材发送给终端设备。
在本申请实施例中,第一确定单元401还包括:
第二确定单元,用于将用户从备选的视觉语义标签中指定的至少一个视觉语义标签,确定为素材的至少一个视觉语义标签;或者,
解析单元,用于解析素材的内容,确定素材的至少一个视觉语义标签。
在本申请实施例中,解析单元具体用于:
素材为图像集合时,利用预先训练的标签识别模型,对素材进行视觉语义标签识别,获得素材的视觉语义标签向量,并将视觉语义标签向量中分值符合预设筛选条件的视觉语义标签,确定为素材对应的视觉语义标签;
其中:图像集合中包含至少一帧图像,素材的视觉语义标签向量包括:从素材中识别出的内容的至少一个视觉语义标签及其对应的分值,标签识别模型为对多个标签识别样本进行训练后获得的,每个标签识别样本包括样本图像和该样本图像的视觉语义标签向量。
在本申请实施例中,解析单元具体用于:
素材为视频时,将素材进行帧解析,获得各帧图像;
利用预先训练的标签识别模型,分别对每一帧图像进行视觉语义标签识别,获得每一帧图像的视觉语义标签向量;
确定各帧图像的视觉语义标签向量的平均向量,将各帧图像的视觉语义标签向量的平均向量中,分值符合预设筛选条件的视觉语义标签,确定为素材对应的视觉语义标签;
其中:一帧图像的视觉语义标签向量包括:从该帧图像中识别出的内容的至少一个视觉语义标签及其对应的分值,标签识别模型为对多个标签识别样本进行训练后获得的,每个标签识别样本包括样本图像和该样本图像的视觉语义标签向量。
在本申请实施例中,搜索单元402具体用于:
基于至少一个视觉语义标签,采用预先训练的音乐搜索模型,获得与至少一个视觉语义标签匹配的各个匹配音乐;
其中,音乐搜索模型是将各用户对各音乐的音乐评论信息进行文本分类训练后获得的。
在本申请实施例中,排序单元403具体用于:
根据素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序,用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的;
其中,一个用户对一首音乐的实际音乐鉴赏信息是对用户的音乐鉴赏行为数据中包含的各个参数值进行加权处理后获得的;音乐鉴赏行为数据中包含以下参数中的任意一种或任意组合:音乐评分、点击率、收藏行为、点赞行为,以及分享行为。
在本申请实施例中,排序单元403具体用于:
针对匹配音乐,获取鉴赏该匹配音乐的各用户的用户属性信息,并筛选出用户属性信息与输入素材的用户的用户属性信息相似的各相似用户;
获取各相似用户对各匹配音乐的实际音乐鉴赏信息;
分别对各相似用户分别对每一匹配音乐的实际音乐鉴赏信息进行平均值处理,预估用户对各匹配音乐的预估音乐鉴赏信息。
在本申请实施例中,排序单元403具体用于:
基于各用户对各候选音乐的实际音乐鉴赏信息,获得评分矩阵;
对评分矩阵进行矩阵分解以及优化处理,获得用户矩阵和音乐特征矩阵;
分别将音乐特征矩阵中的每一音乐特征向量的转置与用户矩阵中的每一用户向量的乘积,确定为每一用户对每一音乐的预估音乐鉴赏信息。
排序单元403具体用于:
根据素材对应的用户对音乐的一种音乐鉴赏行为数据的参数值,或者是对音乐的至少两种音乐鉴赏行为数据的参数值进行加权处理后获得的综合值,对各个匹配音乐进行排序;
其中,一个用户对一首音乐的音乐鉴赏行为数据中包含以下参数中的任意一种或任意组合:音乐评分、点击率、收藏行为、点赞行为,以及分享行为。
如图4b所示,其为本申请实施例提供的一种音乐推荐的装置的结构示意图二,包括:
发送单元410,用于向服务器设备发送待配乐的素材,触发服务器设备执行以下步骤:确定素材的至少一个视觉语义标签;从候选音乐库中,搜索出与至少一个视觉语义标签匹配的各个匹配音乐;根据素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序;基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为素材的备选音乐;
接收单元411,用于接收服务器设备返回的备选音乐;
其中,用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的。
基于同一技术构思,本申请实施例还提供了一种计算设备,包括至少一个处理单元、以及至少一个存储单元,其中,存储单元存储有计算机程序,当该程序被处理单元执行时,使得处理单元执行上述实施例所述的方法的步骤。
在本申请实施例中,该计算设备可以为服务器设备或终端设备,服务器设备和终端设备都可以采用图5中所示的结构。下面以终端设备为例对计算设备的结构进行说明。本申请实施例提供一种终端设备500,参照图5所示,终端设备500用于实施上述各个方法实施例记载的方法,例如实施图2所示的实施例,终端设备500可以包括存储器501、处理器502、输入单元503和显示面板504。
存储器501,用于存储处理器502执行的计算机程序。存储器501可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据终端设备500的使用所创建的数据等。处理器502,可以是一个中央处理单元(central processing unit,CPU),或者为数字处理单元等等。输入单元503,可以用于获取用户输入的用户指令。显示面板504,用于显示由用户输入的信息或提供给用户的信息,本申请实施例中,显示面板504主要用于显示终端设备中各应用程序的显示界面以及各显示界面中显示的控件实体。在本申请实施例中,显示面板504可以采用液晶显示器(liquid crystal display,LCD)或OLED(organic light-emitting diode,有机发光二极管)等形式来配置显示面板504。
本申请实施例中不限定上述存储器501、处理器502、输入单元503和显示面板504之间的具体连接介质。本申请实施例在图5中以存储器501、处理器502、输入单元503、显示面板504之间通过总线505连接,总线505在图5中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。总线505可以分为地址总线、数据总线、控制总线等。为便于表示,图5中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
存储器501可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM);存储器501也可以是非易失性存储器(non-volatile memory),例如只读存储器,快闪存储器(flash memory),硬 盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)、或者存储器501是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器501可以是上述存储器的组合。
处理器502,用于实现如图2所示的实施例,包括:
处理器502,用于调用存储器501中存储的计算机程序执行如实施图2所示的实施例。
本申请实施例还提供了一种计算机可读存储介质,存储为执行上述处理器所需执行的计算机可执行指令,其包含用于执行上述处理器所需执行的程序。例如,该存储介质存储有可由计算设备执行的计算机程序,当所述程序在计算设备上运行时,使得计算设备执行上述实施例所述的方法的步骤。
在一些可能的实施方式中,本申请提供的一种音乐推荐的方法的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当程序产品在终端设备上运行时,程序代码用于使终端设备执行本说明书上述描述的根据本申请各种示例性实施方式的一种音乐推荐的方法中的步骤。例如,终端设备可以执行如实施图2所示的实施例。
程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
本申请的实施方式的用于一种音乐推荐的程序产品可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在计算设备上运行。然而,本申请的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括— —但不限于——电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于——无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本申请操作的程序代码,程序设计语言包括面向实体的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器设备上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
应当注意,尽管在上文详细描述中提及了装置的若干单元或子单元,但是这种划分是示例性的并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多单元的特征和功能可以在一个单元中具体化。反之,上文描述的一个单元的特征和功能可以进一步划分为由多个单元来具体化。
此外,尽管在附图中以特定顺序描述了本申请方法的操作,但是,这并非要求或者暗示必须按照该特定顺序来执行这些操作,或是必须执行全部所示的操作才能实现期望的结果。附加地或备选地,可以省略某些步骤,将多个步骤合并为一个步骤执行,和/或将一个步骤分解为多个步骤执行。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、 嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请的多个实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括多个实施例以及落入本申请范围的所有变更和修改。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (18)

  1. 一种音乐推荐的方法,其特征在于,由服务器设备执行,包括:
    获取待配乐的素材;
    确定所述素材的至少一个视觉语义标签,每个视觉语义标签用于描述素材的至少一项内容;
    从候选音乐库中,搜索出与所述至少一个视觉语义标签匹配的各个匹配音乐;
    根据所述素材对应的用户针对各个匹配音乐的用户鉴赏信息,对各个匹配音乐进行排序;
    基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为所述素材的备选音乐。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    接收终端设备发送的从所述备选音乐中指定配乐音乐的指示信息;
    根据所述指示信息,将所述配乐音乐合成到所述素材;
    将合成有音乐的素材发送给终端设备。
  3. 如权利要求1所述的方法,其特征在于,所述确定所述素材的至少一个视觉语义标签,包括:
    将所述用户从备选的视觉语义标签中指定的至少一个视觉语义标签,确定为所述素材的至少一个视觉语义标签;或者,
    解析所述素材的内容,确定所述素材的至少一个视觉语义标签。
  4. 如权利要求3所述的方法,其特征在于,解析所述素材的内容,确定所述素材的至少一个视觉语义标签,包括:
    所述素材为图像集合时,利用预先训练的标签识别模型,对所述素材进行视觉语义标签识别,获得所述素材的视觉语义标签向量,并将所述视觉语义标签向量中分值符合预设筛选条件的视觉语义标签,确定为所述素材对应的视觉语义标签;
    其中:所述图像集合中包含至少一帧图像,所述素材的视觉语义标签向量包括:从素材中识别出的内容的至少一个视觉语义标签及其对应的分值,所述标签识别模型为对多个标签识别样本进行训练后获得的,每个标签识别样本包括样本图像和该样本图像的视觉语义标签向量。
  5. 如权利要求3所述的方法,其特征在于,所述解析所述素材的内容,确定所述素材的至少一个视觉语义标签,包括;
    所述素材为视频时,将所述素材进行帧解析,获得各帧图像;
    利用预先训练的标签识别模型,分别对每一帧图像进行视觉语义标签识别,获得每一帧图像的视觉语义标签向量;
    将各帧图像的视觉语义标签向量的平均向量中,分值符合预设筛选条件的视觉语义标签,确定为所述素材对应的视觉语义标签;
    其中:一帧图像的视觉语义标签向量包括:从该帧图像中识别出的内容的至少一个视觉语义标签及其对应的分值,所述标签识别模型为对多个标签识别样本进行训练后获得的,每个标签识别样本包括样本图像和该样本图像的视觉语义标签向量。
  6. 如权利要求1所述的方法,其特征在于,所述搜索出与所述至少一个视觉语义标签匹配的各个匹配音乐,包括:
    基于所述至少一个视觉语义标签,采用预先训练的音乐搜索模型,获得与所述至少一个视觉语义标签匹配的各个匹配音乐;
    其中,所述音乐搜索模型是将各用户对各音乐的音乐评论信息进行文本分类训练后获得的。
  7. 如权利要求1~6任一项所述的方法,其特征在于,根据所述素材对应的用户针对各个匹配音乐的用户鉴赏信息,对各个匹配音乐进行排序,包括:
    根据所述素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序,所述用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的;其中,一个用户对一首音乐的所述实际音乐鉴赏信息是对用户的音乐鉴赏行为数据中包含的各个参数值进行加权处理后获得的;所述音乐鉴赏行为数据中包含以下参数中的任意一种或任意组合:音乐评分、点击率、收藏行为、点赞行为,以及分享行为。
  8. 如权利要求7所述的方法,其特征在于,在根据所述素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序之前,进一步包括:
    针对匹配音乐,获取鉴赏该匹配音乐的各用户的用户属性信息,并筛选出用户属性信息与所述用户的用户属性信息相似的各相似用户;
    获取各相似用户对各匹配音乐的实际音乐鉴赏信息;
    分别对各相似用户分别对每一匹配音乐的实际音乐鉴赏信息进行平均值处理,预估所述用户对各匹配音乐的预估音乐鉴赏信息。
  9. 如权利要求7所述的方法,其特征在于,在根据所述素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序之前,进一步包括:
    基于各用户对各候选音乐的实际音乐鉴赏信息,获得评分矩阵;
    对所述评分矩阵进行矩阵分解以及优化处理,获得用户矩阵和音乐特征矩阵;
    分别将所述音乐特征矩阵中的每一音乐特征向量的转置与所述用户矩阵中的每一用户向量的乘积,确定为每一用户对每一音乐的预估音乐鉴赏信息。
  10. 如权利要求1~6任一项所述的方法,其特征在于,根据所述素材对应的用户针对各个匹配音乐的用户鉴赏信息,对各个匹配音乐进行排序,包括:
    根据所述素材对应的用户对音乐的一种音乐鉴赏行为数据的参数值,或者是对音乐的至少两种音乐鉴赏行为数据的参数值进行加权处理后获得的综合值,对各个匹配音乐进行排序;
    其中,一个用户对一首音乐的音乐鉴赏行为数据中包含以下参数中的任意一种或任意组合:音乐评分、点击率、收藏行为、点赞行为,以及分享行为。
  11. 一种音乐推荐的方法,其特征在于,由终端设备执行,包括:
    向服务器设备发送待配乐的素材,触发所述服务器设备执行以下步骤:确定所述素材的至少一个视觉语义标签;从候选音乐库中,搜索出与所述至少一个视觉语义标签匹配的各个匹配音乐;根据所述素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序;基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为所述素材的备选音乐;
    接收所述服务器设备返回的备选音乐;
    其中,所述用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的。
  12. 一种音乐推荐的装置,其特征在于,包括:
    获取单元,用于获取待配乐的素材;
    第一确定单元,用于确定所述素材的至少一个视觉语义标签,每个视觉语义标签用于描述素材的至少一项内容;
    搜索单元,用于从候选音乐库中,搜索出与所述至少一个视觉语义标签匹配的各个匹配音乐;
    排序单元,用于根据所述素材对应的用户针对各个匹配音乐的用户鉴赏信息,对各个匹配音乐进行排序;
    推荐单元,用于基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为所述素材的备选音乐。
  13. 如权利要求12所述的装置,其特征在于,所述第一确定单元还包括:
    第二确定单元,用于将所述用户从备选的视觉语义标签中指定的至少一个视觉语义标签,确定为所述素材的至少一个视觉语义标签;或者,
    解析单元,用于解析所述素材的内容,确定所述素材的至少一个视觉语义标签。
  14. 如权利要求12或13所述的装置,其特征在于,所述排序单元具体用于:
    根据所述素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序,所述用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的;
    其中,一个用户对一首音乐的所述实际音乐鉴赏信息是对用户的音乐鉴赏行为数据中包含的各个参数值进行加权处理后获得的;所述音乐鉴赏行为数据中包含以下参数中的任意一种或任意组合:音乐评分、点击率、收藏行为、点赞行为,以及分享行为。
  15. 如权利要求14所述的装置,其特征在于,所述排序单元具体用于:
    针对匹配音乐,获取鉴赏该匹配音乐的各用户的用户属性信息,并筛选出用户属性信息与所述用户的用户属性信息相似的各相似用户;获取各相似用户对各匹配音乐的实际音乐鉴赏信息;分别对各相似用户分别对每一匹配音乐的实际音乐鉴赏信息进行平均值处理,预估所述用户对各匹配音乐的预估音乐鉴赏信息;
    基于各用户对各候选音乐的实际音乐鉴赏信息,获得评分矩阵;对所述评分矩阵进行矩阵分解以及优化处理,获得用户矩阵和音乐特征矩阵;分别 将所述音乐特征矩阵中的每一音乐特征向量的转置与所述用户矩阵中的每一用户向量的乘积,确定为每一用户对每一音乐的预估音乐鉴赏信息;或者
    根据所述素材对应的用户对音乐的一种音乐鉴赏行为数据的参数值,或者是对音乐的至少两种音乐鉴赏行为数据的参数值进行加权处理后获得的综合值,对各个匹配音乐进行排序;其中,一个用户对一首音乐的音乐鉴赏行为数据中包含以下参数中的任意一种或任意组合:音乐评分、点击率、收藏行为、点赞行为,以及分享行为。
  16. 一种音乐推荐的装置,其特征在于,包括:
    发送单元,用于向服务器设备发送待配乐的素材,触发所述服务器设备执行以下步骤:确定所述素材的至少一个视觉语义标签;从候选音乐库中,搜索出与所述至少一个视觉语义标签匹配的各个匹配音乐;根据所述素材对应的用户对各个匹配音乐的预估音乐鉴赏信息,对各个匹配音乐进行排序;基于排序结果,按照预设的音乐筛选条件对匹配音乐进行筛选,并将筛选出的匹配音乐推荐为所述素材的备选音乐;
    接收单元,用于接收所述服务器设备返回的备选音乐;
    其中,所述用户对各个匹配音乐的预估音乐鉴赏信息是基于不同用户对各个候选音乐的实际音乐鉴赏信息获得的。
  17. 一种计算设备,其特征在于,包括至少一个处理单元、以及至少一个存储单元,其中,所述存储单元存储有计算机程序,当所述程序被所述处理单元执行时,使得所述处理单元执行权利要求1~10或11任一权利要求所述方法的步骤。
  18. 一种计算机可读介质,其特征在于,其存储有可由计算设备执行的计算机程序,当所述程序在计算设备上运行时,使得所述计算设备执行权利要求1~10或11任一所述方法的步骤。
PCT/CN2019/098861 2018-08-14 2019-08-01 音乐推荐的方法、装置、计算设备和介质 WO2020034849A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2020549554A JP7206288B2 (ja) 2018-08-14 2019-08-01 音楽推薦方法、装置、コンピューティング機器及び媒体
EP19849335.5A EP3757995A4 (en) 2018-08-14 2019-08-01 METHOD AND DEVICE FOR RECOMMENDING MUSIC AND COMPUTER DEVICE AND MEDIUM
US17/026,477 US11314806B2 (en) 2018-08-14 2020-09-21 Method for making music recommendations and related computing device, and medium thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810924409.0 2018-08-14
CN201810924409.0A CN109063163B (zh) 2018-08-14 2018-08-14 一种音乐推荐的方法、装置、终端设备和介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/026,477 Continuation US11314806B2 (en) 2018-08-14 2020-09-21 Method for making music recommendations and related computing device, and medium thereof

Publications (1)

Publication Number Publication Date
WO2020034849A1 true WO2020034849A1 (zh) 2020-02-20

Family

ID=64683893

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/098861 WO2020034849A1 (zh) 2018-08-14 2019-08-01 音乐推荐的方法、装置、计算设备和介质

Country Status (5)

Country Link
US (1) US11314806B2 (zh)
EP (1) EP3757995A4 (zh)
JP (1) JP7206288B2 (zh)
CN (1) CN109063163B (zh)
WO (1) WO2020034849A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597320A (zh) * 2020-12-09 2021-04-02 上海掌门科技有限公司 社交信息生成方法、设备及计算机可读介质
CN114390342A (zh) * 2021-12-10 2022-04-22 阿里巴巴(中国)有限公司 一种视频配乐方法、装置、设备及介质
JP7502553B2 (ja) 2020-08-31 2024-06-18 レモン インコーポレイテッド マルチメディア作品の作成方法、装置及びコンピュータ可読記憶媒体

Families Citing this family (38)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8805854B2 (en) * 2009-06-23 2014-08-12 Gracenote, Inc. Methods and apparatus for determining a mood profile associated with media data
CN109063163B (zh) 2018-08-14 2022-12-02 腾讯科技(深圳)有限公司 一种音乐推荐的方法、装置、终端设备和介质
CN109587554B (zh) * 2018-10-29 2021-08-03 百度在线网络技术(北京)有限公司 视频数据的处理方法、装置及可读存储介质
CN109766493B (zh) * 2018-12-24 2022-08-02 哈尔滨工程大学 一种在神经网络下结合人格特征的跨域推荐方法
CN111401100B (zh) 2018-12-28 2021-02-09 广州市百果园信息技术有限公司 视频质量评估方法、装置、设备及存储介质
CN111435369B (zh) * 2019-01-14 2024-04-09 腾讯科技(深圳)有限公司 音乐推荐方法、装置、终端及存储介质
CN109862393B (zh) * 2019-03-20 2022-06-14 深圳前海微众银行股份有限公司 视频文件的配乐方法、系统、设备及存储介质
CN110297939A (zh) * 2019-06-21 2019-10-01 山东科技大学 一种融合用户行为和文化元数据的音乐个性化系统
CN112182281B (zh) * 2019-07-05 2023-09-19 腾讯科技(深圳)有限公司 一种音频推荐方法、装置及存储介质
CN110598766B (zh) * 2019-08-28 2022-05-10 第四范式(北京)技术有限公司 一种商品推荐模型的训练方法、装置及电子设备
CN110727785A (zh) * 2019-09-11 2020-01-24 北京奇艺世纪科技有限公司 推荐模型的训练、搜索文本的推荐方法、装置及存储介质
JP7188337B2 (ja) * 2019-09-24 2022-12-13 カシオ計算機株式会社 サーバ装置、演奏支援方法、プログラム、および情報提供システム
CN112559777A (zh) * 2019-09-25 2021-03-26 北京达佳互联信息技术有限公司 内容项投放方法、装置、计算机设备及存储介质
CN110704682B (zh) * 2019-09-26 2022-03-18 新华智云科技有限公司 一种基于视频多维特征智能推荐背景音乐的方法及系统
CN110728539A (zh) * 2019-10-09 2020-01-24 重庆特斯联智慧科技股份有限公司 一种基于大数据的顾客差异化管理的方法及装置
CN110677711B (zh) * 2019-10-17 2022-03-01 北京字节跳动网络技术有限公司 视频配乐方法、装置、电子设备及计算机可读介质
US11907963B2 (en) * 2019-10-29 2024-02-20 International Business Machines Corporation On-device privacy-preservation and personalization
CN110839173A (zh) * 2019-11-18 2020-02-25 上海极链网络科技有限公司 一种音乐匹配方法、装置、终端及存储介质
CN110971969B (zh) * 2019-12-09 2021-09-07 北京字节跳动网络技术有限公司 视频配乐方法、装置、电子设备及计算机可读存储介质
CN111031391A (zh) * 2019-12-19 2020-04-17 北京达佳互联信息技术有限公司 视频配乐方法、装置、服务器、终端及存储介质
CN111008287B (zh) * 2019-12-19 2023-08-04 Oppo(重庆)智能科技有限公司 音视频处理方法、装置、服务器及存储介质
CN111259192B (zh) * 2020-01-15 2023-12-01 腾讯科技(深圳)有限公司 音频推荐方法和装置
CN111259191A (zh) * 2020-01-16 2020-06-09 石河子大学 一种中小学音乐教育学习系统与方法
US11461649B2 (en) * 2020-03-19 2022-10-04 Adobe Inc. Searching for music
CN111417030A (zh) * 2020-04-28 2020-07-14 广州酷狗计算机科技有限公司 设置配乐的方法、装置、系统、设备及存储设备
CN111800650B (zh) * 2020-06-05 2022-03-25 腾讯科技(深圳)有限公司 视频配乐方法、装置、电子设备及计算机可读介质
CN111695041B (zh) * 2020-06-17 2023-05-23 北京字节跳动网络技术有限公司 用于推荐信息的方法和装置
EP4198772A4 (en) * 2020-08-31 2023-08-16 Huawei Technologies Co., Ltd. METHOD AND DEVICE FOR MAKING A MUSIC RECOMMENDATION
CN112214636A (zh) * 2020-09-21 2021-01-12 华为技术有限公司 音频文件的推荐方法、装置、电子设备以及可读存储介质
US11693897B2 (en) 2020-10-20 2023-07-04 Spotify Ab Using a hierarchical machine learning algorithm for providing personalized media content
US11544315B2 (en) * 2020-10-20 2023-01-03 Spotify Ab Systems and methods for using hierarchical ordered weighted averaging for providing personalized media content
CN113434763B (zh) * 2021-06-28 2022-10-14 平安科技(深圳)有限公司 搜索结果的推荐理由生成方法、装置、设备及存储介质
US11876841B2 (en) 2021-07-21 2024-01-16 Honda Motor Co., Ltd. Disparate player media sharing
CN113569088B (zh) * 2021-09-27 2021-12-21 腾讯科技(深圳)有限公司 一种音乐推荐方法、装置以及可读存储介质
CN114117142A (zh) * 2021-12-02 2022-03-01 南京邮电大学 一种基于注意力机制与超图卷积的标签感知推荐方法
CN114302225A (zh) * 2021-12-23 2022-04-08 阿里巴巴(中国)有限公司 视频配乐方法、数据处理方法、设备及存储介质
CN114637867A (zh) * 2022-05-18 2022-06-17 合肥的卢深视科技有限公司 视频特效配置方法、装置、电子设备和存储介质
CN115795023B (zh) * 2022-11-22 2024-01-05 百度时代网络技术(北京)有限公司 文档推荐方法、装置、设备以及存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110320454A1 (en) * 2010-06-29 2011-12-29 International Business Machines Corporation Multi-facet classification scheme for cataloging of information artifacts
CN102637178A (zh) * 2011-02-14 2012-08-15 北京瑞信在线系统技术有限公司 一种音乐推荐方法、装置及系统
US20130077937A1 (en) * 2011-09-26 2013-03-28 Sony Corporation Apparatus and method for producing remote streaming audiovisual montages
CN105975472A (zh) * 2015-12-09 2016-09-28 乐视网信息技术(北京)股份有限公司 一种推荐方法和装置
WO2018081751A1 (en) * 2016-10-28 2018-05-03 Vilynx, Inc. Video tagging system and method
CN108153831A (zh) * 2017-12-13 2018-06-12 北京小米移动软件有限公司 音乐添加方法及装置
WO2018145015A1 (en) * 2017-02-06 2018-08-09 Kodak Alaris Inc. Method for creating audio tracks for accompanying visual imagery
CN109063163A (zh) * 2018-08-14 2018-12-21 腾讯科技(深圳)有限公司 一种音乐推荐的方法、装置、终端设备和介质

Family Cites Families (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1020843B1 (en) * 1996-09-13 2008-04-16 Hitachi, Ltd. Automatic musical composition method
JP2006099740A (ja) * 2004-09-02 2006-04-13 Olympus Corp 情報提供装置、端末装置、情報提供システム及び情報提供方法
EP1666967B1 (en) * 2004-12-03 2013-05-08 Magix AG System and method of creating an emotional controlled soundtrack
KR101329266B1 (ko) * 2005-11-21 2013-11-14 코닌클리케 필립스 일렉트로닉스 엔.브이. 관련된 오디오 반주를 찾도록 디지털 영상들의 컨텐트특징들과 메타데이터를 사용하는 시스템 및 방법
US9032297B2 (en) * 2006-03-17 2015-05-12 Disney Enterprises, Inc. Web based video editing
US9111146B2 (en) * 2008-02-15 2015-08-18 Tivo Inc. Systems and methods for semantically classifying and normalizing shots in video
JP2009266005A (ja) * 2008-04-25 2009-11-12 Clarion Co Ltd 画像検索方法、画像検索プログラム、楽曲再生装置、および楽曲検索用物品
CN101727943B (zh) 2009-12-03 2012-10-17 无锡中星微电子有限公司 一种图像配乐的方法、图像配乐装置及图像播放装置
WO2012004650A1 (en) * 2010-07-08 2012-01-12 Siun Ni Raghallaigh Systems and methods for dynamic, distributed creation of a musical composition to accompany a visual composition
US9045967B2 (en) 2011-07-26 2015-06-02 Schlumberger Technology Corporation System and method for controlling and monitoring a drilling operation using refined solutions from a panistic inversion
CN103793447B (zh) 2012-10-26 2019-05-14 汤晓鸥 音乐与图像间语义相似度的估计方法和估计系统
JP2014095966A (ja) * 2012-11-08 2014-05-22 Sony Corp 情報処理装置、情報処理方法およびプログラム
CN103605656B (zh) * 2013-09-30 2018-02-02 小米科技有限责任公司 一种推荐音乐的方法、装置及一种移动终端
CN103795897A (zh) 2014-01-21 2014-05-14 深圳市中兴移动通信有限公司 自动生成背景音乐的方法和装置
CN105072354A (zh) 2015-07-17 2015-11-18 Tcl集团股份有限公司 一种利用多张照片合成视频流的方法及系统
TWI587574B (zh) 2015-07-20 2017-06-11 廣達電腦股份有限公司 行動裝置
US10178341B2 (en) * 2016-03-01 2019-01-08 DISH Technologies L.L.C. Network-based event recording
CN105930429A (zh) * 2016-04-19 2016-09-07 乐视控股(北京)有限公司 一种音乐推荐的方法及装置
US9836853B1 (en) * 2016-09-06 2017-12-05 Gopro, Inc. Three-dimensional convolutional neural networks for video highlight detection
KR20180036153A (ko) * 2016-09-30 2018-04-09 주식회사 요쿠스 영상 편집 시스템 및 방법
JP6589838B2 (ja) * 2016-11-30 2019-10-16 カシオ計算機株式会社 動画像編集装置及び動画像編集方法
US11761790B2 (en) 2016-12-09 2023-09-19 Tomtom Global Content B.V. Method and system for image-based positioning and mapping for a road network utilizing object detection
KR101863672B1 (ko) * 2016-12-15 2018-06-01 정우주 멀티미디어 컨텐츠 정보를 기반으로 사용자 맞춤형 멀티미디어 컨텐츠를 제공하는 방법 및 장치
CN107220663B (zh) * 2017-05-17 2020-05-19 大连理工大学 一种基于语义场景分类的图像自动标注方法
CN107707828B (zh) 2017-09-26 2019-07-26 维沃移动通信有限公司 一种视频处理方法及移动终端
CN107959873A (zh) * 2017-11-02 2018-04-24 深圳天珑无线科技有限公司 在视频中植入背景音乐的方法、装置、终端及存储介质
CN108600825B (zh) * 2018-07-12 2019-10-25 北京微播视界科技有限公司 选择背景音乐拍摄视频的方法、装置、终端设备和介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110320454A1 (en) * 2010-06-29 2011-12-29 International Business Machines Corporation Multi-facet classification scheme for cataloging of information artifacts
CN102637178A (zh) * 2011-02-14 2012-08-15 北京瑞信在线系统技术有限公司 一种音乐推荐方法、装置及系统
US20130077937A1 (en) * 2011-09-26 2013-03-28 Sony Corporation Apparatus and method for producing remote streaming audiovisual montages
CN105975472A (zh) * 2015-12-09 2016-09-28 乐视网信息技术(北京)股份有限公司 一种推荐方法和装置
WO2018081751A1 (en) * 2016-10-28 2018-05-03 Vilynx, Inc. Video tagging system and method
WO2018145015A1 (en) * 2017-02-06 2018-08-09 Kodak Alaris Inc. Method for creating audio tracks for accompanying visual imagery
CN108153831A (zh) * 2017-12-13 2018-06-12 北京小米移动软件有限公司 音乐添加方法及装置
CN109063163A (zh) * 2018-08-14 2018-12-21 腾讯科技(深圳)有限公司 一种音乐推荐的方法、装置、终端设备和介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3757995A4

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7502553B2 (ja) 2020-08-31 2024-06-18 レモン インコーポレイテッド マルチメディア作品の作成方法、装置及びコンピュータ可読記憶媒体
CN112597320A (zh) * 2020-12-09 2021-04-02 上海掌门科技有限公司 社交信息生成方法、设备及计算机可读介质
CN114390342A (zh) * 2021-12-10 2022-04-22 阿里巴巴(中国)有限公司 一种视频配乐方法、装置、设备及介质
CN114390342B (zh) * 2021-12-10 2023-08-29 阿里巴巴(中国)有限公司 一种视频配乐方法、装置、设备及介质

Also Published As

Publication number Publication date
EP3757995A4 (en) 2021-06-09
US11314806B2 (en) 2022-04-26
EP3757995A1 (en) 2020-12-30
CN109063163B (zh) 2022-12-02
JP2021516398A (ja) 2021-07-01
US20210004402A1 (en) 2021-01-07
JP7206288B2 (ja) 2023-01-17
CN109063163A (zh) 2018-12-21

Similar Documents

Publication Publication Date Title
WO2020034849A1 (zh) 音乐推荐的方法、装置、计算设备和介质
US11216496B2 (en) Visual interactive search
US20210027160A1 (en) End-to-end deep collaborative filtering
US9678957B2 (en) Systems and methods for classifying electronic information using advanced active learning techniques
CN111815415B (zh) 一种商品推荐方法、系统及设备
US20170200066A1 (en) Semantic Natural Language Vector Space
US11397873B2 (en) Enhanced processing for communication workflows using machine-learning techniques
CN111125422A (zh) 一种图像分类方法、装置、电子设备及存储介质
US11915298B2 (en) System and method for intelligent context-based personalized beauty product recommendation and matching
WO2014107193A1 (en) Efficiently identifying images, videos, songs or documents most relevant to the user based on attribute feedback
US20230237093A1 (en) Video recommender system by knowledge based multi-modal graph neural networks
CN113806588A (zh) 搜索视频的方法和装置
US20230030341A1 (en) Dynamic user interface and machine learning tools for generating digital content and multivariate testing recommendations
US11397614B2 (en) Enhanced processing for communication workflows using machine-learning techniques
CN116595252A (zh) 一种数据处理方法及相关装置
CN118172146A (zh) 物品数据处理方法、装置、计算机设备和存储介质
CN118277651A (zh) 互动内容处理方法、装置、计算机设备和存储介质
Sen et al. Vector ordering and regression learning-based ranking for dynamic summarisation of user videos.
CN113641900A (zh) 信息推荐方法及装置
CN116975735A (zh) 相关程度预估模型的训练方法、装置、设备、存储介质
CN117370682A (zh) 对象排序方法、装置、设备及存储介质
CN116521971A (zh) 内容推荐方法、装置、设备、存储介质及计算机程序产品
CN116010638A (zh) 一种互动图像的显示方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19849335

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020549554

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2019849335

Country of ref document: EP

Effective date: 20200924

NENP Non-entry into the national phase

Ref country code: DE