CN112182282A

CN112182282A - Music recommendation method and device, computer equipment and readable storage medium

Info

Publication number: CN112182282A
Application number: CN202010904446.2A
Authority: CN
Inventors: 张琼; 俞骏燊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2020-09-01
Filing date: 2020-09-01
Publication date: 2021-01-05

Abstract

The application relates to a music recommendation method, a music recommendation device, a computer device and a readable storage medium, wherein the method comprises the following steps: acquiring user video information, wherein the user video information comprises a human body; positioning human joint points of the video information, and extracting to obtain video characteristic information according to the position information of the joint points in the video information; inputting the video characteristic information into a scene recognition model to obtain scene information; and acquiring music corresponding to the scene information according to the scene information. According to the method and the device, the music is recommended through the action of the action recognition human body, the problem that the music of the corresponding scene cannot be recommended under the condition that the face recognition cannot be carried out is solved, and the user experience is improved.

Description

Music recommendation method and device, computer equipment and readable storage medium

Technical Field

The present application relates to the field of intelligent music recommendation, and in particular, to a music recommendation method, apparatus, computer device, and readable storage medium.

Background

Face recognition is a biometric technique for identifying an identity based on facial feature information of a person. A series of related technologies, also commonly called face recognition and face recognition, are used to collect images or video streams containing faces by using a camera or a video camera, automatically detect and track the faces in the images, and then perform face recognition on the detected faces.

The research of the face recognition system starts in the 60 s of the 20 th century, the development of the computer technology and the optical imaging technology is improved after the 80 s, and the research really enters the early application stage in the later 90 s and mainly realizes the technology of the United states, Germany and Japan; the key to the success of the face recognition system is whether the face recognition system has a core algorithm with a sharp end or not, and the recognition result has practical recognition rate and recognition speed; the human face recognition system integrates various professional technologies such as artificial intelligence, machine recognition, machine learning, model theory, expert system and video image processing, and meanwhile, the theory and implementation of intermediate value processing need to be combined, so that the human face recognition system is the latest application of biological feature recognition, the core technology of the human face recognition system is implemented, and the conversion from weak artificial intelligence to strong artificial intelligence is shown.

With the technical progress, the face recognition is more mature and widely applied. Many intelligent music recommendation systems have been developed, such as those based on face recognition to further analyze emotion, age, gender, etc. The above systems all rely on face recognition, and cannot process scenes in which faces cannot be recognized.

Disclosure of Invention

The embodiment of the application provides a music recommendation method, a music recommendation device, computer equipment and a readable storage medium, so as to at least solve the problem that a scene which cannot identify a human face in the related art obtains corresponding music.

In a first aspect, an embodiment of the present application provides a music recommendation method, including:

acquiring user video information, wherein the user video information comprises a human body;

positioning human joint points of the video information, and extracting to obtain video characteristic information according to the position information of the joint points in the video information;

inputting the video characteristic information into a scene recognition model to obtain scene information;

and acquiring music corresponding to the scene information according to the scene information.

In some embodiments, the inputting the video feature information into a scene recognition model, and the obtaining the scene information includes:

obtaining target human body actions according to the video characteristic information;

and obtaining scene information which corresponds to the target human body action and contains human body movement degree information according to a preset mapping relation between the human body action and the movement degree scene.

In some embodiments, the positioning of the human body joint points on the video information, and obtaining the video feature information includes:

acquiring position information of human body joint points of each frame of image in the video information;

respectively generating joint point coordinate matrixes of corresponding frame images according to the position information of the human body joint points of each frame image;

calculating a coordinate variation matrix according to the joint point coordinate matrixes of the two adjacent frame images;

and obtaining video characteristic information according to the coordinate variation matrix.

In some embodiments, the obtaining video feature information according to the coordinate variation matrix includes:

splitting the user video information into a plurality of sub-videos;

respectively acquiring coordinate variation matrixes of two adjacent frames of images in each sub-video;

obtaining a distance variable matrix of the corresponding sub-video according to the coordinate variable matrix of each sub-video;

and obtaining video characteristic information according to the distance variable quantity matrixes of all the sub-videos.

In some embodiments, the obtaining video feature information according to the distance variation matrix of all sub-videos includes:

and carrying out normalization processing on the distance variation matrix of the sub-video to obtain the video characteristic information.

In some embodiments, the inputting the video feature information into a scene recognition model, and obtaining the scene information includes:

establishing a neural network, training a neural network model by taking video characteristic information as a training set to obtain a scene recognition model, wherein the input of the scene recognition model is the video characteristic information, and the output of the scene recognition model is the scene information.

In some embodiments, the obtaining music corresponding to the scene information according to the scene information includes: the scene information comprises a first scene, a second scene and a third scene;

acquiring first music corresponding to the first scene according to the first scene;

acquiring second music corresponding to the second scene according to the second scene;

and acquiring third music corresponding to the third scene according to the third scene.

In a second aspect, an embodiment of the present application provides a music recommendation apparatus, including:

the video acquisition module is used for acquiring user video information, and the user video information comprises a human body;

the extraction module is used for positioning the human joint points of the video information and extracting the video characteristic information according to the position information of the joint points in the video information;

the scene recognition module is used for inputting the video characteristic information into a scene recognition model to obtain scene information;

and the playing module is used for acquiring the music corresponding to the scene information according to the scene information.

In a third aspect, an embodiment of the present application provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor, when executing the computer program, implements the music recommendation method according to the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the music recommendation method according to the first aspect.

Compared with the related art, the music recommendation method provided by the embodiment of the application recommends music through the action of the human body, solves the problem that music of a corresponding scene cannot be recommended under the condition that face recognition cannot be performed, and improves user experience.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is an application environment diagram of a music recommendation method according to an embodiment of the present application;

FIG. 2 is a block diagram of a music recommendation system according to an embodiment of the present application

FIG. 3 is a flow chart of a music recommendation method according to an embodiment of the present application;

FIG. 4 is a flow chart of a music recommendation method according to another embodiment of the present application;

fig. 5 is a block diagram of a music recommendation apparatus according to an embodiment of the present application;

fig. 6 is a schematic diagram of a hardware structure of a music recommendation device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.

It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The music recommendation method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The video information of the user is obtained through the terminal 102, the human body joint point positioning is carried out on the video information, the video characteristic information is obtained, and the video characteristic information is sent to the server 104. The server 104 inputs the video characteristic information into a scene recognition model to obtain scene information; and acquiring music corresponding to the scene information according to the scene information. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

As shown in fig. 2, the method of the present application may be applied to a music recommendation system. The music recommendation system comprises: the system comprises a video acquisition module 210, a feature extraction module 220, a scene recognition module 230, a music recommendation module 240 and a music playing module 250 which are connected in sequence. The video capture module 210, the feature extraction module 220, and the music playing module 250 are located in the terminal 102, and the scene recognition module 230 and the music recommendation module 240 are located in the server 104. It is to be understood that the video capture module 210, the feature extraction module 220, the scene identification module 230, the music recommendation module 240, and the music playing module 250 may also be separately located in a single device of the terminal 102 or the server 104, or the terminal 102 and the server 104 may cooperate to implement the functions of the music recommendation system.

The video capture module 210 is configured to capture video information of a user via a camera and transmit the video to the feature extraction module.

The feature extraction module 220 is configured to perform human body joint positioning on the video information to obtain video feature information, and send the video feature information to the scene identification module 230.

The scene recognition module 230 is configured to input the video feature information to a scene recognition model, so as to obtain scene information.

The music recommending module 240 is configured to recommend music information corresponding to the scene information according to the scene information, and send the music information to the terminal 102.

The music playing module 250 is configured to play corresponding music according to the music information.

In the music recommendation system in this embodiment, the feature extraction module 220 is located in the terminal 102, and the terminal 102 extracts and performs human body joint positioning on the video information to obtain video feature information and then uploads the video feature information to the server, so that the risk of revealing privacy of a user can be effectively avoided.

The embodiment also provides a music recommendation method. Fig. 3 is a flowchart of a music recommendation method according to an embodiment of the present application, and as shown in fig. 3, the flowchart includes the following steps:

step S302, user video information is obtained, and the user video information comprises a human body.

Specifically, when a user needs to play music, video information of the user is collected through the camera.

And S304, positioning the human body joint points of the video information to obtain video characteristic information.

The human body joint point positioning is a process of researching and describing human body postures and predicting human body behaviors, and the identification process is a process of identifying human body actions in a specified image or video according to changes of joint point positions in a human body.

Specifically, position information of a human body joint point of each frame of image in video information is acquired; respectively generating joint point coordinate matrixes of corresponding frame images according to the position information of the human body joint points of each frame image; calculating a coordinate variation matrix according to the joint point coordinate matrixes of the two adjacent frame images; and obtaining video characteristic information according to the coordinate variation matrix. The position information coordinates of the human body joint points comprise positions of a human body neck, a chest, a head, a right shoulder, a left shoulder, a right hip, a left hip, a right elbow, a left elbow, a right knee, a left knee, a right wrist, a left wrist, a right ankle, a left ankle and the like. The joint point coordinate matrix is a matrix containing a plurality of coordinates of position information of human joint points. The coordinate variation matrix is a matrix in which the position information coordinates of the human body joint points in the two frames of images are changed. By identifying the coordinate variation matrix, the information such as the specific posture, the action intensity and the like of the human body can be obtained.

Furthermore, the position information of the human body joint points of the video frame is extracted by using an Open position method, and a joint point coordinate matrix P is generated according to the position information of the human body joint points, wherein P is [ (x)₁，y₁)，(x₂，y₂)，...，(x_i，y_i)，...，(x₁₅，y₁₅)]. It is understood that the joint coordinate matrix is the coordinates of the position information of all the human body joints. The number of the coordinates is changed along with the number of the joint points in the acquired position information of the human body joint points. According to the joint point coordinate matrix P of two adjacent frame images_iAnd P_i+1And calculating a coordinate variation matrix of the position information coordinates of the adjacent two frames of human body joint points. And adding a plurality of coordinate variable quantity matrixes to obtain video characteristic information.

The obtaining of the video feature information according to the coordinate variation matrix includes: splitting the user video information into a plurality of sub-videos; respectively acquiring coordinate variation matrixes of two adjacent frames of images in each sub-video; obtaining a distance variable matrix of the corresponding sub-video according to the coordinate variable matrix of each sub-video; and obtaining video characteristic information according to the distance variable quantity matrixes of all the sub-videos.

Specifically, the video is divided into n segments of sub-videos with n being 1 × u according to the length of the user video information (1 is the video length, and u is the precision, i.e., the number of data per second). According to the joint point coordinate matrix P of two adjacent frame images in each sub video information_iAnd P_i+1And obtaining a coordinate variation matrix. And obtaining a distance variable matrix of the corresponding sub-video according to the coordinate variable matrix of each sub-video. According to the distances of all sub-videosAnd separating the variable quantity matrix to obtain the video characteristic information. Further, the distance variation matrix of the sub-video is normalized to obtain the video feature information.

Step S306, inputting the video characteristic information into a scene recognition model to obtain scene information.

Specifically, the feature extraction module sends the video feature information to the scene recognition module, and the scene recognition module obtains the scene information according to the video feature information. And identifying the motion degree of the human body according to the video characteristic information by using a scene identification model constructed by a deep learning algorithm, and further obtaining corresponding scene information.

The inputting the video feature information into a scene recognition model, and the obtaining the scene information includes: obtaining target human body actions according to the video characteristic information; and obtaining scene information which corresponds to the target human body action and contains human body movement degree information according to a preset mapping relation between the human body action and the movement degree scene.

Specifically, a human motion-scene mapping table is set in the scene recognition model, and the human motion-scene mapping table has a mapping relationship between a human motion and a motion degree scene. And the scene identification module identifies the position information of the joint points in the video characteristic information to obtain target human body actions, and scene information which corresponds to the target human body actions and contains human body movement degree information is obtained according to the mapping relation in the mapping table. In one embodiment, the scene recognition model has a motion intensity recognition mode, and the motion intensity recognition mode is to obtain a target human body action according to the video feature information, recognize the motion intensity of the target human body action, and further obtain corresponding scene information. In another embodiment, the scene recognition model has a mapping table recognition mode, the mapping table recognition mode is to obtain the target human body action according to the video characteristic information, and corresponding scene information is obtained through a mapping table.

In another embodiment, the scene recognition model has two recognition modes, one mode is to obtain the target human body action according to the video feature information, and obtain the corresponding scene information through a mapping table. And the other method is to obtain the target human body action according to the video characteristic information, identify the motion intensity of the target human body action and further obtain corresponding scene information. The identification mode may set a priority, for example, the identification mode of the mapping table is preferentially adopted, and if the corresponding mapping relation is not queried in the mapping table, the motion of the target human body is determined according to the motion intensity, so as to obtain scene information; and if the corresponding mapping relation can be inquired in the mapping table, obtaining the scene information according to the mapping relation in the mapping table.

Further, the scene information includes a first scene, a second scene, and a third scene. By dividing the motion of the human body into three scenes according to the motion intensity, the first scene may be a high-intensity motion scene: such as street dance, etc. The second scene may be a medium intensity motion scene: jogging, push-up, etc. The third scene may be a low intensity motion scene: such as learning, resting, etc. Each scene may have a variety of classical actions, e.g. classical actions of street dance: back flip, head rotation; and (3) movement: running, push-up and pull-up; when resting, the limbs lie flat, etc.

In one embodiment, the user is in a high intensity motion scene, such as performing a street dance, and a music recommendation system is turned on and collects video information of the user through a camera. The method comprises the steps of positioning human body joint points through video information of a user to obtain position information of the human body joint points, and obtaining a joint point coordinate matrix according to a plurality of human body joint coordinate positions. And obtaining a coordinate variation matrix by detecting the variation condition of the joint point coordinate matrix. And identifying the specific gesture and action intensity of the user according to the coordinate variation matrix to generate video characteristic information. And obtaining the current scene of the user according to the video characteristic information, acquiring the music corresponding to the scene, and playing the music. The music recommendation system intelligently captures the action scene of a user by applying technologies such as action recognition and deep learning, and recommends music conforming to the current scene.

The inputting the video feature information into a scene recognition model, before obtaining the scene information, includes: establishing a neural network, training a neural network model by taking video characteristic information as a training set to obtain a scene recognition model, wherein the input of the scene recognition model is the video characteristic information, and the output of the scene recognition model is the scene information.

Specifically, video characteristic information under a plurality of different scenes is obtained to train the neural network model, and a scene recognition model is obtained.

And step S208, acquiring music corresponding to the scene information according to the scene information.

Specifically, the music recommendation module receives the scene information of the scene identification module, acquires music information according to a music recommendation strategy, and pushes a music information terminal, and the terminal plays corresponding music according to the music information.

The acquiring of the music corresponding to the scene information according to the scene information includes: acquiring first music corresponding to the first scene according to the first scene; acquiring second music corresponding to the second scene according to the second scene; and acquiring third music corresponding to the third scene according to the third scene.

Specifically, the basic principle of the music recommendation strategy is as follows: recommending music with strong rhythm and fast change when the user is in a high-intensity motion scene; recommending music with slow and regular rhythm change when the user is in a medium-intensity motion scene; when the user is in a low intensity sports scene, the user recommends soothing and smooth music.

The method further comprises the following steps: and acquiring music updating information, and updating the music corresponding to the scene information.

Specifically, the server can obtain the singing lists of different music from the network and update the music in the database.

Through the steps, the music is recommended through the action of the human body, the problem that the music of the corresponding scene cannot be recommended under the condition that face recognition cannot be carried out is solved, and user experience is improved.

The embodiment also provides another music recommendation method. Fig. 4 is a flowchart of a music recommendation method according to another embodiment of the present application, and as shown in fig. 4, the flowchart includes the following steps:

step S402, collecting user video information.

Specifically, the video acquisition module acquires user video information through the camera and transmits the video to the feature extraction module.

And step S404, extracting video characteristic information in the user video information.

Specifically, the feature extraction module performs human body joint point positioning on the video information to obtain video feature information. And sending the video feature information to a scene recognition module.

Step S406, sending the video characteristic information to a server.

Specifically, the feature extraction module sends the video feature information to the scene recognition module.

Step S408, identifying the video characteristic information and generating scene information.

Specifically, the scene recognition module inputs the video feature information to a scene recognition model to obtain scene information.

Step S410, generating corresponding music information according to the scene information.

Specifically, the music recommendation module recommends music information corresponding to the scene information according to the scene information.

Step S412, the music information is sent to the terminal.

Specifically, the music recommendation module sends the music information to the terminal 102.

And step S414, the terminal updates the local music song list according to the music information.

Specifically, the music playing module plays the corresponding music according to the music information.

It should be understood that although the various steps in the flow charts of fig. 3-4 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 3-4 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least some of the sub-steps or stages of other steps.

The present embodiment further provides a music recommendation apparatus, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the music recommendation apparatus is omitted here. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 5 is a block diagram of a music recommendation apparatus according to an embodiment of the present application, and as shown in fig. 6, the apparatus includes: a video acquisition module 510, an extraction module 520, a scene recognition module 530, and a playback module 540.

The video obtaining module 510 is configured to obtain user video information, where the user video information includes a human body.

The extracting module 520 is configured to perform human joint point positioning on the video information, and extract video feature information according to the position information of the joint point in the video information.

The scene recognition module 530 is configured to input the video feature information to a scene recognition model, so as to obtain scene information.

The playing module 540 is configured to obtain music corresponding to the scene information according to the scene information.

The extracting module 520 is configured to obtain a target human body action according to the video feature information; and obtaining scene information which corresponds to the target human body action and contains human body movement degree information according to a preset mapping relation between the human body action and the movement degree scene.

The extraction module 520 is configured to obtain position information of a human body joint point of each frame of image in the video information; respectively generating joint point coordinate matrixes of corresponding frame images according to the position information of the human body joint points of each frame image; calculating a coordinate variation matrix according to the joint point coordinate matrixes of the two adjacent frame images; and obtaining video characteristic information according to the coordinate variation matrix.

The extracting module 520 is configured to split the user video information into a plurality of sub-videos; respectively acquiring coordinate variation matrixes of two adjacent frames of images in each sub-video; obtaining a distance variable matrix of the corresponding sub-video according to the coordinate variable matrix of each sub-video; and obtaining video characteristic information according to the distance variable quantity matrixes of all the sub-videos.

The extracting module 520 is configured to perform normalization processing on the distance variation matrix of the sub-video to obtain the video feature information.

The scene recognition module 530 is configured to establish a neural network, train a neural network model by using video feature information as a training set, and obtain a scene recognition model, where an input of the scene recognition model is the video feature information and an output of the scene recognition model is the scene information.

The playing module 540 is configured to obtain first music corresponding to the first scene according to the first scene; acquiring second music corresponding to the second scene according to the second scene; and acquiring third music corresponding to the third scene according to the third scene.

The playing module 540 is configured to update a playlist according to the obtained music and play the playlist.

The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.

In addition, the music recommendation method described in the embodiment of the present application with reference to fig. 3 may be implemented by a music recommendation device. Fig. 6 is a schematic diagram of a hardware structure of a music recommendation device according to an embodiment of the present application.

The music recommendation device may include a processor 81 and a memory 82 storing computer program instructions.

Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 82 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (earrom), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.

The memory 82 may be used to store or cache various data files for processing and/or communication use, as well as possible computer program instructions executed by the processor 81.

The processor 81 realizes any one of the music recommendation methods in the above embodiments by reading and executing computer program instructions stored in the memory 82.

In some of these embodiments, the music recommendation device may also include a communication interface 83 and a bus 80. As shown in fig. 6, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.

The communication interface 83 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 83 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.

Bus 80 includes hardware, software, or both to couple the components of the music recommendation device to each other. Bus 80 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 80 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

The music recommendation device may execute the method in the embodiment of the present application based on the acquired user video information, thereby implementing the method described in conjunction with fig. 3.

In addition, in combination with the methods in the foregoing embodiments, the embodiments of the present application may be implemented by providing a computer-readable storage medium. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the methods in the above embodiments.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A music recommendation method, comprising:

2. The music recommendation method according to claim 1, wherein said inputting said video feature information to a scene recognition model, obtaining scene information comprises:

3. The music recommendation method according to claim 1, wherein positioning joint points of a human body on the video information, and extracting video feature information according to position information of joint points in the video information comprises:

4. The music recommendation method according to claim 3, wherein said deriving video feature information according to the coordinate variation matrix comprises:

splitting the user video information into a plurality of sub-videos;

5. The music recommendation method according to claim 4, wherein said deriving video feature information from the distance variation matrix of all sub-videos comprises:

6. The music recommendation method according to claim 4, wherein the inputting the video feature information into a scene recognition model, and the obtaining the scene information comprises:

7. The music recommendation method according to claim 1, wherein said obtaining music corresponding to the scene information according to the scene information comprises: the scene information comprises a first scene, a second scene and a third scene;

8. A music recommendation device, comprising:

9. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the music recommendation method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a music recommendation method according to any one of claims 1 to 7.