CN111782858A - Music matching method and device - Google Patents

Music matching method and device Download PDF

Info

Publication number
CN111782858A
CN111782858A CN202010245027.2A CN202010245027A CN111782858A CN 111782858 A CN111782858 A CN 111782858A CN 202010245027 A CN202010245027 A CN 202010245027A CN 111782858 A CN111782858 A CN 111782858A
Authority
CN
China
Prior art keywords
music
sample
detected
target
matched
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010245027.2A
Other languages
Chinese (zh)
Other versions
CN111782858B (en
Inventor
左鑫孟
赖荣凤
梅涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010245027.2A priority Critical patent/CN111782858B/en
Publication of CN111782858A publication Critical patent/CN111782858A/en
Application granted granted Critical
Publication of CN111782858B publication Critical patent/CN111782858B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • G06F16/636Filtering based on additional data, e.g. user or group profiles by using biological or physiological data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Physiology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The invention discloses a music matching method and device, and relates to the technical field of computers. One embodiment of the method comprises: determining position information of a plurality of key points of a target object in a sample to be detected based on a pre-trained key point detection model; determining the action characteristics of the target object according to the position information of the key points; and screening the target music matched with the sample to be detected from a preset music library according to the action characteristics. This embodiment need not support such as intelligent hardware, wearable equipment, alright provide individualized music collocation scheme, and is with low costs, the convenience is good.

Description

Music matching method and device
Technical Field
The invention relates to the technical field of computers, in particular to a music matching method and device.
Background
At present, music matched according to limbs and the like mainly depends on manual work or various wearable devices, and corresponding music is matched through multiple operations and measurement. This kind of matching mode has increased the hardware cost on the one hand, on the one hand owing to need cooperate the body sensing equipment to use, lacks the convenience.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for music matching, which can provide an individualized music matching scheme without the support of intelligent hardware, wearable equipment, and the like, and have low cost and good convenience.
According to an aspect of an embodiment of the present invention, there is provided a music matching method, including:
determining position information of a plurality of key points of a target object in a sample to be detected based on a pre-trained key point detection model;
determining the action characteristics of the target object according to the position information of the key points;
and screening the target music matched with the sample to be detected from a preset music library according to the action characteristics.
Optionally, the action feature comprises: an action frequency characteristic of the first characteristic part; according to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps:
determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the number of the first features is plural; according to the action frequency characteristics of the first characteristic part, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps:
for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the action feature comprises: the action amplitude characteristic and the action frequency characteristic of the first characteristic part;
before the target music matched with the sample to be detected is screened from a preset music library, the method further comprises the following steps: determining the action amplitude characteristics of the target object according to the position information of the plurality of key points;
according to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the number of the first features is plural; according to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps:
for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, before the target music matched with the sample to be detected is screened from a preset music library, the method further includes: determining a target music type matched with the sample to be detected;
screening the target music matched with the sample to be detected from a preset music library, wherein the screening comprises the following steps: and screening a music set with the target music type from a preset music library, and screening target music matched with the sample to be detected from the music set.
Optionally, determining the target music type matched with the sample to be detected includes:
and determining the local change characteristics of the second characteristic part of the target object according to the position information of the plurality of key points, and determining the target music type matched with the sample to be detected according to the local change characteristics.
Optionally, the target music type is a kind of a main instrument of the target music.
Optionally, the sample to be detected is an offline video stream or a periodically acquired real-time video stream.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for music matching, including:
the key point detection module is used for determining the position information of a plurality of key points of the target object in the sample to be detected based on a pre-trained key point detection model;
the characteristic analysis module is used for determining the action characteristics of the target object according to the position information of the key points;
and the music matching module is used for screening the target music matched with the sample to be detected from a preset music library according to the action characteristics.
Optionally, the action feature comprises: an action frequency characteristic of the first characteristic part; the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps:
determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the number of the first features is plural; the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps:
for each first characteristic part, determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the action feature comprises: the action amplitude characteristic and the action frequency characteristic of the first characteristic part;
the feature analysis module is further to: before the music matching module screens target music matched with the sample to be detected from a preset music library, determining the action amplitude characteristics of a target object according to the position information of the key points;
the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the number of the first features is plural; the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps:
for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the feature analysis module is further configured to: determining a target music type matched with the sample to be detected before the music matching module screens the target music matched with the sample to be detected from a preset music library;
the music matching module screens target music matched with the sample to be detected from a preset music library, and the method comprises the following steps: and screening a music set with the target music type from a preset music library, and screening target music matched with the sample to be detected from the music set.
Optionally, the determining, by the feature analysis module, a target music type matched with the sample to be detected includes:
and determining the local change characteristics of the second characteristic part of the target object according to the position information of the plurality of key points, and determining the target music type matched with the sample to be detected according to the local change characteristics.
Optionally, the target music type is a kind of a main instrument of the target music.
Optionally, the sample to be detected is an offline video stream or a periodically acquired real-time video stream.
According to a third aspect of embodiments of the present invention, there is provided an electronic device for music matching, comprising:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method provided by the first aspect of the embodiments of the present invention.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method provided by the first aspect of embodiments of the present invention.
One embodiment of the above invention has the following advantages or benefits: through carrying out the music matching according to the positional information of key point, need not support such as intelligent hardware, wearable equipment, alright provide individualized music collocation scheme, with low costs, the convenience is good.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram of a main flow of a music matching method of an embodiment of the present invention;
FIG. 2 is a schematic representation of human key point detection results in an alternative embodiment of the present invention;
FIG. 3 is a diagram illustrating an application scenario of a method for music matching according to an alternative embodiment of the present invention;
FIG. 4 is a schematic diagram of the main blocks of an apparatus for music matching according to an embodiment of the present invention;
FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 6 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
According to an aspect of an embodiment of the present invention, there is provided a method of music matching.
Fig. 1 is a schematic diagram of a main flow of a music matching method according to an embodiment of the present invention, and as shown in fig. 1, the music matching method includes:
s101, determining position information of a plurality of key points of a target object in a sample to be detected based on a pre-trained key point detection model;
step S102, determining the action characteristics of the target object according to the position information of the plurality of key points;
and S103, screening target music matched with the sample to be detected from a preset music library according to the action characteristics.
The target object refers to a subject in a sample to be detected. For example, after the user takes a small video (sample to be detected) about pet entertainment, if music is to be matched for the small video, the target object is a pet in the small video. For another example, high-definition shooting in the dance practice room is used for acquiring an impromptu performance (to-be-detected sample) of the dancer, and matching music of a scene for the dancer in real time according to the motion change of the dancer so that the target object is the dancer when the music is played in the dance practice room.
The motion feature is a feature that can reflect dynamic information of the target object, and includes, for example, a change width of the motion, a change frequency of the motion, and the like.
The key point detection model is used for detecting key points of the target object. In the actual application process, only the position information of a specific key point in the target object can be detected, and the position information of more key points can also be detected. Illustratively, the total number of the detected key points of the detection model is 18, as shown in fig. 2, which are respectively: nose 0, neck 1, left shoulder 2, left elbow 3, left wrist 4, right shoulder 5, right elbow 6, right wrist 7, left hip 8, left knee 9, left ankle 10, right hip 11, right knee 12, right ankle 13, left eye 14, right eye 15, left ear 16, right ear 17. Since regression fitting of each key point of a human body is often required to be associated with each other in the process of training the key point detection model, the robustness of the position information of each key point required by the subsequent steps can be improved on one hand and the expansibility of the key point detection model can be improved on the other hand by detecting the position information of more key points.
According to the embodiment of the invention, the music matching is carried out according to the position information of the key points, and an individualized music matching scheme can be provided without the support of intelligent hardware, wearable equipment and the like, so that the cost is low and the convenience is good.
In some embodiments, the action features include: an action frequency characteristic of the first characteristic portion. According to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
The first characteristic portion refers to a portion in the target object, and the position of the portion specifically referred to by the first characteristic portion may be selectively set according to actual conditions, for example, the position of limbs, the position of waist, the position of neck, and the like of a human body or an animated character. The motion frequency feature is a feature that can reflect the motion frequency of the target object, and is, for example, a moving speed, a displacement curve of a specific portion, or the like. The action frequency can reflect the emotional state of the target object when the target object makes corresponding action, music in the music library is matched according to the action frequency characteristics, and the matching effect is good.
The matching degree refers to the similarity between the sample to be detected and the music in the music library. The calculation method of the matching degree can be selected according to the actual situation, such as Euclidean distance, cosine similarity, and the like. Optionally, the calculation formula of the matching degree is as follows:
Figure BDA0002433762060000081
wherein score is the matching degree fraction, the higher the fraction is, the higher the matching degree is, and the score value is [0,1 ]];ptSampling points at the time t after the displacement curve of the first characteristic part is normalized; q. q.stSampling points at the time t after the frequency curve of the music main melody in the music library is normalized; n is the number of sampling points. Through normalization processing, the matching values of the sample to be detected and different music are convenient to compare.
In other embodiments, the number of first features is multiple. According to the action frequency characteristics of the first characteristic part, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps: for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part; and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Illustratively, the first feature part is an extremity part of the human body, and displacement values at different times are respectively calculated according to the key points 3, 6, 9 and 12 and a central point of the human body, so that four displacement curves with different changes are obtained. And performing matching operation on each displacement curve and the music in the music library respectively, and finally selecting the music with the highest matching degree from the music matched with the four displacement curves as the target music matched with the sample to be detected.
By adopting the plurality of first characteristic parts, the best matching music can be found for the action performance of the target object in a quasi-real time manner, and the matching effect can be improved; by collecting the position information of more key points, a more personalized music collocation scheme is provided for the user conveniently.
Optionally, the action features include: the motion amplitude characteristic and the motion frequency characteristic of the first characteristic part. Before the target music matched with the sample to be detected is screened from a preset music library, the method further comprises the following steps: and determining the action amplitude characteristics of the target object according to the position information of the plurality of key points. According to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
The motion width feature is a feature that can reflect the motion change width of the target object, for example, the displacement amount of a certain part of the target object per unit time. For example, the displacement amount per unit time of a portion such as a limb, a waist position, or a neck position of a human body or an animated character, or the average displacement amount per unit time of all key points of a target object. The specific content and the calculation mode of the action amplitude characteristic can be selectively set according to the actual situation. Illustratively, the motion amplitude characteristic is an average displacement amount per unit time of all key points of the target object, and the calculation formula is as follows:
Figure BDA0002433762060000091
in the formula, fea1 is a characteristic value of the action amplitude characteristic, and the value range is [0,1]. The larger the feature value, the wider the musical range of music will be matched.
Figure BDA0002433762060000092
Indicating the position of the ith keypoint at time (t + 1).
Figure BDA0002433762060000093
Indicates that the ith key point is [ t, t +1 ]]The absolute value of the displacement at that moment. n is the number of sampling points.
The action amplitude characteristics can reflect the emotional state of the target object when the target object makes corresponding action, music in the music library is matched according to the action amplitude characteristics, and the matching effect is good. The music matching is carried out by adopting a plurality of characteristics, so that the best matching music can be found for the action performance of the target object in a quasi-real time manner, and the matching effect can be improved; by collecting the position information of more key points, a more personalized music collocation scheme is provided for the user conveniently.
Optionally, the number of first features is plural. According to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps: for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part; and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Illustratively, the first characteristic portion is a limb portion of the human body, and the motion amplitude characteristic is a unit time displacement amount of a certain portion of the target object (for example, a unit time displacement amount of a portion such as a limb, a waist, or a neck of the human body, or an average unit time displacement amount of all key points of the human body). And respectively calculating displacement values at different moments according to the key points 3, 6, 9 and 12 and the central point of the human body to obtain four displacement curves with different changes. And taking one displacement curve and the displacement in unit time as a group of data, respectively carrying out matching operation on the four groups of data and the music in the music library, and finally selecting the music with the highest matching degree from the music matched with the four groups of data as the target music matched with the sample to be detected.
By adopting the plurality of first characteristic parts and the action amplitude characteristics, the best matching music can be found for the action performance of the target object in a quasi-real time manner, and the matching effect can be improved; by collecting the position information of more key points, a more personalized music collocation scheme is provided for the user conveniently.
Optionally, before the target music matched with the sample to be detected is screened from a preset music library, the method further includes: and determining the target music type matched with the sample to be detected. Screening the target music matched with the sample to be detected from a preset music library, wherein the screening comprises the following steps: and screening a music set with the target music type from a preset music library, and screening target music matched with the sample to be detected from the music set.
Illustratively, the music library comprises 10 music types, and the target music type matched with the sample to be detected is the 2 nd type. And screening a2 nd type music set from the music library, and screening target music matched with the sample to be detected from a preset music library according to the action characteristics based on the music set.
In this example, the calculation amount of the subsequent steps can be greatly reduced and the music matching speed can be improved by determining the target music type matched with the sample to be detected and screening the music set with the target music type from the music library to perform the subsequent music matching steps.
The mode of determining the target music type matched with the sample to be detected can be selectively set according to the actual situation, for example, the target music type is preset. Optionally, determining the target music type matching the sample to be detected includes: and determining the local change characteristics of the second characteristic part of the target object according to the position information of the plurality of key points, and determining the target music type matched with the sample to be detected according to the local change characteristics.
The second characteristic portion refers to a portion in the target object, and the position of the portion specifically referred to by the second characteristic portion may be selectively set according to actual conditions, for example, the position of limbs, the position of waist, the position of neck, and the like of the human body or the animated character. The second feature may be located at the same position as or different from the first feature, and when there are a plurality of the first features or the second features, there may be an intersection therebetween. In general, the second characteristic region webbing target subject is mainly located at a position where the position changes when performing various operations, for example, a position of a limb, a waist, a neck, and the like of a human body.
The position and the calculation mode of the specific designated part of the second characteristic part can be selectively set according to the actual situation. Illustratively, the second feature site is a limb of the target object, which is calculated by the following formula:
Figure BDA0002433762060000111
in the formula, fea2 is the characteristic value of the second characteristic part, and the value range is [0, 1%]。
Figure BDA0002433762060000112
Indicating the position of the ith keypoint at time (t + 1).
Figure BDA0002433762060000113
Indicates that the ith key point is [ t, t +1 ]]The absolute value of the displacement at that moment. n is the number of sampling points.
The target music type matched with the sample to be detected is determined according to the local change characteristics of the second characteristic part, the application range of the music matching method can be widened, and therefore the music matching scheme can be provided in real time and in a personalized mode according to the action characteristics of the target object.
Optionally, the target music type is a master instrument category of the target music, such as piano, guitar, koto, violin, etc. And the method can be used for screening according to the types of the main musical instruments, so that the calculation amount of subsequent steps can be greatly reduced, and the music matching speed is improved.
Fig. 3 is a schematic diagram of an application scenario of the music matching method in an alternative embodiment of the present invention. As shown in fig. 3, motion video streams are collected to perform human body key point detection, so as to obtain motion frequency characteristics, local variation characteristics and motion amplitude characteristics, music library music matching is performed according to the three characteristics, so as to obtain target music, and then music playing is performed.
The embodiment of the invention does not need to wear wearable equipment or sensors and the like in use, and does not need to rely on the hardware performance of a special camera; the position information of the key points is used for acquiring various information such as action frequency, beat, action amplitude, facial expression and the like of a user, background music matching can be performed more diversely and accurately, and the dubbing music experience of performances such as small videos and ad hoc dances is improved.
The adoption of a plurality of characteristics for music matching can avoid the problem of poor matching effect caused by too single characteristics. In addition, different parts of the body may be suitable for different dance movements and music, such as belly dance, kick dance and the like, and even if the beats are similar, the dance music is different in dance types. By adopting the method of the embodiment of the invention, the music can be played according to the activity degree of the body part.
The sample to be detected in the embodiment of the present invention may be an offline video stream. For example, after the user takes a small video (sample to be detected) about pet entertainment, if music is to be matched for the small video, the target object is a pet in the small video. The small video or a video clip with a preset duration cut from the small video can be used as the sample to be detected.
The sample to be detected in the embodiment of the present invention may also be a real-time video stream that is periodically collected. For example, when an impromptu action is entertained, a real-time video of the impromptu action (e.g., a segment of an impromptu performance video taken every 5 seconds) may be the sample to be tested. For another example, when matching music for early education, dancing in body, impromptu, artistic evening, party and other entertainment activities, a section of impromptu video collected at a certain time interval can be used as a sample to be detected, for example, a section of video with the duration of 5 seconds is collected every 30 seconds and used as a sample to be detected.
With the self-media activity of small videos and the like, users basically adopt background music added on the short videos after shooting the short videos to further express feelings and contents to be expressed by the short videos. The existing background music is usually added manually, and a great deal of time and energy are consumed for seeking suitable music. The embodiment of the invention can automatically match proper music for the limb actions in the short video, saves the labor matching cost and has good matching effect.
In daily impromptu actions, or early education music, body feeling dancing, impromptu performance, artistic evening, party and other entertainment, people usually want to seek music corresponding to the actions according to the beats of the people, and hope to find music addition effects in time with the actions of the people at any time. The music player can achieve the whole body and mind fusion with music through the action beat, frequency, amplitude and the like of the music player, even use the excitement degree of different moods to control the action speed, quickly find out the tacit music and achieve the state of human-voice integration. At present, according to the matching music of limbs and the like, the music mainly depends on manual work or various wearable devices, multiple operations and measurement are carried out, the hardware cost is increased, meanwhile, the music still needs to be matched with a body sensing device for use, and the convenience is lacked. According to the embodiment of the invention, the video stream is analyzed without wearing equipment, the background music in the video or on site is matched by combining the key points with the information such as the amplitude, rhythm and the like of the action, the complexity of manually selecting the background music can be reduced, the defect that the traditional identification processing technology only identifies partial information such as hands or beats is overcome, and the matching effect is improved.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for implementing the above method.
Fig. 4 is a schematic diagram of main blocks of an apparatus for music matching according to an embodiment of the present invention, and as shown in fig. 4, the apparatus 400 for music matching includes:
the key point detection module 401 determines position information of a plurality of key points of a target object in a sample to be detected based on a pre-trained key point detection model;
the feature analysis module 402 determines the action features of the target object according to the position information of the plurality of key points;
and the music matching module 403 is configured to screen target music matched with the sample to be detected from a preset music library according to the motion characteristics.
Optionally, the action feature comprises: an action frequency characteristic of the first characteristic part; the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps:
determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the number of the first features is plural; the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps:
for each first characteristic part, determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the action feature comprises: the action amplitude characteristic and the action frequency characteristic of the first characteristic part;
the feature analysis module is further to: before the music matching module screens target music matched with the sample to be detected from a preset music library, determining the action amplitude characteristics of a target object according to the position information of the key points;
the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the number of the first features is plural; the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps:
for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the feature analysis module is further configured to: determining a target music type matched with the sample to be detected before the music matching module screens the target music matched with the sample to be detected from a preset music library;
the music matching module screens target music matched with the sample to be detected from a preset music library, and the method comprises the following steps: and screening a music set with the target music type from a preset music library, and screening target music matched with the sample to be detected from the music set.
Optionally, the determining, by the feature analysis module, a target music type matched with the sample to be detected includes:
and determining the local change characteristics of the second characteristic part of the target object according to the position information of the plurality of key points, and determining the target music type matched with the sample to be detected according to the local change characteristics.
Optionally, the target music type is a kind of a main instrument of the target music.
Optionally, the sample to be detected is an offline video stream or a periodically acquired real-time video stream.
According to a third aspect of embodiments of the present invention, there is provided an electronic device for music matching, comprising:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method provided by the first aspect of the embodiments of the present invention.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method provided by the first aspect of embodiments of the present invention.
Fig. 5 illustrates an exemplary system architecture 500 of a music matching method or a music matching apparatus to which embodiments of the present invention may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 serves to provide a medium for communication links between the terminal devices 501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 501, 502, 503 to interact with a server 505 over a network 504 to receive or send messages or the like. The terminal devices 501, 502, 503 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 501, 502, 503 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 505 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 501, 502, 503. The background management server may analyze and otherwise process the received data such as the music matching request, and feed back the processing result (e.g., the target music — just an example) to the terminal device.
It should be noted that the method for matching music provided by the embodiment of the present invention is generally executed by the server 505, and accordingly, the apparatus for matching music is generally disposed in the server 505.
It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprising: the key point detection module is used for determining the position information of a plurality of key points of the target object in the sample to be detected based on a pre-trained key point detection model; the characteristic analysis module is used for determining the action characteristics of the target object according to the position information of the key points; and the music matching module is used for screening the target music matched with the sample to be detected from a preset music library according to the action characteristics. The names of these modules do not limit the module itself in some cases, for example, the key point detection module may also be described as a "module for screening target music matching the sample to be detected from a preset music library".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: determining position information of a plurality of key points of a target object in a sample to be detected based on a pre-trained key point detection model; determining the action characteristics of the target object according to the position information of the key points; and screening the target music matched with the sample to be detected from a preset music library according to the action characteristics.
According to the technical scheme of the embodiment of the invention, the personalized music collocation scheme can be provided by matching the music according to the position information of the key point without the support of intelligent hardware, wearable equipment and the like, and the invention has low cost and good convenience.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (11)

1. A method of music matching, comprising:
determining position information of a plurality of key points of a target object in a sample to be detected based on a pre-trained key point detection model;
determining the action characteristics of the target object according to the position information of the key points;
and screening the target music matched with the sample to be detected from a preset music library according to the action characteristics.
2. The method of claim 1, wherein the action features comprise: an action frequency characteristic of the first characteristic part; according to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps:
determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
3. The method of claim 2, wherein the number of first features is plural; according to the action frequency characteristics of the first characteristic part, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps:
for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
4. The method of claim 1, wherein the action features comprise: the action amplitude characteristic and the action frequency characteristic of the first characteristic part;
before the target music matched with the sample to be detected is screened from a preset music library, the method further comprises the following steps: determining the action amplitude characteristics of the target object according to the position information of the plurality of key points;
according to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
5. The method of claim 4, wherein the number of first features is plural; according to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps:
for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
6. The method according to any one of claims 1 to 5, wherein before the step of screening the target music matched with the sample to be detected from the preset music library, the method further comprises the following steps: determining a target music type matched with the sample to be detected;
screening the target music matched with the sample to be detected from a preset music library, wherein the screening comprises the following steps: and screening a music set with the target music type from a preset music library, and screening target music matched with the sample to be detected from the music set.
7. The method of claim 6, wherein determining the target music type that matches the sample to be tested comprises:
and determining the local change characteristics of the second characteristic part of the target object according to the position information of the plurality of key points, and determining the target music type matched with the sample to be detected according to the local change characteristics.
8. The method of claim 7, wherein the target music type is a dominant instrument category of the target music.
9. An apparatus for music matching, comprising:
the key point detection module is used for determining the position information of a plurality of key points of the target object in the sample to be detected based on a pre-trained key point detection model;
the characteristic analysis module is used for determining the action characteristics of the target object according to the position information of the key points;
and the music matching module is used for screening the target music matched with the sample to be detected from a preset music library according to the action characteristics.
10. An electronic device for music matching, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.
11. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.
CN202010245027.2A 2020-03-31 2020-03-31 Music matching method and device Active CN111782858B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010245027.2A CN111782858B (en) 2020-03-31 2020-03-31 Music matching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010245027.2A CN111782858B (en) 2020-03-31 2020-03-31 Music matching method and device

Publications (2)

Publication Number Publication Date
CN111782858A true CN111782858A (en) 2020-10-16
CN111782858B CN111782858B (en) 2024-04-05

Family

ID=72753118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010245027.2A Active CN111782858B (en) 2020-03-31 2020-03-31 Music matching method and device

Country Status (1)

Country Link
CN (1) CN111782858B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486214A (en) * 2021-07-23 2021-10-08 广州酷狗计算机科技有限公司 Music matching method and device, computer equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130023343A1 (en) * 2011-07-20 2013-01-24 Brian Schmidt Studios, Llc Automatic music selection system
CN104754372A (en) * 2014-02-26 2015-07-01 苏州乐聚一堂电子科技有限公司 Beat-synchronized special effect system and beat-synchronized special effect handling method
CN105304101A (en) * 2015-10-29 2016-02-03 长沙相对音乐文化创作服务有限公司 Method for realizing matching music playing by detecting motion frequency of human body
CN109462776A (en) * 2018-11-29 2019-03-12 北京字节跳动网络技术有限公司 A kind of special video effect adding method, device, terminal device and storage medium
CN110602550A (en) * 2019-08-09 2019-12-20 咪咕动漫有限公司 Video processing method, electronic equipment and storage medium
CN110711374A (en) * 2019-10-15 2020-01-21 石家庄铁道大学 Multi-modal dance action evaluation method
CN110798737A (en) * 2019-11-29 2020-02-14 北京达佳互联信息技术有限公司 Video and audio synthesis method, terminal and storage medium
CN110852047A (en) * 2019-11-08 2020-02-28 腾讯科技(深圳)有限公司 Text score method, device and computer storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130023343A1 (en) * 2011-07-20 2013-01-24 Brian Schmidt Studios, Llc Automatic music selection system
CN104754372A (en) * 2014-02-26 2015-07-01 苏州乐聚一堂电子科技有限公司 Beat-synchronized special effect system and beat-synchronized special effect handling method
CN105304101A (en) * 2015-10-29 2016-02-03 长沙相对音乐文化创作服务有限公司 Method for realizing matching music playing by detecting motion frequency of human body
CN109462776A (en) * 2018-11-29 2019-03-12 北京字节跳动网络技术有限公司 A kind of special video effect adding method, device, terminal device and storage medium
CN110602550A (en) * 2019-08-09 2019-12-20 咪咕动漫有限公司 Video processing method, electronic equipment and storage medium
CN110711374A (en) * 2019-10-15 2020-01-21 石家庄铁道大学 Multi-modal dance action evaluation method
CN110852047A (en) * 2019-11-08 2020-02-28 腾讯科技(深圳)有限公司 Text score method, device and computer storage medium
CN110798737A (en) * 2019-11-29 2020-02-14 北京达佳互联信息技术有限公司 Video and audio synthesis method, terminal and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李海峰;孙佳音;张田;马琳;: "基于音乐认知原理的音乐旋律发现技术", 信号处理, no. 10 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486214A (en) * 2021-07-23 2021-10-08 广州酷狗计算机科技有限公司 Music matching method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111782858B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN109543064B (en) Lyric display processing method and device, electronic equipment and computer storage medium
CN109377539B (en) Method and apparatus for generating animation
EP3803846B1 (en) Autonomous generation of melody
US11670188B2 (en) Method and apparatus for an adaptive and interactive teaching of playing a musical instrument
US20150373455A1 (en) Presenting and creating audiolinks
US20150161908A1 (en) Method and apparatus for providing sensory information related to music
US11314475B2 (en) Customizing content delivery through cognitive analysis
US20110169603A1 (en) Distinguishing between user physical exertion biometric feedback and user emotional interest in a media stream
CN112365876B (en) Method, device and equipment for training speech synthesis model and storage medium
CN108885639A (en) Properties collection navigation and automatic forwarding
US20150379774A1 (en) System and method for dynamically generating contextual and personalized digital content
US20240054911A2 (en) Crowd-based device configuration selection of a music teaching system
CN107316641B (en) Voice control method and electronic equipment
US9436756B2 (en) Media system for generating playlist of multimedia files
US11511200B2 (en) Game playing method and system based on a multimedia file
US11341945B2 (en) Techniques for learning effective musical features for generative and retrieval-based applications
CN111027419B (en) Method, device, equipment and medium for detecting video irrelevant content
CN112235635B (en) Animation display method, animation display device, electronic equipment and storage medium
JP6535497B2 (en) Music recommendation system, program and music recommendation method
CA3189604A1 (en) Dance segment recognition method, dance segment recognition apparatus, and storage medium
CN110209658B (en) Data cleaning method and device
CN112153460A (en) Video dubbing method and device, electronic equipment and storage medium
Wang et al. Clustering-based emotion recognition micro-service cloud framework for mobile computing
CN106530377B (en) Method and apparatus for manipulating three-dimensional animated characters
CN111782858B (en) Music matching method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant