CN111782858A - Music matching method and device - Google Patents
Music matching method and device Download PDFInfo
- Publication number
- CN111782858A CN111782858A CN202010245027.2A CN202010245027A CN111782858A CN 111782858 A CN111782858 A CN 111782858A CN 202010245027 A CN202010245027 A CN 202010245027A CN 111782858 A CN111782858 A CN 111782858A
- Authority
- CN
- China
- Prior art keywords
- music
- sample
- detected
- target
- matched
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 230000009471 action Effects 0.000 claims abstract description 119
- 238000012216 screening Methods 0.000 claims abstract description 42
- 238000001514 detection method Methods 0.000 claims abstract description 20
- 230000008859 change Effects 0.000 claims description 15
- 238000004458 analytical method Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 10
- 238000006073 displacement reaction Methods 0.000 description 19
- 230000033001 locomotion Effects 0.000 description 17
- 238000010586 diagram Methods 0.000 description 13
- 230000000694 effects Effects 0.000 description 13
- 230000000875 corresponding effect Effects 0.000 description 12
- 210000003414 extremity Anatomy 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 7
- 238000004891 communication Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 210000003423 ankle Anatomy 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000002996 emotional effect Effects 0.000 description 2
- 239000000835 fiber Substances 0.000 description 2
- 210000003127 knee Anatomy 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 210000000707 wrist Anatomy 0.000 description 2
- 210000001015 abdomen Anatomy 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 210000004247 hand Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/635—Filtering based on additional data, e.g. user or group profiles
- G06F16/636—Filtering based on additional data, e.g. user or group profiles by using biological or physiological data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Physiology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The invention discloses a music matching method and device, and relates to the technical field of computers. One embodiment of the method comprises: determining position information of a plurality of key points of a target object in a sample to be detected based on a pre-trained key point detection model; determining the action characteristics of the target object according to the position information of the key points; and screening the target music matched with the sample to be detected from a preset music library according to the action characteristics. This embodiment need not support such as intelligent hardware, wearable equipment, alright provide individualized music collocation scheme, and is with low costs, the convenience is good.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a music matching method and device.
Background
At present, music matched according to limbs and the like mainly depends on manual work or various wearable devices, and corresponding music is matched through multiple operations and measurement. This kind of matching mode has increased the hardware cost on the one hand, on the one hand owing to need cooperate the body sensing equipment to use, lacks the convenience.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for music matching, which can provide an individualized music matching scheme without the support of intelligent hardware, wearable equipment, and the like, and have low cost and good convenience.
According to an aspect of an embodiment of the present invention, there is provided a music matching method, including:
determining position information of a plurality of key points of a target object in a sample to be detected based on a pre-trained key point detection model;
determining the action characteristics of the target object according to the position information of the key points;
and screening the target music matched with the sample to be detected from a preset music library according to the action characteristics.
Optionally, the action feature comprises: an action frequency characteristic of the first characteristic part; according to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps:
determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the number of the first features is plural; according to the action frequency characteristics of the first characteristic part, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps:
for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the action feature comprises: the action amplitude characteristic and the action frequency characteristic of the first characteristic part;
before the target music matched with the sample to be detected is screened from a preset music library, the method further comprises the following steps: determining the action amplitude characteristics of the target object according to the position information of the plurality of key points;
according to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the number of the first features is plural; according to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps:
for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, before the target music matched with the sample to be detected is screened from a preset music library, the method further includes: determining a target music type matched with the sample to be detected;
screening the target music matched with the sample to be detected from a preset music library, wherein the screening comprises the following steps: and screening a music set with the target music type from a preset music library, and screening target music matched with the sample to be detected from the music set.
Optionally, determining the target music type matched with the sample to be detected includes:
and determining the local change characteristics of the second characteristic part of the target object according to the position information of the plurality of key points, and determining the target music type matched with the sample to be detected according to the local change characteristics.
Optionally, the target music type is a kind of a main instrument of the target music.
Optionally, the sample to be detected is an offline video stream or a periodically acquired real-time video stream.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for music matching, including:
the key point detection module is used for determining the position information of a plurality of key points of the target object in the sample to be detected based on a pre-trained key point detection model;
the characteristic analysis module is used for determining the action characteristics of the target object according to the position information of the key points;
and the music matching module is used for screening the target music matched with the sample to be detected from a preset music library according to the action characteristics.
Optionally, the action feature comprises: an action frequency characteristic of the first characteristic part; the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps:
determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the number of the first features is plural; the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps:
for each first characteristic part, determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the action feature comprises: the action amplitude characteristic and the action frequency characteristic of the first characteristic part;
the feature analysis module is further to: before the music matching module screens target music matched with the sample to be detected from a preset music library, determining the action amplitude characteristics of a target object according to the position information of the key points;
the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the number of the first features is plural; the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps:
for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the feature analysis module is further configured to: determining a target music type matched with the sample to be detected before the music matching module screens the target music matched with the sample to be detected from a preset music library;
the music matching module screens target music matched with the sample to be detected from a preset music library, and the method comprises the following steps: and screening a music set with the target music type from a preset music library, and screening target music matched with the sample to be detected from the music set.
Optionally, the determining, by the feature analysis module, a target music type matched with the sample to be detected includes:
and determining the local change characteristics of the second characteristic part of the target object according to the position information of the plurality of key points, and determining the target music type matched with the sample to be detected according to the local change characteristics.
Optionally, the target music type is a kind of a main instrument of the target music.
Optionally, the sample to be detected is an offline video stream or a periodically acquired real-time video stream.
According to a third aspect of embodiments of the present invention, there is provided an electronic device for music matching, comprising:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method provided by the first aspect of the embodiments of the present invention.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method provided by the first aspect of embodiments of the present invention.
One embodiment of the above invention has the following advantages or benefits: through carrying out the music matching according to the positional information of key point, need not support such as intelligent hardware, wearable equipment, alright provide individualized music collocation scheme, with low costs, the convenience is good.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram of a main flow of a music matching method of an embodiment of the present invention;
FIG. 2 is a schematic representation of human key point detection results in an alternative embodiment of the present invention;
FIG. 3 is a diagram illustrating an application scenario of a method for music matching according to an alternative embodiment of the present invention;
FIG. 4 is a schematic diagram of the main blocks of an apparatus for music matching according to an embodiment of the present invention;
FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 6 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
According to an aspect of an embodiment of the present invention, there is provided a method of music matching.
Fig. 1 is a schematic diagram of a main flow of a music matching method according to an embodiment of the present invention, and as shown in fig. 1, the music matching method includes:
s101, determining position information of a plurality of key points of a target object in a sample to be detected based on a pre-trained key point detection model;
step S102, determining the action characteristics of the target object according to the position information of the plurality of key points;
and S103, screening target music matched with the sample to be detected from a preset music library according to the action characteristics.
The target object refers to a subject in a sample to be detected. For example, after the user takes a small video (sample to be detected) about pet entertainment, if music is to be matched for the small video, the target object is a pet in the small video. For another example, high-definition shooting in the dance practice room is used for acquiring an impromptu performance (to-be-detected sample) of the dancer, and matching music of a scene for the dancer in real time according to the motion change of the dancer so that the target object is the dancer when the music is played in the dance practice room.
The motion feature is a feature that can reflect dynamic information of the target object, and includes, for example, a change width of the motion, a change frequency of the motion, and the like.
The key point detection model is used for detecting key points of the target object. In the actual application process, only the position information of a specific key point in the target object can be detected, and the position information of more key points can also be detected. Illustratively, the total number of the detected key points of the detection model is 18, as shown in fig. 2, which are respectively: nose 0, neck 1, left shoulder 2, left elbow 3, left wrist 4, right shoulder 5, right elbow 6, right wrist 7, left hip 8, left knee 9, left ankle 10, right hip 11, right knee 12, right ankle 13, left eye 14, right eye 15, left ear 16, right ear 17. Since regression fitting of each key point of a human body is often required to be associated with each other in the process of training the key point detection model, the robustness of the position information of each key point required by the subsequent steps can be improved on one hand and the expansibility of the key point detection model can be improved on the other hand by detecting the position information of more key points.
According to the embodiment of the invention, the music matching is carried out according to the position information of the key points, and an individualized music matching scheme can be provided without the support of intelligent hardware, wearable equipment and the like, so that the cost is low and the convenience is good.
In some embodiments, the action features include: an action frequency characteristic of the first characteristic portion. According to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
The first characteristic portion refers to a portion in the target object, and the position of the portion specifically referred to by the first characteristic portion may be selectively set according to actual conditions, for example, the position of limbs, the position of waist, the position of neck, and the like of a human body or an animated character. The motion frequency feature is a feature that can reflect the motion frequency of the target object, and is, for example, a moving speed, a displacement curve of a specific portion, or the like. The action frequency can reflect the emotional state of the target object when the target object makes corresponding action, music in the music library is matched according to the action frequency characteristics, and the matching effect is good.
The matching degree refers to the similarity between the sample to be detected and the music in the music library. The calculation method of the matching degree can be selected according to the actual situation, such as Euclidean distance, cosine similarity, and the like. Optionally, the calculation formula of the matching degree is as follows:
wherein score is the matching degree fraction, the higher the fraction is, the higher the matching degree is, and the score value is [0,1 ]];ptSampling points at the time t after the displacement curve of the first characteristic part is normalized; q. q.stSampling points at the time t after the frequency curve of the music main melody in the music library is normalized; n is the number of sampling points. Through normalization processing, the matching values of the sample to be detected and different music are convenient to compare.
In other embodiments, the number of first features is multiple. According to the action frequency characteristics of the first characteristic part, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps: for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part; and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Illustratively, the first feature part is an extremity part of the human body, and displacement values at different times are respectively calculated according to the key points 3, 6, 9 and 12 and a central point of the human body, so that four displacement curves with different changes are obtained. And performing matching operation on each displacement curve and the music in the music library respectively, and finally selecting the music with the highest matching degree from the music matched with the four displacement curves as the target music matched with the sample to be detected.
By adopting the plurality of first characteristic parts, the best matching music can be found for the action performance of the target object in a quasi-real time manner, and the matching effect can be improved; by collecting the position information of more key points, a more personalized music collocation scheme is provided for the user conveniently.
Optionally, the action features include: the motion amplitude characteristic and the motion frequency characteristic of the first characteristic part. Before the target music matched with the sample to be detected is screened from a preset music library, the method further comprises the following steps: and determining the action amplitude characteristics of the target object according to the position information of the plurality of key points. According to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
The motion width feature is a feature that can reflect the motion change width of the target object, for example, the displacement amount of a certain part of the target object per unit time. For example, the displacement amount per unit time of a portion such as a limb, a waist position, or a neck position of a human body or an animated character, or the average displacement amount per unit time of all key points of a target object. The specific content and the calculation mode of the action amplitude characteristic can be selectively set according to the actual situation. Illustratively, the motion amplitude characteristic is an average displacement amount per unit time of all key points of the target object, and the calculation formula is as follows:
in the formula, fea1 is a characteristic value of the action amplitude characteristic, and the value range is [0,1]. The larger the feature value, the wider the musical range of music will be matched.Indicating the position of the ith keypoint at time (t + 1).Indicates that the ith key point is [ t, t +1 ]]The absolute value of the displacement at that moment. n is the number of sampling points.
The action amplitude characteristics can reflect the emotional state of the target object when the target object makes corresponding action, music in the music library is matched according to the action amplitude characteristics, and the matching effect is good. The music matching is carried out by adopting a plurality of characteristics, so that the best matching music can be found for the action performance of the target object in a quasi-real time manner, and the matching effect can be improved; by collecting the position information of more key points, a more personalized music collocation scheme is provided for the user conveniently.
Optionally, the number of first features is plural. According to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps: for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part; and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Illustratively, the first characteristic portion is a limb portion of the human body, and the motion amplitude characteristic is a unit time displacement amount of a certain portion of the target object (for example, a unit time displacement amount of a portion such as a limb, a waist, or a neck of the human body, or an average unit time displacement amount of all key points of the human body). And respectively calculating displacement values at different moments according to the key points 3, 6, 9 and 12 and the central point of the human body to obtain four displacement curves with different changes. And taking one displacement curve and the displacement in unit time as a group of data, respectively carrying out matching operation on the four groups of data and the music in the music library, and finally selecting the music with the highest matching degree from the music matched with the four groups of data as the target music matched with the sample to be detected.
By adopting the plurality of first characteristic parts and the action amplitude characteristics, the best matching music can be found for the action performance of the target object in a quasi-real time manner, and the matching effect can be improved; by collecting the position information of more key points, a more personalized music collocation scheme is provided for the user conveniently.
Optionally, before the target music matched with the sample to be detected is screened from a preset music library, the method further includes: and determining the target music type matched with the sample to be detected. Screening the target music matched with the sample to be detected from a preset music library, wherein the screening comprises the following steps: and screening a music set with the target music type from a preset music library, and screening target music matched with the sample to be detected from the music set.
Illustratively, the music library comprises 10 music types, and the target music type matched with the sample to be detected is the 2 nd type. And screening a2 nd type music set from the music library, and screening target music matched with the sample to be detected from a preset music library according to the action characteristics based on the music set.
In this example, the calculation amount of the subsequent steps can be greatly reduced and the music matching speed can be improved by determining the target music type matched with the sample to be detected and screening the music set with the target music type from the music library to perform the subsequent music matching steps.
The mode of determining the target music type matched with the sample to be detected can be selectively set according to the actual situation, for example, the target music type is preset. Optionally, determining the target music type matching the sample to be detected includes: and determining the local change characteristics of the second characteristic part of the target object according to the position information of the plurality of key points, and determining the target music type matched with the sample to be detected according to the local change characteristics.
The second characteristic portion refers to a portion in the target object, and the position of the portion specifically referred to by the second characteristic portion may be selectively set according to actual conditions, for example, the position of limbs, the position of waist, the position of neck, and the like of the human body or the animated character. The second feature may be located at the same position as or different from the first feature, and when there are a plurality of the first features or the second features, there may be an intersection therebetween. In general, the second characteristic region webbing target subject is mainly located at a position where the position changes when performing various operations, for example, a position of a limb, a waist, a neck, and the like of a human body.
The position and the calculation mode of the specific designated part of the second characteristic part can be selectively set according to the actual situation. Illustratively, the second feature site is a limb of the target object, which is calculated by the following formula:
in the formula, fea2 is the characteristic value of the second characteristic part, and the value range is [0, 1%]。Indicating the position of the ith keypoint at time (t + 1).Indicates that the ith key point is [ t, t +1 ]]The absolute value of the displacement at that moment. n is the number of sampling points.
The target music type matched with the sample to be detected is determined according to the local change characteristics of the second characteristic part, the application range of the music matching method can be widened, and therefore the music matching scheme can be provided in real time and in a personalized mode according to the action characteristics of the target object.
Optionally, the target music type is a master instrument category of the target music, such as piano, guitar, koto, violin, etc. And the method can be used for screening according to the types of the main musical instruments, so that the calculation amount of subsequent steps can be greatly reduced, and the music matching speed is improved.
Fig. 3 is a schematic diagram of an application scenario of the music matching method in an alternative embodiment of the present invention. As shown in fig. 3, motion video streams are collected to perform human body key point detection, so as to obtain motion frequency characteristics, local variation characteristics and motion amplitude characteristics, music library music matching is performed according to the three characteristics, so as to obtain target music, and then music playing is performed.
The embodiment of the invention does not need to wear wearable equipment or sensors and the like in use, and does not need to rely on the hardware performance of a special camera; the position information of the key points is used for acquiring various information such as action frequency, beat, action amplitude, facial expression and the like of a user, background music matching can be performed more diversely and accurately, and the dubbing music experience of performances such as small videos and ad hoc dances is improved.
The adoption of a plurality of characteristics for music matching can avoid the problem of poor matching effect caused by too single characteristics. In addition, different parts of the body may be suitable for different dance movements and music, such as belly dance, kick dance and the like, and even if the beats are similar, the dance music is different in dance types. By adopting the method of the embodiment of the invention, the music can be played according to the activity degree of the body part.
The sample to be detected in the embodiment of the present invention may be an offline video stream. For example, after the user takes a small video (sample to be detected) about pet entertainment, if music is to be matched for the small video, the target object is a pet in the small video. The small video or a video clip with a preset duration cut from the small video can be used as the sample to be detected.
The sample to be detected in the embodiment of the present invention may also be a real-time video stream that is periodically collected. For example, when an impromptu action is entertained, a real-time video of the impromptu action (e.g., a segment of an impromptu performance video taken every 5 seconds) may be the sample to be tested. For another example, when matching music for early education, dancing in body, impromptu, artistic evening, party and other entertainment activities, a section of impromptu video collected at a certain time interval can be used as a sample to be detected, for example, a section of video with the duration of 5 seconds is collected every 30 seconds and used as a sample to be detected.
With the self-media activity of small videos and the like, users basically adopt background music added on the short videos after shooting the short videos to further express feelings and contents to be expressed by the short videos. The existing background music is usually added manually, and a great deal of time and energy are consumed for seeking suitable music. The embodiment of the invention can automatically match proper music for the limb actions in the short video, saves the labor matching cost and has good matching effect.
In daily impromptu actions, or early education music, body feeling dancing, impromptu performance, artistic evening, party and other entertainment, people usually want to seek music corresponding to the actions according to the beats of the people, and hope to find music addition effects in time with the actions of the people at any time. The music player can achieve the whole body and mind fusion with music through the action beat, frequency, amplitude and the like of the music player, even use the excitement degree of different moods to control the action speed, quickly find out the tacit music and achieve the state of human-voice integration. At present, according to the matching music of limbs and the like, the music mainly depends on manual work or various wearable devices, multiple operations and measurement are carried out, the hardware cost is increased, meanwhile, the music still needs to be matched with a body sensing device for use, and the convenience is lacked. According to the embodiment of the invention, the video stream is analyzed without wearing equipment, the background music in the video or on site is matched by combining the key points with the information such as the amplitude, rhythm and the like of the action, the complexity of manually selecting the background music can be reduced, the defect that the traditional identification processing technology only identifies partial information such as hands or beats is overcome, and the matching effect is improved.
According to a second aspect of the embodiments of the present invention, there is provided an apparatus for implementing the above method.
Fig. 4 is a schematic diagram of main blocks of an apparatus for music matching according to an embodiment of the present invention, and as shown in fig. 4, the apparatus 400 for music matching includes:
the key point detection module 401 determines position information of a plurality of key points of a target object in a sample to be detected based on a pre-trained key point detection model;
the feature analysis module 402 determines the action features of the target object according to the position information of the plurality of key points;
and the music matching module 403 is configured to screen target music matched with the sample to be detected from a preset music library according to the motion characteristics.
Optionally, the action feature comprises: an action frequency characteristic of the first characteristic part; the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps:
determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the number of the first features is plural; the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps:
for each first characteristic part, determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the action feature comprises: the action amplitude characteristic and the action frequency characteristic of the first characteristic part;
the feature analysis module is further to: before the music matching module screens target music matched with the sample to be detected from a preset music library, determining the action amplitude characteristics of a target object according to the position information of the key points;
the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the number of the first features is plural; the music matching module screens target music matched with the sample to be detected from a preset music library according to the action characteristics, and the method comprises the following steps:
for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
Optionally, the feature analysis module is further configured to: determining a target music type matched with the sample to be detected before the music matching module screens the target music matched with the sample to be detected from a preset music library;
the music matching module screens target music matched with the sample to be detected from a preset music library, and the method comprises the following steps: and screening a music set with the target music type from a preset music library, and screening target music matched with the sample to be detected from the music set.
Optionally, the determining, by the feature analysis module, a target music type matched with the sample to be detected includes:
and determining the local change characteristics of the second characteristic part of the target object according to the position information of the plurality of key points, and determining the target music type matched with the sample to be detected according to the local change characteristics.
Optionally, the target music type is a kind of a main instrument of the target music.
Optionally, the sample to be detected is an offline video stream or a periodically acquired real-time video stream.
According to a third aspect of embodiments of the present invention, there is provided an electronic device for music matching, comprising:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method provided by the first aspect of the embodiments of the present invention.
According to a fourth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method provided by the first aspect of embodiments of the present invention.
Fig. 5 illustrates an exemplary system architecture 500 of a music matching method or a music matching apparatus to which embodiments of the present invention may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 serves to provide a medium for communication links between the terminal devices 501, 502, 503 and the server 505. Network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 501, 502, 503 to interact with a server 505 over a network 504 to receive or send messages or the like. The terminal devices 501, 502, 503 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 501, 502, 503 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 505 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 501, 502, 503. The background management server may analyze and otherwise process the received data such as the music matching request, and feed back the processing result (e.g., the target music — just an example) to the terminal device.
It should be noted that the method for matching music provided by the embodiment of the present invention is generally executed by the server 505, and accordingly, the apparatus for matching music is generally disposed in the server 505.
It should be understood that the number of terminal devices, networks, and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use with a terminal device implementing an embodiment of the invention is shown. The terminal device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 601.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor comprising: the key point detection module is used for determining the position information of a plurality of key points of the target object in the sample to be detected based on a pre-trained key point detection model; the characteristic analysis module is used for determining the action characteristics of the target object according to the position information of the key points; and the music matching module is used for screening the target music matched with the sample to be detected from a preset music library according to the action characteristics. The names of these modules do not limit the module itself in some cases, for example, the key point detection module may also be described as a "module for screening target music matching the sample to be detected from a preset music library".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: determining position information of a plurality of key points of a target object in a sample to be detected based on a pre-trained key point detection model; determining the action characteristics of the target object according to the position information of the key points; and screening the target music matched with the sample to be detected from a preset music library according to the action characteristics.
According to the technical scheme of the embodiment of the invention, the personalized music collocation scheme can be provided by matching the music according to the position information of the key point without the support of intelligent hardware, wearable equipment and the like, and the invention has low cost and good convenience.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (11)
1. A method of music matching, comprising:
determining position information of a plurality of key points of a target object in a sample to be detected based on a pre-trained key point detection model;
determining the action characteristics of the target object according to the position information of the key points;
and screening the target music matched with the sample to be detected from a preset music library according to the action characteristics.
2. The method of claim 1, wherein the action features comprise: an action frequency characteristic of the first characteristic part; according to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps:
determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
3. The method of claim 2, wherein the number of first features is plural; according to the action frequency characteristics of the first characteristic part, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps:
for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action frequency characteristics of the first characteristic part of the sample to be detected, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
4. The method of claim 1, wherein the action features comprise: the action amplitude characteristic and the action frequency characteristic of the first characteristic part;
before the target music matched with the sample to be detected is screened from a preset music library, the method further comprises the following steps: determining the action amplitude characteristics of the target object according to the position information of the plurality of key points;
according to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part; and taking the music with the highest matching degree as the target music matched with the sample to be detected.
5. The method of claim 4, wherein the number of first features is plural; according to the action characteristics, screening target music matched with the sample to be detected from a preset music library, wherein the method comprises the following steps:
for each first feature: determining the matching degree between the sample to be detected and each piece of music in the music library according to the action amplitude characteristic and the action frequency characteristic of the first characteristic part, and taking the music with the highest matching degree as candidate music corresponding to the first characteristic part;
and taking the candidate music with the highest matching degree as the target music matched with the sample to be detected.
6. The method according to any one of claims 1 to 5, wherein before the step of screening the target music matched with the sample to be detected from the preset music library, the method further comprises the following steps: determining a target music type matched with the sample to be detected;
screening the target music matched with the sample to be detected from a preset music library, wherein the screening comprises the following steps: and screening a music set with the target music type from a preset music library, and screening target music matched with the sample to be detected from the music set.
7. The method of claim 6, wherein determining the target music type that matches the sample to be tested comprises:
and determining the local change characteristics of the second characteristic part of the target object according to the position information of the plurality of key points, and determining the target music type matched with the sample to be detected according to the local change characteristics.
8. The method of claim 7, wherein the target music type is a dominant instrument category of the target music.
9. An apparatus for music matching, comprising:
the key point detection module is used for determining the position information of a plurality of key points of the target object in the sample to be detected based on a pre-trained key point detection model;
the characteristic analysis module is used for determining the action characteristics of the target object according to the position information of the key points;
and the music matching module is used for screening the target music matched with the sample to be detected from a preset music library according to the action characteristics.
10. An electronic device for music matching, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.
11. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010245027.2A CN111782858B (en) | 2020-03-31 | 2020-03-31 | Music matching method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010245027.2A CN111782858B (en) | 2020-03-31 | 2020-03-31 | Music matching method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111782858A true CN111782858A (en) | 2020-10-16 |
CN111782858B CN111782858B (en) | 2024-04-05 |
Family
ID=72753118
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010245027.2A Active CN111782858B (en) | 2020-03-31 | 2020-03-31 | Music matching method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111782858B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113486214A (en) * | 2021-07-23 | 2021-10-08 | 广州酷狗计算机科技有限公司 | Music matching method and device, computer equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130023343A1 (en) * | 2011-07-20 | 2013-01-24 | Brian Schmidt Studios, Llc | Automatic music selection system |
CN104754372A (en) * | 2014-02-26 | 2015-07-01 | 苏州乐聚一堂电子科技有限公司 | Beat-synchronized special effect system and beat-synchronized special effect handling method |
CN105304101A (en) * | 2015-10-29 | 2016-02-03 | 长沙相对音乐文化创作服务有限公司 | Method for realizing matching music playing by detecting motion frequency of human body |
CN109462776A (en) * | 2018-11-29 | 2019-03-12 | 北京字节跳动网络技术有限公司 | A kind of special video effect adding method, device, terminal device and storage medium |
CN110602550A (en) * | 2019-08-09 | 2019-12-20 | 咪咕动漫有限公司 | Video processing method, electronic equipment and storage medium |
CN110711374A (en) * | 2019-10-15 | 2020-01-21 | 石家庄铁道大学 | Multi-modal dance action evaluation method |
CN110798737A (en) * | 2019-11-29 | 2020-02-14 | 北京达佳互联信息技术有限公司 | Video and audio synthesis method, terminal and storage medium |
CN110852047A (en) * | 2019-11-08 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Text score method, device and computer storage medium |
-
2020
- 2020-03-31 CN CN202010245027.2A patent/CN111782858B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130023343A1 (en) * | 2011-07-20 | 2013-01-24 | Brian Schmidt Studios, Llc | Automatic music selection system |
CN104754372A (en) * | 2014-02-26 | 2015-07-01 | 苏州乐聚一堂电子科技有限公司 | Beat-synchronized special effect system and beat-synchronized special effect handling method |
CN105304101A (en) * | 2015-10-29 | 2016-02-03 | 长沙相对音乐文化创作服务有限公司 | Method for realizing matching music playing by detecting motion frequency of human body |
CN109462776A (en) * | 2018-11-29 | 2019-03-12 | 北京字节跳动网络技术有限公司 | A kind of special video effect adding method, device, terminal device and storage medium |
CN110602550A (en) * | 2019-08-09 | 2019-12-20 | 咪咕动漫有限公司 | Video processing method, electronic equipment and storage medium |
CN110711374A (en) * | 2019-10-15 | 2020-01-21 | 石家庄铁道大学 | Multi-modal dance action evaluation method |
CN110852047A (en) * | 2019-11-08 | 2020-02-28 | 腾讯科技(深圳)有限公司 | Text score method, device and computer storage medium |
CN110798737A (en) * | 2019-11-29 | 2020-02-14 | 北京达佳互联信息技术有限公司 | Video and audio synthesis method, terminal and storage medium |
Non-Patent Citations (1)
Title |
---|
李海峰;孙佳音;张田;马琳;: "基于音乐认知原理的音乐旋律发现技术", 信号处理, no. 10 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113486214A (en) * | 2021-07-23 | 2021-10-08 | 广州酷狗计算机科技有限公司 | Music matching method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111782858B (en) | 2024-04-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109543064B (en) | Lyric display processing method and device, electronic equipment and computer storage medium | |
CN109377539B (en) | Method and apparatus for generating animation | |
EP3803846B1 (en) | Autonomous generation of melody | |
US11670188B2 (en) | Method and apparatus for an adaptive and interactive teaching of playing a musical instrument | |
US20150373455A1 (en) | Presenting and creating audiolinks | |
US20150161908A1 (en) | Method and apparatus for providing sensory information related to music | |
US11314475B2 (en) | Customizing content delivery through cognitive analysis | |
US20110169603A1 (en) | Distinguishing between user physical exertion biometric feedback and user emotional interest in a media stream | |
CN112365876B (en) | Method, device and equipment for training speech synthesis model and storage medium | |
CN108885639A (en) | Properties collection navigation and automatic forwarding | |
US20150379774A1 (en) | System and method for dynamically generating contextual and personalized digital content | |
US20240054911A2 (en) | Crowd-based device configuration selection of a music teaching system | |
CN107316641B (en) | Voice control method and electronic equipment | |
US9436756B2 (en) | Media system for generating playlist of multimedia files | |
US11511200B2 (en) | Game playing method and system based on a multimedia file | |
US11341945B2 (en) | Techniques for learning effective musical features for generative and retrieval-based applications | |
CN111027419B (en) | Method, device, equipment and medium for detecting video irrelevant content | |
CN112235635B (en) | Animation display method, animation display device, electronic equipment and storage medium | |
JP6535497B2 (en) | Music recommendation system, program and music recommendation method | |
CA3189604A1 (en) | Dance segment recognition method, dance segment recognition apparatus, and storage medium | |
CN110209658B (en) | Data cleaning method and device | |
CN112153460A (en) | Video dubbing method and device, electronic equipment and storage medium | |
Wang et al. | Clustering-based emotion recognition micro-service cloud framework for mobile computing | |
CN106530377B (en) | Method and apparatus for manipulating three-dimensional animated characters | |
CN111782858B (en) | Music matching method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |