CN114283850A - Music beat detection method and device and electronic equipment - Google Patents

Music beat detection method and device and electronic equipment Download PDF

Info

Publication number
CN114283850A
CN114283850A CN202111670399.0A CN202111670399A CN114283850A CN 114283850 A CN114283850 A CN 114283850A CN 202111670399 A CN202111670399 A CN 202111670399A CN 114283850 A CN114283850 A CN 114283850A
Authority
CN
China
Prior art keywords
audio data
target audio
beat
determining
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111670399.0A
Other languages
Chinese (zh)
Inventor
李治均
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Lianzhou International Technology Co Ltd
Original Assignee
Shenzhen Lianzhou International Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Lianzhou International Technology Co Ltd filed Critical Shenzhen Lianzhou International Technology Co Ltd
Priority to CN202111670399.0A priority Critical patent/CN114283850A/en
Publication of CN114283850A publication Critical patent/CN114283850A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Auxiliary Devices For Music (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The application provides a music beat detection method, a music beat detection device and electronic equipment. The method comprises the following steps: acquiring target audio data; establishing a state space according to target audio data, wherein the state space comprises a plurality of BPMs and a plurality of audio points with hidden states, the audio points are obtained by splitting the BPMs according to the duration of the target audio data, and the hidden states comprise states of beat points or states of not beat points; and determining whether the target audio data is a beat point or not according to at least the space state. The method considers the characteristic that the music rhythm has a repeatability rule, and determines whether the target audio data is a beat point or not by establishing a discrete state space comprising the BPM and a plurality of audio frequency points under the BPM and according to the state space, so that the states of the beat point and the states of the beat points can be measured simultaneously, and the accuracy of the result is ensured to be higher.

Description

Music beat detection method and device and electronic equipment
Technical Field
The present application relates to the field of music information analysis, and in particular, to a method and an apparatus for detecting a music tempo, a computer-readable storage medium, a processor, and an electronic device.
Background
Rhythm and tone are two basic components of music, and automatic music tracking is an important technical means in music information analysis, and can be applied to tasks such as music plagiarism and the like. Currently, the mainstream music tempo detection methods are limited to energy changes in a small range, such as the patent CN107103917A, which uses the energy peak value between adjacent three frames to determine whether it is a beat point. The method only considers the energy distribution of the beat points, the energy of the beat points is actually larger at the adjacent time nodes, but the points with larger energy are not necessarily the beat points, and the accuracy rate of rhythm detection is lower.
The above information disclosed in this background section is only for enhancement of understanding of the background of the technology described herein and, therefore, certain information may be included in the background that does not form the prior art that is already known in this country to a person of ordinary skill in the art.
Disclosure of Invention
The present application mainly aims to provide a method, an apparatus, a computer-readable storage medium, a processor, and an electronic device for detecting a music tempo, so as to solve the problem in the prior art that the detection accuracy is low due to determining whether a tempo point is a tempo point according to energy distribution.
According to an aspect of an embodiment of the present invention, there is provided a method for detecting a music tempo, including: acquiring target audio data; establishing a state space according to the target audio data, wherein the state space comprises a plurality of BPMs and a plurality of audio frequency points of each BPM, the audio frequency points have hidden states, each audio frequency point is obtained by splitting the BPMs according to the duration of the target audio data, and the hidden states comprise states of beat points or states of not the beat points; and determining whether the target audio data is the beat point at least according to the state space.
Optionally, establishing a state space according to the target audio data includes: obtaining a plurality of BPM bases
Figure BDA0003449507930000011
Determining a first quantity fps, wherein T 11 second, T2Is the duration of the target audio data; according to
Figure BDA0003449507930000012
Determining the number M of audio points of each BPM; splitting the BPM according to the number of the audio frequency points to obtainTo a plurality of said audio points.
Optionally, determining whether the target audio data is the beat point at least according to the state space includes: determining a plurality of first probabilities according to the state space, wherein the first probabilities are the probabilities that any two audio frequency points are adjacent in time; determining the sum of amplitudes, wherein the sum of amplitudes is the sum of amplitudes of the target audio data in a preset frequency band, and the preset frequency band is a preset frequency band range; determining a plurality of second probabilities according to the state space and the sum of the amplitudes, wherein the second probabilities are probabilities that each audio frequency point corresponding to the target audio data has the corresponding hidden state; determining whether the target audio data is the beat point according to the first probabilities and the second probabilities, and determining the BPM of the target audio data.
Optionally, determining a plurality of first probabilities according to the state space includes: according to
Figure BDA0003449507930000021
Determining each of said first probabilities, wherein,
Figure BDA0003449507930000022
ωtthe BPM, omega of the audio frequency point at time tt-1The BPM for the audio bin at time t-1,
Figure BDA0003449507930000023
and representing the audio frequency point at the time t.
Optionally, determining a plurality of second probabilities according to the state space and the sum of the magnitudes includes: according to
Figure BDA0003449507930000024
Determining each of the second probabilities, wherein i is the hidden state, i is 0 or 1, and y is when i is 0iThe hidden state indicating the audio frequency point is a state other than the beat point, and when i is 1, y is set toiThe hidden state of the audio frequency point is represented asThe state of the beat point, s (t), is the amplitude sum.
Optionally, determining whether the target audio data is the beat point and determining the BPM of the target audio data according to each of the first probabilities and each of the second probabilities includes: according to
Figure BDA0003449507930000025
Determination of YtThe hidden state of the corresponding audio frequency point is the hidden state of the target audio data, YtThe corresponding BPM is the BPM, P (S) of the target audio data0) The preset probability value corresponding to t being 0,x1-Tis a time domain value of the target audio data.
Optionally, determining a sum of magnitudes according to the target audio data includes: determining a time domain signal of the target audio data; carrying out Fourier transform on the time domain signal to obtain a frequency domain signal; and determining the amplitude sum according to the frequency domain signal and the preset frequency band.
Optionally, there are a plurality of target audio data, in the plurality of target audio data, a first target audio data, a second target audio data, and a third target audio data are three beat points that are adjacent in sequence according to a time sequence, and after determining whether the target audio data is the beat point at least according to the state space, the method further includes: obtaining a first time interval between the first target audio data and the second target audio data; acquiring a second time interval between the second target audio data and the third target audio data; according to
Figure BDA0003449507930000026
Determining the third target audio data as the accuracy of the beat point, wherein μt=0.9μt-1+0.1Xt,μt-1=0.9μt-2+0.1Xt-1,Xt-1For the first time interval, XtIs the second time interval; in the case where the accuracy is less than or equal to a predetermined valueIn the case, it is determined that the third target audio data is not the beat point.
According to another aspect of the embodiments of the present invention, there is also provided a music tempo detection apparatus, including: a first acquisition unit configured to acquire target audio data; the establishing unit is used for establishing a state space according to the target audio data, wherein the state space comprises a plurality of BPMs and a plurality of audio frequency points of each BPM, the audio frequency points have hidden states, each audio frequency point is obtained by splitting the BPMs according to the duration of the target audio data, and the hidden states comprise states of beat points or states of not the beat points; a first determining unit, configured to determine whether the target audio data is the beat point at least according to the state space.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium including a stored program, wherein the program executes any one of the methods.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes any one of the methods.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including: one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing any of the methods described herein.
In the embodiment of the invention, firstly, target audio data is obtained; then, establishing a state space according to the target audio data, wherein the state space comprises a plurality of BPMs and a plurality of audio points with hidden states, the audio points are obtained by splitting the BPMs according to the duration of the target audio data, and the hidden states comprise states of beat points or states of not beat points; and finally, determining whether the target audio data is a beat point or not according to at least the space state. Compared with the prior art, the method has the advantages that the problem that the accuracy of rhythm detection is low due to the fact that whether the target audio data are the beat points is only considered from the energy perspective is solved, the method considers the characteristic that the music rhythm has the repeatability rule, the discrete state space comprising the BPM and the plurality of tone frequency points under the BPM is established, and whether the target audio data are the beat points is determined according to the state space, so that the states of the beat points and the states of the beat points can be measured simultaneously, the accuracy of results is high, and the problem that the detection accuracy is low due to the fact that whether the target audio data are the beat points is determined according to energy distribution in the prior art is solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application. In the drawings:
fig. 1 shows a schematic flow diagram of a method of detecting a music tempo according to an embodiment of the present application;
fig. 2 is a schematic structural diagram illustrating an apparatus for detecting a music tempo according to an embodiment of the present application;
FIG. 3 shows a schematic diagram of a state space according to an embodiment of the present application;
fig. 4 shows a flow chart of a method of detection of a music beat according to an embodiment of the present application.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
It will be understood that when an element such as a layer, film, region, or substrate is referred to as being "on" another element, it can be directly on the other element or intervening elements may also be present. Also, in the specification and claims, when an element is described as being "connected" to another element, the element may be "directly connected" to the other element or "connected" to the other element through a third element.
For convenience of description, some terms or expressions referred to in the embodiments of the present application are explained below:
beats Per Minute (Beat Per Minute, BPM for short): is the number of sound beats made between time segments of one minute.
As described in the background art, in the prior art, determining whether a beat point is a beat point according to an energy distribution results in a low detection accuracy, and in order to solve the above problems, in an exemplary embodiment of the present application, a method for detecting a music beat, a detection apparatus, a computer-readable storage medium, a processor, and an electronic device are provided.
According to an embodiment of the present application, there is provided a method of detecting a tempo of music.
Fig. 1 is a flowchart of a method of detecting a beat of music according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:
step S101, acquiring target audio data;
step S102, establishing a state space according to the target audio data, wherein the state space comprises a plurality of BPMs and a plurality of audio frequency points of each BPM, the audio frequency points have hidden states, each audio frequency point is obtained by splitting the BPMs according to the duration of the target audio data, and the hidden states comprise states of beat points or states of not the beat points;
step S103, determining whether the target audio data is the beat point at least according to the state space.
In the method, firstly, target audio data is obtained; then, establishing a state space according to the target audio data, wherein the state space comprises a plurality of BPMs and a plurality of audio points with hidden states, the audio points are obtained by splitting the BPMs according to the duration of the target audio data, and the hidden states comprise states of beat points or states of not beat points; and finally, determining whether the target audio data is a beat point or not according to at least the space state. Compared with the prior art, the method has the advantages that the problem that the accuracy of rhythm detection is low due to the fact that whether the target audio data are the beat points is only considered from the energy perspective is solved, the method considers the characteristic that the music rhythm has the repeatability rule, the discrete state space comprising the BPM and the plurality of tone frequency points under the BPM is established, and whether the target audio data are the beat points is determined according to the state space, so that the states of the beat points and the states of the beat points can be measured simultaneously, the accuracy of results is high, and the problem that the detection accuracy is low due to the fact that whether the target audio data are the beat points is determined according to energy distribution in the prior art is solved.
It should be noted that the above method can be applied to real-time audio data, and also to a complete song. When the target audio data is applied to real-time audio data, the target audio data is the real-time audio data; when the target audio data is applied to a complete song, the target audio data is each piece of audio data obtained by splitting the song.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
In an embodiment of the present application, establishing a state space according to the target audio data includes: acquiring a plurality of BPMs; according to
Figure BDA0003449507930000051
Is determined to beAn amount fps, wherein T 11 second, T2The time length of the target audio data; according to
Figure BDA0003449507930000052
Determining the number M of audio points of each BPM; and splitting the BPM according to the number of the audio frequency points to obtain a plurality of audio frequency points. Each audio frequency point has a hidden state, where the hidden state is whether a beat point is located under a corresponding BPM, because a time interval exists between beat points, and this time interval can be calculated by the BPM, that is, there are multiple non-beat points between beat points, in order to ensure uniformity of time resolutions under different spatial states (i.e., different BPMs), in this embodiment, the time interval needs to be calculated by the BPM
Figure BDA0003449507930000053
The number of audio bins (i.e., the number of hidden states) is calculated.
Specifically, according to the statistics of the BPMs of the common songs, the number of the BPMs can be 55-215, and the number of the BPMs is 161. When the target audio data has a time length T2When the first amount fps is 100, the skilled person can set the duration of the target audio data according to actual requirements. And setting the number of the BPMs with different resolutions under the corresponding BPMs according to different time intervals corresponding to different BPMs so as to better track the state conversion under different BPMs.
For example, when the BPM is 55, the number of audio bins in this state is 109 (i.e., the number of audio bins is 109) by the above equation. Assuming that the audio frequency point 1 is a beat point and the rest are non-beat points, under the condition that the BPM is not changed, the number of non-beat points between the beat points is 108. The state space is shown in fig. 3, where the black dots represent each element in the state space.
In another embodiment of the present application, determining whether the target audio data is the beat point at least according to the state space includes: determining a plurality of first probabilities according to the state space, wherein the first probabilities are the probabilities that any two audio frequency points are adjacent in time; determining the sum of amplitudes, wherein the sum of amplitudes is the sum of amplitudes of the target audio data in a preset frequency band, and the preset frequency band is a preset frequency band range; determining a plurality of second probabilities according to the state space and the sum of the amplitudes, wherein the second probabilities are probabilities that the audio frequency points corresponding to the target audio data have the corresponding hidden states; determining whether the target audio data is the beat point and determining the BPM of the target audio data according to the first probabilities and the second probabilities. In practical cases, the state between the beat points generally has only one transition possibility, namely, gradually shifts to the next beat point under the BPM; in the embodiment, the probability that any two of the audio frequency points are adjacent in time is determined, the probability that each audio frequency point has a specified hidden state is determined according to the amplitude, and finally, whether the target audio data is the beat point and the corresponding BPM is determined according to the first probability and the second probability, so that whether the target audio data is the beat point is further determined according to the periodic characteristic of the beat and the change rule between the beat points, and the accuracy of beat detection is further ensured.
In another embodiment of the present application, determining a plurality of first probabilities according to the state space includes: according to
Figure BDA0003449507930000061
Determining each of the above-mentioned first probabilities, wherein,
Figure BDA0003449507930000062
ωtthe BPM, omega of the audio frequency point at time tt-1The BPM of the audio frequency point at time t-1,
Figure BDA0003449507930000063
and the audio frequency point at the time t is shown. In this embodiment, each first probability is obtained through the above formula, which facilitates subsequent basesThe unreasonable sound frequency points are eliminated through the first probability, so that the detection result is more accurate, and the accuracy of rhythm detection is further improved.
In a specific embodiment, the first probability may be obtained by training music data, marking a beat point of the music data as 1 according to BPM, taking a value between 0 and 1 according to a time schedule, and training according to the marked value.
In order to further improve the accuracy of the tempo detection, in another embodiment of the present application, determining a plurality of second probabilities according to the state space and the sum of the magnitudes includes: according to
Figure BDA0003449507930000064
Determining each of the second probabilities, wherein i is the hidden state, i is 0 or 1, and y is the hidden state when i is 0iIndicating that the hidden state of the audio frequency point is not the beat point, and when i is 1, y is setiThe hidden state of the audio frequency point is the state of the beat point, and S (t) is the sum of the amplitudes. y isiThe state that represents the actual perception of a person, i.e. whether it is a beat point, is different from the hidden state in the state space, and the state that a person perceives is only two, i.e. a beat point and a non-beat point, and the hidden state actually changes with the change of the state, for example, when the above mentioned BPM value is 55, the hidden state has 109 results, but we can define the hidden state 1 as a beat point, and the rest hidden states as non-beat points.
In a specific embodiment, the second probability may also be obtained by training music data, and the audio of the training set is subjected to state labeling for each frame of audio according to fps, and is trained according to the label.
In yet another embodiment of the present application, determining whether the target audio data is the beat point and determining the BPM of the target audio data according to the first probabilities and the second probabilities includes: according to
Figure BDA0003449507930000065
Determination of YtThe hidden state of the corresponding audio frequency point is the hidden state of the target audio data, YtThe corresponding BPM is the BPM, P (S) of the target audio data0) The preset probability value corresponding to t being 0,x1-Tis the time domain value of the target audio data. Because the unreasonable audio frequency points are removed in the process of determining the first probability and the second probability, a more accurate detection result can be obtained according to the more accurate first probability and the more accurate second probability, and the accuracy of rhythm detection is further improved.
In another embodiment of the present application, determining the sum of the amplitudes according to the target audio data includes: determining a time domain signal of the target audio data; carrying out Fourier transform on the time domain signal to obtain a frequency domain signal; and determining the amplitude sum according to the frequency domain signal and the preset frequency band.
Specifically, a time domain signal x (T) of the target audio data is determined, wherein T is 1, 2. Then, performing Fourier transform on the time domain signal to obtain a frequency domain signal X (f), wherein f represents an audio point corresponding to the Fourier transform, then setting a preset frequency band to be 55Hz-4000Hz, and calculating the amplitude sum according to the following formula:
Figure BDA0003449507930000071
the reason for selecting the frequency band of 55-4000Hz is that the tones of the notes are corresponding to a fixed standard frequency value, the commonly used tones with 5 octaves are in the frequency band of 55-1760Hz, and the influence of the first orders of harmonic waves and the experimental value obtained by reducing other noise interference are considered.
In another embodiment of the present application, the target audio data includes a plurality of target audio data, and in the plurality of target audio data, the first target audio data, the second target audio data, and the third target audio data are three beat points that are adjacent in sequence according to a time sequence, and the target audio data is determined according to at least the state spaceAfter the beat point, the method further includes: acquiring a first time interval between the first target audio data and the second target audio data; acquiring a second time interval between the second target audio data and the third target audio data; according to
Figure BDA0003449507930000072
Determining the third target audio data as the accuracy of the beat point, whereint=0.9μt-1+0.1Xt,μt-1=0.9μt-2+0.1Xt-1,Xt-1For the first time interval, XtThe second time interval; in a case where the accuracy is less than or equal to a predetermined value, it is determined that the third target audio data is not the beat point. In this embodiment, correlation analysis is performed according to the first time interval and the second time interval, and whether the determined result is accurate is determined, so as to further improve the accuracy of detection.
In the above-described embodiment, μtIs the average value of all time intervals, mu, from the first beat point to the third target audio point detected in a piece of musict-1Is the average value, mu, of all time intervals between the first detected beat point and the second target audio pointt-2Similarly, the time interval refers to a time interval between two adjacent beat points. When the first target audio frequency point is the first beat point detected, then μt-1Is equal to the first time interval, when the first target audio point is not the first beat point detected, then μt-1Is equal to 0.9 mut-2+0.1Xt-1
In a specific embodiment of the present application, the predetermined value is 0.95, but in practical applications, the predetermined value may also be other values, and those skilled in the art can set the predetermined value according to practical situations.
The embodiment of the present application further provides a device for detecting a music beat, and it should be noted that the device for detecting a music beat of the embodiment of the present application may be used to execute the method for detecting a music beat provided by the embodiment of the present application. The following describes a music tempo detection apparatus according to an embodiment of the present application.
Fig. 2 is a schematic diagram of an apparatus for detecting a music tempo according to an embodiment of the present application. As shown in fig. 2, the apparatus includes:
a first acquisition unit 10 for acquiring target audio data;
an establishing unit 20, configured to establish a state space according to the target audio data, where the state space includes multiple BPMs and multiple audio frequency points of each BPM, the audio frequency points have hidden states, each audio frequency point is obtained by splitting the BPM according to a duration of the target audio data, and the hidden states include states of beat points or states of not the beat points;
a first determining unit 30, configured to determine whether the target audio data is the beat point at least according to the state space.
The device comprises an acquisition unit, an establishment unit and a determination unit, wherein the acquisition unit is used for acquiring target audio data; the establishing unit is used for establishing a state space according to target audio data, the state space comprises a plurality of BPMs and a plurality of audio points with hidden states, the audio points are obtained by splitting the BPMs according to the duration of the target audio data, and the hidden states comprise states of beat points or states of not beat points; the determining unit is configured to determine whether the target audio data is a beat point at least according to the spatial state. Compared with the prior art, the device only considers whether the target audio data is the beat point from the energy perspective, so that the problem of low accuracy of rhythm detection is caused, the device considers the characteristic that the music rhythm has a repeatability rule, and determines whether the target audio data is the beat point by establishing a discrete state space comprising the BPM and a plurality of audio frequency points under the BPM and determining whether the target audio data is the beat point according to the state space, so that the states of the beat point and the states of the beat point can be measured simultaneously, the accuracy of the result is ensured to be high, and the problem that whether the target audio data is the beat point or not is solved according to energy distribution in the prior art, so that the detection accuracy is low.
In an embodiment of the present application, the establishing unit includes an obtaining module, a first determining module, a second determining module, and a processing module, where the obtaining module is configured to obtain a plurality of BPMs; the first determining module is used for determining according to
Figure BDA0003449507930000081
Determining a first quantity fps, wherein T 11 second, T2The time length of the target audio data; the second determining module is used for determining according to
Figure BDA0003449507930000082
Determining the number M of audio points of each BPM; the processing module is used for splitting the BPM according to the number of the audio frequency points to obtain a plurality of audio frequency points. Each audio frequency point has a hidden state, where the hidden state is whether a beat point is located under a corresponding BPM, because a time interval exists between beat points, and this time interval can be calculated by the BPM, that is, there are multiple non-beat points between beat points, in order to ensure uniformity of time resolutions under different spatial states (i.e., different BPMs), in this embodiment, the time interval needs to be calculated by the BPM
Figure BDA0003449507930000083
The number of audio bins (i.e., the number of hidden states) is calculated.
Specifically, according to the statistics of the BPMs of the common songs, the number of the BPMs can be 55-215, and the number of the BPMs is 161. fps needs to be set according to actual requirements, and is set to be 100 in the application. And setting the number of the BPMs with different resolutions under the corresponding BPMs according to different time intervals corresponding to different BPMs so as to better track the state conversion under different BPMs.
For example, when the BPM is 55, the number of audio bins in this state is 109 (i.e., the number of hidden states is 109) by the above formula. Assuming that the hidden state 1 is a beat point and the rest are non-beat points, under the condition that the BPM is not changed, the number of non-beat point bits between beat points is 108. The state space is shown in fig. 3, where the black dots represent each element in the state space.
In yet another embodiment of the present application, the determining unit includes a third determining module, a fourth determining module, a fifth determining module, and a sixth determining module, where the third determining module is configured to determine a plurality of first probabilities according to the state space, where the first probabilities are probabilities that any two of the audio points are adjacent in time; the fourth determining module is configured to determine a sum of amplitudes, where the sum of amplitudes is a sum of amplitudes of the target audio data in a predetermined frequency band, and the predetermined frequency band is a preset frequency band range; the fifth determining module is configured to determine a plurality of second probabilities according to the state space and the sum of the amplitudes, where the second probabilities are probabilities that the audio points corresponding to the target audio data have the corresponding hidden states; the sixth determining module is configured to determine whether the target audio data is the beat point and determine the BPM of the target audio data according to each of the first probabilities and each of the second probabilities. In practical cases, the state between the beat points generally has only one transition possibility, namely, gradually shifts to the next beat point under the BPM; in the embodiment, the probability that any two of the audio frequency points are adjacent in time is determined, the probability that each audio frequency point has a specified hidden state is determined according to the amplitude, and finally, whether the target audio data is the beat point and the corresponding BPM is determined according to the first probability and the second probability, so that whether the target audio data is the beat point is further determined according to the periodic characteristic of the beat and the change rule between the beat points, and the accuracy of beat detection is further ensured.
In yet another embodiment of the present application, the third determining module includes a first determining submodule, wherein the first determining submodule is configured to determine the second determining submodule according to
Figure BDA0003449507930000091
Determining each of the above-mentioned first probabilities, wherein,
Figure BDA0003449507930000092
ωtthe BPM, omega of the audio frequency point at time tt-1The BPM of the audio frequency point at time t-1,
Figure BDA0003449507930000093
and the audio frequency point at the time t is shown. In the embodiment, the unreasonable sound frequency points are removed, so that the detection result is more accurate, and the accuracy of rhythm detection is further improved.
In a specific embodiment, the first probability may be obtained by training music data, marking the music as 1 according to BPM, and training according to the marked value, wherein the middle position of the beat point takes a value in the middle of 0-1 according to the time schedule.
In order to further improve the accuracy of the cadence detection, in another embodiment of the present application, the fifth determining module includes a second determining submodule, wherein the second determining submodule is configured to determine the cadence according to
Figure BDA0003449507930000101
Determining each of the second probabilities, wherein i is the hidden state, i is 0 or 1, and y is the hidden state when i is 0iIndicating that the hidden state of the audio frequency point is not the beat point, and when i is 1, y is setiThe hidden state of the audio frequency point is the state of the beat point, and S (t) is the sum of the amplitudes. y isiThe state that represents the actual perception of a person, i.e. whether it is a beat point, is different from the hidden state in the state space, and the state that a person perceives is only two, i.e. a beat point and a non-beat point, and the hidden state actually changes with the change of the state, for example, when the above mentioned BPM value is 55, the hidden state has 109 results, but we can define the hidden state 1 as a beat point, and the rest hidden states as non-beat points.
In a specific embodiment, the second probability may also be obtained by training music data, and the audio of the training set is subjected to state labeling for each frame of audio according to fps, and is trained according to the label.
In yet another embodiment of the present application, the sixth determining module includes a third determining submodule, wherein the third determining submodule is configured to determine the second determination value according to
Figure BDA0003449507930000102
Determination of YtThe hidden state of the corresponding audio frequency point is the hidden state of the target audio data, YtThe corresponding BPM is the BPM, P (S) of the target audio data0) For a predetermined probability value, x, corresponding to t ═ 01-TIs the time domain value of the target audio data. Because the unreasonable audio frequency points are removed in the process of determining the first probability and the second probability, a more accurate detection result can be obtained according to the more accurate first probability and the more accurate second probability, and the accuracy of rhythm detection is further improved.
In another embodiment of the present application, the fourth determining module includes a fourth determining submodule, a processing submodule and a determining submodule, wherein the fourth determining submodule is configured to determine a time-domain signal of the target audio data; the processing submodule is used for carrying out Fourier transform on the time domain signal to obtain a frequency domain signal; the determining submodule is configured to determine the amplitude sum according to the frequency domain signal and the predetermined frequency band.
Specifically, a time domain signal x (T) of the target audio data is determined, wherein T is 1, 2. Then, performing Fourier transform on the time domain signal to obtain a frequency domain signal X (f), wherein f represents an audio point corresponding to the Fourier transform, then setting a preset frequency band to be 55Hz-4000Hz, and calculating the amplitude sum according to the following formula:
Figure BDA0003449507930000103
the reason for selecting the frequency band of 55-4000Hz is that the tones of the notes are corresponding to a fixed standard frequency value, the commonly used tones with 5 octaves are in the frequency band of 55-1760Hz, and the influence of the first orders of harmonic waves and the experimental value obtained by reducing other noise interference are considered.
In another embodiment of the present application, there are a plurality of the target audio data, in the plurality of the target audio data, a first target audio data, a second target audio data, and a third target audio data are three beat points that are adjacent in sequence in time, and after determining whether the target audio data is the beat point according to at least the state space, the apparatus further includes a second obtaining unit, a third obtaining unit, a second determining unit, and a third determining unit, where the second obtaining unit is configured to obtain a first time interval between the first target audio data and the second target audio data; the third obtaining unit is configured to obtain a second time interval between the second target audio data and the third target audio data; the second determination unit is used for determining according to
Figure BDA0003449507930000111
Determining the third target audio data as the accuracy of the beat point, whereint=0.9μt-1+0.1Xt,μt-1=0.9μt-2+0.1Xt-1,Xt-1For the first time interval, XtThe second time interval; the third determining unit is configured to determine that the third target audio data is not the beat point in a case where the accuracy is less than or equal to a predetermined value. In this embodiment, correlation analysis is performed according to the first time interval and the second time interval, and whether the determined result is accurate is determined, so as to further improve the accuracy of detection.
In the above-described embodiment, μtIs the average value of all time intervals, mu, from the first beat point to the third target audio point detected in a piece of musict-1For the first detected beat pointAverage, mu, of all time intervals between to the above-mentioned second target audio pointt-2Similarly, the time interval refers to a time interval between two adjacent beat points. When the first target audio frequency point is the first beat point detected, then μt-1Is equal to the first time interval, when the first target audio point is not the first beat point detected, then μt-1Is equal to 0.9 mut-2+0.1Xt-1
In a specific embodiment of the present application, the predetermined value is 0.95, but in practical applications, the predetermined value may also be other values, and those skilled in the art can set the predetermined value according to practical situations.
The detection device for the music beat comprises a processor and a memory, wherein the first acquisition unit, the establishment unit, the first determination unit and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, and the problem that the detection accuracy is low due to the fact that whether the beat point is determined according to the energy distribution in the prior art is solved by adjusting the kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a computer-readable storage medium on which a program is stored, which, when executed by a processor, implements the above-described music tempo detection method.
The embodiment of the invention provides a processor, wherein the processor is used for running a program, and the detection method of the music beat is executed when the program runs.
An embodiment of the present invention provides an electronic device, including: one or more processors, memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the above-described methods.
The electronic device includes one or more processors, a memory, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing any of the above methods. Compared with the prior art, the method has the advantages that the problem that the accuracy of rhythm detection is low due to the fact that whether the target audio data are the beat points is only considered from the energy perspective is solved, the method considers the characteristic that the music rhythm has the repeatability rule, the discrete state space comprising the BPM and the plurality of tone frequency points under the BPM is established, and whether the target audio data are the beat points is determined according to the state space, so that the states of the beat points and the states of the beat points can be measured simultaneously, the accuracy of results is high, and the problem that the detection accuracy is low due to the fact that whether the target audio data are the beat points is determined according to energy distribution in the prior art is solved.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein when the processor executes the program, at least the following steps are realized:
step S101, acquiring target audio data;
step S102, establishing a state space according to the target audio data, wherein the state space comprises a plurality of BPMs and a plurality of audio frequency points of each BPM, the audio frequency points have hidden states, each audio frequency point is obtained by splitting the BPMs according to the duration of the target audio data, and the hidden states comprise states of beat points or states of not the beat points;
step S103, determining whether the target audio data is the beat point at least according to the state space.
The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application further provides a computer program product adapted to perform a program of initializing at least the following method steps when executed on a data processing device:
step S101, acquiring target audio data;
step S102, establishing a state space according to the target audio data, wherein the state space comprises a plurality of BPMs and a plurality of audio frequency points of each BPM, the audio frequency points have hidden states, each audio frequency point is obtained by splitting the BPMs according to the duration of the target audio data, and the hidden states comprise states of beat points or states of not the beat points;
step S103, determining whether the target audio data is the beat point at least according to the state space.
In order to make the technical solutions of the present disclosure more clearly understood by those skilled in the art, the technical solutions of the present disclosure will be described in detail below with reference to specific examples and comparative examples.
Examples
A flowchart of the method for detecting a music beat is shown in fig. 4, and the method includes the following steps:
1. inputting: determining a time domain signal x (T) of the target audio data, wherein T is 1, 2.
2. Determining the sum of the amplitudes: performing Fourier transform on the time domain signal to obtain a frequency domain signal X (f), wherein f represents an audio point corresponding to the Fourier transform, then setting a preset frequency band to be 55Hz-4000Hz, and calculating the amplitude sum according to the following formula:
Figure BDA0003449507930000131
3. determining a first probability: first obtaining a plurality of BPMs as described above, and then obtaining a BPM based on
Figure BDA0003449507930000132
Determining a first quantity fps, wherein,T 11 second, T2For the duration of the target audio data, based on
Figure BDA0003449507930000133
Determining the number M of audio frequency points of each BPM, and finally splitting the BPM according to the number of the audio frequency points to obtain a plurality of audio frequency points;
according to
Figure BDA0003449507930000134
A first probability is determined, wherein,
Figure BDA0003449507930000135
ωtthe BPM, omega of the audio frequency point at time tt-1The BPM of the audio frequency point at time t-1,
Figure BDA0003449507930000136
and the audio frequency point at the time t is shown.
4. Determining a second probability: according to
Figure BDA0003449507930000137
Determining each second probability, wherein i is the hidden state, i is 0 or 1, and when i is 0, y isiIndicating that the hidden state of the audio frequency point is not the beat point, and when i is 1, y is setiThe hidden state of the audio frequency point is the state of the beat point, and S (t) is the sum of the amplitudes.
5. Determining a beat point and BPM: according to
Figure BDA0003449507930000138
Determination of YtThe hidden state of the corresponding audio frequency point is the hidden state of the target audio data, YtThe corresponding BPM is the BPM, P (S) of the target audio data0) For a predetermined probability value, x, corresponding to t ═ 01-TIs the time domain value of the target audio data.
6. Determining the accuracy: according to a plurality of beatsTime interval of points and
Figure BDA0003449507930000139
determining the accuracy of the beat point, whereint=0.9μt-1+0.1Xt,μt-1=0.9μt-2+0.1Xt-1,Xt-1For the first time interval, XtThe second time interval is the above-mentioned second time interval.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a computer-readable storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned computer-readable storage media comprise: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
From the above description, it can be seen that the above-described embodiments of the present application achieve the following technical effects:
1) the method for detecting the music tempo comprises the steps of firstly, obtaining target audio data; then, establishing a state space according to the target audio data, wherein the state space comprises a plurality of BPMs and a plurality of audio points with hidden states, the audio points are obtained by splitting the BPMs according to the duration of the target audio data, and the hidden states comprise states of beat points or states of not beat points; and finally, determining whether the target audio data is a beat point or not according to at least the space state. Compared with the prior art, the method has the advantages that the problem that the accuracy of rhythm detection is low due to the fact that whether the target audio data are the beat points is only considered from the energy perspective is solved, the method considers the characteristic that the music rhythm has the repeatability rule, the discrete state space comprising the BPM and the plurality of tone frequency points under the BPM is established, and whether the target audio data are the beat points is determined according to the state space, so that the states of the beat points and the states of the beat points can be measured simultaneously, the accuracy of results is high, and the problem that the detection accuracy is low due to the fact that whether the target audio data are the beat points is determined according to energy distribution in the prior art is solved.
2) The device for detecting the music tempo comprises an acquisition unit, an establishment unit and a determination unit, wherein the acquisition unit is used for acquiring target audio data; the establishing unit is used for establishing a state space according to target audio data, the state space comprises a plurality of BPMs and a plurality of audio points with hidden states, the audio points are obtained by splitting the BPMs according to the duration of the target audio data, and the hidden states comprise states of beat points or states of not beat points; the determining unit is configured to determine whether the target audio data is a beat point at least according to the spatial state. Compared with the prior art, the device only considers whether the target audio data is the beat point from the energy perspective, so that the problem of low accuracy of rhythm detection is caused, the device considers the characteristic that the music rhythm has a repeatability rule, and determines whether the target audio data is the beat point by establishing a discrete state space comprising the BPM and a plurality of audio frequency points under the BPM and determining whether the target audio data is the beat point according to the state space, so that the states of the beat point and the states of the beat point can be measured simultaneously, the accuracy of the result is ensured to be high, and the problem that whether the target audio data is the beat point or not is solved according to energy distribution in the prior art, so that the detection accuracy is low.
3) An electronic device of the present application includes one or more processors, memory, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the one or more processors, and the one or more programs include instructions for performing any of the above-described methods. Compared with the prior art, the method has the advantages that the problem that the accuracy of rhythm detection is low due to the fact that whether the target audio data are the beat points is only considered from the energy perspective is solved, the method considers the characteristic that the music rhythm has the repeatability rule, the discrete state space comprising the BPM and the plurality of tone frequency points under the BPM is established, and whether the target audio data are the beat points is determined according to the state space, so that the states of the beat points and the states of the beat points can be measured simultaneously, the accuracy of results is high, and the problem that the detection accuracy is low due to the fact that whether the target audio data are the beat points is determined according to energy distribution in the prior art is solved.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (12)

1. A method for detecting a tempo of music, comprising:
acquiring target audio data;
establishing a state space according to the target audio data, wherein the state space comprises a plurality of BPMs and a plurality of audio frequency points of each BPM, the audio frequency points have hidden states, each audio frequency point is obtained by splitting the BPMs according to the duration of the target audio data, and the hidden states comprise states of beat points or states of not the beat points;
and determining whether the target audio data is the beat point at least according to the state space.
2. The method of claim 1, wherein establishing a state space based on the target audio data comprises:
acquiring a plurality of BPMs;
according to
Figure FDA0003449507920000011
Determining a first quantity fps, wherein T11 second, T2Is the duration of the target audio data;
according to
Figure FDA0003449507920000012
Determining the number M of audio points of each BPM;
and splitting the BPM according to the number of the audio frequency points to obtain a plurality of audio frequency points.
3. The method of claim 1, wherein determining whether the target audio data is the beat point according to at least the state space comprises:
determining a plurality of first probabilities according to the state space, wherein the first probabilities are the probabilities that any two audio frequency points are adjacent in time;
determining the sum of amplitudes, wherein the sum of amplitudes is the sum of amplitudes of the target audio data in a preset frequency band, and the preset frequency band is a preset frequency band range;
determining a plurality of second probabilities according to the state space and the sum of the amplitudes, wherein the second probabilities are probabilities that each audio frequency point corresponding to the target audio data has the corresponding hidden state;
determining whether the target audio data is the beat point according to the first probabilities and the second probabilities, and determining the BPM of the target audio data.
4. The method of claim 3, wherein determining a plurality of first probabilities from the state space comprises:
according to
Figure FDA0003449507920000013
Determining each of said first probabilities, wherein,
Figure FDA0003449507920000021
ωtthe BPM, omega of the audio frequency point at time tt-1The BPM for the audio bin at time t-1,
Figure FDA0003449507920000022
and representing the audio frequency point at the time t.
5. The method of claim 4, wherein determining a plurality of second probabilities from the state space and the sum of magnitudes comprises:
according to
Figure FDA0003449507920000023
Determining each of the second probabilities, wherein i is the hidden state, i is 0 or 1, and y is when i is 0iThe hidden state indicating the audio frequency point is a state other than the beat point, and when i is 1, y is set toiThe hidden state representing the audio frequency point is the state of the beat point, and s (t) is the sum of the amplitudes.
6. The method according to claim 5, wherein determining whether the target audio data is the beat point and determining the BPM of the target audio data according to each of the first probabilities and each of the second probabilities comprises:
according to
Figure FDA0003449507920000024
Determination of YtThe hidden state of the corresponding audio frequency point is the hidden state of the target audio data, YtThe corresponding BPM is the BPM, P (S) of the target audio data0) For a predetermined probability value, x, corresponding to t ═ 01-TIs a time domain value of the target audio data.
7. The method of claim 3, wherein determining a sum of magnitudes from the target audio data comprises:
determining a time domain signal of the target audio data;
carrying out Fourier transform on the time domain signal to obtain a frequency domain signal;
and determining the amplitude sum according to the frequency domain signal and the preset frequency band.
8. The method according to any one of claims 1 to 7, wherein the target audio data is plural, and of the plural target audio data, the first target audio data, the second target audio data, and the third target audio data are three of the beat points that are adjacent in time series, and after determining whether the target audio data is the beat point based on at least the state space, the method further comprises:
obtaining a first time interval between the first target audio data and the second target audio data;
acquiring a second time interval between the second target audio data and the third target audio data;
according to
Figure FDA0003449507920000025
Determining the third target audio data as the accuracy of the beat point, wherein μt=0.9μt-1+0.1Xt,μt-1=0.9μt-2+0.1Xt-1,Xt-1For the first time interval, XtIs the second time interval;
determining that the third target audio data is not the beat point in a case where the accuracy is less than or equal to a predetermined value.
9. An apparatus for detecting a tempo of music, comprising:
a first acquisition unit configured to acquire target audio data;
the establishing unit is used for establishing a state space according to the target audio data, wherein the state space comprises a plurality of BPMs and a plurality of audio frequency points of each BPM, the audio frequency points have hidden states, each audio frequency point is obtained by splitting the BPMs according to the duration of the target audio data, and the hidden states comprise states of beat points or states of not the beat points;
a first determining unit, configured to determine whether the target audio data is the beat point at least according to the state space.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program performs the method of any one of claims 1 to 8.
11. A processor, characterized in that the processor is configured to run a program, wherein the program when running performs the method of any of claims 1 to 8.
12. An electronic device, comprising: one or more processors, memory, and one or more programs stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the method of any of claims 1-8.
CN202111670399.0A 2021-12-30 2021-12-30 Music beat detection method and device and electronic equipment Pending CN114283850A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111670399.0A CN114283850A (en) 2021-12-30 2021-12-30 Music beat detection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111670399.0A CN114283850A (en) 2021-12-30 2021-12-30 Music beat detection method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114283850A true CN114283850A (en) 2022-04-05

Family

ID=80879773

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111670399.0A Pending CN114283850A (en) 2021-12-30 2021-12-30 Music beat detection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114283850A (en)

Similar Documents

Publication Publication Date Title
RU2743315C1 (en) Method of music classification and a method of detecting music beat parts, a data medium and a computer device
Vincent et al. The signal separation evaluation campaign (2007–2010): Achievements and remaining challenges
US8853516B2 (en) Audio analysis apparatus
CN110265064B (en) Audio frequency crackle detection method, device and storage medium
JP5565374B2 (en) Device for changing the segmentation of audio works
Friberg et al. Using listener-based perceptual features as intermediate representations in music information retrieval
Eerola et al. Shared periodic performer movements coordinate interactions in duo improvisations
CN104620313A (en) Audio signal analysis
Stein et al. Automatic detection of audio effects in guitar and bass recordings
CN109979469B (en) Signal processing method, apparatus and storage medium
CN106250400A (en) A kind of audio data processing method, device and system
US8046384B2 (en) Information processing apparatus, information processing method and information processing program
JP7337169B2 (en) AUDIO CLIP MATCHING METHOD AND APPARATUS, COMPUTER PROGRAM AND ELECTRONIC DEVICE
WO2019017242A1 (en) Musical composition analysis method, musical composition analysis device and program
CN109410972B (en) Method, device and storage medium for generating sound effect parameters
CN111399745A (en) Music playing method, music playing interface generation method and related products
JP2010097084A (en) Mobile terminal, beat position estimation method, and beat position estimation program
CN110070891B (en) Song identification method and device and storage medium
CN114283850A (en) Music beat detection method and device and electronic equipment
KR100974871B1 (en) Feature vector selection method and apparatus, and audio genre classification method and apparatus using the same
CN111477248B (en) Audio noise detection method and device
CN112687247B (en) Audio alignment method and device, electronic equipment and storage medium
US20220005443A1 (en) Musical analysis method and music analysis device
CN113674723A (en) Audio processing method, computer equipment and readable storage medium
US11398212B2 (en) Intelligent accompaniment generating system and method of assisting a user to play an instrument in a system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination