CN107103917A

CN107103917A - Music rhythm detection method and its system

Info

Publication number: CN107103917A
Application number: CN201710159699.XA
Authority: CN
Inventors: 王子亮; 邹应双; 武建聪; 蔡智力; 欧继福; 陈待有
Original assignee: Fujian Star Net eVideo Information Systems Co Ltd
Current assignee: Fujian Star Net eVideo Information Systems Co Ltd
Priority date: 2017-03-17
Filing date: 2017-03-17
Publication date: 2017-08-29
Anticipated expiration: 2037-03-17
Also published as: CN107103917B

Abstract

The invention discloses a kind of music rhythm detection method and its system, method includes：Obtain the voice data of music；An audio frame is sequentially obtained in the voice data as current audio frame, by the spectrum energy of current audio frame and a upper audio frame and difference, as current audio frame energy differences and preserve the energy differences；Determine the corresponding energy threshold of current audio frame；The energy differences of current audio frame and its continuous audio frame of two or more adjacent thereto before are obtained, the energy differences of the audio frame of more than three are obtained；If there is peak value in the energy differences of the audio frame of described more than three, and the peak value is more than the corresponding energy threshold of current audio frame, then the corresponding audio frame of the peak value is labeled as into rhythm point.The present invention can accurately and rapidly detect the position of rhythm point in music, and strong adaptability.

Description

Music rhythm detection method and system

Technical Field

The invention relates to the technical field of audio data processing, in particular to a music rhythm detection method and a system thereof.

Background

At present, most of light control in most stages adopts a DMX control console to carry out manual control, the labor cost is extremely consumed, and the audio control light technology mainly replaces manual work to detect the rhythm of songs in real time through software and control light. The existing audio control light technology is not high in detection accuracy and adaptability to song rhythm, and the purpose of intelligent interaction of music and light is difficult to achieve.

In chinese patent publication No. CN201210477064.1, a music tempo detection method and a detection apparatus are disclosed, in which the music tempo detection method includes: acquiring an audio signal of the detected music; calculating a cross-correlation function between the audio signal of the detected music and the audio signal of a preset music rhythm model; obtaining the number of preset music rhythm models contained in the detected music according to the type of the detected music and the frequency of the wave crest positions of the cross-correlation function curve appearing in the detected music; and comparing the number of preset music rhythm models contained in the detected music with preset rhythm information to determine the rhythm of the detected music. The comparison file obtains the correlation function of the detected audio signal by pre-establishing a music rhythm model, and further obtains the rhythm of the detected audio. By adopting the detection method, audio data needs to be obtained in advance, a music rhythm model is established, detection steps and complexity are increased, and inconvenience is brought to practical application.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: a music rhythm detection method and a system thereof are provided, which can improve the detection efficiency and accuracy.

In order to solve the technical problems, the invention adopts the technical scheme that: a music tempo detection method comprising:

acquiring audio data of music;

sequentially acquiring an audio frame from the audio data as a current audio frame, taking the difference value of the spectral energy sum of the current audio frame and the previous audio frame as the energy difference value of the current audio frame, and storing the energy difference value;

determining an energy threshold corresponding to the current audio frame;

acquiring energy difference values of a current audio frame and more than two adjacent continuous audio frames before the current audio frame to obtain energy difference values of more than three audio frames;

and if the energy difference value of the more than three audio frames has a peak value, and the peak value is larger than the energy threshold value corresponding to the current audio frame, marking the audio frame corresponding to the peak value as a rhythm point.

The invention also relates to a music tempo detection system comprising:

the first acquisition module is used for acquiring audio data of music;

a second obtaining module, configured to sequentially obtain an audio frame from the audio data as a current audio frame, use a difference between a sum of spectral energies of the current audio frame and a previous audio frame as an energy difference of the current audio frame, and store the energy difference;

the determining module is used for determining an energy threshold corresponding to the current audio frame;

the third acquisition module is used for acquiring the energy difference value of the current audio frame and more than two adjacent continuous audio frames before the current audio frame to obtain the energy difference value of more than three audio frames;

and the marking module is used for marking the audio frame corresponding to the peak value as a rhythm point if the peak value exists in the energy difference values of the more than three audio frames and the peak value is larger than the energy threshold value corresponding to the current audio frame.

The invention has the beneficial effects that: the rhythm point in the audio data is detected according to the energy difference value between the audio frames, which can be carried out in real time and has higher accuracy; the rhythm point is determined by analyzing and comparing the energy difference values of a plurality of adjacent audio frames without establishing a model, so that the detection efficiency is improved; the energy threshold value is adaptively adjusted according to the energy difference value of the processed audio frame, so that the energy threshold value is more matched with the currently processed audio data, the detected rhythm points are prevented from being too few or too many, the detection accuracy is further improved, the method is suitable for rhythm detection of various types of music, and the method has strong adaptability and strong robustness.

Drawings

FIG. 1 is a flow chart of a music tempo detection method according to the present invention;

FIG. 2 is a flowchart of a method according to a first embodiment of the present invention;

FIG. 3 is a flowchart of a method of step S2 according to an embodiment of the present invention;

FIG. 4 is a flowchart of a method of step S3 according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a music tempo detection system according to the present invention;

fig. 6 is a schematic structural diagram of a system according to a third embodiment of the present invention.

Description of reference numerals:

1. a first acquisition module; 2. a second acquisition module; 3. a determination module; 4. a third obtaining module;

5. a marking module; 6. continuing to execute the module; 7. a control module;

21. a first acquisition unit; 22. a first obtaining unit; 23. a fourth calculation unit; 24. a second acquisition unit; 25. a second obtaining unit; 26. a fifth calculation unit; 27. a third obtaining unit;

31. a first setting unit; 32. a first calculation unit; 33. a second setting unit;

321. a second calculation unit; 322. and a third calculation unit.

Detailed Description

In order to explain technical contents, objects and effects of the present invention in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.

The most key concept of the invention is as follows: and determining a rhythm point according to the energy difference value, and simultaneously determining an energy threshold corresponding to the audio frame in real time.

Referring to fig. 1, a music tempo detection method includes:

acquiring audio data of music;

determining an energy threshold corresponding to the current audio frame;

From the above description, the beneficial effects of the present invention are: the rhythm of the audio data can be detected in real time, and the detection result is high in accuracy and efficiency.

Further, the "determining an energy threshold corresponding to the current audio frame" specifically includes:

if the sequence number corresponding to the current audio frame acquired in sequence is less than or equal to the preset frame number N, setting the energy threshold corresponding to the current audio frame as a preset first energy threshold;

if the sequence number corresponding to the current audio frame obtained in sequence is a natural number multiple of the preset frame number N, calculating to obtain a second energy threshold value according to the energy difference value of each audio frame in a first audio frame group, wherein the first audio frame group comprises the current audio frame and the previous continuous N-1 audio frames adjacent to the current audio frame;

and setting the energy threshold corresponding to the continuous N audio frames adjacent to the current audio frame after the current audio frame as the second energy threshold.

Further, if the sequence numbers corresponding to the sequentially obtained current audio frames are natural numbers times of the preset number of frames N, the step of calculating the second energy threshold according to the energy difference of each audio frame in the first audio frame group specifically includes:

if the sequence numbers corresponding to the current audio frames which are sequentially acquired are natural numbers times of the preset number of frames N, calculating to obtain an average value and a median value of energy difference values according to the energy difference values of the audio frames in the first audio frame group;

and calculating to obtain a second energy threshold according to the average value and the median.

Further, the "calculating a second energy threshold according to the average value and the median" specifically includes:

and calculating to obtain a second energy threshold according to a formula of alpha × mean + beta × mean + gamma, wherein mean is the average value, mean is the median, alpha is the weight corresponding to the average value, beta is the weight corresponding to the median, and gamma is a preset constant.

According to the description, the corresponding energy threshold of the audio frame is updated in real time according to the condition of the energy difference value of the audio frames with a certain number of frames, the robustness is strong, and the adaptability to rhythm detection of different types of songs is strong.

Further, if there is a peak in the energy difference values of the more than three audio frames and the peak is greater than the energy threshold corresponding to the current audio frame, the step of marking the audio frame corresponding to the peak as a rhythm point specifically includes: when the number of the energy difference values of the obtained audio frames is three, if the energy difference values of the three audio frames satisfy D_n-2<D_n-1And D_n-1>D_nWhile D is_n-1>_nThen D will be_n-1Marking the corresponding audio frame as a rhythm point; wherein,_nfor the energy threshold corresponding to the current audio frame, D_nFor the energy difference of the current audio frame, D_n-1For the energy difference of the current audio frame and the previous audio frame, D_n-2The energy difference value of two adjacent audio frames of the current audio frame is obtained.

Further, the "sequentially obtaining an audio frame from the audio data as a current audio frame, taking a difference between a sum of spectral energies of the current audio frame and a previous audio frame as an energy difference of the current audio frame, and storing the energy difference" specifically includes:

acquiring a first audio frame of the audio data according to a preset frame length;

fourier transform is carried out on the first audio frame to obtain the frequency spectrum of the first audio frame;

calculating to obtain the sum of the spectral energy of the frequency spectrum of the first audio frame in a preset frequency band;

acquiring a next audio frame of the audio data according to a preset frame length to serve as a current audio frame;

performing Fourier transform on the current audio frame to obtain a frequency spectrum of the current audio frame;

calculating to obtain the sum of the spectral energy of the frequency spectrum of the current audio frame in a preset frequency band;

and subtracting the sum of the spectral energy of the last audio frame from the sum of the spectral energy of the current audio frame to obtain an energy difference value of the current audio frame, and storing the energy difference value.

Further, after the step of marking the audio frame corresponding to the peak as a rhythm point, the method further includes:

and continuously executing the step of obtaining the next audio frame of the audio data according to the preset frame length as the current audio frame.

and controlling the linkage of external equipment according to the rhythm point, or displaying the audio data according to the rhythm point and the frequency spectrum energy of the corresponding audio frame.

According to the above description, the rhythm point is detected and then applied to the control of external equipment, such as the control of stage lighting, so that the intelligent interaction effect of music and lighting can be realized; and displaying the audio characteristics corresponding to the rhythm points, so that the user can visually see the rhythm change corresponding to the audio data.

Referring to fig. 5, the present invention further provides a music tempo detection system, comprising:

the first acquisition module is used for acquiring audio data of music;

Further, the determining module includes:

the first setting unit is used for setting the energy threshold corresponding to the current audio frame as a preset first energy threshold if the sequence number corresponding to the current audio frame acquired in sequence is less than or equal to a preset frame number N;

the first calculating unit is used for calculating to obtain a second energy threshold value according to the energy difference value of each audio frame in a first audio frame group if the sequence number corresponding to the current audio frame acquired in sequence is a natural number multiple of the preset frame number N, wherein the first audio frame group comprises the current audio frame and the previous continuous N-1 audio frames adjacent to the current audio frame;

and the second setting unit is used for setting the energy threshold corresponding to the continuous N audio frames adjacent to the current audio frame after the current audio frame as the second energy threshold.

Further, the first calculation unit includes:

the second calculation unit is used for calculating to obtain the average value and the median value of the energy difference values according to the energy difference values of all the audio frames in the first audio frame group if the sequence numbers corresponding to the current audio frames acquired in sequence are natural number times of the preset frame number N;

and the third calculating unit is used for calculating to obtain a second energy threshold according to the average value and the median.

Further, the third calculating unit is specifically configured to calculate a second energy threshold according to a formula ═ α × mean + β × mean + γ, where mean is the average value, mean is the median, α is a weight corresponding to the average value, β is a weight corresponding to the median, and γ is a preset constant.

Further, the marking module is specifically configured to, when the number of the energy difference values of the acquired audio frames is three, if the energy difference values of the three audio frames satisfy D_n-2<D_n-1And D_n-1>D_nWhile D is_n-1>_nThen D will be_n-1Marking the corresponding audio frame as a rhythm point; wherein,_nfor the energy threshold corresponding to the current audio frame, D_nFor the energy difference of the current audio frame, D_n-1For the energy difference of the current audio frame and the previous audio frame, D_n-2The energy difference value of two adjacent audio frames of the current audio frame is obtained.

Further, the second obtaining module includes:

the first acquisition unit is used for acquiring a first audio frame of the audio data according to a preset frame length;

the first obtaining unit is used for carrying out Fourier transform on the first audio frame to obtain the frequency spectrum of the first audio frame;

the fourth calculating unit is used for calculating and obtaining the sum of the spectral energy of the frequency spectrum of the first audio frame in a preset frequency band;

the second acquisition unit is used for acquiring the next audio frame of the audio data as the current audio frame according to the preset frame length;

the second obtaining unit is used for carrying out Fourier transform on the current audio frame to obtain the frequency spectrum of the current audio frame;

the fifth calculating unit is used for calculating and obtaining the sum of the spectrum energy of the frequency spectrum of the current audio frame in a preset frequency band;

and the third obtaining unit is used for subtracting the sum of the spectral energy of the last audio frame from the sum of the spectral energy of the current audio frame to obtain an energy difference value of the current audio frame and storing the energy difference value.

Further, still include:

and the continuous execution module is used for continuously executing the step of acquiring the next audio frame of the audio data according to the preset frame length as the current audio frame.

Further, still include:

and the control module is used for controlling the linkage of external equipment according to the rhythm point or displaying the audio data according to the rhythm point and the frequency spectrum energy of the corresponding audio frame.

Example one

Referring to fig. 2, a first embodiment of the present invention is: a music tempo detection method comprises the following steps:

s1: acquiring audio data of music; further, after the audio data are acquired, the audio data are normalized. The music includes songs and accompaniment.

S2: sequentially acquiring an audio frame from the audio data as a current audio frame, and taking the difference value of the spectral energy sum of the current audio frame and the previous audio frame as the energy difference value D of the current audio frame_nAnd storing said energy difference D_n；

S3: determining the corresponding energy of the current audio frameVolume threshold_n(ii) a In this embodiment, the energy threshold may be a preset empirical value, and in this embodiment, the energy thresholds corresponding to each audio frame are the same and are the preset empirical value.

S4: acquiring a current audio frame and more than two continuous audio frames adjacent to the current audio frame before the current audio frame to obtain energy difference values of more than three audio frames;

s5: and judging whether a peak value exists in the energy difference values of the more than three audio frames, wherein the peak value is greater than the energy threshold value corresponding to the current audio frame, and if so, executing the step S6. For example, when the energy difference values of three audio frames are acquired, it is determined whether the following condition is satisfied: d_n-2<D_n-1And D_n-1>D_nAt the same time, D_n-1>_nWherein D is_n-1For the energy difference of the current audio frame and the previous audio frame, D_n-2The energy difference value of two adjacent audio frames of the current audio frame is obtained.

S6: and marking the audio frame corresponding to the peak value as a rhythm point. The execution returns to step S2 until the audio data processing is completed.

As shown in fig. 3, step S2 includes the following steps:

s201: acquiring a first audio frame of the audio data according to a preset frame length; further, the respective frame lengths are set according to different sampling rates, for example, the sampling rate is 44.1khz, and the frame length is 1024.

S202: fourier transform is carried out on the first audio frame to obtain the frequency spectrum of the first audio frame;

s203: calculating to obtain the sum of the spectral energy of the frequency spectrum of the first audio frame in a preset frequency band; the frequency band range can be selected according to requirements, such as a low-frequency part, a medium-frequency part and a high-frequency part, and can also be full frequency band_nIt is indicated that the subscript n indicates the frame number, counting from 1, i.e. corresponding to the audio frameA serial number.

S204: acquiring a next audio frame of the audio data according to a preset frame length to serve as a current audio frame;

s205: performing Fourier transform on the current audio frame to obtain a frequency spectrum of the current audio frame;

s206: calculating to obtain the sum of the spectral energy of the frequency spectrum of the current audio frame in a preset frequency band;

s207: and subtracting the sum of the spectral energy of the last audio frame from the sum of the spectral energy of the current audio frame to obtain an energy difference value of the current audio frame, and storing the energy difference value. In particular, according to formula D_n＝S_n-S_n-1And calculating the energy difference value of the current audio frame relative to the last audio frame. The energy difference value of the first audio frame can be ignored, and the spectral energy value of the first audio frame can also be directly used as the energy difference value.

Further, after step S6, the process returns to step S204.

Preferably, after step S6, the method further includes: and controlling the linkage of external equipment according to the rhythm point, or displaying the audio data according to the rhythm point and the frequency spectrum energy of the corresponding audio frame.

And the step of controlling the external equipment linkage by the rhythm point comprises controlling the light according to the rhythm point. The method specifically comprises the following steps: a. flashing a light of one color corresponding to one rhythm point; b. flashing a type of light (e.g., a spot light); c. lights of two or more colors are sequentially flickered corresponding to one rhythm point; d. the lamps of various types are matched and twinkle corresponding to one rhythm point; e. the light flicker is matched with the water spraying, air spraying, screaming or applause effects at the same time corresponding to a rhythm point, so that the display of the light is changed in coordination with the change of the rhythm point.

The audio data according to the rhythm point and the spectral energy of the corresponding audio frame and the display may be:

displaying the audio data by using an electrocardiogram, wherein the potential change of the electrocardiogram represents the change of the frequency spectrum energy sum of the audio frame corresponding to the rhythm point, and the speed of the potential change represents the speed of the occurrence of the rhythm point;

or displaying the audio data by using a left-right moving sound column diagram, wherein the length of the sound column represents the sum of the spectral energy of the audio frame corresponding to the rhythm point, and the moving speed of the sound column represents the appearance speed of the rhythm point;

or displaying the audio data by using a sound column graph with ascending and descending changes, wherein the ascending height of the sound column represents the spectral energy sum of the audio frame corresponding to the rhythm point, and the ascending and descending speed of the sound column represents the speed of the rhythm point.

In the embodiment, the rhythm point in the audio data is detected according to the energy difference value between the audio frames, which can be carried out in real time and has higher accuracy; and a model is not required to be established, and the rhythm points are determined by analyzing and comparing the energy difference values of a plurality of adjacent audio frames, so that the detection efficiency is improved. After the rhythm point is detected, the rhythm point is applied to the control of external equipment, such as the control of stage lighting, so that the intelligent interaction effect of music and lighting can be realized; and displaying the audio characteristics corresponding to the rhythm points, so that the user can visually see the rhythm change corresponding to the audio data.

Example two

This embodiment is a further development of the first embodiment, and the same points are not described again, except that in step S3, the energy threshold is not fixed.

As shown in fig. 4, step S3 includes the following steps:

s301: and judging whether the sequence number corresponding to the current audio frame acquired in sequence is less than or equal to a preset frame number N, if so, executing the step S302, and if not, executing the step S303.

S302: setting an energy threshold corresponding to a current audio frame as a preset first energy threshold;

s303: judging whether the sequence number corresponding to the current audio frame acquired in sequence is a natural number multiple of the preset frame number N, if so, executing the step S304. The method comprises the steps of obtaining sequence numbers corresponding to current audio frames obtained in sequence, and judging whether the sequence numbers can completely divide a preset first frame number.

S304: and calculating to obtain a second energy threshold according to the energy difference value of each audio frame in a first audio frame group, wherein the first audio frame group comprises the current audio frame and the previous continuous N-1 audio frames adjacent to the current audio frame. Acquiring the energy difference value of the current audio frame and the previous continuous N-1 audio frames adjacent to the current audio frame; calculating to obtain the average value and the median value of the energy difference values according to the energy difference values; and calculating to obtain a second energy threshold according to the average value and the median. Specifically, the second energy threshold is calculated according to a formula of α × mean + β × mean + γ, where mean is the average value, mean is the median, α is a weight corresponding to the average value, β is a weight corresponding to the median, and γ is a preset constant.

S305: and setting the energy threshold corresponding to the continuous N audio frames adjacent to the current audio frame after the current audio frame as the second energy threshold.

For example, assuming that the preset frame number N is 150 frames, the energy thresholds corresponding to the first 150 frames, i.e., the 1 st frame to the 150 th frame, in the audio data are all preset first energy thresholds; meanwhile, when the 150 th frame is traversed, the sequence number is also a natural number multiple of the preset frame number N, so that the energy difference value of the 150 th frame and the previous continuous 149 frames adjacent to the 150 th frame, that is, the energy difference value from the 1 st frame to the 150 th frame is obtained, the average value and the median value of the energy difference values of the 150 th frame are calculated, then, the second energy threshold value is obtained through weighting calculation, and the second energy threshold value is used as the audio frame of the 150 th frame after the 150 th frame and adjacent to the 150 th frame, that is, the energy threshold value corresponding to the 151 th frame to the 300 th frame. Similarly, when the frame 300 is traversed, the energy threshold corresponding to the frame 301 to the frame 450 is calculated according to the energy difference value from the frame 151 to the frame 300, and so on.

The energy ranges of different songs are different, and the energy ranges of different stages of the same song can be deviated, so that the adaptability is not strong by only setting a uniform threshold, if the threshold is larger, the detected rhythm point is missing, and if the threshold is smaller, the detected rhythm point is redundant. The embodiment provides a threshold self-adaptive method, which adjusts an energy threshold according to an energy difference value of a processed audio frame, so that the energy threshold is more matched with currently processed audio data, too few or too many detected rhythm points are avoided, the detection accuracy is further improved, and the method is applicable to rhythm detection of various types of music, and has strong adaptability and robustness.

EXAMPLE III

Referring to fig. 6, the present embodiment is a music tempo detection system corresponding to the above embodiment, including:

a first obtaining module 1, configured to obtain audio data of music;

a second obtaining module 2, configured to sequentially obtain an audio frame from the audio data as a current audio frame, use a difference between a sum of spectral energies of the current audio frame and a previous audio frame as an energy difference of the current audio frame, and store the energy difference;

a determining module 3, configured to determine an energy threshold corresponding to a current audio frame;

a third obtaining module 4, configured to obtain an energy difference value between a current audio frame and two or more previous consecutive audio frames adjacent to the current audio frame, so as to obtain energy difference values of the three or more audio frames;

and the marking module 5 is configured to mark the audio frame corresponding to the peak as a rhythm point if the peak exists in the energy difference values of the more than three audio frames and the peak is greater than the energy threshold corresponding to the current audio frame.

Further, the determining module 3 includes:

a first setting unit 31, configured to set an energy threshold corresponding to the current audio frame as a preset first energy threshold if a sequence number corresponding to the sequentially acquired current audio frame is less than or equal to a preset frame number N;

the first calculating unit 32 is configured to calculate a second energy threshold according to an energy difference value of each audio frame in a first audio frame group if a sequence number corresponding to a current audio frame obtained in sequence is a natural number multiple of a preset frame number N, where the first audio frame group includes the current audio frame and N-1 consecutive audio frames before the current audio frame and adjacent to the current audio frame;

a second setting unit 33, configured to set an energy threshold corresponding to N consecutive audio frames after the current audio frame and adjacent to the current audio frame as the second energy threshold.

Further, the first calculation unit 32 includes:

a second calculating unit 321, configured to calculate, if the sequence number corresponding to the sequentially obtained current audio frame is a natural number multiple of the preset frame number N, an average value and a median value of energy difference values according to the energy difference value of each audio frame in the first audio frame group;

and a third calculating unit 322, configured to calculate a second energy threshold according to the average value and the median.

Further, the third calculating unit 322 is specifically configured to calculate a second energy threshold according to a formula ═ α × mean + β × mean + γ, where mean is the average value, mean is the median, α is a weight corresponding to the average value, β is a weight corresponding to the median, and γ is a preset constant.

Further, the marking module 5 is specifically configured to, when the number of the energy difference values of the obtained audio frames is three, if the energy difference values of the three audio frames satisfy D_n-2<D_n-1And D_n-1>D_nWhile D is_n-1>_nThen D will be_n-1Marking the corresponding audio frame as a rhythm point; wherein,_nfor the energy threshold corresponding to the current audio frame, D_nFor the energy difference of the current audio frame, D_n-1For the energy difference of the current audio frame adjacent to the previous frame, D_n-2The energy difference value of two adjacent previous frames of the current audio frame is obtained.

Further, the second obtaining module 2 includes:

a first obtaining unit 21, configured to obtain a first audio frame of the audio data according to a preset frame length;

a first obtaining unit 22, configured to perform fourier transform on the first audio frame to obtain a frequency spectrum of the first audio frame;

a fourth calculating unit 23, configured to calculate a sum of spectral energies of the frequency spectrum of the first audio frame in a preset frequency band;

a second obtaining unit 24, configured to obtain a next audio frame of the audio data according to a preset frame length, as a current audio frame;

a second obtaining unit 25, configured to perform fourier transform on the current audio frame to obtain a frequency spectrum of the current audio frame;

a fifth calculating unit 26, configured to calculate a sum of spectral energies of the frequency spectrum of the current audio frame in a preset frequency band;

a third obtaining unit 27, configured to subtract the sum of the spectral energy of the previous audio frame from the sum of the spectral energy of the current audio frame to obtain an energy difference value of the current audio frame, and store the energy difference value.

Further, still include:

and a continuous execution module 6, configured to continuously execute the step of acquiring a next audio frame of the audio data according to the preset frame length as a current audio frame.

Further, still include:

and the control module 7 is used for controlling the linkage of external equipment according to the rhythm point, or displaying the audio data according to the rhythm point and the frequency spectrum energy of the corresponding audio frame.

In summary, the music tempo detection method and system provided by the invention can detect the tempo point in the audio data according to the energy difference between the audio frames, and can be performed in real time, thereby having higher accuracy; the energy threshold value is adaptively adjusted according to the energy difference value of the processed audio frame, so that the energy threshold value is more matched with the currently processed audio data, the detected rhythm points are prevented from being too few or too many, the detection accuracy is further improved, the method is suitable for rhythm detection of various types of music, and the method has strong adaptability and strong robustness; after the rhythm point is detected, the rhythm point is applied to the control of external equipment, such as the control of stage lighting, so that the intelligent interaction effect of music and lighting can be realized; and displaying the audio characteristics corresponding to the rhythm points, so that the user can visually see the rhythm change corresponding to the audio data.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or applied directly or indirectly to the related technical fields, are included in the scope of the present invention.

Claims

1. A music tempo detection method, comprising:

acquiring audio data of music;

determining an energy threshold corresponding to the current audio frame;

2. The method according to claim 1, wherein the "determining the energy threshold corresponding to the current audio frame" specifically comprises:

3. The method for detecting a music tempo according to claim 2, wherein said "if the sequence number corresponding to the current audio frame obtained in sequence is a natural number multiple of the preset number of frames N, then calculating a second energy threshold according to the energy difference of each audio frame in the first audio frame group" specifically comprises:

4. The music tempo detection method according to claim 3, wherein said "calculating a second energy threshold value according to the average value and the median" specifically comprises:

5. The method according to claim 1, wherein if there is a peak in the energy difference values of the more than three audio frames and the peak is greater than the energy threshold corresponding to the current audio frame, the step of marking the audio frame corresponding to the peak as a tempo point specifically comprises: when the number of the energy difference values of the obtained audio frames is three, if the energy difference values of the three audio frames satisfy D_n-2<D_n-1And D_n-1>D_nWhile D is_n-1>_nThen D will be_n-1Marking the corresponding audio frame as a rhythm point; wherein,_nfor the energy threshold corresponding to the current audio frame, D_nFor the energy difference of the current audio frame, D_n-1For the energy difference of the current audio frame and the previous audio frame, D_n-2The energy difference value of two adjacent audio frames of the current audio frame is obtained.

6. The method for detecting a music tempo according to claim 1, wherein said sequentially obtaining an audio frame from the audio data as a current audio frame, and using a difference between a spectral energy sum of the current audio frame and a previous audio frame as an energy difference of the current audio frame and storing the energy difference specifically comprises:

7. The method according to claim 6, wherein after said marking the audio frame corresponding to the peak as a tempo point, further comprising:

8. The method for detecting music tempo according to claim 1, wherein after said marking the audio frame corresponding to said peak as a tempo point, further comprising:

9. A music tempo detection system characterized by comprising:

the first acquisition module is used for acquiring audio data of music;

10. The music tempo detection system according to claim 9, wherein said determination module comprises:

11. The music tempo detection system according to claim 10, wherein said first calculation unit comprises:

12. The music tempo detection system according to claim 9, further comprising: