CN104318931B

CN104318931B - Method for acquiring emotional activity of audio file, and method and device for classifying audio file

Info

Publication number: CN104318931B
Application number: CN201410521416.8A
Authority: CN
Inventors: 王徽蓉
Original assignee: Beijing Yinzhibang Culture Technology Co ltd
Current assignee: Shenzhen Taile Culture Technology Co ltd
Priority date: 2014-09-30
Filing date: 2014-09-30
Publication date: 2017-11-21
Anticipated expiration: 2034-09-30
Also published as: CN104318931A

Abstract

The invention provides a method for acquiring the emotional activity of an audio file, a method for classifying the emotional activity of the audio file and a device thereof, wherein the method for acquiring the emotional activity of the audio file comprises the following steps: acquiring a spectrogram of the audio file; acquiring the number of peak points of the voice frequency in the audio file from the spectrogram; and determining the emotional liveness of the audio file according to the number of the peak points and the duration of the audio file. According to the embodiment of the invention, the number of the peak points of the voice frequency in the audio file is obtained from the spectrogram, and the emotional activity of the audio file is determined according to the number of the peak points and the duration of the audio file, so that the emotional activity of the audio file is quantized, and a basis is provided for a user to select songs according to the emotional activity.

Description

Mood liveness acquisition methods and sorting technique, the device of a kind of audio file

Technical field

The present invention relates to voice processing technology field, more particularly to a kind of mood liveness acquisition methods of audio file and Sorting technique, device.

Background technology

In the prior art, when carrying out mood analysis to audio file, audio file is analyzed, extracts audio text The audio frequency characteristics of part, audio file is classified by using the mode of pattern-recognition.

In the mode of pattern-recognition, the feature of audio file is extracted first, for example, the strength characteristic of extraction audio file, The feature of tamber characteristic and frequency spectrum correlation, after feature is extracted, the study for having supervision is carried out using sorter model, is instructed After white silk model is set up, so as to realize the prediction to unknown audio file.By way of pattern-recognition, although can enter The classification of several situations of row, but pattern-recognition can not be quantified the mood expressed by audio file.

The content of the invention

The embodiment of the present invention provides a kind of the mood liveness acquisition methods and sorting technique, device of audio file, passes through The mood liveness of audio file is quantified, foundation is provided by mood liveness choosing song for user.

To reach above-mentioned purpose, embodiments of the invention adopt the following technical scheme that：

A kind of mood liveness acquisition methods of audio file, this includes：

Obtain the sound spectrograph of the audio file；

The peak dot number of the speech frequency in the audio file is obtained from the sound spectrograph；

The mood liveness of the audio file is determined by the duration of the peak dot number and the audio file.

A kind of sorting technique of audio file, this method include：

Method described in Pass through above-mentioned technical proposal obtains the mood liveness of the audio file；

The music file in library is classified according to the mood liveness.

A kind of mood liveness acquisition device of audio file, this includes：

Sound spectrograph acquisition module, for obtaining the sound spectrograph of the audio file；

Peak dot number acquisition module, for obtaining the peak dot of the speech frequency in the audio file from the sound spectrograph Number；

Mood liveness determining module, for determining the sound by the duration of the peak dot number and the audio file The mood liveness of frequency file.

A kind of sorter of audio file, the device include：

Described in the mood liveness acquisition device Pass through above-mentioned technical proposal of audio file described in above-mentioned technical proposal Mood liveness acquisition methods obtain the mood liveness of the audio file；

Sort module, for being classified according to the mood liveness to the music file in library.

The mood liveness acquisition methods and sorting technique, device of a kind of audio file provided in an embodiment of the present invention, lead to The peak dot number of the speech frequency obtained from sound spectrograph in audio file is crossed, and passes through peak dot number and the duration of audio file The mood liveness of audio file is determined, realizes and the mood liveness of audio file is quantified, passes through mood for user Liveness choosing song provides foundation.

Brief description of the drawings

Fig. 1 is the schematic diagram of a sound spectrograph provided in an embodiment of the present invention；

Fig. 2 is the schematic flow sheet of the mood liveness acquisition methods for the audio file that the embodiment of the present invention one provides；

Fig. 3 is the schematic flow sheet of the mood liveness acquisition methods for the audio file that the embodiment of the present invention two provides；

Fig. 4 is sound spectrograph of the embodiment illustrated in fig. 3 before step 320-330 filtering in Time And Frequency reference axis Schematic diagram；

Fig. 5 is that embodiment illustrated in fig. 3 passes through step 320-330 filtered sound spectrograph in Time And Frequency reference axis Schematic diagram；

Fig. 6 is the schematic flow sheet of the mood liveness acquisition methods for the audio file that the embodiment of the present invention three provides；

Fig. 7 is the structural representation of the mood liveness acquisition device for the audio file that the embodiment of the present invention four provides；

Fig. 8 is the structural representation of the mood liveness acquisition device for the audio file that the embodiment of the present invention five provides；

Fig. 9 is the structural representation of the mood liveness acquisition device for the audio file that the embodiment of the present invention six provides.

Embodiment

Below in conjunction with the accompanying drawings to the mood liveness acquisition methods of audio file provided in an embodiment of the present invention and classification side Method, device are described in detail.

Fig. 1 is the schematic diagram of a sound spectrograph provided in an embodiment of the present invention, as shown in figure 1, when the X-axis of sound spectrograph represents Between (axle in direction obliquely in corresponding diagram 1), Y-axis represent frequency (axle of horizontal right direction in corresponding diagram 1), Z axis represent language Sound data capacity, voice signal enter line translation in time domain and obtain the spectrogram of frequency domain, and the spectrogram is sound spectrograph.

In fig. 1, it may be seen that the saturate point compared with surrounding, the saturate point represents current in voice signal For point relative to the point amplitude highest point on periphery, the point can be as the peak dot described in the embodiment of the present invention, it can be seen that, this Peak dot described in inventive embodiments is simultaneously not only determined by corresponding range value, but the amplitude for the point of surrounding It is worth bigger point.

Embodiments of the invention are described more detail below.

Embodiment one：

Fig. 2 is the schematic flow sheet of the mood liveness acquisition methods for the audio file that the embodiment of the present invention one provides, such as Shown in Fig. 2, the embodiment of the present invention comprises the following steps：

Step 210, the sound spectrograph of audio file is obtained.

Step 220, the peak dot number of the speech frequency in audio file is obtained from sound spectrograph.

Step 230, the mood liveness of audio file is determined by the duration of peak dot number and audio file.

Processing in step 210 can be specially：Audio file is decoded, and with predetermined sample frequency (example Such as, 44100Hz) resampling is carried out to decoded signal, the audio after resampling is merged into monophonic, to the sound after merging Frequency carries out framing, and (for example, frame length is 2048,256) interframe is divided into, and carry out Hanning window processing, to the sound after above-mentioned processing Frequency carries out Fourier transform, obtains sound spectrograph.

The mood liveness acquisition methods of audio file provided in an embodiment of the present invention, by obtaining audio from sound spectrograph The peak dot number of speech frequency in file, and determine that the mood of audio file is lived by the duration of peak dot number and audio file Jerk, realize and the mood liveness of audio file is quantified, foundation is provided by mood liveness choosing song for user.

Embodiment two：

Fig. 3 is the schematic flow sheet of the mood liveness acquisition methods for the audio file that the embodiment of the present invention two provides, such as Shown in Fig. 3, the embodiment of the present invention comprises the following steps：

Step 310, the sound spectrograph of audio file is obtained.

Step 320, peak-seeking filtering process is carried out to sound spectrograph in frequency axis by the first wave filter.

Step 330, peak-seeking filtering process is carried out to sound spectrograph in time shaft by the second wave filter.

Step 340, the peak dot number of audio file is counted in the sound spectrograph after handling after filtering.

Step 350, by peak dot number divided by the duration of audio file, the mood liveness of audio file is obtained.

Wherein, the specific processing mode of the step 310 in the embodiment of the present invention may be referred to the step 210 of embodiment one Description, will not be repeated here.

In step 320, the first wave filter can be set by filter function, and by the first wave filter in frequency axis Peak-seeking filtering process is carried out to sound spectrograph.Filter function described in the embodiment of the present invention refers to formula (1)：

Wherein,y_iRepresent i-th on the frequency axis of sound spectrograph Value, y '_iI-th of value on the frequency axis of the sound spectrograph after filtering process is represented, α represents an empirical value, can be according to audio The characteristics of determine α value, σ represents the coefficient of Gaussian function, and m represents the half of width of the first wave filter, and 2m+1 represents the first filter The width of ripple device；It is possible to further by adjusting parameter α amplitude can be made smaller but the range value of more significant peak dot become Greatly.

In a step 330, the second wave filter equally can be filtered place to sound spectrograph on a timeline by above-mentioned formula (1) Reason.Correspondingly, in above-mentioned formula (1), y_iRepresent i-th of value on the time shaft of sound spectrograph, y '_iRepresent the language after filtering process I-th of value on the time shaft of spectrogram, α represent an empirical value, and α value can be determined according to the characteristics of audio, and σ represents Gauss The coefficient of function, now the half of width of m the second wave filters of expression in above-mentioned formula (1), 2m+1 represent the width of the second wave filter Degree；Further, by adjusting parameter α amplitude can be made smaller but the range value of more significant peak dot becomes big.This area skill Art personnel it is understood that can be according on time shaft the characteristics of the parameter value different from said frequencies axle is set.

Due to the speed difference that voice signal is decayed in time shaft with frequency axis, therefore the embodiment of the present invention passes through above-mentioned step Rapid 320 are respectively adopted the first wave filter and the second wave filter with step 330 carries out peak-seeking filtering respectively in frequency axis and time shaft Processing so that filtering has more specific aim, avoids filtering out real peak dot, or is protected using ghost peak as real peak dot Stay, so that the statistics of peak dot number is more accurate.

In addition, in order to obtain more preferable filter effect, the embodiment of the present invention can also be passed through by the way that step 330 is performed a plurality of times Second wave filter carries out secondary filtering processing or more filtering process in time shaft to sound spectrograph, so that peak dot is more It is sharp so that the number of follow-up statistics peak dot is more accurate.

Alternatively, in above-mentioned steps 320 and step 330, because the purpose of peak-seeking filtering is to obtain sound spectrograph In the point larger with respect to the frequency values of surrounding, therefore above-mentioned specific filter function can not form the limit to the embodiment of the present invention System, the embodiment of the present invention can also be filtered processing by other similar filter functions, as long as acquisition peak dot can be improved Accuracy.

In step 340, the step of counting the peak dot number of audio file in the sound spectrograph after handling after filtering can To be specially：More than first individual peak dots in the sound spectrograph after handling after filtering are obtained, by corresponding to more than first individual peak dots difference Compared with predetermined threshold value, the peak dot that amplitude is less than to predetermined threshold value filters out amplitude, obtains individual peak dot more than second, statistics second The number of multiple peak dots, obtain the peak dot number of the speech frequency in audio file.For example, the sound spectrograph after handling after filtering In have multiple peak dots (for example, actually 100, but now not yet counted to the number of peak dot), by 100 peaks Compared with predetermined threshold value, the peak dot that amplitude is less than to predetermined threshold value filters out corresponding amplitude point, obtains filtering out peak dot respectively Multiple peak dots (for example, actually 50 peak dots, but now not yet counted to the number of peak dot) afterwards, pass through and count filter Except the number of multiple peak dots after peak dot, the peak dot number for obtaining the speech frequency in audio file is 50.

Further, due to foregoing filtering process can be lifted amplitude it is smaller but for its adjacent point amplitude The amplitude of larger peak dot, so as to which the embodiment of the present invention can use unified predetermined threshold value.

Fig. 4 is schematic diagram of the sound spectrograph in Time And Frequency reference axis before step 320-330 filtering, and Fig. 5 is By schematic diagram of the step 320-330 filtered sound spectrograph in Time And Frequency reference axis.As shown in Figure 4 and Figure 5, filter The burr of waveform is more in the sound spectrograph of wavefront, peak dot unobvious, and the burr of waveform substantially eliminates in filtered sound spectrograph, peak Point is apparent.It can therefore be seen that by above-mentioned steps 320-330 filtering process, make the burr of waveform in sound spectrograph basic Eliminate, peak dot is more obvious, is that subsequent statistical peak dot number can be more accurate.

Embodiment three：

Fig. 6 is the schematic flow sheet of the mood liveness acquisition methods for the audio file that the embodiment of the present invention three provides, such as Shown in Fig. 6, the mood liveness acquisition methods of the audio file of the embodiment of the present invention comprise the following steps：

Step 610, the sound spectrograph of audio file is obtained.

Step 620, peak-seeking filtering process is carried out to sound spectrograph in frequency axis by the first wave filter.

Step 630, peak-seeking filtering process is carried out to sound spectrograph in time shaft by the second wave filter.

Step 640, the peak dot number of audio file is counted in the sound spectrograph after handling after filtering.

Step 650, the melody complexity of audio file is determined by the duration of peak dot number and audio file.

Step 660, the mood liveness of audio file is determined according to the rhythm intensity of rotation complexity and audio file.

The specific processing mode of step 610 in the embodiment of the present invention may be referred to the description of the step 210 of embodiment one, It will not be repeated here.

The specific processing of step 620- steps 640 in the embodiment of the present invention and advantageous effects may be referred to real implementation The description of the step 320- steps 340 of example two, will not be repeated here.

In step 650, by being analyzed song files it is known that the melody complexity of song is higher, the song Mood liveness is also higher corresponding to song.It can be proved by related experiment, song in general melody complexity is in 0.3-1.3 Between.

Because melody complexity and mood liveness are closely related, and obvious timing can directly strengthen mood and enliven Degree, therefore, in step 660, it can be determined according to the rhythm intensity of rotation complexity and audio file in the embodiment of the present invention The mood liveness of the audio file.

The embodiment of the present invention can obtain the mood liveness of audio file by formula (2)：

A=C+X*0.2 (2)

Wherein, A represents mood liveness, and C represents melody complexity, and B represents rhythm intensity, X=C*C*B, if X>=1, X =1.In addition, the rhythm intensity in the embodiment of the present invention can be obtained by prior art, the embodiment of the present invention is no longer described in detail.

The embodiment of the present invention determines that the mood of audio file is enlivened according to the rhythm intensity of rotation complexity and audio file Degree, by the way that another factor rhythm intensity for the liveness that sways the emotion calculate to the mood liveness of audio file, so as to So that mood liveness is more accurate, reliable.

The embodiment of the present invention additionally provides a kind of sorting technique of audio file, and the sorting technique of the audio file is included such as Lower step：

First, the mood for obtaining audio file to any described method of embodiment three by above-described embodiment one is enlivened Degree；

Secondly, the music file in library is classified according to mood liveness.

The music file in library is classified by mood liveness, it is possible to achieve general need of the user to library Ask, and scene classification is carried out to song according to mood liveness, further realize personalized recommendation, so as to listen song to user During produce active influence.In addition, by marking using subjective feeling and manually what is quantified to the embodiment of the present invention Evaluation method, it can draw the conclusion of the validity of the embodiment of the present invention.

Example IV：

Fig. 7 is the structural representation of the mood liveness acquisition device for the audio file that the embodiment of the present invention four provides, such as Shown in Fig. 7, the mood liveness acquisition device of the audio file of the embodiment of the present invention includes：

Sound spectrograph acquisition module 710, for obtaining the sound spectrograph of audio file.

Peak dot number acquisition module 720, for obtaining the peak dot number of the speech frequency in audio file from sound spectrograph.

Mood liveness determining module 730, for determining audio file by the duration of peak dot number and audio file Mood liveness.

Sound spectrograph acquisition module 710 can decode to audio file, and with predetermined sample frequency (for example, Resampling 44100Hz) is carried out to decoded signal, the audio after resampling is merged into monophonic, to the audio after merging Carrying out framing, (for example, frame length is 2048,256) interframe is divided into, and carry out Hanning window processing, to the audio after above-mentioned processing Fourier transform is carried out, obtains sound spectrograph.

The mood liveness acquisition device of audio file provided in an embodiment of the present invention, passes through sound spectrograph acquisition module 710 The peak dot number of the speech frequency in audio file is obtained with peak dot number acquisition module 720, and is determined by mood liveness Module 730 determines the mood liveness of audio file according to the duration of peak dot number and audio file, realizes to audio file Mood liveness quantified, for user by mood liveness choosing song foundation is provided.

Embodiment five：

Fig. 8 is the structural representation of the mood liveness acquisition device for the audio file that the embodiment of the present invention five provides, such as Shown in Fig. 8, the mood liveness acquisition device of the audio file of the embodiment of the present invention includes：

Further, peak dot number acquisition module 720 can include：

Peak-seeking filter unit 721, for being carried out respectively on frequency axis and time shaft to the sound spectrograph at peak-seeking filtering Reason.

Peak dot number statistic unit 722, for counting the peak dot of audio file in the sound spectrograph after handling after filtering Number.

Wherein, peak-seeking filter unit 721 specifically handles the step 320-330 referred in embodiment two, no longer superfluous herein State.

Further, peak dot number statistic unit 722 can include：

First obtains subelement 7221, for obtaining more than first individual peak dots in the sound spectrograph after handling after filtering；

Threshold value comparing subunit 7222, for corresponding amplitude to be compared with predetermined threshold value respectively by more than first individual peak dots Compared with；

Second obtains subelement 7223, and the peak dot for amplitude in more than first individual peak dots to be less than to predetermined threshold value filters out, and obtains To more than second individual peak dots；

Subelement 7224 is counted, for counting the number of more than second individual peak dots, obtains speech frequency in audio file Peak dot number.

First obtains subelement 7221, threshold value comparing subunit 7222, second obtains subelement 7223 and statistics subelement 7224 specific processing refers to the associated description in the step 340 in embodiment two, will not be repeated here.

Further, mood liveness determining module 730 can include：

First mood liveness determining unit 731, for by the duration of peak dot number divided by audio file, obtaining audio text The mood liveness of part.

By above-mentioned to peak dot number acquisition module 720, peak dot number statistic unit 723 and mood liveness determining module Advantageous effects caused by 730 further supplement are referred in embodiment two described in step 320- steps 350 Advantageous effects, it will not be repeated here.

Embodiment six：

Fig. 9 is the structural representation of the mood liveness acquisition device for the audio file that the embodiment of the present invention six provides, such as Shown in Fig. 9, the mood liveness acquisition device of the audio file of the embodiment of the present invention includes：

Mood liveness determining module 730, for determining the sound by the duration of the peak dot number and audio file The mood liveness of frequency file.

Further, peak dot number acquisition module 720 can include：

First filter unit 721, for being carried out respectively on frequency axis and time shaft to the sound spectrograph at peak-seeking filtering Reason.

Further, peak dot number statistic unit 722 can include：

First obtains subelement 7221, for obtaining more than first individual peak dots in the sound spectrograph after handling after filtering.

Threshold value comparing subunit 7222, for corresponding amplitude to be compared with predetermined threshold value respectively by more than first individual peak dots Compared with.

Second obtains subelement 7223, and the peak dot for amplitude in more than first individual peak dots to be less than to predetermined threshold value filters out, and obtains To more than second individual peak dots.

Further, mood liveness determining module 730 can include：

Melody complexity acquiring unit 732, for determining audio file by the duration of peak dot number and audio file Melody complexity.

Second mood liveness determining unit 733, for being determined according to the rhythm intensity of melody complexity and audio file The mood liveness of audio file.

The specific processing of melody complexity acquiring unit 732 and the second mood liveness determining module unit 733 and beneficial Effect refers in embodiment three associated description in step 650- steps 660, will not be described in detail herein.

Embodiment seven：

The first mood liveness determining unit 731 in embodiment five can be obtained with the melody complexity in embodiment six Take unit 732, the second mood liveness determining unit 733 to be merged into same mood liveness determining module, formed and implemented Example seven, the mode that by embodiment seven user's selection can be made different obtain the mood liveness of song.

In addition, the embodiment of the present invention also provides a kind of sorter of audio file, the sorter bag of the audio file Include：

The mood liveness acquisition device of any described audio file of four-embodiment of above-described embodiment seven, for passing through Above-described embodiment one obtains the mood liveness of the audio file to the method for embodiment three；

The music file in library is classified by sort module, it is possible to achieve general need of the user to library Ask, and scene classification is carried out to song according to mood liveness, further realize personalized recommendation, so as to listen song to user During produce active influence.In addition, by marking using subjective feeling and manually what is quantified to the embodiment of the present invention Evaluation method, it can draw the conclusion of the validity of the embodiment of the present invention.

In summary, the mood liveness acquisition methods and device of audio file provided in an embodiment of the present invention, by from The peak dot number of the speech frequency in audio file is obtained in sound spectrograph, and is determined by the duration of peak dot number and audio file The mood liveness of audio file, realize and the mood liveness of audio file is quantified, enlivened for user by mood Degree choosing song provides foundation.

The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims

1. the mood liveness acquisition methods of a kind of audio file, it is characterised in that methods described includes：

Obtain the sound spectrograph of the audio file；

2. according to the method for claim 1, it is characterised in that described to be obtained from the sound spectrograph in the audio file Speech frequency peak dot number the step of include：

Peak-seeking filtering process is carried out to the sound spectrograph on frequency axis and time shaft respectively；

The peak dot number of the audio file is counted in the sound spectrograph after handling after filtering.

3. according to the method for claim 2, it is characterised in that united in the sound spectrograph after handling after filtering The step of peak dot number for counting the audio file, includes：

Obtain more than first individual peak dots in the sound spectrograph after handling after filtering；

By more than described first individual peak dots respectively corresponding amplitude compared with predetermined threshold value；

The peak dot that the amplitude is less than to the predetermined threshold value filters out, and obtains individual peak dot more than second；

The number of individual peak dot more than described second is counted, obtains the peak dot number of the speech frequency in the audio file.

4. according to any described methods of claim 1-3, it is characterised in that described to pass through the peak dot number and the audio The duration of file determines that the step of mood liveness of the audio file includes：

By the peak dot number divided by the duration of the audio file, the mood liveness of the audio file is obtained.

5. according to any described methods of claim 1-3, it is characterised in that described to pass through the peak dot number and the audio The duration of file determines that the step of mood liveness of the audio file includes：

The melody complexity of the audio file is determined by the duration of the peak dot number and the audio file；

The mood liveness of the audio file is determined according to the rhythm intensity of the melody complexity and the audio file.

6. a kind of sorting technique of audio file, it is characterised in that methods described includes：

The mood liveness of the audio file is obtained by any described methods of the claims 1-5；

The music file in library is classified according to the mood liveness.

7. the mood liveness acquisition device of a kind of audio file, it is characterised in that described device includes：

Mood liveness determining module, for determining the audio text by the duration of the peak dot number and the audio file The mood liveness of part.

8. device according to claim 7, it is characterised in that the peak dot number acquisition module includes：

Peak-seeking filter unit, for carrying out peak-seeking filtering process to the sound spectrograph on frequency axis and time shaft respectively；

Peak dot number statistic unit, for counting the peak dot of the audio file in the sound spectrograph after handling after filtering Number.

9. device according to claim 8, it is characterised in that the peak dot number statistic unit includes：

First obtains subelement, for obtaining more than first individual peak dots in the sound spectrograph after handling after filtering；

Threshold value comparing subunit, for by more than described first individual peak dots respectively corresponding amplitude compared with predetermined threshold value；

Second obtains subelement, and the peak dot for amplitude described in more than described first individual peak dots to be less than to the predetermined threshold value is filtered Remove, obtain individual peak dot more than second；

Subelement is counted, for counting the number of individual peak dot more than described second, obtains speech frequency in the audio file Peak dot number.

10. according to any described devices of claim 7-9, it is characterised in that the mood liveness determining module includes：

First mood liveness determining unit, it is described for by the duration of the peak dot number divided by the audio file, obtaining The mood liveness of audio file.

11. according to any described devices of claim 7-9, it is characterised in that the mood liveness determining module includes：

Melody complexity acquiring unit, for determining the audio text by the duration of the peak dot number and the audio file The melody complexity of part；

Second mood liveness determining unit, determined for the rhythm intensity according to the melody complexity and the audio file The mood liveness of the audio file.

12. a kind of sorter of audio file, it is characterised in that described device includes：

The mood liveness acquisition device of any described audio files of the claims 7-11, for being wanted by aforesaid right Any described methods of 1-5 are asked to obtain the mood liveness of the audio file；