Mood liveness acquisition methods and sorting technique, the device of a kind of audio file
Technical field
The present invention relates to voice processing technology field, more particularly to a kind of mood liveness acquisition methods of audio file and
Sorting technique, device.
Background technology
In the prior art, when carrying out mood analysis to audio file, audio file is analyzed, extracts audio text
The audio frequency characteristics of part, audio file is classified by using the mode of pattern-recognition.
In the mode of pattern-recognition, the feature of audio file is extracted first, for example, the strength characteristic of extraction audio file,
The feature of tamber characteristic and frequency spectrum correlation, after feature is extracted, the study for having supervision is carried out using sorter model, is instructed
After white silk model is set up, so as to realize the prediction to unknown audio file.By way of pattern-recognition, although can enter
The classification of several situations of row, but pattern-recognition can not be quantified the mood expressed by audio file.
The content of the invention
The embodiment of the present invention provides a kind of the mood liveness acquisition methods and sorting technique, device of audio file, passes through
The mood liveness of audio file is quantified, foundation is provided by mood liveness choosing song for user.
To reach above-mentioned purpose, embodiments of the invention adopt the following technical scheme that:
A kind of mood liveness acquisition methods of audio file, this includes:
Obtain the sound spectrograph of the audio file;
The peak dot number of the speech frequency in the audio file is obtained from the sound spectrograph;
The mood liveness of the audio file is determined by the duration of the peak dot number and the audio file.
A kind of sorting technique of audio file, this method include:
Method described in Pass through above-mentioned technical proposal obtains the mood liveness of the audio file;
The music file in library is classified according to the mood liveness.
A kind of mood liveness acquisition device of audio file, this includes:
Sound spectrograph acquisition module, for obtaining the sound spectrograph of the audio file;
Peak dot number acquisition module, for obtaining the peak dot of the speech frequency in the audio file from the sound spectrograph
Number;
Mood liveness determining module, for determining the sound by the duration of the peak dot number and the audio file
The mood liveness of frequency file.
A kind of sorter of audio file, the device include:
Described in the mood liveness acquisition device Pass through above-mentioned technical proposal of audio file described in above-mentioned technical proposal
Mood liveness acquisition methods obtain the mood liveness of the audio file;
Sort module, for being classified according to the mood liveness to the music file in library.
The mood liveness acquisition methods and sorting technique, device of a kind of audio file provided in an embodiment of the present invention, lead to
The peak dot number of the speech frequency obtained from sound spectrograph in audio file is crossed, and passes through peak dot number and the duration of audio file
The mood liveness of audio file is determined, realizes and the mood liveness of audio file is quantified, passes through mood for user
Liveness choosing song provides foundation.
Brief description of the drawings
Fig. 1 is the schematic diagram of a sound spectrograph provided in an embodiment of the present invention;
Fig. 2 is the schematic flow sheet of the mood liveness acquisition methods for the audio file that the embodiment of the present invention one provides;
Fig. 3 is the schematic flow sheet of the mood liveness acquisition methods for the audio file that the embodiment of the present invention two provides;
Fig. 4 is sound spectrograph of the embodiment illustrated in fig. 3 before step 320-330 filtering in Time And Frequency reference axis
Schematic diagram;
Fig. 5 is that embodiment illustrated in fig. 3 passes through step 320-330 filtered sound spectrograph in Time And Frequency reference axis
Schematic diagram;
Fig. 6 is the schematic flow sheet of the mood liveness acquisition methods for the audio file that the embodiment of the present invention three provides;
Fig. 7 is the structural representation of the mood liveness acquisition device for the audio file that the embodiment of the present invention four provides;
Fig. 8 is the structural representation of the mood liveness acquisition device for the audio file that the embodiment of the present invention five provides;
Fig. 9 is the structural representation of the mood liveness acquisition device for the audio file that the embodiment of the present invention six provides.
Embodiment
Below in conjunction with the accompanying drawings to the mood liveness acquisition methods of audio file provided in an embodiment of the present invention and classification side
Method, device are described in detail.
Fig. 1 is the schematic diagram of a sound spectrograph provided in an embodiment of the present invention, as shown in figure 1, when the X-axis of sound spectrograph represents
Between (axle in direction obliquely in corresponding diagram 1), Y-axis represent frequency (axle of horizontal right direction in corresponding diagram 1), Z axis represent language
Sound data capacity, voice signal enter line translation in time domain and obtain the spectrogram of frequency domain, and the spectrogram is sound spectrograph.
In fig. 1, it may be seen that the saturate point compared with surrounding, the saturate point represents current in voice signal
For point relative to the point amplitude highest point on periphery, the point can be as the peak dot described in the embodiment of the present invention, it can be seen that, this
Peak dot described in inventive embodiments is simultaneously not only determined by corresponding range value, but the amplitude for the point of surrounding
It is worth bigger point.
Embodiments of the invention are described more detail below.
Embodiment one:
Fig. 2 is the schematic flow sheet of the mood liveness acquisition methods for the audio file that the embodiment of the present invention one provides, such as
Shown in Fig. 2, the embodiment of the present invention comprises the following steps:
Step 210, the sound spectrograph of audio file is obtained.
Step 220, the peak dot number of the speech frequency in audio file is obtained from sound spectrograph.
Step 230, the mood liveness of audio file is determined by the duration of peak dot number and audio file.
Processing in step 210 can be specially:Audio file is decoded, and with predetermined sample frequency (example
Such as, 44100Hz) resampling is carried out to decoded signal, the audio after resampling is merged into monophonic, to the sound after merging
Frequency carries out framing, and (for example, frame length is 2048,256) interframe is divided into, and carry out Hanning window processing, to the sound after above-mentioned processing
Frequency carries out Fourier transform, obtains sound spectrograph.
The mood liveness acquisition methods of audio file provided in an embodiment of the present invention, by obtaining audio from sound spectrograph
The peak dot number of speech frequency in file, and determine that the mood of audio file is lived by the duration of peak dot number and audio file
Jerk, realize and the mood liveness of audio file is quantified, foundation is provided by mood liveness choosing song for user.
Embodiment two:
Fig. 3 is the schematic flow sheet of the mood liveness acquisition methods for the audio file that the embodiment of the present invention two provides, such as
Shown in Fig. 3, the embodiment of the present invention comprises the following steps:
Step 310, the sound spectrograph of audio file is obtained.
Step 320, peak-seeking filtering process is carried out to sound spectrograph in frequency axis by the first wave filter.
Step 330, peak-seeking filtering process is carried out to sound spectrograph in time shaft by the second wave filter.
Step 340, the peak dot number of audio file is counted in the sound spectrograph after handling after filtering.
Step 350, by peak dot number divided by the duration of audio file, the mood liveness of audio file is obtained.
Wherein, the specific processing mode of the step 310 in the embodiment of the present invention may be referred to the step 210 of embodiment one
Description, will not be repeated here.
In step 320, the first wave filter can be set by filter function, and by the first wave filter in frequency axis
Peak-seeking filtering process is carried out to sound spectrograph.Filter function described in the embodiment of the present invention refers to formula (1):
Wherein,yiRepresent i-th on the frequency axis of sound spectrograph
Value, y 'iI-th of value on the frequency axis of the sound spectrograph after filtering process is represented, α represents an empirical value, can be according to audio
The characteristics of determine α value, σ represents the coefficient of Gaussian function, and m represents the half of width of the first wave filter, and 2m+1 represents the first filter
The width of ripple device;It is possible to further by adjusting parameter α amplitude can be made smaller but the range value of more significant peak dot become
Greatly.
In a step 330, the second wave filter equally can be filtered place to sound spectrograph on a timeline by above-mentioned formula (1)
Reason.Correspondingly, in above-mentioned formula (1), yiRepresent i-th of value on the time shaft of sound spectrograph, y 'iRepresent the language after filtering process
I-th of value on the time shaft of spectrogram, α represent an empirical value, and α value can be determined according to the characteristics of audio, and σ represents Gauss
The coefficient of function, now the half of width of m the second wave filters of expression in above-mentioned formula (1), 2m+1 represent the width of the second wave filter
Degree;Further, by adjusting parameter α amplitude can be made smaller but the range value of more significant peak dot becomes big.This area skill
Art personnel it is understood that can be according on time shaft the characteristics of the parameter value different from said frequencies axle is set.
Due to the speed difference that voice signal is decayed in time shaft with frequency axis, therefore the embodiment of the present invention passes through above-mentioned step
Rapid 320 are respectively adopted the first wave filter and the second wave filter with step 330 carries out peak-seeking filtering respectively in frequency axis and time shaft
Processing so that filtering has more specific aim, avoids filtering out real peak dot, or is protected using ghost peak as real peak dot
Stay, so that the statistics of peak dot number is more accurate.
In addition, in order to obtain more preferable filter effect, the embodiment of the present invention can also be passed through by the way that step 330 is performed a plurality of times
Second wave filter carries out secondary filtering processing or more filtering process in time shaft to sound spectrograph, so that peak dot is more
It is sharp so that the number of follow-up statistics peak dot is more accurate.
Alternatively, in above-mentioned steps 320 and step 330, because the purpose of peak-seeking filtering is to obtain sound spectrograph
In the point larger with respect to the frequency values of surrounding, therefore above-mentioned specific filter function can not form the limit to the embodiment of the present invention
System, the embodiment of the present invention can also be filtered processing by other similar filter functions, as long as acquisition peak dot can be improved
Accuracy.
In step 340, the step of counting the peak dot number of audio file in the sound spectrograph after handling after filtering can
To be specially:More than first individual peak dots in the sound spectrograph after handling after filtering are obtained, by corresponding to more than first individual peak dots difference
Compared with predetermined threshold value, the peak dot that amplitude is less than to predetermined threshold value filters out amplitude, obtains individual peak dot more than second, statistics second
The number of multiple peak dots, obtain the peak dot number of the speech frequency in audio file.For example, the sound spectrograph after handling after filtering
In have multiple peak dots (for example, actually 100, but now not yet counted to the number of peak dot), by 100 peaks
Compared with predetermined threshold value, the peak dot that amplitude is less than to predetermined threshold value filters out corresponding amplitude point, obtains filtering out peak dot respectively
Multiple peak dots (for example, actually 50 peak dots, but now not yet counted to the number of peak dot) afterwards, pass through and count filter
Except the number of multiple peak dots after peak dot, the peak dot number for obtaining the speech frequency in audio file is 50.
Further, due to foregoing filtering process can be lifted amplitude it is smaller but for its adjacent point amplitude
The amplitude of larger peak dot, so as to which the embodiment of the present invention can use unified predetermined threshold value.
Fig. 4 is schematic diagram of the sound spectrograph in Time And Frequency reference axis before step 320-330 filtering, and Fig. 5 is
By schematic diagram of the step 320-330 filtered sound spectrograph in Time And Frequency reference axis.As shown in Figure 4 and Figure 5, filter
The burr of waveform is more in the sound spectrograph of wavefront, peak dot unobvious, and the burr of waveform substantially eliminates in filtered sound spectrograph, peak
Point is apparent.It can therefore be seen that by above-mentioned steps 320-330 filtering process, make the burr of waveform in sound spectrograph basic
Eliminate, peak dot is more obvious, is that subsequent statistical peak dot number can be more accurate.
Embodiment three:
Fig. 6 is the schematic flow sheet of the mood liveness acquisition methods for the audio file that the embodiment of the present invention three provides, such as
Shown in Fig. 6, the mood liveness acquisition methods of the audio file of the embodiment of the present invention comprise the following steps:
Step 610, the sound spectrograph of audio file is obtained.
Step 620, peak-seeking filtering process is carried out to sound spectrograph in frequency axis by the first wave filter.
Step 630, peak-seeking filtering process is carried out to sound spectrograph in time shaft by the second wave filter.
Step 640, the peak dot number of audio file is counted in the sound spectrograph after handling after filtering.
Step 650, the melody complexity of audio file is determined by the duration of peak dot number and audio file.
Step 660, the mood liveness of audio file is determined according to the rhythm intensity of rotation complexity and audio file.
The specific processing mode of step 610 in the embodiment of the present invention may be referred to the description of the step 210 of embodiment one,
It will not be repeated here.
The specific processing of step 620- steps 640 in the embodiment of the present invention and advantageous effects may be referred to real implementation
The description of the step 320- steps 340 of example two, will not be repeated here.
In step 650, by being analyzed song files it is known that the melody complexity of song is higher, the song
Mood liveness is also higher corresponding to song.It can be proved by related experiment, song in general melody complexity is in 0.3-1.3
Between.
Because melody complexity and mood liveness are closely related, and obvious timing can directly strengthen mood and enliven
Degree, therefore, in step 660, it can be determined according to the rhythm intensity of rotation complexity and audio file in the embodiment of the present invention
The mood liveness of the audio file.
The embodiment of the present invention can obtain the mood liveness of audio file by formula (2):
A=C+X*0.2 (2)
Wherein, A represents mood liveness, and C represents melody complexity, and B represents rhythm intensity, X=C*C*B, if X>=1, X
=1.In addition, the rhythm intensity in the embodiment of the present invention can be obtained by prior art, the embodiment of the present invention is no longer described in detail.
The embodiment of the present invention determines that the mood of audio file is enlivened according to the rhythm intensity of rotation complexity and audio file
Degree, by the way that another factor rhythm intensity for the liveness that sways the emotion calculate to the mood liveness of audio file, so as to
So that mood liveness is more accurate, reliable.
The embodiment of the present invention additionally provides a kind of sorting technique of audio file, and the sorting technique of the audio file is included such as
Lower step:
First, the mood for obtaining audio file to any described method of embodiment three by above-described embodiment one is enlivened
Degree;
Secondly, the music file in library is classified according to mood liveness.
The music file in library is classified by mood liveness, it is possible to achieve general need of the user to library
Ask, and scene classification is carried out to song according to mood liveness, further realize personalized recommendation, so as to listen song to user
During produce active influence.In addition, by marking using subjective feeling and manually what is quantified to the embodiment of the present invention
Evaluation method, it can draw the conclusion of the validity of the embodiment of the present invention.
Example IV:
Fig. 7 is the structural representation of the mood liveness acquisition device for the audio file that the embodiment of the present invention four provides, such as
Shown in Fig. 7, the mood liveness acquisition device of the audio file of the embodiment of the present invention includes:
Sound spectrograph acquisition module 710, for obtaining the sound spectrograph of audio file.
Peak dot number acquisition module 720, for obtaining the peak dot number of the speech frequency in audio file from sound spectrograph.
Mood liveness determining module 730, for determining audio file by the duration of peak dot number and audio file
Mood liveness.
Sound spectrograph acquisition module 710 can decode to audio file, and with predetermined sample frequency (for example,
Resampling 44100Hz) is carried out to decoded signal, the audio after resampling is merged into monophonic, to the audio after merging
Carrying out framing, (for example, frame length is 2048,256) interframe is divided into, and carry out Hanning window processing, to the audio after above-mentioned processing
Fourier transform is carried out, obtains sound spectrograph.
The mood liveness acquisition device of audio file provided in an embodiment of the present invention, passes through sound spectrograph acquisition module 710
The peak dot number of the speech frequency in audio file is obtained with peak dot number acquisition module 720, and is determined by mood liveness
Module 730 determines the mood liveness of audio file according to the duration of peak dot number and audio file, realizes to audio file
Mood liveness quantified, for user by mood liveness choosing song foundation is provided.
Embodiment five:
Fig. 8 is the structural representation of the mood liveness acquisition device for the audio file that the embodiment of the present invention five provides, such as
Shown in Fig. 8, the mood liveness acquisition device of the audio file of the embodiment of the present invention includes:
Sound spectrograph acquisition module 710, for obtaining the sound spectrograph of audio file.
Peak dot number acquisition module 720, for obtaining the peak dot number of the speech frequency in audio file from sound spectrograph.
Mood liveness determining module 730, for determining audio file by the duration of peak dot number and audio file
Mood liveness.
Further, peak dot number acquisition module 720 can include:
Peak-seeking filter unit 721, for being carried out respectively on frequency axis and time shaft to the sound spectrograph at peak-seeking filtering
Reason.
Peak dot number statistic unit 722, for counting the peak dot of audio file in the sound spectrograph after handling after filtering
Number.
Wherein, peak-seeking filter unit 721 specifically handles the step 320-330 referred in embodiment two, no longer superfluous herein
State.
Further, peak dot number statistic unit 722 can include:
First obtains subelement 7221, for obtaining more than first individual peak dots in the sound spectrograph after handling after filtering;
Threshold value comparing subunit 7222, for corresponding amplitude to be compared with predetermined threshold value respectively by more than first individual peak dots
Compared with;
Second obtains subelement 7223, and the peak dot for amplitude in more than first individual peak dots to be less than to predetermined threshold value filters out, and obtains
To more than second individual peak dots;
Subelement 7224 is counted, for counting the number of more than second individual peak dots, obtains speech frequency in audio file
Peak dot number.
First obtains subelement 7221, threshold value comparing subunit 7222, second obtains subelement 7223 and statistics subelement
7224 specific processing refers to the associated description in the step 340 in embodiment two, will not be repeated here.
Further, mood liveness determining module 730 can include:
First mood liveness determining unit 731, for by the duration of peak dot number divided by audio file, obtaining audio text
The mood liveness of part.
By above-mentioned to peak dot number acquisition module 720, peak dot number statistic unit 723 and mood liveness determining module
Advantageous effects caused by 730 further supplement are referred in embodiment two described in step 320- steps 350
Advantageous effects, it will not be repeated here.
Embodiment six:
Fig. 9 is the structural representation of the mood liveness acquisition device for the audio file that the embodiment of the present invention six provides, such as
Shown in Fig. 9, the mood liveness acquisition device of the audio file of the embodiment of the present invention includes:
Sound spectrograph acquisition module 710, for obtaining the sound spectrograph of audio file.
Peak dot number acquisition module 720, for obtaining the peak dot number of the speech frequency in audio file from sound spectrograph.
Mood liveness determining module 730, for determining the sound by the duration of the peak dot number and audio file
The mood liveness of frequency file.
Further, peak dot number acquisition module 720 can include:
First filter unit 721, for being carried out respectively on frequency axis and time shaft to the sound spectrograph at peak-seeking filtering
Reason.
Peak dot number statistic unit 722, for counting the peak dot of audio file in the sound spectrograph after handling after filtering
Number.
Further, peak dot number statistic unit 722 can include:
First obtains subelement 7221, for obtaining more than first individual peak dots in the sound spectrograph after handling after filtering.
Threshold value comparing subunit 7222, for corresponding amplitude to be compared with predetermined threshold value respectively by more than first individual peak dots
Compared with.
Second obtains subelement 7223, and the peak dot for amplitude in more than first individual peak dots to be less than to predetermined threshold value filters out, and obtains
To more than second individual peak dots.
Subelement 7224 is counted, for counting the number of more than second individual peak dots, obtains speech frequency in audio file
Peak dot number.
Further, mood liveness determining module 730 can include:
Melody complexity acquiring unit 732, for determining audio file by the duration of peak dot number and audio file
Melody complexity.
Second mood liveness determining unit 733, for being determined according to the rhythm intensity of melody complexity and audio file
The mood liveness of audio file.
The specific processing of melody complexity acquiring unit 732 and the second mood liveness determining module unit 733 and beneficial
Effect refers in embodiment three associated description in step 650- steps 660, will not be described in detail herein.
Embodiment seven:
The first mood liveness determining unit 731 in embodiment five can be obtained with the melody complexity in embodiment six
Take unit 732, the second mood liveness determining unit 733 to be merged into same mood liveness determining module, formed and implemented
Example seven, the mode that by embodiment seven user's selection can be made different obtain the mood liveness of song.
In addition, the embodiment of the present invention also provides a kind of sorter of audio file, the sorter bag of the audio file
Include:
The mood liveness acquisition device of any described audio file of four-embodiment of above-described embodiment seven, for passing through
Above-described embodiment one obtains the mood liveness of the audio file to the method for embodiment three;
Sort module, for being classified according to the mood liveness to the music file in library.
The music file in library is classified by sort module, it is possible to achieve general need of the user to library
Ask, and scene classification is carried out to song according to mood liveness, further realize personalized recommendation, so as to listen song to user
During produce active influence.In addition, by marking using subjective feeling and manually what is quantified to the embodiment of the present invention
Evaluation method, it can draw the conclusion of the validity of the embodiment of the present invention.
In summary, the mood liveness acquisition methods and device of audio file provided in an embodiment of the present invention, by from
The peak dot number of the speech frequency in audio file is obtained in sound spectrograph, and is determined by the duration of peak dot number and audio file
The mood liveness of audio file, realize and the mood liveness of audio file is quantified, enlivened for user by mood
Degree choosing song provides foundation.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained
Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.