CN104464726B - A kind of determination method and device of similar audio - Google Patents
A kind of determination method and device of similar audio Download PDFInfo
- Publication number
- CN104464726B CN104464726B CN201410840295.3A CN201410840295A CN104464726B CN 104464726 B CN104464726 B CN 104464726B CN 201410840295 A CN201410840295 A CN 201410840295A CN 104464726 B CN104464726 B CN 104464726B
- Authority
- CN
- China
- Prior art keywords
- audio
- frequency
- characteristic value
- specific
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a kind of determination method and device of similar audio, this method includes:Determine the specific audio frequency characteristic value sequence of target audio;According to dynamic time warping algorithm, the DTW distances between the specific audio frequency characteristic value sequence of target audio and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency are calculated respectively;Obtained N number of DTW distances are defined as to the audio-frequency fingerprint of target audio;According to default formula, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated;If the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio are more than default pre- threshold value, it is determined that target audio is similar to standard audio.Compared with prior art, it is not necessary to produce a large amount of characteristic vectors, this causes in whole audio-frequency fingerprint matching process, and without substantial amounts of characteristic storage and retrieval, machine resources expense is smaller.Moreover, the problem of local robustness that can mitigate prior art is not high enough, improves overall robustness.
Description
Technical field
The present invention relates to field of computer multimedia, the determination method and device of more particularly to a kind of similar audio.
Background technology
With developing rapidly for multimedia and internet, the generation and transmission of audio become simple and fast, audio resource
Become extremely to enrich.Some provide a user the website of audio resource, while audio is provided a user, and can also receive user
The substantial amounts of audio resource uploaded.And in these audios, often have the similar audio of content, if audio website by these
The similar audio of content is all stored, and for audio website, operating pressure is than larger.Therefore, how can determine that audio
It is whether similar, and similar audio is removed, for audio website, it appears particularly significant.
In the prior art, typically determine whether these audios are similar by the corresponding audio-frequency fingerprint of audio.Audio refers to
Line is the one group of unique identification calculated according to audio signal, and similar audio should have similar audio-frequency fingerprint.Therefore, exist
After the audio-frequency fingerprint that each audio is determined, each audio-frequency fingerprint is compared, if their audio-frequency fingerprint matches, that is,
Say that similarity reaches certain numerical value, it is possible to which it is similar audio to determine corresponding audio.
In the matching process of existing audio-frequency fingerprint, the determination of its audio-frequency fingerprint mainly passes through analyzing audio content,
Multiple local features of audio particular audio frame are extracted, such as carrying out quantization encoding, spectral difference amplitude to audio sample value,
Then using the set of multiple audio frequency characteristics of extracted particular audio frame as whole audio fingerprint.
The matching process of above-mentioned audio-frequency fingerprint is all multiple local features according to audio particular audio frame to determine sound
Frequency fingerprint, can produce a large amount of characteristic vectors, this cause in whole audio-frequency fingerprint matching process, it is necessary to substantial amounts of characteristic storage and
Retrieval, machine resources expense is larger, may be inconsistent simultaneously because taking out frame when extracting audio particular audio frame so that local Shandong
Rod is not high enough.
The content of the invention
To solve the above problems, the embodiment of the invention discloses a kind of determination method and device of similar audio.Technical side
Case is as follows:
The embodiment of the invention discloses a kind of determination method of similar audio, it can include:
Determine the specific audio frequency characteristic value sequence of target audio;
According to dynamic time warping algorithm, calculate respectively the specific audio frequency characteristic value sequence of target audio with it is predetermined
DTW distances between the specific audio frequency characteristic value sequence of N number of fundamental tone frequency;Wherein, the specific audio frequency characteristic value of N number of fundamental tone frequency
The determination method of sequence is identical with the determination method of the specific audio frequency characteristic value sequence of target audio;
Obtained N number of DTW distances are defined as to the audio-frequency fingerprint of target audio;
According to default formula, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated, its
In, the determination method of the audio-frequency fingerprint of the standard audio is identical with the determination method of the audio-frequency fingerprint of target audio;
If the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio are more than default pre- threshold value, really
The audio that sets the goal is similar to standard audio;
Wherein, the determination method of the specific audio frequency characteristic value sequence of the target audio includes:
Target audio is subjected to segment processing by the chopping rule specified, audio section is obtained;
At least two audio sections of target audio are chosen by default audio section selection rule;
It is determined that the specific audio frequency characteristic value for each audio section chosen;
The specific audio frequency characteristic value of fixed each audio section is arranged by default order, target audio is obtained
Specific audio frequency characteristic value sequence.
Wherein, it is described that target audio is subjected to segment processing by the chopping rule specified, audio section is obtained, including:
Target audio is subjected to segment processing by the time interval specified, audio section is obtained.
Wherein, the specific audio frequency characteristic value of each audio section for determining to choose, including:
It is determined that the audio intensity average for each audio section chosen, regard identified audio intensity average as each audio section
Specific audio frequency characteristic value;
Or
It is determined that the short-time zero-crossing rate for each audio section chosen, using identified short-time zero-crossing rate specifying as each audio section
Audio frequency characteristics value;
Or
It is determined that choose each audio section short-time energy, using identified short-time energy as each audio section specific audio frequency
Characteristic value.
Wherein, the specific audio frequency characteristic value by fixed each audio section is arranged by default order, is obtained
The specific audio frequency characteristic value sequence of target audio;Including:
By the specific audio frequency characteristic value of fixed each audio section according to audio section corresponding with each specific audio frequency characteristic value
Sequencing in audio is arranged, and obtains the specific audio frequency characteristic value sequence of target audio.
Wherein, according to default formula, the audio-frequency fingerprint for calculating target audio is similar to the audio-frequency fingerprint of standard audio
Degree, including:
Using below equation, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated:
Wherein, A is similarity;
XiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio
DTW distances;
YiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio
DTW distances.
The embodiment of the present invention also provides a kind of determining device of similar audio, can include:
Audio frequency characteristics value sequence determining module, the specific audio frequency characteristic value sequence for determining target audio;
DTW is apart from determining module, for according to dynamic time warping algorithm, the specific audio frequency that target audio is calculated respectively to be special
DTW distances between value indicative sequence and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency;Wherein, N number of base
The determination method phase of the determination method of the specific audio frequency characteristic value sequence of audio and the specific audio frequency characteristic value sequence of target audio
Together;
Audio-frequency fingerprint determining module, the audio-frequency fingerprint for obtained N number of DTW distances to be defined as to target audio;
Similarity calculation module, audio-frequency fingerprint and standard audio for according to default formula, calculating target audio
The similarity of audio-frequency fingerprint, wherein, the determination method of the audio-frequency fingerprint of the standard audio and the audio-frequency fingerprint of target audio
Determine that method is identical;
Similar audio determining module, if audio-frequency fingerprint and the audio-frequency fingerprint similarity of standard audio for target audio
More than default pre- threshold value, it is determined that target audio is similar to standard audio;
Wherein, the audio frequency characteristics value sequence determining module includes:
Audio parsing submodule, for target audio to be carried out into segment processing by the chopping rule specified, obtains audio section;
Audio section chooses submodule, at least two audios for choosing target audio by default audio section selection rule
Section;
Audio frequency characteristics value determination sub-module, for the specific audio frequency characteristic value for each audio section for determining to choose;
Sequence determination sub-module, for the specific audio frequency characteristic value of fixed each audio section to be carried out by default order
Arrangement, obtains the specific audio frequency characteristic value sequence of target audio.
Wherein, the audio parsing submodule, specifically for:
Target audio is subjected to segment processing by the time interval chopping rule specified, audio section is obtained.
Wherein, the audio frequency characteristics value determination sub-module, specifically for:
It is determined that the audio intensity average for each audio section chosen, using identified gray average specifying as each audio section
Audio frequency characteristics value;
Or
It is determined that the short-time zero-crossing rate for each audio section chosen, using identified short-time zero-crossing rate specifying as each audio section
Audio frequency characteristics value;
Or
It is determined that choose each audio section short-time energy, using identified short-time energy as each audio section specific audio frequency
Characteristic value.
Wherein, the sequence determination sub-module, specifically for:
By the specific audio frequency characteristic value of fixed each audio section according to audio section corresponding with each specific audio frequency characteristic value
Sequencing in audio is arranged, and obtains the specific audio frequency characteristic value sequence of target audio.
Wherein, the similarity calculation module, specifically for:
Using below equation, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated:
Wherein, A is similarity;
XiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio
DTW distances;
YiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio
DTW distances.
Technical scheme is it is determined that after the specific audio frequency characteristic value sequence of target audio, according to dynamic time warping
Algorithm, calculates the specific audio frequency characteristic value sequence of target audio and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency
DTW distances between row;And as the audio-frequency fingerprint of target audio;And then the audio-frequency fingerprint and standard for passing through target audio
The similarity of the audio-frequency fingerprint of audio determines whether target audio and standard audio are similar.Compared with prior art, it is not necessary to
A large amount of characteristic vectors are produced, this causes in whole audio-frequency fingerprint matching process, without substantial amounts of characteristic storage and retrieval, machine
Resource overhead is smaller.Even if moreover, extract audio particular audio segment when there is a situation where it is inconsistent, to technical scheme
Influence is also smaller, accordingly, it is capable to which the problem of local robustness for mitigating prior art is not high enough, improves overall robustness.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with
Other accompanying drawings are obtained according to these accompanying drawings.
A kind of determination method flow diagram for similar audio that Fig. 1 provides for the present invention;
The specific audio frequency characteristic value sequence for the target audio that Fig. 2 provides for the present invention determines method flow diagram;
A kind of structural representation of the determining device for similar audio that Fig. 3 provides for the present invention;
The structural representation for the audio frequency characteristics value sequence determining module that Fig. 4 provides for the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
A kind of determination method for similar audio that Fig. 1 provides for the present invention, this method can include:
S101, determines the specific audio frequency characteristic value sequence of target audio.
In the present invention, target audio is corresponding with standard audio.Standard audio refers to have determined that audio-frequency fingerprint,
And on the basis of the audio, evaluate other audios audio whether similar to it.Described " other audios " is target audio.
The determination method of the specific audio frequency characteristic value sequence of target audio mentioned here is as shown in Fig. 2 can include:
S201, carries out segment processing by the chopping rule specified by target audio, obtains audio section.
Want to determine audio frequency characteristics value, first choice will be segmented target audio, and the rule of segmentation can be by this area skill
Art personnel are come what is be determined, and the present invention is to the rule itself and need not be defined, it is preferred that can press target audio
The time interval chopping rule specified carries out segment processing, obtains audio section.For example, for the audio of 10 seconds, can be by 2
The time interval of second carries out segment processing, obtains 5 audio sections.
S202, at least two audio sections of target audio are chosen by default audio section selection rule.
Default audio selection rule mentioned here can be determined by those skilled in the art, this hair
It is bright to the rule itself and to be defined, as long as can choose a certain amount of according to the audio section selection rule selection rule
Audio section.For example, the audio section in sequential for odd number can be chosen, the sound for even number in sequential can also be chosen
Frequency range, can also preset a selection relational expression, for example, choose the 2N+2 audio section.It is understood that selected
Audio section it is more, the stability and accuracy rate of this method will be accordingly higher, but simultaneously, the determination for providing similar audio
The workload of the device of method also can be larger;If selected audio section is less, the stability and accuracy rate of this method will
It is relatively low, but simultaneously, the workload of the device of the determination method for providing similar audio also can be smaller, therefore, selected audio
The quantity of section can be determined by those skilled in the art according to actual conditions.The present invention is not especially limited herein.
S203, it is determined that the specific audio frequency characteristic value for each audio section chosen.
Described audio frequency characteristics value can be the arbitrary characteristics value that this area is used to describe audio section audio features, the present invention
It is not required to be defined herein.For example, it may be determined that the audio intensity (being also loudness of a sound, the sound intensity or sound intensity) for each audio section chosen
Average, using identified audio intensity average as each audio section specific audio frequency characteristic value;It should be noted that being carried here
To average refer to sensu lato average, the average of weighting can be referred to, the average that can not also be weighted.It is understood that
Audio intensity can be carried out after Mathematical treatment otherwise, be used as the specific audio frequency characteristic value of each audio section.Its is specific
Processing method, the present invention without limiting, can according to actual needs be selected by those skilled in the art herein.
The short-time zero-crossing rate of each audio section chosen can also be determined, identified short-time zero-crossing rate is regard as each audio section
Specific audio frequency characteristic value.
Can also determine choose each audio section short-time energy, using identified short-time energy as each audio section finger
Accordatura frequency characteristic value.
For the selection of audio frequency characteristics value, those skilled in the art can select according to actual conditions, and the present invention is herein not
It need to be defined.
S204, the specific audio frequency characteristic value of fixed each audio section is arranged by default order, target is obtained
The specific audio frequency characteristic value sequence of audio.
, it is necessary to which each audio frequency characteristics value is arranged in certain sequence after specific audio frequency characteristic value is obtained, one is obtained
The specific audio frequency characteristic value sequence of target audio.Default order mentioned here can be carried out true by those skilled in the art
Fixed, the present invention need not be defined herein.For example, can by the specific audio frequency characteristic value of fixed each audio section according to it is each
Sequencing of the corresponding audio section of specific audio frequency characteristic value in audio is arranged, and the specific audio frequency for obtaining target audio is special
Value indicative sequence.Specifically, when choosing the audio intensity average of each audio section as audio frequency characteristics value, if selected audio
Section is respectively the 1st, 10,20,30 sections, putting in order for the equal value sequence of audio intensity of target audio can be corresponding for the 1st section
Audio intensity average ranked first position, and the 10th section of corresponding audio intensity average ranked second position, and the 20th section of corresponding audio intensity is equal
Value ranked third position, and the 30th section of corresponding audio intensity average ranked fourth position.
S102, according to dynamic time warping algorithm, calculate respectively the specific audio frequency characteristic value sequence of target audio with advance
DTW distances between the specific audio frequency characteristic value sequence of the N number of fundamental tone frequency determined.S103, obtained N number of DTW distances are determined
For the audio-frequency fingerprint of target audio.
Firstly, it is necessary to explanation, the determination method and target sound of the specific audio frequency characteristic value sequence of N number of fundamental tone frequency
The determination method of the specific audio frequency characteristic value sequence of frequency is identical.In actual application, N occurrence can be by this area skill
Art personnel are determined according to application scenarios, and similarity is relatively low between the selection major requirement fundamental tone frequency of fundamental tone frequency, to avoid
Many redundancies.
Because the quantity for the audio section selected by different audios may be different, therefore, the audio formed is special
The quantity of each element in value indicative sequence may also be different.Therefore, it is impossible to directly by the audio frequency characteristics value of two different audios
Sequence is directly compared.The two audio frequency characteristics value sequences are only passed through into dynamic time warping algorithm (Dynamic respectively
Time Warping, referred to as:DTW) the specific audio frequency characteristic value sequence with N number of fundamental tone frequency is calculated, and respectively obtains N number of DTW
Distance, can just be compared.Specifically,
Can be calculated respectively according to dynamic time warping algorithm the specific audio frequency characteristic value sequence of standard audio with advance
DTW distances between the specific audio frequency characteristic value sequence of the N number of fundamental tone frequency determined.
Calculate the specific audio frequency characteristic value sequence of target audio and the specific audio frequency of predetermined N number of fundamental tone frequency respectively again
DTW distances between characteristic value sequence.So, for target audio, it is possible to obtained N number of DTW distances, it is used as target
The audio-frequency fingerprint of audio.For standard audio, the N number of DTW distances that can also be obtained refer to as the audio of standard audio
Line.Now, for the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio, they are with identical element
Number, therefore, it can be compared.
S104, according to default formula, the audio-frequency fingerprint for calculating target audio is similar to the audio-frequency fingerprint of standard audio
Degree.
From previous step as can be seen that the determination method of the audio-frequency fingerprint of the standard audio and the audio of target audio refer to
The determination method of line is identical.
In S103, it has been determined that go out the audio-frequency fingerprint of target audio, N number of DTW distances are included in the audio-frequency fingerprint;Together
Sample, in advance we have determined that going out the audio-frequency fingerprint of standard audio, also includes N number of DTW distances in the audio-frequency fingerprint.
The similarity for calculating the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio seeks to first be respectively compared mesh
The otherness of corresponding DTW distances in mark with phonetic symbols frequency and standard audio, corresponding DTW distances mentioned here refer to same
The DTW distances that individual fundamental tone frequency is obtained, then the otherness of each DTW distances is handled according to specified method, it is possible to
To target audio audio-frequency fingerprint and standard audio audio-frequency fingerprint similarity.
Specifically, it is possible to use below equation, the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio are calculated
Similarity:
Wherein, A is similarity;
XiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio
DTW distances;
YiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio
DTW distances.
It should be noted that above-mentioned calculating formula of similarity is a kind of preferred embodiment of the present invention, not simultaneously table
Show that similarity can only be obtained by above-mentioned calculation formula in the present invention.For example, it is also possible to be obtained by the way of following similar
Degree:
The calculation formula of similarity can voluntarily be determined according to thought of the invention by those skilled in the art, of the invention
It is not especially limited herein.
S105, if the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio are more than default pre- threshold
Value, it is determined that target audio is similar to standard audio.
In S104, after the similarity for calculating the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio, judge
Whether the similarity calculated is more than default pre- threshold value, and described default pre- threshold value can be by those skilled in the art's root
According to the calculation formula of similarity in S104, and, proven business datum is determined.The present invention is not especially limited herein.
Technical scheme is it is determined that after the specific audio frequency characteristic value sequence of target audio, according to dynamic time warping
Algorithm, calculates the specific audio frequency characteristic value sequence of target audio and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency
DTW distances between row;And as the audio-frequency fingerprint of target audio;And then the audio-frequency fingerprint and standard for passing through target audio
The similarity of the audio-frequency fingerprint of audio determines whether target audio and standard audio are similar.Compared with prior art, it is not necessary to
A large amount of characteristic vectors are produced, this causes in whole audio-frequency fingerprint matching process, without substantial amounts of characteristic storage and retrieval, machine
Resource overhead is smaller.Even if moreover, extract audio particular audio segment when there is a situation where it is inconsistent, to technical scheme
Influence is also smaller, accordingly, it is capable to which the problem of local robustness for mitigating prior art is not high enough, improves overall robustness.
Corresponding to above method embodiment, present invention also offers a kind of determining device of similar audio, such as Fig. 3 institutes
Show, can include:
Audio frequency characteristics value sequence determining module 101, the specific audio frequency characteristic value sequence for determining target audio.
DTW is apart from determining module 102, for according to dynamic time warping algorithm, the designated tone of target audio to be calculated respectively
DTW distances between frequency characteristic value sequence and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency;Wherein, the N
The determination side of the determination method of the specific audio frequency characteristic value sequence of individual fundamental tone frequency and the specific audio frequency characteristic value sequence of target audio
Method is identical;
Audio-frequency fingerprint determining module 103, the audio-frequency fingerprint for obtained N number of DTW distances to be defined as to target audio;
Similarity calculation module 104, for according to default formula, calculating the audio-frequency fingerprint and standard audio of target audio
Audio-frequency fingerprint similarity, wherein, the determination method and the audio-frequency fingerprint of target audio of the audio-frequency fingerprint of the standard audio
Determination method it is identical;
Similar audio determining module 105, if audio-frequency fingerprint and the audio-frequency fingerprint phase of standard audio for target audio
It is more than default pre- threshold value like degree, it is determined that target audio is similar to standard audio.
In actual application, audio frequency characteristics value sequence determining module 101, as shown in figure 4, can include:
Audio parsing submodule 201, at least two sounds for choosing target audio by default audio section selection rule
Frequency range;
Audio section chooses submodule 202, for choosing at least two of target audio by default audio section selection rule
Audio section;
Audio frequency characteristics value determination sub-module 203, for the specific audio frequency characteristic value for each audio section for determining to choose;
Sequence determination sub-module 204, for the specific audio frequency characteristic value of fixed each audio section to be pressed into default order
Arranged, obtain the specific audio frequency characteristic value sequence of target audio.
Technical scheme is it is determined that after the specific audio frequency characteristic value sequence of target audio, according to dynamic time warping
Algorithm, calculates the specific audio frequency characteristic value sequence of target audio and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency
DTW distances between row;And as the audio-frequency fingerprint of target audio;And then the audio-frequency fingerprint and standard for passing through target audio
The similarity of the audio-frequency fingerprint of audio determines whether target audio and standard audio are similar.Compared with prior art, it is not necessary to
A large amount of characteristic vectors are produced, this causes in whole audio-frequency fingerprint matching process, without substantial amounts of characteristic storage and retrieval, machine
Resource overhead is smaller.Even if moreover, there is a situation where when extracting audio particular audio frame section it is inconsistent, to the technical side of the present invention
Case influence is also smaller, accordingly, it is capable to which the problem of local robustness for mitigating prior art is not high enough, improves overall robustness.
In the preferred embodiment of the present invention, the audio parsing submodule 201, specifically for:
Target audio is subjected to segment processing by the time interval chopping rule specified, audio section is obtained.
In the preferred embodiment of the present invention, the audio frequency characteristics value determination sub-module 203, specifically for:
It is determined that the audio intensity average for each audio section chosen, using identified gray average specifying as each audio section
Audio frequency characteristics value;
Or
It is determined that the short-time zero-crossing rate for each audio section chosen, using identified short-time zero-crossing rate specifying as each audio section
Audio frequency characteristics value;
Or
It is determined that choose each audio section short-time energy, using identified short-time energy as each audio section specific audio frequency
Characteristic value.
In the preferred embodiment of the present invention, the sequence determination sub-module 204, specifically for:
By the specific audio frequency characteristic value of fixed each audio section according to audio section corresponding with each specific audio frequency characteristic value
Sequencing in audio is arranged, and obtains the specific audio frequency characteristic value sequence of target audio.
In the preferred embodiment of the present invention, the similarity calculation module 104, specifically for:
Using below equation, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated:
Wherein, A is similarity;
XiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio
DTW distances;
YiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio
DTW distances.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality
Body or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or deposited between operating
In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to
Nonexcludability is included, so that process, method, article or equipment including a series of key elements not only will including those
Element, but also other key elements including being not expressly set out, or also include being this process, method, article or equipment
Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that
Also there is other identical element in process, method, article or equipment including the key element.
Each embodiment in this specification is described by the way of related, identical similar portion between each embodiment
Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.It is real especially for device
Apply for example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method
Part explanation.
Can one of ordinary skill in the art will appreciate that realizing that all or part of step in above method embodiment is
To instruct the hardware of correlation to complete by program, described program can be stored in computer read/write memory medium,
The storage medium designated herein obtained, such as:ROM/RAM, magnetic disc, CD etc..
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all
Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention
It is interior.
Claims (10)
1. a kind of determination method of similar audio, it is characterised in that including:
Determine the specific audio frequency characteristic value sequence of target audio;
According to dynamic time warping algorithm, calculate respectively the specific audio frequency characteristic value sequence of target audio with it is predetermined N number of
DTW distances between the specific audio frequency characteristic value sequence of fundamental tone frequency;Wherein, the specific audio frequency characteristic value sequence of N number of fundamental tone frequency
The determination method of row is identical with the determination method of the specific audio frequency characteristic value sequence of target audio;
Obtained N number of DTW distances are defined as to the audio-frequency fingerprint of target audio;
According to default formula, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated, wherein, institute
The determination method for stating the audio-frequency fingerprint of standard audio is identical with the determination method of the audio-frequency fingerprint of target audio;
If the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio are more than default pre- threshold value, it is determined that mesh
Mark with phonetic symbols frequency is similar to standard audio;
Wherein, the determination method of the specific audio frequency characteristic value sequence of the target audio includes:
Target audio is subjected to segment processing by the chopping rule specified, audio section is obtained;
At least two audio sections of target audio are chosen by default audio section selection rule;
It is determined that the specific audio frequency characteristic value for each audio section chosen;
The specific audio frequency characteristic value of fixed each audio section is arranged by default order, specifying for target audio is obtained
Audio frequency characteristics value sequence.
2. the method as described in claim 1, it is characterised in that described to be segmented target audio by the chopping rule specified
Processing, obtains audio section, including:
Target audio is subjected to segment processing by the time interval specified, audio section is obtained.
3. the method as described in claim 1, it is characterised in that the specific audio frequency feature for each audio section that the determination is chosen
Value, including:
It is determined that the audio intensity average for each audio section chosen, using identified audio intensity average specifying as each audio section
Audio frequency characteristics value;
Or
It is determined that choose each audio section short-time zero-crossing rate, using identified short-time zero-crossing rate as each audio section specific audio frequency
Characteristic value;
Or
It is determined that choose each audio section short-time energy, using identified short-time energy as each audio section specific audio frequency feature
Value.
4. the method as described in claim 1, it is characterised in that the specific audio frequency characteristic value by fixed each audio section
Arranged by default order, obtain the specific audio frequency characteristic value sequence of target audio;Including:
By the specific audio frequency characteristic value of fixed each audio section according to audio section corresponding with each specific audio frequency characteristic value in sound
Sequencing in frequency is arranged, and obtains the specific audio frequency characteristic value sequence of target audio.
5. the method as described in claim 1, it is characterised in that according to default formula, calculate the audio-frequency fingerprint of target audio
With the similarity of the audio-frequency fingerprint of standard audio, including:
Using below equation, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated:
<mrow>
<mi>A</mi>
<mo>=</mo>
<msqrt>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>X</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>Y</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</msqrt>
</mrow>
1
Wherein, A is similarity;
XiThe DTW between specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio away from
From;
YiThe DTW between specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio away from
From.
6. a kind of determining device of similar audio, it is characterised in that including:
Audio frequency characteristics value sequence determining module, the specific audio frequency characteristic value sequence for determining target audio;
DTW is apart from determining module, for according to dynamic time warping algorithm, the specific audio frequency characteristic value of target audio to be calculated respectively
DTW distances between sequence and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency;Wherein, N number of fundamental tone frequency
Specific audio frequency characteristic value sequence determination method it is identical with the determination method of the specific audio frequency characteristic value sequence of target audio;
Audio-frequency fingerprint determining module, the audio-frequency fingerprint for obtained N number of DTW distances to be defined as to target audio;
Similarity calculation module, for according to default formula, calculating the audio-frequency fingerprint of target audio and the audio of standard audio
The similarity of fingerprint, wherein, the determination of the determination method of the audio-frequency fingerprint of the standard audio and the audio-frequency fingerprint of target audio
Method is identical;
Similar audio determining module, if be more than for the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio
Default pre- threshold value, it is determined that target audio is similar to standard audio;
Wherein, the audio frequency characteristics value sequence determining module includes:
Audio parsing submodule, for target audio to be carried out into segment processing by the chopping rule specified, obtains audio section;
Audio section chooses submodule, at least two audio sections for choosing target audio by default audio section selection rule;
Audio frequency characteristics value determination sub-module, for the specific audio frequency characteristic value for each audio section for determining to choose;
Sequence determination sub-module, for the specific audio frequency characteristic value of fixed each audio section to be arranged by default order
Row, obtain the specific audio frequency characteristic value sequence of target audio.
7. device as claimed in claim 6, it is characterised in that the audio parsing submodule, specifically for:
Target audio is subjected to segment processing by the time interval chopping rule specified, audio section is obtained.
8. device as claimed in claim 6, it is characterised in that the audio frequency characteristics value determination sub-module, specifically for:
It is determined that choose each audio section audio intensity average, using identified gray average as each audio section specific audio frequency
Characteristic value;
Or
It is determined that choose each audio section short-time zero-crossing rate, using identified short-time zero-crossing rate as each audio section specific audio frequency
Characteristic value;
Or
It is determined that choose each audio section short-time energy, using identified short-time energy as each audio section specific audio frequency feature
Value.
9. device as claimed in claim 6, it is characterised in that the sequence determination sub-module, specifically for:
By the specific audio frequency characteristic value of fixed each audio section according to audio section corresponding with each specific audio frequency characteristic value in sound
Sequencing in frequency is arranged, and obtains the specific audio frequency characteristic value sequence of target audio.
10. device as claimed in claim 6, it is characterised in that the similarity calculation module, specifically for:
Using below equation, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated:
<mrow>
<mi>A</mi>
<mo>=</mo>
<msqrt>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>X</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<msub>
<mi>Y</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</msqrt>
</mrow>
Wherein, A is similarity;
XiThe DTW between specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio away from
From;
YiThe DTW between specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio away from
From.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410840295.3A CN104464726B (en) | 2014-12-30 | 2014-12-30 | A kind of determination method and device of similar audio |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410840295.3A CN104464726B (en) | 2014-12-30 | 2014-12-30 | A kind of determination method and device of similar audio |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104464726A CN104464726A (en) | 2015-03-25 |
CN104464726B true CN104464726B (en) | 2017-10-27 |
Family
ID=52910677
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410840295.3A Active CN104464726B (en) | 2014-12-30 | 2014-12-30 | A kind of determination method and device of similar audio |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104464726B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104900238B (en) * | 2015-05-14 | 2018-08-21 | 电子科技大学 | A kind of audio real-time comparison method based on perception filtering |
CN104900239B (en) * | 2015-05-14 | 2018-08-21 | 电子科技大学 | A kind of audio real-time comparison method based on Walsh-Hadamard transform |
CN107545904B (en) * | 2016-06-23 | 2021-06-18 | 杭州海康威视数字技术股份有限公司 | Audio detection method and device |
CN106529433B (en) * | 2016-10-25 | 2019-07-16 | 天津大学 | Queue march in step degree evaluation method based on voice signal |
CN107610715B (en) * | 2017-10-10 | 2021-03-02 | 昆明理工大学 | Similarity calculation method based on multiple sound characteristics |
CN107731220B (en) | 2017-10-18 | 2019-01-22 | 北京达佳互联信息技术有限公司 | Audio identification methods, device and server |
CN107918663A (en) | 2017-11-22 | 2018-04-17 | 腾讯科技(深圳)有限公司 | audio file search method and device |
CN109192196A (en) * | 2018-08-22 | 2019-01-11 | 昆明理工大学 | A kind of audio frequency characteristics selection method of the SVM classifier of anti-noise |
CN109493853B (en) * | 2018-09-30 | 2022-03-22 | 福建星网视易信息系统有限公司 | Method for determining audio similarity and terminal |
CN110047515B (en) * | 2019-04-04 | 2021-04-20 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio identification method, device, equipment and storage medium |
CN110289013B (en) * | 2019-07-24 | 2023-12-19 | 腾讯科技(深圳)有限公司 | Multi-audio acquisition source detection method and device, storage medium and computer equipment |
CN110910899B (en) * | 2019-11-27 | 2022-04-08 | 杭州联汇科技股份有限公司 | Real-time audio signal consistency comparison detection method |
CN111081276B (en) * | 2019-12-04 | 2023-06-27 | 广州酷狗计算机科技有限公司 | Audio segment matching method, device, equipment and readable storage medium |
CN113450768A (en) * | 2021-06-25 | 2021-09-28 | 平安科技(深圳)有限公司 | Speech synthesis system evaluation method and device, readable storage medium and terminal equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021854A (en) * | 2006-10-11 | 2007-08-22 | 鲍东山 | Audio analysis system based on content |
CN101409073A (en) * | 2008-11-17 | 2009-04-15 | 浙江大学 | Method for identifying Chinese Putonghua orphaned word base on base frequency envelope |
CN102214462A (en) * | 2011-06-08 | 2011-10-12 | 北京爱说吧科技有限公司 | Method and system for estimating pronunciation |
CN103366784A (en) * | 2013-07-16 | 2013-10-23 | 湖南大学 | Multimedia playing method and device with function of voice controlling and humming searching |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20050059766A (en) * | 2003-12-15 | 2005-06-21 | 엘지전자 주식회사 | Voice recognition method using dynamic time warping |
-
2014
- 2014-12-30 CN CN201410840295.3A patent/CN104464726B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021854A (en) * | 2006-10-11 | 2007-08-22 | 鲍东山 | Audio analysis system based on content |
CN101409073A (en) * | 2008-11-17 | 2009-04-15 | 浙江大学 | Method for identifying Chinese Putonghua orphaned word base on base frequency envelope |
CN102214462A (en) * | 2011-06-08 | 2011-10-12 | 北京爱说吧科技有限公司 | Method and system for estimating pronunciation |
CN103366784A (en) * | 2013-07-16 | 2013-10-23 | 湖南大学 | Multimedia playing method and device with function of voice controlling and humming searching |
Non-Patent Citations (1)
Title |
---|
"基于数字指纹的音频检索系统的设计与实现";高昕晟;《中国学位论文全文数据库》;20140917;第3、8-12、40-41页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104464726A (en) | 2015-03-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104464726B (en) | A kind of determination method and device of similar audio | |
US11657798B2 (en) | Methods and apparatus to segment audio and determine audio segment similarities | |
CN103971689B (en) | A kind of audio identification methods and device | |
JP6732296B2 (en) | Audio information processing method and device | |
CN104282322B (en) | A kind of mobile terminal and its method and apparatus for identifying song climax parts | |
TW201246183A (en) | Extraction and matching of characteristic fingerprints from audio signals | |
CN103853836B (en) | Music retrieval method and system based on music fingerprint characteristic | |
JP2013508767A (en) | Perceptual tempo estimation with scalable complexity | |
CN103489445A (en) | Method and device for recognizing human voices in audio | |
US20240177697A1 (en) | Audio data processing method and apparatus, computer device, and storage medium | |
CN108206027A (en) | A kind of audio quality evaluation method and system | |
KR20140080429A (en) | Apparatus and Method for correcting Audio data | |
TW202109508A (en) | Sound separation method, electronic and computer readable storage medium | |
CN105047202B (en) | A kind of audio-frequency processing method, device and terminal | |
CN104143339A (en) | Music signal processing apparatus and method, and program | |
WO2019017242A1 (en) | Musical composition analysis method, musical composition analysis device and program | |
CN105283916A (en) | Digital-watermark embedding device, digital-watermark embedding method, and digital-watermark embedding program | |
CN104217731A (en) | Quick solo music score recognizing method | |
US9213703B1 (en) | Pitch shift and time stretch resistant audio matching | |
CN104900239B (en) | A kind of audio real-time comparison method based on Walsh-Hadamard transform | |
Pilia et al. | Time scaling detection and estimation in audio recordings | |
KR100766170B1 (en) | Music summarization apparatus and method using multi-level vector quantization | |
Viloria et al. | Segmentation process and spectral characteristics in the determination of musical genres | |
Jeong et al. | Dlr: Toward a deep learned rhythmic representation for music content analysis | |
Fan et al. | Notice of violation of ieee publication principles: A music identification system based on audio fingerprint |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |