CN104464726B

CN104464726B - A kind of determination method and device of similar audio

Info

Publication number: CN104464726B
Application number: CN201410840295.3A
Authority: CN
Inventors: 刘祁跃; 李典
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2014-12-30
Filing date: 2014-12-30
Publication date: 2017-10-27
Anticipated expiration: 2034-12-30
Also published as: CN104464726A

Abstract

The embodiment of the invention discloses a kind of determination method and device of similar audio, this method includes：Determine the specific audio frequency characteristic value sequence of target audio；According to dynamic time warping algorithm, the DTW distances between the specific audio frequency characteristic value sequence of target audio and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency are calculated respectively；Obtained N number of DTW distances are defined as to the audio-frequency fingerprint of target audio；According to default formula, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated；If the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio are more than default pre- threshold value, it is determined that target audio is similar to standard audio.Compared with prior art, it is not necessary to produce a large amount of characteristic vectors, this causes in whole audio-frequency fingerprint matching process, and without substantial amounts of characteristic storage and retrieval, machine resources expense is smaller.Moreover, the problem of local robustness that can mitigate prior art is not high enough, improves overall robustness.

Description

A kind of determination method and device of similar audio

Technical field

The present invention relates to field of computer multimedia, the determination method and device of more particularly to a kind of similar audio.

Background technology

With developing rapidly for multimedia and internet, the generation and transmission of audio become simple and fast, audio resource Become extremely to enrich.Some provide a user the website of audio resource, while audio is provided a user, and can also receive user The substantial amounts of audio resource uploaded.And in these audios, often have the similar audio of content, if audio website by these The similar audio of content is all stored, and for audio website, operating pressure is than larger.Therefore, how can determine that audio It is whether similar, and similar audio is removed, for audio website, it appears particularly significant.

In the prior art, typically determine whether these audios are similar by the corresponding audio-frequency fingerprint of audio.Audio refers to Line is the one group of unique identification calculated according to audio signal, and similar audio should have similar audio-frequency fingerprint.Therefore, exist After the audio-frequency fingerprint that each audio is determined, each audio-frequency fingerprint is compared, if their audio-frequency fingerprint matches, that is, Say that similarity reaches certain numerical value, it is possible to which it is similar audio to determine corresponding audio.

In the matching process of existing audio-frequency fingerprint, the determination of its audio-frequency fingerprint mainly passes through analyzing audio content, Multiple local features of audio particular audio frame are extracted, such as carrying out quantization encoding, spectral difference amplitude to audio sample value, Then using the set of multiple audio frequency characteristics of extracted particular audio frame as whole audio fingerprint.

The matching process of above-mentioned audio-frequency fingerprint is all multiple local features according to audio particular audio frame to determine sound Frequency fingerprint, can produce a large amount of characteristic vectors, this cause in whole audio-frequency fingerprint matching process, it is necessary to substantial amounts of characteristic storage and Retrieval, machine resources expense is larger, may be inconsistent simultaneously because taking out frame when extracting audio particular audio frame so that local Shandong Rod is not high enough.

The content of the invention

To solve the above problems, the embodiment of the invention discloses a kind of determination method and device of similar audio.Technical side Case is as follows：

The embodiment of the invention discloses a kind of determination method of similar audio, it can include：

Determine the specific audio frequency characteristic value sequence of target audio；

According to dynamic time warping algorithm, calculate respectively the specific audio frequency characteristic value sequence of target audio with it is predetermined DTW distances between the specific audio frequency characteristic value sequence of N number of fundamental tone frequency；Wherein, the specific audio frequency characteristic value of N number of fundamental tone frequency The determination method of sequence is identical with the determination method of the specific audio frequency characteristic value sequence of target audio；

Obtained N number of DTW distances are defined as to the audio-frequency fingerprint of target audio；

According to default formula, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated, its In, the determination method of the audio-frequency fingerprint of the standard audio is identical with the determination method of the audio-frequency fingerprint of target audio；

If the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio are more than default pre- threshold value, really The audio that sets the goal is similar to standard audio；

Wherein, the determination method of the specific audio frequency characteristic value sequence of the target audio includes：

Target audio is subjected to segment processing by the chopping rule specified, audio section is obtained；

At least two audio sections of target audio are chosen by default audio section selection rule；

It is determined that the specific audio frequency characteristic value for each audio section chosen；

The specific audio frequency characteristic value of fixed each audio section is arranged by default order, target audio is obtained Specific audio frequency characteristic value sequence.

Wherein, it is described that target audio is subjected to segment processing by the chopping rule specified, audio section is obtained, including：

Target audio is subjected to segment processing by the time interval specified, audio section is obtained.

Wherein, the specific audio frequency characteristic value of each audio section for determining to choose, including：

It is determined that the audio intensity average for each audio section chosen, regard identified audio intensity average as each audio section Specific audio frequency characteristic value；

Or

It is determined that the short-time zero-crossing rate for each audio section chosen, using identified short-time zero-crossing rate specifying as each audio section Audio frequency characteristics value；

Or

It is determined that choose each audio section short-time energy, using identified short-time energy as each audio section specific audio frequency Characteristic value.

Wherein, the specific audio frequency characteristic value by fixed each audio section is arranged by default order, is obtained The specific audio frequency characteristic value sequence of target audio；Including：

By the specific audio frequency characteristic value of fixed each audio section according to audio section corresponding with each specific audio frequency characteristic value Sequencing in audio is arranged, and obtains the specific audio frequency characteristic value sequence of target audio.

Wherein, according to default formula, the audio-frequency fingerprint for calculating target audio is similar to the audio-frequency fingerprint of standard audio Degree, including：

Using below equation, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated：

Wherein, A is similarity；

X_iBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio DTW distances；

Y_iBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio DTW distances.

The embodiment of the present invention also provides a kind of determining device of similar audio, can include：

Audio frequency characteristics value sequence determining module, the specific audio frequency characteristic value sequence for determining target audio；

DTW is apart from determining module, for according to dynamic time warping algorithm, the specific audio frequency that target audio is calculated respectively to be special DTW distances between value indicative sequence and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency；Wherein, N number of base The determination method phase of the determination method of the specific audio frequency characteristic value sequence of audio and the specific audio frequency characteristic value sequence of target audio Together；

Audio-frequency fingerprint determining module, the audio-frequency fingerprint for obtained N number of DTW distances to be defined as to target audio；

Similarity calculation module, audio-frequency fingerprint and standard audio for according to default formula, calculating target audio The similarity of audio-frequency fingerprint, wherein, the determination method of the audio-frequency fingerprint of the standard audio and the audio-frequency fingerprint of target audio Determine that method is identical；

Similar audio determining module, if audio-frequency fingerprint and the audio-frequency fingerprint similarity of standard audio for target audio More than default pre- threshold value, it is determined that target audio is similar to standard audio；

Wherein, the audio frequency characteristics value sequence determining module includes：

Audio parsing submodule, for target audio to be carried out into segment processing by the chopping rule specified, obtains audio section；

Audio section chooses submodule, at least two audios for choosing target audio by default audio section selection rule Section；

Audio frequency characteristics value determination sub-module, for the specific audio frequency characteristic value for each audio section for determining to choose；

Sequence determination sub-module, for the specific audio frequency characteristic value of fixed each audio section to be carried out by default order Arrangement, obtains the specific audio frequency characteristic value sequence of target audio.

Wherein, the audio parsing submodule, specifically for：

Target audio is subjected to segment processing by the time interval chopping rule specified, audio section is obtained.

Wherein, the audio frequency characteristics value determination sub-module, specifically for：

It is determined that the audio intensity average for each audio section chosen, using identified gray average specifying as each audio section Audio frequency characteristics value；

Or

Wherein, the sequence determination sub-module, specifically for：

Wherein, the similarity calculation module, specifically for：

Wherein, A is similarity；

Technical scheme is it is determined that after the specific audio frequency characteristic value sequence of target audio, according to dynamic time warping Algorithm, calculates the specific audio frequency characteristic value sequence of target audio and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency DTW distances between row；And as the audio-frequency fingerprint of target audio；And then the audio-frequency fingerprint and standard for passing through target audio The similarity of the audio-frequency fingerprint of audio determines whether target audio and standard audio are similar.Compared with prior art, it is not necessary to A large amount of characteristic vectors are produced, this causes in whole audio-frequency fingerprint matching process, without substantial amounts of characteristic storage and retrieval, machine Resource overhead is smaller.Even if moreover, extract audio particular audio segment when there is a situation where it is inconsistent, to technical scheme Influence is also smaller, accordingly, it is capable to which the problem of local robustness for mitigating prior art is not high enough, improves overall robustness.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.

A kind of determination method flow diagram for similar audio that Fig. 1 provides for the present invention；

The specific audio frequency characteristic value sequence for the target audio that Fig. 2 provides for the present invention determines method flow diagram；

A kind of structural representation of the determining device for similar audio that Fig. 3 provides for the present invention；

The structural representation for the audio frequency characteristics value sequence determining module that Fig. 4 provides for the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.

A kind of determination method for similar audio that Fig. 1 provides for the present invention, this method can include：

S101, determines the specific audio frequency characteristic value sequence of target audio.

In the present invention, target audio is corresponding with standard audio.Standard audio refers to have determined that audio-frequency fingerprint, And on the basis of the audio, evaluate other audios audio whether similar to it.Described " other audios " is target audio.

The determination method of the specific audio frequency characteristic value sequence of target audio mentioned here is as shown in Fig. 2 can include：

S201, carries out segment processing by the chopping rule specified by target audio, obtains audio section.

Want to determine audio frequency characteristics value, first choice will be segmented target audio, and the rule of segmentation can be by this area skill Art personnel are come what is be determined, and the present invention is to the rule itself and need not be defined, it is preferred that can press target audio The time interval chopping rule specified carries out segment processing, obtains audio section.For example, for the audio of 10 seconds, can be by 2 The time interval of second carries out segment processing, obtains 5 audio sections.

S202, at least two audio sections of target audio are chosen by default audio section selection rule.

Default audio selection rule mentioned here can be determined by those skilled in the art, this hair It is bright to the rule itself and to be defined, as long as can choose a certain amount of according to the audio section selection rule selection rule Audio section.For example, the audio section in sequential for odd number can be chosen, the sound for even number in sequential can also be chosen Frequency range, can also preset a selection relational expression, for example, choose the 2N+2 audio section.It is understood that selected Audio section it is more, the stability and accuracy rate of this method will be accordingly higher, but simultaneously, the determination for providing similar audio The workload of the device of method also can be larger；If selected audio section is less, the stability and accuracy rate of this method will It is relatively low, but simultaneously, the workload of the device of the determination method for providing similar audio also can be smaller, therefore, selected audio The quantity of section can be determined by those skilled in the art according to actual conditions.The present invention is not especially limited herein.

S203, it is determined that the specific audio frequency characteristic value for each audio section chosen.

Described audio frequency characteristics value can be the arbitrary characteristics value that this area is used to describe audio section audio features, the present invention It is not required to be defined herein.For example, it may be determined that the audio intensity (being also loudness of a sound, the sound intensity or sound intensity) for each audio section chosen Average, using identified audio intensity average as each audio section specific audio frequency characteristic value；It should be noted that being carried here To average refer to sensu lato average, the average of weighting can be referred to, the average that can not also be weighted.It is understood that Audio intensity can be carried out after Mathematical treatment otherwise, be used as the specific audio frequency characteristic value of each audio section.Its is specific Processing method, the present invention without limiting, can according to actual needs be selected by those skilled in the art herein.

The short-time zero-crossing rate of each audio section chosen can also be determined, identified short-time zero-crossing rate is regard as each audio section Specific audio frequency characteristic value.

Can also determine choose each audio section short-time energy, using identified short-time energy as each audio section finger Accordatura frequency characteristic value.

For the selection of audio frequency characteristics value, those skilled in the art can select according to actual conditions, and the present invention is herein not It need to be defined.

S204, the specific audio frequency characteristic value of fixed each audio section is arranged by default order, target is obtained The specific audio frequency characteristic value sequence of audio.

, it is necessary to which each audio frequency characteristics value is arranged in certain sequence after specific audio frequency characteristic value is obtained, one is obtained The specific audio frequency characteristic value sequence of target audio.Default order mentioned here can be carried out true by those skilled in the art Fixed, the present invention need not be defined herein.For example, can by the specific audio frequency characteristic value of fixed each audio section according to it is each Sequencing of the corresponding audio section of specific audio frequency characteristic value in audio is arranged, and the specific audio frequency for obtaining target audio is special Value indicative sequence.Specifically, when choosing the audio intensity average of each audio section as audio frequency characteristics value, if selected audio Section is respectively the 1st, 10,20,30 sections, putting in order for the equal value sequence of audio intensity of target audio can be corresponding for the 1st section Audio intensity average ranked first position, and the 10th section of corresponding audio intensity average ranked second position, and the 20th section of corresponding audio intensity is equal Value ranked third position, and the 30th section of corresponding audio intensity average ranked fourth position.

S102, according to dynamic time warping algorithm, calculate respectively the specific audio frequency characteristic value sequence of target audio with advance DTW distances between the specific audio frequency characteristic value sequence of the N number of fundamental tone frequency determined.S103, obtained N number of DTW distances are determined For the audio-frequency fingerprint of target audio.

Firstly, it is necessary to explanation, the determination method and target sound of the specific audio frequency characteristic value sequence of N number of fundamental tone frequency The determination method of the specific audio frequency characteristic value sequence of frequency is identical.In actual application, N occurrence can be by this area skill Art personnel are determined according to application scenarios, and similarity is relatively low between the selection major requirement fundamental tone frequency of fundamental tone frequency, to avoid Many redundancies.

Because the quantity for the audio section selected by different audios may be different, therefore, the audio formed is special The quantity of each element in value indicative sequence may also be different.Therefore, it is impossible to directly by the audio frequency characteristics value of two different audios Sequence is directly compared.The two audio frequency characteristics value sequences are only passed through into dynamic time warping algorithm (Dynamic respectively Time Warping, referred to as：DTW) the specific audio frequency characteristic value sequence with N number of fundamental tone frequency is calculated, and respectively obtains N number of DTW Distance, can just be compared.Specifically,

Can be calculated respectively according to dynamic time warping algorithm the specific audio frequency characteristic value sequence of standard audio with advance DTW distances between the specific audio frequency characteristic value sequence of the N number of fundamental tone frequency determined.

Calculate the specific audio frequency characteristic value sequence of target audio and the specific audio frequency of predetermined N number of fundamental tone frequency respectively again DTW distances between characteristic value sequence.So, for target audio, it is possible to obtained N number of DTW distances, it is used as target The audio-frequency fingerprint of audio.For standard audio, the N number of DTW distances that can also be obtained refer to as the audio of standard audio Line.Now, for the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio, they are with identical element Number, therefore, it can be compared.

S104, according to default formula, the audio-frequency fingerprint for calculating target audio is similar to the audio-frequency fingerprint of standard audio Degree.

From previous step as can be seen that the determination method of the audio-frequency fingerprint of the standard audio and the audio of target audio refer to The determination method of line is identical.

In S103, it has been determined that go out the audio-frequency fingerprint of target audio, N number of DTW distances are included in the audio-frequency fingerprint；Together Sample, in advance we have determined that going out the audio-frequency fingerprint of standard audio, also includes N number of DTW distances in the audio-frequency fingerprint.

The similarity for calculating the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio seeks to first be respectively compared mesh The otherness of corresponding DTW distances in mark with phonetic symbols frequency and standard audio, corresponding DTW distances mentioned here refer to same The DTW distances that individual fundamental tone frequency is obtained, then the otherness of each DTW distances is handled according to specified method, it is possible to To target audio audio-frequency fingerprint and standard audio audio-frequency fingerprint similarity.

Specifically, it is possible to use below equation, the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio are calculated Similarity：

Wherein, A is similarity；

It should be noted that above-mentioned calculating formula of similarity is a kind of preferred embodiment of the present invention, not simultaneously table Show that similarity can only be obtained by above-mentioned calculation formula in the present invention.For example, it is also possible to be obtained by the way of following similar Degree：

The calculation formula of similarity can voluntarily be determined according to thought of the invention by those skilled in the art, of the invention It is not especially limited herein.

S105, if the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio are more than default pre- threshold Value, it is determined that target audio is similar to standard audio.

In S104, after the similarity for calculating the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio, judge Whether the similarity calculated is more than default pre- threshold value, and described default pre- threshold value can be by those skilled in the art's root According to the calculation formula of similarity in S104, and, proven business datum is determined.The present invention is not especially limited herein.

Corresponding to above method embodiment, present invention also offers a kind of determining device of similar audio, such as Fig. 3 institutes Show, can include：

Audio frequency characteristics value sequence determining module 101, the specific audio frequency characteristic value sequence for determining target audio.

DTW is apart from determining module 102, for according to dynamic time warping algorithm, the designated tone of target audio to be calculated respectively DTW distances between frequency characteristic value sequence and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency；Wherein, the N The determination side of the determination method of the specific audio frequency characteristic value sequence of individual fundamental tone frequency and the specific audio frequency characteristic value sequence of target audio Method is identical；

Audio-frequency fingerprint determining module 103, the audio-frequency fingerprint for obtained N number of DTW distances to be defined as to target audio；

Similarity calculation module 104, for according to default formula, calculating the audio-frequency fingerprint and standard audio of target audio Audio-frequency fingerprint similarity, wherein, the determination method and the audio-frequency fingerprint of target audio of the audio-frequency fingerprint of the standard audio Determination method it is identical；

Similar audio determining module 105, if audio-frequency fingerprint and the audio-frequency fingerprint phase of standard audio for target audio It is more than default pre- threshold value like degree, it is determined that target audio is similar to standard audio.

In actual application, audio frequency characteristics value sequence determining module 101, as shown in figure 4, can include：

Audio parsing submodule 201, at least two sounds for choosing target audio by default audio section selection rule Frequency range；

Audio section chooses submodule 202, for choosing at least two of target audio by default audio section selection rule Audio section；

Audio frequency characteristics value determination sub-module 203, for the specific audio frequency characteristic value for each audio section for determining to choose；

Sequence determination sub-module 204, for the specific audio frequency characteristic value of fixed each audio section to be pressed into default order Arranged, obtain the specific audio frequency characteristic value sequence of target audio.

Technical scheme is it is determined that after the specific audio frequency characteristic value sequence of target audio, according to dynamic time warping Algorithm, calculates the specific audio frequency characteristic value sequence of target audio and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency DTW distances between row；And as the audio-frequency fingerprint of target audio；And then the audio-frequency fingerprint and standard for passing through target audio The similarity of the audio-frequency fingerprint of audio determines whether target audio and standard audio are similar.Compared with prior art, it is not necessary to A large amount of characteristic vectors are produced, this causes in whole audio-frequency fingerprint matching process, without substantial amounts of characteristic storage and retrieval, machine Resource overhead is smaller.Even if moreover, there is a situation where when extracting audio particular audio frame section it is inconsistent, to the technical side of the present invention Case influence is also smaller, accordingly, it is capable to which the problem of local robustness for mitigating prior art is not high enough, improves overall robustness.

In the preferred embodiment of the present invention, the audio parsing submodule 201, specifically for：

In the preferred embodiment of the present invention, the audio frequency characteristics value determination sub-module 203, specifically for：

Or

In the preferred embodiment of the present invention, the sequence determination sub-module 204, specifically for：

In the preferred embodiment of the present invention, the similarity calculation module 104, specifically for：

Wherein, A is similarity；

It should be noted that herein, such as first and second or the like relational terms are used merely to a reality Body or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or deposited between operating In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to Nonexcludability is included, so that process, method, article or equipment including a series of key elements not only will including those Element, but also other key elements including being not expressly set out, or also include being this process, method, article or equipment Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that Also there is other identical element in process, method, article or equipment including the key element.

Each embodiment in this specification is described by the way of related, identical similar portion between each embodiment Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.It is real especially for device Apply for example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.

Can one of ordinary skill in the art will appreciate that realizing that all or part of step in above method embodiment is To instruct the hardware of correlation to complete by program, described program can be stored in computer read/write memory medium, The storage medium designated herein obtained, such as：ROM/RAM, magnetic disc, CD etc..

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention It is interior.

Claims

1. a kind of determination method of similar audio, it is characterised in that including：

According to dynamic time warping algorithm, calculate respectively the specific audio frequency characteristic value sequence of target audio with it is predetermined N number of DTW distances between the specific audio frequency characteristic value sequence of fundamental tone frequency；Wherein, the specific audio frequency characteristic value sequence of N number of fundamental tone frequency The determination method of row is identical with the determination method of the specific audio frequency characteristic value sequence of target audio；

According to default formula, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated, wherein, institute The determination method for stating the audio-frequency fingerprint of standard audio is identical with the determination method of the audio-frequency fingerprint of target audio；

If the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio are more than default pre- threshold value, it is determined that mesh Mark with phonetic symbols frequency is similar to standard audio；

The specific audio frequency characteristic value of fixed each audio section is arranged by default order, specifying for target audio is obtained Audio frequency characteristics value sequence.

2. the method as described in claim 1, it is characterised in that described to be segmented target audio by the chopping rule specified Processing, obtains audio section, including：

3. the method as described in claim 1, it is characterised in that the specific audio frequency feature for each audio section that the determination is chosen Value, including：

It is determined that the audio intensity average for each audio section chosen, using identified audio intensity average specifying as each audio section Audio frequency characteristics value；

Or

It is determined that choose each audio section short-time zero-crossing rate, using identified short-time zero-crossing rate as each audio section specific audio frequency Characteristic value；

Or

It is determined that choose each audio section short-time energy, using identified short-time energy as each audio section specific audio frequency feature Value.

4. the method as described in claim 1, it is characterised in that the specific audio frequency characteristic value by fixed each audio section Arranged by default order, obtain the specific audio frequency characteristic value sequence of target audio；Including：

By the specific audio frequency characteristic value of fixed each audio section according to audio section corresponding with each specific audio frequency characteristic value in sound Sequencing in frequency is arranged, and obtains the specific audio frequency characteristic value sequence of target audio.

5. the method as described in claim 1, it is characterised in that according to default formula, calculate the audio-frequency fingerprint of target audio With the similarity of the audio-frequency fingerprint of standard audio, including：

<mrow> <mi>A</mi> <mo>=</mo> <msqrt> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>Y</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> </mrow> 1

Wherein, A is similarity；

X_iThe DTW between specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio away from From；

Y_iThe DTW between specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio away from From.

6. a kind of determining device of similar audio, it is characterised in that including：

DTW is apart from determining module, for according to dynamic time warping algorithm, the specific audio frequency characteristic value of target audio to be calculated respectively DTW distances between sequence and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency；Wherein, N number of fundamental tone frequency Specific audio frequency characteristic value sequence determination method it is identical with the determination method of the specific audio frequency characteristic value sequence of target audio；

Similarity calculation module, for according to default formula, calculating the audio-frequency fingerprint of target audio and the audio of standard audio The similarity of fingerprint, wherein, the determination of the determination method of the audio-frequency fingerprint of the standard audio and the audio-frequency fingerprint of target audio Method is identical；

Similar audio determining module, if be more than for the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio Default pre- threshold value, it is determined that target audio is similar to standard audio；

Audio section chooses submodule, at least two audio sections for choosing target audio by default audio section selection rule；

Sequence determination sub-module, for the specific audio frequency characteristic value of fixed each audio section to be arranged by default order Row, obtain the specific audio frequency characteristic value sequence of target audio.

7. device as claimed in claim 6, it is characterised in that the audio parsing submodule, specifically for：

8. device as claimed in claim 6, it is characterised in that the audio frequency characteristics value determination sub-module, specifically for：

It is determined that choose each audio section audio intensity average, using identified gray average as each audio section specific audio frequency Characteristic value；

Or

9. device as claimed in claim 6, it is characterised in that the sequence determination sub-module, specifically for：

10. device as claimed in claim 6, it is characterised in that the similarity calculation module, specifically for：

<mrow> <mi>A</mi> <mo>=</mo> <msqrt> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>Y</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> </mrow>

Wherein, A is similarity；