CN104464726B - A kind of determination method and device of similar audio - Google Patents

A kind of determination method and device of similar audio Download PDF

Info

Publication number
CN104464726B
CN104464726B CN201410840295.3A CN201410840295A CN104464726B CN 104464726 B CN104464726 B CN 104464726B CN 201410840295 A CN201410840295 A CN 201410840295A CN 104464726 B CN104464726 B CN 104464726B
Authority
CN
China
Prior art keywords
audio
frequency
characteristic value
specific
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410840295.3A
Other languages
Chinese (zh)
Other versions
CN104464726A (en
Inventor
刘祁跃
李典
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201410840295.3A priority Critical patent/CN104464726B/en
Publication of CN104464726A publication Critical patent/CN104464726A/en
Application granted granted Critical
Publication of CN104464726B publication Critical patent/CN104464726B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of determination method and device of similar audio, this method includes:Determine the specific audio frequency characteristic value sequence of target audio;According to dynamic time warping algorithm, the DTW distances between the specific audio frequency characteristic value sequence of target audio and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency are calculated respectively;Obtained N number of DTW distances are defined as to the audio-frequency fingerprint of target audio;According to default formula, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated;If the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio are more than default pre- threshold value, it is determined that target audio is similar to standard audio.Compared with prior art, it is not necessary to produce a large amount of characteristic vectors, this causes in whole audio-frequency fingerprint matching process, and without substantial amounts of characteristic storage and retrieval, machine resources expense is smaller.Moreover, the problem of local robustness that can mitigate prior art is not high enough, improves overall robustness.

Description

A kind of determination method and device of similar audio
Technical field
The present invention relates to field of computer multimedia, the determination method and device of more particularly to a kind of similar audio.
Background technology
With developing rapidly for multimedia and internet, the generation and transmission of audio become simple and fast, audio resource Become extremely to enrich.Some provide a user the website of audio resource, while audio is provided a user, and can also receive user The substantial amounts of audio resource uploaded.And in these audios, often have the similar audio of content, if audio website by these The similar audio of content is all stored, and for audio website, operating pressure is than larger.Therefore, how can determine that audio It is whether similar, and similar audio is removed, for audio website, it appears particularly significant.
In the prior art, typically determine whether these audios are similar by the corresponding audio-frequency fingerprint of audio.Audio refers to Line is the one group of unique identification calculated according to audio signal, and similar audio should have similar audio-frequency fingerprint.Therefore, exist After the audio-frequency fingerprint that each audio is determined, each audio-frequency fingerprint is compared, if their audio-frequency fingerprint matches, that is, Say that similarity reaches certain numerical value, it is possible to which it is similar audio to determine corresponding audio.
In the matching process of existing audio-frequency fingerprint, the determination of its audio-frequency fingerprint mainly passes through analyzing audio content, Multiple local features of audio particular audio frame are extracted, such as carrying out quantization encoding, spectral difference amplitude to audio sample value, Then using the set of multiple audio frequency characteristics of extracted particular audio frame as whole audio fingerprint.
The matching process of above-mentioned audio-frequency fingerprint is all multiple local features according to audio particular audio frame to determine sound Frequency fingerprint, can produce a large amount of characteristic vectors, this cause in whole audio-frequency fingerprint matching process, it is necessary to substantial amounts of characteristic storage and Retrieval, machine resources expense is larger, may be inconsistent simultaneously because taking out frame when extracting audio particular audio frame so that local Shandong Rod is not high enough.
The content of the invention
To solve the above problems, the embodiment of the invention discloses a kind of determination method and device of similar audio.Technical side Case is as follows:
The embodiment of the invention discloses a kind of determination method of similar audio, it can include:
Determine the specific audio frequency characteristic value sequence of target audio;
According to dynamic time warping algorithm, calculate respectively the specific audio frequency characteristic value sequence of target audio with it is predetermined DTW distances between the specific audio frequency characteristic value sequence of N number of fundamental tone frequency;Wherein, the specific audio frequency characteristic value of N number of fundamental tone frequency The determination method of sequence is identical with the determination method of the specific audio frequency characteristic value sequence of target audio;
Obtained N number of DTW distances are defined as to the audio-frequency fingerprint of target audio;
According to default formula, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated, its In, the determination method of the audio-frequency fingerprint of the standard audio is identical with the determination method of the audio-frequency fingerprint of target audio;
If the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio are more than default pre- threshold value, really The audio that sets the goal is similar to standard audio;
Wherein, the determination method of the specific audio frequency characteristic value sequence of the target audio includes:
Target audio is subjected to segment processing by the chopping rule specified, audio section is obtained;
At least two audio sections of target audio are chosen by default audio section selection rule;
It is determined that the specific audio frequency characteristic value for each audio section chosen;
The specific audio frequency characteristic value of fixed each audio section is arranged by default order, target audio is obtained Specific audio frequency characteristic value sequence.
Wherein, it is described that target audio is subjected to segment processing by the chopping rule specified, audio section is obtained, including:
Target audio is subjected to segment processing by the time interval specified, audio section is obtained.
Wherein, the specific audio frequency characteristic value of each audio section for determining to choose, including:
It is determined that the audio intensity average for each audio section chosen, regard identified audio intensity average as each audio section Specific audio frequency characteristic value;
Or
It is determined that the short-time zero-crossing rate for each audio section chosen, using identified short-time zero-crossing rate specifying as each audio section Audio frequency characteristics value;
Or
It is determined that choose each audio section short-time energy, using identified short-time energy as each audio section specific audio frequency Characteristic value.
Wherein, the specific audio frequency characteristic value by fixed each audio section is arranged by default order, is obtained The specific audio frequency characteristic value sequence of target audio;Including:
By the specific audio frequency characteristic value of fixed each audio section according to audio section corresponding with each specific audio frequency characteristic value Sequencing in audio is arranged, and obtains the specific audio frequency characteristic value sequence of target audio.
Wherein, according to default formula, the audio-frequency fingerprint for calculating target audio is similar to the audio-frequency fingerprint of standard audio Degree, including:
Using below equation, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated:
Wherein, A is similarity;
XiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio DTW distances;
YiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio DTW distances.
The embodiment of the present invention also provides a kind of determining device of similar audio, can include:
Audio frequency characteristics value sequence determining module, the specific audio frequency characteristic value sequence for determining target audio;
DTW is apart from determining module, for according to dynamic time warping algorithm, the specific audio frequency that target audio is calculated respectively to be special DTW distances between value indicative sequence and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency;Wherein, N number of base The determination method phase of the determination method of the specific audio frequency characteristic value sequence of audio and the specific audio frequency characteristic value sequence of target audio Together;
Audio-frequency fingerprint determining module, the audio-frequency fingerprint for obtained N number of DTW distances to be defined as to target audio;
Similarity calculation module, audio-frequency fingerprint and standard audio for according to default formula, calculating target audio The similarity of audio-frequency fingerprint, wherein, the determination method of the audio-frequency fingerprint of the standard audio and the audio-frequency fingerprint of target audio Determine that method is identical;
Similar audio determining module, if audio-frequency fingerprint and the audio-frequency fingerprint similarity of standard audio for target audio More than default pre- threshold value, it is determined that target audio is similar to standard audio;
Wherein, the audio frequency characteristics value sequence determining module includes:
Audio parsing submodule, for target audio to be carried out into segment processing by the chopping rule specified, obtains audio section;
Audio section chooses submodule, at least two audios for choosing target audio by default audio section selection rule Section;
Audio frequency characteristics value determination sub-module, for the specific audio frequency characteristic value for each audio section for determining to choose;
Sequence determination sub-module, for the specific audio frequency characteristic value of fixed each audio section to be carried out by default order Arrangement, obtains the specific audio frequency characteristic value sequence of target audio.
Wherein, the audio parsing submodule, specifically for:
Target audio is subjected to segment processing by the time interval chopping rule specified, audio section is obtained.
Wherein, the audio frequency characteristics value determination sub-module, specifically for:
It is determined that the audio intensity average for each audio section chosen, using identified gray average specifying as each audio section Audio frequency characteristics value;
Or
It is determined that the short-time zero-crossing rate for each audio section chosen, using identified short-time zero-crossing rate specifying as each audio section Audio frequency characteristics value;
Or
It is determined that choose each audio section short-time energy, using identified short-time energy as each audio section specific audio frequency Characteristic value.
Wherein, the sequence determination sub-module, specifically for:
By the specific audio frequency characteristic value of fixed each audio section according to audio section corresponding with each specific audio frequency characteristic value Sequencing in audio is arranged, and obtains the specific audio frequency characteristic value sequence of target audio.
Wherein, the similarity calculation module, specifically for:
Using below equation, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated:
Wherein, A is similarity;
XiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio DTW distances;
YiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio DTW distances.
Technical scheme is it is determined that after the specific audio frequency characteristic value sequence of target audio, according to dynamic time warping Algorithm, calculates the specific audio frequency characteristic value sequence of target audio and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency DTW distances between row;And as the audio-frequency fingerprint of target audio;And then the audio-frequency fingerprint and standard for passing through target audio The similarity of the audio-frequency fingerprint of audio determines whether target audio and standard audio are similar.Compared with prior art, it is not necessary to A large amount of characteristic vectors are produced, this causes in whole audio-frequency fingerprint matching process, without substantial amounts of characteristic storage and retrieval, machine Resource overhead is smaller.Even if moreover, extract audio particular audio segment when there is a situation where it is inconsistent, to technical scheme Influence is also smaller, accordingly, it is capable to which the problem of local robustness for mitigating prior art is not high enough, improves overall robustness.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
A kind of determination method flow diagram for similar audio that Fig. 1 provides for the present invention;
The specific audio frequency characteristic value sequence for the target audio that Fig. 2 provides for the present invention determines method flow diagram;
A kind of structural representation of the determining device for similar audio that Fig. 3 provides for the present invention;
The structural representation for the audio frequency characteristics value sequence determining module that Fig. 4 provides for the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
A kind of determination method for similar audio that Fig. 1 provides for the present invention, this method can include:
S101, determines the specific audio frequency characteristic value sequence of target audio.
In the present invention, target audio is corresponding with standard audio.Standard audio refers to have determined that audio-frequency fingerprint, And on the basis of the audio, evaluate other audios audio whether similar to it.Described " other audios " is target audio.
The determination method of the specific audio frequency characteristic value sequence of target audio mentioned here is as shown in Fig. 2 can include:
S201, carries out segment processing by the chopping rule specified by target audio, obtains audio section.
Want to determine audio frequency characteristics value, first choice will be segmented target audio, and the rule of segmentation can be by this area skill Art personnel are come what is be determined, and the present invention is to the rule itself and need not be defined, it is preferred that can press target audio The time interval chopping rule specified carries out segment processing, obtains audio section.For example, for the audio of 10 seconds, can be by 2 The time interval of second carries out segment processing, obtains 5 audio sections.
S202, at least two audio sections of target audio are chosen by default audio section selection rule.
Default audio selection rule mentioned here can be determined by those skilled in the art, this hair It is bright to the rule itself and to be defined, as long as can choose a certain amount of according to the audio section selection rule selection rule Audio section.For example, the audio section in sequential for odd number can be chosen, the sound for even number in sequential can also be chosen Frequency range, can also preset a selection relational expression, for example, choose the 2N+2 audio section.It is understood that selected Audio section it is more, the stability and accuracy rate of this method will be accordingly higher, but simultaneously, the determination for providing similar audio The workload of the device of method also can be larger;If selected audio section is less, the stability and accuracy rate of this method will It is relatively low, but simultaneously, the workload of the device of the determination method for providing similar audio also can be smaller, therefore, selected audio The quantity of section can be determined by those skilled in the art according to actual conditions.The present invention is not especially limited herein.
S203, it is determined that the specific audio frequency characteristic value for each audio section chosen.
Described audio frequency characteristics value can be the arbitrary characteristics value that this area is used to describe audio section audio features, the present invention It is not required to be defined herein.For example, it may be determined that the audio intensity (being also loudness of a sound, the sound intensity or sound intensity) for each audio section chosen Average, using identified audio intensity average as each audio section specific audio frequency characteristic value;It should be noted that being carried here To average refer to sensu lato average, the average of weighting can be referred to, the average that can not also be weighted.It is understood that Audio intensity can be carried out after Mathematical treatment otherwise, be used as the specific audio frequency characteristic value of each audio section.Its is specific Processing method, the present invention without limiting, can according to actual needs be selected by those skilled in the art herein.
The short-time zero-crossing rate of each audio section chosen can also be determined, identified short-time zero-crossing rate is regard as each audio section Specific audio frequency characteristic value.
Can also determine choose each audio section short-time energy, using identified short-time energy as each audio section finger Accordatura frequency characteristic value.
For the selection of audio frequency characteristics value, those skilled in the art can select according to actual conditions, and the present invention is herein not It need to be defined.
S204, the specific audio frequency characteristic value of fixed each audio section is arranged by default order, target is obtained The specific audio frequency characteristic value sequence of audio.
, it is necessary to which each audio frequency characteristics value is arranged in certain sequence after specific audio frequency characteristic value is obtained, one is obtained The specific audio frequency characteristic value sequence of target audio.Default order mentioned here can be carried out true by those skilled in the art Fixed, the present invention need not be defined herein.For example, can by the specific audio frequency characteristic value of fixed each audio section according to it is each Sequencing of the corresponding audio section of specific audio frequency characteristic value in audio is arranged, and the specific audio frequency for obtaining target audio is special Value indicative sequence.Specifically, when choosing the audio intensity average of each audio section as audio frequency characteristics value, if selected audio Section is respectively the 1st, 10,20,30 sections, putting in order for the equal value sequence of audio intensity of target audio can be corresponding for the 1st section Audio intensity average ranked first position, and the 10th section of corresponding audio intensity average ranked second position, and the 20th section of corresponding audio intensity is equal Value ranked third position, and the 30th section of corresponding audio intensity average ranked fourth position.
S102, according to dynamic time warping algorithm, calculate respectively the specific audio frequency characteristic value sequence of target audio with advance DTW distances between the specific audio frequency characteristic value sequence of the N number of fundamental tone frequency determined.S103, obtained N number of DTW distances are determined For the audio-frequency fingerprint of target audio.
Firstly, it is necessary to explanation, the determination method and target sound of the specific audio frequency characteristic value sequence of N number of fundamental tone frequency The determination method of the specific audio frequency characteristic value sequence of frequency is identical.In actual application, N occurrence can be by this area skill Art personnel are determined according to application scenarios, and similarity is relatively low between the selection major requirement fundamental tone frequency of fundamental tone frequency, to avoid Many redundancies.
Because the quantity for the audio section selected by different audios may be different, therefore, the audio formed is special The quantity of each element in value indicative sequence may also be different.Therefore, it is impossible to directly by the audio frequency characteristics value of two different audios Sequence is directly compared.The two audio frequency characteristics value sequences are only passed through into dynamic time warping algorithm (Dynamic respectively Time Warping, referred to as:DTW) the specific audio frequency characteristic value sequence with N number of fundamental tone frequency is calculated, and respectively obtains N number of DTW Distance, can just be compared.Specifically,
Can be calculated respectively according to dynamic time warping algorithm the specific audio frequency characteristic value sequence of standard audio with advance DTW distances between the specific audio frequency characteristic value sequence of the N number of fundamental tone frequency determined.
Calculate the specific audio frequency characteristic value sequence of target audio and the specific audio frequency of predetermined N number of fundamental tone frequency respectively again DTW distances between characteristic value sequence.So, for target audio, it is possible to obtained N number of DTW distances, it is used as target The audio-frequency fingerprint of audio.For standard audio, the N number of DTW distances that can also be obtained refer to as the audio of standard audio Line.Now, for the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio, they are with identical element Number, therefore, it can be compared.
S104, according to default formula, the audio-frequency fingerprint for calculating target audio is similar to the audio-frequency fingerprint of standard audio Degree.
From previous step as can be seen that the determination method of the audio-frequency fingerprint of the standard audio and the audio of target audio refer to The determination method of line is identical.
In S103, it has been determined that go out the audio-frequency fingerprint of target audio, N number of DTW distances are included in the audio-frequency fingerprint;Together Sample, in advance we have determined that going out the audio-frequency fingerprint of standard audio, also includes N number of DTW distances in the audio-frequency fingerprint.
The similarity for calculating the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio seeks to first be respectively compared mesh The otherness of corresponding DTW distances in mark with phonetic symbols frequency and standard audio, corresponding DTW distances mentioned here refer to same The DTW distances that individual fundamental tone frequency is obtained, then the otherness of each DTW distances is handled according to specified method, it is possible to To target audio audio-frequency fingerprint and standard audio audio-frequency fingerprint similarity.
Specifically, it is possible to use below equation, the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio are calculated Similarity:
Wherein, A is similarity;
XiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio DTW distances;
YiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio DTW distances.
It should be noted that above-mentioned calculating formula of similarity is a kind of preferred embodiment of the present invention, not simultaneously table Show that similarity can only be obtained by above-mentioned calculation formula in the present invention.For example, it is also possible to be obtained by the way of following similar Degree:
The calculation formula of similarity can voluntarily be determined according to thought of the invention by those skilled in the art, of the invention It is not especially limited herein.
S105, if the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio are more than default pre- threshold Value, it is determined that target audio is similar to standard audio.
In S104, after the similarity for calculating the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio, judge Whether the similarity calculated is more than default pre- threshold value, and described default pre- threshold value can be by those skilled in the art's root According to the calculation formula of similarity in S104, and, proven business datum is determined.The present invention is not especially limited herein.
Technical scheme is it is determined that after the specific audio frequency characteristic value sequence of target audio, according to dynamic time warping Algorithm, calculates the specific audio frequency characteristic value sequence of target audio and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency DTW distances between row;And as the audio-frequency fingerprint of target audio;And then the audio-frequency fingerprint and standard for passing through target audio The similarity of the audio-frequency fingerprint of audio determines whether target audio and standard audio are similar.Compared with prior art, it is not necessary to A large amount of characteristic vectors are produced, this causes in whole audio-frequency fingerprint matching process, without substantial amounts of characteristic storage and retrieval, machine Resource overhead is smaller.Even if moreover, extract audio particular audio segment when there is a situation where it is inconsistent, to technical scheme Influence is also smaller, accordingly, it is capable to which the problem of local robustness for mitigating prior art is not high enough, improves overall robustness.
Corresponding to above method embodiment, present invention also offers a kind of determining device of similar audio, such as Fig. 3 institutes Show, can include:
Audio frequency characteristics value sequence determining module 101, the specific audio frequency characteristic value sequence for determining target audio.
DTW is apart from determining module 102, for according to dynamic time warping algorithm, the designated tone of target audio to be calculated respectively DTW distances between frequency characteristic value sequence and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency;Wherein, the N The determination side of the determination method of the specific audio frequency characteristic value sequence of individual fundamental tone frequency and the specific audio frequency characteristic value sequence of target audio Method is identical;
Audio-frequency fingerprint determining module 103, the audio-frequency fingerprint for obtained N number of DTW distances to be defined as to target audio;
Similarity calculation module 104, for according to default formula, calculating the audio-frequency fingerprint and standard audio of target audio Audio-frequency fingerprint similarity, wherein, the determination method and the audio-frequency fingerprint of target audio of the audio-frequency fingerprint of the standard audio Determination method it is identical;
Similar audio determining module 105, if audio-frequency fingerprint and the audio-frequency fingerprint phase of standard audio for target audio It is more than default pre- threshold value like degree, it is determined that target audio is similar to standard audio.
In actual application, audio frequency characteristics value sequence determining module 101, as shown in figure 4, can include:
Audio parsing submodule 201, at least two sounds for choosing target audio by default audio section selection rule Frequency range;
Audio section chooses submodule 202, for choosing at least two of target audio by default audio section selection rule Audio section;
Audio frequency characteristics value determination sub-module 203, for the specific audio frequency characteristic value for each audio section for determining to choose;
Sequence determination sub-module 204, for the specific audio frequency characteristic value of fixed each audio section to be pressed into default order Arranged, obtain the specific audio frequency characteristic value sequence of target audio.
Technical scheme is it is determined that after the specific audio frequency characteristic value sequence of target audio, according to dynamic time warping Algorithm, calculates the specific audio frequency characteristic value sequence of target audio and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency DTW distances between row;And as the audio-frequency fingerprint of target audio;And then the audio-frequency fingerprint and standard for passing through target audio The similarity of the audio-frequency fingerprint of audio determines whether target audio and standard audio are similar.Compared with prior art, it is not necessary to A large amount of characteristic vectors are produced, this causes in whole audio-frequency fingerprint matching process, without substantial amounts of characteristic storage and retrieval, machine Resource overhead is smaller.Even if moreover, there is a situation where when extracting audio particular audio frame section it is inconsistent, to the technical side of the present invention Case influence is also smaller, accordingly, it is capable to which the problem of local robustness for mitigating prior art is not high enough, improves overall robustness.
In the preferred embodiment of the present invention, the audio parsing submodule 201, specifically for:
Target audio is subjected to segment processing by the time interval chopping rule specified, audio section is obtained.
In the preferred embodiment of the present invention, the audio frequency characteristics value determination sub-module 203, specifically for:
It is determined that the audio intensity average for each audio section chosen, using identified gray average specifying as each audio section Audio frequency characteristics value;
Or
It is determined that the short-time zero-crossing rate for each audio section chosen, using identified short-time zero-crossing rate specifying as each audio section Audio frequency characteristics value;
Or
It is determined that choose each audio section short-time energy, using identified short-time energy as each audio section specific audio frequency Characteristic value.
In the preferred embodiment of the present invention, the sequence determination sub-module 204, specifically for:
By the specific audio frequency characteristic value of fixed each audio section according to audio section corresponding with each specific audio frequency characteristic value Sequencing in audio is arranged, and obtains the specific audio frequency characteristic value sequence of target audio.
In the preferred embodiment of the present invention, the similarity calculation module 104, specifically for:
Using below equation, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated:
Wherein, A is similarity;
XiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio DTW distances;
YiBetween specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio DTW distances.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality Body or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or deposited between operating In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to Nonexcludability is included, so that process, method, article or equipment including a series of key elements not only will including those Element, but also other key elements including being not expressly set out, or also include being this process, method, article or equipment Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that Also there is other identical element in process, method, article or equipment including the key element.
Each embodiment in this specification is described by the way of related, identical similar portion between each embodiment Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.It is real especially for device Apply for example, because it is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.
Can one of ordinary skill in the art will appreciate that realizing that all or part of step in above method embodiment is To instruct the hardware of correlation to complete by program, described program can be stored in computer read/write memory medium, The storage medium designated herein obtained, such as:ROM/RAM, magnetic disc, CD etc..
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention It is interior.

Claims (10)

1. a kind of determination method of similar audio, it is characterised in that including:
Determine the specific audio frequency characteristic value sequence of target audio;
According to dynamic time warping algorithm, calculate respectively the specific audio frequency characteristic value sequence of target audio with it is predetermined N number of DTW distances between the specific audio frequency characteristic value sequence of fundamental tone frequency;Wherein, the specific audio frequency characteristic value sequence of N number of fundamental tone frequency The determination method of row is identical with the determination method of the specific audio frequency characteristic value sequence of target audio;
Obtained N number of DTW distances are defined as to the audio-frequency fingerprint of target audio;
According to default formula, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated, wherein, institute The determination method for stating the audio-frequency fingerprint of standard audio is identical with the determination method of the audio-frequency fingerprint of target audio;
If the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio are more than default pre- threshold value, it is determined that mesh Mark with phonetic symbols frequency is similar to standard audio;
Wherein, the determination method of the specific audio frequency characteristic value sequence of the target audio includes:
Target audio is subjected to segment processing by the chopping rule specified, audio section is obtained;
At least two audio sections of target audio are chosen by default audio section selection rule;
It is determined that the specific audio frequency characteristic value for each audio section chosen;
The specific audio frequency characteristic value of fixed each audio section is arranged by default order, specifying for target audio is obtained Audio frequency characteristics value sequence.
2. the method as described in claim 1, it is characterised in that described to be segmented target audio by the chopping rule specified Processing, obtains audio section, including:
Target audio is subjected to segment processing by the time interval specified, audio section is obtained.
3. the method as described in claim 1, it is characterised in that the specific audio frequency feature for each audio section that the determination is chosen Value, including:
It is determined that the audio intensity average for each audio section chosen, using identified audio intensity average specifying as each audio section Audio frequency characteristics value;
Or
It is determined that choose each audio section short-time zero-crossing rate, using identified short-time zero-crossing rate as each audio section specific audio frequency Characteristic value;
Or
It is determined that choose each audio section short-time energy, using identified short-time energy as each audio section specific audio frequency feature Value.
4. the method as described in claim 1, it is characterised in that the specific audio frequency characteristic value by fixed each audio section Arranged by default order, obtain the specific audio frequency characteristic value sequence of target audio;Including:
By the specific audio frequency characteristic value of fixed each audio section according to audio section corresponding with each specific audio frequency characteristic value in sound Sequencing in frequency is arranged, and obtains the specific audio frequency characteristic value sequence of target audio.
5. the method as described in claim 1, it is characterised in that according to default formula, calculate the audio-frequency fingerprint of target audio With the similarity of the audio-frequency fingerprint of standard audio, including:
Using below equation, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated:
<mrow> <mi>A</mi> <mo>=</mo> <msqrt> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>Y</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> </mrow> 1
Wherein, A is similarity;
XiThe DTW between specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio away from From;
YiThe DTW between specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio away from From.
6. a kind of determining device of similar audio, it is characterised in that including:
Audio frequency characteristics value sequence determining module, the specific audio frequency characteristic value sequence for determining target audio;
DTW is apart from determining module, for according to dynamic time warping algorithm, the specific audio frequency characteristic value of target audio to be calculated respectively DTW distances between sequence and the specific audio frequency characteristic value sequence of predetermined N number of fundamental tone frequency;Wherein, N number of fundamental tone frequency Specific audio frequency characteristic value sequence determination method it is identical with the determination method of the specific audio frequency characteristic value sequence of target audio;
Audio-frequency fingerprint determining module, the audio-frequency fingerprint for obtained N number of DTW distances to be defined as to target audio;
Similarity calculation module, for according to default formula, calculating the audio-frequency fingerprint of target audio and the audio of standard audio The similarity of fingerprint, wherein, the determination of the determination method of the audio-frequency fingerprint of the standard audio and the audio-frequency fingerprint of target audio Method is identical;
Similar audio determining module, if be more than for the audio-frequency fingerprint of target audio and the audio-frequency fingerprint similarity of standard audio Default pre- threshold value, it is determined that target audio is similar to standard audio;
Wherein, the audio frequency characteristics value sequence determining module includes:
Audio parsing submodule, for target audio to be carried out into segment processing by the chopping rule specified, obtains audio section;
Audio section chooses submodule, at least two audio sections for choosing target audio by default audio section selection rule;
Audio frequency characteristics value determination sub-module, for the specific audio frequency characteristic value for each audio section for determining to choose;
Sequence determination sub-module, for the specific audio frequency characteristic value of fixed each audio section to be arranged by default order Row, obtain the specific audio frequency characteristic value sequence of target audio.
7. device as claimed in claim 6, it is characterised in that the audio parsing submodule, specifically for:
Target audio is subjected to segment processing by the time interval chopping rule specified, audio section is obtained.
8. device as claimed in claim 6, it is characterised in that the audio frequency characteristics value determination sub-module, specifically for:
It is determined that choose each audio section audio intensity average, using identified gray average as each audio section specific audio frequency Characteristic value;
Or
It is determined that choose each audio section short-time zero-crossing rate, using identified short-time zero-crossing rate as each audio section specific audio frequency Characteristic value;
Or
It is determined that choose each audio section short-time energy, using identified short-time energy as each audio section specific audio frequency feature Value.
9. device as claimed in claim 6, it is characterised in that the sequence determination sub-module, specifically for:
By the specific audio frequency characteristic value of fixed each audio section according to audio section corresponding with each specific audio frequency characteristic value in sound Sequencing in frequency is arranged, and obtains the specific audio frequency characteristic value sequence of target audio.
10. device as claimed in claim 6, it is characterised in that the similarity calculation module, specifically for:
Using below equation, the similarity of the audio-frequency fingerprint of target audio and the audio-frequency fingerprint of standard audio is calculated:
<mrow> <mi>A</mi> <mo>=</mo> <msqrt> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msup> <mrow> <mo>(</mo> <msub> <mi>X</mi> <mi>i</mi> </msub> <mo>-</mo> <msub> <mi>Y</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </msqrt> </mrow>
Wherein, A is similarity;
XiThe DTW between specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for target audio away from From;
YiThe DTW between specific audio frequency characteristic value sequence and i-th of fundamental tone frequency specific audio frequency characteristic value sequence for standard audio away from From.
CN201410840295.3A 2014-12-30 2014-12-30 A kind of determination method and device of similar audio Active CN104464726B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410840295.3A CN104464726B (en) 2014-12-30 2014-12-30 A kind of determination method and device of similar audio

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410840295.3A CN104464726B (en) 2014-12-30 2014-12-30 A kind of determination method and device of similar audio

Publications (2)

Publication Number Publication Date
CN104464726A CN104464726A (en) 2015-03-25
CN104464726B true CN104464726B (en) 2017-10-27

Family

ID=52910677

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410840295.3A Active CN104464726B (en) 2014-12-30 2014-12-30 A kind of determination method and device of similar audio

Country Status (1)

Country Link
CN (1) CN104464726B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104900238B (en) * 2015-05-14 2018-08-21 电子科技大学 A kind of audio real-time comparison method based on perception filtering
CN104900239B (en) * 2015-05-14 2018-08-21 电子科技大学 A kind of audio real-time comparison method based on Walsh-Hadamard transform
CN107545904B (en) * 2016-06-23 2021-06-18 杭州海康威视数字技术股份有限公司 Audio detection method and device
CN106529433B (en) * 2016-10-25 2019-07-16 天津大学 Queue march in step degree evaluation method based on voice signal
CN107610715B (en) * 2017-10-10 2021-03-02 昆明理工大学 Similarity calculation method based on multiple sound characteristics
CN107731220B (en) 2017-10-18 2019-01-22 北京达佳互联信息技术有限公司 Audio identification methods, device and server
CN107918663A (en) 2017-11-22 2018-04-17 腾讯科技(深圳)有限公司 audio file search method and device
CN109192196A (en) * 2018-08-22 2019-01-11 昆明理工大学 A kind of audio frequency characteristics selection method of the SVM classifier of anti-noise
CN109493853B (en) * 2018-09-30 2022-03-22 福建星网视易信息系统有限公司 Method for determining audio similarity and terminal
CN110047515B (en) * 2019-04-04 2021-04-20 腾讯音乐娱乐科技(深圳)有限公司 Audio identification method, device, equipment and storage medium
CN110289013B (en) * 2019-07-24 2023-12-19 腾讯科技(深圳)有限公司 Multi-audio acquisition source detection method and device, storage medium and computer equipment
CN110910899B (en) * 2019-11-27 2022-04-08 杭州联汇科技股份有限公司 Real-time audio signal consistency comparison detection method
CN111081276B (en) * 2019-12-04 2023-06-27 广州酷狗计算机科技有限公司 Audio segment matching method, device, equipment and readable storage medium
CN113450768A (en) * 2021-06-25 2021-09-28 平安科技(深圳)有限公司 Speech synthesis system evaluation method and device, readable storage medium and terminal equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021854A (en) * 2006-10-11 2007-08-22 鲍东山 Audio analysis system based on content
CN101409073A (en) * 2008-11-17 2009-04-15 浙江大学 Method for identifying Chinese Putonghua orphaned word base on base frequency envelope
CN102214462A (en) * 2011-06-08 2011-10-12 北京爱说吧科技有限公司 Method and system for estimating pronunciation
CN103366784A (en) * 2013-07-16 2013-10-23 湖南大学 Multimedia playing method and device with function of voice controlling and humming searching

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050059766A (en) * 2003-12-15 2005-06-21 엘지전자 주식회사 Voice recognition method using dynamic time warping

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021854A (en) * 2006-10-11 2007-08-22 鲍东山 Audio analysis system based on content
CN101409073A (en) * 2008-11-17 2009-04-15 浙江大学 Method for identifying Chinese Putonghua orphaned word base on base frequency envelope
CN102214462A (en) * 2011-06-08 2011-10-12 北京爱说吧科技有限公司 Method and system for estimating pronunciation
CN103366784A (en) * 2013-07-16 2013-10-23 湖南大学 Multimedia playing method and device with function of voice controlling and humming searching

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于数字指纹的音频检索系统的设计与实现";高昕晟;《中国学位论文全文数据库》;20140917;第3、8-12、40-41页 *

Also Published As

Publication number Publication date
CN104464726A (en) 2015-03-25

Similar Documents

Publication Publication Date Title
CN104464726B (en) A kind of determination method and device of similar audio
US11657798B2 (en) Methods and apparatus to segment audio and determine audio segment similarities
CN103971689B (en) A kind of audio identification methods and device
JP6732296B2 (en) Audio information processing method and device
CN104282322B (en) A kind of mobile terminal and its method and apparatus for identifying song climax parts
TW201246183A (en) Extraction and matching of characteristic fingerprints from audio signals
CN103853836B (en) Music retrieval method and system based on music fingerprint characteristic
JP2013508767A (en) Perceptual tempo estimation with scalable complexity
CN103489445A (en) Method and device for recognizing human voices in audio
US20240177697A1 (en) Audio data processing method and apparatus, computer device, and storage medium
CN108206027A (en) A kind of audio quality evaluation method and system
KR20140080429A (en) Apparatus and Method for correcting Audio data
TW202109508A (en) Sound separation method, electronic and computer readable storage medium
CN105047202B (en) A kind of audio-frequency processing method, device and terminal
CN104143339A (en) Music signal processing apparatus and method, and program
WO2019017242A1 (en) Musical composition analysis method, musical composition analysis device and program
CN105283916A (en) Digital-watermark embedding device, digital-watermark embedding method, and digital-watermark embedding program
CN104217731A (en) Quick solo music score recognizing method
US9213703B1 (en) Pitch shift and time stretch resistant audio matching
CN104900239B (en) A kind of audio real-time comparison method based on Walsh-Hadamard transform
Pilia et al. Time scaling detection and estimation in audio recordings
KR100766170B1 (en) Music summarization apparatus and method using multi-level vector quantization
Viloria et al. Segmentation process and spectral characteristics in the determination of musical genres
Jeong et al. Dlr: Toward a deep learned rhythmic representation for music content analysis
Fan et al. Notice of violation of ieee publication principles: A music identification system based on audio fingerprint

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant