CN101980197A

CN101980197A - Long time structure vocal print-based multi-layer filtering audio frequency search method and device

Info

Publication number: CN101980197A
Application number: CN 201010524833
Authority: CN
Inventors: 刘刚; 王镪; 郭军
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2010-10-29
Filing date: 2010-10-29
Publication date: 2011-02-23
Anticipated expiration: 2030-10-29
Also published as: CN101980197B

Abstract

The embodiment of the invention discloses a sample-based audio frequency search method, namely, a long time structure vocal print-based multi-layer filtering audio frequency search method, which can search for the complete information of the entire audio frequency through a recorded audio frequency clip. The invention discloses a novel method for generating a vocal print having long time structure information and search effect is enhanced by a two-layer filtering method. The method comprises the following steps of: extracting the vocal print characteristic of an input clip; processing by using a first layer filter; calculating result reliability; determining whether second filtering is performed or not; and realizing secondary filtering by inquiring vocal print expansion. The invention also discloses a long time structure vocal print-based multi-layer filtering audio frequency search device. Experiments indicate that the accuracy of up to 99.7 percent can be reached for an audio frequency library containing 10,000 songs when an inquired clip lasts for 5 seconds and the signal-to-noise ratio is 0 db by the embodiment of the invention.

Description

A kind of when long the multilayer filtering audio search method and the device of structure vocal print

Technical field

The invention belongs to the computer technology application, be specifically related to a kind of method and apparatus of inquiring about audio database, relate in particular to a kind of content-based sample audio search method, promptly search the complete information of whole audio frequency by the original audio segment of recording.

Background technology

Along with modern information technologies, particularly multimedia technology and rapid development of network technique, a large amount of multimedia messagess can obtain from network.And various audio files more become the most normal object of being searched by the user in each search engine (for example Baidu, Google etc.).Traditional audio-frequency information retrieval technique mainly is based on text, yet the retrieval of traditional text based audio-frequency information can't be satisfied the demand of people to audio retrieval.That is to say,, want to inquire about the information of whole section audio, still have bigger realization difficulty at present technically by the segment of recording several seconds if the user hears one section very familiar audio frequency.

At present the audio search service on the internet is a kind of text search from essence, is by coupling audio frequency related text content, crucial words and return results.Want the audio-frequency fragments of recording is searched for, just relate to content-based sample audio retrieval.And existing audio retrieval technology still can not satisfy people's demand.In recent years, content-based audio retrieval technology becomes a research focus, and the scientist of various fields begins to inquire into this new technological challenge.

Content-based audio retrieval, realizing inquiring about by the segment of recording several seconds is one of the most basic implementation, i.e. sample retrieval.It refers to the user and imports audio-frequency fragments or by microphone records one section audio, may contain various noises in these segments, and system can correctly return the relevant information of audio-frequency fragments.

Based on the audio retrieval of sample, can be divided into two subproblems usually: 1) audio-frequency fragments of inquiry is transferred to representative characteristic sequence and form vocal print (vocal print is meant the characteristic sequence that can represent a section audio and energy index building) 2) the search segment candidates the most similar in the storehouse to characteristic sequence.Relatively more classical audio search method mainly contains two classes: based on the audio search method of local feature point or global structure information.Based on the method for local feature point, generally be from frequency spectrum, to seek some typical unique points, for example the Shazam company of Britain extracts spectrum peak information, it is right then unique point to be formed unique point, unique point to vocal print as this fragment; Set up hash index in the time of search and realize search fast.The characteristics of the method are the global informations that does not need to keep frequency spectrum, and feature is representative, and anti-to make performance strong, and shortcoming is that quantity of information is few, and collision is more serious during the vocal print index building.Method based on global structure information, it is the global information that keeps entire spectrum, contain much information, but noiseproof feature is not strong, information is representative poor, and for example the method for the Philips research institute of Holland proposition is divided into 33 non-overlapped sub-bands to the frequency spectrum between the 300-2000Hz, final sub-band represents that by 0 or 1 these 0,1 sequences are formed vocal print; Also use vocal print to make up Hash table in the time of search and accelerate search speed.

These audio search methods can obtain reasonable effect in small-scale application, but when audio repository is magnanimity, has a lot of problems and occur, and serious such as the index collision, search time is long.Because the characteristic information amount of extracting is not enough, collision is serious when causing setting up index, search time is long, if increase the voiceprint amount to forming vocal print with unique point, reduce the index collision, can reduce vocal print stability again, retrieval precision descends, that is to say between vocal print collision rate and the stability it is a contradiction, low collision rate will inevitably be brought the stability decreases of vocal print.

Summary of the invention

In view of this, the audio search method that the purpose of this invention is to provide a kind of structure vocal print and multilayer filtering when long, effectively solve conflicting problem between vocal print stability and the collision rate, for the magnanimity audio database, the present invention can effectively improve retrieval accuracy, recall precision and the noise robustness of audio retrieval.

In order to realize the foregoing invention purpose, the present invention adopts following technical proposals:

A kind of when long the multilayer filtering audio search method of structure vocal print, it is characterized in that:

(1) extracts the invariant feature that the user imports audio-frequency fragments, for example spectrum peak feature;

(2) generate vocal print with structural information when long (English audio fingerprint by name is meant and can represents a section audio and characteristic sequence that can index building) according to unique point;

(3) through the ground floor wave filter, for searching item, search hash index, obtain segment candidates intermediate result, and use the original signal spectrum unique point to calculate the intermediate result similarity, according to similarity middle result is sorted then with all vocal prints;

(4) candidate result of ground floor wave filter rank first is carried out degree of confidence marking,, then export the result, otherwise changed for the 5th step over to if surpass predetermined threshold;

(5) expanding query vocal print number enters second layer wave filter, according to concordance list, searches more intermediate results, and calculates the intermediate result similarity, then the one or two layer of wave filter result is sorted according to similarity;

(6) select the highest audio-frequency fragments information of similarity to return the user.

Wherein, the audio database of being inquired about obtains as follows:

(1) extracts audio database invariant feature, for example spectrum peak feature;

(2) generate vocal print with structural information when long;

(3) use all database vocal prints to make up hash index, key is a vocal print, is worth to be the position in vocal print place audio file name and the vocal print place audio file.

The invention also discloses a kind of audio retrieval device of structure vocal print and multilayer filtering when long, comprising: voice data library unit 101 promptly constitutes the audio database in inquiry storehouse.

Vocal print construction unit 102, i.e. extract minutiae, a plurality of unique points of information make up vocal prints when long with having;

Index building unit 103 for audio repository sound intermediate frequency file, makes up a hashed table index with all vocal prints, and vocal print is a key, and vocal print place audio file name and audio file position, place are values.

Input block 104 is input as the original audio segment of recording in the complex environment;

Filter cell 105 and 108 comprised for three steps, was respectively: search candidate's intermediate result according to the hash index table, calculate the intermediate result similarity, according to similarity to sort result.The difference of unit 105 and unit 108 is that the inquiry vocal print of importing is different, and the original vocal print of segment is inquired about in being input as of unit 105, the vocal print with fault-tolerant ability that is input as the process query expansion of unit 108.

Confidence computation unit 106 is carried out degree of confidence marking to ground floor wave filter output result, estimates confidence level;

Query expansion unit 107, use a kind of based on fault-tolerant query expansion to the inquiry vocal print expand;

Result for retrieval output unit 109, the output result for retrieval.

Provided by the present invention when long the multilayer filtering audio search method of structure vocal print, the voiceprint amount of using during index building of structural information when long is big, the index collision rate is low, what adopt when calculating similarity is the parent mass peak value tag, stability is strong, and use inquiry vocal print expansion to realize secondary filtering, improved the stability of vocal print, improved the speed and the precision of inquiry significantly with fault tolerant mechanism.Use method of the present invention,,, can reach first hit rate of 99.7% when the inquiry segment is 5 seconds and signal to noise ratio (S/N ratio) when being 0db for the audio database of 10000 first songs.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, will do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below.

Fig. 1 is the device block diagram of the embodiment of the invention.

Fig. 2 is the vocal print design of graphics of structural information when long of this method.

Fig. 3 is the filtering algorithm synoptic diagram based on index.

Fig. 4 is the multilayer filtering audio search method process flow diagram of structure vocal print when long.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.

As shown in Figure 1, the device block diagram for the embodiment of the invention comprises:

For the voice data in the database (unit 101), extract feature, a plurality of unique points of structural information make up vocal prints (unit 102) when long with having, and use vocal print to make up database index (unit 103) then.

Retrieval phase, inquiry segment (unit 104) for input, extract the vocal print (unit 102) that feature construction has structural information when long, through ground floor wave filter (unit 105), promptly search candidate's intermediate result according to the hash index table, calculate the intermediate result similarity, according to similarity to sort result, then initial results is carried out degree of confidence marking (unit 106), whether judgement is passed through based on fault-tolerant query expansion (unit 107) and is entered second layer wave filter (unit 108), finally exports the result and gives user (unit 109).

Below, in conjunction with Fig. 2-Fig. 4, the multilayer filtering audio search method of structure vocal print when long that illustrates that the embodiment of the invention provides:

In content-based audio retrieval, all be earlier the voice data process to be handled, extract audio frequency characteristics.This audio frequency characteristics is representative, this section audio of the unique representative of energy, and noiseproof feature is eager to excel in whatever one does, and when neighbourhood noise, feature still remains unchanged or less variation.

Present modal voice data all is a wave file, and form generally is wav, and the audio file of extended formatting is easy to be converted into the wav file by software.Therefore, in the present embodiment, audio repository and user record segment and all adopt wave file wav form.

Set up database index and query script and all will use vocal print, method is the same, below the first generative process of explanation vocal print.

Vocal print generates and comprises that feature extraction and vocal print make up two parts.Feature extraction algorithm comprises following process: at first, voice data is divided into overlapping frame, through windowing process and time-frequency conversion, extracts the spectrum peak point at last from these frames.

When vocal print makes up, adopts a kind of method that is called the anchor point expansion to make up vocal print, promptly make up vocal print (Fig. 2) with a plurality of unique points, structural information during increase vocal print long, the structure formula is as follows:

hash(f _i，f _i+1，...，f _i+r-1)＝f _i+f _i+1*n+...+f _i+r-1*n ^r-1 [1]

Above-mentioned is the formula that r unique point makes up vocal print, and wherein, f is an audio frequency characteristics, and n is the unique point span upper limit.

So-called anchor point refers to a main unique point that is used for making up vocal print, as shown in Equation 1, and f _iBe anchor point.Can adjust the number that distance between the unique point and each anchor point are formed vocal print according to no situation in the reality.Supposing that unique point satisfies evenly distributes, and maximum frequency is n, and the unique point number of forming vocal print is r, if each point all is an anchor point, and each anchor point to form vocal print number be m, so maximum voiceprint is exactly m*n ^rIf m=1, n=256, r=4, then maximum voiceprint is 32bit, the voiceprint amount is very big, can accelerate search speed greatly during index building.When m is not equal to 1, can makes up m Hash table and come acceleration search to reduce collision.Because the database that the present invention considers is a magnanimity, pays the utmost attention to the impact severity of vocal print, for each anchor point, this method increases by 3 points and makes up vocal print.In the characteristic extraction procedure, if certain frequency band peak value last very long, the same situation of peak point that continuous a few frame extracts just may appear, make and have great correlativity between the adjacent feature point, in order to eliminate this correlativity, get 2 at interval between the unique point when vocal print makes up, concrete computing formula is as follows:

hash(f _i，f _i+3，f _i+6，f _i+9，)＝f _i+f _i+3*n+f _i+6*n ²+f _i+9*n ³ [2]

In the following formula, f represents the relative frequency of unique point, and n is the Frequency point span upper limit.The vocal print collision that this method makes up is very little, but the correct matching probability of vocal print is the product of each unique point correct probability, so this anchor point extended method will inevitably cause the instability of vocal print, and the present invention adopts a kind of search strategy of uniqueness to come remedy such and insufficient.

Take all factors into consideration search efficiency and precision problem, the inventor uses a kind of search method of selectable two-layer filtering.As shown in Figure 4, search method is made up of two-layer wave filter, and two-layer wave filter included for three steps, at first searched segment candidates according to vocal print, is that the accurate similarity of segment candidates is calculated then, sorts the output ranking results at last according to similarity.Because vocal print poor stability, for the segment candidates of each corresponding vocal print, all carry out the accurate similarity in second step and calculate, what adopt when similarity is calculated is the primitive character point, the primitive character point is more a lot of than vocal print good stability, can eliminate the influence that the vocal print instability is brought like this.The difference of this two-layer wave filter is an input vocal print number difference, and seek rate and precision are different.According to the output result of ground floor wave filter, can calculate corresponding degree of confidence, if degree of confidence is lower,, increase the vocal print number through the vocal print expansion, by second layer wave filter, export more accurate result again.Experimental result shows, affected by noise when serious when the inquiry segment, second layer wave filter can improve the retrieval accuracy of total system greatly.

Below the several Key Points in the inquiry filtering algorithm is done respectively and specified.

Algorithm filter once at first is described.The searching algorithm of this two-layer wave filter is the same.For audio repository sound intermediate frequency file, make up a Hash table with all vocal prints, vocal print is a key, vocal print place audio file name and audio file position, place are values.Retrieval phase (Fig. 3) is extracted the vocal print of inquiry segment, through index search, just can find the stock's audio frequency vocal print and the position of correspondence, just can find and inquires about corresponding fragment according to these vocal prints, and all these fragments all are candidate segment.Because it is big to constitute the voiceprint amount of this index, collide fewer, so seek rate is very fast.If audio repository is made up of 10000 first songs, average every first song 5 minutes, the maximal value of single unique point is 256 (8bit), vocal print is made up of 4 unique points, then the voiceprint amount is 32bit, corresponding 0.01 candidate segment of average each vocal print, record segment in 10 seconds and extract about 300 vocal prints, can find about 3 candidate segment, distribution owing to feature in the actual conditions is more concentrated, cause many tens times of segment candidates, but still can get rid of most impossible songs, only keep less candidate segment through this index.After finding candidate segment, candidate segment is sorted, use the primitive character that constitutes vocal print to come the similarity of calculated candidate segment, just can obtain song information accurately, computing formula is as follows:

S_{j} = 1 - \frac{Σ_{i = 0}^{N} \min ({(q_{i} - d_{i})}^{2}, C)}{N \cdot C} - - - [3]

Wherein, S _jBe the similarity of j fragment, q _iBe the unique point of inquiry segment, d _iBe the unique point of fragment in the corresponding storehouse, N is the feature total number, and C is a fixing constant, and the influence that can limit noise brings can be arranged to than 3 little integers.Experiment showed, and introduce the retrieval performance that this constant can greatly improve system.Because what this similarity calculating method used is the primitive character point, primitive character point itself is just than stable many of vocal print, and therefore the similarity of obtaining with the method is more accurate, and it is more reliable to export the result after the ordering.

This searching algorithm is based on a hypothesis: having a vocal print at least is accurately to mate, if this hypothesis is set up, those fragments that need calculate similarity only are stock's audio fragments of inquiry segment vocal print correspondence so.In order to prove the validity of this hypothesis, can be calculated to a rare probability that vocal print is correct with following formula:

P＝1-(1-q ^r) ⁿ [4]

Q is the correct probability of each unique point, and r is a unique point number of forming vocal print, and n is the vocal print total number that extracts.If q=0.4, r=4, the inquiry fragment length is 10 seconds, n ≈ 300 so, then calculate P and are approximately 0.999.If q is very little, P is also very little so, and in this case, accurate similarity is calculated and also is difficult to find correct result, so this algorithm is effective.In fact, the selection of r can be selected according to the stability of frame length, index amount, feature and to the requirement of speed.When data were magnanimity, based on paying the utmost attention to of speed, r was set to 4.

Before judging whether to enter second layer wave filter, ground floor wave filter result is had a confidence calculations process, be used for the confidence level of estimated result, the confidence calculations method has multiple, and confidence calculations is as follows as a result in output in this method:

C = \frac{S_{1}}{S_{2}} - - - [5]

C is output result's a degree of confidence, S ₁Be first candidate's similarity, S ₂It is second candidate's similarity.If ground floor wave filter output result's degree of confidence is lower than a threshold value, just, obtain a more accurate result through second layer wave filter.

If the inquiry segment is affected by noise serious, the vocal print that is made of unique point may neither one be on all four, at this situation, the present invention proposes a kind of enhancing searching algorithm, vocal print is made up of r-1 point, and the vocal print with r-1 point when setting up database index makes up second index, is used for second layer wave filter search matched, second layer algorithm filter is the same with ground floor, only is that the structure and the index of vocal print is different.If ground floor wave filter output result's degree of confidence is lower than a threshold value, just, obtain a more accurate result through second layer wave filter.Find through statistics, the frequency values of error characteristic point generally all fluctuates up and down at original frequency in the inquiry segment, and it is very big to differ 1 probability, the possibility that is higher than other frequency values far away, therefore, the inventor has proposed a kind of based on fault-tolerant query expansion algorithm again, when making up second layer wave filter, with the public same index of ground floor wave filter, only expansion inquiry segment vocal print inquires that by expansion the unique point of segment increases the number of vocal print, has so just reduced the demand to internal memory, only need make up an index, also reach requirement fast and accurately simultaneously.If each point all expands to original three times, promptly fluctuates 1 up and down, and constitutes vocal print by 4 points, can obtain 80 times original vocal print number so.Here original vocal print is not retrieved again, only ground floor filter similarity result of calculation and second layer wave filter result are sorted together, export net result then.In fact, only need realize feature expansion to the low unique point of those degree of confidence, the feature confidence calculations is as follows:

F = Σ_{i = 0}^{N - 1} E_{i} / N * λ - - - [6]

E _iBe the energy of unique point, N is the feature total number, and λ is a coefficient, can adjust the number of this coefficient controlling features expansion.In fact, because the ground floor wave filter output existence of confidence threshold value as a result, only go bad when serious when audio-frequency fragments, just can be through two-layer wave filter, in this case, two-layer wave filter can improve the performance of total system greatly.By using this query expansion algorithm, can spend the fewer time and reach good performance.

Find by statistics, for original clip, zero lap between frame, and when just in time differing field when getting frame, about 1/4 peak point generation deviation is arranged, this because frame boundaries is chosen the inconsistent feature extraction mistake that causes and is referred to as boundary effect.Because the existence of boundary effect causes the feature extraction mistake, so the Duplication between the frame should be the bigger the better, promptly frame move the smaller the better so that reduce the influence that boundary effect is brought.In this patent method, for the total amount that reduces index and reduce boundary effect as far as possible, storehouse sound intermediate frequency Duplication is 1/2, and inquiry segment Duplication is 3/4.Because the Duplication difference adopts following formula to calculate similarity:

S_{j} = 1 - \frac{Σ_{i = 0}^{N} \min ({(q_{2 i - 1} - d_{i})}^{2}, {(q_{2 i} - d_{i})}^{2}, C)}{N \cdot C} - - - [7]

This formula implication is the same with formula 4, only two points in point in the audio repository and the inquiry is compared.

Accordingly, inquiry segment vocal print formula is calculated as follows:

hash(f _i，f _i+6，f _i+12，f _i+18，)＝f _i+f _i+6*n+f _i+12*n ²+f _i+18*n ³ [8]

Utilize search method process flow diagram shown in Figure 4 below, the retrieving of explanation this method of image.As shown in the figure, this method comprises that mainly the off-line of left-half sets up the online query process of database index process and right half part.

Overall flow mainly comprises two parts: i) set up database index and ii) retrieve the inquiry segment.Specifically describe as follows:

1, off-line is set up database index:

For the every first song in the database (module 201), extract spectrum peak unique point (module 202) earlier, make up vocal print (module 203) (formula 1) according to these peak points, use the vocal print that contains much information to make up database index (module 204).

2, online retrieving inquiry segment:

Step 1: for inquiry segment (module 206), extract spectrum peak unique point (module 207), generate vocal print (module 208) (formula 8) with peak point.

Step 2: use the ground floor wave filter, promptly search candidate's (module 209) and calculated candidate similarity (module 210) (formula 7), sort (module 211), obtain initial results according to similarity according to database index (module 205).

Step 3: initial results is carried out degree of confidence marking (module 212) (formula 5),, then export the result if surpass predetermined threshold (module 213), otherwise, the inquiry vocal print is expanded (module 214), change step 4 over to.

Step 4: selectively use second layer wave filter that the expansion vocal print is retrieved again, search candidate's (module 215) and calculate similarity (module 216) (formula 7), the candidate result of two-layer wave filter is sorted (module 217) together according to similarity, export more reliable result (module 218).

In order to verify the validity of the method, the inventor is an example with the music retrieval, set up the audio repository of one 10000 first song, tested 400 head from the storehouse at random the length of intercepting be the plus noise segment of 5 seconds and 10 seconds, test result is as shown in the table:

Signal to noise ratio (S/N ratio) (db)-12-9-6-3 03

Accuracy rate 46% 82.00% 96.30% 98.20% 99.70% 99.70%

The average search time (s) 0.12 0.13 0.15 0.20 0.25 0.26

Table 15 second segment test result

Signal to noise ratio (S/N ratio) (db)-12-9-6-3 03

Accuracy rate 54% 90.00% 97.20% 100.00% 100.00% 100.00%

The average search time (s) 0.19 0.32 0.38 0.45 0.58 0.63

Table 210 second segment test result

Come as can be seen from the table, this method has reached gratifying inquiry accuracy rate in the millisecond rank.

Though above described the present invention by embodiment, the present invention has many distortion and variation and does not break away from spirit of the present invention, appended claim will comprise these distortion and variation.Any modification of being done within the spirit and principles in the present invention, be equal to and replace and improvement etc., all should be included within protection scope of the present invention.

Claims

1. the multilayer filtering audio search method of a structure vocal print when long is characterized in that:

(2) make up vocal print with structural information when long;

(3) through the ground floor wave filter, for searching item, search database index, obtain candidate's intermediate result, and calculate the intermediate result similarity, according to similarity middle result is sorted then according to primitive character with all vocal prints;

(5) expanding query vocal print enters second layer wave filter, according to concordance list, searches more intermediate results, and calculates the intermediate result similarity, then the one or two layer of wave filter result is sorted according to similarity;

(6), select the highest audio fragment information of similarity to return the user according to ranking results.

According to right 1 require described when long the multilayer filtering audio search method of structure vocal print, it is characterized in that:

The audio database of being inquired about obtains as follows:

(2) generate vocal print with structural information when long;

According to right 1 and right 2 require described when long the multilayer filtering audio search method of structure vocal print, it is characterized in that:

Vocal print construction method with structural information when long adopts a plurality of unique points to make up vocal prints, and unique point number and the interval between the unique point of forming vocal print can be according to the actual conditions adjustment, and it is as follows to make up formula:

hash(f _i，f _i+1，...，f _i+r-1)＝f _i+f _i+1*n+...+f _i+r-1*n ^r-1

The Query Result confidence calculations, the result carries out confidence calculations to the output of ground floor wave filter, is used for estimating ground floor wave filter output result's confidence level.The confidence calculations method has multiple, and the confidence calculations formula is as follows as a result in output in this method:

C = \frac{S_{1}}{S_{2}}

C is output result's a degree of confidence, S ₁Be first candidate's similarity, S ₂It is second candidate's similarity.

Based on fault-tolerant inquiry vocal print extended method, be to each unique point of recording segment several position that fluctuates, make the vocal print of input fragment expand to a plurality of vocal prints, inquiry input as retrieval for the second time, for example fluctuate 1, unique point expands to original 3 times, and the vocal print expansion for interface quantity becomes original 3 ^rDoubly, r is for making up the unique point number of a vocal print.

Algorithm filter comprised for three steps: 1, search candidate's intermediate result according to the database index table; 2, calculate the intermediate result similarity; 3, according to similarity to middle sort result.

Selectable two-layer filtering algorithm promptly by calculating ground floor wave filter output result's confidence level, selects whether to carry out the more complicated filtering second time, promptly just enters second layer filtering through query expansion when the ground floor result is insincere.

Recording that segment feature extraction time frame moves is half that database sound intermediate frequency Frame moves.

As a kind of alternate algorithm of second layer wave filter, second layer wave filter can use more accurate index structure, and for example each vocal print is made up by less unique point, improves retrieval performance.

10. the multilayer filtering audio retrieval device of a structure vocal print when long comprises:

(1) offline database index construct module;

The voice data library unit promptly constitutes the audio database of inquiring about the storehouse.

The vocal print construction unit promptly extracts the voice data unique point, and a plurality of unique points of structural information make up vocal prints when long with having;

The index building unit for audio repository sound intermediate frequency file, makes up a hashed table index with all vocal prints, and vocal print is a key, and vocal print place audio file name and audio file position, place are values.

(2) online query search module.

Input block is input as the original audio segment of recording in the complex environment;

The vocal print construction unit, i.e. extract minutiae, a plurality of unique points of structural information make up vocal prints when long with having;

Filter cell comprised for three steps, was respectively: search candidate's intermediate result according to the hash index table, calculate the intermediate result similarity, according to similarity to sort result.

Confidence computation unit is carried out degree of confidence marking to ground floor wave filter output result, estimates confidence level;

The query expansion unit, use a kind of based on fault-tolerant query expansion to the inquiry vocal print expand;

The result for retrieval output unit, the output result for retrieval.