CN103280215A - Audio frequency feature library establishing method and device - Google Patents

Audio frequency feature library establishing method and device Download PDF

Info

Publication number
CN103280215A
CN103280215A CN2013102030454A CN201310203045A CN103280215A CN 103280215 A CN103280215 A CN 103280215A CN 2013102030454 A CN2013102030454 A CN 2013102030454A CN 201310203045 A CN201310203045 A CN 201310203045A CN 103280215 A CN103280215 A CN 103280215A
Authority
CN
China
Prior art keywords
audio frequency
audio
recording
noise
uproar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013102030454A
Other languages
Chinese (zh)
Other versions
CN103280215B (en
Inventor
宋辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201310203045.4A priority Critical patent/CN103280215B/en
Publication of CN103280215A publication Critical patent/CN103280215A/en
Application granted granted Critical
Publication of CN103280215B publication Critical patent/CN103280215B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses an audio frequency feature library establishing method and device. The audio frequency feature library establishing method comprises the steps of estimating the noise feature of a recording and playing system; according to the estimated noise feature of the recording and playing system, carrying out imnoise processing on audio frequencies in an audio frequency library; extracting features of the audio frequencies which are subjected to imnoise processing; and an audio frequency feature library is established through the extracted features. According to the technical scheme of the audio frequency feature library provided by the embodiment of the invention, comparing with the training method of a traditional CMI (Contend-based Music Identification) system, the imnoise processing is added in the training phase, the audio frequency feature library which is obtained through the imnoise processing is used to relieve the mismatch phenomenon of training signals and test signals, and the accuracy of an audio frequency identification system can be effectively improved.

Description

A kind of audio frequency characteristics storehouse method for building up and device
Technical field
The present invention relates to the audio signal processing technique field, particularly relate to a kind of audio frequency characteristics storehouse method for building up and device.
Background technology
Along with Internet development, the user has been not limited only to content of text at the object of the enterprising line search of network, and picture, audio frequency, video etc. have all become the object that search engine is supported.CMI(Contend-based Music Identification for example, content-based music identification) be exactly a kind of popular application form in the present internet.From in form, this application class is like traditional text search, when the user hear one section own interested but when not knowing the music of title of the song, can be by the fragment in a few second of recording music, this fragment is submitted to corresponding audio search system as searching request, and system finds the various information feedback of this music to give the user by the search technique on backstage.
In order to realize above-mentioned functions, need utilize large-scale Qu Ku to train in advance, set up the audio frequency characteristics storehouse, the process nature of music identification is exactly that the unknown snatch of music of user's input is searched in this feature database, finds the fragment of coupling.
Ideally, if user's typing without any the snatch of music that disturbs, as long as in feature database, there is this feature, just can correctly match so.But in actual applications, the signal that the user records can have tangible interference, wherein both comprised the system noise that playback equipment, recording arrangement etc. are introduced, the noise that comprises the surrounding environment of recording again, and when training used signal generally all be pure music file (for example audio format such as MP3, APE), this difficulty that just causes mating under real applied environment strengthens, thereby reduces the performance of music identification system.
Summary of the invention
For solving the problems of the technologies described above, the embodiment of the invention provides a kind of audio frequency characteristics storehouse method for building up and device, and to improve the recognition performance to audio fragment under true environment, technical scheme is as follows:
The embodiment of the invention provides a kind of audio frequency characteristics storehouse method for building up, it is characterized in that this method comprises:
Noise characteristic to recording-reproducing system is estimated;
Recording-reproducing system noise characteristic according to estimated adds the processing of making an uproar to the audio frequency in the audio repository;
Carry out feature extraction to adding the audio frequency of making an uproar after handling, utilize the feature of extracting to set up the audio frequency characteristics storehouse.
According to a kind of embodiment of the present invention, described noise characteristic to recording-reproducing system is estimated, comprising:
From audio repository picked at random sample audio frequency, in default environment, play this sample audio frequency and recording;
Utilize original sample audio frequency and recording audio, calculate the transport function of recording-reproducing system.
According to a kind of embodiment of the present invention, this method also comprises:
Before calculating the transport function of recording-reproducing system, to calculating the relative time delay of original sample audio frequency and recording audio, and utilize result of calculation that original sample audio frequency and recording audio are carried out time unifying.
According to a kind of embodiment of the present invention, this method also comprises:
Before calculating the transport function of recording-reproducing system, be unpacked format with the original sample audio conversion of compressed format.
According to a kind of embodiment of the present invention, describedly according to estimated recording-reproducing system noise characteristic the audio frequency in the audio repository is added the processing of making an uproar, comprising:
Calculate the response results of audio frequency by described recording-reproducing system in the audio repository respectively.
According to a kind of embodiment of the present invention, according to estimated recording-reproducing system noise characteristic, to the audio frequency in the audio repository add make an uproar handle after, also comprise:
Playback environ-ment noise characteristic according to pre-estimating adds the processing of making an uproar to the audio frequency in the audio repository.
According to a kind of embodiment of the present invention, the described playback environ-ment noise characteristic of pre-estimating comprises the noise characteristic corresponding to the actual playback environ-ment of difference;
Describedly carry out feature extraction to adding the audio frequency of making an uproar after handling, the feature that utilization is extracted is set up the audio frequency characteristics storehouse, comprise: carry out feature extraction to utilizing the varying environment noise characteristic to add to make an uproar the audio frequency of handling, the corresponding many groups audio frequency characteristics that obtains same audio frequency is further set up many groups audio frequency characteristics storehouse of corresponding multiple neighbourhood noise.
According to a kind of embodiment of the present invention, this method also comprises:
After receiving the audio search request of user's input, search in the corresponding audio frequency characteristics of varying environment noise storehouse respectively.
According to a kind of embodiment of the present invention, this method also comprises:
After finishing search at least one times, determine the corresponding audio frequency characteristics of user's current environment storehouse according to Search Results, subsequent searches is directly searched in determined feature database.
According to a kind of embodiment of the present invention, this method also comprises:
Noise characteristic to user's current environment is estimated, determines the corresponding audio frequency characteristics of user's current environment storehouse according to estimated result;
After the audio search request that receives user's input, directly in determined feature database, search for.
The embodiment of the invention also provides a kind of audio frequency characteristics storehouse apparatus for establishing, and this device comprises:
The system noise estimation unit is used for the noise characteristic of recording-reproducing system is estimated;
Add the processing unit of making an uproar, be used for according to estimated recording-reproducing system noise characteristic, the audio frequency in the audio repository is added the processing of making an uproar;
Feature database is set up the unit, is used for carrying out feature extraction to adding the audio frequency of making an uproar after handling, and utilizes the feature of extracting to set up the audio frequency characteristics storehouse.
According to a kind of embodiment of the present invention, described system noise estimation unit specifically is used for:
From audio repository picked at random sample audio frequency, in default environment, play this sample audio frequency and recording;
Utilize original sample audio frequency and recording audio, calculate the transport function of recording-reproducing system.
According to a kind of embodiment of the present invention, described system noise estimation unit also is used for:
Before calculating the transport function of recording-reproducing system, to calculating the relative time delay of original sample audio frequency and recording audio, and utilize result of calculation that original sample audio frequency and recording audio are carried out time unifying.
According to a kind of embodiment of the present invention, described system noise estimation unit also is used for:
Before calculating the transport function of recording-reproducing system, be unpacked format with the original sample audio conversion of compressed format.
According to a kind of embodiment of the present invention, the described processing unit of making an uproar that adds comprises multiplicative noise processing subelement, is used for calculating the audio frequency of audio repository by the response results of described recording-reproducing system respectively.
According to a kind of embodiment of the present invention, the described processing unit of making an uproar that adds, also comprise additive noise processing subelement, be used for according to estimated recording-reproducing system noise characteristic, to the audio frequency in the audio repository add make an uproar handle after, playback environ-ment noise characteristic according to pre-estimating adds the processing of making an uproar to the audio frequency in the audio repository.
According to a kind of embodiment of the present invention,
The described playback environ-ment noise characteristic of pre-estimating comprises the noise characteristic corresponding to the actual playback environ-ment of difference;
Described feature database is set up the unit, and concrete being used for utilizing the varying environment noise characteristic to add to make an uproar the audio frequency of handling to carry out feature extraction, the corresponding many groups audio frequency characteristics that obtains same audio frequency is further set up many groups audio frequency characteristics storehouse of corresponding multiple neighbourhood noise.
According to a kind of embodiment of the present invention, this device also comprises:
The searching request processing unit is used for searching in the corresponding audio frequency characteristics of varying environment noise storehouse respectively after the audio search request that receives user's input.
According to a kind of embodiment of the present invention, described searching request processing unit also is used for:
After finishing search at least one times, determine the corresponding audio frequency characteristics of user's current environment storehouse according to Search Results, subsequent searches is directly searched in determined feature database.
According to a kind of embodiment of the present invention, this device also comprises:
The searching request processing unit is used for the noise characteristic of user's current environment is estimated, determines the corresponding audio frequency characteristics of user's current environment storehouse according to estimated result; After the audio search request that receives user's input, directly in determined feature database, search for.
The scheme of setting up in the audio frequency characteristics storehouse that the embodiment of the invention provides, compare with the training method of traditional C MI system, introduced in the training stage and to have added the processing of making an uproar, this processing can simulate the transport function of " audio amplifier-microphone " this transmission system on the one hand, and with this transport function act on the training Qu Ku all audio frequency, make that the acoustic feature of the acoustic feature of training signal and actual recording signal is more approaching, offset the influence of channel distortion.On the other hand, in adding the processing of making an uproar, can also simulate the neighbourhood noise of true playback environ-ment, thereby improve audio identification for the adaptability of varying environment.Utilization can be alleviated the mismatch phenomenon of training signal and test signal through adding the audio frequency characteristics storehouse that obtains after the processing of making an uproar, and effectively promotes the accuracy rate of audio recognition systems.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, the accompanying drawing that describes below only is some embodiment that put down in writing among the present invention, for those of ordinary skills, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is a kind of process flow diagram of embodiment of the invention audio frequency characteristics storehouse method for building up;
Fig. 2 is a kind of structural representation of embodiment of the invention audio frequency characteristics storehouse apparatus for establishing;
Fig. 3 is second kind of structural representation of embodiment of the invention audio frequency characteristics storehouse apparatus for establishing.
Embodiment
In order to realize the search of audio content, need utilize large-scale Qu Ku to train, set up the audio frequency characteristics storehouse, existing C MI features training technology for example mainly comprises four steps:
1) audio feature extraction, each audio file among the Qu Ku extracts features such as music rhythm, rhythm frame by frame;
2) audio frequency cutting, the catastrophe point of searching sound signal utilizes these catastrophe points that training data is cut into some audio parsings (Segments), and each Segments extracts a proper vector;
3) feature clustering by specific clustering algorithm, carries out cluster with the feature of each Segment, extracts the most representative K category feature of a piece of music when reducing feature quantity, and K is exactly number of clusters;
4) index is set up, and sets up concordance list for whole music signals among the Qu Ku through the feature after the cluster, shows as hash.
After finishing the foundation of concordance list, also just finished the CMI training process.In identifying, audio fragment to be identified passes through processing such as feature extraction, audio frequency cutting, cluster equally, obtains the proper vector of audio fragment to be identified, by mating in the characteristic library index table, finds target audio.
Ideally, if user's typing without any the snatch of music that disturbs, as long as in feature database, there is this feature, just can correctly match so.But in actual applications, the signal that the user records can have tangible interference, wherein both comprised the system noise that playback equipment, recording arrangement etc. are introduced, the noise that comprises the surrounding environment of recording again, and when training used signal generally all be pure music file (for example audio format such as MP3, APE), this difficulty that just causes mating under real applied environment strengthens, thereby reduces the performance of music identification system.
At the problems referred to above, the invention provides the method for building up in a kind of audio frequency characteristics storehouse, compare with the training method of traditional C MI system, the present invention program is training stage (namely setting up the aspect indexing stage), introduced and added the processing of making an uproar, this step 1 aspect can simulate the transport function of " audio amplifier-microphone " this transmission system, and with this transport function act on the training Qu Ku all audio frequency, make that the acoustic feature of the acoustic feature of training signal and actual recording signal is more approaching, offset the influence of channel distortion.On the other hand, in this step, can also further simulate the neighbourhood noise of true playback environ-ment, thereby improve audio identification for the adaptability of varying environment.
Through and having added the pretreated training data of making an uproar, passed through conventional processing such as feature extraction, audio frequency cutting, feature clustering again, obtained the feature after each audio frequency cluster, utilized all characteristic signals to set up the hash index, finished training process.Utilization can be alleviated the mismatch phenomenon of training signal and test signal through adding the audio frequency characteristics storehouse that obtains after the processing of making an uproar, and effectively promotes the accuracy rate of audio recognition systems.
In order to make those skilled in the art understand technical scheme among the present invention better, below in conjunction with the accompanying drawing in the embodiment of the invention, technical scheme in the embodiment of the invention is described in detail, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, the every other embodiment that those of ordinary skills obtain should belong to the scope of protection of the invention.
Shown in Figure 1, be the process flow diagram of a kind of audio frequency characteristics of the present invention storehouse method for building up, this method can may further comprise the steps:
S101 estimates the noise characteristic of recording-reproducing system;
For a section audio, in the process of play-recording, the noise that may introduce mainly comprises two aspects: on the one hand be the noise that recording-reproducing system self produces, comprise the caused distortions of input equipment transmission system such as output device such as audio amplifier and microphone, this partial noise belongs to multiplicative noise; Another part is the ground unrest of recording environment, and this partial noise belongs to additive noise.In these two kinds of noises, system's self-noise is generally occupied main status to the influence of sound signal, and this part noise has versatility (namely no matter for which type of environment, system noise all exists), therefore in this step, at first the noise characteristic of recording-reproducing system is estimated.
In a kind of embodiment of the present invention, it is as follows that the noise characteristic of recording-reproducing system is carried out estimation approach:
At first from the one or more sample audio frequency of audio repository picked at random, in default environment, play this sample audio frequency and record; Owing to be that the noise that recording-reproducing system self is introduced is estimated here, therefore in order to reduce influence of environmental noise as far as possible, can choose more quiet environment as far as possible and record.In addition, in order to reduce the deviation that the sample size deficiency causes estimated result, can choose a plurality of sample audio frequency, for example can from Qu Ku, picked at random N song play by audio amplifier, utilize microphone records to obtain one whole section recording audio simultaneously.
After obtaining recording audio, utilize original sample audio frequency and recording audio, just can estimate the transport function of recording-reproducing system.
The time-domain signal of supposing the original sample audio frequency is x (t), and the time-domain signal of recording audio is y (t), and the time domain transport function of recording-reproducing system is h (t), and y (t) is numerically equal to the convolution of x (t) and h (t) so, that is:
y(t)=x(t)*h(t)
If x (t) and y (t) are carried out Fourier transform respectively, obtain frequency-domain expression X (ω) and the Y (ω) of original sample audio frequency and recording audio, so, at frequency domain following relation is arranged:
Y(ω)=H(ω)X(ω)
Wherein H (ω) is the frequency domain transfer function of recording-reproducing system, and itself and h (t) be Fourier pair each other.Therefore utilize
Figure BDA00003259213200071
Just can estimate the transport function of recording-reproducing system, just " audio amplifier-microphone " system introduce noisiness.
In the process of recording of reality, may there be time delay between recording audio and the original sample audio frequency, this asynchrony phenomenon can impact the estimated result of ssystem transfer function, therefore in one embodiment of the invention, before the transport function of calculating recording-reproducing system, opening is earlier to calculating the relative time delay of original sample audio frequency and recording audio, and utilizes result of calculation that original sample audio frequency and recording audio are carried out time unifying.
The time-domain signal of supposing the original sample audio frequency is s (t), and the time-domain signal of recording audio is Because the two may be nonsynchronous in time, so need estimate the mistiming t of two paths of signals 0, utilize this mistiming that signal s (t) is compensated, obtain the signal s (t-t behind the delay compensation 0).At this moment, s (t-t 0) with
Figure BDA00003259213200082
Fully synchronously in time, can carry out follow-up transport function estimates to calculate.
The time delay of two paths of signals is estimated, can be adopted correlation method to carry out.Since s (t) with
Figure BDA00003259213200083
Be same source signal, have very strong correlativity, therefore can calculate the related function of the two:
R ( t 0 ) = Σ t s ( t + t 0 ) s ^ ( t )
When two paths of signals in time during complete matching, their correlativity is the strongest.So related function R (t 0) the position that occurs of peak value be exactly the time delay that needs are estimated.That is to say, by pair correlation function R (t 0) peak value search for, just can estimate time delay t 0
Be understandable that, except adopting the time domain correlation method, can also adopt frequency domain correlation method calculation delay, because related function and cross-spectral density are a pair of Fourier pairs, therefore can calculate the cross-spectral density function of signal earlier, be transformed to the time domain related function again, carry out peak value searching and obtain time delay t 0, no longer concrete formula is described here.
Obtain time delay t 0Afterwards, with the time delayed signal s (t-t of original audio sample 0) and the recording audio signal
Figure BDA00003259213200085
Be expressed as x (t) and y (t) respectively, through after the Fourier transform, just can be in the hope of the frequency domain transfer function H (ω) of recording-reproducing system.
In addition, demand according to reality, before the transport function of calculating recording-reproducing system, also may need the original sample audio frequency is done some pre-service, for example compression format audios such as MP3, APE are converted to unpacked format audio frequency, the sampling rate of audio frequency is carried out sampling processing, etc., because these contents do not belong to emphasis of the present invention, therefore do not need in the present embodiment to describe in detail.
In another kind of embodiment of the present invention, calculate the frequency domain transfer function H (ω) of recording-reproducing system except directly adopting X (ω) and Y (ω), can also calculate H (ω) by power spectral density function, concrete grammar is as follows:
For formula Y (ω)=H (ω) X (ω), X is multiply by at two ends simultaneously *(ω), and get for a long time on average, obtain
lim T → ∞ 1 T [ Y ( ω ) X * ( ω ) ] = H ( ω ) lim T → ∞ 1 T [ X ( ω ) X * ( ω ) ]
According to the power spectrum density definition, above-mentioned formula can be rewritten as:
S xy(ω)=H(ω)S x(ω)
Wherein, S Xy(ω) be the cross-spectral density function of original sample audio frequency and recording audio; S x(ω) be the autopower spectral density function of original sample audio frequency.Like this, utilize
H ^ ( ω ) = S xy ( ω ) S x ( ω )
Just can calculate the frequency domain transfer function estimated value of recording-reproducing system.
The advantage of this method of estimation is: owing to transcribe audio frequency y (t) except the noise that system self introduces, more or less all can introduce some neighbourhood noise n (t), this moment, the frequency domain transfer function of recording-reproducing system can be expressed as:
H ^ ( ω ) = S xy ( ω ) + S xn ( ω ) S x ( ω )
According to law of great numbers, abundant when the I/O data, the time that perhaps is averaged is during long enough, S Xn(ω) level off to zero.Therefore calculate with above-mentioned formula
Figure BDA00003259213200094
Can regard that the nothing of H (ω) estimates partially as, also just mean estimated value
Figure BDA00003259213200095
Basically the H (ω) with actual environment is consistent.
S102, the recording-reproducing system noise characteristic according to estimated adds the processing of making an uproar to the audio frequency in the audio repository;
Obtained the Estimation of Frequency Response Function of system
Figure BDA00003259213200096
Afterwards, respectively each audio frequency in the audio repository is added the processing of making an uproar, make that the audio frequency after handling approaches actual recorded audio signals more on its acoustic characteristic, thereby alleviate training data and the unmatched problem of actual recorded audio signals.
Adding the process of making an uproar actual is exactly the response results of audio frequency by recording-reproducing system of calculating respectively in the audio repository.Supposing that the audio frequency among the Qu Ku is s (t), is s'(t through adding the audio frequency of making an uproar after handling so) be:
s ′ ( t ) = s ( t ) * h ^ ( t )
Wherein
Figure BDA00003259213200102
Can by
Figure BDA00003259213200103
Obtain through Fourier inversion, the above-mentioned process of making an uproar that adds also can directly calculate by the product mode at frequency domain certainly, and the embodiment of the invention no longer describes in detail.
Above-mentioned adding in the processing of making an uproar, what mainly consider is the noise that recording-reproducing system self is introduced, in a kind of preferred implementation of the present invention, in order to allow training data more near actual playback environ-ment, can also utilize the playback environ-ment noise characteristic of pre-estimating, the audio frequency in the audio repository be carried out secondary add the processing of making an uproar.
Different with the multiplicative noise that recording-reproducing system is introduced, neighbourhood noise is additivity, and therefore, secondary adds the process of making an uproar and handling, and is exactly at s'(t) the basis on, a spot of additive noise signal v (t) that superposes, obtaining secondary, to add the audio frequency of making an uproar after handling be s''(t):
s ′ ′ ( t ) = s ′ ( t ) + v ( t )
= s ( t ) * h ^ ( t ) + v ( t )
Wherein, noise signal v 1(t) can simulate in advance, also can gather some ambient noise signals in advance, train a rational disturbing signal v (t), such signal has higher System Discrimination performance more near the additive noise signal in the true environment.
In actual applications, the noise characteristic of introducing of different environment may have larger difference, if training data and a certain environmental characteristic are too approaching, in case bigger variation takes place the playback environ-ment of user's reality so, may have influence on actual recognition performance.In a kind of embodiment of the present invention, can preset multiple noise signal v at above-mentioned according to the different noise circumstances of reality 1(t), v 2(t), v 3(t) ... for example strong noise environment, middle noise circumstance, small noise environment etc.Correspondingly, according to many groups noise signal, training obtains many groups and adds noise cancellation signal s 1' ' (t), s 2' ' (t), s 3' ' (t) ..., follow-up in identifying, can search at other feature database of different noise levels respectively, thereby improve the recall rate of search.
S103 carries out feature extraction to adding the audio frequency of making an uproar after handling, and utilizes the feature of extracting to set up the audio frequency characteristics storehouse.
In this step, for make an uproar through adding (can be once add make an uproar or secondary add make an uproar) audio frequency after handling carries out audio feature extraction, introduces steps such as audio frequency cutting, feature clustering, index foundation then by the front, finishes the foundation in audio frequency characteristics storehouse.Because this part processing is similar with prior art, therefore no longer be described in detail in the present embodiment.
Need to prove if adding the stage of making an uproar, the training that has generated at different noise circumstances obtains how group adds noise cancellation signal, also can generate a plurality of audio frequency characteristics storehouse D1, D2, the D3 of corresponding different noise circumstance features so respectively in this step ...That is to say that for the arbitrary audio frequency among the D0 of original audio storehouse, s (t) is at each audio frequency characteristics storehouse D1, D2, D3 ... in, all there is one group of character pair of s (t), to adapt to different noise circumstances.
At the situation of setting up a plurality of audio frequency characteristics storehouse, the present invention also further provides corresponding audio identification method:
The most basic a kind of mode is: when the user imports the audio search request, namely the mode by recording is submitted one section audio fragment s to be identified to system q(t) after, system is at first to s q(t) carry out base conditioning such as feature extraction, audio frequency cutting, cluster, then respectively at the corresponding audio frequency characteristics of varying environment noise storehouse D1, D2, D3 ... search for, one or more Search Results that matching degree is the highest return to the user, for concrete audio frequency characteristics matching algorithm, can adopt existing techniques in realizing, not need detailed introduction in embodiments of the present invention.Because matching process is to carry out in the noise circumstance of multiple simulation, therefore can effectively reduce different environment to the influence of recording effect, under the situation of not loosening the coupling requirement, promote the recall rate of Search Results.
For the general user, to use in the process of audio search at certain, the situation of surrounding environment is metastable.In view of the situation, in one embodiment of the invention, if the current audio search at least one times of having finished of system, and the match condition of each Search Results better (can be judged by system self, perhaps mode such as user's marking is judged), so, if the result of preceding n search is at a certain definite audio frequency characteristics storehouse Dx(x=1,2,3 ...) in the matching result that finds, then system is in n+1 and follow-up search, can only in Dx, search for, ignore other feature database.
For example, set in advance n=2, if the user is in preceding twice search, all in the D1 of audio frequency characteristics storehouse, found the Search Results of unique coupling, search for for the third time from the user so, system can think that the current environment of living in of user and the corresponding environment of D1 are immediate, follow-uply directly searches in D1, carries out the system resource that matching operation will consume thereby saved in a plurality of feature databases.
The operation of above-mentioned automatic selection feature database can keep continuously effective, manually removes up to the user; Also can set in advance a failure period (for example 1 hour), treat the releasing of overtime back; Perhaps after detecting some behavior of user (for example close searched page, withdraw from search application etc.), automatically terminate.
Such scheme is that system is according to some user behavior basic laws and the current agenda of user, to the noise characteristic estimation of the current environment of living in of user.In another embodiment of the invention, system can also directly estimate the noise characteristic of the current environment of living in of user according to objective data.Concrete grammar is: require one section " blank " audio frequency of user's typing earlier, in this " blank " audio frequency, without any contents such as music, songs, the background noise that only comprises current environment, system just can be at D1, D2, D3 by this section noise is analyzed ... in find the feature database Dx that mates the most with user's current environment, follow-up when carrying out audio identification, can only in Dx, search for, ignore other feature database.
Particularly, at different audio frequency characteristics storehouses, can extract the proper vector (noise feature vector) that can characterize neighbourhood noise in advance.Usually can adopt normalization subband noise energy as the proper vector that characterizes noise signal.Be example with feature database D1, suppose that corresponding default additive noise signal is v 1(t), calculate its power spectral density function S by the mode of Fourier transform V1(ω), to power spectral density function S V1(ω) in frequency domain, carry out sub-band division, for example be divided into 16 subbands, the sub belt energy of calculating noise signal in each subband, and whole sub belt energies are carried out normalized, obtain the sub belt energy proper vector of one 16 dimension.This normalized sub belt energy vector has been described the distribution character of noise signal at frequency domain, can effectively distinguish different noise signals.
At current " blank " background noise recorded of user, the same normalization sub belt energy vector that extracts, and compare with the noise characteristic in different audio frequency characteristics storehouse, finding that the highest audio frequency characteristics storehouse of degree of matching, the mode of coupling can adopt modes such as Euclidean distance, inner product.If find and record the audio frequency characteristics storehouse Dx(x=1,2,3 that the characteristics of noise vector mates the most ...), then subsequent searches can only be carried out in Dx, ignores other feature databases.
Compare with the scheme of the neighbourhood noise feature being estimated according to user's basic act rule, the present embodiment scheme is selected the audio frequency characteristics storehouse based on the objective noise data of current environment, although need the user further to cooperate (the blank audio frequency of typing), have higher estimation accuracy rate in theory.
Automatically select the operation of feature database by the way, the mode that can adopt automatic or manual to trigger is equally removed no longer repeat specification here.
Corresponding to top method embodiment, the present invention also provides a kind of audio frequency characteristics storehouse apparatus for establishing, and referring to shown in Figure 2, this device can comprise:
System noise estimation unit 110 is used for the noise characteristic of recording-reproducing system is estimated;
Add the processing unit 120 of making an uproar, be used for according to estimated recording-reproducing system noise characteristic, the audio frequency in the audio repository is added the processing of making an uproar;
Feature database is set up unit 130, is used for carrying out feature extraction to adding the audio frequency of making an uproar after handling, and utilizes the feature of extracting to set up the audio frequency characteristics storehouse.
Respectively function and the cooperation relation of each unit is elaborated below:
System noise estimation unit 110:
For a section audio, in the process of play-recording, the noise that may introduce mainly comprises two aspects: on the one hand be the noise that recording-reproducing system self produces, comprise the caused distortions of input equipment transmission system such as output device such as audio amplifier and microphone, this partial noise belongs to multiplicative noise; Another part is the ground unrest of recording environment, and this partial noise belongs to additive noise.In these two kinds of noises, system's self-noise is generally occupied main status to the influence of sound signal, and this part noise has versatility (namely no matter for which type of environment, system noise all exists), therefore in this step, at first the noise characteristic of recording-reproducing system is estimated.
In a kind of embodiment of the present invention, it is as follows that the noise characteristic of recording-reproducing system is carried out estimation approach:
At first from the one or more sample audio frequency of audio repository picked at random, in default environment, play this sample audio frequency and record; Owing to be that the noise that recording-reproducing system self is introduced is estimated here, therefore in order to reduce influence of environmental noise as far as possible, can choose more quiet environment as far as possible and record.In addition, in order to reduce the deviation that the sample size deficiency causes estimated result, can choose a plurality of sample audio frequency, for example can from Qu Ku, picked at random N song play by audio amplifier, utilize microphone records to obtain one whole section recording audio simultaneously.
After obtaining recording audio, utilize original sample audio frequency and recording audio, just can estimate the transport function of recording-reproducing system.
The time-domain signal of supposing the original sample audio frequency is x (t), and the time-domain signal of recording audio is y (t), and the time domain transport function of recording-reproducing system is h (t), and y (t) is numerically equal to the convolution of x (t) and h (t) so, that is:
y(t)=x(t)*h(t)
If x (t) and y (t) are carried out Fourier transform respectively, obtain frequency-domain expression X (ω) and the Y (ω) of original sample audio frequency and recording audio, so, at frequency domain following relation is arranged:
Y(ω)=H(ω)X(ω)
Wherein H (ω) is the frequency domain transfer function of recording-reproducing system, and itself and h (t) be Fourier pair each other.Therefore utilize
Figure BDA00003259213200141
Just can estimate the transport function of recording-reproducing system, just " audio amplifier-microphone " system introduce noisiness.
In the process of recording of reality, may there be time delay between recording audio and the original sample audio frequency, this asynchrony phenomenon can impact the estimated result of ssystem transfer function, therefore in one embodiment of the invention, before the transport function of calculating recording-reproducing system, opening is earlier to calculating the relative time delay of original sample audio frequency and recording audio, and utilizes result of calculation that original sample audio frequency and recording audio are carried out time unifying.
The time-domain signal of supposing the original sample audio frequency is s (t), and the time-domain signal of recording audio is
Figure BDA00003259213200142
Because the two may be nonsynchronous in time, so need estimate the mistiming t of two paths of signals 0, utilize this mistiming that signal s (t) is compensated, obtain the signal s (t-t behind the delay compensation 0).At this moment, s (t-t 0) with
Figure BDA00003259213200143
Fully synchronously in time, can carry out follow-up transport function estimates to calculate.
The time delay of two paths of signals is estimated, can be adopted correlation method to carry out.Since s (t) with
Figure BDA00003259213200144
Be same source signal, have very strong correlativity, therefore can calculate the related function of the two:
R ( t 0 ) = Σ t s ( t + t 0 ) s ^ ( t )
When two paths of signals in time during complete matching, their correlativity is the strongest.So related function R (t 0) the position that occurs of peak value be exactly the time delay that needs are estimated.That is to say, by pair correlation function R (t 0) peak value search for, just can estimate time delay t 0
Be understandable that, except adopting the time domain correlation method, can also adopt frequency domain correlation method calculation delay, because related function and cross-spectral density are a pair of Fourier pairs, therefore can calculate the cross-spectral density function of signal earlier, be transformed to the time domain related function again, carry out peak value searching and obtain time delay t 0, no longer concrete formula is described here.
Obtain time delay t 0Afterwards, with the time delayed signal s (t-t of original audio sample 0) and the recording audio signal
Figure BDA00003259213200151
Be expressed as x (t) and y (t) respectively, through after the Fourier transform, just can be in the hope of the frequency domain transfer function H (ω) of recording-reproducing system.
In addition, demand according to reality, before the transport function of calculating recording-reproducing system, also may need the original sample audio frequency is done some pre-service, for example compression format audios such as MP3, APE are converted to unpacked format audio frequency, the sampling rate of audio frequency is carried out sampling processing, etc., because these contents do not belong to emphasis of the present invention, therefore do not need in the present embodiment to describe in detail.
In another kind of embodiment of the present invention, calculate the frequency domain transfer function H (ω) of recording-reproducing system except directly adopting X (ω) and Y (ω), can also calculate H (ω) by power spectral density function, concrete grammar is as follows:
For formula Y (ω)=H (ω) X (ω), X is multiply by at two ends simultaneously *(ω), and get for a long time on average, obtain
lim T → ∞ 1 T [ Y ( ω ) X * ( ω ) ] = H ( ω ) lim T → ∞ 1 T [ X ( ω ) X * ( ω ) ]
According to the power spectrum density definition, above-mentioned formula can be rewritten as:
S xy(ω)=H(ω)S x(ω)
Wherein, S Xy(ω) be the cross-spectral density function of original sample audio frequency and recording audio; S x(ω) be the autopower spectral density function of original sample audio frequency.Like this, utilize
H ^ ( ω ) = S xy ( ω ) S x ( ω )
Just can calculate the frequency domain transfer function estimated value of recording-reproducing system.
The advantage of this method of estimation is: owing to transcribe audio frequency y (t) except the noise that system self introduces, more or less all can introduce some neighbourhood noise n (t), this moment, the frequency domain transfer function of recording-reproducing system can be expressed as:
H ^ ( ω ) = S xy ( ω ) + S xn ( ω ) S x ( ω )
According to law of great numbers, abundant when the I/O data, the time that perhaps is averaged is during long enough, S Xn(ω) level off to zero.Therefore calculate with above-mentioned formula Can regard that the nothing of H (ω) estimates partially as, also just mean estimated value
Figure BDA00003259213200163
Basically the H (ω) with actual environment is consistent.
Add the processing unit 120 of making an uproar:
Obtained the Estimation of Frequency Response Function of system
Figure BDA00003259213200164
Afterwards, add the processing unit 120 of making an uproar and respectively each audio frequency in the audio repository is added the processing of making an uproar, make that the audio frequency after handling approaches actual recorded audio signals more on its acoustic characteristic, thereby alleviate training data and the unmatched problem of actual recorded audio signals.
In a kind of embodiment of the present invention, add the processing unit 120 of making an uproar and can further include additive noise processing subelement and multiplicative noise processing subelement, wherein multiplicative noise processing subelement is the basic configuration of apparatus of the present invention, concrete function is as follows:
Multiplicative noise is handled subelement, and to add the process of making an uproar actual be exactly the response results of audio frequency by recording-reproducing system of calculating respectively in the audio repository.Supposing that the audio frequency among the Qu Ku is s (t), is s'(t through adding the audio frequency of making an uproar after handling so) be:
s ′ ( t ) = s ( t ) * h ^ ( t )
Wherein Can by Obtain through Fourier inversion, the above-mentioned process of making an uproar that adds also can directly calculate by the product mode at frequency domain certainly, and the embodiment of the invention no longer describes in detail.
Above-mentioned multiplicative noise is handled subelement and is added in the processing of making an uproar, what mainly consider is the noise that recording-reproducing system self is introduced, in a kind of preferred implementation of the present invention, in order to allow training data more near actual playback environ-ment, can also further in adding the processing unit 120 of making an uproar, dispose additive noise and handle subelement, the playback environ-ment noise characteristic that utilization is pre-estimated carries out secondary to the audio frequency in the audio repository and adds the processing of making an uproar.
Different with the multiplicative noise that recording-reproducing system is introduced, neighbourhood noise is additivity, and therefore, secondary adds the process of making an uproar and handling, and is exactly at s'(t) the basis on, a spot of additive noise signal v (t) that superposes, obtaining secondary, to add the audio frequency of making an uproar after handling be s''(t):
s ′ ′ ( t ) = s ′ ( t ) + v ( t )
= s ( t ) * h ^ ( t ) + v ( t )
Wherein, noise signal v 1(t) can simulate in advance, also can gather some ambient noise signals in advance, train a rational disturbing signal v (t), such signal has higher System Discrimination performance more near the additive noise signal in the true environment.
In actual applications, the noise characteristic of introducing of different environment may have larger difference, if training data and a certain environmental characteristic are too approaching, in case bigger variation takes place the playback environ-ment of user's reality so, may have influence on actual recognition performance.In a kind of embodiment of the present invention, can preset multiple noise signal v at above-mentioned according to the different noise circumstances of reality 1(t), v 2(t), v 3(t) ... for example strong noise environment, middle noise circumstance, small noise environment etc.Correspondingly, according to many groups noise signal, training obtains many groups and adds noise cancellation signal s 1' ' (t), s 2' ' (t), s 3' ' (t) ..., follow-up in identifying, can search at other feature database of different noise levels respectively, thereby improve the recall rate of search.
Feature database is set up unit 130
Feature database set up unit 130 for make an uproar through adding (can be once add make an uproar or secondary add make an uproar) audio frequency after handling carries out audio feature extraction, introduces steps such as audio frequency cutting, feature clustering, index foundation then by the front, finishes the foundation in audio frequency characteristics storehouse.Because this part processing is similar with prior art, therefore no longer be described in detail in the present embodiment.
Need to prove if adding the stage of making an uproar, the training that has generated at different noise circumstances obtains how group adds noise cancellation signal, also can generate a plurality of audio frequency characteristics storehouse D1, D2, the D3 of corresponding different noise circumstance features so respectively in this step ...That is to say that for the arbitrary audio frequency among the D0 of original audio storehouse, s (t) is at each audio frequency characteristics storehouse D1, D2, D3 ... in, all there is one group of character pair of s (t), to adapt to different noise circumstances.
At the situation of setting up a plurality of audio frequency characteristics storehouse, the present invention also further provides corresponding audio identification scheme, referring to shown in Figure 3, also further comprises searching request processing unit 140 in device, be used under the situation that has a plurality of audio frequency characteristics storehouse, user's searching request being handled.
Searching request processing unit 140 the most basic a kind of implementations are: when the user imports the audio search request, namely the mode by recording is submitted one section audio fragment s to be identified to system q(t) after, system is at first to s q(t) carry out base conditioning such as feature extraction, audio frequency cutting, cluster, then respectively at the corresponding audio frequency characteristics of varying environment noise storehouse D1, D2, D3 ... search for, one or more Search Results that matching degree is the highest return to the user, for concrete audio frequency characteristics matching algorithm, can adopt existing techniques in realizing, not need detailed introduction in embodiments of the present invention.Because matching process is to carry out in the noise circumstance of multiple simulation, therefore can effectively reduce different environment to the influence of recording effect, under the situation of not loosening the coupling requirement, promote the recall rate of Search Results.
For the general user, to use in the process of audio search at certain, the situation of surrounding environment is metastable.In view of the situation, in one embodiment of the invention, if the searching request processing unit 140 current audio searchs at least one times of having finished, and the match condition of each Search Results better (can be judged by system self, perhaps mode such as user's marking is judged), so, if the result of preceding n search is at a certain definite audio frequency characteristics storehouse Dx(x=1,2,3 ...) in the matching result that finds, then searching request processing unit 140 is in n+1 and follow-up search, can only in Dx, search for, ignore other feature database.
For example, set in advance n=2, if the user is in preceding twice search, all in the D1 of audio frequency characteristics storehouse, found the Search Results of unique coupling, search for for the third time from the user so, searching request processing unit 140 can think that the current environment of living in of user and the corresponding environment of D1 are immediate, follow-uply directly searches in D1, carries out the system resource that matching operation will consume thereby saved in a plurality of feature databases.
The operation of above-mentioned automatic selection feature database can keep continuously effective, manually removes up to the user; Also can set in advance a failure period (for example 1 hour), treat the releasing of overtime back; Perhaps after detecting some behavior of user (for example close searched page, withdraw from search application etc.), automatically terminate.
Such scheme is that searching request processing unit 140 is according to some user behavior basic laws and the current agenda of user, to the noise characteristic estimation of the current environment of living in of user.In another embodiment of the invention, searching request processing unit 140 can also directly be estimated the noise characteristic of the current environment of living in of user according to objective data.Concrete grammar is: require one section " blank " audio frequency of user's typing earlier, in this " blank " audio frequency, without any contents such as music, songs, the background noise that only comprises current environment, searching request processing unit 140 just can be at D1, D2, D3 by this section noise is analyzed ... in find the feature database Dx that mates the most with user's current environment, follow-up when carrying out audio identification, can only in Dx, search for, ignore other feature database.
Particularly, at different audio frequency characteristics storehouses, can extract the proper vector (noise feature vector) that can characterize neighbourhood noise in advance.Usually can adopt normalization subband noise energy as the proper vector that characterizes noise signal.Be example with feature database D1, suppose that corresponding default additive noise signal is v 1(t), calculate its power spectral density function S by the mode of Fourier transform V1(ω), to power spectral density function S V1(ω) in frequency domain, carry out sub-band division, for example be divided into 16 subbands, the sub belt energy of calculating noise signal in each subband, and whole sub belt energies are carried out normalized, obtain the sub belt energy proper vector of one 16 dimension.This normalized sub belt energy vector has been described the distribution character of noise signal at frequency domain, can effectively distinguish different noise signals.
At current " blank " background noise recorded of user, the same normalization sub belt energy vector that extracts, and compare with the noise characteristic in different audio frequency characteristics storehouse, finding that the highest audio frequency characteristics storehouse of degree of matching, the mode of coupling can adopt modes such as Euclidean distance, inner product.If find and record the audio frequency characteristics storehouse Dx(x=1,2,3 that the characteristics of noise vector mates the most ...), then subsequent searches can only be carried out in Dx, ignores other feature databases.
Compare with the scheme of the neighbourhood noise feature being estimated according to user's basic act rule, the present embodiment scheme is selected the audio frequency characteristics storehouse based on the objective noise data of current environment, although need the user further to cooperate (the blank audio frequency of typing), have higher estimation accuracy rate in theory.
Automatically select the operation of feature database by the way, the mode that can adopt automatic or manual to trigger is equally removed no longer repeat specification here.
For the convenience of describing, be divided into various unit with function when describing above the device and describe respectively.Certainly, when enforcement is of the present invention, can in same or a plurality of softwares and/or hardware, realize the function of each unit.
As seen through the above description of the embodiments, those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential general hardware platform.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product can be stored in the storage medium, as ROM/RAM, magnetic disc, CD etc., comprise that some instructions are with so that a computer equipment (can be personal computer, server, the perhaps network equipment etc.) carry out the described method of some part of each embodiment of the present invention or embodiment.
Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and identical similar part is mutually referring to getting final product between each embodiment, and each embodiment stresses is difference with other embodiment.Especially, for device or system embodiment, because it is substantially similar in appearance to method embodiment, so describe fairly simplely, relevant part gets final product referring to the part explanation of method embodiment.Apparatus and system embodiment described above only is schematic, wherein said unit as the separating component explanation can or can not be physically to separate also, the parts that show as the unit can be or can not be physical locations also, namely can be positioned at a place, perhaps also can be distributed on a plurality of network element.Can select wherein some or all of module to realize the purpose of present embodiment scheme according to the actual needs.Those of ordinary skills namely can understand and implement under the situation of not paying creative work.
The above only is the specific embodiment of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (20)

1. audio frequency characteristics storehouse method for building up is characterized in that this method comprises:
Noise characteristic to recording-reproducing system is estimated;
Recording-reproducing system noise characteristic according to estimated adds the processing of making an uproar to the audio frequency in the audio repository;
Carry out feature extraction to adding the audio frequency of making an uproar after handling, utilize the feature of extracting to set up the audio frequency characteristics storehouse.
2. method according to claim 1 is characterized in that, described noise characteristic to recording-reproducing system is estimated, comprising:
From audio repository picked at random sample audio frequency, in default environment, play this sample audio frequency and recording;
Utilize original sample audio frequency and recording audio, calculate the transport function of recording-reproducing system.
3. method according to claim 2 is characterized in that, this method also comprises:
Before calculating the transport function of recording-reproducing system, to calculating the relative time delay of original sample audio frequency and recording audio, and utilize result of calculation that original sample audio frequency and recording audio are carried out time unifying.
4. method according to claim 2 is characterized in that, this method also comprises:
Before calculating the transport function of recording-reproducing system, be unpacked format with the original sample audio conversion of compressed format.
5. method according to claim 1 is characterized in that, describedly according to estimated recording-reproducing system noise characteristic the audio frequency in the audio repository is added the processing of making an uproar, and comprising:
Calculate the response results of audio frequency by described recording-reproducing system in the audio repository respectively.
6. method according to claim 1 is characterized in that, according to estimated recording-reproducing system noise characteristic, to the audio frequency in the audio repository add make an uproar handle after, also comprise:
Playback environ-ment noise characteristic according to pre-estimating adds the processing of making an uproar to the audio frequency in the audio repository.
7. method according to claim 1 is characterized in that,
The described playback environ-ment noise characteristic of pre-estimating comprises the noise characteristic corresponding to the actual playback environ-ment of difference;
Describedly carry out feature extraction to adding the audio frequency of making an uproar after handling, the feature that utilization is extracted is set up the audio frequency characteristics storehouse, comprise: carry out feature extraction to utilizing the varying environment noise characteristic to add to make an uproar the audio frequency of handling, the corresponding many groups audio frequency characteristics that obtains same audio frequency is further set up many groups audio frequency characteristics storehouse of corresponding multiple neighbourhood noise.
8. method according to claim 7 is characterized in that, this method also comprises:
After receiving the audio search request of user's input, search in the corresponding audio frequency characteristics of varying environment noise storehouse respectively.
9. method according to claim 8 is characterized in that, this method also comprises:
After finishing search at least one times, determine the corresponding audio frequency characteristics of user's current environment storehouse according to Search Results, subsequent searches is directly searched in determined feature database.
10. method according to claim 8 is characterized in that, this method also comprises:
Noise characteristic to user's current environment is estimated, determines the corresponding audio frequency characteristics of user's current environment storehouse according to estimated result;
After the audio search request that receives user's input, directly in determined feature database, search for.
11. an audio frequency characteristics storehouse apparatus for establishing is characterized in that this device comprises:
The system noise estimation unit is used for the noise characteristic of recording-reproducing system is estimated;
Add the processing unit of making an uproar, be used for according to estimated recording-reproducing system noise characteristic, the audio frequency in the audio repository is added the processing of making an uproar;
Feature database is set up the unit, is used for carrying out feature extraction to adding the audio frequency of making an uproar after handling, and utilizes the feature of extracting to set up the audio frequency characteristics storehouse.
12. device according to claim 11 is characterized in that, described system noise estimation unit specifically is used for:
From audio repository picked at random sample audio frequency, in default environment, play this sample audio frequency and recording;
Utilize original sample audio frequency and recording audio, calculate the transport function of recording-reproducing system.
13. device according to claim 12 is characterized in that, described system noise estimation unit also is used for:
Before calculating the transport function of recording-reproducing system, to calculating the relative time delay of original sample audio frequency and recording audio, and utilize result of calculation that original sample audio frequency and recording audio are carried out time unifying.
14. device according to claim 12 is characterized in that, described system noise estimation unit also is used for:
Before calculating the transport function of recording-reproducing system, be unpacked format with the original sample audio conversion of compressed format.
15. device according to claim 11 is characterized in that, the described processing unit of making an uproar that adds comprises multiplicative noise processing subelement, is used for calculating the audio frequency of audio repository by the response results of described recording-reproducing system respectively.
16. device according to claim 11, it is characterized in that, the described processing unit of making an uproar that adds, also comprise additive noise processing subelement, be used for according to estimated recording-reproducing system noise characteristic, to the audio frequency in the audio repository add make an uproar handle after, the playback environ-ment noise characteristic according to pre-estimating adds the processing of making an uproar to the audio frequency in the audio repository.
17. device according to claim 11 is characterized in that,
The described playback environ-ment noise characteristic of pre-estimating comprises the noise characteristic corresponding to the actual playback environ-ment of difference;
Described feature database is set up the unit, and concrete being used for utilizing the varying environment noise characteristic to add to make an uproar the audio frequency of handling to carry out feature extraction, the corresponding many groups audio frequency characteristics that obtains same audio frequency is further set up many groups audio frequency characteristics storehouse of corresponding multiple neighbourhood noise.
18. device according to claim 17 is characterized in that, this device also comprises:
The searching request processing unit is used for searching in the corresponding audio frequency characteristics of varying environment noise storehouse respectively after the audio search request that receives user's input.
19. device according to claim 18 is characterized in that, described searching request processing unit also is used for:
After finishing search at least one times, determine the corresponding audio frequency characteristics of user's current environment storehouse according to Search Results, subsequent searches is directly searched in determined feature database.
20. device according to claim 18 is characterized in that, this device also comprises:
The searching request processing unit is used for the noise characteristic of user's current environment is estimated, determines the corresponding audio frequency characteristics of user's current environment storehouse according to estimated result; After the audio search request that receives user's input, directly in determined feature database, search for.
CN201310203045.4A 2013-05-28 2013-05-28 A kind of audio frequency feature library method for building up and device Active CN103280215B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310203045.4A CN103280215B (en) 2013-05-28 2013-05-28 A kind of audio frequency feature library method for building up and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310203045.4A CN103280215B (en) 2013-05-28 2013-05-28 A kind of audio frequency feature library method for building up and device

Publications (2)

Publication Number Publication Date
CN103280215A true CN103280215A (en) 2013-09-04
CN103280215B CN103280215B (en) 2016-03-23

Family

ID=49062711

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310203045.4A Active CN103280215B (en) 2013-05-28 2013-05-28 A kind of audio frequency feature library method for building up and device

Country Status (1)

Country Link
CN (1) CN103280215B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104952450A (en) * 2015-05-15 2015-09-30 百度在线网络技术(北京)有限公司 Far field identification processing method and device
CN105321526A (en) * 2015-09-23 2016-02-10 联想(北京)有限公司 Audio processing method and electronic device
CN108784932A (en) * 2017-05-02 2018-11-13 中国石油化工股份有限公司 A kind of preventing noise ear cover based on spectrum analysis
CN110875050A (en) * 2020-01-17 2020-03-10 深圳亿智时代科技有限公司 Voice data collection method, device, equipment and medium for real scene
CN111179969A (en) * 2019-12-26 2020-05-19 数海信息技术有限公司 Alarm method, device and system based on audio information and storage medium
WO2021027132A1 (en) * 2019-08-12 2021-02-18 平安科技(深圳)有限公司 Audio processing method and apparatus and computer storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1331467A (en) * 2000-06-28 2002-01-16 松下电器产业株式会社 Method and device for producing acoustics model
EP1189204A2 (en) * 2000-09-18 2002-03-20 Pioneer Corporation HMM-based noisy speech recognition
US20050043945A1 (en) * 2003-08-19 2005-02-24 Microsoft Corporation Method of noise reduction using instantaneous signal-to-noise ratio as the principal quantity for optimal estimation
CN1595497A (en) * 2003-09-12 2005-03-16 古井贞熙 Noise adaptation system and method for speech model, noise adaptation program for speech recognition
CN1801326A (en) * 2004-12-31 2006-07-12 中国科学院自动化研究所 Method for adaptively improving speech recognition rate by means of gain
CN101354887A (en) * 2007-07-25 2009-01-28 通用汽车公司 Ambient noise injection for use in speech recognition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1331467A (en) * 2000-06-28 2002-01-16 松下电器产业株式会社 Method and device for producing acoustics model
EP1189204A2 (en) * 2000-09-18 2002-03-20 Pioneer Corporation HMM-based noisy speech recognition
US20050043945A1 (en) * 2003-08-19 2005-02-24 Microsoft Corporation Method of noise reduction using instantaneous signal-to-noise ratio as the principal quantity for optimal estimation
CN1595497A (en) * 2003-09-12 2005-03-16 古井贞熙 Noise adaptation system and method for speech model, noise adaptation program for speech recognition
CN1801326A (en) * 2004-12-31 2006-07-12 中国科学院自动化研究所 Method for adaptively improving speech recognition rate by means of gain
CN101354887A (en) * 2007-07-25 2009-01-28 通用汽车公司 Ambient noise injection for use in speech recognition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何勇军,韩纪庆: "语音识别中环境失配补偿综述", 《智能计算机与应用》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104952450A (en) * 2015-05-15 2015-09-30 百度在线网络技术(北京)有限公司 Far field identification processing method and device
CN104952450B (en) * 2015-05-15 2017-11-17 百度在线网络技术(北京)有限公司 The treating method and apparatus of far field identification
CN105321526A (en) * 2015-09-23 2016-02-10 联想(北京)有限公司 Audio processing method and electronic device
CN105321526B (en) * 2015-09-23 2020-07-24 联想(北京)有限公司 Audio processing method and electronic equipment
CN108784932A (en) * 2017-05-02 2018-11-13 中国石油化工股份有限公司 A kind of preventing noise ear cover based on spectrum analysis
WO2021027132A1 (en) * 2019-08-12 2021-02-18 平安科技(深圳)有限公司 Audio processing method and apparatus and computer storage medium
CN111179969A (en) * 2019-12-26 2020-05-19 数海信息技术有限公司 Alarm method, device and system based on audio information and storage medium
CN110875050A (en) * 2020-01-17 2020-03-10 深圳亿智时代科技有限公司 Voice data collection method, device, equipment and medium for real scene

Also Published As

Publication number Publication date
CN103280215B (en) 2016-03-23

Similar Documents

Publication Publication Date Title
CN103280215B (en) A kind of audio frequency feature library method for building up and device
Haitsma et al. A highly robust audio fingerprinting system with an efficient search strategy
CN101002254B (en) Device and method for robustry classifying audio signals, method for establishing and operating audio signal database
CN103971689B (en) A kind of audio identification methods and device
CN102332262B (en) Method for intelligently identifying songs based on audio features
EP1410380B1 (en) Automatic identification of sound recordings
US9659092B2 (en) Music information searching method and apparatus thereof
US20130226957A1 (en) Methods, Systems, and Media for Identifying Similar Songs Using Two-Dimensional Fourier Transform Magnitudes
US8706276B2 (en) Systems, methods, and media for identifying matching audio
Das et al. Assessing the scope of generalized countermeasures for anti-spoofing
CN105405448A (en) Sound effect processing method and apparatus
CN104598502A (en) Method, device and system for obtaining background music information in played video
CN105612510A (en) System and method for performing automatic audio production using semantic data
CN102436806A (en) Audio frequency copy detection method based on similarity
CN105741835A (en) Audio information processing method and terminal
CN106021398A (en) Information publishing method and apparatus
Kim et al. Robust audio fingerprinting using peak-pair-based hash of non-repeating foreground audio in a real environment
CN108197319A (en) A kind of audio search method and system of the characteristic point based on time-frequency local energy
CN104142831A (en) Application program searching method and device
CN105575400A (en) Method, terminal, server, and system for obtaining song information
CN104900239A (en) Audio real-time comparison method based on Walsh-Hadamard transform
CN105589970A (en) Music searching method and device
Wassi et al. FPGA-based real-time MFCC extraction for automatic audio indexing on FM broadcast data
Távora et al. Detecting replicas within audio evidence using an adaptive audio fingerprinting scheme
WO2016110156A1 (en) Voice search method and apparatus, terminal and computer storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant