CN105224844A

CN105224844A - Verification method, system and device

Info

Publication number: CN105224844A
Application number: CN201410311000.3A
Authority: CN
Inventors: 张大威; 黄亮
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2014-07-01
Filing date: 2014-07-01
Publication date: 2016-01-06
Anticipated expiration: 2034-07-01
Also published as: CN105224844B

Abstract

The invention provides a kind of verification method, described method comprises: receive audio frequency to be verified; SNR estimation is carried out to described audio frequency to be verified; Judge whether the signal to noise ratio (S/N ratio) of described audio frequency to be verified reaches first threshold, whether the content if so, then detected in described audio frequency to be verified is consistent with the content in corresponding original audio, is if so, then verified, otherwise checking is not passed through.Adopt this verification method, can environmental impact be reduced, improve accuracy of speech recognition, thus improve the accuracy of checking, reduce erroneous judgement.In addition, another kind of verification method and a kind of verification system and device is additionally provided.

Description

Verification method, system and device

Technical field

The present invention relates to field of computer technology, particularly relate to a kind of verification method, system and device.

Background technology

Whether all need authentication of users to be " the real mankind " under many application scenarioss, identifying code is exactly the public auto-programming that a kind of current effective differentiation user is computing machine and people.User needs the validation problem of answering the proposition of website service side just can be thought the mankind by terminal and continue to enjoy to serve.Facts have proved, based on identifying code verification method can high degree reduces malice Brute Force password, brush ticket, forum pour water, brush the common network security risk such as page.

Read-write identifying code is a kind of modal identifying code, normally provide a picture containing character string by service side, require the content of user inputs character string, can comprise at least one in upper and lower case letter, numeral, Chinese character, data formula etc. in character string, the length of character string can at random or be fixed.The material provided is provided by user, and inputs corresponding content, then pass server back and verify.

The one distortion of read-write identifying code is dictation identifying code, is provide a section audio to user by server, and user understands the content of audio frequency and recorded and pass server back and verify.But, no matter be read-write identifying code or dictation identifying code, all need user's input characters, be easily subject to the attack of yard work.Code work refers to the people carrying out the manual input of identifying code specially, and skilled code instrument has very high input efficiency (hundred milliseconds of ranks), causes have a strong impact on the availability of system.

Thus, propose one and hear identifying code.Hear that identifying code refers in use, play to user the audio frequency that a section has word content, require that user faces toward microphone and repeats this section audio, server is verified this section audio collected.But the collection environment of audio frequency is uncontrollable, may there is various Noise and Interference in the audio frequency that therefore server receives, and how to reduce environment to hearing that the impact of identifying code is problem demanding prompt solution.

Summary of the invention

Based on this, be necessary for above-mentioned technical matters, a kind of verification method, system and the device that can reduce environmental impact are provided.

A kind of verification method, described method comprises:

Receive audio frequency to be verified;

SNR estimation is carried out to described audio frequency to be verified;

Judge whether the signal to noise ratio (S/N ratio) of described audio frequency to be verified reaches first threshold, if so, then

Whether the content detected in described audio frequency to be verified is consistent with the content in corresponding original audio, is if so, then verified, otherwise checking is not passed through.

A kind of verification method, described method comprises:

Receive the original audio issued;

Obtain the audio frequency to be verified of input;

SNR estimation is carried out to described audio frequency to be verified;

Whether the content detected in described audio frequency to be verified is consistent with the content in described original audio, is if so, then verified, otherwise checking is not passed through.

A kind of verification system, described system comprises:

Receiver module, for receiving audio frequency to be verified;

SNR estimation module, for carrying out SNR estimation to described audio frequency to be verified;

Judge module, for judging whether the signal to noise ratio (S/N ratio) of described audio frequency to be verified reaches first threshold;

Authentication module, if reach first threshold for the signal to noise ratio (S/N ratio) of described audio frequency to be verified, then whether the content detected in described audio frequency to be verified is consistent with the content in corresponding original audio, is if so, then verified, otherwise checking is not passed through.

A kind of demo plant, described device comprises:

Receiver module, for receiving the original audio issued;

Acquisition module, for obtaining the audio frequency to be verified of input;

Authentication module, if reach first threshold for the signal to noise ratio (S/N ratio) of described audio frequency to be verified, then whether the content detected in described audio frequency to be verified is consistent with the content in described original audio, is if so, then verified, otherwise checking is not passed through.

Above-mentioned verification method, system and device, by carrying out SNR estimation to audio frequency to be verified, when signal to noise ratio (S/N ratio) reaches first threshold, show that the noise ratio contained in audio frequency to be verified is less, accurately can identify content wherein, then whether the content in direct-detection audio frequency to be verified is consistent with the content in corresponding original audio, if unanimously, then be verified, otherwise checking is not passed through.The audio frequency that the method, system and device have collected under having excluded the environment of some too noisy, decrease the impact of environment on checking, the audio frequency less for noise ratio carries out speech recognition, can improve the accuracy of checking, reduces erroneous judgement.

Accompanying drawing explanation

Fig. 1 is the applied environment figure of verification method in an embodiment;

Fig. 2 is the schematic flow sheet of verification method in an embodiment;

Fig. 3 is the schematic flow sheet of verification method in another embodiment;

Fig. 4 is the structural representation of verification system in an embodiment;

Fig. 5 is the structural representation of verification system in another embodiment;

Fig. 6 is the schematic flow sheet of verification method in another embodiment;

Fig. 7 is the schematic flow sheet of verification method in another embodiment;

Fig. 8 is the structural representation of demo plant in an embodiment;

Fig. 9 is the structural representation of demo plant in another embodiment;

Figure 10 is the hardware environment figure of the server of runtime verification method in an embodiment;

Figure 11 is the hardware environment figure of the terminal of runtime verification method in an embodiment.

Embodiment

In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.

The verification method that one embodiment of the invention provides can be applicable in environment as shown in Figure 1.Please refer to shown in Fig. 1, terminal 20 issues server 40 by network 30 and audio frequency and authentication server 50 carries out alternately, when user 10 uses terminal 20 to the application server request msg that some application is corresponding, audio frequency issues server 40 and issues original audio to terminal 20 by network 30, and each original audio has unique numbering.User 10 listens to the original audio received by terminal 20, terminal 20 receives the audio frequency (i.e. audio frequency to be verified) of user 10 input again by audio input device such as microphones, audio frequency to be verified is sent to authentication server 50 with the numbering of corresponding original audio by network 30 and verifies by terminal 20, authentication server 50 receives audio frequency to be verified, SNR estimation is carried out to audio frequency to be verified, estimate noise wherein, if noise is less, then identify the content in audio frequency to be verified.Authentication server 50 receives the numbering of original audio, gets corresponding original audio, also identify the content in original audio according to numbering.Further, whether the content that authentication server 50 detects in audio frequency to be verified is consistent with the content in corresponding original audio, if unanimously, illustrates that the audio frequency that user 10 is inputted by terminal 20 is correct, be verified, otherwise checking do not passed through.If the noise in audio frequency to be verified is excessive, then illustrate that user 10 inputs the neighbourhood noise of audio frequency too high, can not deal with or point out user 10 to re-enter audio frequency, or, can also judge whether the signal to noise ratio (S/N ratio) of audio frequency to be verified reaches a threshold value further, if so, then illustrate that the environment that user 10 inputs audio frequency is small noise environment, whether the content detected again after can carrying out speech enhan-cement to audio frequency to be verified is wherein consistent with the content in original audio.Be appreciated that audio frequency issues server 40 and authentication server 50 can be same server, be responsible for issuing original audio and verifying the audio frequency uploaded by a server.

In one embodiment, as shown in Figure 2, provide a kind of verification method, the method is illustrated to be applied to server end, specifically comprises:

Step 202, receives audio frequency to be verified.

In the present embodiment, terminal to server sends request of data, when asking some data, server needs to verify user, server automated randomizedly can issue one section of fixing audio frequency to terminal, and each audio frequency has unique numbering, and this numbering also sends to terminal in the lump.The user of terminal listens to audio frequency, is received the audio frequency (audio frequency to be verified) of user's input, be sent to server by equipment such as microphones together with the numbering of original audio.

Step 204, carries out SNR estimation to audio frequency to be verified.

In the present embodiment, the audio frequency to be verified that server receiving terminal is uploaded, carries out SNR estimation to audio frequency to be verified, obtains the signal to noise ratio (S/N ratio) of audio frequency to be verified.Signal to noise ratio (S/N ratio) is the ratio of noise in noisy audio frequency and audio frequency, and signal to noise ratio (S/N ratio) can reflect in audio frequency containing how many noises.Server can determine the collection environment of audio frequency to be verified by the signal to noise ratio (S/N ratio) estimating audio frequency to be verified, namely determines to use the current residing environment of the user of terminal whether to have noise.

Step 206, judges whether the signal to noise ratio (S/N ratio) of audio frequency to be verified reaches first threshold, if so, then enters step 208, otherwise enters step 210.

Signal to noise ratio (S/N ratio) is larger, and illustrate that audio frequency contains noise fewer, signal to noise ratio (S/N ratio) is less, illustrates that audio frequency contains more.Preset first threshold value, when the signal to noise ratio (S/N ratio) of audio frequency is more than or equal to first threshold, then shows in audio frequency less containing noise, when the signal to noise ratio (S/N ratio) of audio frequency is less than first threshold, then shows in audio frequency more containing noise.Less containing noise in audio frequency, then speech recognition is more accurate.

Step 208, whether the content detected in audio frequency to be verified is consistent with the content in corresponding original audio, is if so, then verified, otherwise checking is not passed through.

The signal to noise ratio (S/N ratio) of audio frequency to be verified is more than or equal to first threshold, then show in audio frequency to be verified less containing noise, directly carry out speech recognition to audio frequency to be verified, identify content wherein, is such as " studying hard and make progress everyday ".In the present embodiment, when terminal to server uploads audio frequency to be verified, upload the numbering of corresponding original audio in the lump, server finds corresponding original audio according to this numbering, identifies the content in original audio, and whether the content in both then detecting is consistent, if consistent, what show that user inputs is correct audio frequency, be verified, otherwise checking is not passed through.

Step 210, does not process or points out and re-enter or judge further.

The signal to noise ratio (S/N ratio) of audio frequency to be verified is less than first threshold, shows to carry out speech recognition accurately too much containing noise in audio frequency to be verified.Now, can direct process ends or return information that prompting re-enters audio frequency to terminal, so that reminding user should gather audio frequency under the environment that noise is less.Or, can also judge signal to noise ratio (S/N ratio) further, to segment out the noise circumstance residing for audio frequency, do other process.

In the present embodiment, by carrying out SNR estimation to audio frequency to be verified, when signal to noise ratio (S/N ratio) reaches first threshold, show that the noise ratio contained in audio frequency to be verified is less, accurately can identify content wherein, then whether the content in direct-detection audio frequency to be verified is consistent with the content in corresponding original audio, if unanimously, then be verified, otherwise checking is not passed through.The audio frequency that the method has collected under having excluded the environment of some too noisy, decrease the impact of environment on checking, the audio frequency less for noise ratio carries out speech recognition, can improve the accuracy of checking, reduces erroneous judgement.In addition, the present embodiment realizes checking by gathering audio frequency, using audio frequency as a kind of identifying code, namely identifying code is heard, this identifying code is relative to the identifying code of other form, application is more extensive, is such as applicable to the crowds such as some colour blindness, anomalous trichromatism and blind person, is applicable to less mobile device of the size of some physical keyboard or dummy keyboard etc.Further, for code work, its work efficiency can be reduced by various means, such as reduce the playback rate of original audio, increase the length of audio frequency internal information or use dialect etc., therefore thisly hear that safety of verification code is also higher.

In one embodiment, this verification method also comprises: if the signal to noise ratio (S/N ratio) of audio frequency to be verified does not reach first threshold, then judge whether the signal to noise ratio (S/N ratio) of audio frequency to be verified reaches Second Threshold further, if, then carry out speech enhan-cement to audio frequency to be verified, whether the content detected in the audio frequency to be verified after speech enhan-cement is consistent with the content in corresponding original audio, if, then be verified, otherwise checking is not passed through; If the signal to noise ratio (S/N ratio) of audio frequency to be verified does not reach Second Threshold, then return the information pointed out and re-enter.

As shown in Figure 3, in one embodiment, provide a kind of verification method, the method is illustrated to be applied to server end, specifically comprises:

Step 302, receives audio frequency to be verified.

Step 304, carries out SNR estimation to audio frequency to be verified.

Step 306, judges whether the signal to noise ratio (S/N ratio) of audio frequency to be verified reaches first threshold, if so, then enters step 308, otherwise enters step 310.

Step 308, whether the content detected in audio frequency to be verified is consistent with the content in corresponding original audio, is if so, then verified, otherwise checking is not passed through.

As mentioned above, in the present embodiment, if the signal to noise ratio (S/N ratio) of audio frequency to be verified is more than or equal to first threshold, then when showing to gather audio frequency, user place environment is comparatively quiet environment, can content in Direct Recognition audio frequency.

Step 310, judges whether the signal to noise ratio (S/N ratio) of audio frequency to be verified reaches Second Threshold, if so, then enters step 312, otherwise enters step 316 further.

In the present embodiment, Second Threshold is less than first threshold.If the signal to noise ratio (S/N ratio) of audio frequency to be verified is less than Second Threshold, when then showing to gather audio frequency, residing for terminal, environment is strong noise environment, if the signal to noise ratio (S/N ratio) of audio frequency to be verified is between first threshold and Second Threshold, when then showing to gather audio frequency, residing for terminal, environment is small noise environment, if the signal to noise ratio (S/N ratio) of audio frequency to be verified is more than or equal to first threshold, then when showing to gather audio frequency, residing for terminal, environment is quiet environment.

Step 312, carries out speech enhan-cement to audio frequency to be verified.

For the audio frequency collected under small noise environment, recognition result will be made inaccurate if directly carry out speech recognition, and therefore need first to carry out speech enhan-cement to audio frequency.Speech enhan-cement refers to from containing noisy audio frequency removes pure voice, realizes the object removing noise.It is more accurate to identify the audio frequency after speech enhan-cement.

Step 314, whether the content detected in the audio frequency to be verified after speech enhan-cement is consistent with the content in corresponding original audio, is if so, then verified, otherwise checking is not passed through.

In the present embodiment, the audio frequency to be verified after speech enhan-cement is identified, identifies content wherein, contrast with the content in original audio, if unanimously, then show that the audio frequency that user inputs is accurate, be verified, otherwise checking is not passed through.

Step 316, returns the information pointed out and re-enter.

When the signal to noise ratio (S/N ratio) of audio frequency to be verified is less than Second Threshold, show that audio frequency is the audio frequency gathered under strong noise environment, speech recognition is carried out to this audio frequency or speech enhan-cement all can not obtain good recognition effect, this class audio frequency received then is not dealt with, server is to terminal return message, terminal demonstration points out the information re-entered, and such as information is " ambient noise present is too high, please re-enters under quiet environment ".

In the present embodiment, when signal to noise ratio (S/N ratio) within the specific limits time, determine that audio frequency to be verified is the audio frequency gathered under small noise environment, identify again after speech enhan-cement is carried out to this class audio frequency and verify, better recognition effect can be obtained and the accuracy of checking can be improved, reduce erroneous judgement further.By speech enhan-cement, thus also can realize checking for the audio frequency gathered under small noise environment, expand the range of application of verification system.

In one embodiment, audio frequency to be verified (hereinafter referred to as audio frequency) is carried out to the step of SNR estimation, comprising: audio frequency to be verified is carried out Fourier transform to frequency domain; Minimum control recurrence average algorithm (MCRA, Minima-controlledRecursiveAveraging) is adopted to carry out SNR estimation to the audio frequency transforming to frequency domain.

In this algorithm, noise power Power estimation is revised hypothesis based on following two and is obtained:

H_{0}^{k} : {\hat{σ}}_{d}^{2} (λ, k) = α {\hat{σ}}_{d}^{2} (λ - 1, k) + (1 - α) {| Y (λ, k) |}^{2}

H_{1}^{k} : {\hat{σ}}_{d}^{2} (λ, k) = {\hat{σ}}_{d}^{2} (λ - 1, k)

Wherein, represent the voice section of existence, the result of noise continuity former frame does not upgrade power spectrum; represent the voice not section of existence, voice noise composition is more, at this moment upgrades noise.

Concrete, the process of SNR estimation comprises:

1) level and smooth noisy power spectrum density is calculated

Adopt the level and smooth noisy power spectrum density of following formulae discovery:

S(λ,k)＝α _sS(λ-1,k)+(1-α _s)S _f(λ,k)

Wherein

S_{f} (λ, k) = Σ_{i = - L_{w}}^{L_{w}} w (i) {| Y (λ, k - i) |}^{2}

Wherein, λ is frame number, and k is frequency, α _sfor smoothing factor, L _wfor frame length, w (i) is hamming code window, | Y (λ, k-i) | ²for noisy speech power spectrum density, S _f(λ, k) is the level and smooth noisy power spectrum density in frequency domain.

2) minimum value of noisy power spectrum density is asked for

The minimum value S of noisy power spectrum density is asked for according to following formula _min(λ, k):

ifmod(λ/D)＝0

S _min(λ,k)＝min{S _tmp(λ-1,k),S(λ,k)}

S _tmp(λ,k)＝S(λ,k)

else

S _min(λ,k)＝min{S _min(λ-1,k),S(λ,k)}

S _tmp(λ,k)＝min{S _tmp(λ-1,k),S(λ,k)}

end

Wherein, D is search window length, for m asks modular arithmetic, S to D _tmpm () is the minimum value of power spectrum in current search window, S _minm () is the minimum value of the power spectrum of renewal continuously in each search window.

Therefore, the performance number of present frame noisy speech and now power spectrum minimum value S _rthe ratio of (λ, k) can be rewritten as:

S_{r} (λ, k) = \frac{S (λ, k)}{S_{\min} (λ, k)}

Wherein, S _min(λ, k) is the minimum value of current frequency k place noisy speech power spectrum.

3) calculate Local speech and there is probability P (λ, k)

By by S _r(λ, k) and one sets compared with threshold value δ, and the Local speech that can obtain present analysis frame intermediate-frequeney point k place exists probability P (λ, k), as shown in following formula:

ifS _r(λ,k)>δ

p＝1speechpresent

else

p＝0speechabsent

end

In above formula, p is P (λ, k).

4) smoothing factor is calculated

Adopt following formulae discovery smoothing factor α _d(λ, k):

α _d(λ,k)＝α+(1-α)P(λ,k)

Wherein, α is fixed constant, reflects the impact that former frame noise in noise segment is estimated to estimate present frame noise.

5) noise power spectral density is upgraded

Be calculated as follows noise power spectral density

{\hat{σ}}_{d}^{2} (λ, k) = α_{d} (λ, k) \cdot {\hat{σ}}_{d}^{2} (λ - 1, k) + [1 - α_{d} (λ, k)] {| Y (λ, k) |}^{2}

Wherein, for the noise power spectral density at present analysis frame intermediate-frequeney point k place, | Y (λ, k) | ²for noisy speech power spectrum density.

6) SNR estimation value is calculated

Wherein, posteriori SNR is calculated as:

γ _k＝Y _k/λ _d(k)

Wherein, λ _dk () is noise power spectral density, namely y _kfor noisy speech power spectrum density, namely | Y (λ, k) | ².

When the signal to noise ratio (S/N ratio) of audio frequency to be verified and first threshold, Second Threshold are compared, be that posteriori SNR is compared.

Carry out speech enhan-cement for follow-up, also need in this step to calculate prior weight.Concrete, adopt the method determination prior weight based on judgement for:

{\hat{ξ}}_{k} (λ) = α \frac{{\hat{X}}_{k}^{2} (λ - 1)}{λ_{d} (k, λ - 1)} + (1 - α) \max [γ_{k} (λ) - 1,0]

Wherein, 0< α <1 is weight factor, for the amplitude estimation of former frame.

In one embodiment, audio frequency to be verified is carried out to the step of speech enhan-cement, comprising:

The voice enhancement algorithm of employing Corpus--based Method model removes the noise in audio frequency to be verified; Noise audio frequency to be verified will be eliminated and carry out inverse fourier transform to time domain, obtain the audio frequency to be verified after speech enhan-cement.

Concrete, adopt the detailed process of the noise in the voice enhancement algorithm of Corpus--based Method model removal audio frequency as follows:

The expression formula of weighting European distortion measurement (WEDM, WeightedEuclideanDistortionMeasure) short-time magnitude Power estimation device is as follows:

d (X_{k}, {\hat{X}}_{k}) = {(X_{k} - {\hat{X}}_{k})}^{2} / X_{k}

Wherein, d represents cost function, X _kfor the voice before denoising, for the voice after denoising, the Output rusults namely after final speech enhan-cement.

Utilize the cost function in above formula, bayes risk function can be made minimum, obtain the WEDM short-time magnitude Power estimation device of Corpus--based Method model:

{\hat{X}}_{k} = \frac{\sqrt{v_{k}}}{\sqrt{π} γ_{k}} \frac{\exp (v_{k} / 2)}{I_{0} (v_{k} / 2)} Y_{k}

Wherein, v _k=ξ _kγ _k/ (1+ ξ _k)

I ₀() is the Bessel's function that 0 rank are revised.

As shown in Figure 4, in one embodiment, additionally provide a kind of verification system, this system comprises:

Receiver module 402, for receiving audio frequency to be verified.

SNR estimation module 404, for carrying out SNR estimation to audio frequency to be verified.

Judge module 406, for judging whether the signal to noise ratio (S/N ratio) of audio frequency to be verified reaches first threshold.

Authentication module 408, if reach first threshold for the signal to noise ratio (S/N ratio) of audio frequency to be verified, then whether the content detected in audio frequency to be verified is consistent with the content in corresponding original audio, is if so, then verified, otherwise checking is not passed through.

Further, in one embodiment, if judge module 406 does not also reach first threshold for the signal to noise ratio (S/N ratio) of audio frequency to be verified, then judge whether the signal to noise ratio (S/N ratio) of audio frequency to be verified reaches Second Threshold further.The present embodiment, as shown in Figure 5, verification system also comprises speech enhan-cement module 407 and information returns module 409, wherein:

If speech enhan-cement module 407 reaches Second Threshold for the signal to noise ratio (S/N ratio) of audio frequency to be verified, then speech enhan-cement is carried out to audio frequency to be verified.

If information returns module 409 do not reach Second Threshold for the signal to noise ratio (S/N ratio) of audio frequency to be verified, then return the information pointed out and re-enter.

In the present embodiment, whether the content of authentication module 408 also for detecting in the audio frequency to be verified after speech enhan-cement be consistent with the content in original audio, is if so, then verified, otherwise checking is not passed through.

In one embodiment, SNR estimation module 404 is also for carrying out Fourier transform to frequency domain by audio frequency to be verified; Adopt minimum value to control recurrence average algorithm and SNR estimation is carried out to the audio frequency to be verified transforming to frequency domain.

In one embodiment, speech enhan-cement module 407 is also for adopting the voice enhancement algorithm of Corpus--based Method model to remove noise in audio frequency to be verified; The audio frequency to be verified eliminating noise is carried out inverse fourier transform to time domain, obtains the audio frequency to be verified after speech enhan-cement.

As shown in Figure 6, in one embodiment, additionally provide another kind of verification method, the method is illustrated to be applied to various terminal, comprising:

Step 602, receives the original audio issued.

Step 604, obtains the audio frequency to be verified of input.

Step 606, carries out SNR estimation to audio frequency to be verified.

Step 608, judges whether the signal to noise ratio (S/N ratio) of audio frequency to be verified reaches first threshold, if so, then enters step 610, otherwise enters step 612.

Step 610, whether the content detected in audio frequency to be verified is consistent with the content in original audio, if be then verified, otherwise checking is not passed through.

Step 612, does not process or points out and re-enter or judge further.

In the present embodiment, terminal to transmission request of data, ask some data time, server needs to verify user, server automated randomizedly can issue one section of fixing audio frequency to terminal, and each audio frequency has unique numbering, and this numbering also sends to terminal in the lump.Terminal receives the original audio issued, and the user of terminal listens to audio frequency, is obtained the audio frequency (audio frequency to be verified) of user's input by equipment such as microphones.

Further, terminal directly can carry out SNR estimation and checking to audio frequency to be verified.Wherein the process of SNR estimation and checking is as shown in Figure 2 described in embodiment, then repeats no more at this.

In another embodiment, as shown in Figure 7, provide a kind of verification method, detailed process comprises:

Step 702, receives the original audio issued.

Step 704, obtains the audio frequency to be verified of input.

Step 706, carries out SNR estimation to audio frequency to be verified.

Step 708, judges whether the signal to noise ratio (S/N ratio) of audio frequency to be verified reaches first threshold, if so, then enters step 710, otherwise enters step 712.

Step 710, whether the content detected in audio frequency to be verified is consistent with the content in original audio, if be then verified, otherwise checking is not passed through.

Step 712, judges whether the signal to noise ratio (S/N ratio) of audio frequency to be verified reaches Second Threshold, if so, then enters step 714, otherwise enters step 718 further.

Step 714, carries out speech enhan-cement to audio frequency to be verified.

Step 716, whether the content detected in the audio frequency to be verified after speech enhan-cement is consistent with the content in original audio, if be then verified, otherwise checking is not passed through.

Step 718, audio frequency is re-entered in prompting.

In the present embodiment, Second Threshold is less than first threshold, by the signal to noise ratio (S/N ratio) of audio frequency to be verified is compared with first threshold, Second Threshold, the audio frequency of the input got can be divided three classes, the audio frequency namely collected under three kinds of environment: strong noise environment, small noise environment and quiet environment.Can speech enhan-cement be carried out to the audio frequency under small noise environment, remove noise wherein, improve the accuracy of speech recognition, thus improve the accuracy of checking further, reduce erroneous judgement.Detailed process can refer to the embodiment shown in Fig. 3, also repeats no more at this.

As shown in Figure 8, in one embodiment, additionally provide a kind of demo plant, this device comprises:

Receiver module 802, for receiving the original audio issued.

Acquisition module 804, for obtaining the audio frequency to be verified of input.

SNR estimation module 806, for carrying out SNR estimation to audio frequency to be verified.

Judge module 808, for judging whether the signal to noise ratio (S/N ratio) of audio frequency to be verified reaches first threshold.

Authentication module 810, if reach first threshold for the signal to noise ratio (S/N ratio) of audio frequency to be verified, then whether the content detected in audio frequency to be verified is consistent with the content in original audio, if be then verified, otherwise checking is not passed through.

Further, in one embodiment, if judge module 808 does not also reach first threshold for the signal to noise ratio (S/N ratio) of audio frequency to be verified, then judge whether the signal to noise ratio (S/N ratio) of audio frequency to be verified reaches Second Threshold further.As shown in Figure 9, this demo plant also comprises:

Speech enhan-cement module 809, if reach Second Threshold for the signal to noise ratio (S/N ratio) of audio frequency to be verified, then carries out speech enhan-cement to audio frequency to be verified.

Information reminding module 811, if do not reach Second Threshold for the signal to noise ratio (S/N ratio) of audio frequency to be verified, then audio frequency is re-entered in prompting.

In the present embodiment, whether the content of authentication module 810 also for detecting in the audio frequency to be verified after speech enhan-cement be consistent with the content in original audio, is if so, then verified, otherwise checking is not passed through.

In one embodiment, as shown in Figure 10, show a kind of can the structural representation of server of service chart 2 and the verification method provided embodiment illustrated in fig. 3, this server 1000 can produce larger difference because of configuration or performance difference, one or more central processing units (centralprocessingunits can be comprised, CPU) 1002 (such as, one or more processors) and storer 1003, one or more store the storage medium 1004 (such as one or more mass memory units) of application program 1034 or data 1024.Wherein, storer 1003 and storage medium 1004 can be of short duration storages or store lastingly.The program being stored in storage medium 1004 can comprise one or more modules (data reception module 402 as the aforementioned, SNR estimation module 404, judge module 406 and authentication module 408).Further, central processing unit 1002 can be set to communicate with storage medium 1004, and server 1000 performs a series of command operatings in storage medium 1004.Server 1000 can also comprise one or more power supplys 1005, one or more wired or wireless network interfaces 1006, one or more IO interface 1007, and/or, one or more operating system 1014, such as WindowsServerTM, MacOSXTM, UnixTM, LinuxTM, FreeBSDTM etc.

Above-mentioned Fig. 2 or embodiment illustrated in fig. 3 described in step can based on the server architecture shown in this Figure 10.One of ordinary skill in the art will appreciate that all or part of flow process realized in above-described embodiment method, that the hardware that can carry out instruction relevant by computer program has come, described program can be stored in a computer read/write memory medium, this program, when performing, can comprise the flow process of the embodiment as above-mentioned each side method.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-OnlyMemory, ROM) or random store-memory body (RandomAccessMemory, RAM) etc.

In one embodiment, as shown in figure 11, show a kind of can the structural representation of terminal of service chart 6 and the verification method provided embodiment illustrated in fig. 7, this terminal 1100 is an example being applicable to computer environment of the present invention, can not think to propose any restriction to usable range of the present invention.

Terminal 1100 shown in Figure 11 is the examples being suitable for computer system of the present invention.Other framework with different sub-systems configuration also can use.The similar devices such as the desktop computer known by masses, notebook, personal digital assistant, smart phone, panel computer, portable electronic device, Set Top Box are such as had to go for some embodiments of the present invention.But be not limited to above cited equipment.

As shown in figure 11, terminal 1100 comprises processor 1110, storer 1120 and system bus 1122.The various system components comprising storer 1120 and processor 1110 are connected on system bus 1122.Processor 1110 is the hardware being used for being performed by arithmetic sum logical operation basic in computer system computer program instructions.Storer 1120 be one for storing the physical equipment of calculation procedure or data (such as, program state information) temporarily or permanently.System bus 1120 can be any one in the bus structure of following several types, comprises memory bus or memory controller, peripheral bus and local bus.Processor 1110 and storer 1120 can carry out data communication by system bus 1122.Wherein storer 1120 comprises ROM (read-only memory) (ROM) or flash memory (all not shown in figure), and random access memory (RAM), and RAM typically refers to the primary memory being loaded with operating system and application program.

Terminal 1100 also comprises display interface 1130 (such as, Graphics Processing Unit), display device 1140 (such as, liquid crystal display), audio interface 1150 (such as, sound card) and audio frequency apparatus 1160 (such as, loudspeaker).Audio interface 1150 can be used for the original audio that playback terminal 1100 receives together with audio frequency apparatus 1160.

Terminal 1100 generally comprises a memory device 1170.Memory device 1170 can be selected from multiple computer-readable medium, and computer-readable medium refers to any available medium can accessed by computer system 1100, that comprise movement and fixing two media.Such as, computer-readable medium includes but not limited to, flash memory (miniature SD card), CD-ROM, digital versatile disc (DVD) or other optical disc storage, tape cassete, tape, disk storage or other magnetic storage apparatus, or can be used for storing information needed and other medium any can accessed by terminal 1100.

Terminal 1100 also comprises input media 1180 and input interface 1190 (such as, I/O controller).User can pass through input media 1180, and as the touch panel equipment in keyboard, mouse, display device 1140, input instruction and information are in terminal 1100.Input media 1180 is normally connected on system bus 1122 by input interface 1190, but also can be connected by other interface or bus structure, as USB (universal serial bus) (USB).

Terminal 1100 can be carried out logic with one or more network equipment in a network environment and is connected.The network equipment can be PC, server, router, smart phone, panel computer or other common network node.Terminal 1100 is connected with the network equipment by LAN (Local Area Network) (LAN) interface 1200 or mobile comm unit 1210.

As described in detail, be applicable to the assigned operation that terminal 1100 of the present invention can perform verification method above.The form of the software instruction that terminal 1100 is operated in computer-readable medium by processor 1110 performs these operations.These software instructions can be read into storer 1120 from memory device 1170 or by lan interfaces 1200 from another equipment.The software instruction be stored in storer 1120 makes processor 1110 perform above-mentioned verification method.In addition, also the present invention can be realized equally by hardware circuit or hardware circuit in conjunction with software instruction.Therefore, the combination that the present invention is not limited to any specific hardware circuit and software is realized.

The above embodiment only have expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.

Claims

1. a verification method, described method comprises:

Receive audio frequency to be verified;

SNR estimation is carried out to described audio frequency to be verified;

2. method according to claim 1, is characterized in that, described method also comprises:

If the signal to noise ratio (S/N ratio) of described audio frequency to be verified does not reach first threshold, then judge whether the signal to noise ratio (S/N ratio) of described audio frequency to be verified reaches Second Threshold further, if, then speech enhan-cement is carried out to described audio frequency to be verified, whether the content detected in the audio frequency to be verified after speech enhan-cement is consistent with the content in corresponding original audio, if so, be then verified, otherwise checking is not passed through;

If the signal to noise ratio (S/N ratio) of described audio frequency to be verified does not reach Second Threshold, then return the information pointed out and re-enter.

3. method according to claim 2, is characterized in that, described step of audio frequency to be verified being carried out to SNR estimation, comprising:

Described audio frequency to be verified is carried out Fourier transform to frequency domain;

Adopt minimum value to control recurrence average algorithm and SNR estimation is carried out to the audio frequency to be verified transforming to frequency domain.

4. method according to claim 3, is characterized in that, described step of audio frequency to be verified being carried out to speech enhan-cement, comprising:

Adopt the noise in the described audio frequency to be verified of voice enhancement algorithm removal of Corpus--based Method model;

The audio frequency to be verified eliminating noise is carried out inverse fourier transform to time domain, obtains the audio frequency to be verified after speech enhan-cement.

5. a verification method, described method comprises:

Receive the original audio issued;

Obtain the audio frequency to be verified of input;

SNR estimation is carried out to described audio frequency to be verified;

6. method according to claim 5, is characterized in that, described method also comprises:

If the signal to noise ratio (S/N ratio) of described audio frequency to be verified does not reach first threshold, then judge whether the signal to noise ratio (S/N ratio) of described audio frequency to be verified reaches Second Threshold further, if, then speech enhan-cement is carried out to described audio frequency to be verified, whether the content detected in the audio frequency to be verified after speech enhan-cement is consistent with the content in described original audio, if so, be then verified, otherwise checking is not passed through;

If the signal to noise ratio (S/N ratio) of described audio frequency to be verified does not reach Second Threshold, then audio frequency is re-entered in prompting.

7. a verification system, is characterized in that, described system comprises:

Receiver module, for receiving audio frequency to be verified;

8. system according to claim 7, is characterized in that, if described judge module does not also reach first threshold for the signal to noise ratio (S/N ratio) of described audio frequency to be verified, then judges whether the signal to noise ratio (S/N ratio) of described audio frequency to be verified reaches Second Threshold further;

Described system also comprises:

Speech enhan-cement module, if reach Second Threshold for the signal to noise ratio (S/N ratio) of described audio frequency to be verified, then carries out speech enhan-cement to described audio frequency to be verified;

Information returns module, if do not reach Second Threshold for the signal to noise ratio (S/N ratio) of described audio frequency to be verified, then returns the information pointed out and re-enter;

Whether the content of described authentication module also for detecting in the audio frequency to be verified after speech enhan-cement be consistent with the content in corresponding original audio, is if so, then verified, otherwise checking is not passed through.

9. system according to claim 8, is characterized in that, described SNR estimation module is used for described audio frequency to be verified to carry out Fourier transform to frequency domain; Adopt minimum value to control recurrence average algorithm and SNR estimation is carried out to the audio frequency to be verified transforming to frequency domain.

10. system according to claim 9, is characterized in that, described speech enhan-cement module is for adopting the noise in the described audio frequency to be verified of the voice enhancement algorithm of Corpus--based Method model removal; The audio frequency to be verified eliminating noise is carried out inverse fourier transform to time domain, obtains the audio frequency to be verified after speech enhan-cement.

11. 1 kinds of demo plants, is characterized in that, described device comprises:

Receiver module, for receiving the original audio issued;

Acquisition module, for obtaining the audio frequency to be verified of input;

12. devices according to claim 11, is characterized in that, if described judge module does not also reach first threshold for the signal to noise ratio (S/N ratio) of described audio frequency to be verified, then judge whether the signal to noise ratio (S/N ratio) of described audio frequency to be verified reaches Second Threshold further;

Described device also comprises:

Information reminding module, if do not reach Second Threshold for the signal to noise ratio (S/N ratio) of described audio frequency to be verified, then audio frequency is re-entered in prompting;

Whether the content of described authentication module also for detecting in the audio frequency to be verified after speech enhan-cement be consistent with the content in described original audio, is if so, then verified, otherwise checking is not passed through.