RU98100221A

RU98100221A - TALKING VERIFICATION SYSTEM

Info

Publication number: RU98100221A
Application number: RU98100221/09A
Authority: RU
Inventors: Ричард Дж. Мэммон; Кевин Фаррел; Мэниш Шарма; Нейк Дивэнг; Зяою Занг; Халед Ассалех; Хан-Шенг Леу
Original assignee: Ратгерс Юниверсити
Priority date: 1995-06-07
Filing date: 1996-06-06
Publication date: 2000-01-10

Claims

1. The method of verifying the speaker, according to which at least one attribute is distinguished from the first speech fragment spoken by the speaker, the at least one attribute is classified using a plurality of classifiers to form a plurality of classification results, the aforementioned plurality of classification results are combined to form the combined results classifications, recognize the specified combined results of the classification by determining the similarity of the specified combined results the labeling moiety and the second speech spoken by said speaker to speaker verification and, based on these detected results of the combined classification decide to accept or reject said speaker.

2. The method according to claim 1, characterized in that it further determines the reliability based on the specified recognized combined classification results.

3. The method according to claim 2, characterized in that, in addition to classifying said at least one feature, words are recognized in said first speech fragment spoken by said speaker by comparing said at least one feature with data corresponding to said speaker and stored before verifying the speaker, in order to pre-accept the specified speaker or previously reject the specified speaker and carry out the specified classification operation at least one sign, if it is decided to preliminarily accept the specified speaker, and the re-request module is activated if it is decided to first reject the specified speaker.

4. The method according to claim 3, characterized in that said first speech fragment contains at least one password for said speaker.

5. The method according to claim 4, characterized in that said data comprises a speaker-dependent pattern formed from a first speech fragment pre-spoken by said speaker and a speaker-independent pattern formed from a first speech fragment pre-spoken at least at least one second speaker.

6. The method according to claim 1, characterized in that the classification operation is performed using a classifier using a neural tree network and a classifier using dynamic timeline predistortion.

7. The method according to claim 1, characterized in that the classification is performed using a classifier using a modified neural tree network, and a classifier using dynamic timeline predistortion.

8. The method according to p. 1, characterized in that when the specified recognition serves for a pair of the specified set of classifiers, the set of the first speech fragments of the specified speaker and discard one of these fragments forming the discarded fragment, for training these classifiers, the specified discarded fragment is fed to the specified pairs of classifiers for independent testing of these classifiers, calculate the first probability for the first of the classifiers from the specified pair of classifiers and the second probability l for the second classifier from the specified pair of classifiers and determine the first threshold for the first classifier from the specified pair of classifiers based on the specified first probability and the second threshold for the second classifier from the specified pair of classifiers based on the specified second probability, and the indicated similarity of the set of classification results is determined by comparing the specified the first classifier from a pair of classifiers with the specified first threshold and the specified second classifier from a pair of keys codifiers with the indicated second threshold.

9. The method according to claim 1, characterized in that the selection is performed by modifying the poles in the pole filter of the specified first and second speech fragments to highlight the specified at least one feature.

10. The method according to p. 1, characterized in that it further segmentes said at least one feature of said first speech fragment into a plurality of first subwords after said extraction operation.

11. The method according to p. 10, characterized in that said subwords are phonemes.

12. The method according to claim 1, characterized in that said at least one feature is adjusted using the affinity conversion
y = Ax + b,
where y is the indicated affine transformation of the vector x, A is the matrix corresponding to the linear transformation, b is the vector corresponding to the transfer.

13. The speaker verification system, comprising means for extracting at least one feature from the first speech fragment spoken by said speaker, means for classifying said at least one feature using a plurality of classifiers to generate a plurality of classification results, means for combining said plurality of classification output signals to generate combined classification results, means of recognition of the specified combined classification results by o definiteness said combined similarity classification results and the second speech fragment speaker spoken before said speaker verification and the decision means, on the basis of said detected combined classification result to accept or reject said speaker.

14. The system according to item 13, characterized in that it further comprises means for recognizing words in the specified first speech fragment spoken by the specified speaker, by comparing the specified at least one attribute with data related to the specified speaker and stored prior to verification of the speaker, to determine , accept the specified speaker in advance or reject the specified speaker in advance, and means of activating said means of classifying at least one attribute, if resolved preliminarily accept said speaker, and actuating module re-request, if it is decided to reject said speaker in advance.

15. The system of claim 14, wherein said data comprises a pattern dependent on the speaker and formed from the first speech fragment pre-spoken by the specified speaker and a pattern independent of the speaker and formed from the first speech fragment pre-spoken at least at least one second speaker.

16. The system of Claim 15, wherein said classification means comprises a classifier using a modified neural tree network and a classifier using dynamic timeline predistortion.

17. The system according to clause 16, characterized in that the said means of selection are implemented by limiting the poles in an all-pole filter.

18. The system of claim 17, wherein said at least one feature is a cepstral coefficient that is adjusted using an affinity transform.

19. The method according to claim 10, characterized in that said poles are modified by determining the spectral component of the specified at least one feature and limiting the narrow frequency band to obtain a channel estimate.

20. The method according to claim 19, characterized in that it further deconvolution of the specified first speech fragment and the specified second speech fragment using the specified channel estimation to obtain a normalized speech fragment and calculate the spectral characteristics of the specified normalized speech fragment to obtain the feature vectors of the normalized speech fragment, which are used in this classification.

21. The method according to p. 19, characterized in that it further converts said channel estimate into cepstral coefficients to obtain a modified channel estimate in the cepstral region and subtracts said modified channel estimate from the cepstral frames of said first speech fragment and said second speech fragment.

22. The method according to p. 12, characterized in that the at least one feature is cepstral coefficients, which are corrected using the affinity conversion.

23. The method according to claim 7, characterized in that at least one more characteristic is distinguished from the second speech fragment uttered by other speakers, a first label is assigned to the indicated at least one feature from the first speech fragment pronounced by the said speaker, the second label is assigned to the specified at least one attribute from the second speech fragment uttered by other speakers and these classifiers are taught the first and second label.

24. The method according to claim 10, characterized in that at least one feature is further extracted from a second speech fragment spoken by other speakers, said at least one feature of said second speech fragment is segmented into a plurality of second subwords after said selection, said first is stored the set of subwords and the second set of subwords in the database of subwords, determine the first marks for the specified speaker from the indicated first stored subwords, and the second marks for the other specified second subwords ogih speakers and teach these classifiers the first and second marks.