CA2643930A1

CA2643930A1 - Method and apparatus for building grammars with lexical semantic clustering in a speech recognizer

Info

Publication number: CA2643930A1
Application number: CA002643930A
Authority: CA
Inventors: Kenneth Todd Reed
Original assignee: Call Genie Inc.; Kenneth Todd Reed
Current assignee: Call Genie Inc
Priority date: 2006-04-17
Filing date: 2007-04-17
Publication date: 2007-10-25
Also published as: WO2007118324A1; EP2008268A1; EP2008268A4

Abstract

A method and system for building a grammar module for a speech application. The method includes the step of clustering phrases having a semantic similarity. The grammar module comprises phrases in a machine-readable format and semantic concepts associated with the phrases. According to another aspect, the grammar module includes embedded semantic interpretations associated with the semantic concepts.

Claims

1. A method for creating a grammar module for a speech application, said method comprising the steps of:

collecting phrases associated with one or more voice responses;
transcribing said collected phrases into a machine-readable format;

clustering selected ones of said collected phrases into one or more semantic concepts, and wherein said selected collected phrases in each of said semantic concepts have a related meaning;

building a grammar module based on said collected phrases and said semantic concepts.

2. The method as claimed in claim 1, wherein said step of clustering comprises the step of identifying one or more words in each of said collected phrases and associated said collected phrases with a semantic concept when one or more of said words have a meaning which is similar or the same.

3. The method as claimed in claim 2, wherein said step of identifying one or more words comprises generating a vector for said collected phrase, said vector having an element for each of a plurality of words in said collected phrase, and comparing the vector for said collected phrase to a vector for one of said semantic concepts, and associating said collected phrase with said semantic concept if said vector has a number of elements exceeding a predefined threshold.

4. The method as claimed in claim 3, wherein said step of building a grammar module comprises converting a plurality of grammar elements into a machine-readable format and converting said semantic concepts into a machine-readable format, and storing said machine-readable grammar elements and semantic concepts in a computer file.

5. The method as claimed in claim 3, wherein one or more of said vector elements includes an indicator, said indicator providing information about said associated vector element.

6. The method as claimed in claim 5, wherein said indicator comprises a content indicator providing a probability indicator for the occurrence of a word.

7. The method as claimed in claim 5, wherein said indicator comprises a word sense indicator providing an intended meaning for a word.

8. The method as claimed in claim 3, further including the step of inserting one or more synonymous terms for one or more words in said collected phrases wherein said one or more words have a synonymous term, and said vector including a corresponding element for at least some of said synonymous terms.

9. The method as claimed in claim 3, further including the step of inserting one or more hypernyms into said vector, and said one or more hypernyms having a weighting.

10. A system for building a grammar module for a speech application, said system comprising:

means for collecting phrases associated with one or more of said voice responses;

means for transcribing said collected phrases into a machine-readable format;
means for clustering selected ones of said collected phrases into a plurality of semantic concepts, wherein each of said semantic concepts comprises one or more collected phrases having a similar meaning;

means for creating a grammar module based on said collected phrases and said semantic concepts.

11. The system as claimed in claim 10, wherein said means for clustering includes means for characterizing each of said selected collected phrases as a vector, said vector having one or more elements corresponding to one or more words comprising said collected phrase, and each of said semantic concepts including one or more vectors having an element for each of a plurality of words associated with said semantic concept.

12. The system as claimed in claim 11, further including means for comparing each of said collected phrase vectors to one or more of said semantic concept vectors based on a similarity measure, and means for grouping one or more of said collected phrases when said similarity measure exceeds a predetermined threshold.

13. The system as claimed in claim 12, further including means for inserting one or more synonymous terms for one or more words in said collected phrases wherein said one or more words have a synonymous term, and said vector including a corresponding element for at least some of said synonymous terms.

14. The system as claimed in claim 12, further including means for inserting one or more hypernyms into said vector, and said one or more hypernyms each having an associated weighting.

15. A method for creating a grammar module suitable for use with a speech application, said method comprising the steps of:

collecting phrases associated with one or more voice responses;
transcribing said collected phrases into a machine-readable format;

grouping one or more of said collected phrases into a plurality of groups, wherein each of said groups has an associated semantic, said one or more collected phrases being grouped based on a similarity between said collected phrase and the associated semantic concept for said group; and building a grammar module based on said collected phrases and said semantic concepts.

16. The method as claimed in claim 15, wherein said step of grouping comprises determining a similarity between said collected phrase and the associated semantic concept for said group, and comparing said similarity to a predefined threshold, and adding said collected phrase to the group associated with said semantic concept if said predefined threshold is satisfied.

17. The method as claimed in claim 16, further including the step utilizing said collected phrase not satisfying said predefined threshold for a new semantic concept.

18. The method as claimed in claim 17, wherein said semantic concepts comprise a plurality of semantically equivalent words or phrases.

19 19. The method as claimed in claim 16, wherein said similarity is determined according to a similarity function.

20. A method for generating a grammar module for a speech application, said method comprising the steps of:

collecting one or more phrases associated with one or more voice responses;
transcribing said collected phrases into a machine-readable format;

clustering selected ones of said collected phrases into one or more semantic concepts, and wherein said selected collected phrases in each of said semantic concepts have a similar meaning;

interpreting at least some of said semantic concepts;

building a grammar module based on said collected phrases, said semantic concepts and said interpreted semantic concepts.

21. The method as claimed in claim 20, wherein said step of building a grammar module comprises creating a machine-readable grammar file.

22. The method as claimed in claim 21, further including converting said interpreted semantic concepts into a machine-readable format and embedding said interpreted semantic concepts in said machine-readable grammar file.

23. The method as claimed in claim 20, wherein said step of interpreting each of said semantic concepts comprises converting said interpreted semantic concepts into a machine-readable format