US20100094630A1 - Associating source information with phonetic indices - Google Patents

Associating source information with phonetic indices Download PDF

Info

Publication number
US20100094630A1
US20100094630A1 US12/249,451 US24945108A US2010094630A1 US 20100094630 A1 US20100094630 A1 US 20100094630A1 US 24945108 A US24945108 A US 24945108A US 2010094630 A1 US2010094630 A1 US 2010094630A1
Authority
US
United States
Prior art keywords
phonemes
phonetic
source
speech
sources
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/249,451
Other versions
US8301447B2 (en
Inventor
John H. Yoakum
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Arlington Technologies LLC
Avaya Management LP
Original Assignee
Nortel Networks Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nortel Networks Ltd filed Critical Nortel Networks Ltd
Priority to US12/249,451 priority Critical patent/US8301447B2/en
Assigned to NORTEL NETWORKS LIMITED reassignment NORTEL NETWORKS LIMITED ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: YOAKUM, JOHN H.
Assigned to NORTEL NETWORKS LIMITED reassignment NORTEL NETWORKS LIMITED CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNOR NAMES PREVIOUSLY RECORDED ON REEL 021667 FRAME 0964. ASSIGNOR(S) HEREBY CONFIRMS THE FIRST-NAMED ASSIGNOR AS YOAKUM, JOHN H.. Assignors: YOAKUM, JOHN H., WHYNOT, STEPHEN
Priority to PCT/IB2009/007074 priority patent/WO2010041131A1/en
Assigned to CITIBANK, N.A., AS ADMINISTRATIVE AGENT reassignment CITIBANK, N.A., AS ADMINISTRATIVE AGENT SECURITY AGREEMENT Assignors: AVAYA INC.
Assigned to CITICORP USA, INC., AS ADMINISTRATIVE AGENT reassignment CITICORP USA, INC., AS ADMINISTRATIVE AGENT SECURITY AGREEMENT Assignors: AVAYA INC.
Assigned to AVAYA INC. reassignment AVAYA INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: NORTEL NETWORKS LIMITED
Publication of US20100094630A1 publication Critical patent/US20100094630A1/en
Assigned to BANK OF NEW YORK MELLON TRUST, NA, AS NOTES COLLATERAL AGENT, THE reassignment BANK OF NEW YORK MELLON TRUST, NA, AS NOTES COLLATERAL AGENT, THE SECURITY AGREEMENT Assignors: AVAYA INC., A DELAWARE CORPORATION
Publication of US8301447B2 publication Critical patent/US8301447B2/en
Application granted granted Critical
Assigned to THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A. reassignment THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A. SECURITY AGREEMENT Assignors: AVAYA, INC.
Assigned to BANK OF NEW YORK MELLON TRUST COMPANY, N.A., THE reassignment BANK OF NEW YORK MELLON TRUST COMPANY, N.A., THE SECURITY AGREEMENT Assignors: AVAYA, INC.
Assigned to CITIBANK, N.A., AS ADMINISTRATIVE AGENT reassignment CITIBANK, N.A., AS ADMINISTRATIVE AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AVAYA INC., AVAYA INTEGRATED CABINET SOLUTIONS INC., OCTEL COMMUNICATIONS CORPORATION, VPNET TECHNOLOGIES, INC.
Assigned to VPNET TECHNOLOGIES, INC., AVAYA INTEGRATED CABINET SOLUTIONS INC., AVAYA INC., OCTEL COMMUNICATIONS LLC (FORMERLY KNOWN AS OCTEL COMMUNICATIONS CORPORATION) reassignment VPNET TECHNOLOGIES, INC. BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 041576/0001 Assignors: CITIBANK, N.A.
Assigned to AVAYA INC. reassignment AVAYA INC. BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 023892/0500 Assignors: CITIBANK, N.A.
Assigned to AVAYA INC. reassignment AVAYA INC. BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 029608/0256 Assignors: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A.
Assigned to AVAYA INC. reassignment AVAYA INC. BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 025863/0535 Assignors: THE BANK OF NEW YORK MELLON TRUST, NA
Assigned to AVAYA INC. reassignment AVAYA INC. BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 030083/0639 Assignors: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A.
Assigned to GOLDMAN SACHS BANK USA, AS COLLATERAL AGENT reassignment GOLDMAN SACHS BANK USA, AS COLLATERAL AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AVAYA INC., AVAYA INTEGRATED CABINET SOLUTIONS LLC, OCTEL COMMUNICATIONS LLC, VPNET TECHNOLOGIES, INC., ZANG, INC.
Assigned to SIERRA HOLDINGS CORP., AVAYA, INC. reassignment SIERRA HOLDINGS CORP. RELEASE BY SECURED PARTY (SEE DOCUMENT FOR DETAILS). Assignors: CITICORP USA, INC.
Assigned to CITIBANK, N.A., AS COLLATERAL AGENT reassignment CITIBANK, N.A., AS COLLATERAL AGENT SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AVAYA INC., AVAYA INTEGRATED CABINET SOLUTIONS LLC, OCTEL COMMUNICATIONS LLC, VPNET TECHNOLOGIES, INC., ZANG, INC.
Assigned to WILMINGTON TRUST, NATIONAL ASSOCIATION reassignment WILMINGTON TRUST, NATIONAL ASSOCIATION SECURITY INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AVAYA INC., AVAYA INTEGRATED CABINET SOLUTIONS LLC, AVAYA MANAGEMENT L.P., INTELLISIST, INC.
Assigned to WILMINGTON TRUST, NATIONAL ASSOCIATION, AS COLLATERAL AGENT reassignment WILMINGTON TRUST, NATIONAL ASSOCIATION, AS COLLATERAL AGENT INTELLECTUAL PROPERTY SECURITY AGREEMENT Assignors: AVAYA CABINET SOLUTIONS LLC, AVAYA INC., AVAYA MANAGEMENT L.P., INTELLISIST, INC.
Assigned to AVAYA INTEGRATED CABINET SOLUTIONS LLC, AVAYA HOLDINGS CORP., AVAYA MANAGEMENT L.P., AVAYA INC. reassignment AVAYA INTEGRATED CABINET SOLUTIONS LLC RELEASE OF SECURITY INTEREST IN PATENTS AT REEL 45124/FRAME 0026 Assignors: CITIBANK, N.A., AS COLLATERAL AGENT
Assigned to WILMINGTON SAVINGS FUND SOCIETY, FSB [COLLATERAL AGENT] reassignment WILMINGTON SAVINGS FUND SOCIETY, FSB [COLLATERAL AGENT] INTELLECTUAL PROPERTY SECURITY AGREEMENT Assignors: AVAYA INC., AVAYA MANAGEMENT L.P., INTELLISIST, INC., KNOAHSOFT INC.
Assigned to CITIBANK, N.A., AS COLLATERAL AGENT reassignment CITIBANK, N.A., AS COLLATERAL AGENT INTELLECTUAL PROPERTY SECURITY AGREEMENT Assignors: AVAYA INC., AVAYA MANAGEMENT L.P., INTELLISIST, INC.
Assigned to INTELLISIST, INC., AVAYA INTEGRATED CABINET SOLUTIONS LLC, AVAYA INC., AVAYA MANAGEMENT L.P. reassignment INTELLISIST, INC. RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 53955/0436) Assignors: WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT
Assigned to INTELLISIST, INC., AVAYA INC., AVAYA INTEGRATED CABINET SOLUTIONS LLC, AVAYA MANAGEMENT L.P. reassignment INTELLISIST, INC. RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 61087/0386) Assignors: WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT
Assigned to VPNET TECHNOLOGIES, INC., INTELLISIST, INC., CAAS TECHNOLOGIES, LLC, AVAYA MANAGEMENT L.P., HYPERQUALITY II, LLC, AVAYA INC., ZANG, INC. (FORMER NAME OF AVAYA CLOUD INC.), HYPERQUALITY, INC., AVAYA INTEGRATED CABINET SOLUTIONS LLC, OCTEL COMMUNICATIONS LLC reassignment VPNET TECHNOLOGIES, INC. RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 045034/0001) Assignors: GOLDMAN SACHS BANK USA., AS COLLATERAL AGENT
Assigned to AVAYA LLC reassignment AVAYA LLC (SECURITY INTEREST) GRANTOR'S NAME CHANGE Assignors: AVAYA INC.
Assigned to AVAYA LLC, AVAYA MANAGEMENT L.P. reassignment AVAYA LLC INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT Assignors: CITIBANK, N.A.
Assigned to AVAYA LLC, AVAYA MANAGEMENT L.P. reassignment AVAYA LLC INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT Assignors: WILMINGTON SAVINGS FUND SOCIETY, FSB
Assigned to ARLINGTON TECHNOLOGIES, LLC reassignment ARLINGTON TECHNOLOGIES, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AVAYA LLC
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics

Definitions

  • the present invention relates to phonetic searching, and in particular to associating source information with phonetic indices.
  • Text-based documents lend themselves well to electronic searching because the content is easily characterized, understood, and searched. In short, the words of a document are well defined and easily searched.
  • speech-based media such as speech recordings, dictation, telephone calls, multi-party conference calls, music, and the like have traditionally been more difficult to analyze from a content perspective than text-based documents. Most speech-based media is characterized in general and organized and searched accordingly.
  • the specific speech content is generally not known with any specificity, unless human or automated transcription is employed to provide an associated text-based document. Human transcription has proven time-consuming and expensive.
  • phonemes are the smallest units of human speech, and most languages only have 30 to 40 phonemes. From this relatively small group of phonemes, all speech can be accurately defined.
  • the series of phonemes created by this parsing process is readily searchable and referred to in general as a phonetic index of the speech.
  • To search for the occurrence of a given term in the speech the term is first transformed into its phonetic equivalent, which is provided in the form of a string of phonemes.
  • the phonetic index is processed to identify whether the string of phonemes occurs within the phonetic index. If the string of phonemes for the search term occurs in the phonetic index, then the term occurs in the speech.
  • phonetic-based speech processing and searching techniques tend to be less complicated and more accurate than the traditional word-based speech recognition techniques.
  • use of phonemes mitigates the impact of dialects, slang, and other language variations that make identifying a specific word difficult, but have much less impact on each individual phoneme that makes up the same word.
  • One drawback of phonetic-based speech processing is the ability to distinguish between speakers in multi-party speech, such as that found in telephone or conference calls. Although a particular term may be identified, there is no efficient and automated way to identify the speaker who uttered the term. The ability to associate portions of speech with the respective speakers in multi-party speech would add another dimension in the ability to process and analyze multi-party speech. As such, there is a need for an efficient and effective technique to identify and associate the source of speech in multi-party speech with the corresponding phonemes in a phonetic index that is derived from the multi-party speech.
  • the present invention relates to creating a phonetic index of phonemes from an audio segment that includes speech content from multiple sources.
  • the phonemes in the phonetic index are directly or indirectly associated with the corresponding source of the speech from which the phonemes were derived.
  • the phonetic index of speech content from multiple sources may be searched based on phonetic content as well as the corresponding source.
  • the audio segment is processed to identify phonemes for each unit of speech in the audio segment.
  • a phonetic index of the phonemes is generated for the audio segment, wherein each phonetic entry in the phonetic index identifies a phoneme that is associated with a corresponding unit of speech in the audio segment.
  • each of the multiple sources is associated with corresponding phonetic entries in the phonetic index, wherein a source associated with a given phonetic entry corresponds to the source of the unit of speech from which the phoneme for the given phonetic entry was generated.
  • Various techniques may be employed to associate the phonemes with their sources; however, once associated, the phonetic index may be searched based on phonetic content and source criteria.
  • Such searches may entail searching the phonetic index based on phonetic content criteria to identify a source associated with the phonetic content, searching the phonetic index based on the phonetic content criteria as well as source criteria to identify a matching location in the phonetic index or corresponding audio segment, and the like.
  • the source information that is associated with the phonetic index may be useful as a search criterion or a search result.
  • a phonetic index and any source information directly or indirectly associated therewith may be searched as follows.
  • the content criteria query may include keywords, phoneme strings, or any combination thereof alone or in the form of a Boolean function. If one or more keywords are used, each keyword is broken into its phonetic equivalent, which will provide a string of phonemes. Accordingly, the content criteria either is or is converted into phonetic search criteria comprising one or more strings of phonemes, which may be associated with one or more Boolean operators.
  • the phonetic index and associated source information are then searched based on the phonetic content criteria and the source criteria to identify portions of the phonetic index that match the phonetic search criteria and correspond to the source or sources identified in the source criteria.
  • various actions may be taken in response to identifying those portions of the phonetic index that match the phonetic search criteria and correspond to the source or sources identified by the source criteria. Further, such processing and searching may be provided on existing media files or streaming media that has speech content from one or more parties.
  • FIG. 1 is a flow diagram illustrating an exemplary process for generating a phonetic index and associating source information with the phonetic index according to one embodiment of the present invention.
  • FIG. 2 illustrates the parsing of an audio segment into a sequence of phonemes for a phonetic index according to one embodiment of the present invention
  • FIGS. 3-10 illustrate different techniques for generating a phonetic index and associating source information with the phonetic index according to different embodiments of the present invention.
  • FIG. 11 is a flow diagram illustrating an exemplary technique for searching a phonetic index according to one embodiment of the present invention.
  • FIGS. 12A and 12B illustrates windows in which content criteria and source criteria may be entered for a search query.
  • FIG. 13 illustrates the translation of a keyword to a string of phonemes for the keyword.
  • FIG. 14 illustrates matching a string of phonemes within the sequence of phonemes of a phonetic index.
  • FIG. 15 illustrates a media environment according to one embodiment of the present invention.
  • FIG. 16 illustrates a conferencing environment according to one embodiment of the present invention.
  • FIGS. 17A and 17B illustrate the basic operation of a conference bridge according to one embodiment of the present invention.
  • FIG. 18 illustrates a block diagram of a conference bridge according to one embodiment of the present invention.
  • FIG. 19 illustrates a service node according to one embodiment of the present invention.
  • the present invention relates to creating a phonetic index of phonemes from an audio segment that includes speech content from multiple sources.
  • the phonemes in the phonetic index are directly or indirectly associated with the corresponding source of the speech from which the phonemes were derived.
  • the phonetic index of speech content from multiple sources may be searched based on phonetic content as well as the corresponding source.
  • FIG. 1 an overview is provided of an exemplary process for generating a phonetic index for an audio segment that includes speech content from multiple sources, and associating sources with the corresponding phonemes, according to one embodiment of the present invention.
  • an audio segment including speech content from multiple sources is accessed for processing (Step 100 ).
  • the audio segment may be provided in any type of media item, such as a media stream or stored media file that includes speech from two or more known sources.
  • the media item may include graphics, images, or video in addition to the audio segment.
  • the audio segment is then parsed to identify phonemes for each unit of speech in the audio segment (Step 102 ).
  • the result of such processing is a sequence of phonemes, which represents the phonetic content of the audio segment.
  • a phonetic index of the phonemes is generated for the audio segment, wherein each phonetic entry in the phonetic index identifies a phoneme that is associated with a corresponding unit of speech in the audio segment (Step 104 ).
  • each of the multiple sources is associated with corresponding phonetic entries in the phonetic index, wherein a source associated with a given phonetic entry corresponds to the source of the unit of speech from which the phoneme for the given phonetic entry was generated (Step 106 ).
  • Various techniques may be employed to associate the phonemes with their sources; however, once associated, the phonetic index may be searched based on phonetic content and source criteria.
  • FIG. 2 illustrates the parsing and association process described above.
  • an audio segment 10 includes interactive speech between two sources.
  • the first source (Source 1 ) is an operator
  • the second source (Source 2 ) is a caller.
  • the operator (Source 1 ) utters the question “how may I help you?” and in response, the caller (Source 2 ) utters the response “please transfer me to the Washington office for a call.”
  • each basic unit of speech is translated into a phoneme (PH).
  • the sequence of phonemes is used to form a corresponding phonetic index 12 and a source of the phonemes is associated with the phonemes of the phonetic index 12 .
  • the source of the phonemes is provided in or with the phonetic index 12 , which may be associated with the audio segment in a media item or in a separate file or stream, which is associated with the media item containing the audio segment.
  • a portion of the phonetic index 12 is illustrated, and the actual phonemes for the speech segment corresponding to “transfer me to the Washington office” are provided. Phonemes about this speech segment are illustrated generically as PH.
  • the string of phonemes for the speech segment “transfer me to the Washington office” is represented as follows:
  • this phonetic example is an over-simplified representation of a typical phonetic index.
  • a phonetic index may include or represent characteristics of the acoustic channel, which represents the environment in which the speech was uttered, and a transducer through which it was recorded, and a natural language in which human beings express the speech.
  • Acoustic channel characteristics include frequency response, background noise, and reverberation.
  • Natural language characteristics include accent, dialect, and gender traits.
  • the phonetic index 12 and associated source information may be directly or indirectly associated with each other in a variety of ways. Regardless of the source information, the phonetic index 12 may also be maintained in a variety of ways.
  • the phonetic index 12 may be associated with the corresponding audio segment in a media item that includes both the audio segment and the phonetic index 12 .
  • the phonetic index 12 may be maintained as metadata associated with the audio segment, wherein the phonetic index 12 is preferably, but need not be, synchronized (or time-aligned) with the audio segment. When synchronized, a particular phoneme is matched to a particular location in the audio segment where the unit of speech from which the phoneme was derived resides.
  • the phonetic index 12 may be maintained in a separate file or stream, which may or may not be synchronized with the audio segment, depending on the application.
  • the phonetic index may be associated with a time reference or other synchronization reference with respect to the audio segment or media item containing the audio segment.
  • certain applications will not require the maintenance of an association between the phonetic index 12 and the audio segment from which the phonetic index 12 was derived.
  • the source information may be maintained with the phonetic index 12 , in a separate file or stream, or in the media item containing the audio segment.
  • FIGS. 3 through 10 illustrate various ways to maintain a phonetic index 12 and associated source information.
  • FIG. 3 illustrates an embodiment wherein the phonetic index 12 includes a sequence of phonemes (PH), which correspond to the units of speech in the audio segment, as well as source information for each phoneme (PH).
  • the source information SX where X represents a particular source of the source information SX, is generally provided for each phoneme. Accordingly, each position of the phonetic index 12 includes a phoneme for a particular unit of speech and the corresponding source for that particular unit of speech.
  • the phonetic index 12 may be provided in a separate media item, such as a file or stream, which may or may not be time-aligned with the corresponding audio segment.
  • FIG. 4 illustrates a similar embodiment as that illustrated in FIG. 3 , with the exception that the source information SX is provided in association with the first phoneme that is associated with a new source. Accordingly, once a particular source SX is identified, the phonemes derived after determining the particular source are associated with that particular source until there is a change to another source.
  • the phonetic index 12 may be provided in a separate media item, such as a file or stream, which may or may not be time-aligned with the corresponding audio segment 10 .
  • FIG. 5 illustrates a phonetic index 12 wherein the source information SX is associated with each phoneme, such as that provided in FIG. 3 .
  • the phonetic index 12 is further associated with a time reference or other synchronization information, which allows a particular phoneme and associated source SX to be correlated to a corresponding location in the audio segment 10 .
  • the time reference allows an association between the unit of speech and the audio segment 10 to the corresponding phoneme and source in the phonetic index 12 , wherein the audio segment 10 and the phonetic index 12 may be provided in separate files or streams.
  • FIG. 6 provides a similar embodiment, wherein the source information provided in the phonetic index 12 identifies the source SX as associated with a particular phoneme, without providing a separate source index position for each corresponding phoneme position, such as that illustrated in FIG. 4 .
  • the embodiment in FIG. 6 is further associated with a time reference or other synchronization information as illustrated in FIG. 5 .
  • the phonetic index 12 is stored along with the audio segment 10 and the source information SX in a media item, such as a media file or stream.
  • the audio segment 10 may be the same as that from which the phonemes were derived or a compressed version thereof. Regardless of the compression or formatting, the content of the audio segment 10 will correspond to the phonetic index 12 that is associated therewith.
  • the phonemes of the phonetic index 12 and the corresponding source information may be provided as metadata associated with the audio segment 10 .
  • the source information will be aligned with the corresponding phonemes; however, the phonemes may or may not be synchronized or time-aligned with the audio segment 10 , depending on the particular application.
  • a media item 14 which includes the phonetic index 12 , is separate from a file or stream that includes a separate source index 16 .
  • the source index 16 may be provided in a file that is separate from a file or stream that contains both the phonetic index 12 and the corresponding audio segment 10 .
  • synchronization information such as a time reference, may be provided in association with the source information entries of the source index 16 .
  • the time reference or synchronization information will correspond to a time reference or appropriate synchronization reference that is inherent to the audio item 14 or provided therein to facilitate synchronization of the phonetic index 12 and the source index 16 .
  • FIG. 10 illustrates an embodiment wherein separate files or streams are used to provide the phonetic index 12 and the source index 16 , respectively.
  • the files for the respective phonetic index 12 and source index 16 may be separate from a file or stream containing the audio segment.
  • the phonetic index 12 and the source index 16 may include a time reference or other synchronization information to allow the phonemes of the phonetic index 12 to be associated with a particular source. This time reference or synchronization information may also relate to the audio segment 10 in a separate file or stream, in certain embodiments.
  • the phonetic index 12 of speech content from multiple sources may be searched based on the phonetic content, corresponding source, or a combination thereof. Such searches may entail searching the phonetic index 12 based on phonetic content criteria to identify a source associated with the phonetic content, searching the phonetic index 12 based on the phonetic content criteria as well as source criteria to identify a matching location in the phonetic index or corresponding audio segment 10 , and the like. Accordingly, the source information that is associated with the phonetic index 12 may be useful as a search criterion or a search result. In one embodiment, a phonetic index 12 and any source information directly or indirectly associated therewith may be searched as follows.
  • a search query providing content criteria and source criteria is received.
  • the content criteria bear on the desired phonetic content and source criteria bear on the desired source.
  • the content criterion and the source criteria is obtained (Steps 200 and 202 ).
  • FIGS. 12A and 12B illustrate application windows 18 in which a user may provide the content criteria and source criteria for the search query.
  • a content field 20 is provided for entering the content criteria and a source field 22 is provided for entering the source criteria.
  • the content criteria query may include keywords, phoneme strings, or any combination thereof alone or in the form of a Boolean function.
  • the search criterion is a single keyword “Washington.” If one or more keywords are used, each keyword or keywords are broken into a phonetic equivalent, which is generally a corresponding string of phonemes and represents phonetic search criteria (Step 204 ). If the keyword is “Washington,” the phonetic search criterion is represented by the phonetic string: w a sh ih ng t ah n, as illustrated in FIG. 13 . The user may enter the phonetic equivalent of the keyword instead of the corresponding keyword in the search content field 20 , as provide in FIG. 12B . If the phonetic equivalent is provided, the step of generating the phonetic search criterion is not necessary. Notably, a single keyword is used in this example for the sake of clarity, but those skilled in the art will appreciate that more complicated content criteria including multiple keywords and associated Boolean operators may be provided.
  • the content criteria either is or is converted to phonetic search criteria comprising one more strings of phonemes, which may be associated with one or more Boolean operators.
  • the phonetic index and associated source information are then searched based on the phonetic content criteria and the source criteria (Step 206 ) and portions of the phonetic index that match the phonetic search criteria and correspond to the source or sources of source criteria are identified (Step 208 ), as depicted in FIG. 14 .
  • the highlighted section of the phonetic index 12 represents the matching string of phonemes (w a sh ih ng t ah n ) for the keyword “Washington.”
  • various actions may be taken in response to identifying those portions of the phonetic index that match the phonetic search criteria and correspond to the source or sources identified by the source criteria (Step 210 ). Further, such processing and searching may be provided on existing media files or streaming media that has speech content from one or more parties.
  • the actions taken may range from merely indicating that a match was found to identifying the locations in the phonetic index 12 or audio segment 10 wherein the matches were found.
  • the text about the location of a phonetic match may be provided in a textual format, wherein the phonetic index 12 or other source is used to provide all or a portion of a transcript associated with the phonetic match.
  • the portions of the audio segment that correspond to the phonetic match may be played, queued, or otherwise annotated or identified.
  • multi-party telephone conversations may be monitored based on keywords alone, and when certain keywords are uttered, alerts are generated to indicate when the keywords were uttered and the party who uttered them.
  • multi-party telephone conversations may be monitored based on keywords as well as source criteria, such that when certain keywords are uttered by an identified party or parties, alerts are generated to indicate when the keywords were uttered by the identified party or parties.
  • the alerts may identify each time a keyword is uttered and identify the party uttering the keyword at any given time, wherein utterances of the keyword by parties that are not identified in the search criteria will not generate an alert.
  • keyword is generally used to identify any type of syllable search term, sound, phrase, or utterance, as well as any series or string thereof that are associated directly or through Boolean logic.
  • the media environment 24 may include a media system 26 and a phonetic processing system 28 .
  • the media system 26 may process any number of media items, which include speech, from any number of sources, such as communication terminals 30 .
  • the media system 26 may simply operate to allow various communication terminals 30 to communicate with one another in the case of a communication network or the like.
  • the media system 26 may also represent a service node or like communication processing system, wherein composite speech signals, which include the speech from different sources or parties, are made available alone or in association with other media.
  • the media system 26 may provide multi-source audio content, which includes audio segments that have speech from different sources.
  • the media system 26 may be able to identify the source that is associated with the various segments of speech in the audio segment. Accordingly, the source information will identify a source for the various speech segments in the multi-source audio content.
  • the phonetic processing system 28 may provide the functionality described above, and as such, may receive the source information and the multi-source audio content, generate a phonetic index for the multi-source audio content, and associate sources with the phonemes in the phonetic index based on the source information.
  • the source information may be integrated with or provided separately from the multi-source audio content, depending on the application.
  • a database 32 a search server 34 , and a search terminal 36 , which may also represent a communication terminal 30 .
  • the search server 34 will control searching of a phonetic index and any integrated or separate source information as described above, in response to search queries provided by the search terminal 36 .
  • the phonetic processing system 28 may be instructed by the search server 34 to search the incoming multi-source audio content in real time or access phonetic indices 12 that are stored in the database 32 .
  • the phonetic processing system 28 may generate the phonetic indices 12 and associated source information, as well as search the phonetic indices 12 , the source information, or a combination thereof, in real time.
  • search results may be reported to the search server 34 and on to the search terminal 36 .
  • the phonetic processing system 28 may generate the phonetic indices 12 and associated source information, and store the phonetic indices 12 and associated source information in the database 32 , alone or in association with the multi-source audio content. If stored in the database 32 , the search server 34 may access the phonetic indices 12 and associated source information for any number of multi-source audio content and provide search results to the search terminal 36 .
  • the present invention is particularly useful in an audio or video conferencing.
  • An overview of a conference environment in which the present invention may be practiced is provided in association with FIG. 16 .
  • a number of communication terminals 30 are in communication with a conference system 38 , which has a conference bridge 40 .
  • a communication terminal 30 for each of the participants in a conference call is coupled to the conference system 38 through a voice session.
  • the conference system 38 will facilitate the conference call via the various voice sessions in traditional fashion, and may also support associated video conferencing.
  • the communication terminals are generally referenced with the numeral 30 ; however, the different types of communication terminals are specifically identified when desired with a letter V, D, or C.
  • a voice communication terminal 30 (V) is primarily configured for voice communications, communicates with the conference system 38 through an appropriate voice network 42 , and generally has limited data processing capability.
  • the voice communication terminal 30 (V) may represent a wired, wireless, or cellular telephone or the like while the voice network 42 may be a cellular or public switched telephone network (PSTN).
  • PSTN public switched telephone network
  • a data communication terminal 30 (D) may represent a computer, personal digital assistant, media player, or like processing device that communicates with the conference system 38 over a data network 44 , such as a local area network, the Internet, or the like.
  • a data network 44 such as a local area network, the Internet, or the like.
  • certain users will have a data communication terminal 30 (D) and an associated voice communication terminal 30 (V).
  • a user may have an office or cellular telephone as well as a personal computer.
  • a composite communication terminal 30 (C) supports voice communications as well as sufficient control applications to facilitate interactions with the conference system 38 over the data network 44 , as will be described further below.
  • the composite communication terminal 30 (C) may be a personal computer that is capable of supporting telephony applications or a telephone capable of supporting computing applications, such as a browser application.
  • certain conference participants are either associated with a composite communication terminal 30 (C) or both voice and data communication terminals 30 (V), 30 (D).
  • C composite communication terminal
  • V voice and data communication terminals
  • D data communication terminals
  • a session function of the conference system 38 may be used to help facilitate establishment of the voice sessions for the conference call.
  • the session function may represent call server functionality or like session signaling control function that participates in establishing, controlling, and breaking down the bearer paths or bearer channels for the voice sessions with the conference bridge 40 .
  • a control channel may also be established for each or certain participants.
  • the control channel for each participant is provided between an associated communication terminal 30 and the conference system 38 .
  • the control channel may allow a corresponding participant to control various aspects of the conference call, receive information related to the conference call, provide informant related to the conference call, and exchange information with other participants.
  • the control channels may be established with a conference control function, which is operatively associated with the conference bridge 40 and the session control function.
  • control channels may be established between the composite communication terminal 30 (C) and the conference control function while the voice session is established between the composite communication terminal 30 (C) and the conference bridge 40 .
  • control channels may be established between the data communication terminal 30 (D) and the conference control function, while the corresponding voice sessions are established between the voice communication terminals 30 (V) and the conference bridge 40 .
  • control channels may take any form
  • an exemplary control channel is provided by a web session wherein the conference control function runs a web server application and the composite communication terminal 30 (C) runs a compatible browser application.
  • the browser application provides a control interface for the associated participant and the web server application will control certain operations of the conference system 30 based on participant input and facilitate interactions with and between the participants.
  • the conference bridge 40 may be associated with the search server 34 and the phonetic processing system 28 .
  • keyword or phonetic search queries may be received by the search server 34 from the participants via the control channels, and search results may be provided to the participants via the same control channels.
  • the conference bridge 40 will be able to provide a conference output that represents multi-source audio content and associated source information to the phonetic processing system 28 to facilitate creation of a phonetic index 12 and associated source information for the conference output as well as searching of the phonetic index 12 and the associated source information.
  • the conference bridge 40 is used to facilitate a conference call between two or more conference participants who are in different locations.
  • calls from each of the participants are connected to the conference bridge 40 .
  • the audio levels of the incoming audio signals from the different calls are monitored.
  • One or more of the audio signals having the highest audio level are selected and provided to the participants as an output of the conference bridge.
  • the audio signal with the highest audio level generally corresponds to the participant who is talking at any given time. If multiple participants are talking, audio signals for the participant or participants who are talking the loudest at any given time are selected.
  • the unselected audio signals are generally not provided by the conference bridge to conference participants. As such, the participants are only provided the selected audio signal or signals and will not receive the unselected audio signals of the other participants. To avoid distracting the conference participants who are providing the selected audio signals, the selected audio signals are generally not provided back to the corresponding conference participants. In other words, the active participant in the conference call is not fed back their own audio signal.
  • a conference bridge 40 may function to mix the audio signals from the different sources. As the audio levels of the different audio signals change, different ones of the audio signals are selected throughout the conference call and provided to the conference participants as the output of the conference bridge.
  • FIGS. 17A and 17B An exemplary embodiment of the conference bridge 40 is now described in association with FIGS. 17A and 17B .
  • the conference bridge 40 receives audio signals for users A through F via source ports, SOURCES 1 through 6 , and provides the selected one of the audio signals to users A through F via output ports, OUTPUT 1 through 6 .
  • Each voice session is associated with one source port, SOURCE N, that can receive audio signals for a user from a corresponding communication terminal 30 and one output port, OUTPUT N, that can provide the selected audio signal back to that communication terminal 30 .
  • Audio signals from users A through F are received at source ports SOURCE 1 through 6 , respectively, by the conference bridge 40 .
  • the audio signals from User A are selected to be the output of the conference bridge 40
  • User A's audio signals are provided to users B through F via output ports OUTPUT 2 through 6 , respectively.
  • User A's audio signals are not provided back to User A via output port OUTPUT 1 .
  • the audio signals of the other users B through F are dropped.
  • the conference bridge 40 will select the audio signals from User C to be the output of the conference bridge 40 . Audio signals from users A through F are still received at source ports SOURCE 1 through 6 , respectively. Since the audio signals from User C are selected to be the output of the conference bridge 40 , User C's audio signals are provided to users A, B, D, E, and F via output ports OUTPUT 1 , 2 , 4 , 5 , and 6 , respectively. To avoid distracting User C, User C's audio signals are not provided back to User C via output port OUTPUT 3 . The audio signals of the other users A, B, D, E, and F are dropped.
  • Audio signals are received via source ports, SOURCE 1 -N, and processed by signal normalization circuitry 48 ( 1 -N).
  • the signal normalization circuitry 48 ( 1 -N) may operate on the various audio signals to provide a normalized signal level among the conference participants, such that the relative volume associated with each of the conference participants during the conference call is substantially normalized to a given level.
  • the signal normalization circuitry 48 ( 1 -N) is optional, but normally employed in conference bridges 40 . After normalization, the audio signals from the various circuitry are sent to an audio processing function 50 .
  • a source selection function 52 is used to select the source port, SOURCE 1 -N, which is receiving the audio signals with the highest average level.
  • the source selection function 52 provides a corresponding source selection signal to the audio processing function 50 .
  • the source selection signal identifies the source port, SOURCE 1 -N, which is receiving the audio signals with the highest average level.
  • These audio signals represent the selected audio signals to be output by the conference bridge 40 .
  • the audio processing function 50 will provide the selected audio signals from the selected source port, SOURCE 1 -N from all of the output ports, OUTPUT 1 -N, except for the output port that is associated with the selected source port.
  • the audio signals from the unselected source ports SOURCE 1 -N are dropped, and therefore not presented to any of the output ports, OUTPUT 1 -N, in traditional fashion.
  • the source port SOURCE 1 -N providing the audio signals having the greatest average magnitude is selected at any given time.
  • the source selection function 52 will continuously monitor the relative average magnitudes of the audio signals at each of the source ports, SOURCES 1 -N, and select appropriate source ports, SOURCE 1 -N, throughout the conference call. As such, the source selection function 52 will select different ones of the source ports, SOURCE 1 -N, throughout the conference call based on the participation of the participants.
  • the source selection function 52 may work in cooperation with level detection circuitry 54 ( 1 -N) to monitor the levels of audio signals being received from the different source ports, SOURCE 1 -N. After normalization by the signal normalization circuitry 48 ( 1 -N), the audio signals from source ports, SOURCE 1 -N are provided to the corresponding level detection circuitry 54 ( 1 -N). Each level detection circuitry 54 ( 1 -N) will process corresponding audio signals to generate a level measurement signal, which is presented to the source selection function 52 . The level measurement signal corresponds to a relative average magnitude of the audio signals that are received from a given source port, SOURCE 1 -N.
  • the level detection circuitry 54 ( 1 -N) may employ different techniques to generate a corresponding level measurement signal.
  • a power level derived from a running average of given audio signals or an average power level of audio signals over a given period of time is generated and represents the level measurement signal, which is provided by the level detection circuitry 54 to the source selection function 52 .
  • the source selection function 52 will continuously monitor the level measurement signals from the various level detection circuitry 54 ( 1 -N) and select one of the source ports, SOURCE 1 -N, as a selected source port based thereon.
  • the source selection function 52 will then provide a source selection signal to identify the selected source port SOURCE 1 -N to the audio processing function 50 , which will deliver the audio signals received at the selected source port, SOURCE 1 -N, from the different output ports, OUTPUT 1 -N, which are associated with the unselected source ports, SOURCE 1 -N.
  • the source selection function 52 may also provide source selection signals that identify the active source port, SOURCE 1 -N, at any given time to the phonetic processing system 28 .
  • the audio processing function 50 may provide the audio signals from the selected source port, SOURCE 1 -N, to the phonetic processing system 28 .
  • the phonetic processing system may generate a phonetic index 12 of phonemes for the audio signals.
  • the sources of the audio signals provided to the phonetic processing system 28 change, wherein the audio signals provide multi-source audio content.
  • the multi-source audio content effectively includes a series of speech segments from different source ports, SOURCE 1 -N.
  • the phonetic processing system 28 can associate a particular source port, SOURCE 1 -N, with a corresponding speech segment in the multi-source audio content, and in particular, the particular section or phonemes of the phonetic index 12 that corresponds to the speech segment.
  • the phonetic index 12 and associated source information may be monitored in real time or may be stored for subsequent processing.
  • search queries may be employed to identify utterances by certain parties or sources, and appropriate action may be taken.
  • the actions may include providing an alert via a control channel or other mechanism to the speaking party or the other parties, based on rules established by the speaking party or other parties.
  • the speaking party may establish rules to alert himself or other parties when the speaking party utters a certain keyword or phrase.
  • a first party may establish criteria wherein they are alerted when one or more selected parties utter a certain keyword or phrase.
  • a person who is not a party to the conference call may monitor the conference call and receive alerts when a keyword is uttered.
  • the alert may include the utterance and the source of the utterance.
  • criteria may be employed wherein alerts are provided to a person who is not participating in the conference call when only selected parties utter certain keywords or phrases. Similar processing may be provided on audio files of a conference call, after the conference call has concluded.
  • search criteria may be employed to search multiple media items based on content, source, or a combination thereof in an effective and efficient manner according to the present invention.
  • the service node 56 may be employed to provide or represent all or a portion of one or more of the following: a conference bridge 40 , a search server 34 , a phonetic processing system 28 , a media system 26 , or the like.
  • the service node 56 will include a control system 58 having sufficient memory 60 for the requisite software 62 and data 64 to operate as described above.
  • the software 62 may include any number of functions, such as a phonetic processing function 66 and a search function 68 . These functions may provide the functionality of the phonetic processing system 28 and the search server 34 , respectively, alone or in combination.
  • the control system 58 may also be associated with a communication interface 70 , which facilitates communications with the various entities in the media environment 24 , or an appropriate conference environment.

Abstract

The present invention relates to creating a phonetic index of phonemes from an audio segment that includes speech content from multiple sources. The phonemes in the phonetic index are directly or indirectly associated with the corresponding source of the speech from which the phonemes were derived. By associating the phonemes with a corresponding source, the phonetic index of speech content from multiple sources may be searched based on phonetic content as well as the corresponding source.

Description

    FIELD OF THE INVENTION
  • The present invention relates to phonetic searching, and in particular to associating source information with phonetic indices.
  • BACKGROUND OF THE INVENTION
  • A vast portion of modern communications is provided through written text or speech. In many instances, such text and speech are captured in electronic form and stored for future reference. Given the volume of these communications, large libraries of text and audio-based communications are being amassed and efforts are being made to make these libraries more accessible. Although there is significant benefit gained from thoughtful organization, contextual searching is becoming a necessary supplement, if not a replacement, for traditional organizing techniques. Most document management systems for written documents allow keyword searching throughout any number of databases, regardless of how the documents are organized, to allow users to electronically sift through volumes of documents in an effective and efficient manner.
  • Text-based documents lend themselves well to electronic searching because the content is easily characterized, understood, and searched. In short, the words of a document are well defined and easily searched. However, speech-based media, such as speech recordings, dictation, telephone calls, multi-party conference calls, music, and the like have traditionally been more difficult to analyze from a content perspective than text-based documents. Most speech-based media is characterized in general and organized and searched accordingly. The specific speech content is generally not known with any specificity, unless human or automated transcription is employed to provide an associated text-based document. Human transcription has proven time-consuming and expensive.
  • Over the past decade, significant efforts have been made to improve automated speech recognition. Unfortunately, most speech recognition techniques rely on creating large vocabularies of words, which are created based on linguistic modeling for cross-sections of the specific population in which the speech recognition system will be used. In essence, the vocabularies are filled with the many thousands of words that may be uttered during speech. Although such speech recognition has improved, the improvements have been incremental and remain error prone.
  • An evolving speech processing technology that shows significant promise is based on phonetics. In essence, speech is parsed into a series of discrete human sounds called phonemes. Phonemes are the smallest units of human speech, and most languages only have 30 to 40 phonemes. From this relatively small group of phonemes, all speech can be accurately defined. The series of phonemes created by this parsing process is readily searchable and referred to in general as a phonetic index of the speech. To search for the occurrence of a given term in the speech, the term is first transformed into its phonetic equivalent, which is provided in the form of a string of phonemes. The phonetic index is processed to identify whether the string of phonemes occurs within the phonetic index. If the string of phonemes for the search term occurs in the phonetic index, then the term occurs in the speech. If the phonetic index is time aligned with the speech, the location of the string of phonemes in the phonetic index will correspond to the location of the term in the speech. Notably, phonetic-based speech processing and searching techniques tend to be less complicated and more accurate than the traditional word-based speech recognition techniques. Further, the use of phonemes mitigates the impact of dialects, slang, and other language variations that make identifying a specific word difficult, but have much less impact on each individual phoneme that makes up the same word.
  • One drawback of phonetic-based speech processing is the ability to distinguish between speakers in multi-party speech, such as that found in telephone or conference calls. Although a particular term may be identified, there is no efficient and automated way to identify the speaker who uttered the term. The ability to associate portions of speech with the respective speakers in multi-party speech would add another dimension in the ability to process and analyze multi-party speech. As such, there is a need for an efficient and effective technique to identify and associate the source of speech in multi-party speech with the corresponding phonemes in a phonetic index that is derived from the multi-party speech.
  • SUMMARY OF THE INVENTION
  • The present invention relates to creating a phonetic index of phonemes from an audio segment that includes speech content from multiple sources. The phonemes in the phonetic index are directly or indirectly associated with the corresponding source of the speech from which the phonemes were derived. By associating the phonemes with a corresponding source, the phonetic index of speech content from multiple sources may be searched based on phonetic content as well as the corresponding source. In one embodiment, the audio segment is processed to identify phonemes for each unit of speech in the audio segment. A phonetic index of the phonemes is generated for the audio segment, wherein each phonetic entry in the phonetic index identifies a phoneme that is associated with a corresponding unit of speech in the audio segment. Next, each of the multiple sources is associated with corresponding phonetic entries in the phonetic index, wherein a source associated with a given phonetic entry corresponds to the source of the unit of speech from which the phoneme for the given phonetic entry was generated. Various techniques may be employed to associate the phonemes with their sources; however, once associated, the phonetic index may be searched based on phonetic content and source criteria.
  • Such searches may entail searching the phonetic index based on phonetic content criteria to identify a source associated with the phonetic content, searching the phonetic index based on the phonetic content criteria as well as source criteria to identify a matching location in the phonetic index or corresponding audio segment, and the like. Accordingly, the source information that is associated with the phonetic index may be useful as a search criterion or a search result. In one embodiment, a phonetic index and any source information directly or indirectly associated therewith may be searched as follows.
  • Initially, content criteria bearing on the desired phonetic content and source criteria bearing on the desired source are obtained via an appropriate search query. The content criteria query may include keywords, phoneme strings, or any combination thereof alone or in the form of a Boolean function. If one or more keywords are used, each keyword is broken into its phonetic equivalent, which will provide a string of phonemes. Accordingly, the content criteria either is or is converted into phonetic search criteria comprising one or more strings of phonemes, which may be associated with one or more Boolean operators. The phonetic index and associated source information are then searched based on the phonetic content criteria and the source criteria to identify portions of the phonetic index that match the phonetic search criteria and correspond to the source or sources identified in the source criteria. Depending on the application, various actions may be taken in response to identifying those portions of the phonetic index that match the phonetic search criteria and correspond to the source or sources identified by the source criteria. Further, such processing and searching may be provided on existing media files or streaming media that has speech content from one or more parties.
  • Those skilled in the art will appreciate the scope of the present invention and realize additional aspects thereof after reading the following detailed description of the preferred embodiments in association with the accompanying drawing figures.
  • BRIEF DESCRIPTION OF THE DRAWING FIGURES
  • The accompanying drawing figures incorporated in and forming a part of this specification illustrate several aspects of the invention, and together with the description serve to explain the principles of the invention.
  • FIG. 1 is a flow diagram illustrating an exemplary process for generating a phonetic index and associating source information with the phonetic index according to one embodiment of the present invention.
  • FIG. 2 illustrates the parsing of an audio segment into a sequence of phonemes for a phonetic index according to one embodiment of the present invention
  • FIGS. 3-10 illustrate different techniques for generating a phonetic index and associating source information with the phonetic index according to different embodiments of the present invention.
  • FIG. 11 is a flow diagram illustrating an exemplary technique for searching a phonetic index according to one embodiment of the present invention.
  • FIGS. 12A and 12B illustrates windows in which content criteria and source criteria may be entered for a search query.
  • FIG. 13 illustrates the translation of a keyword to a string of phonemes for the keyword.
  • FIG. 14 illustrates matching a string of phonemes within the sequence of phonemes of a phonetic index.
  • FIG. 15 illustrates a media environment according to one embodiment of the present invention.
  • FIG. 16 illustrates a conferencing environment according to one embodiment of the present invention.
  • FIGS. 17A and 17B illustrate the basic operation of a conference bridge according to one embodiment of the present invention.
  • FIG. 18 illustrates a block diagram of a conference bridge according to one embodiment of the present invention.
  • FIG. 19 illustrates a service node according to one embodiment of the present invention.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • The embodiments set forth below represent the necessary information to enable those skilled in the art to practice the invention and illustrate the best mode of practicing the invention. Upon reading the following description in light of the accompanying drawing figures, those skilled in the art will understand the concepts of the invention and will recognize applications of these concepts not particularly addressed herein. It should be understood that these concepts and applications fall within the scope of the disclosure and the accompanying claims.
  • The present invention relates to creating a phonetic index of phonemes from an audio segment that includes speech content from multiple sources. The phonemes in the phonetic index are directly or indirectly associated with the corresponding source of the speech from which the phonemes were derived. By associating the phonemes with a corresponding source, the phonetic index of speech content from multiple sources may be searched based on phonetic content as well as the corresponding source. With reference to FIG. 1, an overview is provided of an exemplary process for generating a phonetic index for an audio segment that includes speech content from multiple sources, and associating sources with the corresponding phonemes, according to one embodiment of the present invention.
  • Initially, an audio segment including speech content from multiple sources is accessed for processing (Step 100). The audio segment may be provided in any type of media item, such as a media stream or stored media file that includes speech from two or more known sources. The media item may include graphics, images, or video in addition to the audio segment. The audio segment is then parsed to identify phonemes for each unit of speech in the audio segment (Step 102). The result of such processing is a sequence of phonemes, which represents the phonetic content of the audio segment. Based on this sequence of phonemes, a phonetic index of the phonemes is generated for the audio segment, wherein each phonetic entry in the phonetic index identifies a phoneme that is associated with a corresponding unit of speech in the audio segment (Step 104). Next, each of the multiple sources is associated with corresponding phonetic entries in the phonetic index, wherein a source associated with a given phonetic entry corresponds to the source of the unit of speech from which the phoneme for the given phonetic entry was generated (Step 106). Various techniques may be employed to associate the phonemes with their sources; however, once associated, the phonetic index may be searched based on phonetic content and source criteria.
  • FIG. 2 illustrates the parsing and association process described above. Assume that an audio segment 10 includes interactive speech between two sources. The first source (Source 1) is an operator, and the second source (Source 2) is a caller. Among other speech, assume the operator (Source 1) utters the question “how may I help you?” and in response, the caller (Source 2) utters the response “please transfer me to the Washington office for a call.” As the audio segment 10 is parsed into corresponding phonemes, each basic unit of speech is translated into a phoneme (PH). The sequence of phonemes is used to form a corresponding phonetic index 12 and a source of the phonemes is associated with the phonemes of the phonetic index 12. As will be described further below, there are numerous ways to associate a source with a corresponding group of phonemes in the phonetic index 12. With respect to FIG. 2, assume the source of the phonemes is provided in or with the phonetic index 12, which may be associated with the audio segment in a media item or in a separate file or stream, which is associated with the media item containing the audio segment.
  • A portion of the phonetic index 12 is illustrated, and the actual phonemes for the speech segment corresponding to “transfer me to the Washington office” are provided. Phonemes about this speech segment are illustrated generically as PH. The string of phonemes for the speech segment “transfer me to the Washington office” is represented as follows:
  • t r ae n s f er m iy t uw dh ah
    w a sh ih ng t ah n ao f ah s.

    Since the speech segment was uttered by the caller (Source 2), corresponding source information is provided in the phonetic index 12, wherein the phonemes uttered by the caller (Source 2) are associated with the caller (Source 2). Notably, this phonetic example is an over-simplified representation of a typical phonetic index. A phonetic index may include or represent characteristics of the acoustic channel, which represents the environment in which the speech was uttered, and a transducer through which it was recorded, and a natural language in which human beings express the speech. Acoustic channel characteristics include frequency response, background noise, and reverberation. Natural language characteristics include accent, dialect, and gender traits. For basic information on one technique for parsing speech into phonemes, please refer to the phonetic processing technology provided by Nexidia Inc., 3565 Piedmont Road NE, Building Two, Suite 400, Atlanta, Ga. 30305 (www.nexidia.com), and its white paper entitled Phonetic Search Technology, 2007 and the references cited therein, wherein the white paper and cited reference are each incorporated herein by reference in their entireties.
  • As indicated, the phonetic index 12 and associated source information may be directly or indirectly associated with each other in a variety of ways. Regardless of the source information, the phonetic index 12 may also be maintained in a variety of ways. For example, the phonetic index 12 may be associated with the corresponding audio segment in a media item that includes both the audio segment and the phonetic index 12. In one embodiment, the phonetic index 12 may be maintained as metadata associated with the audio segment, wherein the phonetic index 12 is preferably, but need not be, synchronized (or time-aligned) with the audio segment. When synchronized, a particular phoneme is matched to a particular location in the audio segment where the unit of speech from which the phoneme was derived resides. Alternatively, the phonetic index 12 may be maintained in a separate file or stream, which may or may not be synchronized with the audio segment, depending on the application. When there is a need for synchronization, the phonetic index may be associated with a time reference or other synchronization reference with respect to the audio segment or media item containing the audio segment. Notably, certain applications will not require the maintenance of an association between the phonetic index 12 and the audio segment from which the phonetic index 12 was derived. Similarly, the source information may be maintained with the phonetic index 12, in a separate file or stream, or in the media item containing the audio segment. Notably, certain applications will not require the maintenance of an association between the source information and the audio segment from which the phonetic index 12 was derived. FIGS. 3 through 10 illustrate various ways to maintain a phonetic index 12 and associated source information.
  • FIG. 3 illustrates an embodiment wherein the phonetic index 12 includes a sequence of phonemes (PH), which correspond to the units of speech in the audio segment, as well as source information for each phoneme (PH). The source information SX, where X represents a particular source of the source information SX, is generally provided for each phoneme. Accordingly, each position of the phonetic index 12 includes a phoneme for a particular unit of speech and the corresponding source for that particular unit of speech. The phonetic index 12 may be provided in a separate media item, such as a file or stream, which may or may not be time-aligned with the corresponding audio segment.
  • FIG. 4 illustrates a similar embodiment as that illustrated in FIG. 3, with the exception that the source information SX is provided in association with the first phoneme that is associated with a new source. Accordingly, once a particular source SX is identified, the phonemes derived after determining the particular source are associated with that particular source until there is a change to another source. The phonetic index 12 may be provided in a separate media item, such as a file or stream, which may or may not be time-aligned with the corresponding audio segment 10.
  • FIG. 5 illustrates a phonetic index 12 wherein the source information SX is associated with each phoneme, such as that provided in FIG. 3. In this embodiment, the phonetic index 12 is further associated with a time reference or other synchronization information, which allows a particular phoneme and associated source SX to be correlated to a corresponding location in the audio segment 10. As such, the time reference allows an association between the unit of speech and the audio segment 10 to the corresponding phoneme and source in the phonetic index 12, wherein the audio segment 10 and the phonetic index 12 may be provided in separate files or streams.
  • FIG. 6 provides a similar embodiment, wherein the source information provided in the phonetic index 12 identifies the source SX as associated with a particular phoneme, without providing a separate source index position for each corresponding phoneme position, such as that illustrated in FIG. 4. However, the embodiment in FIG. 6 is further associated with a time reference or other synchronization information as illustrated in FIG. 5.
  • With reference to FIGS. 7 and 8, the phonetic index 12 is stored along with the audio segment 10 and the source information SX in a media item, such as a media file or stream. Notably, the audio segment 10 may be the same as that from which the phonemes were derived or a compressed version thereof. Regardless of the compression or formatting, the content of the audio segment 10 will correspond to the phonetic index 12 that is associated therewith. Further, the phonemes of the phonetic index 12 and the corresponding source information may be provided as metadata associated with the audio segment 10. The source information will be aligned with the corresponding phonemes; however, the phonemes may or may not be synchronized or time-aligned with the audio segment 10, depending on the particular application.
  • With reference to FIG. 9, a media item 14, which includes the phonetic index 12, is separate from a file or stream that includes a separate source index 16. Accordingly, the source index 16 may be provided in a file that is separate from a file or stream that contains both the phonetic index 12 and the corresponding audio segment 10. In order to synchronize the source information of the source index 16 with the phonemes in the phonetic index 12, synchronization information, such as a time reference, may be provided in association with the source information entries of the source index 16. The time reference or synchronization information will correspond to a time reference or appropriate synchronization reference that is inherent to the audio item 14 or provided therein to facilitate synchronization of the phonetic index 12 and the source index 16.
  • FIG. 10 illustrates an embodiment wherein separate files or streams are used to provide the phonetic index 12 and the source index 16, respectively. In this embodiment, the files for the respective phonetic index 12 and source index 16 may be separate from a file or stream containing the audio segment. Notably, the phonetic index 12 and the source index 16 may include a time reference or other synchronization information to allow the phonemes of the phonetic index 12 to be associated with a particular source. This time reference or synchronization information may also relate to the audio segment 10 in a separate file or stream, in certain embodiments.
  • By associating the phonemes with a corresponding source, the phonetic index 12 of speech content from multiple sources may be searched based on the phonetic content, corresponding source, or a combination thereof. Such searches may entail searching the phonetic index 12 based on phonetic content criteria to identify a source associated with the phonetic content, searching the phonetic index 12 based on the phonetic content criteria as well as source criteria to identify a matching location in the phonetic index or corresponding audio segment 10, and the like. Accordingly, the source information that is associated with the phonetic index 12 may be useful as a search criterion or a search result. In one embodiment, a phonetic index 12 and any source information directly or indirectly associated therewith may be searched as follows.
  • With reference to FIG. 11, initially, a search query providing content criteria and source criteria is received. The content criteria bear on the desired phonetic content and source criteria bear on the desired source. From the search query, the content criterion and the source criteria is obtained (Steps 200 and 202). FIGS. 12A and 12B illustrate application windows 18 in which a user may provide the content criteria and source criteria for the search query. A content field 20 is provided for entering the content criteria and a source field 22 is provided for entering the source criteria. The content criteria query may include keywords, phoneme strings, or any combination thereof alone or in the form of a Boolean function. In FIG. 12A, the search criterion is a single keyword “Washington.” If one or more keywords are used, each keyword or keywords are broken into a phonetic equivalent, which is generally a corresponding string of phonemes and represents phonetic search criteria (Step 204). If the keyword is “Washington,” the phonetic search criterion is represented by the phonetic string: w a sh ih ng t ah n, as illustrated in FIG. 13. The user may enter the phonetic equivalent of the keyword instead of the corresponding keyword in the search content field 20, as provide in FIG. 12B. If the phonetic equivalent is provided, the step of generating the phonetic search criterion is not necessary. Notably, a single keyword is used in this example for the sake of clarity, but those skilled in the art will appreciate that more complicated content criteria including multiple keywords and associated Boolean operators may be provided.
  • Accordingly, the content criteria either is or is converted to phonetic search criteria comprising one more strings of phonemes, which may be associated with one or more Boolean operators. The phonetic index and associated source information are then searched based on the phonetic content criteria and the source criteria (Step 206) and portions of the phonetic index that match the phonetic search criteria and correspond to the source or sources of source criteria are identified (Step 208), as depicted in FIG. 14. The highlighted section of the phonetic index 12 represents the matching string of phonemes (w a sh ih ng t ah n ) for the keyword “Washington.” Depending on the application, various actions may be taken in response to identifying those portions of the phonetic index that match the phonetic search criteria and correspond to the source or sources identified by the source criteria (Step 210). Further, such processing and searching may be provided on existing media files or streaming media that has speech content from one or more parties.
  • The actions taken may range from merely indicating that a match was found to identifying the locations in the phonetic index 12 or audio segment 10 wherein the matches were found. Accordingly, the text about the location of a phonetic match may be provided in a textual format, wherein the phonetic index 12 or other source is used to provide all or a portion of a transcript associated with the phonetic match. Alternatively, the portions of the audio segment that correspond to the phonetic match may be played, queued, or otherwise annotated or identified. In another example, multi-party telephone conversations may be monitored based on keywords alone, and when certain keywords are uttered, alerts are generated to indicate when the keywords were uttered and the party who uttered them. Alternatively, multi-party telephone conversations may be monitored based on keywords as well as source criteria, such that when certain keywords are uttered by an identified party or parties, alerts are generated to indicate when the keywords were uttered by the identified party or parties. The alerts may identify each time a keyword is uttered and identify the party uttering the keyword at any given time, wherein utterances of the keyword by parties that are not identified in the search criteria will not generate an alert. Those skilled the art will recognize innumerable applications based on the teachings provided herein. Notably, the term “keyword” is generally used to identify any type of syllable search term, sound, phrase, or utterance, as well as any series or string thereof that are associated directly or through Boolean logic.
  • With reference to FIG. 15, a media environment 24 in which one embodiment of the present invention may be employed is illustrated. The media environment 24 may include a media system 26 and a phonetic processing system 28. The media system 26 may process any number of media items, which include speech, from any number of sources, such as communication terminals 30. The media system 26 may simply operate to allow various communication terminals 30 to communicate with one another in the case of a communication network or the like. The media system 26 may also represent a service node or like communication processing system, wherein composite speech signals, which include the speech from different sources or parties, are made available alone or in association with other media. As such, the media system 26 may provide multi-source audio content, which includes audio segments that have speech from different sources. In addition, the media system 26 may be able to identify the source that is associated with the various segments of speech in the audio segment. Accordingly, the source information will identify a source for the various speech segments in the multi-source audio content.
  • The phonetic processing system 28 may provide the functionality described above, and as such, may receive the source information and the multi-source audio content, generate a phonetic index for the multi-source audio content, and associate sources with the phonemes in the phonetic index based on the source information. Notably, the source information may be integrated with or provided separately from the multi-source audio content, depending on the application.
  • Also illustrated are a database 32, a search server 34, and a search terminal 36, which may also represent a communication terminal 30. The search server 34 will control searching of a phonetic index and any integrated or separate source information as described above, in response to search queries provided by the search terminal 36. Notably, the phonetic processing system 28 may be instructed by the search server 34 to search the incoming multi-source audio content in real time or access phonetic indices 12 that are stored in the database 32. When processing real-time or streaming information, the phonetic processing system 28 may generate the phonetic indices 12 and associated source information, as well as search the phonetic indices 12, the source information, or a combination thereof, in real time. Any search results may be reported to the search server 34 and on to the search terminal 36. Alternatively, the phonetic processing system 28 may generate the phonetic indices 12 and associated source information, and store the phonetic indices 12 and associated source information in the database 32, alone or in association with the multi-source audio content. If stored in the database 32, the search server 34 may access the phonetic indices 12 and associated source information for any number of multi-source audio content and provide search results to the search terminal 36.
  • The present invention is particularly useful in an audio or video conferencing. An overview of a conference environment in which the present invention may be practiced is provided in association with FIG. 16. As illustrated, a number of communication terminals 30 are in communication with a conference system 38, which has a conference bridge 40. A communication terminal 30 for each of the participants in a conference call is coupled to the conference system 38 through a voice session. The conference system 38 will facilitate the conference call via the various voice sessions in traditional fashion, and may also support associated video conferencing.
  • The communication terminals are generally referenced with the numeral 30; however, the different types of communication terminals are specifically identified when desired with a letter V, D, or C. In particular, a voice communication terminal 30(V) is primarily configured for voice communications, communicates with the conference system 38 through an appropriate voice network 42, and generally has limited data processing capability. The voice communication terminal 30(V) may represent a wired, wireless, or cellular telephone or the like while the voice network 42 may be a cellular or public switched telephone network (PSTN).
  • A data communication terminal 30(D) may represent a computer, personal digital assistant, media player, or like processing device that communicates with the conference system 38 over a data network 44, such as a local area network, the Internet, or the like. In certain embodiments, certain users will have a data communication terminal 30(D) and an associated voice communication terminal 30(V). For example, a user may have an office or cellular telephone as well as a personal computer. Alternatively, a composite communication terminal 30(C) supports voice communications as well as sufficient control applications to facilitate interactions with the conference system 38 over the data network 44, as will be described further below. The composite communication terminal 30(C) may be a personal computer that is capable of supporting telephony applications or a telephone capable of supporting computing applications, such as a browser application.
  • In certain embodiments of the present invention, certain conference participants are either associated with a composite communication terminal 30(C) or both voice and data communication terminals 30(V), 30(D). For a conference call, each participant is engaged in a voice session, or call, which is connected to the conference bridge 40 of the conference system 38 via one or more network interfaces 46. Data or video capable terminals are used for application sharing or video presentation. A session function of the conference system 38 may be used to help facilitate establishment of the voice sessions for the conference call. In particular, the session function may represent call server functionality or like session signaling control function that participates in establishing, controlling, and breaking down the bearer paths or bearer channels for the voice sessions with the conference bridge 40.
  • In addition to a voice session, a control channel may also be established for each or certain participants. The control channel for each participant is provided between an associated communication terminal 30 and the conference system 38. The control channel may allow a corresponding participant to control various aspects of the conference call, receive information related to the conference call, provide informant related to the conference call, and exchange information with other participants. The control channels may be established with a conference control function, which is operatively associated with the conference bridge 40 and the session control function. For participants using a composite communication terminal 30(C), control channels may be established between the composite communication terminal 30(C) and the conference control function while the voice session is established between the composite communication terminal 30(C) and the conference bridge 40. For participants using voice and data communication terminals 30(V), 30(D), control channels may be established between the data communication terminal 30(D) and the conference control function, while the corresponding voice sessions are established between the voice communication terminals 30(V) and the conference bridge 40.
  • Although the control channels may take any form, an exemplary control channel is provided by a web session wherein the conference control function runs a web server application and the composite communication terminal 30(C) runs a compatible browser application. The browser application provides a control interface for the associated participant and the web server application will control certain operations of the conference system 30 based on participant input and facilitate interactions with and between the participants.
  • The conference bridge 40, including the session function and the conference control function, may be associated with the search server 34 and the phonetic processing system 28. As such, keyword or phonetic search queries may be received by the search server 34 from the participants via the control channels, and search results may be provided to the participants via the same control channels. The conference bridge 40 will be able to provide a conference output that represents multi-source audio content and associated source information to the phonetic processing system 28 to facilitate creation of a phonetic index 12 and associated source information for the conference output as well as searching of the phonetic index 12 and the associated source information.
  • As noted, the conference bridge 40 is used to facilitate a conference call between two or more conference participants who are in different locations. In operation, calls from each of the participants are connected to the conference bridge 40. The audio levels of the incoming audio signals from the different calls are monitored. One or more of the audio signals having the highest audio level are selected and provided to the participants as an output of the conference bridge. The audio signal with the highest audio level generally corresponds to the participant who is talking at any given time. If multiple participants are talking, audio signals for the participant or participants who are talking the loudest at any given time are selected.
  • The unselected audio signals are generally not provided by the conference bridge to conference participants. As such, the participants are only provided the selected audio signal or signals and will not receive the unselected audio signals of the other participants. To avoid distracting the conference participants who are providing the selected audio signals, the selected audio signals are generally not provided back to the corresponding conference participants. In other words, the active participant in the conference call is not fed back their own audio signal. Those skilled in the art will recognize various ways in which a conference bridge 40 may function to mix the audio signals from the different sources. As the audio levels of the different audio signals change, different ones of the audio signals are selected throughout the conference call and provided to the conference participants as the output of the conference bridge.
  • An exemplary embodiment of the conference bridge 40 is now described in association with FIGS. 17A and 17B. With initial reference to FIG. 17A, assume the conference bridge 40 only selects one of the audio signals to provide as an output at any given time during the conference call. As illustrated, the conference bridge 40 receives audio signals for users A through F via source ports, SOURCES 1 through 6, and provides the selected one of the audio signals to users A through F via output ports, OUTPUT 1 through 6. Each voice session is associated with one source port, SOURCE N, that can receive audio signals for a user from a corresponding communication terminal 30 and one output port, OUTPUT N, that can provide the selected audio signal back to that communication terminal 30. For example, assume users A through F are conference participants and are each being served via different voice session, and thus communication terminal 30. Audio signals from users A through F are received at source ports SOURCE 1 through 6, respectively, by the conference bridge 40. Assuming the audio signals from User A are selected to be the output of the conference bridge 40, User A's audio signals are provided to users B through F via output ports OUTPUT 2 through 6, respectively. To avoid distracting User A, User A's audio signals are not provided back to User A via output port OUTPUT 1. The audio signals of the other users B through F are dropped.
  • With reference to FIG. 17B, assume that User A stops talking and User C begins talking. When User C begins talking, the conference bridge 40 will select the audio signals from User C to be the output of the conference bridge 40. Audio signals from users A through F are still received at source ports SOURCE 1 through 6, respectively. Since the audio signals from User C are selected to be the output of the conference bridge 40, User C's audio signals are provided to users A, B, D, E, and F via output ports OUTPUT 1, 2, 4, 5, and 6, respectively. To avoid distracting User C, User C's audio signals are not provided back to User C via output port OUTPUT 3. The audio signals of the other users A, B, D, E, and F are dropped.
  • An exemplary architecture for a conference bridge 40 is provided in FIG. 18. Audio signals are received via source ports, SOURCE 1-N, and processed by signal normalization circuitry 48(1-N). The signal normalization circuitry 48(1-N) may operate on the various audio signals to provide a normalized signal level among the conference participants, such that the relative volume associated with each of the conference participants during the conference call is substantially normalized to a given level. The signal normalization circuitry 48(1-N) is optional, but normally employed in conference bridges 40. After normalization, the audio signals from the various circuitry are sent to an audio processing function 50.
  • A source selection function 52 is used to select the source port, SOURCE 1-N, which is receiving the audio signals with the highest average level. The source selection function 52 provides a corresponding source selection signal to the audio processing function 50. The source selection signal identifies the source port, SOURCE 1-N, which is receiving the audio signals with the highest average level. These audio signals represent the selected audio signals to be output by the conference bridge 40. In response to the source selection signal, the audio processing function 50 will provide the selected audio signals from the selected source port, SOURCE 1-N from all of the output ports, OUTPUT 1-N, except for the output port that is associated with the selected source port. The audio signals from the unselected source ports SOURCE 1-N are dropped, and therefore not presented to any of the output ports, OUTPUT 1-N, in traditional fashion.
  • Preferably, the source port SOURCE 1-N providing the audio signals having the greatest average magnitude is selected at any given time. The source selection function 52 will continuously monitor the relative average magnitudes of the audio signals at each of the source ports, SOURCES 1-N, and select appropriate source ports, SOURCE 1-N, throughout the conference call. As such, the source selection function 52 will select different ones of the source ports, SOURCE 1-N, throughout the conference call based on the participation of the participants.
  • The source selection function 52 may work in cooperation with level detection circuitry 54(1-N) to monitor the levels of audio signals being received from the different source ports, SOURCE 1-N. After normalization by the signal normalization circuitry 48(1-N), the audio signals from source ports, SOURCE 1-N are provided to the corresponding level detection circuitry 54(1-N). Each level detection circuitry 54(1-N) will process corresponding audio signals to generate a level measurement signal, which is presented to the source selection function 52. The level measurement signal corresponds to a relative average magnitude of the audio signals that are received from a given source port, SOURCE 1-N. The level detection circuitry 54(1-N) may employ different techniques to generate a corresponding level measurement signal. In one embodiment, a power level derived from a running average of given audio signals or an average power level of audio signals over a given period of time is generated and represents the level measurement signal, which is provided by the level detection circuitry 54 to the source selection function 52. The source selection function 52 will continuously monitor the level measurement signals from the various level detection circuitry 54(1-N) and select one of the source ports, SOURCE 1-N, as a selected source port based thereon. As noted, the source selection function 52 will then provide a source selection signal to identify the selected source port SOURCE 1-N to the audio processing function 50, which will deliver the audio signals received at the selected source port, SOURCE 1-N, from the different output ports, OUTPUT 1-N, which are associated with the unselected source ports, SOURCE 1-N.
  • The source selection function 52 may also provide source selection signals that identify the active source port, SOURCE 1-N, at any given time to the phonetic processing system 28. Further, the audio processing function 50 may provide the audio signals from the selected source port, SOURCE 1-N, to the phonetic processing system 28. The phonetic processing system may generate a phonetic index 12 of phonemes for the audio signals. As the selected source ports, SOURCE 1-N, change throughout the conference call, the sources of the audio signals provided to the phonetic processing system 28 change, wherein the audio signals provide multi-source audio content. The multi-source audio content effectively includes a series of speech segments from different source ports, SOURCE 1-N. Since the source selection signals identify the active source port, SOURCE 1-N, at any given time, the phonetic processing system 28 can associate a particular source port, SOURCE 1-N, with a corresponding speech segment in the multi-source audio content, and in particular, the particular section or phonemes of the phonetic index 12 that corresponds to the speech segment.
  • The phonetic index 12 and associated source information may be monitored in real time or may be stored for subsequent processing. When processed in real time, search queries may be employed to identify utterances by certain parties or sources, and appropriate action may be taken. The actions may include providing an alert via a control channel or other mechanism to the speaking party or the other parties, based on rules established by the speaking party or other parties. As such, the speaking party may establish rules to alert himself or other parties when the speaking party utters a certain keyword or phrase. Alternatively, a first party may establish criteria wherein they are alerted when one or more selected parties utter a certain keyword or phrase. Further, a person who is not a party to the conference call may monitor the conference call and receive alerts when a keyword is uttered. The alert may include the utterance and the source of the utterance. Alternatively, criteria may be employed wherein alerts are provided to a person who is not participating in the conference call when only selected parties utter certain keywords or phrases. Similar processing may be provided on audio files of a conference call, after the conference call has concluded. With the present invention, multiple conference calls may be analyzed at the same time, in real time or after the conference call has concluded. Accordingly, search criteria may be employed to search multiple media items based on content, source, or a combination thereof in an effective and efficient manner according to the present invention.
  • With reference to FIG. 19, a block representation of a service node 56 is illustrated according to one embodiment of the present invention. The service node 56 may be employed to provide or represent all or a portion of one or more of the following: a conference bridge 40, a search server 34, a phonetic processing system 28, a media system 26, or the like. In general, the service node 56 will include a control system 58 having sufficient memory 60 for the requisite software 62 and data 64 to operate as described above. The software 62 may include any number of functions, such as a phonetic processing function 66 and a search function 68. These functions may provide the functionality of the phonetic processing system 28 and the search server 34, respectively, alone or in combination. The control system 58 may also be associated with a communication interface 70, which facilitates communications with the various entities in the media environment 24, or an appropriate conference environment.
  • Those skilled in the art will recognize improvements and modifications to the preferred embodiments of the present invention. All such improvements and modifications are considered within the scope of the concepts disclosed herein and the claims that follow.

Claims (31)

1. A method comprising:
accessing an audio segment comprising speech content from a plurality of sources, wherein different speech segments of the speech content correspond to different ones of the plurality of sources;
parsing the speech content into a sequence of phonemes wherein each phoneme corresponds to a basic unit of speech of the speech content;
generating a phonetic index from the sequence of phonemes; and
associating a corresponding one of the plurality of sources with phonemes in the sequence of phonemes of the phonetic index such that a source associated with a given phoneme or group of phonemes in the sequence of phonemes of the phonetic index is ascertainable.
2. The method of claim 1 wherein source information identifying the corresponding ones of the plurality of sources with the phonemes in the sequence of phonemes is provided in the phonetic index.
3. The method of claim 2 wherein the phonetic index comprises synchronization information sufficient to align the phonemes in the sequence of phonemes with corresponding basic units of speech in the speech content of the audio segment.
4. The method of claim 3 wherein the synchronization information is a time reference that aligns the phonemes of the sequence of phonemes in time with the corresponding units of speech in the speech content of the audio segment.
5. The method of claim 2 wherein the phonetic index is maintained in a data entity that does not include the audio segment.
6. The method of claim 2 wherein the phonetic index is maintained in a media item that also includes the audio segment or a version of the audio segment.
7. The method of claim 6 wherein the phonemes in the sequence of phonemes and the associated source information is time aligned with the audio segment, such that the phonemes of the sequence of phonemes are aligned in time with the corresponding units of speech in the speech content of the audio segment.
8. The method of claim 2 wherein the source information identifies one of the plurality of sources for each of the phonemes in the sequence of phonemes of the phonetic index.
9. The method of claim 2 wherein different groups of phonemes of the phonetic index correspond to the different speech segments, and the source information identifies one of the plurality of sources for each group of the different groups of phonemes.
10. The method of claim 1 wherein source information identifying the corresponding ones of the plurality of sources with the phonemes in the sequence of phonemes is provided in a separate file or stream than the phonetic index.
11. The method of claim 1 further comprising receiving a media item containing the audio segment and receiving source indicia identifying a source from the plurality sources for each of the different speech segments of the speech content in the audio segment, wherein the source indicia is used to associate the corresponding one of the plurality of sources with phonemes in the sequence of phonemes of the phonetic index.
12. The method of claim 11 wherein the media item is a media stream.
13. The method of claim 1 wherein the audio segment is a telephony audio signal and the different speech segments correspond to speech from different telephony sources or telephony parties.
14. The method of claim 13 wherein the telephony audio signal is an output of an audio conference bridge and the different speech segments correspond to speech from the different telephony sources or the telephony parties who are connected by the audio conference bridge.
15. The method of claim 14 further comprising:
providing the audio conference bridge to support audio conferencing of the different telephony sources or the telephony parties; and
providing the audio segment as the output of the conference bridge.
16. The method of claim 1 further comprising:
providing at least one phonetic search criterion in association with a search query wherein the phonetic search criterion corresponds to phonetic content;
identifying a matching portion of the phonetic index that meets the at least one phonetic search criterion; and
taking an action in response to identifying the matching portion of the phonetic index.
17. The method of claim 16 further comprising providing source criteria identifying at least one source of the plurality of sources in association with the search query, and wherein identifying the matching portion of the phonetic index comprises identifying the matching portion of the phonetic index that meets the at least one phonetic search criterion and the source criteria in light of the association of the corresponding one of the plurality of sources with the phonemes in the sequence of phonemes, such that the matching portion of the phonetic index corresponds to a source identified by the source criteria.
18. The method of claim 16 wherein the action comprises providing a notification that identifies the matching portion of the phonetic index.
19. The method of claim 1 further comprising:
providing source criteria identifying at least one source of the plurality of sources;
identifying a matching portion of the phonetic index that meets the source criteria in light of the association of the corresponding one of the plurality of sources with the phonemes in the sequence of phonemes; and
taking an action in response to identifying the matching portion of the phonetic index.
20. A method comprising:
accessing a phonetic index of a sequence of phonemes and associated source information wherein:
each phoneme in the phonetic index corresponds to a basic unit of speech content of an audio segment comprising speech content from a plurality of sources, such that different speech segments of the speech content correspond to different ones of the plurality of sources; and
the source information associates a corresponding one of the plurality of sources with the phonemes in the sequence of phonemes of the phonetic index;
determining at least one phonetic search criterion in association with a search query wherein the phonetic search criterion corresponds to phonetic content;
identifying a matching portion of the phonetic index that meets the at least one phonetic search criterion; and
taking an action in response to identifying the matching portion of the phonetic index.
21. The method of claim 20 further comprising determining source criteria identifying at least one source of the plurality of sources in association with the search query, and wherein identifying the matching portion of the phonetic index comprises identifying the matching portion of the phonetic index that meets the at least one phonetic search criterion and the source criteria in light of the source information, such that the matching portion of the phonetic index corresponds to a source identified by the source criteria.
22. The method of claim 20 wherein the action comprises providing a notification that identifies the matching portion of the phonetic index.
23. The method of claim 20 wherein the phonetic search criterion comprises a string of phonemes.
24. The method of claim 20 further comprising receiving a search query comprising at least one keyword, and wherein determining the at least one phonetic search criterion comprises translating the at least one keyword into a string of phonemes that is phonetically equivalent to the at least one keyword to provide the at least one phonetic search criterion.
25. The method of claim 20 wherein the audio segment is a telephony audio signal of a conference call, and the different speech segments correspond to speech from different telephony sources or telephony parties associated with the conference call.
26. The method of claim 20 wherein the audio segment is a telephony audio signal and the different speech segments correspond to speech from different telephony sources or telephony parties.
27. The method of claim 26 wherein the telephony audio signal is an output of an audio conference bridge and the different speech segments correspond to speech from the different telephony sources or the telephony parties who are connected by the audio conference bridge.
28. The method of claim 26 wherein taking the action comprises providing an alert to at least one of the different telephony sources or the telephony parties.
29. The method of claim 26 wherein taking the action comprises providing an alert to a person or entity other than the different telephony sources or the telephony parties.
30. The method of claim 20 wherein taking the action comprises identifying a portion of the audio content that corresponds to the matching portion of the phonetic index.
31. A system comprising:
at least one communication interface; and
a control system associated with the at least one communication interface and adapted to:
access an audio segment comprising speech content from a plurality of sources wherein different speech segments of the speech content correspond to different ones of the plurality of sources;
parse the speech content into a sequence of phonemes wherein each phoneme corresponds to a basic unit of speech of the speech content;
generate a phonetic index from the sequence of phonemes; and
associate a corresponding one of the plurality of sources with phonemes in the sequence of phonemes of the phonetic index such that a source associated with a given phoneme or group of phonemes in the sequence of phonemes of the phonetic index is ascertainable.
US12/249,451 2008-10-10 2008-10-10 Associating source information with phonetic indices Active 2031-08-31 US8301447B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US12/249,451 US8301447B2 (en) 2008-10-10 2008-10-10 Associating source information with phonetic indices
PCT/IB2009/007074 WO2010041131A1 (en) 2008-10-10 2009-10-08 Associating source information with phonetic indices

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US12/249,451 US8301447B2 (en) 2008-10-10 2008-10-10 Associating source information with phonetic indices

Publications (2)

Publication Number Publication Date
US20100094630A1 true US20100094630A1 (en) 2010-04-15
US8301447B2 US8301447B2 (en) 2012-10-30

Family

ID=42099701

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/249,451 Active 2031-08-31 US8301447B2 (en) 2008-10-10 2008-10-10 Associating source information with phonetic indices

Country Status (2)

Country Link
US (1) US8301447B2 (en)
WO (1) WO2010041131A1 (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161311A1 (en) * 2008-12-19 2010-06-24 Massuh Lucas A Method, apparatus and system for location assisted translation
US20100223056A1 (en) * 2009-02-27 2010-09-02 Autonomy Corporation Ltd. Various apparatus and methods for a speech recognition system
US20110137638A1 (en) * 2009-12-04 2011-06-09 Gm Global Technology Operations, Inc. Robust speech recognition based on spelling with phonetic letter families
US20110153768A1 (en) * 2009-12-23 2011-06-23 International Business Machines Corporation E-meeting presentation relevance alerts
DE102011118780A1 (en) 2010-12-17 2012-06-21 Avaya Inc. PROCESS AND SYSTEM FOR CREATING A COOPERATION TIME AXIS ILLUSTRATING APPLICATION ARTICLES IN THE CONTEXT
US20130060849A1 (en) * 2011-09-02 2013-03-07 International Business Machines Corporation Injecting content in collaboration sessions
EP2568683A2 (en) 2011-09-08 2013-03-13 Avaya Inc. Methods, apparatuses, and computer-readable media for initiating an application for participants of a conference
US20130226930A1 (en) * 2012-02-29 2013-08-29 Telefonaktiebolaget L M Ericsson (Publ) Apparatus and Methods For Indexing Multimedia Content
US9179002B2 (en) 2011-08-08 2015-11-03 Avaya Inc. System and method for initiating online social interactions based on conference call participation
US9514220B1 (en) * 2012-10-19 2016-12-06 Google Inc. Generating content placement criteria based on a search query
WO2017055879A1 (en) * 2015-10-01 2017-04-06 Chase Information Technology Services Limited System and method for preserving privacy of data in the cloud
US9633015B2 (en) 2012-07-26 2017-04-25 Telefonaktiebolaget Lm Ericsson (Publ) Apparatus and methods for user generated content indexing
US9929869B2 (en) 2011-10-26 2018-03-27 Avaya Inc. Methods, apparatuses, and computer-readable media for providing a collaboration license to an application for participant user device(s) participating in an on-line collaboration
US10019995B1 (en) 2011-03-01 2018-07-10 Alice J. Stiebel Methods and systems for language learning based on a series of pitch patterns
US10141010B1 (en) * 2015-10-01 2018-11-27 Google Llc Automatic censoring of objectionable song lyrics in audio
US20180342235A1 (en) * 2017-05-24 2018-11-29 Verbit Software Ltd. System and method for segmenting audio files for transcription
US10289810B2 (en) 2013-08-29 2019-05-14 Telefonaktiebolaget Lm Ericsson (Publ) Method, content owner device, computer program, and computer program product for distributing content items to authorized users
US10311038B2 (en) 2013-08-29 2019-06-04 Telefonaktiebolaget Lm Ericsson (Publ) Methods, computer program, computer program product and indexing systems for indexing or updating index
US10445367B2 (en) 2013-05-14 2019-10-15 Telefonaktiebolaget Lm Ericsson (Publ) Search engine for textual content and non-textual content
CN111147444A (en) * 2019-11-20 2020-05-12 维沃移动通信有限公司 Interaction method and electronic equipment
CN111383659A (en) * 2018-12-28 2020-07-07 广州市百果园网络科技有限公司 Distributed voice monitoring method, device, system, storage medium and equipment
US11062615B1 (en) 2011-03-01 2021-07-13 Intelligibility Training LLC Methods and systems for remote language learning in a pandemic-aware world

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101483433B1 (en) * 2013-03-28 2015-01-16 (주)이스트소프트 System and Method for Spelling Correction of Misspelled Keyword
US10943580B2 (en) * 2018-05-11 2021-03-09 International Business Machines Corporation Phonological clustering

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5930755A (en) * 1994-03-11 1999-07-27 Apple Computer, Inc. Utilization of a recorded sound sample as a voice source in a speech synthesizer
US6073094A (en) * 1998-06-02 2000-06-06 Motorola Voice compression by phoneme recognition and communication of phoneme indexes and voice features
US6163769A (en) * 1997-10-02 2000-12-19 Microsoft Corporation Text-to-speech using clustered context-dependent phoneme-based units
US20020040296A1 (en) * 2000-08-16 2002-04-04 Anne Kienappel Phoneme assigning method
US20020052870A1 (en) * 2000-06-21 2002-05-02 Charlesworth Jason Peter Andrew Indexing method and apparatus
US20020080927A1 (en) * 1996-11-14 2002-06-27 Uppaluru Premkumar V. System and method for providing and using universally accessible voice and speech data files
US20030125945A1 (en) * 2001-12-14 2003-07-03 Sean Doyle Automatically improving a voice recognition system
US20030125954A1 (en) * 1999-09-28 2003-07-03 Bradley James Frederick System and method at a conference call bridge server for identifying speakers in a conference call
US20040111266A1 (en) * 1998-11-13 2004-06-10 Geert Coorman Speech synthesis using concatenation of speech waveforms
US20040111271A1 (en) * 2001-12-10 2004-06-10 Steve Tischer Method and system for customizing voice translation of text to speech
US20040215449A1 (en) * 2002-06-28 2004-10-28 Philippe Roy Multi-phoneme streamer and knowledge representation speech recognition system and method
US20050159953A1 (en) * 2004-01-15 2005-07-21 Microsoft Corporation Phonetic fragment search in speech data
US20060149558A1 (en) * 2001-07-17 2006-07-06 Jonathan Kahn Synchronized pattern recognition source data processed by manual or automatic means for creation of shared speaker-dependent speech user profile
US20060206324A1 (en) * 2005-02-05 2006-09-14 Aurix Limited Methods and apparatus relating to searching of spoken audio data
US7263484B1 (en) * 2000-03-04 2007-08-28 Georgia Tech Research Corporation Phonetic searching
US20070203702A1 (en) * 2005-06-16 2007-08-30 Yoshifumi Hirose Speech synthesizer, speech synthesizing method, and program
US20080071542A1 (en) * 2006-09-19 2008-03-20 Ke Yu Methods, systems, and products for indexing content
US20080082341A1 (en) * 2006-09-29 2008-04-03 Blair Christopher D Automated Utterance Search
US20080162125A1 (en) * 2006-12-28 2008-07-03 Motorola, Inc. Method and apparatus for language independent voice indexing and searching
US20080270138A1 (en) * 2007-04-30 2008-10-30 Knight Michael J Audio content search engine
US20090043581A1 (en) * 2007-08-07 2009-02-12 Aurix Limited Methods and apparatus relating to searching of spoken audio data

Patent Citations (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5930755A (en) * 1994-03-11 1999-07-27 Apple Computer, Inc. Utilization of a recorded sound sample as a voice source in a speech synthesizer
US20020080927A1 (en) * 1996-11-14 2002-06-27 Uppaluru Premkumar V. System and method for providing and using universally accessible voice and speech data files
US6163769A (en) * 1997-10-02 2000-12-19 Microsoft Corporation Text-to-speech using clustered context-dependent phoneme-based units
US6073094A (en) * 1998-06-02 2000-06-06 Motorola Voice compression by phoneme recognition and communication of phoneme indexes and voice features
US20040111266A1 (en) * 1998-11-13 2004-06-10 Geert Coorman Speech synthesis using concatenation of speech waveforms
US20030125954A1 (en) * 1999-09-28 2003-07-03 Bradley James Frederick System and method at a conference call bridge server for identifying speakers in a conference call
US7263484B1 (en) * 2000-03-04 2007-08-28 Georgia Tech Research Corporation Phonetic searching
US20020052870A1 (en) * 2000-06-21 2002-05-02 Charlesworth Jason Peter Andrew Indexing method and apparatus
US20020040296A1 (en) * 2000-08-16 2002-04-04 Anne Kienappel Phoneme assigning method
US7668718B2 (en) * 2001-07-17 2010-02-23 Custom Speech Usa, Inc. Synchronized pattern recognition source data processed by manual or automatic means for creation of shared speaker-dependent speech user profile
US20060149558A1 (en) * 2001-07-17 2006-07-06 Jonathan Kahn Synchronized pattern recognition source data processed by manual or automatic means for creation of shared speaker-dependent speech user profile
US20040111271A1 (en) * 2001-12-10 2004-06-10 Steve Tischer Method and system for customizing voice translation of text to speech
US20030125945A1 (en) * 2001-12-14 2003-07-03 Sean Doyle Automatically improving a voice recognition system
US20040215449A1 (en) * 2002-06-28 2004-10-28 Philippe Roy Multi-phoneme streamer and knowledge representation speech recognition system and method
US7509258B1 (en) * 2002-06-28 2009-03-24 Conceptual Speech Llc Phonetic, syntactic and conceptual analysis driven speech recognition system and method
US20050159953A1 (en) * 2004-01-15 2005-07-21 Microsoft Corporation Phonetic fragment search in speech data
US20060206324A1 (en) * 2005-02-05 2006-09-14 Aurix Limited Methods and apparatus relating to searching of spoken audio data
US20070203702A1 (en) * 2005-06-16 2007-08-30 Yoshifumi Hirose Speech synthesizer, speech synthesizing method, and program
US20080071542A1 (en) * 2006-09-19 2008-03-20 Ke Yu Methods, systems, and products for indexing content
US20080082341A1 (en) * 2006-09-29 2008-04-03 Blair Christopher D Automated Utterance Search
US20080162125A1 (en) * 2006-12-28 2008-07-03 Motorola, Inc. Method and apparatus for language independent voice indexing and searching
US20080270138A1 (en) * 2007-04-30 2008-10-30 Knight Michael J Audio content search engine
US20090043581A1 (en) * 2007-08-07 2009-02-12 Aurix Limited Methods and apparatus relating to searching of spoken audio data

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9323854B2 (en) * 2008-12-19 2016-04-26 Intel Corporation Method, apparatus and system for location assisted translation
US20100161311A1 (en) * 2008-12-19 2010-06-24 Massuh Lucas A Method, apparatus and system for location assisted translation
US9646603B2 (en) * 2009-02-27 2017-05-09 Longsand Limited Various apparatus and methods for a speech recognition system
US20100223056A1 (en) * 2009-02-27 2010-09-02 Autonomy Corporation Ltd. Various apparatus and methods for a speech recognition system
US20110137638A1 (en) * 2009-12-04 2011-06-09 Gm Global Technology Operations, Inc. Robust speech recognition based on spelling with phonetic letter families
US8195456B2 (en) * 2009-12-04 2012-06-05 GM Global Technology Operations LLC Robust speech recognition based on spelling with phonetic letter families
US20110153768A1 (en) * 2009-12-23 2011-06-23 International Business Machines Corporation E-meeting presentation relevance alerts
DE102011118780A1 (en) 2010-12-17 2012-06-21 Avaya Inc. PROCESS AND SYSTEM FOR CREATING A COOPERATION TIME AXIS ILLUSTRATING APPLICATION ARTICLES IN THE CONTEXT
US10565997B1 (en) 2011-03-01 2020-02-18 Alice J. Stiebel Methods and systems for teaching a hebrew bible trope lesson
US11380334B1 (en) 2011-03-01 2022-07-05 Intelligible English LLC Methods and systems for interactive online language learning in a pandemic-aware world
US11062615B1 (en) 2011-03-01 2021-07-13 Intelligibility Training LLC Methods and systems for remote language learning in a pandemic-aware world
US10019995B1 (en) 2011-03-01 2018-07-10 Alice J. Stiebel Methods and systems for language learning based on a series of pitch patterns
US9179002B2 (en) 2011-08-08 2015-11-03 Avaya Inc. System and method for initiating online social interactions based on conference call participation
US20130060849A1 (en) * 2011-09-02 2013-03-07 International Business Machines Corporation Injecting content in collaboration sessions
US9853824B2 (en) * 2011-09-02 2017-12-26 International Business Machines Corporation Injecting content in collaboration sessions
US9584558B2 (en) 2011-09-08 2017-02-28 Avaya Inc. Methods, apparatuses, and computer-readable media for initiating an application for participants of a conference
EP2568683A2 (en) 2011-09-08 2013-03-13 Avaya Inc. Methods, apparatuses, and computer-readable media for initiating an application for participants of a conference
US9929869B2 (en) 2011-10-26 2018-03-27 Avaya Inc. Methods, apparatuses, and computer-readable media for providing a collaboration license to an application for participant user device(s) participating in an on-line collaboration
US20130226930A1 (en) * 2012-02-29 2013-08-29 Telefonaktiebolaget L M Ericsson (Publ) Apparatus and Methods For Indexing Multimedia Content
US9846696B2 (en) * 2012-02-29 2017-12-19 Telefonaktiebolaget Lm Ericsson (Publ) Apparatus and methods for indexing multimedia content
US9633015B2 (en) 2012-07-26 2017-04-25 Telefonaktiebolaget Lm Ericsson (Publ) Apparatus and methods for user generated content indexing
US9514220B1 (en) * 2012-10-19 2016-12-06 Google Inc. Generating content placement criteria based on a search query
US10445367B2 (en) 2013-05-14 2019-10-15 Telefonaktiebolaget Lm Ericsson (Publ) Search engine for textual content and non-textual content
US10289810B2 (en) 2013-08-29 2019-05-14 Telefonaktiebolaget Lm Ericsson (Publ) Method, content owner device, computer program, and computer program product for distributing content items to authorized users
US10311038B2 (en) 2013-08-29 2019-06-04 Telefonaktiebolaget Lm Ericsson (Publ) Methods, computer program, computer program product and indexing systems for indexing or updating index
US10141010B1 (en) * 2015-10-01 2018-11-27 Google Llc Automatic censoring of objectionable song lyrics in audio
WO2017055879A1 (en) * 2015-10-01 2017-04-06 Chase Information Technology Services Limited System and method for preserving privacy of data in the cloud
US20180342235A1 (en) * 2017-05-24 2018-11-29 Verbit Software Ltd. System and method for segmenting audio files for transcription
US10522135B2 (en) * 2017-05-24 2019-12-31 Verbit Software Ltd. System and method for segmenting audio files for transcription
CN111383659A (en) * 2018-12-28 2020-07-07 广州市百果园网络科技有限公司 Distributed voice monitoring method, device, system, storage medium and equipment
CN111147444A (en) * 2019-11-20 2020-05-12 维沃移动通信有限公司 Interaction method and electronic equipment

Also Published As

Publication number Publication date
WO2010041131A8 (en) 2011-05-12
US8301447B2 (en) 2012-10-30
WO2010041131A1 (en) 2010-04-15

Similar Documents

Publication Publication Date Title
US8301447B2 (en) Associating source information with phonetic indices
US11580991B2 (en) Speaker based anaphora resolution
US11232808B2 (en) Adjusting speed of human speech playback
US8386265B2 (en) Language translation with emotion metadata
US9953636B2 (en) Automatic language model update
US9031839B2 (en) Conference transcription based on conference data
US7788095B2 (en) Method and apparatus for fast search in call-center monitoring
US8694317B2 (en) Methods and apparatus relating to searching of spoken audio data
US8676586B2 (en) Method and apparatus for interaction or discourse analytics
US9245523B2 (en) Method and apparatus for expansion of search queries on large vocabulary continuous speech recognition transcripts
US20110004473A1 (en) Apparatus and method for enhanced speech recognition
JPWO2008114811A1 (en) Information search system, information search method, and information search program
US9311914B2 (en) Method and apparatus for enhanced phonetic indexing and search
JP6517419B1 (en) Dialogue summary generation apparatus, dialogue summary generation method and program
US20210232776A1 (en) Method for recording and outputting conversion between multiple parties using speech recognition technology, and device therefor
JP6513869B1 (en) Dialogue summary generation apparatus, dialogue summary generation method and program
KR102462219B1 (en) Method of Automatically Generating Meeting Minutes Using Speaker Diarization Technology
CN109616116B (en) Communication system and communication method thereof
CN109410945A (en) Can information alert video-meeting method and system
JP5713782B2 (en) Information processing apparatus, information processing method, and program
Hansen et al. Audio stream phrase recognition for a national gallery of the spoken word:" one small step".
US7860715B2 (en) Method, system and program product for training and use of a voice recognition application
Whetten et al. Evaluating Automatic Speech Recognition and Natural Language Understanding in an Incremental Setting
EP1688915A1 (en) Methods and apparatus relating to searching of spoken audio data
Adma presented to the University of waterloo

Legal Events

Date Code Title Description
AS Assignment

Owner name: NORTEL NETWORKS LIMITED,CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YOAKUM, JOHN H.;REEL/FRAME:021667/0964

Effective date: 20081010

Owner name: NORTEL NETWORKS LIMITED, CANADA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YOAKUM, JOHN H.;REEL/FRAME:021667/0964

Effective date: 20081010

AS Assignment

Owner name: NORTEL NETWORKS LIMITED,CANADA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNOR NAMES PREVIOUSLY RECORDED ON REEL 021667 FRAME 0964. ASSIGNOR(S) HEREBY CONFIRMS THE FIRST-NAMED ASSIGNOR AS YOAKUM, JOHN H.;ASSIGNORS:YOAKUM, JOHN H.;WHYNOT, STEPHEN;SIGNING DATES FROM 20081021 TO 20081105;REEL/FRAME:021804/0894

Owner name: NORTEL NETWORKS LIMITED, CANADA

Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNOR NAMES PREVIOUSLY RECORDED ON REEL 021667 FRAME 0964. ASSIGNOR(S) HEREBY CONFIRMS THE FIRST-NAMED ASSIGNOR AS YOAKUM, JOHN H.;ASSIGNORS:YOAKUM, JOHN H.;WHYNOT, STEPHEN;SIGNING DATES FROM 20081021 TO 20081105;REEL/FRAME:021804/0894

AS Assignment

Owner name: CITIBANK, N.A., AS ADMINISTRATIVE AGENT,NEW YORK

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023892/0500

Effective date: 20100129

Owner name: CITIBANK, N.A., AS ADMINISTRATIVE AGENT, NEW YORK

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023892/0500

Effective date: 20100129

AS Assignment

Owner name: CITICORP USA, INC., AS ADMINISTRATIVE AGENT, NEW YORK

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023905/0001

Effective date: 20100129

Owner name: CITICORP USA, INC., AS ADMINISTRATIVE AGENT,NEW YO

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023905/0001

Effective date: 20100129

Owner name: CITICORP USA, INC., AS ADMINISTRATIVE AGENT, NEW Y

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC.;REEL/FRAME:023905/0001

Effective date: 20100129

AS Assignment

Owner name: AVAYA INC.,NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NORTEL NETWORKS LIMITED;REEL/FRAME:023998/0878

Effective date: 20091218

Owner name: AVAYA INC., NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NORTEL NETWORKS LIMITED;REEL/FRAME:023998/0878

Effective date: 20091218

AS Assignment

Owner name: BANK OF NEW YORK MELLON TRUST, NA, AS NOTES COLLATERAL AGENT, THE, PENNSYLVANIA

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC., A DELAWARE CORPORATION;REEL/FRAME:025863/0535

Effective date: 20110211

Owner name: BANK OF NEW YORK MELLON TRUST, NA, AS NOTES COLLAT

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA INC., A DELAWARE CORPORATION;REEL/FRAME:025863/0535

Effective date: 20110211

STCF Information on status: patent grant

Free format text: PATENTED CASE

AS Assignment

Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., PENNSYLVANIA

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA, INC.;REEL/FRAME:029608/0256

Effective date: 20121221

Owner name: THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A., P

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA, INC.;REEL/FRAME:029608/0256

Effective date: 20121221

AS Assignment

Owner name: BANK OF NEW YORK MELLON TRUST COMPANY, N.A., THE, PENNSYLVANIA

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA, INC.;REEL/FRAME:030083/0639

Effective date: 20130307

Owner name: BANK OF NEW YORK MELLON TRUST COMPANY, N.A., THE,

Free format text: SECURITY AGREEMENT;ASSIGNOR:AVAYA, INC.;REEL/FRAME:030083/0639

Effective date: 20130307

FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: CITIBANK, N.A., AS ADMINISTRATIVE AGENT, NEW YORK

Free format text: SECURITY INTEREST;ASSIGNORS:AVAYA INC.;AVAYA INTEGRATED CABINET SOLUTIONS INC.;OCTEL COMMUNICATIONS CORPORATION;AND OTHERS;REEL/FRAME:041576/0001

Effective date: 20170124

AS Assignment

Owner name: OCTEL COMMUNICATIONS LLC (FORMERLY KNOWN AS OCTEL COMMUNICATIONS CORPORATION), CALIFORNIA

Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 041576/0001;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:044893/0531

Effective date: 20171128

Owner name: AVAYA INTEGRATED CABINET SOLUTIONS INC., CALIFORNIA

Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 041576/0001;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:044893/0531

Effective date: 20171128

Owner name: AVAYA INC., CALIFORNIA

Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 029608/0256;ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A.;REEL/FRAME:044891/0801

Effective date: 20171128

Owner name: AVAYA INC., CALIFORNIA

Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 025863/0535;ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST, NA;REEL/FRAME:044892/0001

Effective date: 20171128

Owner name: AVAYA INC., CALIFORNIA

Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 023892/0500;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:044891/0564

Effective date: 20171128

Owner name: OCTEL COMMUNICATIONS LLC (FORMERLY KNOWN AS OCTEL

Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 041576/0001;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:044893/0531

Effective date: 20171128

Owner name: AVAYA INC., CALIFORNIA

Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 041576/0001;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:044893/0531

Effective date: 20171128

Owner name: VPNET TECHNOLOGIES, INC., CALIFORNIA

Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 041576/0001;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:044893/0531

Effective date: 20171128

Owner name: AVAYA INTEGRATED CABINET SOLUTIONS INC., CALIFORNI

Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 041576/0001;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:044893/0531

Effective date: 20171128

Owner name: AVAYA INC., CALIFORNIA

Free format text: BANKRUPTCY COURT ORDER RELEASING ALL LIENS INCLUDING THE SECURITY INTEREST RECORDED AT REEL/FRAME 030083/0639;ASSIGNOR:THE BANK OF NEW YORK MELLON TRUST COMPANY, N.A.;REEL/FRAME:045012/0666

Effective date: 20171128

AS Assignment

Owner name: GOLDMAN SACHS BANK USA, AS COLLATERAL AGENT, NEW YORK

Free format text: SECURITY INTEREST;ASSIGNORS:AVAYA INC.;AVAYA INTEGRATED CABINET SOLUTIONS LLC;OCTEL COMMUNICATIONS LLC;AND OTHERS;REEL/FRAME:045034/0001

Effective date: 20171215

Owner name: GOLDMAN SACHS BANK USA, AS COLLATERAL AGENT, NEW Y

Free format text: SECURITY INTEREST;ASSIGNORS:AVAYA INC.;AVAYA INTEGRATED CABINET SOLUTIONS LLC;OCTEL COMMUNICATIONS LLC;AND OTHERS;REEL/FRAME:045034/0001

Effective date: 20171215

AS Assignment

Owner name: AVAYA, INC., CALIFORNIA

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITICORP USA, INC.;REEL/FRAME:045045/0564

Effective date: 20171215

Owner name: SIERRA HOLDINGS CORP., NEW JERSEY

Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:CITICORP USA, INC.;REEL/FRAME:045045/0564

Effective date: 20171215

AS Assignment

Owner name: CITIBANK, N.A., AS COLLATERAL AGENT, NEW YORK

Free format text: SECURITY INTEREST;ASSIGNORS:AVAYA INC.;AVAYA INTEGRATED CABINET SOLUTIONS LLC;OCTEL COMMUNICATIONS LLC;AND OTHERS;REEL/FRAME:045124/0026

Effective date: 20171215

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8

AS Assignment

Owner name: WILMINGTON TRUST, NATIONAL ASSOCIATION, MINNESOTA

Free format text: SECURITY INTEREST;ASSIGNORS:AVAYA INC.;AVAYA MANAGEMENT L.P.;INTELLISIST, INC.;AND OTHERS;REEL/FRAME:053955/0436

Effective date: 20200925

AS Assignment

Owner name: WILMINGTON TRUST, NATIONAL ASSOCIATION, AS COLLATERAL AGENT, DELAWARE

Free format text: INTELLECTUAL PROPERTY SECURITY AGREEMENT;ASSIGNORS:AVAYA INC.;INTELLISIST, INC.;AVAYA MANAGEMENT L.P.;AND OTHERS;REEL/FRAME:061087/0386

Effective date: 20220712

AS Assignment

Owner name: AVAYA INTEGRATED CABINET SOLUTIONS LLC, NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS AT REEL 45124/FRAME 0026;ASSIGNOR:CITIBANK, N.A., AS COLLATERAL AGENT;REEL/FRAME:063457/0001

Effective date: 20230403

Owner name: AVAYA MANAGEMENT L.P., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS AT REEL 45124/FRAME 0026;ASSIGNOR:CITIBANK, N.A., AS COLLATERAL AGENT;REEL/FRAME:063457/0001

Effective date: 20230403

Owner name: AVAYA INC., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS AT REEL 45124/FRAME 0026;ASSIGNOR:CITIBANK, N.A., AS COLLATERAL AGENT;REEL/FRAME:063457/0001

Effective date: 20230403

Owner name: AVAYA HOLDINGS CORP., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS AT REEL 45124/FRAME 0026;ASSIGNOR:CITIBANK, N.A., AS COLLATERAL AGENT;REEL/FRAME:063457/0001

Effective date: 20230403

AS Assignment

Owner name: WILMINGTON SAVINGS FUND SOCIETY, FSB (COLLATERAL AGENT), DELAWARE

Free format text: INTELLECTUAL PROPERTY SECURITY AGREEMENT;ASSIGNORS:AVAYA MANAGEMENT L.P.;AVAYA INC.;INTELLISIST, INC.;AND OTHERS;REEL/FRAME:063742/0001

Effective date: 20230501

AS Assignment

Owner name: CITIBANK, N.A., AS COLLATERAL AGENT, NEW YORK

Free format text: INTELLECTUAL PROPERTY SECURITY AGREEMENT;ASSIGNORS:AVAYA INC.;AVAYA MANAGEMENT L.P.;INTELLISIST, INC.;REEL/FRAME:063542/0662

Effective date: 20230501

AS Assignment

Owner name: AVAYA MANAGEMENT L.P., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 045034/0001);ASSIGNOR:GOLDMAN SACHS BANK USA., AS COLLATERAL AGENT;REEL/FRAME:063779/0622

Effective date: 20230501

Owner name: CAAS TECHNOLOGIES, LLC, NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 045034/0001);ASSIGNOR:GOLDMAN SACHS BANK USA., AS COLLATERAL AGENT;REEL/FRAME:063779/0622

Effective date: 20230501

Owner name: HYPERQUALITY II, LLC, NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 045034/0001);ASSIGNOR:GOLDMAN SACHS BANK USA., AS COLLATERAL AGENT;REEL/FRAME:063779/0622

Effective date: 20230501

Owner name: HYPERQUALITY, INC., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 045034/0001);ASSIGNOR:GOLDMAN SACHS BANK USA., AS COLLATERAL AGENT;REEL/FRAME:063779/0622

Effective date: 20230501

Owner name: ZANG, INC. (FORMER NAME OF AVAYA CLOUD INC.), NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 045034/0001);ASSIGNOR:GOLDMAN SACHS BANK USA., AS COLLATERAL AGENT;REEL/FRAME:063779/0622

Effective date: 20230501

Owner name: VPNET TECHNOLOGIES, INC., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 045034/0001);ASSIGNOR:GOLDMAN SACHS BANK USA., AS COLLATERAL AGENT;REEL/FRAME:063779/0622

Effective date: 20230501

Owner name: OCTEL COMMUNICATIONS LLC, NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 045034/0001);ASSIGNOR:GOLDMAN SACHS BANK USA., AS COLLATERAL AGENT;REEL/FRAME:063779/0622

Effective date: 20230501

Owner name: AVAYA INTEGRATED CABINET SOLUTIONS LLC, NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 045034/0001);ASSIGNOR:GOLDMAN SACHS BANK USA., AS COLLATERAL AGENT;REEL/FRAME:063779/0622

Effective date: 20230501

Owner name: INTELLISIST, INC., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 045034/0001);ASSIGNOR:GOLDMAN SACHS BANK USA., AS COLLATERAL AGENT;REEL/FRAME:063779/0622

Effective date: 20230501

Owner name: AVAYA INC., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 045034/0001);ASSIGNOR:GOLDMAN SACHS BANK USA., AS COLLATERAL AGENT;REEL/FRAME:063779/0622

Effective date: 20230501

Owner name: AVAYA INTEGRATED CABINET SOLUTIONS LLC, NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 53955/0436);ASSIGNOR:WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT;REEL/FRAME:063705/0023

Effective date: 20230501

Owner name: INTELLISIST, INC., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 53955/0436);ASSIGNOR:WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT;REEL/FRAME:063705/0023

Effective date: 20230501

Owner name: AVAYA INC., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 53955/0436);ASSIGNOR:WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT;REEL/FRAME:063705/0023

Effective date: 20230501

Owner name: AVAYA MANAGEMENT L.P., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 53955/0436);ASSIGNOR:WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT;REEL/FRAME:063705/0023

Effective date: 20230501

Owner name: AVAYA INTEGRATED CABINET SOLUTIONS LLC, NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 61087/0386);ASSIGNOR:WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT;REEL/FRAME:063690/0359

Effective date: 20230501

Owner name: INTELLISIST, INC., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 61087/0386);ASSIGNOR:WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT;REEL/FRAME:063690/0359

Effective date: 20230501

Owner name: AVAYA INC., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 61087/0386);ASSIGNOR:WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT;REEL/FRAME:063690/0359

Effective date: 20230501

Owner name: AVAYA MANAGEMENT L.P., NEW JERSEY

Free format text: RELEASE OF SECURITY INTEREST IN PATENTS (REEL/FRAME 61087/0386);ASSIGNOR:WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT;REEL/FRAME:063690/0359

Effective date: 20230501

AS Assignment

Owner name: AVAYA LLC, DELAWARE

Free format text: (SECURITY INTEREST) GRANTOR'S NAME CHANGE;ASSIGNOR:AVAYA INC.;REEL/FRAME:065019/0231

Effective date: 20230501

AS Assignment

Owner name: AVAYA MANAGEMENT L.P., NEW JERSEY

Free format text: INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT;ASSIGNOR:WILMINGTON SAVINGS FUND SOCIETY, FSB;REEL/FRAME:066894/0227

Effective date: 20240325

Owner name: AVAYA LLC, DELAWARE

Free format text: INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT;ASSIGNOR:WILMINGTON SAVINGS FUND SOCIETY, FSB;REEL/FRAME:066894/0227

Effective date: 20240325

Owner name: AVAYA MANAGEMENT L.P., NEW JERSEY

Free format text: INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:066894/0117

Effective date: 20240325

Owner name: AVAYA LLC, DELAWARE

Free format text: INTELLECTUAL PROPERTY RELEASE AND REASSIGNMENT;ASSIGNOR:CITIBANK, N.A.;REEL/FRAME:066894/0117

Effective date: 20240325

AS Assignment

Owner name: ARLINGTON TECHNOLOGIES, LLC, TEXAS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AVAYA LLC;REEL/FRAME:067022/0780

Effective date: 20240329