TWI353585B - Computer-implemented method,apparatus, and compute - Google Patents

Computer-implemented method,apparatus, and compute Download PDF

Info

Publication number
TWI353585B
TWI353585B TW094131515A TW94131515A TWI353585B TW I353585 B TWI353585 B TW I353585B TW 094131515 A TW094131515 A TW 094131515A TW 94131515 A TW94131515 A TW 94131515A TW I353585 B TWI353585 B TW I353585B
Authority
TW
Taiwan
Prior art keywords
grammar
voice
field
fill
sound
Prior art date
Application number
TW094131515A
Other languages
English (en)
Other versions
TW200630957A (en
Inventor
Soonthorn Ativanichayaphong
Charles W Cross Jr
Gerald M Mccobb
Original Assignee
Nuance Communications Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nuance Communications Inc filed Critical Nuance Communications Inc
Publication of TW200630957A publication Critical patent/TW200630957A/zh
Application granted granted Critical
Publication of TWI353585B publication Critical patent/TWI353585B/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/174Form filling; Merging
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/19Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
    • G10L15/193Formal grammars, e.g. finite state automata, context free grammars or word networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/487Arrangements for providing information services, e.g. recorded voice services or time announcements
    • H04M3/493Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
    • H04M3/4938Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals comprising a voice browser which renders and interprets, e.g. VoiceXML

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Machine Translation (AREA)
  • User Interface Of Digital Computer (AREA)

Description

1353585 九、發明說明: 【發明所屬之技術領域】 本發明係關於多模式瀏覽器及聲音伺服器,且更特定言 之,係關於用於多模式瀏覽器及聲音伺服器之聲致多模式 應用程式。 【先前技術】 許多以資料描述可延伸性標記語言(XML)為基礎之新近 發展已開發出新的基於網路之應用程式,其包含多模式介 面或瀏覽器。多模式瀏覽器允許使用者存取多模式内容, 該内容可為圖形的及音訊的兩者。傳統上,使用者利用自 鍵盤之圖形輸入或手動導向之螢幕指標入口來存取網路内 容。之後,使用者亦能夠利用語音輸入。更近地,使用者 已能夠經由允許使用圖形輸入及語音輸入兩者之多模式介 面來存取網路内容。 可延伸性超文字標記語言(XHTML或XML)+聲音可延伸 性標記語言(VXML),亦更簡明地表示為X+V標記語言, 提供一種類型之多模式瀏覽器。X+V標記語言擴展傳統圖 形瀏覽器,使其包含話語互動。X+V標記語言整合 XHTML·及XML·事件技術與被發展為全球資訊網協會(W3C) 語音介面構架之部分的XML字彙。該整合包含支持語音合 成、語音對話、命令及控制應用程式,及語音文法之多個 聲音模組。聲音處理程序可附著於XHTML元素且回應視 覺瀏覽器之多個特定文件'物件模式(DOM)事件。 儘管有該等發展,但是在以習知方式實施之多模式介面 104804.doc 丄353585 中並不存在許多使用者所要之性能,諸如基於語音發言而 填入表袼欄位之使用者友好性能。要求使用者輸入之表格 已邊彳于普遍。舉例而言,通常使用者在被授予進入安全網 站之存取特權之前必須完成一表格。輸入表格資訊可為單 調乏味、耗時甚至令人煩擾的。對於自各種網站反覆存取 内容之使用者而言尤其如此,每一該等網站在允許存取之 • 前均要求基於表格輸入使用者資料。此外,使用者可能正 • 使用一裝置來存取具有有限或不便之輸入選項之網路内 谷舉例而έ,電話、行動電話、個人數位助理(pDa)或 相似之裝置通常僅包含一有限陣列之鍵、一極小之鍵區或 只有一聲音輸入機構。因此,希望擴展多模式瀏覽器以提 供聲致表格欄位之自動填入之有效方式。 【發明内容】 本發明提供-用來回應語音發言而自動填人—表格攔位 之電腦實施方法。該方法可包含產生至少一對應於該表格 Φ 棚位之文法之步驟。該文法可基於一使用者資料楷且可包 含一語義解譯串。_方法可進一步包含創建一事件之步 驟。該事件可基於該i少一文法且可回應該語音發言。該 事件可促使用對應於該使用者資料標之資料來填入表格搁 位。 根據另-實施例,本發明提供一用來回應—語音發言而 自動填入一表格欄位之電腦系統。該系統可包含一產生至 少一對應於該表格欄位之文法之文法產生模組。該文法可 基於-使用者資料稽且可包括—語義解譯串。該電腦系統 104804.doc 1353585 . 亦可包含一創建一基於該至少一文法且回應於該語音發言 之事件之事件模組。該事件可促使用對應於該使用者資料 檔之資料填入該表格欄位。 【實施方式】 圖1係說明多模式通信環境1〇〇之示意圖,根據本發明之 用於回應語音發言而自動填入表格欄位之系統2〇〇可用於 其中。如所說明’多模式通信環境ι〇0可包含諸如自動語 Φ 音辨識(ASR)引擎14〇及本文至語音(TTS)引擎I45之多個語 音處理資源,每一該等語音處理資源可經由一通信網路 150與系統200電通信。通信網路15〇可包含(但不限於)區域 網路、廣域網路、公用交換電話網路、無線或行動通信網 路或網際網路。例示性地,系統200亦能夠經由另一或同 一通信網路150與電腦系統155及電話16〇電通信。 自以下描述將易瞭解’所說明之多模式通信環境1〇〇僅 係系統200有利地用於其中之多模式通信環境之一類型。 馨 舉例而σ,替代性多模式通信環境可包含例示性展示之不 同組件之各種子集。 另外參看圖2 ’系統200例示性地包含一應用程式205及 一應用程式介面(ΑΡΙ)210,該應用程式經由該應用程式介 面鏈接至一解譯器211。在解譯器211中,系統2〇〇亦例示 性地包含一文法產生模組215及一經由ΑΡΙ 21〇連接至應用 程式205之事件模組220。文法產生模組21 5及事件模組22〇 可在與應用程式205相同之位址空間内運行。該系統亦包 含一連接至語音瀏覽器之語、音服務介面221。更一般地, 104804.doc 1353585 語音服務介面221可連接至諸如音訊子系統之各種音訊資 源(未圖示)及諸如自動語音辨識(ASR)引擎及本文至語音 (tts)引擎之語音處理資源中之任—者。因此,系統2⑽可 充當伺服器,其用作諸如聲音瀏覽器、互動聲音回應系 統、聲音伺服器或其它類型之應用程式之一或多個應用程 式之主機。舉例而言,應用程式205亦可充當待聲音致能 • 或語音致能之視覺瀏覽器。 φ 系統200另外包含一剖析模組217 ,其剖析一以
VcnceXML寫入之文件且判定該文件在使用者資料檔域中 疋否含有一同步聲音攔位。本文中所用之術語同步聲音攔 位表示#1由使語音輸入與圖形輸入同步而填入之表格搁 位。如下文描述’該同步導致回應於語音輸人而用形輸入 填入該表格欄位。在本文中使用術語使用者資料標域來表 示待用對應於使用者資料標之資料填入之多個表格搁位, 該使用者資料檔表示,例如,對應於使用者之個人資料。 φ 此個人資訊可包含使用者之姓名、地址及電話號碼。其它 類型之資料可替代性地包含於使用者資料權中且可為相應 牮致自動填入之主體而不會改變如本文中所描述之本發 明。 在本文中,使用者資料檔例示性地包含關鍵字、標 ««己短及值如表丨中之代表性使用者資料檔方案所展 示0 104804.doc 1353585 關鍵字 標記短語 值 "名" "我的名" "Gerald" •,姓" "我的姓" "McCobb" "地址" ”我的地址” "8051國會大道" 表1 API 210提供—實施諸如x+v<Sync>元素之同步元素之
VoiceXML攔位文法。由voiceXML攔位文法實施之同步元 素使諸如XHTML輸入控制之圖形輸入控制之值性質與同 步聲音欄位同步。如上所述,在本文中,同步聲音攔位界 定將由系統200自動填入或已由系統2〇〇自動填入之表格欄 位。 文法產生模組215可包括VoiceXML解譯器。如圖3中說 明,文法產生模組215自使用者資料檔獲得標記短語及將 被自動填入已識別表格欄位之特定一者中之對應值。文法 產生模組215產生一表示為自動填入文法之額外文法,其 基於標記短語且具有包含對應於標記短語之值之語義解譯 ⑽串或標記(tag) „該文法產生模組對由剖析模組217識別 為使用者資料檔域中之同步聲音攔位之每一表格攔位執行 此操作。因此,文法產生模組215產生—對應於每一 表格欄位之同步聲音攔位之自動填人文法,該文法基於使 用者資料檔且包含SI串或標記。 當表格解譯演算法(FIA)存取該等經如此識別之表格棚 位之一時’文法產生模組215致能自動填人文法以及由絕 210提供之V〇iceXML攔位文法。當自動填入文法及 104804.doc 1353585 wceXviL攔位文法得以致能時,事件模組22〇創建一基於 言玄令·:冬 . *之自動填入事件。該事件係經組態以回應語音發 言。 自動填入事件回應語音發言而促使執行SI串或標記,以 使得執行之結果係對應於標記短語之值。自動填入事件使 该結果得以傳播,且由voicexML攔位文法實施之同步元 素用SI串或標記執行之結果填入網頁之表格欄位中。因 此,該事件回應語音發言而促使用包含於語義解譯串中之 值填入表格欄位。 圖4係根據本發明之另一實施例之用於回應語音發言而 自動填入表格攔位之系統300的示意圖。該系統包含一應 用程式305及一解譯器312,該應用程式與該解譯器經由一 API 310連接。該系統亦包含一文法產生模組3 15及一事件 模組325。如所說明的,該文法產生模組3 15與該事件模組 325為該解譯器312之部分。 應用程式305可產生VoiceXML片段330並將其傳送至解 譯器312。VoiceXML片段330可指定一可用來處理已接收 之語音發言之文法。在多模式互動致能多個裝置之情況 下,視需要可包含一組態襠案320以指定諸如電話、行動 電话、家庭安全系統、儀錶.板音訊/通信系統、電腦系 統、攜帶型電腦系統之一或多個不同裝置。在組轉樓案 320内,可為每一裝置指派唯一地識別該裝置之識別符。 在一實施例中,於使用解譯器312暫存VoiceXML片段33〇 之前,應用程式305可存取組態檔案320,以獲得正被使用 104804.doc 1353585 之裝置之識別碼。 系統300利用命令、控制及内容導航標記語言(C3N),其 中應用程式305以該語言使用解譯器312暫存基於諸如
VoiceXML<link>之C3N文法之VoiceXML·鏈接。藉由使該 鏈接中之文法匹配所產生之多個事件(如圖所示)被傳播回 至應用程式305。藉由指定基於C3N文法且由C3N鏈接文法 表示之一或多個鏈接元素,可使至應用程式3〇5之語音輸 入匹配。意即,解譯器312可用C3N鏈接文法來使自應用 程式3 0 5接收之語音輸入匹配。當偵測到匹配時,解譯器 312可產生一或多個事件,其被傳送回至應用程式3〇5。 更確切地說’如圖5中說明的,當需要自動填入時,文 法產生模組315產生C3N鏈接文法。C3N鏈接文法係基於使 用者資料檔。應用程式305隨後指示解譯器312添加鏈 接文法。C3N键接文法促使事件模組325創建自動填入事 件。當回應语音發言而勢行該事件時,該事件促使來自使 用者資料檔之圖形輸入填入表格攔位。’ 舉例而言,假設如上所述之相同使用者資料檔,以下基 於VoiceXML之應用程式使用要素、標記短語元素及值元 素來產生根據本實施例之文法: 104804.doc •12· 1353585
<vxml :link eventexpH-"application.lastresuIt$.interpretation.c3n> 〈grammar〉 <![CDATA[ #JSGFVl.t) grammar user_profile; public <user_profile> + Browser fill [my] ( fi^st name {$.c3n = "command.autofill.firstnaine":} I last naihe {$.έ3η 亡"command.aut.ofilUastiiame";} I street address. {$.c3n = "command.aut6filladress";} • " . ]]> </grammar> 使用包含於對應SI串或標記之一部分中之標記短語及對 應關鍵字來建立該文法。該文法例示性地經構建以便匹配 諸如”填入我的街道地址"之短語。VoiceXML鏈接回應語音 發言而促使一事件得以傳播。系統300藉由搜尋使用者資 料檔以獲得該地址之值而回應該事件,其被解譯為自動填 入命令。回應之結果為用值"8051國會大道"自動填入表格 欄位。 圖6提供說明根據本發明之又一實施例之方法400的流程 圖。該方法400在步驟410處開始,在該步驟410中剖析一 文件以判定X+V文件在使用者資料檔域中是否含有同步聲 音攔位。在步驟412處為每一攔位產生一 VoiceXML欄位文 法。 在步驟414處,產生一基於使用者資料檔之標記短語及 104804.doc •13- 叫585 對應值之自動填入文法,該值包含在一51串或標記中。在 步驟416處’致能VoiceXML欄位文法及自動填入文法。回 應語音發言’在步驟418處執行SI串以使得結果為包含於 SI串或標記中之值。在步驟420處,用該結果自動填入一 視覺欄值。
圖7提供說明根據本發明之又一不同實施例之方法5〇〇的 流程圖。在步驟5〇2中,創建一具有對應於使用者資料檔 之攔位之鏈接文法。在步驟5〇4處,一解譯器添加該鏈接 文法。該鏈接在步驟506處回應語音發言且當網頁中一攔 位受到關注時產生一事件。
本發明可在硬體、軟體或硬體與軟體之組合中實現。本 發明可在電腦系統中以集中樣式實現,或以分散樣式實 現,其中不同元件遍佈於若干互相連接之電腦系統。任何 —種電腦系統或其它經調適以執行本文中描述之方法之裝 置係適宜的。硬體與軟體之典型組合可為具有電腦程式: 通用電知系統,當載人並執行該電腦程式時,其控制電 糸統以使電腦系統執行本文中描述之方法。 本發明亦可體現於電腦程式 ^ ^ 〇 ^ 能本文中描述之方法之實所有致 時能夠執行該等方法。電m统 , 式在本文中意謂一組指令之 任何表達(以任何語言、 7之 . J辱或記數法表達),該組指令
思欲使具有資訊處理能 相V 丁本_ 之系統直接執行特定功能哎在以
下步驟之任一者或兩者後 此次在U t m… 後執仃特定功能、)轉換至另-达 。私式碼或讀法:如Μ㈣形式複^ - 104804.doc 1353585 本發明可體現於其它形式中而不會背離其精神或本質屬 性。因此,當指示本發明之範疇時,需參考以下申請專利 範圍而不是前述之說明書。 【圖式簡單說明】 圖1係說明一多模式通信環境之示意圖’根據本發明之 一實施例之系統可用於其中。
圖2係根據本發明之一實施例之系統的示意圖。 圖3係由圖2之系統執行之操作之示意圖。 圖4係根據本發明之另一實施例之系統的示意圖。 圖5係由圖4之系統執行之操作之示意圖。 實施例之方法的流程圖。 圖6係說明根據本發明之_ 實施例之方法的流程圖 圖7係說明根據本發明之另— 【主要元件符號說明】 100 多模式通信環境 140 自動語音辨識引擎 145 本文至語音引擎 150 通信網路 155 電腦系統 160 電話 200 聲致自動填入系統 205 應用程式
210 API 211 解譯器 215 文法產生模組 104804.doc -15* 1353585 217 剖析模組 220 事件模組 221 語音服務介面 300 聲致自動填入系統 305 應用程式
310 API 312 解譯器 315 文法產生模組 320 組態檔案 325 事件模組 330 VXML 片段
-16· 104804.doc

Claims (1)

1353585 第094131515號專利申請案 --- 中文申請專利範圍替換本(1〇〇年5月) I⑽年Γ护Ο日修正本 十、申請專利範圍: --:一一 1. 一種在一多模式通信環境中用來回應一語音發言而自動 填入表格攔位之電腦實施之方法,該多模式通信環境 具有一實施一 XHTML+VXML(X+V)標記語言之網頁瀏覽 器’該方法包括: /析一 X+V文件以決定一使用者資料檔域中的同步聲 音欄位’其中—时聲音攔位參照—表格攔位,該表格 欄位㈣語音及圖形輸人之同步來填人該表格欄位,且 • 一使用者資料檔域參照,將被填入對應-使用者資料標 之資料的表格欄位; 對於每個被決定之同步聲音攔位,在運作時動態地產 生至少一對應於該表格欄位之文法,該至少一文法係基 於一使用者資料檔,且包括一語義解譯及 創建—基於該至少一文法且回應於該語音發言之自動 填入事件,該自動填入事件促使以對應於該使用者資料 Φ 檔之資料填入該表格攔位並包含至少一部分不在該語音 發言之資料。 月长項1之方法,其中填入該表格欄位之該資料係擷 取自-特定針對該使用者資料檔之表之一記錄該記錄 在該資料與該語音發言之間建立一關聯。 求項1之方法,其中填入該表格攔位之該資料包含 除包含於該語音發言之—語音至本文轉換内之資訊以外 之資訊。 (如請求们之方法,其中該產生至少一文法包括產生至 104804-1000520.doc 少 _ 古、 法’該至少一文法界定一對應於該同步聲音表格 櫚位之表格爛位文法。 。月求項4之方法’該產生至少一文法包括另外產生一 土於&§己短語及一包含於該語義解譯串中之值的自動 填入文法。 立月长項5之方法,其中該自動填入事件促使回應該語 θ發言’以包含於該語義解譯串中之值填入該表格欄 仇。 如叫求項1之方法,該產生至少一文法包括產生一聲音 8命7及控制文法以及一内容導航文法中之至少一者。 如4求項1之方法,其中該表格欄位係一聲音標記語言 文件之一表格欄位,且其中該產生步驟包括產生至少一 文法忒至少—文法界定一對應於該表格欄位之鏈接文 法。 9.—種在H切信環境巾用細應-語音發言以自動 :入-表格欄位之裝置,該多模式通信環境具有一實施 HTML+VXML(X+V)標記語言之網頁潘j覽器,該裝置 包括一硬體及軟體之組合來實施 〜-用來剖析_x+v標記語言文件之剖析模組,用來決 疋该x+v標記語言文件是否包含—參照-表格棚位之同 步聲音攔位,該表格欄位利用語音及圖形輸入之同步來 填入該表格欄位’且__使用者資料檀域參照將被填入對 應一使用者資料檔之資料的表格欄位; 在運作時用來動態地產生至少一個對應於每個決定 I04804-I000520.doc •2· 1353585 之同步聲音攔位之文法之文法產生模組,該至少一文法 係基於—使用者資料檔且包括一語義解譯串;及 用來創建一基於該至少一文法且回應於該語音發言 之自動填入事件之自動填入事件模組,該自動填入事件 促使以對應於該使用者資料擋之資料填入該表格欄位並 包3至少一部分不在該語音發言之資料。 I .如吻求項9之裝置,其中該文法產生模組包括一用來產 生至少一文法之聲音標記語言解譯器,該至少一文法界 定一對應於該同步聲音表格攔位之表格攔位文法。 II _如叫求項1 〇之裝置,其中該聲音標記語言解譯器另外產 生一基於一標記短語及一包含於該語義解譯串中之值的 自動填入文法。 12. 如請求仙之裝置中自動填人事件促使回應該語音 發言,以包含於該語義解譯串中之該值填入該表格攔 位。 13. 如請求項9之裝置,其中該文法產生模組包括一經組態 以產生一聲音標記s吾言鍵接文法之谬j覽器。 14. 如請求項13之裝置’其中該自動填入事件係—回應於該 鏈接文法之聲音標記語言自動填入事件,且該自動填入 事件促使該瀏覽器以將對應於該使用者資料檔之資料填 入該表格糊位。 15. —種用於一資料通信網路中之電腦可讀儲存媒體,該電 月®可讀儲存媒體包括多個電腦指令以供: 剖析一X+V文件來決定一使用♦資料_中的同步聲 104804.1000520.doc 1353585 曰糊位’其中-同步聲音欄位參照—表格爛位該表格 攔位利用βσ a及圖形輪人之同步來填人該表格棚位且 吏用者資料播域參照將被填入對應一使用者資料播之 資料的表格攔位; 對於每個被決定之同步聲音欄位,在運作時動態地產 生至少-對應於該表格欄位之文法,該至少—文法係基 於一使用者資料檔’且包括-語義解譯_ ;且 創建-基於該至少一文法且回應於該語音發言之自動 填入事件,該自動填人事件促使以對應於該使用者資料 檔之資料填入該表格欄位並包含至少_部分不在該語音 發言之資料。 曰 16. 17. 如請求们5之電腦可讀儲存媒體,其中填人該表格搁位 之该貧料係操取卜特定針對該使用者資料稽之表之一 記錄’該記錄在該資料與該語音發言<間建p關聯, ^其中填人該表格欄位之該f料包含除包含於該語音發 s之一浯音至本文轉換内之資訊以外之資訊。 如請求項15之電腦可讀儲存媒體,其中該產生至少— 法包括-用來產生至少一文法之電腦指令,該至二:文 法界定一對應於該同步聲音表格糊位之表格搁位:法文 104804-1000520.doc • 4 -
TW094131515A 2004-09-20 2005-09-13 Computer-implemented method,apparatus, and compute TWI353585B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/945,112 US7739117B2 (en) 2004-09-20 2004-09-20 Method and system for voice-enabled autofill

Publications (2)

Publication Number Publication Date
TW200630957A TW200630957A (en) 2006-09-01
TWI353585B true TWI353585B (en) 2011-12-01

Family

ID=36075165

Family Applications (1)

Application Number Title Priority Date Filing Date
TW094131515A TWI353585B (en) 2004-09-20 2005-09-13 Computer-implemented method,apparatus, and compute

Country Status (3)

Country Link
US (2) US7739117B2 (zh)
CN (1) CN1752975B (zh)
TW (1) TWI353585B (zh)

Families Citing this family (99)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8788271B2 (en) * 2004-12-22 2014-07-22 Sap Aktiengesellschaft Controlling user interfaces with contextual voice commands
US9083798B2 (en) * 2004-12-22 2015-07-14 Nuance Communications, Inc. Enabling voice selection of user preferences
US7409344B2 (en) * 2005-03-08 2008-08-05 Sap Aktiengesellschaft XML based architecture for controlling user interfaces with contextual voice commands
GB0507148D0 (en) * 2005-04-08 2005-05-18 Ibm Method and apparatus for multimodal voice and web services
US7917365B2 (en) 2005-06-16 2011-03-29 Nuance Communications, Inc. Synchronizing visual and speech events in a multimodal application
US8090584B2 (en) 2005-06-16 2012-01-03 Nuance Communications, Inc. Modifying a grammar of a hierarchical multimodal menu in dependence upon speech command frequency
US20060287865A1 (en) * 2005-06-16 2006-12-21 Cross Charles W Jr Establishing a multimodal application voice
US20060288309A1 (en) * 2005-06-16 2006-12-21 Cross Charles W Jr Displaying available menu choices in a multimodal browser
US8032825B2 (en) * 2005-06-16 2011-10-04 International Business Machines Corporation Dynamically creating multimodal markup documents
US20060287858A1 (en) * 2005-06-16 2006-12-21 Cross Charles W Jr Modifying a grammar of a hierarchical multimodal menu with keywords sold to customers
US20060287846A1 (en) * 2005-06-21 2006-12-21 Microsoft Corporation Generating grammar rules from prompt text
US8924212B1 (en) * 2005-08-26 2014-12-30 At&T Intellectual Property Ii, L.P. System and method for robust access and entry to large structured data using voice form-filling
US8073700B2 (en) 2005-09-12 2011-12-06 Nuance Communications, Inc. Retrieval and presentation of network service results for mobile device using a multimodal browser
US8719034B2 (en) 2005-09-13 2014-05-06 Nuance Communications, Inc. Displaying speech command input state information in a multimodal browser
US7822699B2 (en) * 2005-11-30 2010-10-26 Microsoft Corporation Adaptive semantic reasoning engine
US20070130134A1 (en) * 2005-12-05 2007-06-07 Microsoft Corporation Natural-language enabling arbitrary web forms
US7831585B2 (en) * 2005-12-05 2010-11-09 Microsoft Corporation Employment of task framework for advertising
US7933914B2 (en) * 2005-12-05 2011-04-26 Microsoft Corporation Automatic task creation and execution using browser helper objects
US7996783B2 (en) * 2006-03-02 2011-08-09 Microsoft Corporation Widget searching utilizing task framework
US20070274297A1 (en) * 2006-05-10 2007-11-29 Cross Charles W Jr Streaming audio from a full-duplex network through a half-duplex device
US7848314B2 (en) * 2006-05-10 2010-12-07 Nuance Communications, Inc. VOIP barge-in support for half-duplex DSR client on a full-duplex network
US9208785B2 (en) * 2006-05-10 2015-12-08 Nuance Communications, Inc. Synchronizing distributed speech recognition
US8332218B2 (en) * 2006-06-13 2012-12-11 Nuance Communications, Inc. Context-based grammars for automated speech recognition
US7676371B2 (en) * 2006-06-13 2010-03-09 Nuance Communications, Inc. Oral modification of an ASR lexicon of an ASR engine
US8145493B2 (en) * 2006-09-11 2012-03-27 Nuance Communications, Inc. Establishing a preferred mode of interaction between a user and a multimodal application
US8374874B2 (en) 2006-09-11 2013-02-12 Nuance Communications, Inc. Establishing a multimodal personality for a multimodal application in dependence upon attributes of user interaction
US7957976B2 (en) * 2006-09-12 2011-06-07 Nuance Communications, Inc. Establishing a multimodal advertising personality for a sponsor of a multimodal application
US8086463B2 (en) 2006-09-12 2011-12-27 Nuance Communications, Inc. Dynamically generating a vocal help prompt in a multimodal application
US8073697B2 (en) * 2006-09-12 2011-12-06 International Business Machines Corporation Establishing a multimodal personality for a multimodal application
US7881932B2 (en) * 2006-10-02 2011-02-01 Nuance Communications, Inc. VoiceXML language extension for natively supporting voice enrolled grammars
US7747442B2 (en) * 2006-11-21 2010-06-29 Sap Ag Speech recognition application grammar modeling
US7827033B2 (en) * 2006-12-06 2010-11-02 Nuance Communications, Inc. Enabling grammars in web page frames
US8069047B2 (en) * 2007-02-12 2011-11-29 Nuance Communications, Inc. Dynamically defining a VoiceXML grammar in an X+V page of a multimodal application
US8150698B2 (en) 2007-02-26 2012-04-03 Nuance Communications, Inc. Invoking tapered prompts in a multimodal application
US7801728B2 (en) 2007-02-26 2010-09-21 Nuance Communications, Inc. Document session replay for multimodal applications
US20080208594A1 (en) * 2007-02-27 2008-08-28 Cross Charles W Effecting Functions On A Multimodal Telephony Device
US7809575B2 (en) * 2007-02-27 2010-10-05 Nuance Communications, Inc. Enabling global grammars for a particular multimodal application
US20080208589A1 (en) * 2007-02-27 2008-08-28 Cross Charles W Presenting Supplemental Content For Digital Media Using A Multimodal Application
US7822608B2 (en) * 2007-02-27 2010-10-26 Nuance Communications, Inc. Disambiguating a speech recognition grammar in a multimodal application
US7840409B2 (en) * 2007-02-27 2010-11-23 Nuance Communications, Inc. Ordering recognition results produced by an automatic speech recognition engine for a multimodal application
US9208783B2 (en) * 2007-02-27 2015-12-08 Nuance Communications, Inc. Altering behavior of a multimodal application based on location
US20080208586A1 (en) * 2007-02-27 2008-08-28 Soonthorn Ativanichayaphong Enabling Natural Language Understanding In An X+V Page Of A Multimodal Application
US8938392B2 (en) * 2007-02-27 2015-01-20 Nuance Communications, Inc. Configuring a speech engine for a multimodal application based on location
US8713542B2 (en) * 2007-02-27 2014-04-29 Nuance Communications, Inc. Pausing a VoiceXML dialog of a multimodal application
US8843376B2 (en) * 2007-03-13 2014-09-23 Nuance Communications, Inc. Speech-enabled web content searching using a multimodal browser
US7945851B2 (en) * 2007-03-14 2011-05-17 Nuance Communications, Inc. Enabling dynamic voiceXML in an X+V page of a multimodal application
US8515757B2 (en) 2007-03-20 2013-08-20 Nuance Communications, Inc. Indexing digitized speech with words represented in the digitized speech
US8670987B2 (en) * 2007-03-20 2014-03-11 Nuance Communications, Inc. Automatic speech recognition with dynamic grammar rules
US8909532B2 (en) * 2007-03-23 2014-12-09 Nuance Communications, Inc. Supporting multi-lingual user interaction with a multimodal application
US20080235029A1 (en) * 2007-03-23 2008-09-25 Cross Charles W Speech-Enabled Predictive Text Selection For A Multimodal Application
US8788620B2 (en) * 2007-04-04 2014-07-22 International Business Machines Corporation Web service support for a multimodal client processing a multimodal application
US8725513B2 (en) * 2007-04-12 2014-05-13 Nuance Communications, Inc. Providing expressive user interaction with a multimodal application
US8862475B2 (en) * 2007-04-12 2014-10-14 Nuance Communications, Inc. Speech-enabled content navigation and control of a distributed multimodal browser
US8060371B1 (en) 2007-05-09 2011-11-15 Nextel Communications Inc. System and method for voice interaction with non-voice enabled web pages
US8584020B2 (en) * 2007-12-28 2013-11-12 Microsoft Corporation User-defined application models
US9177551B2 (en) * 2008-01-22 2015-11-03 At&T Intellectual Property I, L.P. System and method of providing speech processing in user interface
US8831950B2 (en) * 2008-04-07 2014-09-09 Nuance Communications, Inc. Automated voice enablement of a web page
US9047869B2 (en) * 2008-04-07 2015-06-02 Nuance Communications, Inc. Free form input field support for automated voice enablement of a web page
US8543404B2 (en) * 2008-04-07 2013-09-24 Nuance Communications, Inc. Proactive completion of input fields for automated voice enablement of a web page
US8082148B2 (en) 2008-04-24 2011-12-20 Nuance Communications, Inc. Testing a grammar used in speech recognition for reliability in a plurality of operating environments having different background noise
US8229081B2 (en) * 2008-04-24 2012-07-24 International Business Machines Corporation Dynamically publishing directory information for a plurality of interactive voice response systems
US8214242B2 (en) * 2008-04-24 2012-07-03 International Business Machines Corporation Signaling correspondence between a meeting agenda and a meeting discussion
US9349367B2 (en) * 2008-04-24 2016-05-24 Nuance Communications, Inc. Records disambiguation in a multimodal application operating on a multimodal device
US8121837B2 (en) 2008-04-24 2012-02-21 Nuance Communications, Inc. Adjusting a speech engine for a mobile computing device based on background noise
US20100070360A1 (en) * 2008-09-13 2010-03-18 At&T Intellectual Property I, L.P. System and method for creating a speech search platform for coupons
US20100111270A1 (en) * 2008-10-31 2010-05-06 Vonage Holdings Corp. Method and apparatus for voicemail management
US8380513B2 (en) * 2009-05-19 2013-02-19 International Business Machines Corporation Improving speech capabilities of a multimodal application
US8290780B2 (en) 2009-06-24 2012-10-16 International Business Machines Corporation Dynamically extending the speech prompts of a multimodal application
US8510117B2 (en) * 2009-07-09 2013-08-13 Nuance Communications, Inc. Speech enabled media sharing in a multimodal application
WO2011004000A2 (en) * 2009-07-10 2011-01-13 Dialogs Unlimited B.V. Information distributing system with feedback mechanism
US8416714B2 (en) * 2009-08-05 2013-04-09 International Business Machines Corporation Multimodal teleconferencing
DE112009005347A5 (de) * 2009-11-05 2012-08-16 Bertram Stoll System und methode zur spracherfassung von strukturierten daten
US9847083B2 (en) * 2011-11-17 2017-12-19 Universal Electronics Inc. System and method for voice actuated configuration of a controlling device
EP2639792A1 (en) * 2012-03-16 2013-09-18 France Télécom Voice control of applications by associating user input with action-context idendifier pairs
US20130246920A1 (en) * 2012-03-19 2013-09-19 Research In Motion Limited Method of enabling voice input for a visually based interface
US10198417B2 (en) * 2012-04-05 2019-02-05 Mitesh L. THAKKER Systems and methods to input or access data using remote submitting mechanism
US8898063B1 (en) 2013-03-15 2014-11-25 Mark Sykes Method for converting speech to text, performing natural language processing on the text output, extracting data values and matching to an electronic ticket form
US9449600B2 (en) * 2013-06-11 2016-09-20 Plantronics, Inc. Character data entry
WO2015058293A1 (en) 2013-10-23 2015-04-30 Mcafee, Inc. Method and processes for securely autofilling data fields in a software application
US11120210B2 (en) * 2014-07-18 2021-09-14 Microsoft Technology Licensing, Llc Entity recognition for enhanced document productivity
US9953646B2 (en) 2014-09-02 2018-04-24 Belleau Technologies Method and system for dynamic speech recognition and tracking of prewritten script
US9582498B2 (en) * 2014-09-12 2017-02-28 Microsoft Technology Licensing, Llc Actions on digital document elements from voice
US10199041B2 (en) * 2014-12-30 2019-02-05 Honeywell International Inc. Speech recognition systems and methods for maintenance repair and overhaul
RU2646350C2 (ru) * 2015-01-27 2018-03-02 Общество С Ограниченной Ответственностью "Яндекс" Способ ввода данных в электронное устройство, способ обработки голосового запроса, машиночитаемый носитель (варианты), электронное устройство, сервер и система
US10019485B2 (en) * 2015-02-23 2018-07-10 Google Llc Search query based form populator
US10445419B2 (en) * 2016-01-05 2019-10-15 Adobe Inc. Form filling engine to populate fields of an electronic form
US10460024B2 (en) * 2016-01-05 2019-10-29 Adobe Inc. Interactive electronic form workflow assistant that guides interactions with electronic forms in a conversational manner
US10657200B2 (en) 2016-01-05 2020-05-19 Adobe Inc. Proactive form guidance for interacting with electronic forms
US10229682B2 (en) 2017-02-01 2019-03-12 International Business Machines Corporation Cognitive intervention for voice recognition failure
US9824691B1 (en) * 2017-06-02 2017-11-21 Sorenson Ip Holdings, Llc Automated population of electronic records
US11776059B1 (en) 2018-02-19 2023-10-03 State Farm Mutual Automobile Insurance Company Voice analysis systems and methods for processing digital sound data over a communications network
US11144906B2 (en) 2018-02-20 2021-10-12 Visa International Service Association Key-pad centric payments
CN108764649B (zh) * 2018-04-28 2022-04-26 平安科技(深圳)有限公司 保险销售实时监控方法、装置、设备及存储介质
CN109360571A (zh) * 2018-10-31 2019-02-19 深圳壹账通智能科技有限公司 贷款信息的处理方法及装置、存储介质、计算机设备
US11556699B2 (en) * 2019-02-04 2023-01-17 Citrix Systems, Inc. Data migration across SaaS applications
KR20220010034A (ko) * 2019-10-15 2022-01-25 구글 엘엘씨 그래픽 사용자 인터페이스에 음성-제어 컨텐츠 입력
CN111930776B (zh) * 2020-09-10 2021-01-05 北京维数统计事务所有限公司 表单处理方法、装置、电子设备和可读存储介质
US11507345B1 (en) * 2020-09-23 2022-11-22 Suki AI, Inc. Systems and methods to accept speech input and edit a note upon receipt of an indication to edit
CN117057325B (zh) * 2023-10-13 2024-01-05 湖北华中电力科技开发有限责任公司 一种应用于电网领域表单填写方法、系统和电子设备

Family Cites Families (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5642519A (en) * 1994-04-29 1997-06-24 Sun Microsystems, Inc. Speech interpreter with a unified grammer compiler
US5619708A (en) * 1994-10-25 1997-04-08 Korteam International, Inc. System and method for generating database input forms
AU1566497A (en) * 1995-12-22 1997-07-17 Rutgers University Method and system for audio access to information in a wide area computer network
US6188985B1 (en) * 1997-01-06 2001-02-13 Texas Instruments Incorporated Wireless voice-activated device for control of a processor-based host system
US6456974B1 (en) * 1997-01-06 2002-09-24 Texas Instruments Incorporated System and method for adding speech recognition capabilities to java
US6856960B1 (en) * 1997-04-14 2005-02-15 At & T Corp. System and method for providing remote automatic speech recognition and text-to-speech services via a packet network
US6078886A (en) * 1997-04-14 2000-06-20 At&T Corporation System and method for providing remote automatic speech recognition services via a packet network
US5878418A (en) * 1997-08-12 1999-03-02 Intervoice Limited Partnership Auto definition of data sets and provisioning interfaces for call automation
US5995918A (en) * 1997-09-17 1999-11-30 Unisys Corporation System and method for creating a language grammar using a spreadsheet or table interface
AU2789499A (en) * 1998-02-25 1999-09-15 Scansoft, Inc. Generic run-time engine for interfacing between applications and speech engines
US6199079B1 (en) * 1998-03-09 2001-03-06 Junglee Corporation Method and system for automatically filling forms in an integrated network based transaction environment
US6493671B1 (en) * 1998-10-02 2002-12-10 Motorola, Inc. Markup language for interactive services to notify a user of an event and methods thereof
US6490601B1 (en) * 1999-01-15 2002-12-03 Infospace, Inc. Server for enabling the automatic insertion of data into electronic forms on a user computer
US7216351B1 (en) * 1999-04-07 2007-05-08 International Business Machines Corporation Systems and methods for synchronizing multi-modal interactions
US6314402B1 (en) * 1999-04-23 2001-11-06 Nuance Communications Method and apparatus for creating modifiable and combinable speech objects for acquiring information from a speaker in an interactive voice response system
GB9911971D0 (en) * 1999-05-21 1999-07-21 Canon Kk A system, a server for a system and a machine for use in a system
US7050977B1 (en) * 1999-11-12 2006-05-23 Phoenix Solutions, Inc. Speech-enabled server for internet website and method
US7334050B2 (en) * 2000-06-07 2008-02-19 Nvidia International, Inc. Voice applications and voice-based interface
US20020107918A1 (en) * 2000-06-15 2002-08-08 Shaffer James D. System and method for capturing, matching and linking information in a global communications network
WO2002005079A2 (en) * 2000-07-07 2002-01-17 Openwave Systems, Inc. Graphical user interface features of a browser in a hand-held wireless communication device
FI20001918A (fi) * 2000-08-30 2002-03-01 Nokia Corp Monimodaalinen sisältö ja automaattinen puheen tunnistus langattomassa tietoliikennejärjestelmässä
US20020054090A1 (en) * 2000-09-01 2002-05-09 Silva Juliana Freire Method and apparatus for creating and providing personalized access to web content and services from terminals having diverse capabilities
DE60113787T2 (de) * 2000-11-22 2006-08-10 Matsushita Electric Industrial Co., Ltd., Kadoma Verfahren und Vorrichtung zur Texteingabe durch Spracherkennung
DE60133529T2 (de) * 2000-11-23 2009-06-10 International Business Machines Corp. Sprachnavigation in Webanwendungen
US7487440B2 (en) * 2000-12-04 2009-02-03 International Business Machines Corporation Reusable voiceXML dialog components, subdialogs and beans
US20020093530A1 (en) 2001-01-17 2002-07-18 Prasad Krothapalli Automatic filling and submission of completed forms
US6658414B2 (en) * 2001-03-06 2003-12-02 Topic Radio, Inc. Methods, systems, and computer program products for generating and providing access to end-user-definable voice portals
US7409349B2 (en) * 2001-05-04 2008-08-05 Microsoft Corporation Servers for web enabled speech recognition
JP2004530982A (ja) * 2001-05-04 2004-10-07 ユニシス コーポレーション Webサーバからの音声アプリケーション情報の動的な生成
JP3423296B2 (ja) * 2001-06-18 2003-07-07 沖電気工業株式会社 音声対話インターフェース装置
US6801604B2 (en) * 2001-06-25 2004-10-05 International Business Machines Corporation Universal IP-based and scalable architectures across conversational applications using web services for speech and audio processing resources
US20030028792A1 (en) 2001-08-02 2003-02-06 International Business Machines Corportion System, method, and computer program product for automatically inputting user data into internet based electronic forms
US6996528B2 (en) * 2001-08-03 2006-02-07 Matsushita Electric Industrial Co., Ltd. Method for efficient, safe and reliable data entry by voice under adverse conditions
US8229753B2 (en) * 2001-10-21 2012-07-24 Microsoft Corporation Web server controls for web enabled recognition and/or audible prompting
US7124085B2 (en) * 2001-12-13 2006-10-17 Matsushita Electric Industrial Co., Ltd. Constraint-based speech recognition system and method
US8799464B2 (en) 2001-12-28 2014-08-05 Motorola Mobility Llc Multi-modal communication using a session specific proxy server
US6898567B2 (en) * 2001-12-29 2005-05-24 Motorola, Inc. Method and apparatus for multi-level distributed speech recognition
US7177814B2 (en) * 2002-02-07 2007-02-13 Sap Aktiengesellschaft Dynamic grammar for voice-enabled applications
US6925308B2 (en) * 2002-02-11 2005-08-02 Qualcomm, Incorporated Auto-fill message fields in a communication terminal
AT6920U1 (de) * 2002-02-14 2004-05-25 Sail Labs Technology Ag Verfahren zur erzeugung natürlicher sprache in computer-dialogsystemen
WO2003071422A1 (en) * 2002-02-18 2003-08-28 Kirusa, Inc. A technique for synchronizing visual and voice browsers to enable multi-modal browsing
US20050171762A1 (en) * 2002-03-06 2005-08-04 Professional Pharmaceutical Index Creating records of patients using a browser based hand-held assistant
US7076428B2 (en) * 2002-12-30 2006-07-11 Motorola, Inc. Method and apparatus for selective distributed speech recognition
US7003464B2 (en) * 2003-01-09 2006-02-21 Motorola, Inc. Dialog recognition and control in a voice browser
CA2516941A1 (en) * 2003-02-19 2004-09-02 Custom Speech Usa, Inc. A method for form completion using speech recognition and text comparison
US7260535B2 (en) * 2003-04-28 2007-08-21 Microsoft Corporation Web server controls for web enabled recognition and/or audible prompting for call controls
US7200559B2 (en) * 2003-05-29 2007-04-03 Microsoft Corporation Semantic object synchronous understanding implemented with speech application language tags
US7729919B2 (en) * 2003-07-03 2010-06-01 Microsoft Corporation Combining use of a stepwise markup language and an object oriented development tool
US7451086B2 (en) * 2005-05-19 2008-11-11 Siemens Communications, Inc. Method and apparatus for voice recognition

Also Published As

Publication number Publication date
TW200630957A (en) 2006-09-01
CN1752975B (zh) 2011-07-06
US20060074652A1 (en) 2006-04-06
US20060064302A1 (en) 2006-03-23
US7739117B2 (en) 2010-06-15
CN1752975A (zh) 2006-03-29
US7953597B2 (en) 2011-05-31

Similar Documents

Publication Publication Date Title
TWI353585B (en) Computer-implemented method,apparatus, and compute
US6188985B1 (en) Wireless voice-activated device for control of a processor-based host system
US8046228B2 (en) Voice activated hypermedia systems using grammatical metadata
US6587822B2 (en) Web-based platform for interactive voice response (IVR)
EP2824596B1 (en) Speech- Enabled Web Content Searching Using a Multimodal Browser
US7827033B2 (en) Enabling grammars in web page frames
KR100561228B1 (ko) 보이스엑스엠엘 문서를 엑스에이치티엠엘플러스보이스문서로 변환하기 위한 방법 및 이를 이용한 멀티모달서비스 시스템
RU2349969C2 (ru) Синхронное понимание семантических объектов, реализованное с помощью тэгов речевого приложения
US8073697B2 (en) Establishing a multimodal personality for a multimodal application
US7171361B2 (en) Idiom handling in voice service systems
US20060235694A1 (en) Integrating conversational speech into Web browsers
GB2383247A (en) Multi-modal picture allowing verbal interaction between a user and the picture
US20050010422A1 (en) Speech processing apparatus and method
US20030195751A1 (en) Distributed automatic speech recognition with persistent user parameters
Fabbrizio et al. Extending a standard-based ip and computer telephony platform to support multi-modal services
Reddy et al. Listener-controlled dynamic navigation of voicexml documents
Griol et al. The VoiceApp system: Speech technologies to access the semantic web
Habeeb et al. Design module for speech recognition graphical user interface browser to supports the web speech applications
Gupta et al. Dawn: Dynamic aural web navigation
Brkic et al. VoiceXML for Slavic languages application development
Kolias et al. A pervasive Wiki application based on VoiceXML
Hsu et al. On the construction of a VoiceXML Voice Browser
Ju Voice-enabled click and dial system

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees