TWI840587B - 多模態使用者介面 - Google Patents

多模態使用者介面 Download PDF

Info

Publication number
TWI840587B
TWI840587B TW109123487A TW109123487A TWI840587B TW I840587 B TWI840587 B TW I840587B TW 109123487 A TW109123487 A TW 109123487A TW 109123487 A TW109123487 A TW 109123487A TW I840587 B TWI840587 B TW I840587B
Authority
TW
Taiwan
Prior art keywords
input
user
data
mode
feedback message
Prior art date
Application number
TW109123487A
Other languages
English (en)
Chinese (zh)
Other versions
TW202109245A (zh
Inventor
瑞比 喬杜里
金萊軒
文山古
郭寅一
費特梅 薩吉
艾瑞克 維瑟
Original Assignee
美商高通公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 美商高通公司 filed Critical 美商高通公司
Publication of TW202109245A publication Critical patent/TW202109245A/zh
Application granted granted Critical
Publication of TWI840587B publication Critical patent/TWI840587B/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/033Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
    • G06F3/038Control and interface arrangements therefor, e.g. drivers or device-embedded control circuitry
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04883Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/038Indexing scheme relating to G06F3/038
    • G06F2203/0381Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/038Indexing scheme relating to G06F3/038
    • G06F2203/0382Plural input, i.e. interface arrangements in which a plurality of input device of the same type are in communication with a PC
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)
  • Input From Keyboards Or The Like (AREA)
TW109123487A 2019-07-12 2020-07-10 多模態使用者介面 TWI840587B (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201962873775P 2019-07-12 2019-07-12
US62/873,775 2019-07-12
US16/685,946 2019-11-15
US16/685,946 US11348581B2 (en) 2019-07-12 2019-11-15 Multi-modal user interface

Publications (2)

Publication Number Publication Date
TW202109245A TW202109245A (zh) 2021-03-01
TWI840587B true TWI840587B (zh) 2024-05-01

Family

ID=74101815

Family Applications (1)

Application Number Title Priority Date Filing Date
TW109123487A TWI840587B (zh) 2019-07-12 2020-07-10 多模態使用者介面

Country Status (9)

Country Link
US (1) US11348581B2 (https=)
EP (1) EP3997553A1 (https=)
JP (1) JP7522177B2 (https=)
KR (1) KR20220031610A (https=)
CN (1) CN114127665B (https=)
BR (1) BR112021026765A2 (https=)
PH (1) PH12021553219A1 (https=)
TW (1) TWI840587B (https=)
WO (1) WO2021011331A1 (https=)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021103191A (ja) * 2018-03-30 2021-07-15 ソニーグループ株式会社 情報処理装置および情報処理方法
US11615801B1 (en) * 2019-09-20 2023-03-28 Apple Inc. System and method of enhancing intelligibility of audio playback
US11521643B2 (en) * 2020-05-08 2022-12-06 Bose Corporation Wearable audio device with user own-voice recording
WO2022016406A1 (zh) * 2020-07-22 2022-01-27 北京小米移动软件有限公司 信息传输方法、装置及通信设备
US11996095B2 (en) 2020-08-12 2024-05-28 Kyndryl, Inc. Augmented reality enabled command management
US11878244B2 (en) * 2020-09-10 2024-01-23 Holland Bloorview Kids Rehabilitation Hospital Customizable user input recognition systems
US11830486B2 (en) * 2020-10-13 2023-11-28 Google Llc Detecting near matches to a hotword or phrase
US11461681B2 (en) * 2020-10-14 2022-10-04 Openstream Inc. System and method for multi-modality soft-agent for query population and information mining
US11809480B1 (en) * 2020-12-31 2023-11-07 Meta Platforms, Inc. Generating dynamic knowledge graph of media contents for assistant systems
US12321865B2 (en) * 2021-01-25 2025-06-03 Salesforce, Inc. Event prediction based on multimodal learning
US11651541B2 (en) * 2021-03-01 2023-05-16 Roblox Corporation Integrated input/output (I/O) for a three-dimensional (3D) environment
CN113282172A (zh) * 2021-05-18 2021-08-20 前海七剑科技(深圳)有限公司 一种手势识别的控制方法和装置
US11783073B2 (en) * 2021-06-21 2023-10-10 Microsoft Technology Licensing, Llc Configuration of default sensitivity labels for network file storage locations
WO2023272629A1 (zh) * 2021-06-30 2023-01-05 华为技术有限公司 界面的控制方法、装置和系统
US12614095B2 (en) * 2021-07-12 2026-04-28 Cypress Semiconductor Corporation System and method for activity classification
WO2023035073A1 (en) * 2021-09-08 2023-03-16 Huawei Technologies Canada Co., Ltd. Methods and devices for communication with multimodal compositions
US11966663B1 (en) * 2021-09-29 2024-04-23 Amazon Technologies, Inc. Speech processing and multi-modal widgets
US20230104856A1 (en) * 2021-10-05 2023-04-06 Rfmicron, Inc. Data logging device
US11971710B2 (en) * 2021-11-12 2024-04-30 Pani Energy Inc Digital model based plant operation and optimization
US12333794B2 (en) * 2021-11-12 2025-06-17 Sony Group Corporation Emotion recognition in multimedia videos using multi-modal fusion-based deep neural network
WO2024029827A1 (ko) * 2022-08-01 2024-02-08 삼성전자 주식회사 제어 추천을 위한 전자 장치 및 컴퓨터 판독가능 저장 매체
US20240036527A1 (en) * 2022-08-01 2024-02-01 Samsung Electronics Co., Ltd. Electronic device and computer readable storage medium for control recommendation
KR20240079507A (ko) * 2022-11-29 2024-06-05 한국전자통신연구원 크로스모달 정보를 이용한 언어모델 생성 방법 및 장치
EP4524685A1 (en) * 2023-09-12 2025-03-19 Rohde & Schwarz GmbH & Co. KG Measurement application device, and method
US20250178624A1 (en) * 2023-12-01 2025-06-05 Qualcomm Incorporated Speech-based vehicular control
US20260016309A1 (en) * 2024-07-11 2026-01-15 Apple Inc. Providing movement dynamics estimations

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729386A (zh) * 2012-10-16 2014-04-16 阿里巴巴集团控股有限公司 信息查询系统与方法
CN104025085A (zh) * 2011-07-28 2014-09-03 纪金有限公司 用于提供关于包括在内容页面中的语义实体的信息的系统和方法
CN105843605A (zh) * 2016-03-17 2016-08-10 中国银行股份有限公司 一种数据映射方法及装置
US20180329677A1 (en) * 2017-05-15 2018-11-15 Apple Inc. Multi-modal interfaces

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8386255B2 (en) * 2009-03-17 2013-02-26 Avaya Inc. Providing descriptions of visually presented information to video teleconference participants who are not video-enabled
US9123341B2 (en) 2009-03-18 2015-09-01 Robert Bosch Gmbh System and method for multi-modal input synchronization and disambiguation
KR101092820B1 (ko) 2009-09-22 2011-12-12 현대자동차주식회사 립리딩과 음성 인식 통합 멀티모달 인터페이스 시스템
US8473289B2 (en) * 2010-08-06 2013-06-25 Google Inc. Disambiguating input based on context
US20130085753A1 (en) * 2011-09-30 2013-04-04 Google Inc. Hybrid Client/Server Speech Recognition In A Mobile Device
US9152376B2 (en) * 2011-12-01 2015-10-06 At&T Intellectual Property I, L.P. System and method for continuous multimodal speech and gesture interaction
US9465833B2 (en) * 2012-07-31 2016-10-11 Veveo, Inc. Disambiguating user intent in conversational interaction system for large corpus information retrieval
WO2014070872A2 (en) 2012-10-30 2014-05-08 Robert Bosch Gmbh System and method for multimodal interaction with reduced distraction in operating vehicles
US9190058B2 (en) * 2013-01-25 2015-11-17 Microsoft Technology Licensing, Llc Using visual cues to disambiguate speech inputs
WO2014182787A2 (en) 2013-05-08 2014-11-13 Jpmorgan Chase Bank, N.A. Systems and methods for high fidelity multi-modal out-of-band biometric authentication
US10402060B2 (en) 2013-06-28 2019-09-03 Orange System and method for gesture disambiguation
US10741182B2 (en) * 2014-02-18 2020-08-11 Lenovo (Singapore) Pte. Ltd. Voice input correction using non-audio based input
US8825585B1 (en) 2014-03-11 2014-09-02 Fmr Llc Interpretation of natural communication
US20160034249A1 (en) * 2014-07-31 2016-02-04 Microsoft Technology Licensing Llc Speechless interaction with a speech recognition device
US10446141B2 (en) * 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
JP2018036902A (ja) * 2016-08-31 2018-03-08 島根県 機器操作システム、機器操作方法および機器操作プログラム
US20180357040A1 (en) * 2017-06-09 2018-12-13 Mitsubishi Electric Automotive America, Inc. In-vehicle infotainment with multi-modal interface
US11430437B2 (en) * 2017-08-01 2022-08-30 Sony Corporation Information processor and information processing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104025085A (zh) * 2011-07-28 2014-09-03 纪金有限公司 用于提供关于包括在内容页面中的语义实体的信息的系统和方法
CN103729386A (zh) * 2012-10-16 2014-04-16 阿里巴巴集团控股有限公司 信息查询系统与方法
CN105843605A (zh) * 2016-03-17 2016-08-10 中国银行股份有限公司 一种数据映射方法及装置
US20180329677A1 (en) * 2017-05-15 2018-11-15 Apple Inc. Multi-modal interfaces

Also Published As

Publication number Publication date
PH12021553219A1 (en) 2022-11-21
BR112021026765A2 (pt) 2022-02-15
EP3997553A1 (en) 2022-05-18
WO2021011331A1 (en) 2021-01-21
JP7522177B2 (ja) 2024-07-24
JP2022539794A (ja) 2022-09-13
CN114127665B (zh) 2024-10-08
KR20220031610A (ko) 2022-03-11
CN114127665A (zh) 2022-03-01
US20210012770A1 (en) 2021-01-14
US11348581B2 (en) 2022-05-31
TW202109245A (zh) 2021-03-01

Similar Documents

Publication Publication Date Title
TWI840587B (zh) 多模態使用者介面
CN111699528B (zh) 电子装置及执行电子装置的功能的方法
US12327573B2 (en) Identifying input for speech recognition engine
CN111868824B (zh) 用于情境感知控制的设备和方法
US11656837B2 (en) Electronic device for controlling sound and operation method therefor
US10353495B2 (en) Personalized operation of a mobile device using sensor signatures
CN102483918B (zh) 声音识别装置
JP2023159461A (ja) ウェアラブルシステム発話処理
KR102740847B1 (ko) 사용자 입력 처리 방법 및 이를 지원하는 전자 장치
JP2017050010A (ja) ハイブリッド性能スケーリングまたは音声認識
WO2015187587A1 (en) Hands free device with directional interface
US11895474B2 (en) Activity detection on devices with multi-modal sensing
CN104464737B (zh) 声音验证系统和声音验证方法
TW202135044A (zh) 基於使用者辨識的語音啟用
KR20150004080A (ko) 보청기 및 보청기 제어 방법
CN112639965A (zh) 在包括多个设备的环境中的语音识别方法和设备
WO2021149441A1 (ja) 情報処理装置および情報処理方法
EP4285611B1 (en) Psychoacoustic enhancement based on audio source directivity
KR20230084154A (ko) 동적 분류기를 사용한 사용자 음성 활동 검출
KR102168812B1 (ko) 사운드를 제어하는 전자 장치 및 그 동작 방법
KR20210109722A (ko) 사용자의 발화 상태에 기초하여 제어 정보를 생성하는 디바이스 및 그 제어 방법