JP7522177B2 - マルチモーダルユーザインターフェース - Google Patents
マルチモーダルユーザインターフェース Download PDFInfo
- Publication number
- JP7522177B2 JP7522177B2 JP2022500128A JP2022500128A JP7522177B2 JP 7522177 B2 JP7522177 B2 JP 7522177B2 JP 2022500128 A JP2022500128 A JP 2022500128A JP 2022500128 A JP2022500128 A JP 2022500128A JP 7522177 B2 JP7522177 B2 JP 7522177B2
- Authority
- JP
- Japan
- Prior art keywords
- input
- user
- data
- mode
- feedback message
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/03—Arrangements for converting the position or the displacement of a member into a coded form
- G06F3/033—Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
- G06F3/038—Control and interface arrangements therefor, e.g. drivers or device-embedded control circuitry
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
- G06F3/04883—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/451—Execution arrangements for user interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/038—Indexing scheme relating to G06F3/038
- G06F2203/0381—Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/038—Indexing scheme relating to G06F3/038
- G06F2203/0382—Plural input, i.e. interface arrangements in which a plurality of input device of the same type are in communication with a PC
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/226—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
- G10L2015/228—Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- User Interface Of Digital Computer (AREA)
- Input From Keyboards Or The Like (AREA)
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201962873775P | 2019-07-12 | 2019-07-12 | |
| US62/873,775 | 2019-07-12 | ||
| US16/685,946 US11348581B2 (en) | 2019-07-12 | 2019-11-15 | Multi-modal user interface |
| US16/685,946 | 2019-11-15 | ||
| PCT/US2020/041499 WO2021011331A1 (en) | 2019-07-12 | 2020-07-10 | Multi-modal user interface |
Publications (4)
| Publication Number | Publication Date |
|---|---|
| JP2022539794A JP2022539794A (ja) | 2022-09-13 |
| JP2022539794A5 JP2022539794A5 (https=) | 2023-06-20 |
| JPWO2021011331A5 JPWO2021011331A5 (https=) | 2023-06-20 |
| JP7522177B2 true JP7522177B2 (ja) | 2024-07-24 |
Family
ID=74101815
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| JP2022500128A Active JP7522177B2 (ja) | 2019-07-12 | 2020-07-10 | マルチモーダルユーザインターフェース |
Country Status (9)
| Country | Link |
|---|---|
| US (1) | US11348581B2 (https=) |
| EP (1) | EP3997553A1 (https=) |
| JP (1) | JP7522177B2 (https=) |
| KR (1) | KR20220031610A (https=) |
| CN (1) | CN114127665B (https=) |
| BR (1) | BR112021026765A2 (https=) |
| PH (1) | PH12021553219A1 (https=) |
| TW (1) | TWI840587B (https=) |
| WO (1) | WO2021011331A1 (https=) |
Families Citing this family (25)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2021103191A (ja) * | 2018-03-30 | 2021-07-15 | ソニーグループ株式会社 | 情報処理装置および情報処理方法 |
| US11615801B1 (en) * | 2019-09-20 | 2023-03-28 | Apple Inc. | System and method of enhancing intelligibility of audio playback |
| US11521643B2 (en) * | 2020-05-08 | 2022-12-06 | Bose Corporation | Wearable audio device with user own-voice recording |
| EP4187930A4 (en) * | 2020-07-22 | 2024-04-03 | Beijing Xiaomi Mobile Software Co., Ltd. | INFORMATION TRANSMISSION METHOD AND APPARATUS, AND COMMUNICATION DEVICE |
| US11996095B2 (en) * | 2020-08-12 | 2024-05-28 | Kyndryl, Inc. | Augmented reality enabled command management |
| US11878244B2 (en) * | 2020-09-10 | 2024-01-23 | Holland Bloorview Kids Rehabilitation Hospital | Customizable user input recognition systems |
| US11830486B2 (en) * | 2020-10-13 | 2023-11-28 | Google Llc | Detecting near matches to a hotword or phrase |
| US11461681B2 (en) * | 2020-10-14 | 2022-10-04 | Openstream Inc. | System and method for multi-modality soft-agent for query population and information mining |
| US11809480B1 (en) * | 2020-12-31 | 2023-11-07 | Meta Platforms, Inc. | Generating dynamic knowledge graph of media contents for assistant systems |
| US12321865B2 (en) * | 2021-01-25 | 2025-06-03 | Salesforce, Inc. | Event prediction based on multimodal learning |
| US11651541B2 (en) * | 2021-03-01 | 2023-05-16 | Roblox Corporation | Integrated input/output (I/O) for a three-dimensional (3D) environment |
| CN113282172A (zh) * | 2021-05-18 | 2021-08-20 | 前海七剑科技(深圳)有限公司 | 一种手势识别的控制方法和装置 |
| US11783073B2 (en) * | 2021-06-21 | 2023-10-10 | Microsoft Technology Licensing, Llc | Configuration of default sensitivity labels for network file storage locations |
| CN116670624A (zh) * | 2021-06-30 | 2023-08-29 | 华为技术有限公司 | 界面的控制方法、装置和系统 |
| CN118251878A (zh) * | 2021-09-08 | 2024-06-25 | 华为技术加拿大有限公司 | 使用多模态合成进行通信的方法和设备 |
| US11966663B1 (en) * | 2021-09-29 | 2024-04-23 | Amazon Technologies, Inc. | Speech processing and multi-modal widgets |
| US20230104856A1 (en) * | 2021-10-05 | 2023-04-06 | Rfmicron, Inc. | Data logging device |
| US12333794B2 (en) * | 2021-11-12 | 2025-06-17 | Sony Group Corporation | Emotion recognition in multimedia videos using multi-modal fusion-based deep neural network |
| US11971710B2 (en) * | 2021-11-12 | 2024-04-30 | Pani Energy Inc | Digital model based plant operation and optimization |
| US20240036527A1 (en) * | 2022-08-01 | 2024-02-01 | Samsung Electronics Co., Ltd. | Electronic device and computer readable storage medium for control recommendation |
| WO2024029827A1 (ko) * | 2022-08-01 | 2024-02-08 | 삼성전자 주식회사 | 제어 추천을 위한 전자 장치 및 컴퓨터 판독가능 저장 매체 |
| KR20240079507A (ko) * | 2022-11-29 | 2024-06-05 | 한국전자통신연구원 | 크로스모달 정보를 이용한 언어모델 생성 방법 및 장치 |
| EP4524685A1 (en) * | 2023-09-12 | 2025-03-19 | Rohde & Schwarz GmbH & Co. KG | Measurement application device, and method |
| US20250178624A1 (en) * | 2023-12-01 | 2025-06-05 | Qualcomm Incorporated | Speech-based vehicular control |
| US20260016309A1 (en) * | 2024-07-11 | 2026-01-15 | Apple Inc. | Providing movement dynamics estimations |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014070872A2 (en) | 2012-10-30 | 2014-05-08 | Robert Bosch Gmbh | System and method for multimodal interaction with reduced distraction in operating vehicles |
| JP2018036902A (ja) | 2016-08-31 | 2018-03-08 | 島根県 | 機器操作システム、機器操作方法および機器操作プログラム |
| US20180329677A1 (en) | 2017-05-15 | 2018-11-15 | Apple Inc. | Multi-modal interfaces |
| WO2019026617A1 (ja) | 2017-08-01 | 2019-02-07 | ソニー株式会社 | 情報処理装置、及び情報処理方法 |
Family Cites Families (18)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8386255B2 (en) * | 2009-03-17 | 2013-02-26 | Avaya Inc. | Providing descriptions of visually presented information to video teleconference participants who are not video-enabled |
| US9123341B2 (en) | 2009-03-18 | 2015-09-01 | Robert Bosch Gmbh | System and method for multi-modal input synchronization and disambiguation |
| KR101092820B1 (ko) | 2009-09-22 | 2011-12-12 | 현대자동차주식회사 | 립리딩과 음성 인식 통합 멀티모달 인터페이스 시스템 |
| US8473289B2 (en) * | 2010-08-06 | 2013-06-25 | Google Inc. | Disambiguating input based on context |
| US8898583B2 (en) * | 2011-07-28 | 2014-11-25 | Kikin Inc. | Systems and methods for providing information regarding semantic entities included in a page of content |
| US20130085753A1 (en) * | 2011-09-30 | 2013-04-04 | Google Inc. | Hybrid Client/Server Speech Recognition In A Mobile Device |
| US9152376B2 (en) * | 2011-12-01 | 2015-10-06 | At&T Intellectual Property I, L.P. | System and method for continuous multimodal speech and gesture interaction |
| US9465833B2 (en) * | 2012-07-31 | 2016-10-11 | Veveo, Inc. | Disambiguating user intent in conversational interaction system for large corpus information retrieval |
| CN103729386B (zh) * | 2012-10-16 | 2017-08-04 | 阿里巴巴集团控股有限公司 | 信息查询系统与方法 |
| US9190058B2 (en) * | 2013-01-25 | 2015-11-17 | Microsoft Technology Licensing, Llc | Using visual cues to disambiguate speech inputs |
| EP2995040B1 (en) | 2013-05-08 | 2022-11-16 | JPMorgan Chase Bank, N.A. | Systems and methods for high fidelity multi-modal out-of-band biometric authentication |
| US10402060B2 (en) | 2013-06-28 | 2019-09-03 | Orange | System and method for gesture disambiguation |
| US10741182B2 (en) * | 2014-02-18 | 2020-08-11 | Lenovo (Singapore) Pte. Ltd. | Voice input correction using non-audio based input |
| US8825585B1 (en) | 2014-03-11 | 2014-09-02 | Fmr Llc | Interpretation of natural communication |
| US20160034249A1 (en) * | 2014-07-31 | 2016-02-04 | Microsoft Technology Licensing Llc | Speechless interaction with a speech recognition device |
| US10446141B2 (en) * | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
| CN105843605B (zh) * | 2016-03-17 | 2019-03-08 | 中国银行股份有限公司 | 一种数据映射方法及装置 |
| US20180357040A1 (en) * | 2017-06-09 | 2018-12-13 | Mitsubishi Electric Automotive America, Inc. | In-vehicle infotainment with multi-modal interface |
-
2019
- 2019-11-15 US US16/685,946 patent/US11348581B2/en active Active
-
2020
- 2020-07-10 EP EP20747296.0A patent/EP3997553A1/en active Pending
- 2020-07-10 CN CN202080049275.8A patent/CN114127665B/zh active Active
- 2020-07-10 JP JP2022500128A patent/JP7522177B2/ja active Active
- 2020-07-10 TW TW109123487A patent/TWI840587B/zh active
- 2020-07-10 PH PH1/2021/553219A patent/PH12021553219A1/en unknown
- 2020-07-10 BR BR112021026765A patent/BR112021026765A2/pt unknown
- 2020-07-10 WO PCT/US2020/041499 patent/WO2021011331A1/en not_active Ceased
- 2020-07-10 KR KR1020227000411A patent/KR20220031610A/ko active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014070872A2 (en) | 2012-10-30 | 2014-05-08 | Robert Bosch Gmbh | System and method for multimodal interaction with reduced distraction in operating vehicles |
| JP2018036902A (ja) | 2016-08-31 | 2018-03-08 | 島根県 | 機器操作システム、機器操作方法および機器操作プログラム |
| US20180329677A1 (en) | 2017-05-15 | 2018-11-15 | Apple Inc. | Multi-modal interfaces |
| WO2019026617A1 (ja) | 2017-08-01 | 2019-02-07 | ソニー株式会社 | 情報処理装置、及び情報処理方法 |
| CN110998718A (zh) | 2017-08-01 | 2020-04-10 | 索尼公司 | 信息处理设备和信息处理方法 |
| US20200152191A1 (en) | 2017-08-01 | 2020-05-14 | Sony Corporation | Information processor and information procesing method |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2021011331A1 (en) | 2021-01-21 |
| US20210012770A1 (en) | 2021-01-14 |
| US11348581B2 (en) | 2022-05-31 |
| EP3997553A1 (en) | 2022-05-18 |
| CN114127665B (zh) | 2024-10-08 |
| PH12021553219A1 (en) | 2022-11-21 |
| TWI840587B (zh) | 2024-05-01 |
| KR20220031610A (ko) | 2022-03-11 |
| TW202109245A (zh) | 2021-03-01 |
| BR112021026765A2 (pt) | 2022-02-15 |
| JP2022539794A (ja) | 2022-09-13 |
| CN114127665A (zh) | 2022-03-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7522177B2 (ja) | マルチモーダルユーザインターフェース | |
| CN111868824B (zh) | 用于情境感知控制的设备和方法 | |
| JP7745603B2 (ja) | ウェアラブルシステム発話処理 | |
| KR102740847B1 (ko) | 사용자 입력 처리 방법 및 이를 지원하는 전자 장치 | |
| KR20190090281A (ko) | 사운드를 제어하는 전자 장치 및 그 동작 방법 | |
| JP2022522748A (ja) | 発話処理エンジンのための入力の決定 | |
| US11895474B2 (en) | Activity detection on devices with multi-modal sensing | |
| CN104464737B (zh) | 声音验证系统和声音验证方法 | |
| JPWO2019187834A1 (ja) | 情報処理装置、情報処理方法、およびプログラム | |
| KR20240017404A (ko) | 탠덤 네트워크들을 사용한 잡음 억제 | |
| CN112639965A (zh) | 在包括多个设备的环境中的语音识别方法和设备 | |
| EP4285611B1 (en) | Psychoacoustic enhancement based on audio source directivity | |
| JP7757398B2 (ja) | 動的分類器を使用したユーザ音声アクティビティ検出 | |
| JP2018045192A (ja) | 音声対話装置および発話音量調整方法 | |
| KR102168812B1 (ko) | 사운드를 제어하는 전자 장치 및 그 동작 방법 | |
| US12537004B2 (en) | Voice recognition device having barge-in function and method thereof | |
| KR20210109722A (ko) | 사용자의 발화 상태에 기초하여 제어 정보를 생성하는 디바이스 및 그 제어 방법 | |
| KR102933433B1 (ko) | 근접 음성 분류 방법, 장치 및 기록 매체 | |
| US20260045260A1 (en) | Environment based user model creation and user verification |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| RD02 | Notification of acceptance of power of attorney |
Free format text: JAPANESE INTERMEDIATE CODE: A7422 Effective date: 20230104 |
|
| A521 | Request for written amendment filed |
Free format text: JAPANESE INTERMEDIATE CODE: A523 Effective date: 20230612 |
|
| A621 | Written request for application examination |
Free format text: JAPANESE INTERMEDIATE CODE: A621 Effective date: 20230612 |
|
| A977 | Report on retrieval |
Free format text: JAPANESE INTERMEDIATE CODE: A971007 Effective date: 20240527 |
|
| TRDD | Decision of grant or rejection written | ||
| A01 | Written decision to grant a patent or to grant a registration (utility model) |
Free format text: JAPANESE INTERMEDIATE CODE: A01 Effective date: 20240618 |
|
| A61 | First payment of annual fees (during grant procedure) |
Free format text: JAPANESE INTERMEDIATE CODE: A61 Effective date: 20240711 |
|
| R150 | Certificate of patent or registration of utility model |
Ref document number: 7522177 Country of ref document: JP Free format text: JAPANESE INTERMEDIATE CODE: R150 |