DE112021005482T5 - Ar- (augmented reality) gestützte selektive geräuscheinbindung aus der umgebung während der ausführung eines sprachbefehls - Google Patents

Ar- (augmented reality) gestützte selektive geräuscheinbindung aus der umgebung während der ausführung eines sprachbefehls Download PDF

Info

Publication number
DE112021005482T5
DE112021005482T5 DE112021005482.1T DE112021005482T DE112021005482T5 DE 112021005482 T5 DE112021005482 T5 DE 112021005482T5 DE 112021005482 T DE112021005482 T DE 112021005482T DE 112021005482 T5 DE112021005482 T5 DE 112021005482T5
Authority
DE
Germany
Prior art keywords
sounds
voice command
unit
selecting
visualization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
DE112021005482.1T
Other languages
German (de)
English (en)
Inventor
Clement Decrop
Tushar Agrawal
Jeremy R. Fox
Sarbajit K. Rakshit
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of DE112021005482T5 publication Critical patent/DE112021005482T5/de
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)
  • Selective Calling Equipment (AREA)
DE112021005482.1T 2020-11-24 2021-11-10 Ar- (augmented reality) gestützte selektive geräuscheinbindung aus der umgebung während der ausführung eines sprachbefehls Pending DE112021005482T5 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US17/102,687 US11978444B2 (en) 2020-11-24 2020-11-24 AR (augmented reality) based selective sound inclusion from the surrounding while executing any voice command
US17/102,687 2020-11-24
PCT/CN2021/129740 WO2022111282A1 (en) 2020-11-24 2021-11-10 Ar (augmented reality) based selective sound inclusion from the surrounding while executing any voice command

Publications (1)

Publication Number Publication Date
DE112021005482T5 true DE112021005482T5 (de) 2023-09-14

Family

ID=81657233

Family Applications (1)

Application Number Title Priority Date Filing Date
DE112021005482.1T Pending DE112021005482T5 (de) 2020-11-24 2021-11-10 Ar- (augmented reality) gestützte selektive geräuscheinbindung aus der umgebung während der ausführung eines sprachbefehls

Country Status (6)

Country Link
US (1) US11978444B2 (https=)
JP (1) JP7824008B2 (https=)
CN (1) CN116348950A (https=)
DE (1) DE112021005482T5 (https=)
GB (1) GB2616765B (https=)
WO (1) WO2022111282A1 (https=)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115079833B (zh) * 2022-08-24 2023-01-06 北京亮亮视野科技有限公司 基于体感控制的多层界面与信息可视化呈现方法及系统

Family Cites Families (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6270040B1 (en) 2000-04-03 2001-08-07 Kam Industries Model train control system
ATE400871T1 (de) * 2004-01-29 2008-07-15 Harman Becker Automotive Sys Multimodale dateneingabe
US8788589B2 (en) 2007-10-12 2014-07-22 Watchitoo, Inc. System and method for coordinating simultaneous edits of shared digital data
US8769510B2 (en) 2010-04-08 2014-07-01 The Mathworks, Inc. Identification and translation of program code executable by a graphical processing unit (GPU)
US8296151B2 (en) * 2010-06-18 2012-10-23 Microsoft Corporation Compound gesture-speech commands
US8223088B1 (en) 2011-06-09 2012-07-17 Google Inc. Multimode input field for a head-mounted display
US8971854B2 (en) * 2012-06-19 2015-03-03 Honeywell International Inc. System and method of speaker recognition
US9966075B2 (en) * 2012-09-18 2018-05-08 Qualcomm Incorporated Leveraging head mounted displays to enable person-to-person interactions
US10824310B2 (en) * 2012-12-20 2020-11-03 Sri International Augmented reality virtual personal assistant for external representation
US9092600B2 (en) * 2012-11-05 2015-07-28 Microsoft Technology Licensing, Llc User authentication on augmented reality display device
US9747900B2 (en) 2013-05-24 2017-08-29 Google Technology Holdings LLC Method and apparatus for using image data to aid voice recognition
US9582246B2 (en) 2014-03-04 2017-02-28 Microsoft Technology Licensing, Llc Voice-command suggestions based on computer context
US9293141B2 (en) 2014-03-27 2016-03-22 Storz Endoskop Produktions Gmbh Multi-user voice control system for medical devices
US10152987B2 (en) * 2014-06-23 2018-12-11 Google Llc Remote invocation of mobile device actions
FR3026543B1 (fr) 2014-09-29 2017-12-22 Christophe Guedon Procede d'aide au suivi d'une conversation pour personne malentendante
CN111427534B (zh) * 2014-12-11 2023-07-25 微软技术许可有限责任公司 能够实现可动作的消息传送的虚拟助理系统
US10146355B2 (en) * 2015-03-26 2018-12-04 Lenovo (Singapore) Pte. Ltd. Human interface device input fusion
US20170243582A1 (en) * 2016-02-19 2017-08-24 Microsoft Technology Licensing, Llc Hearing assistance with automated speech transcription
US10031967B2 (en) * 2016-02-29 2018-07-24 Rovi Guides, Inc. Systems and methods for using a trained model for determining whether a query comprising multiple segments relates to an individual query or several queries
JP6918471B2 (ja) 2016-11-24 2021-08-11 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America 対話補助システムの制御方法、対話補助システム、及び、プログラム
US11099716B2 (en) * 2016-12-23 2021-08-24 Realwear, Inc. Context based content navigation for wearable display
US11107469B2 (en) * 2017-01-18 2021-08-31 Sony Corporation Information processing apparatus and information processing method
WO2018140502A1 (en) * 2017-01-27 2018-08-02 Magic Leap, Inc. Antireflection coatings for metasurfaces
US20180261223A1 (en) 2017-03-13 2018-09-13 Amazon Technologies, Inc. Dialog management and item fulfillment using voice assistant system
US20200327890A1 (en) * 2017-11-28 2020-10-15 Sony Corporation Information processing device and information processing method
CN108363556A (zh) 2018-01-30 2018-08-03 百度在线网络技术(北京)有限公司 一种基于语音与增强现实环境交互的方法和系统
US10365885B1 (en) * 2018-02-21 2019-07-30 Sling Media Pvt. Ltd. Systems and methods for composition of audio content from multi-object audio
DE102018208703A1 (de) * 2018-06-01 2019-12-05 Volkswagen Aktiengesellschaft Verfahren zur Berechnung einer "augmented reality"-Einblendung für die Darstellung einer Navigationsroute auf einer AR-Anzeigeeinheit, Vorrichtung zur Durchführung des Verfahrens sowie Kraftfahrzeug und Computerprogramm
US10650829B2 (en) 2018-06-06 2020-05-12 International Business Machines Corporation Operating a voice response system in a multiuser environment
CN109272982A (zh) * 2018-09-07 2019-01-25 昆明盛策同辉数字科技有限责任公司 结合增强现实的tts语音实时播报方法、装置、存储介质及设备
US11120791B2 (en) 2018-11-15 2021-09-14 International Business Machines Corporation Collaborative artificial intelligence (AI) voice response system control for authorizing a command associated with a calendar event
KR20200072026A (ko) * 2018-12-12 2020-06-22 현대자동차주식회사 음성 인식 처리 장치 및 방법
KR101990284B1 (ko) * 2018-12-13 2019-06-18 주식회사 버넥트 음성인식을 이용한 지능형 인지기술기반 증강현실시스템
US10499179B1 (en) * 2019-01-01 2019-12-03 Philip Scott Lyren Displaying emojis for binaural sound
JP2020141235A (ja) 2019-02-27 2020-09-03 パナソニックIpマネジメント株式会社 機器制御システム、機器制御方法及びプログラム
US11170774B2 (en) * 2019-05-21 2021-11-09 Qualcomm Incorproated Virtual assistant device
CN110413106B (zh) * 2019-06-18 2024-02-09 中国人民解放军军事科学院国防科技创新研究院 一种基于语音和手势的增强现实输入方法及系统

Also Published As

Publication number Publication date
JP7824008B2 (ja) 2026-03-04
GB2616765A (en) 2023-09-20
WO2022111282A1 (en) 2022-06-02
JP2023551169A (ja) 2023-12-07
GB2616765B (en) 2025-03-05
CN116348950A (zh) 2023-06-27
US20220165260A1 (en) 2022-05-26
US11978444B2 (en) 2024-05-07

Similar Documents

Publication Publication Date Title
DE102022102912A1 (de) Pipelines für effizientes training und einsatz von modellen für maschinelles lernen
DE102021125855B4 (de) Selbstlernende sprachsteuerung durch künstliche intelligenz auf grundlage eines benutzerverhaltens während einer interaktion
DE112020005323B4 (de) Elastische ausführung von machine-learning-arbeitslasten unter verwendung einer anwendungsbasierten profilierung
DE112021004261T5 (de) Dualmodale beziehungsnetzwerke zur audiovisuellen ereignislokalisierung
DE112020005253T5 (de) Auflösung von anaphern
DE112020000545T5 (de) Deep-forest-modell-entwicklung und -training
DE112021004163T5 (de) Zuschneiden eines kommunikationsinhalts
DE112018005227T5 (de) Merkmalsextraktion mithilfe von multi-task-lernen
DE112018005167T5 (de) Aktualisieren von trainingsdaten
US20080162310A1 (en) System and Method for Creating and Implementing Community Defined Presentation Structures
DE112021004234T5 (de) Einsetzen von metalernen zum optimieren der automatischen auswahl von pipelinesdes maschinellen lernens
DE102017207686A1 (de) Einblicke in die belegschaftsstrategie
DE102023108430A1 (de) Erzeugung von konversationellen erwiderungen unter verwendung von neuralen netzwerken
DE112021005633T5 (de) Lernen eines abgleichens ungepaarter multimodaler merkmale für halbüberwachtes lernen
DE112021002572T5 (de) Multikriterielles optimieren von anwendungen
DE112022004517T5 (de) Optimierung von lippensynchronisation in einem in natürliche sprache übersetzten video
DE102021124264A1 (de) Erzeugung von synthetischen Systemfehlern
DE112015005269T5 (de) Erweitern einer Informationsanforderung
DE112018001711T5 (de) Generator von Unterrichtsnotizen auf Blickrichtungsgrundlage
DE102024136304A1 (de) Prompteignungsanalyse für sprachmodellbasierte ki-systeme und anwendungen
DE112021001550T5 (de) Automatisches erstellen von verbesserungen an av inhalten
DE112020004925T5 (de) Aktualisieren und umsetzen eines dokuments aus einem audiovorgang
DE112020005296T5 (de) Durchsuchen von gesprächsprotokollen eines systems mit virtuellen dialogagenten nach kontrastierenden zeitlichen mustern
DE112022001431T5 (de) Adaptive auswahl von datenmodalitäten für eine effiziente videoerkennung
DE112022001483B4 (de) Betriebsbefehlsgrenzen

Legal Events

Date Code Title Description
R012 Request for examination validly filed
R084 Declaration of willingness to licence
R016 Response to examination communication