CN115461811A - 用于多方交互的多模态波束成形和注意力过滤 - Google Patents
用于多方交互的多模态波束成形和注意力过滤 Download PDFInfo
- Publication number
- CN115461811A CN115461811A CN202180031587.0A CN202180031587A CN115461811A CN 115461811 A CN115461811 A CN 115461811A CN 202180031587 A CN202180031587 A CN 202180031587A CN 115461811 A CN115461811 A CN 115461811A
- Authority
- CN
- China
- Prior art keywords
- computing device
- user
- users
- implementations
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000003993 interaction Effects 0.000 title claims description 117
- 238000001914 filtration Methods 0.000 title claims description 44
- 238000000034 method Methods 0.000 claims abstract description 96
- 238000005259 measurement Methods 0.000 claims abstract description 75
- 238000003384 imaging method Methods 0.000 claims abstract description 40
- 238000003331 infrared imaging Methods 0.000 claims abstract description 10
- 230000002085 persistent effect Effects 0.000 claims abstract description 6
- 238000012545 processing Methods 0.000 claims description 39
- 230000033001 locomotion Effects 0.000 claims description 35
- 230000008569 process Effects 0.000 claims description 30
- 238000004891 communication Methods 0.000 claims description 22
- 238000004458 analytical method Methods 0.000 claims description 20
- 230000008921 facial expression Effects 0.000 claims description 13
- 230000002452 interceptive effect Effects 0.000 claims description 13
- 230000004044 response Effects 0.000 claims description 8
- 230000008859 change Effects 0.000 claims description 7
- 230000001815 facial effect Effects 0.000 claims description 7
- 230000003595 spectral effect Effects 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims 1
- 238000003860 storage Methods 0.000 description 50
- 238000011156 evaluation Methods 0.000 description 45
- 230000004927 fusion Effects 0.000 description 36
- 238000007726 management method Methods 0.000 description 19
- 230000008447 perception Effects 0.000 description 18
- 230000008451 emotion Effects 0.000 description 16
- 230000009467 reduction Effects 0.000 description 14
- 238000001514 detection method Methods 0.000 description 13
- 241000282326 Felis catus Species 0.000 description 10
- 210000003128 head Anatomy 0.000 description 10
- 238000012360 testing method Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 8
- 230000009466 transformation Effects 0.000 description 8
- 230000009471 action Effects 0.000 description 7
- 238000001931 thermography Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000013500 data storage Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000007774 longterm Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 241000282412 Homo Species 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000007935 neutral effect Effects 0.000 description 2
- IAKOZHOLGAGEJT-UHFFFAOYSA-N 1,1,1-trichloro-2,2-bis(p-methoxyphenyl)-Ethane Chemical compound C1=CC(OC)=CC=C1C(C(Cl)(Cl)Cl)C1=CC=C(OC)C=C1 IAKOZHOLGAGEJT-UHFFFAOYSA-N 0.000 description 1
- 230000003187 abdominal effect Effects 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000004378 air conditioning Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000036772 blood pressure Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012911 target assessment Methods 0.000 description 1
- 230000009012 visual motion Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/01—Customer relationship services
- G06Q30/015—Providing customer assistance, e.g. assisting a customer within a business location or via helpdesk
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
- G06V40/176—Dynamic expression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L2021/02087—Noise filtering the noise being separate speech, e.g. cocktail party
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Strategic Management (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Resources & Organizations (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Game Theory and Decision Science (AREA)
- Tourism & Hospitality (AREA)
- Entrepreneurship & Innovation (AREA)
- Signal Processing (AREA)
- Ophthalmology & Optometry (AREA)
- Data Mining & Analysis (AREA)
- Operations Research (AREA)
- Computing Systems (AREA)
- Primary Health Care (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- User Interface Of Digital Computer (AREA)
- Manipulator (AREA)
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202062983595P | 2020-02-29 | 2020-02-29 | |
US62/983,595 | 2020-02-29 | ||
US202163154727P | 2021-02-27 | 2021-02-27 | |
US63/154,727 | 2021-02-27 | ||
PCT/US2021/020148 WO2021174162A1 (fr) | 2020-02-29 | 2021-02-28 | Formation de faisceau multimodale et filtrage d'attention pour interactions multiparties |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115461811A true CN115461811A (zh) | 2022-12-09 |
Family
ID=77491638
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202180031587.0A Pending CN115461811A (zh) | 2020-02-29 | 2021-02-28 | 用于多方交互的多模态波束成形和注意力过滤 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20220180887A1 (fr) |
EP (1) | EP4111446A4 (fr) |
CN (1) | CN115461811A (fr) |
WO (1) | WO2021174162A1 (fr) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220284915A1 (en) * | 2021-03-04 | 2022-09-08 | Orcam Technologies Ltd. | Separation of signals based on direction of arrival |
WO2024064468A1 (fr) * | 2022-09-20 | 2024-03-28 | Qualcomm Incorporated | Interface utilisateur vocale assistée par détection de radiofréquence |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150314454A1 (en) * | 2013-03-15 | 2015-11-05 | JIBO, Inc. | Apparatus and methods for providing a persistent companion device |
US11250844B2 (en) * | 2017-04-12 | 2022-02-15 | Soundhound, Inc. | Managing agent engagement in a man-machine dialog |
EP3642835A4 (fr) * | 2017-08-03 | 2021-01-06 | Telepathy Labs, Inc. | Agent virtuel proactif, intelligent et omnicanal |
EP3752957A4 (fr) * | 2018-02-15 | 2021-11-17 | DMAI, Inc. | Système et procédé de compréhension de la parole par l'intermédiaire d'une reconnaissance vocale basée sur des signaux audio et visuel intégrés |
-
2021
- 2021-02-28 EP EP21759786.3A patent/EP4111446A4/fr active Pending
- 2021-02-28 CN CN202180031587.0A patent/CN115461811A/zh active Pending
- 2021-02-28 US US17/622,703 patent/US20220180887A1/en active Pending
- 2021-02-28 WO PCT/US2021/020148 patent/WO2021174162A1/fr unknown
Also Published As
Publication number | Publication date |
---|---|
EP4111446A4 (fr) | 2024-04-17 |
US20220180887A1 (en) | 2022-06-09 |
WO2021174162A1 (fr) | 2021-09-02 |
EP4111446A1 (fr) | 2023-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20220012470A1 (en) | Multi-user intelligent assistance | |
KR102334942B1 (ko) | 돌봄 로봇을 위한 데이터 처리 방법 및 장치 | |
US11010601B2 (en) | Intelligent assistant device communicating non-verbal cues | |
JP2021057057A (ja) | 精神障害の療法のためのモバイルおよびウェアラブルビデオ捕捉およびフィードバックプラットフォーム | |
US11407106B2 (en) | Electronic device capable of moving and operating method thereof | |
US20220241985A1 (en) | Systems and methods to manage conversation interactions between a user and a robot computing device or conversation agent | |
JP2021503625A (ja) | 対話セッション管理用のシステム及び方法 | |
US20220180887A1 (en) | Multimodal beamforming and attention filtering for multiparty interactions | |
US20240152705A1 (en) | Systems And Methods For Short- and Long- Term Dialog Management Between A Robot Computing Device/Digital Companion And A User | |
US20200143235A1 (en) | System and method for providing smart objects virtual communication | |
US20230274743A1 (en) | Methods and systems enabling natural language processing, understanding, and generation | |
US20220207426A1 (en) | Method of semi-supervised data collection and machine learning leveraging distributed computing devices | |
CN111919250B (zh) | 传达非语言提示的智能助理设备 | |
Saxena et al. | Virtual Assistant with Facial Expession Recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |