CN116612542A - 基于多模态生物特征一致性的音视频人物识别方法及系统 - Google Patents
基于多模态生物特征一致性的音视频人物识别方法及系统 Download PDFInfo
- Publication number
- CN116612542A CN116612542A CN202310571748.6A CN202310571748A CN116612542A CN 116612542 A CN116612542 A CN 116612542A CN 202310571748 A CN202310571748 A CN 202310571748A CN 116612542 A CN116612542 A CN 116612542A
- Authority
- CN
- China
- Prior art keywords
- face
- gait
- features
- audio
- person
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000005021 gait Effects 0.000 claims abstract description 80
- 238000012216 screening Methods 0.000 claims abstract description 15
- 238000000926 separation method Methods 0.000 claims abstract description 8
- 210000000746 body region Anatomy 0.000 claims description 18
- 238000000605 extraction Methods 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 10
- 238000001228 spectrum Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 6
- 108010076504 Protein Sorting Signals Proteins 0.000 claims description 4
- 230000005236 sound signal Effects 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 3
- 239000003973 paint Substances 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 abstract description 6
- 238000013077 scoring method Methods 0.000 abstract description 3
- 238000013135 deep learning Methods 0.000 abstract description 2
- 238000004891 communication Methods 0.000 description 5
- 239000013598 vector Substances 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/70—Multimodal biometrics, e.g. combining information from different biometric modalities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/74—Image or video pattern matching; Proximity measures in feature spaces
- G06V10/761—Proximity, similarity or dissimilarity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
- G06V40/25—Recognition of walking or running movements, e.g. gait recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/10—Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Acoustics & Sound (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Life Sciences & Earth Sciences (AREA)
- Social Psychology (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Psychiatry (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Collating Specific Patterns (AREA)
Abstract
Description
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310571748.6A CN116612542A (zh) | 2023-05-19 | 2023-05-19 | 基于多模态生物特征一致性的音视频人物识别方法及系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310571748.6A CN116612542A (zh) | 2023-05-19 | 2023-05-19 | 基于多模态生物特征一致性的音视频人物识别方法及系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116612542A true CN116612542A (zh) | 2023-08-18 |
Family
ID=87674138
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310571748.6A Pending CN116612542A (zh) | 2023-05-19 | 2023-05-19 | 基于多模态生物特征一致性的音视频人物识别方法及系统 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116612542A (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117174092A (zh) * | 2023-11-02 | 2023-12-05 | 北京语言大学 | 基于声纹识别与多模态分析的移动语料转写方法及装置 |
-
2023
- 2023-05-19 CN CN202310571748.6A patent/CN116612542A/zh active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117174092A (zh) * | 2023-11-02 | 2023-12-05 | 北京语言大学 | 基于声纹识别与多模态分析的移动语料转写方法及装置 |
CN117174092B (zh) * | 2023-11-02 | 2024-01-26 | 北京语言大学 | 基于声纹识别与多模态分析的移动语料转写方法及装置 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mahmood et al. | WHITE STAG model: Wise human interaction tracking and estimation (WHITE) using spatio-temporal and angular-geometric (STAG) descriptors | |
Goh et al. | Micro-expression recognition: an updated review of current trends, challenges and solutions | |
Perveen et al. | Spontaneous expression recognition using universal attribute model | |
CN109472198B (zh) | 一种姿态鲁棒的视频笑脸识别方法 | |
Guanghui et al. | Multi-modal emotion recognition by fusing correlation features of speech-visual | |
Chauhan et al. | Study & analysis of different face detection techniques | |
More et al. | Hand gesture recognition system using image processing | |
Cheng et al. | Visual speaker authentication with random prompt texts by a dual-task CNN framework | |
Paul et al. | Extraction of facial feature points using cumulative histogram | |
Siddiqui et al. | Human action recognition: a construction of codebook by discriminative features selection approach | |
Tsitsoulis et al. | A methodology for extracting standing human bodies from single images | |
CN116612542A (zh) | 基于多模态生物特征一致性的音视频人物识别方法及系统 | |
Galiyawala et al. | Person retrieval in surveillance using textual query: a review | |
Hrkać et al. | Deep learning architectures for tattoo detection and de-identification | |
Sarin et al. | Cnn-based multimodal touchless biometric recognition system using gait and speech | |
Aly et al. | Arabic sign language recognition using spatio-temporal local binary patterns and support vector machine | |
Sujatha et al. | Lip feature extraction for visual speech recognition using Hidden Markov Model | |
CN116883900A (zh) | 一种基于多维生物特征的视频真伪鉴别方法和系统 | |
CN116778530A (zh) | 一种基于生成模型的跨外观行人重识别检测方法 | |
Mahbub et al. | One-shot-learning gesture recognition using motion history based gesture silhouettes | |
Xu et al. | A novel mid-level distinctive feature learning for action recognition via diffusion map | |
Gupta et al. | Comparative analysis of movement and tracking techniques for Indian sign language recognition | |
Vo et al. | Automatic hand gesture segmentation for recognition of Vietnamese sign language | |
Yazdi | Depth-based lip localization and identification of open or closed mouth, using kinect 2 | |
Mustafa et al. | An Efficient Lip-reading Method Using K-nearest Neighbor Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Li Hengda Inventor after: Zeng Ming Inventor after: Zheng Yinglin Inventor after: Lin Yuxin Inventor after: Song Haodong Inventor after: Zhang Xiangjun Inventor before: Zeng Ming Inventor before: Li Hengda Inventor before: Zheng Yinglin Inventor before: Lin Yuxin Inventor before: Song Haodong Inventor before: Zhang Xiangjun |