CN116312552B - 一种视频说话人日志方法及系统 - Google Patents
一种视频说话人日志方法及系统 Download PDFInfo
- Publication number
- CN116312552B CN116312552B CN202310569405.6A CN202310569405A CN116312552B CN 116312552 B CN116312552 B CN 116312552B CN 202310569405 A CN202310569405 A CN 202310569405A CN 116312552 B CN116312552 B CN 116312552B
- Authority
- CN
- China
- Prior art keywords
- speaker
- video
- attribute information
- target
- audio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/14—Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310569405.6A CN116312552B (zh) | 2023-05-19 | 2023-05-19 | 一种视频说话人日志方法及系统 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310569405.6A CN116312552B (zh) | 2023-05-19 | 2023-05-19 | 一种视频说话人日志方法及系统 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN116312552A CN116312552A (zh) | 2023-06-23 |
| CN116312552B true CN116312552B (zh) | 2023-08-15 |
Family
ID=86836329
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310569405.6A Active CN116312552B (zh) | 2023-05-19 | 2023-05-19 | 一种视频说话人日志方法及系统 |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116312552B (zh) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116823598B (zh) * | 2023-08-29 | 2023-11-17 | 湖北微模式科技发展有限公司 | 基于图片隐写及模糊比对的操作记录可回溯方法 |
| CN117523683B (zh) * | 2024-01-05 | 2024-03-29 | 湖北微模式科技发展有限公司 | 一种基于生物特征识别的欺诈视频检测方法 |
| CN121126077A (zh) * | 2025-11-12 | 2025-12-12 | 广州趣丸网络科技有限公司 | 用于影视类视频的跨模态多说话人分割方法及相关设备 |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20200129934A (ko) * | 2019-05-10 | 2020-11-18 | 네이버 주식회사 | 오디오-비주얼 데이터에 기반한 화자 분리 방법 및 장치 |
| CN112906544A (zh) * | 2021-02-07 | 2021-06-04 | 广东电网有限责任公司广州供电局 | 一种适用于多目标的基于声纹和人脸的匹配方法 |
| CN114125365A (zh) * | 2021-11-25 | 2022-03-01 | 京东方科技集团股份有限公司 | 视频会议方法、装置及可读存储介质 |
| CN114282621A (zh) * | 2021-12-29 | 2022-04-05 | 湖北微模式科技发展有限公司 | 一种多模态融合的话者角色区分方法与系统 |
| CN114299953A (zh) * | 2021-12-29 | 2022-04-08 | 湖北微模式科技发展有限公司 | 一种结合嘴部运动分析的话者角色区分方法与系统 |
| CN115050375A (zh) * | 2021-02-26 | 2022-09-13 | 华为技术有限公司 | 一种设备的语音操作方法、装置和电子设备 |
| CN115937726A (zh) * | 2021-05-31 | 2023-04-07 | 华为云计算技术有限公司 | 说话人检测方法、装置、设备及计算机可读存储介质 |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW201513095A (zh) * | 2013-09-23 | 2015-04-01 | Hon Hai Prec Ind Co Ltd | 語音處理系統、裝置及方法 |
| US11475899B2 (en) * | 2018-01-23 | 2022-10-18 | Cirrus Logic, Inc. | Speaker identification |
| CN112148922A (zh) * | 2019-06-28 | 2020-12-29 | 鸿富锦精密工业(武汉)有限公司 | 会议记录方法、装置、数据处理设备及可读存储介质 |
| KR20220138924A (ko) * | 2021-04-06 | 2022-10-14 | 주식회사 솔루게이트 | 음성인식 및 성문인식을 통한 음성인증 시스템 |
-
2023
- 2023-05-19 CN CN202310569405.6A patent/CN116312552B/zh active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20200129934A (ko) * | 2019-05-10 | 2020-11-18 | 네이버 주식회사 | 오디오-비주얼 데이터에 기반한 화자 분리 방법 및 장치 |
| CN112906544A (zh) * | 2021-02-07 | 2021-06-04 | 广东电网有限责任公司广州供电局 | 一种适用于多目标的基于声纹和人脸的匹配方法 |
| CN115050375A (zh) * | 2021-02-26 | 2022-09-13 | 华为技术有限公司 | 一种设备的语音操作方法、装置和电子设备 |
| CN115937726A (zh) * | 2021-05-31 | 2023-04-07 | 华为云计算技术有限公司 | 说话人检测方法、装置、设备及计算机可读存储介质 |
| CN114125365A (zh) * | 2021-11-25 | 2022-03-01 | 京东方科技集团股份有限公司 | 视频会议方法、装置及可读存储介质 |
| CN114282621A (zh) * | 2021-12-29 | 2022-04-05 | 湖北微模式科技发展有限公司 | 一种多模态融合的话者角色区分方法与系统 |
| CN114299953A (zh) * | 2021-12-29 | 2022-04-08 | 湖北微模式科技发展有限公司 | 一种结合嘴部运动分析的话者角色区分方法与系统 |
Non-Patent Citations (1)
| Title |
|---|
| Voice Recognition and Voice Comparison using Machine Learning Techniques: A Survey;Nishtha H. Tandel;2020 6th International Conference on Advanced Computing and Communication Systems;第459-461页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116312552A (zh) | 2023-06-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN116312552B (zh) | 一种视频说话人日志方法及系统 | |
| JP6463825B2 (ja) | 多重話者音声認識修正システム | |
| US10068588B2 (en) | Real-time emotion recognition from audio signals | |
| US9798934B2 (en) | Method and apparatus for providing combined-summary in imaging apparatus | |
| JP6323947B2 (ja) | 音響イベント認識装置、及びプログラム | |
| CN111128223A (zh) | 一种基于文本信息的辅助说话人分离方法及相关装置 | |
| CN113923521B (zh) | 一种视频的脚本化方法 | |
| US11355099B2 (en) | Word extraction device, related conference extraction system, and word extraction method | |
| CN107305541A (zh) | 语音识别文本分段方法及装置 | |
| CN105512348A (zh) | 用于处理视频和相关音频的方法和装置及检索方法和装置 | |
| JP2019053126A (ja) | 成長型対話装置 | |
| CN113838460A (zh) | 视频语音识别方法、装置、设备和存储介质 | |
| Ding et al. | Audio-visual keyword spotting based on multidimensional convolutional neural network | |
| US10930283B2 (en) | Sound recognition device and sound recognition method applied therein | |
| CN113345423B (zh) | 语音端点检测方法、装置、电子设备和存储介质 | |
| JP6915637B2 (ja) | 情報処理装置、情報処理方法、およびプログラム | |
| CN113129895A (zh) | 一种语音检测处理系统 | |
| US20220399030A1 (en) | Systems and Methods for Voice Based Audio and Text Alignment | |
| CN111415128A (zh) | 控制会议的方法、系统、装置、设备和介质 | |
| CN118355436A (zh) | 用于基于语言识别执行说话人日志的方法及设备 | |
| CN114944149B (zh) | 语音识别方法、语音识别设备及计算机可读存储介质 | |
| CN113539235B (zh) | 文本分析与语音合成方法、装置、系统及存储介质 | |
| CN113539234B (zh) | 语音合成方法、装置、系统及存储介质 | |
| US12087307B2 (en) | Method and apparatus for performing speaker diarization on mixed-bandwidth speech signals | |
| Chiţu¹ et al. | Automatic visual speech recognition |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
| PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A Method and System for Video Speaker Logging Effective date of registration: 20230926 Granted publication date: 20230815 Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd. Pledgor: HUBEI MICROPATTERN TECHNOLOGY DEVELOPMENT CO.,LTD. Registration number: Y2023980058723 |
|
| PC01 | Cancellation of the registration of the contract for pledge of patent right | ||
| PC01 | Cancellation of the registration of the contract for pledge of patent right |
Granted publication date: 20230815 Pledgee: Guanggu Branch of Wuhan Rural Commercial Bank Co.,Ltd. Pledgor: HUBEI MICROPATTERN TECHNOLOGY DEVELOPMENT CO.,LTD. Registration number: Y2023980058723 |
|
| PE01 | Entry into force of the registration of the contract for pledge of patent right | ||
| PE01 | Entry into force of the registration of the contract for pledge of patent right |
Denomination of invention: A Video Speaker Log Method and System Granted publication date: 20230815 Pledgee: Industrial Bank Limited by Share Ltd. Wuhan branch Pledgor: HUBEI MICROPATTERN TECHNOLOGY DEVELOPMENT CO.,LTD. Registration number: Y2024980046641 |