CN112165599A - Automatic conference summary generation method for video conference - Google Patents
Automatic conference summary generation method for video conference Download PDFInfo
- Publication number
- CN112165599A CN112165599A CN202011077651.2A CN202011077651A CN112165599A CN 112165599 A CN112165599 A CN 112165599A CN 202011077651 A CN202011077651 A CN 202011077651A CN 112165599 A CN112165599 A CN 112165599A
- Authority
- CN
- China
- Prior art keywords
- audio
- conference
- classes
- clustering
- speaker
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000006243 chemical reaction Methods 0.000 claims abstract description 25
- 239000012634 fragment Substances 0.000 claims abstract description 12
- 230000011218 segmentation Effects 0.000 claims abstract description 12
- 238000009826 distribution Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 10
- 230000007704 transition Effects 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 4
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000005259 measurement Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000005354 coacervation Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000005054 agglomeration Methods 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/14—Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Game Theory and Decision Science (AREA)
- Business, Economics & Management (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a conference summary automatic generation method for a video conference, which comprises the steps of firstly, segmentation judgment, namely segmenting original audio mixed in the video conference, marking conversion points of different speakers when the original audio is not the same speaker, and performing audio segmentation coding according to the conversion points; clustering, namely clustering all the segmented audio fragments respectively, clustering the audio fragments belonging to the same speaker together, and marking the clustered audio data; step three, identification, namely identifying the clustered audio data; and fourthly, performing sound-text conversion, namely performing sound-text conversion on the audio data after the clustering identification to generate and store a text file. The invention can be effectively applied to the video conference, automatically generates the text file and the summary text file of each speaker, thereby freeing the two hands of a conference recorder, improving the output efficiency and the user experience of the conference summary, having obvious effect and being convenient for popularization.
Description
Technical Field
The invention belongs to the technical field of communication, and particularly relates to an automatic conference summary generation method for a video conference.
Background
With the arrival of 5G and digital times, audio and video applications based on a 5G network and the digital times are more and more common, a video conference is used as an important business application of the large video times and is more and more widely used in governments, army, national enterprises and large and medium-sized enterprises, and how conference recorders perform conference preschool according to speech contents of different speakers in the video conference is a challenging task.
Among the prior art, video conference's meeting era automatic generation mainly records through the audio frequency input end at the meeting, then realize through the mode of manual entry, this implementation scheme does not consider video conference's particularity, whether someone carries out recording of audio frequency in the speech, not only can record invalid speech data, but also can waste conference resource, simultaneously in the sound-text translation stage, also can not intelligent conversion, need the user to select the audio file by hand, just can change, user experience is not good.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an automatic generation method of a conference summary for a video conference, which is simple in steps and convenient to implement, can be effectively applied to the video conference, automatically classifies speakers by using acoustic features and an AI identification module, and automatically generates a text file and a summary text file of each speaker according to a clustered audio file, thereby freeing both hands of a conference recorder, improving the output efficiency and user experience of the conference summary, having remarkable effect and being convenient for popularization.
In order to solve the technical problems, the invention adopts the technical scheme that: a method for automatically generating a conference summary for a videoconference, the method comprising the steps of:
step one, segmentation judgment
Dividing original audio mixed in a video conference, extracting voiceprint characteristics, comparing and identifying the front and rear divided audio, judging whether the two divided audio are the same speaker, marking conversion points of different speakers when the two divided audio are not the same speaker, and performing audio segmentation coding according to the conversion points to generate audio fragments;
step two, clustering
Clustering all the segmented audio fragments respectively, clustering the audio fragments belonging to the same speaker together, and marking the clustered audio data;
step three, recognition
Identifying the clustered audio data, and determining the participants corresponding to the marked audio data by combining the information of the participants in the conference;
step four, sound and text conversion
And performing sound-text conversion on the audio data after the clustering identification to generate and store a text file.
In the method for automatically generating the conference summary for the video conference, the first step of the segmentation judgment method includes distance measurement based and model search based.
The method for automatically generating the conference summary for the video conference comprises the following specific processes based on the distance measurement:
step A1, using a sliding window mechanism, wherein the window length is fixed and moves forwards by a fixed window difference;
a2, calculating a feature vector in a window, a mean value and a variance;
step A3, checking whether the characteristic vector in the window obeys Gaussian distribution, and when the characteristic vector obeys the Gaussian distribution, having no conversion point; when the gaussian distribution is not obeyed, there is a transition point.
The method for automatically generating the conference summary for the video conference comprises the following specific processes of model-based search:
step B1, performing model training on each segmented audio;
b2, calculating the Bayes value of the model corresponding to each section of audio;
step B3, comparing the Bayes values of the front and back sections of audio, and when the difference value is not greater than the threshold value, having no conversion point; when the difference is greater than the threshold, a transition point exists.
In the above automatic generation method of a conference summary for a video conference, the clustering method in the second step includes an aggregation level clustering algorithm AHC.
The method for automatically generating the conference summary for the video conference comprises the following specific processes of:
step C1, initializing, wherein each sample point is a class and has N classes, calculating the distance between every two classes, and setting a distance threshold;
step C2, comparing the minimum distance value between the two classes with a distance threshold value, and executing the step C3 when the minimum distance value between the two classes is smaller than the distance threshold value; stopping iteration when the minimum value of the distance between the two classes is not less than the distance threshold value;
c3, classifying the two classes with the minimum distance into one class, wherein the number of the classes is N-1;
and C4, calculating the distance between every two classes in the N-1 classes, and returning to the step C2.
In the method for automatically generating the conference summary for the video conference, the specific process of identification in the third step includes that the characteristic comparison and association are performed on the audio characteristic information of the audio channel in the conference and the clustered information according to the speaker number determined by clustering and the relevant audio characteristic of each audio channel in the conference participant information, so that the speaker information is identified.
The conference summary automatic generation method for the video conference comprises the steps of obtaining the name, the place and the area of a speaker according to the speaker information.
In the method for automatically generating the conference summary for the video conference, the text file in the fourth step includes a single text file obtained by converting the audio data of each speaker and a summary text file obtained by converting the identification of all speakers.
Compared with the prior art, the invention has the following advantages:
1. the method has simple steps and convenient realization.
2. According to the method, voice signals are segmented and sliced, then voiceprint characteristics of each fragment are extracted, clustering is carried out by adopting a coacervation hierarchical clustering algorithm, the number of speakers in the voice can be judged according to differences of the voiceprint characteristics, and finally voice splicing is carried out to obtain the voice of each separated person; when the difference degree of the voiceprint characteristics of each slice is compared, the threshold set by the system is a parameter which is difficult to determine, repeated verification and optimization of a considerable number of meetings are needed, the scheme is combined with an AI training and recognition module, the selection of the threshold is continuously optimized and adjusted, and the instantaneity and the accuracy of segmentation judgment are guaranteed.
3. The invention automatically translates the audio data into text information through the sound-text conversion module with the AI machine algorithm function, generates M +1 text files according to the number M of the identified speakers, and ensures the accuracy and the integrity of the text content, wherein the last file is the summary text file of all the speakers.
4. The invention can be effectively applied to a video conference, automatically classifies speakers by using the acoustic characteristics and AI identification modules, and automatically generates the text file and the summary text file of each speaker according to the clustered audio files, thereby liberating the hands of a conference recorder, improving the output efficiency and the user experience of the conference summary, having remarkable effect and being convenient for popularization.
In conclusion, the method provided by the invention has the advantages that the steps are simple, the realization is convenient, the method can be effectively applied to the video conference, the acoustic characteristics and the AI identification module are utilized to automatically classify the speakers, and the text file and the summary text file of each speaker are automatically generated according to the audio files after clustering, so that the two hands of a conference recorder are liberated, the output efficiency and the user experience of the conference summary are improved, the effect is obvious, and the popularization is convenient.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a flow chart of a distance metric based method of the present invention;
FIG. 3 is a flow chart of a method for model-based searching according to the present invention.
Detailed Description
As shown in fig. 1, the method for automatically generating a conference summary for a video conference according to the present invention includes the following steps:
step one, segmentation judgment
Dividing original audio mixed in a video conference, extracting voiceprint characteristics, comparing and identifying the front and rear divided audio, judging whether the two divided audio are the same speaker, marking conversion points of different speakers when the two divided audio are not the same speaker, and performing audio segmentation coding according to the conversion points to generate audio fragments;
step two, clustering
Clustering all the segmented audio fragments respectively, clustering the audio fragments belonging to the same speaker together, and marking the clustered audio data;
step three, recognition
Identifying the clustered audio data, and determining the participants corresponding to the marked audio data by combining the information of the participants in the conference;
in specific implementation, the number of speakers in the conference can be judged according to the difference of the voiceprint characteristics.
Step four, sound and text conversion
And performing sound-text conversion on the audio data after the clustering identification to generate and store a text file.
In the method, the segmentation judgment method in the first step comprises distance measurement-based and model-based search.
In specific implementation, the method is carried out based on combination of distance measurement and model search, and the real-time performance and accuracy of segmentation judgment are guaranteed.
In the method, as shown in fig. 2, the specific process based on the distance metric includes:
step A1, using a sliding window mechanism, wherein the window length is fixed and moves forwards by a fixed window difference;
a2, calculating a feature vector in a window, a mean value and a variance;
step A3, checking whether the characteristic vector in the window obeys Gaussian distribution, and when the characteristic vector obeys the Gaussian distribution, having no conversion point; when the gaussian distribution is not obeyed, there is a transition point.
In specific implementation, the speech features are independent from each other and obey Gaussian distribution, speech feature parameters of different speakers are different in probability distribution, and when two speakers speak in a section of audio, feature vectors do not obey Gaussian distribution.
In the method, as shown in fig. 3, the specific process of the model-based search includes:
step B1, performing model training on each segmented audio;
b2, calculating the Bayes value of the model corresponding to each section of audio;
step B3, comparing the Bayes values of the front and back sections of audio, and when the difference value is not greater than the threshold value, having no conversion point; when the difference is greater than the threshold, a transition point exists.
In the method, the clustering method in the second step comprises an agglomeration hierarchical clustering algorithm AHC.
In the method, the concrete process of the coacervation hierarchical clustering algorithm AHC comprises the following steps:
step C1, initializing, wherein each sample point is a class and has N classes, calculating the distance between every two classes, and setting a distance threshold;
step C2, comparing the minimum distance value between the two classes with a distance threshold value, and executing the step C3 when the minimum distance value between the two classes is smaller than the distance threshold value; stopping iteration when the minimum value of the distance between the two classes is not less than the distance threshold value;
c3, classifying the two classes with the minimum distance into one class, wherein the number of the classes is N-1;
and C4, calculating the distance between every two classes in the N-1 classes, and returning to the step C2.
In specific implementation, for the clustering algorithm AHC, the value of the distance threshold is important, the stopping position of the method is determined after the distance threshold is determined, the stopping position is determined, the category number is determined, and the category number is the number of speakers. Therefore, the distance threshold value needs to be obtained through repeated verification optimization and adjustment of a considerable number of meetings, and therefore, the distance threshold value is also selected through a continuous optimization and adjustment process.
In the method, the specific process of identification in the third step comprises the steps of determining the number of speakers according to the clustering, and comparing and associating the audio characteristic information of the audio channel in the conference with the clustered information according to the related audio characteristics of the audio channel of each person in the conference participant information, so as to identify the speaker information.
In the method, the speaker information includes a name, a location, and an area of the speaker.
In the method, the text files in the fourth step comprise a single text file obtained by converting the audio data of each speaker and a summary text file obtained by identifying and converting all speakers.
In specific implementation, according to a meeting ending mark when a meeting is ended, audio data is automatically translated into text information through a sound-text conversion module with an AI machine algorithm function, M +1 text files are generated according to the number M of identified speakers, and the last file is a summary text file of all the speakers, so that the accuracy and the integrity of text contents are ensured.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all simple modifications, changes and equivalent structural changes made to the above embodiment according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.
Claims (9)
1. A method for automatically generating a conference summary for a video conference, the method comprising the steps of:
step one, segmentation judgment
Dividing original audio mixed in a video conference, extracting voiceprint characteristics, comparing and identifying the front and rear divided audio, judging whether the two divided audio are the same speaker, marking conversion points of different speakers when the two divided audio are not the same speaker, and performing audio segmentation coding according to the conversion points to generate audio fragments;
step two, clustering
Clustering all the segmented audio fragments respectively, clustering the audio fragments belonging to the same speaker together, and marking the clustered audio data;
step three, recognition
Identifying the clustered audio data, and determining the participants corresponding to the marked audio data by combining the information of the participants in the conference;
step four, sound and text conversion
And performing sound-text conversion on the audio data after the clustering identification to generate and store a text file.
2. The method of claim 1, wherein the step of segmenting comprises distance metric based and model based searching.
3. The method of claim 2, wherein the distance metric based process comprises:
step A1, using a sliding window mechanism, wherein the window length is fixed and moves forwards by a fixed window difference;
a2, calculating a feature vector in a window, a mean value and a variance;
step A3, checking whether the characteristic vector in the window obeys Gaussian distribution, and when the characteristic vector obeys the Gaussian distribution, having no conversion point; when the gaussian distribution is not obeyed, there is a transition point.
4. The method for automatically generating a conference summary for a video conference as claimed in claim 2, wherein the specific process of model-based search comprises:
step B1, performing model training on each segmented audio;
b2, calculating the Bayes value of the model corresponding to each section of audio;
step B3, comparing the Bayes values of the front and back sections of audio, and when the difference value is not greater than the threshold value, having no conversion point; when the difference is greater than the threshold, a transition point exists.
5. The method of claim 1, wherein said clustering in step two comprises a hierarchical clustering algorithm AHC.
6. The method for automatically generating a conference summary for a video conference according to claim 5, wherein the specific process of the hierarchical clustering algorithm AHC comprises:
step C1, initializing, wherein each sample point is a class and has N classes, calculating the distance between every two classes, and setting a distance threshold;
step C2, comparing the minimum distance value between the two classes with a distance threshold value, and executing the step C3 when the minimum distance value between the two classes is smaller than the distance threshold value; stopping iteration when the minimum value of the distance between the two classes is not less than the distance threshold value;
c3, classifying the two classes with the minimum distance into one class, wherein the number of the classes is N-1;
and C4, calculating the distance between every two classes in the N-1 classes, and returning to the step C2.
7. The method as claimed in claim 1, wherein the step three of identifying comprises determining the number of speakers according to the cluster, and performing feature comparison and association between the audio feature information of the audio channel in the conference and the clustered information according to the audio features associated with the audio channels of each person in the conference participant information, thereby identifying the speaker information.
8. The method of claim 7, wherein the speaker information comprises a name, a location, and an area of the speaker.
9. The method of claim 8, wherein the text file in step four comprises a single text file converted from audio data of each speaker and a summary text file converted from all speaker identifications.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011077651.2A CN112165599A (en) | 2020-10-10 | 2020-10-10 | Automatic conference summary generation method for video conference |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011077651.2A CN112165599A (en) | 2020-10-10 | 2020-10-10 | Automatic conference summary generation method for video conference |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112165599A true CN112165599A (en) | 2021-01-01 |
Family
ID=73867950
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011077651.2A Pending CN112165599A (en) | 2020-10-10 | 2020-10-10 | Automatic conference summary generation method for video conference |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112165599A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113051426A (en) * | 2021-03-18 | 2021-06-29 | 深圳市声扬科技有限公司 | Audio information classification method and device, electronic equipment and storage medium |
CN113707130A (en) * | 2021-08-16 | 2021-11-26 | 北京搜狗科技发展有限公司 | Voice recognition method and device for voice recognition |
WO2022161264A1 (en) * | 2021-01-26 | 2022-08-04 | 阿里巴巴集团控股有限公司 | Audio signal processing method, conference recording and presentation method, device, system, and medium |
CN115100701A (en) * | 2021-03-08 | 2022-09-23 | 福建福清核电有限公司 | Conference speaker identity identification method based on artificial intelligence technology |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030048946A1 (en) * | 2001-09-07 | 2003-03-13 | Fuji Xerox Co., Ltd. | Systems and methods for the automatic segmentation and clustering of ordered information |
CN103530432A (en) * | 2013-09-24 | 2014-01-22 | 华南理工大学 | Conference recorder with speech extracting function and speech extracting method |
US9584946B1 (en) * | 2016-06-10 | 2017-02-28 | Philip Scott Lyren | Audio diarization system that segments audio input |
CN106971713A (en) * | 2017-01-18 | 2017-07-21 | 清华大学 | Speaker's labeling method and system based on density peaks cluster and variation Bayes |
CN110335612A (en) * | 2019-07-11 | 2019-10-15 | 招商局金融科技有限公司 | Minutes generation method, device and storage medium based on speech recognition |
CN110491392A (en) * | 2019-08-29 | 2019-11-22 | 广州国音智能科技有限公司 | A kind of audio data cleaning method, device and equipment based on speaker's identity |
-
2020
- 2020-10-10 CN CN202011077651.2A patent/CN112165599A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030048946A1 (en) * | 2001-09-07 | 2003-03-13 | Fuji Xerox Co., Ltd. | Systems and methods for the automatic segmentation and clustering of ordered information |
CN103530432A (en) * | 2013-09-24 | 2014-01-22 | 华南理工大学 | Conference recorder with speech extracting function and speech extracting method |
US9584946B1 (en) * | 2016-06-10 | 2017-02-28 | Philip Scott Lyren | Audio diarization system that segments audio input |
CN106971713A (en) * | 2017-01-18 | 2017-07-21 | 清华大学 | Speaker's labeling method and system based on density peaks cluster and variation Bayes |
CN110335612A (en) * | 2019-07-11 | 2019-10-15 | 招商局金融科技有限公司 | Minutes generation method, device and storage medium based on speech recognition |
CN110491392A (en) * | 2019-08-29 | 2019-11-22 | 广州国音智能科技有限公司 | A kind of audio data cleaning method, device and equipment based on speaker's identity |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022161264A1 (en) * | 2021-01-26 | 2022-08-04 | 阿里巴巴集团控股有限公司 | Audio signal processing method, conference recording and presentation method, device, system, and medium |
CN115100701A (en) * | 2021-03-08 | 2022-09-23 | 福建福清核电有限公司 | Conference speaker identity identification method based on artificial intelligence technology |
CN113051426A (en) * | 2021-03-18 | 2021-06-29 | 深圳市声扬科技有限公司 | Audio information classification method and device, electronic equipment and storage medium |
CN113707130A (en) * | 2021-08-16 | 2021-11-26 | 北京搜狗科技发展有限公司 | Voice recognition method and device for voice recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112165599A (en) | Automatic conference summary generation method for video conference | |
Lu et al. | Speaker change detection and tracking in real-time news broadcasting analysis | |
CN103700370A (en) | Broadcast television voice recognition method and system | |
WO2021073116A1 (en) | Method and apparatus for generating legal document, device and storage medium | |
WO2020238209A1 (en) | Audio processing method, system and related device | |
CN101529500A (en) | Content summarizing system, method, and program | |
CN103871424A (en) | Online speaking people cluster analysis method based on bayesian information criterion | |
US20220375492A1 (en) | End-To-End Speech Diarization Via Iterative Speaker Embedding | |
CN112633241B (en) | News story segmentation method based on multi-feature fusion and random forest model | |
CN101867742A (en) | Television system based on sound control | |
Lu et al. | Unsupervised speaker segmentation and tracking in real-time audio content analysis | |
CN101950564A (en) | Remote digital voice acquisition, analysis and identification system | |
CN114022923A (en) | Intelligent collecting and editing system | |
TWI769520B (en) | Multi-language speech recognition and translation method and system | |
CN113936236A (en) | Video entity relationship and interaction identification method based on multi-modal characteristics | |
CN116996337B (en) | Conference data management system and method based on Internet of things and microphone switching technology | |
Imoto et al. | Acoustic scene classification based on generative model of acoustic spatial words for distributed microphone array | |
CN110322883B (en) | Voice-to-text effect evaluation optimization method | |
CN110807370B (en) | Conference speaker identity noninductive confirmation method based on multiple modes | |
CN114547264A (en) | News diagram data identification method based on Mahalanobis distance and comparison learning | |
CN114155845A (en) | Service determination method and device, electronic equipment and storage medium | |
Wu et al. | Universal Background Models for Real-time Speaker Change Detection. | |
CN110400578A (en) | The generation of Hash codes and its matching process, device, electronic equipment and storage medium | |
CN111914777B (en) | Method and system for identifying robot instruction in cross-mode manner | |
CN117316165B (en) | Conference audio analysis processing method and system based on time sequence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210101 |
|
RJ01 | Rejection of invention patent application after publication |