US10650826B2 - Diarization using acoustic labeling - Google Patents
Diarization using acoustic labeling Download PDFInfo
- Publication number
- US10650826B2 US10650826B2 US16/594,812 US201916594812A US10650826B2 US 10650826 B2 US10650826 B2 US 10650826B2 US 201916594812 A US201916594812 A US 201916594812A US 10650826 B2 US10650826 B2 US 10650826B2
- Authority
- US
- United States
- Prior art keywords
- speaker
- audio
- segments
- audio file
- acoustic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000002372 labelling Methods 0.000 title claims description 10
- 238000013518 transcription Methods 0.000 claims description 33
- 230000035897 transcription Effects 0.000 claims description 33
- 238000012545 processing Methods 0.000 claims description 26
- 230000003993 interaction Effects 0.000 claims description 13
- 238000000034 method Methods 0.000 abstract description 34
- 239000003795 chemical substances by application Substances 0.000 description 41
- 239000013598 vector Substances 0.000 description 13
- 230000008569 process Effects 0.000 description 11
- 238000000926 separation method Methods 0.000 description 10
- 238000012549 training Methods 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000002513 implantation Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000000699 topical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
-
- G10L17/005—
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
Definitions
- the present disclosure is related to the field of automated transcription. More specifically, the present disclosure is related to diarization using acoustic labeling.
- Speech transcription and speech analytics of audio data may be enhanced by a process of diarization wherein audio data that contains multiple speakers is separated into segments of audio data typically to a single speaker. While speaker separation in diarization facilitates later transcription and/or speech analytics, further identification or discrimination between the identified speakers can further facilitate these processes by enabling the association of further context and information in later transcription and speech analytics processes specific to an identified speaker.
- Systems and methods as disclosed herein present solutions to improve diarization using acoustic models to identify and label at least one speaker separated from the audio data.
- Previous attempts to create individualized acoustic voiceprint models are time intensive in that an identified speaker must recorded training speech into the system or the underlying data must be manually separated to ensure that only speech from the identified speak is used. Recorded training speech further has limitation as the speakers are likely to speak differently than when the speaker is in the middle of a live interaction with another person.
- An embodiment of a method of diarization of audio files includes receiving speaker metadata associated with each of a plurality of audio files. A set of audio files of the plurality belonging to a specific speaker are identified based upon the received speaker metadata. A sub set of the audio files of the identified set of audio files is selected. An acoustic voiceprint for the specific speaker is computed from the selected subset of audio files. The acoustic voiceprint is applied to a new audio file to identify a specific speaker in the diarization of the new audio file.
- An exemplary embodiment of a method of diarization of audio files of a customer service interaction between at least one agent and at least one customer includes receiving agent metadata associated with each of a plurality of audio files.
- a set of audio files of the plurality of audio files associated to a specific agent is identified based upon the received agent metadata.
- a subset of the audio files of the identified set of audio files are selected that maximize an acoustical difference between audio data of an agent and audio data of at least one other speaker in each of the audio files.
- An acoustic voiceprint is computed from the audio data of the agent in the selected subset. The acoustic voiceprint is applied to a new audio file to identify the agent in diarization of the new audio file.
- An exemplary embodiment of a system for diarization of audio data includes a database of audio files, each audio file of the database being associated with metadata identifying at least one speaker in the audio file.
- a processor is communicatively connected to the database. The processor selects a set of audio files with the same speaker based upon the metadata. The processor filters the selected set to a subset of the audio files that maximize an acoustical difference between audio data of at least two speakers in an audio file. The processor creates an acoustic voiceprint for the speaker identified by the metadata.
- a database includes a plurality of acoustic voiceprints, each acoustic voiceprint of the plurality is associated with a speaker.
- An audio source provides new audio data to the processor with metadata that identified at least one speaker in the audio data.
- the processor selects an acoustic voiceprint from the plurality of acoustic voiceprints based upon the metadata and applies the selected acoustic voiceprint to the new audio data to identify audio data of the speaker in the new audio data for diarization of the new audio data.
- FIG. 1 is a flow chart that depicts an embodiment of a method of diarization.
- FIG. 2 is a flow chart that depicts an embodiment of creating and using an acoustic voiceprint model.
- FIG. 3 is a system diagram of an exemplary embodiment of a system for diarization of audio files.
- Embodiments of a diarization process disclosed herein include a first optional step of a speech-to-text transcription of an audio file to be diarized. Next, a “blind” diarization of the audio file is performed.
- the audio file is exemplarily a .WAV file.
- the blind diarization receives the audio file and optionally the automatically generated transcript. This diarization is characterized as “blind” as the diarization is performed prior to an identification of the speakers.
- the “blind diarization” may only cluster the audio data into speakers while it may still be undetermined which speaker is the agent and which speaker is the customer.
- the blind diarization is followed by a speaker diarization wherein a voiceprint model that represents the speech and/or information content of an identified speaker in the audio data is compared to the identified speech segments associated with the separated speakers. Through this comparison, one speaker can be selected as the known speaker, while the other speaker is identified as the other speaker.
- the customer agent will have a voiceprint model as disclosed herein which is used to identify one of the separated speaker as the agent while the other speaker is the customer.
- the identification of segments in an audio file can facilitate increased accuracy in transcription, diarization, speaker adaption, and/or speech analytics of the audio file.
- An initial transcription exemplarily from a fast speech-to-text engine, can be used to more accurately identify speech segments in an audio file, such as an audio stream or recording, resulting in more accurate diarization and/or speech adaptation.
- FIGS. 1 and 2 are flow charts that respectively depict exemplary embodiments of method 100 of diarization and a method 200 of creating and using an acoustic voiceprint model.
- FIG. 3 is a system diagram of an exemplary embodiment of a system 300 for creating and using an acoustic voiceprint model.
- the system 300 is generally a computing system that includes a processing system 306 , storage system 304 , software 302 , communication interface 308 and a user interface 310 .
- the processing system 306 loads and executes software 302 from the storage system 304 , including a software module 330 .
- software module 330 directs the processing system 306 to operate as described in herein in further detail in accordance with the method 100 and alternatively the method 200 .
- computing system 300 as depicted in FIG. 3 includes one software module in the present example, it should be understood that one or more modules could provide the same operation.
- description as provided herein refers to a computing system 300 and a processing system 306 , it is to be recognized that implementations of such systems can be performed using one or more processors, which may be communicatively connected, and such implementations are considered to be within the scope of the description.
- the processing system 306 can comprise a microprocessor and other circuitry that retrieves and executes software 302 from storage system 304 .
- Processing system 306 can be implemented within a single processing device but can also be distributed across multiple processing devices or sub-systems that cooperate in existing program instructions. Examples of processing system 306 include general purpose central processing units, application specific processors, and logic devices, as well as any other type of processing device, combinations of processing devices, or variations thereof.
- the storage system 304 can comprise any storage media readable by processing system 306 , and capable of storing software 302 .
- the storage system 304 can include volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
- Storage system 304 can be implemented as a single storage device but may also be implemented across multiple storage devices or sub-systems.
- Storage system 304 can further include additional elements, such as a controller capable of communicating with the processing system 306 .
- Examples of storage media include random access memory, read only memory, magnetic discs, optical discs, flash memory, virtual memory, and non-virtual memory, magnetic sets, magnetic tape, magnetic disc storage or other magnetic storage devices, or any other medium which can be used to storage the desired information and that may be accessed by an instruction execution system, as well as any combination or variation thereof, or any other type of storage medium.
- the storage media can be a non-transitory storage media.
- at least a portion of the storage media may be transitory. It should be understood that in no case is the storage media a propogated signal.
- User interface 310 can include a mouse, a keyboard, a voice input device, a touch input device for receiving a gesture from a user, a motion input device for detecting non-touch gestures and other motions by a user, and other comparable input devices and associated processing elements capable of receiving user input from a user.
- Output devices such as a video display or graphical display can display an interface further associated with embodiments of the system and method as disclosed herein. Speakers, printers, haptic devices and other types of output devices may also be included in the user interface 310 .
- the computing system 200 receives an audio file 320 .
- the audio file 320 may be an audio recording or a conversation, which may exemplarily be between two speakers, although the audio recording may be any of a variety of other audio records, including multiple speakers, a single speaker, or an automated or recorded auditory message.
- the audio file may be streaming audio data received in real time or near-real time by the computing system 300 .
- FIG. 1 is a flow chart that depicts an embodiment of a method of diarization 100 .
- Audio data 102 is exemplarily an audio recording of a conversation exemplarily between two or more speakers.
- the audio file may exemplarily be a .WAV file, but may also be other types of audio or video files, for example, pulse code modulated (PCM) formatted audio, and more specifically, linear pulse code modulated (LPCM) audio files.
- PCM pulse code modulated
- LPCM linear pulse code modulated
- the audio data is exemplarily a mono audio file; however, it is recognized that embodiments of the method disclosed herein may also be used with stereo audio files.
- One feature of the method disclosed herein is that speaker separation and diarization can be achieved in mono audio files where stereo speaker separation techniques are not available.
- the audio data 102 further comprises or is associated to metadata 108 .
- the metadata 108 can exemplarily include an identification number for one or more of the speakers in the audio data 102 .
- the metadata 108 may provide information regarding context or content of the audio data 102 , including a topic, time, date, location etc. In the context of a customer service call center, the metadata 108 provides a customer service agent identification.
- the audio data 102 and the metadata 108 are provided to a speech-to-text (STT) server 104 , which may employ any of a variety of method of techniques for automatic speech recognition (ASR) to create an automated speech-to-text transcription 106 from the audio file.
- the transcription performed by the STT server at 104 can exemplarily be a large-vocabulary continuous speech recognition (LVCSR) and the audio data 102 provided to the STT server 104 can alternatively be a previously recorded audio file or can be streaming audio data obtained from an ongoing communication between two speakers.
- LVCSR large-vocabulary continuous speech recognition
- the STT server 104 may use the received metadata 108 to select one or more models or techniques for producing the automated transcription cased upon the metadata 108 .
- an identification of one of the speakers in the audio data can be used to select a topical linguistic model based upon a context area associate with the speaker.
- STT server 104 may also output time stamps associate with particular transcription segments, words, or phrases, and may also include a confidence score in the automated transcription.
- the transcription 106 may also identify homogeneous speaker speech segments. Homogenous speech segments are those segments of the transcription that have a high likelihood of originating from a single speaker. The speech segments may exemplarily be phonemes, words, or sentences.
- both the audio data 102 and the transcription 106 are used for a blind diarization at 110 .
- the blind diarization may be performed without the transcription 106 and may be applied directly to the audio data 102 .
- the features at 104 and 106 as described above may not be used.
- the diarization is characterized as blind as the identities of the speakers (e.g. agent, customer) are not known at this stage and therefore the diarization 110 merely discriminates between a first speaker (speaker 1 ) and a second speaker (speaker 2 ), or more. Additionally, in some embodiments, those segments for which a speaker cannot be reliably determined may be labeled as being of an unknown speaker.
- An embodiment of the blind diarization at 110 receives the mono audio data 102 and the transcription 106 and begins with the assumption that there are two main speakers in the audio file.
- the homogeneous speaker segments from 106 are identified in the audio file. Then, long homogeneous speaker segments can be split into sub-segments if long silent intervals are found within a single segment. The sub-segments are selected to avoid splitting the long speaker segments within a word.
- the transcription information in the information file 106 can provide context to where individual words start and end. After the audio file has been segmented based upon both the audio file 102 and the information file 106 , the identified segments are clustered into speakers (e.g. speaker 1 and speaker 2 ).
- the blind diarization uses voice activity detection (VAD) to segment the audio data 102 into utterances or short segments of audio data with a likelihood of emanating from a single speaker.
- VAD voice activity detection
- the VAD segments the audio data into utterances by identifying segments of speech separated by segments of non-speech on a frame-by-frame basis. Context provided by the transcription 106 can improve the distinction between speech and not speech segments.
- an audio frame may be identified as speech or non-speech based upon a plurality of characteristics or probabilities exemplarily based upon mean energy, band energy, peakiness, or residual energy; however, it will be recognized that alternative characteristics or probabilities may be used in alternative embodiments.
- Embodiments of the blind diarization 110 may further leverage the received metadata 108 to select an acoustic voiceprint model 116 , from a plurality of stored acoustic voiceprint models as well be described in further detail herein.
- Embodiments that use the acoustic voiceprint model in the blind diarization 110 can improve the clustering of the segmented audio data into speakers, for example by helping to cluster segments that are otherwise indeterminate, or “unknown.”
- the blind diarization at 110 results in audio data of separated speaker at 112 .
- the homogeneous speaker segments in the audio data are tagged as being associated with a first speaker or a second speaker.
- in determinate segments may be tagged as “unknown” and audio data may have more than two speakers tagged.
- a second diarization “speaker” diarization, is undertaken to identify which of the first speaker and second speaker is the speaker identified by the metadata 108 and which speaker is the at least one other speaker.
- the metadata 108 identifies a customer service agent participating in the recorded conversation and the other speaker is identified as the customer.
- An acoustic voiceprint model 116 which can be derived in a variety of manners or techniques as described in more detail herein, is compared to the homogeneous speaker audio data segments assigned to the first speaker and then compared to the homogeneous speaker audio data segments assigned to the second speaker to determine which separated speaker audio data segments have a greater likelihood of matching the acoustic voiceprint model 116 .
- the homogeneous speaker segments tagged in the audio file as being the speaker that is most likely the agent based upon the comparison of the acoustic voiceprint model 116 are tagged as the speaker identified in the metadata and the other homogeneous speaker segments are tagged as being the other speaker.
- the diarized and labeled audio data from 118 again undergoes an automated transcription, exemplarily performed by a STT server or other form of ASR, which exemplarily may be LVCSR.
- an automated transcription 122 can be output from the transcription at 120 through the application of improved algorithms and selection of further linguistic or acoustic models tailored to either the identified agent or the customer, or another aspect of the customer service interaction as identified through the identification of one or more of the speakers in the audio data.
- This improved labeling of the speaker in the audio data and the resulting transcription 122 can also facilitate analytics of the spoken content of the audio data by providing additional context regarding the speaker, as well as improved transcription of the audio data.
- the acoustic voice prints as described herein may be used in conjunction with one or more linguistic models, exemplarily the linguistic models as disclosed and applied in U.S. Provisional Patent Application No. 61/729,067, which is incorporated herein by reference.
- the speaker diarization may be performed in parallel with both a linguistic model and an acoustic voice print model and the two resulting speaker diarization are combined or analyzed in combination in order to provide an improved separation of the audio data into known speakers.
- the combination of both an acoustic voiceprint model and a linguistic model can help to identify errors in the blind diarization or the speaker separation phases, exemplarily by highlighting the portions of the audio data above within which the two models disagree and providing for more detailed analysis on those areas in which the models are in disagreement in order to arrive at the correct diarization and speaker labeling.
- the use of an additional linguistic model may provide a backup for an instance wherein an acoustic voiceprint is not available or identified based upon the received metadata. For example, this situation may arrive when there is insufficient audio data regarding a speaker to create an acoustic voiceprint as described in further detail herein.
- a combined implantation using a linguistic model and an acoustic model may help to identify an incongruity between the received metadata, which may identify one speaker, while the comparison to that speaker's acoustic voiceprint model reveals that the identified speaker is not in the audio data.
- this may help to detect an instance wherein a customer service agent enters the wrung agent ID number so that corrective action may be taken.
- the use of a combination of acoustic and linguistic models may help in the identification and separation of speakers in audio data that contain more than two speakers, exemplarily, one customer service agent and two customers; two agents and one customer; or an agent, a customer, and an automated recording such as a voicemail message.
- FIG. 2 is a flow chart that depicts an embodiment of the creation and use of an acoustic voiceprint model exemplarily used as the acoustic voiceprint model 116 in FIG. 1 .
- the method 200 is divided into two portions, exemplarily, the creation of the acoustic voiceprint model at 202 and the application or use of the acoustic voiceprint model at 204 to label speakers in an audio file.
- the acoustic voiceprint model is of a customer service agent and associated with an agent identification number specific to the customer service agent.
- a number (N) of files are selected from a repository of files 208 .
- the files selected at 206 all share a common speaker, exemplarily, the customer service agent for which the model is being created.
- each of the audio files in the repository 208 are stored with or associated to an agent identification number.
- N may be 5 files, 100 files, or 1,000; however, these are merely exemplary numbers.
- the N files selected at 20 may be further filtered in order to only select audio files in which the speaker, and thus the identified speaker are easy to differentiate, f r example due to the frequency of the voices of the different speakers.
- the acoustic voiceprint model as disclosed herein may be started with files that are likely to be accurate in the speaker separation.
- the top 50% of the selected files are used to create the acoustic voiceprint, while in other embodiments, the top 20% or top 10% are used; however, these percentages are in no way intended to be limiting on the thresholds that may be used in embodiments in accordance with the present disclosure.
- a diarization or transcription of the audio file is received and scored and only the highest scoring audio files are used to create the acoustic voiceprint model.
- the score may exemplarily be an automatedly calculated confidence score for the diarization or transcription. Such automated confidence may exemplarily, but not limited to, use an auto correction function.
- Each of the files selected at 206 are processed through a diarization at 210 .
- the diarization process may be such as is exemplarily disclosed above with respect to FIG. 1 .
- the diarization at 210 takes each of the selected audio files and separates the file into a plurality of segments of speech separated by non-speech.
- the plurality of speech segments are further divided such that each segment has a high likelihood of containing speech sections from a single speaker. Similar to the blind diarization described above, the diarization at 210 can divide the audio file into segments labeled as a first speaker and a second speaker (or in some embodiments more speakers) at 212 .
- the previously identified speaker segments from the plurality of selected audio files are clustered into segments that are similar to one another.
- the clustering process can be done directly by matching segments based upon similarity to one another or by clustering the speaker segments based upon similarities to a group of segments.
- the clustered speaker segments are classified at 216 .
- Embodiments of the system and method use one or more metrics to determine which clusters of speaker segments belong to the customer service agent and which speaker segment clusters belong to the customers with whom the customer service agent was speaking.
- the metric of cluster size may be used to identify the segment clusters associated with the customer service agent as larger clusters may belong to the customer service agent because the customer service agent is a party in each of the audio files selected for use in creating a model at 206 . While it will be recognized that other features related to the agent's script, delivery, other factors related to the customer service calls themselves may be used as the classifying metric.
- an acoustic voiceprint model for the identified speaker exemplarily a customer service agent is built using the segments that have been classified as being from the identified speaker.
- a background voiceprint model that is representative of the audio produced from speakers who are not the identified speaker is built from those speech segments identified to not be the identified speaker, and thus may include the other speakers as well as background noise.
- the acoustic voiceprint model includes both an identified speaker voiceprint 222 that is representative of the speech of the identified speaker and a background voiceprint 224 that is representative of the other speaker with whom the identified speaker speaks, and any background noises to the audio data of the identified speaker.
- the creation of the acoustic voiceprint model 202 may be performed in embodiments to create an acoustic voiceprint model for each of a plurality of identified speakers that will be recorded and analyzed in the diarization method of FIG. 1 .
- the identified speakers may be a plurality of customer service agents.
- each of the created acoustic voiceprint models are stored in a database of acoustic voiceprint models from which specific models are accessed as described above with respect to FIG. 1 , exemplarily based upon an identification number in metadata associated with audio data.
- the processes at 202 may be performed at regular intervals using a predefined number of recently obtained audio data, or a stored set of exemplary audio files.
- exemplary audio files may be identified from situations in which the identified speaker is particularly easy to pick out in the audio, perhaps due to differences in the pitch or tone between the identified speaker's voice and the other speaker's voice, or due to a distinctive speech pattern or characteristic or prevalent accent by the other speaker.
- the acoustic voiceprint model is built on an ad hoc basis at the time of diarization of the audio.
- the acoustic model creation process may simply select a predetermined number of the most recent audio recordings that include the identified speaker or may include all audio recordings within a predefined date that include the identified speaker. It will be also noted that once the audio file currently being processed has been diarized, that audio recording may be added to the repository of audio files 208 for training of future models of the speech of the identified speaker.
- the acoustic voiceprint model as created at 202 in performing a speaker diarization such as represented at 114 in FIG. 1 .
- new audio data is received.
- the new audio data received at 226 may be a stream of real-time audio data or may be recorded audio data being processed. Similar to that described above with respect to 110 and 112 in FIG. 1 , the new audio data 226 undergoes diarization at 228 to separate the new audio data 226 into segments that can be confidently tagged as being the speech of a single speaker, exemplarily a first speaker and a second speaker.
- the selected acoustic voiceprint 222 which may include background voiceprint 224 , is compared to the segments identified in the diarization at 228 .
- each of the identified segments is separately compared to both the acoustic voiceprint 222 and to the background voiceprint 224 and an aggregation of the similarities of the first speaker segments and the second speaker segments to each of the models is compared in order to determine which of the speakers in the diarized audio file is the identified speaker.
- the acoustic voiceprint model is created from a collection of audio files that are selected to provide a sufficient amount of audio data that can be confidently tagged to belong only to the agent, and these selected audio files are used to create the agent acoustic model. Some considerations that may go into such a selection may be identified files with good speaker separation and sufficient length to provide data to the model and confirm speaker separation.
- the audio files are preprocessed to eliminate non-speech data from the audio file that may affect the background model. Such elimination of non-speech data can be performed by filtering or concatenation.
- the speakers in an audio file can be represented by a feature vector and the feature vectors can be aggregated into clusters. Such aggregation of the feature vectors may help to identify the customer service agent from the background speech as the feature vector associated with the agent will aggregate into clusters more quickly than those feature vectors representing a number of different customers.
- an iterative process may be employed whereby a first acoustic voiceprint model is created using some of the techniques disclosed above, the acoustic voiceprint model is tested or verified, and if the model is not deemed to be broad enough or be based upon enough speaker segments, additional audio files and speaker segments can be selected from the repository and the model is recreated.
- the speaker in an audio file is represented by a feature vector.
- An initial super-segment labeling is performed using agglomerative clustering of feature vectors.
- the feature vectors from the agent will aggregate into clusters more quickly than the feature vectors from the second speaker as the second speaker in each of the audio files is likely to be a different person.
- a first acoustic voiceprint model is built from the feature vectors found in the largest clusters and the background model is built from all of the other feature vectors.
- a diagonal Gaussian can be trained for each large cluster from the super-segments in that cluster.
- GMM Gaussian Mixture Model
- the Gaussians are then merged where a weighting value of each Gaussian is proportionate to the number of super-segments in the cluster represented by the Gaussian.
- the background model can be comprised of a single diagonal Gaussian trained on the values of the super segments that are remaining.
- the acoustic voiceprint model can be refined by calculating a log-likelihood of each audio file's super-segments with both the acoustic voiceprint and background models, reassigning the super-segments based upon this comparison.
- the acoustic voiceprint and background models can be rebuilt from the reassigned super-segments in the manner as described above and the models can be iteratively created in the manner described above until the acoustic voiceprint model can be verified.
- the acoustic voiceprint model can be verified when a high enough quality match is found between enough of the sample agent super-segments and the agent model. Once the acoustic voiceprint model has been verified, then the final acoustic voiceprint model can be built with a single full Gaussian over the last super-segment assignments from the application of the acoustic voiceprint model to the selected audio files.
- GMM Gaussian Mixture Model
- the background model can be created from the super-segments not assigned to the identified speaker. It will be recognized that in alternative embodiments, an institution, such as a call center, may use a single background model for all agents with the background model being updated in the manner described above at periodic intervals.
- Embodiments of the method described above can be performed or implemented in a variety of ways.
- the SST server in addition to performing the LVCSR, can also perform the diarization process.
- Another alternative is to use a centralized server to perform the diarization process.
- a stand-alone SST server performs the diarization process locally without any connection to another server for central storage or processing.
- the STT server performs the diarization, but relies upon centrally stored or processed models, to perform the initial transcription.
- a central dedicated diarization server may be used where the output of many STT servers are sent to the centralized diarization server for processing.
- the centralized diarization server may have locally stored models that build from processing of all of the diarization at a single server.
Landscapes
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Telephonic Communication Services (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Circuit For Audible Band Transducer (AREA)
- Computational Linguistics (AREA)
Abstract
Description
Claims (20)
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/594,812 US10650826B2 (en) | 2012-11-21 | 2019-10-07 | Diarization using acoustic labeling |
US16/848,385 US11227603B2 (en) | 2012-11-21 | 2020-04-14 | System and method of video capture and search optimization for creating an acoustic voiceprint |
US17/577,238 US11776547B2 (en) | 2012-11-21 | 2022-01-17 | System and method of video capture and search optimization for creating an acoustic voiceprint |
US18/475,599 US20240021206A1 (en) | 2012-11-21 | 2023-09-27 | Diarization using acoustic labeling |
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261729067P | 2012-11-21 | 2012-11-21 | |
US201261729064P | 2012-11-21 | 2012-11-21 | |
US14/084,974 US10134400B2 (en) | 2012-11-21 | 2013-11-20 | Diarization using acoustic labeling |
US16/170,306 US10438592B2 (en) | 2012-11-21 | 2018-10-25 | Diarization using speech segment labeling |
US16/594,812 US10650826B2 (en) | 2012-11-21 | 2019-10-07 | Diarization using acoustic labeling |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/170,306 Continuation US10438592B2 (en) | 2012-11-21 | 2018-10-25 | Diarization using speech segment labeling |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/848,385 Continuation US11227603B2 (en) | 2012-11-21 | 2020-04-14 | System and method of video capture and search optimization for creating an acoustic voiceprint |
Publications (2)
Publication Number | Publication Date |
---|---|
US20200035246A1 US20200035246A1 (en) | 2020-01-30 |
US10650826B2 true US10650826B2 (en) | 2020-05-12 |
Family
ID=50728768
Family Applications (20)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/084,976 Active 2035-07-31 US10134401B2 (en) | 2012-11-21 | 2013-11-20 | Diarization using linguistic labeling |
US14/084,974 Active 2035-07-12 US10134400B2 (en) | 2012-11-21 | 2013-11-20 | Diarization using acoustic labeling |
US16/170,306 Active US10438592B2 (en) | 2012-11-21 | 2018-10-25 | Diarization using speech segment labeling |
US16/170,278 Active US10522152B2 (en) | 2012-11-21 | 2018-10-25 | Diarization using linguistic labeling |
US16/170,289 Active US10522153B2 (en) | 2012-11-21 | 2018-10-25 | Diarization using linguistic labeling |
US16/170,297 Active US10446156B2 (en) | 2012-11-21 | 2018-10-25 | Diarization using textual and audio speaker labeling |
US16/567,446 Active US10593332B2 (en) | 2012-11-21 | 2019-09-11 | Diarization using textual and audio speaker labeling |
US16/587,518 Active US10692500B2 (en) | 2012-11-21 | 2019-09-30 | Diarization using linguistic labeling to create and apply a linguistic model |
US16/594,764 Active US10692501B2 (en) | 2012-11-21 | 2019-10-07 | Diarization using acoustic labeling to create an acoustic voiceprint |
US16/594,812 Active US10650826B2 (en) | 2012-11-21 | 2019-10-07 | Diarization using acoustic labeling |
US16/703,274 Active US10950242B2 (en) | 2012-11-21 | 2019-12-04 | System and method of diarization and labeling of audio data |
US16/703,245 Active US10902856B2 (en) | 2012-11-21 | 2019-12-04 | System and method of diarization and labeling of audio data |
US16/703,099 Active 2034-08-18 US11322154B2 (en) | 2012-11-21 | 2019-12-04 | Diarization using linguistic labeling |
US16/703,030 Active 2034-09-04 US11367450B2 (en) | 2012-11-21 | 2019-12-04 | System and method of diarization and labeling of audio data |
US16/703,143 Active 2034-09-30 US11380333B2 (en) | 2012-11-21 | 2019-12-04 | System and method of diarization and labeling of audio data |
US16/703,206 Active US10950241B2 (en) | 2012-11-21 | 2019-12-04 | Diarization using linguistic labeling with segmented and clustered diarized textual transcripts |
US16/702,998 Active US10720164B2 (en) | 2012-11-21 | 2019-12-04 | System and method of diarization and labeling of audio data |
US16/848,385 Active 2033-12-20 US11227603B2 (en) | 2012-11-21 | 2020-04-14 | System and method of video capture and search optimization for creating an acoustic voiceprint |
US17/577,238 Active US11776547B2 (en) | 2012-11-21 | 2022-01-17 | System and method of video capture and search optimization for creating an acoustic voiceprint |
US18/475,599 Pending US20240021206A1 (en) | 2012-11-21 | 2023-09-27 | Diarization using acoustic labeling |
Family Applications Before (9)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/084,976 Active 2035-07-31 US10134401B2 (en) | 2012-11-21 | 2013-11-20 | Diarization using linguistic labeling |
US14/084,974 Active 2035-07-12 US10134400B2 (en) | 2012-11-21 | 2013-11-20 | Diarization using acoustic labeling |
US16/170,306 Active US10438592B2 (en) | 2012-11-21 | 2018-10-25 | Diarization using speech segment labeling |
US16/170,278 Active US10522152B2 (en) | 2012-11-21 | 2018-10-25 | Diarization using linguistic labeling |
US16/170,289 Active US10522153B2 (en) | 2012-11-21 | 2018-10-25 | Diarization using linguistic labeling |
US16/170,297 Active US10446156B2 (en) | 2012-11-21 | 2018-10-25 | Diarization using textual and audio speaker labeling |
US16/567,446 Active US10593332B2 (en) | 2012-11-21 | 2019-09-11 | Diarization using textual and audio speaker labeling |
US16/587,518 Active US10692500B2 (en) | 2012-11-21 | 2019-09-30 | Diarization using linguistic labeling to create and apply a linguistic model |
US16/594,764 Active US10692501B2 (en) | 2012-11-21 | 2019-10-07 | Diarization using acoustic labeling to create an acoustic voiceprint |
Family Applications After (10)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/703,274 Active US10950242B2 (en) | 2012-11-21 | 2019-12-04 | System and method of diarization and labeling of audio data |
US16/703,245 Active US10902856B2 (en) | 2012-11-21 | 2019-12-04 | System and method of diarization and labeling of audio data |
US16/703,099 Active 2034-08-18 US11322154B2 (en) | 2012-11-21 | 2019-12-04 | Diarization using linguistic labeling |
US16/703,030 Active 2034-09-04 US11367450B2 (en) | 2012-11-21 | 2019-12-04 | System and method of diarization and labeling of audio data |
US16/703,143 Active 2034-09-30 US11380333B2 (en) | 2012-11-21 | 2019-12-04 | System and method of diarization and labeling of audio data |
US16/703,206 Active US10950241B2 (en) | 2012-11-21 | 2019-12-04 | Diarization using linguistic labeling with segmented and clustered diarized textual transcripts |
US16/702,998 Active US10720164B2 (en) | 2012-11-21 | 2019-12-04 | System and method of diarization and labeling of audio data |
US16/848,385 Active 2033-12-20 US11227603B2 (en) | 2012-11-21 | 2020-04-14 | System and method of video capture and search optimization for creating an acoustic voiceprint |
US17/577,238 Active US11776547B2 (en) | 2012-11-21 | 2022-01-17 | System and method of video capture and search optimization for creating an acoustic voiceprint |
US18/475,599 Pending US20240021206A1 (en) | 2012-11-21 | 2023-09-27 | Diarization using acoustic labeling |
Country Status (1)
Country | Link |
---|---|
US (20) | US10134401B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220139399A1 (en) * | 2012-11-21 | 2022-05-05 | Verint Systems Ltd. | System and method of video capture and search optimization for creating an acoustic voiceprint |
Families Citing this family (79)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9571652B1 (en) | 2005-04-21 | 2017-02-14 | Verint Americas Inc. | Enhanced diarization systems, media and methods of use |
CN102760434A (en) * | 2012-07-09 | 2012-10-31 | 华为终端有限公司 | Method for updating voiceprint feature model and terminal |
US9401140B1 (en) * | 2012-08-22 | 2016-07-26 | Amazon Technologies, Inc. | Unsupervised acoustic model training |
US10346542B2 (en) | 2012-08-31 | 2019-07-09 | Verint Americas Inc. | Human-to-human conversation analysis |
US9368116B2 (en) | 2012-09-07 | 2016-06-14 | Verint Systems Ltd. | Speaker separation in diarization |
US9503579B2 (en) * | 2013-01-17 | 2016-11-22 | Verint Systems Ltd. | Identification of non-compliant interactions |
US9368109B2 (en) * | 2013-05-31 | 2016-06-14 | Nuance Communications, Inc. | Method and apparatus for automatic speaker-based speech clustering |
US9460722B2 (en) | 2013-07-17 | 2016-10-04 | Verint Systems Ltd. | Blind diarization of recorded calls with arbitrary number of speakers |
US9368106B2 (en) | 2013-07-30 | 2016-06-14 | Verint Systems Ltd. | System and method of automated evaluation of transcription quality |
US9984706B2 (en) | 2013-08-01 | 2018-05-29 | Verint Systems Ltd. | Voice activity detection using a soft decision mechanism |
KR20150093482A (en) * | 2014-02-07 | 2015-08-18 | 한국전자통신연구원 | System for Speaker Diarization based Multilateral Automatic Speech Translation System and its operating Method, and Apparatus supporting the same |
US9792899B2 (en) * | 2014-07-15 | 2017-10-17 | International Business Machines Corporation | Dataset shift compensation in machine learning |
CN105575391B (en) | 2014-10-10 | 2020-04-03 | 阿里巴巴集团控股有限公司 | Voiceprint information management method and device and identity authentication method and system |
US9875742B2 (en) | 2015-01-26 | 2018-01-23 | Verint Systems Ltd. | Word-level blind diarization of recorded calls with arbitrary number of speakers |
US9728191B2 (en) * | 2015-08-27 | 2017-08-08 | Nuance Communications, Inc. | Speaker verification methods and apparatus |
CN105719659A (en) * | 2016-02-03 | 2016-06-29 | 努比亚技术有限公司 | Recording file separation method and device based on voiceprint identification |
US10311855B2 (en) * | 2016-03-29 | 2019-06-04 | Speech Morphing Systems, Inc. | Method and apparatus for designating a soundalike voice to a target voice from a database of voices |
US20240055014A1 (en) * | 2016-07-16 | 2024-02-15 | Ron Zass | Visualizing Auditory Content for Accessibility |
US10403268B2 (en) * | 2016-09-08 | 2019-09-03 | Intel IP Corporation | Method and system of automatic speech recognition using posterior confidence scores |
US10432789B2 (en) * | 2017-02-09 | 2019-10-01 | Verint Systems Ltd. | Classification of transcripts by sentiment |
US10642889B2 (en) | 2017-02-20 | 2020-05-05 | Gong I.O Ltd. | Unsupervised automated topic detection, segmentation and labeling of conversations |
WO2018155026A1 (en) * | 2017-02-27 | 2018-08-30 | ソニー株式会社 | Information processing device, information processing method, and program |
US10460727B2 (en) * | 2017-03-03 | 2019-10-29 | Microsoft Technology Licensing, Llc | Multi-talker speech recognizer |
US10832587B2 (en) * | 2017-03-15 | 2020-11-10 | International Business Machines Corporation | Communication tone training |
CN108630193B (en) * | 2017-03-21 | 2020-10-02 | 北京嘀嘀无限科技发展有限公司 | Voice recognition method and device |
KR102304701B1 (en) * | 2017-03-28 | 2021-09-24 | 삼성전자주식회사 | Method and apparatus for providng response to user's voice input |
US20190051376A1 (en) * | 2017-08-10 | 2019-02-14 | Nuance Communications, Inc. | Automated clinical documentation system and method |
US11316865B2 (en) | 2017-08-10 | 2022-04-26 | Nuance Communications, Inc. | Ambient cooperative intelligence system and method |
US10403288B2 (en) * | 2017-10-17 | 2019-09-03 | Google Llc | Speaker diarization |
US11120802B2 (en) | 2017-11-21 | 2021-09-14 | International Business Machines Corporation | Diarization driven by the ASR based segmentation |
US10468031B2 (en) | 2017-11-21 | 2019-11-05 | International Business Machines Corporation | Diarization driven by meta-information identified in discussion content |
CN107945815B (en) * | 2017-11-27 | 2021-09-07 | 歌尔科技有限公司 | Voice signal noise reduction method and device |
US11848010B1 (en) * | 2018-02-09 | 2023-12-19 | Voicebase, Inc. | Systems and methods for creating dynamic features for correlation engines |
WO2019173333A1 (en) | 2018-03-05 | 2019-09-12 | Nuance Communications, Inc. | Automated clinical documentation system and method |
WO2019173340A1 (en) | 2018-03-05 | 2019-09-12 | Nuance Communications, Inc. | System and method for review of automated clinical documentation |
US11250382B2 (en) | 2018-03-05 | 2022-02-15 | Nuance Communications, Inc. | Automated clinical documentation system and method |
US11276407B2 (en) | 2018-04-17 | 2022-03-15 | Gong.Io Ltd. | Metadata-based diarization of teleconferences |
US11094316B2 (en) * | 2018-05-04 | 2021-08-17 | Qualcomm Incorporated | Audio analytics for natural language processing |
CN108900725B (en) * | 2018-05-29 | 2020-05-29 | 平安科技(深圳)有限公司 | Voiceprint recognition method and device, terminal equipment and storage medium |
US11011162B2 (en) * | 2018-06-01 | 2021-05-18 | Soundhound, Inc. | Custom acoustic models |
US11822888B2 (en) | 2018-10-05 | 2023-11-21 | Verint Americas Inc. | Identifying relational segments |
JP7218547B2 (en) * | 2018-11-16 | 2023-02-07 | 富士フイルムビジネスイノベーション株式会社 | Information processing device and information processing program |
US11031017B2 (en) | 2019-01-08 | 2021-06-08 | Google Llc | Fully supervised speaker diarization |
GB201906367D0 (en) * | 2019-02-28 | 2019-06-19 | Cirrus Logic Int Semiconductor Ltd | Speaker verification |
US11062706B2 (en) * | 2019-04-29 | 2021-07-13 | Microsoft Technology Licensing, Llc | System and method for speaker role determination and scrubbing identifying information |
US11182504B2 (en) * | 2019-04-29 | 2021-11-23 | Microsoft Technology Licensing, Llc | System and method for speaker role determination and scrubbing identifying information |
US11227679B2 (en) | 2019-06-14 | 2022-01-18 | Nuance Communications, Inc. | Ambient clinical intelligence system and method |
US11216480B2 (en) | 2019-06-14 | 2022-01-04 | Nuance Communications, Inc. | System and method for querying data points from graph data structures |
US11043207B2 (en) | 2019-06-14 | 2021-06-22 | Nuance Communications, Inc. | System and method for array data simulation and customized acoustic modeling for ambient ASR |
US11531807B2 (en) | 2019-06-28 | 2022-12-20 | Nuance Communications, Inc. | System and method for customized text macros |
KR102689034B1 (en) * | 2019-07-01 | 2024-07-25 | 구글 엘엘씨 | Adaptive separation model and user interface |
CN110570869B (en) * | 2019-08-09 | 2022-01-14 | 科大讯飞股份有限公司 | Voiceprint recognition method, device, equipment and storage medium |
US11670408B2 (en) | 2019-09-30 | 2023-06-06 | Nuance Communications, Inc. | System and method for review of automated clinical documentation |
US11076043B2 (en) * | 2019-10-04 | 2021-07-27 | Red Box Recorders Limited | Systems and methods of voiceprint generation and use in enforcing compliance policies |
US11238884B2 (en) | 2019-10-04 | 2022-02-01 | Red Box Recorders Limited | Systems and methods for recording quality driven communication management |
US11238869B2 (en) * | 2019-10-04 | 2022-02-01 | Red Box Recorders Limited | System and method for reconstructing metadata from audio outputs |
US11916913B2 (en) | 2019-11-22 | 2024-02-27 | International Business Machines Corporation | Secure audio transcription |
US11664044B2 (en) | 2019-11-25 | 2023-05-30 | Qualcomm Incorporated | Sound event detection learning |
CN111128223B (en) * | 2019-12-30 | 2022-08-05 | 科大讯飞股份有限公司 | Text information-based auxiliary speaker separation method and related device |
CN111243595B (en) * | 2019-12-31 | 2022-12-27 | 京东科技控股股份有限公司 | Information processing method and device |
US11646032B2 (en) * | 2020-02-27 | 2023-05-09 | Medixin Inc. | Systems and methods for audio processing |
US11232798B2 (en) | 2020-05-21 | 2022-01-25 | Bank Of America Corporation | Audio analysis system for automatic language proficiency assessment |
EP3951775A4 (en) * | 2020-06-16 | 2022-08-10 | Minds Lab Inc. | Method for generating speaker-marked text |
CN111805558B (en) * | 2020-08-03 | 2021-10-08 | 深圳作为科技有限公司 | Self-learning type elderly nursing robot system with memory recognition function |
CN111968657B (en) * | 2020-08-17 | 2022-08-16 | 北京字节跳动网络技术有限公司 | Voice processing method and device, electronic equipment and computer readable medium |
US11495216B2 (en) * | 2020-09-09 | 2022-11-08 | International Business Machines Corporation | Speech recognition using data analysis and dilation of interlaced audio input |
US11538464B2 (en) | 2020-09-09 | 2022-12-27 | International Business Machines Corporation . | Speech recognition using data analysis and dilation of speech content from separated audio input |
CN112420057B (en) * | 2020-10-26 | 2022-05-03 | 四川长虹电器股份有限公司 | Voiceprint recognition method, device and equipment based on distance coding and storage medium |
US11222103B1 (en) | 2020-10-29 | 2022-01-11 | Nuance Communications, Inc. | Ambient cooperative intelligence system and method |
US11522994B2 (en) | 2020-11-23 | 2022-12-06 | Bank Of America Corporation | Voice analysis platform for voiceprint tracking and anomaly detection |
US11410677B2 (en) | 2020-11-24 | 2022-08-09 | Qualcomm Incorporated | Adaptive sound event classification |
US11521623B2 (en) | 2021-01-11 | 2022-12-06 | Bank Of America Corporation | System and method for single-speaker identification in a multi-speaker environment on a low-frequency audio recording |
CN112712809B (en) * | 2021-03-29 | 2021-06-18 | 北京远鉴信息技术有限公司 | Voice detection method and device, electronic equipment and storage medium |
CN113793592B (en) * | 2021-10-29 | 2024-07-16 | 浙江核新同花顺网络信息股份有限公司 | Method and system for distinguishing speakers |
US12087307B2 (en) | 2021-11-30 | 2024-09-10 | Samsung Electronics Co., Ltd. | Method and apparatus for performing speaker diarization on mixed-bandwidth speech signals |
US12062375B2 (en) * | 2021-12-08 | 2024-08-13 | The Mitre Corporation | Systems and methods for separating and identifying audio in an audio file using machine learning |
US11978457B2 (en) * | 2022-02-15 | 2024-05-07 | Gong.Io Ltd | Method for uniquely identifying participants in a recorded streaming teleconference |
US20230394226A1 (en) * | 2022-06-01 | 2023-12-07 | Gong.Io Ltd | Method for summarization and ranking of text of diarized conversations |
CN118197324B (en) * | 2024-05-16 | 2024-08-06 | 江西广播电视网络传媒有限公司 | Dialogue corpus extraction method, dialogue corpus extraction system, dialogue corpus extraction computer and dialogue corpus storage medium |
Citations (114)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4653097A (en) | 1982-01-29 | 1987-03-24 | Tokyo Shibaura Denki Kabushiki Kaisha | Individual verification apparatus |
US4864566A (en) | 1986-09-26 | 1989-09-05 | Cycomm Corporation | Precise multiplexed transmission and reception of analog and digital data through a narrow-band channel |
US5027407A (en) | 1987-02-23 | 1991-06-25 | Kabushiki Kaisha Toshiba | Pattern recognition apparatus using a plurality of candidates |
US5222147A (en) | 1989-04-13 | 1993-06-22 | Kabushiki Kaisha Toshiba | Speech recognition LSI system including recording/reproduction device |
EP0598469A2 (en) | 1992-10-27 | 1994-05-25 | Daniel P. Dunlevy | Interactive credit card fraud control process |
US5638430A (en) | 1993-10-15 | 1997-06-10 | Linkusa Corporation | Call validation system |
US5805674A (en) | 1995-01-26 | 1998-09-08 | Anderson, Jr.; Victor C. | Security arrangement and method for controlling access to a protected system |
US5907602A (en) | 1995-03-30 | 1999-05-25 | British Telecommunications Public Limited Company | Detecting possible fraudulent communication usage |
US5946654A (en) | 1997-02-21 | 1999-08-31 | Dragon Systems, Inc. | Speaker identification using unsupervised speech models |
US5963908A (en) | 1996-12-23 | 1999-10-05 | Intel Corporation | Secure logon to notebook or desktop computers |
US5999525A (en) | 1996-11-18 | 1999-12-07 | Mci Communications Corporation | Method for video telephony over a hybrid network |
US6044382A (en) | 1995-05-19 | 2000-03-28 | Cyber Fone Technologies, Inc. | Data transaction assembly server |
US6145083A (en) | 1998-04-23 | 2000-11-07 | Siemens Information And Communication Networks, Inc. | Methods and system for providing data and telephony security |
WO2000077772A2 (en) | 1999-06-14 | 2000-12-21 | Cyber Technology (Iom) Liminted | Speech and voice signal preprocessing |
US6266640B1 (en) | 1996-08-06 | 2001-07-24 | Dialogic Corporation | Data network with voice verification means |
US6275806B1 (en) | 1999-08-31 | 2001-08-14 | Andersen Consulting, Llp | System method and article of manufacture for detecting emotion in voice signals by utilizing statistics for voice signal parameters |
US20010026632A1 (en) | 2000-03-24 | 2001-10-04 | Seiichiro Tamai | Apparatus for identity verification, a system for identity verification, a card for identity verification and a method for identity verification, based on identification by biometrics |
US20020022474A1 (en) | 1998-12-23 | 2002-02-21 | Vesa Blom | Detecting and preventing fraudulent use in a telecommunications network |
US20020099649A1 (en) | 2000-04-06 | 2002-07-25 | Lee Walter W. | Identification and management of fraudulent credit/debit card purchases at merchant ecommerce sites |
US6427137B2 (en) | 1999-08-31 | 2002-07-30 | Accenture Llp | System, method and article of manufacture for a voice analysis system that detects nervousness for preventing fraud |
US6480825B1 (en) | 1997-01-31 | 2002-11-12 | T-Netix, Inc. | System and method for detecting a recorded voice |
US20030009333A1 (en) | 1996-11-22 | 2003-01-09 | T-Netix, Inc. | Voice print system and method |
US6510415B1 (en) | 1999-04-15 | 2003-01-21 | Sentry Com Ltd. | Voice authentication method and system utilizing same |
US20030050780A1 (en) * | 2001-05-24 | 2003-03-13 | Luca Rigazio | Speaker and environment adaptation based on linear separation of variability sources |
US20030050816A1 (en) | 2001-08-09 | 2003-03-13 | Givens George R. | Systems and methods for network-based employment decisioning |
US20030097593A1 (en) | 2001-11-19 | 2003-05-22 | Fujitsu Limited | User terminal authentication program |
US6587552B1 (en) | 2001-02-15 | 2003-07-01 | Worldcom, Inc. | Fraud library |
US6597775B2 (en) | 2000-09-29 | 2003-07-22 | Fair Isaac Corporation | Self-learning real-time prioritization of telecommunication fraud control actions |
US20030147516A1 (en) | 2001-09-25 | 2003-08-07 | Justin Lawyer | Self-learning real-time prioritization of telecommunication fraud control actions |
US20030208684A1 (en) | 2000-03-08 | 2003-11-06 | Camacho Luz Maria | Method and apparatus for reducing on-line fraud using personal digital identification |
US20040029087A1 (en) | 2002-08-08 | 2004-02-12 | Rodney White | System and method for training and managing gaming personnel |
US20040111305A1 (en) | 1995-04-21 | 2004-06-10 | Worldcom, Inc. | System and method for detecting and managing fraud |
JP2004193942A (en) | 2002-12-11 | 2004-07-08 | Nippon Hoso Kyokai <Nhk> | Method, apparatus and program for transmitting content and method, apparatus and program for receiving content |
US20040131160A1 (en) | 2003-01-02 | 2004-07-08 | Aris Mardirossian | System and method for monitoring individuals |
US20040143635A1 (en) | 2003-01-15 | 2004-07-22 | Nick Galea | Regulating receipt of electronic mail |
US20040167964A1 (en) | 2003-02-25 | 2004-08-26 | Rounthwaite Robert L. | Adaptive junk message filtering system |
US20040203575A1 (en) | 2003-01-13 | 2004-10-14 | Chin Mary W. | Method of recognizing fraudulent wireless emergency service calls |
US20040240631A1 (en) | 2003-05-30 | 2004-12-02 | Vicki Broman | Speaker recognition in a multi-speaker environment and comparison of several voice prints to many |
US20050010411A1 (en) * | 2003-07-09 | 2005-01-13 | Luca Rigazio | Speech data mining for call center management |
US20050043014A1 (en) | 2002-08-08 | 2005-02-24 | Hodge Stephen L. | Telecommunication call management and monitoring system with voiceprint verification |
US20050076084A1 (en) | 2003-10-03 | 2005-04-07 | Corvigo | Dynamic message filtering |
US20050125339A1 (en) | 2003-12-09 | 2005-06-09 | Tidwell Lisa C. | Systems and methods for assessing the risk of a financial transaction using biometric information |
US20050125226A1 (en) | 2003-10-29 | 2005-06-09 | Paul Magee | Voice recognition system and method |
US20050135595A1 (en) * | 2003-12-18 | 2005-06-23 | Sbc Knowledge Ventures, L.P. | Intelligently routing customer communications |
US20050185779A1 (en) | 2002-07-31 | 2005-08-25 | Toms Alvin D. | System and method for the detection and termination of fraudulent services |
US20060013372A1 (en) | 2004-07-15 | 2006-01-19 | Tekelec | Methods, systems, and computer program products for automatically populating signaling-based access control database |
JP2006038955A (en) | 2004-07-22 | 2006-02-09 | Docomo Engineering Tohoku Inc | Voiceprint recognition system |
WO2006013555A2 (en) | 2004-08-04 | 2006-02-09 | Cellmax Systems Ltd. | Method and system for verifying and enabling user access based on voice parameters |
US7006605B1 (en) | 1996-06-28 | 2006-02-28 | Ochopee Big Cypress Llc | Authenticating a caller before providing the caller with access to one or more secured resources |
US7039951B1 (en) | 2000-06-06 | 2006-05-02 | International Business Machines Corporation | System and method for confidence based incremental access authentication |
US20060106605A1 (en) | 2004-11-12 | 2006-05-18 | Saunders Joseph M | Biometric record management |
US20060149558A1 (en) * | 2001-07-17 | 2006-07-06 | Jonathan Kahn | Synchronized pattern recognition source data processed by manual or automatic means for creation of shared speaker-dependent speech user profile |
US20060161435A1 (en) | 2004-12-07 | 2006-07-20 | Farsheed Atef | System and method for identity verification and management |
US7106843B1 (en) | 1994-04-19 | 2006-09-12 | T-Netix, Inc. | Computer-based method and apparatus for controlling, monitoring, recording and reporting telephone access |
US20060212925A1 (en) | 2005-03-02 | 2006-09-21 | Markmonitor, Inc. | Implementing trust policies |
US20060212407A1 (en) | 2005-03-17 | 2006-09-21 | Lyon Dennis B | User authentication and secure transaction system |
US20060248019A1 (en) | 2005-04-21 | 2006-11-02 | Anthony Rajakumar | Method and system to detect fraud using voice data |
US20060251226A1 (en) | 1993-10-15 | 2006-11-09 | Hogan Steven J | Call-processing system and method |
US20060282660A1 (en) | 2005-04-29 | 2006-12-14 | Varghese Thomas E | System and method for fraud monitoring, detection, and tiered user authentication |
US20060285665A1 (en) | 2005-05-27 | 2006-12-21 | Nice Systems Ltd. | Method and apparatus for fraud detection |
US20060293891A1 (en) | 2005-06-22 | 2006-12-28 | Jan Pathuel | Biometric control systems and associated methods of use |
US20060289622A1 (en) | 2005-06-24 | 2006-12-28 | American Express Travel Related Services Company, Inc. | Word recognition system and method for customer and employee assessment |
US20070041517A1 (en) | 2005-06-30 | 2007-02-22 | Pika Technologies Inc. | Call transfer detection method using voice identification techniques |
US20070074021A1 (en) | 2005-09-23 | 2007-03-29 | Smithies Christopher P K | System and method for verification of personal identity |
US20070071206A1 (en) | 2005-06-24 | 2007-03-29 | Gainsboro Jay L | Multi-party conversation analyzer & logger |
US7212613B2 (en) | 2003-09-18 | 2007-05-01 | International Business Machines Corporation | System and method for telephonic voice authentication |
US20070100608A1 (en) | 2000-11-21 | 2007-05-03 | The Regents Of The University Of California | Speaker verification system using acoustic data and non-acoustic data |
US20070244702A1 (en) | 2006-04-12 | 2007-10-18 | Jonathan Kahn | Session File Modification with Annotation Using Speech Recognition or Text to Speech |
US20070250318A1 (en) | 2006-04-25 | 2007-10-25 | Nice Systems Ltd. | Automatic speech analysis |
US20070282605A1 (en) | 2005-04-21 | 2007-12-06 | Anthony Rajakumar | Method and System for Screening Using Voice Data and Metadata |
US20070280436A1 (en) | 2006-04-14 | 2007-12-06 | Anthony Rajakumar | Method and System to Seed a Voice Database |
US20070288242A1 (en) * | 2006-06-12 | 2007-12-13 | Lockheed Martin Corporation | Speech recognition and control system, program product, and related methods |
US7403922B1 (en) | 1997-07-28 | 2008-07-22 | Cybersource Corporation | Method and apparatus for evaluating fraud risk in an electronic commerce transaction |
US20080181417A1 (en) * | 2006-01-25 | 2008-07-31 | Nice Systems Ltd. | Method and Apparatus For Segmentation of Audio Interactions |
US20080195387A1 (en) | 2006-10-19 | 2008-08-14 | Nice Systems Ltd. | Method and apparatus for large population speaker identification in telephone interactions |
US20080222734A1 (en) | 2000-11-13 | 2008-09-11 | Redlich Ron M | Security System with Extraction, Reconstruction and Secure Recovery and Storage of Data |
US20090119106A1 (en) | 2005-04-21 | 2009-05-07 | Anthony Rajakumar | Building whitelists comprising voiceprints not associated with fraud and screening calls using a combination of a whitelist and blacklist |
US7539290B2 (en) | 2002-11-08 | 2009-05-26 | Verizon Services Corp. | Facilitation of a conference call |
US20090247131A1 (en) | 2005-10-31 | 2009-10-01 | Champion Laurenn L | Systems and Methods for Restricting The Use of Stolen Devices on a Wireless Network |
US20090254971A1 (en) | 1999-10-27 | 2009-10-08 | Pinpoint, Incorporated | Secure data interchange |
US20090319269A1 (en) | 2008-06-24 | 2009-12-24 | Hagai Aronowitz | Method of Trainable Speaker Diarization |
US7657431B2 (en) | 2005-02-18 | 2010-02-02 | Fujitsu Limited | Voice authentication system |
US7660715B1 (en) * | 2004-01-12 | 2010-02-09 | Avaya Inc. | Transparent monitoring and intervention to improve automatic adaptation of speech models |
US7668769B2 (en) | 2005-10-04 | 2010-02-23 | Basepoint Analytics, LLC | System and method of detecting fraud |
US7693965B2 (en) | 1993-11-18 | 2010-04-06 | Digimarc Corporation | Analyzing audio, including analyzing streaming audio signals |
US20100138282A1 (en) * | 2006-02-22 | 2010-06-03 | Kannan Pallipuram V | Mining interactions to manage customer experience throughout a customer service lifecycle |
US20100228656A1 (en) | 2009-03-09 | 2010-09-09 | Nice Systems Ltd. | Apparatus and method for fraud prevention |
US20100305960A1 (en) | 2005-04-21 | 2010-12-02 | Victrio | Method and system for enrolling a voiceprint in a fraudster database |
US20100305946A1 (en) | 2005-04-21 | 2010-12-02 | Victrio | Speaker verification-based fraud system for combined automated risk score with agent review and associated user interface |
US20100303211A1 (en) | 2005-04-21 | 2010-12-02 | Victrio | Method and system for generating a fraud risk score using telephony channel based audio and non-audio data |
US20100332287A1 (en) * | 2009-06-24 | 2010-12-30 | International Business Machines Corporation | System and method for real-time prediction of customer satisfaction |
US20110004472A1 (en) | 2006-03-31 | 2011-01-06 | Igor Zlokarnik | Speech Recognition Using Channel Verification |
US20110026689A1 (en) | 2009-07-30 | 2011-02-03 | Metz Brent D | Telephone call inbox |
US20110119060A1 (en) | 2009-11-15 | 2011-05-19 | International Business Machines Corporation | Method and system for speaker diarization |
US20110255676A1 (en) | 2000-05-22 | 2011-10-20 | Verizon Business Global Llc | Fraud detection based on call attempt velocity on terminating number |
US20110282778A1 (en) | 2001-05-30 | 2011-11-17 | Wright William A | Method and apparatus for evaluating fraud risk in an electronic commerce transaction |
US20110282661A1 (en) | 2010-05-11 | 2011-11-17 | Nice Systems Ltd. | Method for speaker source classification |
US8112278B2 (en) | 2004-12-13 | 2012-02-07 | Securicom (Nsw) Pty Ltd | Enhancing the response of biometric access systems |
US20120072453A1 (en) | 2005-04-21 | 2012-03-22 | Lisa Guerra | Systems, methods, and media for determining fraud patterns and creating fraud behavioral models |
US20120130771A1 (en) * | 2010-11-18 | 2012-05-24 | Kannan Pallipuram V | Chat Categorization and Agent Performance Modeling |
US20120253805A1 (en) | 2005-04-21 | 2012-10-04 | Anthony Rajakumar | Systems, methods, and media for determining fraud risk from audio signals |
US20120263285A1 (en) | 2005-04-21 | 2012-10-18 | Anthony Rajakumar | Systems, methods, and media for disambiguating call data to determine fraud |
US20120284026A1 (en) | 2011-05-06 | 2012-11-08 | Nexidia Inc. | Speaker verification system |
US20130163737A1 (en) | 2011-12-22 | 2013-06-27 | Cox Communications, Inc. | Systems and Methods of Detecting Communications Fraud |
US20130197912A1 (en) | 2012-01-31 | 2013-08-01 | Fujitsu Limited | Specific call detecting device and specific call detecting method |
US8537978B2 (en) | 2008-10-06 | 2013-09-17 | International Business Machines Corporation | Method and system for using conversational biometrics and speaker identification/verification to filter voice streams |
US20130300939A1 (en) | 2012-05-11 | 2013-11-14 | Cisco Technology, Inc. | System and method for joint speaker and scene recognition in a video/audio processing environment |
US20140067394A1 (en) | 2012-08-28 | 2014-03-06 | King Abdulaziz City For Science And Technology | System and method for decoding speech |
US20140142940A1 (en) | 2012-11-21 | 2014-05-22 | Verint Systems Ltd. | Diarization Using Linguistic Labeling |
US20150055763A1 (en) | 2005-04-21 | 2015-02-26 | Verint Americas Inc. | Systems, methods, and media for determining fraud patterns and creating fraud behavioral models |
US9001976B2 (en) * | 2012-05-03 | 2015-04-07 | Nexidia, Inc. | Speaker adaptation |
US20160364606A1 (en) * | 2014-10-27 | 2016-12-15 | Mattersight Corporation | Predictive and responsive video analytics system and methods |
US20160379032A1 (en) * | 2014-01-14 | 2016-12-29 | Focaltech Electronics, Ltd. | Electric field-type fingerprint identification apparatus and state control method and prosthesis identification method thereof |
US20160379082A1 (en) | 2009-10-28 | 2016-12-29 | Digimarc Corporation | Intuitive computing methods and systems |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7184539B2 (en) * | 2003-04-29 | 2007-02-27 | International Business Machines Corporation | Automated call center transcription services |
WO2007117626A2 (en) * | 2006-04-05 | 2007-10-18 | Yap, Inc. | Hosted voice recognition system for wireless devices |
US8611523B2 (en) * | 2007-09-28 | 2013-12-17 | Mattersight Corporation | Methods and systems for determining segments of a telephonic communication between a customer and a contact center to classify each segment of the communication, assess negotiations, and automate setup time calculation |
US8259910B2 (en) * | 2008-03-14 | 2012-09-04 | Voicecloud | Method and system for transcribing audio messages |
US8417233B2 (en) * | 2011-06-13 | 2013-04-09 | Mercury Mobile, Llc | Automated notation techniques implemented via mobile devices and/or computer networks |
JP6495098B2 (en) | 2015-05-21 | 2019-04-03 | 中央精機株式会社 | Thermoacoustic power generation system |
-
2013
- 2013-11-20 US US14/084,976 patent/US10134401B2/en active Active
- 2013-11-20 US US14/084,974 patent/US10134400B2/en active Active
-
2018
- 2018-10-25 US US16/170,306 patent/US10438592B2/en active Active
- 2018-10-25 US US16/170,278 patent/US10522152B2/en active Active
- 2018-10-25 US US16/170,289 patent/US10522153B2/en active Active
- 2018-10-25 US US16/170,297 patent/US10446156B2/en active Active
-
2019
- 2019-09-11 US US16/567,446 patent/US10593332B2/en active Active
- 2019-09-30 US US16/587,518 patent/US10692500B2/en active Active
- 2019-10-07 US US16/594,764 patent/US10692501B2/en active Active
- 2019-10-07 US US16/594,812 patent/US10650826B2/en active Active
- 2019-12-04 US US16/703,274 patent/US10950242B2/en active Active
- 2019-12-04 US US16/703,245 patent/US10902856B2/en active Active
- 2019-12-04 US US16/703,099 patent/US11322154B2/en active Active
- 2019-12-04 US US16/703,030 patent/US11367450B2/en active Active
- 2019-12-04 US US16/703,143 patent/US11380333B2/en active Active
- 2019-12-04 US US16/703,206 patent/US10950241B2/en active Active
- 2019-12-04 US US16/702,998 patent/US10720164B2/en active Active
-
2020
- 2020-04-14 US US16/848,385 patent/US11227603B2/en active Active
-
2022
- 2022-01-17 US US17/577,238 patent/US11776547B2/en active Active
-
2023
- 2023-09-27 US US18/475,599 patent/US20240021206A1/en active Pending
Patent Citations (141)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4653097A (en) | 1982-01-29 | 1987-03-24 | Tokyo Shibaura Denki Kabushiki Kaisha | Individual verification apparatus |
US4864566A (en) | 1986-09-26 | 1989-09-05 | Cycomm Corporation | Precise multiplexed transmission and reception of analog and digital data through a narrow-band channel |
US5027407A (en) | 1987-02-23 | 1991-06-25 | Kabushiki Kaisha Toshiba | Pattern recognition apparatus using a plurality of candidates |
US5222147A (en) | 1989-04-13 | 1993-06-22 | Kabushiki Kaisha Toshiba | Speech recognition LSI system including recording/reproduction device |
EP0598469A2 (en) | 1992-10-27 | 1994-05-25 | Daniel P. Dunlevy | Interactive credit card fraud control process |
US5638430A (en) | 1993-10-15 | 1997-06-10 | Linkusa Corporation | Call validation system |
US20060251226A1 (en) | 1993-10-15 | 2006-11-09 | Hogan Steven J | Call-processing system and method |
US7693965B2 (en) | 1993-11-18 | 2010-04-06 | Digimarc Corporation | Analyzing audio, including analyzing streaming audio signals |
US7106843B1 (en) | 1994-04-19 | 2006-09-12 | T-Netix, Inc. | Computer-based method and apparatus for controlling, monitoring, recording and reporting telephone access |
US5805674A (en) | 1995-01-26 | 1998-09-08 | Anderson, Jr.; Victor C. | Security arrangement and method for controlling access to a protected system |
US5907602A (en) | 1995-03-30 | 1999-05-25 | British Telecommunications Public Limited Company | Detecting possible fraudulent communication usage |
US20040111305A1 (en) | 1995-04-21 | 2004-06-10 | Worldcom, Inc. | System and method for detecting and managing fraud |
US6044382A (en) | 1995-05-19 | 2000-03-28 | Cyber Fone Technologies, Inc. | Data transaction assembly server |
US7006605B1 (en) | 1996-06-28 | 2006-02-28 | Ochopee Big Cypress Llc | Authenticating a caller before providing the caller with access to one or more secured resources |
US20090147939A1 (en) | 1996-06-28 | 2009-06-11 | Morganstein Sanford J | Authenticating An Individual Using An Utterance Representation and Ambiguity Resolution Information |
US6266640B1 (en) | 1996-08-06 | 2001-07-24 | Dialogic Corporation | Data network with voice verification means |
US5999525A (en) | 1996-11-18 | 1999-12-07 | Mci Communications Corporation | Method for video telephony over a hybrid network |
US20030009333A1 (en) | 1996-11-22 | 2003-01-09 | T-Netix, Inc. | Voice print system and method |
US5963908A (en) | 1996-12-23 | 1999-10-05 | Intel Corporation | Secure logon to notebook or desktop computers |
US6480825B1 (en) | 1997-01-31 | 2002-11-12 | T-Netix, Inc. | System and method for detecting a recorded voice |
US5946654A (en) | 1997-02-21 | 1999-08-31 | Dragon Systems, Inc. | Speaker identification using unsupervised speech models |
US7403922B1 (en) | 1997-07-28 | 2008-07-22 | Cybersource Corporation | Method and apparatus for evaluating fraud risk in an electronic commerce transaction |
US6145083A (en) | 1998-04-23 | 2000-11-07 | Siemens Information And Communication Networks, Inc. | Methods and system for providing data and telephony security |
US20020022474A1 (en) | 1998-12-23 | 2002-02-21 | Vesa Blom | Detecting and preventing fraudulent use in a telecommunications network |
US6510415B1 (en) | 1999-04-15 | 2003-01-21 | Sentry Com Ltd. | Voice authentication method and system utilizing same |
WO2000077772A2 (en) | 1999-06-14 | 2000-12-21 | Cyber Technology (Iom) Liminted | Speech and voice signal preprocessing |
US6427137B2 (en) | 1999-08-31 | 2002-07-30 | Accenture Llp | System, method and article of manufacture for a voice analysis system that detects nervousness for preventing fraud |
US6275806B1 (en) | 1999-08-31 | 2001-08-14 | Andersen Consulting, Llp | System method and article of manufacture for detecting emotion in voice signals by utilizing statistics for voice signal parameters |
US20090254971A1 (en) | 1999-10-27 | 2009-10-08 | Pinpoint, Incorporated | Secure data interchange |
US20030208684A1 (en) | 2000-03-08 | 2003-11-06 | Camacho Luz Maria | Method and apparatus for reducing on-line fraud using personal digital identification |
US20010026632A1 (en) | 2000-03-24 | 2001-10-04 | Seiichiro Tamai | Apparatus for identity verification, a system for identity verification, a card for identity verification and a method for identity verification, based on identification by biometrics |
US20020099649A1 (en) | 2000-04-06 | 2002-07-25 | Lee Walter W. | Identification and management of fraudulent credit/debit card purchases at merchant ecommerce sites |
US20110255676A1 (en) | 2000-05-22 | 2011-10-20 | Verizon Business Global Llc | Fraud detection based on call attempt velocity on terminating number |
US7039951B1 (en) | 2000-06-06 | 2006-05-02 | International Business Machines Corporation | System and method for confidence based incremental access authentication |
US20070124246A1 (en) | 2000-09-29 | 2007-05-31 | Justin Lawyer | Self-Learning Real-Time Priorization of Fraud Control Actions |
US7158622B2 (en) | 2000-09-29 | 2007-01-02 | Fair Isaac Corporation | Self-learning real-time prioritization of telecommunication fraud control actions |
US6597775B2 (en) | 2000-09-29 | 2003-07-22 | Fair Isaac Corporation | Self-learning real-time prioritization of telecommunication fraud control actions |
US20080222734A1 (en) | 2000-11-13 | 2008-09-11 | Redlich Ron M | Security System with Extraction, Reconstruction and Secure Recovery and Storage of Data |
US20070100608A1 (en) | 2000-11-21 | 2007-05-03 | The Regents Of The University Of California | Speaker verification system using acoustic data and non-acoustic data |
US6587552B1 (en) | 2001-02-15 | 2003-07-01 | Worldcom, Inc. | Fraud library |
US6915259B2 (en) * | 2001-05-24 | 2005-07-05 | Matsushita Electric Industrial Co., Ltd. | Speaker and environment adaptation based on linear separation of variability sources |
US20030050780A1 (en) * | 2001-05-24 | 2003-03-13 | Luca Rigazio | Speaker and environment adaptation based on linear separation of variability sources |
US20110282778A1 (en) | 2001-05-30 | 2011-11-17 | Wright William A | Method and apparatus for evaluating fraud risk in an electronic commerce transaction |
US20060149558A1 (en) * | 2001-07-17 | 2006-07-06 | Jonathan Kahn | Synchronized pattern recognition source data processed by manual or automatic means for creation of shared speaker-dependent speech user profile |
US20030050816A1 (en) | 2001-08-09 | 2003-03-13 | Givens George R. | Systems and methods for network-based employment decisioning |
US20030147516A1 (en) | 2001-09-25 | 2003-08-07 | Justin Lawyer | Self-learning real-time prioritization of telecommunication fraud control actions |
US20030097593A1 (en) | 2001-11-19 | 2003-05-22 | Fujitsu Limited | User terminal authentication program |
US20050185779A1 (en) | 2002-07-31 | 2005-08-25 | Toms Alvin D. | System and method for the detection and termination of fraudulent services |
US20050043014A1 (en) | 2002-08-08 | 2005-02-24 | Hodge Stephen L. | Telecommunication call management and monitoring system with voiceprint verification |
US20040029087A1 (en) | 2002-08-08 | 2004-02-12 | Rodney White | System and method for training and managing gaming personnel |
US20090046841A1 (en) | 2002-08-08 | 2009-02-19 | Hodge Stephen L | Telecommunication call management and monitoring system with voiceprint verification |
US7054811B2 (en) | 2002-11-06 | 2006-05-30 | Cellmax Systems Ltd. | Method and system for verifying and enabling user access based on voice parameters |
US7539290B2 (en) | 2002-11-08 | 2009-05-26 | Verizon Services Corp. | Facilitation of a conference call |
JP2004193942A (en) | 2002-12-11 | 2004-07-08 | Nippon Hoso Kyokai <Nhk> | Method, apparatus and program for transmitting content and method, apparatus and program for receiving content |
US20040131160A1 (en) | 2003-01-02 | 2004-07-08 | Aris Mardirossian | System and method for monitoring individuals |
US20040203575A1 (en) | 2003-01-13 | 2004-10-14 | Chin Mary W. | Method of recognizing fraudulent wireless emergency service calls |
US20040143635A1 (en) | 2003-01-15 | 2004-07-22 | Nick Galea | Regulating receipt of electronic mail |
US20040167964A1 (en) | 2003-02-25 | 2004-08-26 | Rounthwaite Robert L. | Adaptive junk message filtering system |
WO2004079501A2 (en) | 2003-02-25 | 2004-09-16 | Microsoft Corporation | Adaptive junk message filtering system |
US20080010066A1 (en) | 2003-05-30 | 2008-01-10 | American Express Travel Related Services Company, Inc. | Speaker recognition in a multi-speaker environment and comparison of several voice prints to many |
US8036892B2 (en) | 2003-05-30 | 2011-10-11 | American Express Travel Related Services Company, Inc. | Speaker recognition in a multi-speaker environment and comparison of several voice prints to many |
US7299177B2 (en) | 2003-05-30 | 2007-11-20 | American Express Travel Related Services Company, Inc. | Speaker recognition in a multi-speaker environment and comparison of several voice prints to many |
US7778832B2 (en) | 2003-05-30 | 2010-08-17 | American Express Travel Related Services Company, Inc. | Speaker recognition in a multi-speaker environment and comparison of several voice prints to many |
US20040240631A1 (en) | 2003-05-30 | 2004-12-02 | Vicki Broman | Speaker recognition in a multi-speaker environment and comparison of several voice prints to many |
US20050010411A1 (en) * | 2003-07-09 | 2005-01-13 | Luca Rigazio | Speech data mining for call center management |
US7212613B2 (en) | 2003-09-18 | 2007-05-01 | International Business Machines Corporation | System and method for telephonic voice authentication |
US20050076084A1 (en) | 2003-10-03 | 2005-04-07 | Corvigo | Dynamic message filtering |
US20050125226A1 (en) | 2003-10-29 | 2005-06-09 | Paul Magee | Voice recognition system and method |
US20050125339A1 (en) | 2003-12-09 | 2005-06-09 | Tidwell Lisa C. | Systems and methods for assessing the risk of a financial transaction using biometric information |
US20060098803A1 (en) | 2003-12-18 | 2006-05-11 | Sbc Knowledge Ventures, L.P. | Intelligently routing customer communications |
US20050135595A1 (en) * | 2003-12-18 | 2005-06-23 | Sbc Knowledge Ventures, L.P. | Intelligently routing customer communications |
US7660715B1 (en) * | 2004-01-12 | 2010-02-09 | Avaya Inc. | Transparent monitoring and intervention to improve automatic adaptation of speech models |
US20060013372A1 (en) | 2004-07-15 | 2006-01-19 | Tekelec | Methods, systems, and computer program products for automatically populating signaling-based access control database |
JP2006038955A (en) | 2004-07-22 | 2006-02-09 | Docomo Engineering Tohoku Inc | Voiceprint recognition system |
WO2006013555A2 (en) | 2004-08-04 | 2006-02-09 | Cellmax Systems Ltd. | Method and system for verifying and enabling user access based on voice parameters |
US20060106605A1 (en) | 2004-11-12 | 2006-05-18 | Saunders Joseph M | Biometric record management |
US20060161435A1 (en) | 2004-12-07 | 2006-07-20 | Farsheed Atef | System and method for identity verification and management |
US8112278B2 (en) | 2004-12-13 | 2012-02-07 | Securicom (Nsw) Pty Ltd | Enhancing the response of biometric access systems |
US7657431B2 (en) | 2005-02-18 | 2010-02-02 | Fujitsu Limited | Voice authentication system |
US20060212925A1 (en) | 2005-03-02 | 2006-09-21 | Markmonitor, Inc. | Implementing trust policies |
US20060212407A1 (en) | 2005-03-17 | 2006-09-21 | Lyon Dennis B | User authentication and secure transaction system |
US20150055763A1 (en) | 2005-04-21 | 2015-02-26 | Verint Americas Inc. | Systems, methods, and media for determining fraud patterns and creating fraud behavioral models |
US8073691B2 (en) | 2005-04-21 | 2011-12-06 | Victrio, Inc. | Method and system for screening using voice data and metadata |
US20120054202A1 (en) | 2005-04-21 | 2012-03-01 | Victrio, Inc. | Method and System for Screening Using Voice Data and Metadata |
US20120053939A9 (en) | 2005-04-21 | 2012-03-01 | Victrio | Speaker verification-based fraud system for combined automated risk score with agent review and associated user interface |
US20100305946A1 (en) | 2005-04-21 | 2010-12-02 | Victrio | Speaker verification-based fraud system for combined automated risk score with agent review and associated user interface |
US8510215B2 (en) | 2005-04-21 | 2013-08-13 | Victrio, Inc. | Method and system for enrolling a voiceprint in a fraudster database |
US20090119106A1 (en) | 2005-04-21 | 2009-05-07 | Anthony Rajakumar | Building whitelists comprising voiceprints not associated with fraud and screening calls using a combination of a whitelist and blacklist |
US20130253919A1 (en) | 2005-04-21 | 2013-09-26 | Richard Gutierrez | Method and System for Enrolling a Voiceprint in a Fraudster Database |
US20120254243A1 (en) | 2005-04-21 | 2012-10-04 | Torsten Zeppenfeld | Systems, methods, and media for generating hierarchical fused risk scores |
US20120072453A1 (en) | 2005-04-21 | 2012-03-22 | Lisa Guerra | Systems, methods, and media for determining fraud patterns and creating fraud behavioral models |
US20120253805A1 (en) | 2005-04-21 | 2012-10-04 | Anthony Rajakumar | Systems, methods, and media for determining fraud risk from audio signals |
US8311826B2 (en) | 2005-04-21 | 2012-11-13 | Victrio, Inc. | Method and system for screening using voice data and metadata |
US20070282605A1 (en) | 2005-04-21 | 2007-12-06 | Anthony Rajakumar | Method and System for Screening Using Voice Data and Metadata |
US20100305960A1 (en) | 2005-04-21 | 2010-12-02 | Victrio | Method and system for enrolling a voiceprint in a fraudster database |
US20120263285A1 (en) | 2005-04-21 | 2012-10-18 | Anthony Rajakumar | Systems, methods, and media for disambiguating call data to determine fraud |
US20060248019A1 (en) | 2005-04-21 | 2006-11-02 | Anthony Rajakumar | Method and system to detect fraud using voice data |
US20100303211A1 (en) | 2005-04-21 | 2010-12-02 | Victrio | Method and system for generating a fraud risk score using telephony channel based audio and non-audio data |
US20060282660A1 (en) | 2005-04-29 | 2006-12-14 | Varghese Thomas E | System and method for fraud monitoring, detection, and tiered user authentication |
US7908645B2 (en) | 2005-04-29 | 2011-03-15 | Oracle International Corporation | System and method for fraud monitoring, detection, and tiered user authentication |
US20060285665A1 (en) | 2005-05-27 | 2006-12-21 | Nice Systems Ltd. | Method and apparatus for fraud detection |
US7386105B2 (en) | 2005-05-27 | 2008-06-10 | Nice Systems Ltd | Method and apparatus for fraud detection |
US20060293891A1 (en) | 2005-06-22 | 2006-12-28 | Jan Pathuel | Biometric control systems and associated methods of use |
US20070071206A1 (en) | 2005-06-24 | 2007-03-29 | Gainsboro Jay L | Multi-party conversation analyzer & logger |
US20060289622A1 (en) | 2005-06-24 | 2006-12-28 | American Express Travel Related Services Company, Inc. | Word recognition system and method for customer and employee assessment |
WO2007001452A2 (en) | 2005-06-24 | 2007-01-04 | American Express Marketing & Development Corp. | Word recognition system and method for customer and employee assessment |
US7940897B2 (en) | 2005-06-24 | 2011-05-10 | American Express Travel Related Services Company, Inc. | Word recognition system and method for customer and employee assessment |
US20110191106A1 (en) | 2005-06-24 | 2011-08-04 | American Express Travel Related Services Company, Inc. | Word recognition system and method for customer and employee assessment |
US20070041517A1 (en) | 2005-06-30 | 2007-02-22 | Pika Technologies Inc. | Call transfer detection method using voice identification techniques |
US20110320484A1 (en) | 2005-09-23 | 2011-12-29 | Smithies Christopher P K | System and method for verification of personal identity |
US20070074021A1 (en) | 2005-09-23 | 2007-03-29 | Smithies Christopher P K | System and method for verification of personal identity |
US7668769B2 (en) | 2005-10-04 | 2010-02-23 | Basepoint Analytics, LLC | System and method of detecting fraud |
US20090247131A1 (en) | 2005-10-31 | 2009-10-01 | Champion Laurenn L | Systems and Methods for Restricting The Use of Stolen Devices on a Wireless Network |
US20080181417A1 (en) * | 2006-01-25 | 2008-07-31 | Nice Systems Ltd. | Method and Apparatus For Segmentation of Audio Interactions |
US20100138282A1 (en) * | 2006-02-22 | 2010-06-03 | Kannan Pallipuram V | Mining interactions to manage customer experience throughout a customer service lifecycle |
US20110004472A1 (en) | 2006-03-31 | 2011-01-06 | Igor Zlokarnik | Speech Recognition Using Channel Verification |
US20070244702A1 (en) | 2006-04-12 | 2007-10-18 | Jonathan Kahn | Session File Modification with Annotation Using Speech Recognition or Text to Speech |
US20070280436A1 (en) | 2006-04-14 | 2007-12-06 | Anthony Rajakumar | Method and System to Seed a Voice Database |
US20070250318A1 (en) | 2006-04-25 | 2007-10-25 | Nice Systems Ltd. | Automatic speech analysis |
US20070288242A1 (en) * | 2006-06-12 | 2007-12-13 | Lockheed Martin Corporation | Speech recognition and control system, program product, and related methods |
US7822605B2 (en) | 2006-10-19 | 2010-10-26 | Nice Systems Ltd. | Method and apparatus for large population speaker identification in telephone interactions |
US20080195387A1 (en) | 2006-10-19 | 2008-08-14 | Nice Systems Ltd. | Method and apparatus for large population speaker identification in telephone interactions |
US20090319269A1 (en) | 2008-06-24 | 2009-12-24 | Hagai Aronowitz | Method of Trainable Speaker Diarization |
US8537978B2 (en) | 2008-10-06 | 2013-09-17 | International Business Machines Corporation | Method and system for using conversational biometrics and speaker identification/verification to filter voice streams |
US20100228656A1 (en) | 2009-03-09 | 2010-09-09 | Nice Systems Ltd. | Apparatus and method for fraud prevention |
US20100332287A1 (en) * | 2009-06-24 | 2010-12-30 | International Business Machines Corporation | System and method for real-time prediction of customer satisfaction |
US20110026689A1 (en) | 2009-07-30 | 2011-02-03 | Metz Brent D | Telephone call inbox |
US20160379082A1 (en) | 2009-10-28 | 2016-12-29 | Digimarc Corporation | Intuitive computing methods and systems |
US20110119060A1 (en) | 2009-11-15 | 2011-05-19 | International Business Machines Corporation | Method and system for speaker diarization |
US20110282661A1 (en) | 2010-05-11 | 2011-11-17 | Nice Systems Ltd. | Method for speaker source classification |
US20120130771A1 (en) * | 2010-11-18 | 2012-05-24 | Kannan Pallipuram V | Chat Categorization and Agent Performance Modeling |
US20120284026A1 (en) | 2011-05-06 | 2012-11-08 | Nexidia Inc. | Speaker verification system |
US20130163737A1 (en) | 2011-12-22 | 2013-06-27 | Cox Communications, Inc. | Systems and Methods of Detecting Communications Fraud |
US20130197912A1 (en) | 2012-01-31 | 2013-08-01 | Fujitsu Limited | Specific call detecting device and specific call detecting method |
US9001976B2 (en) * | 2012-05-03 | 2015-04-07 | Nexidia, Inc. | Speaker adaptation |
US20130300939A1 (en) | 2012-05-11 | 2013-11-14 | Cisco Technology, Inc. | System and method for joint speaker and scene recognition in a video/audio processing environment |
US20140067394A1 (en) | 2012-08-28 | 2014-03-06 | King Abdulaziz City For Science And Technology | System and method for decoding speech |
US20140142940A1 (en) | 2012-11-21 | 2014-05-22 | Verint Systems Ltd. | Diarization Using Linguistic Labeling |
US10134400B2 (en) * | 2012-11-21 | 2018-11-20 | Verint Systems Ltd. | Diarization using acoustic labeling |
US20160379032A1 (en) * | 2014-01-14 | 2016-12-29 | Focaltech Electronics, Ltd. | Electric field-type fingerprint identification apparatus and state control method and prosthesis identification method thereof |
US20160364606A1 (en) * | 2014-10-27 | 2016-12-15 | Mattersight Corporation | Predictive and responsive video analytics system and methods |
Non-Patent Citations (12)
Title |
---|
Baum, L. E., et al., "A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains,"The Annals of Mathematical Statistics, vol. 41, No. 1, 1970, pp. 164-171. |
Cheng, Y., "Mean Shift, Mode Seeking, and Clustering," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, No. 8, 1995, pp. 790-799. |
Cohen, I., "Noise Spectrum Estimation in Adverse Environment: Improved Minima Controlled Recursive Averaging," IEEE Transactions On Speech and Audio Processing, vol. 11, No. 5, 2003, pp. 466-475. |
Cohen, I., et al., "Spectral Enhancement by Tracking Speech Presence Probability in Subbands," Proc. International Workshop in Hand-Free Speech Communication (HSC'01), 2001, pp. 95-98. |
Coifman, R.R., et al., "Diffusion maps," Applied and Computational Harmonic Analysis, vol. 21, 2006, pp. 5-30. |
Hayes, M.H., "Statistical Digital Signal Processing and Modeling," J. Wiley & Sons, Inc., New York, 1996, 200 pages. |
Hermansky, H., "Perceptual linear predictive (PLP) analysis of speech," Journal of the Acoustical Society of America, vol. 87, No. 4, 1990, pp. 1738-1752. |
Lailler, C., et al., "Semi-Supervised and Unsupervised Data Extraction Targeting Speakers: From Speaker Roles to Fame?," Proceedings of the First Workshop on Speech, Language and Audio in Multimedia (SLAM), Marseille, France, 2013, 6 pages. |
Mermelstein, P., "Distance Measures for Speech Recognition-Psychological and Instrumental," Pattern Recognition and Artificial Intelligence, 1976, pp. 374-388. |
Mermelstein, P., "Distance Measures for Speech Recognition—Psychological and Instrumental," Pattern Recognition and Artificial Intelligence, 1976, pp. 374-388. |
Schmalenstroeer, J., et al., "Online Diarization of Streaming Audio-Visual Data for Smart Environments," IEEE Journal of Selected Topics in Signal Processing, vol. 4, No. 5, 2010, 12 pages. |
Viterbi, A.J., "Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm," IEEE Transactions on Information Theory, vol. 13, No. 2, 1967, pp. 260-269. |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220139399A1 (en) * | 2012-11-21 | 2022-05-05 | Verint Systems Ltd. | System and method of video capture and search optimization for creating an acoustic voiceprint |
US11776547B2 (en) * | 2012-11-21 | 2023-10-03 | Verint Systems Inc. | System and method of video capture and search optimization for creating an acoustic voiceprint |
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10650826B2 (en) | Diarization using acoustic labeling | |
US11636860B2 (en) | Word-level blind diarization of recorded calls with arbitrary number of speakers | |
US10109280B2 (en) | Blind diarization of recorded calls with arbitrary number of speakers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: VERINT SYSTEMS LTD., ISRAEL Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZIV, OMER;ACHITUV, RAN;SHAPIRA, IDO;AND OTHERS;SIGNING DATES FROM 20131119 TO 20131120;REEL/FRAME:051579/0405 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: VERINT SYSTEMS INC., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:VERINT SYSTEMS LTD.;REEL/FRAME:057568/0183 Effective date: 20210201 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |