US20190373111A1 - Automatic call classification using machine learning - Google Patents
Automatic call classification using machine learning Download PDFInfo
- Publication number
- US20190373111A1 US20190373111A1 US16/366,605 US201916366605A US2019373111A1 US 20190373111 A1 US20190373111 A1 US 20190373111A1 US 201916366605 A US201916366605 A US 201916366605A US 2019373111 A1 US2019373111 A1 US 2019373111A1
- Authority
- US
- United States
- Prior art keywords
- phone call
- grams
- outcome
- text
- phone
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/436—Arrangements for screening incoming calls, i.e. evaluating the characteristics of a call before deciding whether to answer it
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G06N7/005—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/51—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
- H04M3/5183—Call or contact centers with computer-telephony arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/53—Centralised arrangements for recording incoming messages, i.e. mailbox systems
- H04M3/533—Voice mail systems
- H04M3/53366—Message disposing or creating aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/40—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/60—Medium conversion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/55—Aspects of automatic or semi-automatic exchanges related to network data storage and management
- H04M2203/552—Call annotations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/55—Aspects of automatic or semi-automatic exchanges related to network data storage and management
- H04M2203/555—Statistics, e.g. about subscribers but not being call statistics
- H04M2203/556—Statistical analysis and interpretation
Definitions
- the present invention relates to software and hardware for classifying a phone call.
- Embodiments relate to using a machine learning system to automatically classify the outcome of a phone call.
- the system may be used, for example, in call centers where human operators would otherwise have to record the outcomes themselves.
- One embodiment relates to a machine learning method for classifying the outcome of a phone call.
- a text transcript of a phone call is provided and is translated into a vector representation.
- the vector representation is input to a machine learning model, which outputs a predicted classification of the outcome of the phone call.
- the predicted classification of the outcome may be used to determine future actions of the system.
- FIG. 1 illustrates an exemplary environment in which a call management system with a call outcome classifier may operate.
- FIG. 2 illustrates an exemplary method for classifying the outcome of a call using machine learning.
- FIG. 3 illustrates another exemplary method for classifying the outcome of a call using machine learning.
- steps of the exemplary methods set forth in this exemplary patent can be performed in different orders than the order presented in this specification. Furthermore, some steps of the exemplary methods may be performed in parallel rather than being performed sequentially. Also, the steps of the exemplary methods may be performed in a network environment in which some steps are performed by different computers in the networked environment.
- FIG. 1 illustrates an exemplary environment 100 in which embodiments may operate.
- a call management system 110 manages calls for human operators 180 .
- the human operators 180 may be in a dedicated call center for a business.
- the call management system 110 provides functionality such as automatically setting up and connecting calls. When a human operator completes a call, the call management system 110 may automatically initiate the next call.
- Some embodiments operate to classify outbound calls.
- the call management system 110 automatically determines a phone number to dial.
- the call management system 110 may do this by iterating through a list of phone numbers and scheduling phone numbers to dial based on several factors such as the last time an operator called that number, whether the number is considered to be a good number, the last contact that number had with the business, and so on.
- the call management system may transmit a request to call a phone number to a dialer 120 that dials the phone number. After the receiving phone rings and is picked up, the call is then connected with the recipient 130 . Alternatively, the call goes to a voicemail box if the recipient 130 is not available.
- inbound calls 190 are received by call management system 110 and routed by the call management system 110 to an available operator 180 who may answer the call and speak with the inbound caller.
- Inbound and outbound calls may connected over various networks such as the public switched telephone network (PSTN), voice over IP (VOIP), the Internet, intranets, or other networks.
- PSTN public switched telephone network
- VOIP voice over IP
- Internet Internet
- intranets intranets, or other networks.
- Classification may be performed by first translating phone calls into the intermediate representation of text.
- Automatic speech transcription system 140 may take as input an audio file of a phone call and output a text file transcript 150 representing the words spoken.
- the text transcript 150 may optionally also include indications of the turns taken (the switch between party speaking) and the identity of the party speaking for each utterance, whether it be the operator or caller or callee.
- the text transcript 150 may be entered as input into machine learning model 160 that serves to classify the outcome of the phone call.
- the outcome of a phone call may also be referred to as a disposition or status.
- Outcomes may include voicemail, call back later, not interested, completed, wrong number, maybe interested, alternative phone, and other possible outcomes.
- the voicemail outcome means that the recipient of the call was not available and the call went to voicemail.
- the call back later outcome means that the recipient was available but that it is necessary to call back later.
- the not interested outcome means that the recipient was reached but was not interested.
- the completed outcome means that the recipient was reached and the transaction was completed.
- the wrong number outcome means that a wrong number was reached.
- the maybe interested outcome means that the recipient was reached and might be interested.
- the alternative phone outcome means that an alternative phone number for the recipient was received and entered.
- Machine learning model 160 may employ a variety of machine learning models such as any of deep learning, neural networks, multinomial logistic regression, decision trees, random forests, Bayesian networks, support vector machines, nearest neighbor, ensemble methods, and other machine learning models.
- alternative embodiments may involve the direct classification of the outcome of the phone call from the audio file by the machine learning model 160 without going through the intermediate stage of a text representation.
- FIG. 2 illustrates an exemplary method 200 that may be performed to classify a phone conversation.
- a text transcript is provided of a phone call.
- An automatic speech recognition system may be used to generate the text transcript.
- a vector representation of the text transcript is generated. Embodiments may use a variety of vector representations. The vectors may be referred to as feature vectors.
- a vector is generated using a bag of words model.
- Each element of the vector represents the frequency of a particular word or n-gram in the text.
- each vector element is simply the number of times that each word or n-gram appears in the text.
- An n-gram is an adjacent grouping of n words or characters. Therefore, 1-grams represent single words and 2-grams represent pairs of adjacent words. Values of n larger than 2 may also be used. N-grams are typically ordered but may optionally be unordered.
- the elements may instead be represented by term frequency-inverse document frequency (TF-IDF) of n-grams rather than raw counts.
- TF-IDF term frequency-inverse document frequency
- Term frequency-inverse document frequency normalizes frequencies to provide more information than a raw count.
- the TF-IDF is calculated by multiplying the term frequency by the inverse document frequency.
- the term frequency of a term in a document is obtained by taking the number of times the term appears in the document divided by the number of words in the document.
- the inverse document frequency is obtained by dividing the total number of documents in the set (e.g., the number of text transcripts of calls) by the number of documents containing the term and taking the natural log of the resulting value.
- the TF-IDF can be represented by the equation
- f t,d is the frequency of the term t in document d
- ⁇ f t′,d is the number of words in document d
- N is the number of documents in the set
- n t is the number of documents containing the term t.
- a vector may be generated to represent the text as a document embedding.
- An embedding is a vector representation of an entity that tends to place more closely related entities to be more closely located in vector space and tends to place more disparate entities farther from each other in vector space.
- the goal of an embedding would be to group similar text transcripts closely together in vector space.
- One method of generating an embedding is using the skip-gram model. In the skip-gram model, a one layer neural network is used and the weights learned by the single layer of the neural network end up being the vector representation for the embedding of the text.
- a single layer neural network is trained using one-hot vector encodings of words to output probabilities that other words appear in the same context as the given word, where a context is a word window of a specified size.
- One-hot vector encodings have an element for each possible word, where there is a single 1 in the position of the represented word and a 0 in all other positions.
- the output layer of the single layer neural network has a node for each potential word in the vocabulary, and the value of each node in the output layer is the probability that the word appears in the context of the input word.
- the weights of the single layer neural network for a particular word when input as a one-hot encoding, may be used as a word embedding for the word.
- the word embeddings created in this manner tend to cluster similar words together in vector space, while increasing the distance to unlike words.
- the skip-gram model may also be applied to documents in the same manner by training a single layer neural network to output probabilities that other documents are similar to the input document, which is input as a one-hot vector encoding. In this way, document embeddings may be generated, similarly to word embeddings.
- the vector representation is input into a machine learning model.
- the machine learning model is previously trained using supervised learning by providing examples of pairs of text transcripts of phone calls and correct classifications of the outcome of the call. Through the training process the machine learning model learns an internal representation with increased accuracy in classifying new, unseen examples into the various outcomes.
- step 204 an output is received from the machine learning model representing a predicted classification of the outcome of the phone call.
- the system may determine a future course of action to take based on the predicted classification of the outcome of the phone call.
- the course of action may include determining whether to schedule a follow up phone call. If it is determined that a follow up phone call should be scheduled, then the system may schedule a follow up phone call using the call management system 110 or other schedule management system.
- FIG. 3 illustrates an exemplary method 300 that may be performed to classify a phone conversation.
- a text transcript is provided of a phone call.
- An automatic speech recognition system may be used to generate the text transcript.
- a vector representation of the text transcript is generated. Embodiments may use a variety of vector representations, including bag of words or document embedding as described above.
- the vector representation includes at least some elements representing the TF-IDF of 1-grams of single words in the text transcript and at least some elements representing the TF-IDF of 2-grams of adjacent word pairs in the text transcript.
- a vector representation may optionally be created by setting a maximum vector size (e.g., 10,000), determining from the set of text transcripts of phone calls the most common 1-grams and 2-grams up to the maximum vector size (e.g., the 10,000 most common 1-grams and 2-grams), and generating the vector representation of the text transcript by filling the vector with elements representing the TF-IDF of the most common 1-grams and 2-grams in the set of text transcripts up to the maximum vector size.
- a maximum vector size e.g. 10,000
- the vector representation is input into a machine learning model.
- the machine learning model may be a multinomial logistic regression model.
- the multinomial logistic regression model may use the Softmax function to categorize outcomes into multiple classes. When the Softmax function is used, this is known as Softmax regression or the Softmax algorithm.
- the model may also use L2 regularization to help prevent overfitting. L2 regularization may also be referred to as Ridge regression and adds squared magnitude of coefficients as a penalty term to the loss function.
- L1 regularization may be used in the model. L1 regularization may also be referred to as Lasso regression and adds absolute value of magnitude of coefficients as a penalty term to the loss function.
- the multinomial logistic regression model is trained using examples of text transcripts, in vector representation format as described above, along with their correct outcome classifications.
- the vector representations used for training may, for example, include at least some elements representing the TF-IDF of 1-grams of single words in the text transcripts and at least some elements representing the TF-IDF of 2-grams of adjacent word pairs in the text transcripts.
- the multinomial logistic regression model then converges to a model that allows classification of unseen examples.
- step 304 an output is received from the multinomial logistic regression model representing a predicted classification of the outcome of the phone call.
- the system may determine a future course of action to take based on the predicted classification of the outcome of the phone call.
- the course of action may include determining whether to schedule a follow up phone call. If it is determined that a follow up phone call should be scheduled, then the system may schedule a follow up phone call using the call management system 110 or other schedule management system.
Abstract
Description
- This application claims the benefit of U.S. Provisional Patent Application No. 62/678,189, filed May 30, 2018, which is hereby incorporated by reference in its entirety.
- The present invention relates to software and hardware for classifying a phone call.
- It is often necessary for operators in call centers to classify the outcome of a phone call. Current methods include displaying a user interface to the operator so that the operator may select an outcome or enter an outcome in a free text field. Recording the outcome of the phone call is important for knowing what further actions to take regarding the recipient of the phone call, for example whether to follow up with the individual or to take the individual off a calling list.
- However, requiring operators to manually enter the outcome of a phone call is cumbersome. It distracts operators from their task and takes precious seconds away from time they could be spending making more calls. Moreover, operators can select the wrong outcome choices when under the time pressure to select outcomes quickly, leading to inaccurate data. Therefore, it would be advantageous to use a computer system to analyze phone calls and automatically determine the outcome.
- Embodiments relate to using a machine learning system to automatically classify the outcome of a phone call. The system may be used, for example, in call centers where human operators would otherwise have to record the outcomes themselves.
- One embodiment relates to a machine learning method for classifying the outcome of a phone call. A text transcript of a phone call is provided and is translated into a vector representation. The vector representation is input to a machine learning model, which outputs a predicted classification of the outcome of the phone call. Optionally, the predicted classification of the outcome may be used to determine future actions of the system.
-
FIG. 1 illustrates an exemplary environment in which a call management system with a call outcome classifier may operate. -
FIG. 2 illustrates an exemplary method for classifying the outcome of a call using machine learning. -
FIG. 3 illustrates another exemplary method for classifying the outcome of a call using machine learning. - In this specification, reference is made in detail to specific embodiments of the invention. Some of the embodiments or their aspects are illustrated in the drawings.
- For clarity in explanation, the invention has been described with reference to specific embodiments, however it should be understood that the invention is not limited to the described embodiments. On the contrary, the invention covers alternatives, modifications, and equivalents as may be included within its scope as defined by any patent claims. The following embodiments of the invention are set forth without any loss of generality to, and without imposing limitations on, the claimed invention. In the following description, specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In addition, well known features may not have been described in detail to avoid unnecessarily obscuring the invention.
- In addition, it should be understood that steps of the exemplary methods set forth in this exemplary patent can be performed in different orders than the order presented in this specification. Furthermore, some steps of the exemplary methods may be performed in parallel rather than being performed sequentially. Also, the steps of the exemplary methods may be performed in a network environment in which some steps are performed by different computers in the networked environment.
-
FIG. 1 illustrates anexemplary environment 100 in which embodiments may operate. Acall management system 110 manages calls forhuman operators 180. Thehuman operators 180 may be in a dedicated call center for a business. Thecall management system 110 provides functionality such as automatically setting up and connecting calls. When a human operator completes a call, thecall management system 110 may automatically initiate the next call. - Some embodiments operate to classify outbound calls. For outbound calling, the
call management system 110 automatically determines a phone number to dial. Thecall management system 110 may do this by iterating through a list of phone numbers and scheduling phone numbers to dial based on several factors such as the last time an operator called that number, whether the number is considered to be a good number, the last contact that number had with the business, and so on. The call management system may transmit a request to call a phone number to adialer 120 that dials the phone number. After the receiving phone rings and is picked up, the call is then connected with therecipient 130. Alternatively, the call goes to a voicemail box if therecipient 130 is not available. - Some embodiments may instead operate on inbound calls to a call center. In that case,
inbound calls 190 are received bycall management system 110 and routed by thecall management system 110 to anavailable operator 180 who may answer the call and speak with the inbound caller. - Inbound and outbound calls may connected over various networks such as the public switched telephone network (PSTN), voice over IP (VOIP), the Internet, intranets, or other networks.
- Classification may be performed by first translating phone calls into the intermediate representation of text. Automatic
speech transcription system 140 may take as input an audio file of a phone call and output atext file transcript 150 representing the words spoken. Thetext transcript 150 may optionally also include indications of the turns taken (the switch between party speaking) and the identity of the party speaking for each utterance, whether it be the operator or caller or callee. - The
text transcript 150 may be entered as input intomachine learning model 160 that serves to classify the outcome of the phone call. The outcome of a phone call may also be referred to as a disposition or status. Outcomes may include voicemail, call back later, not interested, completed, wrong number, maybe interested, alternative phone, and other possible outcomes. The voicemail outcome means that the recipient of the call was not available and the call went to voicemail. The call back later outcome means that the recipient was available but that it is necessary to call back later. The not interested outcome means that the recipient was reached but was not interested. The completed outcome means that the recipient was reached and the transaction was completed. The wrong number outcome means that a wrong number was reached. The maybe interested outcome means that the recipient was reached and might be interested. The alternative phone outcome means that an alternative phone number for the recipient was received and entered. -
Machine learning model 160 may employ a variety of machine learning models such as any of deep learning, neural networks, multinomial logistic regression, decision trees, random forests, Bayesian networks, support vector machines, nearest neighbor, ensemble methods, and other machine learning models. - Although one method was described involving the transformation of an audio file into text before classification, alternative embodiments may involve the direct classification of the outcome of the phone call from the audio file by the
machine learning model 160 without going through the intermediate stage of a text representation. -
FIG. 2 illustrates an exemplary method 200 that may be performed to classify a phone conversation. Instep 201, a text transcript is provided of a phone call. An automatic speech recognition system may be used to generate the text transcript. Instep 202, a vector representation of the text transcript is generated. Embodiments may use a variety of vector representations. The vectors may be referred to as feature vectors. - In one embodiment, a vector is generated using a bag of words model. Each element of the vector represents the frequency of a particular word or n-gram in the text. In a raw frequency based model, each vector element is simply the number of times that each word or n-gram appears in the text. An n-gram is an adjacent grouping of n words or characters. Therefore, 1-grams represent single words and 2-grams represent pairs of adjacent words. Values of n larger than 2 may also be used. N-grams are typically ordered but may optionally be unordered.
- In more sophisticated methods, the elements may instead be represented by term frequency-inverse document frequency (TF-IDF) of n-grams rather than raw counts. Term frequency-inverse document frequency normalizes frequencies to provide more information than a raw count. The TF-IDF is calculated by multiplying the term frequency by the inverse document frequency. The term frequency of a term in a document is obtained by taking the number of times the term appears in the document divided by the number of words in the document. The inverse document frequency is obtained by dividing the total number of documents in the set (e.g., the number of text transcripts of calls) by the number of documents containing the term and taking the natural log of the resulting value. The TF-IDF can be represented by the equation
-
-
- where ft,d is the frequency of the term t in document d, Σ ft′,d is the number of words in document d, N is the number of documents in the set, and nt is the number of documents containing the term t.
- In another embodiment, a vector may be generated to represent the text as a document embedding. An embedding is a vector representation of an entity that tends to place more closely related entities to be more closely located in vector space and tends to place more disparate entities farther from each other in vector space. In the case of text transcripts of phone calls, the goal of an embedding would be to group similar text transcripts closely together in vector space. One method of generating an embedding is using the skip-gram model. In the skip-gram model, a one layer neural network is used and the weights learned by the single layer of the neural network end up being the vector representation for the embedding of the text.
- In one embodiment, a single layer neural network is trained using one-hot vector encodings of words to output probabilities that other words appear in the same context as the given word, where a context is a word window of a specified size. One-hot vector encodings have an element for each possible word, where there is a single 1 in the position of the represented word and a 0 in all other positions. The output layer of the single layer neural network has a node for each potential word in the vocabulary, and the value of each node in the output layer is the probability that the word appears in the context of the input word. After training the single layer neural network, the weights of the single layer neural network for a particular word, when input as a one-hot encoding, may be used as a word embedding for the word. The word embeddings created in this manner tend to cluster similar words together in vector space, while increasing the distance to unlike words.
- The skip-gram model may also be applied to documents in the same manner by training a single layer neural network to output probabilities that other documents are similar to the input document, which is input as a one-hot vector encoding. In this way, document embeddings may be generated, similarly to word embeddings.
- In
step 203, the vector representation is input into a machine learning model. The machine learning model is previously trained using supervised learning by providing examples of pairs of text transcripts of phone calls and correct classifications of the outcome of the call. Through the training process the machine learning model learns an internal representation with increased accuracy in classifying new, unseen examples into the various outcomes. - In
step 204, an output is received from the machine learning model representing a predicted classification of the outcome of the phone call. - In
step 205, optionally, the system may determine a future course of action to take based on the predicted classification of the outcome of the phone call. The course of action may include determining whether to schedule a follow up phone call. If it is determined that a follow up phone call should be scheduled, then the system may schedule a follow up phone call using thecall management system 110 or other schedule management system. -
FIG. 3 illustrates an exemplary method 300 that may be performed to classify a phone conversation. Instep 301, a text transcript is provided of a phone call. An automatic speech recognition system may be used to generate the text transcript. Instep 302, a vector representation of the text transcript is generated. Embodiments may use a variety of vector representations, including bag of words or document embedding as described above. - In one embodiment, the vector representation includes at least some elements representing the TF-IDF of 1-grams of single words in the text transcript and at least some elements representing the TF-IDF of 2-grams of adjacent word pairs in the text transcript. Such a vector representation may optionally be created by setting a maximum vector size (e.g., 10,000), determining from the set of text transcripts of phone calls the most common 1-grams and 2-grams up to the maximum vector size (e.g., the 10,000 most common 1-grams and 2-grams), and generating the vector representation of the text transcript by filling the vector with elements representing the TF-IDF of the most common 1-grams and 2-grams in the set of text transcripts up to the maximum vector size.
- In step 303, the vector representation is input into a machine learning model. The machine learning model may be a multinomial logistic regression model. Optionally, the multinomial logistic regression model may use the Softmax function to categorize outcomes into multiple classes. When the Softmax function is used, this is known as Softmax regression or the Softmax algorithm. The model may also use L2 regularization to help prevent overfitting. L2 regularization may also be referred to as Ridge regression and adds squared magnitude of coefficients as a penalty term to the loss function. Alternatively, L1 regularization may be used in the model. L1 regularization may also be referred to as Lasso regression and adds absolute value of magnitude of coefficients as a penalty term to the loss function.
- The multinomial logistic regression model is trained using examples of text transcripts, in vector representation format as described above, along with their correct outcome classifications. The vector representations used for training may, for example, include at least some elements representing the TF-IDF of 1-grams of single words in the text transcripts and at least some elements representing the TF-IDF of 2-grams of adjacent word pairs in the text transcripts. Through training, the multinomial logistic regression model then converges to a model that allows classification of unseen examples.
- In
step 304, an output is received from the multinomial logistic regression model representing a predicted classification of the outcome of the phone call. - In
step 305, optionally, the system may determine a future course of action to take based on the predicted classification of the outcome of the phone call. The course of action may include determining whether to schedule a follow up phone call. If it is determined that a follow up phone call should be scheduled, then the system may schedule a follow up phone call using thecall management system 110 or other schedule management system. - The terminology used herein is for the purpose of describing particular aspects only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a,” “an,” and “the” are intended to comprise the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof
- While the invention has been particularly shown and described with reference to specific embodiments thereof, it should be understood that changes in the form and details of the disclosed embodiments may be made without departing from the scope of the invention. Although various advantages, aspects, and objects of the present invention have been discussed herein with reference to various embodiments, it will be understood that the scope of the invention should not be limited by reference to such advantages, aspects, and objects. Rather, the scope of the invention should be determined with reference to patent claims.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/366,605 US10498888B1 (en) | 2018-05-30 | 2019-03-27 | Automatic call classification using machine learning |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862678189P | 2018-05-30 | 2018-05-30 | |
US16/366,605 US10498888B1 (en) | 2018-05-30 | 2019-03-27 | Automatic call classification using machine learning |
Publications (2)
Publication Number | Publication Date |
---|---|
US10498888B1 US10498888B1 (en) | 2019-12-03 |
US20190373111A1 true US20190373111A1 (en) | 2019-12-05 |
Family
ID=68693427
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/366,605 Active US10498888B1 (en) | 2018-05-30 | 2019-03-27 | Automatic call classification using machine learning |
Country Status (1)
Country | Link |
---|---|
US (1) | US10498888B1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111275571A (en) * | 2020-01-14 | 2020-06-12 | 河海大学 | Resident load probability prediction deep learning method considering microclimate and user mode |
CN111723949A (en) * | 2020-06-24 | 2020-09-29 | 中国石油大学(华东) | Porosity prediction method based on selective ensemble learning |
US11727935B2 (en) | 2020-12-15 | 2023-08-15 | Optum Technology, Inc. | Natural language processing for optimized extractive summarization |
US11741400B1 (en) * | 2020-12-18 | 2023-08-29 | Beijing Didi Infinity Technology And Development Co., Ltd. | Machine learning-based real-time guest rider identification |
US11741143B1 (en) | 2022-07-28 | 2023-08-29 | Optum, Inc. | Natural language processing techniques for document summarization using local and corpus-wide inferences |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11615332B2 (en) * | 2019-06-25 | 2023-03-28 | Paypal, Inc. | Telephone call assessment using artificial intelligence |
US11373095B2 (en) * | 2019-12-23 | 2022-06-28 | Jens C. Jenkins | Machine learning multiple features of depicted item |
CN111310623B (en) * | 2020-02-03 | 2023-04-07 | 中国地质大学(武汉) | Method for analyzing debris flow sensitivity map based on remote sensing data and machine learning |
US11257486B2 (en) * | 2020-02-28 | 2022-02-22 | Intuit Inc. | Machine learning to propose actions in response to natural language questions |
CN113407185B (en) * | 2021-03-10 | 2023-01-06 | 天津大学 | Compiler optimization option recommendation method based on Bayesian optimization |
US20220309251A1 (en) * | 2021-03-25 | 2022-09-29 | InContact Inc. | Automated outcome classification systems in contact interactions, and methods |
US20220383867A1 (en) * | 2021-05-19 | 2022-12-01 | Capital One Services, Llc | Automated generation of fine-grained call reasons from customer service call transcripts |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004178123A (en) * | 2002-11-26 | 2004-06-24 | Hitachi Ltd | Information processor and program for executing information processor |
US20150120379A1 (en) * | 2013-10-30 | 2015-04-30 | Educational Testing Service | Systems and Methods for Passage Selection for Language Proficiency Testing Using Automated Authentic Listening |
US9900436B2 (en) * | 2015-06-01 | 2018-02-20 | AffectLayer, Inc. | Coordinating voice calls between representatives and customers to influence an outcome of the call |
-
2019
- 2019-03-27 US US16/366,605 patent/US10498888B1/en active Active
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111275571A (en) * | 2020-01-14 | 2020-06-12 | 河海大学 | Resident load probability prediction deep learning method considering microclimate and user mode |
CN111723949A (en) * | 2020-06-24 | 2020-09-29 | 中国石油大学(华东) | Porosity prediction method based on selective ensemble learning |
US11727935B2 (en) | 2020-12-15 | 2023-08-15 | Optum Technology, Inc. | Natural language processing for optimized extractive summarization |
US11741400B1 (en) * | 2020-12-18 | 2023-08-29 | Beijing Didi Infinity Technology And Development Co., Ltd. | Machine learning-based real-time guest rider identification |
US11741143B1 (en) | 2022-07-28 | 2023-08-29 | Optum, Inc. | Natural language processing techniques for document summarization using local and corpus-wide inferences |
Also Published As
Publication number | Publication date |
---|---|
US10498888B1 (en) | 2019-12-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10498888B1 (en) | Automatic call classification using machine learning | |
US11935540B2 (en) | Switching between speech recognition systems | |
US10672383B1 (en) | Training speech recognition systems using word sequences | |
US10971153B2 (en) | Transcription generation from multiple speech recognition systems | |
US20220122587A1 (en) | Training of speech recognition systems | |
US11568231B2 (en) | Waypoint detection for a contact center analysis system | |
US20210029248A1 (en) | Hierarchical interface for adaptive closed loop communication system | |
US10157609B2 (en) | Local and remote aggregation of feedback data for speech recognition | |
US8170866B2 (en) | System and method for increasing accuracy of searches based on communication network | |
US20230335152A1 (en) | Adaptive closed loop communication system | |
US10387410B2 (en) | Method and system of classification in a natural language user interface | |
US11715459B2 (en) | Alert generator for adaptive closed loop communication system | |
CN111681653A (en) | Call control method, device, computer equipment and storage medium | |
CN110309299B (en) | Communication anti-fraud method, device, computer readable medium and electronic equipment | |
US20230102179A1 (en) | Computer systems and computer-based methods for automated caller intent prediction | |
US20210021709A1 (en) | Configurable dynamic call routing and matching system | |
US20230090049A1 (en) | Computer systems and computer-based methods for automated callback scheduling utilizing call duration prediction | |
US11721324B2 (en) | Providing high quality speech recognition | |
KR20240046508A (en) | Decision and visual display of voice menu for calls | |
US11735207B1 (en) | Systems and methods for determining a next action based on weighted predicted emotions, entities, and intents | |
US11978475B1 (en) | Systems and methods for determining a next action based on a predicted emotion by weighting each portion of the action's reply | |
US11947872B1 (en) | Natural language processing platform for automated event analysis, translation, and transcription verification | |
US20240161123A1 (en) | Auditing user feedback data | |
WO2023027833A1 (en) | Determination and visual display of spoken menus for calls | |
CN115473963A (en) | Call processing method and device, electronic equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: MICROENTITY |
|
AS | Assignment |
Owner name: UPCALL INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:RUTE, JASON;DEVYVER, MICHAEL;SIGNING DATES FROM 20190325 TO 20190327;REEL/FRAME:048759/0154 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: MICROENTITY Free format text: ENTITY STATUS SET TO MICRO (ORIGINAL EVENT CODE: MICR); ENTITY STATUS OF PATENT OWNER: MICROENTITY |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO MICRO (ORIGINAL EVENT CODE: MICR); ENTITY STATUS OF PATENT OWNER: MICROENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, MICRO ENTITY (ORIGINAL EVENT CODE: M3551); ENTITY STATUS OF PATENT OWNER: MICROENTITY Year of fee payment: 4 |