WO2014069121A1 - Dispositif d'analyse de conversation et procédé d'analyse de conversation - Google Patents

Dispositif d'analyse de conversation et procédé d'analyse de conversation Download PDF

Info

Publication number
WO2014069121A1
WO2014069121A1 PCT/JP2013/075243 JP2013075243W WO2014069121A1 WO 2014069121 A1 WO2014069121 A1 WO 2014069121A1 JP 2013075243 W JP2013075243 W JP 2013075243W WO 2014069121 A1 WO2014069121 A1 WO 2014069121A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
conversation
expression data
expression
apology
Prior art date
Application number
PCT/JP2013/075243
Other languages
English (en)
Japanese (ja)
Inventor
真 寺尾
祥史 大西
真宏 谷
岡部 浩司
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2014544379A priority Critical patent/JP6365304B2/ja
Publication of WO2014069121A1 publication Critical patent/WO2014069121A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/51Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2201/00Electronic components, circuits, software, systems or apparatus used in telephone systems
    • H04M2201/40Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2203/00Aspects of automatic or semi-automatic exchanges
    • H04M2203/20Aspects of automatic or semi-automatic exchanges related to features of supplementary services
    • H04M2203/2038Call context notifications

Definitions

  • the present invention relates to a conversation analysis technique.
  • An example of a technology for analyzing conversation is a technology for analyzing call data.
  • data of a call performed in a department called a call center or a contact center is analyzed.
  • a contact center such a department that specializes in the business of responding to customer calls such as inquiries, complaints and orders regarding products and services.
  • Patent Document 1 it is determined whether or not a keyword issued at the time of complaint is included in the call by performing voice recognition on the call contents between the customer and the operator, and the customer's CS (customer satisfaction) is determined based on the determination result. A method to judge the level is proposed.
  • the degree of satisfaction or dissatisfaction of the person who participates in the conversation (hereinafter referred to as “conversation participant”), that is, the degree of satisfaction or dissatisfaction of the customer cannot be determined appropriately.
  • the degree of satisfaction or dissatisfaction of the customer cannot be determined appropriately.
  • even expressions (keywords) that can express satisfaction may be uttered regardless of satisfaction.
  • the thank-you expression of “Thank you” can express satisfaction.
  • the expression does not necessarily indicate satisfaction when used in the following dialogue. Operator “If this is the case, please reboot the PC first.” Customer “Thank you. I just tried it.”
  • misrecognition such as insertion error and omission error may occur.
  • an expression that is not actually uttered in the conversation (call) is recognized, or an expression that is actually uttered in the conversation is not recognized.
  • the keyword to be extracted is erroneously detected or dropped, and as a result, the estimation accuracy of customer satisfaction or dissatisfaction based on the keyword decreases.
  • the present invention has been made in view of such circumstances, and provides a technique for accurately estimating the degree of satisfaction or dissatisfaction of a conversation participant.
  • the degree of satisfaction or dissatisfaction of a conversation participant means the degree of satisfaction or dissatisfaction that at least one conversation participant felt in the conversation.
  • the degree of satisfaction includes only indicating satisfaction or not
  • the degree of dissatisfaction includes indicating only dissatisfaction or not.
  • the first aspect relates to a conversation analysis device.
  • the conversation analysis device provides thank-you expression data uttered by the first conversation participant from data corresponding to the voice of only the closing section of the conversation between the first conversation participant and the second conversation participant.
  • an expression detection unit for detecting at least one of the apology expression data uttered by the second conversation participant as specific expression data, and the satisfaction level of the first conversation participant in the conversation or
  • An estimation unit for estimating the degree of dissatisfaction.
  • the second aspect relates to a conversation analysis method executed by at least one computer.
  • the conversation analysis method according to the second aspect includes thank-you expression data uttered by the first conversation participant from data corresponding to the voice of only the closing section of the conversation between the first conversation participant and the second conversation participant. And at least one of the apology expression data uttered by the second conversation participant is detected as specific expression data, and the degree of satisfaction or dissatisfaction of the first conversation participant in the conversation is estimated according to the detection result of the specific expression data Including.
  • Another aspect of the present invention may be a program that causes at least one computer to implement each configuration in the first aspect, or a computer-readable recording medium that records such a program. There may be.
  • This recording medium includes a non-transitory tangible medium.
  • the conversation analysis apparatus provides thank-you expression data uttered by the first conversation participant from data corresponding to the voice of only the closing section of the conversation between the first conversation participant and the second conversation participant. And an expression detection unit for detecting at least one of the apology expression data uttered by the second conversation participant as specific expression data, and the satisfaction level of the first conversation participant in the conversation or An estimation unit for estimating the degree of dissatisfaction.
  • the conversation analysis method is executed by at least one computer, and from the data corresponding to the voice of only the closing section of the conversation between the first conversation participant and the second conversation participant, the first conversation participation At least one of the thank-you expression data uttered by the person and the apology expression data uttered by the second conversation participant is detected as specific expression data, and the first conversation participant in the conversation according to the detection result of the specific expression data Estimating the degree of satisfaction or dissatisfaction.
  • conversation means that two or more speakers speak by expressing their intentions by uttering a language.
  • conversation participants can speak directly, such as at bank counters and cash registers at stores, and in remote conversations such as telephone conversations and video conferencing.
  • remote conversations such as telephone conversations and video conferencing.
  • the content and form of the target conversation are not limited, but a public conversation is more preferable as the target conversation than a private conversation such as a conversation between friends.
  • the above-mentioned thank-you expression data, apology expression data, and specific expression data are a word, a word string that is a sequence of a plurality of words, or a set of words scattered in a certain utterance in a conversation.
  • the thank-you expression data and the thank-you expression, the apology expression data and the apology expression, the specific expression data and the specific expression may be used without distinction.
  • the word “Thank you”, the word string “Thank you”, “Yes” and “Masu”, the word set “True” and “Thank you” can be included.
  • the apology expression data may include the word “sorry”, the word string “sorry”, “present”, “no”, “n”, and the like.
  • Talkers often express their gratitude when they are satisfied with the conversation.
  • a conversation participant often gives an apology when he / she feels that his / her conversation partner is dissatisfied due to his / her disagreement.
  • an apology may be uttered regardless of the dissatisfaction of the conversation partner.
  • a typical apology expression is issued such as “I'm sorry, please wait for a while”. In this case, the conversation participant is expressing his apology regardless of the emotion of the conversation partner.
  • the present inventors have found that the conversation participants' feelings regarding the whole conversation, especially satisfaction and dissatisfaction, are easily expressed in the conversation end process, and this knowledge was further spoken during the conversation end process. We found that pleasedness and apology are likely to express the emotions of the participants.
  • the present embodiment provides the concept of a closing section that means the conversation end process, and the thanks given by the first conversation participant and the second conversation participant from the data corresponding to the voice of only this closing section.
  • Specific expression data representing at least one of the spoken apologies is detected.
  • the closing time of the closing section is set to the conversation end time.
  • the end of the conversation is represented, for example, by disconnecting the call in the case of a call, and by the dissolution of the conversation participants in the case of a conversation other than a call.
  • a conversation is terminated due to a specific sudden cause such as a situation in which the conversation participant cannot help, there may be no closing section in the conversation.
  • noise information accompanying voice recognition misrecognition for voice outside the closing section can also be excluded from the estimation material of satisfaction or dissatisfaction of the first conversation participant. Specifically, if a thank-you expression or apology expression that is not actually uttered by a conversation participant outside the closing section is misrecognized, the mis-recognized thank-you expression and apology expression are excluded from the estimated material. Is done.
  • the satisfaction level or dissatisfaction level of the first conversation participant is estimated only for specific expression data that is likely to represent the satisfaction level or dissatisfaction level of the conversation participant. Therefore, according to the present embodiment, the specific expression that does not reflect the satisfaction or dissatisfaction of the first conversation participant and the specific expression data having high purity excluding noise data based on misrecognition of speech recognition, Participant satisfaction or dissatisfaction can be estimated with high accuracy.
  • the conversation analysis apparatus and the conversation analysis method described above are not limited to application to a contact center system that handles call data, and can be applied to various modes that handle conversation data. For example, they can also be applied to in-house call management systems other than contact centers, and personal terminals owned by PCs (Personal Computers), fixed telephones, mobile phones, tablet terminals, smartphones, etc. .
  • conversation data for example, data indicating conversation between a person in charge and a customer at a bank counter or a store cash register can be exemplified.
  • a call handled in each embodiment refers to a call from when a call terminal possessed by a certain caller and a certain caller is connected between the call connection and the call disconnection.
  • a continuous area in which a single caller is speaking in a call voice is referred to as an utterance or an utterance section.
  • the speech segment is detected as a segment in which the amplitude of a predetermined value or more continues in the voice waveform of the caller.
  • a normal call is formed from each speaker's utterance section, silent section, and the like.
  • FIG. 1 is a conceptual diagram showing a configuration example of a contact center system 1 in the first embodiment.
  • the contact center system 1 in the first embodiment includes an exchange (PBX) 5, a plurality of operator telephones 6, a plurality of operator terminals 7, a file server 9, a call analysis server 10, and the like.
  • the call analysis server 10 includes a configuration corresponding to the conversation analysis device in the above-described embodiment.
  • the customer corresponds to the first conversation participant described above
  • the operator corresponds to the second conversation participant described above.
  • the exchange 5 is communicably connected via a communication network 2 to a call terminal (customer telephone) 3 such as a PC, a fixed telephone, a mobile phone, a tablet terminal, or a smartphone that is used by a customer.
  • the communication network 2 is a public network such as the Internet or a PSTN (Public Switched Telephone Network), a wireless communication network, or the like.
  • the exchange 5 is connected to each operator telephone 6 used by each operator of the contact center. The exchange 5 receives the call from the customer and connects the call to the operator telephone 6 of the operator corresponding to the call.
  • Each operator uses an operator terminal 7.
  • Each operator terminal 7 is a general-purpose computer such as a PC connected to a communication network 8 (LAN (Local Area Network) or the like) in the contact center system 1.
  • LAN Local Area Network
  • each operator terminal 7 records customer voice data and operator voice data in a call between each operator and the customer.
  • the customer voice data and the operator voice data may be generated by being separated from the mixed state by predetermined voice processing. Note that this embodiment does not limit the recording method and the recording subject of such audio data.
  • Each voice data may be generated by a device (not shown) other than the operator terminal 7.
  • the file server 9 is realized by a general server computer.
  • the file server 9 stores the call data of each call between the customer and the operator together with the identification information of each call.
  • Each call data includes a pair of customer voice data and operator voice data, and disconnection time data indicating the time when the call was disconnected.
  • the file server 9 acquires customer voice data and operator voice data from another device (each operator terminal 7 or the like) that records each voice of the customer and the operator. Further, the file server 9 acquires disconnection time data from each operator telephone 6, the exchange 5 and the like.
  • the call analysis server 10 estimates the degree of customer satisfaction or dissatisfaction for each call data stored in the file server 9.
  • the call analysis server 10 includes a CPU (Central Processing Unit) 11, a memory 12, an input / output interface (I / F) 13, a communication device 14 and the like as a hardware configuration.
  • the memory 12 is a RAM (Random Access Memory), a ROM (Read Only Memory), a hard disk, a portable storage medium, or the like.
  • the input / output I / F 13 is connected to a device that accepts an input of a user operation such as a keyboard and a mouse, and a device that provides information to the user such as a display device and a printer.
  • the communication device 14 communicates with the file server 9 and the like via the communication network 8. Note that the hardware configuration of the call analysis server 10 is not limited.
  • FIG. 2 is a diagram conceptually illustrating a processing configuration example of the call analysis server 10 in the first embodiment.
  • the call analysis server 10 in the first embodiment includes a call data acquisition unit 20, a voice recognition unit 21, a closing detection unit 23, a specific expression table 25, an expression detection unit 26, an estimation unit 27, and the like.
  • Each of these processing units is realized, for example, by executing a program stored in the memory 12 by the CPU 11. Further, the program may be installed from a portable recording medium such as a CD (Compact Disc) or a memory card, or another computer on the network via the input / output I / F 13 and stored in the memory 12. Good.
  • a portable recording medium such as a CD (Compact Disc) or a memory card
  • the call data acquisition unit 20 acquires the call data of the call to be analyzed from the file server 9 together with the identification information of the call. As described above, the call data includes disconnection time data.
  • the call data may be acquired by communication between the call analysis server 10 and the file server 9, or may be acquired via a portable recording medium.
  • the voice recognition unit 21 performs voice recognition processing on each voice data of the operator and the customer included in the call data. Thereby, the voice recognition unit 21 acquires each voice text data and each utterance time data corresponding to the operator voice and the customer voice from the call data.
  • the voice text data is character data in which a voice uttered by a customer or an operator is converted into text. Each voice text data is divided for each word (part of speech). Each utterance time data includes utterance time data for each word of each voice text data.
  • the voice recognition unit 21 may detect the utterance sections of the operator and the customer from the voice data of the operator and the customer, respectively, and acquire the start time and the end time of each utterance section. In this case, the speech recognition unit 21 determines an utterance time for each word string corresponding to each utterance section in each speech text data, and uses the utterance time for each word string corresponding to each utterance section as the utterance time data. You may do it.
  • a voice recognition parameter (hereinafter referred to as a reference voice recognition parameter) adapted for a call in a contact center is used.
  • this speech recognition parameter for example, an acoustic model and a language model learned from a plurality of speech samples are used.
  • a known method may be used for the voice recognition process, and the voice recognition process itself and various voice recognition parameters used in the voice recognition process are not limited.
  • the method for detecting the utterance section is not limited.
  • the voice recognition unit 21 may perform the voice recognition process only on the voice data of either the customer or the operator according to the processing contents of the closing detection unit 23 and the expression detection unit 26. For example, when a closing section is detected by searching for a predetermined closing phrase as will be described later, the closing detection unit 23 requires the operator's voice text data. Moreover, the expression detection part 26 requires a customer's audio
  • the closing detection unit 23 detects the closing period of the target call based on the disconnection time data included in the call data, the voice text data of the operator or customer acquired by the voice recognition unit 21 and the utterance time data thereof.
  • the closing detection unit 23 generates closing section data including the start time and the end time of the detected closing section.
  • the end time of the closing section is set to the cutting time indicated by the cutting time data.
  • the start time of the closing section is set as follows, for example.
  • the closing detection unit 23 determines the start end time of a predetermined number of utterance sections from the call disconnection time as the start end time of the closing section.
  • the closing detection unit 23 may determine a time point that is a predetermined time later than the call disconnection time as the start time of the closing section. According to these methods for determining the start time of the closing section, the start time of the closing section can be determined based only on the voice text data of either the operator or the customer used in the expression detection unit 26.
  • the predetermined number of utterances and the predetermined time for determining the width of the closing section are determined in advance according to a closing sentence described in an operator manual or the like, a result of listening to audio data at a contact center, or the like.
  • the closing detection unit 23 may determine the utterance time of the previous predetermined closing phrase in the voice text data of the operator as the start time of the closing section.
  • the closing phrase is a phrase issued by the operator in the process of ending the call, such as a final greeting phrase.
  • a phrase to be issued by an operator in the process of terminating a call is often determined manually.
  • the closing detection unit 23 may hold data of a plurality of such predetermined closing phrases so as to be adjustable in advance.
  • Such predetermined closing phrase data may be input by a user based on an input screen or the like, or may be acquired from a portable recording medium, another computer, or the like via the input / output I / F 13.
  • the specific expression table 25 holds thanks expression data and apology expression data as specific expression data. Specifically, the specific expression table 25 holds the specific expression data to be detected by the expression detection unit 26 so that it can be distinguished into thank-you expression data and apology expression data. The specific expression table 25 may hold only one of thank-you expression data and apology expression data in accordance with the processing of the expression detection unit 26.
  • the expression detection unit 26 executes any one of the following three types of processing according to the specific expression data to be detected.
  • the first processing type detects only thank-you expression data
  • the second processing type detects only apology expression data
  • the third processing type detects both thank-you expression data and apology expression data. set to target.
  • the expression detection unit 26 has an utterance time within the time range indicated by the closing section data generated by the closing detection unit 23 from the voice text data of the customer acquired by the voice recognition unit 21. Extract voice text data.
  • the expression detection unit 26 detects thank-you expression data held in the specific expression table 25 from the voice text data of the customer corresponding to the extracted closing section. Along with this detection, the expression detection unit 26 counts the number of thanks expression data detected.
  • the expression detection unit 26 has an utterance time within the time range indicated by the closing section data generated by the closing detection unit 23 from the voice text data of the operator acquired by the voice recognition unit 21. Extract voice text data.
  • the expression detection unit 26 detects apology expression data held in the specific expression table 25 from the voice text data of the operator corresponding to the extracted closing section. Along with this detection, the expression detection unit 26 counts the number of detected apology expression data.
  • the expression detection unit 26 utters speech within the time range indicated by the closing section data generated by the closing detection unit 23 from the voice text data of the customer and the operator acquired by the voice recognition unit 21. Each speech text data having time is extracted.
  • the expression detection unit 26 detects apology expression data held in the specific expression table 25 from the voice text data of the operator corresponding to the extracted closing section, and the customer's corresponding to the extracted closing section is detected. Thanks expression data held in the specific expression table 25 is detected from the voice text data.
  • the expression detection unit 26 separately counts the number of detected thank-you expression data and the number of detected apology expression data.
  • the estimation unit 27 estimates at least one of customer satisfaction and dissatisfaction in the target call according to the number of thank-you expression data detected by the expression detection unit 26. For example, when the number of thank-you expression data detected is greater than or equal to a predetermined threshold, the estimation unit 27 estimates that there is satisfaction. Further, when the number of thank-you expression data detected is equal to or greater than a predetermined threshold, it may be estimated that there is no dissatisfaction. Furthermore, the estimation unit 27 may estimate that there is no satisfaction when the number of thank-you expression data detected is smaller than a predetermined threshold.
  • the predetermined threshold for estimating the presence or absence of satisfaction or dissatisfaction is determined in advance based on the result of listening to audio data at the contact center.
  • the table below shows the results of examining the relationship between the number of times the customer expressed gratitude in the closing period of the contact center call and the satisfaction and dissatisfaction of the customer. “Neutral” in the table indicates that the customer does not feel satisfaction or dissatisfaction. From the table below, it can be seen that the greater the number of times thanking in the closing section, the greater the probability that the customer feels satisfied and the less likely that the customer feels dissatisfied.
  • the threshold value for estimating the presence or absence of satisfaction or dissatisfaction is determined in advance based on such survey results. For example, based on the table below, it can be expected that satisfaction can be estimated with an accuracy of about 80% if the number of thanks is three or more. Moreover, if the number of thanks is less than 1 (that is, zero), it can be expected that satisfaction can be estimated with an accuracy of about 88%.
  • the estimation unit 27 estimates at least one of the degree of customer dissatisfaction and satisfaction in the target call according to the number of detected apology expression data counted by the expression detection unit 26. For example, the estimation unit 27 estimates that there is dissatisfaction when the detected number of apology expression data is greater than or equal to a predetermined threshold. Moreover, the estimation part 27 may determine the satisfaction level value and dissatisfaction level value according to the detection number of thanks expression data. Similarly, the estimation unit 27 may determine a dissatisfaction level value or a satisfaction level value corresponding to the number of detected apology expression data.
  • the estimation unit 27 calculates at least one of customer satisfaction and dissatisfaction level in the target call according to the number of detections of both. You may make it estimate. For example, the estimation unit 27 estimates that there is satisfaction when the number of thanks expression data detected is greater than the other, and estimates that there is dissatisfaction when the number of detected apology data is greater than the other. In addition, the estimation unit 27 may determine a satisfaction level value and a dissatisfaction level value according to the number of detections, or determine a satisfaction level value or a dissatisfaction level value based on a difference value between the two. Also good.
  • the estimation unit 27 generates output data including information indicating the estimation result, and outputs the determination result to the display unit or another output device via the input / output I / F 13.
  • the present embodiment does not limit the specific form of output of the determination result.
  • FIG. 3 is a flowchart showing an operation example of the call analysis server 10 in the first embodiment.
  • the call analysis server 10 acquires call data (S30).
  • the call analysis server 10 acquires call data to be analyzed from a plurality of call data stored in the file server 9.
  • the call analysis server 10 performs voice recognition processing on the customer voice data included in the call data acquired in (S30) (S31). Thereby, the call analysis server 10 acquires the customer's voice text data and utterance time data.
  • the customer's voice text data is divided for each word (part of speech).
  • the utterance time data includes utterance time data for each word or for each word string corresponding to each utterance section.
  • the call analysis server 10 detects the closing section of the target call based on the disconnection time data included in the call data acquired in (S30) and the utterance time data acquired in (S31) (S32). For example, the call analysis server 10 determines a time point that is a predetermined time back from the call disconnection time indicated by the disconnection time data as the start time of the closing section. As another example, the call analysis server 10 determines the start time of the utterance section for a predetermined number of customers as the start time of the closing section from the call disconnection time. The call analysis server 10 generates closing section data indicating the start time and the end time of the detected closing section.
  • the call analysis server 10 extracts voice text data corresponding to the utterance time within the time range indicated by the closing section data generated in (S32) from the customer voice text data acquired in (S31). From the extracted speech text data, thank-you expression data as specific expression data is detected (S33). With this detection, the call analysis server 10 counts the number of thank-you expression data detected (S34).
  • the call analysis server 10 estimates the customer satisfaction of the target call based on the number of thank-you expression data detected in (S34) (S35). For example, when the number of thank-you expression data detected is greater than a predetermined threshold, the call analysis server 10 estimates that there is satisfaction and no dissatisfaction. When the number of thank-you expression data detected is smaller than the predetermined threshold, the call analysis server 10 estimates that there is no satisfaction. The call analysis server 10 generates output data indicating the presence or absence of the estimated satisfaction or dissatisfaction level, or a level value.
  • the call analysis server 10 performs a voice recognition process on the operator's voice data included in the call data. Thereby, the call analysis server 10 acquires the operator's voice text data and utterance time data.
  • the call analysis server 10 closes the closing period of the target call based on the disconnection time data included in the call data acquired in (S30) and the voice text data of the operator acquired in (S31). Is detected. In this case, the call analysis server 10 determines the utterance time of the first predetermined closing phrase in the voice text data of the operator as the start time of the closing section.
  • the call analysis server 10 selects the voice corresponding to the utterance time within the time range indicated by the closing section data generated in (S32) from the voice text data of the operator acquired in (S31). Text data is extracted, and apology expression data as specific expression data is detected from the extracted speech text data. In (S34), the call analysis server 10 counts the number of detected apology expression data (S34).
  • the call analysis server 10 estimates the degree of dissatisfaction of the customer of the target call based on the detected number of apology expression data counted in (S34) (S35). The call analysis server 10 estimates that there is dissatisfaction if the detected number of apology expression data is greater than a predetermined threshold value, and otherwise estimates that there is no dissatisfaction.
  • the call analysis server 10 performs voice recognition processing on each voice data of the customer and the operator. Thereby, the call analysis server 10 acquires voice text data and utterance time data related to the customer and the operator, respectively.
  • the call analysis server 10 executes (S33) and (S34) in the above two cases, respectively. As a result, the number of detected thank-you expression data and the number of detected apology data are counted.
  • the call analysis server 10 estimates at least one of the satisfaction level and dissatisfaction level of the customer of the target call based on the detected number of thank-you expression data and the detected number of apology data counted in (S34). To do.
  • the detected number of thank-you expression data uttered by the customer and the detected number of apology expression data uttered by the operator, which are detected from the data corresponding to the voice of the closing period of the target call, are detected. Based on at least one, at least one of customer satisfaction and dissatisfaction of the target call is estimated. According to this embodiment, since a thank-you expression or an apology expression is detected only from the closing section, these specific expressions are highly likely to reflect customer satisfaction or dissatisfaction, and other than the closing section. Since it is not adversely affected by the misrecognized specific expression, the satisfaction or dissatisfaction of the customer can be estimated with high accuracy.
  • the satisfaction or dissatisfaction level of the customer can be estimated with high accuracy as described in the above embodiment. Can do. Therefore, according to the present embodiment, it is possible to reduce the load of the voice recognition process as compared with the form in which the voice recognition process is performed on the voice data of both the customer and the operator.
  • At least one of the satisfaction level and the dissatisfaction level of the customer of the target call is based on both the detected number of thank-you expression data uttered by the customer and the detected number of apology expression data uttered by the operator. Can also be estimated. In this way, both the customer's gratitude expression and the operator's apology expression that have a strong correlation with the customer's satisfaction and dissatisfaction are taken into account, so the accuracy of estimating the customer's satisfaction or dissatisfaction is further improved. be able to.
  • FIG. 4 is a diagram conceptually illustrating a processing configuration example of the call analysis server 10 in the second embodiment.
  • the call analysis server 10 in the second embodiment further includes a voice recognition unit 41 in addition to the configuration of the first embodiment.
  • the voice recognition unit 41 is realized by executing a program stored in the memory 12 by the CPU 11, for example, in the same manner as the other processing units.
  • the voice recognition unit 21 performs voice recognition processing on the voice data of the operator included in the call data, using the reference voice recognition parameter LM-1. Since the voice text data acquired by the voice recognition process is used only by the closing detection unit 23, the voice recognition process may be performed only on the voice data of the operator. Note that the voice recognition unit 21 may perform voice recognition processing on the voice data of both the operator and the customer.
  • the voice recognition unit 21 holds in advance a reference voice recognition parameter LM-1 that has been learned in advance for general calls in the contact center.
  • the voice recognition unit 41 is weighted so that the reference voice recognition parameter LM-1 used by the voice recognition unit 21 is weighted so that the specific expression data detected by the expression detection unit 26 can be recognized more easily than other word data.
  • the recognition parameter hereinafter referred to as a weighted speech recognition parameter
  • LM-2 the recognition parameter (hereinafter referred to as a weighted speech recognition parameter) LM-2, speech recognition processing is performed on the speech data in the closing section of the target call.
  • the voice recognition unit 21 and the voice recognition unit 41 are distinguished from each other, but both may be realized as one processing unit, and the voice recognition parameters to be used may be switched.
  • the weighted speech recognition parameter LM-2 is calculated by a predetermined method based on the reference speech recognition parameter LM-1, for example, and is held in advance by the speech recognition unit 41.
  • the following equation is a diagram illustrating an example of calculating the weighted speech recognition parameter LM-2 when the N-gram language model is used as the speech recognition parameter.
  • w i ⁇ n + 1 i ⁇ 1 ) of the above formula represents an N-gram language model corresponding to the weighted speech recognition parameter LM-2, and the (i ⁇ n + 1) th to (i ⁇ 1)
  • the appearance probability of the i-th word w i under the condition of the word string w i-n + 1 i ⁇ 1 up to the th is shown.
  • w i ⁇ n + 1 i ⁇ 1 ) on the right side of the above expression represents an N-gram language model corresponding to the reference speech recognition parameter LM ⁇ 1.
  • P new (w i ) on the right side of the above expression is a unigram language model in which the appearance probability of a willing expression and an apology expression is increased
  • P old (w i ) on the right side of the above expression is a reference speech recognition parameter LM-1
  • (P new (w i ) / P old (w) is set so that the N-gram language model learned in advance for general calls in the contact center increases the appearance probability of a thank-you expression and an apology expression.
  • the N-gram language model weighted in i )) is calculated as the weighted speech recognition parameter LM-2.
  • the voice recognition unit 41 performs the voice recognition process only on the voice data within the time range indicated by the closing section data generated by the closing detection unit 23. Further, the voice recognition unit 41 may set both voice data of the customer and the operator as a target of voice recognition processing according to the processing content of the expression detection unit 26, or only the voice data of one of the customer and the operator. It may be a target of voice recognition processing.
  • the expression detection unit 26 detects at least one of thanks expression data and apology expression data held in the specific expression table 25 from the voice text data acquired by the voice recognition unit 41.
  • FIG. 5 is a flowchart illustrating an operation example of the call analysis server 10 according to the second embodiment.
  • the same steps as those in FIG. 3 are denoted by the same reference numerals as those in FIG.
  • the call analysis server 10 applies weighted speech recognition parameters to the voice data in the time range indicated by the closing section data generated in (S32) among the voice data included in the call data acquired in (S30). Speech recognition using LM-2 is performed (S51). The call analysis server 10 detects at least one of thank-you expression data and apology expression data as specific expression data from the speech text data acquired in (S51) (S33).
  • the speech recognition process is performed on the speech data in the closing section using the weighted speech recognition parameters weighted so as to easily recognize the thanks and apologies. Then, at least one of thank-you expression data and apology expression data is detected from the speech text data acquired by this speech recognition process, and the satisfaction or dissatisfaction level of the customer of the target call is estimated based on the detection result.
  • the detection rate of the thank-you expression and the apology expression is increased in this way, if the thank-you expression is still not detected, it is estimated that the customer is not satisfied according to the detection result. , Extremely high accuracy (purity) will be exhibited. Therefore, according to the second embodiment, it can be expected that the estimation accuracy is very high by estimating that there is no satisfaction when the number of detected thank-you expressions is 0.
  • a weighted language model is used so that a thank-you expression can be easily recognized. Therefore, when the number of detected thank-you expressions is 0, there is a high possibility that the customer did not say thank-you at all. It is also possible to estimate that there is dissatisfaction with the call.
  • FIG. 6 is a diagram conceptually illustrating a processing configuration example of the call analysis server 10 in the first modification.
  • the closing detection unit 23 detects a closing section using at least one of voice data and disconnection time data included in the call data acquired by the call data acquisition unit 20.
  • the closing detection unit 23 may set the call disconnection time indicated by the disconnection time data as the end time of the closing section, and determine a predetermined time width from the call disconnection time as the start end time of the closing section. In addition, the closing detection unit 23 holds each voice signal waveform obtained from the voice data of each closing phrase, and collates each voice signal waveform with the waveform of the voice data included in the call data, thereby closing the phrase. May be acquired.
  • the voice recognition unit 21 may perform voice recognition processing on the voice data in the closing section of the target call.
  • the step (S31) shown in FIG. 3 may be executed after the step (S32) and before the step (S33).
  • FIG. 7 is a diagram conceptually illustrating a processing configuration example of the call analysis server 10 in the second modification.
  • the call analysis server 10 may not have the voice recognition unit 21.
  • the closing detection unit 23 detects a closing section using at least one of voice data and disconnection time data included in the call data acquired by the call data acquisition unit 20. Since the processing content of the closing detection unit 23 in the second modification may be the same as that in the first modification, description thereof is omitted here.
  • the step (S31) shown in FIG. 5 is omitted. According to the first modification and the second modification, since voice recognition is applied only to the section detected by the closing detection unit, there is an advantage that the calculation time required for estimating the degree of satisfaction or dissatisfaction of the customer can be reduced. is there.
  • customer satisfaction or dissatisfaction is estimated based on the number of thank-you expression data detected and the number of apology expression data detected.
  • customer satisfaction or dissatisfaction may be estimated from other than the number of detections.
  • a satisfaction point is given in advance for each thanks expression data, and a dissatisfaction point is given in advance for each apology expression.
  • the customer satisfaction level value and the dissatisfaction level value may be estimated from the total value of the dissatisfaction points of the apology expression data.
  • the contact center system 1 is exemplified, and an example in which the reference voice recognition parameter is adapted (learned) for general calls in the contact center is shown.
  • the reference speech recognition parameters need only be adapted to the type of call being handled. For example, when a general call by a call terminal is handled, a reference speech recognition parameter adapted for such a general call may be used.
  • the call data includes disconnection time data
  • the disconnection time data is generated by each operator telephone 6, the exchange 5, or the like. It may be generated by detecting a cutting sound from customer voice data.
  • the disconnection time data may be generated by the file server 9 or the call analysis server 10.
  • the above-described call analysis server 10 may be realized as a plurality of computers.
  • the call analysis server 10 includes only the expression detection unit 26 and the estimation unit 27, and is configured such that another computer has another processing unit.
  • the closing detection unit 23 may acquire the closing section data by the user operating the input device based on the input screen or the like, or the input / output I / F 13 from a portable recording medium, another computer, or the like. May be obtained via.
  • call data is handled.
  • the above-described conversation analysis device and conversation analysis method may be applied to a device or system that handles conversation data other than a call.
  • a recording device for recording a conversation to be analyzed is installed at a place (conference room, bank window, store cash register, etc.) where the conversation is performed.
  • the conversation data is recorded in a state in which the voices of a plurality of conversation participants are mixed, the conversation data is separated from the mixed state into voice data for each conversation participant by a predetermined voice process.
  • the call disconnection time data is used as the data indicating the end time of the conversation.
  • the event indicating the end of the conversation May be detected automatically or manually, and this detection time point may be treated as conversation end time data.
  • the automatic detection the end of the utterance of all the conversation participants may be detected, or the movement of the person indicating the dissolution of the conversation participants may be detected by a sensor or the like.
  • the manual detection an input operation for notifying the conversation end by the conversation participant may be detected.
  • the closing detection unit 23 includes the conversation end time data included in the conversation data, the voice text data of the conversation participant acquired by the voice recognition unit 21, and the utterance time thereof.
  • the closing section of the target conversation may be detected based on the data.
  • the predetermined number of utterances and the predetermined time for determining the width of the closing section are the conversation types such as conversations conducted at bank counters, conversations conducted at store cash registers, conversations conducted at facility information centers, etc. It is decided according to.
  • a predetermined closing phrase is determined according to the conversation type.
  • the expression detection unit The speech recognition parameter weighted so that the specific expression data is more easily recognized than other word data is used as the reference speech recognition parameter adapted for speech recognition of a predetermined form of conversation including the conversation.
  • a speech recognition unit that performs speech recognition processing on the speech data of the closing section; Including The conversation analysis device according to claim 1, wherein the specific expression data is detected from speech text data in the closing section of the conversation obtained by the speech recognition process of the speech recognition unit.
  • the expression detection unit detects the specific expression data based on a specific expression table that holds the specific expression data in a distinguishable manner between the thanks expression data and the apology expression data. Count the number of detected apology data, The estimation unit estimates at least one of satisfaction and dissatisfaction of the first conversation participant in the conversation based on the number of detections of the thanks expression data or the number of detections of the apology expression data.
  • the conversation analyzer according to appendix 1 or 2.
  • the expression detection unit detects the thanks expression data by detecting the specific expression data based on a specific expression table that holds the specific expression data in a distinguishable manner between the thanks expression data and the apology expression data. Count the number and the number of detected apology data, The estimation unit estimates at least one of satisfaction and dissatisfaction of the first conversation participant in the conversation based on the number of detections of the thanks expression data and the number of detections of the apology expression data.
  • the conversation analyzer according to appendix 1 or 2.
  • the speech recognition parameter weighted so that the specific expression data is more easily recognized than other word data is used as the reference speech recognition parameter adapted for speech recognition of a predetermined form of conversation including the conversation.
  • Voice recognition processing is performed on the voice data of the closing section. Further including The specific expression data is detected by detecting the specific expression data from the speech text data of the closing section of the conversation obtained by the speech recognition process.
  • At least one of the thank-you expression data and the apology-expression data by detecting the specific-expression data based on a specific-expression table that holds the specific-expression data separately in the thank-you expression data and the apology-expression data Count the number of detected Further including
  • the estimation estimates at least one of satisfaction and dissatisfaction of the first conversation participant in the conversation based on the number of detected thanksgiving data or the number of detected apology data.
  • the conversation analysis method according to appendix 5 or 6.
  • Appendix 9 A program for causing at least one computer to execute the conversation analysis method according to any one of appendices 5 to 8.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

La présente invention concerne un dispositif d'analyse de conversation comprenant : une unité de détection d'expression qui détecte des données associées à des expressions de remerciement prononcées par un premier participant à une conversation et/ou des données associées à des expressions d'excuses prononcées par un deuxième participant à une conversation en tant que données d'expression spécifique à partir des données correspondant à l'audio uniquement du segment de clôture d'une conversation entre le premier participant à la conversation et le deuxième participant à la conversation ; et une unité d'estimation qui estime le degré de satisfaction ou d'insatisfaction du premier participant à la conversation dans la conversation en fonction du résultat de détection des données d'expression spécifique.
PCT/JP2013/075243 2012-10-31 2013-09-19 Dispositif d'analyse de conversation et procédé d'analyse de conversation WO2014069121A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2014544379A JP6365304B2 (ja) 2012-10-31 2013-09-19 会話分析装置及び会話分析方法

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2012-240750 2012-10-31
JP2012240750 2012-10-31

Publications (1)

Publication Number Publication Date
WO2014069121A1 true WO2014069121A1 (fr) 2014-05-08

Family

ID=50627037

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2013/075243 WO2014069121A1 (fr) 2012-10-31 2013-09-19 Dispositif d'analyse de conversation et procédé d'analyse de conversation

Country Status (2)

Country Link
JP (1) JP6365304B2 (fr)
WO (1) WO2014069121A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750674A (zh) * 2015-02-17 2015-07-01 北京京东尚科信息技术有限公司 一种人机会话满意度预测方法及系统
JP2019101399A (ja) * 2017-11-30 2019-06-24 日本電信電話株式会社 好感度推定装置、好感度推定方法、プログラム
JP2020126185A (ja) * 2019-02-06 2020-08-20 日本電信電話株式会社 音声認識装置、検索装置、音声認識方法、検索方法およびプログラム
WO2023119992A1 (fr) * 2021-12-24 2023-06-29 ソニーグループ株式会社 Dispositif de traitement d'informations, procédé de traitement d'informations et programme

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010041507A1 (fr) * 2008-10-10 2010-04-15 インターナショナル・ビジネス・マシーンズ・コーポレーション Système et procédé qui extraient une situation spécifique d’une conversation
JP2012047875A (ja) * 2010-08-25 2012-03-08 Nippon Telegr & Teleph Corp <Ntt> 用件区間抽出方法、装置、及びそのプログラム
JP2013156524A (ja) * 2012-01-31 2013-08-15 Fujitsu Ltd 特定通話検出装置、特定通話検出方法及び特定通話検出用コンピュータプログラム

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4972107B2 (ja) * 2009-01-28 2012-07-11 日本電信電話株式会社 通話状態判定装置、通話状態判定方法、プログラム、記録媒体
US20100332287A1 (en) * 2009-06-24 2010-12-30 International Business Machines Corporation System and method for real-time prediction of customer satisfaction
JP5533219B2 (ja) * 2010-05-11 2014-06-25 セイコーエプソン株式会社 接客データ記録装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010041507A1 (fr) * 2008-10-10 2010-04-15 インターナショナル・ビジネス・マシーンズ・コーポレーション Système et procédé qui extraient une situation spécifique d’une conversation
JP2012047875A (ja) * 2010-08-25 2012-03-08 Nippon Telegr & Teleph Corp <Ntt> 用件区間抽出方法、装置、及びそのプログラム
JP2013156524A (ja) * 2012-01-31 2013-08-15 Fujitsu Ltd 特定通話検出装置、特定通話検出方法及び特定通話検出用コンピュータプログラム

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750674A (zh) * 2015-02-17 2015-07-01 北京京东尚科信息技术有限公司 一种人机会话满意度预测方法及系统
JP2019101399A (ja) * 2017-11-30 2019-06-24 日本電信電話株式会社 好感度推定装置、好感度推定方法、プログラム
JP2020126185A (ja) * 2019-02-06 2020-08-20 日本電信電話株式会社 音声認識装置、検索装置、音声認識方法、検索方法およびプログラム
JP7177348B2 (ja) 2019-02-06 2022-11-24 日本電信電話株式会社 音声認識装置、音声認識方法およびプログラム
WO2023119992A1 (fr) * 2021-12-24 2023-06-29 ソニーグループ株式会社 Dispositif de traitement d'informations, procédé de traitement d'informations et programme

Also Published As

Publication number Publication date
JPWO2014069121A1 (ja) 2016-09-08
JP6365304B2 (ja) 2018-08-01

Similar Documents

Publication Publication Date Title
JP6341092B2 (ja) 表現分類装置、表現分類方法、不満検出装置及び不満検出方法
EP2717258B1 (fr) Systèmes et procédés de reconnaissance de phrase
WO2014069076A1 (fr) Dispositif d&#39;analyse de conversation et procédé d&#39;analyse de conversation
JP6358093B2 (ja) 分析対象決定装置及び分析対象決定方法
US9293133B2 (en) Improving voice communication over a network
US10592611B2 (en) System for automatic extraction of structure from spoken conversation using lexical and acoustic features
US9269357B2 (en) System and method for extracting a specific situation from a conversation
US8417524B2 (en) Analysis of the temporal evolution of emotions in an audio interaction in a service delivery environment
US9711167B2 (en) System and method for real-time speaker segmentation of audio interactions
JP6213476B2 (ja) 不満会話判定装置及び不満会話判定方法
CN102254556A (zh) 基于听者和说者的讲话风格比较估计听者理解说者的能力
US10199035B2 (en) Multi-channel speech recognition
JP6365304B2 (ja) 会話分析装置及び会話分析方法
CN113744742A (zh) 对话场景下的角色识别方法、装置和系统
JP6327252B2 (ja) 分析対象決定装置及び分析対象決定方法
JP7287006B2 (ja) 話者決定装置、話者決定方法、および話者決定装置の制御プログラム
WO2014069443A1 (fr) Dispositif de détermination d&#39;appel de réclamation et procédé de détermination d&#39;appel de réclamation
WO2014069444A1 (fr) Dispositif de détermination de conversation insatisfaisante et procédé de détermination de conversation insatisfaisante
CN116975242A (zh) 语音播报打断处理方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13851196

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2014544379

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13851196

Country of ref document: EP

Kind code of ref document: A1