WO2009130785A1 - Programme d'estimation de temps de résolution de problème, dispositif de traitement, et procédé de traitement - Google Patents

Programme d'estimation de temps de résolution de problème, dispositif de traitement, et procédé de traitement Download PDF

Info

Publication number
WO2009130785A1
WO2009130785A1 PCT/JP2008/058056 JP2008058056W WO2009130785A1 WO 2009130785 A1 WO2009130785 A1 WO 2009130785A1 JP 2008058056 W JP2008058056 W JP 2008058056W WO 2009130785 A1 WO2009130785 A1 WO 2009130785A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
voice
time
speaker
question
Prior art date
Application number
PCT/JP2008/058056
Other languages
English (en)
Japanese (ja)
Inventor
難波 功
佐知子 小野寺
Original Assignee
富士通株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士通株式会社 filed Critical 富士通株式会社
Priority to JP2010509012A priority Critical patent/JP5099218B2/ja
Priority to PCT/JP2008/058056 priority patent/WO2009130785A1/fr
Publication of WO2009130785A1 publication Critical patent/WO2009130785A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use

Definitions

  • a computer for executing a process for estimating a time required for solving a problem inquired by a customer using voice dialogue data in which a dialogue between the customer and an operator who handles the telephone is recorded. It relates to problem solving time estimation processing.
  • the call center manager grasps the time required to resolve this problem (problem resolution time), and improves the efficiency of customer service and customer satisfaction. There is a demand to plan.
  • the response history information that records the response processing of the operator's response, reception time, investigation, transfer to the secondary line (separate department), etc.
  • a response history management system was established to record callback history information for customer inquiries, and record callback information in association with response history information.
  • voice conversation data recording all conversation contents is accumulated so that the conversation contents between the customer and the operator can be heard later.
  • the accumulated large amount of spoken dialogue data can be used not only as confirmation data for dialogue content but also as analysis material for various purposes. However, it is necessary to identify the necessary parts according to the purpose of use. is there.
  • the customer's inquiry content is recognized as a problem and the time required to resolve this problem is known, if the customer can respond within the response time for incoming calls, the start time and end time of the response can be set.
  • the response time can be regarded as the time required for resolution by recording.
  • An object of the present invention is to provide a processing method capable of estimating a problem solving time only from voice dialogue data with a recording time.
  • a problem appears as a part where a customer utters a question
  • a solution to this problem appears as a part where an operator who receives the utterance of this customer utters an answer. Therefore, the time from the recognition of the problem to the completion of the presentation of the answer is regarded as the problem solving time from the start of the dialogue in which the customer asks the question until the end of the dialogue in which the operator answers the customer. Can do.
  • a speaker who speaks predominantly among speakers during a conversation tends to continuously utter with a certain amount of voice as compared to a speaker.
  • the customer makes an inquiry, it is assumed that the customer is leading and speaking in advance.
  • the operator calls back to the customer and answers, it is assumed that the operator is leading and speaking after the dialogue including the customer's question.
  • voice conversation data that can be regarded as a dialogue with the same customer and operator is extracted from a set of voice dialogue data between the customer and the operator, and the type of dialogue is determined by specifying the leading speaker of each dialogue. Determine whether “question” or “non-question”. Furthermore, voice dialogue data having a correspondence relationship such that there is a “non-question” dialogue following the “question” dialogue is extracted, and it is judged as “question and answer”. Estimate the time until the end of the dialogue as the problem solving time.
  • the program disclosed discloses a computer as a set of voice interaction data with recording time, which is composed of 1) a first channel in which operator's voice is recorded and a second channel in which customer's voice is recorded.
  • a processing unit that sets the type of the voice conversation data as non-question when the first channel is included in the first channel, and 3) for each of the voice conversation data, speaker characteristics of the voice data of each channel are determined in advance.
  • the computer on which the disclosed program is installed and executed first inputs a set of voice interactive data with recording time, which is composed of a first channel in which the operator's voice is recorded and a second channel in which the customer's voice is recorded. To do. For each input voice dialogue data, one speaker speaks continuously for a predetermined length or longer from the voice data of the first and second channels. Identify the section. In addition, when the earliest initiative utterance section is identified as the preceding initiative utterance section, and when the preceding initiative utterance section is included in the second channel, the type of the voice conversation data is set as a question, and when it is included in the first channel Set the type as non-question.
  • the speaker characteristics of the voice data of each channel are learned by a predetermined machine learning processing method, and based on the learned speaker characteristics, the first channel and the second channel of the voice conversation data are learned. Speech dialogue data in which the speaker characteristics of the two channels are within a certain similar range is collected to form a similar speaker dialogue set.
  • the voice conversation data constituting the set is arranged according to the recording time, one voice conversation data whose type is a question is taken out, and the type of temporal conversation that follows this voice conversation data is Correlate with non-questional voice dialogue data.
  • the time from the earliest recording start time to the last recording end time of the associated voice dialogue data set is calculated as the problem solving time.
  • voice dialogue data with recording time is also available for cases in which the operator does not answer the customer's question while responding to the incoming customer, and the operator calls back and answers after conducting a survey.
  • the problem solving time can be calculated using only
  • processing device disclosed is a processing device having each processing means for realizing processing that the program causes the computer to execute.
  • processing method disclosed is a processing method configured by each processing step executed by the computer by the program.
  • the work / processing for associating the callback information with the reception information by the customer's incoming call becomes unnecessary, and the problem solving time can be estimated easily.
  • the problem solving time for each question content is analyzed by combining the processing method for estimating customer inquiry tendency from voice dialogue data for another application made by the same person with the disclosed problem solving time estimation process. It can be used as various analysis materials.
  • FIG. 1 is a diagram illustrating a configuration example of a problem solving time estimation processing apparatus.
  • the problem solving time estimation processing device 1 solves a problem presented by a customer from voice dialogue data recorded by an operator and the customer's dialogue recorded at a call center that receives and answers a customer's inquiry.
  • the problem solving time required is an estimation device.
  • the problem solving time estimation processing device 1 includes a data input unit 11, a dialogue type estimating unit 13, a similar speaker set calculating unit 15, a dialogue data associating unit 17, and a problem solving time calculating unit 19.
  • the data input unit 11 inputs a set of voice dialogue data 3.
  • the dialogue type estimation unit 13 is a section in which one speaker continuously speaks with a louder voice than the other speaker for a predetermined length or longer from each of the inputted voice conversation data 3 (hereinafter referred to as a lead utterance section). ), And the leading utterance interval that appears first is defined as the preceding initiative utterance interval. Then, it is determined whether the preceding initiative utterance section is on the speaker side of the customer or the operator. If the speaker in the preceding initiative section (preceding initiative speaker) is a customer (second channel), the type of the voice conversation data is set to “question”. If the preceding initiative speaker is an operator (first channel), the type of the voice interaction data 3 is set to “non-question”.
  • the similar speaker set calculation unit 15 learns the speaker characteristics of the customer and the operator for each of the input voice conversation data 3 by a predetermined machine learning processing method. Further, based on the learned speaker characteristics, the voice conversation data 3 in which the speaker characteristics of both the customer and the operator are within a certain similar range are collected and classified, and the classification of the voice conversation data 3 is similar to the similar conversation. A set of people.
  • the dialogue data associating unit 17 arranges the voice dialogue data 3 constituting the similar speaker dialogue set according to the reception time (recording time) for each of the similar speaker dialogue sets, and applies the corresponding correspondence conditions. Correlate speech dialogue data.
  • ⁇ Association is started from the voice conversation data of type “Question”.
  • Types “question” and “non-question” are consecutive in time order.
  • one voice conversation data 3x whose type is “Question” is extracted, and it is checked whether there is voice dialog data 3y whose type is “Non-Question” that is subsequent to the extracted voice conversation data 3x. . If there is the corresponding voice dialogue data 3y, the voice dialogue data 3x and 3y are associated with each other. If there is no corresponding voice dialogue data, no association is performed.
  • the problem solving time calculation unit 19 calculates the time from the earliest recording start time to the last recording end time of the associated voice conversation data 3x, 3y as the problem solving time 5, and outputs the problem solving time 5 To do.
  • FIG. 2 is a diagram showing a processing flow of the problem solving time estimation processing apparatus 1.
  • Step S1 The data input unit 11 of the problem solving time estimation processing apparatus 1 inputs a set 3 of voice conversation data.
  • FIG. 3 is a diagram showing an example of the contents of the utterances of the operator and the customer as the voice dialogue data 3
  • FIG. 3 is a diagram showing an example of the contents of the utterances of the operator and the customer as the voice dialogue data 3
  • the voice dialogue data 3 is voice data obtained by recording the voice of the dialogue between the operator and the customer as shown in FIG. 3 using a known recording device.
  • the voice interaction data 3 is composed of two channels. Voice data of the operator is recorded on the first channel (for example, L channel), and voice data of the customer is recorded on the second channel (for example, R channel) independently.
  • data identification information (recording 1), operator name (Yamada), recording date (05/10/11), recording start time (15:25:20)
  • recording start time 15:25:20
  • the recording end time (15:31:32) is stored.
  • the recording start time / end time is used as the response start time / end time.
  • Step S2 The dialogue type estimation unit 13 estimates the dialogue type of the input voice dialogue data 3.
  • the conversation type estimation unit 13 continuously utters the voice conversation data 3 from the voice data of each of the L / R channels with a louder voice than the other speaker for a predetermined length or longer.
  • the channel including the preceding initiative utterance section, which is the leading utterance section that appears first, is determined, and the channel is set as the preceding initiative utterer (preceding initiative utterance channel).
  • the conversation type estimation unit 13 calculates a sound power value for each predetermined unit section for each channel of the voice conversation data 3, and generates voice power information in which the power values are arranged in time series. . Then, the audio power information of each channel is compared in time series from the beginning, and in each predetermined determination unit section, a channel whose sum or percentage of the determination unit section of the power value is a larger value is determined. It is determined that the speaker is the lead speaker in the unit section.
  • the leading speaker in the determination unit section closer to the head in time series is identified as the preceding leading speaker.
  • the determination unit section of the leading speaker that is continuous from the determination unit section of the preceding initiative speaker and that is the same as the preceding initiative speaker is defined as the preceding initiative section.
  • the type of the voice conversation data 3 is set to “question”, and when the preceding initiative utterance section is included in the L channel, the voice conversation data 3 is set. Set the type of "Non-question”.
  • Step S3 The similar speaker set calculation unit 15 learns speaker characteristics of the voice data of each channel for each of the voice conversation data 3 by a predetermined processing method.
  • Step S4 Furthermore, the similar speaker set calculation unit 15 obtains a similarity relationship between the voice dialogue data 3 for each of the L channel and the R channel of the voice dialogue data 3 based on the speaker characteristics obtained by the learning process. , Voice dialogue data 3 having speaker characteristics within a certain similar range for both the L channel and the R channel are collected and classified to form a similar speaker dialogue set.
  • Step S5 The dialogue data associating unit 17 arranges the voice dialogue data 3 according to the recording time (for example, the recording start time) for each similar speaker dialogue set, and extracts one voice dialogue data whose type is “question”. . Then, voice dialogue data satisfying a predetermined condition is extracted and associated. For example, if there is voice dialogue data of the type “non-question” that follows in time, it is extracted and associated.
  • the recording time for example, the recording start time
  • voice dialogue data satisfying a predetermined condition is extracted and associated. For example, if there is voice dialogue data of the type “non-question” that follows in time, it is extracted and associated.
  • the dialogue asked by the customer and the dialogue of the operator's answer to the question are associated with each other.
  • Step S6 The problem solving time calculation unit 19 calculates the time from the recording start time of the voice dialog data of the previous dialog of the associated voice dialog data to the recording end time of the subsequent voice dialog data to calculate the problem solving time.
  • the problem solving time calculation unit 19 takes out the voice conversation data 3 that is not associated and the type is a question, calculates the time from the recording start time to the recording end time of the voice conversation data, and calculates the problem solving time.
  • FIG. 5 is a diagram showing a more detailed processing flow of the processing in step S2.
  • Step S20 The voice dialogue data 3 is divided into predetermined unit sections.
  • the unit interval is, for example, a value of 1 to 2 seconds.
  • Step S21 The average of the power values of the voices in each unit section is obtained and converted to voice power information 4 that is a continuation of time-series power values.
  • the audio power information 4 is information on bit strings obtained by converting an average value (power) of audio data of each channel in a predetermined unit section into a bit string using a predetermined threshold th and arranging them in time series. Therefore, if the voice power of the utterance is greater than or equal to a certain threshold th, “1” is stored in the bit, otherwise it remains “0”.
  • FIG. 6 shows a processing flow of the generation processing of the audio power information 4 in step S21.
  • the Fourier transform process is applied to each channel of the voice dialogue data 3 to obtain a column of [power, pitch] (step S111). Further, a unit section m which is the minimum time unit of the power train is determined (step S112). As the voice power information 4, an average power value is obtained for each unit section m from the beginning of the voice conversation data 3. If the average power value is equal to or greater than the threshold th, “1” is indicated. If the average power value is less than the threshold th, “0” is indicated. A bit string to which is added is output (step S113).
  • FIG. 7 is a diagram showing the voice power information 4 of the voice conversation data (recording 1) 3.
  • Step S22 From the converted voice power information 4, the total response time, the preceding utterance channel, the preceding initiative utterer (channel), and the precedence initiative utterance time are acquired as attribute information.
  • the total response time indicates the total time of actual dialogue of the voice dialogue data 3. As shown in FIG. 8, it is obtained by the difference between the recording start time and the end time of the dialogue of the index information of the voice dialogue data.
  • FIG. 9 is a diagram showing an example of the total response time of each of the voice conversation data (recordings 1, 2,%) 3.
  • the preceding utterance channel indicates a channel in which the utterance preceded in the dialogue between the customer and the operator.
  • the channel having the earliest unit section in which “1” is assigned to the bit is defined as the preceding speech channel.
  • the values of the preceding speech channel are “L”, “R”, and “LR”.
  • the recipient of the telephone call starts the conversation, that is, speaks first. Therefore, in the case of a customer-side call at the time of a normal inquiry, the first utterance is an operator. Conversely, when the operator calls back to the customer, the operator calls and the first utterance is the customer. In general, callback conversations rarely include customer questions, so by identifying which channel the operator or customer's voice was recorded on corresponds to the preceding speech channel, the operator's call You can specify the back-up dialogue.
  • FIG. 11 is a diagram showing the preceding speech channel of each of the voice conversation data (recordings 1, 2,%) 3.
  • the leading initiative speaker is the initiative speaker (channel) of the determination unit interval closest to the head among the initiative speakers in the predetermined determination unit interval.
  • the conversation type estimation unit 13 sets a channel having a large total number (or a high ratio) of unit sections in which a power value bit of the voice power information 4 is “1” within a predetermined determination unit section as a leading speaker. judge. Then, the leading utterer in the determination unit section closest to the head (the first determination unit section in the time series) is specified as the preceding initiative utterance.
  • the unit determination interval in which the preceding initiative speech channel is determined as the lead speaker The continuation is the lead-led utterance time.
  • FIG. 12 is a diagram for explaining the preceding initiative utterer and the precedence initiative utterance time.
  • the dialogue type estimation unit 13 performs a determination process by shifting a window indicating a range of a unit section that is a target of a predetermined determination process by a predetermined movement unit.
  • the window size n of the processing corresponding to the unit determination time is 15 seconds (unit interval)
  • the window shift unit k is 3 seconds (unit interval).
  • the number of unit sections to which “1” is assigned as the power value for each channel is calculated, and a channel with a large number of unit sections is determined as the leading speaker.
  • a channel having a large number of unit sections of “1” is determined as the lead speaker.
  • R channel is determined as the leading speaker in the first to fifth determination processes
  • L channel is determined in the sixth determination process
  • LR is determined in the seventh determination process. Yes. Therefore, the “R channel” determined as the leading speaker in the earliest determination unit section is determined as the preceding leading speaker (preceding leading utterance channel).
  • the continuous section of the determination section is set as the preceding initiative utterance time.
  • the continuous section up to the unit section plus half of the window size n at that time is calculated as the preceding lead speech period.
  • FIG. 13 and FIG. 14 are process flow diagrams for obtaining the preceding initiative utterer and the precedence initiative utterance time.
  • the conversation type estimation unit 13 selects the L channel specified as the preceding speech channel (step S131).
  • a window size n is set (step S132), and a pointer is set at the head of the bit string of the audio power information (step S133).
  • the number of unit sections in which the bit on the L channel side is “1” is calculated and set as a value A (step S134). Further, the number of unit sections in which the bit on the R channel side is “1” in the window is calculated as a value B (step S135).
  • step S1312 the window is shifted by the movement unit k (step S1312), and if the window has reached the end of the bit string of the audio power information 4 (FIG. 14: step S1313), the process proceeds to step S1314. If the end of the bit string of information 4 has not been reached, the process returns to step S134. In the process of step S1314, the initiative speaker value whose pointer position is “0” is set as the value of the preceding initiative speaker.
  • step S1315 the range (L) of the unit interval in which the values of the preceding initiative speaker and the initiative speaker continuously take the same value is obtained (step S1315).
  • FIG. 15 is a diagram showing a calculation result of the preceding initiative utterance time of the voice dialogue data (recording 1) 3.
  • the start second indicates the start position of the window
  • the window size indicates the window size n.
  • the lead channel is the channel determined to be the lead speaker
  • the L ratio and the R ratio indicate the number of unit sections to which “1” is assigned in the window.
  • Step S23 The dialogue type estimation unit 13 determines the question utterance unit from the preceding initiative utterer (channel) and the precedence initiative utterance time. For example, when the preceding initiative utterance channel is the R channel, that is, the channel in which the customer's voice is recorded, the time corresponding to the precedence initiative utterance time is specified as the question utterance section.
  • the preceding initiative utterance channel is the R channel, that is, the channel in which the customer's voice is recorded
  • the time corresponding to the precedence initiative utterance time is specified as the question utterance section.
  • FIG. 16 is a process flow diagram for determining a question utterance part based on a rule base.
  • the dialogue type estimation unit 13 sets a set of [preceding utterer (channel), preceding led utterer (channel), preceding led utterance time, total response time] for the speech target data to be determined as shown in FIG. Input (step S141).
  • step S142 to step S147 is performed.
  • step S141 It is determined whether the input of step S141 corresponds to rule 1 (step S142). If it corresponds to rule 1, it is further determined whether it corresponds to rule 2 (step S143). If it corresponds to rule 2, Further, it is determined whether it corresponds to rule 3 (step S144). If it corresponds to rule 3, it is further determined whether it corresponds to rule 4 (step S145). (Step S146), and if it falls under rule 5, it is determined that there is no question utterance part (reject) (step S147). On the other hand, if none of the rules 1 to 5 is satisfied, it is determined that the question utterance part is included (step S148).
  • the voice dialogue data of recording 1 and recording 2 includes the question utterance part (accept) among the voice dialogue data of FIG. It is determined that the question utterance part is not included (reject).
  • Step S24 The dialogue type estimation unit 13 determines whether the question utterance unit exists in the voice dialogue data 3.
  • Step S25 When it is determined as “accept” in the process of step S24, that is, when a question utterance part exists in the voice dialogue data 3 (YES in step S24), the dialogue type of the voice dialogue data 3 is set to “ “Question”.
  • Step S26 When it is determined as “reject” in the process of Step S24, that is, when there is no question utterance part in the voice dialogue data 3 (NO in Step S24), the dialogue type of the voice dialogue data 3 is set to “ Other ”.
  • the similar speaker set calculation unit 15 extracts voice data for each channel recorded with a certain power and length from the voice dialogue data as learning data used for the speaker feature learning process, and each of the extracted voice data The speaker feature is learned about and the similarity of the speaker feature is calculated for all speech data. The similarity is calculated by the brute force method of audio data.
  • FIG. 19 is a diagram showing a process flow of learning data extraction in the process of step S3.
  • Step S300 The similar speaker set calculation unit 15 applies a Fourier transform process to each channel of the voice conversation data 3 to obtain a column of [power, pitch].
  • Step S301 A unit interval m which is the minimum time unit of the power train is determined.
  • Step S302 For the voice data of each channel of the voice dialogue data 3, an average power value is obtained for each unit section m from the head, and a portion where the average power value is equal to or greater than the threshold th2 is output. Audio data corresponding to the output location is stored in the audio set A.
  • Step S303 The total recording time of the output audio data is recorded in association with each audio data.
  • FIG. 20 is a diagram showing a processing flow of the learning process in the process of step S3. The following processing is performed for each piece of voice data in the voice set A.
  • Step S310 The similar speaker set calculation unit 15 extracts one piece of voice data from the voice set A.
  • Step S311 The speaker features are learned using the extracted voice data, and the learning results (speaker feature data set) are stored in the learning set B.
  • Step S312 The used voice data is removed from the voice set A.
  • Step S313 If there is remaining voice data in the voice set A, the process returns to step S310, and if all voice data is used, the process ends.
  • FIG. 21 is a diagram showing a processing flow of similarity calculation processing in step S3. The following processing is performed for each piece of voice data in the voice set A.
  • Step S320 The similar speaker set calculation unit 15 extracts one speaker feature data set from the learning set B.
  • Step S321 The similarity for all audio data of the audio set A is calculated.
  • FIG. 22 shows the total learning target time and average similarity of speech data (A, B, C,%) Constituting speech set A.
  • the total learning target time is the recording time of voice data used to learn speaker characteristics.
  • the average similarity is an average value of similarities when outputting similarities with other audio data.
  • Step S322 The used speaker feature data set is removed from the learning set B.
  • Step S323 If there is a remaining speaker feature data set in the learning set B, the process returns to step S320, and if all speaker feature data sets are used, the process ends.
  • FIG. 23 is a diagram showing an example of the calculated similarity matrix.
  • the similarity is “0” as the maximum (minimum distance), and the greater the value, the less similar.
  • the column A of the similarity matrix shown in FIG. 23 is the similarity of the other audio data (B, C,%) With respect to the audio data A, and the audio data A includes the audio data A, B, C, D, E,..., And “0, 30, 1500, 25000, 230,.
  • the similar speaker set calculation unit 15 calculates speaker authentication, that is, a set of voice data that can be regarded as the same speaker (similar speaker set) based on the similarity of each voice data. Here, in order to improve the determination accuracy, a similarity correction process is performed.
  • FIG. 24 is a diagram showing a more detailed processing flow of the processing in step S4.
  • Step S40 The similar speaker set calculation unit 15 determines a threshold th3 for determining speech data to be regarded as a similar speaker candidate.
  • Step S41 Obtain the average similarity of all audio data.
  • Step S42 “Undetermined” is set for voice data that is lower than a certain level with respect to the average similarity and is eliminated.
  • the average similarity is “23000000”
  • “undefined” is set to the audio data “1/4 or less of the average similarity” as shown in FIG.
  • Step S43 Among the voice data, “no similarity” is set for two voice data whose speaker characteristics on the L channel side (operator) of the corresponding voice dialogue data are not regarded as the same speaker.
  • Step S44 The similarity of the audio data is corrected according to the corresponding average similarity value, and the correction coefficient and the normalized similarity are calculated as shown in FIG.
  • Step S45 The pairwise similarity between the audio data to be processed and other audio data is calculated as an average value of the normalized similarity of each audio data.
  • the similarity between the audio data A and B is the average of the similarity between the audio data A and the audio data B (A ⁇ B similarity) and the similarity between the audio data B and the audio data A (B ⁇ A similarity).
  • FIG. 29 shows an example of the average similarity matrix of each voice dialogue data.
  • voice data B, C, E can be regarded as the same speaker, and a similar speaker set ⁇ A, B, C, E ⁇ is calculated.
  • voice data E can be regarded as the same speaker
  • voice data D can be regarded as the same speaker
  • a similar speaker set ⁇ C, D ⁇ is calculated.
  • a set of dialogue types other than the above is not continuous.
  • FIG. 31 is a more detailed process flow diagram of the process of step S5.
  • Step S50 The dialogue data associating unit 17 leaves the voice data satisfying the condition for the voice data constituting the similar speaker set based on the dialogue type of the voice data.
  • the similar speaker set ⁇ A, B, C, E ⁇ is targeted, since the voice data A is a “question” from the conversation type shown in FIG. 32 (B), this is the starting point. Since the voice data B and C are “questions” and do not satisfy the condition, they are excluded. The voice data E is “other” and satisfies the condition, and is left, and ⁇ A, E ⁇ is left in this similar speaker set.
  • Step S51 Output voice dialogue data corresponding to the voice data left in each similar speaker set in time order.
  • step S50 when the recording time of each voice conversation data is in the order shown in FIG. 32C, “voice data A ⁇ E”, “voice data B”, “voice data” shown in FIG. Three associations “C ⁇ D” are output.
  • the problem solving time calculation unit 19 obtains the output of the dialogue data association unit 17 and calculates the problem solving time based on the recording start time and the recording end time recorded in the header of each voice dialogue data.
  • the time t1 from the recording start time of the voice conversation data A to the recording end time of the voice conversation data E is defined as the problem solving time.
  • the time t2 from the recording start time of the voice dialogue data C to the recording end time of the voice dialogue data D is shown in FIG.
  • the time t3 from the recording start time to the recording end time of the voice dialogue data B is set as the problem solving time.
  • the problem solving time estimation processing apparatus 1 can be realized as a computer program, and this program can be stored in a suitable recording medium such as a portable medium memory, a semiconductor memory, or a hard disk that can be read by a computer. It can be provided by being recorded on these recording media, or can be provided by transmission / reception using various communication networks via a communication interface.
  • a suitable recording medium such as a portable medium memory, a semiconductor memory, or a hard disk that can be read by a computer. It can be provided by being recorded on these recording media, or can be provided by transmission / reception using various communication networks via a communication interface.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Un dispositif d'estimation de temps de résolution de problème (1) comprend une unité d'entrée de données (11) qui reçoit des données de conversation (3) concernant les paroles d'un opérateur et les paroles d'un client enregistrées séparément et une unité d'estimation de type de conversation (13) qui identifie le locuteur qui parle en premier et qui conduit la conversation sur la base des données de conversation (3), qui fixe le type des données de conversation en tant que « question » si le premier locuteur et locuteur qui conduit la conversation est le client, et qui fixe le type en tant que « non-question » si le premier locuteur et locuteur qui conduit la conversation est l'opérateur. Une unité de calcul d'ensemble de locuteurs similaires (15)  apprend les caractéristiques de locuteurs à partir des données de conversation (3) et définit un ensemble de conversations de locuteurs similaires en collectant des données sur la base desquelles il est estimé que la caractéristique de locuteur du client est similaire à celle de l'opérateur. Une unité d'association de données de conversation (17) arrange les données dans l'ordre du temps d'enregistrement pour chaque ensemble de conversations de locuteurs similaires et associe les données de conversation dont le type est « question » avec les données de conversation qui se suivent dans le temps et dont le type est « non-question ». Une unité de calcul de temps de résolution de problème (19) calcule le temps du premier instant de début d'enregistrement jusqu'au dernier instant de fin d'enregistrement des données de conversation associées et en fait un temps de résolution de problème.
PCT/JP2008/058056 2008-04-25 2008-04-25 Programme d'estimation de temps de résolution de problème, dispositif de traitement, et procédé de traitement WO2009130785A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2010509012A JP5099218B2 (ja) 2008-04-25 2008-04-25 問題解決時間推定処理プログラム,処理装置および処理方法
PCT/JP2008/058056 WO2009130785A1 (fr) 2008-04-25 2008-04-25 Programme d'estimation de temps de résolution de problème, dispositif de traitement, et procédé de traitement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2008/058056 WO2009130785A1 (fr) 2008-04-25 2008-04-25 Programme d'estimation de temps de résolution de problème, dispositif de traitement, et procédé de traitement

Publications (1)

Publication Number Publication Date
WO2009130785A1 true WO2009130785A1 (fr) 2009-10-29

Family

ID=41216534

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2008/058056 WO2009130785A1 (fr) 2008-04-25 2008-04-25 Programme d'estimation de temps de résolution de problème, dispositif de traitement, et procédé de traitement

Country Status (2)

Country Link
JP (1) JP5099218B2 (fr)
WO (1) WO2009130785A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011692A (zh) * 2019-12-20 2021-06-22 江苏省城市轨道交通研究设计院股份有限公司 一种城市轨道交通总承包现场安全管理系统及方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190310888A1 (en) * 2018-04-05 2019-10-10 The Fin Exploration Company Allocating Resources in Response to Estimated Completion Times for Requests

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001331624A (ja) * 2000-05-22 2001-11-30 Hitachi Ltd 顧客対応者の技術レベル評価方法及びそのシステム
JP2004096149A (ja) * 2002-08-29 2004-03-25 Casio Comput Co Ltd 通話内容管理装置およびプログラム
JP2005338610A (ja) * 2004-05-28 2005-12-08 Toshiba Tec Corp 情報入力装置および情報蓄積処理装置
JP2007312186A (ja) * 2006-05-19 2007-11-29 Nec Corp 通話音声録音再生装置及び通話音声録音再生方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001331624A (ja) * 2000-05-22 2001-11-30 Hitachi Ltd 顧客対応者の技術レベル評価方法及びそのシステム
JP2004096149A (ja) * 2002-08-29 2004-03-25 Casio Comput Co Ltd 通話内容管理装置およびプログラム
JP2005338610A (ja) * 2004-05-28 2005-12-08 Toshiba Tec Corp 情報入力装置および情報蓄積処理装置
JP2007312186A (ja) * 2006-05-19 2007-11-29 Nec Corp 通話音声録音再生装置及び通話音声録音再生方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011692A (zh) * 2019-12-20 2021-06-22 江苏省城市轨道交通研究设计院股份有限公司 一种城市轨道交通总承包现场安全管理系统及方法

Also Published As

Publication number Publication date
JPWO2009130785A1 (ja) 2011-08-11
JP5099218B2 (ja) 2012-12-19

Similar Documents

Publication Publication Date Title
US11227603B2 (en) System and method of video capture and search optimization for creating an acoustic voiceprint
US11636860B2 (en) Word-level blind diarization of recorded calls with arbitrary number of speakers
US10109280B2 (en) Blind diarization of recorded calls with arbitrary number of speakers
US6775651B1 (en) Method of transcribing text from computer voice mail
US10789943B1 (en) Proxy for selective use of human and artificial intelligence in a natural language understanding system
US9672825B2 (en) Speech analytics system and methodology with accurate statistics
JP2019053126A (ja) 成長型対話装置
WO2007086042A2 (fr) Procede et appamethod and apparatus for segmentation of audio interactions
JP5099211B2 (ja) 音声データの質問発話部抽出処理プログラム,方法および装置,ならびに音声データの質問発話部を用いた顧客問い合わせ傾向推定処理プログラム,方法および装置
JP7160778B2 (ja) 評価システム、評価方法、及びコンピュータプログラム。
CN113744742A (zh) 对话场景下的角色识别方法、装置和系统
JP5099218B2 (ja) 問題解決時間推定処理プログラム,処理装置および処理方法
US20190272828A1 (en) Speaker estimation method and speaker estimation device
US20220165276A1 (en) Evaluation system and evaluation method
JP7162783B2 (ja) 情報処理装置、推定方法、及び推定プログラム
US20230169981A1 (en) Method and apparatus for performing speaker diarization on mixed-bandwidth speech signals
JP2022038498A (ja) 選定プログラム、選定方法および選定装置
JPH0689099A (ja) 連続音声認識方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08740858

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2010509012

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08740858

Country of ref document: EP

Kind code of ref document: A1