WO2016009634A1 - Conversation analysis system, conversation analysis method, and storage medium wherein conversation analysis program is recorded - Google Patents

Conversation analysis system, conversation analysis method, and storage medium wherein conversation analysis program is recorded Download PDF

Info

Publication number
WO2016009634A1
WO2016009634A1 PCT/JP2015/003523 JP2015003523W WO2016009634A1 WO 2016009634 A1 WO2016009634 A1 WO 2016009634A1 JP 2015003523 W JP2015003523 W JP 2015003523W WO 2016009634 A1 WO2016009634 A1 WO 2016009634A1
Authority
WO
WIPO (PCT)
Prior art keywords
knowledge
feature amount
conversation
feature
utterance
Prior art date
Application number
PCT/JP2015/003523
Other languages
French (fr)
Japanese (ja)
Inventor
祐 北出
祥史 大西
Original Assignee
日本電気株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電気株式会社 filed Critical 日本電気株式会社
Priority to JP2016534111A priority Critical patent/JPWO2016009634A1/en
Publication of WO2016009634A1 publication Critical patent/WO2016009634A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates

Definitions

  • the present invention relates to a conversation analysis system, a conversation analysis method, and a conversation analysis program for estimating a speaker's knowledge level from conversation.
  • Knowledge level corresponds to the result of classifying whether the target speaker is familiar with the predetermined theme or the peripheral information of the predetermined theme into two or more classes, or the result of quantification.
  • the predetermined theme is, for example, a conversation subject itself.
  • Patent Document 1 describes an example of a conversation analysis device.
  • the knowledge amount estimation information generation device described in Patent Literature 1 includes an utterance string extraction unit 1, an utterance intention determination unit 2, a feature amount extraction unit 3, an estimation information generation unit 4, and the like. , Knowledge amount label 4a, knowledge amount estimation unit 5, and estimated information storage unit 5a.
  • the knowledge amount estimation information generation apparatus configured as shown in FIG. 7 is mainly divided into a learning unit and an estimation unit, and operates as follows.
  • the learning unit of the knowledge amount estimation information generation device extracts text data composed of the utterance sequence by the utterance sequence extraction unit 1 To do. Next, the learning unit determines whether “question”, “explanation”, and “conversation” are obtained from the text data of the utterance sequence regarding the dialogue between the inquirer and the respondent extracted by the utterance sequence extraction unit 1 in the utterance intention determination unit 2. Each utterance representing the utterance intention is determined. After the determination, the learning unit associates the utterance intention with the target utterance.
  • the learning unit calculates the number of different words of the user related to the appearance word (hereinafter referred to as “used vocabulary feature amount”) by the feature amount extraction unit 3.
  • the learning unit calculates the vocabulary feature amount used, and calculates the number of appearances of the utterance intention of each of “question”, “explanation”, and “consideration” determined by the utterance intention determination unit 2.
  • the learning unit extracts an utterance including a question word as an interrogative question sentence from utterances indicating the utterance intention of the “question”, and calculates the number of appearances.
  • an intention feature quantity related to the number of appearances of “question”, “explanation”, “conformity”, and “question word question sentence” are collectively referred to as an intention feature quantity.
  • the learning unit uses the estimated information generation unit 4 as the learning data using the intention feature amount, the vocabulary feature amount calculated by the feature amount extraction unit 3, and the knowledge amount label 4a that is correct information related to the knowledge amount.
  • the estimation information used to estimate the knowledge amount for the input text (speech recognition result 7) is generated.
  • the estimation unit performs the same processing as the processing performed by the learning unit in the utterance string extraction unit 1, the utterance intention determination unit 2, and the feature amount extraction unit 3 with respect to the input speech recognition result 6.
  • the used vocabulary feature value and the dialogue feature value are obtained.
  • the estimation unit estimates the knowledge amount from the use vocabulary feature amount and the dialogue feature amount calculated by the knowledge amount estimation unit 5 and the estimation information generated by the learning unit and stored in the estimation information storage unit 5a. To do.
  • the knowledge amount estimation information generating device described in Patent Literature 1 can estimate the user's knowledge amount when the input text to be evaluated is not a written word, that is, not a sentence conforming to the correct grammar. Have difficulty. Sentences that do not conform to correct grammar are, for example, broken sentences such as colloquial expressions and sentences that include recognition errors.
  • a general conversation analysis apparatus calculates a used vocabulary feature amount and an intention feature amount from a speech recognition result to be evaluated, and estimates a knowledge amount.
  • the used vocabulary feature amount is a feature amount related to the appearance word.
  • the intention feature amount is the number of utterances in each classification when each utterance is classified into “question”, “explanation”, “conflict” and “question word question sentence” by language processing such as pattern matching. .
  • both the used vocabulary feature amount and the intention feature amount are calculated based on language information.
  • the linguistic information used for calculating the various feature values described above is an appearance word, a word string, or a character string (hereinafter referred to as a symbol) itself. Also, additional information such as notation, part of speech and meaning of the symbol, or statistical information based on the symbol such as the appearance frequency required for each symbol is used.
  • the accuracy of estimation of the amount of knowledge of the user by the conversation analysis device largely depends on the grammatical correctness of the utterance content or the accuracy of the recognition result when the utterance is recognized.
  • the conversation analysis device can estimate the knowledge amount of the user.
  • the problem with a general conversation analyzer is that it is difficult to calculate the correct vocabulary feature amount and intention feature amount when a broken sentence different from the written word is input. It is a difficult point to estimate.
  • Non-Patent Document 2 uses a feature amount extracted from a conversation state between speakers for estimation of a knowledge level.
  • the correct knowledge level of the speaker can be estimated, for example, when speech data of a frank speech is input or when the speech recognition rate is low. difficult.
  • the reason is that the method described in Non-Patent Document 2 does not use knowledge feature amounts respectively obtained from different feature amounts such as language feature amounts and dialogue feature amounts for estimation of the knowledge level.
  • the conversation analysis device When the knowledge feature amount obtained from each of the different feature amounts as described above is not used, the conversation analysis device, for example, knows the knowledge feature obtained from the different feature amount (for example, the dialogue feature amount not affected by the language feature amount).
  • the estimation result of the quantity cannot be supplemented with the erroneous estimation result of the knowledge feature quantity based on the linguistic feature quantity. For example, when a broken sentence different from a written word is input, it is difficult for the conversation analysis apparatus to correctly estimate the speaker's knowledge level.
  • the present invention has been made to solve the above-described problems. That is, the present invention mainly provides a conversation analysis system, a conversation analysis method, and a conversation analysis program that can robustly estimate a speaker's knowledge level even when a broken sentence different from written words is input.
  • the present invention mainly provides a conversation analysis system, a conversation analysis method, and a conversation analysis program that can robustly estimate a speaker's knowledge level even when a broken sentence different from written words is input.
  • a conversation analysis system includes a conversation feature amount extraction unit that extracts a conversation feature amount that is a feature amount related to a conversation state between speakers from voice data and text data of the voice data, and is included in the text data.
  • a conversation analysis method is a feature amount related to a word included in text data by extracting a speech feature amount that is a feature amount related to a conversation state between speakers from voice data and text data of the voice data. Extract language features, estimate the knowledge features from the extracted conversation features and language features, and the knowledge feature estimation model that holds the identification pattern indicating the knowledge features. Integrated to estimate the speaker's knowledge level.
  • a conversation analysis program is a computer program that extracts speech feature data, which is a feature value related to a conversation state between speakers, from speech data and text data of speech data.
  • speech feature data which is a feature value related to a conversation state between speakers
  • knowledge from language feature extraction processing that extracts language features, which are features related to the contained words, from the extracted conversation features and language features, and knowledge feature estimation models that hold identification patterns indicating knowledge features
  • Knowledge feature amount estimation processing for estimating the feature amount and knowledge level estimation processing for estimating the knowledge level of the speaker by integrating the estimated knowledge feature amount are executed.
  • the object of the present invention is also achieved by a computer-readable storage medium in which the conversation analysis program is stored.
  • FIG. 1 is a block diagram illustrating a configuration example of a learning system of the conversation analysis apparatus according to the embodiment of the present invention.
  • FIG. 2 is an explanatory diagram showing the concept of knowledge features in the embodiment of the present invention.
  • FIG. 3 is a block diagram illustrating a configuration example of an estimation system of the conversation analysis apparatus according to the embodiment of the present invention.
  • FIG. 4 is a flowchart showing the operation of the conversation analysis apparatus 100.
  • FIG. 5 is an explanatory diagram showing the evaluation result of the evaluation experiment by the conversation analysis device and the evaluation result of the evaluation experiment by another method according to the embodiment of the present invention.
  • FIG. 6 is a block diagram showing an outline of the conversation analysis system in the embodiment of the present invention.
  • FIG. 7 is a block diagram illustrating a configuration of the knowledge amount estimation information generation device described in Patent Document 1.
  • FIG. 8 is an explanatory diagram illustrating a hardware configuration capable of realizing the conversation analysis system or the conversation analysis apparatus according to the embodiment of the present invention.
  • FIG. 1 is a block diagram illustrating a configuration example of a learning system of the conversation analysis apparatus according to the embodiment of the present invention.
  • the input voice data is interactive voice data that can be reproduced in stereophonic (hereinafter referred to as “stereo”), in which different speaker voices are input to the left and right channels, respectively.
  • stereo stereophonic
  • the configuration and operation of the conversation analysis apparatus according to the present embodiment will be described by taking as an example the case of estimating the speaker's knowledge level.
  • the input audio data may be data reproducible by a method other than stereo.
  • the input voice data may be voice data of dialogue by three or more people. Even when voice data of conversations by three or more people is input, if the voice data of each speaker is separated using speaker recognition technology or the like, the conversation analysis apparatus according to the present embodiment can provide the speaker's knowledge. The level can be estimated.
  • the learning system of the conversation analysis apparatus 100 shown in FIG. 1 includes an utterance section calculation unit 101 and a feature amount extraction unit 102.
  • the learning system of the conversation analysis apparatus 100 includes knowledge feature quantity estimation model storage means 103, knowledge level estimation model storage means 105, knowledge feature quantity estimation model creation means 110, and knowledge level estimation model creation means 111. .
  • the utterance interval calculation means 101 has a function of calculating an utterance interval from input voice data and text data related to the audio data, and outputting the calculated utterance interval.
  • the text data related to such voice data may include, for example, text data of an utterance word obtained by voice recognition of the voice data.
  • the utterance section is a section in which utterance detection sections by the same speaker are continuous and grouped.
  • the utterance section is a unit for calculating the language feature value or the dialogue feature value.
  • the utterance detection section is a section where humans speak continuously without breathing.
  • the utterance detection section is automatically calculated by, for example, preprocessing for voice recognition.
  • the utterance detection section is not an automatically detected section, but may be a section with a margin before and after the automatically detected section. Further, the utterance detection section may not be a section where humans are speaking, but may be a section determined simply by a fixed time length.
  • the utterance section calculation means 101 describes the detected utterance detection.
  • the utterance interval may be calculated from the interval and speaker information.
  • the utterance interval calculation means 101 may classify the utterance based on the calculated utterance interval.
  • the feature amount extraction unit 102 obtains a language feature amount or a dialogue feature amount for each classified class. The obtained language feature amount or dialogue feature amount is used for estimation of the knowledge feature amount as will be described later.
  • the utterance section calculation means 101 arranges utterances by two speakers in time series using the speech section information and the speaker information included in the input text data. If there is no utterance detection section or speaker information in the input text data, the utterance section calculation means 101 may acquire the utterance detection section or speaker information by analyzing the input voice data.
  • the utterance interval calculation means 101 compares the utterance detection interval of one speaker (main speaker) with the utterance detection interval of the other speaker (interactive speaker), and interacts with the utterance detection interval of the main speaker. An utterance in which the person's utterance detection section is completely included is detected. As an example, this corresponds to the interaction by the interlocutor inserted while the main speaker is speaking.
  • the utterance section calculation means 101 performs processing for detecting utterances that are completely included in utterances by both speakers.
  • the utterance interval calculation means 101 combines the continuous utterance detection intervals of the same speaker among the remaining utterance detection intervals excluding the completely included utterance interval to make one interval. That is, the combined one section becomes the speech section.
  • the feature amount extraction unit 102 can calculate a more accurate feature amount by clarifying the meaning separation.
  • the utterance interval calculation means 101 can also use the utterance detection interval (utterance start time, utterance end time) and speaker information obtained from the input text data as the utterance interval.
  • the utterance detection section and the speaker information are used as the utterance section, the above processing by the utterance section calculation unit 101 becomes unnecessary.
  • the utterance section calculation means 101 may classify the utterances according to a predetermined criterion.
  • a predetermined criterion there is a method based on utterance control of the utterance.
  • the utterance section calculation means 101 has utterance with the initiative of utterance (hereinafter referred to as “leading utterance”) and no initiative for each utterance in the utterance section calculated as described above based on the initiative of conversation.
  • leading utterance the initiative of utterance
  • passive utterances There are two types of utterances (hereinafter referred to as passive utterances).
  • the utterance interval calculation means 101 classifies, for example, an utterance whose utterance interval is shorter than a threshold as a passive utterance.
  • the utterance section calculation unit 101 has a low number of phonemes (for example, “Yes” or “No”) that are easily misrecognized under the influence of acoustic conditions or recording conditions, and the reliability of the recognition result is low.
  • An utterance section including a word may be classified as a passive utterance.
  • the utterance interval calculation means 101 classifies utterances other than the utterance classified as passive utterance as the leading utterance.
  • the feature quantity extraction unit 102 regards the classification result by the utterance section calculation unit 101 as a classification class, and obtains a language feature quantity or a dialogue feature quantity for each class.
  • the reason for classifying utterances into leading utterances and passive utterances is that speaker characteristics can be made more obvious by classification.
  • the feature length (for example, average or variance) of the utterance length which is one of the dialogue feature values, varies greatly depending on the ratio of passive utterances. Therefore, if the utterance is focused on the leading utterance and the feature amount related to the utterance length is obtained, the feature amount for the utterance spoken independently can be obtained compared with the case where the utterance is not classified, and the feature of the speaker can be understood Become.
  • the feature quantity extraction unit 102 may use the utterance classification result before and after the target utterance as the classification class, in addition to regarding the utterance classification result as the classification class as described above. In addition, the feature amount extraction unit 102 may use a combination of the classification result of the target utterance and the classification result of the utterance before and after the target utterance as the classification class.
  • the feature amount extraction unit 102 includes a language feature amount extraction unit 102a and a dialogue feature amount extraction unit 102b. Text data, voice data, utterance section detection results, utterance classification results, and the like are input to the feature amount extraction unit 102. The feature amount extraction unit 102 outputs a language feature amount and a dialogue feature amount based on these input data.
  • the language feature quantity extraction unit 102a has a function of extracting a language feature quantity calculated from input text data.
  • the language feature amount is a word appearance frequency included in input text data, a statistical value based on the word appearance frequency, or the like.
  • the extracted linguistic feature quantity is the reliability of the recognition result given to each recognized word.
  • the language feature quantity extraction unit 102a may obtain the feature quantity using the class to which the recognized word belongs.
  • the language feature quantity extraction unit 102a replaces the appearing word with another symbol by performing notation correction for correcting the notation fluctuation, synonym expansion, or the like, so that the appearing word is not replaced with the replaced symbol.
  • a feature amount may be obtained.
  • the dialogue feature quantity extraction unit 102b has a function of extracting a dialogue feature quantity that is a feature quantity relating to a dialogue state between speakers, which is mainly calculated from voice data.
  • the dialogue feature amount is a feature amount that can be acquired when two or more people have a conversation.
  • the dialogue feature amount is calculated based on the utterance section.
  • the dialogue feature quantity extraction unit 102b can obtain, for example, a speaker's speaking speed, utterance length, and number of companions by analyzing the utterance section sandwiched between the utterance sections of the conversation person.
  • the dialogue feature quantity extraction unit 102b also analyzes the dialogue feature quantity by analyzing the utterance section between the beginning of the data and the utterance section of the talker and the utterance section between the end of the data and the utterance section of the talker. Can be requested.
  • the dialogue feature quantity extraction unit 102b can calculate a pause length value to be described later when the utterance section of each speaker is determined. As described above, the dialogue feature value extraction unit 102b can obtain various dialogue feature values based on the utterance section.
  • dialogue feature amounts such as speech speed, pause length, number of peers, and speech length.
  • Talk speed is the speed at which a speaker speaks in one unit of dialogue.
  • Speaking speed is expressed by the number of mora per unit time.
  • the speech speed is obtained by, for example, dividing the number of mora of the recognized word by the length of the utterance section.
  • a mora is a syllable that forms a single rhythm.
  • the pause length means the length of “between” when a speaker change occurs.
  • the pause length is calculated by obtaining the difference between the utterance end time of the utterance section immediately before the target utterance section and the utterance start time of the target utterance section.
  • the utterance length is the length of one utterance section. That is, the utterance length is the length of time from the utterance start time to the utterance end time of one utterance section.
  • the number of solicitations is the number of times that the interlocutor strikes.
  • the partner has a property that the dialogue person shows understanding of the content of the other party's utterance or prompts the continuation of the other party's utterance.
  • the dialogue feature amount extraction unit 102b may perform the recognition of the mutual agreement by pattern matching based on the recognition result or based on the utterance length. Further, the dialogue feature amount extraction unit 102b may perform the recognition of the conflict using the information on the utterance inclusion relation which is an example of the utterance classification result described above.
  • the knowledge feature quantity estimation model creation means 110 has a function of generating a knowledge feature quantity estimation model.
  • the knowledge feature quantity estimation model creation means 110 includes learning data including the speech data for learning in the feature quantity extraction means 102, the language feature quantity and the dialogue feature quantity extracted from the text data, the feature quantity and the speech for learning.
  • a knowledge feature quantity estimation model is generated using the knowledge feature quantity label 112 which is teacher data representing the knowledge feature quantity for the data.
  • the knowledge feature quantity estimation model creation means 110 sends the created knowledge feature quantity estimation model to the knowledge feature quantity estimation model storage means 103.
  • Knowledge feature is an element that determines the speaker's level of knowledge based on language features and dialogue features extracted from the language used by the speaker and the reaction of the speaker.
  • the knowledge feature quantity estimation model is a model generated by learning an identification pattern using, as input data, learning data including a set of a language feature quantity, a dialogue feature quantity, and a knowledge feature quantity label 112. It is.
  • SVM Small Vector Machine
  • Non-patent Document 1 Non-patent Document 1
  • the knowledge level estimation model creation unit 111 has a function of generating a knowledge level estimation model.
  • the knowledge level estimation model creating means 111 generates a knowledge level estimation model obtained by learning a knowledge level identification pattern, using the knowledge feature quantity label 112 and the knowledge label 113 which is knowledge level teacher data.
  • the knowledge level estimation model creating unit 111 generates a knowledge level estimation model in which the knowledge level identification pattern is learned using the result output by the knowledge feature amount estimation unit 104 (to be described later) with respect to the learning data and the knowledge label 113. To do. Then, the knowledge level estimation model creating unit 111 sends the generated knowledge level estimation model to the knowledge level estimation model storage unit 105.
  • the knowledge level estimation model creation unit 111 may use the knowledge feature amount label 112 instead of the output result of the knowledge feature amount estimation unit 104 with respect to the learning data.
  • the knowledge level estimation model storage unit 105 has a function of storing the knowledge level estimation model created by the knowledge level estimation model creation unit 111.
  • the knowledge level estimation model is a model for estimating a knowledge level for input data.
  • the knowledge level estimation model is obtained by learning an identification pattern using the result of knowledge feature amount estimation unit 104 described later output from learning data, or using knowledge feature amount label 112 and knowledge label 113 for learning data. Generated.
  • FIG. 3 is a block diagram illustrating a configuration example of an estimation system of the conversation analysis apparatus according to the embodiment of the present invention.
  • the estimation system of the conversation analysis apparatus 100 shown in FIG. 3 includes an utterance section calculation unit 101, a feature amount extraction unit 102, a knowledge feature amount estimation model storage unit 103, a knowledge feature amount estimation unit 104, and a knowledge level estimation model storage. Means 105 and knowledge level estimation means 106 are included.
  • the knowledge feature amount estimation means 104 and the knowledge level estimation means 106 that are not included in the learning system but are included only in the estimation system will be described.
  • Knowledge feature quantity estimation means 104 digitizes the knowledge feature quantity into a discrete value such as “0” or “1” or a continuous value ranging from “0” to “1” and outputs the digitized value.
  • the knowledge feature quantity estimation model storage unit 103 stores at least one knowledge feature quantity estimation model for one knowledge feature quantity estimated by the knowledge feature quantity estimation unit 104.
  • the knowledge feature quantity estimation unit 104 estimates the knowledge feature quantity, the language feature quantity and the dialogue feature quantity obtained from the input data are compared with the knowledge feature quantity estimation model to determine whether or not the knowledge feature exists. Identify. In the knowledge feature identification processing, SVM or the like is used as in the identification pattern learning. When there are a plurality of knowledge features, the knowledge feature amount estimation unit 104 performs identification processing for each knowledge feature.
  • the knowledge feature quantity estimation means 104 can determine the knowledge feature quantity by binary values such as “present”, “absent”, and “unknown”, for example, in addition to determining the knowledge feature quantity by binary “present” or “none”. It may be determined. Further, the knowledge feature quantity estimation means 104 may discriminate the knowledge feature quantity at a level higher than ternary values. When the knowledge feature quantity is discriminated at many levels, the knowledge feature quantity estimation unit 104 can output the knowledge feature quantity label 112 by configuring the above identification processing in multiple stages.
  • the knowledge feature quantity estimation means 104 may output a continuous value in addition to outputting the discrete value as described above.
  • the knowledge feature amount estimation unit 104 may use, for example, a score that is output together with the output result of the identification process.
  • the knowledge feature quantity estimation means 104 may adopt the number of knowledge feature quantities to be estimated, which is obtained by an experiment using development data and has an optimum knowledge level estimation accuracy. Further, the number of estimated knowledge feature amounts may be determined in advance manually.
  • the knowledge feature quantity estimation means 104 may determine the optimum number obtained by text analysis such as clustering of the described contents as the number of knowledge feature quantities to be estimated.
  • the optimum number is the number of knowledge feature amounts, the factor that led to the determination of the knowledge level corresponds to the knowledge feature.
  • the knowledge level estimation means 106 has a function of estimating the knowledge level.
  • the knowledge level estimation means 106 estimates the knowledge level by integrating knowledge feature quantities.
  • the knowledge level estimation means 106 estimates the knowledge level using the knowledge feature quantity estimation result output from the knowledge feature quantity estimation means 104 and the knowledge level estimation model stored in the knowledge level estimation model storage means 105.
  • the knowledge level estimation means 106 outputs the knowledge level estimation result from the knowledge level estimation model and the knowledge feature quantity estimation result by SVM or the like, as in the case of the identification pattern learning process.
  • the knowledge level estimation means 106 may determine the knowledge level as a binary value of “present” or “absent” or a level of three or more values.
  • the knowledge level estimating means 106 can output the knowledge level discriminated at many levels by, for example, configuring the identification processing in multiple stages.
  • the knowledge level estimation means 106 uses, for example, a score output together with the output result of the identification process.
  • the knowledge level estimating means 106 may use a majority method which is a known technique. When the majority method is used, the knowledge level estimation means 106 adopts the output result of each knowledge feature quantity as a discrete value, and adopts the output result of the knowledge feature quantity having the largest discrete value as the knowledge level output by the conversation analysis apparatus 100. . When the majority method is used, the knowledge level estimation model is not necessary.
  • the conversation analysis apparatus 100 of the present embodiment is realized by a CPU (Central Processing Unit) that executes processing according to a program, for example.
  • the conversation analysis apparatus 100 may be realized by hardware.
  • the utterance section calculation unit 101, the feature amount extraction unit 102, the knowledge feature amount estimation unit 104, the knowledge level estimation unit 106, the knowledge feature amount estimation model creation unit 110, and the knowledge level estimation model creation unit 111 for example, according to program control This is realized by a CPU that executes processing.
  • the knowledge feature quantity estimation model storage unit 103 and the knowledge level estimation model storage unit 105 are realized by, for example, a RAM (Random Access Memory).
  • FIG. 4 is a flowchart showing the operation of knowledge level estimation processing by the conversation analysis apparatus 100.
  • the utterance interval calculation means 101 calculates an utterance interval as a unit for calculating the language feature amount and the dialogue feature amount based on the information about the utterance described in the text data.
  • the information related to the utterance is, for example, an utterance detection section or speaker information (step S201).
  • the utterance interval calculation unit 101 may classify the utterance based on the calculated utterance interval. Based on the utterance interval or the classification result calculated by the utterance interval calculation unit 101, the feature amount extraction unit 102 calculates a language feature amount and a dialogue feature amount.
  • the language feature quantity extraction unit 102a calculates a language feature quantity such as word appearance frequency and word reliability related to the speech recognition result from the text data (step S202).
  • the dialog feature quantity extraction unit 102b uses the input voice data, text data, and the utterance period information calculated in step S201, and the dialog feature quantity such as the speech speed, pause length, utterance length, or number of companions. Is calculated (step S203). Note that the process in step S203 may be executed prior to the process in step S202. Two processes may be executed in parallel.
  • the knowledge feature quantity estimation unit 104 uses the language feature quantity calculated in step S202, the dialogue feature quantity calculated in step S203, and the knowledge feature quantity estimation model stored in the knowledge feature quantity estimation model storage unit 103.
  • the knowledge feature amount is estimated (step S204).
  • the knowledge level estimation means 106 estimates the knowledge level using the knowledge feature quantity estimation result by the knowledge feature quantity estimation means 104 and the knowledge level estimation model stored in the knowledge level estimation model storage means 105 (step S205). ). After outputting the knowledge level estimation result, the conversation analysis apparatus 100 ends the process.
  • the conversation analysis apparatus can robustly estimate the user's knowledge level even when a broken sentence different from the written word is input.
  • the reason is that when the conversation analyzer estimates the knowledge level, it is not affected by the accuracy of the speech recognition results and the collapse of the text, which are not included in the text data, and the dialogue such as the timing of speech and the speed of speech. This is because feature quantities are used.
  • the knowledge feature amount estimation unit estimates a knowledge feature amount obtained by a feature amount that is different from the language feature amount and the dialogue feature amount.
  • the conversation analyzer can reduce the influence on knowledge level estimation even when, for example, voice data of a frank speech is input or when the recognition rate is low.
  • the reason (that is, the reason why the influence on the knowledge level estimation is reduced) is that the estimation result of other knowledge features obtained from the interactive features not affected by the language features is the language feature. This is because an erroneous estimation result of the knowledge feature amount based on it can be complemented. [Evaluation experiment]
  • the input data was the stereo voice in which the operator's voice was recorded in one channel and the customer's voice was recorded in the other channel, and the voice recognition result of the stereo voice.
  • language features and dialogue features were extracted by the above method, and the knowledge level was estimated.
  • the knowledge label which is correct data used for the evaluation, was given two values, “high knowledge level” and “low knowledge level” for each call unit based on the subjective evaluation by hand. 100 files of correct data created in this way were prepared and an evaluation experiment was performed. The breakdown of 100 files was 46 files with “high knowledge level” and 54 files with “low knowledge level”.
  • 10-fold cross-validation was carried out when learning the knowledge feature estimation model, learning the knowledge level estimation model, and estimating the knowledge level.
  • the data was divided into 10 groups, of which 9 groups were used as learning data, and the remaining 1 group was used as evaluation data. Then, a test was performed on a combination of 10 patterns of learning data and evaluation data created by changing one group to be evaluation data.
  • the user describes the reason for judging the knowledge level, and the knowledge feature is defined based on the written reason for judgment.
  • knowledge level judgment factors for example, “technical terms” and “conversation fluency”
  • the found judgment factors were used as knowledge features.
  • the knowledge feature amount label that becomes teacher data was generated by clustering the data based on the judgment factors.
  • the knowledge feature amount of “technical term” that is one of the knowledge features indicates whether or not the technical term is included in the target learning data. That is, in the knowledge feature amount of “technical term”, whether the corresponding speaker uses the technical term is expressed by “0 (no technical term)” or “1 (with technical term)”.
  • the generated knowledge features include factors representing linguistic features such as technical terms and factors representing interactive features related to the flow of conversation.
  • factors representing linguistic features such as technical terms and factors representing interactive features related to the flow of conversation.
  • one knowledge feature quantity is estimated from only the language feature quantity.
  • Another knowledge feature amount is estimated only from the dialogue feature amount.
  • the remaining two knowledge feature quantities are estimated from combinations of language feature quantities and dialogue feature quantities.
  • the weights of the specific feature quantities of the language feature quantity and the dialogue feature quantity are different.
  • the knowledge level was estimated by integrating the output results of the four knowledge features estimated by the knowledge feature estimation model generated as described above.
  • Equation (1) the recall and precision shown in Equation (1) are calculated using the following equations when “knowledge level is high”.
  • Reproducibility (number that can be correctly estimated as “high knowledge level”) / (number of correct answers for “high knowledge level”) (2)
  • Relevance rate (number that can be correctly estimated as “high knowledge level”) / (number that is estimated as “high knowledge level”) (3)
  • the conversation analysis system 10 has knowledge feature quantity estimation means 13 (13) for estimating a knowledge feature quantity from the extracted conversation feature quantity and language feature quantity, and a knowledge feature quantity estimation model holding an identification pattern indicating the knowledge feature quantity.
  • knowledge feature amount estimation means 104 is provided.
  • the conversation analysis system 10 further includes knowledge level estimation means 14 (for example, knowledge level estimation means 106) that estimates the knowledge level of the speaker by integrating the estimated knowledge feature quantities.
  • the knowledge feature amount estimation model includes a language feature amount and a conversation feature amount calculated from learning speech data and text data of the speech data, and a knowledge feature amount label that is teacher data (for example, the knowledge feature amount label 112). An identification pattern indicating the knowledge feature amount learned from the above may be held.
  • the conversation analysis system can use a knowledge feature quantity estimation model suitable for inputting language feature quantities and conversation feature quantities in advance.
  • the conversation analysis system can estimate the knowledge level based on the knowledge level estimation model.
  • the conversation analysis system can use a knowledge level estimation model that is suitable for inputting knowledge feature amount labels in advance.
  • the conversation analysis system 10 includes speech section calculation means (for example, speech section calculation means 101) for obtaining speech sections in which speech detection sections by the same speaker are continuous from speech data and text data of speech data. Also good.
  • the language feature amount extraction unit 12 may extract a language feature amount based on the utterance section, and the conversation feature amount extraction unit 11 may extract the conversation feature amount based on the utterance section.
  • the conversation analysis system can obtain an utterance section from input data, and can extract a language feature amount and a conversation feature amount based on the utterance section.
  • the knowledge feature quantity estimation means 13 may estimate at least one knowledge feature quantity based on the language feature quantity and the conversation feature quantity.
  • the knowledge feature quantity estimation means 13 may estimate at least one knowledge feature quantity based only on the language feature quantity.
  • the physical component can be realized as, for example, an electronic circuit or a computer device described later.
  • the logical component can be realized as a software program executed in an electronic circuit or a computer device, for example.
  • the means for providing the function of each component described above is realized as a component (unit) of an apparatus or a system in which the function is mounted.
  • each component shown in each of the above figures includes hardware (an integrated circuit or a storage device on which processing logic is mounted) that is partially or fully integrated. It may be realized using.
  • the storage device 802 is a memory device such as a RAM that can be referred to from the arithmetic device 801, and stores software programs, various data, and the like. Note that the storage device 802 may be a volatile memory device.
  • the nonvolatile storage device 803 is a nonvolatile storage device such as a magnetic disk drive or a semiconductor storage device using flash memory.
  • the nonvolatile storage device 803 can store various software programs and data.
  • the drive device 804 is a device that processes reading and writing of data with respect to a storage medium 805 described later, for example.
  • the storage medium 805 is an arbitrary storage medium capable of recording data, such as an optical disk, a magneto-optical disk, and a semiconductor flash memory.
  • the conversation analysis system (or its constituent elements) according to the present invention described with the above-described embodiment as an example is, for example, a software program that can realize the functions described in the above-described embodiment with respect to the hardware device illustrated in FIG. It may be realized by supplying a program. More specifically, for example, the present invention may be realized by causing the arithmetic device 801 to execute a software program supplied to such a hardware device. In this case, an operating system running on the hardware device, database management software, network software, middleware such as a virtual environment platform, etc. may execute part of each process.
  • each means (or system component (unit) capable of realizing the means) shown in each of the above figures is a function (process) of a software program executed by the hardware described above. It can be realized as a software module that is a unit. That is, each component in the conversation analysis apparatus 100 (the utterance interval calculation unit 101, the feature amount extraction unit 102, the knowledge feature amount estimation unit 104, the knowledge level estimation unit 106, the knowledge feature amount estimation model creation unit 110, and the knowledge level estimation model creation.
  • the means 111 and the like may be realized as a software module in which those functions are implemented.
  • the classification of the software modules shown in the drawings is a configuration for convenience of explanation, and various configurations can be assumed for the implementation.
  • these software modules are stored in the nonvolatile storage device 803, and the arithmetic device 801 executes each process. These software modules may be read out to the storage device 802.
  • the present invention can be understood to be constituted by a code constituting the software program or a computer-readable storage medium in which the code is recorded.
  • the storage medium is not limited to a medium independent of the hardware device, but includes a storage medium in which a software program transmitted via the Internet or the like is downloaded and stored or temporarily stored.
  • the conversation analysis system described above may be configured by a virtual environment in which the hardware device illustrated in FIG. 8 is virtualized and various software programs (computer programs) executed in the virtual environment.
  • the components of the hardware device illustrated in FIG. 8 are provided as virtual devices in the virtual environment.
  • the present invention can be realized with the same configuration as when the hardware device illustrated in FIG. 8 is configured as a physical device.
  • the present invention analyzes a user's tendency based on a knowledge level using, for example, a voice database containing conversations between various contact points of a customer such as a contact center, that is, a customer and a business person such as a store clerk or an operator. It can be applied to a conversation analysis device or the like. The present invention can also be applied to applications such as a program for realizing the conversation analysis apparatus using a computer. The present invention can also be applied to a conversation analysis apparatus that can extract linguistic features and interactive features from conversation words and conversation exchanges such as user interests and preferences rather than knowledge level. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

Provided is a conversation analysis system capable of robustly estimating a speaker's knowledge level even with input of a broken sentence different from a written sentence. The conversation analysis system comprises: a conversation feature amount extraction means (11) for extracting, from speech data and text data of the speech data, a conversation feature amount pertaining to a conversation state between speakers; a language feature amount extraction means (12) for extracting a language feature amount pertaining to a word included in the text data; a knowledge feature amount estimation means (13) for estimating a knowledge feature amount from the extracted conversation feature amount and language feature amount and a knowledge feature amount estimation model retaining an identification pattern representing a knowledge feature amount; and a knowledge level estimation means (14) for estimating the speaker's knowledge level by integrating the estimated knowledge feature amounts.

Description

会話分析システム、会話分析方法および会話分析プログラムが記録された記憶媒体Conversation analysis system, conversation analysis method, and storage medium on which conversation analysis program is recorded
 本発明は、会話から話者の知識レベルを推定する会話分析システム、会話分析方法および会話分析プログラムに関する。 The present invention relates to a conversation analysis system, a conversation analysis method, and a conversation analysis program for estimating a speaker's knowledge level from conversation.
 知識レベルは、対象話者が所定のテーマ、または所定のテーマの周辺情報に関して精通しているかを2クラス以上に分類した結果、または数値化した結果に対応する。所定のテーマは、例えば、対話の主題そのものである。 Knowledge level corresponds to the result of classifying whether the target speaker is familiar with the predetermined theme or the peripheral information of the predetermined theme into two or more classes, or the result of quantification. The predetermined theme is, for example, a conversation subject itself.
 特許文献1に、会話分析装置の一例が記載されている。図7に示すように、特許文献1に記載されている知識量推定情報生成装置は、発話列抽出部1と、発話意図判別部2と、特徴量抽出部3と、推定情報生成部4と、知識量ラベル4aと、知識量推定部5と、推定情報記憶部5aとを含む。 Patent Document 1 describes an example of a conversation analysis device. As illustrated in FIG. 7, the knowledge amount estimation information generation device described in Patent Literature 1 includes an utterance string extraction unit 1, an utterance intention determination unit 2, a feature amount extraction unit 3, an estimation information generation unit 4, and the like. , Knowledge amount label 4a, knowledge amount estimation unit 5, and estimated information storage unit 5a.
 図7に示すように構成されている知識量推定情報生成装置は、主に学習部と推定部の2つに分かれて、以下のように動作する。 The knowledge amount estimation information generation apparatus configured as shown in FIG. 7 is mainly divided into a learning unit and an estimation unit, and operates as follows.
 ユーザとオペレータとの対話に対する音声認識結果7(学習用通話)が入力されると、知識量推定情報生成装置の学習部は、発話列抽出部1で、発話列で構成されるテキストデータを抽出する。次いで、学習部は、発話意図判別部2で、発話列抽出部1において抽出された問合せ者と回答者との対話に関する発話列のテキストデータから、「質問」、「説明」、「相槌」の発話意図を表す発話を各々判別する。判別した後、学習部は、発話意図と対象発話とを対応付ける。 When the speech recognition result 7 (learning call) for the dialogue between the user and the operator is input, the learning unit of the knowledge amount estimation information generation device extracts text data composed of the utterance sequence by the utterance sequence extraction unit 1 To do. Next, the learning unit determines whether “question”, “explanation”, and “conversation” are obtained from the text data of the utterance sequence regarding the dialogue between the inquirer and the respondent extracted by the utterance sequence extraction unit 1 in the utterance intention determination unit 2. Each utterance representing the utterance intention is determined. After the determination, the learning unit associates the utterance intention with the target utterance.
 次いで、学習部は、特徴量抽出部3で出現単語に関するユーザの異なり語数(以下、使用語彙特徴量という。)を算出する。学習部は、使用語彙特徴量を算出するとともに、発話意図判別部2において判別された「質問」、「説明」、「相槌」各々の発話意図の出現回数を算出する。 Next, the learning unit calculates the number of different words of the user related to the appearance word (hereinafter referred to as “used vocabulary feature amount”) by the feature amount extraction unit 3. The learning unit calculates the vocabulary feature amount used, and calculates the number of appearances of the utterance intention of each of “question”, “explanation”, and “consideration” determined by the utterance intention determination unit 2.
 また、学習部は、「質問」の発話意図を表す発話のうち、疑問詞を含む発話を疑問詞疑問文として抽出し、出現回数を算出する。なお、「質問」、「説明」、「相槌」および「疑問詞疑問文」の出現回数に関する特徴量を、総称して意図特徴量という。 Also, the learning unit extracts an utterance including a question word as an interrogative question sentence from utterances indicating the utterance intention of the “question”, and calculates the number of appearances. Note that the feature quantities related to the number of appearances of “question”, “explanation”, “conformity”, and “question word question sentence” are collectively referred to as an intention feature quantity.
 次いで、学習部は、推定情報生成部4で、特徴量抽出部3において算出された意図特徴量、使用語彙特徴量と、知識量に関する正解情報である知識量ラベル4aとを学習データとして用いて、入力テキスト(音声認識結果7)に対する知識量の推定に用いられる推定情報を生成する。 Next, the learning unit uses the estimated information generation unit 4 as the learning data using the intention feature amount, the vocabulary feature amount calculated by the feature amount extraction unit 3, and the knowledge amount label 4a that is correct information related to the knowledge amount. The estimation information used to estimate the knowledge amount for the input text (speech recognition result 7) is generated.
 次に、推定部は、入力された音声認識結果6に対して、発話列抽出部1、発話意図判別部2、および特徴量抽出部3において、学習部が実施した処理と同様の処理を行い、使用語彙特徴量と対話特徴量を求める。次いで、推定部は、知識量推定部5で、算出された使用語彙特徴量および対話特徴量と、推定情報記憶部5aに記憶されている、学習部が生成した推定情報とから知識量を推定する。 Next, the estimation unit performs the same processing as the processing performed by the learning unit in the utterance string extraction unit 1, the utterance intention determination unit 2, and the feature amount extraction unit 3 with respect to the input speech recognition result 6. The used vocabulary feature value and the dialogue feature value are obtained. Next, the estimation unit estimates the knowledge amount from the use vocabulary feature amount and the dialogue feature amount calculated by the knowledge amount estimation unit 5 and the estimation information generated by the learning unit and stored in the estimation information storage unit 5a. To do.
特開2013-167765号公報JP 2013-167765 A
 しかし、特許文献1に記載されている知識量推定情報生成装置は、評価対象である入力テキストが書き言葉ではない場合、すなわち正しい文法に則った文章ではない場合、ユーザの知識量を推定することが困難である。正しい文法に則っていない文章は、例えば、口語表現のような崩れた文章や、認識誤りが含まれる文章である。 However, the knowledge amount estimation information generating device described in Patent Literature 1 can estimate the user's knowledge amount when the input text to be evaluated is not a written word, that is, not a sentence conforming to the correct grammar. Have difficulty. Sentences that do not conform to correct grammar are, for example, broken sentences such as colloquial expressions and sentences that include recognition errors.
 一般的な会話分析装置は、評価対象である音声認識結果から使用語彙特徴量と意図特徴量とを算出し、知識量を推定する。使用語彙特徴量は、出現単語に関する特徴量である。
また、意図特徴量は、各発話をパターンマッチングなどの言語処理により「質問」、「説明」、「相槌」および「疑問詞疑問文」にそれぞれ分類した際の、各分類における発話の数である。
A general conversation analysis apparatus calculates a used vocabulary feature amount and an intention feature amount from a speech recognition result to be evaluated, and estimates a knowledge amount. The used vocabulary feature amount is a feature amount related to the appearance word.
In addition, the intention feature amount is the number of utterances in each classification when each utterance is classified into “question”, “explanation”, “conflict” and “question word question sentence” by language processing such as pattern matching. .
 すなわち、使用語彙特徴量および意図特徴量は、いずれも言語情報に基づいて算出される。前述の各種特徴量算出にあたっては、概ね正しい文章が入力されることが前提条件になる。 That is, both the used vocabulary feature amount and the intention feature amount are calculated based on language information. In the above-described calculation of various feature amounts, it is a precondition that almost correct sentences are input.
 前述の各種特徴量算出に使用される言語情報は、出現単語や単語列、または文字列(以下、シンボルという。)そのものである。また、表記、シンボルが持つ品詞や意味などの付帯情報、またはシンボルごとに求められる出現頻度などのシンボルに基づいた統計情報なども使用される。 The linguistic information used for calculating the various feature values described above is an appearance word, a word string, or a character string (hereinafter referred to as a symbol) itself. Also, additional information such as notation, part of speech and meaning of the symbol, or statistical information based on the symbol such as the appearance frequency required for each symbol is used.
 したがって、会話分析装置によるユーザの知識量の推定の精度は、発話内容の文法面での正しさ、または発話が認識された時の認識結果の正確さに大きく依存する。 Therefore, the accuracy of estimation of the amount of knowledge of the user by the conversation analysis device largely depends on the grammatical correctness of the utterance content or the accuracy of the recognition result when the utterance is recognized.
 入力される音声認識結果が、評価対象の会話音声が正しい文法に則って話され、かつ正しく認識された結果である場合、会話分析装置は、ユーザの知識量を推定できる。しかし、評価対象の会話に口語表現が含まれていたり、音声認識結果に認識誤りが多く含まれていたりする場合、会話分析装置は、正しい使用語彙特徴量および意図特徴量を算出することが困難である。正しい使用語彙特徴量および意図特徴量を算出できなければ、会話分析装置は、ユーザの知識量を正しく推定することが難しい。 When the input speech recognition result is a result of the conversation speech to be evaluated being spoken according to the correct grammar and correctly recognized, the conversation analysis device can estimate the knowledge amount of the user. However, it is difficult for the conversation analysis device to calculate the correct vocabulary feature quantity and intention feature quantity when the spoken expression is included in the conversation to be evaluated or the speech recognition result contains many recognition errors. It is. Unless the correct vocabulary feature amount and intention feature amount can be calculated, it is difficult for the conversation analysis device to correctly estimate the user's knowledge amount.
 すなわち、一般的な会話分析装置の問題点は、書き言葉と異なる崩れた文章が入力された場合、正しい使用語彙特徴量および意図特徴量を算出することが困難であるため、ユーザの知識量を正しく推定することが難しい点である。 That is, the problem with a general conversation analyzer is that it is difficult to calculate the correct vocabulary feature amount and intention feature amount when a broken sentence different from the written word is input. It is a difficult point to estimate.
 上記の課題を解決するために、話者の知識レベルの推定に話者間の会話の内容だけではなく、話者間の会話状態も使用することが考えられる。その理由は、会話状態からは音声認識結果の精度や文章の崩れに影響を受けにくい、発話のタイミングや発話のスピードなどの特徴量を抽出でき、抽出された特徴量を知識レベルの推定に活用できるためである。 In order to solve the above problem, it is conceivable to use not only the conversation contents between the speakers but also the conversation state between the speakers to estimate the knowledge level of the speakers. The reason for this is that features such as speech timing and speech speed can be extracted, and the extracted features are used to estimate the knowledge level. This is because it can.
 非特許文献2に記載されている方法は、知識レベルの推定に話者間の会話状態から抽出された特徴量を用いている。しかし、非特許文献2に記載されている方法を使用した場合、例えば、フランクな話し方の音声データが入力された時や音声の認識率が低い時に、話者の正しい知識レベルを推定することが難しい。その理由は、非特許文献2に記載されている方法が、言語特徴量と対話特徴量のような異なる特徴量からそれぞれ求められる知識特徴量を知識レベルの推定に使用していないためである。 The method described in Non-Patent Document 2 uses a feature amount extracted from a conversation state between speakers for estimation of a knowledge level. However, when the method described in Non-Patent Document 2 is used, the correct knowledge level of the speaker can be estimated, for example, when speech data of a frank speech is input or when the speech recognition rate is low. difficult. The reason is that the method described in Non-Patent Document 2 does not use knowledge feature amounts respectively obtained from different feature amounts such as language feature amounts and dialogue feature amounts for estimation of the knowledge level.
 上記したような異なる特徴量からそれぞれ求められる知識特徴量を使用しない場合、会話分析装置は、例えば、異なる特徴量(例えば、言語特徴量の影響を受けない対話特徴量)から求められた知識特徴量の推定結果を用いて、言語特徴量に基づいた知識特徴量の誤った推定結果を補完することができない。係る会話分析装置は、例えば、書き言葉とは異なる崩れた文章が入力された場合、話者の知識レベルを正しく推定することが困難である。 When the knowledge feature amount obtained from each of the different feature amounts as described above is not used, the conversation analysis device, for example, knows the knowledge feature obtained from the different feature amount (for example, the dialogue feature amount not affected by the language feature amount). The estimation result of the quantity cannot be supplemented with the erroneous estimation result of the knowledge feature quantity based on the linguistic feature quantity. For example, when a broken sentence different from a written word is input, it is difficult for the conversation analysis apparatus to correctly estimate the speaker's knowledge level.
 そこで、本発明は、上述した問題を解決するためになされたものである。即ち、本発明は、書き言葉とは異なる崩れた文章が入力された場合であっても、頑健に話者の知識レベルを推定できる会話分析システム、会話分析方法および会話分析プログラムを提供することを主たる目的の一つとする。 Therefore, the present invention has been made to solve the above-described problems. That is, the present invention mainly provides a conversation analysis system, a conversation analysis method, and a conversation analysis program that can robustly estimate a speaker's knowledge level even when a broken sentence different from written words is input. One of the purposes.
 本発明の一態様に係る会話分析システムは、音声データおよび音声データのテキストデータから話者間の会話状態に関する特徴量である会話特徴量を抽出する会話特徴量抽出手段と、テキストデータに含まれる単語に関する特徴量である言語特徴量を抽出する言語特徴量抽出手段と、抽出された会話特徴量および言語特徴量と、知識特徴量を示す識別パターンを保持する知識特徴量推定モデルとから知識特徴量を推定する知識特徴量推定手段と、推定された知識特徴量を統合して話者の知識レベルを推定する知識レベル推定手段とを備える。 A conversation analysis system according to an aspect of the present invention includes a conversation feature amount extraction unit that extracts a conversation feature amount that is a feature amount related to a conversation state between speakers from voice data and text data of the voice data, and is included in the text data. Knowledge features from linguistic feature extraction means for extracting linguistic features, which are features related to words, knowledge features estimation models that hold the extracted conversation features and language features, and identification patterns indicating knowledge features Knowledge feature quantity estimating means for estimating the quantity, and knowledge level estimating means for estimating the knowledge level of the speaker by integrating the estimated knowledge feature quantities.
 本発明の一態様に係る会話分析方法は、音声データおよび音声データのテキストデータから話者間の会話状態に関する特徴量である会話特徴量を抽出し、テキストデータに含まれる単語に関する特徴量である言語特徴量を抽出し、抽出された会話特徴量および言語特徴量と、知識特徴量を示す識別パターンを保持する知識特徴量推定モデルとから知識特徴量を推定し、推定された知識特徴量を統合して話者の知識レベルを推定する。 A conversation analysis method according to an aspect of the present invention is a feature amount related to a word included in text data by extracting a speech feature amount that is a feature amount related to a conversation state between speakers from voice data and text data of the voice data. Extract language features, estimate the knowledge features from the extracted conversation features and language features, and the knowledge feature estimation model that holds the identification pattern indicating the knowledge features. Integrated to estimate the speaker's knowledge level.
 本発明の一態様に係る会話分析プログラムは、コンピュータに、音声データおよび音声データのテキストデータから話者間の会話状態に関する特徴量である会話特徴量を抽出する会話特徴量抽出処理、テキストデータに含まれる単語に関する特徴量である言語特徴量を抽出する言語特徴量抽出処理、抽出された会話特徴量および言語特徴量と、知識特徴量を示す識別パターンを保持する知識特徴量推定モデルとから知識特徴量を推定する知識特徴量推定処理、および推定された知識特徴量を統合して話者の知識レベルを推定する知識レベル推定処理を実行させる。 A conversation analysis program according to an aspect of the present invention is a computer program that extracts speech feature data, which is a feature value related to a conversation state between speakers, from speech data and text data of speech data. Knowledge from language feature extraction processing that extracts language features, which are features related to the contained words, from the extracted conversation features and language features, and knowledge feature estimation models that hold identification patterns indicating knowledge features Knowledge feature amount estimation processing for estimating the feature amount and knowledge level estimation processing for estimating the knowledge level of the speaker by integrating the estimated knowledge feature amount are executed.
 また、本発明の目的は、上記会話分析プログラムが格納されている、コンピュータ読み取り可能な記憶媒体によっても達成される。 The object of the present invention is also achieved by a computer-readable storage medium in which the conversation analysis program is stored.
 本発明によれば、書き言葉とは異なる崩れた文章が入力された場合であっても、頑健に話者の知識レベルを推定できる。 According to the present invention, it is possible to robustly estimate the speaker's knowledge level even when a broken sentence different from the written word is input.
図1は、本発明の実施の形態における会話分析装置の学習系の構成例を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration example of a learning system of the conversation analysis apparatus according to the embodiment of the present invention. 図2は、本発明の実施の形態における知識特徴の概念を示す説明図である。FIG. 2 is an explanatory diagram showing the concept of knowledge features in the embodiment of the present invention. 図3は、本発明の実施の形態における会話分析装置の推定系の構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration example of an estimation system of the conversation analysis apparatus according to the embodiment of the present invention. 図4は、会話分析装置100の動作を示すフローチャートである。FIG. 4 is a flowchart showing the operation of the conversation analysis apparatus 100. 図5は、本発明の実施の形態における会話分析装置による評価実験の評価結果と他の手法による評価実験の評価結果を示す説明図である。FIG. 5 is an explanatory diagram showing the evaluation result of the evaluation experiment by the conversation analysis device and the evaluation result of the evaluation experiment by another method according to the embodiment of the present invention. 図6は、本発明の実施の形態における会話分析システムの概要を示すブロック図である。FIG. 6 is a block diagram showing an outline of the conversation analysis system in the embodiment of the present invention. 図7は、特許文献1に記載されている知識量推定情報生成装置の構成を示すブロック図である。FIG. 7 is a block diagram illustrating a configuration of the knowledge amount estimation information generation device described in Patent Document 1. 図8は、本発明の実施の形態に係る会話分析システムあるいは会話分析装置を実現可能なハードウェア構成を例示する説明図である。FIG. 8 is an explanatory diagram illustrating a hardware configuration capable of realizing the conversation analysis system or the conversation analysis apparatus according to the embodiment of the present invention.
[構成の説明]
 以下、本発明の実施形態を、図面を参照して説明する。図1は、本発明の実施の形態における会話分析装置の学習系の構成例を示すブロック図である。
[Description of configuration]
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration example of a learning system of the conversation analysis apparatus according to the embodiment of the present invention.
 本実施形態では、入力される音声データは、左右の各チャネルに異なる話者の音声がそれぞれ入力された、ステレオフォニック(以下、ステレオという。)再生可能な対話音声のデータとする。以下、話者の知識レベルを推定する場合を例に、本実施形態における会話分析装置の構成および動作を説明する。 In the present embodiment, the input voice data is interactive voice data that can be reproduced in stereophonic (hereinafter referred to as “stereo”), in which different speaker voices are input to the left and right channels, respectively. Hereinafter, the configuration and operation of the conversation analysis apparatus according to the present embodiment will be described by taking as an example the case of estimating the speaker's knowledge level.
 なお、入力される音声データは、ステレオ以外の方式で再生可能なデータでもよい。また、入力される音声データは、3人以上による対話の音声データでもよい。3人以上による対話の音声データが入力された場合であっても、話者認識技術などを用いて各話者の音声データを分離すれば、本実施形態における会話分析装置は、話者の知識レベルを推定できる。 Note that the input audio data may be data reproducible by a method other than stereo. Further, the input voice data may be voice data of dialogue by three or more people. Even when voice data of conversations by three or more people is input, if the voice data of each speaker is separated using speaker recognition technology or the like, the conversation analysis apparatus according to the present embodiment can provide the speaker's knowledge. The level can be estimated.
 図1に示す会話分析装置100の学習系は、発話区間算出手段101と、特徴量抽出手段102とを含む。また、会話分析装置100の学習系は、知識特徴量推定モデル記憶手段103と、知識レベル推定モデル記憶手段105と、知識特徴量推定モデル作成手段110と、知識レベル推定モデル作成手段111とを含む。 The learning system of the conversation analysis apparatus 100 shown in FIG. 1 includes an utterance section calculation unit 101 and a feature amount extraction unit 102. The learning system of the conversation analysis apparatus 100 includes knowledge feature quantity estimation model storage means 103, knowledge level estimation model storage means 105, knowledge feature quantity estimation model creation means 110, and knowledge level estimation model creation means 111. .
 発話区間算出手段101は、入力される音声データおよび音声データに関するテキストデータから発話区間を算出し、算出した発話区間を出力する機能を有する。係る音声データに関するテキストデータは、例えば、音声データを音声認識することによって得られる、発話単語のテキストデータを含んでもよい。 The utterance interval calculation means 101 has a function of calculating an utterance interval from input voice data and text data related to the audio data, and outputting the calculated utterance interval. The text data related to such voice data may include, for example, text data of an utterance word obtained by voice recognition of the voice data.
 発話区間は、同一話者による発話検出区間が連続し、まとまっている区間である。発話区間は、言語特徴量または対話特徴量を算出する単位になる。 The utterance section is a section in which utterance detection sections by the same speaker are continuous and grouped. The utterance section is a unit for calculating the language feature value or the dialogue feature value.
 また、発話検出区間は、人間が息継ぎなしに連続して話す区間である。発話検出区間は、例えば、音声認識の前処理などにより自動的に算出される。 Also, the utterance detection section is a section where humans speak continuously without breathing. The utterance detection section is automatically calculated by, for example, preprocessing for voice recognition.
 なお、発話検出区間は、自動検出された区間ではなく、自動検出された区間の前後にマージンが付与された区間でもよい。また、発話検出区間は、人間の話している区間ではなく、単に固定時間長で定められた区間でもよい。 Note that the utterance detection section is not an automatically detected section, but may be a section with a margin before and after the automatically detected section. Further, the utterance detection section may not be a section where humans are speaking, but may be a section determined simply by a fixed time length.
 入力されるテキストデータに、音声認識の際に付与される発話検出区間や話者に関連する情報(話者情報)が記述されている場合、発話区間算出手段101は、記述されている発話検出区間や話者情報から発話区間を算出してもよい。 When the input text data describes an utterance detection section provided at the time of speech recognition or information related to the speaker (speaker information), the utterance section calculation means 101 describes the detected utterance detection. The utterance interval may be calculated from the interval and speaker information.
 さらに、発話区間算出手段101は、算出された発話区間に基づいて、発話を分類してもよい。発話区間算出手段101が発話を分類した場合、特徴量抽出手段102は、分類されたクラスごとに言語特徴量または対話特徴量を求める。求められた言語特徴量または対話特徴量は、後述のように、知識特徴量の推定に用いられる。 Furthermore, the utterance interval calculation means 101 may classify the utterance based on the calculated utterance interval. When the utterance section calculation unit 101 classifies the utterance, the feature amount extraction unit 102 obtains a language feature amount or a dialogue feature amount for each classified class. The obtained language feature amount or dialogue feature amount is used for estimation of the knowledge feature amount as will be described later.
 発話区間算出手段101による発話区間の算出方法の一例を以下に説明する。発話区間算出手段101は、入力されるテキストデータに含まれる音声区間情報および話者情報を用いて、二人の話者による発話を時系列に並べる。入力されるテキストデータに発話検出区間や話者情報がない場合、発話区間算出手段101は、入力される音声データを分析することによって、発話検出区間や話者情報を取得してもよい。 An example of the calculation method of the utterance section by the utterance section calculation means 101 will be described below. The utterance section calculation means 101 arranges utterances by two speakers in time series using the speech section information and the speaker information included in the input text data. If there is no utterance detection section or speaker information in the input text data, the utterance section calculation means 101 may acquire the utterance detection section or speaker information by analyzing the input voice data.
 次いで、発話区間算出手段101は、ある一方の話者(主話者)の発話検出区間ともう一方の話者(対話者)の発話検出区間を比較し、主話者の発話検出区間に対話者の発話検出区間が完全に包含されている発話を検出する。一例として、主話者が話している最中に挿入される対話者による相槌が、これに該当する。発話区間算出手段101は、完全包含されている発話の検出処理を、両話者による発話に対して行う。 Next, the utterance interval calculation means 101 compares the utterance detection interval of one speaker (main speaker) with the utterance detection interval of the other speaker (interactive speaker), and interacts with the utterance detection interval of the main speaker. An utterance in which the person's utterance detection section is completely included is detected. As an example, this corresponds to the interaction by the interlocutor inserted while the main speaker is speaking. The utterance section calculation means 101 performs processing for detecting utterances that are completely included in utterances by both speakers.
 さらに、発話区間算出手段101は、完全包含された発話区間を除いた残りの発話検出区間の中で、連続している同一話者の発話検出区間を結合して1区間にする。すなわち、この結合された1区間が発話区間になる。 Further, the utterance interval calculation means 101 combines the continuous utterance detection intervals of the same speaker among the remaining utterance detection intervals excluding the completely included utterance interval to make one interval. That is, the combined one section becomes the speech section.
 上記の処理で発話区間が得られることによって、発話において性質上明確ではない意味の区切れが明らかになる。意味の区切れが明らかになることによって、特徴量抽出手段102は、より正確な特徴量を算出できる。 When the utterance section is obtained by the above processing, a meaning break that is not clear in nature is clarified in the utterance. The feature amount extraction unit 102 can calculate a more accurate feature amount by clarifying the meaning separation.
 なお、発話区間算出手段101は、入力されるテキストデータから得られる発話検出区間(発話開始時刻、発話終了時刻)と話者情報を発話区間として用いることもできる。発話検出区間と話者情報が発話区間として用いられる場合、発話区間算出手段101による上記の処理は不要になる。 Note that the utterance interval calculation means 101 can also use the utterance detection interval (utterance start time, utterance end time) and speaker information obtained from the input text data as the utterance interval. When the utterance detection section and the speaker information are used as the utterance section, the above processing by the utterance section calculation unit 101 becomes unnecessary.
 さらに、発話区間算出手段101は、予め定められた所定の基準に従って発話を分類してもよい。発話の分類方法の一例として、発話の主導権(take control of conversations)に基づく方法がある。 Furthermore, the utterance section calculation means 101 may classify the utterances according to a predetermined criterion. As an example of an utterance classification method, there is a method based on utterance control of the utterance.
 発話区間算出手段101は、会話の主導権に基づいて、上記のように算出された発話区間における発話それぞれを、発話の主導権がある発話(以下、主導的発話という。)と主導権がない発話(以下、受動的発話という。)の2つに分類する。 The utterance section calculation means 101 has utterance with the initiative of utterance (hereinafter referred to as “leading utterance”) and no initiative for each utterance in the utterance section calculated as described above based on the initiative of conversation. There are two types of utterances (hereinafter referred to as passive utterances).
 主導権の有無の判定方法の一例として、発話区間の長さや発話区間に表れる単語の種類を用いる方法がある。上記のように算出された発話区間における発話それぞれに対して、発話区間算出手段101は、例えば、発話区間が閾値よりも短い発話を受動的発話に分類する。 As an example of the determination method of the presence / absence of initiative, there is a method of using the length of the utterance section and the type of word appearing in the utterance section. For each utterance in the utterance interval calculated as described above, the utterance interval calculation means 101 classifies, for example, an utterance whose utterance interval is shorter than a threshold as a passive utterance.
 また、発話区間算出手段101は、音響条件、または収録条件の影響を受けて誤認識されやすい、音素数が少ない単語(例えば、「はい」、「いいえ」)や、認識結果の信頼度が低い単語を含む発話区間を受動的発話に分類してもよい。 In addition, the utterance section calculation unit 101 has a low number of phonemes (for example, “Yes” or “No”) that are easily misrecognized under the influence of acoustic conditions or recording conditions, and the reliability of the recognition result is low. An utterance section including a word may be classified as a passive utterance.
 上記のように対象の発話を受動的発話に分類した後、発話区間算出手段101は、受動的発話に分類した発話以外の発話を主導的発話に分類する。特徴量抽出手段102は、発話区間算出手段101による分類結果を分類クラスとみなして、クラスごとに言語特徴量または対話特徴量を求める。 After classifying the target utterance as a passive utterance as described above, the utterance interval calculation means 101 classifies utterances other than the utterance classified as passive utterance as the leading utterance. The feature quantity extraction unit 102 regards the classification result by the utterance section calculation unit 101 as a classification class, and obtains a language feature quantity or a dialogue feature quantity for each class.
 発話を主導的発話と受動的発話に分類する理由は、分類することで話者特徴をより顕在化させることができるからである。例えば、対話特徴量の1つである発話長は、受動的発話の割合により特徴量(例えば、平均または分散)が大きく変化する。したがって、発話を主導的発話に絞って発話長に関する特徴量を求めた方が、分類しない場合に比べて主体的に話した発話に対する特徴量を求めることができ、話者の特徴を把握可能となる。 The reason for classifying utterances into leading utterances and passive utterances is that speaker characteristics can be made more obvious by classification. For example, the feature length (for example, average or variance) of the utterance length, which is one of the dialogue feature values, varies greatly depending on the ratio of passive utterances. Therefore, if the utterance is focused on the leading utterance and the feature amount related to the utterance length is obtained, the feature amount for the utterance spoken independently can be obtained compared with the case where the utterance is not classified, and the feature of the speaker can be understood Become.
 なお、特徴量抽出手段102は、上記のように発話の分類結果を分類クラスとみなす他にも、対象の発話の前後の発話の分類結果を分類クラスに用いてもよい。また、特徴量抽出手段102は、対象の発話の分類結果と、対象の発話の前後の発話の分類結果との組み合わせを分類クラスに用いてもよい。 Note that the feature quantity extraction unit 102 may use the utterance classification result before and after the target utterance as the classification class, in addition to regarding the utterance classification result as the classification class as described above. In addition, the feature amount extraction unit 102 may use a combination of the classification result of the target utterance and the classification result of the utterance before and after the target utterance as the classification class.
 特徴量抽出手段102は、言語特徴量抽出手段102aと対話特徴量抽出手段102bとを含む。特徴量抽出手段102には、テキストデータ、音声データ、発話区間検出結果、発話分類結果などが入力される。特徴量抽出手段102は、これらの入力されたデータに基づいて、言語特徴量や対話特徴量を出力する。 The feature amount extraction unit 102 includes a language feature amount extraction unit 102a and a dialogue feature amount extraction unit 102b. Text data, voice data, utterance section detection results, utterance classification results, and the like are input to the feature amount extraction unit 102. The feature amount extraction unit 102 outputs a language feature amount and a dialogue feature amount based on these input data.
 言語特徴量抽出手段102aは、入力されるテキストデータから算出される言語特徴量を抽出する機能を有する。言語特徴量は、具体的には、入力されるテキストデータに含まれる単語の出現頻度や単語の出現頻度に基づく統計値などである。テキストデータが音声認識結果のテキストデータである場合、抽出される言語特徴量は、認識された単語ごとに付与される認識結果の信頼度などになる。 The language feature quantity extraction unit 102a has a function of extracting a language feature quantity calculated from input text data. Specifically, the language feature amount is a word appearance frequency included in input text data, a statistical value based on the word appearance frequency, or the like. When the text data is text data of a speech recognition result, the extracted linguistic feature quantity is the reliability of the recognition result given to each recognized word.
 なお、言語特徴量抽出手段102aは、認識された単語が属するクラスを用いて特徴量を求めてもよい。また、言語特徴量抽出手段102aは、表記揺れを修正するための表記の補正、または類義語展開などを行うことによって出現単語を別のシンボルに置き換えることで、出現単語そのものからではなく置き換えたシンボルから特徴量を求めてもよい。 Note that the language feature quantity extraction unit 102a may obtain the feature quantity using the class to which the recognized word belongs. In addition, the language feature quantity extraction unit 102a replaces the appearing word with another symbol by performing notation correction for correcting the notation fluctuation, synonym expansion, or the like, so that the appearing word is not replaced with the replaced symbol. A feature amount may be obtained.
 対話特徴量抽出手段102bは、主に音声データから算出される、話者間の対話状態に関する特徴量である対話特徴量を抽出する機能を有する。対話特徴量は、2人以上が会話した時に取得可能な特徴量である。対話特徴量は、上記の発話区間を基準に算出される。 The dialogue feature quantity extraction unit 102b has a function of extracting a dialogue feature quantity that is a feature quantity relating to a dialogue state between speakers, which is mainly calculated from voice data. The dialogue feature amount is a feature amount that can be acquired when two or more people have a conversation. The dialogue feature amount is calculated based on the utterance section.
 すなわち、対話特徴量抽出手段102bは、例えば、ある話者の話速、発話長、相槌数などを、対話者の発話区間に挟まれた発話区間を分析することによって求めることができる。また、対話特徴量抽出手段102bは、データの冒頭と対話者の発話区間に挟まれた発話区間やデータの末尾と対話者の発話区間に挟まれた発話区間を分析することによっても対話特徴量を求めることができる。 That is, the dialogue feature quantity extraction unit 102b can obtain, for example, a speaker's speaking speed, utterance length, and number of companions by analyzing the utterance section sandwiched between the utterance sections of the conversation person. The dialogue feature quantity extraction unit 102b also analyzes the dialogue feature quantity by analyzing the utterance section between the beginning of the data and the utterance section of the talker and the utterance section between the end of the data and the utterance section of the talker. Can be requested.
 また、対話特徴量抽出手段102bは、各話者の発話区間が確定すれば、後述するポーズ長の値を算出できる。このように、対話特徴量抽出手段102bは、発話区間を基準に各種の対話特徴量を求めることができる。 Also, the dialogue feature quantity extraction unit 102b can calculate a pause length value to be described later when the utterance section of each speaker is determined. As described above, the dialogue feature value extraction unit 102b can obtain various dialogue feature values based on the utterance section.
 以下、対話特徴量の具体例である、話速、ポーズ長、相槌数および発話長をそれぞれ説明する。 The following describes specific examples of dialogue feature amounts, such as speech speed, pause length, number of peers, and speech length.
 話速は、対話1単位における話者の話す速さである。話速は、単位時間あたりのモーラ数などで表現される。ある1発話区間に関して、話速は、例えば、認識された単語のモーラ数を発話区間の長さで除算することで求められる。なお、モーラは、単一のリズムをなす音節である。 Talk speed is the speed at which a speaker speaks in one unit of dialogue. Speaking speed is expressed by the number of mora per unit time. With respect to a certain utterance section, the speech speed is obtained by, for example, dividing the number of mora of the recognized word by the length of the utterance section. A mora is a syllable that forms a single rhythm.
 ポーズ長は、本実施形態において、話者交代が起こった時の「間」の長さを意味する。
ポーズ長は、対象の発話区間の直前の発話区間の発話終了時刻と、対象の発話区間の発話開始時刻との差を求めることで算出される。
In the present embodiment, the pause length means the length of “between” when a speaker change occurs.
The pause length is calculated by obtaining the difference between the utterance end time of the utterance section immediately before the target utterance section and the utterance start time of the target utterance section.
 発話長は、1発話区間の長さである。すなわち、発話長は、1発話区間の発話開始時刻から発話終了時刻までの時間の長さである。 The utterance length is the length of one utterance section. That is, the utterance length is the length of time from the utterance start time to the utterance end time of one utterance section.
 相槌数は、対話者が相槌を打った回数である。相槌は、対話者が相手の発話内容に対して理解を示したり、相手の発話の継続を促したりする性質を有する。 The number of solicitations is the number of times that the interlocutor strikes. The partner has a property that the dialogue person shows understanding of the content of the other party's utterance or prompts the continuation of the other party's utterance.
 対話特徴量抽出手段102bは、相槌の認定を、認識結果に基づくパターンマッチングにより行ってもよいし、発話長に基づいて行ってもよい。また、対話特徴量抽出手段102bは、上述した発話分類結果の一例である発話の包含関係の情報を用いて、相槌の認定を行ってもよい。 The dialogue feature amount extraction unit 102b may perform the recognition of the mutual agreement by pattern matching based on the recognition result or based on the utterance length. Further, the dialogue feature amount extraction unit 102b may perform the recognition of the conflict using the information on the utterance inclusion relation which is an example of the utterance classification result described above.
 知識特徴量推定モデル作成手段110は、知識特徴量推定モデルを生成する機能を有する。知識特徴量推定モデル作成手段110は、特徴量抽出手段102において学習用の音声データと、テキストデータから抽出された言語特徴量および対話特徴量とを含む学習データと、特徴量と学習用の音声データに対する知識特徴量を表す教師データである知識特徴量ラベル112とを用いて、知識特徴量推定モデルを生成する。知識特徴量推定モデル作成手段110は、作成した知識特徴量推定モデルを知識特徴量推定モデル記憶手段103に送る。 The knowledge feature quantity estimation model creation means 110 has a function of generating a knowledge feature quantity estimation model. The knowledge feature quantity estimation model creation means 110 includes learning data including the speech data for learning in the feature quantity extraction means 102, the language feature quantity and the dialogue feature quantity extracted from the text data, the feature quantity and the speech for learning. A knowledge feature quantity estimation model is generated using the knowledge feature quantity label 112 which is teacher data representing the knowledge feature quantity for the data. The knowledge feature quantity estimation model creation means 110 sends the created knowledge feature quantity estimation model to the knowledge feature quantity estimation model storage means 103.
 知識特徴量推定モデル記憶手段103は、知識特徴量推定モデル作成手段110が作成した知識特徴量推定モデルを記憶する機能を有する。図2は、本実施形態における知識特徴の概念を示す説明図である。 The knowledge feature quantity estimation model storage means 103 has a function of storing the knowledge feature quantity estimation model created by the knowledge feature quantity estimation model creation means 110. FIG. 2 is an explanatory diagram showing the concept of knowledge features in the present embodiment.
 知識特徴は、話者の使う言葉や話者の反応から抽出される言語特徴量や対話特徴量に基づいた、話者の知識レベルを決定づける要素である。 Knowledge feature is an element that determines the speaker's level of knowledge based on language features and dialogue features extracted from the language used by the speaker and the reaction of the speaker.
 知識特徴量推定モデルは、入力データに対する知識特徴量を推定するモデルである。知識特徴量推定モデルは、特徴量抽出手段102において学習用の音声データと、テキストデータから抽出された言語特徴量および対話特徴量とを含む学習データと、学習用の音声データに対する知識特徴量を表す知識特徴量ラベル112とを用いて生成される。 The knowledge feature quantity estimation model is a model for estimating knowledge feature quantity with respect to input data. The knowledge feature amount estimation model includes learning data including learning speech data, language feature amounts and dialogue feature amounts extracted from text data in the feature amount extraction unit 102, and knowledge feature amounts for learning speech data. It is generated using the knowledge feature quantity label 112 to represent.
 上記のように、知識特徴量推定モデルは、言語特徴量および対話特徴量と、知識特徴量ラベル112とを一組とする学習データを入力データとして、識別パターンを学習することによって生成されるモデルである。識別パターンの学習には、公知技術である「Support Vector Machine(SVM)」(非特許文献1)などが用いられる。 As described above, the knowledge feature quantity estimation model is a model generated by learning an identification pattern using, as input data, learning data including a set of a language feature quantity, a dialogue feature quantity, and a knowledge feature quantity label 112. It is. For learning of the identification pattern, “Support Vector Machine (SVM)” (Non-patent Document 1), which is a known technique, is used.
 知識レベル推定モデル作成手段111は、知識レベル推定モデルを生成する機能を有する。知識レベル推定モデル作成手段111は、知識特徴量ラベル112と、知識レベルの教師データである知識ラベル113とを用いて、知識レベルの識別パターンを学習した知識レベル推定モデルを生成する。 The knowledge level estimation model creation unit 111 has a function of generating a knowledge level estimation model. The knowledge level estimation model creating means 111 generates a knowledge level estimation model obtained by learning a knowledge level identification pattern, using the knowledge feature quantity label 112 and the knowledge label 113 which is knowledge level teacher data.
 知識レベル推定モデル作成手段111は、後述する知識特徴量推定手段104が学習データに対して出力した結果と、知識ラベル113とを用いて、知識レベルの識別パターンを学習した知識レベル推定モデルを生成する。そして、知識レベル推定モデル作成手段111は、生成した知識レベル推定モデルを知識レベル推定モデル記憶手段105に送る。なお、知識レベル推定モデル作成手段111は、知識特徴量推定手段104の学習データに対する出力結果の代わりに、知識特徴量ラベル112を用いてもよい。 The knowledge level estimation model creating unit 111 generates a knowledge level estimation model in which the knowledge level identification pattern is learned using the result output by the knowledge feature amount estimation unit 104 (to be described later) with respect to the learning data and the knowledge label 113. To do. Then, the knowledge level estimation model creating unit 111 sends the generated knowledge level estimation model to the knowledge level estimation model storage unit 105. The knowledge level estimation model creation unit 111 may use the knowledge feature amount label 112 instead of the output result of the knowledge feature amount estimation unit 104 with respect to the learning data.
 知識レベル推定モデル記憶手段105は、知識レベル推定モデル作成手段111が作成した知識レベル推定モデルを記憶する機能を有する。 The knowledge level estimation model storage unit 105 has a function of storing the knowledge level estimation model created by the knowledge level estimation model creation unit 111.
 知識レベル推定モデルは、入力データに対する知識レベルを推定するモデルである。知識レベル推定モデルは、後述する知識特徴量推定手段104が学習データに対して出力した結果、または学習データに対する知識特徴量ラベル112と、知識ラベル113とを用いて、識別パターンを学習することによって生成される。 The knowledge level estimation model is a model for estimating a knowledge level for input data. The knowledge level estimation model is obtained by learning an identification pattern using the result of knowledge feature amount estimation unit 104 described later output from learning data, or using knowledge feature amount label 112 and knowledge label 113 for learning data. Generated.
 上記のように、知識レベル推定モデルは、知識特徴量ラベル112、または学習データに対する知識特徴量の推定結果と、知識ラベル113とを一組とする学習データを入力データとして、識別パターンを学習することによって生成されるモデルである。識別パターンの学習には、知識特徴量推定モデルと同様に、SVMなどが用いられる。 As described above, the knowledge level estimation model learns an identification pattern by using as input data the knowledge feature amount label 112 or the knowledge feature amount estimation result for the learning data and the knowledge label 113 as a set. This is a model generated by Similar to the knowledge feature amount estimation model, SVM or the like is used for learning the identification pattern.
 次に、会話分析装置100の推定系を説明する。図3は、本発明の実施の形態における会話分析装置の推定系の構成例を示すブロック図である。 Next, the estimation system of the conversation analysis apparatus 100 will be described. FIG. 3 is a block diagram illustrating a configuration example of an estimation system of the conversation analysis apparatus according to the embodiment of the present invention.
 図3に示す会話分析装置100の推定系は、発話区間算出手段101と、特徴量抽出手段102と、知識特徴量推定モデル記憶手段103と、知識特徴量推定手段104と、知識レベル推定モデル記憶手段105と、知識レベル推定手段106とを含む。以下、学習系には含まれず推定系のみに含まれる、知識特徴量推定手段104と知識レベル推定手段106を説明する。 The estimation system of the conversation analysis apparatus 100 shown in FIG. 3 includes an utterance section calculation unit 101, a feature amount extraction unit 102, a knowledge feature amount estimation model storage unit 103, a knowledge feature amount estimation unit 104, and a knowledge level estimation model storage. Means 105 and knowledge level estimation means 106 are included. Hereinafter, the knowledge feature amount estimation means 104 and the knowledge level estimation means 106 that are not included in the learning system but are included only in the estimation system will be described.
 知識特徴量推定手段104は、知識特徴量を推定する機能を有する。知識特徴量推定手段104は、特徴量抽出手段102で算出された言語特徴量および対話特徴量と、知識特徴量推定モデル記憶手段103に記憶された知識特徴量推定モデルを用いて、入力データに対する知識特徴量をそれぞれ推定する。 The knowledge feature quantity estimation means 104 has a function of estimating the knowledge feature quantity. The knowledge feature amount estimation unit 104 uses the language feature amount and the dialogue feature amount calculated by the feature amount extraction unit 102 and the knowledge feature amount estimation model stored in the knowledge feature amount estimation model storage unit 103 to input data. Each knowledge feature is estimated.
 知識特徴量推定手段104は、知識特徴量を、「0」または「1」といった離散値や、「0」から「1」などの範囲の連続値に数値化して出力する。知識特徴量推定モデル記憶手段103は、知識特徴量推定手段104が推定する1つの知識特徴量に対して、少なくとも1つ知識特徴量推定モデルを記憶する。 Knowledge feature quantity estimation means 104 digitizes the knowledge feature quantity into a discrete value such as “0” or “1” or a continuous value ranging from “0” to “1” and outputs the digitized value. The knowledge feature quantity estimation model storage unit 103 stores at least one knowledge feature quantity estimation model for one knowledge feature quantity estimated by the knowledge feature quantity estimation unit 104.
 知識特徴量推定手段104は、知識特徴量を推定する際に、入力データから求められた言語特徴量および対話特徴量と、知識特徴量推定モデルを照合することによって知識特徴が有るか否かを識別する。知識特徴の識別処理においても、識別パターンの学習時と同様にSVMなどが用いられる。なお、知識特徴が複数存在する場合、知識特徴量推定手段104は、知識特徴毎に識別処理を行う。 When the knowledge feature quantity estimation unit 104 estimates the knowledge feature quantity, the language feature quantity and the dialogue feature quantity obtained from the input data are compared with the knowledge feature quantity estimation model to determine whether or not the knowledge feature exists. Identify. In the knowledge feature identification processing, SVM or the like is used as in the identification pattern learning. When there are a plurality of knowledge features, the knowledge feature amount estimation unit 104 performs identification processing for each knowledge feature.
 また、知識特徴量推定手段104は、知識特徴量を「有り」または「無し」の二値で判別する以外にも、例えば、「有り」、「無し」、「不明」のような三値で判別してもよい。また、知識特徴量推定手段104は、知識特徴量を三値よりも多くのレベルで判別してもよい。知識特徴量を多くのレベルで判別する場合、知識特徴量推定手段104は、上記の識別処理を多段に構成することによって知識特徴量ラベル112を出力できる。 Further, the knowledge feature quantity estimation means 104 can determine the knowledge feature quantity by binary values such as “present”, “absent”, and “unknown”, for example, in addition to determining the knowledge feature quantity by binary “present” or “none”. It may be determined. Further, the knowledge feature quantity estimation means 104 may discriminate the knowledge feature quantity at a level higher than ternary values. When the knowledge feature quantity is discriminated at many levels, the knowledge feature quantity estimation unit 104 can output the knowledge feature quantity label 112 by configuring the above identification processing in multiple stages.
 また、知識特徴量推定手段104は、上記のような離散値を出力する以外にも、連続値を出力してもよい。連続値を出力する場合、知識特徴量推定手段104は、例えば、上記の識別処理の出力結果とともに出力されるスコアを用いてもよい。 Further, the knowledge feature quantity estimation means 104 may output a continuous value in addition to outputting the discrete value as described above. When outputting a continuous value, the knowledge feature amount estimation unit 104 may use, for example, a score that is output together with the output result of the identification process.
 知識特徴量推定手段104は、推定する知識特徴量の数として、開発データを用いた実験により得られた、知識レベルの推定精度が最適になる数を採用してもよい。また、推定される知識特徴量の数は、人手により予め定めてもよい。 The knowledge feature quantity estimation means 104 may adopt the number of knowledge feature quantities to be estimated, which is obtained by an experiment using development data and has an optimum knowledge level estimation accuracy. Further, the number of estimated knowledge feature amounts may be determined in advance manually.
 さらに、知識ラベル113が作成される際に、知識ラベル113付与に加えて、作成者が知識ラベル113を付与した理由を記述する場合を想定する。この場合、知識特徴量推定手段104は、記述された内容をクラスタリングなどテキスト分析することによって得られた最適数を、推定する知識特徴量の数に定めてもよい。その最適数を知識特徴量の数とする場合、知識レベルを判断するに至った要因が、知識特徴に相当する。 Furthermore, it is assumed that when the knowledge label 113 is created, the reason why the creator has given the knowledge label 113 is described in addition to the knowledge label 113 assignment. In this case, the knowledge feature quantity estimation means 104 may determine the optimum number obtained by text analysis such as clustering of the described contents as the number of knowledge feature quantities to be estimated. When the optimum number is the number of knowledge feature amounts, the factor that led to the determination of the knowledge level corresponds to the knowledge feature.
 知識レベル推定手段106は、知識レベルを推定する機能を有する。知識レベル推定手段106は、知識特徴量を統合することによって知識レベルを推定する。 The knowledge level estimation means 106 has a function of estimating the knowledge level. The knowledge level estimation means 106 estimates the knowledge level by integrating knowledge feature quantities.
 知識レベル推定手段106は、知識特徴量推定手段104が出力した知識特徴量推定結果と、知識レベル推定モデル記憶手段105に記憶された知識レベル推定モデルとを用いて、知識レベルを推定する。知識レベル推定手段106は、識別パターンの学習処理の時と同様に、SVMなどにより、知識レベル推定モデルと知識特徴量推定結果から知識レベルの推定結果を出力する。 The knowledge level estimation means 106 estimates the knowledge level using the knowledge feature quantity estimation result output from the knowledge feature quantity estimation means 104 and the knowledge level estimation model stored in the knowledge level estimation model storage means 105. The knowledge level estimation means 106 outputs the knowledge level estimation result from the knowledge level estimation model and the knowledge feature quantity estimation result by SVM or the like, as in the case of the identification pattern learning process.
 知識レベル推定手段106が出力した知識レベルの推定結果が、会話分析装置100の出力結果になる。知識レベルの出力結果は、2つ以上のクラス(離散値)でもよいし、連続値でもよい。 The knowledge level estimation result output by the knowledge level estimation means 106 becomes the output result of the conversation analysis apparatus 100. The output result of the knowledge level may be two or more classes (discrete values) or a continuous value.
 知識レベルの出力結果を離散値にする場合、知識レベル推定手段106は、知識レベルを「有り」または「無し」の二値で判別しても、三値以上のレベルに判別してもよい。知識レベルを多くのレベルで判別する場合、知識レベル推定手段106は、例えば、上記の識別処理を多段に構成することによって、多くのレベルに判別された知識レベルを出力できる。 When the output result of the knowledge level is a discrete value, the knowledge level estimation means 106 may determine the knowledge level as a binary value of “present” or “absent” or a level of three or more values. When discriminating the knowledge level at many levels, the knowledge level estimating means 106 can output the knowledge level discriminated at many levels by, for example, configuring the identification processing in multiple stages.
 知識レベルの出力結果を離散値ではなく連続値にする場合、知識レベル推定手段106は、例えば、上記の識別処理の出力結果とともに出力されるスコアを用いる。 When the output result of the knowledge level is not a discrete value but a continuous value, the knowledge level estimation means 106 uses, for example, a score output together with the output result of the identification process.
 また、知識レベル推定手段106は、公知の技術である多数決法を用いてもよい。多数決法を用いた場合、知識レベル推定手段106は、各知識特徴量の出力結果を離散値とし、離散値が最も多い知識特徴量の出力結果を会話分析装置100が出力する知識レベルとして採用する。多数決法が用いられる場合、知識レベル推定モデルは不要になる。 Further, the knowledge level estimating means 106 may use a majority method which is a known technique. When the majority method is used, the knowledge level estimation means 106 adopts the output result of each knowledge feature quantity as a discrete value, and adopts the output result of the knowledge feature quantity having the largest discrete value as the knowledge level output by the conversation analysis apparatus 100. . When the majority method is used, the knowledge level estimation model is not necessary.
 なお、本実施形態の会話分析装置100は、例えば、プログラムに従って処理を実行するCPU(Central Processing Unit)によって実現される。また、会話分析装置100は、ハードウェアによって実現されてもよい。 Note that the conversation analysis apparatus 100 of the present embodiment is realized by a CPU (Central Processing Unit) that executes processing according to a program, for example. The conversation analysis apparatus 100 may be realized by hardware.
 また、発話区間算出手段101、特徴量抽出手段102、知識特徴量推定手段104、知識レベル推定手段106、知識特徴量推定モデル作成手段110および知識レベル推定モデル作成手段111は、例えば、プログラム制御に従って処理を実行するCPUによって実現される。 Further, the utterance section calculation unit 101, the feature amount extraction unit 102, the knowledge feature amount estimation unit 104, the knowledge level estimation unit 106, the knowledge feature amount estimation model creation unit 110, and the knowledge level estimation model creation unit 111, for example, according to program control This is realized by a CPU that executes processing.
 なお、会話分析装置100を実現可能なハードウェア構成については後述する。 Note that a hardware configuration capable of realizing the conversation analysis apparatus 100 will be described later.
 また、知識特徴量推定モデル記憶手段103および知識レベル推定モデル記憶手段105は、例えば、RAM(Random Access Memory)で実現される。 Further, the knowledge feature quantity estimation model storage unit 103 and the knowledge level estimation model storage unit 105 are realized by, for example, a RAM (Random Access Memory).
[動作の説明]
 以下、本実施形態の会話分析装置100の動作を図4を参照して説明する。図4は、会話分析装置100による知識レベル推定処理の動作を示すフローチャートである。
[Description of operation]
Hereinafter, the operation of the conversation analysis apparatus 100 of the present embodiment will be described with reference to FIG. FIG. 4 is a flowchart showing the operation of knowledge level estimation processing by the conversation analysis apparatus 100.
 発話区間算出手段101には、音声データと、音声データに関するテキストデータとが入力される。入力されるテキストデータは、例えば、音声認識結果や書き起こしによるデータである。 The speech data and text data related to the speech data are input to the utterance section calculation means 101. The input text data is, for example, voice recognition results or transcription data.
 入力された後、発話区間算出手段101は、テキストデータに記述された発話に関する情報をもとに、言語特徴量や対話特徴量を算出する単位になる発話区間を算出する。発話に関する情報は、例えば、発話検出区間や話者情報である(ステップS201)。 After the input, the utterance interval calculation means 101 calculates an utterance interval as a unit for calculating the language feature amount and the dialogue feature amount based on the information about the utterance described in the text data. The information related to the utterance is, for example, an utterance detection section or speaker information (step S201).
 ステップS201において、発話区間算出手段101は、算出した発話区間に基づいて発話を分類してもよい。発話区間算出手段101が算出した発話区間または分類結果をもとに、特徴量抽出手段102は、言語特徴量や対話特徴量を算出する。 In step S201, the utterance interval calculation unit 101 may classify the utterance based on the calculated utterance interval. Based on the utterance interval or the classification result calculated by the utterance interval calculation unit 101, the feature amount extraction unit 102 calculates a language feature amount and a dialogue feature amount.
 次いで、言語特徴量抽出手段102aは、テキストデータから単語出現頻度や音声認識結果に関する単語信頼度などの言語特徴量を算出する(ステップS202)。 Next, the language feature quantity extraction unit 102a calculates a language feature quantity such as word appearance frequency and word reliability related to the speech recognition result from the text data (step S202).
 次いで、対話特徴量抽出手段102bは、入力された音声データ、テキストデータ、およびステップS201において算出された発話区間情報を用いて、話速、ポーズ長、発話長、または相槌数などの対話特徴量を算出する(ステップS203)。なお、ステップS203における処理が、ステップS202における処理より先に実行されてもよい。また、2つの処理が並列に実行されてもよい。 Next, the dialog feature quantity extraction unit 102b uses the input voice data, text data, and the utterance period information calculated in step S201, and the dialog feature quantity such as the speech speed, pause length, utterance length, or number of companions. Is calculated (step S203). Note that the process in step S203 may be executed prior to the process in step S202. Two processes may be executed in parallel.
 次いで、知識特徴量推定手段104は、ステップS202において算出された言語特徴量、ステップS203において算出された対話特徴量、および知識特徴量推定モデル記憶手段103に記憶された知識特徴量推定モデルを用いて知識特徴量を推定する(ステップS204)。 Next, the knowledge feature quantity estimation unit 104 uses the language feature quantity calculated in step S202, the dialogue feature quantity calculated in step S203, and the knowledge feature quantity estimation model stored in the knowledge feature quantity estimation model storage unit 103. The knowledge feature amount is estimated (step S204).
 次いで、知識レベル推定手段106は、知識特徴量推定手段104による知識特徴量の推定結果、および知識レベル推定モデル記憶手段105に記憶された知識レベル推定モデルを用いて知識レベルを推定する(ステップS205)。知識レベルの推定結果を出力した後、会話分析装置100は、処理を終了する。 Next, the knowledge level estimation means 106 estimates the knowledge level using the knowledge feature quantity estimation result by the knowledge feature quantity estimation means 104 and the knowledge level estimation model stored in the knowledge level estimation model storage means 105 (step S205). ). After outputting the knowledge level estimation result, the conversation analysis apparatus 100 ends the process.
 本実施形態における会話分析装置は、対話音声データと、対話音声データのテキストデータとを入力データとして、テキストデータに含まれる単語に関する特徴量を抽出する言語特徴量抽出手段と、音声データから話者間の対話状態に関する特徴量を算出する対話特徴量抽出手段とを含む特徴量抽出手段を備える。また、会話分析装置は、言語特徴量および対話特徴量から算出される知識特徴量を推定する知識特徴量推定手段と、知識特徴量推定手段における知識特徴量の推定結果を用いて知識レベルを推定する知識レベル推定手段とを備える。また、会話分析装置は、知識レベル推定に用いられる知識特徴量を示す識別パターンを保持する知識特徴量推定モデルを記憶する知識特徴量推定モデル記憶手段を備えていてもよい。 The conversation analysis apparatus according to the present embodiment includes dialogue feature data extraction means for extracting feature values relating to words included in text data, using dialogue voice data and text data of the dialogue voice data as input data, and a speaker from the voice data. A feature amount extracting means including a dialog feature amount extracting means for calculating a feature amount relating to the dialogue state between the two. In addition, the conversation analysis device estimates a knowledge level using a knowledge feature amount estimation unit that estimates a knowledge feature amount calculated from a language feature amount and a dialogue feature amount, and a knowledge feature amount estimation result in the knowledge feature amount estimation unit. Knowledge level estimating means. In addition, the conversation analysis apparatus may include a knowledge feature quantity estimation model storage unit that stores a knowledge feature quantity estimation model that holds an identification pattern indicating a knowledge feature quantity used for knowledge level estimation.
 本実施形態における会話分析装置は、書き言葉と異なる崩れた文章が入力された場合であっても、ユーザの知識レベルを頑健に推定できる。その理由は、会話分析装置が、知識レベルを推定する際に、テキストデータには含まれない、音声認識結果の精度や文章の崩れに影響を受けにくい、発話のタイミングや発話のスピードなどの対話特徴量を用いているからである。 The conversation analysis apparatus according to the present embodiment can robustly estimate the user's knowledge level even when a broken sentence different from the written word is input. The reason is that when the conversation analyzer estimates the knowledge level, it is not affected by the accuracy of the speech recognition results and the collapse of the text, which are not included in the text data, and the dialogue such as the timing of speech and the speed of speech. This is because feature quantities are used.
 さらに、会話分析装置は、知識特徴量推定手段が言語特徴量と対話特徴量の異なった特徴量によって求められる知識特徴量を推定する。これにより、会話分析装置は、例えば、フランクな話し方の音声データが入力された場合や認識率が低い場合であっても、知識レベル推定への影響を低減できる。その理由は(即ち、知識レベル推定への影響が低減する理由は)、言語特徴量の影響を受けない対話的な特徴量から求められた他の知識特徴量の推定結果が、言語特徴量に基づく知識特徴量の誤った推定結果を補完できるからである。
[評価実験]
Further, in the conversation analysis device, the knowledge feature amount estimation unit estimates a knowledge feature amount obtained by a feature amount that is different from the language feature amount and the dialogue feature amount. Thereby, the conversation analyzer can reduce the influence on knowledge level estimation even when, for example, voice data of a frank speech is input or when the recognition rate is low. The reason (that is, the reason why the influence on the knowledge level estimation is reduced) is that the estimation result of other knowledge features obtained from the interactive features not affected by the language features is the language feature. This is because an erroneous estimation result of the knowledge feature amount based on it can be complemented.
[Evaluation experiment]
 以下、本実施形態の会話分析装置に対する評価実験例を、図5を参照して説明する。以下の説明における評価実験は、コンタクトセンタ通話における顧客の知識レベルを推定する実験である。コンタクトセンタ通話は、商品またはサービスに対する問合せや相談の受付を行うコンタクトセンタにかかってきた電話における対話を意味する。なお、図5に示す内容は、実際に行った事項に基づく数値結果である。 Hereinafter, an example of an evaluation experiment for the conversation analysis apparatus according to the present embodiment will be described with reference to FIG. The evaluation experiment in the following description is an experiment for estimating a customer's knowledge level in a contact center call. A contact center call refers to a telephone conversation that has been made to a contact center that accepts inquiries and consultations regarding products or services. Note that the contents shown in FIG. 5 are numerical results based on the actual items performed.
 評価実験では、一方のチャネルにオペレータの音声、もう一方のチャネルに顧客の音声が録音されたステレオ音声と、ステレオ音声の音声認識結果を入力データとした。入力データに基づいて、上記の方法により言語特徴量と対話特徴量を抽出し、知識レベルを推定した。 In the evaluation experiment, the input data was the stereo voice in which the operator's voice was recorded in one channel and the customer's voice was recorded in the other channel, and the voice recognition result of the stereo voice. Based on the input data, language features and dialogue features were extracted by the above method, and the knowledge level was estimated.
 また、評価に用いられる正解データである知識ラベルには、人手による主観評価に基づいて、通話単位ごとに「知識レベル高」と「知識レベル低」の2値を付与した。このように作成された正解データを100ファイル用意し、評価実験を行った。なお、100ファイルの内訳は、「知識レベル高」のファイルが46ファイル、「知識レベル低」のファイルが54ファイルであった。 Also, the knowledge label, which is correct data used for the evaluation, was given two values, “high knowledge level” and “low knowledge level” for each call unit based on the subjective evaluation by hand. 100 files of correct data created in this way were prepared and an evaluation experiment was performed. The breakdown of 100 files was 46 files with “high knowledge level” and 54 files with “low knowledge level”.
 知識特徴量推定モデルの学習、知識レベル推定モデルの学習および知識レベルの推定評価の際には、10分割交差検定を実施した。10分割交差検定では、データを10グループに分割し、分割したグループのうち9つのグループを学習データに、残りの1グループを評価データにした。そして、評価データにする1グループを変えることによって作成された、10パターンの学習データと評価データの組み合わせに対して検定を実施した。 10-fold cross-validation was carried out when learning the knowledge feature estimation model, learning the knowledge level estimation model, and estimating the knowledge level. In the 10-fold cross validation, the data was divided into 10 groups, of which 9 groups were used as learning data, and the remaining 1 group was used as evaluation data. Then, a test was performed on a combination of 10 patterns of learning data and evaluation data created by changing one group to be evaluation data.
 本評価実験では、ユーザが知識レベルの判断理由を記述し、記述された判断理由をもとに知識特徴が定義されている。具体的には、記述された判断理由を分析することによって4つの知識レベルの判断因子(例えば、「専門用語」、「会話の流暢さ」)が見つかり、見つかった判断因子を知識特徴にした。 In this evaluation experiment, the user describes the reason for judging the knowledge level, and the knowledge feature is defined based on the written reason for judgment. Specifically, four knowledge level judgment factors (for example, “technical terms” and “conversation fluency”) were found by analyzing the written judgment reasons, and the found judgment factors were used as knowledge features.
 教師データになる知識特徴量ラベルは、判断因子に基づいてデータをクラスタリングすることによって生成された。生成された知識特徴量ラベルにおいて、例えば、知識特徴の1つである「専門用語」の知識特徴量は、対象になる学習データ中に専門用語が含まれているか否かを示す。すなわち、「専門用語」の知識特徴量では、該当話者が専門用語を使っているかが「0(専門用語無し)」または「1(専門用語あり)」で表現されている。 The knowledge feature amount label that becomes teacher data was generated by clustering the data based on the judgment factors. In the generated knowledge feature amount label, for example, the knowledge feature amount of “technical term” that is one of the knowledge features indicates whether or not the technical term is included in the target learning data. That is, in the knowledge feature amount of “technical term”, whether the corresponding speaker uses the technical term is expressed by “0 (no technical term)” or “1 (with technical term)”.
 同様に、知識特徴の1つである「会話の流暢さ」の知識特徴量は、対象になる学習データでの会話が流暢であったか否かが「0(流暢ではない)」または「1(流暢である)」で表現されている。 Similarly, the knowledge feature quantity of “conversation fluency” which is one of the knowledge features is “0 (not fluent)” or “1 (fluency) whether or not the conversation in the target learning data is fluent. Is).
 そして、学習データから抽出された言語特徴量と対話特徴量、および教師データである知識特徴量ラベルから知識特徴量推定モデルが生成される。次いで、学習データの場合と同様に評価データに対して言語特徴量と対話特徴量が求められ、生成された知識特徴量推定モデルを用いて4つの知識特徴量が推定される。 Then, a knowledge feature amount estimation model is generated from the language feature amount and dialogue feature amount extracted from the learning data and the knowledge feature amount label which is teacher data. Next, as in the case of the learning data, a language feature amount and a dialogue feature amount are obtained for the evaluation data, and four knowledge feature amounts are estimated using the generated knowledge feature amount estimation model.
 生成された知識特徴には、専門用語などの言語的な特徴を表す因子や、会話の流れに関する対話的な特徴を表す因子が含まれている。4つの知識特徴量のうち1つの知識特徴量は、言語特徴量のみから推定されている。別の1つの知識特徴量は、対話特徴量のみから推定されている。 The generated knowledge features include factors representing linguistic features such as technical terms and factors representing interactive features related to the flow of conversation. Of the four knowledge feature quantities, one knowledge feature quantity is estimated from only the language feature quantity. Another knowledge feature amount is estimated only from the dialogue feature amount.
 残る2つの知識特徴量は、言語特徴量と対話特徴量の組み合わせから推定されている。なお、残る2つの知識特徴量の推定において、言語特徴量および対話特徴量のそれぞれの具体的な特徴量に対する重みは異なる。 The remaining two knowledge feature quantities are estimated from combinations of language feature quantities and dialogue feature quantities. In the estimation of the remaining two knowledge feature quantities, the weights of the specific feature quantities of the language feature quantity and the dialogue feature quantity are different.
 そして、上記のように生成された知識特徴量推定モデルによって推定された4つの知識特徴量の各出力結果を統合することによって、知識レベルを推定した。 Then, the knowledge level was estimated by integrating the output results of the four knowledge features estimated by the knowledge feature estimation model generated as described above.
 実験における評価指標として、「知識レベル高」および「知識レベル低」の各実験パターンにおいて、F値を求めた。F値は、以下の式を用いて算出される。 As an evaluation index in the experiment, F values were obtained for each experimental pattern of “knowledge level high” and “knowledge level low”. The F value is calculated using the following equation.
 F値=(2×再現率×適合率)/(再現率+適合率)・・・式(1) F value = (2 × recall rate × relevance rate) / (recall rate + relevance rate) Equation (1)
 なお、式(1)に示す再現率および適合率は、「知識レベル高」の場合、以下の式を用いてそれぞれ算出される。 It should be noted that the recall and precision shown in Equation (1) are calculated using the following equations when “knowledge level is high”.
 再現率=(「知識レベル高」と正しく推定できた数)/(「知識レベル高」の正解数)・・・式(2)
 適合率=(「知識レベル高」と正しく推定できた数)/(「知識レベル高」と推定した数)・・・式(3)
Reproducibility = (number that can be correctly estimated as “high knowledge level”) / (number of correct answers for “high knowledge level”) (2)
Relevance rate = (number that can be correctly estimated as “high knowledge level”) / (number that is estimated as “high knowledge level”) (3)
 同様に、式(1)に示す再現率および適合率は、「知識レベル低」の場合、以下の式を用いてそれぞれ算出される。 Similarly, the recall and precision shown in equation (1) are calculated using the following equations when the knowledge level is low.
 再現率=(「知識レベル低」と正しく推定できた数)/(「知識レベル低」の正解数)・・・式(4)
 適合率=(「知識レベル低」と正しく推定できた数)/(「知識レベル低」と推定した数)・・・式(5)
Reproducibility = (number that can be correctly estimated as “knowledge level low”) / (number of correct answers for “knowledge level low”) (4)
Relevance rate = (number that can be correctly estimated as “low knowledge level”) / (number that is estimated as “low knowledge level”) (5)
 本評価実験では、上記の特許文献1に記載されている手法、知識特徴量を用いず言語特徴量および対話特徴量から直接知識レベルを推定する手法と、本実施形態における手法とを比較した。なお、知識特徴量を用いず直接知識レベルを推定する手法では、本実施形態における手法で使用された言語特徴量および対話特徴量と同一の特徴量を用いた。 In this evaluation experiment, the method described in Patent Document 1 above, the method of directly estimating the knowledge level from the language feature amount and the dialogue feature amount without using the knowledge feature amount, and the method in the present embodiment were compared. Note that in the method of directly estimating the knowledge level without using the knowledge feature amount, the same feature amount as the language feature amount and the dialogue feature amount used in the method in the present embodiment is used.
 評価実験の結果を図5に示す。図5は、本実施形態の会話分析装置による評価実験の評価結果と他の手法による評価実験の評価結果を示す説明図である。図5(a)は、特許文献1に記載されている手法(図5(a)に示す「関連手法」)による評価結果と本実施形態における手法(図5(a)に示す「提案手法」)による評価結果とを比較した説明図である。また、図5(b)は、知識特徴量を用いず言語特徴量および対話特徴量から直接知識レベルを推定する手法(図5(b)に示す「知識特徴量無し」)による評価結果と、本実施形態における手法(図5(b)に示す「提案手法」)による評価結果とを比較した説明図である。 The results of the evaluation experiment are shown in FIG. FIG. 5 is an explanatory diagram showing the evaluation result of the evaluation experiment by the conversation analysis device of this embodiment and the evaluation result of the evaluation experiment by another method. FIG. 5A shows an evaluation result by the method described in Patent Document 1 (“related method” shown in FIG. 5A) and the method in this embodiment (“proposed method” shown in FIG. 5A). It is explanatory drawing which compared the evaluation result by. FIG. 5B shows an evaluation result obtained by a technique (“no knowledge feature” shown in FIG. 5B) that directly estimates the knowledge level from the language feature and the dialogue feature without using the knowledge feature. It is explanatory drawing which compared the evaluation result by the method ("proposed method" shown in FIG.5 (b)) in this embodiment.
 図5(a)には2つの評価結果のグループが示されている。即ち、図5(a)の左のグループは、「知識レベル高」のデータから知識レベルを推定する評価実験における評価結果を、右のグループは「知識レベル低」のデータから知識レベルを推定する評価実験における評価結果をそれぞれ表す。 Fig. 5 (a) shows two groups of evaluation results. That is, the left group in FIG. 5A estimates the evaluation result in the evaluation experiment for estimating the knowledge level from the data of “knowledge level high”, and the right group estimates the knowledge level from the data of “knowledge level low”. Each evaluation result in the evaluation experiment is shown.
 また、図5(a)に示す評価結果の各グループは2種類のデータで構成されている。即ち、図5(a)における左の棒グラフは「関連手法」による評価結果、右の棒グラフは「提案手法」による評価結果をそれぞれ表す。図5(b)も、図5(a)と同様に構成されている。 In addition, each group of evaluation results shown in FIG. 5A is composed of two types of data. That is, the left bar graph in FIG. 5A represents the evaluation result by the “related method”, and the right bar graph represents the evaluation result by the “proposed method”. FIG. 5B is also configured in the same manner as FIG.
 図5(a)に示すように、「提案手法」を用いた場合、「知識レベル高」および「知識レベル低」の両方の実験パターンにおいて、「関連手法」に比べて知識レベルの推定精度が高くなった。また、図5(b)に示すように、「提案手法」を用いた場合、両方の実験パターンにおいて「知識特徴量無し」に比べて知識レベルの推定精度が高くなった。以上より、会話内容を対象にした知識レベル推定において、本実施形態における会話分析装置を使用する手法は、他の手法に比べて有効である。 As shown in FIG. 5 (a), when the “proposed method” is used, in both the “knowledge level high” and “knowledge level low” experimental patterns, the knowledge level estimation accuracy is higher than that of the “related method”. It became high. As shown in FIG. 5B, when the “proposed method” is used, the knowledge level estimation accuracy is higher in both experimental patterns than in the case of “no knowledge feature amount”. As described above, in the knowledge level estimation for the conversation content, the technique using the conversation analysis apparatus in this embodiment is more effective than other techniques.
 次に、本発明の概要を説明する。図6は、本発明の実施の形態における会話分析システムの概要を示すブロック図である。本発明の実施の形態における会話分析システム10は、音声データおよび音声データのテキストデータから話者間の会話状態に関する特徴量である会話特徴量(例えば、対話特徴量)を抽出する会話特徴量抽出手段11(例えば、対話特徴量抽出手段102b)を備える。また、会話分析システム10は、テキストデータに含まれる単語に関する特徴量である言語特徴量を抽出する言語特徴量抽出手段12(例えば、言語特徴量抽出手段102a)を備える。また、会話分析システム10は、抽出された会話特徴量および言語特徴量と、知識特徴量を示す識別パターンを保持する知識特徴量推定モデルとから知識特徴量を推定する知識特徴量推定手段13(例えば、知識特徴量推定手段104)を備える。また、会話分析システム10は、推定された知識特徴量を統合して話者の知識レベルを推定する知識レベル推定手段14(例えば、知識レベル推定手段106)を備える。 Next, the outline of the present invention will be described. FIG. 6 is a block diagram showing an outline of the conversation analysis system in the embodiment of the present invention. The conversation analysis system 10 according to the embodiment of the present invention extracts a conversation feature quantity that extracts a conversation feature quantity (for example, a dialogue feature quantity) that is a feature quantity related to a conversation state between speakers from voice data and text data of the voice data. Means 11 (for example, dialogue feature amount extraction means 102b) is provided. The conversation analysis system 10 further includes a language feature quantity extraction unit 12 (for example, a language feature quantity extraction unit 102a) that extracts a language feature quantity that is a feature quantity related to a word included in text data. Further, the conversation analysis system 10 has knowledge feature quantity estimation means 13 (13) for estimating a knowledge feature quantity from the extracted conversation feature quantity and language feature quantity, and a knowledge feature quantity estimation model holding an identification pattern indicating the knowledge feature quantity. For example, knowledge feature amount estimation means 104) is provided. The conversation analysis system 10 further includes knowledge level estimation means 14 (for example, knowledge level estimation means 106) that estimates the knowledge level of the speaker by integrating the estimated knowledge feature quantities.
 そのような構成により、会話分析システムは、書き言葉とは異なる崩れた文章が入力された場合であっても、頑健に話者の知識レベルを推定できる。 With such a configuration, the conversation analysis system can robustly estimate the speaker's knowledge level even when a broken sentence different from the written word is input.
 また、知識特徴量推定モデルは、学習用の音声データおよび音声データのテキストデータから計算された言語特徴量および会話特徴量と、教師データである知識特徴量ラベル(例えば、知識特徴量ラベル112)とから学習された知識特徴量を示す識別パターンを保持してもよい。 The knowledge feature amount estimation model includes a language feature amount and a conversation feature amount calculated from learning speech data and text data of the speech data, and a knowledge feature amount label that is teacher data (for example, the knowledge feature amount label 112). An identification pattern indicating the knowledge feature amount learned from the above may be held.
 そのような構成により、会話分析システムは、予め言語特徴量および会話特徴量の入力に適している知識特徴量推定モデルを使用できる。 With such a configuration, the conversation analysis system can use a knowledge feature quantity estimation model suitable for inputting language feature quantities and conversation feature quantities in advance.
 また、知識レベル推定手段14は、知識レベルを示す識別パターンを保持する知識レベル推定モデルにより推定された知識特徴量を統合して知識レベルを推定してもよい。 Further, the knowledge level estimation means 14 may estimate the knowledge level by integrating the knowledge feature quantities estimated by the knowledge level estimation model holding the identification pattern indicating the knowledge level.
 そのような構成により、会話分析システムは、知識レベル推定モデルに基づいて知識レベルを推定できる。 With such a configuration, the conversation analysis system can estimate the knowledge level based on the knowledge level estimation model.
 また、知識レベル推定モデルは、学習用の音声データおよび音声データのテキストデータに対する知識特徴量ラベルと、教師データである知識ラベル(例えば、知識ラベル113)とから学習された知識レベルを示す識別パターンを保持してもよい。 The knowledge level estimation model is an identification pattern indicating a knowledge level learned from a knowledge feature amount label for speech data for learning and text data of the speech data and a knowledge label (for example, knowledge label 113) which is teacher data. May be held.
 そのような構成により、会話分析システムは、予め知識特徴量ラベルの入力に適している知識レベル推定モデルを使用できる。 With such a configuration, the conversation analysis system can use a knowledge level estimation model that is suitable for inputting knowledge feature amount labels in advance.
 また、会話分析システム10は、音声データおよび音声データのテキストデータから、同一話者による発話検出区間が連続している発話区間を求める発話区間算出手段(例えば、発話区間算出手段101)を備えてもよい。言語特徴量抽出手段12は、係る発話区間に基づいて言語特徴量を抽出し、会話特徴量抽出手段11は、係る発話区間に基づいて会話特徴量を抽出してもよい。 In addition, the conversation analysis system 10 includes speech section calculation means (for example, speech section calculation means 101) for obtaining speech sections in which speech detection sections by the same speaker are continuous from speech data and text data of speech data. Also good. The language feature amount extraction unit 12 may extract a language feature amount based on the utterance section, and the conversation feature amount extraction unit 11 may extract the conversation feature amount based on the utterance section.
 そのような構成により、会話分析システムは、入力データから発話区間を求めることができ、発話区間に基づいて言語特徴量および会話特徴量を抽出できる。 With such a configuration, the conversation analysis system can obtain an utterance section from input data, and can extract a language feature amount and a conversation feature amount based on the utterance section.
 また、発話区間算出手段は、発話の主導権に基づいて発話を分類した分類結果を出力し、言語特徴量抽出手段12は、分類結果に基づいて言語特徴量を抽出し、会話特徴量抽出手段11は、分類結果に基づいて会話特徴量を抽出してもよい。 The utterance section calculating means outputs a classification result obtained by classifying the utterances based on the initiative of the utterance, and the language feature quantity extracting means 12 extracts the language feature quantity based on the classification result, and the conversation feature quantity extracting means. 11 may extract a conversation feature amount based on the classification result.
 そのような構成により、会話分析システムは、発話を分類でき、発話の分類結果に基づいて言語特徴量および会話特徴量を抽出できる。 With such a configuration, the conversation analysis system can classify utterances and extract language feature quantities and conversation feature quantities based on utterance classification results.
 また、会話分析システム10は、知識特徴量推定モデルを記憶する知識特徴量推定モデル記憶手段(例えば、知識特徴量推定モデル記憶手段103)と、知識レベル推定モデルを記憶する知識レベル推定モデル記憶手段(例えば、知識レベル推定モデル記憶手段105)とを備えていてもよい。 The conversation analysis system 10 includes a knowledge feature quantity estimation model storage unit (for example, a knowledge feature quantity estimation model storage unit 103) that stores a knowledge feature quantity estimation model, and a knowledge level estimation model storage unit that stores a knowledge level estimation model. (For example, a knowledge level estimation model storage unit 105).
 また、知識特徴量推定手段13は、少なくとも1つの知識特徴量を言語特徴量と会話特徴量に基づいて推定してもよい。 Further, the knowledge feature quantity estimation means 13 may estimate at least one knowledge feature quantity based on the language feature quantity and the conversation feature quantity.
 また、知識特徴量推定手段13は、少なくとも1つの知識特徴量を言語特徴量のみに基づいて推定してもよい。 Further, the knowledge feature quantity estimation means 13 may estimate at least one knowledge feature quantity based only on the language feature quantity.
 また、知識特徴量推定手段13は、少なくとも1つの知識特徴量を対話特徴量のみに基づいて推定してもよい。 Further, the knowledge feature quantity estimation means 13 may estimate at least one knowledge feature quantity based only on the dialogue feature quantity.
 [ハードウェア及びソフトウェア・プログラム(コンピュータ・プログラム)の構成]
 以下、上記説明した本発明の実施形態を実現可能な具体的な構成(ハードウェア構成及びソフトウェア・プログラムの構成)について説明する。
[Configuration of hardware and software program (computer program)]
A specific configuration (a hardware configuration and a software program configuration) capable of realizing the above-described embodiment of the present invention will be described below.
 上記説明した会話分析装置100、あるいは、会話分析システム10を構成する構成要素は、当該構成要素の機能を提供する手段を実装可能な任意の実現手段により実現可能である。例えば、図1及び図3に例示する会話分析装置100において、参照符号101乃至参照符号111が付与された各構成要素は、当該構成要素の機能を提供する手段が実装された、物理的あるいは論理的な部品(会話分析装置100の構成部品)あるいはその組合せとして実現されてもよい。同様に、図6に例示する会話分析システム10において、参照符号11乃至参照符号14が付与された各構成要素は、当該構成要素の機能を提供する手段が実装された、物理的あるいは論理的な部品(会話分析システム10の構成部品)あるいはその組合せとして実現されてもよい。この場合、物理的な部品は、例えば、後述する電子回路あるいはコンピュータ装置として実現可能である。論理的な部品は、例えば、電子回路あるいはコンピュータ装置において実行されるソフトウェア・プログラムとして実現可能である。なお、この場合、上記各構成要素の機能を提供する手段は、係る機能が実装された、装置あるいはシステムの構成部(ユニット)として実現されると理解されてもよい。 The components constituting the conversation analysis apparatus 100 or the conversation analysis system 10 described above can be realized by any realization means that can implement means for providing the functions of the constituent elements. For example, in the conversation analysis apparatus 100 illustrated in FIG. 1 and FIG. 3, each of the components to which reference numerals 101 to 111 are assigned is a physical or logical device in which means for providing the function of the component is implemented. It may be realized as a typical part (component of conversation analysis apparatus 100) or a combination thereof. Similarly, in the conversation analysis system 10 illustrated in FIG. 6, each constituent element to which reference numerals 11 to 14 are assigned is a physical or logical component in which means for providing the function of the constituent element is implemented. It may be realized as a component (component of the conversation analysis system 10) or a combination thereof. In this case, the physical component can be realized as, for example, an electronic circuit or a computer device described later. The logical component can be realized as a software program executed in an electronic circuit or a computer device, for example. In this case, it may be understood that the means for providing the function of each component described above is realized as a component (unit) of an apparatus or a system in which the function is mounted.
 以下の説明においては、上記説明した会話分析装置100、及び、会話分析システム10をまとめて、単に「会話分析システム」と称する。またこれら会話分析システムの各構成要素を、単に「会話分析システムの構成要素」と称する。 In the following description, the conversation analysis apparatus 100 and the conversation analysis system 10 described above are collectively referred to as a “conversation analysis system”. Each component of the conversation analysis system is simply referred to as “component of the conversation analysis system”.
 上記実施形態において説明した会話分析システムは、1つ又は複数の専用のハードウェア装置により構成されてもよい。その場合、上記各図(図1、図3、及び、図6)に示した各構成要素は、その一部又は全部を統合したハードウェア(処理ロジックを実装した集積回路あるいは記憶デバイス等)を用いて実現されてもよい。 The conversation analysis system described in the above embodiment may be configured by one or a plurality of dedicated hardware devices. In that case, each component shown in each of the above figures (FIGS. 1, 3, and 6) includes hardware (an integrated circuit or a storage device on which processing logic is mounted) that is partially or fully integrated. It may be realized using.
 例えば、会話分析システムが専用のハードウェアにより実現される場合、係る会話分析システムの構成要素は、それぞれの機能を提供可能な集積回路(例えば、SoC(System on a Chip)等)を用いて実装されてもよい。この場合、会話分析システムの構成要素が保持するデータは、例えば、SoCとして統合されたRAM(Random Access Memory)領域やフラッシュメモリ領域、あるいは、当該SoCに接続された記憶デバイス(磁気ディスク等)に記憶されてもよい。また、この場合、会話分析システムの各構成要素を接続する通信回線としては、周知の通信バスを採用してもよい。また、各構成要素を接続する通信回線はバス接続に限らず、それぞれの構成要素間をピアツーピアで接続してもよい。会話分析システムを複数のハードウェア装置により構成する場合、それぞれのハードウェア装置の間は、任意の通信手段(有線、無線、またはそれらの組み合わせ)により通信可能に接続されていてもよい。 For example, when the conversation analysis system is realized by dedicated hardware, the components of the conversation analysis system are implemented using an integrated circuit (such as SoC (System on a Chip)) that can provide each function. May be. In this case, the data held by the components of the conversation analysis system is stored in, for example, a RAM (Random Access Memory) area or flash memory area integrated as a SoC, or a storage device (magnetic disk or the like) connected to the SoC. It may be stored. In this case, a well-known communication bus may be adopted as a communication line for connecting each component of the conversation analysis system. Further, the communication line connecting each component is not limited to bus connection, and each component may be connected by peer-to-peer. When the conversation analysis system is constituted by a plurality of hardware devices, the respective hardware devices may be communicably connected by any communication means (wired, wireless, or a combination thereof).
 また、上述した会話分析システム又はその構成要素は、図8に例示するような汎用のハードウェアと、係るハードウェアによって実行される各種ソフトウェア・プログラム(コンピュータ・プログラム)とによって構成されてもよい。この場合、会話分析システムは、任意の数の、汎用のハードウェア装置及びソフトウェア・プログラムにより構成されてもよい。即ち、会話分析システムを構成する構成要素毎に、個別のハードウェア装置が割り当てられてもよく、複数の構成要素が、一つのハードウェア装置を用いて実現されてもよい。 Further, the above-described conversation analysis system or its components may be configured by general-purpose hardware as exemplified in FIG. 8 and various software programs (computer programs) executed by the hardware. In this case, the conversation analysis system may be configured by any number of general-purpose hardware devices and software programs. That is, an individual hardware device may be assigned to each component constituting the conversation analysis system, and a plurality of components may be realized using one hardware device.
 図8における演算装置801は、汎用のCPU(中央処理装置:Central Processing Unit)やマイクロプロセッサ等の演算処理装置である。演算装置801は、例えば後述する不揮発性記憶装置803に記憶された各種ソフトウェア・プログラムを記憶装置802に読み出し、係るソフトウェア・プログラムに従って処理を実行してもよい。例えば、上記実施形態における会話分析システムの構成要素は、演算装置801により実行されるソフトウェア・プログラムとして実現されてもよい。 8 is an arithmetic processing device such as a general-purpose CPU (Central Processing Unit) or a microprocessor. The arithmetic device 801 may read various software programs stored in a non-volatile storage device 803, which will be described later, into the storage device 802, and execute processing according to the software programs. For example, the components of the conversation analysis system in the above embodiment may be realized as a software program executed by the arithmetic device 801.
 記憶装置802は、演算装置801から参照可能な、RAM等のメモリ装置であり、ソフトウェア・プログラムや各種データ等を記憶する。なお、記憶装置802は、揮発性のメモリ装置であってもよい。 The storage device 802 is a memory device such as a RAM that can be referred to from the arithmetic device 801, and stores software programs, various data, and the like. Note that the storage device 802 may be a volatile memory device.
 不揮発性記憶装置803は、例えば磁気ディスクドライブや、フラッシュメモリによる半導体記憶装置のような、不揮発性の記憶装置である。不揮発性記憶装置803は、各種ソフトウェア・プログラムやデータ等を記憶可能である。 The nonvolatile storage device 803 is a nonvolatile storage device such as a magnetic disk drive or a semiconductor storage device using flash memory. The nonvolatile storage device 803 can store various software programs and data.
 ドライブ装置804は、例えば、後述する記憶媒体805に対するデータの読み込みや書き込みを処理する装置である。 The drive device 804 is a device that processes reading and writing of data with respect to a storage medium 805 described later, for example.
 記憶媒体805は、例えば光ディスク、光磁気ディスク、半導体フラッシュメモリ等、データを記録可能な任意の記憶媒体である。 The storage medium 805 is an arbitrary storage medium capable of recording data, such as an optical disk, a magneto-optical disk, and a semiconductor flash memory.
 上述した実施形態を例に説明した本発明における会話分析システム(あるいはその構成要素)は、例えば、図8に例示するハードウェア装置に対して、上記実施形態において説明した機能を実現可能なソフトウェア・プログラムを供給することにより、実現されてもよい。より具体的には、例えば、係るハードウェア装置に対して供給したソフトウェア・プログラムを、演算装置801が実行することによって、本発明が実現されてもよい。この場合、係るハードウェア装置で稼働しているオペレーティングシステムや、データベース管理ソフト、ネットワークソフト、仮想環境基盤等のミドルウェアなどが各処理の一部を実行してもよい。 The conversation analysis system (or its constituent elements) according to the present invention described with the above-described embodiment as an example is, for example, a software program that can realize the functions described in the above-described embodiment with respect to the hardware device illustrated in FIG. It may be realized by supplying a program. More specifically, for example, the present invention may be realized by causing the arithmetic device 801 to execute a software program supplied to such a hardware device. In this case, an operating system running on the hardware device, database management software, network software, middleware such as a virtual environment platform, etc. may execute part of each process.
 上述した実施形態において、上記各図に示した各手段(あるいは、当該手段を実現可能な、システムの構成部分(ユニット))は、上述したハードウェアにより実行されるソフトウェア・プログラムの機能(処理)単位である、ソフトウェアモジュールとして実現することができる。即ち、会話分析装置100における各構成要素(発話区間算出手段101、特徴量抽出手段102、知識特徴量推定手段104、知識レベル推定手段106、知識特徴量推定モデル作成手段110、知識レベル推定モデル作成手段111等)は、それらの機能が実装された、ソフトウェアモジュールとして実現されてもよい。但し、上記各図面に示した各ソフトウェアモジュールの区分けは、説明の便宜上の構成であり、実装に際しては、様々な構成が想定され得る。 In the embodiment described above, each means (or system component (unit) capable of realizing the means) shown in each of the above figures is a function (process) of a software program executed by the hardware described above. It can be realized as a software module that is a unit. That is, each component in the conversation analysis apparatus 100 (the utterance interval calculation unit 101, the feature amount extraction unit 102, the knowledge feature amount estimation unit 104, the knowledge level estimation unit 106, the knowledge feature amount estimation model creation unit 110, and the knowledge level estimation model creation. The means 111 and the like may be realized as a software module in which those functions are implemented. However, the classification of the software modules shown in the drawings is a configuration for convenience of explanation, and various configurations can be assumed for the implementation.
 例えば、図1、図3、図6に示した各部をソフトウェアモジュールとして実現する場合、これらのソフトウェアモジュールを不揮発性記憶装置803に記憶しておき、演算装置801がそれぞれの処理を実行する際に、これらのソフトウェアモジュールを記憶装置802に読み出すように構成してもよい。 For example, when the units illustrated in FIGS. 1, 3, and 6 are implemented as software modules, these software modules are stored in the nonvolatile storage device 803, and the arithmetic device 801 executes each process. These software modules may be read out to the storage device 802.
 また、これらのソフトウェアモジュール間は、共有メモリやプロセス間通信等の適宜の方法により、相互に各種データを伝達できるように構成してもよい。このような構成により、これらのソフトウェアモジュール間は、相互に通信可能に接続可能である。 In addition, these software modules may be configured to transmit various data to each other by an appropriate method such as shared memory or interprocess communication. With such a configuration, these software modules can be connected so as to communicate with each other.
 更に、上記ソフトウェア・プログラムは記憶媒体805に記録されてもよい。この場合、上記ソフトウェア・プログラムは、上記会話分析システム等の出荷段階、あるいは運用段階等において、適宜ドライブ装置804を通じて不揮発性記憶装置803に格納されるように構成されてもよい。 Further, the software program may be recorded on the storage medium 805. In this case, the software program may be configured to be stored in the non-volatile storage device 803 through the drive device 804 as appropriate at the shipping stage or operation stage of the conversation analysis system.
 なお、上記の場合において、上記ハードウェアへの各種ソフトウェア・プログラムの供給方法は、出荷前の製造段階、あるいは出荷後のメンテナンス段階等において、適当な治具を利用して当該装置内にインストールする方法を採用してもよい。また、各種ソフトウェア・プログラムの供給方法は、インターネット等の通信回線を介して外部からダウンロードする方法等のように、現在では一般的な手順を採用してもよい。 In the above case, the method of supplying various software programs to the hardware is installed in the apparatus using an appropriate jig in the manufacturing stage before shipment or the maintenance stage after shipment. A method may be adopted. As a method for supplying various software programs, a general procedure may be adopted at present, such as a method of downloading from the outside via a communication line such as the Internet.
 そして、このような場合において、本発明は、係るソフトウェア・プログラムを構成するコード、あるいは係るコードが記録されたところの、コンピュータ読み取り可能な記憶媒体によって構成されると捉えることができる。この場合、係る記憶媒体は、ハードウェア装置と独立した媒体に限らず、インターネットなどにより伝送されたソフトウェア・プログラムをダウンロードして記憶又は一時記憶した記憶媒体を含む。 In such a case, the present invention can be understood to be constituted by a code constituting the software program or a computer-readable storage medium in which the code is recorded. In this case, the storage medium is not limited to a medium independent of the hardware device, but includes a storage medium in which a software program transmitted via the Internet or the like is downloaded and stored or temporarily stored.
 また、上述した会話分析システムは、図8に例示するハードウェア装置を仮想化した仮想化環境と、当該仮想化環境において実行される各種ソフトウェア・プログラム(コンピュータ・プログラム)とによって構成されてもよい。この場合、図8に例示するハードウェア装置の構成要素は、当該仮想化環境における仮想デバイスとして提供される。なお、この場合も、図8に例示するハードウェア装置を物理的な装置として構成した場合と同様の構成にて、本発明を実現可能である。 In addition, the conversation analysis system described above may be configured by a virtual environment in which the hardware device illustrated in FIG. 8 is virtualized and various software programs (computer programs) executed in the virtual environment. . In this case, the components of the hardware device illustrated in FIG. 8 are provided as virtual devices in the virtual environment. In this case as well, the present invention can be realized with the same configuration as when the hardware device illustrated in FIG. 8 is configured as a physical device.
 本発明は、例えば、コンタクトセンタをはじめ、様々な顧客接点、すなわち顧客と店員やオペレータなどのビジネス関係者との会話を収録した音声データベースを用いて、知識レベルをもとにユーザの傾向を分析する会話分析装置等に適用可能である。また、本発明は、例えば、係る会話分析装置をコンピュータを用いて実現するプログラム等の用途にも適用可能である。また、本発明は、知識レベルではなく、ユーザの興味関心、嗜好などといった会話の言葉や会話のやり取りの中から、言語的特徴および対話的特徴を抽出可能な会話分析装置にも適用可能である。 The present invention analyzes a user's tendency based on a knowledge level using, for example, a voice database containing conversations between various contact points of a customer such as a contact center, that is, a customer and a business person such as a store clerk or an operator. It can be applied to a conversation analysis device or the like. The present invention can also be applied to applications such as a program for realizing the conversation analysis apparatus using a computer. The present invention can also be applied to a conversation analysis apparatus that can extract linguistic features and interactive features from conversation words and conversation exchanges such as user interests and preferences rather than knowledge level. .
 以上、上述した実施形態を模範的な例として本発明を説明した。しかしながら、本発明は、上述した実施形態には限定されない。即ち、本発明は、本発明のスコープ内において、当業者が理解し得る様々な態様を適用することができる。 The present invention has been described above using the above-described embodiment as an exemplary example. However, the present invention is not limited to the above-described embodiment. That is, the present invention can apply various modes that can be understood by those skilled in the art within the scope of the present invention.
 この出願は、2014年7月16日に出願された日本出願特願2014-145873を基礎とする優先権を主張し、その開示の全てをここに取り込む。 This application claims priority based on Japanese Patent Application No. 2014-145873 filed on July 16, 2014, the entire disclosure of which is incorporated herein.
1 発話列抽出部
2 発話意図判別部
3 特徴量抽出部
4 推定情報生成部
4a 知識量ラベル
5 知識量推定部
5a 推定情報記憶部
6 音声認識結果
7 音声認識結果
10 会話分析システム
11 会話特徴量抽出手段
12、102a 言語特徴量抽出手段
13、104 知識特徴量推定手段
14、106 知識レベル推定手段
100 会話分析装置
101 発話区間算出手段
102 特徴量抽出手段
102b 対話特徴量抽出手段
103 知識特徴量推定モデル記憶手段
105 知識レベル推定モデル記憶手段
110 知識特徴量推定モデル作成手段
111 知識レベル推定モデル作成手段
112 知識特徴量ラベル
113 知識ラベル
801 演算装置
802 記憶装置
803 不揮発性記憶装置
804 ドライブ装置
805 記憶媒体
DESCRIPTION OF SYMBOLS 1 Speech sequence extraction part 2 Speech intention discrimination | determination part 3 Feature quantity extraction part 4 Estimated information generation part 4a Knowledge quantity label 5 Knowledge quantity estimation part 5a Estimated information storage part 6 Speech recognition result 7 Speech recognition result 10 Conversation analysis system 11 Extraction means 12, 102a Language feature quantity extraction means 13, 104 Knowledge feature quantity estimation means 14, 106 Knowledge level estimation means 100 Conversation analyzer 101 Utterance section calculation means 102 Feature quantity extraction means 102b Dialogue feature quantity extraction means 103 Knowledge feature quantity estimation Model storage means 105 Knowledge level estimation model storage means 110 Knowledge feature quantity estimation model creation means 111 Knowledge level estimation model creation means 112 Knowledge feature quantity label 113 Knowledge label 801 Arithmetic device 802 Storage device 803 Non-volatile storage device 804 Drive device 805 Storage medium

Claims (9)

  1.  音声データおよび前記音声データのテキストデータから話者間の会話状態に関する特徴量である会話特徴量を抽出する会話特徴量抽出手段と、
     前記テキストデータに含まれる単語に関する特徴量である言語特徴量を抽出する言語特徴量抽出手段と、
     抽出された前記会話特徴量および前記言語特徴量と、知識特徴量を示す識別パターンを保持する知識特徴量推定モデルとから知識特徴量を推定する知識特徴量推定手段と、
     推定された前記知識特徴量を統合して前記話者の知識レベルを推定する知識レベル推定手段とを備える
     ことを特徴とする会話分析システム。
    A conversation feature quantity extraction means for extracting a conversation feature quantity which is a feature quantity related to a conversation state between speakers from voice data and text data of the voice data;
    A linguistic feature amount extracting means for extracting a linguistic feature amount that is a feature amount relating to a word included in the text data;
    Knowledge feature quantity estimation means for estimating a knowledge feature quantity from the extracted conversation feature quantity and the language feature quantity, and a knowledge feature quantity estimation model holding an identification pattern indicating the knowledge feature quantity;
    A conversation analysis system comprising: knowledge level estimation means for estimating the knowledge level of the speaker by integrating the estimated knowledge feature quantities.
  2.  前記知識特徴量推定モデルは、学習用の音声データおよび前記音声データのテキストデータから計算された言語特徴量および会話特徴量と、教師データである知識特徴量ラベルとから学習された知識特徴量を示す識別パターンを保持する
     請求項1記載の会話分析システム。
    The knowledge feature amount estimation model includes a knowledge feature amount learned from learning speech data and language feature amount and conversation feature amount calculated from text data of the speech data, and a knowledge feature amount label which is teacher data. The conversation analysis system according to claim 1, wherein an identification pattern to be displayed is held.
  3.  前記知識レベル推定手段は、知識レベルを示す識別パターンを保持する知識レベル推定モデルにより推定された知識特徴量を統合して知識レベルを推定する
     請求項1または請求項2記載の会話分析システム。
    The conversation analysis system according to claim 1, wherein the knowledge level estimation unit estimates a knowledge level by integrating knowledge feature quantities estimated by a knowledge level estimation model holding an identification pattern indicating a knowledge level.
  4.  前記知識レベル推定モデルは、学習用の音声データおよび前記音声データのテキストデータに対する知識特徴量ラベルと、教師データである知識ラベルとから学習された知識レベルを示す識別パターンを保持する
     請求項3記載の会話分析システム。
    The knowledge level estimation model holds an identification pattern indicating a knowledge level learned from knowledge feature amount labels for learning speech data and text data of the speech data, and knowledge labels as teacher data. Conversation analysis system.
  5.  音声データおよび前記音声データのテキストデータから、同一話者による発話検出区間が連続している発話区間を求める発話区間算出手段を更に備え、
     前記言語特徴量抽出手段は、前記発話区間に基づいて言語特徴量を抽出し、
     前記会話特徴量抽出手段は、前記発話区間に基づいて会話特徴量を抽出する
     請求項1から請求項4のうちのいずれか1項に記載の会話分析システム。
    Further comprising speech section calculation means for obtaining a speech section in which speech detection sections by the same speaker are continuous from speech data and text data of the speech data;
    The language feature extraction means extracts a language feature based on the utterance interval;
    The conversation analysis system according to any one of claims 1 to 4, wherein the conversation feature amount extraction unit extracts a conversation feature amount based on the utterance section.
  6.  前記発話区間算出手段は、発話の主導権に基づいて発話を分類した分類結果を出力し、
     前記言語特徴量抽出手段は、前記分類結果に基づいて言語特徴量を抽出し、
     前記会話特徴量抽出手段は、前記分類結果に基づいて会話特徴量を抽出する
     請求項5記載の会話分析システム。
    The utterance interval calculation means outputs a classification result obtained by classifying utterances based on utterance initiative,
    The language feature amount extraction unit extracts a language feature amount based on the classification result,
    The conversation analysis system according to claim 5, wherein the conversation feature amount extraction unit extracts a conversation feature amount based on the classification result.
  7.  前記知識特徴量推定手段は、少なくとも1つの知識特徴量を言語特徴量と会話特徴量に基づいて推定する
     請求項1から請求項6のうちのいずれか1項に記載の会話分析システム。
    The conversation analysis system according to any one of claims 1 to 6, wherein the knowledge feature amount estimation unit estimates at least one knowledge feature amount based on a language feature amount and a conversation feature amount.
  8.  音声データおよび前記音声データのテキストデータから話者間の会話状態に関する特徴量である会話特徴量を抽出し、
     前記テキストデータに含まれる単語に関する特徴量である言語特徴量を抽出し、
     抽出された前記会話特徴量および前記言語特徴量と、知識特徴量を示す識別パターンを保持する知識特徴量推定モデルとから知識特徴量を推定し、
     推定された前記知識特徴量を統合して前記話者の知識レベルを推定する
     ことを特徴とする会話分析方法。
    Extracting a conversation feature amount that is a feature amount related to a conversation state between speakers from voice data and text data of the voice data,
    Extracting a linguistic feature quantity which is a feature quantity related to a word included in the text data;
    Estimating the knowledge feature amount from the extracted conversation feature amount and the language feature amount, and a knowledge feature amount estimation model holding an identification pattern indicating the knowledge feature amount;
    A conversation analysis method, wherein the knowledge level of the speaker is estimated by integrating the estimated knowledge feature quantities.
  9.  コンピュータに、
     音声データおよび前記音声データのテキストデータから話者間の会話状態に関する特徴量である会話特徴量を抽出する会話特徴量抽出処理、
     前記テキストデータに含まれる単語に関する特徴量である言語特徴量を抽出する言語特徴量抽出処理、
     抽出された前記会話特徴量および前記言語特徴量と、知識特徴量を示す識別パターンを保持する知識特徴量推定モデルとから知識特徴量を推定する知識特徴量推定処理、および 推定された前記知識特徴量を統合して前記話者の知識レベルを推定する知識レベル推定処理
     を実行させるための会話分析プログラムが記録された記憶媒体。
    On the computer,
    A conversation feature amount extraction process for extracting a conversation feature amount that is a feature amount relating to a conversation state between speakers from voice data and text data of the voice data;
    A linguistic feature amount extraction process for extracting a linguistic feature amount that is a feature amount related to a word included in the text data;
    Knowledge feature amount estimation processing for estimating a knowledge feature amount from the extracted conversation feature amount, the language feature amount, and a knowledge feature amount estimation model holding an identification pattern indicating the knowledge feature amount, and the estimated knowledge feature A storage medium on which a conversation analysis program for executing a knowledge level estimation process for estimating the knowledge level of the speaker by integrating the quantities is recorded.
PCT/JP2015/003523 2014-07-16 2015-07-13 Conversation analysis system, conversation analysis method, and storage medium wherein conversation analysis program is recorded WO2016009634A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2016534111A JPWO2016009634A1 (en) 2014-07-16 2015-07-13 Conversation analysis system, conversation analysis method, and storage medium on which conversation analysis program is recorded

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2014-145873 2014-07-16
JP2014145873 2014-07-16

Publications (1)

Publication Number Publication Date
WO2016009634A1 true WO2016009634A1 (en) 2016-01-21

Family

ID=55078142

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/003523 WO2016009634A1 (en) 2014-07-16 2015-07-13 Conversation analysis system, conversation analysis method, and storage medium wherein conversation analysis program is recorded

Country Status (2)

Country Link
JP (1) JPWO2016009634A1 (en)
WO (1) WO2016009634A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106710588A (en) * 2016-12-20 2017-05-24 科大讯飞股份有限公司 Voice data sentence type identification method and device and system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013167765A (en) * 2012-02-15 2013-08-29 Nippon Telegr & Teleph Corp <Ntt> Knowledge amount estimation information generating apparatus, and knowledge amount estimating apparatus, method and program

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013167765A (en) * 2012-02-15 2013-08-29 Nippon Telegr & Teleph Corp <Ntt> Knowledge amount estimation information generating apparatus, and knowledge amount estimating apparatus, method and program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KAZUNORI KOMATANI ET AL.: "User Modeling for Adaptive Guidance Generation in Spoken Dialogue Systems", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS D-2, vol. J87-D-2, no. 10, October 2004 (2004-10-01), pages 1921 - 1928 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106710588A (en) * 2016-12-20 2017-05-24 科大讯飞股份有限公司 Voice data sentence type identification method and device and system

Also Published As

Publication number Publication date
JPWO2016009634A1 (en) 2017-04-27

Similar Documents

Publication Publication Date Title
US10720164B2 (en) System and method of diarization and labeling of audio data
Polzehl et al. Anger recognition in speech using acoustic and linguistic cues
JP6857581B2 (en) Growth interactive device
US9368116B2 (en) Speaker separation in diarization
US20180137109A1 (en) Methodology for automatic multilingual speech recognition
US8494853B1 (en) Methods and systems for providing speech recognition systems based on speech recordings logs
JP6154155B2 (en) Spoken dialogue system using prominence
US20060080098A1 (en) Apparatus and method for speech processing using paralinguistic information in vector form
JP6440967B2 (en) End-of-sentence estimation apparatus, method and program thereof
CN111159364B (en) Dialogue system, dialogue device, dialogue method, and storage medium
US20190244611A1 (en) System and method for automatic filtering of test utterance mismatches in automatic speech recognition systems
Kopparapu Non-linguistic analysis of call center conversations
US11270691B2 (en) Voice interaction system, its processing method, and program therefor
KR20210130024A (en) Dialogue system and method of controlling the same
EP1398758B1 (en) Method and apparatus for generating decision tree questions for speech processing
JP2020064370A (en) Sentence symbol insertion device and method thereof
López-Cózar et al. Enhancement of emotion detection in spoken dialogue systems by combining several information sources
US20110224985A1 (en) Model adaptation device, method thereof, and program thereof
WO2016009634A1 (en) Conversation analysis system, conversation analysis method, and storage medium wherein conversation analysis program is recorded
Casale et al. Analysis of robustness of attributes selection applied to speech emotion recognition
JP6546070B2 (en) Acoustic model learning device, speech recognition device, acoustic model learning method, speech recognition method, and program
JP2015102914A (en) Method for learning incomprehensible sentence determination model, and method, apparatus and program for determining incomprehensible sentence
Ike Inequity in Popular Voice Recognition Systems Regarding African Accents
JP2020064630A (en) Sentence symbol insertion device and method thereof
CN117813599A (en) Method and system for training classifier used in speech recognition auxiliary system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15821378

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016534111

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15821378

Country of ref document: EP

Kind code of ref document: A1