WO2016009634A1

WO2016009634A1 - Conversation analysis system, conversation analysis method, and storage medium wherein conversation analysis program is recorded

Info

Publication number: WO2016009634A1
Application number: PCT/JP2015/003523
Authority: WO
Inventors: 祐北出; 祥史大西
Original assignee: 日本電気株式会社
Priority date: 2014-07-16
Filing date: 2015-07-13
Publication date: 2016-01-21
Also published as: JPWO2016009634A1

Abstract

Provided is a conversation analysis system capable of robustly estimating a speaker's knowledge level even with input of a broken sentence different from a written sentence. The conversation analysis system comprises: a conversation feature amount extraction means (11) for extracting, from speech data and text data of the speech data, a conversation feature amount pertaining to a conversation state between speakers; a language feature amount extraction means (12) for extracting a language feature amount pertaining to a word included in the text data; a knowledge feature amount estimation means (13) for estimating a knowledge feature amount from the extracted conversation feature amount and language feature amount and a knowledge feature amount estimation model retaining an identification pattern representing a knowledge feature amount; and a knowledge level estimation means (14) for estimating the speaker's knowledge level by integrating the estimated knowledge feature amounts.

Description

Conversation analysis system, conversation analysis method, and storage medium on which conversation analysis program is recorded

The present invention relates to a conversation analysis system, a conversation analysis method, and a conversation analysis program for estimating a speaker's knowledge level from conversation.

Knowledge level corresponds to the result of classifying whether the target speaker is familiar with the predetermined theme or the peripheral information of the predetermined theme into two or more classes, or the result of quantification. The predetermined theme is, for example, a conversation subject itself.

Patent Document 1 describes an example of a conversation analysis device. As illustrated in FIG. 7, the knowledge amount estimation information generation device described in Patent Literature 1 includes an utterance string extraction unit 1, an utterance intention determination unit 2, a feature amount extraction unit 3, an estimation information generation unit 4, and the like. , Knowledge amount label 4a, knowledge amount estimation unit 5, and estimated information storage unit 5a.

The knowledge amount estimation information generation apparatus configured as shown in FIG. 7 is mainly divided into a learning unit and an estimation unit, and operates as follows.

When the speech recognition result 7 (learning call) for the dialogue between the user and the operator is input, the learning unit of the knowledge amount estimation information generation device extracts text data composed of the utterance sequence by the utterance sequence extraction unit 1 To do. Next, the learning unit determines whether “question”, “explanation”, and “conversation” are obtained from the text data of the utterance sequence regarding the dialogue between the inquirer and the respondent extracted by the utterance sequence extraction unit 1 in the utterance intention determination unit 2. Each utterance representing the utterance intention is determined. After the determination, the learning unit associates the utterance intention with the target utterance.

Next, the learning unit calculates the number of different words of the user related to the appearance word (hereinafter referred to as “used vocabulary feature amount”) by the feature amount extraction unit 3. The learning unit calculates the vocabulary feature amount used, and calculates the number of appearances of the utterance intention of each of “question”, “explanation”, and “consideration” determined by the utterance intention determination unit 2.

Also, the learning unit extracts an utterance including a question word as an interrogative question sentence from utterances indicating the utterance intention of the “question”, and calculates the number of appearances. Note that the feature quantities related to the number of appearances of “question”, “explanation”, “conformity”, and “question word question sentence” are collectively referred to as an intention feature quantity.

Next, the learning unit uses the estimated information generation unit 4 as the learning data using the intention feature amount, the vocabulary feature amount calculated by the feature amount extraction unit 3, and the knowledge amount label 4a that is correct information related to the knowledge amount. The estimation information used to estimate the knowledge amount for the input text (speech recognition result 7) is generated.

Next, the estimation unit performs the same processing as the processing performed by the learning unit in the utterance string extraction unit 1, the utterance intention determination unit 2, and the feature amount extraction unit 3 with respect to the input speech recognition result 6. The used vocabulary feature value and the dialogue feature value are obtained. Next, the estimation unit estimates the knowledge amount from the use vocabulary feature amount and the dialogue feature amount calculated by the knowledge amount estimation unit 5 and the estimation information generated by the learning unit and stored in the estimation information storage unit 5a. To do.

JP 2013-167765 A

However, the knowledge amount estimation information generating device described in Patent Literature 1 can estimate the user's knowledge amount when the input text to be evaluated is not a written word, that is, not a sentence conforming to the correct grammar. Have difficulty. Sentences that do not conform to correct grammar are, for example, broken sentences such as colloquial expressions and sentences that include recognition errors.

A general conversation analysis apparatus calculates a used vocabulary feature amount and an intention feature amount from a speech recognition result to be evaluated, and estimates a knowledge amount. The used vocabulary feature amount is a feature amount related to the appearance word.
In addition, the intention feature amount is the number of utterances in each classification when each utterance is classified into “question”, “explanation”, “conflict” and “question word question sentence” by language processing such as pattern matching. .

That is, both the used vocabulary feature amount and the intention feature amount are calculated based on language information. In the above-described calculation of various feature amounts, it is a precondition that almost correct sentences are input.

The linguistic information used for calculating the various feature values described above is an appearance word, a word string, or a character string (hereinafter referred to as a symbol) itself. Also, additional information such as notation, part of speech and meaning of the symbol, or statistical information based on the symbol such as the appearance frequency required for each symbol is used.

Therefore, the accuracy of estimation of the amount of knowledge of the user by the conversation analysis device largely depends on the grammatical correctness of the utterance content or the accuracy of the recognition result when the utterance is recognized.

When the input speech recognition result is a result of the conversation speech to be evaluated being spoken according to the correct grammar and correctly recognized, the conversation analysis device can estimate the knowledge amount of the user. However, it is difficult for the conversation analysis device to calculate the correct vocabulary feature quantity and intention feature quantity when the spoken expression is included in the conversation to be evaluated or the speech recognition result contains many recognition errors. It is. Unless the correct vocabulary feature amount and intention feature amount can be calculated, it is difficult for the conversation analysis device to correctly estimate the user's knowledge amount.

That is, the problem with a general conversation analyzer is that it is difficult to calculate the correct vocabulary feature amount and intention feature amount when a broken sentence different from the written word is input. It is a difficult point to estimate.

In order to solve the above problem, it is conceivable to use not only the conversation contents between the speakers but also the conversation state between the speakers to estimate the knowledge level of the speakers. The reason for this is that features such as speech timing and speech speed can be extracted, and the extracted features are used to estimate the knowledge level. This is because it can.

The method described in Non-Patent Document 2 uses a feature amount extracted from a conversation state between speakers for estimation of a knowledge level. However, when the method described in Non-Patent Document 2 is used, the correct knowledge level of the speaker can be estimated, for example, when speech data of a frank speech is input or when the speech recognition rate is low. difficult. The reason is that the method described in Non-Patent Document 2 does not use knowledge feature amounts respectively obtained from different feature amounts such as language feature amounts and dialogue feature amounts for estimation of the knowledge level.

When the knowledge feature amount obtained from each of the different feature amounts as described above is not used, the conversation analysis device, for example, knows the knowledge feature obtained from the different feature amount (for example, the dialogue feature amount not affected by the language feature amount). The estimation result of the quantity cannot be supplemented with the erroneous estimation result of the knowledge feature quantity based on the linguistic feature quantity. For example, when a broken sentence different from a written word is input, it is difficult for the conversation analysis apparatus to correctly estimate the speaker's knowledge level.

Therefore, the present invention has been made to solve the above-described problems. That is, the present invention mainly provides a conversation analysis system, a conversation analysis method, and a conversation analysis program that can robustly estimate a speaker's knowledge level even when a broken sentence different from written words is input. One of the purposes.

A conversation analysis system according to an aspect of the present invention includes a conversation feature amount extraction unit that extracts a conversation feature amount that is a feature amount related to a conversation state between speakers from voice data and text data of the voice data, and is included in the text data. Knowledge features from linguistic feature extraction means for extracting linguistic features, which are features related to words, knowledge features estimation models that hold the extracted conversation features and language features, and identification patterns indicating knowledge features Knowledge feature quantity estimating means for estimating the quantity, and knowledge level estimating means for estimating the knowledge level of the speaker by integrating the estimated knowledge feature quantities.

A conversation analysis method according to an aspect of the present invention is a feature amount related to a word included in text data by extracting a speech feature amount that is a feature amount related to a conversation state between speakers from voice data and text data of the voice data. Extract language features, estimate the knowledge features from the extracted conversation features and language features, and the knowledge feature estimation model that holds the identification pattern indicating the knowledge features. Integrated to estimate the speaker's knowledge level.

A conversation analysis program according to an aspect of the present invention is a computer program that extracts speech feature data, which is a feature value related to a conversation state between speakers, from speech data and text data of speech data. Knowledge from language feature extraction processing that extracts language features, which are features related to the contained words, from the extracted conversation features and language features, and knowledge feature estimation models that hold identification patterns indicating knowledge features Knowledge feature amount estimation processing for estimating the feature amount and knowledge level estimation processing for estimating the knowledge level of the speaker by integrating the estimated knowledge feature amount are executed.

The object of the present invention is also achieved by a computer-readable storage medium in which the conversation analysis program is stored.

According to the present invention, it is possible to robustly estimate the speaker's knowledge level even when a broken sentence different from the written word is input.

FIG. 1 is a block diagram illustrating a configuration example of a learning system of the conversation analysis apparatus according to the embodiment of the present invention. FIG. 2 is an explanatory diagram showing the concept of knowledge features in the embodiment of the present invention. FIG. 3 is a block diagram illustrating a configuration example of an estimation system of the conversation analysis apparatus according to the embodiment of the present invention. FIG. 4 is a flowchart showing the operation of the conversation analysis apparatus 100. FIG. 5 is an explanatory diagram showing the evaluation result of the evaluation experiment by the conversation analysis device and the evaluation result of the evaluation experiment by another method according to the embodiment of the present invention. FIG. 6 is a block diagram showing an outline of the conversation analysis system in the embodiment of the present invention. FIG. 7 is a block diagram illustrating a configuration of the knowledge amount estimation information generation device described in Patent Document 1. FIG. 8 is an explanatory diagram illustrating a hardware configuration capable of realizing the conversation analysis system or the conversation analysis apparatus according to the embodiment of the present invention.

[Description of configuration]
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration example of a learning system of the conversation analysis apparatus according to the embodiment of the present invention.

In the present embodiment, the input voice data is interactive voice data that can be reproduced in stereophonic (hereinafter referred to as “stereo”), in which different speaker voices are input to the left and right channels, respectively. Hereinafter, the configuration and operation of the conversation analysis apparatus according to the present embodiment will be described by taking as an example the case of estimating the speaker's knowledge level.

Note that the input audio data may be data reproducible by a method other than stereo. Further, the input voice data may be voice data of dialogue by three or more people. Even when voice data of conversations by three or more people is input, if the voice data of each speaker is separated using speaker recognition technology or the like, the conversation analysis apparatus according to the present embodiment can provide the speaker's knowledge. The level can be estimated.

The learning system of the conversation analysis apparatus 100 shown in FIG. 1 includes an utterance section calculation unit 101 and a feature amount extraction unit 102. The learning system of the conversation analysis apparatus 100 includes knowledge feature quantity estimation model storage means 103, knowledge level estimation model storage means 105, knowledge feature quantity estimation model creation means 110, and knowledge level estimation model creation means 111. .

The utterance interval calculation means 101 has a function of calculating an utterance interval from input voice data and text data related to the audio data, and outputting the calculated utterance interval. The text data related to such voice data may include, for example, text data of an utterance word obtained by voice recognition of the voice data.

The utterance section is a section in which utterance detection sections by the same speaker are continuous and grouped. The utterance section is a unit for calculating the language feature value or the dialogue feature value.

Also, the utterance detection section is a section where humans speak continuously without breathing. The utterance detection section is automatically calculated by, for example, preprocessing for voice recognition.

Note that the utterance detection section is not an automatically detected section, but may be a section with a margin before and after the automatically detected section. Further, the utterance detection section may not be a section where humans are speaking, but may be a section determined simply by a fixed time length.

When the input text data describes an utterance detection section provided at the time of speech recognition or information related to the speaker (speaker information), the utterance section calculation means 101 describes the detected utterance detection. The utterance interval may be calculated from the interval and speaker information.

Furthermore, the utterance interval calculation means 101 may classify the utterance based on the calculated utterance interval. When the utterance section calculation unit 101 classifies the utterance, the feature amount extraction unit 102 obtains a language feature amount or a dialogue feature amount for each classified class. The obtained language feature amount or dialogue feature amount is used for estimation of the knowledge feature amount as will be described later.

An example of the calculation method of the utterance section by the utterance section calculation means 101 will be described below. The utterance section calculation means 101 arranges utterances by two speakers in time series using the speech section information and the speaker information included in the input text data. If there is no utterance detection section or speaker information in the input text data, the utterance section calculation means 101 may acquire the utterance detection section or speaker information by analyzing the input voice data.

Next, the utterance interval calculation means 101 compares the utterance detection interval of one speaker (main speaker) with the utterance detection interval of the other speaker (interactive speaker), and interacts with the utterance detection interval of the main speaker. An utterance in which the person's utterance detection section is completely included is detected. As an example, this corresponds to the interaction by the interlocutor inserted while the main speaker is speaking. The utterance section calculation means 101 performs processing for detecting utterances that are completely included in utterances by both speakers.

Further, the utterance interval calculation means 101 combines the continuous utterance detection intervals of the same speaker among the remaining utterance detection intervals excluding the completely included utterance interval to make one interval. That is, the combined one section becomes the speech section.

When the utterance section is obtained by the above processing, a meaning break that is not clear in nature is clarified in the utterance. The feature amount extraction unit 102 can calculate a more accurate feature amount by clarifying the meaning separation.

Note that the utterance interval calculation means 101 can also use the utterance detection interval (utterance start time, utterance end time) and speaker information obtained from the input text data as the utterance interval. When the utterance detection section and the speaker information are used as the utterance section, the above processing by the utterance section calculation unit 101 becomes unnecessary.

Furthermore, the utterance section calculation means 101 may classify the utterances according to a predetermined criterion. As an example of an utterance classification method, there is a method based on utterance control of the utterance.

The utterance section calculation means 101 has utterance with the initiative of utterance (hereinafter referred to as “leading utterance”) and no initiative for each utterance in the utterance section calculated as described above based on the initiative of conversation. There are two types of utterances (hereinafter referred to as passive utterances).

As an example of the determination method of the presence / absence of initiative, there is a method of using the length of the utterance section and the type of word appearing in the utterance section. For each utterance in the utterance interval calculated as described above, the utterance interval calculation means 101 classifies, for example, an utterance whose utterance interval is shorter than a threshold as a passive utterance.

In addition, the utterance section calculation unit 101 has a low number of phonemes (for example, “Yes” or “No”) that are easily misrecognized under the influence of acoustic conditions or recording conditions, and the reliability of the recognition result is low. An utterance section including a word may be classified as a passive utterance.

After classifying the target utterance as a passive utterance as described above, the utterance interval calculation means 101 classifies utterances other than the utterance classified as passive utterance as the leading utterance. The feature quantity extraction unit 102 regards the classification result by the utterance section calculation unit 101 as a classification class, and obtains a language feature quantity or a dialogue feature quantity for each class.

The reason for classifying utterances into leading utterances and passive utterances is that speaker characteristics can be made more obvious by classification. For example, the feature length (for example, average or variance) of the utterance length, which is one of the dialogue feature values, varies greatly depending on the ratio of passive utterances. Therefore, if the utterance is focused on the leading utterance and the feature amount related to the utterance length is obtained, the feature amount for the utterance spoken independently can be obtained compared with the case where the utterance is not classified, and the feature of the speaker can be understood Become.

Note that the feature quantity extraction unit 102 may use the utterance classification result before and after the target utterance as the classification class, in addition to regarding the utterance classification result as the classification class as described above. In addition, the feature amount extraction unit 102 may use a combination of the classification result of the target utterance and the classification result of the utterance before and after the target utterance as the classification class.

The feature amount extraction unit 102 includes a language feature amount extraction unit 102a and a dialogue feature amount extraction unit 102b. Text data, voice data, utterance section detection results, utterance classification results, and the like are input to the feature amount extraction unit 102. The feature amount extraction unit 102 outputs a language feature amount and a dialogue feature amount based on these input data.

The language feature quantity extraction unit 102a has a function of extracting a language feature quantity calculated from input text data. Specifically, the language feature amount is a word appearance frequency included in input text data, a statistical value based on the word appearance frequency, or the like. When the text data is text data of a speech recognition result, the extracted linguistic feature quantity is the reliability of the recognition result given to each recognized word.

Note that the language feature quantity extraction unit 102a may obtain the feature quantity using the class to which the recognized word belongs. In addition, the language feature quantity extraction unit 102a replaces the appearing word with another symbol by performing notation correction for correcting the notation fluctuation, synonym expansion, or the like, so that the appearing word is not replaced with the replaced symbol. A feature amount may be obtained.

The dialogue feature quantity extraction unit 102b has a function of extracting a dialogue feature quantity that is a feature quantity relating to a dialogue state between speakers, which is mainly calculated from voice data. The dialogue feature amount is a feature amount that can be acquired when two or more people have a conversation. The dialogue feature amount is calculated based on the utterance section.

That is, the dialogue feature quantity extraction unit 102b can obtain, for example, a speaker's speaking speed, utterance length, and number of companions by analyzing the utterance section sandwiched between the utterance sections of the conversation person. The dialogue feature quantity extraction unit 102b also analyzes the dialogue feature quantity by analyzing the utterance section between the beginning of the data and the utterance section of the talker and the utterance section between the end of the data and the utterance section of the talker. Can be requested.

Also, the dialogue feature quantity extraction unit 102b can calculate a pause length value to be described later when the utterance section of each speaker is determined. As described above, the dialogue feature value extraction unit 102b can obtain various dialogue feature values based on the utterance section.

The following describes specific examples of dialogue feature amounts, such as speech speed, pause length, number of peers, and speech length.

Talk speed is the speed at which a speaker speaks in one unit of dialogue. Speaking speed is expressed by the number of mora per unit time. With respect to a certain utterance section, the speech speed is obtained by, for example, dividing the number of mora of the recognized word by the length of the utterance section. A mora is a syllable that forms a single rhythm.

In the present embodiment, the pause length means the length of “between” when a speaker change occurs.
The pause length is calculated by obtaining the difference between the utterance end time of the utterance section immediately before the target utterance section and the utterance start time of the target utterance section.

The utterance length is the length of one utterance section. That is, the utterance length is the length of time from the utterance start time to the utterance end time of one utterance section.

The number of solicitations is the number of times that the interlocutor strikes. The partner has a property that the dialogue person shows understanding of the content of the other party's utterance or prompts the continuation of the other party's utterance.

The dialogue feature amount extraction unit 102b may perform the recognition of the mutual agreement by pattern matching based on the recognition result or based on the utterance length. Further, the dialogue feature amount extraction unit 102b may perform the recognition of the conflict using the information on the utterance inclusion relation which is an example of the utterance classification result described above.

The knowledge feature quantity estimation model creation means 110 has a function of generating a knowledge feature quantity estimation model. The knowledge feature quantity estimation model creation means 110 includes learning data including the speech data for learning in the feature quantity extraction means 102, the language feature quantity and the dialogue feature quantity extracted from the text data, the feature quantity and the speech for learning. A knowledge feature quantity estimation model is generated using the knowledge feature quantity label 112 which is teacher data representing the knowledge feature quantity for the data. The knowledge feature quantity estimation model creation means 110 sends the created knowledge feature quantity estimation model to the knowledge feature quantity estimation model storage means 103.

The knowledge feature quantity estimation model storage means 103 has a function of storing the knowledge feature quantity estimation model created by the knowledge feature quantity estimation model creation means 110. FIG. 2 is an explanatory diagram showing the concept of knowledge features in the present embodiment.

Knowledge feature is an element that determines the speaker's level of knowledge based on language features and dialogue features extracted from the language used by the speaker and the reaction of the speaker.

The knowledge feature quantity estimation model is a model for estimating knowledge feature quantity with respect to input data. The knowledge feature amount estimation model includes learning data including learning speech data, language feature amounts and dialogue feature amounts extracted from text data in the feature amount extraction unit 102, and knowledge feature amounts for learning speech data. It is generated using the knowledge feature quantity label 112 to represent.

As described above, the knowledge feature quantity estimation model is a model generated by learning an identification pattern using, as input data, learning data including a set of a language feature quantity, a dialogue feature quantity, and a knowledge feature quantity label 112. It is. For learning of the identification pattern, “Support Vector Machine (SVM)” (Non-patent Document 1), which is a known technique, is used.

The knowledge level estimation model creation unit 111 has a function of generating a knowledge level estimation model. The knowledge level estimation model creating means 111 generates a knowledge level estimation model obtained by learning a knowledge level identification pattern, using the knowledge feature quantity label 112 and the knowledge label 113 which is knowledge level teacher data.

The knowledge level estimation model creating unit 111 generates a knowledge level estimation model in which the knowledge level identification pattern is learned using the result output by the knowledge feature amount estimation unit 104 (to be described later) with respect to the learning data and the knowledge label 113. To do. Then, the knowledge level estimation model creating unit 111 sends the generated knowledge level estimation model to the knowledge level estimation model storage unit 105. The knowledge level estimation model creation unit 111 may use the knowledge feature amount label 112 instead of the output result of the knowledge feature amount estimation unit 104 with respect to the learning data.

The knowledge level estimation model storage unit 105 has a function of storing the knowledge level estimation model created by the knowledge level estimation model creation unit 111.

The knowledge level estimation model is a model for estimating a knowledge level for input data. The knowledge level estimation model is obtained by learning an identification pattern using the result of knowledge feature amount estimation unit 104 described later output from learning data, or using knowledge feature amount label 112 and knowledge label 113 for learning data. Generated.

As described above, the knowledge level estimation model learns an identification pattern by using as input data the knowledge feature amount label 112 or the knowledge feature amount estimation result for the learning data and the knowledge label 113 as a set. This is a model generated by Similar to the knowledge feature amount estimation model, SVM or the like is used for learning the identification pattern.

Next, the estimation system of the conversation analysis apparatus 100 will be described. FIG. 3 is a block diagram illustrating a configuration example of an estimation system of the conversation analysis apparatus according to the embodiment of the present invention.

The estimation system of the conversation analysis apparatus 100 shown in FIG. 3 includes an utterance section calculation unit 101, a feature amount extraction unit 102, a knowledge feature amount estimation model storage unit 103, a knowledge feature amount estimation unit 104, and a knowledge level estimation model storage. Means 105 and knowledge level estimation means 106 are included. Hereinafter, the knowledge feature amount estimation means 104 and the knowledge level estimation means 106 that are not included in the learning system but are included only in the estimation system will be described.

The knowledge feature quantity estimation means 104 has a function of estimating the knowledge feature quantity. The knowledge feature amount estimation unit 104 uses the language feature amount and the dialogue feature amount calculated by the feature amount extraction unit 102 and the knowledge feature amount estimation model stored in the knowledge feature amount estimation model storage unit 103 to input data. Each knowledge feature is estimated.

Knowledge feature quantity estimation means 104 digitizes the knowledge feature quantity into a discrete value such as “0” or “1” or a continuous value ranging from “0” to “1” and outputs the digitized value. The knowledge feature quantity estimation model storage unit 103 stores at least one knowledge feature quantity estimation model for one knowledge feature quantity estimated by the knowledge feature quantity estimation unit 104.

When the knowledge feature quantity estimation unit 104 estimates the knowledge feature quantity, the language feature quantity and the dialogue feature quantity obtained from the input data are compared with the knowledge feature quantity estimation model to determine whether or not the knowledge feature exists. Identify. In the knowledge feature identification processing, SVM or the like is used as in the identification pattern learning. When there are a plurality of knowledge features, the knowledge feature amount estimation unit 104 performs identification processing for each knowledge feature.

Further, the knowledge feature quantity estimation means 104 can determine the knowledge feature quantity by binary values such as “present”, “absent”, and “unknown”, for example, in addition to determining the knowledge feature quantity by binary “present” or “none”. It may be determined. Further, the knowledge feature quantity estimation means 104 may discriminate the knowledge feature quantity at a level higher than ternary values. When the knowledge feature quantity is discriminated at many levels, the knowledge feature quantity estimation unit 104 can output the knowledge feature quantity label 112 by configuring the above identification processing in multiple stages.

Further, the knowledge feature quantity estimation means 104 may output a continuous value in addition to outputting the discrete value as described above. When outputting a continuous value, the knowledge feature amount estimation unit 104 may use, for example, a score that is output together with the output result of the identification process.

The knowledge feature quantity estimation means 104 may adopt the number of knowledge feature quantities to be estimated, which is obtained by an experiment using development data and has an optimum knowledge level estimation accuracy. Further, the number of estimated knowledge feature amounts may be determined in advance manually.

Furthermore, it is assumed that when the knowledge label 113 is created, the reason why the creator has given the knowledge label 113 is described in addition to the knowledge label 113 assignment. In this case, the knowledge feature quantity estimation means 104 may determine the optimum number obtained by text analysis such as clustering of the described contents as the number of knowledge feature quantities to be estimated. When the optimum number is the number of knowledge feature amounts, the factor that led to the determination of the knowledge level corresponds to the knowledge feature.

The knowledge level estimation means 106 has a function of estimating the knowledge level. The knowledge level estimation means 106 estimates the knowledge level by integrating knowledge feature quantities.

The knowledge level estimation means 106 estimates the knowledge level using the knowledge feature quantity estimation result output from the knowledge feature quantity estimation means 104 and the knowledge level estimation model stored in the knowledge level estimation model storage means 105. The knowledge level estimation means 106 outputs the knowledge level estimation result from the knowledge level estimation model and the knowledge feature quantity estimation result by SVM or the like, as in the case of the identification pattern learning process.

The knowledge level estimation result output by the knowledge level estimation means 106 becomes the output result of the conversation analysis apparatus 100. The output result of the knowledge level may be two or more classes (discrete values) or a continuous value.

When the output result of the knowledge level is a discrete value, the knowledge level estimation means 106 may determine the knowledge level as a binary value of “present” or “absent” or a level of three or more values. When discriminating the knowledge level at many levels, the knowledge level estimating means 106 can output the knowledge level discriminated at many levels by, for example, configuring the identification processing in multiple stages.

When the output result of the knowledge level is not a discrete value but a continuous value, the knowledge level estimation means 106 uses, for example, a score output together with the output result of the identification process.

Further, the knowledge level estimating means 106 may use a majority method which is a known technique. When the majority method is used, the knowledge level estimation means 106 adopts the output result of each knowledge feature quantity as a discrete value, and adopts the output result of the knowledge feature quantity having the largest discrete value as the knowledge level output by the conversation analysis apparatus 100. . When the majority method is used, the knowledge level estimation model is not necessary.

Note that the conversation analysis apparatus 100 of the present embodiment is realized by a CPU (Central Processing Unit) that executes processing according to a program, for example. The conversation analysis apparatus 100 may be realized by hardware.

Further, the utterance section calculation unit 101, the feature amount extraction unit 102, the knowledge feature amount estimation unit 104, the knowledge level estimation unit 106, the knowledge feature amount estimation model creation unit 110, and the knowledge level estimation model creation unit 111, for example, according to program control This is realized by a CPU that executes processing.

Note that a hardware configuration capable of realizing the conversation analysis apparatus 100 will be described later.

Further, the knowledge feature quantity estimation model storage unit 103 and the knowledge level estimation model storage unit 105 are realized by, for example, a RAM (Random Access Memory).

[Description of operation]
Hereinafter, the operation of the conversation analysis apparatus 100 of the present embodiment will be described with reference to FIG. FIG. 4 is a flowchart showing the operation of knowledge level estimation processing by the conversation analysis apparatus 100.

The speech data and text data related to the speech data are input to the utterance section calculation means 101. The input text data is, for example, voice recognition results or transcription data.

After the input, the utterance interval calculation means 101 calculates an utterance interval as a unit for calculating the language feature amount and the dialogue feature amount based on the information about the utterance described in the text data. The information related to the utterance is, for example, an utterance detection section or speaker information (step S201).

In step S201, the utterance interval calculation unit 101 may classify the utterance based on the calculated utterance interval. Based on the utterance interval or the classification result calculated by the utterance interval calculation unit 101, the feature amount extraction unit 102 calculates a language feature amount and a dialogue feature amount.

Next, the language feature quantity extraction unit 102a calculates a language feature quantity such as word appearance frequency and word reliability related to the speech recognition result from the text data (step S202).

Next, the dialog feature quantity extraction unit 102b uses the input voice data, text data, and the utterance period information calculated in step S201, and the dialog feature quantity such as the speech speed, pause length, utterance length, or number of companions. Is calculated (step S203). Note that the process in step S203 may be executed prior to the process in step S202. Two processes may be executed in parallel.

Next, the knowledge feature quantity estimation unit 104 uses the language feature quantity calculated in step S202, the dialogue feature quantity calculated in step S203, and the knowledge feature quantity estimation model stored in the knowledge feature quantity estimation model storage unit 103. The knowledge feature amount is estimated (step S204).

Next, the knowledge level estimation means 106 estimates the knowledge level using the knowledge feature quantity estimation result by the knowledge feature quantity estimation means 104 and the knowledge level estimation model stored in the knowledge level estimation model storage means 105 (step S205). ). After outputting the knowledge level estimation result, the conversation analysis apparatus 100 ends the process.

The conversation analysis apparatus according to the present embodiment includes dialogue feature data extraction means for extracting feature values relating to words included in text data, using dialogue voice data and text data of the dialogue voice data as input data, and a speaker from the voice data. A feature amount extracting means including a dialog feature amount extracting means for calculating a feature amount relating to the dialogue state between the two. In addition, the conversation analysis device estimates a knowledge level using a knowledge feature amount estimation unit that estimates a knowledge feature amount calculated from a language feature amount and a dialogue feature amount, and a knowledge feature amount estimation result in the knowledge feature amount estimation unit. Knowledge level estimating means. In addition, the conversation analysis apparatus may include a knowledge feature quantity estimation model storage unit that stores a knowledge feature quantity estimation model that holds an identification pattern indicating a knowledge feature quantity used for knowledge level estimation.

The conversation analysis apparatus according to the present embodiment can robustly estimate the user's knowledge level even when a broken sentence different from the written word is input. The reason is that when the conversation analyzer estimates the knowledge level, it is not affected by the accuracy of the speech recognition results and the collapse of the text, which are not included in the text data, and the dialogue such as the timing of speech and the speed of speech. This is because feature quantities are used.

Further, in the conversation analysis device, the knowledge feature amount estimation unit estimates a knowledge feature amount obtained by a feature amount that is different from the language feature amount and the dialogue feature amount. Thereby, the conversation analyzer can reduce the influence on knowledge level estimation even when, for example, voice data of a frank speech is input or when the recognition rate is low. The reason (that is, the reason why the influence on the knowledge level estimation is reduced) is that the estimation result of other knowledge features obtained from the interactive features not affected by the language features is the language feature. This is because an erroneous estimation result of the knowledge feature amount based on it can be complemented.
[Evaluation experiment]

Hereinafter, an example of an evaluation experiment for the conversation analysis apparatus according to the present embodiment will be described with reference to FIG. The evaluation experiment in the following description is an experiment for estimating a customer's knowledge level in a contact center call. A contact center call refers to a telephone conversation that has been made to a contact center that accepts inquiries and consultations regarding products or services. Note that the contents shown in FIG. 5 are numerical results based on the actual items performed.

In the evaluation experiment, the input data was the stereo voice in which the operator's voice was recorded in one channel and the customer's voice was recorded in the other channel, and the voice recognition result of the stereo voice. Based on the input data, language features and dialogue features were extracted by the above method, and the knowledge level was estimated.

Also, the knowledge label, which is correct data used for the evaluation, was given two values, “high knowledge level” and “low knowledge level” for each call unit based on the subjective evaluation by hand. 100 files of correct data created in this way were prepared and an evaluation experiment was performed. The breakdown of 100 files was 46 files with “high knowledge level” and 54 files with “low knowledge level”.

10-fold cross-validation was carried out when learning the knowledge feature estimation model, learning the knowledge level estimation model, and estimating the knowledge level. In the 10-fold cross validation, the data was divided into 10 groups, of which 9 groups were used as learning data, and the remaining 1 group was used as evaluation data. Then, a test was performed on a combination of 10 patterns of learning data and evaluation data created by changing one group to be evaluation data.

In this evaluation experiment, the user describes the reason for judging the knowledge level, and the knowledge feature is defined based on the written reason for judgment. Specifically, four knowledge level judgment factors (for example, “technical terms” and “conversation fluency”) were found by analyzing the written judgment reasons, and the found judgment factors were used as knowledge features.

The knowledge feature amount label that becomes teacher data was generated by clustering the data based on the judgment factors. In the generated knowledge feature amount label, for example, the knowledge feature amount of “technical term” that is one of the knowledge features indicates whether or not the technical term is included in the target learning data. That is, in the knowledge feature amount of “technical term”, whether the corresponding speaker uses the technical term is expressed by “0 (no technical term)” or “1 (with technical term)”.

Similarly, the knowledge feature quantity of “conversation fluency” which is one of the knowledge features is “0 (not fluent)” or “1 (fluency) whether or not the conversation in the target learning data is fluent. Is).

Then, a knowledge feature amount estimation model is generated from the language feature amount and dialogue feature amount extracted from the learning data and the knowledge feature amount label which is teacher data. Next, as in the case of the learning data, a language feature amount and a dialogue feature amount are obtained for the evaluation data, and four knowledge feature amounts are estimated using the generated knowledge feature amount estimation model.

The generated knowledge features include factors representing linguistic features such as technical terms and factors representing interactive features related to the flow of conversation. Of the four knowledge feature quantities, one knowledge feature quantity is estimated from only the language feature quantity. Another knowledge feature amount is estimated only from the dialogue feature amount.

The remaining two knowledge feature quantities are estimated from combinations of language feature quantities and dialogue feature quantities. In the estimation of the remaining two knowledge feature quantities, the weights of the specific feature quantities of the language feature quantity and the dialogue feature quantity are different.

Then, the knowledge level was estimated by integrating the output results of the four knowledge features estimated by the knowledge feature estimation model generated as described above.

As an evaluation index in the experiment, F values were obtained for each experimental pattern of “knowledge level high” and “knowledge level low”. The F value is calculated using the following equation.

F value = (2 × recall rate × relevance rate) / (recall rate + relevance rate) Equation (1)

It should be noted that the recall and precision shown in Equation (1) are calculated using the following equations when “knowledge level is high”.

Reproducibility = (number that can be correctly estimated as “high knowledge level”) / (number of correct answers for “high knowledge level”) (2)
Relevance rate = (number that can be correctly estimated as “high knowledge level”) / (number that is estimated as “high knowledge level”) (3)

Similarly, the recall and precision shown in equation (1) are calculated using the following equations when the knowledge level is low.

Reproducibility = (number that can be correctly estimated as “knowledge level low”) / (number of correct answers for “knowledge level low”) (4)
Relevance rate = (number that can be correctly estimated as “low knowledge level”) / (number that is estimated as “low knowledge level”) (5)

In this evaluation experiment, the method described in Patent Document 1 above, the method of directly estimating the knowledge level from the language feature amount and the dialogue feature amount without using the knowledge feature amount, and the method in the present embodiment were compared. Note that in the method of directly estimating the knowledge level without using the knowledge feature amount, the same feature amount as the language feature amount and the dialogue feature amount used in the method in the present embodiment is used.

The results of the evaluation experiment are shown in FIG. FIG. 5 is an explanatory diagram showing the evaluation result of the evaluation experiment by the conversation analysis device of this embodiment and the evaluation result of the evaluation experiment by another method. FIG. 5A shows an evaluation result by the method described in Patent Document 1 (“related method” shown in FIG. 5A) and the method in this embodiment (“proposed method” shown in FIG. 5A). It is explanatory drawing which compared the evaluation result by. FIG. 5B shows an evaluation result obtained by a technique (“no knowledge feature” shown in FIG. 5B) that directly estimates the knowledge level from the language feature and the dialogue feature without using the knowledge feature. It is explanatory drawing which compared the evaluation result by the method ("proposed method" shown in FIG.5 (b)) in this embodiment.

Fig. 5 (a) shows two groups of evaluation results. That is, the left group in FIG. 5A estimates the evaluation result in the evaluation experiment for estimating the knowledge level from the data of “knowledge level high”, and the right group estimates the knowledge level from the data of “knowledge level low”. Each evaluation result in the evaluation experiment is shown.

In addition, each group of evaluation results shown in FIG. 5A is composed of two types of data. That is, the left bar graph in FIG. 5A represents the evaluation result by the “related method”, and the right bar graph represents the evaluation result by the “proposed method”. FIG. 5B is also configured in the same manner as FIG.

As shown in FIG. 5 (a), when the “proposed method” is used, in both the “knowledge level high” and “knowledge level low” experimental patterns, the knowledge level estimation accuracy is higher than that of the “related method”. It became high. As shown in FIG. 5B, when the “proposed method” is used, the knowledge level estimation accuracy is higher in both experimental patterns than in the case of “no knowledge feature amount”. As described above, in the knowledge level estimation for the conversation content, the technique using the conversation analysis apparatus in this embodiment is more effective than other techniques.

Next, the outline of the present invention will be described. FIG. 6 is a block diagram showing an outline of the conversation analysis system in the embodiment of the present invention. The conversation analysis system 10 according to the embodiment of the present invention extracts a conversation feature quantity that extracts a conversation feature quantity (for example, a dialogue feature quantity) that is a feature quantity related to a conversation state between speakers from voice data and text data of the voice data. Means 11 (for example, dialogue feature amount extraction means 102b) is provided. The conversation analysis system 10 further includes a language feature quantity extraction unit 12 (for example, a language feature quantity extraction unit 102a) that extracts a language feature quantity that is a feature quantity related to a word included in text data. Further, the conversation analysis system 10 has knowledge feature quantity estimation means 13 (13) for estimating a knowledge feature quantity from the extracted conversation feature quantity and language feature quantity, and a knowledge feature quantity estimation model holding an identification pattern indicating the knowledge feature quantity. For example, knowledge feature amount estimation means 104) is provided. The conversation analysis system 10 further includes knowledge level estimation means 14 (for example, knowledge level estimation means 106) that estimates the knowledge level of the speaker by integrating the estimated knowledge feature quantities.

With such a configuration, the conversation analysis system can robustly estimate the speaker's knowledge level even when a broken sentence different from the written word is input.

The knowledge feature amount estimation model includes a language feature amount and a conversation feature amount calculated from learning speech data and text data of the speech data, and a knowledge feature amount label that is teacher data (for example, the knowledge feature amount label 112). An identification pattern indicating the knowledge feature amount learned from the above may be held.

With such a configuration, the conversation analysis system can use a knowledge feature quantity estimation model suitable for inputting language feature quantities and conversation feature quantities in advance.

Further, the knowledge level estimation means 14 may estimate the knowledge level by integrating the knowledge feature quantities estimated by the knowledge level estimation model holding the identification pattern indicating the knowledge level.

With such a configuration, the conversation analysis system can estimate the knowledge level based on the knowledge level estimation model.

The knowledge level estimation model is an identification pattern indicating a knowledge level learned from a knowledge feature amount label for speech data for learning and text data of the speech data and a knowledge label (for example, knowledge label 113) which is teacher data. May be held.

With such a configuration, the conversation analysis system can use a knowledge level estimation model that is suitable for inputting knowledge feature amount labels in advance.

In addition, the conversation analysis system 10 includes speech section calculation means (for example, speech section calculation means 101) for obtaining speech sections in which speech detection sections by the same speaker are continuous from speech data and text data of speech data. Also good. The language feature amount extraction unit 12 may extract a language feature amount based on the utterance section, and the conversation feature amount extraction unit 11 may extract the conversation feature amount based on the utterance section.

With such a configuration, the conversation analysis system can obtain an utterance section from input data, and can extract a language feature amount and a conversation feature amount based on the utterance section.

The utterance section calculating means outputs a classification result obtained by classifying the utterances based on the initiative of the utterance, and the language feature quantity extracting means 12 extracts the language feature quantity based on the classification result, and the conversation feature quantity extracting means. 11 may extract a conversation feature amount based on the classification result.

With such a configuration, the conversation analysis system can classify utterances and extract language feature quantities and conversation feature quantities based on utterance classification results.

The conversation analysis system 10 includes a knowledge feature quantity estimation model storage unit (for example, a knowledge feature quantity estimation model storage unit 103) that stores a knowledge feature quantity estimation model, and a knowledge level estimation model storage unit that stores a knowledge level estimation model. (For example, a knowledge level estimation model storage unit 105).

Further, the knowledge feature quantity estimation means 13 may estimate at least one knowledge feature quantity based on the language feature quantity and the conversation feature quantity.

Further, the knowledge feature quantity estimation means 13 may estimate at least one knowledge feature quantity based only on the language feature quantity.

Further, the knowledge feature quantity estimation means 13 may estimate at least one knowledge feature quantity based only on the dialogue feature quantity.

[Configuration of hardware and software program (computer program)]
A specific configuration (a hardware configuration and a software program configuration) capable of realizing the above-described embodiment of the present invention will be described below.

The components constituting the conversation analysis apparatus 100 or the conversation analysis system 10 described above can be realized by any realization means that can implement means for providing the functions of the constituent elements. For example, in the conversation analysis apparatus 100 illustrated in FIG. 1 and FIG. 3, each of the components to which reference numerals 101 to 111 are assigned is a physical or logical device in which means for providing the function of the component is implemented. It may be realized as a typical part (component of conversation analysis apparatus 100) or a combination thereof. Similarly, in the conversation analysis system 10 illustrated in FIG. 6, each constituent element to which reference numerals 11 to 14 are assigned is a physical or logical component in which means for providing the function of the constituent element is implemented. It may be realized as a component (component of the conversation analysis system 10) or a combination thereof. In this case, the physical component can be realized as, for example, an electronic circuit or a computer device described later. The logical component can be realized as a software program executed in an electronic circuit or a computer device, for example. In this case, it may be understood that the means for providing the function of each component described above is realized as a component (unit) of an apparatus or a system in which the function is mounted.

In the following description, the conversation analysis apparatus 100 and the conversation analysis system 10 described above are collectively referred to as a “conversation analysis system”. Each component of the conversation analysis system is simply referred to as “component of the conversation analysis system”.

The conversation analysis system described in the above embodiment may be configured by one or a plurality of dedicated hardware devices. In that case, each component shown in each of the above figures (FIGS. 1, 3, and 6) includes hardware (an integrated circuit or a storage device on which processing logic is mounted) that is partially or fully integrated. It may be realized using.

For example, when the conversation analysis system is realized by dedicated hardware, the components of the conversation analysis system are implemented using an integrated circuit (such as SoC (System on a Chip)) that can provide each function. May be. In this case, the data held by the components of the conversation analysis system is stored in, for example, a RAM (Random Access Memory) area or flash memory area integrated as a SoC, or a storage device (magnetic disk or the like) connected to the SoC. It may be stored. In this case, a well-known communication bus may be adopted as a communication line for connecting each component of the conversation analysis system. Further, the communication line connecting each component is not limited to bus connection, and each component may be connected by peer-to-peer. When the conversation analysis system is constituted by a plurality of hardware devices, the respective hardware devices may be communicably connected by any communication means (wired, wireless, or a combination thereof).

Further, the above-described conversation analysis system or its components may be configured by general-purpose hardware as exemplified in FIG. 8 and various software programs (computer programs) executed by the hardware. In this case, the conversation analysis system may be configured by any number of general-purpose hardware devices and software programs. That is, an individual hardware device may be assigned to each component constituting the conversation analysis system, and a plurality of components may be realized using one hardware device.

8 is an arithmetic processing device such as a general-purpose CPU (Central Processing Unit) or a microprocessor. The arithmetic device 801 may read various software programs stored in a non-volatile storage device 803, which will be described later, into the storage device 802, and execute processing according to the software programs. For example, the components of the conversation analysis system in the above embodiment may be realized as a software program executed by the arithmetic device 801.

The storage device 802 is a memory device such as a RAM that can be referred to from the arithmetic device 801, and stores software programs, various data, and the like. Note that the storage device 802 may be a volatile memory device.

The nonvolatile storage device 803 is a nonvolatile storage device such as a magnetic disk drive or a semiconductor storage device using flash memory. The nonvolatile storage device 803 can store various software programs and data.

The drive device 804 is a device that processes reading and writing of data with respect to a storage medium 805 described later, for example.

The storage medium 805 is an arbitrary storage medium capable of recording data, such as an optical disk, a magneto-optical disk, and a semiconductor flash memory.

The conversation analysis system (or its constituent elements) according to the present invention described with the above-described embodiment as an example is, for example, a software program that can realize the functions described in the above-described embodiment with respect to the hardware device illustrated in FIG. It may be realized by supplying a program. More specifically, for example, the present invention may be realized by causing the arithmetic device 801 to execute a software program supplied to such a hardware device. In this case, an operating system running on the hardware device, database management software, network software, middleware such as a virtual environment platform, etc. may execute part of each process.

In the embodiment described above, each means (or system component (unit) capable of realizing the means) shown in each of the above figures is a function (process) of a software program executed by the hardware described above. It can be realized as a software module that is a unit. That is, each component in the conversation analysis apparatus 100 (the utterance interval calculation unit 101, the feature amount extraction unit 102, the knowledge feature amount estimation unit 104, the knowledge level estimation unit 106, the knowledge feature amount estimation model creation unit 110, and the knowledge level estimation model creation. The means 111 and the like may be realized as a software module in which those functions are implemented. However, the classification of the software modules shown in the drawings is a configuration for convenience of explanation, and various configurations can be assumed for the implementation.

For example, when the units illustrated in FIGS. 1, 3, and 6 are implemented as software modules, these software modules are stored in the nonvolatile storage device 803, and the arithmetic device 801 executes each process. These software modules may be read out to the storage device 802.

In addition, these software modules may be configured to transmit various data to each other by an appropriate method such as shared memory or interprocess communication. With such a configuration, these software modules can be connected so as to communicate with each other.

Further, the software program may be recorded on the storage medium 805. In this case, the software program may be configured to be stored in the non-volatile storage device 803 through the drive device 804 as appropriate at the shipping stage or operation stage of the conversation analysis system.

In the above case, the method of supplying various software programs to the hardware is installed in the apparatus using an appropriate jig in the manufacturing stage before shipment or the maintenance stage after shipment. A method may be adopted. As a method for supplying various software programs, a general procedure may be adopted at present, such as a method of downloading from the outside via a communication line such as the Internet.

In such a case, the present invention can be understood to be constituted by a code constituting the software program or a computer-readable storage medium in which the code is recorded. In this case, the storage medium is not limited to a medium independent of the hardware device, but includes a storage medium in which a software program transmitted via the Internet or the like is downloaded and stored or temporarily stored.

In addition, the conversation analysis system described above may be configured by a virtual environment in which the hardware device illustrated in FIG. 8 is virtualized and various software programs (computer programs) executed in the virtual environment. . In this case, the components of the hardware device illustrated in FIG. 8 are provided as virtual devices in the virtual environment. In this case as well, the present invention can be realized with the same configuration as when the hardware device illustrated in FIG. 8 is configured as a physical device.

The present invention analyzes a user's tendency based on a knowledge level using, for example, a voice database containing conversations between various contact points of a customer such as a contact center, that is, a customer and a business person such as a store clerk or an operator. It can be applied to a conversation analysis device or the like. The present invention can also be applied to applications such as a program for realizing the conversation analysis apparatus using a computer. The present invention can also be applied to a conversation analysis apparatus that can extract linguistic features and interactive features from conversation words and conversation exchanges such as user interests and preferences rather than knowledge level. .

The present invention has been described above using the above-described embodiment as an exemplary example. However, the present invention is not limited to the above-described embodiment. That is, the present invention can apply various modes that can be understood by those skilled in the art within the scope of the present invention.

This application claims priority based on Japanese Patent Application No. 2014-145873 filed on July 16, 2014, the entire disclosure of which is incorporated herein.

DESCRIPTION OF SYMBOLS 1 Speech sequence extraction part 2 Speech intention discrimination | determination part 3 Feature quantity extraction part 4 Estimated information generation part 4a Knowledge quantity label 5 Knowledge quantity estimation part 5a Estimated information storage part 6 Speech recognition result 7 Speech recognition result 10 Conversation analysis system 11 Extraction means 12, 102a Language feature quantity extraction means 13, 104 Knowledge feature quantity estimation means 14, 106 Knowledge level estimation means 100 Conversation analyzer 101 Utterance section calculation means 102 Feature quantity extraction means 102b Dialogue feature quantity extraction means 103 Knowledge feature quantity estimation Model storage means 105 Knowledge level estimation model storage means 110 Knowledge feature quantity estimation model creation means 111 Knowledge level estimation model creation means 112 Knowledge feature quantity label 113 Knowledge label 801 Arithmetic device 802 Storage device 803 Non-volatile storage device 804 Drive device 805 Storage medium

Claims

A conversation feature quantity extraction means for extracting a conversation feature quantity which is a feature quantity related to a conversation state between speakers from voice data and text data of the voice data;
A linguistic feature amount extracting means for extracting a linguistic feature amount that is a feature amount relating to a word included in the text data;
Knowledge feature quantity estimation means for estimating a knowledge feature quantity from the extracted conversation feature quantity and the language feature quantity, and a knowledge feature quantity estimation model holding an identification pattern indicating the knowledge feature quantity;
A conversation analysis system comprising: knowledge level estimation means for estimating the knowledge level of the speaker by integrating the estimated knowledge feature quantities.
The knowledge feature amount estimation model includes a knowledge feature amount learned from learning speech data and language feature amount and conversation feature amount calculated from text data of the speech data, and a knowledge feature amount label which is teacher data. The conversation analysis system according to claim 1, wherein an identification pattern to be displayed is held.
The conversation analysis system according to claim 1, wherein the knowledge level estimation unit estimates a knowledge level by integrating knowledge feature quantities estimated by a knowledge level estimation model holding an identification pattern indicating a knowledge level.
The knowledge level estimation model holds an identification pattern indicating a knowledge level learned from knowledge feature amount labels for learning speech data and text data of the speech data, and knowledge labels as teacher data. Conversation analysis system.
Further comprising speech section calculation means for obtaining a speech section in which speech detection sections by the same speaker are continuous from speech data and text data of the speech data;
The language feature extraction means extracts a language feature based on the utterance interval;
The conversation analysis system according to any one of claims 1 to 4, wherein the conversation feature amount extraction unit extracts a conversation feature amount based on the utterance section.
The utterance interval calculation means outputs a classification result obtained by classifying utterances based on utterance initiative,
The language feature amount extraction unit extracts a language feature amount based on the classification result,
The conversation analysis system according to claim 5, wherein the conversation feature amount extraction unit extracts a conversation feature amount based on the classification result.
The conversation analysis system according to any one of claims 1 to 6, wherein the knowledge feature amount estimation unit estimates at least one knowledge feature amount based on a language feature amount and a conversation feature amount.
Extracting a conversation feature amount that is a feature amount related to a conversation state between speakers from voice data and text data of the voice data,
Extracting a linguistic feature quantity which is a feature quantity related to a word included in the text data;
Estimating the knowledge feature amount from the extracted conversation feature amount and the language feature amount, and a knowledge feature amount estimation model holding an identification pattern indicating the knowledge feature amount;
A conversation analysis method, wherein the knowledge level of the speaker is estimated by integrating the estimated knowledge feature quantities.
On the computer,
A conversation feature amount extraction process for extracting a conversation feature amount that is a feature amount relating to a conversation state between speakers from voice data and text data of the voice data;
A linguistic feature amount extraction process for extracting a linguistic feature amount that is a feature amount related to a word included in the text data;
Knowledge feature amount estimation processing for estimating a knowledge feature amount from the extracted conversation feature amount, the language feature amount, and a knowledge feature amount estimation model holding an identification pattern indicating the knowledge feature amount, and the estimated knowledge feature A storage medium on which a conversation analysis program for executing a knowledge level estimation process for estimating the knowledge level of the speaker by integrating the quantities is recorded.