WO2021002136A1 - 発話解析装置、発話解析方法及びプログラム - Google Patents

発話解析装置、発話解析方法及びプログラム Download PDF

Info

Publication number
WO2021002136A1
WO2021002136A1 PCT/JP2020/021809 JP2020021809W WO2021002136A1 WO 2021002136 A1 WO2021002136 A1 WO 2021002136A1 JP 2020021809 W JP2020021809 W JP 2020021809W WO 2021002136 A1 WO2021002136 A1 WO 2021002136A1
Authority
WO
WIPO (PCT)
Prior art keywords
utterance
data
category
likelihood
period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2020/021809
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
夏樹 佐伯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Panasonic Intellectual Property Management Co Ltd
Original Assignee
Panasonic Intellectual Property Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Panasonic Intellectual Property Management Co Ltd filed Critical Panasonic Intellectual Property Management Co Ltd
Priority to CN202080048836.2A priority Critical patent/CN114072786A/zh
Priority to JP2021529929A priority patent/JP7531164B2/ja
Publication of WO2021002136A1 publication Critical patent/WO2021002136A1/ja
Priority to US17/559,033 priority patent/US12300226B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • This disclosure relates to an utterance analysis device, an utterance analysis method, and a program that visualize changes in topics in a speaker's utterance.
  • Patent Document 1 describes a system in which the content of a discourse referring to a talk script by an operator such as a call center is converted into text by voice recognition processing, and information on the frequency of use of the talk script is output.
  • the technique described in Patent Document 1 can solve a problem that the quality of the response record varies depending on the skill of the operator, and can automatically create a response record that ensures unifiedness and conciseness.
  • the present disclosure provides an utterance analysis device, an utterance analysis method, and a program capable of visualizing the transition of a topic in a speaker's utterance.
  • the utterance analysis device of the present disclosure is an utterance analysis device that visualizes changes in the utterance of the speaker in the first period, and is acquired by the acquisition unit and the acquisition unit that acquire the utterance data of the speaker in chronological order.
  • An arithmetic unit that analyzes changes in utterances and visualization of changes in utterances obtained by the arithmetic unit using a plurality of first likelihoods that specify the possibility that the utterance data falls into each category.
  • a display processing unit for displaying data is provided, and the calculation unit integrates the first likelihoods of a plurality of utterance data in the second period shorter than the first period, obtains the second likelihood for each category, and displays the second likelihood.
  • the visualization data displayed by the processing unit represents the change in utterance by the change in the second likelihood of each category in a plurality of different second periods.
  • the utterance analysis device According to the utterance analysis device, the utterance analysis method, and the program of the present disclosure, it is possible to visualize the transition of the topic in the utterance of the speaker.
  • the utterance analysis device visualizes changes in the transition of topics in the utterance of the speaker during a certain period. Specifically, the utterance analysis device identifies and visualizes how the topic in the utterance during a certain period changes. For example, when a speaker speaks, the topic generally changes over time. The utterance analysis device of the present disclosure can acquire the utterance of the speaker, identify the topic of the utterance, and visualize the change in the topic.
  • the utterance analysis device visualizes changes in the transition of topics regarding the utterances of the speaker during a certain period.
  • the utterance analysis device 1 has a microphone as an input device, acquires utterance data that the speaker 20 utters to the customer 21, visualizes changes in the transition of topics, and outputs the data. Display on a display or the like that is a device. As a result, for example, even the user 22 who is not present at the time of the utterance of the speaker 20 can evaluate the utterance of the speaker 20 by looking at the visualized information.
  • the speaker 20 is assumed to be a member of a housing manufacturer that sells custom-built homes (in the following description, "XYZ Home Company” or "XYZ Home” as necessary).
  • the utterance of the speaker 20 includes an explanation to the customer 21 regarding the company's custom-built house, and necessary explanations regarding the sale and purchase of the custom-built house.
  • the example shown in FIG. 1 is an example, and as will be described later, the utterance analysis device 1 does not necessarily have to be installed when the speaker 20 and the customer 21 have a conversation. Further, the user 22 may also be able to access the utterance analysis device 1 from the outside via the network.
  • "utterance” is assumed to be the act of speaking by the speaker 20 and the voice generated by speaking. It is assumed that the "utterance data” is voice data generated by the speaker 20 speaking. Further, the “utterance data” may be text data in which the voice data generated by the speaker 20 speaking is converted into text by voice recognition. Further, the “utterance data” may be data including these "voice data” and "text data”.
  • the “topic” is explained as the content of the utterance of the speaker 20. Further, the “topic category” or “category” will be described as a classification that identifies the topic of the speaker 20. Although a specific example will be described later, the utterance analysis device 1 specifies which category the utterance topic of the speaker 20 is among a plurality of preset topic categories.
  • “Likelihood” is the likelihood used in the likelihood function as a numerical value representing the likelihood. This likelihood is used as a numerical value indicating the likelihood that the target utterance corresponds to each topic category.
  • the person who speaks is referred to as "speaker 20", and the person who interacts with the speaker 20 is referred to as "customer 21".
  • the person who uses the data in which the transition of the topic regarding the utterance of the speaker 20 by the utterance analysis device 1 is visualized is referred to as "user 22".
  • the user 22 may be the speaker 20 or a person other than the speaker 20.
  • the speaker 20 himself can be the user 22 in order to confirm his own past utterances.
  • the boss of the speaker 20 can be the user 22.
  • a colleague or a subordinate of the speaker 20 may be the user 22 in order to refer to the utterance method of the speaker 20.
  • the utterance analysis device 1 includes, for example, a control unit 11, a storage unit 12, an input unit 13, an output unit 14, and a communication unit 15, and these are connected by a bus 16. It is an information processing device.
  • the control unit 11 is a controller that controls the entire utterance analysis device 1.
  • the control unit 11 realizes processing as the acquisition unit 111, the calculation unit 112, the generation unit 113, and the display processing unit 114 by reading and executing the program P stored in the storage unit 12.
  • the control unit 11 is not limited to one that realizes a predetermined function by the cooperation of hardware and software, and may be a hardware circuit specially designed to realize a predetermined function. That is, the control unit 11 can be realized by various processors such as a CPU, MPU, GPU, FPGA, DSP, and ASIC.
  • the storage unit 12 is a storage medium for storing various information.
  • the storage unit 12 is realized by, for example, a RAM, a ROM, a flash memory, an SSD (Solid State Device), a hard disk, another storage device, or a combination thereof as appropriate.
  • the storage unit 12 stores information used in the identification information, various information acquired for assigning the identification information, and the like.
  • the storage unit 12 stores the utterance data 121, the change data 122, and the program P.
  • the input unit 13 is an input means such as an operation button, a keyboard, a mouse, a touch panel, and a microphone used for operation and data input.
  • the output unit 14 is an output means such as a display or a speaker used for outputting processing results and data.
  • the utterance analysis device 1 acquires utterance data with the microphone which is the input unit 13, generates visualization data from the utterance data acquired by the control unit 11, and outputs the obtained visualization data to the display or the like which is the output unit 14. To do.
  • the communication unit 15 is an interface circuit (module) for enabling data communication with an external device (not shown).
  • the utterance analysis device 1 may be realized by one computer or by a combination of a plurality of computers connected via a network. Further, for example, all or a part of the data stored in the storage unit 12 is stored in an external storage medium connected via the network 40, and the speech analysis device 1 stores the data stored in the external storage medium. It may be configured to be used. Specifically, the utterance data 121 and the change data 122 may be stored in an external storage medium.
  • the acquisition unit 111 acquires the utterance data of the speaker 20 via the microphone which is the input unit 13. Further, the acquisition unit 111 numbers the acquired utterance data of the speaker 20 in the order of the acquired time series, and stores the acquired speaker 20 as the utterance data 121 in the storage unit 12.
  • the utterance analysis device 1 visualizes the utterance of the speaker 20, it is sufficient if at least the utterance data of the speaker 20 can be acquired, and here, the acquisition and visualization of the utterance data of the customer 21 will not be mentioned. It shall be.
  • the calculation unit 112 obtains the likelihood, which is a value that specifies the possibility that the topic of each utterance data 121 falls into a predetermined category. Further, the calculation unit 112 stores the likelihood of each category in association with the utterance data 121. In the following, the "likelihood" for each category will be described as “category likelihood” as needed.
  • the utterance data 121 can include text data obtained by converting the voice data of each utterance data 121 into text by voice recognition processing, together with the voice data or instead of the voice data.
  • the voice recognition process may be executed in the utterance analysis device 1 or may be executed in an external device.
  • the utterance data 121 includes "number” which is identification information given to each utterance data 121 in chronological order, and "text data” generated from the voice data in this period. , It is the data which associates with the "category likelihood" of each category obtained about the utterance data of this period.
  • the utterance data 121 includes the category likelihood obtained for each category of "XYZ home", "floor plan”, "finance” and "other".
  • "Room layout” is a category whose topic is related to the floor plan of a house.
  • "XYZ Home” is a category when the topic is related to XYZ Home.
  • "Finance” is a category whose topic is related to finance.
  • “Other” is a category when the topic is not included in any of "XYZ home", "floor plan” or "finance”.
  • FIG. 4 is an example of a graph showing the category likelihood obtained for a plurality of preset topic categories with respect to the plurality of utterance data 121 in the calculation unit 112.
  • the arithmetic unit 112 can obtain each likelihood using a pre-learned classification model. Specifically, the classification class corresponds to the topic category described here.
  • the horizontal axis is the “number” attached to the utterance data 121, and indicates the time series of 122 of the utterance data.
  • the vertical axis is the "category likelihood" obtained by the calculation unit 112.
  • the calculation unit 112 can specify the category in a wider range than each utterance data 121, and can easily grasp the change of the topic.
  • the calculation unit 112 identifies the topic category using the obtained category likelihood.
  • the arithmetic unit 112 specifies the topic category of the predetermined time t, the first period immediately before the predetermined time t and is the target of visualization of the change in the utterance of the speaker 20 in the utterance analysis device 1.
  • the likelihood of a plurality of utterance data 121 of the second period T2 shorter than T1 the second likelihood of identifying the possibility that the utterance of the second period T2 falls into a predetermined category is obtained, and the change data 122 is used. It is stored in the storage unit 12, and the topic category of the second period T2 is specified.
  • the "first period” will be referred to as the "utterance period”
  • the “second period” will be referred to as the “time window”
  • the “second likelihood” will be referred to as the "integrated likelihood” as necessary.
  • the "time window” can be set by the number of utterance data 121 or the elapsed time.
  • the utterance period T1 is a period corresponding to the total number of utterance data 121, 277
  • the time window T2 is an example of a period corresponding to the number of utterance data 121, 50.
  • the category likelihood of the utterance data 121 in the period of the numbers "40" to "89” corresponding to the time window T2 is used.
  • the calculation unit 112 sets the time window T2 as the target range for each category, and obtains the integration likelihood for each category obtained for each utterance data 121 included in this target range. Further, the calculation unit 112 can specify the category having the largest value among the integrated likelihoods obtained for each category as the category of the topic at the time of a certain utterance number.
  • the calculation unit 112 can obtain the integrated likelihood by using the "freshness weight value w1" set according to the freshness of the topic.
  • the freshness weight value w1 sets the weight of the category likelihood of the new utterance data 121 (“w11” in FIG. 5A) with respect to the predetermined time t as the old utterance data 121. It is a value larger than the weight of the category likelihood (“w12” in FIG. 5A). For example, in the scope of a period, newer utterances are likely to be in the topic category for that period, or topics are likely to be transitioning, and older utterances are in the topic category for that period. Unlikely.
  • the accuracy of specifying the topic can be improved by using the freshness weight value w1 in the calculation unit 112 to specify the topic in the target period.
  • the category likelihood specified from the 80th utterance data to the 89th utterance data is The weight is set larger than the category likelihood specified from the 40th utterance data to the 49th utterance data 121.
  • the category likelihood specified from the utterance data 121 from 1 minute before the predetermined time t to the predetermined time t is predetermined from 5 minutes before the predetermined time t.
  • the weight is set larger than the category likelihood specified from the utterance data 121 up to 4 minutes before the hour.
  • the calculation unit 112 can obtain the integrated likelihood by using the "frequency weight value w2" set according to the frequency of the topic.
  • the frequency weight value w2 is the weight of the category likelihood of the category with the highest frequency of occurrence, as shown in FIG. 5B, with respect to the appearance frequency of the category with the highest category likelihood of the utterance data 121 included in the target range. (“W21” in FIG. 5B) is a value that is larger than the weight of the likelihood of the category with low appearance frequency (“w22” in FIG. 5B).
  • w21 in FIG. 5B
  • w22 is a value that is larger than the weight of the likelihood of the category with low appearance frequency
  • the accuracy of specifying the topic can be improved by using the frequency weight value w2 to specify the topic in the target period.
  • the time window T2 includes 50 utterance data 121
  • the likelihood of the topic that appears 20 times is set to be higher than the likelihood of the topic that appears only twice, and the category likelihood is set. The degree increases.
  • the calculation unit 112 calculates the frequency of appearance of each category (“w21” and “w22” in FIG. 5B) in the range of the time window T2 as described above.
  • the calculation unit 112 can obtain the integration likelihood Lc2 for each category by the following equation (1).
  • Lc2 (i) ⁇ (Lc (j) x w1) x w2 ...
  • each value used in the equation (1) is specified as follows.
  • the calculation unit 112 can normalize the obtained integrated likelihood Lc2 of each category. Further, the calculation unit 112 can also add the normalized data to the change data 122 of the storage unit 12. As a result, the probability that each category in a certain utterance number is a topic can be expressed as shown in FIG.
  • the calculation unit 112 can use, for example, a method of obtaining the probability by softmax for normalization. By plotting the probability values in each category at each utterance number using the probability Pc (i) thus obtained, a graph showing the transition of topics can be obtained as shown in FIG. This makes it possible to visualize the transition state of the topic as a smooth transition, like the transition of the topic in an actual conversation.
  • the calculation unit 112 determines the utterance data from the start of the utterance period T1 to the predetermined time t.
  • a range including 121 is set as a target range, and the integrated likelihood is calculated in the same manner with the category likelihood of the utterance data 121 in this target range.
  • the period t1 from the start of the utterance period T1 to the predetermined time t is a period shorter than the time window T2 and corresponding to the number "40" of the utterance data 121.
  • the integrated likelihood is obtained by using the likelihood from the start of the utterance period T1 to the predetermined time t.
  • weighting may be performed so that the integrated value of the likelihood becomes small.
  • the generation unit 113 generates visualization data that visualizes changes in topics related to the utterance data 121 by using the specific result of the calculation unit 112.
  • the generation unit 113 can generate visualization data that visualizes changes in the topic of the plurality of time windows T2.
  • the generation unit 113 may generate visualization data including a graph in which the integrated likelihood of each classification is displayed in time series.
  • visualization data for displaying the display screen W1 as shown in FIG. 8 is generated.
  • An example display screen W1 shown in FIG. 8 includes a display unit B11 that displays a graph showing a change in the integrated likelihood, and a display unit B12 that displays a time-series change obtained from the graph displayed by the display unit B11. ..
  • the display screen W1 shown in FIG. 8 allows the user 22 to see that the talk of the speaker 20 has changed in the order of "floor plan", “finance", “others”, “XYZ home”, and "finance". I understand.
  • the acquisition unit 111 acquires the utterance data via the microphone which is the input unit 13 and assigns the utterance data in time series. It is stored in the storage unit 12 as utterance data 121 together with the number to be generated (S1).
  • the calculation unit 112 calculates the category likelihood for each category of each utterance data 121 stored in step S1, and stores it in the storage unit 12 in association with the utterance data 121 (S2).
  • the calculation unit 112 executes an analysis process for analyzing the topic category using the category likelihood included in each utterance data 121, using each likelihood calculated in step S2 (S3).
  • the calculation unit 112 selects the category to be processed (S11). For example, each category of "XYZ Home”, “Room Layout”, “Finance”, and “Other” is selected in order, and the subsequent processing is repeated for each category.
  • the calculation unit 112 initializes the value of i to 0 in order to specify the target range for which the integration likelihood is to be calculated for the category selected in step S11 (S12).
  • i is a value for specifying the number assigned to the utterance data 121, and by initializing i, the target range is set in order from the 0th position of the utterance data 121 for the selected category. Further, in the following, the number of utterance data 121 included in the target range is defined as “q”.
  • the calculation unit 112 sets the value of Lc (-1) to 0 (S13).
  • Lc (i) is the likelihood obtained from the utterance data 121 of the i for the category selected in S11, and since the utterance data 121 of the "-1" does not exist, Lc (-1) also exists. However, since it may be used in the process of step S17, it is set to "0" here.
  • the calculation unit 112 sets a target range for calculating the integration likelihood according to the value of i (S14). At this time, the calculation unit 112 sets a new number "j" from "1" within the target range in order from the utterance data 121 of the number "i". As described above in FIG. 4, in the example in which the utterance data 121 of 50 is the time window T2, the numbers 1 to 50 are used for “j”. In this case, the number q of the target range is "50".
  • the utterance data 121 of the 0th utterance is set to t at a predetermined time and the integrated likelihood is calculated.
  • Data 121 is the target range.
  • j is "1" for the utterance data 121 in which i is "0".
  • the number q of the target range is "1".
  • the calculation unit 112 targets the utterance data 121 of the 40th to 89th.
  • j is set so that "40" of i becomes “1” of j and "89” of i becomes "50” of j.
  • the calculation unit 112 sets the utterance data 121 of the 0th to 39th as the target range. Also in this case, for j, "0" of i becomes “1" of j. The number q of the target range is "40".
  • the calculation unit 112 initializes the value of j to 1 and the value of the temporary integrated likelihood Sc to 0 in order to calculate the integrated likelihood for the target range set in step S14 (S15). ..
  • j is a value that specifies the utterance data 121 within the target range.
  • the likelihood Lc (j) of each utterance data 121 included in the target range can be added to obtain the integrated likelihood Lc2 (i).
  • the temporary integrated likelihood Sc is a value used in the calculation process for obtaining the integrated likelihood Lc2 (i) in the target range.
  • the calculation unit 112 determines from the category likelihood Lc of each utterance data 121 in the target range set in step S14 whether or not the category determined by maximum likelihood estimation is “other” (S16). ). Specifically, the calculation unit 112 determines whether or not the category having the highest value among the category likelihoods of each category in the target range is “other”.
  • the calculation unit 112 integrates the target range managed by the utterance data 121 of the number “i-1" with respect to the selected category.
  • the likelihood Lc2 (i-1) is adopted as the integrated likelihood Lc2 (i) of the target range managed by the utterance data 121 of the number “i” (S17). If i is "0", "0" set in L (-1) in step S13 is used.
  • the calculation unit 112 sets the temporary integration likelihood Sc to the category likelihood Lc (j) of the utterance data 121 of the number "j". Is added with a value (Lc (j) ⁇ w1) weighted by the freshness weight value w1, and the obtained value is used as a new temporary integration likelihood Sc (S18).
  • the freshness weight value w1 may be calculated by j / q.
  • the calculation unit 112 increments the value of j (S19). After that, the calculation unit 112 determines whether or not j ⁇ q (S20).
  • the calculation unit 112 returns to the processing in step S18 and performs the processing in steps S18 to S20. repeat.
  • the calculation unit 112 obtains the maximum likelihood topic category frequency Nc of the target category in the target range (NO). S21).
  • the maximum likelihood topic category frequency Nc is the number of times in each utterance data 121 of the target range that the likelihood of the category selected as the target of processing in step S11 becomes the highest value. For example, when the number of utterance data 121 having the highest category likelihood Lc (j) in the target range is "20" when the processing is performed for the "floor plan", the maximum likelihood topic category frequency Nc Is "20".
  • the calculation unit 112 sets the temporary integrated likelihood Sc to the value (Sc ⁇ w2) weighted by the frequency weight value w2 as the integrated likelihood Lc2 (i) of the target range (S22).
  • the frequency weight value w2 may be calculated by Nc / q.
  • the calculation unit 112 obtains the integrated likelihood L2c (i)
  • the calculation unit 112 obtains the probability Pc (i) of the selected category regarding the target range by normalization (S23).
  • the calculation unit 112 increments the value of i (S24). As a result, the value of i is set to a value for specifying the next target range.
  • the calculation unit 112 determines whether or not it is the end timing (S25).
  • the end timing is a case where processing is performed for the entire range. For example, in the example of the category likelihood shown in FIG. 4, the utterance of the last number "276" in the time series is made for a series of utterance data 121. This is the case when the processing is completed up to the data 121.
  • the calculation unit 112 When it is not the end timing (NO in S25), since the processing has not been completed for all the utterance data 121 in the utterance period T1, the calculation unit 112 returns to the processing in step S14 and repeats the processing in steps S14 to S25.
  • step S11 If the processing is not completed for all categories (NO in S26), the calculation unit 112 returns to step S11, selects another category, and repeats the processing of steps S11 to S25 until all categories are completed. .. For example, when the category of "XYZ Home” is finished, “Room layout” is selected, then “Finance” is selected, and finally “Other” is selected to repeat the same process.
  • the calculation unit 112 uses the maximum likelihood Lc2 (i) for all the target ranges set in step S14.
  • the category of the topic is specified by maximum likelihood estimation (S27). Further, when the categories are specified (S27), the calculation unit 112 can calculate the integrated likelihood Lc2 and the probability Pc (i) of each category, and thus ends the analysis process (step S3 in FIG. 9).
  • the generation unit 113 generates visualization data for each category in the process of step S3 (S4).
  • the display processing unit 114 outputs the visualization data generated in step S4 to the output unit 14 of the display or the like (S5).
  • the utterance of the speaker 20 can be visualized in this way. Thereby, the evaluation of the utterance of the speaker 20 can be easily realized. Further, another speaker can easily refer to the utterance of the speaker 20 who is another person.
  • the acquisition unit 111 has been described as an example of acquiring utterance data at the timing when the speaker 20 speaks, but the present invention is not limited to this.
  • the acquisition unit 111 may acquire external voice data such as an IC recorder from a recording device and acquire the utterance data recorded at the timing of the utterance of the speaker 20 at a subsequent timing and use it.
  • the acquisition unit 111 may acquire and use the utterance sentence input as a text sentence like a chat.
  • the calculation unit 112 in the utterance analysis device 1 has described an example of calculating the “category likelihood”, but the present invention is not limited to this. Specifically, the utterance analysis device 1 may acquire and use the category likelihood calculated by an external arithmetic unit.
  • the utterance analysis device 1 may include a reception unit that accepts a period designated by the user 22.
  • the calculation unit 112 can receive the period as the reception unit via the input unit 13 or the communication unit 15, and can calculate the integrated likelihood using the period designated by the user 22 as the time window T2.
  • the speaker 20 or the user 22 who is a third party who analyzes the utterance of the speaker 20 may freely set the time window T2 according to the target utterance.
  • the target category is first selected in step S11, the target range is set in step S14 for each selected category, and the integration likelihood of each target range is set in order.
  • the integrated likelihood of all categories can be obtained as a result for each range.
  • a method may be used in which a target range is first set in step S14, categories are selected in order in step S111 within this target range, and the integrated likelihood of each category is obtained.
  • the calculation unit 112 determines whether or not the processing of steps S15 to S23 is completed for all categories (S127).
  • the generation unit 113 uses the utterance data 121 of the plurality of speakers 20 stored in the storage unit 12, and the visualization data generated from the utterance data 121 of the first speaker 20 and the first speaker 20. May generate comparison data comparing with visualization data generated from utterance data 121 of another second speaker.
  • FIG. 12 shows a display screen W2 including a display unit B21 for displaying the visualization data of the utterance data 121 of the first speaker 20 and a display unit B22 for displaying the visualization data of the utterance data 121 of the second speaker. This is an example. As shown in FIG. 12, since the visualization data for two people is displayed on the display screen W2 in a comparable manner, the long-time utterance data for two people can be confirmed by voice or with the scripted data.
  • the user 22 can compare the topics of each speaker at a glance without any trouble. For example, by displaying the visualization data for two people in parallel, the user 22 can easily compare what kind of explanation method, specifically, the flow of the story is effective.
  • the generation unit 113 may generate visualization data for displaying the display screen W3 including the display unit B23 for displaying the analysis result.
  • the generation unit 113 may generate visualization data including texts such as predetermined phrases in the utterance data.
  • FIG. 14 shows a display unit B41 for displaying a graph showing a change in the integrated likelihood obtained from the utterance data 121 of the speaker, and a display unit B42 for displaying phrases extracted from topics in the “floor plan” category.
  • a display screen W5 including a display unit B43 on which phrases extracted from topics in the "finance" category are displayed.
  • the wording text data included in the display unit B42 and the display unit B43 is the utterance data having a higher probability than other utterance data in the category from the voice data or the data in which the voice data is converted into text.
  • the user 22 refers to the visualization data of the speaker 20 who is another person, and uses the phrases used by the speaker 20 as a reference. , You can consider the wording you will use in the future. For example, by imitating the wording of the speaker 20 who is another person, it becomes possible to explain to the customer 21 in an easy-to-understand manner about a matter that cannot be explained.
  • the generation unit 113 arranges the data in which the utterance data 121 is converted into text in descending order of the likelihood obtained from the utterance data 121, and generates visualization data including a predetermined number (for example, 10) of the data having high likelihood. You may.
  • FIG. 15 shows a display unit B51 that displays a graph showing a change in the integrated likelihood obtained from the utterance data 121 of the speaker 20, and a display unit that displays the text of the utterance data 121 in descending order of the likelihood of a certain category.
  • the text displayed on the display unit B52 and the likelihood displayed on the display unit B53 are changed by switching the selection of the category desired to be displayed by the user. Therefore, the user can confirm the utterance data 121 having a high likelihood for the desired category.
  • the generation unit 113 may generate visualization data such that the word set as a keyword in advance is emphasized and displayed by changing the font, character size, color, and the like with other characters. As a result, the user 22 can explain to the customer in an easy-to-understand manner by imitating the wording of the speaker 20 who is another person.
  • the utterance analysis device of the present disclosure is an utterance analysis device that visualizes changes in the utterance of the speaker in the first period, and is an acquisition unit and an acquisition unit that acquire the utterance data of the speaker in chronological order.
  • the calculation unit that analyzes the change in utterance and the change in utterance obtained by the calculation unit It is equipped with a display processing unit that displays the visualized visualization data, and the calculation unit integrates the first likelihood of a plurality of utterance data in the second period shorter than the first period and sets the second likelihood for each category.
  • the visualized data obtained and displayed by the display processing unit represents the change in utterance by the change in the second likelihood of each category in a plurality of different second periods.
  • the calculation unit of (1) is based on the second likelihood of each category obtained by integrating the first likelihoods of the plurality of utterance data acquired in the second period immediately before the predetermined time.
  • the visualization data that specifies the category at a predetermined time, identifies each of a plurality of categories at a predetermined time that are continuously obtained in the time series, and displays it on the display processing unit shows the change between the multiple categories at the predetermined time in the time series. ,
  • the data may be visualized as a change in topic.
  • the calculation unit of (2) or (2) is obtained by integrating the first likelihood obtained from the utterance data included in the second period to obtain the second likelihood for each category.
  • the category with the largest value may be specified as the topic category of the second period.
  • the calculation unit of (3) may obtain the second likelihood by using the first weight value set to a larger value as the frequency of appearance in the second period increases.
  • the calculation unit of (3) or (4) may obtain the second likelihood by using the second weight value set to a larger value as it approaches a predetermined time.
  • the arithmetic unit shall perform the predetermined time from the start of the first period.
  • the second likelihood may be calculated using the utterance data of the period up to.
  • the utterance data for a certain long period can be used, and an appropriate change can be represented. ..
  • the utterance analysis devices of (1) to (6) include a reception unit that accepts a period specified by the user, and the calculation unit sets a second likelihood with the period accepted by the reception unit as the second period. You may ask.
  • the user can set the second period, so that the optimum information for the user can be provided.
  • the visualization data displayed by the display processing units of (1) to (7) may include a graph in which the second likelihood of each category is represented in time series.
  • the transition of the topic is displayed in an easy-to-understand manner, and the user can easily grasp the transition of the topic.
  • the visualization data displayed by the display processing units of (1) to (7) may include the text data of the utterance included in the utterance data.
  • the visualization data displayed by the display processing units of (1) to (7) is generated from the visualization data generated from the speech data of the first speaker and the speech data of the second speaker. Comparison data comparing with visualization data may be used.
  • the calculation units of (1) to (10) may calculate the first likelihood for each category for each of the utterance data.
  • the first likelihood can be calculated in the utterance analysis device, so that processing can be performed independently of the network load.
  • the visualization method of the present disclosure is an utterance analysis method for visualizing changes in the speaker's utterance in the first period, and includes a step in which the acquisition unit acquires the speaker's utterance data in chronological order.
  • the calculation unit integrates the first likelihoods of the plurality of utterance data in the second period shorter than the first period, and sets the second likelihood for each category.
  • the visualization data obtained in the above and displayed by the display processing unit is an utterance analysis method that represents a change in utterance by a change in the second likelihood of each category in a plurality of different second periods.
  • the program of the present disclosure causes a computer to realize the method (12).
  • the utterance analyzer, utterance analysis method and program described in all the claims of the present disclosure are realized by cooperation with hardware resources such as a processor, a memory, and a program.
  • the utterance analysis device, visualization method and program of the present disclosure are used for a certain period of time by, for example, a speaker who engages in sales by talking, a lecturer who gives a lecture, a respondent who answers a question at a call center, or the like. , When an utterance is made, it is useful when evaluating the utterance or when others refer to the topic of the utterance.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)
PCT/JP2020/021809 2019-07-04 2020-06-02 発話解析装置、発話解析方法及びプログラム Ceased WO2021002136A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202080048836.2A CN114072786A (zh) 2019-07-04 2020-06-02 说话解析装置、说话解析方法以及程序
JP2021529929A JP7531164B2 (ja) 2019-07-04 2020-06-02 発話解析装置、発話解析方法及びプログラム
US17/559,033 US12300226B2 (en) 2019-07-04 2021-12-22 Utterance analysis device, utterance analysis method, and computer program

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2019-125454 2019-07-04
JP2019125454 2019-07-04
JP2019-134559 2019-07-22
JP2019134559 2019-07-22

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/559,033 Continuation US12300226B2 (en) 2019-07-04 2021-12-22 Utterance analysis device, utterance analysis method, and computer program

Publications (1)

Publication Number Publication Date
WO2021002136A1 true WO2021002136A1 (ja) 2021-01-07

Family

ID=74100168

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/JP2020/021809 Ceased WO2021002136A1 (ja) 2019-07-04 2020-06-02 発話解析装置、発話解析方法及びプログラム
PCT/JP2020/021811 Ceased WO2021002137A1 (ja) 2019-07-04 2020-06-02 発話解析装置、発話解析方法及びプログラム

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/021811 Ceased WO2021002137A1 (ja) 2019-07-04 2020-06-02 発話解析装置、発話解析方法及びプログラム

Country Status (4)

Country Link
US (2) US12094464B2 (https=)
JP (2) JP7407190B2 (https=)
CN (2) CN114026557A (https=)
WO (2) WO2021002136A1 (https=)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220100959A1 (en) * 2020-09-30 2022-03-31 Honda Motor Co., Ltd. Conversation support device, conversation support system, conversation support method, and storage medium
WO2022162957A1 (ja) * 2021-02-01 2022-08-04 オムロン株式会社 情報処理装置、制御システムおよびレポート出力方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4027247A4 (en) * 2019-09-02 2023-05-10 Imatrix Holdings Corp. TEXT ANALYSIS SYSTEM AND EVALUATION SYSTEM OF THE CHARACTERISTICS FOR MESSAGE EXCHANGE WITH THIS SYSTEM
US11893990B2 (en) * 2021-09-27 2024-02-06 Sap Se Audio file annotation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009084554A1 (ja) * 2007-12-27 2009-07-09 Nec Corporation テキスト分割装置とテキスト分割方法およびプログラム
JP2011123706A (ja) * 2009-12-11 2011-06-23 Advanced Media Inc 文章分類装置および文章分類方法
JP2017016566A (ja) * 2015-07-06 2017-01-19 ソニー株式会社 情報処理装置、情報処理方法及びプログラム
JP2018049478A (ja) * 2016-09-21 2018-03-29 日本電信電話株式会社 テキスト分析方法、テキスト分析装置、及びプログラム
WO2018110029A1 (ja) * 2016-12-13 2018-06-21 株式会社東芝 情報処理装置、情報処理方法、および情報処理プログラム

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5329610U (https=) 1976-08-18 1978-03-14
JPS5468474U (https=) 1977-10-24 1979-05-15
US20080300872A1 (en) * 2007-05-31 2008-12-04 Microsoft Corporation Scalable summaries of audio or visual content
JP2011221873A (ja) * 2010-04-12 2011-11-04 Nippon Telegr & Teleph Corp <Ntt> データ分類方法及び装置及びプログラム
JP5468474B2 (ja) 2010-06-21 2014-04-09 株式会社野村総合研究所 トークスクリプト利用状況算出システムおよびトークスクリプト利用状況算出プログラム
JP5329610B2 (ja) 2011-07-22 2013-10-30 みずほ情報総研株式会社 説明支援システム、説明支援方法及び説明支援プログラム
JP5774459B2 (ja) * 2011-12-08 2015-09-09 株式会社野村総合研究所 談話要約テンプレート作成システムおよび談話要約テンプレート作成プログラム
US8612211B1 (en) * 2012-09-10 2013-12-17 Google Inc. Speech recognition and summarization
WO2016027364A1 (ja) * 2014-08-22 2016-02-25 株式会社日立製作所 話題クラスタ選択装置、及び検索方法
US10057707B2 (en) * 2015-02-03 2018-08-21 Dolby Laboratories Licensing Corporation Optimized virtual scene layout for spatial meeting playback
JP6664072B2 (ja) * 2015-12-02 2020-03-13 パナソニックIpマネジメント株式会社 探索支援方法、探索支援装置、及び、プログラム
EP3809283A1 (en) * 2016-05-13 2021-04-21 Equals 3 LLC Searching structured and unstructured data sets
JP2018194980A (ja) * 2017-05-15 2018-12-06 富士通株式会社 判定プログラム、判定方法および判定装置
JP6614589B2 (ja) 2018-05-09 2019-12-04 株式会社野村総合研究所 コンプライアンスチェックシステムおよびコンプライアンスチェックプログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009084554A1 (ja) * 2007-12-27 2009-07-09 Nec Corporation テキスト分割装置とテキスト分割方法およびプログラム
JP2011123706A (ja) * 2009-12-11 2011-06-23 Advanced Media Inc 文章分類装置および文章分類方法
JP2017016566A (ja) * 2015-07-06 2017-01-19 ソニー株式会社 情報処理装置、情報処理方法及びプログラム
JP2018049478A (ja) * 2016-09-21 2018-03-29 日本電信電話株式会社 テキスト分析方法、テキスト分析装置、及びプログラム
WO2018110029A1 (ja) * 2016-12-13 2018-06-21 株式会社東芝 情報処理装置、情報処理方法、および情報処理プログラム

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220100959A1 (en) * 2020-09-30 2022-03-31 Honda Motor Co., Ltd. Conversation support device, conversation support system, conversation support method, and storage medium
WO2022162957A1 (ja) * 2021-02-01 2022-08-04 オムロン株式会社 情報処理装置、制御システムおよびレポート出力方法
JP7524784B2 (ja) 2021-02-01 2024-07-30 オムロン株式会社 情報処理装置、制御システムおよびレポート出力方法

Also Published As

Publication number Publication date
JP7407190B2 (ja) 2023-12-28
US20220108697A1 (en) 2022-04-07
JPWO2021002137A1 (https=) 2021-01-07
JP7531164B2 (ja) 2024-08-09
WO2021002137A1 (ja) 2021-01-07
CN114072786A (zh) 2022-02-18
CN114026557A (zh) 2022-02-08
US12300226B2 (en) 2025-05-13
US20220114348A1 (en) 2022-04-14
US12094464B2 (en) 2024-09-17
JPWO2021002136A1 (https=) 2021-01-07

Similar Documents

Publication Publication Date Title
WO2021002136A1 (ja) 発話解析装置、発話解析方法及びプログラム
RU2720359C1 (ru) Способ и оборудование распознавания эмоций в речи
CN107818798A (zh) 客服服务质量评价方法、装置、设备及存储介质
US20190385628A1 (en) Voice conversion / voice identity conversion device, voice conversion / voice identity conversion method and program
CN109767765A (zh) 话术匹配方法及装置、存储介质、计算机设备
JP6815899B2 (ja) 出力文生成装置、出力文生成方法および出力文生成プログラム
Anderson et al. Recognition of elderly speech and voice-driven document retrieval
US12200322B2 (en) Systems and methods for generating a video summary of a virtual event
CN112002346A (zh) 基于语音的性别年龄识别方法、装置、设备和存储介质
US20200013389A1 (en) Word extraction device, related conference extraction system, and word extraction method
US20060253285A1 (en) Method and apparatus using spectral addition for speaker recognition
EP3739583A1 (en) Dialog device, dialog method, and dialog computer program
JPWO2017146073A1 (ja) 声質変換装置、声質変換方法およびプログラム
CN109616098A (zh) 基于频域能量的语音端点检测方法和装置
KR20200082232A (ko) 감성 분석 장치, 이를 포함하는 대화형 에이전트 시스템, 감성 분석을 수행하기 위한 단말 장치 및 감성 분석 방법
KR20240073984A (ko) 관찰된 쿼리 패턴들에 기초하는 타겟 디바이스에 대한 증류
US20230081543A1 (en) Method for synthetizing speech and electronic device
JP2021124530A (ja) 情報処理装置、情報処理方法及びプログラム
JP6786065B2 (ja) 音声評定装置、音声評定方法、教師変化情報の生産方法、およびプログラム
US20190385590A1 (en) Generating device, generating method, and non-transitory computer readable storage medium
Gyulyustan et al. Measuring and analysis of speech-to-text accuracy of some automatic speech recognition services in dynamic environment conditions
CN114999440A (zh) 虚拟形象生成方法、装置、设备、存储介质以及程序产品
KR20230000175A (ko) Ai 기반 발음 평가 방법, 발음 코칭용 학습 컨텐츠 제공 방법 및 이를 수행하기 위한 컴퓨팅 시스템
JP2022082049A (ja) 発話評価方法および発話評価装置
US20260119812A1 (en) System for evaluating a large language model generated response to a user query

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20835247

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021529929

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20835247

Country of ref document: EP

Kind code of ref document: A1