WO2019142231A1 - Voice analysis device, voice analysis method, voice analysis program, and voice analysis system - Google Patents
Voice analysis device, voice analysis method, voice analysis program, and voice analysis system Download PDFInfo
- Publication number
- WO2019142231A1 WO2019142231A1 PCT/JP2018/000942 JP2018000942W WO2019142231A1 WO 2019142231 A1 WO2019142231 A1 WO 2019142231A1 JP 2018000942 W JP2018000942 W JP 2018000942W WO 2019142231 A1 WO2019142231 A1 WO 2019142231A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- section
- unit
- amount
- participants
- Prior art date
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 176
- 230000008859 change Effects 0.000 claims abstract description 39
- 230000002123 temporal effect Effects 0.000 claims abstract description 15
- 238000004891 communication Methods 0.000 claims description 70
- 230000001174 ascending effect Effects 0.000 claims description 4
- 239000006185 dispersion Substances 0.000 claims 1
- 238000000034 method Methods 0.000 description 18
- 230000004807 localization Effects 0.000 description 14
- 238000012545 processing Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 238000000605 extraction Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000013459 approach Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 230000003213 activating effect Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G10L21/10—Transforming into visible information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Definitions
- the present invention relates to a voice analysis device for analyzing voice, a voice analysis method, a voice analysis program and a voice analysis system.
- the Harkness method (also referred to as the Harkness method) is known as a method for analyzing discussions in group learning and meetings (see, for example, Non-Patent Document 1).
- the Harkness method the transition of each participant's utterance is recorded in a line. In this way, it is possible to analyze the contribution of each participant to the discussion and the relationship with others.
- the Harkness Law can also be effectively applied to active learning where students take the initiative in learning.
- Harkness method shows the tendency of the speech of the whole period from the start to the end of the discussion, it can not show the change of the speech volume of each participant along the time series. Therefore, there is a problem that it is difficult to analyze based on the time change of the volume of each participant.
- the present invention has been made in view of these points, and a speech analysis device, a speech analysis method, a speech analysis program, and a speech analysis that can output information for performing an analysis based on a time change of a participant's speech volume in a discussion It aims to provide a system.
- a voice analysis device including: an acquisition unit for acquiring voices uttered by a plurality of participants; and an analysis unit for identifying an utterance amount of each of the plurality of participants in the voice.
- a section setting unit for setting a section in the voice based on an input from the user, a graph in which temporal changes of the utterance amount of the plurality of participants are accumulated, and information indicating the section in the graph
- an output unit for outputting.
- the output unit may output, as the information indicating the section, a position on the graph corresponding to a time when switching between the two sections.
- the section setting unit is configured to set the section based on at least one of an operation in a communication terminal that communicates with the voice analysis device, an operation in a sound collection device for obtaining the voice, and a predetermined sound included in the voice.
- a section may be set.
- the output unit may output the graph in which temporal changes in the amount of utterance are stacked in ascending order of the degree of variation in the amount of utterance calculated for each of the plurality of participants.
- the output unit outputs the graph in which temporal changes of the utterance amount are accumulated for each of the sections in ascending order of the variation degree of the utterance amount for each of the sections calculated for each of the plurality of participants. It is also good.
- the output unit may output a plurality of graphs of the same section set to a plurality of the voices.
- information indicating an event that has occurred within the time of the voice may be output on the graph.
- the analysis unit may specify a value obtained by dividing the length of time during which a participant speaks within a predetermined time window by the length of the time window as the speech amount.
- the processor acquires a speech uttered by a plurality of participants, and a step of specifying an amount of speech of each of the plurality of participants in the speech per hour A step of setting a section in the voice based on an input from the user, a graph in which temporal changes in the amount of utterance of the plurality of participants are accumulated, and information indicating the section in the graph And the following steps:
- a voice analysis program includes the steps of: obtaining voices uttered by a plurality of participants on a computer; and identifying an amount of time of each of the plurality of participants in the voice.
- a voice analysis system includes a voice analysis device and a communication terminal capable of communicating with the voice analysis device, the communication terminal having a display unit for displaying information, the voice
- the analysis apparatus is based on an acquisition unit that acquires voices uttered by a plurality of participants, an analysis unit that identifies an amount of speech of each of the plurality of participants in the voice, and an input from a user.
- An output unit that causes the display unit to display a section setting unit that sets a section in the voice, a graph in which temporal changes of the utterance amount of the plurality of participants are accumulated, and information indicating the section in the graph And.
- FIG. 1 is a schematic view of a speech analysis system S according to the present embodiment.
- the voice analysis system S includes a voice analysis device 100, a sound collection device 10, and a communication terminal 20.
- the number of sound collectors 10 and communication terminals 20 included in the speech analysis system S is not limited.
- the voice analysis system S may include devices such as other servers and terminals.
- the voice analysis device 100, the sound collection device 10, and the communication terminal 20 are connected via a network N such as a local area network or the Internet. At least a part of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 may be directly connected without the network N.
- the sound collector 10 includes a microphone array including a plurality of sound collectors (microphones) arranged in different orientations.
- the microphone array includes eight microphones equally spaced on the same circumference in the horizontal plane with respect to the ground.
- the sound collection device 10 transmits the voice acquired using the microphone array to the voice analysis device 100 as data.
- the communication terminal 20 is a communication device capable of performing wired or wireless communication.
- the communication terminal 20 is, for example, a portable terminal such as a smart phone terminal or a computer terminal such as a personal computer.
- the communication terminal 20 receives the setting of analysis conditions from the analyst and displays the analysis result by the voice analysis device 100.
- the voice analysis device 100 is a computer that analyzes the voice acquired by the sound collection device 10 by a voice analysis method described later. Further, the voice analysis device 100 transmits the result of the voice analysis to the communication terminal 20.
- FIG. 2 is a block diagram of the speech analysis system S according to the present embodiment. Arrows in FIG. 2 indicate the main data flow, and there may be data flows not shown in FIG. In FIG. 2, each block is not a hardware (apparatus) unit configuration but a function unit configuration. As such, the blocks shown in FIG. 2 may be implemented in a single device or may be implemented separately in multiple devices. Transfer of data between the blocks may be performed via any means such as a data bus, a network, a portable storage medium, and the like.
- the communication terminal 20 has a display unit 21 for displaying various information, and an operation unit 22 for receiving an operation by an analyst.
- the display unit 21 includes a display device such as a liquid crystal display or an organic light emitting diode (OLED) display.
- the operation unit 22 includes operation members such as a button, a switch, and a dial.
- the display unit 21 and the operation unit 22 may be integrally configured by using a touch screen capable of detecting the position of contact by the analyst as the display unit 21.
- the voice analysis device 100 includes a control unit 110, a communication unit 120, and a storage unit 130.
- the control unit 110 includes a setting unit 111, a sound acquisition unit 112, a sound source localization unit 113, an analysis unit 114, a section setting unit 115, and an output unit 116.
- the storage unit 130 includes a setting information storage unit 131, a voice storage unit 132, and an analysis result storage unit 133.
- the communication unit 120 is a communication interface for communicating with the sound collection device 10 and the communication terminal 20 via the network N.
- the communication unit 120 includes a processor for performing communication, a connector, an electric circuit, and the like.
- the communication unit 120 performs predetermined processing on a communication signal received from the outside to acquire data, and inputs the acquired data to the control unit 110. Further, the communication unit 120 performs predetermined processing on the data input from the control unit 110 to generate a communication signal, and transmits the generated communication signal to the outside.
- the storage unit 130 is a storage medium including a read only memory (ROM), a random access memory (RAM), a hard disk drive, and the like.
- the storage unit 130 stores in advance a program to be executed by the control unit 110.
- the storage unit 130 may be provided outside the voice analysis device 100, and in this case, data may be exchanged with the control unit 110 via the communication unit 120.
- the setting information storage unit 131 stores setting information indicating analysis conditions set by the analyst in the communication terminal 20.
- the voice storage unit 132 stores the voice acquired by the sound collection device 10.
- the analysis result storage unit 133 stores an analysis result indicating the result of analyzing the voice.
- the setting information storage unit 131, the voice storage unit 132, and the analysis result storage unit 133 may be storage areas on the storage unit 130, or a database configured on the storage unit 130.
- the control unit 110 is a processor such as a central processing unit (CPU), for example, and executes the program stored in the storage unit 130 to set the setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the analysis unit 114, It functions as a section setting unit 115 and an output unit 116.
- the functions of the setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the analysis unit 114, the section setting unit 115, and the output unit 116 will be described later with reference to FIGS. 3 to 9.
- At least a part of the functions of the control unit 110 may be performed by an electrical circuit.
- at least a part of the functions of the control unit 110 may be executed by a program executed via a network.
- the speech analysis system S is not limited to the specific configuration shown in FIG.
- the voice analysis device 100 is not limited to one device, and may be configured by connecting two or more physically separated devices in a wired or wireless manner.
- FIG. 3 is a schematic view of the speech analysis method performed by the speech analysis system S according to the present embodiment.
- the analyst sets the analysis conditions by operating the operation unit 22 of the communication terminal 20.
- the analysis condition is information indicating the number of participants in the argument to be analyzed and the direction in which each participant (that is, each of a plurality of participants) is located with reference to the sound collection device 10.
- the communication terminal 20 receives the setting of analysis conditions from the analyst, and transmits the setting as the setting information to the voice analysis device 100 (a).
- the setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and causes the setting information storage unit 131 to store the setting information.
- FIG. 4 is a front view of the display unit 21 of the communication terminal 20 displaying the setting screen A.
- the communication terminal 20 displays the setting screen A on the display unit 21 and receives the setting of the analysis condition by the analyst.
- the setting screen A includes a position setting area A1, a start button A2, and an end button A3.
- the positioning area A1 is an area for setting the direction in which each participant U is actually positioned with reference to the sound collection device 10 in the argument to be analyzed.
- the position setting area A1 represents a circle centered on the position of the sound collector 10 as shown in FIG. 4, and further represents an angle based on the sound collector 10 along the circle.
- the analyst sets the position of each participant U in the position setting area A1 by operating the operation unit 22 of the communication terminal 20.
- identification information here, U1 to U4
- U1 to U4 identification information for identifying each participant U is allocated and displayed.
- four participants U1 to U4 are set.
- the portion corresponding to each participant U in the positioning area A1 is displayed in a different color for each participant. Thereby, the analyst can easily recognize the direction in which each participant U is set.
- the start button A2 and the end button A3 are virtual buttons displayed on the display unit 21, respectively.
- the communication terminal 20 transmits a signal of a start instruction to the voice analysis device 100 when the analyst presses the start button A2.
- the communication terminal 20 transmits a signal of a termination instruction to the voice analysis device 100 when the analyst presses the termination button A3.
- from the start instruction to the end instruction by the analyst is one discussion.
- the voice acquisition unit 112 of the voice analysis device 100 When the voice acquisition unit 112 of the voice analysis device 100 receives the signal of the start instruction from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing acquisition of voice to the sound collection device 10 (b). When the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100, the collection of voice is started. Further, when the voice acquisition unit 112 of the voice analysis device 100 receives the signal of the termination instruction from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing the termination of the voice acquisition to the sound collection device 10. When the sound collection device 10 receives a signal instructing the end of the acquisition of the sound from the speech analysis device 100, the sound collection device 10 ends the acquisition of the sound.
- the sound collection device 10 acquires voices in each of a plurality of sound collection units, and internally records the sound as the sound of each channel corresponding to each sound collection unit. Then, the sound collection device 10 transmits the acquired voices of the plurality of channels to the voice analysis device 100 (c). The sound collector 10 may transmit the acquired voice sequentially or may transmit a predetermined amount or a predetermined time of sound. Further, the sound collection device 10 may collectively transmit the sound from the start to the end of the acquisition.
- the voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
- the voice analysis device 100 analyzes voice at predetermined timing using the voice acquired from the sound collection device 10.
- the voice analysis device 100 may analyze the voice when the analyst gives an analysis instruction at the communication terminal 20 by a predetermined operation. In this case, the analyst selects a voice corresponding to the argument to be analyzed from the voices stored in the voice storage unit 132.
- the voice analysis device 100 may analyze the voice when the voice acquisition ends. In this case, the speech from the start to the end of the acquisition corresponds to the argument to be analyzed. In addition, the voice analysis device 100 may analyze voice sequentially (that is, in real time processing) during acquisition of voice. In this case, the voice analysis device 100 goes back from the current time, and the voice for a predetermined time (for example, 30 seconds) in the past corresponds to the argument to be analyzed.
- a predetermined time for example, 30 seconds
- the sound source localization unit 113 When analyzing speech, first, the sound source localization unit 113 performs sound source localization based on the plurality of channels of speech acquired by the speech acquisition unit 112 (d). The sound source localization is processing for estimating the direction of the sound source included in the sound acquired by the sound acquisition unit 112 for each time (for example, every 10 milliseconds to 100 milliseconds). The sound source localization unit 113 associates the direction of the sound source estimated for each time with the direction of the participant indicated by the setting information stored in the setting information storage unit 131.
- the sound source localization unit 113 can identify the direction of the sound source based on the sound acquired from the sound collection device 10, a known sound source localization method such as Multiple Signal Classification (MUSIC) method or beam forming method can be used. .
- MUSIC Multiple Signal Classification
- the analysis unit 114 analyzes the voice based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113 (e).
- the analysis unit 114 may analyze the entire completed discussion as an analysis target, or may analyze a part of the discussion in the case of real-time processing.
- the analysis unit 114 first performs analysis (for example, 10 milliseconds to 100 milliseconds) in the discussion of the analysis target. Every second), it is determined which participant speaks (speaks).
- the analysis unit 114 specifies a continuous period from the start to the end of one participant's speech as a speech period, and causes the analysis result storage unit 133 to store the same. When a plurality of participants speak at the same time, the analysis unit 114 specifies a speech period for each participant.
- the analysis unit 114 calculates the amount of speech of each participant for each time, and causes the analysis result storage unit 133 to store the amount. Specifically, in a certain time window (for example, 5 seconds), the analysis unit 114 divides the length of time during which the participant speaks by the length of the time window, the amount of speech per hour (activity Calculated as a degree). Then, the analysis unit 114 calculates the amount of speech per hour for each participant while shifting the time window by a predetermined time (for example, one second) from the start time of the discussion to the end time (currently in the case of real time processing). repeat.
- a predetermined time for example, one second
- the section setting unit 115 sets one or more sections for the voice corresponding to the argument to be analyzed based on the input from the user (the participant or the analyst).
- the section may be set for each subject subject to a discussion such as "Japanese language”, “Science” or “Society”, for example, and a discussion such as "Discussion”, “Idea” or “Summary” It may be set for each stage of
- the section setting unit 115 stores section information indicating a section in the analysis result storage section 133 in association with the voice to be set.
- the section information includes the section name and the section time (ie, the start time and end time of the section in the voice).
- the section setting unit 115 determines a section based on at least one of (1) an operation in the communication terminal 20, (2) an operation in the sound collector 10, and (3) a predetermined sound acquired by the sound collector 10.
- the participant or the analyst When setting a section based on an operation on the communication terminal 20, the participant or the analyst includes the section information by operating the operation unit 22 (for example, a touch screen, a mouse, a keyboard, etc.) of the communication terminal 20. Input the character string and time. The participant or the analyst may input the section information after the end of the discussion, or may input the section information in the middle of the discussion. Then, the section setting unit 115 receives section information specified in the communication terminal 20 via the communication unit 120 and stores the information in the analysis result storage unit 133.
- the operation unit 22 for example, a touch screen, a mouse, a keyboard, etc.
- the participant or the analyst When setting a section based on an operation in the sound collection device 10, the participant or the analyst operates the operation unit such as a switch or a touch screen provided on the sound collection device 10 when switching the section. , Set the interval.
- the operation of the operation unit of the sound collection device 10 is associated in advance with switching of a predetermined section (for example, switching from a "discussion" section to an "idea out” section).
- the section setting unit 115 receives information indicating an operation from the operation unit of the sound collection device 10 via the communication unit 120, and specifies switching of a predetermined section at the timing of the operation. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.
- the participant or the analyst uses the device capable of generating the sound (for example, a portable terminal, a music reproduction apparatus, etc.) to set the section.
- a predetermined switching sound indicating switching is generated.
- the switching sound may be a sound wave that can be heard by humans, or an ultrasonic wave that can not be heard by humans.
- the switching sound indicates the switching of the section by, for example, a predefined frequency or an on / off pattern.
- the switching sound may be emitted only at the switching timing of the section, or may be emitted continuously in the section.
- the section setting unit 115 detects the switching sound included in the sound acquired by the sound collection device 10. Then, the section setting unit 115 specifies switching from the section corresponding to the switching sound before the change to the section corresponding to the switching sound after the change at the timing when the switching sound changes. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.
- the section setting unit 115 detects the switching sound included in the sound acquired by the sound collection device 10. Then, the section setting unit 115 specifies switching of a predetermined section at the timing when the switching sound is emitted. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.
- the output unit 116 performs control to display the analysis result by the analysis unit 114 on the display unit 21 by transmitting the display information to the communication terminal 20 (f).
- the output unit 116 is not limited to the display on the display unit 21 and may output the analysis result by other methods such as printing by a printer, data recording to a storage device, and the like. The method of outputting the analysis result by the output unit 116 will be described below with reference to FIGS. 5 to 9.
- the output unit 116 of the voice analysis device 100 reads out, from the analysis result storage unit 133, the analysis result by the analysis unit 114 and the section information by the section setting unit 115 for the display target discussion.
- the output unit 116 may display a discussion immediately after the analysis by the analysis unit 114 is completed, or may display a discussion specified by the analyst.
- FIG. 5 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen B.
- the speech amount screen B is a screen for displaying information indicating time change of the speech amount for each section, and includes a graph B1 of the speech amount, the name of the section B2, and the switching line B3 of the section.
- the output unit 116 is a display for displaying the time change of the speech amount of each participant for each section based on the analysis result and the section information read from the analysis result storage section 133. Generate information.
- the graph B1 is a graph showing the time change of the amount of speech of each participant U.
- the output unit 116 displays the amount of speech (activity) on the vertical axis and the time on the horizontal axis, and displays the amount of speech for each participant U at each time indicated by the analysis result on the display unit 21 as a line graph. At this time, the output unit 116 accumulates the amounts of speech of the participants U at each point in time, that is, displays the sum of the amounts of speech of the participants U in order on the vertical axis.
- the amount of speech of participant U4 is the total value of the amounts of speech of participants U3 and U4
- the amount of speech of participant U2 is the total value of the amounts of speech of participants U2, U3 and U4.
- the amount of speech of the participant U1 is a total value of the amounts of speech of the participants U1, U2, U3, and U4.
- the output unit 116 may randomly determine the order of accumulating (summing) the utterance amounts of the participants U, or may determine the order according to a predetermined rule.
- the output unit 116 can display the amount of speech of the entire group of discussions in addition to the amount of speech of each participant U.
- the analyst can grasp the time change of contribution of each participant U, and at the same time grasp the time change of excitement of the whole group of the participant U.
- the output unit 116 displays an area or a line indicating the graph B1 for each participant U in a display mode such as a color, a pattern, or the like different for each participant.
- a display mode such as a color, a pattern, or the like different for each participant.
- the graph B1 is displayed in a different pattern for each participant U, and a legend that associates the participant U with the pattern is displayed in the vicinity of the graph B1. Thereby, the analyst can easily determine which participant U the graph B1 corresponds to.
- the section name B2 is a character string representing the section name.
- the section switching line B3 is a line indicating the switching timing of the two sections.
- the output unit 116 displays, for each section indicated by the section information, the section name in the vicinity of the graph B1 of the time range corresponding to the section. Further, the output unit 116 specifies the switching timing of the two sections based on the time of the section indicated by the section information. Then, the output unit 116 causes the switching line B3 to be displayed at the time (horizontal axis) position of the graph B1 corresponding to the specified switching timing. Thereby, the output unit 116 can display which section the graph B1 of the amount of speech of each participant U corresponds to each time.
- the output unit 116 superimposes on the time change of the utterance amount of each participant U, and displays the information indicating the section set in the discussion. Therefore, the analyst can grasp the time change of the amount of speech of each participant U for each section.
- the output unit 116 can display the time change of the utterance amount of each participant U in a legible manner by determining the order of accumulating the utterance amount of the participant U in the graph B1 based on the utterance amount of each participant U it can.
- the output unit 116 may switch between and display the speech amount screen B in FIG. 5 and the speech amount screen B in FIG. 6 according to the operation of the analyst, or may display at least one of the predetermined ones.
- the output unit 116 measures the degree of variation (for example, variance or standard deviation) of the utterance amount of each participant U in each section based on the analysis result and the section information read from the analysis result storage section 133 Calculate). Then, the output unit 116 generates the graph B1 by accumulating the utterance amounts of the participants U in the order in which the degree of variation is small in each section. The output unit 116 may determine the stacking order based on the degree of variation of all sections, not for each section.
- the degree of variation for example, variance or standard deviation
- the change in the amount of utterance of the participant U disposed below is the apparent amount of utterance of the participant U disposed above It is possible to reduce the impact. Further, since the tendency of the amount of speech of each participant U changes depending on the section, by changing the stacking order for each section, it is possible to display the time change of the amount of speech more easily.
- the output unit 116 may display a predetermined event that has occurred during the discussion (that is, within the time of the sound acquired by the sound acquisition unit 112) in the graph B1. Thereby, the analyst can analyze the influence of the occurrence of the event on the volume of each participant U's utterance.
- the event is, for example, (1) access to a group of assistants (teachers, facilitators, etc.) of the discussion, or (2) specific remarks (words) of the assistants.
- the event shown here is an example, and the output unit 116 may display the occurrence of other events that can be recognized by the voice analysis device 100.
- the output unit 116 uses a signal transmitted and received between the sound collector 10 and the assistants.
- the assistant holds a transmitter that emits a predetermined signal by radio waves or ultrasonic waves of wireless communication such as Bluetooth (registered trademark), for example, and the sound collection device 10 includes a receiver that receives the signal.
- the output unit 116 indicates that the assistant has approached when the receiver of the sound collection device 10 can receive the signal from the transmitter of the assistant or when the strength of receiving the signal becomes equal to or higher than a predetermined threshold.
- the output unit 116 is configured to leave the assistant when the receiver of the sound collection device 10 can not receive the signal from the transmitter of the assistant or when the intensity at which the signal is received becomes less than a predetermined threshold. Determine what you did.
- the output unit 116 may use the voiceprint of the assistant (ie, the frequency spectrum of the assistant's voice) to detect the approach of the assistant's group.
- the output unit 116 registers the voiceprint of the assistant in advance, and detects the voiceprint of the assistant in the voice acquired by the sound collection device 10 during the discussion. Then, the output unit 116 determines that the assistant has approached when detecting the assistant's voiceprint, and determines that the assistant has left when the assistant's voiceprint can not be detected.
- the output unit 116 performs speech recognition on the speech of the assistant.
- the assistant holds a sound collector (for example, a pin microphone), and the output unit 116 receives the voice of the assistant acquired by the sound collector held by the assistant.
- a sound collector held by the assistant separately from the sound collector 10, the voice of the participant U and the voice of the assistant can be clearly distinguished.
- the output unit 116 converts the voice acquired from the sound collection device held by the assistant into a character string.
- the output unit 116 can use a known speech recognition method to convert speech into a character string. Then, the output unit 116 outputs specific words (for example, words related to the progress of the discussion such as “first”, “summary”, “last”, and words such as “good” or “bad”) in the converted character string. ) To detect.
- the words to be detected are set in the voice analysis device 100 in advance. Then, when the specific word is detected, the output unit 116 determines that the specific word is uttered.
- the output unit 116 may perform speech recognition only before and after the timing at which the change in the amount of speech of each participant U is large. In this case, based on the analysis result read out from the analysis result storage unit 133, the output unit 116 calculates the degree of change in the amount of speech per time (for example, the amount or ratio of change per unit time). The degree of change in the amount of speech may be calculated for each participant U, or may be calculated as the sum of all participants U.
- the output unit 116 outputs the voice acquired by the sound collector held by the assistant in a predetermined time range (for example, 5 seconds after 5 seconds before the timing) including timing when the degree of change is equal to or higher than the predetermined threshold.
- a predetermined time range for example, 5 seconds after 5 seconds before the timing
- the output unit 116 outputs the voice acquired by the sound collector held by the assistant in a predetermined time range (for example, 5 seconds after 5 seconds before the timing) including timing when the degree of change is equal to or higher than the predetermined threshold.
- FIG. 7 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen B.
- event information B4 is displayed on the graph B1, and the other is the same as the speech amount screen B of FIG.
- the output unit 116 may switch and display the speech amount screen B in FIG. 5 and the speech amount screen B in FIG. 7 according to the operation of the analyst, or may display at least one of the predetermined amounts.
- the event information B4 is information indicating the content and timing of the event.
- the event information B4 indicates the content of the event by, for example, a character string indicating that the assistant has approached or left or a character string indicating the speech of the assistant detected by speech recognition. Further, the event information B4 indicates the timing of the event by an arrow indicating the timing at which the event occurs on the graph B1.
- the output unit 116 displays information indicating the content and timing of an event that has occurred in the discussion, superimposed on the time change of the utterance amount of each participant U. Therefore, the analyst can analyze how the event that occurred during the discussion influenced the time change of the volume of each participant U's utterance.
- the analyst can evaluate that the teacher has activated the discussion, for example, when the amount of speech increases when the teacher approaches the group.
- the analyst can also evaluate that the word is a valid word for activating the discussion, for example, when the amount of speech increases when a specific word is issued by the teacher.
- the output unit 116 can extract and display a graph of a plurality of utterance amounts in the same section.
- FIG. 8 is a front view of the display unit 21 of the communication terminal 20 displaying the section extraction screen C.
- the output unit 116 displays the section extraction screen C for the specified section when, for example, the analyst specifies the name B2 of any section on the speech amount screen B in FIGS. 5 to 7.
- the section extraction screen C is a screen for displaying a result of extracting a graph of the amount of speech of the same section, and includes a graph C1 of the amount of speech, a name C2 of the section, and a name C3 of the group.
- the output unit 116 When displaying the section extraction screen C, the output unit 116 extracts analysis results and section information of a plurality of groups for the designated section from the analysis result storage section 133.
- the groups to be displayed may be different groups discussed at the same time, or the same or different groups discussed in the past. Then, the output unit 116 generates display information for displaying the time change of the utterance amount of each participant for a plurality of groups in the designated section based on the extracted analysis result and the section information.
- the graph C1 of the amount of speech is a graph showing the time change of the amount of speech of each participant U in the designated section for each of two or more groups.
- the display mode of the graph C1 is the same as that of the graph B1.
- the section name C2 is a character string indicating the name of the designated section.
- the group name C3 is a name for identifying a group to be displayed, and may be set by the analyst or may be automatically determined by the voice analysis device 100.
- the output unit 116 displays the graph C1 of two groups, but the graph C1 of three or more groups may be displayed. Also, the output unit 116 may display the names of one or more participants U belonging to the group instead of or in addition to the name C3 of the group.
- the output unit 116 displays a plurality of graphs indicating temporal change in the amount of speech of each participant in different groups in the same section.
- This allows the analyst to compare and analyze temporal changes in the volume of speech of different groups for the same section (e.g., the same subject, or the same stage in the discussion). For example, an analyst can grasp the tendency of the volume of utterance for each group by comparing different groups discussed at the same time. Also, for example, the analyst can grasp the change in the tendency of the utterance amount of the same group by comparing a plurality of past discussions of the same section for the same group.
- the output unit 116 is not limited to the stacked graph as illustrated in FIG. 5, and may display a heat map indicating time change of the amount of speech of each participant U.
- FIG. 9 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen D.
- the speech amount screen D includes a heat map D1 of the speech amount, a section name D2, and a section switching line D3.
- the section name D2 and the section switching line D3 are the same as the section name B2 and the section switching line B3 in FIG.
- the speech amount heat map D1 displays the amount of speech along time by color.
- FIG. 9 shows the color difference by the density of the points, for example, the higher the density of the points, the darker the color, and the lower the density of the points, the lighter the color.
- the output unit 116 takes time in a predetermined direction (for example, the horizontal direction in FIG. 9) and causes the display unit 21 to display, for each participant U, an area of a color according to the amount of speech per hour.
- the analyzer can also grasp the time change of the amount of speech of each participant U for each section by displaying the heat map instead of the graph.
- the output unit 116 may switch and display the graph of FIG. 5 and the heat map of FIG. 9 according to the operation of the analyst, or may display at least one of the predetermined ones.
- FIG. 10 is a sequence diagram of the speech analysis method performed by the speech analysis system S according to the present embodiment.
- the communication terminal 20 receives the setting of analysis conditions from the analyst, and transmits the setting as setting information to the voice analysis device 100 (S11).
- the setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and causes the setting information storage unit 131 to store the setting information.
- the voice acquisition unit 112 of the voice analysis device 100 transmits a signal instructing voice acquisition to the sound collection device 10 (S12).
- the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100
- recording of voice is started using a plurality of sound collection units, and the voice analysis device 100 collects voices of a plurality of channels recorded.
- the voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
- the voice analysis device 100 starts voice analysis at one of timings when an analyst gives instructions, when voice acquisition ends, or during voice acquisition (that is, real-time processing).
- the sound source localization unit 113 performs sound source localization based on the speech acquired by the speech acquisition unit 112 (S14).
- the analysis unit 114 determines, based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113, which participant has spoken at each time.
- the speech period and the speech volume are specified in (S15).
- the analysis unit 114 causes the analysis result storage unit 133 to store the utterance period and the utterance amount for each participant.
- the section setting unit 115 sets one or more sections for the voice corresponding to the argument to be analyzed (S16). At this time, the section setting unit 115 sets a section based on at least one of the operation in the communication terminal 20, the operation in the sound collection device 10, and the predetermined sound acquired by the sound collection device 10.
- the section setting unit 115 stores section information indicating a section in the analysis result storage section 133 in association with the voice to be set.
- the output unit 116 performs control to display the analysis result on the display unit 21 of the communication terminal 20 (S17). Specifically, based on the analysis result by the analysis unit 114 and the section information by the section setting unit 115, the output unit 116 displays the above-mentioned utterance amount screen B, the section extraction screen C, or the utterance amount screen D. Information is generated and transmitted to the communication terminal 20.
- the communication terminal 20 causes the display unit 21 to display the analysis result in accordance with the display information received from the voice analysis device 100 (S18).
- the voice analysis device 100 displays the time change of the amount of speech of each participant for each section. Thereby, the analyst can grasp the time change of the amount of speech of each participant for each section.
- the voice analysis device 100 automatically analyzes the discussions of the plurality of participants based on the voice acquired using the sound collection device 10 having the plurality of sound collection units. Therefore, it is not necessary to have the recorder monitor the discussion as in the Harkness method described in Non-Patent Document 1, and it is not necessary to arrange the recorder for each group, so the cost is low.
- the processor of the speech analysis device 100, the sound collection device 10, and the communication terminal 20 is a main body of each step (process) included in the speech analysis method shown in FIG. That is, the processors of the voice analysis device 100, the sound collection device 10 and the communication terminal 20 read a program for executing the voice analysis method shown in FIG. 10 from the storage unit, and execute the program to execute the voice analysis device 100. By controlling each part of the sound device 10 and the communication terminal 20, the voice analysis method shown in FIG. 10 is performed.
- the steps included in the speech analysis method shown in FIG. 10 may be partially omitted, the order between the steps may be changed, and a plurality of steps may be performed in parallel.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
図1は、本実施形態に係る音声分析システムSの模式図である。音声分析システムSは、音声分析装置100と、集音装置10と、通信端末20とを含む。音声分析システムSが含む集音装置10及び通信端末20の数は限定されない。音声分析システムSは、その他のサーバ、端末等の機器を含んでもよい。 [Overview of speech analysis system S]
FIG. 1 is a schematic view of a speech analysis system S according to the present embodiment. The voice analysis system S includes a
図2は、本実施形態に係る音声分析システムSのブロック図である。図2において、矢印は主なデータの流れを示しており、図2に示していないデータの流れがあってよい。図2において、各ブロックはハードウェア(装置)単位の構成ではなく、機能単位の構成を示している。そのため、図2に示すブロックは単一の装置内に実装されてよく、あるいは複数の装置内に別れて実装されてよい。ブロック間のデータの授受は、データバス、ネットワーク、可搬記憶媒体等、任意の手段を介して行われてよい。 [Configuration of speech analysis system S]
FIG. 2 is a block diagram of the speech analysis system S according to the present embodiment. Arrows in FIG. 2 indicate the main data flow, and there may be data flows not shown in FIG. In FIG. 2, each block is not a hardware (apparatus) unit configuration but a function unit configuration. As such, the blocks shown in FIG. 2 may be implemented in a single device or may be implemented separately in multiple devices. Transfer of data between the blocks may be performed via any means such as a data bus, a network, a portable storage medium, and the like.
図3は、本実施形態に係る音声分析システムSが行う音声分析方法の模式図である。まず分析者は、通信端末20の操作部22を操作することによって、分析条件の設定を行う。例えば分析条件は、分析対象とする議論の参加者の人数と、集音装置10を基準とした各参加者(すなわち、複数の参加者それぞれ)が位置する向きとを示す情報である。通信端末20は、分析者から分析条件の設定を受け付け、設定情報として音声分析装置100に送信する(a)。音声分析装置100の設定部111は、通信端末20から設定情報を取得して設定情報記憶部131に記憶させる。 [Description of voice analysis method]
FIG. 3 is a schematic view of the speech analysis method performed by the speech analysis system S according to the present embodiment. First, the analyst sets the analysis conditions by operating the
音声分析装置100の出力部116は、分析結果を表示する際に、表示対象の議論についての分析部114による分析結果及び区間設定部115による区間情報を分析結果記憶部133から読み出す。出力部116は、分析部114による分析が完了した直後の議論を表示対象としてもよく、あるいは分析者によって指定された議論を表示対象としてもよい。 [Explanation of the display method of the amount of utterance for each section]
When displaying the analysis result, the
出力部116は、グラフB1において、議論中(すなわち音声取得部112が取得した音声の時間内)に発生した所定のイベントを表示してもよい。これにより、分析者はイベントの発生が各参加者Uの発言量に与えた影響を分析することができる。イベントは、例えば(1)議論の補助者(教師、ファシリテータ等)のグループへの接近、又は(2)補助者の特定の発言(言葉)である。ここに示したイベントは一例であり、出力部116は、音声分析装置100が認識可能なその他イベントの発生を表示してもよい。 [Description of event display method]
The
出力部116は、同じ区間における複数の発言量のグラフを抽出して表示することができる。図8は、区間抽出画面Cを表示している通信端末20の表示部21の前面図である。出力部116は、例えば図5~図7の発言量画面Bにおいて分析者がいずれかの区間の名称B2を指定した場合に、指定された区間について区間抽出画面Cを表示する。区間抽出画面Cは、同じ区間の発言量のグラフを抽出した結果を表示する画面であり、発言量のグラフC1と、区間の名称C2と、グループの名称C3とを含む。 [Explanation of the display method of the amount of speech of the same section]
The
出力部116は、図5のような積み上げグラフに限られず、各参加者Uの発言量の時間変化を示すヒートマップを表示してもよい。図9は、発言量画面Dを表示している通信端末20の表示部21の前面図である。発言量画面Dは、発言量のヒートマップD1と、区間の名称D2と、区間の切り替え線D3とを含む。区間の名称D2及び区間の切り替え線D3は、図5における区間の名称B2及び区間の切り替え線B3と同様である。 [Explanation of how to display the heat map of the statement volume]
The
図10は、本実施形態に係る音声分析システムSが行う音声分析方法のシーケンス図である。まず通信端末20は、分析者から分析条件の設定を受け付け、設定情報として音声分析装置100に送信する(S11)。音声分析装置100の設定部111は、通信端末20から設定情報を取得して設定情報記憶部131に記憶させる。 [Sequence of voice analysis method]
FIG. 10 is a sequence diagram of the speech analysis method performed by the speech analysis system S according to the present embodiment. First, the
ハークネス法は議論の開始から終了までの全期間の発言の傾向を示すため、議論の時系列に沿った各参加者の発言量の変化を示すことができない。そのため、各参加者の発言量の時間変化に基づく分析が難しいという問題があった。それに対して、本実施形態に係る音声分析装置100は、区間ごとに各参加者の発言量の時間変化を表示する。これにより分析者は、各参加者の発言量の時間変化を、区間ごとに把握することができる。 [Effect of this embodiment]
Since the Harkness Law shows the tendency of the speech of the whole period from the start to the end of the discussion, it can not show the change of the speech volume of each participant along the time series of the discussion. Therefore, there is a problem that it is difficult to analyze based on the time change of the volume of each participant. On the other hand, the
100 音声分析装置
110 制御部
112 音声取得部
114 分析部
115 区間設定部
116 出力部
10 集音装置
20 通信端末
21 表示部 S
Claims (11)
- 複数の参加者が発した音声を取得する取得部と、
前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定する分析部と、
ユーザからの入力に基づいて、前記音声において区間を設定する区間設定部と、
前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを出力する出力部と、
を有する音声分析装置。 An acquisition unit for acquiring voices uttered by a plurality of participants;
An analysis unit for specifying an amount of time of each of the plurality of participants in the voice;
A section setting unit for setting a section in the voice based on an input from a user;
An output unit configured to output a graph in which temporal changes in the amount of utterance of the plurality of participants are accumulated with each other, and information indicating the section in the graph;
Voice analyzer with. - 前記出力部は、2つの前記区間の間で切り替わった時間に対応する前記グラフ上の位置を、前記区間を示す情報として出力する、請求項1に記載の音声分析装置。 The voice analysis device according to claim 1, wherein the output unit outputs, as the information indicating the section, a position on the graph that corresponds to a time when switching between the two sections.
- 前記区間設定部は、前記音声分析装置と通信する通信端末における操作と、前記音声を取得する集音装置における操作と、前記音声に含まれる所定の音とのうち少なくとも1つに基づいて、前記区間を設定する、請求項1又は2に記載の音声分析装置。 The section setting unit is configured to set the section based on at least one of an operation in a communication terminal that communicates with the voice analysis device, an operation in a sound collection device for obtaining the voice, and a predetermined sound included in the voice. The voice analysis device according to claim 1, wherein a section is set.
- 前記出力部は、前記複数の参加者それぞれについて算出された前記発言量のばらつきの程度が小さい順に、前記発言量の時間変化を互いに積み上げた前記グラフを出力する、請求項1から3のいずれか一項に記載の音声分析装置。 The said output part outputs the said graph which piled up the time change of the said utterance amount mutually in an order with the small grade of the dispersion | variation degree of the said utterance amount calculated about each of the said several participants. The voice analysis device according to one item.
- 前記出力部は、前記複数の参加者それぞれについて算出された前記区間ごとの前記発言量のばらつきの程度が小さい順に、前記区間ごとに前記発言量の時間変化を互いに積み上げた前記グラフを出力する、請求項4に記載の音声分析装置。 The output unit outputs the graph in which temporal changes of the utterance amount are accumulated for each of the sections in ascending order of the variation degree of the utterance amount for each of the sections calculated for each of the plurality of participants. The voice analysis device according to claim 4.
- 前記出力部は、複数の前記音声に設定された同じ前記区間についての複数の前記グラフを出力する、請求項1から5のいずれか一項に記載の音声分析装置。 The voice analysis device according to any one of claims 1 to 5, wherein the output unit outputs a plurality of the graphs for the same section set to a plurality of the voices.
- 前記グラフ及び前記区間を示す情報に加えて、前記音声の時間内に発生したイベントを示す情報を、前記グラフ上に出力する、請求項1から6のいずれか一項に記載の音声分析装置。 The voice analysis device according to any one of claims 1 to 6, wherein, in addition to the graph and the information indicating the section, information indicating an event occurring within the time of the voice is output on the graph.
- 前記分析部は、所定の時間窓内に参加者の発言を行った時間の長さを、前記時間窓の長さで割った値を、前記発言量として特定する、請求項1から7のいずれか一項に記載の音声分析装置。 The analysis unit according to any one of claims 1 to 7, wherein a value obtained by dividing the length of time during which a participant speaks within a predetermined time window by the length of the time window is specified as the speech amount. The voice analysis device according to any one of the preceding claims.
- プロセッサが、
複数の参加者が発した音声を取得するステップと、
前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定するステップと、
ユーザからの入力に基づいて、前記音声において区間を設定するステップと、
前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを出力するステップと、
を実行する音声分析方法。 Processor is
Acquiring voices uttered by a plurality of participants;
Identifying an amount of time of each of the plurality of participants in the voice;
Setting an interval in the voice based on an input from a user;
Outputting a graph in which temporal changes of the utterance amount of the plurality of participants are stacked together, and information indicating the section in the graph;
Voice analysis method to perform. - コンピュータに、
複数の参加者が発した音声を取得するステップと、
前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定するステップと、
ユーザからの入力に基づいて、前記音声において区間を設定するステップと、
前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを出力するステップと、
を実行させる音声分析プログラム。 On the computer
Acquiring voices uttered by a plurality of participants;
Identifying an amount of time of each of the plurality of participants in the voice;
Setting an interval in the voice based on an input from a user;
Outputting a graph in which temporal changes of the utterance amount of the plurality of participants are stacked together, and information indicating the section in the graph;
Voice analysis program to run. - 音声分析装置と、前記音声分析装置と通信可能な通信端末と、を備え、
前記通信端末は、情報を表示する表示部を有し、
前記音声分析装置は、
複数の参加者が発した音声を取得する取得部と、
前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定する分析部と、
ユーザからの入力に基づいて、前記音声において区間を設定する区間設定部と、
前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを、前記表示部に表示させる出力部と、
を有する、音声分析システム。 A voice analysis device; and a communication terminal capable of communicating with the voice analysis device;
The communication terminal has a display unit for displaying information;
The voice analysis device
An acquisition unit for acquiring voices uttered by a plurality of participants;
An analysis unit for specifying an amount of time of each of the plurality of participants in the voice;
A section setting unit for setting a section in the voice based on an input from a user;
An output unit that causes the display unit to display a graph in which temporal changes of the utterance amount of the plurality of participants are accumulated with one another, and information indicating the section in the graph.
Voice analysis system.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2018/000942 WO2019142231A1 (en) | 2018-01-16 | 2018-01-16 | Voice analysis device, voice analysis method, voice analysis program, and voice analysis system |
JP2018502279A JP6589040B1 (en) | 2018-01-16 | 2018-01-16 | Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2018/000942 WO2019142231A1 (en) | 2018-01-16 | 2018-01-16 | Voice analysis device, voice analysis method, voice analysis program, and voice analysis system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019142231A1 true WO2019142231A1 (en) | 2019-07-25 |
Family
ID=67300990
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2018/000942 WO2019142231A1 (en) | 2018-01-16 | 2018-01-16 | Voice analysis device, voice analysis method, voice analysis program, and voice analysis system |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP6589040B1 (en) |
WO (1) | WO2019142231A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021245759A1 (en) * | 2020-06-01 | 2021-12-09 | ハイラブル株式会社 | Voice conference device, voice conference system, and voice conference method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008139654A (en) * | 2006-12-04 | 2008-06-19 | Nec Corp | Method of estimating interaction, separation, and method, system and program for estimating interaction |
JP2015028625A (en) * | 2013-06-28 | 2015-02-12 | キヤノンマーケティングジャパン株式会社 | Information processing apparatus, control method of information processing apparatus, and program |
JP2016206355A (en) * | 2015-04-20 | 2016-12-08 | 本田技研工業株式会社 | Conversation analysis device, conversation analysis method, and program |
JP2017033443A (en) * | 2015-08-05 | 2017-02-09 | 日本電気株式会社 | Data processing device, data processing method, and program |
JP2017161731A (en) * | 2016-03-09 | 2017-09-14 | 本田技研工業株式会社 | Conversation analyzer, conversation analysis method and program |
-
2018
- 2018-01-16 JP JP2018502279A patent/JP6589040B1/en active Active
- 2018-01-16 WO PCT/JP2018/000942 patent/WO2019142231A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008139654A (en) * | 2006-12-04 | 2008-06-19 | Nec Corp | Method of estimating interaction, separation, and method, system and program for estimating interaction |
JP2015028625A (en) * | 2013-06-28 | 2015-02-12 | キヤノンマーケティングジャパン株式会社 | Information processing apparatus, control method of information processing apparatus, and program |
JP2016206355A (en) * | 2015-04-20 | 2016-12-08 | 本田技研工業株式会社 | Conversation analysis device, conversation analysis method, and program |
JP2017033443A (en) * | 2015-08-05 | 2017-02-09 | 日本電気株式会社 | Data processing device, data processing method, and program |
JP2017161731A (en) * | 2016-03-09 | 2017-09-14 | 本田技研工業株式会社 | Conversation analyzer, conversation analysis method and program |
Non-Patent Citations (1)
Title |
---|
HUMAN INTERFACE 2015, 1 September 2015 (2015-09-01), pages 939 - 943 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021245759A1 (en) * | 2020-06-01 | 2021-12-09 | ハイラブル株式会社 | Voice conference device, voice conference system, and voice conference method |
Also Published As
Publication number | Publication date |
---|---|
JP6589040B1 (en) | 2019-10-09 |
JPWO2019142231A1 (en) | 2020-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101213589B (en) | Object sound analysis device, object sound analysis method | |
WO2007139040A1 (en) | Speech situation data creating device, speech situation visualizing device, speech situation data editing device, speech data reproducing device, and speech communication system | |
CN110782962A (en) | Hearing language rehabilitation device, method, electronic equipment and storage medium | |
US20240153483A1 (en) | Systems and methods for generating synthesized speech responses to voice inputs | |
US20230317095A1 (en) | Systems and methods for pre-filtering audio content based on prominence of frequency content | |
Ramsay et al. | The intrinsic memorability of everyday sounds | |
WO2019142231A1 (en) | Voice analysis device, voice analysis method, voice analysis program, and voice analysis system | |
JP6589042B1 (en) | Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system | |
JP6646134B2 (en) | Voice analysis device, voice analysis method, voice analysis program, and voice analysis system | |
CN109377806B (en) | Test question distribution method based on learning level and learning client | |
KR102077642B1 (en) | Sight-singing evaluation system and Sight-singing evaluation method using the same | |
KR101243766B1 (en) | System and method for deciding user’s personality using voice signal | |
JP2020173415A (en) | Teaching material presentation system and teaching material presentation method | |
JP7427274B2 (en) | Speech analysis device, speech analysis method, speech analysis program and speech analysis system | |
JP6975755B2 (en) | Voice analyzer, voice analysis method, voice analysis program and voice analysis system | |
JP7414319B2 (en) | Speech analysis device, speech analysis method, speech analysis program and speech analysis system | |
JP6589041B1 (en) | Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system | |
CN110727883A (en) | Method and system for analyzing personalized growth map of child | |
JP6975756B2 (en) | Voice analyzer, voice analysis method, voice analysis program and voice analysis system | |
Altaf et al. | Perceptually motivated temporal modeling of footsteps in a cross-environmental detection task | |
KR20230064870A (en) | Psychoanalysis server for people with low vision through online music activity and psychological analysis method using the same | |
KR20200018859A (en) | Web service system for speech feedback | |
Becker et al. | Comparing automatic forensic voice comparison systems under forensic conditions | |
JP2022017527A (en) | Speech analysis device, speech analysis method, voice analysis program, and speech analysis system | |
CN112887490A (en) | Telephone robot pressure test system based on collection scene |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2018502279 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18900614 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.10.2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18900614 Country of ref document: EP Kind code of ref document: A1 |