WO2019142231A1 - Voice analysis device, voice analysis method, voice analysis program, and voice analysis system - Google Patents

Voice analysis device, voice analysis method, voice analysis program, and voice analysis system Download PDF

Info

Publication number
WO2019142231A1
WO2019142231A1 PCT/JP2018/000942 JP2018000942W WO2019142231A1 WO 2019142231 A1 WO2019142231 A1 WO 2019142231A1 JP 2018000942 W JP2018000942 W JP 2018000942W WO 2019142231 A1 WO2019142231 A1 WO 2019142231A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
section
unit
amount
participants
Prior art date
Application number
PCT/JP2018/000942
Other languages
French (fr)
Japanese (ja)
Inventor
武志 水本
哲也 菅原
Original Assignee
ハイラブル株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ハイラブル株式会社 filed Critical ハイラブル株式会社
Priority to PCT/JP2018/000942 priority Critical patent/WO2019142231A1/en
Priority to JP2018502279A priority patent/JP6589040B1/en
Publication of WO2019142231A1 publication Critical patent/WO2019142231A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present invention relates to a voice analysis device for analyzing voice, a voice analysis method, a voice analysis program and a voice analysis system.
  • the Harkness method (also referred to as the Harkness method) is known as a method for analyzing discussions in group learning and meetings (see, for example, Non-Patent Document 1).
  • the Harkness method the transition of each participant's utterance is recorded in a line. In this way, it is possible to analyze the contribution of each participant to the discussion and the relationship with others.
  • the Harkness Law can also be effectively applied to active learning where students take the initiative in learning.
  • Harkness method shows the tendency of the speech of the whole period from the start to the end of the discussion, it can not show the change of the speech volume of each participant along the time series. Therefore, there is a problem that it is difficult to analyze based on the time change of the volume of each participant.
  • the present invention has been made in view of these points, and a speech analysis device, a speech analysis method, a speech analysis program, and a speech analysis that can output information for performing an analysis based on a time change of a participant's speech volume in a discussion It aims to provide a system.
  • a voice analysis device including: an acquisition unit for acquiring voices uttered by a plurality of participants; and an analysis unit for identifying an utterance amount of each of the plurality of participants in the voice.
  • a section setting unit for setting a section in the voice based on an input from the user, a graph in which temporal changes of the utterance amount of the plurality of participants are accumulated, and information indicating the section in the graph
  • an output unit for outputting.
  • the output unit may output, as the information indicating the section, a position on the graph corresponding to a time when switching between the two sections.
  • the section setting unit is configured to set the section based on at least one of an operation in a communication terminal that communicates with the voice analysis device, an operation in a sound collection device for obtaining the voice, and a predetermined sound included in the voice.
  • a section may be set.
  • the output unit may output the graph in which temporal changes in the amount of utterance are stacked in ascending order of the degree of variation in the amount of utterance calculated for each of the plurality of participants.
  • the output unit outputs the graph in which temporal changes of the utterance amount are accumulated for each of the sections in ascending order of the variation degree of the utterance amount for each of the sections calculated for each of the plurality of participants. It is also good.
  • the output unit may output a plurality of graphs of the same section set to a plurality of the voices.
  • information indicating an event that has occurred within the time of the voice may be output on the graph.
  • the analysis unit may specify a value obtained by dividing the length of time during which a participant speaks within a predetermined time window by the length of the time window as the speech amount.
  • the processor acquires a speech uttered by a plurality of participants, and a step of specifying an amount of speech of each of the plurality of participants in the speech per hour A step of setting a section in the voice based on an input from the user, a graph in which temporal changes in the amount of utterance of the plurality of participants are accumulated, and information indicating the section in the graph And the following steps:
  • a voice analysis program includes the steps of: obtaining voices uttered by a plurality of participants on a computer; and identifying an amount of time of each of the plurality of participants in the voice.
  • a voice analysis system includes a voice analysis device and a communication terminal capable of communicating with the voice analysis device, the communication terminal having a display unit for displaying information, the voice
  • the analysis apparatus is based on an acquisition unit that acquires voices uttered by a plurality of participants, an analysis unit that identifies an amount of speech of each of the plurality of participants in the voice, and an input from a user.
  • An output unit that causes the display unit to display a section setting unit that sets a section in the voice, a graph in which temporal changes of the utterance amount of the plurality of participants are accumulated, and information indicating the section in the graph And.
  • FIG. 1 is a schematic view of a speech analysis system S according to the present embodiment.
  • the voice analysis system S includes a voice analysis device 100, a sound collection device 10, and a communication terminal 20.
  • the number of sound collectors 10 and communication terminals 20 included in the speech analysis system S is not limited.
  • the voice analysis system S may include devices such as other servers and terminals.
  • the voice analysis device 100, the sound collection device 10, and the communication terminal 20 are connected via a network N such as a local area network or the Internet. At least a part of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 may be directly connected without the network N.
  • the sound collector 10 includes a microphone array including a plurality of sound collectors (microphones) arranged in different orientations.
  • the microphone array includes eight microphones equally spaced on the same circumference in the horizontal plane with respect to the ground.
  • the sound collection device 10 transmits the voice acquired using the microphone array to the voice analysis device 100 as data.
  • the communication terminal 20 is a communication device capable of performing wired or wireless communication.
  • the communication terminal 20 is, for example, a portable terminal such as a smart phone terminal or a computer terminal such as a personal computer.
  • the communication terminal 20 receives the setting of analysis conditions from the analyst and displays the analysis result by the voice analysis device 100.
  • the voice analysis device 100 is a computer that analyzes the voice acquired by the sound collection device 10 by a voice analysis method described later. Further, the voice analysis device 100 transmits the result of the voice analysis to the communication terminal 20.
  • FIG. 2 is a block diagram of the speech analysis system S according to the present embodiment. Arrows in FIG. 2 indicate the main data flow, and there may be data flows not shown in FIG. In FIG. 2, each block is not a hardware (apparatus) unit configuration but a function unit configuration. As such, the blocks shown in FIG. 2 may be implemented in a single device or may be implemented separately in multiple devices. Transfer of data between the blocks may be performed via any means such as a data bus, a network, a portable storage medium, and the like.
  • the communication terminal 20 has a display unit 21 for displaying various information, and an operation unit 22 for receiving an operation by an analyst.
  • the display unit 21 includes a display device such as a liquid crystal display or an organic light emitting diode (OLED) display.
  • the operation unit 22 includes operation members such as a button, a switch, and a dial.
  • the display unit 21 and the operation unit 22 may be integrally configured by using a touch screen capable of detecting the position of contact by the analyst as the display unit 21.
  • the voice analysis device 100 includes a control unit 110, a communication unit 120, and a storage unit 130.
  • the control unit 110 includes a setting unit 111, a sound acquisition unit 112, a sound source localization unit 113, an analysis unit 114, a section setting unit 115, and an output unit 116.
  • the storage unit 130 includes a setting information storage unit 131, a voice storage unit 132, and an analysis result storage unit 133.
  • the communication unit 120 is a communication interface for communicating with the sound collection device 10 and the communication terminal 20 via the network N.
  • the communication unit 120 includes a processor for performing communication, a connector, an electric circuit, and the like.
  • the communication unit 120 performs predetermined processing on a communication signal received from the outside to acquire data, and inputs the acquired data to the control unit 110. Further, the communication unit 120 performs predetermined processing on the data input from the control unit 110 to generate a communication signal, and transmits the generated communication signal to the outside.
  • the storage unit 130 is a storage medium including a read only memory (ROM), a random access memory (RAM), a hard disk drive, and the like.
  • the storage unit 130 stores in advance a program to be executed by the control unit 110.
  • the storage unit 130 may be provided outside the voice analysis device 100, and in this case, data may be exchanged with the control unit 110 via the communication unit 120.
  • the setting information storage unit 131 stores setting information indicating analysis conditions set by the analyst in the communication terminal 20.
  • the voice storage unit 132 stores the voice acquired by the sound collection device 10.
  • the analysis result storage unit 133 stores an analysis result indicating the result of analyzing the voice.
  • the setting information storage unit 131, the voice storage unit 132, and the analysis result storage unit 133 may be storage areas on the storage unit 130, or a database configured on the storage unit 130.
  • the control unit 110 is a processor such as a central processing unit (CPU), for example, and executes the program stored in the storage unit 130 to set the setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the analysis unit 114, It functions as a section setting unit 115 and an output unit 116.
  • the functions of the setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the analysis unit 114, the section setting unit 115, and the output unit 116 will be described later with reference to FIGS. 3 to 9.
  • At least a part of the functions of the control unit 110 may be performed by an electrical circuit.
  • at least a part of the functions of the control unit 110 may be executed by a program executed via a network.
  • the speech analysis system S is not limited to the specific configuration shown in FIG.
  • the voice analysis device 100 is not limited to one device, and may be configured by connecting two or more physically separated devices in a wired or wireless manner.
  • FIG. 3 is a schematic view of the speech analysis method performed by the speech analysis system S according to the present embodiment.
  • the analyst sets the analysis conditions by operating the operation unit 22 of the communication terminal 20.
  • the analysis condition is information indicating the number of participants in the argument to be analyzed and the direction in which each participant (that is, each of a plurality of participants) is located with reference to the sound collection device 10.
  • the communication terminal 20 receives the setting of analysis conditions from the analyst, and transmits the setting as the setting information to the voice analysis device 100 (a).
  • the setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and causes the setting information storage unit 131 to store the setting information.
  • FIG. 4 is a front view of the display unit 21 of the communication terminal 20 displaying the setting screen A.
  • the communication terminal 20 displays the setting screen A on the display unit 21 and receives the setting of the analysis condition by the analyst.
  • the setting screen A includes a position setting area A1, a start button A2, and an end button A3.
  • the positioning area A1 is an area for setting the direction in which each participant U is actually positioned with reference to the sound collection device 10 in the argument to be analyzed.
  • the position setting area A1 represents a circle centered on the position of the sound collector 10 as shown in FIG. 4, and further represents an angle based on the sound collector 10 along the circle.
  • the analyst sets the position of each participant U in the position setting area A1 by operating the operation unit 22 of the communication terminal 20.
  • identification information here, U1 to U4
  • U1 to U4 identification information for identifying each participant U is allocated and displayed.
  • four participants U1 to U4 are set.
  • the portion corresponding to each participant U in the positioning area A1 is displayed in a different color for each participant. Thereby, the analyst can easily recognize the direction in which each participant U is set.
  • the start button A2 and the end button A3 are virtual buttons displayed on the display unit 21, respectively.
  • the communication terminal 20 transmits a signal of a start instruction to the voice analysis device 100 when the analyst presses the start button A2.
  • the communication terminal 20 transmits a signal of a termination instruction to the voice analysis device 100 when the analyst presses the termination button A3.
  • from the start instruction to the end instruction by the analyst is one discussion.
  • the voice acquisition unit 112 of the voice analysis device 100 When the voice acquisition unit 112 of the voice analysis device 100 receives the signal of the start instruction from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing acquisition of voice to the sound collection device 10 (b). When the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100, the collection of voice is started. Further, when the voice acquisition unit 112 of the voice analysis device 100 receives the signal of the termination instruction from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing the termination of the voice acquisition to the sound collection device 10. When the sound collection device 10 receives a signal instructing the end of the acquisition of the sound from the speech analysis device 100, the sound collection device 10 ends the acquisition of the sound.
  • the sound collection device 10 acquires voices in each of a plurality of sound collection units, and internally records the sound as the sound of each channel corresponding to each sound collection unit. Then, the sound collection device 10 transmits the acquired voices of the plurality of channels to the voice analysis device 100 (c). The sound collector 10 may transmit the acquired voice sequentially or may transmit a predetermined amount or a predetermined time of sound. Further, the sound collection device 10 may collectively transmit the sound from the start to the end of the acquisition.
  • the voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
  • the voice analysis device 100 analyzes voice at predetermined timing using the voice acquired from the sound collection device 10.
  • the voice analysis device 100 may analyze the voice when the analyst gives an analysis instruction at the communication terminal 20 by a predetermined operation. In this case, the analyst selects a voice corresponding to the argument to be analyzed from the voices stored in the voice storage unit 132.
  • the voice analysis device 100 may analyze the voice when the voice acquisition ends. In this case, the speech from the start to the end of the acquisition corresponds to the argument to be analyzed. In addition, the voice analysis device 100 may analyze voice sequentially (that is, in real time processing) during acquisition of voice. In this case, the voice analysis device 100 goes back from the current time, and the voice for a predetermined time (for example, 30 seconds) in the past corresponds to the argument to be analyzed.
  • a predetermined time for example, 30 seconds
  • the sound source localization unit 113 When analyzing speech, first, the sound source localization unit 113 performs sound source localization based on the plurality of channels of speech acquired by the speech acquisition unit 112 (d). The sound source localization is processing for estimating the direction of the sound source included in the sound acquired by the sound acquisition unit 112 for each time (for example, every 10 milliseconds to 100 milliseconds). The sound source localization unit 113 associates the direction of the sound source estimated for each time with the direction of the participant indicated by the setting information stored in the setting information storage unit 131.
  • the sound source localization unit 113 can identify the direction of the sound source based on the sound acquired from the sound collection device 10, a known sound source localization method such as Multiple Signal Classification (MUSIC) method or beam forming method can be used. .
  • MUSIC Multiple Signal Classification
  • the analysis unit 114 analyzes the voice based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113 (e).
  • the analysis unit 114 may analyze the entire completed discussion as an analysis target, or may analyze a part of the discussion in the case of real-time processing.
  • the analysis unit 114 first performs analysis (for example, 10 milliseconds to 100 milliseconds) in the discussion of the analysis target. Every second), it is determined which participant speaks (speaks).
  • the analysis unit 114 specifies a continuous period from the start to the end of one participant's speech as a speech period, and causes the analysis result storage unit 133 to store the same. When a plurality of participants speak at the same time, the analysis unit 114 specifies a speech period for each participant.
  • the analysis unit 114 calculates the amount of speech of each participant for each time, and causes the analysis result storage unit 133 to store the amount. Specifically, in a certain time window (for example, 5 seconds), the analysis unit 114 divides the length of time during which the participant speaks by the length of the time window, the amount of speech per hour (activity Calculated as a degree). Then, the analysis unit 114 calculates the amount of speech per hour for each participant while shifting the time window by a predetermined time (for example, one second) from the start time of the discussion to the end time (currently in the case of real time processing). repeat.
  • a predetermined time for example, one second
  • the section setting unit 115 sets one or more sections for the voice corresponding to the argument to be analyzed based on the input from the user (the participant or the analyst).
  • the section may be set for each subject subject to a discussion such as "Japanese language”, “Science” or “Society”, for example, and a discussion such as "Discussion”, “Idea” or “Summary” It may be set for each stage of
  • the section setting unit 115 stores section information indicating a section in the analysis result storage section 133 in association with the voice to be set.
  • the section information includes the section name and the section time (ie, the start time and end time of the section in the voice).
  • the section setting unit 115 determines a section based on at least one of (1) an operation in the communication terminal 20, (2) an operation in the sound collector 10, and (3) a predetermined sound acquired by the sound collector 10.
  • the participant or the analyst When setting a section based on an operation on the communication terminal 20, the participant or the analyst includes the section information by operating the operation unit 22 (for example, a touch screen, a mouse, a keyboard, etc.) of the communication terminal 20. Input the character string and time. The participant or the analyst may input the section information after the end of the discussion, or may input the section information in the middle of the discussion. Then, the section setting unit 115 receives section information specified in the communication terminal 20 via the communication unit 120 and stores the information in the analysis result storage unit 133.
  • the operation unit 22 for example, a touch screen, a mouse, a keyboard, etc.
  • the participant or the analyst When setting a section based on an operation in the sound collection device 10, the participant or the analyst operates the operation unit such as a switch or a touch screen provided on the sound collection device 10 when switching the section. , Set the interval.
  • the operation of the operation unit of the sound collection device 10 is associated in advance with switching of a predetermined section (for example, switching from a "discussion" section to an "idea out” section).
  • the section setting unit 115 receives information indicating an operation from the operation unit of the sound collection device 10 via the communication unit 120, and specifies switching of a predetermined section at the timing of the operation. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.
  • the participant or the analyst uses the device capable of generating the sound (for example, a portable terminal, a music reproduction apparatus, etc.) to set the section.
  • a predetermined switching sound indicating switching is generated.
  • the switching sound may be a sound wave that can be heard by humans, or an ultrasonic wave that can not be heard by humans.
  • the switching sound indicates the switching of the section by, for example, a predefined frequency or an on / off pattern.
  • the switching sound may be emitted only at the switching timing of the section, or may be emitted continuously in the section.
  • the section setting unit 115 detects the switching sound included in the sound acquired by the sound collection device 10. Then, the section setting unit 115 specifies switching from the section corresponding to the switching sound before the change to the section corresponding to the switching sound after the change at the timing when the switching sound changes. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.
  • the section setting unit 115 detects the switching sound included in the sound acquired by the sound collection device 10. Then, the section setting unit 115 specifies switching of a predetermined section at the timing when the switching sound is emitted. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.
  • the output unit 116 performs control to display the analysis result by the analysis unit 114 on the display unit 21 by transmitting the display information to the communication terminal 20 (f).
  • the output unit 116 is not limited to the display on the display unit 21 and may output the analysis result by other methods such as printing by a printer, data recording to a storage device, and the like. The method of outputting the analysis result by the output unit 116 will be described below with reference to FIGS. 5 to 9.
  • the output unit 116 of the voice analysis device 100 reads out, from the analysis result storage unit 133, the analysis result by the analysis unit 114 and the section information by the section setting unit 115 for the display target discussion.
  • the output unit 116 may display a discussion immediately after the analysis by the analysis unit 114 is completed, or may display a discussion specified by the analyst.
  • FIG. 5 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen B.
  • the speech amount screen B is a screen for displaying information indicating time change of the speech amount for each section, and includes a graph B1 of the speech amount, the name of the section B2, and the switching line B3 of the section.
  • the output unit 116 is a display for displaying the time change of the speech amount of each participant for each section based on the analysis result and the section information read from the analysis result storage section 133. Generate information.
  • the graph B1 is a graph showing the time change of the amount of speech of each participant U.
  • the output unit 116 displays the amount of speech (activity) on the vertical axis and the time on the horizontal axis, and displays the amount of speech for each participant U at each time indicated by the analysis result on the display unit 21 as a line graph. At this time, the output unit 116 accumulates the amounts of speech of the participants U at each point in time, that is, displays the sum of the amounts of speech of the participants U in order on the vertical axis.
  • the amount of speech of participant U4 is the total value of the amounts of speech of participants U3 and U4
  • the amount of speech of participant U2 is the total value of the amounts of speech of participants U2, U3 and U4.
  • the amount of speech of the participant U1 is a total value of the amounts of speech of the participants U1, U2, U3, and U4.
  • the output unit 116 may randomly determine the order of accumulating (summing) the utterance amounts of the participants U, or may determine the order according to a predetermined rule.
  • the output unit 116 can display the amount of speech of the entire group of discussions in addition to the amount of speech of each participant U.
  • the analyst can grasp the time change of contribution of each participant U, and at the same time grasp the time change of excitement of the whole group of the participant U.
  • the output unit 116 displays an area or a line indicating the graph B1 for each participant U in a display mode such as a color, a pattern, or the like different for each participant.
  • a display mode such as a color, a pattern, or the like different for each participant.
  • the graph B1 is displayed in a different pattern for each participant U, and a legend that associates the participant U with the pattern is displayed in the vicinity of the graph B1. Thereby, the analyst can easily determine which participant U the graph B1 corresponds to.
  • the section name B2 is a character string representing the section name.
  • the section switching line B3 is a line indicating the switching timing of the two sections.
  • the output unit 116 displays, for each section indicated by the section information, the section name in the vicinity of the graph B1 of the time range corresponding to the section. Further, the output unit 116 specifies the switching timing of the two sections based on the time of the section indicated by the section information. Then, the output unit 116 causes the switching line B3 to be displayed at the time (horizontal axis) position of the graph B1 corresponding to the specified switching timing. Thereby, the output unit 116 can display which section the graph B1 of the amount of speech of each participant U corresponds to each time.
  • the output unit 116 superimposes on the time change of the utterance amount of each participant U, and displays the information indicating the section set in the discussion. Therefore, the analyst can grasp the time change of the amount of speech of each participant U for each section.
  • the output unit 116 can display the time change of the utterance amount of each participant U in a legible manner by determining the order of accumulating the utterance amount of the participant U in the graph B1 based on the utterance amount of each participant U it can.
  • the output unit 116 may switch between and display the speech amount screen B in FIG. 5 and the speech amount screen B in FIG. 6 according to the operation of the analyst, or may display at least one of the predetermined ones.
  • the output unit 116 measures the degree of variation (for example, variance or standard deviation) of the utterance amount of each participant U in each section based on the analysis result and the section information read from the analysis result storage section 133 Calculate). Then, the output unit 116 generates the graph B1 by accumulating the utterance amounts of the participants U in the order in which the degree of variation is small in each section. The output unit 116 may determine the stacking order based on the degree of variation of all sections, not for each section.
  • the degree of variation for example, variance or standard deviation
  • the change in the amount of utterance of the participant U disposed below is the apparent amount of utterance of the participant U disposed above It is possible to reduce the impact. Further, since the tendency of the amount of speech of each participant U changes depending on the section, by changing the stacking order for each section, it is possible to display the time change of the amount of speech more easily.
  • the output unit 116 may display a predetermined event that has occurred during the discussion (that is, within the time of the sound acquired by the sound acquisition unit 112) in the graph B1. Thereby, the analyst can analyze the influence of the occurrence of the event on the volume of each participant U's utterance.
  • the event is, for example, (1) access to a group of assistants (teachers, facilitators, etc.) of the discussion, or (2) specific remarks (words) of the assistants.
  • the event shown here is an example, and the output unit 116 may display the occurrence of other events that can be recognized by the voice analysis device 100.
  • the output unit 116 uses a signal transmitted and received between the sound collector 10 and the assistants.
  • the assistant holds a transmitter that emits a predetermined signal by radio waves or ultrasonic waves of wireless communication such as Bluetooth (registered trademark), for example, and the sound collection device 10 includes a receiver that receives the signal.
  • the output unit 116 indicates that the assistant has approached when the receiver of the sound collection device 10 can receive the signal from the transmitter of the assistant or when the strength of receiving the signal becomes equal to or higher than a predetermined threshold.
  • the output unit 116 is configured to leave the assistant when the receiver of the sound collection device 10 can not receive the signal from the transmitter of the assistant or when the intensity at which the signal is received becomes less than a predetermined threshold. Determine what you did.
  • the output unit 116 may use the voiceprint of the assistant (ie, the frequency spectrum of the assistant's voice) to detect the approach of the assistant's group.
  • the output unit 116 registers the voiceprint of the assistant in advance, and detects the voiceprint of the assistant in the voice acquired by the sound collection device 10 during the discussion. Then, the output unit 116 determines that the assistant has approached when detecting the assistant's voiceprint, and determines that the assistant has left when the assistant's voiceprint can not be detected.
  • the output unit 116 performs speech recognition on the speech of the assistant.
  • the assistant holds a sound collector (for example, a pin microphone), and the output unit 116 receives the voice of the assistant acquired by the sound collector held by the assistant.
  • a sound collector held by the assistant separately from the sound collector 10, the voice of the participant U and the voice of the assistant can be clearly distinguished.
  • the output unit 116 converts the voice acquired from the sound collection device held by the assistant into a character string.
  • the output unit 116 can use a known speech recognition method to convert speech into a character string. Then, the output unit 116 outputs specific words (for example, words related to the progress of the discussion such as “first”, “summary”, “last”, and words such as “good” or “bad”) in the converted character string. ) To detect.
  • the words to be detected are set in the voice analysis device 100 in advance. Then, when the specific word is detected, the output unit 116 determines that the specific word is uttered.
  • the output unit 116 may perform speech recognition only before and after the timing at which the change in the amount of speech of each participant U is large. In this case, based on the analysis result read out from the analysis result storage unit 133, the output unit 116 calculates the degree of change in the amount of speech per time (for example, the amount or ratio of change per unit time). The degree of change in the amount of speech may be calculated for each participant U, or may be calculated as the sum of all participants U.
  • the output unit 116 outputs the voice acquired by the sound collector held by the assistant in a predetermined time range (for example, 5 seconds after 5 seconds before the timing) including timing when the degree of change is equal to or higher than the predetermined threshold.
  • a predetermined time range for example, 5 seconds after 5 seconds before the timing
  • the output unit 116 outputs the voice acquired by the sound collector held by the assistant in a predetermined time range (for example, 5 seconds after 5 seconds before the timing) including timing when the degree of change is equal to or higher than the predetermined threshold.
  • FIG. 7 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen B.
  • event information B4 is displayed on the graph B1, and the other is the same as the speech amount screen B of FIG.
  • the output unit 116 may switch and display the speech amount screen B in FIG. 5 and the speech amount screen B in FIG. 7 according to the operation of the analyst, or may display at least one of the predetermined amounts.
  • the event information B4 is information indicating the content and timing of the event.
  • the event information B4 indicates the content of the event by, for example, a character string indicating that the assistant has approached or left or a character string indicating the speech of the assistant detected by speech recognition. Further, the event information B4 indicates the timing of the event by an arrow indicating the timing at which the event occurs on the graph B1.
  • the output unit 116 displays information indicating the content and timing of an event that has occurred in the discussion, superimposed on the time change of the utterance amount of each participant U. Therefore, the analyst can analyze how the event that occurred during the discussion influenced the time change of the volume of each participant U's utterance.
  • the analyst can evaluate that the teacher has activated the discussion, for example, when the amount of speech increases when the teacher approaches the group.
  • the analyst can also evaluate that the word is a valid word for activating the discussion, for example, when the amount of speech increases when a specific word is issued by the teacher.
  • the output unit 116 can extract and display a graph of a plurality of utterance amounts in the same section.
  • FIG. 8 is a front view of the display unit 21 of the communication terminal 20 displaying the section extraction screen C.
  • the output unit 116 displays the section extraction screen C for the specified section when, for example, the analyst specifies the name B2 of any section on the speech amount screen B in FIGS. 5 to 7.
  • the section extraction screen C is a screen for displaying a result of extracting a graph of the amount of speech of the same section, and includes a graph C1 of the amount of speech, a name C2 of the section, and a name C3 of the group.
  • the output unit 116 When displaying the section extraction screen C, the output unit 116 extracts analysis results and section information of a plurality of groups for the designated section from the analysis result storage section 133.
  • the groups to be displayed may be different groups discussed at the same time, or the same or different groups discussed in the past. Then, the output unit 116 generates display information for displaying the time change of the utterance amount of each participant for a plurality of groups in the designated section based on the extracted analysis result and the section information.
  • the graph C1 of the amount of speech is a graph showing the time change of the amount of speech of each participant U in the designated section for each of two or more groups.
  • the display mode of the graph C1 is the same as that of the graph B1.
  • the section name C2 is a character string indicating the name of the designated section.
  • the group name C3 is a name for identifying a group to be displayed, and may be set by the analyst or may be automatically determined by the voice analysis device 100.
  • the output unit 116 displays the graph C1 of two groups, but the graph C1 of three or more groups may be displayed. Also, the output unit 116 may display the names of one or more participants U belonging to the group instead of or in addition to the name C3 of the group.
  • the output unit 116 displays a plurality of graphs indicating temporal change in the amount of speech of each participant in different groups in the same section.
  • This allows the analyst to compare and analyze temporal changes in the volume of speech of different groups for the same section (e.g., the same subject, or the same stage in the discussion). For example, an analyst can grasp the tendency of the volume of utterance for each group by comparing different groups discussed at the same time. Also, for example, the analyst can grasp the change in the tendency of the utterance amount of the same group by comparing a plurality of past discussions of the same section for the same group.
  • the output unit 116 is not limited to the stacked graph as illustrated in FIG. 5, and may display a heat map indicating time change of the amount of speech of each participant U.
  • FIG. 9 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen D.
  • the speech amount screen D includes a heat map D1 of the speech amount, a section name D2, and a section switching line D3.
  • the section name D2 and the section switching line D3 are the same as the section name B2 and the section switching line B3 in FIG.
  • the speech amount heat map D1 displays the amount of speech along time by color.
  • FIG. 9 shows the color difference by the density of the points, for example, the higher the density of the points, the darker the color, and the lower the density of the points, the lighter the color.
  • the output unit 116 takes time in a predetermined direction (for example, the horizontal direction in FIG. 9) and causes the display unit 21 to display, for each participant U, an area of a color according to the amount of speech per hour.
  • the analyzer can also grasp the time change of the amount of speech of each participant U for each section by displaying the heat map instead of the graph.
  • the output unit 116 may switch and display the graph of FIG. 5 and the heat map of FIG. 9 according to the operation of the analyst, or may display at least one of the predetermined ones.
  • FIG. 10 is a sequence diagram of the speech analysis method performed by the speech analysis system S according to the present embodiment.
  • the communication terminal 20 receives the setting of analysis conditions from the analyst, and transmits the setting as setting information to the voice analysis device 100 (S11).
  • the setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and causes the setting information storage unit 131 to store the setting information.
  • the voice acquisition unit 112 of the voice analysis device 100 transmits a signal instructing voice acquisition to the sound collection device 10 (S12).
  • the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100
  • recording of voice is started using a plurality of sound collection units, and the voice analysis device 100 collects voices of a plurality of channels recorded.
  • the voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
  • the voice analysis device 100 starts voice analysis at one of timings when an analyst gives instructions, when voice acquisition ends, or during voice acquisition (that is, real-time processing).
  • the sound source localization unit 113 performs sound source localization based on the speech acquired by the speech acquisition unit 112 (S14).
  • the analysis unit 114 determines, based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113, which participant has spoken at each time.
  • the speech period and the speech volume are specified in (S15).
  • the analysis unit 114 causes the analysis result storage unit 133 to store the utterance period and the utterance amount for each participant.
  • the section setting unit 115 sets one or more sections for the voice corresponding to the argument to be analyzed (S16). At this time, the section setting unit 115 sets a section based on at least one of the operation in the communication terminal 20, the operation in the sound collection device 10, and the predetermined sound acquired by the sound collection device 10.
  • the section setting unit 115 stores section information indicating a section in the analysis result storage section 133 in association with the voice to be set.
  • the output unit 116 performs control to display the analysis result on the display unit 21 of the communication terminal 20 (S17). Specifically, based on the analysis result by the analysis unit 114 and the section information by the section setting unit 115, the output unit 116 displays the above-mentioned utterance amount screen B, the section extraction screen C, or the utterance amount screen D. Information is generated and transmitted to the communication terminal 20.
  • the communication terminal 20 causes the display unit 21 to display the analysis result in accordance with the display information received from the voice analysis device 100 (S18).
  • the voice analysis device 100 displays the time change of the amount of speech of each participant for each section. Thereby, the analyst can grasp the time change of the amount of speech of each participant for each section.
  • the voice analysis device 100 automatically analyzes the discussions of the plurality of participants based on the voice acquired using the sound collection device 10 having the plurality of sound collection units. Therefore, it is not necessary to have the recorder monitor the discussion as in the Harkness method described in Non-Patent Document 1, and it is not necessary to arrange the recorder for each group, so the cost is low.
  • the processor of the speech analysis device 100, the sound collection device 10, and the communication terminal 20 is a main body of each step (process) included in the speech analysis method shown in FIG. That is, the processors of the voice analysis device 100, the sound collection device 10 and the communication terminal 20 read a program for executing the voice analysis method shown in FIG. 10 from the storage unit, and execute the program to execute the voice analysis device 100. By controlling each part of the sound device 10 and the communication terminal 20, the voice analysis method shown in FIG. 10 is performed.
  • the steps included in the speech analysis method shown in FIG. 10 may be partially omitted, the order between the steps may be changed, and a plurality of steps may be performed in parallel.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The purpose of the present invention is to provide a voice analysis device, a voice analysis method, a voice analysis program, and a voice analysis system, which enable outputting of information for analysis based on a temporal change in the amount of speech spoken by a participant during a discussion. A voice analysis device 100 according to one embodiment of the present invention includes: a voice acquisition unit 112 that acquires voices uttered by a plurality of participants; an analysis unit 114 that specifies respective amounts of speech, among the voices, spoken by the plurality of participants for each time period; a section setting unit 115 that sets sections in the voices on the basis of an input from a user; and an output unit 116 that outputs a graph in which temporal changes in the amounts of speech spoken by the respective participants are accumulated, and information which indicates the sections in the graph.

Description

音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムSpeech analysis device, speech analysis method, speech analysis program and speech analysis system
 本発明は、音声を分析するための音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムに関する。 The present invention relates to a voice analysis device for analyzing voice, a voice analysis method, a voice analysis program and a voice analysis system.
 グループ学習や会議における議論を分析する方法として、ハークネス法(ハークネスメソッドともいう)が知られている(例えば、非特許文献1参照)。ハークネス法では、各参加者の発言の遷移を線で記録する。これにより、各参加者の議論への貢献や、他者との関係性を分析することができる。ハークネス法は、学生が主体的に学習を行うアクティブ・ラーニングにも効果的に適用できる。 The Harkness method (also referred to as the Harkness method) is known as a method for analyzing discussions in group learning and meetings (see, for example, Non-Patent Document 1). In the Harkness method, the transition of each participant's utterance is recorded in a line. In this way, it is possible to analyze the contribution of each participant to the discussion and the relationship with others. The Harkness Law can also be effectively applied to active learning where students take the initiative in learning.
 しかしながら、ハークネス法は議論の開始から終了までの全期間の発言の傾向を示すため、時系列に沿った各参加者の発言量の変化を示すことができない。そのため、各参加者の発言量の時間変化に基づく分析が難しいという問題があった。 However, since the Harkness method shows the tendency of the speech of the whole period from the start to the end of the discussion, it can not show the change of the speech volume of each participant along the time series. Therefore, there is a problem that it is difficult to analyze based on the time change of the volume of each participant.
 本発明はこれらの点に鑑みてなされたものであり、議論における参加者の発言量の時間変化に基づく分析を行うための情報を出力できる音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムを提供することを目的とする。 The present invention has been made in view of these points, and a speech analysis device, a speech analysis method, a speech analysis program, and a speech analysis that can output information for performing an analysis based on a time change of a participant's speech volume in a discussion It aims to provide a system.
 本発明の第1の態様の音声分析装置は、複数の参加者が発した音声を取得する取得部と、前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定する分析部と、ユーザからの入力に基づいて、前記音声において区間を設定する区間設定部と、前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを出力する出力部と、を有する。 According to a first aspect of the present invention, there is provided a voice analysis device including: an acquisition unit for acquiring voices uttered by a plurality of participants; and an analysis unit for identifying an utterance amount of each of the plurality of participants in the voice. A section setting unit for setting a section in the voice based on an input from the user, a graph in which temporal changes of the utterance amount of the plurality of participants are accumulated, and information indicating the section in the graph And an output unit for outputting.
 前記出力部は、2つの前記区間の間で切り替わった時間に対応する前記グラフ上の位置を、前記区間を示す情報として出力してもよい。 The output unit may output, as the information indicating the section, a position on the graph corresponding to a time when switching between the two sections.
 前記区間設定部は、前記音声分析装置と通信する通信端末における操作と、前記音声を取得する集音装置における操作と、前記音声に含まれる所定の音とのうち少なくとも1つに基づいて、前記区間を設定してもよい。 The section setting unit is configured to set the section based on at least one of an operation in a communication terminal that communicates with the voice analysis device, an operation in a sound collection device for obtaining the voice, and a predetermined sound included in the voice. A section may be set.
 前記出力部は、前記複数の参加者それぞれについて算出された前記発言量のばらつきの程度が小さい順に、前記発言量の時間変化を互いに積み上げた前記グラフを出力してもよい。 The output unit may output the graph in which temporal changes in the amount of utterance are stacked in ascending order of the degree of variation in the amount of utterance calculated for each of the plurality of participants.
 前記出力部は、前記複数の参加者それぞれについて算出された前記区間ごとの前記発言量のばらつきの程度が小さい順に、前記区間ごとに前記発言量の時間変化を互いに積み上げた前記グラフを出力してもよい。 The output unit outputs the graph in which temporal changes of the utterance amount are accumulated for each of the sections in ascending order of the variation degree of the utterance amount for each of the sections calculated for each of the plurality of participants. It is also good.
 前記出力部は、複数の前記音声に設定された同じ前記区間についての複数の前記グラフを出力してもよい。 The output unit may output a plurality of graphs of the same section set to a plurality of the voices.
 前記グラフ及び前記区間を示す情報に加えて、前記音声の時間内に発生したイベントを示す情報を、前記グラフ上に出力してもよい。 In addition to the graph and the information indicating the section, information indicating an event that has occurred within the time of the voice may be output on the graph.
 前記分析部は、所定の時間窓内に参加者の発言を行った時間の長さを、前記時間窓の長さで割った値を、前記発言量として特定してもよい。 The analysis unit may specify a value obtained by dividing the length of time during which a participant speaks within a predetermined time window by the length of the time window as the speech amount.
 本発明の第2の態様の音声分析方法は、プロセッサが、複数の参加者が発した音声を取得するステップと、前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定するステップと、ユーザからの入力に基づいて、前記音声において区間を設定するステップと、前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを出力するステップと、を実行する。 In the speech analysis method according to a second aspect of the present invention, the processor acquires a speech uttered by a plurality of participants, and a step of specifying an amount of speech of each of the plurality of participants in the speech per hour A step of setting a section in the voice based on an input from the user, a graph in which temporal changes in the amount of utterance of the plurality of participants are accumulated, and information indicating the section in the graph And the following steps:
 本発明の第3の態様の音声分析プログラムは、コンピュータに、複数の参加者が発した音声を取得するステップと、前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定するステップと、ユーザからの入力に基づいて、前記音声において区間を設定するステップと、前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを出力するステップと、を実行させる。 A voice analysis program according to a third aspect of the present invention includes the steps of: obtaining voices uttered by a plurality of participants on a computer; and identifying an amount of time of each of the plurality of participants in the voice. A step of setting a section in the voice based on an input from the user, a graph in which temporal changes in the amount of utterance of the plurality of participants are accumulated, and information indicating the section in the graph And to execute.
 本発明の第4の態様の音声分析システムは、音声分析装置と、前記音声分析装置と通信可能な通信端末と、を備え、前記通信端末は、情報を表示する表示部を有し、前記音声分析装置は、複数の参加者が発した音声を取得する取得部と、前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定する分析部と、ユーザからの入力に基づいて、前記音声において区間を設定する区間設定部と、前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを、前記表示部に表示させる出力部と、を有する。 A voice analysis system according to a fourth aspect of the present invention includes a voice analysis device and a communication terminal capable of communicating with the voice analysis device, the communication terminal having a display unit for displaying information, the voice The analysis apparatus is based on an acquisition unit that acquires voices uttered by a plurality of participants, an analysis unit that identifies an amount of speech of each of the plurality of participants in the voice, and an input from a user. An output unit that causes the display unit to display a section setting unit that sets a section in the voice, a graph in which temporal changes of the utterance amount of the plurality of participants are accumulated, and information indicating the section in the graph And.
 本発明によれば、議論の時系列に沿った各参加者の発言量の変化を出力できるという効果を奏する。 According to the present invention, it is possible to output a change in the amount of speech of each participant along the time series of the discussion.
本実施形態に係る音声分析システムの模式図である。It is a schematic diagram of the speech analysis system concerning this embodiment. 本実施形態に係る音声分析システムのブロック図である。It is a block diagram of a speech analysis system concerning this embodiment. 本実施形態に係る音声分析システムが行う音声分析方法の模式図である。It is a schematic diagram of the speech analysis method which the speech analysis system concerning this embodiment performs. 設定画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the setting screen. 発言量画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the speech volume screen. 発言量画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the speech volume screen. 発言量画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the speech volume screen. 区間抽出画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the area extraction screen. 発言量画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the speech volume screen. 本実施形態に係る音声分析システムが行う音声分析方法のシーケンス図である。It is a sequence diagram of the speech analysis method which the speech analysis system concerning this embodiment performs.
[音声分析システムSの概要]
 図1は、本実施形態に係る音声分析システムSの模式図である。音声分析システムSは、音声分析装置100と、集音装置10と、通信端末20とを含む。音声分析システムSが含む集音装置10及び通信端末20の数は限定されない。音声分析システムSは、その他のサーバ、端末等の機器を含んでもよい。
[Overview of speech analysis system S]
FIG. 1 is a schematic view of a speech analysis system S according to the present embodiment. The voice analysis system S includes a voice analysis device 100, a sound collection device 10, and a communication terminal 20. The number of sound collectors 10 and communication terminals 20 included in the speech analysis system S is not limited. The voice analysis system S may include devices such as other servers and terminals.
 音声分析装置100、集音装置10及び通信端末20は、ローカルエリアネットワーク、インターネット等のネットワークNを介して接続される。音声分析装置100、集音装置10及び通信端末20のうち少なくとも一部は、ネットワークNを介さず直接接続されてもよい。 The voice analysis device 100, the sound collection device 10, and the communication terminal 20 are connected via a network N such as a local area network or the Internet. At least a part of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 may be directly connected without the network N.
 集音装置10は、異なる向きに配置された複数の集音部(マイクロフォン)を含むマイクロフォンアレイを備える。例えばマイクロフォンアレイは、地面に対する水平面において、同一円周上に等間隔で配置された8個のマイクロフォンを含む。集音装置10は、マイクロフォンアレイを用いて取得した音声をデータとして音声分析装置100に送信する。 The sound collector 10 includes a microphone array including a plurality of sound collectors (microphones) arranged in different orientations. For example, the microphone array includes eight microphones equally spaced on the same circumference in the horizontal plane with respect to the ground. The sound collection device 10 transmits the voice acquired using the microphone array to the voice analysis device 100 as data.
 通信端末20は、有線又は無線の通信を行うことが可能な通信装置である。通信端末20は、例えばスマートフォン端末等の携帯端末、又はパーソナルコンピュータ等のコンピュータ端末である。通信端末20は、分析者から分析条件の設定を受け付けるとともに、音声分析装置100による分析結果を表示する。 The communication terminal 20 is a communication device capable of performing wired or wireless communication. The communication terminal 20 is, for example, a portable terminal such as a smart phone terminal or a computer terminal such as a personal computer. The communication terminal 20 receives the setting of analysis conditions from the analyst and displays the analysis result by the voice analysis device 100.
 音声分析装置100は、集音装置10によって取得された音声を、後述の音声分析方法によって分析するコンピュータである。また、音声分析装置100は、音声分析の結果を通信端末20に送信する。 The voice analysis device 100 is a computer that analyzes the voice acquired by the sound collection device 10 by a voice analysis method described later. Further, the voice analysis device 100 transmits the result of the voice analysis to the communication terminal 20.
[音声分析システムSの構成]
 図2は、本実施形態に係る音声分析システムSのブロック図である。図2において、矢印は主なデータの流れを示しており、図2に示していないデータの流れがあってよい。図2において、各ブロックはハードウェア(装置)単位の構成ではなく、機能単位の構成を示している。そのため、図2に示すブロックは単一の装置内に実装されてよく、あるいは複数の装置内に別れて実装されてよい。ブロック間のデータの授受は、データバス、ネットワーク、可搬記憶媒体等、任意の手段を介して行われてよい。
[Configuration of speech analysis system S]
FIG. 2 is a block diagram of the speech analysis system S according to the present embodiment. Arrows in FIG. 2 indicate the main data flow, and there may be data flows not shown in FIG. In FIG. 2, each block is not a hardware (apparatus) unit configuration but a function unit configuration. As such, the blocks shown in FIG. 2 may be implemented in a single device or may be implemented separately in multiple devices. Transfer of data between the blocks may be performed via any means such as a data bus, a network, a portable storage medium, and the like.
 通信端末20は、各種情報を表示するための表示部21と、分析者による操作を受け付けるための操作部22とを有する。表示部21は、液晶ディスプレイ、有機エレクトロルミネッセンス(OLED: Organic Light Emitting Diode)ディスプレイ等の表示装置を含む。操作部22は、ボタン、スイッチ、ダイヤル等の操作部材を含む。表示部21として分析者による接触の位置を検出可能なタッチスクリーンを用いることによって、表示部21と操作部22とを一体に構成してもよい。 The communication terminal 20 has a display unit 21 for displaying various information, and an operation unit 22 for receiving an operation by an analyst. The display unit 21 includes a display device such as a liquid crystal display or an organic light emitting diode (OLED) display. The operation unit 22 includes operation members such as a button, a switch, and a dial. The display unit 21 and the operation unit 22 may be integrally configured by using a touch screen capable of detecting the position of contact by the analyst as the display unit 21.
 音声分析装置100は、制御部110と、通信部120と、記憶部130とを有する。制御部110は、設定部111と、音声取得部112と、音源定位部113と、分析部114と、区間設定部115と、出力部116とを有する。記憶部130は、設定情報記憶部131と、音声記憶部132と、分析結果記憶部133とを有する。 The voice analysis device 100 includes a control unit 110, a communication unit 120, and a storage unit 130. The control unit 110 includes a setting unit 111, a sound acquisition unit 112, a sound source localization unit 113, an analysis unit 114, a section setting unit 115, and an output unit 116. The storage unit 130 includes a setting information storage unit 131, a voice storage unit 132, and an analysis result storage unit 133.
 通信部120は、ネットワークNを介して集音装置10及び通信端末20との間で通信をするための通信インターフェースである。通信部120は、通信を実行するためのプロセッサ、コネクタ、電気回路等を含む。通信部120は、外部から受信した通信信号に所定の処理を行ってデータを取得し、取得したデータを制御部110に入力する。また、通信部120は、制御部110から入力されたデータに所定の処理を行って通信信号を生成し、生成した通信信号を外部に送信する。 The communication unit 120 is a communication interface for communicating with the sound collection device 10 and the communication terminal 20 via the network N. The communication unit 120 includes a processor for performing communication, a connector, an electric circuit, and the like. The communication unit 120 performs predetermined processing on a communication signal received from the outside to acquire data, and inputs the acquired data to the control unit 110. Further, the communication unit 120 performs predetermined processing on the data input from the control unit 110 to generate a communication signal, and transmits the generated communication signal to the outside.
 記憶部130は、ROM(Read Only Memory)、RAM(Random Access Memory)、ハードディスクドライブ等を含む記憶媒体である。記憶部130は、制御部110が実行するプログラムを予め記憶している。記憶部130は、音声分析装置100の外部に設けられてもよく、その場合に通信部120を介して制御部110との間でデータの授受を行ってもよい。 The storage unit 130 is a storage medium including a read only memory (ROM), a random access memory (RAM), a hard disk drive, and the like. The storage unit 130 stores in advance a program to be executed by the control unit 110. The storage unit 130 may be provided outside the voice analysis device 100, and in this case, data may be exchanged with the control unit 110 via the communication unit 120.
 設定情報記憶部131は、通信端末20において分析者によって設定された分析条件を示す設定情報を記憶する。音声記憶部132は、集音装置10によって取得された音声を記憶する。分析結果記憶部133は、音声を分析した結果を示す分析結果を記憶する。設定情報記憶部131、音声記憶部132及び分析結果記憶部133は、それぞれ記憶部130上の記憶領域であってもよく、あるいは記憶部130上で構成されたデータベースであってもよい。 The setting information storage unit 131 stores setting information indicating analysis conditions set by the analyst in the communication terminal 20. The voice storage unit 132 stores the voice acquired by the sound collection device 10. The analysis result storage unit 133 stores an analysis result indicating the result of analyzing the voice. The setting information storage unit 131, the voice storage unit 132, and the analysis result storage unit 133 may be storage areas on the storage unit 130, or a database configured on the storage unit 130.
 制御部110は、例えばCPU(Central Processing Unit)等のプロセッサであり、記憶部130に記憶されたプログラムを実行することにより、設定部111、音声取得部112、音源定位部113、分析部114、区間設定部115及び出力部116として機能する。設定部111、音声取得部112、音源定位部113、分析部114、区間設定部115及び出力部116の機能については、図3~図9を用いて後述する。制御部110の機能の少なくとも一部は、電気回路によって実行されてもよい。また、制御部110の機能の少なくとも一部は、ネットワーク経由で実行されるプログラムによって実行されてもよい。 The control unit 110 is a processor such as a central processing unit (CPU), for example, and executes the program stored in the storage unit 130 to set the setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the analysis unit 114, It functions as a section setting unit 115 and an output unit 116. The functions of the setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the analysis unit 114, the section setting unit 115, and the output unit 116 will be described later with reference to FIGS. 3 to 9. At least a part of the functions of the control unit 110 may be performed by an electrical circuit. In addition, at least a part of the functions of the control unit 110 may be executed by a program executed via a network.
 本実施形態に係る音声分析システムSは、図2に示す具体的な構成に限定されない。例えば音声分析装置100は、1つの装置に限られず、2つ以上の物理的に分離した装置が有線又は無線で接続されることにより構成されてもよい。 The speech analysis system S according to the present embodiment is not limited to the specific configuration shown in FIG. For example, the voice analysis device 100 is not limited to one device, and may be configured by connecting two or more physically separated devices in a wired or wireless manner.
[音声分析方法の説明]
 図3は、本実施形態に係る音声分析システムSが行う音声分析方法の模式図である。まず分析者は、通信端末20の操作部22を操作することによって、分析条件の設定を行う。例えば分析条件は、分析対象とする議論の参加者の人数と、集音装置10を基準とした各参加者(すなわち、複数の参加者それぞれ)が位置する向きとを示す情報である。通信端末20は、分析者から分析条件の設定を受け付け、設定情報として音声分析装置100に送信する(a)。音声分析装置100の設定部111は、通信端末20から設定情報を取得して設定情報記憶部131に記憶させる。
[Description of voice analysis method]
FIG. 3 is a schematic view of the speech analysis method performed by the speech analysis system S according to the present embodiment. First, the analyst sets the analysis conditions by operating the operation unit 22 of the communication terminal 20. For example, the analysis condition is information indicating the number of participants in the argument to be analyzed and the direction in which each participant (that is, each of a plurality of participants) is located with reference to the sound collection device 10. The communication terminal 20 receives the setting of analysis conditions from the analyst, and transmits the setting as the setting information to the voice analysis device 100 (a). The setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and causes the setting information storage unit 131 to store the setting information.
 図4は、設定画面Aを表示している通信端末20の表示部21の前面図である。通信端末20は、表示部21上に設定画面Aを表示し、分析者による分析条件の設定を受け付ける。設定画面Aは、位置設定領域A1と、開始ボタンA2と、終了ボタンA3とを含む。位置設定領域A1は、分析対象の議論において、集音装置10を基準として各参加者Uが実際に位置する向きを設定する領域である。例えば位置設定領域A1は、図4のように集音装置10の位置を中心とした円を表し、さらに円に沿って集音装置10を基準とした角度を表している。 FIG. 4 is a front view of the display unit 21 of the communication terminal 20 displaying the setting screen A. As shown in FIG. The communication terminal 20 displays the setting screen A on the display unit 21 and receives the setting of the analysis condition by the analyst. The setting screen A includes a position setting area A1, a start button A2, and an end button A3. The positioning area A1 is an area for setting the direction in which each participant U is actually positioned with reference to the sound collection device 10 in the argument to be analyzed. For example, the position setting area A1 represents a circle centered on the position of the sound collector 10 as shown in FIG. 4, and further represents an angle based on the sound collector 10 along the circle.
 分析者は、通信端末20の操作部22を操作することによって、位置設定領域A1において各参加者Uの位置を設定する。各参加者Uについて設定された位置の近傍には、各参加者Uを識別する識別情報(ここではU1~U4)が割り当てられて表示される。図4の例では、4人の参加者U1~U4が設定されている。位置設定領域A1内の各参加者Uに対応する部分は、参加者ごとに異なる色で表示される。これにより、分析者は容易に各参加者Uが設定されている向きを認識することができる。 The analyst sets the position of each participant U in the position setting area A1 by operating the operation unit 22 of the communication terminal 20. In the vicinity of the position set for each participant U, identification information (here, U1 to U4) for identifying each participant U is allocated and displayed. In the example of FIG. 4, four participants U1 to U4 are set. The portion corresponding to each participant U in the positioning area A1 is displayed in a different color for each participant. Thereby, the analyst can easily recognize the direction in which each participant U is set.
 開始ボタンA2及び終了ボタンA3は、それぞれ表示部21上に表示された仮想的なボタンである。通信端末20は、分析者によって開始ボタンA2が押下されると、音声分析装置100に開始指示の信号を送信する。通信端末20は、分析者によって終了ボタンA3が押下されると、音声分析装置100に終了指示の信号を送信する。本実施形態では、分析者による開始指示から終了指示までを1つの議論とする。 The start button A2 and the end button A3 are virtual buttons displayed on the display unit 21, respectively. The communication terminal 20 transmits a signal of a start instruction to the voice analysis device 100 when the analyst presses the start button A2. The communication terminal 20 transmits a signal of a termination instruction to the voice analysis device 100 when the analyst presses the termination button A3. In the present embodiment, from the start instruction to the end instruction by the analyst is one discussion.
 音声分析装置100の音声取得部112は、通信端末20から開始指示の信号を受信した場合に、音声の取得を指示する信号を集音装置10に送信する(b)。集音装置10は、音声分析装置100から音声の取得を指示する信号を受信した場合に、音声の取得を開始する。また、音声分析装置100の音声取得部112は、通信端末20から終了指示の信号を受信した場合に、音声の取得の終了を指示する信号を集音装置10に送信する。集音装置10は、音声分析装置100から音声の取得の終了を指示する信号を受信した場合に、音声の取得を終了する。 When the voice acquisition unit 112 of the voice analysis device 100 receives the signal of the start instruction from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing acquisition of voice to the sound collection device 10 (b). When the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100, the collection of voice is started. Further, when the voice acquisition unit 112 of the voice analysis device 100 receives the signal of the termination instruction from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing the termination of the voice acquisition to the sound collection device 10. When the sound collection device 10 receives a signal instructing the end of the acquisition of the sound from the speech analysis device 100, the sound collection device 10 ends the acquisition of the sound.
 集音装置10は、複数の集音部においてそれぞれ音声を取得し、各集音部に対応する各チャネルの音声として内部に記録する。そして集音装置10は、取得した複数のチャネルの音声を、音声分析装置100に送信する(c)。集音装置10は、取得した音声を逐次送信してもよく、あるいは所定量又は所定時間の音声を送信してもよい。また、集音装置10は、取得の開始から終了までの音声をまとめて送信してもよい。音声分析装置100の音声取得部112は、集音装置10から音声を受信して音声記憶部132に記憶させる。 The sound collection device 10 acquires voices in each of a plurality of sound collection units, and internally records the sound as the sound of each channel corresponding to each sound collection unit. Then, the sound collection device 10 transmits the acquired voices of the plurality of channels to the voice analysis device 100 (c). The sound collector 10 may transmit the acquired voice sequentially or may transmit a predetermined amount or a predetermined time of sound. Further, the sound collection device 10 may collectively transmit the sound from the start to the end of the acquisition. The voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
 音声分析装置100は、集音装置10から取得した音声を用いて、所定のタイミングで音声を分析する。音声分析装置100は、分析者が通信端末20において所定の操作によって分析指示を行った際に、音声を分析してもよい。この場合には、分析者は分析対象とする議論に対応する音声を音声記憶部132に記憶された音声の中から選択する。 The voice analysis device 100 analyzes voice at predetermined timing using the voice acquired from the sound collection device 10. The voice analysis device 100 may analyze the voice when the analyst gives an analysis instruction at the communication terminal 20 by a predetermined operation. In this case, the analyst selects a voice corresponding to the argument to be analyzed from the voices stored in the voice storage unit 132.
 また、音声分析装置100は、音声の取得が終了した際に音声を分析してもよい。この場合には、取得の開始から終了までの音声が分析対象の議論に対応する。また、音声分析装置100は、音声の取得の途中で逐次(すなわちリアルタイム処理で)音声を分析してもよい。この場合には、音声分析装置100は、現在時間から遡って過去の所定時間分(例えば30秒間)の音声が分析対象の議論に対応する。 In addition, the voice analysis device 100 may analyze the voice when the voice acquisition ends. In this case, the speech from the start to the end of the acquisition corresponds to the argument to be analyzed. In addition, the voice analysis device 100 may analyze voice sequentially (that is, in real time processing) during acquisition of voice. In this case, the voice analysis device 100 goes back from the current time, and the voice for a predetermined time (for example, 30 seconds) in the past corresponds to the argument to be analyzed.
 音声を分析する際に、まず音源定位部113は、音声取得部112が取得した複数チャネルの音声に基づいて音源定位を行う(d)。音源定位は、音声取得部112が取得した音声に含まれる音源の向きを、時間ごと(例えば10ミリ秒~100ミリ秒ごと)に推定する処理である。音源定位部113は、時間ごとに推定した音源の向きを、設定情報記憶部131に記憶された設定情報が示す参加者の向きと関連付ける。 When analyzing speech, first, the sound source localization unit 113 performs sound source localization based on the plurality of channels of speech acquired by the speech acquisition unit 112 (d). The sound source localization is processing for estimating the direction of the sound source included in the sound acquired by the sound acquisition unit 112 for each time (for example, every 10 milliseconds to 100 milliseconds). The sound source localization unit 113 associates the direction of the sound source estimated for each time with the direction of the participant indicated by the setting information stored in the setting information storage unit 131.
 音源定位部113は、集音装置10から取得した音声に基づいて音源の向きを特定可能であれば、MUSIC(Multiple Signal Classification)法、ビームフォーミング法等、公知の音源定位方法を用いることができる。 If the sound source localization unit 113 can identify the direction of the sound source based on the sound acquired from the sound collection device 10, a known sound source localization method such as Multiple Signal Classification (MUSIC) method or beam forming method can be used. .
 次に分析部114は、音声取得部112が取得した音声及び音源定位部113が推定した音源の向きに基づいて、音声を分析する(e)。分析部114は、完了した議論の全体を分析対象としてもよく、あるいはリアルタイム処理の場合に議論の一部を分析対象としてもよい。 Next, the analysis unit 114 analyzes the voice based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113 (e). The analysis unit 114 may analyze the entire completed discussion as an analysis target, or may analyze a part of the discussion in the case of real-time processing.
 具体的には、まず分析部114は、音声取得部112が取得した音声及び音源定位部113が推定した音源の向きに基づいて、分析対象の議論において、時間ごと(例えば10ミリ秒~100ミリ秒ごと)に、いずれの参加者が発言(発声)したかを判別する。分析部114は、1人の参加者が発言を開始してから終了するまでの連続した期間を発言期間として特定し、分析結果記憶部133に記憶させる。同じ時間に複数の参加者が発言を行った場合には、分析部114は、参加者ごとに発言期間を特定する。 Specifically, based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113, the analysis unit 114 first performs analysis (for example, 10 milliseconds to 100 milliseconds) in the discussion of the analysis target. Every second), it is determined which participant speaks (speaks). The analysis unit 114 specifies a continuous period from the start to the end of one participant's speech as a speech period, and causes the analysis result storage unit 133 to store the same. When a plurality of participants speak at the same time, the analysis unit 114 specifies a speech period for each participant.
 また、分析部114は、時間ごとの各参加者の発言量を算出し、分析結果記憶部133に記憶させる。具体的には、分析部114は、ある時間窓(例えば5秒間)において、参加者の発言を行った時間の長さを時間窓の長さで割った値を、時間ごとの発言量(活性度ともいう)として算出する。そして分析部114は、議論の開始時間から終了時間(リアルタイム処理の場合には現在)まで、時間窓を所定の時間(例えば1秒)ずつずらしながら、各参加者について時間ごとの発言量の算出を繰り返す。 In addition, the analysis unit 114 calculates the amount of speech of each participant for each time, and causes the analysis result storage unit 133 to store the amount. Specifically, in a certain time window (for example, 5 seconds), the analysis unit 114 divides the length of time during which the participant speaks by the length of the time window, the amount of speech per hour (activity Calculated as a degree). Then, the analysis unit 114 calculates the amount of speech per hour for each participant while shifting the time window by a predetermined time (for example, one second) from the start time of the discussion to the end time (currently in the case of real time processing). repeat.
 区間設定部115は、ユーザ(参加者又は分析者)からの入力に基づいて、分析対象の議論に対応する音声に対して、1つ以上の区間を設定する。区間は、例えば「国語」、「理科」、「社会」のような議論の題目となった教科ごとに設定されてもよく、「ディスカッション」、「アイデア出し」、「まとめ」のような議論中の段階ごとに設定されてもよい。区間設定部115は、区間を示す区間情報を、設定対象の音声と関連付けて分析結果記憶部133に記憶させる。 The section setting unit 115 sets one or more sections for the voice corresponding to the argument to be analyzed based on the input from the user (the participant or the analyst). The section may be set for each subject subject to a discussion such as "Japanese language", "Science" or "Society", for example, and a discussion such as "Discussion", "Idea" or "Summary" It may be set for each stage of The section setting unit 115 stores section information indicating a section in the analysis result storage section 133 in association with the voice to be set.
 区間情報は、区間の名称と、区間の時間(すなわち音声中の区間の開始時間及び終了時間)とを含む。区間設定部115は、(1)通信端末20における操作、(2)集音装置10における操作、及び(3)集音装置10が取得した所定の音のうち少なくとも1つに基づいて、区間を設定する。 The section information includes the section name and the section time (ie, the start time and end time of the section in the voice). The section setting unit 115 determines a section based on at least one of (1) an operation in the communication terminal 20, (2) an operation in the sound collector 10, and (3) a predetermined sound acquired by the sound collector 10. Set
 通信端末20における操作に基づいて区間を設定する場合には、参加者又は分析者は、通信端末20の操作部22(例えばタッチスクリーン、マウス、キーボード等)を操作することによって、区間情報に含まれる文字列及び時間を入力する。参加者又は分析者は、議論の終了後に区間情報を入力してもよく、あるいは議論の途中で区間情報を入力してもよい。そして区間設定部115は、通信端末20において特定された区間情報を、通信部120を介して受信して分析結果記憶部133に記憶させる。 When setting a section based on an operation on the communication terminal 20, the participant or the analyst includes the section information by operating the operation unit 22 (for example, a touch screen, a mouse, a keyboard, etc.) of the communication terminal 20. Input the character string and time. The participant or the analyst may input the section information after the end of the discussion, or may input the section information in the middle of the discussion. Then, the section setting unit 115 receives section information specified in the communication terminal 20 via the communication unit 120 and stores the information in the analysis result storage unit 133.
 集音装置10における操作に基づいて区間を設定する場合には、参加者又は分析者は、区間の切り替え時に、集音装置10に設けられたスイッチやタッチスクリーン等の操作部を操作することによって、区間を設定する。集音装置10の操作部の操作は、予め所定の区間の切り替え(例えば「ディスカッション」区間から「アイデア出し」区間への切り替え)に関連付けられている。区間設定部115は、通信部120を介して集音装置10の操作部から操作を示す情報を受信し、該操作のタイミングにおける所定の区間の切り替えを特定する。そして区間設定部115は、特定した区間情報を、分析結果記憶部133に記憶させる。 When setting a section based on an operation in the sound collection device 10, the participant or the analyst operates the operation unit such as a switch or a touch screen provided on the sound collection device 10 when switching the section. , Set the interval. The operation of the operation unit of the sound collection device 10 is associated in advance with switching of a predetermined section (for example, switching from a "discussion" section to an "idea out" section). The section setting unit 115 receives information indicating an operation from the operation unit of the sound collection device 10 via the communication unit 120, and specifies switching of a predetermined section at the timing of the operation. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.
 集音装置10が取得した所定の音に基づいて区間を設定する場合には、参加者又は分析者は、音を発生可能な装置(例えば携帯端末、音楽再生装置等)を用いて、区間の切り替えを示す所定の切り替え音を発生させる。切り替え音は、人間に聴こえる音波でもよく、人間に聴こえない超音波でもよい。切り替え音は、例えば予め定義された周波数又はオン/オフのパターンによって、区間の切り替えを示す。切り替え音は、区間の切り替えのタイミングのみで発せられてもよく、あるいは区間の中で持続的に発せられてもよい。 When setting the section based on the predetermined sound acquired by the sound collection device 10, the participant or the analyst uses the device capable of generating the sound (for example, a portable terminal, a music reproduction apparatus, etc.) to set the section. A predetermined switching sound indicating switching is generated. The switching sound may be a sound wave that can be heard by humans, or an ultrasonic wave that can not be heard by humans. The switching sound indicates the switching of the section by, for example, a predefined frequency or an on / off pattern. The switching sound may be emitted only at the switching timing of the section, or may be emitted continuously in the section.
 切り替え音として、区間ごとに異なる音を用いることができる。この場合に、区間設定部115は、集音装置10が取得した音声に含まれる切り替え音を検出する。そして区間設定部115は、切り替え音が変化したタイミングにおける、変化前の切り替え音に対応する区間から変化後の切り替え音に対応する区間への切り替えを特定する。そして区間設定部115は、特定した区間情報を、分析結果記憶部133に記憶させる。 Different sounds can be used for each section as the switching sound. In this case, the section setting unit 115 detects the switching sound included in the sound acquired by the sound collection device 10. Then, the section setting unit 115 specifies switching from the section corresponding to the switching sound before the change to the section corresponding to the switching sound after the change at the timing when the switching sound changes. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.
 また、切り替え音として、所定の区間の切り替え(例えば「ディスカッション」区間から「アイデア出し」区間への切り替え)を示す音を用いることができる。この場合に、区間設定部115は、集音装置10が取得した音声に含まれる切り替え音を検出する。そして区間設定部115は、切り替え音が発せられたタイミングにおける、所定の区間の切り替えを特定する。そして区間設定部115は、特定した区間情報を、分析結果記憶部133に記憶させる。 In addition, as the switching sound, a sound indicating switching of a predetermined section (for example, switching from a “discussion” section to an “ideal” section) can be used. In this case, the section setting unit 115 detects the switching sound included in the sound acquired by the sound collection device 10. Then, the section setting unit 115 specifies switching of a predetermined section at the timing when the switching sound is emitted. Then, the section setting unit 115 stores the specified section information in the analysis result storage unit 133.
 出力部116は、表示情報を通信端末20に送信することによって、分析部114による分析結果を表示部21上に表示させる制御を行う(f)。出力部116は、表示部21への表示に限られず、プリンタによる印刷、記憶装置へのデータ記録等、その他の方法によって分析結果を出力してもよい。出力部116による分析結果の出力方法を、図5~図9を用いて以下に説明する。 The output unit 116 performs control to display the analysis result by the analysis unit 114 on the display unit 21 by transmitting the display information to the communication terminal 20 (f). The output unit 116 is not limited to the display on the display unit 21 and may output the analysis result by other methods such as printing by a printer, data recording to a storage device, and the like. The method of outputting the analysis result by the output unit 116 will be described below with reference to FIGS. 5 to 9.
[区間ごとの発言量の表示方法の説明]
 音声分析装置100の出力部116は、分析結果を表示する際に、表示対象の議論についての分析部114による分析結果及び区間設定部115による区間情報を分析結果記憶部133から読み出す。出力部116は、分析部114による分析が完了した直後の議論を表示対象としてもよく、あるいは分析者によって指定された議論を表示対象としてもよい。
[Explanation of the display method of the amount of utterance for each section]
When displaying the analysis result, the output unit 116 of the voice analysis device 100 reads out, from the analysis result storage unit 133, the analysis result by the analysis unit 114 and the section information by the section setting unit 115 for the display target discussion. The output unit 116 may display a discussion immediately after the analysis by the analysis unit 114 is completed, or may display a discussion specified by the analyst.
 図5は、発言量画面Bを表示している通信端末20の表示部21の前面図である。発言量画面Bは、区間ごとの発言量の時間変化を示す情報を表示する画面であり、発言量のグラフB1と、区間の名称B2と、区間の切り替え線B3とを含む。 FIG. 5 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen B. As shown in FIG. The speech amount screen B is a screen for displaying information indicating time change of the speech amount for each section, and includes a graph B1 of the speech amount, the name of the section B2, and the switching line B3 of the section.
 発言量画面Bを表示する際に、出力部116は、分析結果記憶部133から読み出した分析結果及び区間情報に基づいて、区間ごとの各参加者の発言量の時間変化を表示するための表示情報を生成する。 When displaying the speech amount screen B, the output unit 116 is a display for displaying the time change of the speech amount of each participant for each section based on the analysis result and the section information read from the analysis result storage section 133. Generate information.
 グラフB1は、各参加者Uの発言量の時間変化を示すグラフである。出力部116は、縦軸に発言量(活性度)を、横軸に時間をとって、各参加者Uについて分析結果が示す時間ごとの発言量を折れ線グラフとして、表示部21に表示させる。このとき、出力部116は、各時点において参加者Uの発言量を互いに積み上げて、すなわち参加者Uの発言量を順に合計した値を、縦軸にとって表示する。 The graph B1 is a graph showing the time change of the amount of speech of each participant U. The output unit 116 displays the amount of speech (activity) on the vertical axis and the time on the horizontal axis, and displays the amount of speech for each participant U at each time indicated by the analysis result on the display unit 21 as a line graph. At this time, the output unit 116 accumulates the amounts of speech of the participants U at each point in time, that is, displays the sum of the amounts of speech of the participants U in order on the vertical axis.
 図5の例では、参加者U4の発言量は参加者U3及びU4の発言量の合計値であり、参加者U2の発言量は参加者U2、U3及びU4の発言量の合計値であり、参加者U1の発言量は参加者U1、U2、U3及びU4の発言量の合計値である。出力部116は、参加者Uの発言量を積み上げる(合計する)順番を、無作為に決定してもよく、あるいは所定の規則に従って決定してもよい。 In the example of FIG. 5, the amount of speech of participant U4 is the total value of the amounts of speech of participants U3 and U4, and the amount of speech of participant U2 is the total value of the amounts of speech of participants U2, U3 and U4. The amount of speech of the participant U1 is a total value of the amounts of speech of the participants U1, U2, U3, and U4. The output unit 116 may randomly determine the order of accumulating (summing) the utterance amounts of the participants U, or may determine the order according to a predetermined rule.
 これにより出力部116は、各参加者Uの発言量に加えて、議論のグループ全体の発言量を表示することができる。分析者は、各参加者Uの貢献の時間変化を把握できると同時に、参加者Uのグループ全体の盛り上がりの時間変化を把握できる。 Thus, the output unit 116 can display the amount of speech of the entire group of discussions in addition to the amount of speech of each participant U. The analyst can grasp the time change of contribution of each participant U, and at the same time grasp the time change of excitement of the whole group of the participant U.
 出力部116は、参加者UごとのグラフB1を示す領域又は線を、参加者ごとに異なる色、模様等の表示態様で表示する。図5の例では参加者Uごとに異なる模様でグラフB1が表示されており、グラフB1の近傍に参加者Uと模様とを関連付ける凡例が表示されている。これにより、分析者はグラフB1がいずれの参加者Uに対応するかを容易に判別できる。 The output unit 116 displays an area or a line indicating the graph B1 for each participant U in a display mode such as a color, a pattern, or the like different for each participant. In the example of FIG. 5, the graph B1 is displayed in a different pattern for each participant U, and a legend that associates the participant U with the pattern is displayed in the vicinity of the graph B1. Thereby, the analyst can easily determine which participant U the graph B1 corresponds to.
 区間の名称B2は、区間の名称を表す文字列である。区間の切り替え線B3は、2つの区間の切り替わりのタイミングを示す線である。出力部116は、区間情報が示す各区間について、該区間に対応する時間範囲のグラフB1の近傍に区間の名称に表示させる。また、出力部116は、区間情報が示す区間の時間に基づいて、2つの区間の切り替わりのタイミングを特定する。そして出力部116は、特定した切り替わりのタイミングに対応するグラフB1の時間(横軸)の位置に切り替え線B3を表示させる。これにより出力部116は、各参加者Uの発言量のグラフB1が時間ごとにいずれの区間に対応するかを表示することができる。 The section name B2 is a character string representing the section name. The section switching line B3 is a line indicating the switching timing of the two sections. The output unit 116 displays, for each section indicated by the section information, the section name in the vicinity of the graph B1 of the time range corresponding to the section. Further, the output unit 116 specifies the switching timing of the two sections based on the time of the section indicated by the section information. Then, the output unit 116 causes the switching line B3 to be displayed at the time (horizontal axis) position of the graph B1 corresponding to the specified switching timing. Thereby, the output unit 116 can display which section the graph B1 of the amount of speech of each participant U corresponds to each time.
 このように出力部116は、各参加者Uの発言量の時間変化に重畳して、議論の中で設定された区間を示す情報を表示する。そのため分析者は、各参加者Uの発言量の時間変化を、区間ごとに把握することができる。 As described above, the output unit 116 superimposes on the time change of the utterance amount of each participant U, and displays the information indicating the section set in the discussion. Therefore, the analyst can grasp the time change of the amount of speech of each participant U for each section.
 グラフB1は、各参加者Uの発言量を積み上げて(合計して)表示しているため、下に配置された参加者Uの発言量が変化すると、それにともなって上に配置された参加者Uの発言量も見かけ上変化したように表示される。そのため、各参加者Uの発言量の時間変化が一見してわかりづらい場合がある。そこで出力部116は、グラフB1において参加者Uの発言量を積み上げる順番を各参加者Uの発言量に基づいて決定することによって、各参加者Uの発言量の時間変化を見やすく表示することができる。 Since the graph B1 displays the amount of speech of each participant U accumulated (summed), when the amount of speech of the participant U arranged below changes, the participants arranged upward accordingly The amount of U's speech is also displayed as apparently changed. Therefore, the time change of the amount of speech of each participant U may be difficult to understand at first glance. Therefore, the output unit 116 can display the time change of the utterance amount of each participant U in a legible manner by determining the order of accumulating the utterance amount of the participant U in the graph B1 based on the utterance amount of each participant U it can.
 図6は、発言量画面Bを表示している通信端末20の表示部21の前面図である。図6の発言量画面Bにおいては発言量を積み上げる順番が区間ごとに変更されており、それ以外については図5の発言量画面Bと同様である。出力部116は、図5の発言量画面Bと図6の発言量画面Bとを分析者の操作に応じて切り替えて表示してもよく、予め定められた少なくとも一方を表示してもよい。 6 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen B. As shown in FIG. In the speech amount screen B of FIG. 6, the order of stacking the speech amount is changed for each section, and the other is the same as the speech amount screen B of FIG. 5. The output unit 116 may switch between and display the speech amount screen B in FIG. 5 and the speech amount screen B in FIG. 6 according to the operation of the analyst, or may display at least one of the predetermined ones.
 積み上げる順番を変更する場合に、出力部116は、分析結果記憶部133から読み出した分析結果及び区間情報に基づいて、各区間における各参加者Uの発言量のばらつきの程度(例えば分散又は標準偏差)を算出する。そして出力部116は、区間ごとにばらつきの程度が小さい順番で参加者Uの発言量を積み上げることによって、グラフB1を生成する。出力部116は、区間ごとではなく、全区間のばらつきの程度に基づいて積み上げる順番を決定してもよい。 When changing the stacking order, the output unit 116 measures the degree of variation (for example, variance or standard deviation) of the utterance amount of each participant U in each section based on the analysis result and the section information read from the analysis result storage section 133 Calculate). Then, the output unit 116 generates the graph B1 by accumulating the utterance amounts of the participants U in the order in which the degree of variation is small in each section. The output unit 116 may determine the stacking order based on the degree of variation of all sections, not for each section.
 このように発言量のばらつきの程度が小さい順にグラフB1の下から積み上げることによって、下に配置された参加者Uの発言量の変化が、上に配置された参加者Uの見かけの発言量に及ぼす影響を低減することができる。また、区間によって各参加者Uの発言量の傾向が変化するため、区間ごとに積み上げの順番を変更することによって、発言量の時間変化をより見やすく表示することができる。 By accumulating from the bottom of the graph B1 in ascending order of the degree of variation in the amount of utterance in this manner, the change in the amount of utterance of the participant U disposed below is the apparent amount of utterance of the participant U disposed above It is possible to reduce the impact. Further, since the tendency of the amount of speech of each participant U changes depending on the section, by changing the stacking order for each section, it is possible to display the time change of the amount of speech more easily.
[イベントの表示方法の説明]
 出力部116は、グラフB1において、議論中(すなわち音声取得部112が取得した音声の時間内)に発生した所定のイベントを表示してもよい。これにより、分析者はイベントの発生が各参加者Uの発言量に与えた影響を分析することができる。イベントは、例えば(1)議論の補助者(教師、ファシリテータ等)のグループへの接近、又は(2)補助者の特定の発言(言葉)である。ここに示したイベントは一例であり、出力部116は、音声分析装置100が認識可能なその他イベントの発生を表示してもよい。
[Description of event display method]
The output unit 116 may display a predetermined event that has occurred during the discussion (that is, within the time of the sound acquired by the sound acquisition unit 112) in the graph B1. Thereby, the analyst can analyze the influence of the occurrence of the event on the volume of each participant U's utterance. The event is, for example, (1) access to a group of assistants (teachers, facilitators, etc.) of the discussion, or (2) specific remarks (words) of the assistants. The event shown here is an example, and the output unit 116 may display the occurrence of other events that can be recognized by the voice analysis device 100.
 補助者のグループへの接近を検出するために、出力部116は、集音装置10と補助者との間で授受される信号を用いる。この場合に、補助者は例えばBluetooth(登録商標)等の無線通信の電波や超音波等によって所定の信号を発する発信機を保持し、集音装置10は該信号を受信する受信機を備える。そして出力部116は、集音装置10の受信機において補助者の発信機からの信号を受信できた場合又は信号を受信した強度が所定の閾値以上となった場合に、補助者が接近したことを判定する。また、出力部116は、集音装置10の受信機において補助者の発信機からの信号を受信できなくなった場合又は信号を受信した強度が所定の閾値未満となった場合に、補助者が離脱したことを判定する。 In order to detect the approach of a group of assistants, the output unit 116 uses a signal transmitted and received between the sound collector 10 and the assistants. In this case, the assistant holds a transmitter that emits a predetermined signal by radio waves or ultrasonic waves of wireless communication such as Bluetooth (registered trademark), for example, and the sound collection device 10 includes a receiver that receives the signal. The output unit 116 indicates that the assistant has approached when the receiver of the sound collection device 10 can receive the signal from the transmitter of the assistant or when the strength of receiving the signal becomes equal to or higher than a predetermined threshold. Determine Further, the output unit 116 is configured to leave the assistant when the receiver of the sound collection device 10 can not receive the signal from the transmitter of the assistant or when the intensity at which the signal is received becomes less than a predetermined threshold. Determine what you did.
 また、補助者のグループへの接近を検出するために、出力部116は、補助者の声紋(すなわち補助者の声の周波数スペクトル)を用いてもよい。この場合に、出力部116は、予め補助者の声紋を登録しておき、議論中に集音装置10によって取得した音声の中に補助者の声紋を検出する。そして出力部116は、補助者の声紋を検出した場合に補助者が接近したことを判定し、補助者の声紋を検出できなくなった場合に補助者が離脱したことを判定する。 Also, the output unit 116 may use the voiceprint of the assistant (ie, the frequency spectrum of the assistant's voice) to detect the approach of the assistant's group. In this case, the output unit 116 registers the voiceprint of the assistant in advance, and detects the voiceprint of the assistant in the voice acquired by the sound collection device 10 during the discussion. Then, the output unit 116 determines that the assistant has approached when detecting the assistant's voiceprint, and determines that the assistant has left when the assistant's voiceprint can not be detected.
 補助者の特定の言葉を検出するために、出力部116は、補助者の音声に対して音声認識を行う。この場合に、補助者は集音装置(例えばピンマイク)を保持し、出力部116は補助者が保持する集音装置によって取得した補助者の音声を受信する。集音装置10とは別に補助者が保持する集音装置を用いることによって、参加者Uの音声と補助者の音声とを明確に区別することができる。 In order to detect a specific word of the assistant, the output unit 116 performs speech recognition on the speech of the assistant. In this case, the assistant holds a sound collector (for example, a pin microphone), and the output unit 116 receives the voice of the assistant acquired by the sound collector held by the assistant. By using a sound collector held by the assistant separately from the sound collector 10, the voice of the participant U and the voice of the assistant can be clearly distinguished.
 出力部116は、補助者が保持する集音装置から取得した音声を、文字列に変換する。出力部116は、音声を文字列に変換するために、公知の音声認識方法を用いることができる。そして出力部116は、変換された文字列の中に特定の言葉(例えば「最初」、「まとめ」、「最後」等の議論の進行に関わる言葉や、「良い」、「悪い」等の言葉)を検出する。検出対象の言葉は、予め音声分析装置100に設定される。そして出力部116は、特定の言葉を検出した場合に、特定の言葉が発せられたことを判定する。 The output unit 116 converts the voice acquired from the sound collection device held by the assistant into a character string. The output unit 116 can use a known speech recognition method to convert speech into a character string. Then, the output unit 116 outputs specific words (for example, words related to the progress of the discussion such as “first”, “summary”, “last”, and words such as “good” or “bad”) in the converted character string. ) To detect. The words to be detected are set in the voice analysis device 100 in advance. Then, when the specific word is detected, the output unit 116 determines that the specific word is uttered.
 出力部116は、各参加者Uの発言量の変化が大きいタイミングの前後にのみ音声認識を行ってもよい。この場合に、出力部116は、分析結果記憶部133から読み出した分析結果に基づいて、時間ごとの発言量の変化の程度(例えば単位時間あたりの変化の量又は割合)を算出する。発言量の変化の程度は、参加者Uごとに算出されてもよく、あるいは全ての参加者Uの合計として算出されてもよい。 The output unit 116 may perform speech recognition only before and after the timing at which the change in the amount of speech of each participant U is large. In this case, based on the analysis result read out from the analysis result storage unit 133, the output unit 116 calculates the degree of change in the amount of speech per time (for example, the amount or ratio of change per unit time). The degree of change in the amount of speech may be calculated for each participant U, or may be calculated as the sum of all participants U.
 そして出力部116は、変化の程度が所定の閾値以上であるタイミングを含む所定の時間範囲(例えば該タイミングの5秒前から5秒後)において、補助者が保持する集音装置によって取得した音声の音声認識を行う。一般的に音声認識は処理の負荷が大きい。そこでこのように発言量の変化の程度が大きいタイミングの前後にのみ音声認識を行うことによって、処理の負荷を低減しながら、発言量の変化の原因となった言葉を分析することができる。 Then, the output unit 116 outputs the voice acquired by the sound collector held by the assistant in a predetermined time range (for example, 5 seconds after 5 seconds before the timing) including timing when the degree of change is equal to or higher than the predetermined threshold. Perform voice recognition. Generally speaking, speech recognition is heavy in processing. Thus, by performing speech recognition only before and after the timing at which the degree of change in the amount of speech is large, it is possible to analyze the words causing the change in the amount of speech while reducing the processing load.
 そして出力部116は、以上の方法によって検出したイベントを示す情報を、音声中の時間に関連付けた表示情報を生成する。図7は、発言量画面Bを表示している通信端末20の表示部21の前面図である。図7の発言量画面BにおいてはグラフB1上にイベント情報B4が表示されており、それ以外については図5の発言量画面Bと同様である。出力部116は、図5の発言量画面Bと図7の発言量画面Bとを分析者の操作に応じて切り替えて表示してもよく、予め定められた少なくとも一方を表示してもよい。 Then, the output unit 116 generates display information in which the information indicating the event detected by the above method is associated with the time in the voice. FIG. 7 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen B. In the speech amount screen B of FIG. 7, event information B4 is displayed on the graph B1, and the other is the same as the speech amount screen B of FIG. The output unit 116 may switch and display the speech amount screen B in FIG. 5 and the speech amount screen B in FIG. 7 according to the operation of the analyst, or may display at least one of the predetermined amounts.
 イベント情報B4は、イベントの内容及びタイミングを示す情報である。イベント情報B4は、例えば補助者が接近又は離脱したことを表す文字列や、音声認識によって検出した補助者の発言を表す文字列によって、イベントの内容を示す。また、イベント情報B4は、グラフB1上でイベントが発生したタイミングを示す矢印によって、イベントのタイミングを示す。 The event information B4 is information indicating the content and timing of the event. The event information B4 indicates the content of the event by, for example, a character string indicating that the assistant has approached or left or a character string indicating the speech of the assistant detected by speech recognition. Further, the event information B4 indicates the timing of the event by an arrow indicating the timing at which the event occurs on the graph B1.
 このように出力部116は、各参加者Uの発言量の時間変化に重畳して、議論の中で発生したイベントの内容及びタイミングを示す情報を表示する。そのため分析者は、議論中に発生したイベントが各参加者Uの発言量の時間変化にどのように影響を与えたかを分析することができる。分析者は、例えば教師がグループに接近した場合に発言量が多くなった場合には、教師は議論を活性化できたと評価できる。また分析者は、例えば教師によって特定の言葉が発せられた場合に発言量が多くなった場合に、その言葉は議論を活性化させるための有効な言葉であると評価できる。 Thus, the output unit 116 displays information indicating the content and timing of an event that has occurred in the discussion, superimposed on the time change of the utterance amount of each participant U. Therefore, the analyst can analyze how the event that occurred during the discussion influenced the time change of the volume of each participant U's utterance. The analyst can evaluate that the teacher has activated the discussion, for example, when the amount of speech increases when the teacher approaches the group. The analyst can also evaluate that the word is a valid word for activating the discussion, for example, when the amount of speech increases when a specific word is issued by the teacher.
[同じ区間の発言量の表示方法の説明]
 出力部116は、同じ区間における複数の発言量のグラフを抽出して表示することができる。図8は、区間抽出画面Cを表示している通信端末20の表示部21の前面図である。出力部116は、例えば図5~図7の発言量画面Bにおいて分析者がいずれかの区間の名称B2を指定した場合に、指定された区間について区間抽出画面Cを表示する。区間抽出画面Cは、同じ区間の発言量のグラフを抽出した結果を表示する画面であり、発言量のグラフC1と、区間の名称C2と、グループの名称C3とを含む。
[Explanation of the display method of the amount of speech of the same section]
The output unit 116 can extract and display a graph of a plurality of utterance amounts in the same section. FIG. 8 is a front view of the display unit 21 of the communication terminal 20 displaying the section extraction screen C. The output unit 116 displays the section extraction screen C for the specified section when, for example, the analyst specifies the name B2 of any section on the speech amount screen B in FIGS. 5 to 7. The section extraction screen C is a screen for displaying a result of extracting a graph of the amount of speech of the same section, and includes a graph C1 of the amount of speech, a name C2 of the section, and a name C3 of the group.
 区間抽出画面Cを表示する際に、出力部116は、指定された区間についての複数のグループの分析結果及び区間情報を、分析結果記憶部133から抽出する。表示対象のグループは、同時に議論した異なるグループでもよく、あるいは過去に議論した同じ又は異なるグループでもよい。そして出力部116は、抽出した分析結果及び区間情報に基づいて、指定された区間における複数のグループについて各参加者の発言量の時間変化を表示するための表示情報を生成する。 When displaying the section extraction screen C, the output unit 116 extracts analysis results and section information of a plurality of groups for the designated section from the analysis result storage section 133. The groups to be displayed may be different groups discussed at the same time, or the same or different groups discussed in the past. Then, the output unit 116 generates display information for displaying the time change of the utterance amount of each participant for a plurality of groups in the designated section based on the extracted analysis result and the section information.
 発言量のグラフC1は、2つ以上のグループのそれぞれについて、指定された区間における各参加者Uの発言量の時間変化を示すグラフである。グラフC1の表示態様は、グラフB1と同様である。区間の名称C2は、指定された区間の名称を示す文字列である。 The graph C1 of the amount of speech is a graph showing the time change of the amount of speech of each participant U in the designated section for each of two or more groups. The display mode of the graph C1 is the same as that of the graph B1. The section name C2 is a character string indicating the name of the designated section.
 グループの名称C3は、表示対象のグループを識別するための名称であり、分析者によって設定されてもよく、あるいは音声分析装置100によって自動的に決定されてもよい。図8の例では出力部116は2つのグループのグラフC1を表示しているが、3つ以上のグループのグラフC1を表示してもよい。また、出力部116は、グループの名称C3に代えて又は加えて、グループに属する1人又は複数人の参加者Uの名称を表示してもよい。 The group name C3 is a name for identifying a group to be displayed, and may be set by the analyst or may be automatically determined by the voice analysis device 100. In the example of FIG. 8, the output unit 116 displays the graph C1 of two groups, but the graph C1 of three or more groups may be displayed. Also, the output unit 116 may display the names of one or more participants U belonging to the group instead of or in addition to the name C3 of the group.
 このように出力部116は、同じ区間について、異なるグループにおける各参加者の発言量の時間変化を示す複数のグラフを表示する。これにより、分析者は、同じ区間(例えば同じ教科、又は議論における同じ段階)について異なるグループの発言量の時間変化を比較して分析することができる。例えば分析者は、同時に議論した異なるグループを比較することによって、グループごとの発言量の傾向を把握することができる。また、例えば分析者は、同じグループについて同じ区間の過去の複数の議論を比較することによって、同じグループの発言量の傾向の変化を把握することができる。 As described above, the output unit 116 displays a plurality of graphs indicating temporal change in the amount of speech of each participant in different groups in the same section. This allows the analyst to compare and analyze temporal changes in the volume of speech of different groups for the same section (e.g., the same subject, or the same stage in the discussion). For example, an analyst can grasp the tendency of the volume of utterance for each group by comparing different groups discussed at the same time. Also, for example, the analyst can grasp the change in the tendency of the utterance amount of the same group by comparing a plurality of past discussions of the same section for the same group.
[発言量のヒートマップの表示方法の説明]
 出力部116は、図5のような積み上げグラフに限られず、各参加者Uの発言量の時間変化を示すヒートマップを表示してもよい。図9は、発言量画面Dを表示している通信端末20の表示部21の前面図である。発言量画面Dは、発言量のヒートマップD1と、区間の名称D2と、区間の切り替え線D3とを含む。区間の名称D2及び区間の切り替え線D3は、図5における区間の名称B2及び区間の切り替え線B3と同様である。
[Explanation of how to display the heat map of the statement volume]
The output unit 116 is not limited to the stacked graph as illustrated in FIG. 5, and may display a heat map indicating time change of the amount of speech of each participant U. FIG. 9 is a front view of the display unit 21 of the communication terminal 20 displaying the speech amount screen D. The speech amount screen D includes a heat map D1 of the speech amount, a section name D2, and a section switching line D3. The section name D2 and the section switching line D3 are the same as the section name B2 and the section switching line B3 in FIG.
 発言量のヒートマップD1は、時間に沿った発言量を、色によって表示する。図9は、色の違いを点の粗密によって表しており、例えば点の密度が高いほど濃い色であり、点の密度が低いほど薄い色である。出力部116は、所定の方向(例えば図9の横方向)に時間をとって、各参加者Uについて時間ごとの発言量に応じた色の領域を、表示部21に表示させる。 The speech amount heat map D1 displays the amount of speech along time by color. FIG. 9 shows the color difference by the density of the points, for example, the higher the density of the points, the darker the color, and the lower the density of the points, the lighter the color. The output unit 116 takes time in a predetermined direction (for example, the horizontal direction in FIG. 9) and causes the display unit 21 to display, for each participant U, an area of a color according to the amount of speech per hour.
 このように、出力部116がグラフの代わりにヒートマップを表示することによっても、分析者は、各参加者Uの発言量の時間変化を区間ごとに把握することができる。出力部116は、図5のグラフと図9のヒートマップとを分析者の操作に応じて切り替えて表示してもよく、予め定められた少なくとも一方を表示してもよい。 As described above, the analyzer can also grasp the time change of the amount of speech of each participant U for each section by displaying the heat map instead of the graph. The output unit 116 may switch and display the graph of FIG. 5 and the heat map of FIG. 9 according to the operation of the analyst, or may display at least one of the predetermined ones.
[音声分析方法のシーケンス]
 図10は、本実施形態に係る音声分析システムSが行う音声分析方法のシーケンス図である。まず通信端末20は、分析者から分析条件の設定を受け付け、設定情報として音声分析装置100に送信する(S11)。音声分析装置100の設定部111は、通信端末20から設定情報を取得して設定情報記憶部131に記憶させる。
[Sequence of voice analysis method]
FIG. 10 is a sequence diagram of the speech analysis method performed by the speech analysis system S according to the present embodiment. First, the communication terminal 20 receives the setting of analysis conditions from the analyst, and transmits the setting as setting information to the voice analysis device 100 (S11). The setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and causes the setting information storage unit 131 to store the setting information.
 次に音声分析装置100の音声取得部112は、音声の取得を指示する信号を集音装置10に送信する(S12)。集音装置10は、音声分析装置100から音声の取得を指示する信号を受信した場合に、複数の集音部を用いて音声の記録を開始し、記録した複数チャネルの音声を音声分析装置100に送信する(S13)。音声分析装置100の音声取得部112は、集音装置10から音声を受信して音声記憶部132に記憶させる。 Next, the voice acquisition unit 112 of the voice analysis device 100 transmits a signal instructing voice acquisition to the sound collection device 10 (S12). When the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100, recording of voice is started using a plurality of sound collection units, and the voice analysis device 100 collects voices of a plurality of channels recorded. (S13). The voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
 音声分析装置100は、分析者による指示があった時、音声の取得が終了した時、又は音声を取得している途中(すなわちリアルタイム処理)のいずれかのタイミングで、音声の分析を開始する。音声を分析する際に、まず音源定位部113は、音声取得部112が取得した音声に基づいて音源定位を行う(S14)。 The voice analysis device 100 starts voice analysis at one of timings when an analyst gives instructions, when voice acquisition ends, or during voice acquisition (that is, real-time processing). When analyzing speech, first, the sound source localization unit 113 performs sound source localization based on the speech acquired by the speech acquisition unit 112 (S14).
 次に分析部114は、音声取得部112が取得した音声及び音源定位部113が推定した音源の向きに基づいて、時間ごとにいずれの参加者が発言したかを判別することによって、参加者ごとに発言期間及び発言量を特定する(S15)。分析部114は、参加者ごとの発言期間及び発言量を、分析結果記憶部133に記憶させる。 Next, for each participant, the analysis unit 114 determines, based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113, which participant has spoken at each time. The speech period and the speech volume are specified in (S15). The analysis unit 114 causes the analysis result storage unit 133 to store the utterance period and the utterance amount for each participant.
 区間設定部115は、分析対象の議論に対応する音声に対して、1つ以上の区間を設定する(S16)。このとき、区間設定部115は、通信端末20における操作、集音装置10における操作、及び集音装置10が取得した所定の音のうち少なくとも1つに基づいて、区間を設定する。区間設定部115は、区間を示す区間情報を、設定対象の音声と関連付けて分析結果記憶部133に記憶させる。 The section setting unit 115 sets one or more sections for the voice corresponding to the argument to be analyzed (S16). At this time, the section setting unit 115 sets a section based on at least one of the operation in the communication terminal 20, the operation in the sound collection device 10, and the predetermined sound acquired by the sound collection device 10. The section setting unit 115 stores section information indicating a section in the analysis result storage section 133 in association with the voice to be set.
 出力部116は、分析結果を通信端末20の表示部21に表示させる制御を行う(S17)。具体的には、出力部116は、分析部114による分析結果及び区間設定部115による区間情報に基づいて、上述の発言量画面B、区間抽出画面C又は発言量画面Dを表示させるための表示情報を生成し、通信端末20に送信する。 The output unit 116 performs control to display the analysis result on the display unit 21 of the communication terminal 20 (S17). Specifically, based on the analysis result by the analysis unit 114 and the section information by the section setting unit 115, the output unit 116 displays the above-mentioned utterance amount screen B, the section extraction screen C, or the utterance amount screen D. Information is generated and transmitted to the communication terminal 20.
 通信端末20は、音声分析装置100から受信した表示情報に従って、表示部21に分析結果を表示させる(S18)。 The communication terminal 20 causes the display unit 21 to display the analysis result in accordance with the display information received from the voice analysis device 100 (S18).
[本実施形態の効果]
 ハークネス法は議論の開始から終了までの全期間の発言の傾向を示すため、議論の時系列に沿った各参加者の発言量の変化を示すことができない。そのため、各参加者の発言量の時間変化に基づく分析が難しいという問題があった。それに対して、本実施形態に係る音声分析装置100は、区間ごとに各参加者の発言量の時間変化を表示する。これにより分析者は、各参加者の発言量の時間変化を、区間ごとに把握することができる。
[Effect of this embodiment]
Since the Harkness Law shows the tendency of the speech of the whole period from the start to the end of the discussion, it can not show the change of the speech volume of each participant along the time series of the discussion. Therefore, there is a problem that it is difficult to analyze based on the time change of the volume of each participant. On the other hand, the voice analysis device 100 according to the present embodiment displays the time change of the amount of speech of each participant for each section. Thereby, the analyst can grasp the time change of the amount of speech of each participant for each section.
 また、音声分析装置100は、複数の集音部を有する集音装置10を用いて取得した音声に基づいて、自動的に複数の参加者の議論を分析する。そのため、非特許文献1に記載のハークネス法のように記録者が議論を監視する必要がなく、またグループごとに記録者を配置する必要がないため、低コストである。 In addition, the voice analysis device 100 automatically analyzes the discussions of the plurality of participants based on the voice acquired using the sound collection device 10 having the plurality of sound collection units. Therefore, it is not necessary to have the recorder monitor the discussion as in the Harkness method described in Non-Patent Document 1, and it is not necessary to arrange the recorder for each group, so the cost is low.
 以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の分散・統合の具体的な実施の形態は、以上の実施の形態に限られず、その全部又は一部について、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を合わせ持つ。 As mentioned above, although the present invention was explained using an embodiment, the technical scope of the present invention is not limited to the range given in the above-mentioned embodiment, and various modification and change are possible within the range of the gist. is there. For example, a specific embodiment of device distribution and integration is not limited to the above embodiment, and all or a part thereof may be functionally or physically distributed and integrated in any unit. Can. In addition, new embodiments produced by any combination of a plurality of embodiments are also included in the embodiments of the present invention. The effects of the new embodiment generated by the combination combine the effects of the original embodiment.
 音声分析装置100、集音装置10及び通信端末20のプロセッサは、図10に示す音声分析方法に含まれる各ステップ(工程)の主体となる。すなわち、音声分析装置100、集音装置10及び通信端末20のプロセッサは、図10に示す音声分析方法を実行するためのプログラムを記憶部から読み出し、該プログラムを実行して音声分析装置100、集音装置10及び通信端末20の各部を制御することによって、図10に示す音声分析方法を実行する。図10に示す音声分析方法に含まれるステップは一部省略されてもよく、ステップ間の順番が変更されてもよく、複数のステップが並行して行われてもよい。 The processor of the speech analysis device 100, the sound collection device 10, and the communication terminal 20 is a main body of each step (process) included in the speech analysis method shown in FIG. That is, the processors of the voice analysis device 100, the sound collection device 10 and the communication terminal 20 read a program for executing the voice analysis method shown in FIG. 10 from the storage unit, and execute the program to execute the voice analysis device 100. By controlling each part of the sound device 10 and the communication terminal 20, the voice analysis method shown in FIG. 10 is performed. The steps included in the speech analysis method shown in FIG. 10 may be partially omitted, the order between the steps may be changed, and a plurality of steps may be performed in parallel.
S 音声分析システム
100 音声分析装置
110 制御部
112 音声取得部
114 分析部
115 区間設定部
116 出力部
10 集音装置
20 通信端末
21 表示部
S voice analysis system 100 voice analysis device 110 control unit 112 voice acquisition unit 114 analysis unit 115 section setting unit 116 output unit 10 sound collector 20 communication terminal 21 display unit

Claims (11)

  1.  複数の参加者が発した音声を取得する取得部と、
     前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定する分析部と、
     ユーザからの入力に基づいて、前記音声において区間を設定する区間設定部と、
     前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを出力する出力部と、
     を有する音声分析装置。
    An acquisition unit for acquiring voices uttered by a plurality of participants;
    An analysis unit for specifying an amount of time of each of the plurality of participants in the voice;
    A section setting unit for setting a section in the voice based on an input from a user;
    An output unit configured to output a graph in which temporal changes in the amount of utterance of the plurality of participants are accumulated with each other, and information indicating the section in the graph;
    Voice analyzer with.
  2.  前記出力部は、2つの前記区間の間で切り替わった時間に対応する前記グラフ上の位置を、前記区間を示す情報として出力する、請求項1に記載の音声分析装置。 The voice analysis device according to claim 1, wherein the output unit outputs, as the information indicating the section, a position on the graph that corresponds to a time when switching between the two sections.
  3.  前記区間設定部は、前記音声分析装置と通信する通信端末における操作と、前記音声を取得する集音装置における操作と、前記音声に含まれる所定の音とのうち少なくとも1つに基づいて、前記区間を設定する、請求項1又は2に記載の音声分析装置。 The section setting unit is configured to set the section based on at least one of an operation in a communication terminal that communicates with the voice analysis device, an operation in a sound collection device for obtaining the voice, and a predetermined sound included in the voice. The voice analysis device according to claim 1, wherein a section is set.
  4.  前記出力部は、前記複数の参加者それぞれについて算出された前記発言量のばらつきの程度が小さい順に、前記発言量の時間変化を互いに積み上げた前記グラフを出力する、請求項1から3のいずれか一項に記載の音声分析装置。 The said output part outputs the said graph which piled up the time change of the said utterance amount mutually in an order with the small grade of the dispersion | variation degree of the said utterance amount calculated about each of the said several participants. The voice analysis device according to one item.
  5.  前記出力部は、前記複数の参加者それぞれについて算出された前記区間ごとの前記発言量のばらつきの程度が小さい順に、前記区間ごとに前記発言量の時間変化を互いに積み上げた前記グラフを出力する、請求項4に記載の音声分析装置。 The output unit outputs the graph in which temporal changes of the utterance amount are accumulated for each of the sections in ascending order of the variation degree of the utterance amount for each of the sections calculated for each of the plurality of participants. The voice analysis device according to claim 4.
  6.  前記出力部は、複数の前記音声に設定された同じ前記区間についての複数の前記グラフを出力する、請求項1から5のいずれか一項に記載の音声分析装置。 The voice analysis device according to any one of claims 1 to 5, wherein the output unit outputs a plurality of the graphs for the same section set to a plurality of the voices.
  7.  前記グラフ及び前記区間を示す情報に加えて、前記音声の時間内に発生したイベントを示す情報を、前記グラフ上に出力する、請求項1から6のいずれか一項に記載の音声分析装置。 The voice analysis device according to any one of claims 1 to 6, wherein, in addition to the graph and the information indicating the section, information indicating an event occurring within the time of the voice is output on the graph.
  8.  前記分析部は、所定の時間窓内に参加者の発言を行った時間の長さを、前記時間窓の長さで割った値を、前記発言量として特定する、請求項1から7のいずれか一項に記載の音声分析装置。 The analysis unit according to any one of claims 1 to 7, wherein a value obtained by dividing the length of time during which a participant speaks within a predetermined time window by the length of the time window is specified as the speech amount. The voice analysis device according to any one of the preceding claims.
  9.  プロセッサが、
     複数の参加者が発した音声を取得するステップと、
     前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定するステップと、
     ユーザからの入力に基づいて、前記音声において区間を設定するステップと、
     前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを出力するステップと、
     を実行する音声分析方法。
    Processor is
    Acquiring voices uttered by a plurality of participants;
    Identifying an amount of time of each of the plurality of participants in the voice;
    Setting an interval in the voice based on an input from a user;
    Outputting a graph in which temporal changes of the utterance amount of the plurality of participants are stacked together, and information indicating the section in the graph;
    Voice analysis method to perform.
  10.  コンピュータに、
     複数の参加者が発した音声を取得するステップと、
     前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定するステップと、
     ユーザからの入力に基づいて、前記音声において区間を設定するステップと、
     前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを出力するステップと、
     を実行させる音声分析プログラム。
    On the computer
    Acquiring voices uttered by a plurality of participants;
    Identifying an amount of time of each of the plurality of participants in the voice;
    Setting an interval in the voice based on an input from a user;
    Outputting a graph in which temporal changes of the utterance amount of the plurality of participants are stacked together, and information indicating the section in the graph;
    Voice analysis program to run.
  11.  音声分析装置と、前記音声分析装置と通信可能な通信端末と、を備え、
     前記通信端末は、情報を表示する表示部を有し、
     前記音声分析装置は、
      複数の参加者が発した音声を取得する取得部と、
      前記音声における、前記複数の参加者それぞれの時間ごとの発言量を特定する分析部と、
      ユーザからの入力に基づいて、前記音声において区間を設定する区間設定部と、
      前記複数の参加者の前記発言量の時間変化を互いに積み上げたグラフと、前記グラフにおける前記区間を示す情報とを、前記表示部に表示させる出力部と、
     を有する、音声分析システム。
    A voice analysis device; and a communication terminal capable of communicating with the voice analysis device;
    The communication terminal has a display unit for displaying information;
    The voice analysis device
    An acquisition unit for acquiring voices uttered by a plurality of participants;
    An analysis unit for specifying an amount of time of each of the plurality of participants in the voice;
    A section setting unit for setting a section in the voice based on an input from a user;
    An output unit that causes the display unit to display a graph in which temporal changes of the utterance amount of the plurality of participants are accumulated with one another, and information indicating the section in the graph.
    Voice analysis system.
PCT/JP2018/000942 2018-01-16 2018-01-16 Voice analysis device, voice analysis method, voice analysis program, and voice analysis system WO2019142231A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2018/000942 WO2019142231A1 (en) 2018-01-16 2018-01-16 Voice analysis device, voice analysis method, voice analysis program, and voice analysis system
JP2018502279A JP6589040B1 (en) 2018-01-16 2018-01-16 Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/000942 WO2019142231A1 (en) 2018-01-16 2018-01-16 Voice analysis device, voice analysis method, voice analysis program, and voice analysis system

Publications (1)

Publication Number Publication Date
WO2019142231A1 true WO2019142231A1 (en) 2019-07-25

Family

ID=67300990

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/000942 WO2019142231A1 (en) 2018-01-16 2018-01-16 Voice analysis device, voice analysis method, voice analysis program, and voice analysis system

Country Status (2)

Country Link
JP (1) JP6589040B1 (en)
WO (1) WO2019142231A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021245759A1 (en) * 2020-06-01 2021-12-09 ハイラブル株式会社 Voice conference device, voice conference system, and voice conference method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008139654A (en) * 2006-12-04 2008-06-19 Nec Corp Method of estimating interaction, separation, and method, system and program for estimating interaction
JP2015028625A (en) * 2013-06-28 2015-02-12 キヤノンマーケティングジャパン株式会社 Information processing apparatus, control method of information processing apparatus, and program
JP2016206355A (en) * 2015-04-20 2016-12-08 本田技研工業株式会社 Conversation analysis device, conversation analysis method, and program
JP2017033443A (en) * 2015-08-05 2017-02-09 日本電気株式会社 Data processing device, data processing method, and program
JP2017161731A (en) * 2016-03-09 2017-09-14 本田技研工業株式会社 Conversation analyzer, conversation analysis method and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008139654A (en) * 2006-12-04 2008-06-19 Nec Corp Method of estimating interaction, separation, and method, system and program for estimating interaction
JP2015028625A (en) * 2013-06-28 2015-02-12 キヤノンマーケティングジャパン株式会社 Information processing apparatus, control method of information processing apparatus, and program
JP2016206355A (en) * 2015-04-20 2016-12-08 本田技研工業株式会社 Conversation analysis device, conversation analysis method, and program
JP2017033443A (en) * 2015-08-05 2017-02-09 日本電気株式会社 Data processing device, data processing method, and program
JP2017161731A (en) * 2016-03-09 2017-09-14 本田技研工業株式会社 Conversation analyzer, conversation analysis method and program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUMAN INTERFACE 2015, 1 September 2015 (2015-09-01), pages 939 - 943 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021245759A1 (en) * 2020-06-01 2021-12-09 ハイラブル株式会社 Voice conference device, voice conference system, and voice conference method

Also Published As

Publication number Publication date
JP6589040B1 (en) 2019-10-09
JPWO2019142231A1 (en) 2020-01-23

Similar Documents

Publication Publication Date Title
CN101213589B (en) Object sound analysis device, object sound analysis method
WO2007139040A1 (en) Speech situation data creating device, speech situation visualizing device, speech situation data editing device, speech data reproducing device, and speech communication system
CN110782962A (en) Hearing language rehabilitation device, method, electronic equipment and storage medium
US20240153483A1 (en) Systems and methods for generating synthesized speech responses to voice inputs
US20230317095A1 (en) Systems and methods for pre-filtering audio content based on prominence of frequency content
Ramsay et al. The intrinsic memorability of everyday sounds
WO2019142231A1 (en) Voice analysis device, voice analysis method, voice analysis program, and voice analysis system
JP6589042B1 (en) Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system
JP6646134B2 (en) Voice analysis device, voice analysis method, voice analysis program, and voice analysis system
CN109377806B (en) Test question distribution method based on learning level and learning client
KR102077642B1 (en) Sight-singing evaluation system and Sight-singing evaluation method using the same
KR101243766B1 (en) System and method for deciding user’s personality using voice signal
JP2020173415A (en) Teaching material presentation system and teaching material presentation method
JP7427274B2 (en) Speech analysis device, speech analysis method, speech analysis program and speech analysis system
JP6975755B2 (en) Voice analyzer, voice analysis method, voice analysis program and voice analysis system
JP7414319B2 (en) Speech analysis device, speech analysis method, speech analysis program and speech analysis system
JP6589041B1 (en) Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system
CN110727883A (en) Method and system for analyzing personalized growth map of child
JP6975756B2 (en) Voice analyzer, voice analysis method, voice analysis program and voice analysis system
Altaf et al. Perceptually motivated temporal modeling of footsteps in a cross-environmental detection task
KR20230064870A (en) Psychoanalysis server for people with low vision through online music activity and psychological analysis method using the same
KR20200018859A (en) Web service system for speech feedback
Becker et al. Comparing automatic forensic voice comparison systems under forensic conditions
JP2022017527A (en) Speech analysis device, speech analysis method, voice analysis program, and speech analysis system
CN112887490A (en) Telephone robot pressure test system based on collection scene

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018502279

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18900614

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.10.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18900614

Country of ref document: EP

Kind code of ref document: A1