WO2019142230A1 - Voice analysis device, voice analysis method, voice analysis program, and voice analysis system - Google Patents

Voice analysis device, voice analysis method, voice analysis program, and voice analysis system Download PDF

Info

Publication number
WO2019142230A1
WO2019142230A1 PCT/JP2018/000941 JP2018000941W WO2019142230A1 WO 2019142230 A1 WO2019142230 A1 WO 2019142230A1 JP 2018000941 W JP2018000941 W JP 2018000941W WO 2019142230 A1 WO2019142230 A1 WO 2019142230A1
Authority
WO
WIPO (PCT)
Prior art keywords
participant
voice
unit
transition
participants
Prior art date
Application number
PCT/JP2018/000941
Other languages
French (fr)
Japanese (ja)
Inventor
武志 水本
哲也 菅原
Original Assignee
ハイラブル株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ハイラブル株式会社 filed Critical ハイラブル株式会社
Priority to JP2018502278A priority Critical patent/JP6646134B2/en
Priority to PCT/JP2018/000941 priority patent/WO2019142230A1/en
Publication of WO2019142230A1 publication Critical patent/WO2019142230A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present invention relates to a voice analysis device for analyzing voice, a voice analysis method, a voice analysis program and a voice analysis system.
  • the Harkness method (also referred to as the Harkness method) is known as a method for analyzing discussions in group learning and meetings (see, for example, Non-Patent Document 1).
  • the Harkness method the transition of each participant's utterance is recorded in a line. In this way, it is possible to analyze the contribution of each participant to the discussion and the relationship with others.
  • the Harkness Law can also be effectively applied to active learning where students take the initiative in learning.
  • Harkness method the burden on the registrar is large because the registrar needs to constantly record the discussion. Also, in order to analyze a plurality of groups, it is necessary to arrange a recorder for each group. Therefore, there is a problem that high cost is required to implement the Harkness method.
  • the present invention has been made in view of these points, and it is an object of the present invention to provide a voice analysis device, a voice analysis method, a voice analysis program, and a voice analysis system that can analyze arguments at low cost.
  • a voice analysis device comprising: an acquisition unit for acquiring voices uttered by a plurality of participants; and a plurality of the plurality of participants from the speech of a first participant among the plurality of participants It has an analysis part which detects transition to a statement of the 2nd participant among participants, and an output part which displays on a display part the information which shows the timing which the transition generated.
  • the output unit may display the information indicating the timing on the display unit by a line connecting a position corresponding to the first participant and a position corresponding to the second participant.
  • the output unit generates the line at the time when the transition occurs on the display unit, and erases the line after a predetermined time has elapsed from the time when the transition occurs, as information indicating the timing.
  • the time change of the transition may be displayed.
  • the output unit may change the display mode of the line according to a combination of the first participant and the second participant.
  • the output unit may change the display mode of the line according to the number of times the transition occurs.
  • the analysis unit identifies a period in which each of the plurality of participants is speaking based on the voice, and the second participant speaks from the period in which the first participant speaks The transition may be detected when the period is switched.
  • the output unit may cause the display unit to display an amount of speech of each of the plurality of participants in addition to the time change of the transition.
  • the speech analysis method is characterized in that the processor acquires the speech uttered by a plurality of participants, and the speech of the first participant among the plurality of participants in the speech A step of detecting a transition to a second participant's utterance among a plurality of participants, and a step of displaying information indicating timing at which the transition has occurred on a display unit are performed.
  • the voice analysis program is characterized in that the step of obtaining voices uttered by a plurality of participants on a computer, and the utterance of the first participant among the plurality of participants in the voice A step of detecting a transition to a second participant's utterance among a plurality of participants, and a step of displaying information indicating a timing at which the transition has occurred on a display unit are performed.
  • a voice analysis system includes a voice analysis device and a communication terminal capable of communicating with the voice analysis device, the communication terminal having a display unit for displaying information, the voice
  • the analysis device includes an acquisition unit for acquiring voices uttered by a plurality of participants, and a second participant among the plurality of participants from a statement of a first participant among the plurality of participants in the voice.
  • An analysis unit that detects a transition to a message, and an output unit that causes the display unit to display information indicating a timing at which the transition has occurred.
  • FIG. 1 is a schematic view of a speech analysis system S according to the present embodiment.
  • the voice analysis system S includes a voice analysis device 100, a sound collection device 10, and a communication terminal 20.
  • the number of sound collectors 10 and communication terminals 20 included in the speech analysis system S is not limited.
  • the voice analysis system S may include devices such as other servers and terminals.
  • the voice analysis device 100, the sound collection device 10, and the communication terminal 20 are connected via a network N such as a local area network or the Internet. At least a part of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 may be directly connected without the network N.
  • the sound collector 10 includes a microphone array including a plurality of sound collectors (microphones) arranged in different orientations.
  • the microphone array includes eight microphones equally spaced on the same circumference in the horizontal plane with respect to the ground.
  • the sound collection device 10 transmits the voice acquired using the microphone array to the voice analysis device 100 as data.
  • the communication terminal 20 is a communication device capable of performing wired or wireless communication.
  • the communication terminal 20 is, for example, a portable terminal such as a smart phone terminal or a computer terminal such as a personal computer.
  • the communication terminal 20 receives the setting of analysis conditions from the analyst and displays the analysis result by the voice analysis device 100.
  • the voice analysis device 100 is a computer that analyzes the voice acquired by the sound collection device 10 by a voice analysis method described later. Further, the voice analysis device 100 transmits the result of the voice analysis to the communication terminal 20.
  • FIG. 2 is a block diagram of the speech analysis system S according to the present embodiment. Arrows in FIG. 2 indicate the main data flow, and there may be data flows not shown in FIG. In FIG. 2, each block is not a hardware (apparatus) unit configuration but a function unit configuration. As such, the blocks shown in FIG. 2 may be implemented in a single device or may be implemented separately in multiple devices. Transfer of data between the blocks may be performed via any means such as a data bus, a network, a portable storage medium, and the like.
  • the communication terminal 20 has a display unit 21 for displaying various information, and an operation unit 22 for receiving an operation by an analyst.
  • the display unit 21 includes a display device such as a liquid crystal display or an organic light emitting diode (OLED) display.
  • the operation unit 22 includes operation members such as a button, a switch, and a dial.
  • the display unit 21 and the operation unit 22 may be integrally configured by using a touch screen capable of detecting the position of contact by the analyst as the display unit 21.
  • the voice analysis device 100 includes a control unit 110, a communication unit 120, and a storage unit 130.
  • the control unit 110 includes a setting unit 111, a sound acquisition unit 112, a sound source localization unit 113, an analysis unit 114, and an output unit 115.
  • the storage unit 130 includes a setting information storage unit 131, a voice storage unit 132, and an analysis result storage unit 133.
  • the communication unit 120 is a communication interface for communicating with the sound collection device 10 and the communication terminal 20 via the network N.
  • the communication unit 120 includes a processor for performing communication, a connector, an electric circuit, and the like.
  • the communication unit 120 performs predetermined processing on a communication signal received from the outside to acquire data, and inputs the acquired data to the control unit 110. Further, the communication unit 120 performs predetermined processing on the data input from the control unit 110 to generate a communication signal, and transmits the generated communication signal to the outside.
  • the storage unit 130 is a storage medium including a read only memory (ROM), a random access memory (RAM), a hard disk drive, and the like.
  • the storage unit 130 stores in advance a program to be executed by the control unit 110.
  • the storage unit 130 may be provided outside the voice analysis device 100, and in this case, data may be exchanged with the control unit 110 via the communication unit 120.
  • the setting information storage unit 131 stores setting information indicating analysis conditions set by the analyst in the communication terminal 20.
  • the voice storage unit 132 stores the voice acquired by the sound collection device 10.
  • the analysis result storage unit 133 stores an analysis result indicating the result of analyzing the voice.
  • the setting information storage unit 131, the voice storage unit 132, and the analysis result storage unit 133 may be storage areas on the storage unit 130, or a database configured on the storage unit 130.
  • the control unit 110 is, for example, a processor such as a central processing unit (CPU), and executes the program stored in the storage unit 130 to set the setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the analysis unit 114, and the like. It functions as the output unit 115.
  • the functions of the setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the analysis unit 114, and the output unit 115 will be described later with reference to FIGS. 3 to 8. At least a part of the functions of the control unit 110 may be performed by an electrical circuit. In addition, at least a part of the functions of the control unit 110 may be executed by a program executed via a network.
  • the speech analysis system S is not limited to the specific configuration shown in FIG.
  • the voice analysis device 100 is not limited to one device, and may be configured by connecting two or more physically separated devices in a wired or wireless manner.
  • FIG. 3 is a schematic view of the speech analysis method performed by the speech analysis system S according to the present embodiment.
  • the analyst sets the analysis conditions by operating the operation unit 22 of the communication terminal 20.
  • the analysis condition is information indicating the number of participants in the argument to be analyzed and the direction in which each participant (that is, each of a plurality of participants) is located with reference to the sound collection device 10.
  • the communication terminal 20 receives the setting of analysis conditions from the analyst, and transmits the setting as the setting information to the voice analysis device 100 (a).
  • the setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and causes the setting information storage unit 131 to store the setting information.
  • FIG. 4 is a front view of the display unit 21 of the communication terminal 20 displaying the setting screen A.
  • the communication terminal 20 displays the setting screen A on the display unit 21 and receives the setting of the analysis condition by the analyst.
  • the setting screen A includes a position setting area A1, a start button A2, and an end button A3.
  • the positioning area A1 is an area for setting the direction in which each participant U is actually positioned with reference to the sound collection device 10 in the argument to be analyzed.
  • the position setting area A1 represents a circle centered on the position of the sound collector 10 as shown in FIG. 4, and further represents an angle based on the sound collector 10 along the circle.
  • the analyst sets the position of each participant U in the position setting area A1 by operating the operation unit 22 of the communication terminal 20.
  • identification information here, U1 to U4
  • U1 to U4 identification information for identifying each participant U is allocated and displayed.
  • four participants U1 to U4 are set.
  • the portion corresponding to each participant U in the positioning area A1 is displayed in a different color for each participant. Thereby, the analyst can easily recognize the direction in which each participant U is set.
  • the start button A2 and the end button A3 are virtual buttons displayed on the display unit 21, respectively.
  • the communication terminal 20 transmits a signal of a start instruction to the voice analysis device 100 when the analyst presses the start button A2.
  • the communication terminal 20 transmits a signal of a termination instruction to the voice analysis device 100 when the analyst presses the termination button A3.
  • from the start instruction to the end instruction by the analyst is one discussion.
  • the voice acquisition unit 112 of the voice analysis device 100 When the voice acquisition unit 112 of the voice analysis device 100 receives the signal of the start instruction from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing acquisition of voice to the sound collection device 10 (b). When the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100, the collection of voice is started. Further, when the voice acquisition unit 112 of the voice analysis device 100 receives the signal of the termination instruction from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing the termination of the voice acquisition to the sound collection device 10. When the sound collection device 10 receives a signal instructing the end of the acquisition of sound from the speech analysis device 100, the sound collection device 10 ends the acquisition of sound.
  • the sound collection device 10 acquires voices in each of a plurality of sound collection units, and internally records the sound as the sound of each channel corresponding to each sound collection unit. Then, the sound collection device 10 transmits the acquired voices of the plurality of channels to the voice analysis device 100 (c). The sound collector 10 may transmit the acquired voice sequentially or may transmit a predetermined amount or a predetermined time of sound. Further, the sound collection device 10 may collectively transmit the sound from the start to the end of the acquisition.
  • the voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
  • the voice analysis device 100 analyzes voice at predetermined timing using the voice acquired from the sound collection device 10.
  • the voice analysis device 100 may analyze the voice when the analyst gives an analysis instruction at the communication terminal 20 by a predetermined operation. In this case, the analyst selects a voice corresponding to the argument to be analyzed from the voices stored in the voice storage unit 132.
  • the voice analysis device 100 may analyze the voice when the voice acquisition ends. In this case, the speech from the start to the end of the acquisition corresponds to the argument to be analyzed. In addition, the voice analysis device 100 may analyze voice sequentially (that is, in real time processing) during acquisition of voice. In this case, the voice analysis device 100 goes back from the current time, and the voice for a predetermined time (for example, 30 seconds) in the past corresponds to the argument to be analyzed.
  • a predetermined time for example, 30 seconds
  • the sound source localization unit 113 When analyzing speech, first, the sound source localization unit 113 performs sound source localization based on the plurality of channels of speech acquired by the speech acquisition unit 112 (d). The sound source localization is processing for estimating the direction of the sound source included in the sound acquired by the sound acquisition unit 112 for each time (for example, every 10 milliseconds to 100 milliseconds). The sound source localization unit 113 associates the direction of the sound source estimated for each time with the direction of the participant indicated by the setting information stored in the setting information storage unit 131.
  • the sound source localization unit 113 can identify the direction of the sound source based on the sound acquired from the sound collection device 10, a known sound source localization method such as Multiple Signal Classification (MUSIC) method or beam forming method can be used. .
  • MUSIC Multiple Signal Classification
  • the analysis unit 114 analyzes the voice based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113 (e).
  • the analysis unit 114 may analyze the entire completed discussion as an analysis target, or may analyze a part of the discussion in the case of real-time processing.
  • the analysis unit 114 first performs analysis (for example, 10 milliseconds to 100 milliseconds) in the discussion of the analysis target. Every second), it is determined which participant speaks (speaks).
  • the analysis unit 114 specifies a continuous period from the start to the end of one participant's speech as a speech period, and causes the analysis result storage unit 133 to store the same. When a plurality of participants speak at the same time, the analysis unit 114 specifies a speech period for each participant.
  • the analysis unit 114 calculates the amount of speech of each participant for each time, and causes the analysis result storage unit 133 to store the amount. Specifically, in a certain time window (for example, 5 seconds), the analysis unit 114 calculates a value obtained by dividing the length of time during which the participant speaks by the length of the time window as the amount of speech per time Do. Then, the analysis unit 114 calculates the amount of speech per hour for each participant while shifting the time window by a predetermined time (for example, one second) from the start time of the discussion to the end time (currently in the case of real time processing). repeat.
  • a predetermined time for example, one second
  • the analysis unit 114 detects the transition of the speaker.
  • the same participant participates when another participant (second participant) speaks after one participant (first participant) finishes speaking and after the one participant finishes speaking May make the following remarks.
  • it may be detected as one transition that the speech period has been switched two or more times. For example, after one participant (the first participant) finished speaking, another participant (the second participant) spoke and then another participant (the third participant) spoke May be detected as one transition.
  • the analysis unit 114 counts the occurrence time of the transition detected in the discussion of the analysis target, the transition source participant, and the transition destination participant, associates them with one another, and stores the result in the analysis result storage unit 133.
  • FIG. 5 is a schematic view of the matrix B indicating the transition of the speaker collected by the analysis unit 114. As shown in FIG. Although in FIG. 5 the matrix B is represented as a table of character strings for visibility, it may be represented in other forms recognizable by a computer, such as binary data.
  • the matrix B represents the number of transitions from the transition source participant to the transition destination participant in the analysis target discussion.
  • the number of transitions from the participant U1 to the same participant U1 is two, and the number of transitions from the participant U1 to another participant U4 is eight.
  • the diagonal components of matrix B indicate that the speakers did not alternate, and the non-diagonal components of matrix B indicate that the speakers alternate. Therefore, the analysis unit 114 can determine the atmosphere of the group by comparing the diagonal and non-diagonal components of the matrix B.
  • the output unit 115 performs control to display the analysis result by the analysis unit 114 on the display unit 21 by transmitting the display information to the communication terminal 20 (f).
  • the display control method of the analysis result by the output unit 115 will be described below with reference to FIGS. 6 to 8.
  • the output unit 115 of the voice analysis device 100 reads the analysis result by the analysis unit 114 for the display target discussion from the analysis result storage unit 133.
  • the output unit 115 may display a discussion immediately after the analysis by the analysis unit 114 is completed, or may display a discussion specified by the analyst.
  • FIG. 6 is a front view of the display unit 21 of the communication terminal 20 displaying the speaker transition screen C.
  • the speaker transition screen C includes a circle C1 indicating the arrangement of the participants U, a line C2 indicating the transition of the speakers, and a bar C3 indicating the amount of speech of each participant U.
  • the output unit 115 displays the time change of the speaker transition as information indicating the transition timing of the speaker based on the analysis result read from the analysis result storage unit 133.
  • the output unit 115 determines the position of the transition source participant for a predetermined period (for example, 5 seconds) from the time of occurrence of the transition. And display information for displaying a line connecting the position of the target and the transition destination participant.
  • the circle C1 is a circular area that schematically represents the arrangement of each participant U.
  • the output unit 115 displays the identification information (that is, U1 to U4) of the participant U near the position on the circle C1 corresponding to the position of each participant U set in FIG.
  • a line C2 is a line connecting the position of the transition source participant U on the circle C1 and the position of the transition destination participant U on the circle C1 when the transition of the speaker occurs.
  • the line C2 is displayed in a predetermined color and a predetermined thickness.
  • the line C2 may be a straight line segment, a bent line, or a broken line like a dotted line.
  • the output unit 115 causes the display unit 21 to display a line C2 connecting the position of the transition source participant U and the position of the transition destination participant U for a predetermined period (five seconds in this case) from the transition generation time. Then, the output unit 115 causes the display unit 21 to erase the line C2 after a predetermined period from the occurrence time of the transition.
  • the output unit 115 repeats generation and deletion of a line representing the transition of the speaker from the start time to the end time of the display target discussion. Thus, the output unit 115 can cause the display unit 21 to display the time change of the transition of the speaker.
  • the output unit 115 may automatically advance the time during display (that is, may display as a moving image), or may advance the time during display according to the operation by the user.
  • the output unit 115 indicates how the transition tendency changes along the time series of the discussion by displaying the time change of the transition of the speaker as information indicating the timing of the transition of the speaker. be able to. As a result, the analyst can efficiently grasp the role of each participant U and the relationship between the participants U along the time series of the discussion.
  • the output unit 115 may shift the positions of both ends of the plurality of lines C2 by a predetermined amount and cause the display unit 21 to display the positions. As a result, the output unit 115 can prevent the plurality of lines C2 from matching each other even when a plurality of transitions occur in the same time between the same participants U.
  • the output unit 115 displays a display mode such as the thickness and color of the line C2 based on the number of transitions that occur. You may change the For example, the output unit 115 causes the display unit 21 to display a thicker line C2 as the number of transitions increases, or displays the line C2 in a different color according to the number of transitions.
  • the output unit 115 can display the fact that a plurality of transitions have occurred in the same time between the same participants U in a manner easily understood by the analyst.
  • the output unit 115 may change the display mode such as thickness or color of the line C2 based on the number of times of transition from the start time of the discussion to the time during display in the same combination of the participants U .
  • the output unit 115 causes the display unit 21 to display the line C2 thicker as the number of cumulative transitions increases, or causes the line C2 to be displayed in a different color according to the number of cumulative transitions.
  • the output unit 115 can display the fact that the number of transitions of the cumulative total is high or low for each combination of the participants U in a manner easily understood by the analyst.
  • the output unit 115 may change the display mode such as the thickness and color of the line C2 depending on the combination of the participants U.
  • the output unit 115 causes the display unit 21 to display the line C2 with a different thickness or color depending on the combination of the participants U.
  • the output unit 115 can display the combination of the participants U corresponding to the line C2 in a manner easy for the analyst to understand.
  • a bar C3 is a bar-like area that represents the amount of speech of each participant U.
  • the output unit 115 acquires the amount of speech of each participant U at each time during display, which is indicated by the analysis result read out from the analysis result storage unit 133. Then, the output unit 115 causes the bar C3 having a length or a size corresponding to the read amount of speech to be displayed at a position on the circle C1 corresponding to the position of each participant U. For example, the output unit 115 causes the display unit 21 to display the bar C3 such that the length from the circumference of the circle C1 toward the center becomes longer as the amount of speech of the participant U increases. As a result, the output unit 115 can display the amount of speech of each participant during the displayed time in an easily understandable manner to the analyst in addition to the time change of the transition of the speech.
  • the output unit 115 may display a bar C3 having a length or a size corresponding to the cumulative value of the amount of speech from the start time of the discussion to the time being displayed, not limited to the amount of speech for each time.
  • the output unit 115 may change the display mode such as the color or pattern of the bar C3 depending on the participant U.
  • the output unit 115 is not limited to the time change of the transition from one participant U to another participant U, and may display the time change of the combination of the participants U in which the transition has occurred. In this case, the output unit 115 causes the circle C1 to display identification information (for example, “U1-U2”, “U1-U3”, etc.) indicating the combination of the participants U.
  • identification information for example, “U1-U2”, “U1-U3”, etc.
  • the output unit 115 outputs “U1-U2
  • the display unit 21 displays a line C2 connecting the position of “” and the position of “U1-U3”.
  • the output unit 115 causes the display unit 21 to erase the line C2 a predetermined time after the line C2 is displayed.
  • the output unit 115 can represent how the combination of the participants U in which the transition has occurred changes along the time series of the discussion.
  • FIG. 7 is a front view of the display unit 21 of the communication terminal 20 displaying the speech order screen D.
  • the speech order screen D includes a region D1 indicating the amount of speech of the participant U and an arrow D2 indicating the number of transitions between the speakers.
  • the output unit 115 When displaying the speech order screen D, the output unit 115 acquires the amount of speech of each participant U in the discussion of the display target for each time indicated by the analysis result read out from the analysis result storage unit 133. Then, the output unit 115 calculates the total amount of speech of each participant U by summing up the amounts of speech for each time from the start time to the end time of the display target discussion. Further, the output unit 115 acquires, from the analysis result read out from the analysis result storage unit 133, the number of transitions (that is, the matrix B illustrated in FIG. 5) generated in the display target discussion for each combination of participants U.
  • a region D1 is a graphic representing the total amount of speech of each participant U.
  • the output unit 115 causes the display unit 21 to display an area D1 having a size corresponding to the total amount of speech.
  • the output unit 115 causes the display unit 21 to display a circle with a larger radius as the total amount of speech of each participant U is larger as the area D1.
  • the area D1 is not limited to a circle, but may be another figure such as a polygon.
  • An arrow D2 is a graphic representing the direction of transition and the number of transitions from one participant U to another participant U.
  • the output unit 115 displays, from the area D1 corresponding to the transition source participant U, to the area D1 corresponding to the transition destination participant U, an arrow D2 of a thickness corresponding to the number of transitions on the display unit
  • the arrow D2 may be a straight arrow, a curved arrow, or a broken arrow like a dotted line.
  • the output unit 115 causes the display unit 21 to display the arrow D2 thicker as the number of transitions from the transition source participant U to the transition destination participant U increases.
  • the output unit 115 may not display the arrow D2 for the combination of the participants U whose number of transitions is equal to or less than a predetermined threshold.
  • the output unit 115 may adjust the arrangement of the plurality of areas D1 based on the number of transitions between the participants U. In this case, the output unit 115 arranges two areas D1 corresponding to the participant U with many transition times near and arranges two areas D1 corresponding to the participant U with small transition numbers in the distance. Do. Alternatively, the output unit 115 may arrange the plurality of areas D1 based on the physical position of the participant U. In this case, the output unit 115 arranges the plurality of areas D1 so as to match the positions of the participants U set in FIG. 4.
  • the output unit 115 simultaneously indicates the amount of speech of the participant U and the number of transitions between the participants. As a result, the analyst can grasp at a glance the flow of utterances among the participants U and which participant U talks more or less.
  • FIG. 8 is a front view of the display unit 21 of the communication terminal 20 displaying the analysis report screen E.
  • the analysis report screen E includes a main statement order E1, a group atmosphere E2, and a participant classification E3.
  • the output unit 115 When displaying the analysis report screen E, the output unit 115 acquires the amount of speech of each participant U in the discussion of the display target for each time indicated by the analysis result read out from the analysis result storage unit 133. Then, the output unit 115 calculates the total amount of speech of each participant U by summing up the amounts of speech for each time from the start time to the end time of the display target discussion. Further, the output unit 115 acquires, from the analysis result read out from the analysis result storage unit 133, the number of transitions (that is, the matrix B illustrated in FIG. 5) generated in the display target discussion for each combination of participants U.
  • the order of main utterances E1 is information indicating the transition of the speaker who frequently occurred in the discussion.
  • the output unit 115 sums up the number of transitions for a series of transitions from one participant U to one or more other participants U and then returning to the first participant U.
  • the series of transitions includes transitioning from participant U1 to participant U4, then transitioning from participant U4 to participant U3, and then transitioning from participant U3 to the first participant U1.
  • the output unit 115 determines the combination of the participants U indicated by a series of transitions having the largest number of transitions as the main utterance order E1 and causes the analysis report screen E to be displayed.
  • the output unit 115 may determine the order E1 of two or more main utterances in descending order of the number of transitions. This allows the analyst to grasp the participant U who was at the center of the discussion.
  • the group atmosphere E2 is information indicating an atmosphere as to whether the number of speaker changes is large or small in the discussion.
  • the output unit 115 calculates the average value of the number of transitions of diagonal components (that is, between the same participants U) and non-diagonal components (that is, between different participants U). And the average value of the number of transitions of Then, the output unit 115 causes the analysis report screen E to display the ratio of the average value of the diagonal components and the average value of the non-diagonal components as the atmosphere E2 of the group.
  • the analysis report screen E causes the analysis report screen E to display the ratio of the average value of the diagonal components and the average value of the non-diagonal components as the atmosphere E2 of the group.
  • the output unit 115 displays an arrow at a position corresponding to the ratio of the average value of the diagonal components and the average value of the non-diagonal components on the scale extending in the left-right direction. Further, the output unit 115 may display a value indicating an average value of diagonal components and an average value of non-diagonal components. This enables the analyst to grasp the atmosphere of the whole group who discussed.
  • Participant classification E3 is information for classifying each participant U based on the volume and transition of each participant U in the discussion.
  • the output unit 115 classifies each participant U with respect to two axes of an axis indicating the amount of utterance of the participant U and an axis indicating whether the participant U was at the center of the discussion.
  • the output unit 115 arranges the participant U whose utterance amount is equal to or more than a predetermined threshold above the origin (rightward in FIG. 8) on the axis indicating the utterance amount of the participant U, and the utterance amount Is arranged below the origin (to the left in FIG. 8).
  • the output unit 115 arranges the participant U included in the main statement order E1 above the origin (upward in FIG. 8) with respect to an axis indicating whether the participant U was at the center of the discussion. , Place the participant U not included in the main statement order E1 below the origin (downward in FIG. 8).
  • the output unit 115 displays predetermined labels for four areas (quadrants) divided into two axes.
  • the labels of the respective areas are preset in the voice analysis device 100.
  • the output unit 115 is “leader type” with respect to the upper right area (the participant U who has a large amount of speech and is the center of discussion) and the upper left area (the small amount of speech is the center of the discussion "Participant type” for one participant U), "One more for another type” for the lower right area (a large amount of talks, non-controversial participant U), lower left area (a small amount of speech , "Non-participatory" is displayed for the non-centered participant U).
  • the analyst can grasp the state of each participant U in the entire discussion.
  • the output unit 115 may determine the affinity of the participants U based on the transition of the speaker, and may display the analysis on the analysis report screen E.
  • the output unit 115 sums up the numbers of transitions for all combinations of two participants U.
  • the output unit 115 determines that the combination of participants U whose number of transitions is equal to or more than a predetermined threshold is good compatibility, and determines that the combination of participants U whose number of transitions is less than the predetermined threshold is bad.
  • the output unit 115 causes the analysis report screen E to display the compatibility determined for each combination of the participants U. Thereby, the analyst can grasp that there are more or less transitions for each combination of participants U.
  • the output unit 115 switches the speaker transition screen C, the speech order screen D, and the analysis report screen E and causes the display unit 21 to display the screen by receiving an operation by the analyst.
  • the output unit 115 may cause the display unit 21 to display only a part of the speaker transition screen C, the speech order screen D, and the analysis report screen E.
  • the output unit 115 is not limited to the display on the display unit, and may output the analysis result by other methods such as printing by a printer, data recording to a storage device, and the like.
  • FIG. 9 is a sequence diagram of the speech analysis method performed by the speech analysis system S according to the present embodiment.
  • the communication terminal 20 receives the setting of analysis conditions from the analyst, and transmits the setting as setting information to the voice analysis device 100 (S11).
  • the setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and causes the setting information storage unit 131 to store the setting information.
  • the voice acquisition unit 112 of the voice analysis device 100 transmits a signal instructing voice acquisition to the sound collection device 10 (S12).
  • the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100
  • recording of voice is started using a plurality of sound collection units, and the voice analysis device 100 collects voices of a plurality of channels recorded.
  • the voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
  • the voice analysis device 100 starts voice analysis at one of timings when an analyst gives instructions, when voice acquisition ends, or during voice acquisition (that is, real-time processing).
  • the sound source localization unit 113 performs sound source localization based on the speech acquired by the speech acquisition unit 112 (S14).
  • the analysis unit 114 determines, based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113, which participant has spoken at each time.
  • the speech period and the speech volume are specified in (S15).
  • the analysis unit 114 causes the analysis result storage unit 133 to store the utterance period and the utterance amount for each participant.
  • the analysis unit 114 detects the transition of the speaker (S16).
  • the analysis unit 114 counts the time of occurrence of transition, the participant at the transition source, and the participant at the transition destination, associates them with one another, and stores them in the analysis result storage unit 133.
  • the output unit 115 performs control to display the analysis result on the display unit 21 of the communication terminal 20 (S17). Specifically, the output unit 115 transmits, to the communication terminal 20, display information for displaying the speaker transition screen C, the speech order screen D, and the analysis report screen E described above.
  • the communication terminal 20 causes the display unit 21 to display the analysis result in accordance with the display information received from the voice analysis device 100 (S18).
  • the voice analysis device 100 automatically analyzes the discussions of a plurality of participants based on the voice acquired using the sound collection device 10 having a plurality of sound collection units. Therefore, it is not necessary to have the recorder monitor the discussion as in the Harkness method described in Non-Patent Document 1, and it is not necessary to arrange the recorder for each group, so the cost is low.
  • Non-Patent Document 1 represents the transition of speech in the entire period from the start to the end of the discussion. Therefore, the analyst could not grasp the change of the transition tendency along the time series of the argument.
  • the speech analysis device 100 displays the time change of the transition as information indicating the timing of the transition of the utterance between the participants in the discussion. Thereby, the analyst can grasp the role of each participant U and the relationship between the participants U along the time series of the discussion.
  • the voice analysis device 100 simultaneously displays the amount of speech of the participant U and the number of transitions between the participants based on the acquired voice. As a result, the analyst can grasp at a glance the flow of utterances among the participants U and which participant U talks more or less.
  • the voice analysis device 100 displays the order of main utterances in the discussion, the atmosphere of the group, and the classification of the participants based on the acquired voice. This enables the analyst to understand the participants who were at the center of the discussion, the atmosphere of the whole group who discussed, and the state of each participant in the whole discussion.
  • the processor of the speech analysis device 100, the sound collection device 10, and the communication terminal 20 is the main body of each step (process) included in the speech analysis method shown in FIG. That is, the processors of the voice analysis device 100, the sound collection device 10 and the communication terminal 20 read a program for executing the voice analysis method shown in FIG. 9 from the storage unit, and execute the program to execute the voice analysis device 100. By controlling each part of the sound device 10 and the communication terminal 20, the voice analysis method shown in FIG. 9 is performed.
  • the steps included in the speech analysis method shown in FIG. 9 may be partially omitted, the order between the steps may be changed, and a plurality of steps may be performed in parallel.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The purpose of the present invention is to provide a voice analysis device, a voice analysis method, a voice analysis program, and a voice analysis system, which enable analysis of a discussion at low cost. A voice analysis device 100 according to one embodiment of the present invention includes: a voice acquisition unit 112 that acquires voices uttered by a plurality of participants; an analysis unit 114 that detects, in the voices, transition from a speech spoken by a first participant among the plurality of participants to a speech spoken by a second participant among the plurality of participants; and an output unit 115 that displays, on a display unit, information indicating the occurrence timing of the transition.

Description

音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムSpeech analysis device, speech analysis method, speech analysis program and speech analysis system
 本発明は、音声を分析するための音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムに関する。 The present invention relates to a voice analysis device for analyzing voice, a voice analysis method, a voice analysis program and a voice analysis system.
 グループ学習や会議における議論を分析する方法として、ハークネス法(ハークネスメソッドともいう)が知られている(例えば、非特許文献1参照)。ハークネス法では、各参加者の発言の遷移を線で記録する。これにより、各参加者の議論への貢献や、他者との関係性を分析することができる。ハークネス法は、学生が主体的に学習を行うアクティブ・ラーニングにも効果的に適用できる。 The Harkness method (also referred to as the Harkness method) is known as a method for analyzing discussions in group learning and meetings (see, for example, Non-Patent Document 1). In the Harkness method, the transition of each participant's utterance is recorded in a line. In this way, it is possible to analyze the contribution of each participant to the discussion and the relationship with others. The Harkness Law can also be effectively applied to active learning where students take the initiative in learning.
 しかしながら、ハークネス法では記録者が常に議論を記録する必要があるため、記録者の負担が大きい。また、複数のグループを分析するためには、グループごとに記録者を配置することが必要となる。そのため、ハークネス法を実施するためには高いコストが掛かるという問題があった。 However, with the Harkness method, the burden on the registrar is large because the registrar needs to constantly record the discussion. Also, in order to analyze a plurality of groups, it is necessary to arrange a recorder for each group. Therefore, there is a problem that high cost is required to implement the Harkness method.
 本発明はこれらの点に鑑みてなされたものであり、低コストで議論を分析できる音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムを提供することを目的とする。 The present invention has been made in view of these points, and it is an object of the present invention to provide a voice analysis device, a voice analysis method, a voice analysis program, and a voice analysis system that can analyze arguments at low cost.
 本発明の第1の態様の音声分析装置は、複数の参加者が発した音声を取得する取得部と、前記音声における、前記複数の参加者のうち第1参加者の発言から、前記複数の参加者のうち第2参加者の発言への遷移を検出する分析部と、前記遷移が発生したタイミングを示す情報を表示部に表示させる出力部と、を有する。 According to a first aspect of the present invention, there is provided a voice analysis device comprising: an acquisition unit for acquiring voices uttered by a plurality of participants; and a plurality of the plurality of participants from the speech of a first participant among the plurality of participants It has an analysis part which detects transition to a statement of the 2nd participant among participants, and an output part which displays on a display part the information which shows the timing which the transition generated.
 前記出力部は、前記表示部上で、前記第1参加者に対応する位置と、前記第2参加者に対応する位置とを結ぶ線によって、前記タイミングを示す情報を表示してもよい。 The output unit may display the information indicating the timing on the display unit by a line connecting a position corresponding to the first participant and a position corresponding to the second participant.
 前記出力部は、前記表示部上で、前記遷移が発生した時間に前記線を生成し、前記遷移が発生した時間から所定時間の経過後に前記線を消去することによって、前記タイミングを示す情報として前記遷移の時間変化を表示してもよい。 The output unit generates the line at the time when the transition occurs on the display unit, and erases the line after a predetermined time has elapsed from the time when the transition occurs, as information indicating the timing. The time change of the transition may be displayed.
 前記出力部は、前記第1参加者と前記第2参加者との組み合わせに応じて、前記線の表示態様を変更してもよい。 The output unit may change the display mode of the line according to a combination of the first participant and the second participant.
 前記出力部は、前記遷移が発生した回数に応じて、前記線の表示態様を変更してもよい。 The output unit may change the display mode of the line according to the number of times the transition occurs.
 前記分析部は、前記音声に基づいて前記複数の参加者のそれぞれが発言している期間を特定し、前記第1参加者が発言している前記期間から前記第2参加者が発言している前記期間に切り替わった場合に前記遷移を検出してもよい。 The analysis unit identifies a period in which each of the plurality of participants is speaking based on the voice, and the second participant speaks from the period in which the first participant speaks The transition may be detected when the period is switched.
 前記出力部は、前記遷移の時間変化に加えて、前記複数の参加者のそれぞれの発言量を、前記表示部に表示させてもよい。 The output unit may cause the display unit to display an amount of speech of each of the plurality of participants in addition to the time change of the transition.
 本発明の第2の態様の音声分析方法は、プロセッサが、複数の参加者が発した音声を取得するステップと、前記音声における、前記複数の参加者のうち第1参加者の発言から、前記複数の参加者のうち第2参加者の発言への遷移を検出するステップと、前記遷移が発生したタイミングを示す情報を表示部に表示させるステップと、を実行する。 The speech analysis method according to the second aspect of the present invention is characterized in that the processor acquires the speech uttered by a plurality of participants, and the speech of the first participant among the plurality of participants in the speech A step of detecting a transition to a second participant's utterance among a plurality of participants, and a step of displaying information indicating timing at which the transition has occurred on a display unit are performed.
 本発明の第3の態様の音声分析プログラムは、コンピュータに、複数の参加者が発した音声を取得するステップと、前記音声における、前記複数の参加者のうち第1参加者の発言から、前記複数の参加者のうち第2参加者の発言への遷移を検出するステップと、前記遷移が発生したタイミングを示す情報を表示部に表示させるステップと、を実行させる。 The voice analysis program according to the third aspect of the present invention is characterized in that the step of obtaining voices uttered by a plurality of participants on a computer, and the utterance of the first participant among the plurality of participants in the voice A step of detecting a transition to a second participant's utterance among a plurality of participants, and a step of displaying information indicating a timing at which the transition has occurred on a display unit are performed.
 本発明の第4の態様の音声分析システムは、音声分析装置と、前記音声分析装置と通信可能な通信端末と、を備え、前記通信端末は、情報を表示する表示部を有し、前記音声分析装置は、複数の参加者が発した音声を取得する取得部と、前記音声における、前記複数の参加者のうち第1参加者の発言から、前記複数の参加者のうち第2参加者の発言への遷移を検出する分析部と、前記遷移が発生したタイミングを示す情報を前記表示部に表示させる出力部と、を有する。 A voice analysis system according to a fourth aspect of the present invention includes a voice analysis device and a communication terminal capable of communicating with the voice analysis device, the communication terminal having a display unit for displaying information, the voice The analysis device includes an acquisition unit for acquiring voices uttered by a plurality of participants, and a second participant among the plurality of participants from a statement of a first participant among the plurality of participants in the voice. An analysis unit that detects a transition to a message, and an output unit that causes the display unit to display information indicating a timing at which the transition has occurred.
 本発明によれば、低コストで議論を分析できるという効果を奏する。 According to the present invention, it is possible to analyze the argument at low cost.
本実施形態に係る音声分析システムの模式図である。It is a schematic diagram of the speech analysis system concerning this embodiment. 本実施形態に係る音声分析システムのブロック図である。It is a block diagram of a speech analysis system concerning this embodiment. 本実施形態に係る音声分析システムが行う音声分析方法の模式図である。It is a schematic diagram of the speech analysis method which the speech analysis system concerning this embodiment performs. 設定画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the setting screen. 分析部が集計した発言者の遷移を示す行列の模式図である。It is a schematic diagram of the matrix which shows the transition of the speaker which the analysis part calculated. 発言者遷移画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the speaker transition screen. 発言順画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the speech order screen. 分析レポート画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the analysis report screen. 本実施形態に係る音声分析システムが行う音声分析方法のシーケンス図である。It is a sequence diagram of the speech analysis method which the speech analysis system concerning this embodiment performs.
[音声分析システムSの概要]
 図1は、本実施形態に係る音声分析システムSの模式図である。音声分析システムSは、音声分析装置100と、集音装置10と、通信端末20とを含む。音声分析システムSが含む集音装置10及び通信端末20の数は限定されない。音声分析システムSは、その他のサーバ、端末等の機器を含んでもよい。
[Overview of speech analysis system S]
FIG. 1 is a schematic view of a speech analysis system S according to the present embodiment. The voice analysis system S includes a voice analysis device 100, a sound collection device 10, and a communication terminal 20. The number of sound collectors 10 and communication terminals 20 included in the speech analysis system S is not limited. The voice analysis system S may include devices such as other servers and terminals.
 音声分析装置100、集音装置10及び通信端末20は、ローカルエリアネットワーク、インターネット等のネットワークNを介して接続される。音声分析装置100、集音装置10及び通信端末20のうち少なくとも一部は、ネットワークNを介さず直接接続されてもよい。 The voice analysis device 100, the sound collection device 10, and the communication terminal 20 are connected via a network N such as a local area network or the Internet. At least a part of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 may be directly connected without the network N.
 集音装置10は、異なる向きに配置された複数の集音部(マイクロフォン)を含むマイクロフォンアレイを備える。例えばマイクロフォンアレイは、地面に対する水平面において、同一円周上に等間隔で配置された8個のマイクロフォンを含む。集音装置10は、マイクロフォンアレイを用いて取得した音声をデータとして音声分析装置100に送信する。 The sound collector 10 includes a microphone array including a plurality of sound collectors (microphones) arranged in different orientations. For example, the microphone array includes eight microphones equally spaced on the same circumference in the horizontal plane with respect to the ground. The sound collection device 10 transmits the voice acquired using the microphone array to the voice analysis device 100 as data.
 通信端末20は、有線又は無線の通信を行うことが可能な通信装置である。通信端末20は、例えばスマートフォン端末等の携帯端末、又はパーソナルコンピュータ等のコンピュータ端末である。通信端末20は、分析者から分析条件の設定を受け付けるとともに、音声分析装置100による分析結果を表示する。 The communication terminal 20 is a communication device capable of performing wired or wireless communication. The communication terminal 20 is, for example, a portable terminal such as a smart phone terminal or a computer terminal such as a personal computer. The communication terminal 20 receives the setting of analysis conditions from the analyst and displays the analysis result by the voice analysis device 100.
 音声分析装置100は、集音装置10によって取得された音声を、後述の音声分析方法によって分析するコンピュータである。また、音声分析装置100は、音声分析の結果を通信端末20に送信する。 The voice analysis device 100 is a computer that analyzes the voice acquired by the sound collection device 10 by a voice analysis method described later. Further, the voice analysis device 100 transmits the result of the voice analysis to the communication terminal 20.
[音声分析システムSの構成]
 図2は、本実施形態に係る音声分析システムSのブロック図である。図2において、矢印は主なデータの流れを示しており、図2に示していないデータの流れがあってよい。図2において、各ブロックはハードウェア(装置)単位の構成ではなく、機能単位の構成を示している。そのため、図2に示すブロックは単一の装置内に実装されてよく、あるいは複数の装置内に別れて実装されてよい。ブロック間のデータの授受は、データバス、ネットワーク、可搬記憶媒体等、任意の手段を介して行われてよい。
[Configuration of speech analysis system S]
FIG. 2 is a block diagram of the speech analysis system S according to the present embodiment. Arrows in FIG. 2 indicate the main data flow, and there may be data flows not shown in FIG. In FIG. 2, each block is not a hardware (apparatus) unit configuration but a function unit configuration. As such, the blocks shown in FIG. 2 may be implemented in a single device or may be implemented separately in multiple devices. Transfer of data between the blocks may be performed via any means such as a data bus, a network, a portable storage medium, and the like.
 通信端末20は、各種情報を表示するための表示部21と、分析者による操作を受け付けるための操作部22とを有する。表示部21は、液晶ディスプレイ、有機エレクトロルミネッセンス(OLED: Organic Light Emitting Diode)ディスプレイ等の表示装置を含む。操作部22は、ボタン、スイッチ、ダイヤル等の操作部材を含む。表示部21として分析者による接触の位置を検出可能なタッチスクリーンを用いることによって、表示部21と操作部22とを一体に構成してもよい。 The communication terminal 20 has a display unit 21 for displaying various information, and an operation unit 22 for receiving an operation by an analyst. The display unit 21 includes a display device such as a liquid crystal display or an organic light emitting diode (OLED) display. The operation unit 22 includes operation members such as a button, a switch, and a dial. The display unit 21 and the operation unit 22 may be integrally configured by using a touch screen capable of detecting the position of contact by the analyst as the display unit 21.
 音声分析装置100は、制御部110と、通信部120と、記憶部130とを有する。制御部110は、設定部111と、音声取得部112と、音源定位部113と、分析部114と、出力部115とを有する。記憶部130は、設定情報記憶部131と、音声記憶部132と、分析結果記憶部133とを有する。 The voice analysis device 100 includes a control unit 110, a communication unit 120, and a storage unit 130. The control unit 110 includes a setting unit 111, a sound acquisition unit 112, a sound source localization unit 113, an analysis unit 114, and an output unit 115. The storage unit 130 includes a setting information storage unit 131, a voice storage unit 132, and an analysis result storage unit 133.
 通信部120は、ネットワークNを介して集音装置10及び通信端末20との間で通信をするための通信インターフェースである。通信部120は、通信を実行するためのプロセッサ、コネクタ、電気回路等を含む。通信部120は、外部から受信した通信信号に所定の処理を行ってデータを取得し、取得したデータを制御部110に入力する。また、通信部120は、制御部110から入力されたデータに所定の処理を行って通信信号を生成し、生成した通信信号を外部に送信する。 The communication unit 120 is a communication interface for communicating with the sound collection device 10 and the communication terminal 20 via the network N. The communication unit 120 includes a processor for performing communication, a connector, an electric circuit, and the like. The communication unit 120 performs predetermined processing on a communication signal received from the outside to acquire data, and inputs the acquired data to the control unit 110. Further, the communication unit 120 performs predetermined processing on the data input from the control unit 110 to generate a communication signal, and transmits the generated communication signal to the outside.
 記憶部130は、ROM(Read Only Memory)、RAM(Random Access Memory)、ハードディスクドライブ等を含む記憶媒体である。記憶部130は、制御部110が実行するプログラムを予め記憶している。記憶部130は、音声分析装置100の外部に設けられてもよく、その場合に通信部120を介して制御部110との間でデータの授受を行ってもよい。 The storage unit 130 is a storage medium including a read only memory (ROM), a random access memory (RAM), a hard disk drive, and the like. The storage unit 130 stores in advance a program to be executed by the control unit 110. The storage unit 130 may be provided outside the voice analysis device 100, and in this case, data may be exchanged with the control unit 110 via the communication unit 120.
 設定情報記憶部131は、通信端末20において分析者によって設定された分析条件を示す設定情報を記憶する。音声記憶部132は、集音装置10によって取得された音声を記憶する。分析結果記憶部133は、音声を分析した結果を示す分析結果を記憶する。設定情報記憶部131、音声記憶部132及び分析結果記憶部133は、それぞれ記憶部130上の記憶領域であってもよく、あるいは記憶部130上で構成されたデータベースであってもよい。 The setting information storage unit 131 stores setting information indicating analysis conditions set by the analyst in the communication terminal 20. The voice storage unit 132 stores the voice acquired by the sound collection device 10. The analysis result storage unit 133 stores an analysis result indicating the result of analyzing the voice. The setting information storage unit 131, the voice storage unit 132, and the analysis result storage unit 133 may be storage areas on the storage unit 130, or a database configured on the storage unit 130.
 制御部110は、例えばCPU(Central Processing Unit)等のプロセッサであり、記憶部130に記憶されたプログラムを実行することにより、設定部111、音声取得部112、音源定位部113、分析部114及び出力部115として機能する。設定部111、音声取得部112、音源定位部113、分析部114及び出力部115の機能については、図3~図8を用いて後述する。制御部110の機能の少なくとも一部は、電気回路によって実行されてもよい。また、制御部110の機能の少なくとも一部は、ネットワーク経由で実行されるプログラムによって実行されてもよい。 The control unit 110 is, for example, a processor such as a central processing unit (CPU), and executes the program stored in the storage unit 130 to set the setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the analysis unit 114, and the like. It functions as the output unit 115. The functions of the setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the analysis unit 114, and the output unit 115 will be described later with reference to FIGS. 3 to 8. At least a part of the functions of the control unit 110 may be performed by an electrical circuit. In addition, at least a part of the functions of the control unit 110 may be executed by a program executed via a network.
 本実施形態に係る音声分析システムSは、図2に示す具体的な構成に限定されない。例えば音声分析装置100は、1つの装置に限られず、2つ以上の物理的に分離した装置が有線又は無線で接続されることにより構成されてもよい。 The speech analysis system S according to the present embodiment is not limited to the specific configuration shown in FIG. For example, the voice analysis device 100 is not limited to one device, and may be configured by connecting two or more physically separated devices in a wired or wireless manner.
[音声分析方法の説明]
 図3は、本実施形態に係る音声分析システムSが行う音声分析方法の模式図である。まず分析者は、通信端末20の操作部22を操作することによって、分析条件の設定を行う。例えば分析条件は、分析対象とする議論の参加者の人数と、集音装置10を基準とした各参加者(すなわち、複数の参加者それぞれ)が位置する向きとを示す情報である。通信端末20は、分析者から分析条件の設定を受け付け、設定情報として音声分析装置100に送信する(a)。音声分析装置100の設定部111は、通信端末20から設定情報を取得して設定情報記憶部131に記憶させる。
[Description of voice analysis method]
FIG. 3 is a schematic view of the speech analysis method performed by the speech analysis system S according to the present embodiment. First, the analyst sets the analysis conditions by operating the operation unit 22 of the communication terminal 20. For example, the analysis condition is information indicating the number of participants in the argument to be analyzed and the direction in which each participant (that is, each of a plurality of participants) is located with reference to the sound collection device 10. The communication terminal 20 receives the setting of analysis conditions from the analyst, and transmits the setting as the setting information to the voice analysis device 100 (a). The setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and causes the setting information storage unit 131 to store the setting information.
 図4は、設定画面Aを表示している通信端末20の表示部21の前面図である。通信端末20は、表示部21上に設定画面Aを表示し、分析者による分析条件の設定を受け付ける。設定画面Aは、位置設定領域A1と、開始ボタンA2と、終了ボタンA3とを含む。位置設定領域A1は、分析対象の議論において、集音装置10を基準として各参加者Uが実際に位置する向きを設定する領域である。例えば位置設定領域A1は、図4のように集音装置10の位置を中心とした円を表し、さらに円に沿って集音装置10を基準とした角度を表している。 FIG. 4 is a front view of the display unit 21 of the communication terminal 20 displaying the setting screen A. As shown in FIG. The communication terminal 20 displays the setting screen A on the display unit 21 and receives the setting of the analysis condition by the analyst. The setting screen A includes a position setting area A1, a start button A2, and an end button A3. The positioning area A1 is an area for setting the direction in which each participant U is actually positioned with reference to the sound collection device 10 in the argument to be analyzed. For example, the position setting area A1 represents a circle centered on the position of the sound collector 10 as shown in FIG. 4, and further represents an angle based on the sound collector 10 along the circle.
 分析者は、通信端末20の操作部22を操作することによって、位置設定領域A1において各参加者Uの位置を設定する。各参加者Uについて設定された位置の近傍には、各参加者Uを識別する識別情報(ここではU1~U4)が割り当てられて表示される。図4の例では、4人の参加者U1~U4が設定されている。位置設定領域A1内の各参加者Uに対応する部分は、参加者ごとに異なる色で表示される。これにより、分析者は容易に各参加者Uが設定されている向きを認識することができる。 The analyst sets the position of each participant U in the position setting area A1 by operating the operation unit 22 of the communication terminal 20. In the vicinity of the position set for each participant U, identification information (here, U1 to U4) for identifying each participant U is allocated and displayed. In the example of FIG. 4, four participants U1 to U4 are set. The portion corresponding to each participant U in the positioning area A1 is displayed in a different color for each participant. Thereby, the analyst can easily recognize the direction in which each participant U is set.
 開始ボタンA2及び終了ボタンA3は、それぞれ表示部21上に表示された仮想的なボタンである。通信端末20は、分析者によって開始ボタンA2が押下されると、音声分析装置100に開始指示の信号を送信する。通信端末20は、分析者によって終了ボタンA3が押下されると、音声分析装置100に終了指示の信号を送信する。本実施形態では、分析者による開始指示から終了指示までを1つの議論とする。 The start button A2 and the end button A3 are virtual buttons displayed on the display unit 21, respectively. The communication terminal 20 transmits a signal of a start instruction to the voice analysis device 100 when the analyst presses the start button A2. The communication terminal 20 transmits a signal of a termination instruction to the voice analysis device 100 when the analyst presses the termination button A3. In the present embodiment, from the start instruction to the end instruction by the analyst is one discussion.
 音声分析装置100の音声取得部112は、通信端末20から開始指示の信号を受信した場合に、音声の取得を指示する信号を集音装置10に送信する(b)。集音装置10は、音声分析装置100から音声の取得を指示する信号を受信した場合に、音声の取得を開始する。また、音声分析装置100の音声取得部112は、通信端末20から終了指示の信号を受信した場合に、音声の取得の終了を指示する信号を集音装置10に送信する。集音装置10は、音声分析装置100から音声の取得の終了を指示する信号を受信した場合に、音声の取得を終了する。 When the voice acquisition unit 112 of the voice analysis device 100 receives the signal of the start instruction from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing acquisition of voice to the sound collection device 10 (b). When the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100, the collection of voice is started. Further, when the voice acquisition unit 112 of the voice analysis device 100 receives the signal of the termination instruction from the communication terminal 20, the voice acquisition unit 112 transmits a signal instructing the termination of the voice acquisition to the sound collection device 10. When the sound collection device 10 receives a signal instructing the end of the acquisition of sound from the speech analysis device 100, the sound collection device 10 ends the acquisition of sound.
 集音装置10は、複数の集音部においてそれぞれ音声を取得し、各集音部に対応する各チャネルの音声として内部に記録する。そして集音装置10は、取得した複数のチャネルの音声を、音声分析装置100に送信する(c)。集音装置10は、取得した音声を逐次送信してもよく、あるいは所定量又は所定時間の音声を送信してもよい。また、集音装置10は、取得の開始から終了までの音声をまとめて送信してもよい。音声分析装置100の音声取得部112は、集音装置10から音声を受信して音声記憶部132に記憶させる。 The sound collection device 10 acquires voices in each of a plurality of sound collection units, and internally records the sound as the sound of each channel corresponding to each sound collection unit. Then, the sound collection device 10 transmits the acquired voices of the plurality of channels to the voice analysis device 100 (c). The sound collector 10 may transmit the acquired voice sequentially or may transmit a predetermined amount or a predetermined time of sound. Further, the sound collection device 10 may collectively transmit the sound from the start to the end of the acquisition. The voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
 音声分析装置100は、集音装置10から取得した音声を用いて、所定のタイミングで音声を分析する。音声分析装置100は、分析者が通信端末20において所定の操作によって分析指示を行った際に、音声を分析してもよい。この場合には、分析者は分析対象とする議論に対応する音声を音声記憶部132に記憶された音声の中から選択する。 The voice analysis device 100 analyzes voice at predetermined timing using the voice acquired from the sound collection device 10. The voice analysis device 100 may analyze the voice when the analyst gives an analysis instruction at the communication terminal 20 by a predetermined operation. In this case, the analyst selects a voice corresponding to the argument to be analyzed from the voices stored in the voice storage unit 132.
 また、音声分析装置100は、音声の取得が終了した際に音声を分析してもよい。この場合には、取得の開始から終了までの音声が分析対象の議論に対応する。また、音声分析装置100は、音声の取得の途中で逐次(すなわちリアルタイム処理で)音声を分析してもよい。この場合には、音声分析装置100は、現在時間から遡って過去の所定時間分(例えば30秒間)の音声が分析対象の議論に対応する。 In addition, the voice analysis device 100 may analyze the voice when the voice acquisition ends. In this case, the speech from the start to the end of the acquisition corresponds to the argument to be analyzed. In addition, the voice analysis device 100 may analyze voice sequentially (that is, in real time processing) during acquisition of voice. In this case, the voice analysis device 100 goes back from the current time, and the voice for a predetermined time (for example, 30 seconds) in the past corresponds to the argument to be analyzed.
 音声を分析する際に、まず音源定位部113は、音声取得部112が取得した複数チャネルの音声に基づいて音源定位を行う(d)。音源定位は、音声取得部112が取得した音声に含まれる音源の向きを、時間ごと(例えば10ミリ秒~100ミリ秒ごと)に推定する処理である。音源定位部113は、時間ごとに推定した音源の向きを、設定情報記憶部131に記憶された設定情報が示す参加者の向きと関連付ける。 When analyzing speech, first, the sound source localization unit 113 performs sound source localization based on the plurality of channels of speech acquired by the speech acquisition unit 112 (d). The sound source localization is processing for estimating the direction of the sound source included in the sound acquired by the sound acquisition unit 112 for each time (for example, every 10 milliseconds to 100 milliseconds). The sound source localization unit 113 associates the direction of the sound source estimated for each time with the direction of the participant indicated by the setting information stored in the setting information storage unit 131.
 音源定位部113は、集音装置10から取得した音声に基づいて音源の向きを特定可能であれば、MUSIC(Multiple Signal Classification)法、ビームフォーミング法等、公知の音源定位方法を用いることができる。 If the sound source localization unit 113 can identify the direction of the sound source based on the sound acquired from the sound collection device 10, a known sound source localization method such as Multiple Signal Classification (MUSIC) method or beam forming method can be used. .
 次に分析部114は、音声取得部112が取得した音声及び音源定位部113が推定した音源の向きに基づいて、音声を分析する(e)。分析部114は、完了した議論の全体を分析対象としてもよく、あるいはリアルタイム処理の場合に議論の一部を分析対象としてもよい。 Next, the analysis unit 114 analyzes the voice based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113 (e). The analysis unit 114 may analyze the entire completed discussion as an analysis target, or may analyze a part of the discussion in the case of real-time processing.
 具体的には、まず分析部114は、音声取得部112が取得した音声及び音源定位部113が推定した音源の向きに基づいて、分析対象の議論において、時間ごと(例えば10ミリ秒~100ミリ秒ごと)に、いずれの参加者が発言(発声)したかを判別する。分析部114は、1人の参加者が発言を開始してから終了するまでの連続した期間を発言期間として特定し、分析結果記憶部133に記憶させる。同じ時間に複数の参加者が発言を行った場合には、分析部114は、参加者ごとに発言期間を特定する。 Specifically, based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113, the analysis unit 114 first performs analysis (for example, 10 milliseconds to 100 milliseconds) in the discussion of the analysis target. Every second), it is determined which participant speaks (speaks). The analysis unit 114 specifies a continuous period from the start to the end of one participant's speech as a speech period, and causes the analysis result storage unit 133 to store the same. When a plurality of participants speak at the same time, the analysis unit 114 specifies a speech period for each participant.
 また、分析部114は、時間ごとの各参加者の発言量を算出し、分析結果記憶部133に記憶させる。具体的には、分析部114は、ある時間窓(例えば5秒間)において、参加者の発言を行った時間の長さを時間窓の長さで割った値を、時間ごとの発言量として算出する。そして分析部114は、議論の開始時間から終了時間(リアルタイム処理の場合には現在)まで、時間窓を所定の時間(例えば1秒)ずつずらしながら、各参加者について時間ごとの発言量の算出を繰り返す。 In addition, the analysis unit 114 calculates the amount of speech of each participant for each time, and causes the analysis result storage unit 133 to store the amount. Specifically, in a certain time window (for example, 5 seconds), the analysis unit 114 calculates a value obtained by dividing the length of time during which the participant speaks by the length of the time window as the amount of speech per time Do. Then, the analysis unit 114 calculates the amount of speech per hour for each participant while shifting the time window by a predetermined time (for example, one second) from the start time of the discussion to the end time (currently in the case of real time processing). repeat.
 そして分析部114は、ある発言期間の後に別の発言期間に切り替わった場合に、発言者の遷移を検出する。発言者の遷移には、ある参加者(第1参加者)が発言を終えた後に別の参加者(第2参加者)が発言を行う場合と、ある参加者が発言を終えた後に同じ参加者が次の発言を行う場合とがある。また、発言期間が2回以上切り替わったことを、1つの遷移として検出してもよい。例えば、ある参加者(第1参加者)が発言を終えた後に別の参加者(第2参加者)が発言を行い、その後にさらに別の参加者(第3参加者)が発言を行ったことを、1つの遷移として検出してもよい。分析部114は、分析対象の議論において検出した遷移の発生時間と、遷移元の参加者と、遷移先の参加者とを集計し、それらを関連付けて分析結果記憶部133に記憶させる。 Then, when the analysis unit 114 switches to another utterance period after a certain utterance period, the analysis unit 114 detects the transition of the speaker. For the transition of the speaker, the same participant participates when another participant (second participant) speaks after one participant (first participant) finishes speaking and after the one participant finishes speaking May make the following remarks. Also, it may be detected as one transition that the speech period has been switched two or more times. For example, after one participant (the first participant) finished speaking, another participant (the second participant) spoke and then another participant (the third participant) spoke May be detected as one transition. The analysis unit 114 counts the occurrence time of the transition detected in the discussion of the analysis target, the transition source participant, and the transition destination participant, associates them with one another, and stores the result in the analysis result storage unit 133.
 図5は、分析部114が集計した発言者の遷移を示す行列Bの模式図である。図5において行列Bは視認性のために文字列の表として表されているが、バイナリデータ等、コンピュータが認識可能なその他形式で表されてもよい。 FIG. 5 is a schematic view of the matrix B indicating the transition of the speaker collected by the analysis unit 114. As shown in FIG. Although in FIG. 5 the matrix B is represented as a table of character strings for visibility, it may be represented in other forms recognizable by a computer, such as binary data.
 行列Bは、分析対象の議論において、遷移元の参加者から遷移先の参加者へ遷移した回数を表す。図5の例では、参加者U1から同じ参加者U1に遷移した回数は2回であり、参加者U1から別の参加者U4に遷移した回数は8回である。行列Bの対角成分は発言者が交替しなかったことを示し、行列Bの非対角成分は発言者が交替したことを示す。そのため分析部114は、行列Bの対角成分と非対角成分とを比較することによって、グループの雰囲気を判定することができる。 The matrix B represents the number of transitions from the transition source participant to the transition destination participant in the analysis target discussion. In the example of FIG. 5, the number of transitions from the participant U1 to the same participant U1 is two, and the number of transitions from the participant U1 to another participant U4 is eight. The diagonal components of matrix B indicate that the speakers did not alternate, and the non-diagonal components of matrix B indicate that the speakers alternate. Therefore, the analysis unit 114 can determine the atmosphere of the group by comparing the diagonal and non-diagonal components of the matrix B.
[表示方法の説明]
 出力部115は、表示情報を通信端末20に送信することによって、分析部114による分析結果を表示部21上に表示させる制御を行う(f)。出力部115による分析結果の表示制御方法を、図6~図8を用いて以下に説明する。
[Description of display method]
The output unit 115 performs control to display the analysis result by the analysis unit 114 on the display unit 21 by transmitting the display information to the communication terminal 20 (f). The display control method of the analysis result by the output unit 115 will be described below with reference to FIGS. 6 to 8.
 音声分析装置100の出力部115は、分析結果を表示する際に、表示対象の議論についての分析部114による分析結果を分析結果記憶部133から読み出す。出力部115は、分析部114による分析が完了した直後の議論を表示対象としてもよく、あるいは分析者によって指定された議論を表示対象としてもよい。 When displaying the analysis result, the output unit 115 of the voice analysis device 100 reads the analysis result by the analysis unit 114 for the display target discussion from the analysis result storage unit 133. The output unit 115 may display a discussion immediately after the analysis by the analysis unit 114 is completed, or may display a discussion specified by the analyst.
 まず、発言者の遷移のタイミングを示す情報を表示する発言者遷移画面Cを説明する。図6は、発言者遷移画面Cを表示している通信端末20の表示部21の前面図である。発言者遷移画面Cは、参加者Uの配置を示す円C1と、発言者の遷移を示す線C2と、各参加者Uの発言量を示す棒C3とを含む。 First, a speaker transition screen C displaying information indicating the transition timing of the speaker will be described. FIG. 6 is a front view of the display unit 21 of the communication terminal 20 displaying the speaker transition screen C. As shown in FIG. The speaker transition screen C includes a circle C1 indicating the arrangement of the participants U, a line C2 indicating the transition of the speakers, and a bar C3 indicating the amount of speech of each participant U.
 発言者遷移画面Cを表示する際に、出力部115は、分析結果記憶部133から読み出した分析結果に基づいて、発言者の遷移のタイミングを示す情報として、発言者の遷移の時間変化を表示するための表示情報を生成する。具体的には、出力部115は、ある参加者から別の参加者への発言の遷移が発生した場合に、該遷移の発生時間から所定期間(例えば5秒間)、遷移元の参加者の位置と遷移先の参加者の位置とを結ぶ線を表示するための表示情報を生成する。 When displaying the speaker transition screen C, the output unit 115 displays the time change of the speaker transition as information indicating the transition timing of the speaker based on the analysis result read from the analysis result storage unit 133. To generate display information for Specifically, when a transition of an utterance from a participant to another participant occurs, the output unit 115 determines the position of the transition source participant for a predetermined period (for example, 5 seconds) from the time of occurrence of the transition. And display information for displaying a line connecting the position of the target and the transition destination participant.
 円C1は、各参加者Uの配置を模式的に表す円形状の領域である。出力部115は、図4において設定された各参加者Uの位置に対応する円C1上の位置の近傍に、参加者Uの識別情報(すなわちU1~U4)を表示させる。 The circle C1 is a circular area that schematically represents the arrangement of each participant U. The output unit 115 displays the identification information (that is, U1 to U4) of the participant U near the position on the circle C1 corresponding to the position of each participant U set in FIG.
 線C2は、発言者の遷移が発生した場合に、遷移元の参加者Uの円C1上の位置と遷移先の参加者Uの円C1上の位置とを結ぶ線である。線C2は、所定の色及び所定の太さで表示される。線C2は、まっすぐな線分でもよく、曲がった線でもよく、点線のように途切れた線でもよい。 A line C2 is a line connecting the position of the transition source participant U on the circle C1 and the position of the transition destination participant U on the circle C1 when the transition of the speaker occurs. The line C2 is displayed in a predetermined color and a predetermined thickness. The line C2 may be a straight line segment, a bent line, or a broken line like a dotted line.
 出力部115は、遷移の発生時間から所定期間(ここでは5秒間)、遷移元の参加者Uの位置と遷移先の参加者Uの位置とを結ぶ線C2を、表示部21に表示させる。そして出力部115は、遷移の発生時間から所定期間後に線C2を表示部21に消去させる。出力部115は、表示対象の議論の開始時間から終了時間まで、発言者の遷移を表す線の生成と消去を繰り返す。これにより出力部115は、発言者の遷移の時間変化を表示部21に表示させることができる。出力部115は、表示中の時間を自動的に進めても(すなわち動画として表示しても)よく、あるいはユーザによる操作に従って表示中の時間を進めてもよい。 The output unit 115 causes the display unit 21 to display a line C2 connecting the position of the transition source participant U and the position of the transition destination participant U for a predetermined period (five seconds in this case) from the transition generation time. Then, the output unit 115 causes the display unit 21 to erase the line C2 after a predetermined period from the occurrence time of the transition. The output unit 115 repeats generation and deletion of a line representing the transition of the speaker from the start time to the end time of the display target discussion. Thus, the output unit 115 can cause the display unit 21 to display the time change of the transition of the speaker. The output unit 115 may automatically advance the time during display (that is, may display as a moving image), or may advance the time during display according to the operation by the user.
 このように出力部115は、発言者の遷移のタイミングを示す情報として発言者の遷移の時間変化を表示することによって、議論の時系列に沿って遷移の傾向がどのように変化するかを表すことができる。これにより分析者は、各参加者Uの役割や、参加者U間の関係性を、議論の時系列に沿って効率的に把握することができる。 As described above, the output unit 115 indicates how the transition tendency changes along the time series of the discussion by displaying the time change of the transition of the speaker as information indicating the timing of the transition of the speaker. be able to. As a result, the analyst can efficiently grasp the role of each participant U and the relationship between the participants U along the time series of the discussion.
 出力部115は、同じ参加者Uの組み合わせについて複数の線C2を表示する場合に、複数の線C2の両端の位置を所定量ずらして表示部21に表示させてもよい。これにより、出力部115は、同じ参加者U間で近い時間に複数の遷移が発生した場合であっても、複数の線C2が一致しないようにすることができる。 When displaying a plurality of lines C2 for the same combination of participants U, the output unit 115 may shift the positions of both ends of the plurality of lines C2 by a predetermined amount and cause the display unit 21 to display the positions. As a result, the output unit 115 can prevent the plurality of lines C2 from matching each other even when a plurality of transitions occur in the same time between the same participants U.
 また、出力部115は、近い時間(例えば5秒以内)に同じ参加者Uの組み合わせについて複数の遷移が発生した場合に、発生した遷移の回数に基づいて線C2の太さや色等の表示態様を変えてもよい。例えば出力部115は、表示部21に、遷移の回数が多いほど線C2の太く表示させ、あるいは線C2を遷移の回数に応じた異なる色で表示させる。出力部115は、同じ参加者U間で近い時間に複数の遷移が発生したことを、分析者にとってわかりやすく表示することができる。 In addition, when a plurality of transitions occur for the same combination of participants U in a short time (for example, within 5 seconds), the output unit 115 displays a display mode such as the thickness and color of the line C2 based on the number of transitions that occur. You may change the For example, the output unit 115 causes the display unit 21 to display a thicker line C2 as the number of transitions increases, or displays the line C2 in a different color according to the number of transitions. The output unit 115 can display the fact that a plurality of transitions have occurred in the same time between the same participants U in a manner easily understood by the analyst.
 また、出力部115は、同じ参加者Uの組み合わせにおける、議論の開始時間から表示中の時間までの累計の遷移の回数に基づいて、線C2の太さや色等の表示態様を変えてもよい。例えば出力部115は、表示部21に、累計の遷移の回数が多いほど線C2を太く表示させ、あるいは累計の遷移の回数に応じた異なる色で線C2を表示させる。これにより、出力部115は、参加者Uの組み合わせごとに累計の遷移回数が多い又は少ないことを、分析者にとってわかりやすく表示することができる。 In addition, the output unit 115 may change the display mode such as thickness or color of the line C2 based on the number of times of transition from the start time of the discussion to the time during display in the same combination of the participants U . For example, the output unit 115 causes the display unit 21 to display the line C2 thicker as the number of cumulative transitions increases, or causes the line C2 to be displayed in a different color according to the number of cumulative transitions. As a result, the output unit 115 can display the fact that the number of transitions of the cumulative total is high or low for each combination of the participants U in a manner easily understood by the analyst.
 また、出力部115は、参加者Uの組み合わせによって、線C2の太さや色等の表示態様を変えてもよい。例えば出力部115は、表示部21に、参加者Uの組み合わせに応じて異なる太さ又は色で線C2を表示させる。これにより、出力部115は、線C2がいずれの参加者Uの組み合わせに対応するかを、分析者にとってわかりやすく表示することができる。 Further, the output unit 115 may change the display mode such as the thickness and color of the line C2 depending on the combination of the participants U. For example, the output unit 115 causes the display unit 21 to display the line C2 with a different thickness or color depending on the combination of the participants U. Thereby, the output unit 115 can display the combination of the participants U corresponding to the line C2 in a manner easy for the analyst to understand.
 棒C3は、各参加者Uの発言量を表す棒状の領域である。出力部115は、分析結果記憶部133から読み出した分析結果が示す、表示中の時間における各参加者Uの時間ごとの発言量を取得する。そして出力部115は、各参加者Uの位置に対応する円C1上の位置に、読み出した発言量に応じた長さ又は大きさの棒C3を表示させる。例えば出力部115は、表示部21に、参加者Uの発言量が多いほど円C1の円周から中心方向に向かう長さが長くなるように棒C3を表示させる。これにより、出力部115は、発言の遷移の時間変化に加えて、表示中の時間における各参加者の発言量を、分析者にとってわかりやすく表示することができる。 A bar C3 is a bar-like area that represents the amount of speech of each participant U. The output unit 115 acquires the amount of speech of each participant U at each time during display, which is indicated by the analysis result read out from the analysis result storage unit 133. Then, the output unit 115 causes the bar C3 having a length or a size corresponding to the read amount of speech to be displayed at a position on the circle C1 corresponding to the position of each participant U. For example, the output unit 115 causes the display unit 21 to display the bar C3 such that the length from the circumference of the circle C1 toward the center becomes longer as the amount of speech of the participant U increases. As a result, the output unit 115 can display the amount of speech of each participant during the displayed time in an easily understandable manner to the analyst in addition to the time change of the transition of the speech.
 また、出力部115は、時間ごとの発言量に限られず、議論の開始時間から表示中の時間までの発言量の累計値に応じた長さ又は大きさの棒C3を表示させてもよい。また、出力部115は、参加者Uによって、棒C3の色や模様等の表示態様を変えてもよい。 In addition, the output unit 115 may display a bar C3 having a length or a size corresponding to the cumulative value of the amount of speech from the start time of the discussion to the time being displayed, not limited to the amount of speech for each time. In addition, the output unit 115 may change the display mode such as the color or pattern of the bar C3 depending on the participant U.
 また、出力部115は、ある参加者Uから別の参加者Uへの遷移の時間変化に限られず、遷移が発生した参加者Uの組み合わせの時間変化を表示してもよい。この場合には、出力部115は、円C1上に参加者Uの組み合わせを示す識別情報(例えば「U1-U2」、「U1-U3」等)を表示させる。 Further, the output unit 115 is not limited to the time change of the transition from one participant U to another participant U, and may display the time change of the combination of the participants U in which the transition has occurred. In this case, the output unit 115 causes the circle C1 to display identification information (for example, “U1-U2”, “U1-U3”, etc.) indicating the combination of the participants U.
 そして例えば参加者U1と参加者U2との間の遷移が発生してから所定時間内に参加者U1と参加者U3との間の遷移が発生した場合に、出力部115は、「U1-U2」の位置と「U1-U3」の位置とを結ぶ線C2を、表示部21に表示させる。そして出力部115は、線C2を表示してから所定時間後に線C2を表示部21に消去させる。これにより、出力部115は、遷移が発生した参加者Uの組み合わせが、議論の時系列に沿ってどのように変化するかを表すことができる。 Then, for example, when the transition between the participant U1 and the participant U3 occurs within a predetermined time after the transition between the participant U1 and the participant U2 occurs, the output unit 115 outputs “U1-U2 The display unit 21 displays a line C2 connecting the position of “” and the position of “U1-U3”. Then, the output unit 115 causes the display unit 21 to erase the line C2 a predetermined time after the line C2 is displayed. Thereby, the output unit 115 can represent how the combination of the participants U in which the transition has occurred changes along the time series of the discussion.
 次に、議論における発言の順番を表示する発言順画面Dを説明する。図7は、発言順画面Dを表示している通信端末20の表示部21の前面図である。発言順画面Dは、参加者Uの発言量を示す領域D1と、発言者間の遷移の回数を示す矢印D2とを含む。 Next, the speech order screen D for displaying the order of speech in the discussion will be described. FIG. 7 is a front view of the display unit 21 of the communication terminal 20 displaying the speech order screen D. The speech order screen D includes a region D1 indicating the amount of speech of the participant U and an arrow D2 indicating the number of transitions between the speakers.
 発言順画面Dを表示する際に、出力部115は、分析結果記憶部133から読み出した分析結果が示す、表示対象の議論における各参加者Uの時間ごとの発言量を取得する。そして出力部115は、表示対象の議論の開始時間から終了時間までの時間ごとの発言量を合計することによって、各参加者Uの合計の発言量を算出する。また、出力部115は、分析結果記憶部133から読み出した分析結果から、参加者Uの組み合わせごとに表示対象の議論において発生した遷移の回数(すなわち図5に示した行列B)を取得する。 When displaying the speech order screen D, the output unit 115 acquires the amount of speech of each participant U in the discussion of the display target for each time indicated by the analysis result read out from the analysis result storage unit 133. Then, the output unit 115 calculates the total amount of speech of each participant U by summing up the amounts of speech for each time from the start time to the end time of the display target discussion. Further, the output unit 115 acquires, from the analysis result read out from the analysis result storage unit 133, the number of transitions (that is, the matrix B illustrated in FIG. 5) generated in the display target discussion for each combination of participants U.
 領域D1は、各参加者Uの合計の発言量を表す図形である。出力部115は、合計の発言量に応じた大きさの領域D1を、表示部21上に表示させる。例えば出力部115は、各参加者Uについて合計の発言量が多いほど半径が大きい円を、領域D1として表示部21に表示させる。領域D1は、円に限られず、多角形等のその他図形であってもよい。 A region D1 is a graphic representing the total amount of speech of each participant U. The output unit 115 causes the display unit 21 to display an area D1 having a size corresponding to the total amount of speech. For example, the output unit 115 causes the display unit 21 to display a circle with a larger radius as the total amount of speech of each participant U is larger as the area D1. The area D1 is not limited to a circle, but may be another figure such as a polygon.
 矢印D2は、ある参加者Uから別の参加者Uへの遷移の向き及び遷移の回数を表す図形である。出力部115は、遷移元の参加者Uに対応する領域D1から、遷移先の参加者Uに対応する領域D1へ向けて、遷移の回数に応じた太さの矢印D2を、表示部に表示させる。矢印D2は、まっすぐな矢印でもよく、曲がった矢印でもよく、点線のように途切れた矢印でもよい。 An arrow D2 is a graphic representing the direction of transition and the number of transitions from one participant U to another participant U. The output unit 115 displays, from the area D1 corresponding to the transition source participant U, to the area D1 corresponding to the transition destination participant U, an arrow D2 of a thickness corresponding to the number of transitions on the display unit Let The arrow D2 may be a straight arrow, a curved arrow, or a broken arrow like a dotted line.
 例えば出力部115は、表示部21に、遷移元の参加者Uから遷移先の参加者Uへの遷移の回数が多いほど、矢印D2を太く表示させる。出力部115は、遷移の回数が所定の閾値以下である参加者Uの組み合わせについては、矢印D2を表示させなくてもよい。 For example, the output unit 115 causes the display unit 21 to display the arrow D2 thicker as the number of transitions from the transition source participant U to the transition destination participant U increases. The output unit 115 may not display the arrow D2 for the combination of the participants U whose number of transitions is equal to or less than a predetermined threshold.
 出力部115は、参加者U間の遷移の回数に基づいて、複数の領域D1の配置を調整してもよい。この場合には、出力部115は、遷移の回数が多い参加者Uに対応する2つの領域D1を近くに配置し、遷移の回数が少ない参加者Uに対応する2つの領域D1を遠くに配置する。あるいは出力部115は、参加者Uの物理的な位置に基づいて、複数の領域D1を配置してもよい。この場合には、出力部115は、図4において設定された各参加者Uの位置に合うように、複数の領域D1を配置する。 The output unit 115 may adjust the arrangement of the plurality of areas D1 based on the number of transitions between the participants U. In this case, the output unit 115 arranges two areas D1 corresponding to the participant U with many transition times near and arranges two areas D1 corresponding to the participant U with small transition numbers in the distance. Do. Alternatively, the output unit 115 may arrange the plurality of areas D1 based on the physical position of the participant U. In this case, the output unit 115 arranges the plurality of areas D1 so as to match the positions of the participants U set in FIG. 4.
 このように出力部115は、参加者Uの発言量と、参加者間の遷移の回数とを同時に表す。これにより分析者は、いずれの参加者Uが多く又は少なく話したかと、参加者U間の発言の流れとを一見して把握することができる。 Thus, the output unit 115 simultaneously indicates the amount of speech of the participant U and the number of transitions between the participants. As a result, the analyst can grasp at a glance the flow of utterances among the participants U and which participant U talks more or less.
 次に、議論全体のようすを表示する分析レポート画面Eを説明する。図8は、分析レポート画面Eを表示している通信端末20の表示部21の前面図である。分析レポート画面Eは、主な発言の順番E1と、グループの雰囲気E2と、参加者の分類E3とを含む。 Next, an analysis report screen E for displaying the whole discussion will be described. FIG. 8 is a front view of the display unit 21 of the communication terminal 20 displaying the analysis report screen E. The analysis report screen E includes a main statement order E1, a group atmosphere E2, and a participant classification E3.
 分析レポート画面Eを表示する際に、出力部115は、分析結果記憶部133から読み出した分析結果が示す、表示対象の議論における各参加者Uの時間ごとの発言量を取得する。そして出力部115は、表示対象の議論の開始時間から終了時間までの時間ごとの発言量を合計することによって、各参加者Uの合計の発言量を算出する。また、出力部115は、分析結果記憶部133から読み出した分析結果から、参加者Uの組み合わせごとに表示対象の議論において発生した遷移の回数(すなわち図5に示した行列B)を取得する。 When displaying the analysis report screen E, the output unit 115 acquires the amount of speech of each participant U in the discussion of the display target for each time indicated by the analysis result read out from the analysis result storage unit 133. Then, the output unit 115 calculates the total amount of speech of each participant U by summing up the amounts of speech for each time from the start time to the end time of the display target discussion. Further, the output unit 115 acquires, from the analysis result read out from the analysis result storage unit 133, the number of transitions (that is, the matrix B illustrated in FIG. 5) generated in the display target discussion for each combination of participants U.
 主な発言の順番E1は、議論において多く発生した発言者の遷移を示す情報である。出力部115は、ある参加者Uから1人以上の他の参加者Uを経て最初の参加者Uに戻る一連の遷移について、それぞれ遷移の回数を合計する。例えば一連の遷移は、参加者U1から参加者U4へ遷移し、次に参加者U4から参加者U3へ遷移し、次に参加者U3から最初の参加者U1へ遷移することを含む。出力部115は、最も遷移の回数が多い一連の遷移が示す参加者Uの組み合わせを、主な発言の順番E1として決定し、分析レポート画面Eに表示させる。出力部115は、遷移の回数が多い順に2つ以上の主な発言の順番E1を決定してもよい。これにより分析者は、議論の中心にいた参加者Uを把握することができる。 The order of main utterances E1 is information indicating the transition of the speaker who frequently occurred in the discussion. The output unit 115 sums up the number of transitions for a series of transitions from one participant U to one or more other participants U and then returning to the first participant U. For example, the series of transitions includes transitioning from participant U1 to participant U4, then transitioning from participant U4 to participant U3, and then transitioning from participant U3 to the first participant U1. The output unit 115 determines the combination of the participants U indicated by a series of transitions having the largest number of transitions as the main utterance order E1 and causes the analysis report screen E to be displayed. The output unit 115 may determine the order E1 of two or more main utterances in descending order of the number of transitions. This allows the analyst to grasp the participant U who was at the center of the discussion.
 グループの雰囲気E2は、議論において発言者の交替が多いか少ないかの雰囲気を示す情報である。具体的には、出力部115は、図5に示した行列Bにおいて、対角成分(すなわち同じ参加者U間)の遷移の回数の平均値と、非対角成分(すなわち異なる参加者U間)の遷移の回数の平均値とを算出する。そして出力部115は、対角成分の平均値と非対角成分の平均値との比を、グループの雰囲気E2として分析レポート画面Eに表示させる。図8の例では、出力部115は、左右方向に延在するスケール上で、対角成分の平均値と非対角成分の平均値との比に対応する位置に矢印を表示している。また、出力部115は、対角成分の平均値及び非対角成分の平均値を示す値を表示してもよい。これにより分析者は、議論を行ったグループ全体の雰囲気を把握することができる。 The group atmosphere E2 is information indicating an atmosphere as to whether the number of speaker changes is large or small in the discussion. Specifically, in the matrix B shown in FIG. 5, the output unit 115 calculates the average value of the number of transitions of diagonal components (that is, between the same participants U) and non-diagonal components (that is, between different participants U). And the average value of the number of transitions of Then, the output unit 115 causes the analysis report screen E to display the ratio of the average value of the diagonal components and the average value of the non-diagonal components as the atmosphere E2 of the group. In the example of FIG. 8, the output unit 115 displays an arrow at a position corresponding to the ratio of the average value of the diagonal components and the average value of the non-diagonal components on the scale extending in the left-right direction. Further, the output unit 115 may display a value indicating an average value of diagonal components and an average value of non-diagonal components. This enables the analyst to grasp the atmosphere of the whole group who discussed.
 参加者の分類E3は、議論における各参加者Uの発言量及び遷移に基づいて、各参加者Uを分類する情報である。出力部115は、参加者Uの発言量を示す軸と、参加者Uが議論の中心にいたか否かを示す軸との2つの軸に関して、各参加者Uを分類する。 Participant classification E3 is information for classifying each participant U based on the volume and transition of each participant U in the discussion. The output unit 115 classifies each participant U with respect to two axes of an axis indicating the amount of utterance of the participant U and an axis indicating whether the participant U was at the center of the discussion.
 具体的には、出力部115は、参加者Uの発言量を示す軸について、発言量が所定の閾値以上である参加者Uを原点より上(図8の右方向)に配置し、発言量が所定の閾値未満である参加者Uを原点より下(図8の左方向)に配置する。出力部115は、参加者Uが議論の中心にいたか否かを示す軸について、主な発言の順番E1に含まれている参加者Uを原点より上(図8の上方向)に配置し、主な発言の順番E1に含まれていない参加者Uを原点より下(図8の下方向)に配置する。 Specifically, the output unit 115 arranges the participant U whose utterance amount is equal to or more than a predetermined threshold above the origin (rightward in FIG. 8) on the axis indicating the utterance amount of the participant U, and the utterance amount Is arranged below the origin (to the left in FIG. 8). The output unit 115 arranges the participant U included in the main statement order E1 above the origin (upward in FIG. 8) with respect to an axis indicating whether the participant U was at the center of the discussion. , Place the participant U not included in the main statement order E1 below the origin (downward in FIG. 8).
 出力部115は、2つの軸に区切られた4つの領域(象限)について、それぞれ所定のラベルを表示させる。各領域のラベルは、音声分析装置100に予め設定される。図8の例では、出力部115は、右上の領域(発言量が多く、議論の中心である参加者U)に対して「リーダー型」、左上の領域(発言量が少なく、議論の中心である参加者U)に対して「参謀型」、右下の領域(発言量が多く、議論の中心でない参加者U)に対して「1人ずもう型」、左下の領域(発言量が少なく、議論の中心でない参加者U)に対して「非参加型」と表示している。このように各参加者Uを分類することにより、分析者は、議論全体における各参加者Uのようすを把握することができる。 The output unit 115 displays predetermined labels for four areas (quadrants) divided into two axes. The labels of the respective areas are preset in the voice analysis device 100. In the example of FIG. 8, the output unit 115 is “leader type” with respect to the upper right area (the participant U who has a large amount of speech and is the center of discussion) and the upper left area (the small amount of speech is the center of the discussion "Participant type" for one participant U), "One more for another type" for the lower right area (a large amount of talks, non-controversial participant U), lower left area (a small amount of speech , "Non-participatory" is displayed for the non-centered participant U). By classifying each participant U in this manner, the analyst can grasp the state of each participant U in the entire discussion.
 さらに出力部115は、発言者の遷移に基づいて参加者U同士の相性を判定し、分析レポート画面Eに表示させてもよい。出力部115は、2人の参加者Uの全ての組み合わせについて、それぞれ遷移の回数を合計する。出力部115は、遷移の回数が所定の閾値以上である参加者Uの組み合わせを良い相性と判定し、遷移の回数が所定の閾値未満である参加者Uの組み合わせを悪い相性と判定する。そして出力部115は、参加者Uの各組み合わせについて判定した相性を、分析レポート画面Eに表示させる。これにより、分析者は、参加者Uの各組み合わせについて遷移の多いこと又は少ないことを把握することができる。 Furthermore, the output unit 115 may determine the affinity of the participants U based on the transition of the speaker, and may display the analysis on the analysis report screen E. The output unit 115 sums up the numbers of transitions for all combinations of two participants U. The output unit 115 determines that the combination of participants U whose number of transitions is equal to or more than a predetermined threshold is good compatibility, and determines that the combination of participants U whose number of transitions is less than the predetermined threshold is bad. Then, the output unit 115 causes the analysis report screen E to display the compatibility determined for each combination of the participants U. Thereby, the analyst can grasp that there are more or less transitions for each combination of participants U.
 出力部115は、分析者による操作を受け付けることによって、発言者遷移画面C、発言順画面D及び分析レポート画面Eを切り替えて表示部21に表示させる。出力部115は、発言者遷移画面C、発言順画面D及び分析レポート画面Eのうちの一部のみを表示部21に表示させてもよい。出力部115は、表示部への表示に限られず、プリンタによる印刷、記憶装置へのデータ記録等、その他の方法によって分析結果を出力してもよい。 The output unit 115 switches the speaker transition screen C, the speech order screen D, and the analysis report screen E and causes the display unit 21 to display the screen by receiving an operation by the analyst. The output unit 115 may cause the display unit 21 to display only a part of the speaker transition screen C, the speech order screen D, and the analysis report screen E. The output unit 115 is not limited to the display on the display unit, and may output the analysis result by other methods such as printing by a printer, data recording to a storage device, and the like.
[音声分析方法のシーケンス]
 図9は、本実施形態に係る音声分析システムSが行う音声分析方法のシーケンス図である。まず通信端末20は、分析者から分析条件の設定を受け付け、設定情報として音声分析装置100に送信する(S11)。音声分析装置100の設定部111は、通信端末20から設定情報を取得して設定情報記憶部131に記憶させる。
[Sequence of voice analysis method]
FIG. 9 is a sequence diagram of the speech analysis method performed by the speech analysis system S according to the present embodiment. First, the communication terminal 20 receives the setting of analysis conditions from the analyst, and transmits the setting as setting information to the voice analysis device 100 (S11). The setting unit 111 of the voice analysis device 100 acquires setting information from the communication terminal 20 and causes the setting information storage unit 131 to store the setting information.
 次に音声分析装置100の音声取得部112は、音声の取得を指示する信号を集音装置10に送信する(S12)。集音装置10は、音声分析装置100から音声の取得を指示する信号を受信した場合に、複数の集音部を用いて音声の記録を開始し、記録した複数チャネルの音声を音声分析装置100に送信する(S13)。音声分析装置100の音声取得部112は、集音装置10から音声を受信して音声記憶部132に記憶させる。 Next, the voice acquisition unit 112 of the voice analysis device 100 transmits a signal instructing voice acquisition to the sound collection device 10 (S12). When the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100, recording of voice is started using a plurality of sound collection units, and the voice analysis device 100 collects voices of a plurality of channels recorded. (S13). The voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
 音声分析装置100は、分析者による指示があった時、音声の取得が終了した時、又は音声を取得している途中(すなわちリアルタイム処理)のいずれかのタイミングで、音声の分析を開始する。音声を分析する際に、まず音源定位部113は、音声取得部112が取得した音声に基づいて音源定位を行う(S14)。 The voice analysis device 100 starts voice analysis at one of timings when an analyst gives instructions, when voice acquisition ends, or during voice acquisition (that is, real-time processing). When analyzing speech, first, the sound source localization unit 113 performs sound source localization based on the speech acquired by the speech acquisition unit 112 (S14).
 次に分析部114は、音声取得部112が取得した音声及び音源定位部113が推定した音源の向きに基づいて、時間ごとにいずれの参加者が発言したかを判別することによって、参加者ごとに発言期間及び発言量を特定する(S15)。分析部114は、参加者ごとの発言期間及び発言量を、分析結果記憶部133に記憶させる。 Next, for each participant, the analysis unit 114 determines, based on the voice acquired by the voice acquisition unit 112 and the direction of the sound source estimated by the sound source localization unit 113, which participant has spoken at each time. The speech period and the speech volume are specified in (S15). The analysis unit 114 causes the analysis result storage unit 133 to store the utterance period and the utterance amount for each participant.
 また、分析部114は、ある発言期間の後に別の発言期間に切り替わった場合に、発言者の遷移を検出する(S16)。分析部114は、遷移の発生時間と、遷移元の参加者と、遷移先の参加者とを集計し、それらを関連付けて分析結果記憶部133に記憶させる。 Further, when the analysis unit 114 switches to another utterance period after a certain utterance period, the analysis unit 114 detects the transition of the speaker (S16). The analysis unit 114 counts the time of occurrence of transition, the participant at the transition source, and the participant at the transition destination, associates them with one another, and stores them in the analysis result storage unit 133.
 出力部115は、分析結果を通信端末20の表示部21に表示させる制御を行う(S17)。具体的には、出力部115は、上述の発言者遷移画面C、発言順画面D及び分析レポート画面Eを表示させるための表示情報を、通信端末20に送信する。 The output unit 115 performs control to display the analysis result on the display unit 21 of the communication terminal 20 (S17). Specifically, the output unit 115 transmits, to the communication terminal 20, display information for displaying the speaker transition screen C, the speech order screen D, and the analysis report screen E described above.
 通信端末20は、音声分析装置100から受信した表示情報に従って、表示部21に分析結果を表示させる(S18)。 The communication terminal 20 causes the display unit 21 to display the analysis result in accordance with the display information received from the voice analysis device 100 (S18).
[本実施形態の効果]
 本実施形態に係る音声分析装置100は、複数の集音部を有する集音装置10を用いて取得した音声に基づいて、自動的に複数の参加者の議論を分析する。そのため、非特許文献1に記載のハークネス法のように記録者が議論を監視する必要がなく、またグループごとに記録者を配置する必要がないため、低コストである。
[Effect of this embodiment]
The voice analysis device 100 according to the present embodiment automatically analyzes the discussions of a plurality of participants based on the voice acquired using the sound collection device 10 having a plurality of sound collection units. Therefore, it is not necessary to have the recorder monitor the discussion as in the Harkness method described in Non-Patent Document 1, and it is not necessary to arrange the recorder for each group, so the cost is low.
 また、非特許文献1に記載のハークネス法は、議論の開始から終了までの全期間における発言の遷移を表す。そのため、分析者は議論の時系列に沿って遷移の傾向の変化を把握することができなかった。それに対して本実施形態に係る音声分析装置100は、議論における参加者間の発言の遷移のタイミングを示す情報として、遷移の時間変化を表示する。これにより分析者は、各参加者Uの役割や、参加者U間の関係性を、議論の時系列に沿って把握することができる。 In addition, the Harkness method described in Non-Patent Document 1 represents the transition of speech in the entire period from the start to the end of the discussion. Therefore, the analyst could not grasp the change of the transition tendency along the time series of the argument. On the other hand, the speech analysis device 100 according to the present embodiment displays the time change of the transition as information indicating the timing of the transition of the utterance between the participants in the discussion. Thereby, the analyst can grasp the role of each participant U and the relationship between the participants U along the time series of the discussion.
 また、音声分析装置100は、取得した音声に基づいて、参加者Uの発言量と、参加者間の遷移の回数とを同時に表示する。これにより分析者は、いずれの参加者Uが多く又は少なく話したかと、参加者U間の発言の流れとを一見して把握することができる。 Further, the voice analysis device 100 simultaneously displays the amount of speech of the participant U and the number of transitions between the participants based on the acquired voice. As a result, the analyst can grasp at a glance the flow of utterances among the participants U and which participant U talks more or less.
 また、音声分析装置100は、取得した音声に基づいて、議論における主な発言の順番、グループの雰囲気及び参加者の分類を表示する。これにより分析者は、議論の中心にいた参加者、議論を行ったグループ全体の雰囲気、及び議論全体における各参加者のようすを把握することができる。 In addition, the voice analysis device 100 displays the order of main utterances in the discussion, the atmosphere of the group, and the classification of the participants based on the acquired voice. This enables the analyst to understand the participants who were at the center of the discussion, the atmosphere of the whole group who discussed, and the state of each participant in the whole discussion.
 以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の分散・統合の具体的な実施の形態は、以上の実施の形態に限られず、その全部又は一部について、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を合わせ持つ。 As mentioned above, although the present invention was explained using an embodiment, the technical scope of the present invention is not limited to the range given in the above-mentioned embodiment, and various modification and change are possible within the range of the gist. is there. For example, a specific embodiment of device distribution and integration is not limited to the above embodiment, and all or a part thereof may be functionally or physically distributed and integrated in any unit. Can. In addition, new embodiments produced by any combination of a plurality of embodiments are also included in the embodiments of the present invention. The effects of the new embodiment generated by the combination combine the effects of the original embodiment.
 音声分析装置100、集音装置10及び通信端末20のプロセッサは、図9に示す音声分析方法に含まれる各ステップ(工程)の主体となる。すなわち、音声分析装置100、集音装置10及び通信端末20のプロセッサは、図9に示す音声分析方法を実行するためのプログラムを記憶部から読み出し、該プログラムを実行して音声分析装置100、集音装置10及び通信端末20の各部を制御することによって、図9に示す音声分析方法を実行する。図9に示す音声分析方法に含まれるステップは一部省略されてもよく、ステップ間の順番が変更されてもよく、複数のステップが並行して行われてもよい。 The processor of the speech analysis device 100, the sound collection device 10, and the communication terminal 20 is the main body of each step (process) included in the speech analysis method shown in FIG. That is, the processors of the voice analysis device 100, the sound collection device 10 and the communication terminal 20 read a program for executing the voice analysis method shown in FIG. 9 from the storage unit, and execute the program to execute the voice analysis device 100. By controlling each part of the sound device 10 and the communication terminal 20, the voice analysis method shown in FIG. 9 is performed. The steps included in the speech analysis method shown in FIG. 9 may be partially omitted, the order between the steps may be changed, and a plurality of steps may be performed in parallel.
S 音声分析システム
100 音声分析装置
110 制御部
112 音声取得部
114 分析部
115 出力部
10 集音装置
20 通信端末
21 表示部
S voice analysis system 100 voice analysis device 110 control unit 112 voice acquisition unit 114 analysis unit 115 output unit 10 sound collector 20 communication terminal 21 display unit

Claims (10)

  1.  複数の参加者が発した音声を取得する取得部と、
     前記音声における、前記複数の参加者のうち第1参加者の発言から、前記複数の参加者のうち第2参加者の発言への遷移を検出する分析部と、
     前記遷移が発生したタイミングを示す情報を表示部に表示させる出力部と、
     を有する音声分析装置。
    An acquisition unit for acquiring voices uttered by a plurality of participants;
    An analysis unit that detects a transition from a first participant's utterance among the plurality of participants in the voice to a second participant's utterance among the plurality of participants;
    An output unit configured to display information indicating timing at which the transition has occurred on a display unit;
    Voice analyzer with.
  2.  前記出力部は、前記表示部上で、前記第1参加者に対応する位置と、前記第2参加者に対応する位置とを結ぶ線によって、前記タイミングを示す情報を表示する、請求項1に記載の音声分析装置。 The said output part displays the information which shows the said timing by the line which ties the position corresponding to a said 1st participant, and the position corresponding to a said 2nd participant on the said display part. Voice analyzer as described.
  3.  前記出力部は、前記表示部上で、前記遷移が発生した時間に前記線を生成し、前記遷移が発生した時間から所定時間の経過後に前記線を消去することによって、前記タイミングを示す情報として前記遷移の時間変化を表示する、請求項2に記載の音声分析装置。 The output unit generates the line at the time when the transition occurs on the display unit, and erases the line after a predetermined time has elapsed from the time when the transition occurs, as information indicating the timing. The voice analysis device according to claim 2, wherein the time change of the transition is displayed.
  4.  前記出力部は、前記第1参加者と前記第2参加者との組み合わせに応じて、前記線の表示態様を変更する、請求項3に記載の音声分析装置。 The voice analysis device according to claim 3, wherein the output unit changes a display mode of the line according to a combination of the first participant and the second participant.
  5.  前記出力部は、前記遷移が発生した回数に応じて、前記線の表示態様を変更する、請求項3又は4に記載の音声分析装置。 The voice analysis device according to claim 3, wherein the output unit changes a display mode of the line according to the number of times the transition occurs.
  6.  前記分析部は、前記音声に基づいて前記複数の参加者のそれぞれが発言している期間を特定し、前記第1参加者が発言している前記期間から前記第2参加者が発言している前記期間に切り替わった場合に前記遷移を検出する、請求項1から5のいずれか一項に記載の音声分析装置。 The analysis unit identifies a period in which each of the plurality of participants is speaking based on the voice, and the second participant speaks from the period in which the first participant speaks The voice analysis device according to any one of claims 1 to 5, which detects the transition when switching to the period.
  7.  前記出力部は、前記遷移の時間変化に加えて、前記複数の参加者のそれぞれの発言量を、前記表示部に表示させる、請求項1から6のいずれか一項に記載の音声分析装置。 The voice analysis device according to any one of claims 1 to 6, wherein the output unit causes the display unit to display the amount of speech of each of the plurality of participants in addition to the time change of the transition.
  8.  プロセッサが、
     複数の参加者が発した音声を取得するステップと、
     前記音声における、前記複数の参加者のうち第1参加者の発言から、前記複数の参加者のうち第2参加者の発言への遷移を検出するステップと、
     前記遷移が発生したタイミングを示す情報を表示部に表示させるステップと、
     を実行する音声分析方法。
    Processor is
    Acquiring voices uttered by a plurality of participants;
    Detecting a transition from a first participant's utterance among the plurality of participants in the voice to a second participant's utterance among the plurality of participants;
    Displaying information indicating timing at which the transition has occurred on a display unit;
    Voice analysis method to perform.
  9.  コンピュータに、
     複数の参加者が発した音声を取得するステップと、
     前記音声における、前記複数の参加者のうち第1参加者の発言から、前記複数の参加者のうち第2参加者の発言への遷移を検出するステップと、
     前記遷移が発生したタイミングを示す情報を表示部に表示させるステップと、
     を実行させる音声分析プログラム。
    On the computer
    Acquiring voices uttered by a plurality of participants;
    Detecting a transition from a first participant's utterance among the plurality of participants in the voice to a second participant's utterance among the plurality of participants;
    Displaying information indicating timing at which the transition has occurred on a display unit;
    Voice analysis program to run.
  10.  音声分析装置と、前記音声分析装置と通信可能な通信端末と、を備え、
     前記通信端末は、情報を表示する表示部を有し、
     前記音声分析装置は、
      複数の参加者が発した音声を取得する取得部と、
      前記音声における、前記複数の参加者のうち第1参加者の発言から、前記複数の参加者のうち第2参加者の発言への遷移を検出する分析部と、
      前記遷移が発生したタイミングを示す情報を前記表示部に表示させる出力部と、
     を有する、音声分析システム。
    A voice analysis device; and a communication terminal capable of communicating with the voice analysis device;
    The communication terminal has a display unit for displaying information;
    The voice analysis device
    An acquisition unit for acquiring voices uttered by a plurality of participants;
    An analysis unit that detects a transition from a first participant's utterance among the plurality of participants in the voice to a second participant's utterance among the plurality of participants;
    An output unit that causes the display unit to display information indicating timing at which the transition has occurred;
    Voice analysis system.
PCT/JP2018/000941 2018-01-16 2018-01-16 Voice analysis device, voice analysis method, voice analysis program, and voice analysis system WO2019142230A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2018502278A JP6646134B2 (en) 2018-01-16 2018-01-16 Voice analysis device, voice analysis method, voice analysis program, and voice analysis system
PCT/JP2018/000941 WO2019142230A1 (en) 2018-01-16 2018-01-16 Voice analysis device, voice analysis method, voice analysis program, and voice analysis system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/000941 WO2019142230A1 (en) 2018-01-16 2018-01-16 Voice analysis device, voice analysis method, voice analysis program, and voice analysis system

Publications (1)

Publication Number Publication Date
WO2019142230A1 true WO2019142230A1 (en) 2019-07-25

Family

ID=67301369

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/000941 WO2019142230A1 (en) 2018-01-16 2018-01-16 Voice analysis device, voice analysis method, voice analysis program, and voice analysis system

Country Status (2)

Country Link
JP (1) JP6646134B2 (en)
WO (1) WO2019142230A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023021972A (en) * 2019-10-28 2023-02-14 ハイラブル株式会社 Speech analysis device, speech analysis method, speech analysis program, and speech analysis system
WO2023210052A1 (en) * 2022-04-27 2023-11-02 ハイラブル株式会社 Voice analysis device, voice analysis method, and voice analysis program
JP7530070B2 (en) 2020-06-01 2024-08-07 ハイラブル株式会社 Audio conference device, audio conference system, and audio conference method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004350134A (en) * 2003-05-23 2004-12-09 Nippon Telegr & Teleph Corp <Ntt> Meeting outline grasp support method in multi-point electronic conference system, server for multi-point electronic conference system, meeting outline grasp support program, and recording medium with the program recorded thereon
JP2013058221A (en) * 2012-10-18 2013-03-28 Hitachi Ltd Conference analysis system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004350134A (en) * 2003-05-23 2004-12-09 Nippon Telegr & Teleph Corp <Ntt> Meeting outline grasp support method in multi-point electronic conference system, server for multi-point electronic conference system, meeting outline grasp support program, and recording medium with the program recorded thereon
JP2013058221A (en) * 2012-10-18 2013-03-28 Hitachi Ltd Conference analysis system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023021972A (en) * 2019-10-28 2023-02-14 ハイラブル株式会社 Speech analysis device, speech analysis method, speech analysis program, and speech analysis system
JP7427274B2 (en) 2019-10-28 2024-02-05 ハイラブル株式会社 Speech analysis device, speech analysis method, speech analysis program and speech analysis system
JP7530070B2 (en) 2020-06-01 2024-08-07 ハイラブル株式会社 Audio conference device, audio conference system, and audio conference method
WO2023210052A1 (en) * 2022-04-27 2023-11-02 ハイラブル株式会社 Voice analysis device, voice analysis method, and voice analysis program
WO2023209898A1 (en) * 2022-04-27 2023-11-02 ハイラブル株式会社 Voice analysis device, voice analysis method, and voice analysis program

Also Published As

Publication number Publication date
JP6646134B2 (en) 2020-02-14
JPWO2019142230A1 (en) 2020-02-06

Similar Documents

Publication Publication Date Title
US10056094B2 (en) Method and apparatus for speech behavior visualization and gamification
WO2019142230A1 (en) Voice analysis device, voice analysis method, voice analysis program, and voice analysis system
JP7453714B2 (en) Argument analysis device and method
CN111901627B (en) Video processing method and device, storage medium and electronic equipment
CN110600033A (en) Learning condition evaluation method and device, storage medium and electronic equipment
CN110473533A (en) Speech dialogue system, speech dialog method and program
WO2024099359A1 (en) Voice detection method and apparatus, electronic device and storage medium
JP6589042B1 (en) Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system
JP7427274B2 (en) Speech analysis device, speech analysis method, speech analysis program and speech analysis system
JP6589040B1 (en) Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system
WO2022079777A1 (en) Analysis device, analysis system, analysis method, and non-transitory computer-readable medium having program stored thereon
JP6733452B2 (en) Speech analysis program, speech analysis device, and speech analysis method
JP7452299B2 (en) Conversation support system, conversation support method and program
US20210012791A1 (en) Image representation of a conversation to self-supervised learning
JP6975755B2 (en) Voice analyzer, voice analysis method, voice analysis program and voice analysis system
JP6589041B1 (en) Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system
JP7414319B2 (en) Speech analysis device, speech analysis method, speech analysis program and speech analysis system
JP6975756B2 (en) Voice analyzer, voice analysis method, voice analysis program and voice analysis system
JP7149019B2 (en) Speech analysis device, speech analysis method, speech analysis program and speech analysis system
WO2022079767A1 (en) Analysis device, system, method, and non-transitory computer-readable medium storing program
JP7449577B2 (en) Information processing device, information processing method, and program
WO2022079773A1 (en) Analysis device, system, method, and non-transitory computer-readable medium having program stored therein
JP2022144417A (en) Hearing support device, hearing support method and hearing support program
CN115440231A (en) Speaker recognition method, device, storage medium, client and server

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018502278

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18900613

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.10.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18900613

Country of ref document: EP

Kind code of ref document: A1