WO2023079602A1 - Voice analysis device and voice analysis method - Google Patents

Voice analysis device and voice analysis method Download PDF

Info

Publication number
WO2023079602A1
WO2023079602A1 PCT/JP2021/040443 JP2021040443W WO2023079602A1 WO 2023079602 A1 WO2023079602 A1 WO 2023079602A1 JP 2021040443 W JP2021040443 W JP 2021040443W WO 2023079602 A1 WO2023079602 A1 WO 2023079602A1
Authority
WO
WIPO (PCT)
Prior art keywords
group
section
discussion
information
participants
Prior art date
Application number
PCT/JP2021/040443
Other languages
French (fr)
Japanese (ja)
Inventor
浩平 柳楽
武志 水本
Original Assignee
ハイラブル株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ハイラブル株式会社 filed Critical ハイラブル株式会社
Priority to PCT/JP2021/040443 priority Critical patent/WO2023079602A1/en
Publication of WO2023079602A1 publication Critical patent/WO2023079602A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present invention relates to a speech analysis device and a speech analysis method for analyzing speech uttered in a discussion.
  • Japanese Patent Laid-Open No. 2004-100003 discloses a technology that uses a microphone to acquire voices uttered by participants during a conference, identifies the participants who are speaking on the basis of voiceprint data extracted from the voices, and obtains the utterance status of each of a plurality of participants.
  • a system for displaying on a display is disclosed.
  • Patent Literature 1 displays the utterances of each of a plurality of participants for each time, so it is difficult for an analyst to grasp the tendency of utterances of two groups that divide a plurality of participants. , there was a problem that it was difficult to analyze the discussion between the two groups.
  • the present invention was made in view of these points, and aims to make it easier to analyze the tendency of utterances in discussions between two groups.
  • the speech analysis device is a speech analysis device in which participants belonging to a first group and participants belonging to a second group each uttered in a discussion the utterances of the first group and the second group.
  • an acquisition unit that acquires time-series information indicating a situation for each time period, each of a plurality of sections that constitute the discussion based on the time-series information, and which of the first group and the second group is selected in the section
  • a generation unit that generates interval information that associates an interval tendency indicating whether an utterance is dominant in all or part of the discussion, and an output unit that outputs the interval information.
  • the interval tendency may indicate which of the first group and the second group is the main utterance, or that the utterances of the first group and the second group are antagonistic.
  • the generating unit determines each of the plurality of segments so that they are equal to or longer than a predetermined time, and determines the segment trend in the segment by comparing the utterance situations of the first group and the second group in the segment. You may
  • the time-series information may be information indicating which of the first group and the second group has a larger amount of speech for each predetermined time frame during the period from the start point to the end point of the discussion.
  • the speech analysis device may further include a classification unit that classifies a plurality of participants into the first group and the second group.
  • the classification unit may change the participants belonging to each of the first group and the second group during a plurality of periods in the discussion.
  • the classification unit classifies a first parent group including the first group and the second group into which some of the plurality of participants are classified, and a portion of the plurality of participants not belonging to the first parent group. a second parent group including the classified first group and the second group, and the output unit generates the section information of the first parent group and the section information of the second parent group; can be output at the same time.
  • the classification unit classifies a first parent group including the first group and the second group into which some of the plurality of participants are classified, and a portion of the plurality of participants not belonging to the first parent group. a second parent group containing the classified first group and the second group; A third parent group including groups is generated, and the output unit outputs the section information of at least one of the first parent group and the second parent group and the section information of the third parent group.
  • the classification unit may change the participants belonging to the first group to which the specific participant belongs and the participants belonging to the second group based on the position of the specific participant.
  • the output unit may output the words included in the utterance of each of the plurality of sections, which are extracted by performing speech recognition processing on the speech, in association with the section.
  • the output unit may output the characteristics of the entire discussion based on the section trends of the plurality of sections forming the discussion.
  • the speech analysis device further includes a selection unit that selects reference section information to be compared with the section information, and the output unit outputs a comparison result between the section information and the reference section information. good too.
  • the output unit may output information corresponding to a difference between the section information and the reference section information as the comparison result during the discussion.
  • the output unit may associate and output the section tendency of each of the plurality of sections indicated by the section information and the section tendency of each of the plurality of sections indicated by the reference section information. good.
  • a speech analysis method is performed by a processor, in speeches uttered in a discussion by participants belonging to a first group and participants belonging to a second group. a step of acquiring time-series information indicating the utterance status of each of the two groups for each time; based on the time-series information, each of a plurality of sections constituting the discussion; A step of generating interval information that associates an interval tendency indicating which utterance of the group is dominant in all or part of the discussion; and a step of outputting the interval information.
  • FIG. 1 is a schematic diagram of a speech analysis system S according to an embodiment
  • FIG. 1 is a block diagram of a speech analysis system S according to an embodiment
  • FIG. 5 is a schematic diagram for explaining a method for a selection unit to select reference section information
  • FIG. 4 is a schematic diagram for explaining a method for an acquisition unit to acquire time-series information
  • FIG. 10 is a schematic diagram for explaining a method of determining an interval tendency by a generation unit
  • FIG. 10 is a schematic diagram for explaining a method for an output unit to output section information in real time
  • FIG. 10 is a schematic diagram for explaining a method of post-outputting section information by an output unit
  • FIG. 4 is a diagram showing a flowchart of an exemplary speech analysis method performed by the speech analysis device according to the embodiment
  • FIG. 7 is a diagram showing a flowchart of section information generation processing in an exemplary speech analysis method executed by the speech analysis device according to the embodiment
  • It is a schematic diagram for demonstrating the method an output part outputs area information in a modification.
  • FIG. 11 is a schematic diagram for explaining a method of generating section information by a generation unit in a modified example;
  • FIG. 1 is a schematic diagram of a speech analysis system S according to this embodiment.
  • a speech analysis system S includes a speech analysis device 1 , a sound collector 2 and an information terminal 3 .
  • the number of sound collectors 2 and information terminals 3 included in the speech analysis system S is not limited.
  • the speech analysis system S may include other devices such as servers and terminals.
  • the speech analysis device 1 is a computer that analyzes the speech uttered in discussions in which multiple participants participate and provides the analysis results to the analyst.
  • the analyst may be some of the participants, or may be a different person from the participants.
  • the voice analysis device 1 analyzes the voice acquired by the sound collection device 2 and outputs the analysis result to the sound collection device 2 or the information terminal 3 .
  • the voice analysis device 1 is connected to the sound collector 2 and the information terminal 3 by wire or wirelessly via a network such as a local area network or the Internet.
  • the voice analysis device 1 analyzes the voices of discussions conducted by a plurality of participants divided into at least two groups. Discussions to be analyzed are, for example, classes, group discussions, debates, meetings, and the like. A plurality of participants are classified into either a first group or a second group. Participants belonging to the first group are instructors such as teachers and tutors, for example. Participants belonging to the second group are learners such as pupils and students, for example. Also, a plurality of learners may be classified into the first group and the second group. Multiple participants may be classified according to other criteria.
  • a plurality of participants may be classified into a plurality of parent groups, and a plurality of participants may be classified into the first group or the second group in each of the plurality of parent groups.
  • each parent group includes a first group and a second group.
  • one parent group corresponds to one table surrounded by a plurality of participants, and the sound collector 2 is arranged at the table.
  • a plurality of participants surrounding one table are classified into a first group and a second group.
  • the speech analysis device 1 may analyze the speech of discussions (for example, web conferences) held over a network.
  • a sound collector 2 is arranged in each space where a plurality of participants are present during the discussion, and the sound collector 2 is associated with one of the plurality of participants.
  • the sound collecting device 2 is a device that acquires the voice uttered in the discussion.
  • the sound collector 2 includes, for example, a microphone array including sound collectors such as a plurality of microphones arranged in different directions.
  • a microphone array includes, for example, a plurality of (e.g., eight) microphones arranged at equal intervals on the same circumference in a plane horizontal to the ground.
  • the speech analysis device 1 can determine which participant is the speaker (sound source) based on the speech uttered by a plurality of participants surrounding the sound collector 2. can be specified.
  • the sound collector 2 transmits the sound acquired using the microphone array to the sound analysis device 1 as sound data.
  • the sound collecting device 2 may include an audio output unit such as a speaker.
  • the information terminal 3 is a computer that outputs information, such as a smartphone, tablet terminal, or personal computer.
  • the information terminal 3 is used, for example, by at least some of the participants. Also, the information terminal 3 may be used by an analyst different from the plurality of participants.
  • the information terminal 3 has, for example, a display such as a liquid crystal display.
  • the information terminal 3 causes the display unit to display the information received from the speech analysis device 1 .
  • the information terminal 3 may function as the sound collector 2 by having a sound collector such as a microphone.
  • the information terminal 3 used by each of the plurality of participants transmits the sound acquired using the sound collecting unit to the sound analysis device 1 as sound data.
  • the speech analysis device 1 classifies a plurality of participants into a first group or a second group. For example, the speech analysis device 1 accepts a setting as to whether the plurality of participants belong to the first group or the second group at the information terminal 3, or automatically selects the plurality of participants based on the attributes of the plurality of participants. classified into the first group or the second group.
  • the voice analysis device 1 acquires voices uttered by multiple participants in the discussion from the sound collection device 2 .
  • the speech analysis apparatus 1 acquires time-series information indicating the utterance status of each of the first group and the second group by specifying the utterance period of each of the plurality of participants in the acquired voice.
  • the time-series information is, for example, information indicating which of the first group and the second group has a larger amount of speech for each predetermined time frame.
  • the speech analysis device 1 Based on the acquired time-series information, the speech analysis device 1 obtains each of the plurality of sections that constitute the discussion, the section trend indicating which of the first group and the second group is the main utterance in the section, and Generate section information associated with .
  • the section information may indicate which of the first group and the second group is the main utterance, and that the utterances of the first group and the second group are competing with each other.
  • the speech analysis device 1 outputs the generated segment information to at least one of the sound collector 2 and the information terminal 3 .
  • the speech analysis system S determines a section tendency indicating which of the two groups is the main utterance for each discussion section based on the speech of the discussion, associates the section and the section tendency, and to notify.
  • the speech analysis system S makes it easy for the analyst to grasp the utterance tendencies of the two groups, and facilitates analysis of the utterance tendencies in the discussion between the two groups.
  • FIG. 2 is a block diagram of the speech analysis system S according to this embodiment.
  • arrows indicate main data flows, and there may be data flows other than those shown in FIG.
  • each block does not show the configuration in units of hardware (apparatus), but the configuration in units of functions.
  • the blocks shown in FIG. 2 may be implemented within a single device, or may be implemented separately within multiple devices. Data exchange between blocks may be performed via any means such as a data bus, network, or portable storage medium.
  • the speech analysis device 1 has a storage unit 11 and a control unit 12.
  • the speech analysis device 1 may be configured by connecting two or more physically separated devices by wire or wirelessly. Also, the speech analysis device 1 may be configured by a cloud that is a collection of computer resources.
  • the storage unit 11 is a storage medium including ROM (Read Only Memory), RAM (Random Access Memory), hard disk drive, and the like.
  • the storage unit 11 stores programs executed by the control unit 12 in advance.
  • the storage unit 11 may be provided outside the speech analysis device 1, in which case data may be exchanged with the control unit 12 via a network.
  • the control unit 12 has a selection unit 121 , a classification unit 122 , an acquisition unit 123 , a generation unit 124 and an output unit 125 .
  • the control unit 12 is a processor such as a CPU (Central Processing Unit), for example, and by executing a program stored in the storage unit 11, a selection unit 121, a classification unit 122, an acquisition unit 123, a generation unit 124, and an output unit 125. At least part of the functions of the controller 12 may be performed by an electrical circuit. Moreover, at least part of the functions of the control unit 12 may be realized by the control unit 12 executing a program executed via a network.
  • CPU Central Processing Unit
  • the selection unit 121 selects reference section information to be compared with section information.
  • the section information is information that associates each of a plurality of sections forming an argument with the section tendency of the section.
  • the interval tendency indicates which of the utterances of the first group and the second group is dominant, or that the utterances of the first group and the second group are competing with each other.
  • the section information is generated by the generation unit 124, which will be described later, based on the speech of the discussion to be analyzed.
  • the reference section information is section information generated in advance and stored in the storage unit 11 in advance.
  • the selection unit 121 selects the reference section information based on the content specified by the analyst on the information terminal 3, for example.
  • An analyst is either one of a plurality of participants participating in a discussion to be analyzed, or a person different from the plurality of participants.
  • FIGS. 3(a) and 3(b) are schematic diagrams for explaining how the selection unit 121 selects reference section information.
  • the storage unit 11 stores the attributes of past discussions (subject, type, time, topic, format, etc.), the date and time of the discussion, and the participants of the discussion. (teacher in charge, grade, etc.) and discussion information indicating at least one of the attributes (teacher in charge, grade, etc.) are stored in advance in association with section information generated by the generation unit 124 by a method described later.
  • the selection unit 121 receives designation of search conditions for past discussions by the analyst.
  • the search condition is, for example, at least one of the attributes of the discussion, the date and time of the discussion, and the attributes of the participants.
  • the selection unit 121 extracts discussion information that matches the designated search condition from the storage unit 11, and displays it on the information terminal 3 together with the section information 31 associated with each of the extracted one or more pieces of discussion information.
  • the selection unit 121 receives the specification of any section information 31 by the analyst on the information terminal 3, and selects the specified section information 31 as the reference section information. Thereby, the speech analysis device 1 can compare the argument to be analyzed with the argument that matches the search condition specified by the analyst.
  • the storage unit 11 pre-stores a template of section information.
  • the segment information template includes, for example, a segment in which the utterances of the first group are dominant, a segment in which the utterances of the second group are dominant, and a segment in which the utterances of the first and second groups are competing. This is information indicating the order.
  • the selection unit 121 causes the information terminal 3 to display the plurality of templates 32 stored in the storage unit 11 .
  • FIG. 3B shows an example in which a plurality of participants are classified into a first group, T (teacher) group, and a second group, S (student) group. participants may be classified into two groups by other criteria.
  • the selection unit 121 accepts designation of one of the templates 32 by the analyst, and selects the order of the sections indicated by the designated template 32 as the reference section information.
  • the selection unit 121 selects a section in which the utterances of the first group are dominant, a section in which the utterances of the second group are dominant, and utterances of the first group and the utterances of the second group. It is also possible to receive an input of the order of the section and select the input order of the section as the reference section information. Thereby, the speech analysis device 1 can compare the argument to be analyzed with the order of the sections specified by the analyst.
  • the classification unit 122 classifies a plurality of participants participating in the analysis target discussion into the first group or the second group. For example, in the information terminal 3, the classification unit 122 may receive a setting by the analyst as to which of the first group and the second group each of the plurality of participants belongs to. In this case, the classification unit 122 causes the storage unit 11 to store information indicating to which of the first group and the second group each of the plurality of participants belongs, according to the content set in the information terminal 3 .
  • the classification unit 122 may automatically classify a plurality of participants into the first group or the second group, for example, based on the attributes of each of the participants.
  • the storage unit 11 stores in advance information indicating attributes of each of the plurality of participants. Attributes used for classification are, for example, the roles of participants (instructor, learner, etc.).
  • the classifying unit 122 classifies the participant into the first group if the participant's attribute satisfies a predetermined condition, and classifies the participant into the second group otherwise.
  • the classification unit 122 causes the storage unit 11 to store information indicating to which of the first group and the second group each of the plurality of participants belongs according to the classification result.
  • the acquisition unit 123 acquires, from the sound collection device 2, voices uttered by a plurality of participants in the discussion.
  • the acquisition unit 123 acquires a part of the speech of the discussion at predetermined time intervals during the discussion, or acquires the speech of the entire discussion after the discussion ends.
  • the acquisition unit 123 identifies the utterance period of each of the multiple participants based on the sound acquired from the sound collector 2 .
  • the acquisition unit 123 performs known sound source localization on multi-channel sounds received from the sound collector 2, for example.
  • the sound source localization is a process of estimating the direction of the sound source included in the sound acquired by the acquisition unit 123 for each time (for example, every 10 ms to 100 ms).
  • the acquisition unit 123 associates the orientation of the sound source estimated for each time with the orientation of each of the plurality of participants preset in the information terminal 3 .
  • the acquisition unit 123 can use other sound source localization methods such as the MUSIC (Multiple Signal Classification) method, the beamforming method, etc., as long as the direction of the sound source can be specified based on the acquired sound.
  • MUSIC Multiple Signal Classification
  • the beamforming method etc.
  • the acquisition unit 123 determines which participant uttered (spoken) every predetermined time (for example, every 10 milliseconds to 100 milliseconds) in the discussion. determine whether The acquisition unit 123 identifies a continuous period from when one participant starts speaking to when it ends as an speaking period. When multiple participants speak at the same time, at least some of the speech periods of the multiple participants may overlap.
  • the acquisition unit 123 estimates, as a sound source, a participant associated with the sound collector 2 that is the transmission source of the acquired sound, and uses the acquired sound and the estimated sound source as The utterance period of each of a plurality of participants is specified based on this.
  • the acquisition unit 123 is not limited to the specific method shown here, and may identify the utterance period of each of the multiple participants by other methods.
  • the acquisition unit 123 acquires time-series information indicating the utterance status of each of the first group and the second group based on the utterance period of each of the specified participants.
  • FIG. 4 is a schematic diagram for explaining how the acquisition unit 123 acquires time-series information.
  • the acquisition unit 123 calculates the speech volume of each of the multiple participants based on the identified speech period. For example, for each predetermined time frame (5 seconds, 10 seconds, 30 seconds, etc.), the acquisition unit 123 calculates, as the amount of speech, a value corresponding to the length of the speech period of the participant within that time frame. Instead of or in addition to the length of the speech period, the acquisition unit 123 may calculate a value corresponding to the number of times of speech or the volume of speech as the amount of speech. The acquisition unit 123 calculates the amount of speech for each time frame in the period from the start point (start time) to the end point (end time) of the discussion for each of the plurality of participants.
  • the acquisition unit 123 classifies the speech volume of each of the multiple participants for each time frame into the first group or the second group according to the classification result of the multiple participants by the classifying unit 122 .
  • a plurality of participants are classified into a first group, T (teacher) group, and a second group, S (student) group.
  • the acquisition unit 123 calculates a statistical value (average value, median value, etc.) of the amount of speech of the participants belonging to each of the first group and the second group for each time frame.
  • the acquisition unit 123 determines, for each time frame, which of the first group and the second group has a larger statistical value of the amount of speech.
  • Acquisition section 123 acquires information indicating which of the first group and second group has a larger amount of speech for each predetermined time frame in a period from the start point to the end point of the discussion, according to the determination result, as time-series information. to get as In the time-series information, when a plurality of time frames in which the first group and the second group have the same speech volume continue, the acquisition unit 123 may integrate the plurality of time frames. good.
  • the generation unit 124 determines a plurality of sections that constitute the discussion to be analyzed, and determines which of the first group and the second group is the main utterance for each section. Determine the interval trend that indicates whether The generation unit 124 may determine sections and section trends in a part of the discussion during the discussion, and may determine sections and section trends in the entire discussion after the discussion ends.
  • FIG. 5 is a schematic diagram for explaining how the generation unit 124 determines the interval tendency. Based on the time-series information acquired by the acquisition unit 123, the generation unit 124 generates a transition graph showing transitions indicating which of the first group and the second group has a larger amount of speech.
  • FIG. 5 shows an example in which the speech analysis system S according to the present embodiment is applied to generate an ST analysis graph described in Non-Patent Document 1. In FIG.
  • the horizontal axis represents time for the first group (T group), and the vertical axis represents time for the second group (S group).
  • Acquisition unit 123 takes the origin as the starting point (discussion starting point), draws a line to the right along the horizontal axis for a period in which the amount of speech in the first group in the time-series information is larger, and draws a line to the right in the time-series information.
  • a line is drawn up along the vertical axis for periods when the group's speech volume is greater.
  • the acquisition unit 123 repeats this from the start to the end of the time-series information, that is, from the start point to the end point of the discussion, thereby generating a transition graph showing the transition of which of the first group and the second group has the greater amount of speech. do.
  • the generation unit 124 uses the generated transition graph to divide the argument to be analyzed into a plurality of sections, and generates a section tendency indicating which of the first group and the second group is the main utterance for each section. decide. First, the generating unit 124 sets the origin (the starting point of the transition graph) as the starting point of the interval. The generation unit 124 extracts one predetermined period (for example, 5 seconds) in chronological order from the transition graph as a target unit.
  • the generation unit 124 determines whether or not the elapsed time from the start point of the section to the end point of the attention unit is equal to or longer than a predetermined time.
  • the predetermined time is, for example, a value set in advance as the minimum duration of the section, such as 5 minutes or 10 minutes. Also, the predetermined time may be determined according to the reference section information selected by the selection unit 121 . If the elapsed time from the start point of the section to the end point of the unit of attention is less than the predetermined time, the generation unit 124 extracts the next predetermined period as the unit of attention, and the elapsed time from the start point of the section to the end point of the unit of attention is the predetermined time. It repeats the determination of whether or not it is equal to or longer than the time.
  • the generation unit 124 determines the section tendency in the section by comparing the utterance situations of the first group and the second group. For example, the generation unit 124 calculates the slope (broken line in FIG. 5) between the coordinates of the start point of the section and the coordinates of the end point of the attention unit on the transition graph in order to compare the utterance situations.
  • the generation unit 124 determines the section tendency based on the calculated slope. For example, when the slope is equal to or less than the first reference value, the generation unit 124 determines that the utterances of the first group are predominant. For example, when the slope is greater than the first reference value and equal to or less than the second reference value, the generation unit 124 determines that the utterances of the first group and the second group are competing with each other. For example, when the slope is greater than the second reference value, the generator 124 determines that the utterances of the second group are predominant.
  • the first reference value and the second reference value are stored in advance in the storage unit 11 or set in the information terminal 3 .
  • the generation unit 124 determines the determination result as the segment tendency of the segment.
  • the generation unit 124 When the segment trend of the previous segment and the segment trend of the current segment are the same, the generation unit 124 combines the unit of interest with the previous segment and extracts the next predetermined period as the unit of interest. , repeatedly determines whether or not the elapsed time from the start point of the section to the end point of the target unit is equal to or longer than a predetermined time.
  • the generation unit 124 determines the previous segment and the segment trend.
  • the generation unit 124 sets the start point of the unit of interest as the start point of the section, and repeats the determination of the section and the section tendency until the end point of the transition graph by the above-described processing.
  • the generation unit 124 is not limited to the specific method shown here, and by other methods, determines a plurality of sections that constitute the discussion based on the time-series information, and generates the first group and the second group for each section. , or whether the utterances of the first group and the second group are antagonistic.
  • the generation unit 124 associates each of the plurality of sections that constitute the discussion with a section tendency indicating which of the first group and the second group is the main utterance in the section, in all or part of the discussion.
  • Section information is generated and stored in the storage unit 11 .
  • the section tendency in a part of the discussion indicates that the utterances of the first group and the second group are antagonistic (that is, neither the utterance of the first group nor the second group is dominant). good too.
  • the generation unit 124 divides the argument to be analyzed into a plurality of sections, and determines the section tendency indicating which of the first group and the second group is the main utterance for each section. Since the time-series information expresses the relative amount of utterances of the two groups in detail for each time, it is difficult for the analyst to analyze the tendency of the utterances of the two groups even if the time-series information is viewed as it is. On the other hand, when the generation unit 124 divides the period in which the same utterance tendency continues in the discussion into one section, the analyst can easily grasp the transition of the utterance tendency in the whole discussion. It makes it easier to analyze discussions between groups.
  • the output unit 125 outputs the section information generated by the generation unit 124 during the discussion and/or after the discussion ends.
  • the output of the section information by the output unit 125 during the discussion is called real-time output
  • the output of the section information by the output unit 125 after the discussion is over is called post-output.
  • FIGS. 6(a) and 6(b) are schematic diagrams for explaining how the output unit 125 outputs section information in real time.
  • the output unit 125 causes the information corresponding to the section information generated by the generation unit 124 to be displayed on the display unit provided in the information terminal 3 as shown in FIG. 2 performs control for outputting from the sound output unit provided in .
  • the output unit 125 transmits information corresponding to the time-series information acquired by the acquisition unit 123 and the section information generated by the generation unit 124 to the information terminal 3 .
  • the output unit 125 creates a transition graph 33 showing the transition between the first group and the second group, which corresponds to the time-series information acquired by the acquisition unit 123 and has a larger amount of speech. , is displayed on the information terminal 3 .
  • the output unit 125 may display the section tendency on the transition graph 33 by generating the transition graph 33 using a line of a color corresponding to the section tendency for each section.
  • the output unit 125 causes the information terminal 3 to display a bar graph 34 corresponding to the section information generated by the generation unit 124 and showing the length of the section and the section tendency of the section.
  • the output unit 125 causes the information terminal 3 to display the transition graph 33 and the bar graph 34 corresponding to the reference section information selected by the selection unit 121 . Thereby, the speech analysis device 1 can easily compare the section information of the argument to be analyzed with the reference section information specified by the analyst.
  • the output unit 125 outputs, for example, information corresponding to the difference between the section information generated by the generation unit 124 and the reference section information selected by the selection unit 121 to the information terminal 3 or the sound collector 2 as a comparison result. Send.
  • the output unit 125 outputs, as the difference between the section information and the reference section information, whether or not the difference in the length, number, order, etc. of sections satisfies a predetermined condition.
  • the output unit 125 outputs a message 35 indicating that the segment length of the first group (T group) in the segment information is longer than the segment length of the first group in the reference segment information. is displayed on the information terminal 3.
  • the output unit 125 outputs a voice indicating that the segment length of the second group (S group) in the segment information is longer than the segment length of the second group in the reference segment information. , is output from the sound collector 2 .
  • the speech analysis apparatus 1 sequentially notifies the analyst of the difference between the section information of the discussion to be analyzed and the reference section information specified by the analyst, and can easily reflect the difference in the ongoing discussion. .
  • FIG. 7 is a schematic diagram for explaining how the output unit 125 outputs the section information after the fact.
  • the output unit 125 performs control for displaying information corresponding to the section information generated by the generation unit 124 on the display unit included in the information terminal 3 as shown in FIG.
  • the output unit 125 transmits information corresponding to the time-series information acquired by the acquisition unit 123 and the section information generated by the generation unit 124 to the information terminal 3 .
  • the output unit 125 outputs a transition graph 36 indicating a transition between the first group and the second group, which corresponds to the time-series information acquired by the acquisition unit 123, to the information terminal. 3 is displayed.
  • the output unit 125 may display the section tendency on the transition graph 36 by generating the transition graph 36 using a line of a color corresponding to the section tendency for each section.
  • the output unit 125 causes the information terminal 3 to display a bar graph 37 corresponding to the section information generated by the generation unit 124 and showing the length of the section and the section tendency of the section.
  • the output unit 125 causes the information terminal 3 to display the transition graph 36 and the bar graph 37 corresponding to the reference section information selected by the selection unit 121 . Thereby, the speech analysis device 1 can easily compare the section information of the argument to be analyzed with the reference section information specified by the analyst.
  • the output unit 125 associates, for example, the section tendency of each of the plurality of sections indicated by the section information generated by the generation unit 124 with the section tendency of each of the plurality of sections indicated by the reference section information selected by the selection unit 121.
  • the received information is transmitted to the information terminal 3.
  • the output unit 125 displays on the information terminal 3 the transitions 38 of the interval tendencies of a plurality of intervals for each of the interval information generated by the generation unit 124 and the reference interval information selected by the selection unit 121.
  • the output unit 125 detects the correspondence relationship between the section indicated by the section information generated by the generation unit 124 and the section indicated by the reference section information selected by the selection unit 121, and increases the section. , decrease, order difference, etc. are displayed as transitions 38 of the interval tendency. It is possible to make it easy to understand the difference between the transition of the interval trend and the transition.
  • the output unit 125 may output the words (words) uttered in each of the plurality of intervals determined by the generation unit 124 in association with the interval.
  • the output unit 125 extracts the words included in the utterance in each of the sections by performing known speech recognition processing on the speech in each of the sections, for example.
  • the output unit 125 causes the information terminal 3 to display, for example, each of the plurality of sections in association with some or all of the words extracted for the section.
  • the speech analysis device 1 can make it easier for the analyst to understand the content of each of the plurality of sections.
  • the output unit 125 may output the characteristics of the entire discussion based on the section information generated by the generation unit 124. In this case, the output unit 125 outputs a plurality of Based on the interval tendency of the interval of , determine the characteristics of the entire discussion. For example, the output unit 125 outputs, in a plurality of sections constituting a discussion, a section in which the utterance of the first group is the main part, a section in which the utterance of the second group is the main part, and the utterances of the first group and the second group. The characteristics of the entire argument are determined based on the ratio of the competitive interval and the .
  • the output unit 125 determines that the discussion is a lecture-based discussion when the ratio of the section in which the T group (leader group) is the main utterance is equal to or greater than a predetermined value, and the S group (learner group) utterance is determined. If the ratio of main sections is equal to or greater than a predetermined value, it is determined that the discussion is an exercise-based discussion.
  • the output unit 125 causes the information terminal 3 to display a message (report) representing the determined characteristics of the entire discussion.
  • the speech analysis device 1 makes it easier for the analyst to grasp the trend of the entire discussion determined based on the segment trends of the plurality of segments.
  • FIG. 8 is a diagram showing a flowchart of an exemplary speech analysis method executed by the speech analysis device 1 according to this embodiment.
  • the selection unit 121 selects reference section information to be compared with section information (S11).
  • the classification unit 122 classifies a plurality of participants participating in the analysis target discussion into the first group or the second group (S12). For example, the classification unit 122 receives a setting as to whether the plurality of participants belong to the first group or the second group in the information terminal 3, or automatically classifies the plurality of participants based on the attributes of the plurality of participants. classified into the first group or the second group.
  • the acquisition unit 123 acquires voices uttered by a plurality of participants in the discussion from the sound collector 2 .
  • the acquisition unit 123 identifies the utterance period of each of the multiple participants based on the sound acquired from the sound collector 2 (S13).
  • the acquisition unit 123 acquires time-series information indicating the utterance status of each of the first group and the second group for each time based on the utterance period of each of the identified participants (S14).
  • the generation unit 124 Based on the time-series information acquired by the acquisition unit 123, the generation unit 124 generates a plurality of sections constituting the discussion and a section indicating which of the first group and the second group is the main utterance in the section. Section information generation processing is performed to generate section information that associates the tendency with the (S2). Section information generation processing in step S2 will be described later using FIG.
  • the output unit 125 outputs the section information generated by the generation unit 124 to at least one of the sound collector 2 and the information terminal 3 (S15).
  • FIG. 9 is a diagram showing a flowchart of section information generation processing in an exemplary speech analysis method executed by the speech analysis device 1 according to this embodiment.
  • the generation unit 124 generates a transition graph showing transitions indicating which of the first group and the second group has a larger amount of speech based on the time-series information acquired by the acquisition unit 123 (S21).
  • the generating unit 124 sets the origin (the starting point of the transition graph) as the starting point of the section (S22).
  • the generation unit 124 extracts one predetermined period (for example, 5 seconds) in chronological order from the transition graph as a target unit (S23).
  • the generation unit 124 determines whether or not the elapsed time from the start point of the section to the end point of the attention unit is equal to or longer than a predetermined time (S24). If the elapsed time from the start point of the section to the end point of the attention unit is not equal to or longer than the predetermined time (NO in S25), the generation unit 124 returns to step S23 and repeats the process for the next attention unit.
  • the generating unit 124 calculates the distance between the coordinates of the start point of the section and the end point of the unit of interest on the transition graph. Calculate the slope of (S26).
  • the generation unit 124 determines the section tendency based on the calculated slope (S27). For example, when the slope is equal to or less than the first reference value, the generation unit 124 determines that the utterances of the first group are predominant. For example, when the slope is greater than the first reference value and equal to or less than the second reference value, the generation unit 124 determines that the utterances of the first group and the second group are competing with each other. For example, when the slope is greater than the second reference value, the generation unit 124 determines that the utterances of the second group are predominant. The generation unit 124 determines the determination result as the segment tendency of the segment.
  • the generation unit 124 combines the unit of interest with the previous segment (S29). The generation unit 124 returns to step S23 and repeats the processing for the next target unit.
  • the generator 124 determines the previous segment and the segment trend (S30). When the time-series information has not ended (NO in S31), the generation unit 124 sets the starting point of the attention unit as the starting point of the section (S32). The generation unit 124 returns to step S23 and repeats the processing for the next target unit.
  • the generation unit 124 indicates each of the plurality of sections constituting the discussion and which of the first group and the second group is the main utterance in the section.
  • Section information that associates the section tendency with all or part of the discussion is generated and stored in the storage unit 11 .
  • segment tendencies in part of the discussion may indicate that the utterances of the first group and the second group are competing.
  • the speech analysis device 1 determines a section tendency indicating which of the two groups is the main utterance for each discussion section based on the speech of the discussion, Intervals and interval trends are associated and reported to analysts.
  • the speech analysis system S makes it easy for the analyst to grasp the utterance tendencies of the two groups, and facilitates analysis of the utterance tendencies in the discussion between the two groups.
  • the grouping of multiple participants may change during the discussion.
  • the speech analysis device 1 changes the participants belonging to the first group and the second group during a plurality of periods in the discussion, and generates section information based on the changed grouping.
  • the classification unit 122 changes the participants who belong to the first group and the second group during a plurality of periods in the discussion. For example, the classification unit 122 accepts settings of the discussion structure (explanation period, exercise period, etc.) in advance in the information terminal 3, and at the timing when the structure changes, the participants belonging to the first group and the second group are classified. You can change it. For example, the classification unit 122 classifies teachers into the first group and students into the second group during the commentary period, while classifying some students into the first group and other students into the second group during the exercise period. 2 groups.
  • the classification unit 122 detects the arrangement of each of the plurality of participants by, for example, performing known image recognition processing on the captured image acquired by a camera or the like, and at the timing when the arrangement changes, the first group and the The participants belonging to each of the second groups may be changed. For example, the classifying unit 122 classifies the participants sitting on the specific seats into the first group and the participants sitting in the other seats into the second group.
  • the classification unit 122 may change the parent group including the first group and the second group according to grouping that differs for each period. That is, for each of a plurality of periods in the discussion, the classification unit 122 classifies some of the participants into a first parent group that includes a first group and a second group, and a plurality of participants that do not belong to the first parent group. and a second parent group that includes a first group and a second group that classify some of the participants in .
  • the classification unit 122 may generate three or more parent groups.
  • the classification unit 122 may generate a parent group including all participants (for example, the entire classroom) during at least one period of the discussion. That is, the classification unit 122 classifies some of the participants into a first parent group including a first group and a second group, and classifies some of the participants who do not belong to the first parent group into a first parent group. A second parent group including the first group and the second group is generated, and a third parent group including the first group and the second group is generated by classifying the participants belonging to the first parent group and the second parent group. You may
  • the generation unit 124 generates section information for each of a plurality of parent groups.
  • the output unit 125 outputs section information for each of the multiple parent groups.
  • FIG. 10 is a schematic diagram for explaining how the output unit 125 outputs section information in this modification.
  • the output unit 125 performs control for displaying the information corresponding to the section information of each of the plurality of parent groups generated by the generation unit 124 on the display unit included in the information terminal 3 as shown in FIG.
  • the output unit 125 outputs, for example, the section information of the first parent group and the section information of the second parent group at the same time.
  • the output unit 125 displays the section information of three parent groups including the first parent group and the second parent group side by side in a period of 10 minutes to 50 minutes.
  • the speech analysis device 1 allows the analyst to view the section information of multiple parent groups (multiple tables, etc.).
  • the output unit 125 outputs, for example, section information of at least one of the first parent group and the second parent group, and section information of the third parent group.
  • the output unit 125 displays the section information of the three parent groups including the first parent group and the second parent group side by side for a period of 10 minutes to 50 minutes. and section information of the third parent group corresponding to all of the participants in the period of 50 minutes to 60 minutes.
  • the speech analysis apparatus 1 hierarchically divides a parent group corresponding to a plurality of participants as a whole (a whole classroom, etc.) and a plurality of parent groups corresponding to groups (a plurality of tables, etc.) into which the plurality of participants are divided. can be easily analyzed.
  • a particular participant such as a leader, may move between multiple tables and join different groups.
  • the speech analysis device 1 changes grouping based on the position of a specific participant, and generates section information based on the changed grouping.
  • FIG. 11 is a schematic diagram for explaining how the generation unit 124 generates section information in this modified example.
  • the classification unit 122 estimates the position of the leader who is a specific participant during the discussion. For example, the classifying unit 122 compares the voices acquired by the acquiring unit 123 from the plurality of sound collecting devices 2 with the pre-registered feature amount (voiceprint, etc.) of the voice of the leader to It may be estimated whether it is close to the sound collector 2 of .
  • the classifying unit 122 classifies which sound collector 2 the leader is based on the strength of the short-range wireless communication performed between the communication device (smartphone, etc.) held by the leader and the plurality of sound collectors 2. may be estimated to be close to
  • the classification unit 122 is not limited to the specific method shown here, and may use other methods to estimate the position of the leader during the discussion.
  • the example of FIG. 11 represents that the instructor moved to table 1, table 2, and table 3 in order.
  • the classifying unit 122 changes the participants belonging to the first group to which the leader belongs and the participants belonging to the second group between a plurality of periods in the discussion. .
  • the classification unit 122 generates a first group including the instructor and a second group including the students at table 1 during the period when the instructor is positioned at table 1 . Further, the classification unit 122 generates a first group including the instructor and a second group including the students at the table 2 during the period when the instructor is positioned at the table 2 . Further, the classification unit 122 generates a first group including the instructor and a second group including the students at the table 3 during the period when the instructor is positioned at the table 3 .
  • the generation unit 124 generates section information for each of the plurality of periods whose grouping has been changed, and combines the generated section information.
  • the speech analysis device 1 when a specific participant such as a leader moves during the discussion, the speech analysis device 1 generates section information according to the grouping changed according to the position of the specific participant, It is possible to easily analyze the tendency of utterances centering on the specific participant.
  • the processor of the speech analysis device 1 is the subject of each step (process) included in the speech analysis method shown in FIGS. That is, the processor of the speech analysis device 1 reads a program for executing the speech analysis method shown in FIGS.
  • the speech analysis method shown in FIGS. 8 and 9 is executed. Some steps included in the speech analysis method shown in FIGS. 8 and 9 may be omitted, the order between steps may be changed, and a plurality of steps may be performed in parallel.
  • Speech analysis system 1 Speech analysis device 11 Storage unit 12 Control unit 121 Selection unit 122 Classification unit 123 Acquisition unit 124 Generation unit 125 Output unit 2 Sound collector 3 Information terminal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A voice analysis device 1 according to an embodiment of the present invention has: an acquiring unit 123 that acquires time series information that represents, for each time period, respective utterance situations of a first group and a second group of respective voices uttered in a discussion by participants that belong to the first group and participants that belong to the second group; a generating unit 124 that, on the basis of the time series information, generates segment information in which a plurality of segments that constitute the discussion and segment tendencies that represent which of the first group or the second group in each segment primary utterances, are associated with each other in all or a portion of the discussion; and an output unit 125 that outputs the segment information.

Description

音声分析装置及び音声分析方法Speech analysis device and speech analysis method
 本発明は、議論において発せられた音声を分析するための音声分析装置及び音声分析方法に関する。 The present invention relates to a speech analysis device and a speech analysis method for analyzing speech uttered in a discussion.
 特許文献1には、マイクロフォンを用いて会議中に参加者が発した音声を取得し、音声から抽出した声紋データに基づいて発話中の参加者を特定し、複数の参加者それぞれの発話状況をディスプレイ上に表示するシステムが開示されている。 Japanese Patent Laid-Open No. 2004-100003 discloses a technology that uses a microphone to acquire voices uttered by participants during a conference, identifies the participants who are speaking on the basis of voiceprint data extracted from the voices, and obtains the utterance status of each of a plurality of participants. A system for displaying on a display is disclosed.
特開2006-208482号公報Japanese Patent Application Laid-Open No. 2006-208482
 学習者と教師との間の議論や、2つの班の間の議論等、2グループ間の議論の分析が行われる場合がある。特許文献1に開示されたシステムは、複数の参加者それぞれが発話をしたことを時刻ごとに表示するため、複数の参加者を分けた2グループの発話の傾向を分析者が把握することは難しく、2グループ間の議論の分析を行いづらいという問題があった。 Discussions between two groups, such as discussions between learners and teachers, and discussions between two groups, may be analyzed. The system disclosed in Patent Literature 1 displays the utterances of each of a plurality of participants for each time, so it is difficult for an analyst to grasp the tendency of utterances of two groups that divide a plurality of participants. , there was a problem that it was difficult to analyze the discussion between the two groups.
 本発明はこれらの点に鑑みてなされたものであり、2グループ間の議論における発話の傾向を分析しやすくすることを目的とする。 The present invention was made in view of these points, and aims to make it easier to analyze the tendency of utterances in discussions between two groups.
 本発明の第1の態様の音声分析装置は、第1グループに属する参加者と第2グループに属する参加者とのそれぞれが議論で発した音声における前記第1グループ及び前記第2グループそれぞれの発話状況を時間ごとに示す時系列情報を取得する取得部と、前記時系列情報に基づいて、前記議論を構成する複数の区間それぞれと、当該区間において前記第1グループ及び前記第2グループのどちらの発話が主であるかを示す区間傾向と、を前記議論の全部又は一部において関連付けた区間情報を生成する生成部と、前記区間情報を出力する出力部と、を有する。 The speech analysis device according to the first aspect of the present invention is a speech analysis device in which participants belonging to a first group and participants belonging to a second group each uttered in a discussion the utterances of the first group and the second group. an acquisition unit that acquires time-series information indicating a situation for each time period, each of a plurality of sections that constitute the discussion based on the time-series information, and which of the first group and the second group is selected in the section A generation unit that generates interval information that associates an interval tendency indicating whether an utterance is dominant in all or part of the discussion, and an output unit that outputs the interval information.
 前記区間傾向は、前記第1グループ及び前記第2グループのどちらの発話が主であるか、又は前記第1グループ及び前記第2グループの発話が拮抗していることを示してもよい。 The interval tendency may indicate which of the first group and the second group is the main utterance, or that the utterances of the first group and the second group are antagonistic.
 前記生成部は、所定時間以上となるように前記複数の区間それぞれを決定し、当該区間における前記第1グループ及び前記第2グループの前記発話状況を比較することによって当該区間における前記区間傾向を決定してもよい。 The generating unit determines each of the plurality of segments so that they are equal to or longer than a predetermined time, and determines the segment trend in the segment by comparing the utterance situations of the first group and the second group in the segment. You may
 前記時系列情報は、前記議論の始点から終点までの期間において、所定の時間フレームごとに、前記第1グループ及び前記第2グループのどちらの発話量が大きいかを示す情報であってもよい。 The time-series information may be information indicating which of the first group and the second group has a larger amount of speech for each predetermined time frame during the period from the start point to the end point of the discussion.
 前記音声分析装置は、複数の参加者を前記第1グループ及び前記第2グループに分類する分類部をさらに有してもよい。 The speech analysis device may further include a classification unit that classifies a plurality of participants into the first group and the second group.
 前記分類部は、前記議論における複数の期間の間で、前記第1グループ及び前記第2グループそれぞれに属する参加者を変更してもよい。 The classification unit may change the participants belonging to each of the first group and the second group during a plurality of periods in the discussion.
 前記分類部は、前記複数の参加者の一部を分類した前記第1グループ及び前記第2グループを含む第1親グループと、前記第1親グループに属しない前記複数の参加者の一部を分類した前記第1グループ及び前記第2グループを含む第2親グループと、を生成し、前記出力部は、前記第1親グループの前記区間情報と、前記第2親グループの前記区間情報と、を同時に出力してもよい。 The classification unit classifies a first parent group including the first group and the second group into which some of the plurality of participants are classified, and a portion of the plurality of participants not belonging to the first parent group. a second parent group including the classified first group and the second group, and the output unit generates the section information of the first parent group and the section information of the second parent group; can be output at the same time.
 前記分類部は、前記複数の参加者の一部を分類した前記第1グループ及び前記第2グループを含む第1親グループと、前記第1親グループに属しない前記複数の参加者の一部を分類した前記第1グループ及び前記第2グループを含む第2親グループと、を生成し、さらに前記第1親グループ及び前記第2親グループに属する参加者を分類した前記第1グループ及び前記第2グループを含む第3親グループを生成し、前記出力部は、前記第1親グループ及び前記第2親グループの少なくとも一方の前記区間情報と、前記第3親グループの前記区間情報と、を出力してもよい。 The classification unit classifies a first parent group including the first group and the second group into which some of the plurality of participants are classified, and a portion of the plurality of participants not belonging to the first parent group. a second parent group containing the classified first group and the second group; A third parent group including groups is generated, and the output unit outputs the section information of at least one of the first parent group and the second parent group and the section information of the third parent group. may
 前記分類部は、特定の参加者の位置に基づいて、当該特定の参加者が属する前記第1グループに属する参加者と、前記第2グループに属する参加者と、を変更してもよい。 The classification unit may change the participants belonging to the first group to which the specific participant belongs and the participants belonging to the second group based on the position of the specific participant.
 前記出力部は、前記音声に対して音声認識処理を行うことによって抽出された、前記複数の区間それぞれの発話に含まれる語を、当該区間と関連付けて出力してもよい。 The output unit may output the words included in the utterance of each of the plurality of sections, which are extracted by performing speech recognition processing on the speech, in association with the section.
 前記出力部は、前記議論を構成する前記複数の区間の前記区間傾向に基づいて、前記議論全体の特徴を出力してもよい。 The output unit may output the characteristics of the entire discussion based on the section trends of the plurality of sections forming the discussion.
 前記音声分析装置は、前記区間情報の比較対象とする基準区間情報を選択する選択部をさらに有し、前記出力部は、前記区間情報と、前記基準区間情報と、の比較結果を出力してもよい。 The speech analysis device further includes a selection unit that selects reference section information to be compared with the section information, and the output unit outputs a comparison result between the section information and the reference section information. good too.
 前記出力部は、前記議論の最中に、前記区間情報と、前記基準区間情報と、の差分に対応する情報を前記比較結果として出力してもよい。 The output unit may output information corresponding to a difference between the section information and the reference section information as the comparison result during the discussion.
 前記出力部は、前記議論の後に、前記区間情報が示す前記複数の区間それぞれの前記区間傾向と、前記基準区間情報が示す前記複数の区間それぞれの前記区間傾向と、を関連付けて出力してもよい。 After the discussion, the output unit may associate and output the section tendency of each of the plurality of sections indicated by the section information and the section tendency of each of the plurality of sections indicated by the reference section information. good.
 本発明の第2の態様の音声分析方法は、プロセッサが実行する、第1グループに属する参加者と第2グループに属する参加者とのそれぞれが議論で発した音声における前記第1グループ及び前記第2グループそれぞれの発話状況を時間ごとに示す時系列情報を取得するステップと、前記時系列情報に基づいて、前記議論を構成する複数の区間それぞれと、当該区間において前記第1グループ及び前記第2グループのどちらの発話が主であるかを示す区間傾向と、を前記議論の全部又は一部において関連付けた区間情報を生成するステップと、前記区間情報を出力するステップと、を有する。 A speech analysis method according to a second aspect of the present invention is performed by a processor, in speeches uttered in a discussion by participants belonging to a first group and participants belonging to a second group. a step of acquiring time-series information indicating the utterance status of each of the two groups for each time; based on the time-series information, each of a plurality of sections constituting the discussion; A step of generating interval information that associates an interval tendency indicating which utterance of the group is dominant in all or part of the discussion; and a step of outputting the interval information.
 本発明によれば、2グループ間の議論における発話の傾向を分析しやすくすることができるという効果を奏する。 According to the present invention, it is possible to easily analyze the tendency of utterances in discussions between two groups.
実施形態に係る音声分析システムSの模式図である。1 is a schematic diagram of a speech analysis system S according to an embodiment; FIG. 実施形態に係る音声分析システムSのブロック図である。1 is a block diagram of a speech analysis system S according to an embodiment; FIG. 選択部が基準区間情報を選択する方法を説明するための模式図である。FIG. 5 is a schematic diagram for explaining a method for a selection unit to select reference section information; 取得部が時系列情報を取得する方法を説明するための模式図である。FIG. 4 is a schematic diagram for explaining a method for an acquisition unit to acquire time-series information; 生成部が区間傾向を決定する方法を説明するための模式図である。FIG. 10 is a schematic diagram for explaining a method of determining an interval tendency by a generation unit; 出力部が区間情報のリアルタイム出力をする方法を説明するための模式図である。FIG. 10 is a schematic diagram for explaining a method for an output unit to output section information in real time; 出力部が区間情報の事後出力をする方法を説明するための模式図である。FIG. 10 is a schematic diagram for explaining a method of post-outputting section information by an output unit; 実施形態に係る音声分析装置が実行する例示的な音声分析方法のフローチャートを示す図である。FIG. 4 is a diagram showing a flowchart of an exemplary speech analysis method performed by the speech analysis device according to the embodiment; 実施形態に係る音声分析装置が実行する例示的な音声分析方法における区間情報生成処理のフローチャートを示す図である。FIG. 7 is a diagram showing a flowchart of section information generation processing in an exemplary speech analysis method executed by the speech analysis device according to the embodiment; 変形例において出力部が区間情報を出力する方法を説明するための模式図である。It is a schematic diagram for demonstrating the method an output part outputs area information in a modification. 変形例において生成部が区間情報を生成する方法を説明するための模式図である。FIG. 11 is a schematic diagram for explaining a method of generating section information by a generation unit in a modified example;
[音声分析システムSの概要]
 図1は、本実施形態に係る音声分析システムSの模式図である。音声分析システムSは、音声分析装置1と、集音装置2と、情報端末3と、を含む。音声分析システムSが含む集音装置2及び情報端末3の数は限定されない。音声分析システムSは、その他のサーバ、端末等の機器を含んでもよい。
[Overview of speech analysis system S]
FIG. 1 is a schematic diagram of a speech analysis system S according to this embodiment. A speech analysis system S includes a speech analysis device 1 , a sound collector 2 and an information terminal 3 . The number of sound collectors 2 and information terminals 3 included in the speech analysis system S is not limited. The speech analysis system S may include other devices such as servers and terminals.
 音声分析装置1は、複数の参加者が参加する議論において発せられた音声を分析し、分析結果を分析者に提供するコンピュータである。分析者は、複数の参加者のうち一部の人間であってもよく、複数の参加者とは異なる人間であってもよい。音声分析装置1は、集音装置2が取得した音声を分析し、分析の結果を集音装置2又は情報端末3に出力する。音声分析装置1は、ローカルエリアネットワーク、インターネット等のネットワークを介して、集音装置2及び情報端末3に有線又は無線で接続される。 The speech analysis device 1 is a computer that analyzes the speech uttered in discussions in which multiple participants participate and provides the analysis results to the analyst. The analyst may be some of the participants, or may be a different person from the participants. The voice analysis device 1 analyzes the voice acquired by the sound collection device 2 and outputs the analysis result to the sound collection device 2 or the information terminal 3 . The voice analysis device 1 is connected to the sound collector 2 and the information terminal 3 by wire or wirelessly via a network such as a local area network or the Internet.
 音声分析装置1は、複数の参加者が少なくとも2つのグループに分かれて行う議論の音声を分析する。分析対象の議論は、例えば、授業、グループディスカッション、ディベート、会議等である。複数の参加者は、第1グループ及び第2グループのどちらかに分類される。第1グループに属する参加者は、例えば、教師やチューター等の指導者である。第2グループに属する参加者は、例えば、生徒や学生等の学習者である。また、複数の学習者が第1グループ及び第2グループに分類されてもよい。複数の参加者は、その他の基準で分類されてもよい。 The voice analysis device 1 analyzes the voices of discussions conducted by a plurality of participants divided into at least two groups. Discussions to be analyzed are, for example, classes, group discussions, debates, meetings, and the like. A plurality of participants are classified into either a first group or a second group. Participants belonging to the first group are instructors such as teachers and tutors, for example. Participants belonging to the second group are learners such as pupils and students, for example. Also, a plurality of learners may be classified into the first group and the second group. Multiple participants may be classified according to other criteria.
 複数の参加者全体が複数の親グループに分類され、さらに複数の親グループそれぞれの中で複数の参加者が第1グループ又は第2グループに分類されてもよい。この場合に、親グループそれぞれは、第1グループ及び第2グループを含む。例えば、1つの親グループは複数の参加者が取り囲む1つのテーブルに対応しており、当該テーブルに集音装置2が配置される。1つのテーブルを取り囲む複数の参加者は、第1グループ及び第2グループに分類される。 A plurality of participants may be classified into a plurality of parent groups, and a plurality of participants may be classified into the first group or the second group in each of the plurality of parent groups. In this case, each parent group includes a first group and a second group. For example, one parent group corresponds to one table surrounded by a plurality of participants, and the sound collector 2 is arranged at the table. A plurality of participants surrounding one table are classified into a first group and a second group.
 また、音声分析装置1は、ネットワークを介して行われる議論(例えば、Web会議)の音声を分析してもよい。この場合に、議論中に複数の参加者がいる空間それぞれに集音装置2が配置され、集音装置2は複数の参加者のうちいずれかに関連付けられる。 In addition, the speech analysis device 1 may analyze the speech of discussions (for example, web conferences) held over a network. In this case, a sound collector 2 is arranged in each space where a plurality of participants are present during the discussion, and the sound collector 2 is associated with one of the plurality of participants.
 集音装置2は、議論において発せられた音声を取得する装置である。集音装置2は、例えば、異なる向きに配置された複数のマイクロフォン等の集音部を含むマイクロフォンアレイを備える。マイクロフォンアレイは、例えば、地面に対する水平面において、同一円周上に等間隔で配置された複数個(例えば、8個)のマイクロフォンを含む。このようなマイクロフォンアレイを用いることによって、音声分析装置1は、集音装置2を取り囲んでいる複数の参加者が発した音声に基づいて、いずれの参加者が話者(音源)であるかを特定することができる。集音装置2は、マイクロフォンアレイを用いて取得した音声を音声データとして音声分析装置1へ送信する。また、集音装置2は、スピーカ等の音声出力部を備えてもよい。 The sound collecting device 2 is a device that acquires the voice uttered in the discussion. The sound collector 2 includes, for example, a microphone array including sound collectors such as a plurality of microphones arranged in different directions. A microphone array includes, for example, a plurality of (e.g., eight) microphones arranged at equal intervals on the same circumference in a plane horizontal to the ground. By using such a microphone array, the speech analysis device 1 can determine which participant is the speaker (sound source) based on the speech uttered by a plurality of participants surrounding the sound collector 2. can be specified. The sound collector 2 transmits the sound acquired using the microphone array to the sound analysis device 1 as sound data. Also, the sound collecting device 2 may include an audio output unit such as a speaker.
 情報端末3は、情報を出力するコンピュータであり、例えば、スマートフォン、タブレット端末又はパーソナルコンピュータである。情報端末3は、例えば、複数の参加者のうち少なくとも一部の参加者によって利用される。また、情報端末3は、複数の参加者とは異なる分析者によって利用されてもよい。情報端末3は、例えば、液晶ディスプレイ等の表示部を有する。情報端末3は、音声分析装置1から受信した情報を表示部に表示させる。 The information terminal 3 is a computer that outputs information, such as a smartphone, tablet terminal, or personal computer. The information terminal 3 is used, for example, by at least some of the participants. Also, the information terminal 3 may be used by an analyst different from the plurality of participants. The information terminal 3 has, for example, a display such as a liquid crystal display. The information terminal 3 causes the display unit to display the information received from the speech analysis device 1 .
 また、情報端末3は、マイクロフォン等の集音部を有することにより、集音装置2として機能してもよい。この場合に、複数の参加者それぞれが利用する情報端末3は、集音部を用いて取得した音声を音声データとして音声分析装置1へ送信する。 Also, the information terminal 3 may function as the sound collector 2 by having a sound collector such as a microphone. In this case, the information terminal 3 used by each of the plurality of participants transmits the sound acquired using the sound collecting unit to the sound analysis device 1 as sound data.
 本実施形態に係る音声分析システムSが音声を分析する処理の概要を以下に説明する。音声分析装置1は、複数の参加者を第1グループ又は第2グループに分類する。音声分析装置1は、例えば、情報端末3において複数の参加者が第1グループ及び第2グループのどちらに属するかの設定を受け付け、又は複数の参加者の属性に基づいて複数の参加者を自動的に第1グループ又は第2グループに分類する。 An overview of the process of analyzing speech by the speech analysis system S according to this embodiment will be described below. The speech analysis device 1 classifies a plurality of participants into a first group or a second group. For example, the speech analysis device 1 accepts a setting as to whether the plurality of participants belong to the first group or the second group at the information terminal 3, or automatically selects the plurality of participants based on the attributes of the plurality of participants. classified into the first group or the second group.
 音声分析装置1は、集音装置2から、複数の参加者が議論で発した音声を取得する。音声分析装置1は、取得した音声において複数の参加者それぞれの発話期間を特定することにより、第1グループ及び第2グループそれぞれの発話状況を時間ごとに示す時系列情報を取得する。時系列情報は、例えば、所定の時間フレームごとに、第1グループ及び第2グループのどちらの発話量が大きいかを示す情報である。 The voice analysis device 1 acquires voices uttered by multiple participants in the discussion from the sound collection device 2 . The speech analysis apparatus 1 acquires time-series information indicating the utterance status of each of the first group and the second group by specifying the utterance period of each of the plurality of participants in the acquired voice. The time-series information is, for example, information indicating which of the first group and the second group has a larger amount of speech for each predetermined time frame.
 音声分析装置1は、取得した時系列情報に基づいて、議論を構成する複数の区間それぞれと、当該区間において第1グループ及び第2グループのどちらの発話が主であるかを示す区間傾向と、を関連付けた区間情報を生成する。区間情報は、第1グループ及び第2グループのどちらの発話が主であるかのほかに、第1グループ及び第2グループの発話が拮抗していることを示してもよい。音声分析装置1は、生成した区間情報を、集音装置2及び情報端末3の少なくとも一方に出力する。 Based on the acquired time-series information, the speech analysis device 1 obtains each of the plurality of sections that constitute the discussion, the section trend indicating which of the first group and the second group is the main utterance in the section, and Generate section information associated with . The section information may indicate which of the first group and the second group is the main utterance, and that the utterances of the first group and the second group are competing with each other. The speech analysis device 1 outputs the generated segment information to at least one of the sound collector 2 and the information terminal 3 .
 このように、音声分析システムSは、議論の音声に基づいて、議論の区間ごとに2グループのどちらの発話が主であるかを示す区間傾向を判定し、区間及び区間傾向を関連付けて分析者に通知する。これにより、音声分析システムSは、分析者が2グループの発話の傾向を把握することを容易にし、2グループ間の議論における発話の傾向を分析しやすくすることができる。 In this way, the speech analysis system S determines a section tendency indicating which of the two groups is the main utterance for each discussion section based on the speech of the discussion, associates the section and the section tendency, and to notify. As a result, the speech analysis system S makes it easy for the analyst to grasp the utterance tendencies of the two groups, and facilitates analysis of the utterance tendencies in the discussion between the two groups.
[音声分析システムSの構成]
 図2は、本実施形態に係る音声分析システムSのブロック図である。図2において、矢印は主なデータの流れを示しており、図2に示したもの以外のデータの流れがあってもよい。図2において、各ブロックはハードウェア(装置)単位の構成ではなく、機能単位の構成を示している。そのため、図2に示すブロックは単一の装置内に実装されてもよく、あるいは複数の装置内に分かれて実装されてもよい。ブロック間のデータの授受は、データバス、ネットワーク、可搬記憶媒体等、任意の手段を介して行われてよい。
[Structure of speech analysis system S]
FIG. 2 is a block diagram of the speech analysis system S according to this embodiment. In FIG. 2, arrows indicate main data flows, and there may be data flows other than those shown in FIG. In FIG. 2, each block does not show the configuration in units of hardware (apparatus), but the configuration in units of functions. As such, the blocks shown in FIG. 2 may be implemented within a single device, or may be implemented separately within multiple devices. Data exchange between blocks may be performed via any means such as a data bus, network, or portable storage medium.
 音声分析装置1は、記憶部11と、制御部12とを有する。音声分析装置1は、2つ以上の物理的に分離した装置が有線又は無線で接続されることにより構成されてもよい。また、音声分析装置1は、コンピュータ資源の集合であるクラウドによって構成されてもよい。 The speech analysis device 1 has a storage unit 11 and a control unit 12. The speech analysis device 1 may be configured by connecting two or more physically separated devices by wire or wirelessly. Also, the speech analysis device 1 may be configured by a cloud that is a collection of computer resources.
 記憶部11は、ROM(Read Only Memory)、RAM(Random Access Memory)、ハードディスクドライブ等を含む記憶媒体である。記憶部11は、制御部12が実行するプログラムを予め記憶している。記憶部11は、音声分析装置1の外部に設けられてもよく、その場合にネットワークを介して制御部12との間でデータの授受を行ってもよい。 The storage unit 11 is a storage medium including ROM (Read Only Memory), RAM (Random Access Memory), hard disk drive, and the like. The storage unit 11 stores programs executed by the control unit 12 in advance. The storage unit 11 may be provided outside the speech analysis device 1, in which case data may be exchanged with the control unit 12 via a network.
 制御部12は、選択部121と、分類部122と、取得部123と、生成部124と、出力部125と、を有する。制御部12は、例えばCPU(Central Processing Unit)等のプロセッサであり、記憶部11に記憶されたプログラムを実行することにより、選択部121、分類部122、取得部123、生成部124及び出力部125として機能する。制御部12の機能の少なくとも一部は電気回路によって実行されてもよい。また、制御部12の機能の少なくとも一部は、制御部12がネットワーク経由で実行されるプログラムを実行することによって実現されてもよい。 The control unit 12 has a selection unit 121 , a classification unit 122 , an acquisition unit 123 , a generation unit 124 and an output unit 125 . The control unit 12 is a processor such as a CPU (Central Processing Unit), for example, and by executing a program stored in the storage unit 11, a selection unit 121, a classification unit 122, an acquisition unit 123, a generation unit 124, and an output unit 125. At least part of the functions of the controller 12 may be performed by an electrical circuit. Moreover, at least part of the functions of the control unit 12 may be realized by the control unit 12 executing a program executed via a network.
 以下、音声分析装置1が実行する処理について詳細に説明する。選択部121は、区間情報の比較対象とする基準区間情報を選択する。区間情報は、議論を構成する複数の区間それぞれと、当該区間の区間傾向と、を関連付けた情報である。区間傾向は、第1グループ及び第2グループのどちらの発話が主であるか、又は第1グループ及び第2グループの発話が拮抗していることを示す。区間情報は、分析対象の議論の音声に基づいて、後述の生成部124によって生成される。基準区間情報は、予め生成された区間情報であり、記憶部11に予め記憶されている。 The processing executed by the speech analysis device 1 will be described in detail below. The selection unit 121 selects reference section information to be compared with section information. The section information is information that associates each of a plurality of sections forming an argument with the section tendency of the section. The interval tendency indicates which of the utterances of the first group and the second group is dominant, or that the utterances of the first group and the second group are competing with each other. The section information is generated by the generation unit 124, which will be described later, based on the speech of the discussion to be analyzed. The reference section information is section information generated in advance and stored in the storage unit 11 in advance.
 選択部121は、例えば、情報端末3において分析者が指定した内容に基づいて、基準区間情報を選択する。分析者は、分析対象の議論に参加する複数の参加者のいずれか、又は当該複数の参加者とは異なる人間である。 The selection unit 121 selects the reference section information based on the content specified by the analyst on the information terminal 3, for example. An analyst is either one of a plurality of participants participating in a discussion to be analyzed, or a person different from the plurality of participants.
 図3(a)、図3(b)は、選択部121が基準区間情報を選択する方法を説明するための模式図である。図3(a)の例では、記憶部11は、過去に行われた議論の属性(教科、種類、時間、トピック、形式等)と、当該議論が行われた日時と、当該議論の参加者の属性(担当教員、学年等)と、のうち少なくとも1つを示す議論情報を、生成部124が後述の方法で生成した区間情報と関連付けて予め記憶している。選択部121は、例えば、情報端末3において、分析者による過去の議論の検索条件の指定を受け付ける。検索条件は、例えば、議論の属性と、議論が行われた日時と、参加者の属性と、のうち少なくとも1つである。選択部121は、指定された検索条件に合致する議論情報を記憶部11から抽出し、抽出した一又は複数の議論情報それぞれに関連付けられた区間情報31とともに、情報端末3に表示させる。 FIGS. 3(a) and 3(b) are schematic diagrams for explaining how the selection unit 121 selects reference section information. In the example of FIG. 3A, the storage unit 11 stores the attributes of past discussions (subject, type, time, topic, format, etc.), the date and time of the discussion, and the participants of the discussion. (teacher in charge, grade, etc.) and discussion information indicating at least one of the attributes (teacher in charge, grade, etc.) are stored in advance in association with section information generated by the generation unit 124 by a method described later. For example, in the information terminal 3, the selection unit 121 receives designation of search conditions for past discussions by the analyst. The search condition is, for example, at least one of the attributes of the discussion, the date and time of the discussion, and the attributes of the participants. The selection unit 121 extracts discussion information that matches the designated search condition from the storage unit 11, and displays it on the information terminal 3 together with the section information 31 associated with each of the extracted one or more pieces of discussion information.
 そして選択部121は、情報端末3において、分析者によるいずれかの区間情報31の指定を受け付け、指定された区間情報31を基準区間情報として選択する。これにより音声分析装置1は、分析対象の議論を、分析者が指定した検索条件に合致する議論と比較できる。 Then, the selection unit 121 receives the specification of any section information 31 by the analyst on the information terminal 3, and selects the specified section information 31 as the reference section information. Thereby, the speech analysis device 1 can compare the argument to be analyzed with the argument that matches the search condition specified by the analyst.
 図3(b)の例では、記憶部11は、区間情報のテンプレートを予め記憶している。区間情報のテンプレートは、例えば、第1グループの発話が主である区間と、第2グループの発話が主である区間と、第1グループ及び第2グループの発話が拮抗している区間と、の順序を示す情報である。選択部121は、記憶部11に記憶された複数のテンプレート32を、情報端末3に表示させる。図3(b)は、複数の参加者が第1グループであるT(指導者)グループと、第2グループであるS(学習者)グループと、に分類される例を表しているが、複数の参加者はその他の基準で2つのグループに分類されてもよい。 In the example of FIG. 3(b), the storage unit 11 pre-stores a template of section information. The segment information template includes, for example, a segment in which the utterances of the first group are dominant, a segment in which the utterances of the second group are dominant, and a segment in which the utterances of the first and second groups are competing. This is information indicating the order. The selection unit 121 causes the information terminal 3 to display the plurality of templates 32 stored in the storage unit 11 . FIG. 3B shows an example in which a plurality of participants are classified into a first group, T (teacher) group, and a second group, S (student) group. participants may be classified into two groups by other criteria.
 そして選択部121は、情報端末3において、分析者によるいずれかのテンプレート32の指定を受け付け、指定されたテンプレート32が示す区間の順序を基準区間情報として選択する。また、選択部121は、情報端末3において、第1グループの発話が主である区間と、第2グループの発話が主である区間と、第1グループ及び第2グループの発話が拮抗している区間と、の順序の入力を受け付け、入力された区間の順序を基準区間情報として選択してもよい。これにより音声分析装置1は、分析対象の議論を、分析者が指定した区間の順序と比較できる。 Then, in the information terminal 3, the selection unit 121 accepts designation of one of the templates 32 by the analyst, and selects the order of the sections indicated by the designated template 32 as the reference section information. In the information terminal 3, the selection unit 121 selects a section in which the utterances of the first group are dominant, a section in which the utterances of the second group are dominant, and utterances of the first group and the utterances of the second group. It is also possible to receive an input of the order of the section and select the input order of the section as the reference section information. Thereby, the speech analysis device 1 can compare the argument to be analyzed with the order of the sections specified by the analyst.
 分類部122は、分析対象の議論に参加する複数の参加者を、第1グループ又は第2グループに分類する。分類部122は、例えば、情報端末3において、分析者による複数の参加者それぞれが第1グループ及び第2グループのどちらに属するかの設定を受け付けてもよい。この場合に、分類部122は、情報端末3において設定された内容に従って、複数の参加者それぞれが第1グループ及び第2グループのどちらに属するかを示す情報を、記憶部11に記憶させる。 The classification unit 122 classifies a plurality of participants participating in the analysis target discussion into the first group or the second group. For example, in the information terminal 3, the classification unit 122 may receive a setting by the analyst as to which of the first group and the second group each of the plurality of participants belongs to. In this case, the classification unit 122 causes the storage unit 11 to store information indicating to which of the first group and the second group each of the plurality of participants belongs, according to the content set in the information terminal 3 .
 また、分類部122は、例えば、複数の参加者それぞれの属性に基づいて、複数の参加者を自動的に第1グループ又は第2グループに分類してもよい。この場合に、記憶部11は、複数の参加者それぞれの属性を示す情報を予め記憶している。分類に用いられる属性は、例えば、参加者の役割(指導者又は学習者等)である。分類部122は、例えば、参加者の属性が所定の条件を満たす場合に当該参加者を第1グループに分類し、そうでない場合に当該参加者を第2グループに分類する。分類部122は、分類した結果に従って、複数の参加者それぞれが第1グループ及び第2グループのどちらに属するかを示す情報を、記憶部11に記憶させる。 Also, the classification unit 122 may automatically classify a plurality of participants into the first group or the second group, for example, based on the attributes of each of the participants. In this case, the storage unit 11 stores in advance information indicating attributes of each of the plurality of participants. Attributes used for classification are, for example, the roles of participants (instructor, learner, etc.). For example, the classifying unit 122 classifies the participant into the first group if the participant's attribute satisfies a predetermined condition, and classifies the participant into the second group otherwise. The classification unit 122 causes the storage unit 11 to store information indicating to which of the first group and the second group each of the plurality of participants belongs according to the classification result.
 取得部123は、集音装置2から、複数の参加者が議論で発した音声を取得する。取得部123は、議論の最中に議論の一部の音声を所定の時間間隔で取得し、又は議論の終了後に議論全体の音声を取得する。 The acquisition unit 123 acquires, from the sound collection device 2, voices uttered by a plurality of participants in the discussion. The acquisition unit 123 acquires a part of the speech of the discussion at predetermined time intervals during the discussion, or acquires the speech of the entire discussion after the discussion ends.
 取得部123は、集音装置2から取得した音声に基づいて、複数の参加者それぞれの発話期間を特定する。マイクロフォンアレイを備える集音装置2の周囲で行われた議論の場合に、取得部123は、例えば、集音装置2から受信した複数チャネルの音声に対して既知の音源定位を行う。音源定位は、取得部123が取得した音声に含まれる音源の向きを、時間ごと(例えば10ミリ秒~100ミリ秒ごと)に推定する処理である。取得部123は、時間ごとに推定した音源の向きを、情報端末3において予め設定された複数の参加者それぞれの向きと関連付ける。 The acquisition unit 123 identifies the utterance period of each of the multiple participants based on the sound acquired from the sound collector 2 . In the case of a discussion around the sound collector 2 having a microphone array, the acquisition unit 123 performs known sound source localization on multi-channel sounds received from the sound collector 2, for example. The sound source localization is a process of estimating the direction of the sound source included in the sound acquired by the acquisition unit 123 for each time (for example, every 10 ms to 100 ms). The acquisition unit 123 associates the orientation of the sound source estimated for each time with the orientation of each of the plurality of participants preset in the information terminal 3 .
 取得部123は、取得した音声に基づいて音源の向きを特定可能であれば、MUSIC(Multiple Signal Classification)法、ビームフォーミング法等、その他の音源定位方法を用いることができる。 The acquisition unit 123 can use other sound source localization methods such as the MUSIC (Multiple Signal Classification) method, the beamforming method, etc., as long as the direction of the sound source can be specified based on the acquired sound.
 次に取得部123は、取得した音声及び推定した音源の向きに基づいて、議論において、所定の時間ごと(例えば10ミリ秒~100ミリ秒ごと)に、いずれの参加者が発話(発言)したかを判別する。取得部123は、1人の参加者が発話を開始してから終了するまでの連続した期間を発話期間として特定する。同じ時間に複数の参加者が発話を行った場合には、複数の参加者の発話期間の少なくとも一部同士が重複してもよい。 Next, based on the acquired voice and the direction of the estimated sound source, the acquisition unit 123 determines which participant uttered (spoken) every predetermined time (for example, every 10 milliseconds to 100 milliseconds) in the discussion. determine whether The acquisition unit 123 identifies a continuous period from when one participant starts speaking to when it ends as an speaking period. When multiple participants speak at the same time, at least some of the speech periods of the multiple participants may overlap.
 ネットワークを介して行われる議論の場合に、取得部123は、例えば、取得した音声の送信元である集音装置2に関連付けられた参加者を音源として推定し、取得した音声及び推定した音源に基づいて複数の参加者それぞれの発話期間を特定する。 In the case of a discussion held over a network, for example, the acquisition unit 123 estimates, as a sound source, a participant associated with the sound collector 2 that is the transmission source of the acquired sound, and uses the acquired sound and the estimated sound source as The utterance period of each of a plurality of participants is specified based on this.
 取得部123は、ここに示した具体的な方法に限られず、その他の方法で複数の参加者それぞれの発話期間を特定してもよい。 The acquisition unit 123 is not limited to the specific method shown here, and may identify the utterance period of each of the multiple participants by other methods.
 取得部123は、特定した複数の参加者それぞれの発話期間に基づいて、第1グループ及び第2グループそれぞれの発話状況を時間ごとに示す時系列情報を取得する。図4は、取得部123が時系列情報を取得する方法を説明するための模式図である。 The acquisition unit 123 acquires time-series information indicating the utterance status of each of the first group and the second group based on the utterance period of each of the specified participants. FIG. 4 is a schematic diagram for explaining how the acquisition unit 123 acquires time-series information.
 取得部123は、特定した発話期間に基づいて、複数の参加者それぞれの発話量を算出する。取得部123は、例えば、所定の時間フレーム(5秒、10秒、30秒等)ごとに、当該時間フレーム内の参加者の発話期間の長さに対応する値を、発話量として算出する。取得部123は、発話期間の長さに代えて又は加えて、発話回数又は発話音量に対応する値を、発話量として算出してもよい。取得部123は、複数の参加者それぞれについて、議論の始点(開始時刻)から終点(終了時刻)までの期間において、時間フレームごとの発話量を算出する。 The acquisition unit 123 calculates the speech volume of each of the multiple participants based on the identified speech period. For example, for each predetermined time frame (5 seconds, 10 seconds, 30 seconds, etc.), the acquisition unit 123 calculates, as the amount of speech, a value corresponding to the length of the speech period of the participant within that time frame. Instead of or in addition to the length of the speech period, the acquisition unit 123 may calculate a value corresponding to the number of times of speech or the volume of speech as the amount of speech. The acquisition unit 123 calculates the amount of speech for each time frame in the period from the start point (start time) to the end point (end time) of the discussion for each of the plurality of participants.
 取得部123は、時間フレームごとの複数の参加者それぞれの発話量を、分類部122による複数の参加者の分類結果に従って第1グループ又は第2グループに分類する。図4の例では、複数の参加者は第1グループであるT(指導者)グループと、第2グループであるS(学習者)グループに分類されている。取得部123は、時間フレームごとに、第1グループ及び第2グループそれぞれに属する参加者の発話量の統計値(平均値、中央値等)を算出する。 The acquisition unit 123 classifies the speech volume of each of the multiple participants for each time frame into the first group or the second group according to the classification result of the multiple participants by the classifying unit 122 . In the example of FIG. 4, a plurality of participants are classified into a first group, T (teacher) group, and a second group, S (student) group. The acquisition unit 123 calculates a statistical value (average value, median value, etc.) of the amount of speech of the participants belonging to each of the first group and the second group for each time frame.
 取得部123は、時間フレームごとに、第1グループ及び第2グループのどちらの発話量の統計値が大きいかを判定する。取得部123は、判定結果に従って、議論の始点から終点までの期間において、所定の時間フレームごとに、第1グループ及び第2グループのどちらの発話量がより大きいかを示す情報を、時系列情報として取得する。時系列情報において、第1グループ及び第2グループのどちらの発話量がより大きいかが同じである複数の時間フレームが続いている場合には、取得部123は当該複数の時間フレームを統合してもよい。 The acquisition unit 123 determines, for each time frame, which of the first group and the second group has a larger statistical value of the amount of speech. Acquisition section 123 acquires information indicating which of the first group and second group has a larger amount of speech for each predetermined time frame in a period from the start point to the end point of the discussion, according to the determination result, as time-series information. to get as In the time-series information, when a plurality of time frames in which the first group and the second group have the same speech volume continue, the acquisition unit 123 may integrate the plurality of time frames. good.
 生成部124は、取得部123が取得した時系列情報に基づいて、分析対象の議論を構成する複数の区間を決定し、区間ごとに第1グループ及び第2グループのどちらの発話が主であるかを示す区間傾向を決定する。生成部124は、議論の最中に議論の一部における区間及び区間傾向を決定してもよく、議論の終了後に議論全体における区間及び区間傾向を決定してもよい。 Based on the time-series information acquired by the acquisition unit 123, the generation unit 124 determines a plurality of sections that constitute the discussion to be analyzed, and determines which of the first group and the second group is the main utterance for each section. Determine the interval trend that indicates whether The generation unit 124 may determine sections and section trends in a part of the discussion during the discussion, and may determine sections and section trends in the entire discussion after the discussion ends.
 図5は、生成部124が区間傾向を決定する方法を説明するための模式図である。生成部124は、取得部123が取得した時系列情報に基づいて、第1グループ及び第2グループのどちらの発話量が大きいかの遷移を示す遷移グラフを生成する。図5は、本実施形態に係る音声分析システムSが、非特許文献1において説明されているS-T分析のグラフの生成に適用される例を表している。 FIG. 5 is a schematic diagram for explaining how the generation unit 124 determines the interval tendency. Based on the time-series information acquired by the acquisition unit 123, the generation unit 124 generates a transition graph showing transitions indicating which of the first group and the second group has a larger amount of speech. FIG. 5 shows an example in which the speech analysis system S according to the present embodiment is applied to generate an ST analysis graph described in Non-Patent Document 1. In FIG.
 図5に例示した遷移グラフにおいて、横軸は第1グループ(Tグループ)の時間を表しており、縦軸は第2グループ(Sグループ)の時間を表している。取得部123は、原点を始点(議論の始点)とし、時系列情報において第1グループの発話量がより大きい期間に対しては横軸に沿って右に線を引き、時系列情報において第2グループの発話量がより大きい期間に対しては縦軸に沿って上に線を引く。取得部123は、時系列情報の開始から終了まで、すなわち議論の始点から終点までこれを繰り返すことにより、第1グループ及び第2グループのどちらの発話量が大きいかの遷移を示す遷移グラフを生成する。 In the transition graph illustrated in FIG. 5, the horizontal axis represents time for the first group (T group), and the vertical axis represents time for the second group (S group). Acquisition unit 123 takes the origin as the starting point (discussion starting point), draws a line to the right along the horizontal axis for a period in which the amount of speech in the first group in the time-series information is larger, and draws a line to the right in the time-series information. A line is drawn up along the vertical axis for periods when the group's speech volume is greater. The acquisition unit 123 repeats this from the start to the end of the time-series information, that is, from the start point to the end point of the discussion, thereby generating a transition graph showing the transition of which of the first group and the second group has the greater amount of speech. do.
 生成部124は、生成した遷移グラフを用いて、分析対象の議論を複数の区間に分割するとともに、区間ごとに第1グループ及び第2グループのどちらの発話が主であるかを示す区間傾向を決定する。まず生成部124は、原点(遷移グラフの始点)を区間の始点に設定する。生成部124は、遷移グラフにおいて時系列順に1つの所定期間(例えば、5秒間)を注目単位として抽出する。 The generation unit 124 uses the generated transition graph to divide the argument to be analyzed into a plurality of sections, and generates a section tendency indicating which of the first group and the second group is the main utterance for each section. decide. First, the generating unit 124 sets the origin (the starting point of the transition graph) as the starting point of the interval. The generation unit 124 extracts one predetermined period (for example, 5 seconds) in chronological order from the transition graph as a target unit.
 生成部124は、区間の始点から注目単位の終点までの経過時間が所定時間以上であるか否かを判定する。所定時間は、例えば、5分間や10分間等、区間の最低継続時間として予め設定された値である。また、所定時間は、選択部121が選択した基準区間情報に応じて決定されてもよい。区間の始点から注目単位の終点までの経過時間が所定時間以上でない場合に、生成部124は、次の所定期間を注目単位として抽出し、区間の始点から注目単位の終点までの経過時間が所定時間以上であるか否かの判定を繰り返す。 The generation unit 124 determines whether or not the elapsed time from the start point of the section to the end point of the attention unit is equal to or longer than a predetermined time. The predetermined time is, for example, a value set in advance as the minimum duration of the section, such as 5 minutes or 10 minutes. Also, the predetermined time may be determined according to the reference section information selected by the selection unit 121 . If the elapsed time from the start point of the section to the end point of the unit of attention is less than the predetermined time, the generation unit 124 extracts the next predetermined period as the unit of attention, and the elapsed time from the start point of the section to the end point of the unit of attention is the predetermined time. It repeats the determination of whether or not it is equal to or longer than the time.
 区間の始点から注目単位の終点までの経過時間が所定時間以上である場合に、生成部124は、第1グループ及び第2グループの発話状況を比較することによって当該区間における区間傾向を決定する。生成部124は、例えば、発話状況を比較するために、遷移グラフ上で区間の始点の座標と注目単位の終点の座標との間の傾き(図5の破線)を算出する。 When the elapsed time from the start point of the section to the end point of the attention unit is equal to or longer than a predetermined time, the generation unit 124 determines the section tendency in the section by comparing the utterance situations of the first group and the second group. For example, the generation unit 124 calculates the slope (broken line in FIG. 5) between the coordinates of the start point of the section and the coordinates of the end point of the attention unit on the transition graph in order to compare the utterance situations.
 生成部124は、算出した傾きに基づいて、区間傾向を判定する。生成部124は、例えば、傾きが第1基準値以下である場合に、第1グループの発話が主であると判定する。生成部124は、例えば、傾きが第1基準値より大きく第2基準値以下である場合に、第1グループ及び第2グループの発話が拮抗していると判定する。生成部124は、例えば、傾きが第2基準値より大きい場合に、第2グループの発話が主であると判定する。第1基準値及び第2基準値は、記憶部11に予め記憶され、又は情報端末3において設定される。生成部124は、判定結果を区間の区間傾向として決定する。 The generation unit 124 determines the section tendency based on the calculated slope. For example, when the slope is equal to or less than the first reference value, the generation unit 124 determines that the utterances of the first group are predominant. For example, when the slope is greater than the first reference value and equal to or less than the second reference value, the generation unit 124 determines that the utterances of the first group and the second group are competing with each other. For example, when the slope is greater than the second reference value, the generator 124 determines that the utterances of the second group are predominant. The first reference value and the second reference value are stored in advance in the storage unit 11 or set in the information terminal 3 . The generation unit 124 determines the determination result as the segment tendency of the segment.
 1つ前の区間の区間傾向と今回の区間の区間傾向とが同じである場合に、生成部124は、1つ前の区間に注目単位を結合し、次の所定期間を注目単位として抽出し、区間の始点から注目単位の終点までの経過時間が所定時間以上であるか否かの判定を繰り返す。 When the segment trend of the previous segment and the segment trend of the current segment are the same, the generation unit 124 combines the unit of interest with the previous segment and extracts the next predetermined period as the unit of interest. , repeatedly determines whether or not the elapsed time from the start point of the section to the end point of the target unit is equal to or longer than a predetermined time.
 1つ前の区間の区間傾向と今回の区間の区間傾向とが異なる場合に、生成部124は、1つ前の区間及び区間傾向を確定する。生成部124は、注目単位の始点を区間の始点に設定し、上述の処理により遷移グラフの終点まで区間及び区間傾向の決定を繰り返す。 When the segment trend of the previous segment and the segment trend of the current segment are different, the generation unit 124 determines the previous segment and the segment trend. The generation unit 124 sets the start point of the unit of interest as the start point of the section, and repeats the determination of the section and the section tendency until the end point of the transition graph by the above-described processing.
 生成部124は、ここに示した具体的な方法に限られず、その他の方法により、時系列情報に基づいて議論を構成する複数の区間を決定するとともに、区間ごとに第1グループ及び第2グループのどちらの発話が主であるか、又は第1グループ及び第2グループの発話が拮抗していることを示す区間傾向を決定してもよい。 The generation unit 124 is not limited to the specific method shown here, and by other methods, determines a plurality of sections that constitute the discussion based on the time-series information, and generates the first group and the second group for each section. , or whether the utterances of the first group and the second group are antagonistic.
 生成部124は、議論を構成する複数の区間それぞれと、当該区間において第1グループ及び第2グループのどちらの発話が主であるかを示す区間傾向と、を議論の全部又は一部において関連付けた区間情報を生成し、記憶部11に記憶させる。区間情報において、議論の一部における区間傾向は、第1グループ及び第2グループの発話が拮抗していること(すなわち、第1グループ及び第2グループのどちらの発話も主でないこと)を示してもよい。 The generation unit 124 associates each of the plurality of sections that constitute the discussion with a section tendency indicating which of the first group and the second group is the main utterance in the section, in all or part of the discussion. Section information is generated and stored in the storage unit 11 . In the section information, the section tendency in a part of the discussion indicates that the utterances of the first group and the second group are antagonistic (that is, neither the utterance of the first group nor the second group is dominant). good too.
 このように、生成部124は、分析対象の議論を複数の区間に分割するとともに、区間ごとに第1グループ及び第2グループのどちらの発話が主であるかを示す区間傾向を決定する。時系列情報は2グループの発話量の相対的な大きさを時刻ごとに細かく表しているため、分析者は時系列情報をそのまま見ても2グループの発話の傾向を分析しづらい。これに対して、生成部124が議論において同じ発話の傾向が連続している期間を1つの区間として分けることにより、分析者が議論全体における発話の傾向の遷移を把握することを容易にし、2グループ間の議論を分析しやすくできる。 In this way, the generation unit 124 divides the argument to be analyzed into a plurality of sections, and determines the section tendency indicating which of the first group and the second group is the main utterance for each section. Since the time-series information expresses the relative amount of utterances of the two groups in detail for each time, it is difficult for the analyst to analyze the tendency of the utterances of the two groups even if the time-series information is viewed as it is. On the other hand, when the generation unit 124 divides the period in which the same utterance tendency continues in the discussion into one section, the analyst can easily grasp the transition of the utterance tendency in the whole discussion. It makes it easier to analyze discussions between groups.
 出力部125は、議論の最中と、議論の終了後と、に少なくとも一方において、生成部124が生成した区間情報を出力する。以下、出力部125が議論の最中に区間情報を出力することをリアルタイム出力と呼び、出力部125が議論の終了後に区間情報を出力することを事後出力と呼ぶ。 The output unit 125 outputs the section information generated by the generation unit 124 during the discussion and/or after the discussion ends. Hereinafter, the output of the section information by the output unit 125 during the discussion is called real-time output, and the output of the section information by the output unit 125 after the discussion is over is called post-output.
 図6(a)、図6(b)は、出力部125が区間情報のリアルタイム出力をする方法を説明するための模式図である。出力部125は、生成部124が生成した区間情報に対応する情報を、図6(a)のように情報端末3が備える表示部に表示させ、又は図6(b)のように集音装置2が備える音出力部から出力させるための制御を行う。 FIGS. 6(a) and 6(b) are schematic diagrams for explaining how the output unit 125 outputs section information in real time. The output unit 125 causes the information corresponding to the section information generated by the generation unit 124 to be displayed on the display unit provided in the information terminal 3 as shown in FIG. 2 performs control for outputting from the sound output unit provided in .
 出力部125は、例えば、取得部123が取得した時系列情報及び生成部124が生成した区間情報に対応する情報を、情報端末3に送信する。図6(a)の例では、出力部125は、取得部123が取得した時系列情報に対応する、第1グループ及び第2グループのどちらの発話量が大きいかの遷移を示す遷移グラフ33を、情報端末3に表示させている。出力部125は、区間ごとに区間傾向に応じた色の線を用いて遷移グラフ33を生成することによって、遷移グラフ33上に区間傾向を表示してもよい。 For example, the output unit 125 transmits information corresponding to the time-series information acquired by the acquisition unit 123 and the section information generated by the generation unit 124 to the information terminal 3 . In the example of FIG. 6(a), the output unit 125 creates a transition graph 33 showing the transition between the first group and the second group, which corresponds to the time-series information acquired by the acquisition unit 123 and has a larger amount of speech. , is displayed on the information terminal 3 . The output unit 125 may display the section tendency on the transition graph 33 by generating the transition graph 33 using a line of a color corresponding to the section tendency for each section.
 また、出力部125は、生成部124が生成した区間情報に対応する、区間の長さと当該区間の区間傾向とを示す棒グラフ34を、情報端末3に表示させている。これにより、音声分析装置1は、分析者が議論の途中に2グループの発話の傾向を把握することを容易にし、2グループ間の議論における発話の傾向を分析しやすくすることができる。 Also, the output unit 125 causes the information terminal 3 to display a bar graph 34 corresponding to the section information generated by the generation unit 124 and showing the length of the section and the section tendency of the section. As a result, the speech analysis apparatus 1 makes it easy for the analyst to grasp the utterance tendencies of the two groups during the discussion, and facilitates analysis of the utterance tendencies in the discussion between the two groups.
 また、出力部125は、選択部121が選択した基準区間情報に対応する遷移グラフ33及び棒グラフ34を、情報端末3に表示させている。これにより、音声分析装置1は、分析対象の議論の区間情報を、分析者によって指定された基準区間情報と対比しやすくできる。 Also, the output unit 125 causes the information terminal 3 to display the transition graph 33 and the bar graph 34 corresponding to the reference section information selected by the selection unit 121 . Thereby, the speech analysis device 1 can easily compare the section information of the argument to be analyzed with the reference section information specified by the analyst.
 また、出力部125は、例えば、生成部124が生成した区間情報と、選択部121が選択した基準区間情報と、の差分に対応する情報を、比較結果として情報端末3又は集音装置2に送信する。出力部125は、区間情報及び基準区間情報の差分として、区分の長さ、数、順序等の違いが所定の条件を満たすか否かを出力する。 Further, the output unit 125 outputs, for example, information corresponding to the difference between the section information generated by the generation unit 124 and the reference section information selected by the selection unit 121 to the information terminal 3 or the sound collector 2 as a comparison result. Send. The output unit 125 outputs, as the difference between the section information and the reference section information, whether or not the difference in the length, number, order, etc. of sections satisfies a predetermined condition.
 図6(a)の例では、出力部125は、区間情報における第1グループ(Tグループ)の区分の長さが、基準区間情報における第1グループの区分の長さよりも長いことを示すメッセージ35を、情報端末3に表示させている。図6(b)の例では、出力部125は、区間情報における第2グループ(Sグループ)の区分の長さが、基準区間情報における第2グループの区分の長さよりも長いことを示す音声を、集音装置2から出力させている。これにより、音声分析装置1は、分析対象の議論の区間情報と、分析者によって指定された基準区間情報と、の差分を分析者に対して逐次通知し、進行中の議論に反映しやすくできる。 In the example of FIG. 6A, the output unit 125 outputs a message 35 indicating that the segment length of the first group (T group) in the segment information is longer than the segment length of the first group in the reference segment information. is displayed on the information terminal 3. In the example of FIG. 6B, the output unit 125 outputs a voice indicating that the segment length of the second group (S group) in the segment information is longer than the segment length of the second group in the reference segment information. , is output from the sound collector 2 . As a result, the speech analysis apparatus 1 sequentially notifies the analyst of the difference between the section information of the discussion to be analyzed and the reference section information specified by the analyst, and can easily reflect the difference in the ongoing discussion. .
 図7は、出力部125が区間情報の事後出力をする方法を説明するための模式図である。出力部125は、生成部124が生成した区間情報に対応する情報を、図7のように情報端末3が備える表示部に表示させるための制御を行う。 FIG. 7 is a schematic diagram for explaining how the output unit 125 outputs the section information after the fact. The output unit 125 performs control for displaying information corresponding to the section information generated by the generation unit 124 on the display unit included in the information terminal 3 as shown in FIG.
 出力部125は、例えば、取得部123が取得した時系列情報及び生成部124が生成した区間情報に対応する情報を、情報端末3に送信する。図7の例では、出力部125は、取得部123が取得した時系列情報に対応する、第1グループ及び第2グループのどちらの発話量が大きいかの遷移を示す遷移グラフ36を、情報端末3に表示させている。出力部125は、区間ごとに区間傾向に応じた色の線を用いて遷移グラフ36を生成することによって、遷移グラフ36上に区間傾向を表示してもよい。 For example, the output unit 125 transmits information corresponding to the time-series information acquired by the acquisition unit 123 and the section information generated by the generation unit 124 to the information terminal 3 . In the example of FIG. 7 , the output unit 125 outputs a transition graph 36 indicating a transition between the first group and the second group, which corresponds to the time-series information acquired by the acquisition unit 123, to the information terminal. 3 is displayed. The output unit 125 may display the section tendency on the transition graph 36 by generating the transition graph 36 using a line of a color corresponding to the section tendency for each section.
 また、出力部125は、生成部124が生成した区間情報に対応する、区間の長さと当該区間の区間傾向とを示す棒グラフ37を、情報端末3に表示させている。これにより、音声分析装置1は、分析者が議論全体における2グループの発話の傾向を把握することを容易にし、2グループ間の議論における発話の傾向を分析しやすくすることができる。 In addition, the output unit 125 causes the information terminal 3 to display a bar graph 37 corresponding to the section information generated by the generation unit 124 and showing the length of the section and the section tendency of the section. As a result, the speech analysis device 1 makes it easy for the analyst to grasp the tendencies of the two groups' utterances in the entire discussion, and facilitates the analysis of the utterance tendencies in the discussion between the two groups.
 また、出力部125は、選択部121が選択した基準区間情報に対応する遷移グラフ36及び棒グラフ37を、情報端末3に表示させている。これにより、音声分析装置1は、分析対象の議論の区間情報を、分析者によって指定された基準区間情報と対比しやすくできる。 Also, the output unit 125 causes the information terminal 3 to display the transition graph 36 and the bar graph 37 corresponding to the reference section information selected by the selection unit 121 . Thereby, the speech analysis device 1 can easily compare the section information of the argument to be analyzed with the reference section information specified by the analyst.
 また、出力部125は、例えば、生成部124が生成した区間情報が示す複数の区間それぞれの区間傾向と、選択部121が選択した基準区間情報が示す複数の区間それぞれの区間傾向と、を関連付けた情報を、情報端末3に送信する。 Also, the output unit 125 associates, for example, the section tendency of each of the plurality of sections indicated by the section information generated by the generation unit 124 with the section tendency of each of the plurality of sections indicated by the reference section information selected by the selection unit 121. The received information is transmitted to the information terminal 3.
 図7の例では、出力部125は、生成部124が生成した区間情報及び選択部121が選択した基準区間情報それぞれに対して、複数の区間の区間傾向の遷移38を、情報端末3に表示させている。出力部125は、例えば、動的計画法により、生成部124が生成した区間情報が示す区間と、選択部121が選択した基準区間情報が示す区間と、の対応関係を検出し、区間の増加、減少、順序の差異等を区間傾向の遷移38として表示する、これにより、音声分析装置1は、分析対象の議論の区間情報における区間傾向の遷移と、分析者によって指定された基準区間情報における区間傾向の遷移と、の違いをわかりやすくすることができる。 In the example of FIG. 7 , the output unit 125 displays on the information terminal 3 the transitions 38 of the interval tendencies of a plurality of intervals for each of the interval information generated by the generation unit 124 and the reference interval information selected by the selection unit 121. I am letting The output unit 125, for example, by dynamic programming, detects the correspondence relationship between the section indicated by the section information generated by the generation unit 124 and the section indicated by the reference section information selected by the selection unit 121, and increases the section. , decrease, order difference, etc. are displayed as transitions 38 of the interval tendency. It is possible to make it easy to understand the difference between the transition of the interval trend and the transition.
 また、出力部125は、生成部124が決定した複数の区間それぞれにおいて発せられた語(言葉)を、当該区間と関連付けて出力してもよい。この場合に、出力部125は、例えば、複数の区間それぞれにおける音声に対して既知の音声認識処理を行うことによって、当該区間の発話に含まれる語を抽出する。出力部125は、例えば、複数の区間それぞれと、当該区間に対して抽出した語のうち一部又は全部と、を関連付けて情報端末3に表示させる。これにより、音声分析装置1は、複数の区間それぞれの内容を分析者が把握しやすくできる。 Also, the output unit 125 may output the words (words) uttered in each of the plurality of intervals determined by the generation unit 124 in association with the interval. In this case, the output unit 125 extracts the words included in the utterance in each of the sections by performing known speech recognition processing on the speech in each of the sections, for example. The output unit 125 causes the information terminal 3 to display, for example, each of the plurality of sections in association with some or all of the words extracted for the section. As a result, the speech analysis device 1 can make it easier for the analyst to understand the content of each of the plurality of sections.
 また、出力部125は、生成部124が生成した区間情報に基づいて、議論全体の特徴を出力してもよい、この場合に、出力部125は、生成部124が生成した区間情報が示す複数の区間の区間傾向に基づいて、議論全体の特徴を判定する。出力部125は、例えば、議論を構成する複数の区間における、第1グループの発話が主である区間と、第2グループの発話が主である区間と、第1グループ及び第2グループの発話が拮抗している区間と、の割合に基づいて、議論全体の特徴を判定する。 Further, the output unit 125 may output the characteristics of the entire discussion based on the section information generated by the generation unit 124. In this case, the output unit 125 outputs a plurality of Based on the interval tendency of the interval of , determine the characteristics of the entire discussion. For example, the output unit 125 outputs, in a plurality of sections constituting a discussion, a section in which the utterance of the first group is the main part, a section in which the utterance of the second group is the main part, and the utterances of the first group and the second group. The characteristics of the entire argument are determined based on the ratio of the competitive interval and the .
 出力部125は、例えば、Tグループ(指導者グループ)の発話が主である区間の割合が所定値以上である場合に講義主体型の議論と判定し、Sグループ(学習者グループ)の発話が主である区間の割合が所定値以上である場合に演習主体型の議論と判定する。出力部125は、判定した議論全体の特徴を表すメッセージ(レポート)を、情報端末3に表示させる。これにより、音声分析装置1は、複数の区間の区間傾向に基づいて判定された議論全体の傾向を分析者が把握しやすくできる。 For example, the output unit 125 determines that the discussion is a lecture-based discussion when the ratio of the section in which the T group (leader group) is the main utterance is equal to or greater than a predetermined value, and the S group (learner group) utterance is determined. If the ratio of main sections is equal to or greater than a predetermined value, it is determined that the discussion is an exercise-based discussion. The output unit 125 causes the information terminal 3 to display a message (report) representing the determined characteristics of the entire discussion. As a result, the speech analysis device 1 makes it easier for the analyst to grasp the trend of the entire discussion determined based on the segment trends of the plurality of segments.
[音声分析方法のフローチャート]
 図8は、本実施形態に係る音声分析装置1が実行する例示的な音声分析方法のフローチャートを示す図である。選択部121は、区間情報の比較対象とする基準区間情報を選択する(S11)。分類部122は、分析対象の議論に参加する複数の参加者を、第1グループ又は第2グループに分類する(S12)。分類部122は、例えば、情報端末3において複数の参加者が第1グループ及び第2グループのどちらに属するかの設定を受け付け、又は複数の参加者の属性に基づいて複数の参加者を自動的に第1グループ又は第2グループに分類する。
[Flowchart of voice analysis method]
FIG. 8 is a diagram showing a flowchart of an exemplary speech analysis method executed by the speech analysis device 1 according to this embodiment. The selection unit 121 selects reference section information to be compared with section information (S11). The classification unit 122 classifies a plurality of participants participating in the analysis target discussion into the first group or the second group (S12). For example, the classification unit 122 receives a setting as to whether the plurality of participants belong to the first group or the second group in the information terminal 3, or automatically classifies the plurality of participants based on the attributes of the plurality of participants. classified into the first group or the second group.
 以降の処理は、議論の最中に逐次行われ、又は議論の終了後に行われる。取得部123は、集音装置2から、複数の参加者が議論で発した音声を取得する。取得部123は、集音装置2から取得した音声に基づいて、複数の参加者それぞれの発話期間を特定する(S13)。取得部123は、特定した複数の参加者それぞれの発話期間に基づいて、第1グループ及び第2グループそれぞれの発話状況を時間ごとに示す時系列情報を取得する(S14)。 Subsequent processing is performed sequentially during the discussion, or after the discussion ends. The acquisition unit 123 acquires voices uttered by a plurality of participants in the discussion from the sound collector 2 . The acquisition unit 123 identifies the utterance period of each of the multiple participants based on the sound acquired from the sound collector 2 (S13). The acquisition unit 123 acquires time-series information indicating the utterance status of each of the first group and the second group for each time based on the utterance period of each of the identified participants (S14).
 生成部124は、取得部123が取得した時系列情報に基づいて、議論を構成する複数の区間それぞれと、当該区間において第1グループ及び第2グループのどちらの発話が主であるかを示す区間傾向と、を関連付けた区間情報を生成する区間情報生成処理を行う(S2)。ステップS2の区間情報生成処理については、図9を用いて後述する。 Based on the time-series information acquired by the acquisition unit 123, the generation unit 124 generates a plurality of sections constituting the discussion and a section indicating which of the first group and the second group is the main utterance in the section. Section information generation processing is performed to generate section information that associates the tendency with the (S2). Section information generation processing in step S2 will be described later using FIG.
 出力部125は、生成部124が生成した区間情報を、集音装置2及び情報端末3の少なくとも一方に出力する(S15)。 The output unit 125 outputs the section information generated by the generation unit 124 to at least one of the sound collector 2 and the information terminal 3 (S15).
 図9は、本実施形態に係る音声分析装置1が実行する例示的な音声分析方法における区間情報生成処理のフローチャートを示す図である。生成部124は、取得部123が取得した時系列情報に基づいて、第1グループ及び第2グループのどちらの発話量が大きいかの遷移を示す遷移グラフを生成する(S21)。生成部124は、原点(遷移グラフの始点)を区間の始点に設定する(S22)。生成部124は、遷移グラフにおいて時系列順に1つの所定期間(例えば、5秒間)を注目単位として抽出する(S23)。 FIG. 9 is a diagram showing a flowchart of section information generation processing in an exemplary speech analysis method executed by the speech analysis device 1 according to this embodiment. The generation unit 124 generates a transition graph showing transitions indicating which of the first group and the second group has a larger amount of speech based on the time-series information acquired by the acquisition unit 123 (S21). The generating unit 124 sets the origin (the starting point of the transition graph) as the starting point of the section (S22). The generation unit 124 extracts one predetermined period (for example, 5 seconds) in chronological order from the transition graph as a target unit (S23).
 生成部124は、区間の始点から注目単位の終点までの経過時間が所定時間以上であるか否かを判定する(S24)。区間の始点から注目単位の終点までの経過時間が所定時間以上でない場合に(S25のNO)、生成部124は、ステップS23に戻って次の注目単位に対して処理を繰り返す。 The generation unit 124 determines whether or not the elapsed time from the start point of the section to the end point of the attention unit is equal to or longer than a predetermined time (S24). If the elapsed time from the start point of the section to the end point of the attention unit is not equal to or longer than the predetermined time (NO in S25), the generation unit 124 returns to step S23 and repeats the process for the next attention unit.
 区間の始点から注目単位の終点までの経過時間が所定時間以上である場合に(S25のYES)、生成部124は、遷移グラフ上で区間の始点の座標と注目単位の終点の座標との間の傾きを算出する。(S26)。 If the elapsed time from the start point of the section to the end point of the unit of interest is equal to or longer than the predetermined time (YES in S25), the generating unit 124 calculates the distance between the coordinates of the start point of the section and the end point of the unit of interest on the transition graph. Calculate the slope of (S26).
 生成部124は、算出した傾きに基づいて、区間傾向を判定する(S27)。生成部124は、例えば、傾きが第1基準値以下である場合に、第1グループの発話が主であると判定する。生成部124は、例えば、傾きが第1基準値より大きく第2基準値以下である場合に、第1グループ及び第2グループの発話が拮抗していると判定する。生成部124は、例えば、傾きが第2基準値より大きい場合に、第2グループの発話が主であると判定する。生成部124は、判定結果を区間の区間傾向として決定する。 The generation unit 124 determines the section tendency based on the calculated slope (S27). For example, when the slope is equal to or less than the first reference value, the generation unit 124 determines that the utterances of the first group are predominant. For example, when the slope is greater than the first reference value and equal to or less than the second reference value, the generation unit 124 determines that the utterances of the first group and the second group are competing with each other. For example, when the slope is greater than the second reference value, the generation unit 124 determines that the utterances of the second group are predominant. The generation unit 124 determines the determination result as the segment tendency of the segment.
 1つ前の区間の区間傾向と今回の区間の区間傾向とが同じである場合に(S28のYES)、生成部124は、1つ前の区間に注目単位を結合する(S29)。生成部124は、ステップS23に戻って次の注目単位に対して処理を繰り返す。 If the segment tendency of the previous segment and the segment trend of the current segment are the same (YES in S28), the generation unit 124 combines the unit of interest with the previous segment (S29). The generation unit 124 returns to step S23 and repeats the processing for the next target unit.
 1つ前の区間の区間傾向と今回の区間の区間傾向とが異なる場合に(S28のNO)、生成部124は、1つ前の区間及び区間傾向を確定する(S30)。時系列情報が終了していない場合に(S31のNO)、生成部124は、注目単位の始点を区間の始点に設定する(S32)。生成部124は、ステップS23に戻って次の注目単位に対して処理を繰り返す。 If the segment trend of the previous segment and the segment trend of the current segment are different (NO in S28), the generator 124 determines the previous segment and the segment trend (S30). When the time-series information has not ended (NO in S31), the generation unit 124 sets the starting point of the attention unit as the starting point of the section (S32). The generation unit 124 returns to step S23 and repeats the processing for the next target unit.
 時系列情報が終了した場合に(S31のYES)、生成部124は、議論を構成する複数の区間それぞれと、当該区間において第1グループ及び第2グループのどちらの発話が主であるかを示す区間傾向と、を議論の全部又は一部において関連付けた区間情報を生成し、記憶部11に記憶させる。区間情報において、議論の一部における区間傾向は、第1グループ及び第2グループの発話が拮抗していることを示してもよい。 When the time-series information has ended (YES in S31), the generation unit 124 indicates each of the plurality of sections constituting the discussion and which of the first group and the second group is the main utterance in the section. Section information that associates the section tendency with all or part of the discussion is generated and stored in the storage unit 11 . In the segment information, segment tendencies in part of the discussion may indicate that the utterances of the first group and the second group are competing.
[本実施形態の効果]
 本実施形態に係る音声分析システムSによれば、音声分析装置1は、議論の音声に基づいて、議論の区間ごとに2グループのどちらの発話が主であるかを示す区間傾向を判定し、区間及び区間傾向を関連付けて分析者に通知する。これにより、音声分析システムSは、分析者が2グループの発話の傾向を把握することを容易にし、2グループ間の議論における発話の傾向を分析しやすくすることができる。
[Effect of this embodiment]
According to the speech analysis system S according to the present embodiment, the speech analysis device 1 determines a section tendency indicating which of the two groups is the main utterance for each discussion section based on the speech of the discussion, Intervals and interval trends are associated and reported to analysts. As a result, the speech analysis system S makes it easy for the analyst to grasp the utterance tendencies of the two groups, and facilitates analysis of the utterance tendencies in the discussion between the two groups.
[第1変形例]
 議論の途中で、複数の参加者のグループ分けが変更される場合がある。本変形例では、音声分析装置1は、議論における複数の期間の間で、第1グループ及び第2グループそれぞれに属する参加者を変更し、変更されたグループ分けに基づいて区間情報を生成する。
[First modification]
The grouping of multiple participants may change during the discussion. In this modification, the speech analysis device 1 changes the participants belonging to the first group and the second group during a plurality of periods in the discussion, and generates section information based on the changed grouping.
 分類部122は、議論における複数の期間の間で、第1グループ及び第2グループそれぞれに属する参加者を変更する。分類部122は、例えば、情報端末3において予め議論の構成(解説の期間、演習の期間等)の設定を受け付け、当該構成が変化するタイミングで第1グループ及び第2グループそれぞれに属する参加者を変更してもよい。分類部122は、例えば、解説の期間では教師を第1グループとし、生徒を第2グループとする一方で、演習の期間では一部の生徒を第1グループとし、他の一部の生徒を第2グループとする。 The classification unit 122 changes the participants who belong to the first group and the second group during a plurality of periods in the discussion. For example, the classification unit 122 accepts settings of the discussion structure (explanation period, exercise period, etc.) in advance in the information terminal 3, and at the timing when the structure changes, the participants belonging to the first group and the second group are classified. You can change it. For example, the classification unit 122 classifies teachers into the first group and students into the second group during the commentary period, while classifying some students into the first group and other students into the second group during the exercise period. 2 groups.
 また、分類部122は、例えば、カメラ等によって取得した撮像画像に対して既知の画像認識処理を行うことによって複数の参加者それぞれの配置を検出し、当該配置が変化するタイミングで第1グループ及び第2グループそれぞれに属する参加者を変更してもよい。分類部122は、例えば、特定の座席に座った参加者を第1グループとし、他の座席に座った参加者を第2グループとする。 Further, the classification unit 122 detects the arrangement of each of the plurality of participants by, for example, performing known image recognition processing on the captured image acquired by a camera or the like, and at the timing when the arrangement changes, the first group and the The participants belonging to each of the second groups may be changed. For example, the classifying unit 122 classifies the participants sitting on the specific seats into the first group and the participants sitting in the other seats into the second group.
 また、分類部122は、期間ごとに異なるグループ分けに応じて、第1グループ及び第2グループを含む親グループを変更してもよい。すなわち、分類部122は、議論における複数の期間それぞれに対して、複数の参加者の一部を分類した第1グループ及び第2グループを含む第1親グループと、第1親グループに属しない複数の参加者の一部を分類した第1グループ及び第2グループを含む第2親グループと、を生成する。分類部122は、3つ以上の親グループを生成してもよい。 In addition, the classification unit 122 may change the parent group including the first group and the second group according to grouping that differs for each period. That is, for each of a plurality of periods in the discussion, the classification unit 122 classifies some of the participants into a first parent group that includes a first group and a second group, and a plurality of participants that do not belong to the first parent group. and a second parent group that includes a first group and a second group that classify some of the participants in . The classification unit 122 may generate three or more parent groups.
 また、分類部122は、議論における少なくとも1つの期間において、複数の参加者全体(例えば、教室全体)を含む親グループを生成してもよい。すなわち、分類部122は、複数の参加者の一部を分類した第1グループ及び第2グループを含む第1親グループと、第1親グループに属しない複数の参加者の一部を分類した第1グループ及び第2グループを含む第2親グループと、を生成し、さらに第1親グループ及び第2親グループに属する参加者を分類した第1グループ及び第2グループを含む第3親グループを生成してもよい。 In addition, the classification unit 122 may generate a parent group including all participants (for example, the entire classroom) during at least one period of the discussion. That is, the classification unit 122 classifies some of the participants into a first parent group including a first group and a second group, and classifies some of the participants who do not belong to the first parent group into a first parent group. A second parent group including the first group and the second group is generated, and a third parent group including the first group and the second group is generated by classifying the participants belonging to the first parent group and the second parent group. You may
 生成部124は、複数の親グループそれぞれに対して、区間情報を生成する。出力部125は、複数の親グループそれぞれの区間情報を出力する。図10は、本変形例において出力部125が区間情報を出力する方法を説明するための模式図である。出力部125は、生成部124が生成した複数の親グループそれぞれの区間情報に対応する情報を、図10のように情報端末3が備える表示部に表示させるための制御を行う。 The generation unit 124 generates section information for each of a plurality of parent groups. The output unit 125 outputs section information for each of the multiple parent groups. FIG. 10 is a schematic diagram for explaining how the output unit 125 outputs section information in this modification. The output unit 125 performs control for displaying the information corresponding to the section information of each of the plurality of parent groups generated by the generation unit 124 on the display unit included in the information terminal 3 as shown in FIG.
 出力部125は、例えば、第1親グループの区間情報と、第2親グループの区間情報と、を同時に出力する。図10の例では、出力部125は、10分から50分の期間において、第1親グループ及び第2親グループを含む3つの親グループの区間情報を並べて表示している。これにより、音声分析装置1は、分析者に複数の親グループ(複数のテーブル等)の区間情報を俯瞰させることができる。 The output unit 125 outputs, for example, the section information of the first parent group and the section information of the second parent group at the same time. In the example of FIG. 10, the output unit 125 displays the section information of three parent groups including the first parent group and the second parent group side by side in a period of 10 minutes to 50 minutes. As a result, the speech analysis device 1 allows the analyst to view the section information of multiple parent groups (multiple tables, etc.).
 また、出力部125は、例えば、第1親グループ及び第2親グループの少なくとも一方の区間情報と、第3親グループの区間情報と、を出力する。図10の例では、出力部125は、10分から50分の期間において第1親グループ及び第2親グループを含む3つの親グループの区間情報を並べて表示しており、さらに0分から10分の期間及び50分から60分の期間において複数の参加者全体に対応する第3親グループの区間情報を表示している。これにより、音声分析装置1は、議論の途中で複数の参加者のグループ分けが変更される場合に、期間ごとに異なるグループ分けに応じた区間情報を分析者に提供できる。また、音声分析装置1は、複数の参加者全体(教室全体等)に対応する親グループと、複数の参加者を分けたまとまり(複数のテーブル等)に対応する複数の親グループと、を階層的に分析しやすくできる。 Also, the output unit 125 outputs, for example, section information of at least one of the first parent group and the second parent group, and section information of the third parent group. In the example of FIG. 10, the output unit 125 displays the section information of the three parent groups including the first parent group and the second parent group side by side for a period of 10 minutes to 50 minutes. and section information of the third parent group corresponding to all of the participants in the period of 50 minutes to 60 minutes. As a result, when the grouping of a plurality of participants is changed during the discussion, the speech analysis device 1 can provide the analyst with segment information corresponding to different groupings for each period. In addition, the speech analysis apparatus 1 hierarchically divides a parent group corresponding to a plurality of participants as a whole (a whole classroom, etc.) and a plurality of parent groups corresponding to groups (a plurality of tables, etc.) into which the plurality of participants are divided. can be easily analyzed.
[第2変形例]
 議論の途中で、指導者等の特定の参加者が複数のテーブル間を移動して異なるグループに参加する場合がある。本変形例では、音声分析装置1は、特定の参加者の位置に基づいてグループ分けを変更し、変更されたグループ分けに基づいて区間情報を生成する。
[Second modification]
In the middle of a discussion, a particular participant, such as a leader, may move between multiple tables and join different groups. In this modification, the speech analysis device 1 changes grouping based on the position of a specific participant, and generates section information based on the changed grouping.
 図11は、本変形例において生成部124が区間情報を生成する方法を説明するための模式図である。分類部122は、議論中における、特定の参加者である指導者の位置を推定する。分類部122は、例えば、取得部123が複数の集音装置2から取得した音声と、予め登録された指導者の声の特徴量(声紋等)と、を比較することによって、指導者がいずれの集音装置2に近いかを推定してもよい。分類部122は、指導者が保持する通信装置(スマートフォン等)と、複数の集音装置2と、の間で行われる近距離無線通信の強度に基づいて、指導者がいずれの集音装置2に近いかを推定してもよい。 FIG. 11 is a schematic diagram for explaining how the generation unit 124 generates section information in this modified example. The classification unit 122 estimates the position of the leader who is a specific participant during the discussion. For example, the classifying unit 122 compares the voices acquired by the acquiring unit 123 from the plurality of sound collecting devices 2 with the pre-registered feature amount (voiceprint, etc.) of the voice of the leader to It may be estimated whether it is close to the sound collector 2 of . The classifying unit 122 classifies which sound collector 2 the leader is based on the strength of the short-range wireless communication performed between the communication device (smartphone, etc.) held by the leader and the plurality of sound collectors 2. may be estimated to be close to
 分類部122は、ここに示した具体的な方法に限られず、その他の方法を用いて議論中における指導者の位置を推定してもよい。図11の例は、指導者がテーブル1、テーブル2、テーブル3に順に移動したことを表している。 The classification unit 122 is not limited to the specific method shown here, and may use other methods to estimate the position of the leader during the discussion. The example of FIG. 11 represents that the instructor moved to table 1, table 2, and table 3 in order.
 分類部122は、推定した指導者の位置に基づいて、議論における複数の期間の間で、当該指導者が属する第1グループに属する参加者と、第2グループに属する参加者と、を変更する。図11の例では、分類部122は、指導者がテーブル1に位置している期間において、指導者を含む第1グループと、テーブル1にいる生徒を含む第2グループと、を生成する。また、分類部122は、指導者がテーブル2に位置している期間において、指導者を含む第1グループと、テーブル2にいる生徒を含む第2グループと、を生成する。また、分類部122は、指導者がテーブル3に位置している期間において、指導者を含む第1グループと、テーブル3にいる生徒を含む第2グループと、を生成する。 Based on the estimated position of the leader, the classifying unit 122 changes the participants belonging to the first group to which the leader belongs and the participants belonging to the second group between a plurality of periods in the discussion. . In the example of FIG. 11 , the classification unit 122 generates a first group including the instructor and a second group including the students at table 1 during the period when the instructor is positioned at table 1 . Further, the classification unit 122 generates a first group including the instructor and a second group including the students at the table 2 during the period when the instructor is positioned at the table 2 . Further, the classification unit 122 generates a first group including the instructor and a second group including the students at the table 3 during the period when the instructor is positioned at the table 3 .
 生成部124は、グループ分けが変更された複数の期間それぞれに対して区間情報を生成し、生成した区間情報を結合する。これにより、音声分析装置1は、指導者等の特定の参加者が議論中に移動する場合に、当該特定の参加者の位置に応じて変更されたグループ分けに応じた区間情報を生成し、当該特定の参加者を中心とした発話の傾向を分析しやすくできる。 The generation unit 124 generates section information for each of the plurality of periods whose grouping has been changed, and combines the generated section information. As a result, when a specific participant such as a leader moves during the discussion, the speech analysis device 1 generates section information according to the grouping changed according to the position of the specific participant, It is possible to easily analyze the tendency of utterances centering on the specific participant.
 以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の全部又は一部は、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を併せ持つ。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments, and various modifications and changes are possible within the scope of the gist thereof. be. For example, all or part of the device can be functionally or physically distributed and integrated in arbitrary units. In addition, new embodiments resulting from arbitrary combinations of multiple embodiments are also included in the embodiments of the present invention. The effect of the new embodiment caused by the combination has the effect of the original embodiment.
 音声分析装置1のプロセッサは、図8、図9に示す音声分析方法に含まれる各ステップ(工程)の主体となる。すなわち、音声分析装置1のプロセッサは、図8、図9に示す音声分析方法を実行するためのプログラムを記憶部11から読み出し、該プログラムを実行して音声分析装置1の各部を制御することによって図8、図9に示す音声分析方法を実行する。図8、図9に示す音声分析方法に含まれるステップは一部省略されてもよく、ステップ間の順番が変更されてもよく、複数のステップが並行して行われてもよい。 The processor of the speech analysis device 1 is the subject of each step (process) included in the speech analysis method shown in FIGS. That is, the processor of the speech analysis device 1 reads a program for executing the speech analysis method shown in FIGS. The speech analysis method shown in FIGS. 8 and 9 is executed. Some steps included in the speech analysis method shown in FIGS. 8 and 9 may be omitted, the order between steps may be changed, and a plurality of steps may be performed in parallel.
S 音声分析システム
1 音声分析装置
11 記憶部
12 制御部
121 選択部
122 分類部
123 取得部
124 生成部
125 出力部
2 集音装置
3 情報端末
S Speech analysis system 1 Speech analysis device 11 Storage unit 12 Control unit 121 Selection unit 122 Classification unit 123 Acquisition unit 124 Generation unit 125 Output unit 2 Sound collector 3 Information terminal

Claims (15)

  1.  第1グループに属する参加者と第2グループに属する参加者とのそれぞれが議論で発した音声における前記第1グループ及び前記第2グループそれぞれの発話状況を時間ごとに示す時系列情報を取得する取得部と、
     前記時系列情報に基づいて、前記議論を構成する複数の区間それぞれと、当該区間において前記第1グループ及び前記第2グループのどちらの発話が主であるかを示す区間傾向と、を前記議論の全部又は一部において関連付けた区間情報を生成する生成部と、
     前記区間情報を出力する出力部と、
     を有する、音声分析装置。
    Acquisition for acquiring time-series information indicating the utterance status of each of the first group and the second group in the speech uttered in the discussion by each of the participants belonging to the first group and the participant belonging to the second group. Department and
    Based on the time-series information, each of a plurality of sections constituting the discussion, and a section tendency indicating which of the first group and the second group utterance is dominant in the section, of the discussion. a generation unit that generates section information associated in whole or in part;
    an output unit that outputs the section information;
    A voice analysis device having
  2.  前記区間傾向は、前記第1グループ及び前記第2グループのどちらの発話が主であるか、又は前記第1グループ及び前記第2グループの発話が拮抗していることを示す、
     請求項1に記載の音声分析装置。
    The interval tendency indicates which of the first group and the second group is the main utterance, or that the utterances of the first group and the second group are antagonistic,
    The speech analysis device according to claim 1.
  3.  前記生成部は、所定時間以上となるように前記複数の区間それぞれを決定し、当該区間における前記第1グループ及び前記第2グループの前記発話状況を比較することによって当該区間における前記区間傾向を決定する、
     請求項1又は2に記載の音声分析装置。
    The generating unit determines each of the plurality of segments so that they are equal to or longer than a predetermined time, and determines the segment trend in the segment by comparing the utterance situations of the first group and the second group in the segment. do,
    3. The speech analysis device according to claim 1 or 2.
  4.  前記時系列情報は、前記議論の始点から終点までの期間において、所定の時間フレームごとに、前記第1グループ及び前記第2グループのどちらの発話量が大きいかを示す情報である、
     請求項1から3のいずれか一項に記載の音声分析装置。
    The time-series information is information indicating which of the first group and the second group has a larger amount of speech for each predetermined time frame during the period from the start point to the end point of the discussion.
    The speech analysis device according to any one of claims 1 to 3.
  5.  複数の参加者を前記第1グループ及び前記第2グループに分類する分類部をさらに有する、
     請求項1から4のいずれか一項に記載の音声分析装置。
    further comprising a classification unit that classifies a plurality of participants into the first group and the second group;
    A speech analysis device according to any one of claims 1 to 4.
  6.  前記分類部は、前記議論における複数の期間の間で、前記第1グループ及び前記第2グループそれぞれに属する参加者を変更する、
     請求項5に記載の音声分析装置。
    The classifying unit changes participants belonging to each of the first group and the second group between a plurality of periods in the discussion.
    6. The speech analysis device according to claim 5.
  7.  前記分類部は、前記複数の参加者の一部を分類した前記第1グループ及び前記第2グループを含む第1親グループと、前記第1親グループに属しない前記複数の参加者の一部を分類した前記第1グループ及び前記第2グループを含む第2親グループと、を生成し、
     前記出力部は、前記第1親グループの前記区間情報と、前記第2親グループの前記区間情報と、を同時に出力する、
     請求項5又は6に記載の音声分析装置。
    The classification unit classifies a first parent group including the first group and the second group into which some of the plurality of participants are classified, and a portion of the plurality of participants not belonging to the first parent group. generating a second parent group containing the classified first group and the second group;
    The output unit simultaneously outputs the section information of the first parent group and the section information of the second parent group.
    7. The speech analysis device according to claim 5 or 6.
  8.  前記分類部は、前記複数の参加者の一部を分類した前記第1グループ及び前記第2グループを含む第1親グループと、前記第1親グループに属しない前記複数の参加者の一部を分類した前記第1グループ及び前記第2グループを含む第2親グループと、を生成し、さらに前記第1親グループ及び前記第2親グループに属する参加者を分類した前記第1グループ及び前記第2グループを含む第3親グループを生成し、
     前記出力部は、前記第1親グループ及び前記第2親グループの少なくとも一方の前記区間情報と、前記第3親グループの前記区間情報と、を出力する、
     請求項5又は6に記載の音声分析装置。
    The classification unit classifies a first parent group including the first group and the second group into which some of the plurality of participants are classified, and a portion of the plurality of participants not belonging to the first parent group. a second parent group containing the classified first group and the second group; create a third parent group containing the group;
    The output unit outputs the section information of at least one of the first parent group and the second parent group and the section information of the third parent group,
    7. The speech analysis device according to claim 5 or 6.
  9.  前記分類部は、特定の参加者の位置に基づいて、当該特定の参加者が属する前記第1グループに属する参加者と、前記第2グループに属する参加者と、を変更する、
     請求項5から8のいずれか一項に記載の音声分析装置。
    The classification unit changes the participants belonging to the first group to which the specific participant belongs and the participants belonging to the second group, based on the position of the specific participant.
    A speech analysis device according to any one of claims 5 to 8.
  10.  前記出力部は、前記音声に対して音声認識処理を行うことによって抽出された、前記複数の区間それぞれの発話に含まれる語を、当該区間と関連付けて出力する、
     請求項1から9のいずれか一項に記載の音声分析装置。
    The output unit outputs words included in the utterance of each of the plurality of sections, which are extracted by performing speech recognition processing on the speech, in association with the section.
    A speech analysis device according to any one of claims 1 to 9.
  11.  前記出力部は、前記議論を構成する前記複数の区間の前記区間傾向に基づいて、前記議論全体の特徴を出力する、
     請求項1から10のいずれか一項に記載の音声分析装置。
    The output unit outputs features of the discussion as a whole based on the section trends of the plurality of sections that make up the discussion.
    A speech analysis device according to any one of claims 1 to 10.
  12.  前記区間情報の比較対象とする基準区間情報を選択する選択部をさらに有し、
     前記出力部は、前記区間情報と、前記基準区間情報と、の比較結果を出力する、
     請求項1から11のいずれか一項に記載の音声分析装置。
    further comprising a selection unit that selects reference section information to be compared with the section information;
    The output unit outputs a comparison result between the section information and the reference section information.
    A speech analysis device according to any one of claims 1 to 11.
  13.  前記出力部は、前記議論の最中に、前記区間情報と、前記基準区間情報と、の差分に対応する情報を前記比較結果として出力する、
     請求項12に記載の音声分析装置。
    The output unit outputs information corresponding to a difference between the section information and the reference section information as the comparison result during the discussion.
    13. A speech analysis device according to claim 12.
  14.  前記出力部は、前記議論の後に、前記区間情報が示す前記複数の区間それぞれの前記区間傾向と、前記基準区間情報が示す前記複数の区間それぞれの前記区間傾向と、を関連付けて出力する、
     請求項12又は13に記載の音声分析装置。
    After the discussion, the output unit associates and outputs the section tendency of each of the plurality of sections indicated by the section information and the section tendency of each of the plurality of sections indicated by the reference section information.
    14. A speech analysis device according to claim 12 or 13.
  15.  プロセッサが実行する、
     第1グループに属する参加者と第2グループに属する参加者とのそれぞれが議論で発した音声における前記第1グループ及び前記第2グループそれぞれの発話状況を時間ごとに示す時系列情報を取得するステップと、
     前記時系列情報に基づいて、前記議論を構成する複数の区間それぞれと、当該区間において前記第1グループ及び前記第2グループのどちらの発話が主であるかを示す区間傾向と、を前記議論の全部又は一部において関連付けた区間情報を生成するステップと、
     前記区間情報を出力するステップと、
     を有する、音声分析方法。
    the processor executes
    A step of acquiring time-series information indicating the utterance status of each of the first group and the second group in the speech uttered in the discussion by each of the participants belonging to the first group and the participant belonging to the second group. and,
    Based on the time-series information, each of a plurality of sections constituting the discussion, and a section tendency indicating which of the first group and the second group utterance is dominant in the section, of the discussion. generating section information associated in whole or in part;
    a step of outputting the interval information;
    A speech analysis method comprising:
PCT/JP2021/040443 2021-11-02 2021-11-02 Voice analysis device and voice analysis method WO2023079602A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/040443 WO2023079602A1 (en) 2021-11-02 2021-11-02 Voice analysis device and voice analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/040443 WO2023079602A1 (en) 2021-11-02 2021-11-02 Voice analysis device and voice analysis method

Publications (1)

Publication Number Publication Date
WO2023079602A1 true WO2023079602A1 (en) 2023-05-11

Family

ID=86240796

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/040443 WO2023079602A1 (en) 2021-11-02 2021-11-02 Voice analysis device and voice analysis method

Country Status (1)

Country Link
WO (1) WO2023079602A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015207806A (en) * 2014-04-17 2015-11-19 コニカミノルタ株式会社 Remote conference support system and remote conference support program
JP2018073096A (en) * 2016-10-28 2018-05-10 シャープ株式会社 Information display device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015207806A (en) * 2014-04-17 2015-11-19 コニカミノルタ株式会社 Remote conference support system and remote conference support program
JP2018073096A (en) * 2016-10-28 2018-05-10 シャープ株式会社 Information display device

Similar Documents

Publication Publication Date Title
CN106847263B (en) Speech level evaluation method, device and system
CN110991381A (en) Real-time classroom student state analysis and indication reminding system and method based on behavior and voice intelligent recognition
CN110910691B (en) Personalized course generation method and system
JP2018205638A (en) Concentration ratio evaluation mechanism
Ahrens et al. Listening and conversational quality of spatial audio conferencing
Dias et al. Visual influences on interactive speech alignment
Raveh et al. Three's a Crowd? Effects of a Second Human on Vocal Accommodation with a Voice Assistant.
US10580434B2 (en) Information presentation apparatus, information presentation method, and non-transitory computer readable medium
CN110930781A (en) Recording and broadcasting system
JP2020148931A (en) Discussion analysis device and discussion analysis method
Niebuhr et al. Virtual reality as a digital learning tool in entrepreneurship: How virtual environments help entrepreneurs give more charismatic investor pitches
CN111479124A (en) Real-time playing method and device
US11132913B1 (en) Computer-implemented systems and methods for acquiring and assessing physical-world data indicative of avatar interactions
WO2023079602A1 (en) Voice analysis device and voice analysis method
US20230093298A1 (en) Voice conference apparatus, voice conference system and voice conference method
JP6589042B1 (en) Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system
Niebuhr et al. PASCAL and DPA: A pilot study on using prosodic competence scores to predict communicative skills for team working and public speaking
DE602004004824T2 (en) Automatic treatment of conversation groups
JP2020173415A (en) Teaching material presentation system and teaching material presentation method
CN111698452A (en) Online group state feedback method, system and device
JP6589040B1 (en) Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system
Urbain et al. AVLaughterCycle: An audiovisual laughing machine
Liu et al. Design of Voice Style Detection of Lecture Archives
Van Helvert et al. Observing, coaching and reflecting: A multi-modal natural language-based dialogue system in a learning context
Hahm Convergence research on the speaker's voice perceived by listener, and suggestions for future research application

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21963200

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023557870

Country of ref document: JP