WO2019142232A1 - Voice analysis device, voice analysis method, voice analysis program, and voice analysis system - Google Patents

Voice analysis device, voice analysis method, voice analysis program, and voice analysis system Download PDF

Info

Publication number
WO2019142232A1
WO2019142232A1 PCT/JP2018/000943 JP2018000943W WO2019142232A1 WO 2019142232 A1 WO2019142232 A1 WO 2019142232A1 JP 2018000943 W JP2018000943 W JP 2018000943W WO 2019142232 A1 WO2019142232 A1 WO 2019142232A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
unit
participant
participants
analysis
Prior art date
Application number
PCT/JP2018/000943
Other languages
French (fr)
Japanese (ja)
Inventor
武志 水本
哲也 菅原
Original Assignee
ハイラブル株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ハイラブル株式会社 filed Critical ハイラブル株式会社
Priority to PCT/JP2018/000943 priority Critical patent/WO2019142232A1/en
Priority to JP2018502280A priority patent/JP6589041B1/en
Publication of WO2019142232A1 publication Critical patent/WO2019142232A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present invention relates to a voice analysis device for analyzing voice, a voice analysis method, a voice analysis program and a voice analysis system.
  • the Harkness method (also referred to as the Harkness method) is known as a method for analyzing discussions in group learning and meetings (see, for example, Non-Patent Document 1).
  • the Harkness method the transition of each participant's utterance is recorded in a line. In this way, it is possible to analyze the contribution of each participant to the discussion and the relationship with others.
  • the Harkness Law can also be effectively applied to active learning where students take the initiative in learning.
  • the present invention has been made in view of these points, and it is an object of the present invention to provide a voice analysis device, a voice analysis method, a voice analysis program, and a voice analysis system capable of reducing the time and effort required to set the positions of participants when analyzing voices of discussion. Intended to be provided.
  • the voice analysis device acquires information on a plurality of participants from the sound collection device, and sets a position of each of the plurality of participants based on the acquired information on the participants And an acquisition unit configured to acquire a voice from the sound collection device, and an analysis unit configured to analyze the voice uttered by each of the plurality of participants based on the position set by the setting unit.
  • the setting unit may set a position of each of the plurality of participants by acquiring a voice from the sound collection device as the information on the participant and specifying a direction in which the acquired voice is emitted. .
  • the setting unit acquires an image captured by an imaging unit provided on the sound collection device as the information on the participant, and recognizes the faces of the plurality of participants included in the acquired image.
  • the positions of each of the plurality of participants may be set.
  • the setting unit acquires information of a card read by a reading unit provided on the sound collection device as the information on the participant, and the plurality of participants are each according to the direction in which the card is presented to the reading unit.
  • the position of may be set.
  • the setting unit may set the position of each of the plurality of participants based on the information input in the communication terminal in addition to the information on the participants.
  • It may further include a following unit that updates the position set by the setting unit in the middle of the voice being analyzed by the analysis unit.
  • the tracking unit may update the position set by the setting unit when the direction in which the voice analyzed by the analysis unit is emitted does not correspond to the position set by the setting unit. .
  • the tracking unit may update the position set by the setting unit to a direction in which the voice analyzed by the analysis unit is emitted.
  • the processor acquires information on a plurality of participants from the sound collection device, and sets the position of each of the plurality of participants based on the acquired information on the participants Performing the steps of: acquiring voice from the sound collection device; and analyzing the voice emitted by each of the plurality of participants based on the position set in the setting step. .
  • the voice analysis program acquires information on a plurality of participants from a sound collection device on a computer, and sets the positions of the plurality of participants based on the acquired information on the participants. Performing the steps of: obtaining voice from the sound collection device; and analyzing the voice emitted by each of the plurality of participants based on the position set in the setting step. .
  • a voice analysis system includes a voice analysis device and a sound collection device capable of communicating with the voice analysis device, wherein the sound collection device acquires voice and a plurality of participants.
  • the voice analysis device acquires information on the participant from the sound collection device, and sets positions of the plurality of participants based on the acquired information on the participant
  • FIG. 1 is a schematic view of a speech analysis system S according to the present embodiment.
  • the voice analysis system S includes a voice analysis device 100, a sound collection device 10, and a communication terminal 20.
  • the number of sound collectors 10 and communication terminals 20 included in the speech analysis system S is not limited.
  • the voice analysis system S may include devices such as other servers and terminals.
  • the voice analysis device 100, the sound collection device 10, and the communication terminal 20 are connected via a network N such as a local area network or the Internet. At least a part of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 may be directly connected without the network N.
  • the sound collector 10 includes a microphone array including a plurality of sound collectors (microphones) arranged in different orientations.
  • the microphone array includes eight microphones equally spaced on the same circumference in the horizontal plane with respect to the ground.
  • the sound collection device 10 transmits the voice acquired using the microphone array to the voice analysis device 100 as data.
  • the communication terminal 20 is a communication device capable of performing wired or wireless communication.
  • the communication terminal 20 is, for example, a portable terminal such as a smart phone terminal or a computer terminal such as a personal computer.
  • the communication terminal 20 receives the setting of analysis conditions from the analyst and displays the analysis result by the voice analysis device 100.
  • the voice analysis device 100 is a computer that analyzes the voice acquired by the sound collection device 10 by a voice analysis method described later. Further, the voice analysis device 100 transmits the result of the voice analysis to the communication terminal 20.
  • FIG. 2 is a block diagram of the speech analysis system S according to the present embodiment. Arrows in FIG. 2 indicate the main data flow, and there may be data flows not shown in FIG. In FIG. 2, each block is not a hardware (apparatus) unit configuration but a function unit configuration. As such, the blocks shown in FIG. 2 may be implemented in a single device or may be implemented separately in multiple devices. Transfer of data between the blocks may be performed via any means such as a data bus, a network, a portable storage medium, and the like.
  • the sound collection device 10 includes an imaging unit 11 for imaging a participant in the discussion, and a reading unit 12 for reading information such as a card presented by the participant in the discussion.
  • the imaging unit 11 is an imaging device capable of imaging a predetermined imaging range including the face of each participant (that is, each of a plurality of participants).
  • the imaging unit 11 includes imaging elements of the number and arrangement capable of imaging the faces of all the participants surrounding the sound collection device 10. For example, the imaging unit 11 includes two imaging elements arranged in different orientations of 180 degrees in a horizontal plane with respect to the ground.
  • the imaging part 11 may image the face of all the participants who surround the sound collection apparatus 10 by rotating in the horizontal surface with respect to the ground.
  • the imaging unit 11 may perform imaging at a timing (for example, every 10 seconds) set in advance in the sound collection device 10, or may perform imaging in accordance with an imaging instruction received from the voice analysis device 100.
  • the imaging unit 11 transmits an image indicating the imaged content to the voice analysis device 100.
  • the reading unit 12 has a reader (card reader) for reading information recorded in an IC (Integrated Circuit) card or a magnetic card (hereinafter collectively referred to as a card) presented by a participant by a contact method or a non-contact method. .
  • An IC chip incorporated in a smartphone or the like may be used as an IC card.
  • the reading unit 12 is configured to be able to specify the orientation of the participant who presented the card.
  • the reading unit 12 includes twelve reading devices arranged in different directions every 30 degrees in a horizontal plane with respect to the ground.
  • the reading unit 12 may be provided with a button for specifying the orientation of the participant in addition to the reading device.
  • the reading unit 12 When a card is presented by the participant, the reading unit 12 reads the information of the card by the reading device, and identifies the orientation of the participant based on which reading device has read the card. Then, the reading unit 12 associates the read information with the direction of the participant and transmits the information to the voice analysis device 100.
  • the communication terminal 20 has a display unit 21 for displaying various information, and an operation unit 22 for receiving an operation by an analyst.
  • the display unit 21 includes a display device such as a liquid crystal display or an organic light emitting diode (OLED) display.
  • the operation unit 22 includes operation members such as a button, a switch, and a dial.
  • the display unit 21 and the operation unit 22 may be integrally configured by using a touch screen capable of detecting the position of contact by the analyst as the display unit 21.
  • the voice analysis device 100 includes a control unit 110, a communication unit 120, and a storage unit 130.
  • the control unit 110 includes a position setting unit 111, an audio acquisition unit 112, a sound source localization unit 113, a tracking unit 114, an analysis unit 115, and an output unit 116.
  • the storage unit 130 includes a position storage unit 131, a voice storage unit 132, and an analysis result storage unit 133.
  • the communication unit 120 is a communication interface for communicating with the sound collection device 10 and the communication terminal 20 via the network N.
  • the communication unit 120 includes a processor for performing communication, a connector, an electric circuit, and the like.
  • the communication unit 120 performs predetermined processing on a communication signal received from the outside to acquire data, and inputs the acquired data to the control unit 110. Further, the communication unit 120 performs predetermined processing on the data input from the control unit 110 to generate a communication signal, and transmits the generated communication signal to the outside.
  • the storage unit 130 is a storage medium including a read only memory (ROM), a random access memory (RAM), a hard disk drive, and the like.
  • the storage unit 130 stores in advance a program to be executed by the control unit 110.
  • the storage unit 130 may be provided outside the voice analysis device 100, and in this case, data may be exchanged with the control unit 110 via the communication unit 120.
  • the position storage unit 131 stores information indicating the positions of participants in the discussion.
  • the voice storage unit 132 stores the voice acquired by the sound collection device 10.
  • the analysis result storage unit 133 stores an analysis result indicating the result of analyzing the voice.
  • the position storage unit 131, the voice storage unit 132, and the analysis result storage unit 133 may be storage areas on the storage unit 130, or may be databases configured on the storage unit 130.
  • the control unit 110 is, for example, a processor such as a central processing unit (CPU), and executes the program stored in the storage unit 130 to obtain the position setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, and the tracking unit 114. Functions as an analysis unit 115 and an output unit 116.
  • the functions of the position setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the tracking unit 114, the analysis unit 115, and the output unit 116 will be described later with reference to FIGS.
  • At least a part of the functions of the control unit 110 may be performed by an electrical circuit.
  • at least a part of the functions of the control unit 110 may be executed by a program executed via a network.
  • the speech analysis system S is not limited to the specific configuration shown in FIG.
  • the voice analysis device 100 is not limited to one device, and may be configured by connecting two or more physically separated devices in a wired or wireless manner.
  • FIG. 3 is a schematic view of the speech analysis method performed by the speech analysis system S according to the present embodiment.
  • the position setting unit 111 of the voice analysis device 100 sets the position of each participant in the argument to be analyzed by position setting processing described later (a).
  • the position setting unit 111 sets the position of each participant by storing the position of each participant specified in the position setting processing described later in the position storage unit 131.
  • the audio acquisition unit 112 of the audio analysis device 100 transmits a signal instructing acquisition of audio to the sound collection device 10 when starting acquisition of audio (b).
  • the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100, the collection of voice is started.
  • the voice acquisition unit 112 of the voice analysis device 100 ends the voice acquisition, the voice acquisition unit 112 transmits a signal instructing the end of the voice acquisition to the sound collection device 10.
  • the sound collection device 10 receives a signal instructing the end of the acquisition of sound from the speech analysis device 100, the sound collection device 10 ends the acquisition of sound.
  • the sound collection device 10 acquires voices in each of a plurality of sound collection units, and internally records the sound as the sound of each channel corresponding to each sound collection unit. Then, the sound collection device 10 transmits the acquired voices of the plurality of channels to the voice analysis device 100 (c). The sound collector 10 may transmit the acquired voice sequentially or may transmit a predetermined amount or a predetermined time of sound. Further, the sound collection device 10 may collectively transmit the sound from the start to the end of the acquisition.
  • the voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
  • the voice analysis device 100 analyzes voice at predetermined timing using the voice acquired from the sound collection device 10.
  • the voice analysis device 100 may analyze the voice when the analyst gives an analysis instruction at the communication terminal 20 by a predetermined operation. In this case, the analyst selects a voice corresponding to the argument to be analyzed from the voices stored in the voice storage unit 132.
  • the voice analysis device 100 may analyze the voice when the voice acquisition ends. In this case, the speech from the start to the end of the acquisition corresponds to the argument to be analyzed. In addition, the voice analysis device 100 may analyze voice sequentially (that is, in real time processing) during acquisition of voice. In this case, the voice analysis device 100 goes back from the current time, and the voice for a predetermined time (for example, 30 seconds) in the past corresponds to the argument to be analyzed.
  • a predetermined time for example, 30 seconds
  • the sound source localization unit 113 When analyzing speech, first, the sound source localization unit 113 performs sound source localization based on the plurality of channels of speech acquired by the speech acquisition unit 112 (d). The sound source localization is processing for estimating the direction of the sound source included in the sound acquired by the sound acquisition unit 112 for each time (for example, every 10 milliseconds to 100 milliseconds). The sound source localization unit 113 associates the direction of the sound source estimated for each time with the position of the participant stored in the position storage unit 131.
  • the sound source localization unit 113 can identify the direction of the sound source based on the sound acquired from the sound collection device 10, a known sound source localization method such as Multiple Signal Classification (MUSIC) method or beam forming method can be used. .
  • MUSIC Multiple Signal Classification
  • the analysis unit 115 analyzes the voice based on the voice acquired by the voice acquisition unit 112, the direction of the sound source estimated by the sound source localization unit 113, and the position of the participant stored in the position storage unit 131 (e) .
  • the analysis unit 115 may analyze the entire completed discussion as an analysis target, or may analyze a part of the discussion in the case of real-time processing.
  • the analysis unit 115 first analyzes the voice based on the voice acquired by the voice acquisition unit 112, the direction of the sound source estimated by the sound source localization unit 113, and the position of the participant stored in the position storage unit 131. In the discussion, it is determined at time intervals (eg, every 10 milliseconds to 100 milliseconds) which participants have made a speech.
  • the analysis unit 115 specifies, as a speech period, a continuous period from when one participant starts talking to when it ends, and causes the analysis result storage unit 133 to store the same. When a plurality of participants speak at the same time, the analysis unit 115 specifies a speech period for each participant.
  • the analysis unit 115 calculates the amount of speech of each participant for each time, and causes the analysis result storage unit 133 to store the amount. Specifically, in a certain time window (for example, 5 seconds), the analysis unit 115 calculates a value obtained by dividing the length of time during which the participant speaks by the length of the time window as the amount of speech per time Do. Then, the analysis unit 115 calculates the amount of speech per hour for each participant while shifting the time window by a predetermined time (for example, one second) from the start time of the discussion to the end time (currently in the case of real time processing). repeat.
  • a predetermined time for example, one second
  • the follow-up unit 114 acquires the position of the latest participant at predetermined time intervals in the voice to be analyzed by the sound source localization unit 113 and the analysis unit 115 by the following processing described later, and the participant stored in the position storage unit 131 Update the position of. Thereby, even if the participant moves from the already set position during acquisition of the sound, the sound source localization unit 113 and the analysis unit 115 can analyze the sound by following up.
  • the output unit 116 performs control to display the analysis result by the analysis unit 115 on the display unit 21 by transmitting the display information to the communication terminal 20 (f).
  • the output unit 116 is not limited to the display on the display unit 21 and may output the analysis result by other methods such as printing by a printer, data recording to a storage device, and the like.
  • FIG. 4 is a diagram showing a flowchart of the entire speech analysis method performed by the speech analysis apparatus 100 according to the present embodiment.
  • the position setting unit 111 specifies the position of each participant of the argument to be analyzed by position setting processing described later, and stores the position in the position storage unit 131 (S1).
  • the voice acquisition unit 112 obtains a voice from the sound collection device 10 and stores the voice in the voice storage unit 132 (S2).
  • the voice analysis device 100 analyzes the voice acquired by the voice acquisition unit 112 in step S2 for each predetermined time range (time window) from the start time to the end time.
  • the sound source localization unit 113 executes sound source localization in the time range of the voice to be analyzed, and associates the estimated direction of the sound source with the position of each participant stored in the position storage unit 131 (S3).
  • the following unit 114 acquires the latest participant's position in the time range of the voice to be analyzed by the following processing described later, and updates the participant's position stored in the position storage unit 131 (S4).
  • the analysis unit 115 analyzes the voice based on the voice acquired by the voice acquisition unit 112 in step S2, the direction of the sound source estimated in step S3 by the sound source localization unit 113, and the position of the participant stored in the position storage unit 131. To do (S5).
  • the analysis unit 115 stores the analysis result in the analysis result storage unit 133.
  • the voice analysis device 100 executes the steps S3 to S5 for the next time range in the voice to be analyzed. repeat. If the analysis is completed until the end time of the voice acquired by the voice acquisition unit 112 in step S2 (YES in S6), the output unit 116 outputs the analysis result in step S5 according to a predetermined method (S7).
  • FIG. 5 is a front view of the display unit 21 of the communication terminal 20 displaying the setting screen A.
  • the analyst manually sets the position of each participant by operating the communication terminal 20, and each participant inputs information for specifying the position of the participant in the sound collection device 10. And automatic setting processing.
  • the communication terminal 20 displays the setting screen A on the display unit 21 and receives the setting of the analysis condition by the analyst.
  • the setting screen A includes a position setting area A1, a start button A2, an end button A3, and an automatic setting button A4.
  • the positioning area A1 is an area for setting the direction in which each participant U is actually positioned with reference to the sound collection device 10 in the argument to be analyzed.
  • the position setting area A1 represents a circle centered on the position of the sound collector 10, and further represents an angle based on the sound collector 10 along the circle.
  • the analyst who desires the manual setting process sets the position of each participant U in the position setting area A1 by operating the operation unit 22 of the communication terminal 20.
  • identification information here, U1 to U4
  • U1 to U4 identification information for identifying each participant U is allocated and displayed.
  • FIG. 5 four participants U1 to U4 are set.
  • the portion corresponding to each participant U in the positioning area A1 is displayed in a different color for each participant. Thereby, the analyst can easily recognize the direction in which each participant U is set.
  • the start button A2, the end button A3 and the automatic setting button A4 are virtual buttons displayed on the display unit 21 respectively.
  • the communication terminal 20 transmits a signal of a start instruction to the voice analysis device 100 when the analyst presses the start button A2.
  • the communication terminal 20 transmits a signal of a termination instruction to the voice analysis device 100 when the analyst presses the termination button A3.
  • from the start instruction to the end instruction by the analyst is one discussion.
  • An analyst who desires the automatic setting process causes the voice analysis device 100 to start the automatic setting process by pressing the automatic setting button A4.
  • the communication terminal 20 transmits an automatic setting instruction signal to the voice analysis device 100.
  • FIGS. 6A to 6C are schematic views of the automatic setting process performed by the voice analysis device 100 according to the present embodiment.
  • the voice analysis device 100 sets the position of the participant U by at least one of the processes shown in FIGS. 6 (a) to 6 (c).
  • FIG. 6A shows a process of setting the position of the participant U based on the voice uttered by the participant U.
  • the position setting unit 111 of the voice analysis device 100 causes the sound collection unit of the sound collection device 10 to acquire the voice emitted by each participant U.
  • the position setting unit 111 acquires the sound acquired by the sound collection device 10.
  • the position setting unit 111 specifies the direction of each participant U based on the direction in which the acquired voice is emitted.
  • the position setting unit 111 uses the result of sound source localization by the above-described sound source localization unit 113 in order to specify the direction of the participant from the voice. Then, the position setting unit 111 causes the position storage unit 131 to store the positions of the identified participants U.
  • the position setting unit 111 may specify the individual of the participant U by comparing the acquired voice of each participant U with the voice of the individual stored in advance in the voice analysis device 100. For example, the position setting unit 111 identifies the individual by comparing the voiceprints of the voices of the participants U (that is, the frequency spectrum of the voice). As a result, personal information of the participant U can be displayed together with the analysis result, and a plurality of analysis results of the same participant U can be displayed.
  • FIG. 6B shows a process of setting the position of the participant U based on the image of the face of the participant U.
  • the position setting unit 111 of the voice analysis device 100 causes the imaging unit 11 provided in the sound collection device 10 to capture an area including the faces of all the participants U surrounding the sound collection device 10.
  • the position setting unit 111 acquires an image captured by the imaging unit 11.
  • the position setting unit 111 recognizes the face of each participant U in the acquired image.
  • the position setting unit 111 can use a known face recognition technology to recognize a human face from an image. Then, the position setting unit 111 identifies the position of each participant U based on the sound collection device 10 based on the position of the face of each participant U recognized from the image, and stores the position in the position storage unit 131.
  • the relationship between the position in the image (for example, the coordinates of the pixels in the image) and the position based on the sound collection device 10 (for example, the angle with respect to the sound collection device 10) is set in advance in the voice analysis device 100.
  • the position setting unit 111 may specify the individual of the participant U by comparing the face of each participant U recognized from the image with the face of the individual stored in the voice analysis device 100 in advance. As a result, personal information of the participant U can be displayed together with the analysis result, and a plurality of analysis results of the same participant U can be displayed.
  • FIG. 6C shows a process of setting the position of the participant U based on the information of the card C presented by the participant U.
  • the position setting unit 111 of the voice analysis device 100 causes the reading unit 12 provided in the sound collection device 10 to read the information of the card C presented by each participant U.
  • the position setting unit 111 acquires the information on the card C read by the reading unit 12 and the direction of the participant U who presented the card C.
  • the position setting unit 111 identifies the position of each participant U with respect to the sound collection device 10 based on the acquired information of the card C and the direction of the participant U, and causes the position storage unit 131 to store the position.
  • the position setting unit 111 may specify the individual of the participant U by acquiring personal information stored in advance in the voice analysis device 100 using the acquired information of the card C. As a result, personal information of the participant U can be displayed together with the analysis result, and a plurality of analysis results of the same participant U can be displayed.
  • the position setting unit 111 may execute the automatic setting process and the manual setting process in combination.
  • the position setting unit 111 displays the position of each participant U set by the automatic setting process of FIGS. 6A to 6C in the position setting area A1 of FIG. Accept manual settings by the user.
  • the position of each participant U set by the automatic setting process can be corrected by the manual setting process, and the position of each participant U can be set more reliably.
  • the voice analysis device 100 can automatically set the position of each participant U based on the information on the participant U acquired on the sound collection device 10, the analyst can set the positions of all the groups on the communication terminal 20. It is possible to reduce the trouble of setting the position of each participant U.
  • the voice analysis device 100 is not limited to information on voice, an image, or a card as information on the participant U that can be acquired on the sound collection device 10 (that is, information for specifying the position of the participant U). Other information that can identify the orientation may be used.
  • FIG. 7 is a diagram showing a flowchart of position setting processing performed by the voice analysis device 100 according to the present embodiment.
  • the position setting unit 111 determines whether or not the automatic setting process is instructed by the analyst on the setting screen A in FIG. 5. When the automatic setting process is not instructed (ie, in the case of manual setting) (NO in S11), the position setting unit 111 follows the contents input on the setting screen A displayed on the communication terminal 20 for each participant. The position is specified and set in the position storage unit 131 (S12).
  • the position setting unit 111 acquires information on the participant (that is, information for specifying the position of the participant) on the sound collection device 10 (S13).
  • the position setting unit 111 uses at least one of the voice of the participant, the image of the face of the participant, and the information of the card presented by the participant as the information on the participant.
  • the position setting unit 111 specifies the position of each participant U with respect to the sound collection device 10 based on the acquired information on the participants (S14). Then, the position setting unit 111 sets the positions of the participants by storing the positions of the identified participants in the position storage unit 131 (S15).
  • FIG. 8 is a schematic view of the follow-up process performed by the voice analysis device 100 according to the present embodiment.
  • the following process is a process of updating the position of each participant U stored in the position storage unit 131 in the middle of the voice to be analyzed by the sound source localization unit 113 and the analysis unit 115.
  • the upper part of FIG. 8 shows the position of each participant U before the update
  • the lower part of FIG. 8 shows the position of each participant U after the update.
  • the upper view of FIG. 8 shows a state in which the participant U1 has moved from the position P1 set in the position storage section 131 to another position P2. In this state, the voice emitted by the participant U1 enters the sound collector 10 from a position P2 different from the set position P1 of the participant U1. Therefore, the analysis unit 115 can not detect the utterance of the participant U1 from the voice.
  • the tracking unit 114 updates the position of the participant U1 from the position P1 to the position P2 in the position storage unit 131 as illustrated in the lower part of FIG.
  • the analysis unit 115 can correctly detect the utterance of the participant U1.
  • the tracking unit 114 acquires the direction of the sound source estimated by the sound source localization unit 113 every predetermined time (for example, one minute). If the estimated direction of the sound source does not correspond to any of the positions of the participants U stored in the position storage unit 131, the tracking unit 114 determines that any participant U moves in the direction of the sound source judge. Then, the tracking unit 114 specifies the moved participant U, and updates the position stored in the position storage unit 131 for the moved participant U to a position corresponding to the direction of the sound source.
  • the tracking unit 114 specifies that the participant U set at the position closest to the direction of the sound source estimated from the sound acquired by the sound collection device 10 has moved to the position corresponding to the direction of the sound source. In this case, the tracking unit 114 selects the moved participant U from among the participants U set at a position within a predetermined range (for example, within a range of -30 degrees to +30 degrees) from the direction of the sound source. It is also good. By limiting the range of movement, the following unit 114 can suppress, for example, the position of the participant U from being moved to the wrong position.
  • a predetermined range for example, within a range of -30 degrees to +30 degrees
  • the tracking unit 114 compares the voiceprint of the sound source with the voiceprint of each participant U, and specifies that the participant U having the voiceprint similar to the voiceprint of the sound source has moved to the position corresponding to the direction of the sound source. It is also good. In this case, the tracking unit 114 may acquire the voiceprint of each participant U from the voice of each participant U at the start of analysis, or may acquire the voiceprint of each participant U stored in the storage unit 130 in advance. It is also good. The tracking unit 114 calculates the degree of similarity of voiceprints between the voiceprint of the sound source and the voiceprint of each participant U.
  • the tracking unit 114 selects the participant U whose voiceprint similarity is the highest in the group, or selects the participant U whose voiceprint similarity is equal to or higher than a predetermined threshold.
  • the tracking accuracy can be improved by specifying the moved participant U using the voiceprint.
  • the following unit 114 acquires a face located in the direction of the sound source in the image captured by the imaging unit 11 of the sound collection device 10, and the participant U having a face similar to the acquired face has the direction of the sound source It may specify that it moved to the position corresponding to.
  • the tracking unit 114 may acquire the face of each participant U from the image captured by the imaging unit 11 at the start of analysis, or may acquire the face of each participant U stored in the storage unit 130 in advance. It is also good.
  • the tracking unit 114 calculates the similarity of the face between the face located in the direction of the sound source and the face of each participant U.
  • the tracking unit 114 selects the participant U whose face similarity is the highest in the group, or selects the participant U whose face similarity is equal to or higher than a predetermined threshold. By specifying the moved participant U using the face, it is possible to improve the tracking accuracy.
  • the following unit 114 recognizes voiceprints for each participant U based on the difference between the direction of the sound source and the position (direction) stored in the position storage unit 131.
  • the face similarity may be weighted. The probability that the participant U moves to the position of the sound source is higher as the position set for the participant U and the direction of the sound source are closer, and the distance between the position set for the participant U and the direction of the sound source is higher It can be said that the probability of moving to the position of the sound source is low.
  • the tracking unit 114 weights the similarity of the voiceprint or face higher as the difference between the position set for the participant U and the direction of the sound source decreases, and the difference between the position set for the participant U and the direction of the sound source The greater the value of V, the lower the degree of similarity of the voiceprint or face. This can further improve the tracking accuracy.
  • FIG. 9 is a diagram showing a flowchart of the follow-up process performed by the voice analysis device 100 according to the present embodiment.
  • the tracking unit 114 acquires the direction of the sound source estimated by the sound source localization unit 113. If the direction of the sound source corresponds to any of the positions of the participants U stored in the position storage unit 131 (YES in S41), the following unit 114 ends the process without updating the position. .
  • the following unit 114 selects information about the participants on the sound collection device 10 (Ie, information for specifying the position of the participant) is acquired (S42).
  • the tracking unit 114 uses at least one of the voice of the participant and the image of the face of the participant as the information on the participant.
  • the following unit 114 specifies which participant has moved based on the acquired information on the participant (S43). Then, the tracking unit 114 updates the position stored in the position storage unit 131 to the position corresponding to the direction of the sound source for the participant U specified as having moved (S44).
  • the voice analysis device 100 is information on each participant such as a voice emitted by the participant, an image of the participant's face, information of a card presented by the participant, etc. in the sound collection device 10 arranged in each group. And automatically set the position of each participant based on the acquired information. Therefore, when analyzing the voice of the discussion, it is possible to reduce the trouble of setting the position of each participant for each group.
  • the voice analysis device 100 updates the position of each participant based on the information on each participant during analysis of voice. Therefore, even if the participant moves in the middle of acquiring the voice, it is possible to follow and analyze.
  • the speech analysis device 100 is used for analysis of speech in the discussion, but can be applied to other applications.
  • the voice analysis device 100 can also analyze the voice emitted by a passenger sitting in a car.
  • the processor of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 is a main body of each step (process) included in the voice analysis method shown in FIGS. That is, the processors of the voice analysis device 100, the sound collection device 10 and the communication terminal 20 read a program for executing the voice analysis method shown in FIGS. 4, 7 and 9 from the storage unit and execute the program to perform voice analysis. By controlling the respective units of the device 100, the sound collection device 10 and the communication terminal 20, the voice analysis method shown in FIGS.
  • the steps included in the speech analysis method shown in FIGS. 4, 7 and 9 may be partially omitted, the order between the steps may be changed, and a plurality of steps may be performed in parallel.
  • S speech analysis system 100 speech analysis device 110 control unit 111 position setting unit 112 speech acquisition unit 114 tracking unit 115 analysis unit 10 sound collection device 20 communication terminal

Abstract

The objective of the present invention is to provide a voice analysis device, a voice analysis method, a voice analysis program, and a voice analysis system, which, when voices in a discussion are analyzed, enable reduction in time and effort for setting the positions of participants. A voice analysis device 100 according to one embodiment of the present invention includes: a position setting unit 111 that acquires information about a plurality of participants from a sound collecting device, and sets the positions of the plurality of participants on the basis of the acquired information about the participants; a voice acquisition unit 112 that acquires voices from the sound collecting device; and an analysis unit 115 that analyzes voices uttered by the respective participants on the basis of the positions set by the position setting unit 111.

Description

音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムSpeech analysis device, speech analysis method, speech analysis program and speech analysis system
 本発明は、音声を分析するための音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムに関する。 The present invention relates to a voice analysis device for analyzing voice, a voice analysis method, a voice analysis program and a voice analysis system.
 グループ学習や会議における議論を分析する方法として、ハークネス法(ハークネスメソッドともいう)が知られている(例えば、非特許文献1参照)。ハークネス法では、各参加者の発言の遷移を線で記録する。これにより、各参加者の議論への貢献や、他者との関係性を分析することができる。ハークネス法は、学生が主体的に学習を行うアクティブ・ラーニングにも効果的に適用できる。 The Harkness method (also referred to as the Harkness method) is known as a method for analyzing discussions in group learning and meetings (see, for example, Non-Patent Document 1). In the Harkness method, the transition of each participant's utterance is recorded in a line. In this way, it is possible to analyze the contribution of each participant to the discussion and the relationship with others. The Harkness Law can also be effectively applied to active learning where students take the initiative in learning.
 コンピュータを用いてハークネス法に類する分析を行う場合には、マイクロフォン等の集音装置の位置を基準とした各参加者の位置を設定することによって、集音装置によって取得した音声を各参加者について分析する。そのため、グループごとに各参加者の位置を設定する大きな手間が掛かるという問題があった。 When performing analysis similar to the method of using a computer with a computer, by setting the position of each participant based on the position of a sound collection device such as a microphone, the voice acquired by the sound collection device is determined for each participant analyse. Therefore, there is a problem that it takes a lot of time and effort to set the position of each participant for each group.
 本発明はこれらの点に鑑みてなされたものであり、議論の音声を分析する際に参加者の位置を設定する手間を削減できる音声分析装置、音声分析方法、音声分析プログラム及び音声分析システムを提供することを目的とする。 The present invention has been made in view of these points, and it is an object of the present invention to provide a voice analysis device, a voice analysis method, a voice analysis program, and a voice analysis system capable of reducing the time and effort required to set the positions of participants when analyzing voices of discussion. Intended to be provided.
 本発明の第1の態様の音声分析装置は、集音装置から複数の参加者に関する情報を取得し、取得した前記参加者に関する情報に基づいて前記複数の参加者それぞれの位置を設定する設定部と、前記集音装置から音声を取得する取得部と、前記設定部が設定した前記位置に基づいて、前記複数の参加者それぞれが発した前記音声を分析する分析部と、を有する。 The voice analysis device according to the first aspect of the present invention acquires information on a plurality of participants from the sound collection device, and sets a position of each of the plurality of participants based on the acquired information on the participants And an acquisition unit configured to acquire a voice from the sound collection device, and an analysis unit configured to analyze the voice uttered by each of the plurality of participants based on the position set by the setting unit.
 前記設定部は、前記参加者に関する情報として前記集音装置から音声を取得し、取得した前記音声が発せられた向きを特定することによって、前記複数の参加者それぞれの位置を設定してもよい。 The setting unit may set a position of each of the plurality of participants by acquiring a voice from the sound collection device as the information on the participant and specifying a direction in which the acquired voice is emitted. .
 前記設定部は、前記参加者に関する情報として前記集音装置上に設けられた撮像部が撮像した画像を取得し、取得した前記画像に含まれる前記複数の参加者の顔を認識することによって、前記複数の参加者それぞれの位置を設定してもよい。 The setting unit acquires an image captured by an imaging unit provided on the sound collection device as the information on the participant, and recognizes the faces of the plurality of participants included in the acquired image. The positions of each of the plurality of participants may be set.
 前記設定部は、前記参加者に関する情報として前記集音装置上に設けられた読取部が読み取ったカードの情報を取得し、前記カードが前記読取部に提示された向きに従って前記複数の参加者それぞれの位置を設定してもよい。 The setting unit acquires information of a card read by a reading unit provided on the sound collection device as the information on the participant, and the plurality of participants are each according to the direction in which the card is presented to the reading unit. The position of may be set.
 前記設定部は、前記参加者に関する情報に加えて、通信端末において入力された情報に基づいて、前記複数の参加者それぞれの位置を設定してもよい。 The setting unit may set the position of each of the plurality of participants based on the information input in the communication terminal in addition to the information on the participants.
 前記分析部が分析している前記音声の途中において、前記設定部が設定した前記位置を更新する追従部をさらに有してもよい。 It may further include a following unit that updates the position set by the setting unit in the middle of the voice being analyzed by the analysis unit.
 前記追従部は、前記分析部が分析している前記音声が発せられた向きが、前記設定部が設定した前記位置に対応しない場合に、前記設定部が設定した前記位置を更新してもよい。 The tracking unit may update the position set by the setting unit when the direction in which the voice analyzed by the analysis unit is emitted does not correspond to the position set by the setting unit. .
 前記追従部は、前記設定部が設定した前記位置を、前記分析部が分析している前記音声が発せられた向きに更新してもよい。 The tracking unit may update the position set by the setting unit to a direction in which the voice analyzed by the analysis unit is emitted.
 本発明の第2の態様の音声分析方法は、プロセッサが、集音装置から複数の参加者に関する情報を取得し、取得した前記参加者に関する情報に基づいて前記複数の参加者それぞれの位置を設定するステップと、前記集音装置から音声を取得するステップと、前記設定するステップで設定された前記位置に基づいて、前記複数の参加者それぞれが発した前記音声を分析するステップと、を実行する。 In the voice analysis method according to the second aspect of the present invention, the processor acquires information on a plurality of participants from the sound collection device, and sets the position of each of the plurality of participants based on the acquired information on the participants Performing the steps of: acquiring voice from the sound collection device; and analyzing the voice emitted by each of the plurality of participants based on the position set in the setting step. .
 本発明の第3の態様の音声分析プログラムは、コンピュータに、集音装置から複数の参加者に関する情報を取得し、取得した前記参加者に関する情報に基づいて前記複数の参加者それぞれの位置を設定するステップと、前記集音装置から音声を取得するステップと、前記設定するステップで設定された前記位置に基づいて、前記複数の参加者それぞれが発した前記音声を分析するステップと、を実行させる。 The voice analysis program according to the third aspect of the present invention acquires information on a plurality of participants from a sound collection device on a computer, and sets the positions of the plurality of participants based on the acquired information on the participants. Performing the steps of: obtaining voice from the sound collection device; and analyzing the voice emitted by each of the plurality of participants based on the position set in the setting step. .
 本発明の第4の態様の音声分析システムは、音声分析装置と、前記音声分析装置と通信可能な集音装置と、を備え、前記集音装置は、音声を取得するとともに、複数の参加者に関する情報を取得するように構成され、前記音声分析装置は、前記集音装置から前記参加者に関する情報を取得し、取得した前記参加者に関する情報に基づいて前記複数の参加者それぞれの位置を設定する設定部と、前記集音装置から前記音声を取得する取得部と、前記設定部が設定した前記位置に基づいて、前記複数の参加者それぞれが発した前記音声を分析する分析部と、を有する。 A voice analysis system according to a fourth aspect of the present invention includes a voice analysis device and a sound collection device capable of communicating with the voice analysis device, wherein the sound collection device acquires voice and a plurality of participants. The voice analysis device acquires information on the participant from the sound collection device, and sets positions of the plurality of participants based on the acquired information on the participant A setting unit for acquiring the voice from the sound collection device, and an analysis unit for analyzing the voice emitted by each of the plurality of participants based on the position set by the setting unit; Have.
 本発明によれば、議論の音声を分析する際に参加者の位置を設定する手間を削減できるという効果を奏する。 According to the present invention, it is possible to reduce the trouble of setting the position of the participant when analyzing the voice of the discussion.
本実施形態に係る音声分析システムの模式図である。It is a schematic diagram of the speech analysis system concerning this embodiment. 本実施形態に係る音声分析システムのブロック図である。It is a block diagram of a speech analysis system concerning this embodiment. 本実施形態に係る音声分析システムが行う音声分析方法の模式図である。It is a schematic diagram of the speech analysis method which the speech analysis system concerning this embodiment performs. 本実施形態に係る音声分析装置が行う音声分析方法の全体のフローチャートを示す図である。It is a figure which shows the flowchart of the whole speech analysis method which the speech analysis device concerning this embodiment performs. 設定画面を表示している通信端末の表示部の前面図である。It is a front view of the display part of the communication terminal which is displaying the setting screen. 本実施形態に係る音声分析装置が行う自動設定処理の模式図である。It is a schematic diagram of the automatic setting process which the speech analyzer which concerns on this embodiment performs. 本実施形態に係る音声分析装置が行う位置設定処理のフローチャートを示す図である。It is a figure which shows the flowchart of the position setting process which the speech analyzer which concerns on this embodiment performs. 本実施形態に係る音声分析装置が行う追従処理の模式図である。It is a schematic diagram of the follow-up process which the speech analyzer concerning this embodiment performs. 本実施形態に係る音声分析装置が行う追従処理のフローチャートを示す図である。It is a figure which shows the flowchart of the follow-up process which the speech analyzer which concerns on this embodiment performs.
[音声分析システムSの概要]
 図1は、本実施形態に係る音声分析システムSの模式図である。音声分析システムSは、音声分析装置100と、集音装置10と、通信端末20とを含む。音声分析システムSが含む集音装置10及び通信端末20の数は限定されない。音声分析システムSは、その他のサーバ、端末等の機器を含んでもよい。
[Overview of speech analysis system S]
FIG. 1 is a schematic view of a speech analysis system S according to the present embodiment. The voice analysis system S includes a voice analysis device 100, a sound collection device 10, and a communication terminal 20. The number of sound collectors 10 and communication terminals 20 included in the speech analysis system S is not limited. The voice analysis system S may include devices such as other servers and terminals.
 音声分析装置100、集音装置10及び通信端末20は、ローカルエリアネットワーク、インターネット等のネットワークNを介して接続される。音声分析装置100、集音装置10及び通信端末20のうち少なくとも一部は、ネットワークNを介さず直接接続されてもよい。 The voice analysis device 100, the sound collection device 10, and the communication terminal 20 are connected via a network N such as a local area network or the Internet. At least a part of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 may be directly connected without the network N.
 集音装置10は、異なる向きに配置された複数の集音部(マイクロフォン)を含むマイクロフォンアレイを備える。例えばマイクロフォンアレイは、地面に対する水平面において、同一円周上に等間隔で配置された8個のマイクロフォンを含む。集音装置10は、マイクロフォンアレイを用いて取得した音声をデータとして音声分析装置100に送信する。 The sound collector 10 includes a microphone array including a plurality of sound collectors (microphones) arranged in different orientations. For example, the microphone array includes eight microphones equally spaced on the same circumference in the horizontal plane with respect to the ground. The sound collection device 10 transmits the voice acquired using the microphone array to the voice analysis device 100 as data.
 通信端末20は、有線又は無線の通信を行うことが可能な通信装置である。通信端末20は、例えばスマートフォン端末等の携帯端末、又はパーソナルコンピュータ等のコンピュータ端末である。通信端末20は、分析者から分析条件の設定を受け付けるとともに、音声分析装置100による分析結果を表示する。 The communication terminal 20 is a communication device capable of performing wired or wireless communication. The communication terminal 20 is, for example, a portable terminal such as a smart phone terminal or a computer terminal such as a personal computer. The communication terminal 20 receives the setting of analysis conditions from the analyst and displays the analysis result by the voice analysis device 100.
 音声分析装置100は、集音装置10によって取得された音声を、後述の音声分析方法によって分析するコンピュータである。また、音声分析装置100は、音声分析の結果を通信端末20に送信する。 The voice analysis device 100 is a computer that analyzes the voice acquired by the sound collection device 10 by a voice analysis method described later. Further, the voice analysis device 100 transmits the result of the voice analysis to the communication terminal 20.
[音声分析システムSの構成]
 図2は、本実施形態に係る音声分析システムSのブロック図である。図2において、矢印は主なデータの流れを示しており、図2に示していないデータの流れがあってよい。図2において、各ブロックはハードウェア(装置)単位の構成ではなく、機能単位の構成を示している。そのため、図2に示すブロックは単一の装置内に実装されてよく、あるいは複数の装置内に別れて実装されてよい。ブロック間のデータの授受は、データバス、ネットワーク、可搬記憶媒体等、任意の手段を介して行われてよい。
[Configuration of speech analysis system S]
FIG. 2 is a block diagram of the speech analysis system S according to the present embodiment. Arrows in FIG. 2 indicate the main data flow, and there may be data flows not shown in FIG. In FIG. 2, each block is not a hardware (apparatus) unit configuration but a function unit configuration. As such, the blocks shown in FIG. 2 may be implemented in a single device or may be implemented separately in multiple devices. Transfer of data between the blocks may be performed via any means such as a data bus, a network, a portable storage medium, and the like.
 集音装置10は、議論の参加者を撮像するための撮像部11と、議論の参加者が提示するカード等の情報を読み取るための読取部12とを有する。撮像部11は、各参加者(すなわち、複数の参加者それぞれ)の顔を含む所定の撮像範囲を撮像することが可能な撮像装置である。撮像部11は、集音装置10を取り囲む全ての参加者の顔を撮像可能な数及び配置の撮像素子を含む。例えば撮像部11は、地面に対する水平面において、180度ごとの異なる向きに配置された2個の撮像素子を含む。また、撮像部11は、地面に対する水平面において回転することによって、集音装置10を取り囲む全ての参加者の顔を撮像してもよい。 The sound collection device 10 includes an imaging unit 11 for imaging a participant in the discussion, and a reading unit 12 for reading information such as a card presented by the participant in the discussion. The imaging unit 11 is an imaging device capable of imaging a predetermined imaging range including the face of each participant (that is, each of a plurality of participants). The imaging unit 11 includes imaging elements of the number and arrangement capable of imaging the faces of all the participants surrounding the sound collection device 10. For example, the imaging unit 11 includes two imaging elements arranged in different orientations of 180 degrees in a horizontal plane with respect to the ground. Moreover, the imaging part 11 may image the face of all the participants who surround the sound collection apparatus 10 by rotating in the horizontal surface with respect to the ground.
 撮像部11は、集音装置10に予め設定されたタイミング(例えば10秒ごと)で撮像を行ってもよく、あるいは音声分析装置100から受信した撮像の指示に従って撮像を行ってもよい。撮像部11は、撮像した内容を示す画像を音声分析装置100に送信する。 The imaging unit 11 may perform imaging at a timing (for example, every 10 seconds) set in advance in the sound collection device 10, or may perform imaging in accordance with an imaging instruction received from the voice analysis device 100. The imaging unit 11 transmits an image indicating the imaged content to the voice analysis device 100.
 読取部12は、参加者が提示するIC(Integrated Circuit)カード又は磁気カード(以下、総称してカードという)に記録された情報を接触方式又は非接触方式で読み取る読取装置(カードリーダ)を有する。スマートフォン等に内蔵されたICチップを、ICカードとして用いてもよい。読取部12は、カードを提示した参加者の向きを特定可能に構成される。例えば読取部12は、地面に対する水平面において、30度ごとの異なる向きに配置された12個の読取装置を備える。また、例えば読取部12は、読取装置に加えて、参加者の向きを指定するボタンを備えてもよい。 The reading unit 12 has a reader (card reader) for reading information recorded in an IC (Integrated Circuit) card or a magnetic card (hereinafter collectively referred to as a card) presented by a participant by a contact method or a non-contact method. . An IC chip incorporated in a smartphone or the like may be used as an IC card. The reading unit 12 is configured to be able to specify the orientation of the participant who presented the card. For example, the reading unit 12 includes twelve reading devices arranged in different directions every 30 degrees in a horizontal plane with respect to the ground. Also, for example, the reading unit 12 may be provided with a button for specifying the orientation of the participant in addition to the reading device.
 読取部12は、参加者によってカードが提示された場合に、読取装置によってカードの情報を読み取り、いずれの読取装置がカードを読み取ったかに基づいて参加者の向きを特定する。そして読取部12は、読み取った情報と参加者の向きとを関連付けて音声分析装置100に送信する。 When a card is presented by the participant, the reading unit 12 reads the information of the card by the reading device, and identifies the orientation of the participant based on which reading device has read the card. Then, the reading unit 12 associates the read information with the direction of the participant and transmits the information to the voice analysis device 100.
 通信端末20は、各種情報を表示するための表示部21と、分析者による操作を受け付けるための操作部22とを有する。表示部21は、液晶ディスプレイ、有機エレクトロルミネッセンス(OLED: Organic Light Emitting Diode)ディスプレイ等の表示装置を含む。操作部22は、ボタン、スイッチ、ダイヤル等の操作部材を含む。表示部21として分析者による接触の位置を検出可能なタッチスクリーンを用いることによって、表示部21と操作部22とを一体に構成してもよい。 The communication terminal 20 has a display unit 21 for displaying various information, and an operation unit 22 for receiving an operation by an analyst. The display unit 21 includes a display device such as a liquid crystal display or an organic light emitting diode (OLED) display. The operation unit 22 includes operation members such as a button, a switch, and a dial. The display unit 21 and the operation unit 22 may be integrally configured by using a touch screen capable of detecting the position of contact by the analyst as the display unit 21.
 音声分析装置100は、制御部110と、通信部120と、記憶部130とを有する。制御部110は、位置設定部111と、音声取得部112と、音源定位部113と、追従部114と、分析部115と、出力部116とを有する。記憶部130は、位置記憶部131と、音声記憶部132と、分析結果記憶部133とを有する。 The voice analysis device 100 includes a control unit 110, a communication unit 120, and a storage unit 130. The control unit 110 includes a position setting unit 111, an audio acquisition unit 112, a sound source localization unit 113, a tracking unit 114, an analysis unit 115, and an output unit 116. The storage unit 130 includes a position storage unit 131, a voice storage unit 132, and an analysis result storage unit 133.
 通信部120は、ネットワークNを介して集音装置10及び通信端末20との間で通信をするための通信インターフェースである。通信部120は、通信を実行するためのプロセッサ、コネクタ、電気回路等を含む。通信部120は、外部から受信した通信信号に所定の処理を行ってデータを取得し、取得したデータを制御部110に入力する。また、通信部120は、制御部110から入力されたデータに所定の処理を行って通信信号を生成し、生成した通信信号を外部に送信する。 The communication unit 120 is a communication interface for communicating with the sound collection device 10 and the communication terminal 20 via the network N. The communication unit 120 includes a processor for performing communication, a connector, an electric circuit, and the like. The communication unit 120 performs predetermined processing on a communication signal received from the outside to acquire data, and inputs the acquired data to the control unit 110. Further, the communication unit 120 performs predetermined processing on the data input from the control unit 110 to generate a communication signal, and transmits the generated communication signal to the outside.
 記憶部130は、ROM(Read Only Memory)、RAM(Random Access Memory)、ハードディスクドライブ等を含む記憶媒体である。記憶部130は、制御部110が実行するプログラムを予め記憶している。記憶部130は、音声分析装置100の外部に設けられてもよく、その場合に通信部120を介して制御部110との間でデータの授受を行ってもよい。 The storage unit 130 is a storage medium including a read only memory (ROM), a random access memory (RAM), a hard disk drive, and the like. The storage unit 130 stores in advance a program to be executed by the control unit 110. The storage unit 130 may be provided outside the voice analysis device 100, and in this case, data may be exchanged with the control unit 110 via the communication unit 120.
 位置記憶部131は、議論の参加者の位置を示す情報を記憶する。音声記憶部132は、集音装置10によって取得された音声を記憶する。分析結果記憶部133は、音声を分析した結果を示す分析結果を記憶する。位置記憶部131、音声記憶部132及び分析結果記憶部133は、それぞれ記憶部130上の記憶領域であってもよく、あるいは記憶部130上で構成されたデータベースであってもよい。 The position storage unit 131 stores information indicating the positions of participants in the discussion. The voice storage unit 132 stores the voice acquired by the sound collection device 10. The analysis result storage unit 133 stores an analysis result indicating the result of analyzing the voice. The position storage unit 131, the voice storage unit 132, and the analysis result storage unit 133 may be storage areas on the storage unit 130, or may be databases configured on the storage unit 130.
 制御部110は、例えばCPU(Central Processing Unit)等のプロセッサであり、記憶部130に記憶されたプログラムを実行することにより、位置設定部111、音声取得部112、音源定位部113、追従部114、分析部115及び出力部116として機能する。位置設定部111、音声取得部112、音源定位部113、追従部114、分析部115及び出力部116の機能については、図3~図9を用いて後述する。制御部110の機能の少なくとも一部は、電気回路によって実行されてもよい。また、制御部110の機能の少なくとも一部は、ネットワーク経由で実行されるプログラムによって実行されてもよい。 The control unit 110 is, for example, a processor such as a central processing unit (CPU), and executes the program stored in the storage unit 130 to obtain the position setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, and the tracking unit 114. Functions as an analysis unit 115 and an output unit 116. The functions of the position setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the tracking unit 114, the analysis unit 115, and the output unit 116 will be described later with reference to FIGS. At least a part of the functions of the control unit 110 may be performed by an electrical circuit. In addition, at least a part of the functions of the control unit 110 may be executed by a program executed via a network.
 本実施形態に係る音声分析システムSは、図2に示す具体的な構成に限定されない。例えば音声分析装置100は、1つの装置に限られず、2つ以上の物理的に分離した装置が有線又は無線で接続されることにより構成されてもよい。 The speech analysis system S according to the present embodiment is not limited to the specific configuration shown in FIG. For example, the voice analysis device 100 is not limited to one device, and may be configured by connecting two or more physically separated devices in a wired or wireless manner.
[音声分析方法の説明]
 図3は、本実施形態に係る音声分析システムSが行う音声分析方法の模式図である。まず音声分析装置100の位置設定部111は、後述の位置設定処理によって、分析対象とする議論における各参加者の位置を設定する(a)。位置設定部111は、後述の位置設定処理において特定した各参加者の位置を位置記憶部131に記憶させることにより、各参加者の位置を設定する。
[Description of voice analysis method]
FIG. 3 is a schematic view of the speech analysis method performed by the speech analysis system S according to the present embodiment. First, the position setting unit 111 of the voice analysis device 100 sets the position of each participant in the argument to be analyzed by position setting processing described later (a). The position setting unit 111 sets the position of each participant by storing the position of each participant specified in the position setting processing described later in the position storage unit 131.
 音声分析装置100の音声取得部112は、音声の取得を開始する際に、音声の取得を指示する信号を集音装置10に送信する(b)。集音装置10は、音声分析装置100から音声の取得を指示する信号を受信した場合に、音声の取得を開始する。また、音声分析装置100の音声取得部112は、音声の取得を終了する際に、音声の取得の終了を指示する信号を集音装置10に送信する。集音装置10は、音声分析装置100から音声の取得の終了を指示する信号を受信した場合に、音声の取得を終了する。 The audio acquisition unit 112 of the audio analysis device 100 transmits a signal instructing acquisition of audio to the sound collection device 10 when starting acquisition of audio (b). When the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100, the collection of voice is started. In addition, when the voice acquisition unit 112 of the voice analysis device 100 ends the voice acquisition, the voice acquisition unit 112 transmits a signal instructing the end of the voice acquisition to the sound collection device 10. When the sound collection device 10 receives a signal instructing the end of the acquisition of sound from the speech analysis device 100, the sound collection device 10 ends the acquisition of sound.
 集音装置10は、複数の集音部においてそれぞれ音声を取得し、各集音部に対応する各チャネルの音声として内部に記録する。そして集音装置10は、取得した複数のチャネルの音声を、音声分析装置100に送信する(c)。集音装置10は、取得した音声を逐次送信してもよく、あるいは所定量又は所定時間の音声を送信してもよい。また、集音装置10は、取得の開始から終了までの音声をまとめて送信してもよい。音声分析装置100の音声取得部112は、集音装置10から音声を受信して音声記憶部132に記憶させる。 The sound collection device 10 acquires voices in each of a plurality of sound collection units, and internally records the sound as the sound of each channel corresponding to each sound collection unit. Then, the sound collection device 10 transmits the acquired voices of the plurality of channels to the voice analysis device 100 (c). The sound collector 10 may transmit the acquired voice sequentially or may transmit a predetermined amount or a predetermined time of sound. Further, the sound collection device 10 may collectively transmit the sound from the start to the end of the acquisition. The voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
 音声分析装置100は、集音装置10から取得した音声を用いて、所定のタイミングで音声を分析する。音声分析装置100は、分析者が通信端末20において所定の操作によって分析指示を行った際に、音声を分析してもよい。この場合には、分析者は分析対象とする議論に対応する音声を音声記憶部132に記憶された音声の中から選択する。 The voice analysis device 100 analyzes voice at predetermined timing using the voice acquired from the sound collection device 10. The voice analysis device 100 may analyze the voice when the analyst gives an analysis instruction at the communication terminal 20 by a predetermined operation. In this case, the analyst selects a voice corresponding to the argument to be analyzed from the voices stored in the voice storage unit 132.
 また、音声分析装置100は、音声の取得が終了した際に音声を分析してもよい。この場合には、取得の開始から終了までの音声が分析対象の議論に対応する。また、音声分析装置100は、音声の取得の途中で逐次(すなわちリアルタイム処理で)音声を分析してもよい。この場合には、音声分析装置100は、現在時間から遡って過去の所定時間分(例えば30秒間)の音声が分析対象の議論に対応する。 In addition, the voice analysis device 100 may analyze the voice when the voice acquisition ends. In this case, the speech from the start to the end of the acquisition corresponds to the argument to be analyzed. In addition, the voice analysis device 100 may analyze voice sequentially (that is, in real time processing) during acquisition of voice. In this case, the voice analysis device 100 goes back from the current time, and the voice for a predetermined time (for example, 30 seconds) in the past corresponds to the argument to be analyzed.
 音声を分析する際に、まず音源定位部113は、音声取得部112が取得した複数チャネルの音声に基づいて音源定位を行う(d)。音源定位は、音声取得部112が取得した音声に含まれる音源の向きを、時間ごと(例えば10ミリ秒~100ミリ秒ごと)に推定する処理である。音源定位部113は、時間ごとに推定した音源の向きを、位置記憶部131に記憶された参加者の位置と関連付ける。 When analyzing speech, first, the sound source localization unit 113 performs sound source localization based on the plurality of channels of speech acquired by the speech acquisition unit 112 (d). The sound source localization is processing for estimating the direction of the sound source included in the sound acquired by the sound acquisition unit 112 for each time (for example, every 10 milliseconds to 100 milliseconds). The sound source localization unit 113 associates the direction of the sound source estimated for each time with the position of the participant stored in the position storage unit 131.
 音源定位部113は、集音装置10から取得した音声に基づいて音源の向きを特定可能であれば、MUSIC(Multiple Signal Classification)法、ビームフォーミング法等、公知の音源定位方法を用いることができる。 If the sound source localization unit 113 can identify the direction of the sound source based on the sound acquired from the sound collection device 10, a known sound source localization method such as Multiple Signal Classification (MUSIC) method or beam forming method can be used. .
 次に分析部115は、音声取得部112が取得した音声、音源定位部113が推定した音源の向き及び位置記憶部131に記憶された参加者の位置に基づいて、音声を分析する(e)。分析部115は、完了した議論の全体を分析対象としてもよく、あるいはリアルタイム処理の場合に議論の一部を分析対象としてもよい。 Next, the analysis unit 115 analyzes the voice based on the voice acquired by the voice acquisition unit 112, the direction of the sound source estimated by the sound source localization unit 113, and the position of the participant stored in the position storage unit 131 (e) . The analysis unit 115 may analyze the entire completed discussion as an analysis target, or may analyze a part of the discussion in the case of real-time processing.
 具体的には、まず分析部115は、音声取得部112が取得した音声、音源定位部113が推定した音源の向き及び位置記憶部131に記憶された参加者の位置に基づいて、分析対象の議論において、時間ごと(例えば10ミリ秒~100ミリ秒ごと)に、いずれの参加者が発言(発声)したかを判別する。分析部115は、1人の参加者が発言を開始してから終了するまでの連続した期間を発言期間として特定し、分析結果記憶部133に記憶させる。同じ時間に複数の参加者が発言を行った場合には、分析部115は、参加者ごとに発言期間を特定する。 Specifically, the analysis unit 115 first analyzes the voice based on the voice acquired by the voice acquisition unit 112, the direction of the sound source estimated by the sound source localization unit 113, and the position of the participant stored in the position storage unit 131. In the discussion, it is determined at time intervals (eg, every 10 milliseconds to 100 milliseconds) which participants have made a speech. The analysis unit 115 specifies, as a speech period, a continuous period from when one participant starts talking to when it ends, and causes the analysis result storage unit 133 to store the same. When a plurality of participants speak at the same time, the analysis unit 115 specifies a speech period for each participant.
 また、分析部115は、時間ごとの各参加者の発言量を算出し、分析結果記憶部133に記憶させる。具体的には、分析部115は、ある時間窓(例えば5秒間)において、参加者の発言を行った時間の長さを時間窓の長さで割った値を、時間ごとの発言量として算出する。そして分析部115は、議論の開始時間から終了時間(リアルタイム処理の場合には現在)まで、時間窓を所定の時間(例えば1秒)ずつずらしながら、各参加者について時間ごとの発言量の算出を繰り返す。 Further, the analysis unit 115 calculates the amount of speech of each participant for each time, and causes the analysis result storage unit 133 to store the amount. Specifically, in a certain time window (for example, 5 seconds), the analysis unit 115 calculates a value obtained by dividing the length of time during which the participant speaks by the length of the time window as the amount of speech per time Do. Then, the analysis unit 115 calculates the amount of speech per hour for each participant while shifting the time window by a predetermined time (for example, one second) from the start time of the discussion to the end time (currently in the case of real time processing). repeat.
 追従部114は、後述の追従処理によって、音源定位部113及び分析部115による分析対象の音声において所定の時間間隔で最新の参加者の位置を取得し、位置記憶部131に記憶された参加者の位置を更新する。これにより、音声の取得途中で参加者が既に設定された位置から移動した場合であっても、音源定位部113及び分析部115は追従して音声を分析することができる。 The follow-up unit 114 acquires the position of the latest participant at predetermined time intervals in the voice to be analyzed by the sound source localization unit 113 and the analysis unit 115 by the following processing described later, and the participant stored in the position storage unit 131 Update the position of. Thereby, even if the participant moves from the already set position during acquisition of the sound, the sound source localization unit 113 and the analysis unit 115 can analyze the sound by following up.
 出力部116は、表示情報を通信端末20に送信することによって、分析部115による分析結果を表示部21上に表示させる制御を行う(f)。出力部116は、表示部21への表示に限られず、プリンタによる印刷、記憶装置へのデータ記録等、その他の方法によって分析結果を出力してもよい。 The output unit 116 performs control to display the analysis result by the analysis unit 115 on the display unit 21 by transmitting the display information to the communication terminal 20 (f). The output unit 116 is not limited to the display on the display unit 21 and may output the analysis result by other methods such as printing by a printer, data recording to a storage device, and the like.
 図4は、本実施形態に係る音声分析装置100が行う音声分析方法の全体のフローチャートを示す図である。まず、位置設定部111は、後述の位置設定処理によって、分析対象とする議論の各参加者の位置を特定し、位置記憶部131に記憶させる(S1)。次に音声取得部112は、集音装置10から音声を取得して音声記憶部132に記憶させる(S2)。 FIG. 4 is a diagram showing a flowchart of the entire speech analysis method performed by the speech analysis apparatus 100 according to the present embodiment. First, the position setting unit 111 specifies the position of each participant of the argument to be analyzed by position setting processing described later, and stores the position in the position storage unit 131 (S1). Next, the voice acquisition unit 112 obtains a voice from the sound collection device 10 and stores the voice in the voice storage unit 132 (S2).
 音声分析装置100は、ステップS2で音声取得部112が取得した音声について開始時間から終了時間までの所定の時間範囲(時間窓)ごとに分析する。音源定位部113は、分析対象の音声の時間範囲において、音源定位を実行し、推定した音源の向きを、位置記憶部131に記憶された各参加者の位置と関連付ける(S3)。 The voice analysis device 100 analyzes the voice acquired by the voice acquisition unit 112 in step S2 for each predetermined time range (time window) from the start time to the end time. The sound source localization unit 113 executes sound source localization in the time range of the voice to be analyzed, and associates the estimated direction of the sound source with the position of each participant stored in the position storage unit 131 (S3).
 追従部114は、後述の追従処理によって、分析対象の音声の時間範囲における最新の参加者の位置を取得し、位置記憶部131に記憶された参加者の位置を更新する(S4)。 The following unit 114 acquires the latest participant's position in the time range of the voice to be analyzed by the following processing described later, and updates the participant's position stored in the position storage unit 131 (S4).
 分析部115は、ステップS2で音声取得部112が取得した音声、ステップS3で音源定位部113が推定した音源の向き及び位置記憶部131に記憶された参加者の位置に基づいて、音声を分析する(S5)。分析部115は、分析結果を分析結果記憶部133に記憶させる。 The analysis unit 115 analyzes the voice based on the voice acquired by the voice acquisition unit 112 in step S2, the direction of the sound source estimated in step S3 by the sound source localization unit 113, and the position of the participant stored in the position storage unit 131. To do (S5). The analysis unit 115 stores the analysis result in the analysis result storage unit 133.
 ステップS2で音声取得部112が取得した音声の終了時間まで分析が終了していない場合(S6のNO)には、音声分析装置100は分析対象の音声中の次の時間範囲についてステップS3~S5を繰り返す。ステップS2で音声取得部112が取得した音声の終了時間まで分析が終了した場合(S6のYES)には、出力部116は、ステップS5の分析結果を所定の方法で出力する(S7)。 If the analysis is not completed until the end time of the voice acquired by the voice acquisition unit 112 in step S2 (NO in S6), the voice analysis device 100 executes the steps S3 to S5 for the next time range in the voice to be analyzed. repeat. If the analysis is completed until the end time of the voice acquired by the voice acquisition unit 112 in step S2 (YES in S6), the output unit 116 outputs the analysis result in step S5 according to a predetermined method (S7).
[位置設定処理の説明]
 まず、図4のステップS1に示した位置設定処理について説明する。図5は、設定画面Aを表示している通信端末20の表示部21の前面図である。位置設定処理は、分析者が通信端末20を操作することによって各参加者の位置を設定する手動設定処理と、各参加者が自身の位置を特定するための情報を集音装置10に入力する自動設定処理とを含む。
[Description of positioning process]
First, the position setting process shown in step S1 of FIG. 4 will be described. FIG. 5 is a front view of the display unit 21 of the communication terminal 20 displaying the setting screen A. As shown in FIG. In the position setting process, the analyst manually sets the position of each participant by operating the communication terminal 20, and each participant inputs information for specifying the position of the participant in the sound collection device 10. And automatic setting processing.
 通信端末20は、表示部21上に設定画面Aを表示し、分析者による分析条件の設定を受け付ける。設定画面Aは、位置設定領域A1と、開始ボタンA2と、終了ボタンA3と、自動設定ボタンA4とを含む。位置設定領域A1は、分析対象の議論において、集音装置10を基準として各参加者Uが実際に位置する向きを設定する領域である。例えば位置設定領域A1は、図5のように集音装置10の位置を中心とした円を表し、さらに円に沿って集音装置10を基準とした角度を表している。 The communication terminal 20 displays the setting screen A on the display unit 21 and receives the setting of the analysis condition by the analyst. The setting screen A includes a position setting area A1, a start button A2, an end button A3, and an automatic setting button A4. The positioning area A1 is an area for setting the direction in which each participant U is actually positioned with reference to the sound collection device 10 in the argument to be analyzed. For example, as shown in FIG. 5, the position setting area A1 represents a circle centered on the position of the sound collector 10, and further represents an angle based on the sound collector 10 along the circle.
 手動設定処理を希望する分析者は、通信端末20の操作部22を操作することによって、位置設定領域A1において各参加者Uの位置を設定する。各参加者Uについて設定された位置の近傍には、各参加者Uを識別する識別情報(ここではU1~U4)が割り当てられて表示される。図5の例では、4人の参加者U1~U4が設定されている。位置設定領域A1内の各参加者Uに対応する部分は、参加者ごとに異なる色で表示される。これにより、分析者は容易に各参加者Uが設定されている向きを認識することができる。 The analyst who desires the manual setting process sets the position of each participant U in the position setting area A1 by operating the operation unit 22 of the communication terminal 20. In the vicinity of the position set for each participant U, identification information (here, U1 to U4) for identifying each participant U is allocated and displayed. In the example of FIG. 5, four participants U1 to U4 are set. The portion corresponding to each participant U in the positioning area A1 is displayed in a different color for each participant. Thereby, the analyst can easily recognize the direction in which each participant U is set.
 開始ボタンA2、終了ボタンA3及び自動設定ボタンA4は、それぞれ表示部21上に表示された仮想的なボタンである。通信端末20は、分析者によって開始ボタンA2が押下されると、音声分析装置100に開始指示の信号を送信する。通信端末20は、分析者によって終了ボタンA3が押下されると、音声分析装置100に終了指示の信号を送信する。本実施形態では、分析者による開始指示から終了指示までを1つの議論とする。 The start button A2, the end button A3 and the automatic setting button A4 are virtual buttons displayed on the display unit 21 respectively. The communication terminal 20 transmits a signal of a start instruction to the voice analysis device 100 when the analyst presses the start button A2. The communication terminal 20 transmits a signal of a termination instruction to the voice analysis device 100 when the analyst presses the termination button A3. In the present embodiment, from the start instruction to the end instruction by the analyst is one discussion.
 自動設定処理を希望する分析者は、自動設定ボタンA4を押下することによって、音声分析装置100に自動設定処理を開始させる。通信端末20は、自動設定ボタンA4が押下されると、音声分析装置100に自動設定指示の信号を送信する。 An analyst who desires the automatic setting process causes the voice analysis device 100 to start the automatic setting process by pressing the automatic setting button A4. When the automatic setting button A4 is pressed, the communication terminal 20 transmits an automatic setting instruction signal to the voice analysis device 100.
[自動設定処理の説明]
 図6(a)~図6(c)は、それぞれ本実施形態に係る音声分析装置100が行う自動設定処理の模式図である。音声分析装置100は、自動設定処理が指示されると、図6(a)~図6(c)に示す処理のうち少なくとも1つによって参加者Uの位置を設定する。
[Description of automatic setting process]
FIGS. 6A to 6C are schematic views of the automatic setting process performed by the voice analysis device 100 according to the present embodiment. When the automatic setting process is instructed, the voice analysis device 100 sets the position of the participant U by at least one of the processes shown in FIGS. 6 (a) to 6 (c).
 図6(a)は参加者Uが発した音声に基づいて参加者Uの位置を設定する処理を示す。この場合に、音声分析装置100の位置設定部111は、集音装置10の集音部に、各参加者Uの発した音声を取得させる。位置設定部111は、集音装置10が取得した音声を取得する。 FIG. 6A shows a process of setting the position of the participant U based on the voice uttered by the participant U. In this case, the position setting unit 111 of the voice analysis device 100 causes the sound collection unit of the sound collection device 10 to acquire the voice emitted by each participant U. The position setting unit 111 acquires the sound acquired by the sound collection device 10.
 位置設定部111は、取得した音声が発せられた向きに基づいて、各参加者Uの向きを特定する。位置設定部111は、音声から参加者の向きを特定するために、上述の音源定位部113による音源定位の結果を用いる。そして位置設定部111は、特定した各参加者Uの位置を、位置記憶部131に記憶させる。 The position setting unit 111 specifies the direction of each participant U based on the direction in which the acquired voice is emitted. The position setting unit 111 uses the result of sound source localization by the above-described sound source localization unit 113 in order to specify the direction of the participant from the voice. Then, the position setting unit 111 causes the position storage unit 131 to store the positions of the identified participants U.
 位置設定部111は、取得した各参加者Uの音声を、予め音声分析装置100に記憶された個人の音声と比較することによって、参加者Uの個人を特定してもよい。例えば位置設定部111は、各参加者Uの音声の声紋(すなわち声の周波数スペクトル)を比較することによって、個人を特定する。これにより、分析結果とともに参加者Uの個人情報を表示したり、同じ参加者Uの複数の分析結果を表示したりすることができる。 The position setting unit 111 may specify the individual of the participant U by comparing the acquired voice of each participant U with the voice of the individual stored in advance in the voice analysis device 100. For example, the position setting unit 111 identifies the individual by comparing the voiceprints of the voices of the participants U (that is, the frequency spectrum of the voice). As a result, personal information of the participant U can be displayed together with the analysis result, and a plurality of analysis results of the same participant U can be displayed.
 図6(b)は参加者Uの顔の画像に基づいて参加者Uの位置を設定する処理を示す。この場合に、音声分析装置100の位置設定部111は、集音装置10に設けられた撮像部11に、集音装置10を取り囲む全ての参加者Uの顔を含む領域を撮像させる。位置設定部111は、撮像部11が撮像した画像を取得する。 FIG. 6B shows a process of setting the position of the participant U based on the image of the face of the participant U. In this case, the position setting unit 111 of the voice analysis device 100 causes the imaging unit 11 provided in the sound collection device 10 to capture an area including the faces of all the participants U surrounding the sound collection device 10. The position setting unit 111 acquires an image captured by the imaging unit 11.
 位置設定部111は、取得した画像中の各参加者Uの顔を認識する。位置設定部111は、画像から人間の顔を認識するために公知の顔認識技術を用いることができる。そして位置設定部111は、画像から認識した各参加者Uの顔の位置に基づいて、集音装置10を基準とした各参加者Uの位置を特定し、位置記憶部131に記憶させる。画像中の位置(例えば画像中の画素の座標)と、集音装置10を基準とした位置(例えば集音装置10に対する角度)との関係は、予め音声分析装置100に設定される。 The position setting unit 111 recognizes the face of each participant U in the acquired image. The position setting unit 111 can use a known face recognition technology to recognize a human face from an image. Then, the position setting unit 111 identifies the position of each participant U based on the sound collection device 10 based on the position of the face of each participant U recognized from the image, and stores the position in the position storage unit 131. The relationship between the position in the image (for example, the coordinates of the pixels in the image) and the position based on the sound collection device 10 (for example, the angle with respect to the sound collection device 10) is set in advance in the voice analysis device 100.
 位置設定部111は、画像から認識した各参加者Uの顔を、予め音声分析装置100に記憶された個人の顔と比較することによって、参加者Uの個人を特定してもよい。これにより、分析結果とともに参加者Uの個人情報を表示したり、同じ参加者Uの複数の分析結果を表示したりすることができる。 The position setting unit 111 may specify the individual of the participant U by comparing the face of each participant U recognized from the image with the face of the individual stored in the voice analysis device 100 in advance. As a result, personal information of the participant U can be displayed together with the analysis result, and a plurality of analysis results of the same participant U can be displayed.
 図6(c)は参加者Uが提示したカードCの情報に基づいて参加者Uの位置を設定する処理を示す。この場合に、音声分析装置100の位置設定部111は、集音装置10に設けられた読取部12に、各参加者Uが提示したカードCの情報を読み取らせる。位置設定部111は、読取部12が読み取ったカードCの情報と、カードCを提示した参加者Uの向きとを取得する。位置設定部111は、取得したカードCの情報及び参加者Uの向きに基づいて、集音装置10を基準とした各参加者Uの位置を特定し、位置記憶部131に記憶させる。 FIG. 6C shows a process of setting the position of the participant U based on the information of the card C presented by the participant U. In this case, the position setting unit 111 of the voice analysis device 100 causes the reading unit 12 provided in the sound collection device 10 to read the information of the card C presented by each participant U. The position setting unit 111 acquires the information on the card C read by the reading unit 12 and the direction of the participant U who presented the card C. The position setting unit 111 identifies the position of each participant U with respect to the sound collection device 10 based on the acquired information of the card C and the direction of the participant U, and causes the position storage unit 131 to store the position.
 位置設定部111は、取得したカードCの情報を用いて、予め音声分析装置100に記憶された個人情報を取得することによって、参加者Uの個人を特定してもよい。これにより、分析結果とともに参加者Uの個人情報を表示したり、同じ参加者Uの複数の分析結果を表示したりすることができる。 The position setting unit 111 may specify the individual of the participant U by acquiring personal information stored in advance in the voice analysis device 100 using the acquired information of the card C. As a result, personal information of the participant U can be displayed together with the analysis result, and a plurality of analysis results of the same participant U can be displayed.
 位置設定部111は、自動設定処理と手動設定処理とを組み合わせて実行してもよい。この場合に、例えば位置設定部111は、図6(a)~図6(c)の自動設定処理によって設定された各参加者Uの位置を図5の位置設定領域A1に表示し、さらに分析者による手動の設定を受け付ける。これにより、自動設定処理によって設定された各参加者Uの位置を手動設定処理によって修正し、各参加者Uの位置をより確実に設定することができる。 The position setting unit 111 may execute the automatic setting process and the manual setting process in combination. In this case, for example, the position setting unit 111 displays the position of each participant U set by the automatic setting process of FIGS. 6A to 6C in the position setting area A1 of FIG. Accept manual settings by the user. Thereby, the position of each participant U set by the automatic setting process can be corrected by the manual setting process, and the position of each participant U can be set more reliably.
 このように音声分析装置100は集音装置10上で取得した参加者Uに関する情報に基づいて各参加者Uの位置を自動的に設定できるため、分析者が通信端末20上で全てのグループの各参加者Uの位置を設定する手間を削減することができる。音声分析装置100は、集音装置10上で取得できる参加者Uに関する情報(すなわち参加者Uの位置を特定するための情報)として、音声、画像又はカードの情報に限られず、参加者Uの向きを特定可能なその他の情報を用いてもよい。 Thus, since the voice analysis device 100 can automatically set the position of each participant U based on the information on the participant U acquired on the sound collection device 10, the analyst can set the positions of all the groups on the communication terminal 20. It is possible to reduce the trouble of setting the position of each participant U. The voice analysis device 100 is not limited to information on voice, an image, or a card as information on the participant U that can be acquired on the sound collection device 10 (that is, information for specifying the position of the participant U). Other information that can identify the orientation may be used.
 図7は、本実施形態に係る音声分析装置100が行う位置設定処理のフローチャートを示す図である。まず、位置設定部111は、図5は設定画面Aにおいて分析者によって自動設定処理が指示されたか否かを判定する。自動設定処理が指示されていない場合(すなわち手動設定の場合)に(S11のNO)、位置設定部111は、通信端末20上に表示された設定画面Aにおいて入力された内容に従って各参加者の位置を特定し、位置記憶部131に設定する(S12)。 FIG. 7 is a diagram showing a flowchart of position setting processing performed by the voice analysis device 100 according to the present embodiment. First, the position setting unit 111 determines whether or not the automatic setting process is instructed by the analyst on the setting screen A in FIG. 5. When the automatic setting process is not instructed (ie, in the case of manual setting) (NO in S11), the position setting unit 111 follows the contents input on the setting screen A displayed on the communication terminal 20 for each participant. The position is specified and set in the position storage unit 131 (S12).
 自動設定が指示された場合に(S11のYES)、位置設定部111は、集音装置10上で参加者に関する情報(すなわち参加者の位置を特定するための情報)を取得する(S13)。位置設定部111は、参加者に関する情報として、上述の参加者の発した音声、参加者の顔の画像、及び参加者の提示したカードの情報のうち少なくとも1つを用いる。 When automatic setting is instructed (YES in S11), the position setting unit 111 acquires information on the participant (that is, information for specifying the position of the participant) on the sound collection device 10 (S13). The position setting unit 111 uses at least one of the voice of the participant, the image of the face of the participant, and the information of the card presented by the participant as the information on the participant.
 位置設定部111は、取得した参加者に関する情報に基づいて、集音装置10を基準とした各参加者Uの位置を特定する(S14)。そして位置設定部111は、特定した各参加者の位置を位置記憶部131に記憶させることにより、参加者の位置を設定する(S15)。 The position setting unit 111 specifies the position of each participant U with respect to the sound collection device 10 based on the acquired information on the participants (S14). Then, the position setting unit 111 sets the positions of the participants by storing the positions of the identified participants in the position storage unit 131 (S15).
[追従処理の説明]
 次に、図4のステップS4に示した追従処理について説明する。図8は、本実施形態に係る音声分析装置100が行う追従処理の模式図である。追従処理は、音源定位部113及び分析部115による分析対象の音声の途中で、位置記憶部131に記憶された各参加者Uの位置を更新する処理である。
[Description of follow-up processing]
Next, the follow-up process shown in step S4 of FIG. 4 will be described. FIG. 8 is a schematic view of the follow-up process performed by the voice analysis device 100 according to the present embodiment. The following process is a process of updating the position of each participant U stored in the position storage unit 131 in the middle of the voice to be analyzed by the sound source localization unit 113 and the analysis unit 115.
 図8の上図には更新前の各参加者Uの位置が示され、図8の下図には更新後の各参加者Uの位置が示されている。図8の上図は、参加者U1が、位置記憶部131に設定された位置P1から、別の位置P2に移動した状態である。この状態では、参加者U1が発した音声は、参加者U1の設定された位置P1とは異なる位置P2から集音装置10に入る。そのため、分析部115は、該音声から参加者U1の発言を検出することができない。 The upper part of FIG. 8 shows the position of each participant U before the update, and the lower part of FIG. 8 shows the position of each participant U after the update. The upper view of FIG. 8 shows a state in which the participant U1 has moved from the position P1 set in the position storage section 131 to another position P2. In this state, the voice emitted by the participant U1 enters the sound collector 10 from a position P2 different from the set position P1 of the participant U1. Therefore, the analysis unit 115 can not detect the utterance of the participant U1 from the voice.
 そこで追従部114は、図8の下図のように、位置記憶部131において、参加者U1の位置を位置P1から位置P2に更新する。これにより、分析部115は参加者U1の発言を正しく検出することができる。 Therefore, the tracking unit 114 updates the position of the participant U1 from the position P1 to the position P2 in the position storage unit 131 as illustrated in the lower part of FIG. Thus, the analysis unit 115 can correctly detect the utterance of the participant U1.
 各参加者Uの位置の更新のために、追従部114は、所定の時間(例えば1分)ごとに、音源定位部113が推定した音源の向きを取得する。追従部114は、推定された音源の向きが、位置記憶部131に記憶された各参加者Uの位置のいずれにも対応しない場合に、いずれかの参加者Uが音源の向きに移動したと判定する。そして追従部114は移動した参加者Uを特定し、移動した参加者Uについて位置記憶部131に記憶された位置を、音源の向きに対応する位置に更新する。 In order to update the position of each participant U, the tracking unit 114 acquires the direction of the sound source estimated by the sound source localization unit 113 every predetermined time (for example, one minute). If the estimated direction of the sound source does not correspond to any of the positions of the participants U stored in the position storage unit 131, the tracking unit 114 determines that any participant U moves in the direction of the sound source judge. Then, the tracking unit 114 specifies the moved participant U, and updates the position stored in the position storage unit 131 for the moved participant U to a position corresponding to the direction of the sound source.
 例えば追従部114は、集音装置10が取得した音声から推定された音源の向きに最も近い位置に設定された参加者Uが、該音源の向きに対応する位置に移動したと特定する。この場合に追従部114は、音源の向きから所定の範囲内(例えば-30度から+30度の範囲内)の位置に設定された参加者Uの中から、移動した参加者Uを選択してもよい。移動の範囲を制限することにより、追従部114は、例えば参加者Uの位置を誤った位置へ移動させてしまうことを抑制することができる。 For example, the tracking unit 114 specifies that the participant U set at the position closest to the direction of the sound source estimated from the sound acquired by the sound collection device 10 has moved to the position corresponding to the direction of the sound source. In this case, the tracking unit 114 selects the moved participant U from among the participants U set at a position within a predetermined range (for example, within a range of -30 degrees to +30 degrees) from the direction of the sound source. It is also good. By limiting the range of movement, the following unit 114 can suppress, for example, the position of the participant U from being moved to the wrong position.
 また、追従部114は、音源の声紋を各参加者Uの声紋と比較し、音源の声紋に類似する声紋を有する参加者Uが、該音源の向きに対応する位置に移動したと特定してもよい。この場合に追従部114は、分析開始時の各参加者Uの音声から各参加者Uの声紋を取得してもよく、あるいは予め記憶部130に記憶した各参加者Uの声紋を取得してもよい。追従部114は、音源の声紋と各参加者Uの声紋との間で声紋の類似度を算出する。追従部114は、声紋の類似度がグループの中で最も高い参加者Uを選択し、あるいは声紋の類似度が所定の閾値以上の参加者Uを選択する。声紋を用いて移動した参加者Uを特定することにより、追従の精度を向上させることができる。 Also, the tracking unit 114 compares the voiceprint of the sound source with the voiceprint of each participant U, and specifies that the participant U having the voiceprint similar to the voiceprint of the sound source has moved to the position corresponding to the direction of the sound source. It is also good. In this case, the tracking unit 114 may acquire the voiceprint of each participant U from the voice of each participant U at the start of analysis, or may acquire the voiceprint of each participant U stored in the storage unit 130 in advance. It is also good. The tracking unit 114 calculates the degree of similarity of voiceprints between the voiceprint of the sound source and the voiceprint of each participant U. The tracking unit 114 selects the participant U whose voiceprint similarity is the highest in the group, or selects the participant U whose voiceprint similarity is equal to or higher than a predetermined threshold. The tracking accuracy can be improved by specifying the moved participant U using the voiceprint.
 また、追従部114は、集音装置10の撮像部11が撮像した画像中の、音源の向きに位置する顔を取得し、取得した顔に類似する顔を有する参加者Uが該音源の向きに対応する位置に移動したと特定してもよい。この場合に追従部114は、分析開始時に撮像部11が撮像した画像から各参加者Uの顔を取得してもよく、あるいは予め記憶部130に記憶した各参加者Uの顔を取得してもよい。追従部114は、音源の向きに位置する顔と各参加者Uの顔との間で顔の類似度を算出する。追従部114は、顔の類似度がグループの中で最も高い参加者Uを選択し、あるいは顔の類似度が所定の閾値以上の参加者Uを選択する。顔を用いて移動した参加者Uを特定することにより、追従の精度を向上させることができる。 Further, the following unit 114 acquires a face located in the direction of the sound source in the image captured by the imaging unit 11 of the sound collection device 10, and the participant U having a face similar to the acquired face has the direction of the sound source It may specify that it moved to the position corresponding to. In this case, the tracking unit 114 may acquire the face of each participant U from the image captured by the imaging unit 11 at the start of analysis, or may acquire the face of each participant U stored in the storage unit 130 in advance. It is also good. The tracking unit 114 calculates the similarity of the face between the face located in the direction of the sound source and the face of each participant U. The tracking unit 114 selects the participant U whose face similarity is the highest in the group, or selects the participant U whose face similarity is equal to or higher than a predetermined threshold. By specifying the moved participant U using the face, it is possible to improve the tracking accuracy.
 声紋又は顔を用いて追従する場合に、追従部114は、音源の向きと、位置記憶部131に記憶された位置(向き)との間の差に基づいて、各参加者Uについての声紋又は顔の類似度を重み付けしてもよい。参加者Uに設定された位置と音源の向きとが近いほど参加者Uが音源の位置に移動した確率が高く、参加者Uに設定された位置と音源の向きとが遠いほど参加者Uが音源の位置に移動した確率が低いといえる。そこで追従部114は、参加者Uに設定された位置と音源の向きとの差が小さいほど声紋又は顔の類似度を高く重み付けし、参加者Uに設定された位置と音源の向きとの差が大きいほど声紋又は顔の類似度を低く重み付けする。これによって、追従の精度をより向上させることができる。 When following using voiceprints or faces, the following unit 114 recognizes voiceprints for each participant U based on the difference between the direction of the sound source and the position (direction) stored in the position storage unit 131. The face similarity may be weighted. The probability that the participant U moves to the position of the sound source is higher as the position set for the participant U and the direction of the sound source are closer, and the distance between the position set for the participant U and the direction of the sound source is higher It can be said that the probability of moving to the position of the sound source is low. Therefore, the tracking unit 114 weights the similarity of the voiceprint or face higher as the difference between the position set for the participant U and the direction of the sound source decreases, and the difference between the position set for the participant U and the direction of the sound source The greater the value of V, the lower the degree of similarity of the voiceprint or face. This can further improve the tracking accuracy.
 図9は、本実施形態に係る音声分析装置100が行う追従処理のフローチャートを示す図である。まず、追従部114は、音源定位部113が推定した音源の向きを取得する。音源の向きが、位置記憶部131に記憶された各参加者Uの位置のいずれかに対応している場合に(S41のYES)、追従部114は位置の更新を行わずに処理を終了する。 FIG. 9 is a diagram showing a flowchart of the follow-up process performed by the voice analysis device 100 according to the present embodiment. First, the tracking unit 114 acquires the direction of the sound source estimated by the sound source localization unit 113. If the direction of the sound source corresponds to any of the positions of the participants U stored in the position storage unit 131 (YES in S41), the following unit 114 ends the process without updating the position. .
 音源の向きが、位置記憶部131に記憶された各参加者Uの位置のいずれにも対応していない場合に(S41のNO)、追従部114は、集音装置10上で参加者に関する情報(すなわち参加者の位置を特定するための情報)を取得する(S42)。追従部114は、参加者に関する情報として、上述の参加者の発した音声及び参加者の顔の画像のうち少なくとも一方を用いる。 When the direction of the sound source does not correspond to any of the positions of the participants U stored in the position storage unit 131 (NO in S41), the following unit 114 selects information about the participants on the sound collection device 10 (Ie, information for specifying the position of the participant) is acquired (S42). The tracking unit 114 uses at least one of the voice of the participant and the image of the face of the participant as the information on the participant.
 追従部114は、取得した参加者に関する情報に基づいて、いずれの参加者が移動したかを特定する(S43)。そして追従部114は、移動したと特定した参加者Uについて、位置記憶部131に記憶された位置を、音源の向きに対応する位置に更新する(S44)。 The following unit 114 specifies which participant has moved based on the acquired information on the participant (S43). Then, the tracking unit 114 updates the position stored in the position storage unit 131 to the position corresponding to the direction of the sound source for the participant U specified as having moved (S44).
[本実施形態の効果]
 本実施形態に係る音声分析装置100は、各グループに配置される集音装置10において参加者の発する音声、参加者の顔の画像、参加者が提示するカードの情報等の各参加者に関する情報を取得し、取得した情報に基づいて自動的に各参加者の位置を設定する。そのため、議論の音声を分析する際に各グループについて各参加者の位置を設定する手間を削減できる。
[Effect of this embodiment]
The voice analysis device 100 according to the present embodiment is information on each participant such as a voice emitted by the participant, an image of the participant's face, information of a card presented by the participant, etc. in the sound collection device 10 arranged in each group. And automatically set the position of each participant based on the acquired information. Therefore, when analyzing the voice of the discussion, it is possible to reduce the trouble of setting the position of each participant for each group.
 また、音声分析装置100は、音声の分析中に各参加者に関する情報に基づいて各参加者の位置を更新する。そのため、音声の取得途中で参加者が移動した場合であっても、追従して分析することができる。 In addition, the voice analysis device 100 updates the position of each participant based on the information on each participant during analysis of voice. Therefore, even if the participant moves in the middle of acquiring the voice, it is possible to follow and analyze.
 以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されず、その要旨の範囲内で種々の変形及び変更が可能である。例えば、装置の分散・統合の具体的な実施の形態は、以上の実施の形態に限られず、その全部又は一部について、任意の単位で機能的又は物理的に分散・統合して構成することができる。また、複数の実施の形態の任意の組み合わせによって生じる新たな実施の形態も、本発明の実施の形態に含まれる。組み合わせによって生じる新たな実施の形態の効果は、もとの実施の形態の効果を合わせ持つ。 As mentioned above, although the present invention was explained using an embodiment, the technical scope of the present invention is not limited to the range given in the above-mentioned embodiment, and various modification and change are possible within the range of the gist. is there. For example, a specific embodiment of device distribution and integration is not limited to the above embodiment, and all or a part thereof may be functionally or physically distributed and integrated in any unit. Can. In addition, new embodiments produced by any combination of a plurality of embodiments are also included in the embodiments of the present invention. The effects of the new embodiment generated by the combination combine the effects of the original embodiment.
 上述の説明において、音声分析装置100は議論における音声の分析に用いられているが、その他の用途にも適用できる。例えば音声分析装置100は、自動車の中に着座している乗客の発する音声を分析することもできる。 In the above description, the speech analysis device 100 is used for analysis of speech in the discussion, but can be applied to other applications. For example, the voice analysis device 100 can also analyze the voice emitted by a passenger sitting in a car.
 音声分析装置100、集音装置10及び通信端末20のプロセッサは、図4、7、9に示す音声分析方法に含まれる各ステップ(工程)の主体となる。すなわち、音声分析装置100、集音装置10及び通信端末20のプロセッサは、図4、7、9に示す音声分析方法を実行するためのプログラムを記憶部から読み出し、該プログラムを実行して音声分析装置100、集音装置10及び通信端末20の各部を制御することによって、図4、7、9に示す音声分析方法を実行する。図4、7、9に示す音声分析方法に含まれるステップは一部省略されてもよく、ステップ間の順番が変更されてもよく、複数のステップが並行して行われてもよい。 The processor of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 is a main body of each step (process) included in the voice analysis method shown in FIGS. That is, the processors of the voice analysis device 100, the sound collection device 10 and the communication terminal 20 read a program for executing the voice analysis method shown in FIGS. 4, 7 and 9 from the storage unit and execute the program to perform voice analysis. By controlling the respective units of the device 100, the sound collection device 10 and the communication terminal 20, the voice analysis method shown in FIGS. The steps included in the speech analysis method shown in FIGS. 4, 7 and 9 may be partially omitted, the order between the steps may be changed, and a plurality of steps may be performed in parallel.
S 音声分析システム
100 音声分析装置
110 制御部
111 位置設定部
112 音声取得部
114 追従部
115 分析部
10 集音装置
20 通信端末
S speech analysis system 100 speech analysis device 110 control unit 111 position setting unit 112 speech acquisition unit 114 tracking unit 115 analysis unit 10 sound collection device 20 communication terminal

Claims (11)

  1.  集音装置から複数の参加者に関する情報を取得し、取得した前記参加者に関する情報に基づいて前記複数の参加者それぞれの位置を設定する設定部と、
     前記集音装置から音声を取得する取得部と、
     前記設定部が設定した前記位置に基づいて、前記複数の参加者それぞれが発した前記音声を分析する分析部と、
     を有する音声分析装置。
    A setting unit configured to acquire information on a plurality of participants from the sound collection device, and to set positions of the plurality of participants based on the acquired information on the participants;
    An acquisition unit that acquires voice from the sound collection device;
    An analysis unit that analyzes the voice emitted by each of the plurality of participants based on the position set by the setting unit;
    Voice analyzer with.
  2.  前記設定部は、前記参加者に関する情報として前記集音装置から音声を取得し、取得した前記音声が発せられた向きを特定することによって、前記複数の参加者それぞれの位置を設定する、請求項1に記載の音声分析装置。 The setting unit sets a position of each of the plurality of participants by acquiring a voice from the sound collection device as the information on the participant, and specifying a direction in which the acquired voice is emitted. The voice analysis device according to 1.
  3.  前記設定部は、前記参加者に関する情報として前記集音装置上に設けられた撮像部が撮像した画像を取得し、取得した前記画像に含まれる前記複数の参加者の顔を認識することによって、前記複数の参加者それぞれの位置を設定する、請求項1又は2に記載の音声分析装置。 The setting unit acquires an image captured by an imaging unit provided on the sound collection device as the information on the participant, and recognizes the faces of the plurality of participants included in the acquired image. The voice analysis device according to claim 1, wherein a position of each of the plurality of participants is set.
  4.  前記設定部は、前記参加者に関する情報として前記集音装置上に設けられた読取部が読み取ったカードの情報を取得し、前記カードが前記読取部に提示された向きに従って前記複数の参加者それぞれの位置を設定する、請求項1から3のいずれか一項に記載の音声分析装置。 The setting unit acquires information of a card read by a reading unit provided on the sound collection device as the information on the participant, and the plurality of participants are each according to the direction in which the card is presented to the reading unit. The voice analysis device according to any one of claims 1 to 3, wherein the position of is set.
  5.  前記設定部は、前記参加者に関する情報に加えて、通信端末において入力された情報に基づいて、前記複数の参加者それぞれの位置を設定する、請求項1から4のいずれか一項に記載の音声分析装置。 The setting unit sets the position of each of the plurality of participants based on the information input in the communication terminal in addition to the information on the participants. Voice analyzer.
  6.  前記分析部が分析している前記音声の途中において、前記設定部が設定した前記位置を更新する追従部をさらに有する、請求項1から5のいずれか一項に記載の音声分析装置。 The voice analysis device according to any one of claims 1 to 5, further comprising: a following unit that updates the position set by the setting unit in the middle of the voice being analyzed by the analysis unit.
  7.  前記追従部は、前記分析部が分析している前記音声が発せられた向きが、前記設定部が設定した前記位置に対応しない場合に、前記設定部が設定した前記位置を更新する、請求項6に記載の音声分析装置。 The tracking unit updates the position set by the setting unit when the direction in which the voice analyzed by the analysis unit is emitted does not correspond to the position set by the setting unit. The voice analysis device according to 6.
  8.  前記追従部は、前記設定部が設定した前記位置を、前記分析部が分析している前記音声が発せられた向きに更新する、請求項6又は7に記載の音声分析装置。 The voice analysis device according to claim 6, wherein the tracking unit updates the position set by the setting unit in a direction in which the voice analyzed by the analysis unit is emitted.
  9.  プロセッサが、
     集音装置から複数の参加者に関する情報を取得し、取得した前記参加者に関する情報に基づいて前記複数の参加者それぞれの位置を設定するステップと、
     前記集音装置から音声を取得するステップと、
     前記設定するステップで設定された前記位置に基づいて、前記複数の参加者それぞれが発した前記音声を分析するステップと、
     を実行する音声分析方法。
    Processor is
    Acquiring information on a plurality of participants from the sound collection device, and setting the position of each of the plurality of participants based on the acquired information on the participants;
    Acquiring voice from the sound collection device;
    Analyzing the voice emitted by each of the plurality of participants based on the position set in the setting step;
    Voice analysis method to perform.
  10.  コンピュータに、
     集音装置から複数の参加者に関する情報を取得し、取得した前記参加者に関する情報に基づいて前記複数の参加者それぞれの位置を設定するステップと、
     前記集音装置から音声を取得するステップと、
     前記設定するステップで設定された前記位置に基づいて、前記複数の参加者それぞれが発した前記音声を分析するステップと、
     を実行させる音声分析プログラム。
    On the computer
    Acquiring information on a plurality of participants from the sound collection device, and setting the position of each of the plurality of participants based on the acquired information on the participants;
    Acquiring voice from the sound collection device;
    Analyzing the voice emitted by each of the plurality of participants based on the position set in the setting step;
    Voice analysis program to run.
  11.  音声分析装置と、前記音声分析装置と通信可能な集音装置と、を備え、
     前記集音装置は、音声を取得するとともに、複数の参加者に関する情報を取得するように構成され、
     前記音声分析装置は、
      前記集音装置から前記参加者に関する情報を取得し、取得した前記参加者に関する情報に基づいて前記複数の参加者それぞれの位置を設定する設定部と、
      前記集音装置から前記音声を取得する取得部と、
      前記設定部が設定した前記位置に基づいて、前記複数の参加者それぞれが発した前記音声を分析する分析部と、
     を有する、音声分析システム。
    A voice analysis device, and a sound collection device capable of communicating with the voice analysis device;
    The sound collection device is configured to obtain audio and to obtain information on a plurality of participants,
    The voice analysis device
    A setting unit configured to acquire information on the participant from the sound collection device, and to set positions of the plurality of participants based on the acquired information on the participant;
    An acquisition unit for acquiring the sound from the sound collection device;
    An analysis unit that analyzes the voice emitted by each of the plurality of participants based on the position set by the setting unit;
    Voice analysis system.
PCT/JP2018/000943 2018-01-16 2018-01-16 Voice analysis device, voice analysis method, voice analysis program, and voice analysis system WO2019142232A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2018/000943 WO2019142232A1 (en) 2018-01-16 2018-01-16 Voice analysis device, voice analysis method, voice analysis program, and voice analysis system
JP2018502280A JP6589041B1 (en) 2018-01-16 2018-01-16 Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/000943 WO2019142232A1 (en) 2018-01-16 2018-01-16 Voice analysis device, voice analysis method, voice analysis program, and voice analysis system

Publications (1)

Publication Number Publication Date
WO2019142232A1 true WO2019142232A1 (en) 2019-07-25

Family

ID=67301394

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/000943 WO2019142232A1 (en) 2018-01-16 2018-01-16 Voice analysis device, voice analysis method, voice analysis program, and voice analysis system

Country Status (2)

Country Link
JP (1) JP6589041B1 (en)
WO (1) WO2019142232A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05161190A (en) * 1991-12-09 1993-06-25 Toda Constr Co Ltd Sound response microphone
JP2000356674A (en) * 1999-06-11 2000-12-26 Japan Science & Technology Corp Sound source identification device and its identification method
JP2005274707A (en) * 2004-03-23 2005-10-06 Sony Corp Information processing apparatus and method, program, and recording medium
JP2006189626A (en) * 2005-01-06 2006-07-20 Fuji Photo Film Co Ltd Recording device and voice recording program
JP2017129873A (en) * 2017-03-06 2017-07-27 本田技研工業株式会社 Conversation assist device, method for controlling conversation assist device, and program for conversation assist device
JP2017173768A (en) * 2016-03-25 2017-09-28 グローリー株式会社 Minutes creation system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160026317A (en) * 2014-08-29 2016-03-09 삼성전자주식회사 Method and apparatus for voice recording

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05161190A (en) * 1991-12-09 1993-06-25 Toda Constr Co Ltd Sound response microphone
JP2000356674A (en) * 1999-06-11 2000-12-26 Japan Science & Technology Corp Sound source identification device and its identification method
JP2005274707A (en) * 2004-03-23 2005-10-06 Sony Corp Information processing apparatus and method, program, and recording medium
JP2006189626A (en) * 2005-01-06 2006-07-20 Fuji Photo Film Co Ltd Recording device and voice recording program
JP2017173768A (en) * 2016-03-25 2017-09-28 グローリー株式会社 Minutes creation system
JP2017129873A (en) * 2017-03-06 2017-07-27 本田技研工業株式会社 Conversation assist device, method for controlling conversation assist device, and program for conversation assist device

Also Published As

Publication number Publication date
JP6589041B1 (en) 2019-10-09
JPWO2019142232A1 (en) 2020-01-23

Similar Documents

Publication Publication Date Title
CN110443110B (en) Face recognition method, device, terminal and storage medium based on multipath camera shooting
CN108039988B (en) Equipment control processing method and device
JP2019082990A (en) Identity authentication method, terminal equipment, and computer readable storage medium
EP3791390A1 (en) Voice identification enrollment
CN105354543A (en) Video processing method and apparatus
US10922570B1 (en) Entering of human face information into database
CN111883168B (en) Voice processing method and device
KR102263154B1 (en) Smart mirror system and realization method for training facial sensibility expression
JP2020148931A (en) Discussion analysis device and discussion analysis method
CN110505504A (en) Video program processing method, device, computer equipment and storage medium
CN114556469A (en) Data processing method and device, electronic equipment and storage medium
CN110941992B (en) Smile expression detection method and device, computer equipment and storage medium
CN113143193A (en) Intelligent vision testing method, device and system
JP6646134B2 (en) Voice analysis device, voice analysis method, voice analysis program, and voice analysis system
WO2019142232A1 (en) Voice analysis device, voice analysis method, voice analysis program, and voice analysis system
JP6975755B2 (en) Voice analyzer, voice analysis method, voice analysis program and voice analysis system
JP6589042B1 (en) Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system
CN113409822B (en) Object state determining method and device, storage medium and electronic device
JP6589040B1 (en) Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system
JP2015177490A (en) Image/sound processing system, information processing apparatus, image/sound processing method, and image/sound processing program
JP7427274B2 (en) Speech analysis device, speech analysis method, speech analysis program and speech analysis system
CN111310602A (en) System and method for analyzing attention of exhibit based on emotion recognition
JP7414319B2 (en) Speech analysis device, speech analysis method, speech analysis program and speech analysis system
JP7261462B2 (en) Speech analysis device, speech analysis system and speech analysis method
CN112035639B (en) Intelligent automatic question answering robot system

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018502280

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18900852

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.10.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18900852

Country of ref document: EP

Kind code of ref document: A1