WO2019142232A1 - Voice analysis device, voice analysis method, voice analysis program, and voice analysis system - Google Patents
Voice analysis device, voice analysis method, voice analysis program, and voice analysis system Download PDFInfo
- Publication number
- WO2019142232A1 WO2019142232A1 PCT/JP2018/000943 JP2018000943W WO2019142232A1 WO 2019142232 A1 WO2019142232 A1 WO 2019142232A1 JP 2018000943 W JP2018000943 W JP 2018000943W WO 2019142232 A1 WO2019142232 A1 WO 2019142232A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- unit
- participant
- participants
- analysis
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Definitions
- the present invention relates to a voice analysis device for analyzing voice, a voice analysis method, a voice analysis program and a voice analysis system.
- the Harkness method (also referred to as the Harkness method) is known as a method for analyzing discussions in group learning and meetings (see, for example, Non-Patent Document 1).
- the Harkness method the transition of each participant's utterance is recorded in a line. In this way, it is possible to analyze the contribution of each participant to the discussion and the relationship with others.
- the Harkness Law can also be effectively applied to active learning where students take the initiative in learning.
- the present invention has been made in view of these points, and it is an object of the present invention to provide a voice analysis device, a voice analysis method, a voice analysis program, and a voice analysis system capable of reducing the time and effort required to set the positions of participants when analyzing voices of discussion. Intended to be provided.
- the voice analysis device acquires information on a plurality of participants from the sound collection device, and sets a position of each of the plurality of participants based on the acquired information on the participants And an acquisition unit configured to acquire a voice from the sound collection device, and an analysis unit configured to analyze the voice uttered by each of the plurality of participants based on the position set by the setting unit.
- the setting unit may set a position of each of the plurality of participants by acquiring a voice from the sound collection device as the information on the participant and specifying a direction in which the acquired voice is emitted. .
- the setting unit acquires an image captured by an imaging unit provided on the sound collection device as the information on the participant, and recognizes the faces of the plurality of participants included in the acquired image.
- the positions of each of the plurality of participants may be set.
- the setting unit acquires information of a card read by a reading unit provided on the sound collection device as the information on the participant, and the plurality of participants are each according to the direction in which the card is presented to the reading unit.
- the position of may be set.
- the setting unit may set the position of each of the plurality of participants based on the information input in the communication terminal in addition to the information on the participants.
- It may further include a following unit that updates the position set by the setting unit in the middle of the voice being analyzed by the analysis unit.
- the tracking unit may update the position set by the setting unit when the direction in which the voice analyzed by the analysis unit is emitted does not correspond to the position set by the setting unit. .
- the tracking unit may update the position set by the setting unit to a direction in which the voice analyzed by the analysis unit is emitted.
- the processor acquires information on a plurality of participants from the sound collection device, and sets the position of each of the plurality of participants based on the acquired information on the participants Performing the steps of: acquiring voice from the sound collection device; and analyzing the voice emitted by each of the plurality of participants based on the position set in the setting step. .
- the voice analysis program acquires information on a plurality of participants from a sound collection device on a computer, and sets the positions of the plurality of participants based on the acquired information on the participants. Performing the steps of: obtaining voice from the sound collection device; and analyzing the voice emitted by each of the plurality of participants based on the position set in the setting step. .
- a voice analysis system includes a voice analysis device and a sound collection device capable of communicating with the voice analysis device, wherein the sound collection device acquires voice and a plurality of participants.
- the voice analysis device acquires information on the participant from the sound collection device, and sets positions of the plurality of participants based on the acquired information on the participant
- FIG. 1 is a schematic view of a speech analysis system S according to the present embodiment.
- the voice analysis system S includes a voice analysis device 100, a sound collection device 10, and a communication terminal 20.
- the number of sound collectors 10 and communication terminals 20 included in the speech analysis system S is not limited.
- the voice analysis system S may include devices such as other servers and terminals.
- the voice analysis device 100, the sound collection device 10, and the communication terminal 20 are connected via a network N such as a local area network or the Internet. At least a part of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 may be directly connected without the network N.
- the sound collector 10 includes a microphone array including a plurality of sound collectors (microphones) arranged in different orientations.
- the microphone array includes eight microphones equally spaced on the same circumference in the horizontal plane with respect to the ground.
- the sound collection device 10 transmits the voice acquired using the microphone array to the voice analysis device 100 as data.
- the communication terminal 20 is a communication device capable of performing wired or wireless communication.
- the communication terminal 20 is, for example, a portable terminal such as a smart phone terminal or a computer terminal such as a personal computer.
- the communication terminal 20 receives the setting of analysis conditions from the analyst and displays the analysis result by the voice analysis device 100.
- the voice analysis device 100 is a computer that analyzes the voice acquired by the sound collection device 10 by a voice analysis method described later. Further, the voice analysis device 100 transmits the result of the voice analysis to the communication terminal 20.
- FIG. 2 is a block diagram of the speech analysis system S according to the present embodiment. Arrows in FIG. 2 indicate the main data flow, and there may be data flows not shown in FIG. In FIG. 2, each block is not a hardware (apparatus) unit configuration but a function unit configuration. As such, the blocks shown in FIG. 2 may be implemented in a single device or may be implemented separately in multiple devices. Transfer of data between the blocks may be performed via any means such as a data bus, a network, a portable storage medium, and the like.
- the sound collection device 10 includes an imaging unit 11 for imaging a participant in the discussion, and a reading unit 12 for reading information such as a card presented by the participant in the discussion.
- the imaging unit 11 is an imaging device capable of imaging a predetermined imaging range including the face of each participant (that is, each of a plurality of participants).
- the imaging unit 11 includes imaging elements of the number and arrangement capable of imaging the faces of all the participants surrounding the sound collection device 10. For example, the imaging unit 11 includes two imaging elements arranged in different orientations of 180 degrees in a horizontal plane with respect to the ground.
- the imaging part 11 may image the face of all the participants who surround the sound collection apparatus 10 by rotating in the horizontal surface with respect to the ground.
- the imaging unit 11 may perform imaging at a timing (for example, every 10 seconds) set in advance in the sound collection device 10, or may perform imaging in accordance with an imaging instruction received from the voice analysis device 100.
- the imaging unit 11 transmits an image indicating the imaged content to the voice analysis device 100.
- the reading unit 12 has a reader (card reader) for reading information recorded in an IC (Integrated Circuit) card or a magnetic card (hereinafter collectively referred to as a card) presented by a participant by a contact method or a non-contact method. .
- An IC chip incorporated in a smartphone or the like may be used as an IC card.
- the reading unit 12 is configured to be able to specify the orientation of the participant who presented the card.
- the reading unit 12 includes twelve reading devices arranged in different directions every 30 degrees in a horizontal plane with respect to the ground.
- the reading unit 12 may be provided with a button for specifying the orientation of the participant in addition to the reading device.
- the reading unit 12 When a card is presented by the participant, the reading unit 12 reads the information of the card by the reading device, and identifies the orientation of the participant based on which reading device has read the card. Then, the reading unit 12 associates the read information with the direction of the participant and transmits the information to the voice analysis device 100.
- the communication terminal 20 has a display unit 21 for displaying various information, and an operation unit 22 for receiving an operation by an analyst.
- the display unit 21 includes a display device such as a liquid crystal display or an organic light emitting diode (OLED) display.
- the operation unit 22 includes operation members such as a button, a switch, and a dial.
- the display unit 21 and the operation unit 22 may be integrally configured by using a touch screen capable of detecting the position of contact by the analyst as the display unit 21.
- the voice analysis device 100 includes a control unit 110, a communication unit 120, and a storage unit 130.
- the control unit 110 includes a position setting unit 111, an audio acquisition unit 112, a sound source localization unit 113, a tracking unit 114, an analysis unit 115, and an output unit 116.
- the storage unit 130 includes a position storage unit 131, a voice storage unit 132, and an analysis result storage unit 133.
- the communication unit 120 is a communication interface for communicating with the sound collection device 10 and the communication terminal 20 via the network N.
- the communication unit 120 includes a processor for performing communication, a connector, an electric circuit, and the like.
- the communication unit 120 performs predetermined processing on a communication signal received from the outside to acquire data, and inputs the acquired data to the control unit 110. Further, the communication unit 120 performs predetermined processing on the data input from the control unit 110 to generate a communication signal, and transmits the generated communication signal to the outside.
- the storage unit 130 is a storage medium including a read only memory (ROM), a random access memory (RAM), a hard disk drive, and the like.
- the storage unit 130 stores in advance a program to be executed by the control unit 110.
- the storage unit 130 may be provided outside the voice analysis device 100, and in this case, data may be exchanged with the control unit 110 via the communication unit 120.
- the position storage unit 131 stores information indicating the positions of participants in the discussion.
- the voice storage unit 132 stores the voice acquired by the sound collection device 10.
- the analysis result storage unit 133 stores an analysis result indicating the result of analyzing the voice.
- the position storage unit 131, the voice storage unit 132, and the analysis result storage unit 133 may be storage areas on the storage unit 130, or may be databases configured on the storage unit 130.
- the control unit 110 is, for example, a processor such as a central processing unit (CPU), and executes the program stored in the storage unit 130 to obtain the position setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, and the tracking unit 114. Functions as an analysis unit 115 and an output unit 116.
- the functions of the position setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the tracking unit 114, the analysis unit 115, and the output unit 116 will be described later with reference to FIGS.
- At least a part of the functions of the control unit 110 may be performed by an electrical circuit.
- at least a part of the functions of the control unit 110 may be executed by a program executed via a network.
- the speech analysis system S is not limited to the specific configuration shown in FIG.
- the voice analysis device 100 is not limited to one device, and may be configured by connecting two or more physically separated devices in a wired or wireless manner.
- FIG. 3 is a schematic view of the speech analysis method performed by the speech analysis system S according to the present embodiment.
- the position setting unit 111 of the voice analysis device 100 sets the position of each participant in the argument to be analyzed by position setting processing described later (a).
- the position setting unit 111 sets the position of each participant by storing the position of each participant specified in the position setting processing described later in the position storage unit 131.
- the audio acquisition unit 112 of the audio analysis device 100 transmits a signal instructing acquisition of audio to the sound collection device 10 when starting acquisition of audio (b).
- the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100, the collection of voice is started.
- the voice acquisition unit 112 of the voice analysis device 100 ends the voice acquisition, the voice acquisition unit 112 transmits a signal instructing the end of the voice acquisition to the sound collection device 10.
- the sound collection device 10 receives a signal instructing the end of the acquisition of sound from the speech analysis device 100, the sound collection device 10 ends the acquisition of sound.
- the sound collection device 10 acquires voices in each of a plurality of sound collection units, and internally records the sound as the sound of each channel corresponding to each sound collection unit. Then, the sound collection device 10 transmits the acquired voices of the plurality of channels to the voice analysis device 100 (c). The sound collector 10 may transmit the acquired voice sequentially or may transmit a predetermined amount or a predetermined time of sound. Further, the sound collection device 10 may collectively transmit the sound from the start to the end of the acquisition.
- the voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
- the voice analysis device 100 analyzes voice at predetermined timing using the voice acquired from the sound collection device 10.
- the voice analysis device 100 may analyze the voice when the analyst gives an analysis instruction at the communication terminal 20 by a predetermined operation. In this case, the analyst selects a voice corresponding to the argument to be analyzed from the voices stored in the voice storage unit 132.
- the voice analysis device 100 may analyze the voice when the voice acquisition ends. In this case, the speech from the start to the end of the acquisition corresponds to the argument to be analyzed. In addition, the voice analysis device 100 may analyze voice sequentially (that is, in real time processing) during acquisition of voice. In this case, the voice analysis device 100 goes back from the current time, and the voice for a predetermined time (for example, 30 seconds) in the past corresponds to the argument to be analyzed.
- a predetermined time for example, 30 seconds
- the sound source localization unit 113 When analyzing speech, first, the sound source localization unit 113 performs sound source localization based on the plurality of channels of speech acquired by the speech acquisition unit 112 (d). The sound source localization is processing for estimating the direction of the sound source included in the sound acquired by the sound acquisition unit 112 for each time (for example, every 10 milliseconds to 100 milliseconds). The sound source localization unit 113 associates the direction of the sound source estimated for each time with the position of the participant stored in the position storage unit 131.
- the sound source localization unit 113 can identify the direction of the sound source based on the sound acquired from the sound collection device 10, a known sound source localization method such as Multiple Signal Classification (MUSIC) method or beam forming method can be used. .
- MUSIC Multiple Signal Classification
- the analysis unit 115 analyzes the voice based on the voice acquired by the voice acquisition unit 112, the direction of the sound source estimated by the sound source localization unit 113, and the position of the participant stored in the position storage unit 131 (e) .
- the analysis unit 115 may analyze the entire completed discussion as an analysis target, or may analyze a part of the discussion in the case of real-time processing.
- the analysis unit 115 first analyzes the voice based on the voice acquired by the voice acquisition unit 112, the direction of the sound source estimated by the sound source localization unit 113, and the position of the participant stored in the position storage unit 131. In the discussion, it is determined at time intervals (eg, every 10 milliseconds to 100 milliseconds) which participants have made a speech.
- the analysis unit 115 specifies, as a speech period, a continuous period from when one participant starts talking to when it ends, and causes the analysis result storage unit 133 to store the same. When a plurality of participants speak at the same time, the analysis unit 115 specifies a speech period for each participant.
- the analysis unit 115 calculates the amount of speech of each participant for each time, and causes the analysis result storage unit 133 to store the amount. Specifically, in a certain time window (for example, 5 seconds), the analysis unit 115 calculates a value obtained by dividing the length of time during which the participant speaks by the length of the time window as the amount of speech per time Do. Then, the analysis unit 115 calculates the amount of speech per hour for each participant while shifting the time window by a predetermined time (for example, one second) from the start time of the discussion to the end time (currently in the case of real time processing). repeat.
- a predetermined time for example, one second
- the follow-up unit 114 acquires the position of the latest participant at predetermined time intervals in the voice to be analyzed by the sound source localization unit 113 and the analysis unit 115 by the following processing described later, and the participant stored in the position storage unit 131 Update the position of. Thereby, even if the participant moves from the already set position during acquisition of the sound, the sound source localization unit 113 and the analysis unit 115 can analyze the sound by following up.
- the output unit 116 performs control to display the analysis result by the analysis unit 115 on the display unit 21 by transmitting the display information to the communication terminal 20 (f).
- the output unit 116 is not limited to the display on the display unit 21 and may output the analysis result by other methods such as printing by a printer, data recording to a storage device, and the like.
- FIG. 4 is a diagram showing a flowchart of the entire speech analysis method performed by the speech analysis apparatus 100 according to the present embodiment.
- the position setting unit 111 specifies the position of each participant of the argument to be analyzed by position setting processing described later, and stores the position in the position storage unit 131 (S1).
- the voice acquisition unit 112 obtains a voice from the sound collection device 10 and stores the voice in the voice storage unit 132 (S2).
- the voice analysis device 100 analyzes the voice acquired by the voice acquisition unit 112 in step S2 for each predetermined time range (time window) from the start time to the end time.
- the sound source localization unit 113 executes sound source localization in the time range of the voice to be analyzed, and associates the estimated direction of the sound source with the position of each participant stored in the position storage unit 131 (S3).
- the following unit 114 acquires the latest participant's position in the time range of the voice to be analyzed by the following processing described later, and updates the participant's position stored in the position storage unit 131 (S4).
- the analysis unit 115 analyzes the voice based on the voice acquired by the voice acquisition unit 112 in step S2, the direction of the sound source estimated in step S3 by the sound source localization unit 113, and the position of the participant stored in the position storage unit 131. To do (S5).
- the analysis unit 115 stores the analysis result in the analysis result storage unit 133.
- the voice analysis device 100 executes the steps S3 to S5 for the next time range in the voice to be analyzed. repeat. If the analysis is completed until the end time of the voice acquired by the voice acquisition unit 112 in step S2 (YES in S6), the output unit 116 outputs the analysis result in step S5 according to a predetermined method (S7).
- FIG. 5 is a front view of the display unit 21 of the communication terminal 20 displaying the setting screen A.
- the analyst manually sets the position of each participant by operating the communication terminal 20, and each participant inputs information for specifying the position of the participant in the sound collection device 10. And automatic setting processing.
- the communication terminal 20 displays the setting screen A on the display unit 21 and receives the setting of the analysis condition by the analyst.
- the setting screen A includes a position setting area A1, a start button A2, an end button A3, and an automatic setting button A4.
- the positioning area A1 is an area for setting the direction in which each participant U is actually positioned with reference to the sound collection device 10 in the argument to be analyzed.
- the position setting area A1 represents a circle centered on the position of the sound collector 10, and further represents an angle based on the sound collector 10 along the circle.
- the analyst who desires the manual setting process sets the position of each participant U in the position setting area A1 by operating the operation unit 22 of the communication terminal 20.
- identification information here, U1 to U4
- U1 to U4 identification information for identifying each participant U is allocated and displayed.
- FIG. 5 four participants U1 to U4 are set.
- the portion corresponding to each participant U in the positioning area A1 is displayed in a different color for each participant. Thereby, the analyst can easily recognize the direction in which each participant U is set.
- the start button A2, the end button A3 and the automatic setting button A4 are virtual buttons displayed on the display unit 21 respectively.
- the communication terminal 20 transmits a signal of a start instruction to the voice analysis device 100 when the analyst presses the start button A2.
- the communication terminal 20 transmits a signal of a termination instruction to the voice analysis device 100 when the analyst presses the termination button A3.
- from the start instruction to the end instruction by the analyst is one discussion.
- An analyst who desires the automatic setting process causes the voice analysis device 100 to start the automatic setting process by pressing the automatic setting button A4.
- the communication terminal 20 transmits an automatic setting instruction signal to the voice analysis device 100.
- FIGS. 6A to 6C are schematic views of the automatic setting process performed by the voice analysis device 100 according to the present embodiment.
- the voice analysis device 100 sets the position of the participant U by at least one of the processes shown in FIGS. 6 (a) to 6 (c).
- FIG. 6A shows a process of setting the position of the participant U based on the voice uttered by the participant U.
- the position setting unit 111 of the voice analysis device 100 causes the sound collection unit of the sound collection device 10 to acquire the voice emitted by each participant U.
- the position setting unit 111 acquires the sound acquired by the sound collection device 10.
- the position setting unit 111 specifies the direction of each participant U based on the direction in which the acquired voice is emitted.
- the position setting unit 111 uses the result of sound source localization by the above-described sound source localization unit 113 in order to specify the direction of the participant from the voice. Then, the position setting unit 111 causes the position storage unit 131 to store the positions of the identified participants U.
- the position setting unit 111 may specify the individual of the participant U by comparing the acquired voice of each participant U with the voice of the individual stored in advance in the voice analysis device 100. For example, the position setting unit 111 identifies the individual by comparing the voiceprints of the voices of the participants U (that is, the frequency spectrum of the voice). As a result, personal information of the participant U can be displayed together with the analysis result, and a plurality of analysis results of the same participant U can be displayed.
- FIG. 6B shows a process of setting the position of the participant U based on the image of the face of the participant U.
- the position setting unit 111 of the voice analysis device 100 causes the imaging unit 11 provided in the sound collection device 10 to capture an area including the faces of all the participants U surrounding the sound collection device 10.
- the position setting unit 111 acquires an image captured by the imaging unit 11.
- the position setting unit 111 recognizes the face of each participant U in the acquired image.
- the position setting unit 111 can use a known face recognition technology to recognize a human face from an image. Then, the position setting unit 111 identifies the position of each participant U based on the sound collection device 10 based on the position of the face of each participant U recognized from the image, and stores the position in the position storage unit 131.
- the relationship between the position in the image (for example, the coordinates of the pixels in the image) and the position based on the sound collection device 10 (for example, the angle with respect to the sound collection device 10) is set in advance in the voice analysis device 100.
- the position setting unit 111 may specify the individual of the participant U by comparing the face of each participant U recognized from the image with the face of the individual stored in the voice analysis device 100 in advance. As a result, personal information of the participant U can be displayed together with the analysis result, and a plurality of analysis results of the same participant U can be displayed.
- FIG. 6C shows a process of setting the position of the participant U based on the information of the card C presented by the participant U.
- the position setting unit 111 of the voice analysis device 100 causes the reading unit 12 provided in the sound collection device 10 to read the information of the card C presented by each participant U.
- the position setting unit 111 acquires the information on the card C read by the reading unit 12 and the direction of the participant U who presented the card C.
- the position setting unit 111 identifies the position of each participant U with respect to the sound collection device 10 based on the acquired information of the card C and the direction of the participant U, and causes the position storage unit 131 to store the position.
- the position setting unit 111 may specify the individual of the participant U by acquiring personal information stored in advance in the voice analysis device 100 using the acquired information of the card C. As a result, personal information of the participant U can be displayed together with the analysis result, and a plurality of analysis results of the same participant U can be displayed.
- the position setting unit 111 may execute the automatic setting process and the manual setting process in combination.
- the position setting unit 111 displays the position of each participant U set by the automatic setting process of FIGS. 6A to 6C in the position setting area A1 of FIG. Accept manual settings by the user.
- the position of each participant U set by the automatic setting process can be corrected by the manual setting process, and the position of each participant U can be set more reliably.
- the voice analysis device 100 can automatically set the position of each participant U based on the information on the participant U acquired on the sound collection device 10, the analyst can set the positions of all the groups on the communication terminal 20. It is possible to reduce the trouble of setting the position of each participant U.
- the voice analysis device 100 is not limited to information on voice, an image, or a card as information on the participant U that can be acquired on the sound collection device 10 (that is, information for specifying the position of the participant U). Other information that can identify the orientation may be used.
- FIG. 7 is a diagram showing a flowchart of position setting processing performed by the voice analysis device 100 according to the present embodiment.
- the position setting unit 111 determines whether or not the automatic setting process is instructed by the analyst on the setting screen A in FIG. 5. When the automatic setting process is not instructed (ie, in the case of manual setting) (NO in S11), the position setting unit 111 follows the contents input on the setting screen A displayed on the communication terminal 20 for each participant. The position is specified and set in the position storage unit 131 (S12).
- the position setting unit 111 acquires information on the participant (that is, information for specifying the position of the participant) on the sound collection device 10 (S13).
- the position setting unit 111 uses at least one of the voice of the participant, the image of the face of the participant, and the information of the card presented by the participant as the information on the participant.
- the position setting unit 111 specifies the position of each participant U with respect to the sound collection device 10 based on the acquired information on the participants (S14). Then, the position setting unit 111 sets the positions of the participants by storing the positions of the identified participants in the position storage unit 131 (S15).
- FIG. 8 is a schematic view of the follow-up process performed by the voice analysis device 100 according to the present embodiment.
- the following process is a process of updating the position of each participant U stored in the position storage unit 131 in the middle of the voice to be analyzed by the sound source localization unit 113 and the analysis unit 115.
- the upper part of FIG. 8 shows the position of each participant U before the update
- the lower part of FIG. 8 shows the position of each participant U after the update.
- the upper view of FIG. 8 shows a state in which the participant U1 has moved from the position P1 set in the position storage section 131 to another position P2. In this state, the voice emitted by the participant U1 enters the sound collector 10 from a position P2 different from the set position P1 of the participant U1. Therefore, the analysis unit 115 can not detect the utterance of the participant U1 from the voice.
- the tracking unit 114 updates the position of the participant U1 from the position P1 to the position P2 in the position storage unit 131 as illustrated in the lower part of FIG.
- the analysis unit 115 can correctly detect the utterance of the participant U1.
- the tracking unit 114 acquires the direction of the sound source estimated by the sound source localization unit 113 every predetermined time (for example, one minute). If the estimated direction of the sound source does not correspond to any of the positions of the participants U stored in the position storage unit 131, the tracking unit 114 determines that any participant U moves in the direction of the sound source judge. Then, the tracking unit 114 specifies the moved participant U, and updates the position stored in the position storage unit 131 for the moved participant U to a position corresponding to the direction of the sound source.
- the tracking unit 114 specifies that the participant U set at the position closest to the direction of the sound source estimated from the sound acquired by the sound collection device 10 has moved to the position corresponding to the direction of the sound source. In this case, the tracking unit 114 selects the moved participant U from among the participants U set at a position within a predetermined range (for example, within a range of -30 degrees to +30 degrees) from the direction of the sound source. It is also good. By limiting the range of movement, the following unit 114 can suppress, for example, the position of the participant U from being moved to the wrong position.
- a predetermined range for example, within a range of -30 degrees to +30 degrees
- the tracking unit 114 compares the voiceprint of the sound source with the voiceprint of each participant U, and specifies that the participant U having the voiceprint similar to the voiceprint of the sound source has moved to the position corresponding to the direction of the sound source. It is also good. In this case, the tracking unit 114 may acquire the voiceprint of each participant U from the voice of each participant U at the start of analysis, or may acquire the voiceprint of each participant U stored in the storage unit 130 in advance. It is also good. The tracking unit 114 calculates the degree of similarity of voiceprints between the voiceprint of the sound source and the voiceprint of each participant U.
- the tracking unit 114 selects the participant U whose voiceprint similarity is the highest in the group, or selects the participant U whose voiceprint similarity is equal to or higher than a predetermined threshold.
- the tracking accuracy can be improved by specifying the moved participant U using the voiceprint.
- the following unit 114 acquires a face located in the direction of the sound source in the image captured by the imaging unit 11 of the sound collection device 10, and the participant U having a face similar to the acquired face has the direction of the sound source It may specify that it moved to the position corresponding to.
- the tracking unit 114 may acquire the face of each participant U from the image captured by the imaging unit 11 at the start of analysis, or may acquire the face of each participant U stored in the storage unit 130 in advance. It is also good.
- the tracking unit 114 calculates the similarity of the face between the face located in the direction of the sound source and the face of each participant U.
- the tracking unit 114 selects the participant U whose face similarity is the highest in the group, or selects the participant U whose face similarity is equal to or higher than a predetermined threshold. By specifying the moved participant U using the face, it is possible to improve the tracking accuracy.
- the following unit 114 recognizes voiceprints for each participant U based on the difference between the direction of the sound source and the position (direction) stored in the position storage unit 131.
- the face similarity may be weighted. The probability that the participant U moves to the position of the sound source is higher as the position set for the participant U and the direction of the sound source are closer, and the distance between the position set for the participant U and the direction of the sound source is higher It can be said that the probability of moving to the position of the sound source is low.
- the tracking unit 114 weights the similarity of the voiceprint or face higher as the difference between the position set for the participant U and the direction of the sound source decreases, and the difference between the position set for the participant U and the direction of the sound source The greater the value of V, the lower the degree of similarity of the voiceprint or face. This can further improve the tracking accuracy.
- FIG. 9 is a diagram showing a flowchart of the follow-up process performed by the voice analysis device 100 according to the present embodiment.
- the tracking unit 114 acquires the direction of the sound source estimated by the sound source localization unit 113. If the direction of the sound source corresponds to any of the positions of the participants U stored in the position storage unit 131 (YES in S41), the following unit 114 ends the process without updating the position. .
- the following unit 114 selects information about the participants on the sound collection device 10 (Ie, information for specifying the position of the participant) is acquired (S42).
- the tracking unit 114 uses at least one of the voice of the participant and the image of the face of the participant as the information on the participant.
- the following unit 114 specifies which participant has moved based on the acquired information on the participant (S43). Then, the tracking unit 114 updates the position stored in the position storage unit 131 to the position corresponding to the direction of the sound source for the participant U specified as having moved (S44).
- the voice analysis device 100 is information on each participant such as a voice emitted by the participant, an image of the participant's face, information of a card presented by the participant, etc. in the sound collection device 10 arranged in each group. And automatically set the position of each participant based on the acquired information. Therefore, when analyzing the voice of the discussion, it is possible to reduce the trouble of setting the position of each participant for each group.
- the voice analysis device 100 updates the position of each participant based on the information on each participant during analysis of voice. Therefore, even if the participant moves in the middle of acquiring the voice, it is possible to follow and analyze.
- the speech analysis device 100 is used for analysis of speech in the discussion, but can be applied to other applications.
- the voice analysis device 100 can also analyze the voice emitted by a passenger sitting in a car.
- the processor of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 is a main body of each step (process) included in the voice analysis method shown in FIGS. That is, the processors of the voice analysis device 100, the sound collection device 10 and the communication terminal 20 read a program for executing the voice analysis method shown in FIGS. 4, 7 and 9 from the storage unit and execute the program to perform voice analysis. By controlling the respective units of the device 100, the sound collection device 10 and the communication terminal 20, the voice analysis method shown in FIGS.
- the steps included in the speech analysis method shown in FIGS. 4, 7 and 9 may be partially omitted, the order between the steps may be changed, and a plurality of steps may be performed in parallel.
- S speech analysis system 100 speech analysis device 110 control unit 111 position setting unit 112 speech acquisition unit 114 tracking unit 115 analysis unit 10 sound collection device 20 communication terminal
Abstract
Description
図1は、本実施形態に係る音声分析システムSの模式図である。音声分析システムSは、音声分析装置100と、集音装置10と、通信端末20とを含む。音声分析システムSが含む集音装置10及び通信端末20の数は限定されない。音声分析システムSは、その他のサーバ、端末等の機器を含んでもよい。 [Overview of speech analysis system S]
FIG. 1 is a schematic view of a speech analysis system S according to the present embodiment. The voice analysis system S includes a
図2は、本実施形態に係る音声分析システムSのブロック図である。図2において、矢印は主なデータの流れを示しており、図2に示していないデータの流れがあってよい。図2において、各ブロックはハードウェア(装置)単位の構成ではなく、機能単位の構成を示している。そのため、図2に示すブロックは単一の装置内に実装されてよく、あるいは複数の装置内に別れて実装されてよい。ブロック間のデータの授受は、データバス、ネットワーク、可搬記憶媒体等、任意の手段を介して行われてよい。 [Configuration of speech analysis system S]
FIG. 2 is a block diagram of the speech analysis system S according to the present embodiment. Arrows in FIG. 2 indicate the main data flow, and there may be data flows not shown in FIG. In FIG. 2, each block is not a hardware (apparatus) unit configuration but a function unit configuration. As such, the blocks shown in FIG. 2 may be implemented in a single device or may be implemented separately in multiple devices. Transfer of data between the blocks may be performed via any means such as a data bus, a network, a portable storage medium, and the like.
図3は、本実施形態に係る音声分析システムSが行う音声分析方法の模式図である。まず音声分析装置100の位置設定部111は、後述の位置設定処理によって、分析対象とする議論における各参加者の位置を設定する(a)。位置設定部111は、後述の位置設定処理において特定した各参加者の位置を位置記憶部131に記憶させることにより、各参加者の位置を設定する。 [Description of voice analysis method]
FIG. 3 is a schematic view of the speech analysis method performed by the speech analysis system S according to the present embodiment. First, the
まず、図4のステップS1に示した位置設定処理について説明する。図5は、設定画面Aを表示している通信端末20の表示部21の前面図である。位置設定処理は、分析者が通信端末20を操作することによって各参加者の位置を設定する手動設定処理と、各参加者が自身の位置を特定するための情報を集音装置10に入力する自動設定処理とを含む。 [Description of positioning process]
First, the position setting process shown in step S1 of FIG. 4 will be described. FIG. 5 is a front view of the
図6(a)~図6(c)は、それぞれ本実施形態に係る音声分析装置100が行う自動設定処理の模式図である。音声分析装置100は、自動設定処理が指示されると、図6(a)~図6(c)に示す処理のうち少なくとも1つによって参加者Uの位置を設定する。 [Description of automatic setting process]
FIGS. 6A to 6C are schematic views of the automatic setting process performed by the
次に、図4のステップS4に示した追従処理について説明する。図8は、本実施形態に係る音声分析装置100が行う追従処理の模式図である。追従処理は、音源定位部113及び分析部115による分析対象の音声の途中で、位置記憶部131に記憶された各参加者Uの位置を更新する処理である。 [Description of follow-up processing]
Next, the follow-up process shown in step S4 of FIG. 4 will be described. FIG. 8 is a schematic view of the follow-up process performed by the
本実施形態に係る音声分析装置100は、各グループに配置される集音装置10において参加者の発する音声、参加者の顔の画像、参加者が提示するカードの情報等の各参加者に関する情報を取得し、取得した情報に基づいて自動的に各参加者の位置を設定する。そのため、議論の音声を分析する際に各グループについて各参加者の位置を設定する手間を削減できる。 [Effect of this embodiment]
The
100 音声分析装置
110 制御部
111 位置設定部
112 音声取得部
114 追従部
115 分析部
10 集音装置
20 通信端末 S
Claims (11)
- 集音装置から複数の参加者に関する情報を取得し、取得した前記参加者に関する情報に基づいて前記複数の参加者それぞれの位置を設定する設定部と、
前記集音装置から音声を取得する取得部と、
前記設定部が設定した前記位置に基づいて、前記複数の参加者それぞれが発した前記音声を分析する分析部と、
を有する音声分析装置。 A setting unit configured to acquire information on a plurality of participants from the sound collection device, and to set positions of the plurality of participants based on the acquired information on the participants;
An acquisition unit that acquires voice from the sound collection device;
An analysis unit that analyzes the voice emitted by each of the plurality of participants based on the position set by the setting unit;
Voice analyzer with. - 前記設定部は、前記参加者に関する情報として前記集音装置から音声を取得し、取得した前記音声が発せられた向きを特定することによって、前記複数の参加者それぞれの位置を設定する、請求項1に記載の音声分析装置。 The setting unit sets a position of each of the plurality of participants by acquiring a voice from the sound collection device as the information on the participant, and specifying a direction in which the acquired voice is emitted. The voice analysis device according to 1.
- 前記設定部は、前記参加者に関する情報として前記集音装置上に設けられた撮像部が撮像した画像を取得し、取得した前記画像に含まれる前記複数の参加者の顔を認識することによって、前記複数の参加者それぞれの位置を設定する、請求項1又は2に記載の音声分析装置。 The setting unit acquires an image captured by an imaging unit provided on the sound collection device as the information on the participant, and recognizes the faces of the plurality of participants included in the acquired image. The voice analysis device according to claim 1, wherein a position of each of the plurality of participants is set.
- 前記設定部は、前記参加者に関する情報として前記集音装置上に設けられた読取部が読み取ったカードの情報を取得し、前記カードが前記読取部に提示された向きに従って前記複数の参加者それぞれの位置を設定する、請求項1から3のいずれか一項に記載の音声分析装置。 The setting unit acquires information of a card read by a reading unit provided on the sound collection device as the information on the participant, and the plurality of participants are each according to the direction in which the card is presented to the reading unit. The voice analysis device according to any one of claims 1 to 3, wherein the position of is set.
- 前記設定部は、前記参加者に関する情報に加えて、通信端末において入力された情報に基づいて、前記複数の参加者それぞれの位置を設定する、請求項1から4のいずれか一項に記載の音声分析装置。 The setting unit sets the position of each of the plurality of participants based on the information input in the communication terminal in addition to the information on the participants. Voice analyzer.
- 前記分析部が分析している前記音声の途中において、前記設定部が設定した前記位置を更新する追従部をさらに有する、請求項1から5のいずれか一項に記載の音声分析装置。 The voice analysis device according to any one of claims 1 to 5, further comprising: a following unit that updates the position set by the setting unit in the middle of the voice being analyzed by the analysis unit.
- 前記追従部は、前記分析部が分析している前記音声が発せられた向きが、前記設定部が設定した前記位置に対応しない場合に、前記設定部が設定した前記位置を更新する、請求項6に記載の音声分析装置。 The tracking unit updates the position set by the setting unit when the direction in which the voice analyzed by the analysis unit is emitted does not correspond to the position set by the setting unit. The voice analysis device according to 6.
- 前記追従部は、前記設定部が設定した前記位置を、前記分析部が分析している前記音声が発せられた向きに更新する、請求項6又は7に記載の音声分析装置。 The voice analysis device according to claim 6, wherein the tracking unit updates the position set by the setting unit in a direction in which the voice analyzed by the analysis unit is emitted.
- プロセッサが、
集音装置から複数の参加者に関する情報を取得し、取得した前記参加者に関する情報に基づいて前記複数の参加者それぞれの位置を設定するステップと、
前記集音装置から音声を取得するステップと、
前記設定するステップで設定された前記位置に基づいて、前記複数の参加者それぞれが発した前記音声を分析するステップと、
を実行する音声分析方法。 Processor is
Acquiring information on a plurality of participants from the sound collection device, and setting the position of each of the plurality of participants based on the acquired information on the participants;
Acquiring voice from the sound collection device;
Analyzing the voice emitted by each of the plurality of participants based on the position set in the setting step;
Voice analysis method to perform. - コンピュータに、
集音装置から複数の参加者に関する情報を取得し、取得した前記参加者に関する情報に基づいて前記複数の参加者それぞれの位置を設定するステップと、
前記集音装置から音声を取得するステップと、
前記設定するステップで設定された前記位置に基づいて、前記複数の参加者それぞれが発した前記音声を分析するステップと、
を実行させる音声分析プログラム。 On the computer
Acquiring information on a plurality of participants from the sound collection device, and setting the position of each of the plurality of participants based on the acquired information on the participants;
Acquiring voice from the sound collection device;
Analyzing the voice emitted by each of the plurality of participants based on the position set in the setting step;
Voice analysis program to run. - 音声分析装置と、前記音声分析装置と通信可能な集音装置と、を備え、
前記集音装置は、音声を取得するとともに、複数の参加者に関する情報を取得するように構成され、
前記音声分析装置は、
前記集音装置から前記参加者に関する情報を取得し、取得した前記参加者に関する情報に基づいて前記複数の参加者それぞれの位置を設定する設定部と、
前記集音装置から前記音声を取得する取得部と、
前記設定部が設定した前記位置に基づいて、前記複数の参加者それぞれが発した前記音声を分析する分析部と、
を有する、音声分析システム。 A voice analysis device, and a sound collection device capable of communicating with the voice analysis device;
The sound collection device is configured to obtain audio and to obtain information on a plurality of participants,
The voice analysis device
A setting unit configured to acquire information on the participant from the sound collection device, and to set positions of the plurality of participants based on the acquired information on the participant;
An acquisition unit for acquiring the sound from the sound collection device;
An analysis unit that analyzes the voice emitted by each of the plurality of participants based on the position set by the setting unit;
Voice analysis system.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2018/000943 WO2019142232A1 (en) | 2018-01-16 | 2018-01-16 | Voice analysis device, voice analysis method, voice analysis program, and voice analysis system |
JP2018502280A JP6589041B1 (en) | 2018-01-16 | 2018-01-16 | Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2018/000943 WO2019142232A1 (en) | 2018-01-16 | 2018-01-16 | Voice analysis device, voice analysis method, voice analysis program, and voice analysis system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019142232A1 true WO2019142232A1 (en) | 2019-07-25 |
Family
ID=67301394
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2018/000943 WO2019142232A1 (en) | 2018-01-16 | 2018-01-16 | Voice analysis device, voice analysis method, voice analysis program, and voice analysis system |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP6589041B1 (en) |
WO (1) | WO2019142232A1 (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05161190A (en) * | 1991-12-09 | 1993-06-25 | Toda Constr Co Ltd | Sound response microphone |
JP2000356674A (en) * | 1999-06-11 | 2000-12-26 | Japan Science & Technology Corp | Sound source identification device and its identification method |
JP2005274707A (en) * | 2004-03-23 | 2005-10-06 | Sony Corp | Information processing apparatus and method, program, and recording medium |
JP2006189626A (en) * | 2005-01-06 | 2006-07-20 | Fuji Photo Film Co Ltd | Recording device and voice recording program |
JP2017129873A (en) * | 2017-03-06 | 2017-07-27 | 本田技研工業株式会社 | Conversation assist device, method for controlling conversation assist device, and program for conversation assist device |
JP2017173768A (en) * | 2016-03-25 | 2017-09-28 | グローリー株式会社 | Minutes creation system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20160026317A (en) * | 2014-08-29 | 2016-03-09 | 삼성전자주식회사 | Method and apparatus for voice recording |
-
2018
- 2018-01-16 JP JP2018502280A patent/JP6589041B1/en active Active
- 2018-01-16 WO PCT/JP2018/000943 patent/WO2019142232A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05161190A (en) * | 1991-12-09 | 1993-06-25 | Toda Constr Co Ltd | Sound response microphone |
JP2000356674A (en) * | 1999-06-11 | 2000-12-26 | Japan Science & Technology Corp | Sound source identification device and its identification method |
JP2005274707A (en) * | 2004-03-23 | 2005-10-06 | Sony Corp | Information processing apparatus and method, program, and recording medium |
JP2006189626A (en) * | 2005-01-06 | 2006-07-20 | Fuji Photo Film Co Ltd | Recording device and voice recording program |
JP2017173768A (en) * | 2016-03-25 | 2017-09-28 | グローリー株式会社 | Minutes creation system |
JP2017129873A (en) * | 2017-03-06 | 2017-07-27 | 本田技研工業株式会社 | Conversation assist device, method for controlling conversation assist device, and program for conversation assist device |
Also Published As
Publication number | Publication date |
---|---|
JP6589041B1 (en) | 2019-10-09 |
JPWO2019142232A1 (en) | 2020-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110443110B (en) | Face recognition method, device, terminal and storage medium based on multipath camera shooting | |
CN108039988B (en) | Equipment control processing method and device | |
JP2019082990A (en) | Identity authentication method, terminal equipment, and computer readable storage medium | |
EP3791390A1 (en) | Voice identification enrollment | |
CN105354543A (en) | Video processing method and apparatus | |
US10922570B1 (en) | Entering of human face information into database | |
CN111883168B (en) | Voice processing method and device | |
KR102263154B1 (en) | Smart mirror system and realization method for training facial sensibility expression | |
JP2020148931A (en) | Discussion analysis device and discussion analysis method | |
CN110505504A (en) | Video program processing method, device, computer equipment and storage medium | |
CN114556469A (en) | Data processing method and device, electronic equipment and storage medium | |
CN110941992B (en) | Smile expression detection method and device, computer equipment and storage medium | |
CN113143193A (en) | Intelligent vision testing method, device and system | |
JP6646134B2 (en) | Voice analysis device, voice analysis method, voice analysis program, and voice analysis system | |
WO2019142232A1 (en) | Voice analysis device, voice analysis method, voice analysis program, and voice analysis system | |
JP6975755B2 (en) | Voice analyzer, voice analysis method, voice analysis program and voice analysis system | |
JP6589042B1 (en) | Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system | |
CN113409822B (en) | Object state determining method and device, storage medium and electronic device | |
JP6589040B1 (en) | Speech analysis apparatus, speech analysis method, speech analysis program, and speech analysis system | |
JP2015177490A (en) | Image/sound processing system, information processing apparatus, image/sound processing method, and image/sound processing program | |
JP7427274B2 (en) | Speech analysis device, speech analysis method, speech analysis program and speech analysis system | |
CN111310602A (en) | System and method for analyzing attention of exhibit based on emotion recognition | |
JP7414319B2 (en) | Speech analysis device, speech analysis method, speech analysis program and speech analysis system | |
JP7261462B2 (en) | Speech analysis device, speech analysis system and speech analysis method | |
CN112035639B (en) | Intelligent automatic question answering robot system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2018502280 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18900852 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.10.2020) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18900852 Country of ref document: EP Kind code of ref document: A1 |