WO2019142232A1 - Dispositif d'analyse de voix, procédé d'analyse de voix, programme d'analyse de voix et système d'analyse de voix - Google Patents

Dispositif d'analyse de voix, procédé d'analyse de voix, programme d'analyse de voix et système d'analyse de voix Download PDF

Info

Publication number
WO2019142232A1
WO2019142232A1 PCT/JP2018/000943 JP2018000943W WO2019142232A1 WO 2019142232 A1 WO2019142232 A1 WO 2019142232A1 JP 2018000943 W JP2018000943 W JP 2018000943W WO 2019142232 A1 WO2019142232 A1 WO 2019142232A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
unit
participant
participants
analysis
Prior art date
Application number
PCT/JP2018/000943
Other languages
English (en)
Japanese (ja)
Inventor
武志 水本
哲也 菅原
Original Assignee
ハイラブル株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ハイラブル株式会社 filed Critical ハイラブル株式会社
Priority to JP2018502280A priority Critical patent/JP6589041B1/ja
Priority to PCT/JP2018/000943 priority patent/WO2019142232A1/fr
Publication of WO2019142232A1 publication Critical patent/WO2019142232A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present invention relates to a voice analysis device for analyzing voice, a voice analysis method, a voice analysis program and a voice analysis system.
  • the Harkness method (also referred to as the Harkness method) is known as a method for analyzing discussions in group learning and meetings (see, for example, Non-Patent Document 1).
  • the Harkness method the transition of each participant's utterance is recorded in a line. In this way, it is possible to analyze the contribution of each participant to the discussion and the relationship with others.
  • the Harkness Law can also be effectively applied to active learning where students take the initiative in learning.
  • the present invention has been made in view of these points, and it is an object of the present invention to provide a voice analysis device, a voice analysis method, a voice analysis program, and a voice analysis system capable of reducing the time and effort required to set the positions of participants when analyzing voices of discussion. Intended to be provided.
  • the voice analysis device acquires information on a plurality of participants from the sound collection device, and sets a position of each of the plurality of participants based on the acquired information on the participants And an acquisition unit configured to acquire a voice from the sound collection device, and an analysis unit configured to analyze the voice uttered by each of the plurality of participants based on the position set by the setting unit.
  • the setting unit may set a position of each of the plurality of participants by acquiring a voice from the sound collection device as the information on the participant and specifying a direction in which the acquired voice is emitted. .
  • the setting unit acquires an image captured by an imaging unit provided on the sound collection device as the information on the participant, and recognizes the faces of the plurality of participants included in the acquired image.
  • the positions of each of the plurality of participants may be set.
  • the setting unit acquires information of a card read by a reading unit provided on the sound collection device as the information on the participant, and the plurality of participants are each according to the direction in which the card is presented to the reading unit.
  • the position of may be set.
  • the setting unit may set the position of each of the plurality of participants based on the information input in the communication terminal in addition to the information on the participants.
  • It may further include a following unit that updates the position set by the setting unit in the middle of the voice being analyzed by the analysis unit.
  • the tracking unit may update the position set by the setting unit when the direction in which the voice analyzed by the analysis unit is emitted does not correspond to the position set by the setting unit. .
  • the tracking unit may update the position set by the setting unit to a direction in which the voice analyzed by the analysis unit is emitted.
  • the processor acquires information on a plurality of participants from the sound collection device, and sets the position of each of the plurality of participants based on the acquired information on the participants Performing the steps of: acquiring voice from the sound collection device; and analyzing the voice emitted by each of the plurality of participants based on the position set in the setting step. .
  • the voice analysis program acquires information on a plurality of participants from a sound collection device on a computer, and sets the positions of the plurality of participants based on the acquired information on the participants. Performing the steps of: obtaining voice from the sound collection device; and analyzing the voice emitted by each of the plurality of participants based on the position set in the setting step. .
  • a voice analysis system includes a voice analysis device and a sound collection device capable of communicating with the voice analysis device, wherein the sound collection device acquires voice and a plurality of participants.
  • the voice analysis device acquires information on the participant from the sound collection device, and sets positions of the plurality of participants based on the acquired information on the participant
  • FIG. 1 is a schematic view of a speech analysis system S according to the present embodiment.
  • the voice analysis system S includes a voice analysis device 100, a sound collection device 10, and a communication terminal 20.
  • the number of sound collectors 10 and communication terminals 20 included in the speech analysis system S is not limited.
  • the voice analysis system S may include devices such as other servers and terminals.
  • the voice analysis device 100, the sound collection device 10, and the communication terminal 20 are connected via a network N such as a local area network or the Internet. At least a part of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 may be directly connected without the network N.
  • the sound collector 10 includes a microphone array including a plurality of sound collectors (microphones) arranged in different orientations.
  • the microphone array includes eight microphones equally spaced on the same circumference in the horizontal plane with respect to the ground.
  • the sound collection device 10 transmits the voice acquired using the microphone array to the voice analysis device 100 as data.
  • the communication terminal 20 is a communication device capable of performing wired or wireless communication.
  • the communication terminal 20 is, for example, a portable terminal such as a smart phone terminal or a computer terminal such as a personal computer.
  • the communication terminal 20 receives the setting of analysis conditions from the analyst and displays the analysis result by the voice analysis device 100.
  • the voice analysis device 100 is a computer that analyzes the voice acquired by the sound collection device 10 by a voice analysis method described later. Further, the voice analysis device 100 transmits the result of the voice analysis to the communication terminal 20.
  • FIG. 2 is a block diagram of the speech analysis system S according to the present embodiment. Arrows in FIG. 2 indicate the main data flow, and there may be data flows not shown in FIG. In FIG. 2, each block is not a hardware (apparatus) unit configuration but a function unit configuration. As such, the blocks shown in FIG. 2 may be implemented in a single device or may be implemented separately in multiple devices. Transfer of data between the blocks may be performed via any means such as a data bus, a network, a portable storage medium, and the like.
  • the sound collection device 10 includes an imaging unit 11 for imaging a participant in the discussion, and a reading unit 12 for reading information such as a card presented by the participant in the discussion.
  • the imaging unit 11 is an imaging device capable of imaging a predetermined imaging range including the face of each participant (that is, each of a plurality of participants).
  • the imaging unit 11 includes imaging elements of the number and arrangement capable of imaging the faces of all the participants surrounding the sound collection device 10. For example, the imaging unit 11 includes two imaging elements arranged in different orientations of 180 degrees in a horizontal plane with respect to the ground.
  • the imaging part 11 may image the face of all the participants who surround the sound collection apparatus 10 by rotating in the horizontal surface with respect to the ground.
  • the imaging unit 11 may perform imaging at a timing (for example, every 10 seconds) set in advance in the sound collection device 10, or may perform imaging in accordance with an imaging instruction received from the voice analysis device 100.
  • the imaging unit 11 transmits an image indicating the imaged content to the voice analysis device 100.
  • the reading unit 12 has a reader (card reader) for reading information recorded in an IC (Integrated Circuit) card or a magnetic card (hereinafter collectively referred to as a card) presented by a participant by a contact method or a non-contact method. .
  • An IC chip incorporated in a smartphone or the like may be used as an IC card.
  • the reading unit 12 is configured to be able to specify the orientation of the participant who presented the card.
  • the reading unit 12 includes twelve reading devices arranged in different directions every 30 degrees in a horizontal plane with respect to the ground.
  • the reading unit 12 may be provided with a button for specifying the orientation of the participant in addition to the reading device.
  • the reading unit 12 When a card is presented by the participant, the reading unit 12 reads the information of the card by the reading device, and identifies the orientation of the participant based on which reading device has read the card. Then, the reading unit 12 associates the read information with the direction of the participant and transmits the information to the voice analysis device 100.
  • the communication terminal 20 has a display unit 21 for displaying various information, and an operation unit 22 for receiving an operation by an analyst.
  • the display unit 21 includes a display device such as a liquid crystal display or an organic light emitting diode (OLED) display.
  • the operation unit 22 includes operation members such as a button, a switch, and a dial.
  • the display unit 21 and the operation unit 22 may be integrally configured by using a touch screen capable of detecting the position of contact by the analyst as the display unit 21.
  • the voice analysis device 100 includes a control unit 110, a communication unit 120, and a storage unit 130.
  • the control unit 110 includes a position setting unit 111, an audio acquisition unit 112, a sound source localization unit 113, a tracking unit 114, an analysis unit 115, and an output unit 116.
  • the storage unit 130 includes a position storage unit 131, a voice storage unit 132, and an analysis result storage unit 133.
  • the communication unit 120 is a communication interface for communicating with the sound collection device 10 and the communication terminal 20 via the network N.
  • the communication unit 120 includes a processor for performing communication, a connector, an electric circuit, and the like.
  • the communication unit 120 performs predetermined processing on a communication signal received from the outside to acquire data, and inputs the acquired data to the control unit 110. Further, the communication unit 120 performs predetermined processing on the data input from the control unit 110 to generate a communication signal, and transmits the generated communication signal to the outside.
  • the storage unit 130 is a storage medium including a read only memory (ROM), a random access memory (RAM), a hard disk drive, and the like.
  • the storage unit 130 stores in advance a program to be executed by the control unit 110.
  • the storage unit 130 may be provided outside the voice analysis device 100, and in this case, data may be exchanged with the control unit 110 via the communication unit 120.
  • the position storage unit 131 stores information indicating the positions of participants in the discussion.
  • the voice storage unit 132 stores the voice acquired by the sound collection device 10.
  • the analysis result storage unit 133 stores an analysis result indicating the result of analyzing the voice.
  • the position storage unit 131, the voice storage unit 132, and the analysis result storage unit 133 may be storage areas on the storage unit 130, or may be databases configured on the storage unit 130.
  • the control unit 110 is, for example, a processor such as a central processing unit (CPU), and executes the program stored in the storage unit 130 to obtain the position setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, and the tracking unit 114. Functions as an analysis unit 115 and an output unit 116.
  • the functions of the position setting unit 111, the sound acquisition unit 112, the sound source localization unit 113, the tracking unit 114, the analysis unit 115, and the output unit 116 will be described later with reference to FIGS.
  • At least a part of the functions of the control unit 110 may be performed by an electrical circuit.
  • at least a part of the functions of the control unit 110 may be executed by a program executed via a network.
  • the speech analysis system S is not limited to the specific configuration shown in FIG.
  • the voice analysis device 100 is not limited to one device, and may be configured by connecting two or more physically separated devices in a wired or wireless manner.
  • FIG. 3 is a schematic view of the speech analysis method performed by the speech analysis system S according to the present embodiment.
  • the position setting unit 111 of the voice analysis device 100 sets the position of each participant in the argument to be analyzed by position setting processing described later (a).
  • the position setting unit 111 sets the position of each participant by storing the position of each participant specified in the position setting processing described later in the position storage unit 131.
  • the audio acquisition unit 112 of the audio analysis device 100 transmits a signal instructing acquisition of audio to the sound collection device 10 when starting acquisition of audio (b).
  • the sound collection device 10 receives a signal instructing acquisition of voice from the voice analysis device 100, the collection of voice is started.
  • the voice acquisition unit 112 of the voice analysis device 100 ends the voice acquisition, the voice acquisition unit 112 transmits a signal instructing the end of the voice acquisition to the sound collection device 10.
  • the sound collection device 10 receives a signal instructing the end of the acquisition of sound from the speech analysis device 100, the sound collection device 10 ends the acquisition of sound.
  • the sound collection device 10 acquires voices in each of a plurality of sound collection units, and internally records the sound as the sound of each channel corresponding to each sound collection unit. Then, the sound collection device 10 transmits the acquired voices of the plurality of channels to the voice analysis device 100 (c). The sound collector 10 may transmit the acquired voice sequentially or may transmit a predetermined amount or a predetermined time of sound. Further, the sound collection device 10 may collectively transmit the sound from the start to the end of the acquisition.
  • the voice acquisition unit 112 of the voice analysis device 100 receives voice from the sound collection device 10 and stores the voice in the voice storage unit 132.
  • the voice analysis device 100 analyzes voice at predetermined timing using the voice acquired from the sound collection device 10.
  • the voice analysis device 100 may analyze the voice when the analyst gives an analysis instruction at the communication terminal 20 by a predetermined operation. In this case, the analyst selects a voice corresponding to the argument to be analyzed from the voices stored in the voice storage unit 132.
  • the voice analysis device 100 may analyze the voice when the voice acquisition ends. In this case, the speech from the start to the end of the acquisition corresponds to the argument to be analyzed. In addition, the voice analysis device 100 may analyze voice sequentially (that is, in real time processing) during acquisition of voice. In this case, the voice analysis device 100 goes back from the current time, and the voice for a predetermined time (for example, 30 seconds) in the past corresponds to the argument to be analyzed.
  • a predetermined time for example, 30 seconds
  • the sound source localization unit 113 When analyzing speech, first, the sound source localization unit 113 performs sound source localization based on the plurality of channels of speech acquired by the speech acquisition unit 112 (d). The sound source localization is processing for estimating the direction of the sound source included in the sound acquired by the sound acquisition unit 112 for each time (for example, every 10 milliseconds to 100 milliseconds). The sound source localization unit 113 associates the direction of the sound source estimated for each time with the position of the participant stored in the position storage unit 131.
  • the sound source localization unit 113 can identify the direction of the sound source based on the sound acquired from the sound collection device 10, a known sound source localization method such as Multiple Signal Classification (MUSIC) method or beam forming method can be used. .
  • MUSIC Multiple Signal Classification
  • the analysis unit 115 analyzes the voice based on the voice acquired by the voice acquisition unit 112, the direction of the sound source estimated by the sound source localization unit 113, and the position of the participant stored in the position storage unit 131 (e) .
  • the analysis unit 115 may analyze the entire completed discussion as an analysis target, or may analyze a part of the discussion in the case of real-time processing.
  • the analysis unit 115 first analyzes the voice based on the voice acquired by the voice acquisition unit 112, the direction of the sound source estimated by the sound source localization unit 113, and the position of the participant stored in the position storage unit 131. In the discussion, it is determined at time intervals (eg, every 10 milliseconds to 100 milliseconds) which participants have made a speech.
  • the analysis unit 115 specifies, as a speech period, a continuous period from when one participant starts talking to when it ends, and causes the analysis result storage unit 133 to store the same. When a plurality of participants speak at the same time, the analysis unit 115 specifies a speech period for each participant.
  • the analysis unit 115 calculates the amount of speech of each participant for each time, and causes the analysis result storage unit 133 to store the amount. Specifically, in a certain time window (for example, 5 seconds), the analysis unit 115 calculates a value obtained by dividing the length of time during which the participant speaks by the length of the time window as the amount of speech per time Do. Then, the analysis unit 115 calculates the amount of speech per hour for each participant while shifting the time window by a predetermined time (for example, one second) from the start time of the discussion to the end time (currently in the case of real time processing). repeat.
  • a predetermined time for example, one second
  • the follow-up unit 114 acquires the position of the latest participant at predetermined time intervals in the voice to be analyzed by the sound source localization unit 113 and the analysis unit 115 by the following processing described later, and the participant stored in the position storage unit 131 Update the position of. Thereby, even if the participant moves from the already set position during acquisition of the sound, the sound source localization unit 113 and the analysis unit 115 can analyze the sound by following up.
  • the output unit 116 performs control to display the analysis result by the analysis unit 115 on the display unit 21 by transmitting the display information to the communication terminal 20 (f).
  • the output unit 116 is not limited to the display on the display unit 21 and may output the analysis result by other methods such as printing by a printer, data recording to a storage device, and the like.
  • FIG. 4 is a diagram showing a flowchart of the entire speech analysis method performed by the speech analysis apparatus 100 according to the present embodiment.
  • the position setting unit 111 specifies the position of each participant of the argument to be analyzed by position setting processing described later, and stores the position in the position storage unit 131 (S1).
  • the voice acquisition unit 112 obtains a voice from the sound collection device 10 and stores the voice in the voice storage unit 132 (S2).
  • the voice analysis device 100 analyzes the voice acquired by the voice acquisition unit 112 in step S2 for each predetermined time range (time window) from the start time to the end time.
  • the sound source localization unit 113 executes sound source localization in the time range of the voice to be analyzed, and associates the estimated direction of the sound source with the position of each participant stored in the position storage unit 131 (S3).
  • the following unit 114 acquires the latest participant's position in the time range of the voice to be analyzed by the following processing described later, and updates the participant's position stored in the position storage unit 131 (S4).
  • the analysis unit 115 analyzes the voice based on the voice acquired by the voice acquisition unit 112 in step S2, the direction of the sound source estimated in step S3 by the sound source localization unit 113, and the position of the participant stored in the position storage unit 131. To do (S5).
  • the analysis unit 115 stores the analysis result in the analysis result storage unit 133.
  • the voice analysis device 100 executes the steps S3 to S5 for the next time range in the voice to be analyzed. repeat. If the analysis is completed until the end time of the voice acquired by the voice acquisition unit 112 in step S2 (YES in S6), the output unit 116 outputs the analysis result in step S5 according to a predetermined method (S7).
  • FIG. 5 is a front view of the display unit 21 of the communication terminal 20 displaying the setting screen A.
  • the analyst manually sets the position of each participant by operating the communication terminal 20, and each participant inputs information for specifying the position of the participant in the sound collection device 10. And automatic setting processing.
  • the communication terminal 20 displays the setting screen A on the display unit 21 and receives the setting of the analysis condition by the analyst.
  • the setting screen A includes a position setting area A1, a start button A2, an end button A3, and an automatic setting button A4.
  • the positioning area A1 is an area for setting the direction in which each participant U is actually positioned with reference to the sound collection device 10 in the argument to be analyzed.
  • the position setting area A1 represents a circle centered on the position of the sound collector 10, and further represents an angle based on the sound collector 10 along the circle.
  • the analyst who desires the manual setting process sets the position of each participant U in the position setting area A1 by operating the operation unit 22 of the communication terminal 20.
  • identification information here, U1 to U4
  • U1 to U4 identification information for identifying each participant U is allocated and displayed.
  • FIG. 5 four participants U1 to U4 are set.
  • the portion corresponding to each participant U in the positioning area A1 is displayed in a different color for each participant. Thereby, the analyst can easily recognize the direction in which each participant U is set.
  • the start button A2, the end button A3 and the automatic setting button A4 are virtual buttons displayed on the display unit 21 respectively.
  • the communication terminal 20 transmits a signal of a start instruction to the voice analysis device 100 when the analyst presses the start button A2.
  • the communication terminal 20 transmits a signal of a termination instruction to the voice analysis device 100 when the analyst presses the termination button A3.
  • from the start instruction to the end instruction by the analyst is one discussion.
  • An analyst who desires the automatic setting process causes the voice analysis device 100 to start the automatic setting process by pressing the automatic setting button A4.
  • the communication terminal 20 transmits an automatic setting instruction signal to the voice analysis device 100.
  • FIGS. 6A to 6C are schematic views of the automatic setting process performed by the voice analysis device 100 according to the present embodiment.
  • the voice analysis device 100 sets the position of the participant U by at least one of the processes shown in FIGS. 6 (a) to 6 (c).
  • FIG. 6A shows a process of setting the position of the participant U based on the voice uttered by the participant U.
  • the position setting unit 111 of the voice analysis device 100 causes the sound collection unit of the sound collection device 10 to acquire the voice emitted by each participant U.
  • the position setting unit 111 acquires the sound acquired by the sound collection device 10.
  • the position setting unit 111 specifies the direction of each participant U based on the direction in which the acquired voice is emitted.
  • the position setting unit 111 uses the result of sound source localization by the above-described sound source localization unit 113 in order to specify the direction of the participant from the voice. Then, the position setting unit 111 causes the position storage unit 131 to store the positions of the identified participants U.
  • the position setting unit 111 may specify the individual of the participant U by comparing the acquired voice of each participant U with the voice of the individual stored in advance in the voice analysis device 100. For example, the position setting unit 111 identifies the individual by comparing the voiceprints of the voices of the participants U (that is, the frequency spectrum of the voice). As a result, personal information of the participant U can be displayed together with the analysis result, and a plurality of analysis results of the same participant U can be displayed.
  • FIG. 6B shows a process of setting the position of the participant U based on the image of the face of the participant U.
  • the position setting unit 111 of the voice analysis device 100 causes the imaging unit 11 provided in the sound collection device 10 to capture an area including the faces of all the participants U surrounding the sound collection device 10.
  • the position setting unit 111 acquires an image captured by the imaging unit 11.
  • the position setting unit 111 recognizes the face of each participant U in the acquired image.
  • the position setting unit 111 can use a known face recognition technology to recognize a human face from an image. Then, the position setting unit 111 identifies the position of each participant U based on the sound collection device 10 based on the position of the face of each participant U recognized from the image, and stores the position in the position storage unit 131.
  • the relationship between the position in the image (for example, the coordinates of the pixels in the image) and the position based on the sound collection device 10 (for example, the angle with respect to the sound collection device 10) is set in advance in the voice analysis device 100.
  • the position setting unit 111 may specify the individual of the participant U by comparing the face of each participant U recognized from the image with the face of the individual stored in the voice analysis device 100 in advance. As a result, personal information of the participant U can be displayed together with the analysis result, and a plurality of analysis results of the same participant U can be displayed.
  • FIG. 6C shows a process of setting the position of the participant U based on the information of the card C presented by the participant U.
  • the position setting unit 111 of the voice analysis device 100 causes the reading unit 12 provided in the sound collection device 10 to read the information of the card C presented by each participant U.
  • the position setting unit 111 acquires the information on the card C read by the reading unit 12 and the direction of the participant U who presented the card C.
  • the position setting unit 111 identifies the position of each participant U with respect to the sound collection device 10 based on the acquired information of the card C and the direction of the participant U, and causes the position storage unit 131 to store the position.
  • the position setting unit 111 may specify the individual of the participant U by acquiring personal information stored in advance in the voice analysis device 100 using the acquired information of the card C. As a result, personal information of the participant U can be displayed together with the analysis result, and a plurality of analysis results of the same participant U can be displayed.
  • the position setting unit 111 may execute the automatic setting process and the manual setting process in combination.
  • the position setting unit 111 displays the position of each participant U set by the automatic setting process of FIGS. 6A to 6C in the position setting area A1 of FIG. Accept manual settings by the user.
  • the position of each participant U set by the automatic setting process can be corrected by the manual setting process, and the position of each participant U can be set more reliably.
  • the voice analysis device 100 can automatically set the position of each participant U based on the information on the participant U acquired on the sound collection device 10, the analyst can set the positions of all the groups on the communication terminal 20. It is possible to reduce the trouble of setting the position of each participant U.
  • the voice analysis device 100 is not limited to information on voice, an image, or a card as information on the participant U that can be acquired on the sound collection device 10 (that is, information for specifying the position of the participant U). Other information that can identify the orientation may be used.
  • FIG. 7 is a diagram showing a flowchart of position setting processing performed by the voice analysis device 100 according to the present embodiment.
  • the position setting unit 111 determines whether or not the automatic setting process is instructed by the analyst on the setting screen A in FIG. 5. When the automatic setting process is not instructed (ie, in the case of manual setting) (NO in S11), the position setting unit 111 follows the contents input on the setting screen A displayed on the communication terminal 20 for each participant. The position is specified and set in the position storage unit 131 (S12).
  • the position setting unit 111 acquires information on the participant (that is, information for specifying the position of the participant) on the sound collection device 10 (S13).
  • the position setting unit 111 uses at least one of the voice of the participant, the image of the face of the participant, and the information of the card presented by the participant as the information on the participant.
  • the position setting unit 111 specifies the position of each participant U with respect to the sound collection device 10 based on the acquired information on the participants (S14). Then, the position setting unit 111 sets the positions of the participants by storing the positions of the identified participants in the position storage unit 131 (S15).
  • FIG. 8 is a schematic view of the follow-up process performed by the voice analysis device 100 according to the present embodiment.
  • the following process is a process of updating the position of each participant U stored in the position storage unit 131 in the middle of the voice to be analyzed by the sound source localization unit 113 and the analysis unit 115.
  • the upper part of FIG. 8 shows the position of each participant U before the update
  • the lower part of FIG. 8 shows the position of each participant U after the update.
  • the upper view of FIG. 8 shows a state in which the participant U1 has moved from the position P1 set in the position storage section 131 to another position P2. In this state, the voice emitted by the participant U1 enters the sound collector 10 from a position P2 different from the set position P1 of the participant U1. Therefore, the analysis unit 115 can not detect the utterance of the participant U1 from the voice.
  • the tracking unit 114 updates the position of the participant U1 from the position P1 to the position P2 in the position storage unit 131 as illustrated in the lower part of FIG.
  • the analysis unit 115 can correctly detect the utterance of the participant U1.
  • the tracking unit 114 acquires the direction of the sound source estimated by the sound source localization unit 113 every predetermined time (for example, one minute). If the estimated direction of the sound source does not correspond to any of the positions of the participants U stored in the position storage unit 131, the tracking unit 114 determines that any participant U moves in the direction of the sound source judge. Then, the tracking unit 114 specifies the moved participant U, and updates the position stored in the position storage unit 131 for the moved participant U to a position corresponding to the direction of the sound source.
  • the tracking unit 114 specifies that the participant U set at the position closest to the direction of the sound source estimated from the sound acquired by the sound collection device 10 has moved to the position corresponding to the direction of the sound source. In this case, the tracking unit 114 selects the moved participant U from among the participants U set at a position within a predetermined range (for example, within a range of -30 degrees to +30 degrees) from the direction of the sound source. It is also good. By limiting the range of movement, the following unit 114 can suppress, for example, the position of the participant U from being moved to the wrong position.
  • a predetermined range for example, within a range of -30 degrees to +30 degrees
  • the tracking unit 114 compares the voiceprint of the sound source with the voiceprint of each participant U, and specifies that the participant U having the voiceprint similar to the voiceprint of the sound source has moved to the position corresponding to the direction of the sound source. It is also good. In this case, the tracking unit 114 may acquire the voiceprint of each participant U from the voice of each participant U at the start of analysis, or may acquire the voiceprint of each participant U stored in the storage unit 130 in advance. It is also good. The tracking unit 114 calculates the degree of similarity of voiceprints between the voiceprint of the sound source and the voiceprint of each participant U.
  • the tracking unit 114 selects the participant U whose voiceprint similarity is the highest in the group, or selects the participant U whose voiceprint similarity is equal to or higher than a predetermined threshold.
  • the tracking accuracy can be improved by specifying the moved participant U using the voiceprint.
  • the following unit 114 acquires a face located in the direction of the sound source in the image captured by the imaging unit 11 of the sound collection device 10, and the participant U having a face similar to the acquired face has the direction of the sound source It may specify that it moved to the position corresponding to.
  • the tracking unit 114 may acquire the face of each participant U from the image captured by the imaging unit 11 at the start of analysis, or may acquire the face of each participant U stored in the storage unit 130 in advance. It is also good.
  • the tracking unit 114 calculates the similarity of the face between the face located in the direction of the sound source and the face of each participant U.
  • the tracking unit 114 selects the participant U whose face similarity is the highest in the group, or selects the participant U whose face similarity is equal to or higher than a predetermined threshold. By specifying the moved participant U using the face, it is possible to improve the tracking accuracy.
  • the following unit 114 recognizes voiceprints for each participant U based on the difference between the direction of the sound source and the position (direction) stored in the position storage unit 131.
  • the face similarity may be weighted. The probability that the participant U moves to the position of the sound source is higher as the position set for the participant U and the direction of the sound source are closer, and the distance between the position set for the participant U and the direction of the sound source is higher It can be said that the probability of moving to the position of the sound source is low.
  • the tracking unit 114 weights the similarity of the voiceprint or face higher as the difference between the position set for the participant U and the direction of the sound source decreases, and the difference between the position set for the participant U and the direction of the sound source The greater the value of V, the lower the degree of similarity of the voiceprint or face. This can further improve the tracking accuracy.
  • FIG. 9 is a diagram showing a flowchart of the follow-up process performed by the voice analysis device 100 according to the present embodiment.
  • the tracking unit 114 acquires the direction of the sound source estimated by the sound source localization unit 113. If the direction of the sound source corresponds to any of the positions of the participants U stored in the position storage unit 131 (YES in S41), the following unit 114 ends the process without updating the position. .
  • the following unit 114 selects information about the participants on the sound collection device 10 (Ie, information for specifying the position of the participant) is acquired (S42).
  • the tracking unit 114 uses at least one of the voice of the participant and the image of the face of the participant as the information on the participant.
  • the following unit 114 specifies which participant has moved based on the acquired information on the participant (S43). Then, the tracking unit 114 updates the position stored in the position storage unit 131 to the position corresponding to the direction of the sound source for the participant U specified as having moved (S44).
  • the voice analysis device 100 is information on each participant such as a voice emitted by the participant, an image of the participant's face, information of a card presented by the participant, etc. in the sound collection device 10 arranged in each group. And automatically set the position of each participant based on the acquired information. Therefore, when analyzing the voice of the discussion, it is possible to reduce the trouble of setting the position of each participant for each group.
  • the voice analysis device 100 updates the position of each participant based on the information on each participant during analysis of voice. Therefore, even if the participant moves in the middle of acquiring the voice, it is possible to follow and analyze.
  • the speech analysis device 100 is used for analysis of speech in the discussion, but can be applied to other applications.
  • the voice analysis device 100 can also analyze the voice emitted by a passenger sitting in a car.
  • the processor of the voice analysis device 100, the sound collection device 10, and the communication terminal 20 is a main body of each step (process) included in the voice analysis method shown in FIGS. That is, the processors of the voice analysis device 100, the sound collection device 10 and the communication terminal 20 read a program for executing the voice analysis method shown in FIGS. 4, 7 and 9 from the storage unit and execute the program to perform voice analysis. By controlling the respective units of the device 100, the sound collection device 10 and the communication terminal 20, the voice analysis method shown in FIGS.
  • the steps included in the speech analysis method shown in FIGS. 4, 7 and 9 may be partially omitted, the order between the steps may be changed, and a plurality of steps may be performed in parallel.
  • S speech analysis system 100 speech analysis device 110 control unit 111 position setting unit 112 speech acquisition unit 114 tracking unit 115 analysis unit 10 sound collection device 20 communication terminal

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Telephonic Communication Services (AREA)
  • Image Analysis (AREA)

Abstract

L'objectif de la présente invention est de fournir un dispositif d'analyse de voix, un procédé d'analyse de voix, un programme d'analyse de voix, et un système d'analyse de voix, qui, lorsque des voix dans une discussion sont analysées, permettent une réduction du temps et de l'effort nécessaires à la définition des positions des participants. Selon un mode de réalisation de la présente invention, un dispositif d'analyse de voix (100) comprend : une unité de définition de position (111) qui acquiert des informations concernant une pluralité de participants à partir d'un dispositif de collecte de sons, et définit les positions de la pluralité de participants sur la base des informations acquises concernant les participants ; une unité d'acquisition de voix (112) qui acquiert des voix à partir du dispositif de collecte de sons ; et une unité d'analyse (115) qui analyse des voix émanant des participants respectifs sur la base des positions définies par l'unité de définition de position (111).
PCT/JP2018/000943 2018-01-16 2018-01-16 Dispositif d'analyse de voix, procédé d'analyse de voix, programme d'analyse de voix et système d'analyse de voix WO2019142232A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2018502280A JP6589041B1 (ja) 2018-01-16 2018-01-16 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
PCT/JP2018/000943 WO2019142232A1 (fr) 2018-01-16 2018-01-16 Dispositif d'analyse de voix, procédé d'analyse de voix, programme d'analyse de voix et système d'analyse de voix

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/000943 WO2019142232A1 (fr) 2018-01-16 2018-01-16 Dispositif d'analyse de voix, procédé d'analyse de voix, programme d'analyse de voix et système d'analyse de voix

Publications (1)

Publication Number Publication Date
WO2019142232A1 true WO2019142232A1 (fr) 2019-07-25

Family

ID=67301394

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/000943 WO2019142232A1 (fr) 2018-01-16 2018-01-16 Dispositif d'analyse de voix, procédé d'analyse de voix, programme d'analyse de voix et système d'analyse de voix

Country Status (2)

Country Link
JP (1) JP6589041B1 (fr)
WO (1) WO2019142232A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05161190A (ja) * 1991-12-09 1993-06-25 Toda Constr Co Ltd 音声反応マイクロホン
JP2000356674A (ja) * 1999-06-11 2000-12-26 Japan Science & Technology Corp 音源同定装置及びその同定方法
JP2005274707A (ja) * 2004-03-23 2005-10-06 Sony Corp 情報処理装置および方法、プログラム、並びに記録媒体
JP2006189626A (ja) * 2005-01-06 2006-07-20 Fuji Photo Film Co Ltd 記録装置及び音声記録プログラム
JP2017129873A (ja) * 2017-03-06 2017-07-27 本田技研工業株式会社 会話支援装置、会話支援装置の制御方法、及び会話支援装置のプログラム
JP2017173768A (ja) * 2016-03-25 2017-09-28 グローリー株式会社 議事録作成システム

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20160026317A (ko) * 2014-08-29 2016-03-09 삼성전자주식회사 음성 녹음 방법 및 장치

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05161190A (ja) * 1991-12-09 1993-06-25 Toda Constr Co Ltd 音声反応マイクロホン
JP2000356674A (ja) * 1999-06-11 2000-12-26 Japan Science & Technology Corp 音源同定装置及びその同定方法
JP2005274707A (ja) * 2004-03-23 2005-10-06 Sony Corp 情報処理装置および方法、プログラム、並びに記録媒体
JP2006189626A (ja) * 2005-01-06 2006-07-20 Fuji Photo Film Co Ltd 記録装置及び音声記録プログラム
JP2017173768A (ja) * 2016-03-25 2017-09-28 グローリー株式会社 議事録作成システム
JP2017129873A (ja) * 2017-03-06 2017-07-27 本田技研工業株式会社 会話支援装置、会話支援装置の制御方法、及び会話支援装置のプログラム

Also Published As

Publication number Publication date
JPWO2019142232A1 (ja) 2020-01-23
JP6589041B1 (ja) 2019-10-09

Similar Documents

Publication Publication Date Title
EP3791390B1 (fr) Enrôlement pour la reconnaissance du locuteur
CN110443110B (zh) 基于多路摄像的人脸识别方法、装置、终端及存储介质
JP2019082990A (ja) 身元認証方法、端末装置、およびコンピュータ可読記憶媒体{identity authentication method, terminal equipment and computer readable storage medium}
CN109254669B (zh) 一种表情图片输入方法、装置、电子设备及系统
CN105354543A (zh) 视频处理方法及装置
CN111883168B (zh) 一种语音处理方法及装置
CN112148922A (zh) 会议记录方法、装置、数据处理设备及可读存储介质
US10922570B1 (en) Entering of human face information into database
CN110941992B (zh) 微笑表情检测方法、装置、计算机设备及存储介质
KR102263154B1 (ko) 스마트 미러 기반 얼굴 감성 표현 시스템 및 그 운용방법
JP2020148931A (ja) 議論分析装置及び議論分析方法
CN110505504A (zh) 视频节目处理方法、装置、计算机设备及存储介质
CN114556469A (zh) 数据处理方法、装置、电子设备和存储介质
CN113143193A (zh) 智能视力测试方法、装置及系统
JP6646134B2 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
WO2019142232A1 (fr) Dispositif d'analyse de voix, procédé d'analyse de voix, programme d'analyse de voix et système d'analyse de voix
JP6975755B2 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
JP6589042B1 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
CN113409822B (zh) 对象状态的确定方法、装置、存储介质及电子装置
JP6589040B1 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
JP2015177490A (ja) 映像音声処理システム、情報処理装置、映像音声処理方法、及び映像音声処理プログラム
JP7427274B2 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
CN111310602A (zh) 一种基于情绪识别的展品关注度分析系统及分析方法
JP7414319B2 (ja) 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム
JP7261462B2 (ja) 音声分析装置、音声分析システム及び音声分析方法

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2018502280

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18900852

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 02.10.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18900852

Country of ref document: EP

Kind code of ref document: A1