WO2021245759A1 - 音声会議装置、音声会議システム及び音声会議方法 - Google Patents
音声会議装置、音声会議システム及び音声会議方法 Download PDFInfo
- Publication number
- WO2021245759A1 WO2021245759A1 PCT/JP2020/021646 JP2020021646W WO2021245759A1 WO 2021245759 A1 WO2021245759 A1 WO 2021245759A1 JP 2020021646 W JP2020021646 W JP 2020021646W WO 2021245759 A1 WO2021245759 A1 WO 2021245759A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- conference
- conferences
- unit
- voice conference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/40—Support for services or applications
- H04L65/403—Arrangements for multi-party communication, e.g. for conferences
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/14—Digital output to display device ; Cooperation and interconnection of the display device with other functional units
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/72—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for transmitting results of analysis
Definitions
- the present invention relates to a voice conference device, a voice conference system, and a voice conference method for executing a voice conference via a network.
- Patent Document 1 discloses a system that identifies a speaker who is speaking in a voice conference and visually clearly indicates the specified speaker.
- the present invention has been made in view of these points, and an object of the present invention is to enable an administrator to easily know the status of a plurality of audio conferences.
- the voice conferencing device of the first aspect of the present invention is a voice conferencing device that provides a plurality of voice conferencing via a network, and is used by a plurality of users participating in each of the plurality of voice conferencing.
- a voice conferencing unit that transmits and receives voices emitted in the voice conference, a voice analysis unit that analyzes the voices emitted in each of the plurality of voice conferences, and the plurality of voice conferences.
- it has a display control unit for displaying the analysis result of the voice emitted in the voice conference by the voice analysis unit on the administrator terminal used by the administrator who manages the plurality of voice conferences. ..
- the display control unit may display the analysis result on the administrator terminal in association with each of the plurality of voice conferences while the plurality of voice conferences are continuing.
- the voice conference unit may accept intervention by the administrator terminal to any one of the plurality of voice conferences using at least one of voice, characters, or images.
- the voice conference device may further have a proposal unit that proposes an intervention in any of the plurality of voice conferences to the administrator terminal based on the analysis result.
- the display control unit may make the display mode of the voice conference for which the proposal unit proposes intervention different from the display mode of the other voice conferences among the plurality of voice conferences.
- the voice conference unit may automatically intervene in any of the plurality of voice conferences using at least one of voice, characters, or images based on the analysis result.
- the display control unit may display the analysis result on each of the plurality of user terminals used by the plurality of users participating in the voice conference while the voice conference continues. ..
- the display control unit is the degree of exchange of remarks between the plurality of users in the voice conference between the plurality of captured images obtained by capturing the plurality of users while the voice conference is continuing. A symbol indicating the above may be displayed.
- the voice conference unit receives input of a predetermined action from each of the plurality of user terminals used by the plurality of users participating in the voice conference, and the voice conference unit receives the input of a predetermined action.
- the display control unit may display the information indicating the action input in each of the plurality of voice conferences on the administrator terminal.
- the voice analysis unit may analyze the voice emitted in each of the plurality of voice conferences by comparing it with a voice pattern acquired in the past or a voice pattern of a model.
- the voice analysis unit may analyze the voice emitted in each of the plurality of voice conferences by comparing it with the voice pattern different for each purpose of the voice conference.
- the voice analysis unit may analyze the voice emitted in each of the plurality of voice conferences by comparing it with the voice pattern different for each period of the voice conference.
- the voice analysis unit divides the plurality of users into a plurality of groups based on the analysis result, and the voice conference unit includes the plurality of users included in each of the plurality of groups.
- a voice conference may be started.
- the voice analysis unit may divide the plurality of users into a plurality of groups based on the attributes of each of the plurality of users.
- the voice analysis unit may divide the plurality of users into a plurality of groups so that the amount of speech or the tendency of speech of the plurality of users belonging to one group becomes close. good.
- the audio conferencing system of the second aspect of the present invention includes an audio conferencing device that provides a plurality of audio conferencing over a network, and an administrator terminal used by an administrator who manages the plurality of audio conferencing.
- the voice conferencing device includes a voice conferencing unit that transmits and receives voices emitted in the voice conferencing between a plurality of user terminals used by a plurality of users participating in each of the plurality of voice conferencing.
- the voice analysis unit that analyzes the voice emitted in each of the plurality of voice conferences and the analysis result of the voice emitted in the voice conference by the voice analysis unit in association with each of the plurality of voice conferences are described. It has a display control unit to be displayed on the administrator terminal, and the administrator terminal has a display unit for displaying the analysis result.
- the voice conferencing method relates to a plurality of user terminals used by a plurality of users participating in each of a plurality of voice conferencing via a network executed by a computer.
- the voice conference in the step of transmitting and receiving the voice emitted in the voice conference, the step of analyzing the voice emitted in each of the plurality of voice conferences, and the step of relating to each of the plurality of voice conferences and analyzing the voice conference. It has a step of displaying the analysis result of the emitted voice on an administrator terminal used by an administrator who manages the plurality of voice conferences.
- FIG. 1 is a schematic diagram of the audio conferencing system S according to the present embodiment.
- the voice conference system S includes a voice conference device 1, a plurality of user terminals 2, and an administrator terminal 3.
- the number of user terminals 2 and administrator terminals 3 included in the voice conference system S is not limited.
- the voice conference system S may include other devices such as servers and terminals.
- the voice conference device 1 is a computer that provides a plurality of voice conferences via a network.
- the voice conference is to exchange voices between a plurality of user terminals 2.
- the voice conference may be to exchange at least one of an image (the image may be a still image or a moving image) or a character in addition to the sound between the plurality of user terminals 2.
- the voice conference device 1 analyzes the voice emitted in the voice conference and displays the analysis result on the user terminal 2 and the administrator terminal 3.
- the voice conference device 1 is composed of, for example, a single computer or a cloud that is a collection of computer resources.
- the voice conference device 1 is connected to the user terminal 2 and the administrator terminal 3 by wire or wirelessly via a network such as a local area network or the Internet.
- the user terminal 2 is an information terminal used by users (students, employees, etc.) who participate in the voice conference provided by the voice conference system S.
- the user terminal 2 is, for example, a personal computer, a smartphone, a tablet terminal, or the like. Further, the user terminal 2 may be a wearable terminal worn by the user to input / output voice and images.
- the user terminal 2 receives the input of the user's voice and transmits it to the voice conference device 1, and also receives and outputs the voice input by the other user terminal 2 from the voice conference device 1.
- an example in which one user uses one user terminal 2 will be described, but a plurality of users may use one user terminal 2.
- the administrator terminal 3 is an information terminal used by an administrator (teacher, boss, etc. who is in a position to manage the user) who manages one or more voice conferences provided by the voice conference system S.
- the administrator terminal 3 is, for example, a personal computer, a smartphone, a tablet terminal, or the like. Further, the administrator terminal 3 may be a wearable terminal worn by the administrator to input / output voice and images.
- the administrator terminal 3 displays the analysis result of the voice by the voice conference device 1 and accepts the intervention of the manager into the voice conference.
- the voice conference device 1 analyzes the voice emitted in each of the plurality of voice conferences, associates them with each of the plurality of voice conferences, and outputs the voice analysis result to the administrator terminal 3. Display it.
- the voice conference system S can give the administrator a bird's-eye view of the status of the plurality of voice conferences, and the administrator can easily know the status of the plurality of voice conferences.
- FIG. 2 is a block diagram of the audio conferencing system S according to the present embodiment.
- the arrows indicate the main data flows, and there may be data flows not shown in FIG.
- each block shows not a hardware (device) unit configuration but a functional unit configuration. Therefore, the block shown in FIG. 2 may be mounted in a single device, or may be mounted separately in a plurality of devices. Data can be exchanged between blocks via any means such as a data bus, a network, and a portable storage medium.
- the voice conference device 1 has a storage unit 11 and a control unit 12.
- the control unit 12 includes a voice conference unit 121, a voice analysis unit 122, a proposal unit 123, and a display control unit 124.
- the storage unit 11 is a storage medium including a ROM (ReadOnlyMemory), a RAM (RandomAccessMemory), a hard disk drive, and the like.
- the storage unit 11 stores in advance the program executed by the control unit 12.
- the storage unit 11 may be provided outside the voice conferencing device 1, and in that case, data may be exchanged with the control unit 12 via a network.
- the control unit 12 is, for example, a processor such as a CPU (Central Processing Unit), and by executing a program stored in the storage unit 11, the voice conference unit 121, the voice analysis unit 122, the proposal unit 123, and the display control unit 124 are executed. Functions as. At least some of the functions of the control unit 12 may be performed by an electric circuit. Further, at least a part of the functions of the control unit 12 may be executed by a program executed via the network.
- a processor such as a CPU (Central Processing Unit)
- the voice conference unit 121 executes a plurality of voice conferences by transmitting and receiving voices between the plurality of user terminals 2.
- the voice conference unit 121 is input to the voice (that is, the voice input / output unit 22) emitted in the voice conference with the plurality of user terminals 2 used by the plurality of users participating in the voice conference. Voice) is sent and received.
- the voice conference unit 121 communicates with a plurality of user terminals 2 used by a plurality of users participating in the voice conference, and captures an image (that is, captures) of the user participating in the voice conference.
- the captured image captured by the unit 23) is transmitted and received.
- the voice conference unit 121 has input contents (input) of the user participating in the voice conference with the plurality of user terminals 2 used by the plurality of users participating in the voice conference. Send and receive letters, actions, etc.).
- the voice conference unit 121 can share the voice, the captured image and the input contents among the plurality of user terminals 2 and execute the voice conference.
- the voice conference unit 121 is not limited to the specific method shown here, and a known method can be used to execute the voice conference.
- the voice conference unit 121 accepts the intervention using characters or voice by the administrator terminal 3 for the voice conference in which the proposal unit 123 proposes intervention, or automatically intervenes using characters or voice. May be good.
- the intervention in the voice conference is, for example, to output at least one of the characters, voices or images input by the administrator on the administrator terminal 3 to the user terminal 2 of each user participating in the voice conference.
- at least one of the characters, voices, or images generated by the voice conference unit 121 is output to the user terminal 2 of each user participating in the voice conference.
- the voice conference unit 121 participates in the voice conference with the administrator terminal 3 when, for example, an operation of selecting an intervention button 314 corresponding to any voice conference is performed in the operation unit 34 of the administrator terminal 3.
- the voice conference unit 121 may transmit and receive at least one of voice and characters only to a part of the user terminals 2 selected by the administrator terminal 3 among the plurality of user terminals 2.
- the voice conference unit 121 may add or replace at least one of voice or characters to the image (the image may be a still image or a moving image) designated by the user terminal 2 and the administrator.
- the image specified by the terminal 3 may be transmitted and received.
- the voice analysis unit 122 analyzes the voice emitted in each of the plurality of voice conferences.
- the voice analysis unit 122 calculates, for example, the amount of speech of each of the plurality of users in association with each of the plurality of speech conferences, and also calculates the degree of interaction (transition of the speaker) between the plurality of users.
- the voice analysis unit 122 determines which user speaks (speaks) every hour (for example, every 10 milliseconds to 100 milliseconds) in the voice conference based on the voice acquired by the voice conference unit 121. Determine. When the voice of one user is input to one user terminal 2, which user speaks based on the user terminal 2 from which the voice is acquired in the voice analysis unit 122. To determine. When voices of a plurality of users are input to one user terminal 2, the voice analysis unit 122 acquires the voices from one user terminal 2 by executing a known speaker separation process. Determine which user made a statement in the voice.
- the voice analysis unit 122 specifies a continuous period from the start of one user's speech to the end as a speech period, and stores it in the storage unit 11. Further, the voice analysis unit 122 calculates the amount of speech of each user for each hour and stores it in the storage unit 11. For example, in a certain time window (for example, 5 seconds), the voice analysis unit 122 calculates a value obtained by dividing the length of time that the user has spoken by the length of the time window as the amount of speech for each hour.
- the voice analysis unit 122 shifts the time window by a predetermined time (for example, 1 second) from the start time of the voice conference to the current time (the end time in the case of analysis after the end of the voice conference) for each user. Repeat the calculation of the amount of speech for each hour. Then, the voice analysis unit 122 detects the transition of the speaker when switching to another speech period after a certain speech period. The voice analysis unit 122 aggregates the transition occurrence time detected in the discussion of the analysis target, the transition source user, and the transition destination user, and stores them in the storage unit 11 in association with each other.
- a predetermined time for example, 1 second
- the voice analysis unit 122 may analyze the voice emitted in each of the plurality of voice conferences by comparing the voice pattern acquired in the past or the voice pattern of the model.
- the storage unit 11 stores in advance a voice pattern acquired in the past or a voice pattern of a model.
- the model voice pattern is time-series data of the amount of speech of the model human being, which is created by, for example, acquiring the voice pattern of the voice conference by the model human in advance.
- the voice pattern acquired in the past or the voice pattern of the model is defined for each purpose of the voice conference (use of voice conference, type of voice conference user) such as education, sales, interview, etc.
- the voice analysis unit 122 accepts in advance the selection of the purpose of the voice conference from the user terminal 2 or the administrator terminal 3, and compares the amount of speech of each user calculated from the voice of the voice conference with a voice pattern different for each purpose. do.
- voice patterns may be defined for each period of the voice conference, for example.
- the voice analysis unit 122 receives the designation of the scheduled time (for example, 1 hour) of the voice conference in advance from the user terminal 2 or the administrator terminal 3, and schedules the amount of speech of each user calculated from the voice of the voice conference. Compare with different voice patterns for each period in time (first half, middle, second half, etc.).
- the voice patterns of the models to be compared are such.
- the voice patterns of the models to be compared are such.
- the model voice patterns to be compared are such.
- the meeting is for sales purposes (typically used for sales meetings to customers)
- the first half to the middle of the scheduled time will be for the company's products or services.
- the amount of comments made by the sales staff may be large, but from the middle to the latter half, it is desirable that the amount of statements made by the customer is large in order to hear the reaction of the customer. Will be like that.
- the voice analysis unit 122 compares the degree of difference in the amount of speech by the user with respect to the time-series data of the amount of speech indicated by the voice pattern (for example, the cumulative value of the difference between the amount of speech in the voice pattern and the amount of speech by the user). It is calculated as a result and stored in the storage unit 11.
- the proposal unit 123 proposes to the administrator terminal 3 intervention in any of a plurality of voice conferences based on the voice analysis result by the voice analysis unit 122.
- Intervention in a voice conference means that the administrator participates in the voice conference using at least one of characters or voice in the manager terminal 3.
- the proposal unit 123 specifies the voice conference as an intervention target, and displays the voice conference of the specified intervention target as a display control unit. Notify the administrator terminal 3 by 124.
- the proposal unit 123 is concerned, for example, when there is a user in the voice conference whose speech volume is equal to or less than a predetermined value, or when the total speech volume of a plurality of users participating in the voice conference is equal to or less than a predetermined value. Propose audio conferencing as an intervention target.
- the display control unit 124 causes the user terminal 2 and the administrator terminal 3 to display the status of the voice conference and the analysis result based on the analysis by the voice analysis unit 122.
- the detailed display contents by the display control unit 124 will be described later with reference to FIGS. 3 to 6.
- the user terminal 2 has a display unit 21, an audio input / output unit 22, an image pickup unit 23, and an operation unit 24.
- the administrator terminal 3 has a display unit 31, an audio input / output unit 32, an image pickup unit 33, and an operation unit 34.
- the display units 21 and 31 include a liquid crystal display or the like capable of displaying information.
- the voice input / output units 22 and 32 include a microphone and the like for inputting voice, a speaker and the like for outputting voice, and the like.
- the voice input / output unit 22 may use a plurality of microphones or a microphone array according to the speaker separation process performed by the voice analysis unit 122. And so on.
- the image pickup units 23 and 33 include a camera and the like that output a captured image captured by a user or an administrator.
- the operation units 24 and 34 include buttons, switches, touch panels and the like that can accept human operations.
- the voice conferencing device 1, the user terminal 2, and the administrator terminal 3 according to the present embodiment are not limited to the specific configuration shown in FIG.
- the voice conference device 1, the user terminal 2, and the administrator terminal 3 are not limited to one device each, and may be configured by connecting two or more physically separated devices by wire or wirelessly.
- each room (virtual room) is provided in the audio conference (higher-level audio conference) held for the purpose of a predetermined audio conference, and each room (virtual room) has a plurality of lower layers. This is the case when a voice conference is held.
- FIG. 3 is a schematic diagram of a user terminal 2 displaying an exemplary audio conference list screen.
- the user terminal 2 displays a voice conference list screen on the display unit 21 in response to control by the voice conference device 1.
- the audio conference list screen represents the audio conference information 211 and the room entry button 212 in association with each of the plurality of audio conferences (rooms).
- the voice conference information 211 represents the number of users participating in the voice conference. Further, the audio conference information 211 may represent the name of a user participating in the audio conference, the purpose of the audio conference, and the like.
- the voice conference unit 121 voices the user of the user terminal 2 corresponding to the selected room entry button 212. Join the conference (room) and start the audio conference. Further, the voice conference unit 121 may have the user of the user terminal 2 automatically participate in the voice conference (room) assigned in advance.
- the display control unit 124 causes the user terminal 2 to display a voice conference screen including information on the voice conference in which the user is participating.
- FIG. 4 is a schematic diagram of a user terminal 2 displaying an exemplary audio conference screen. The user terminal 2 displays the voice conference screen on the display unit 21 in response to the control by the voice conference device 1.
- the voice conference screen contains a user image 213, character information 214, an input field 215, an action field 216, and an analysis result 217 regarding a voice conference (room) in which the user of the user terminal 2 participates.
- the user image 213 is an image captured by the user, which is captured by the imaging unit 23 of each of the plurality of user terminals 2. If the captured image cannot be acquired from the user terminal 2 or the user does not wish to publish the captured image, a predetermined image or character (user's name, etc.) is displayed at the position of the user image 213. You may.
- the character information 214 represents a message input by the operation unit 24 of each of the plurality of user terminals 2. Further, the character information 214 may represent a message generated by the voice conferencing device 1 (for example, an automatic intervention message described later).
- the input field 215 is an area for the user to input a comment (for example, an impression or a comment) during the voice conference.
- the comment input in the input field 215 is stored in the storage unit 11 of the voice conferencing device 1 in association with the input time.
- the action column 216 is an area for the user to input an action during the voice conference.
- the action column 216 includes a plurality of buttons (icons) corresponding to a plurality of actions such as like, applause, and laughter.
- the action input in the action field 216 is stored in the storage unit 11 of the voice conferencing device 1 in association with the input time.
- the analysis result 217 represents the analysis result of the voice in the voice conference by the voice conference device 1 while the voice conference continues.
- the display control unit 124 changes the transition generated from the start time to the current time in the voice conference for each combination of two users based on the analysis result by the voice analysis unit 122 while the voice conference continues.
- An arrow symbol indicating the number of times (that is, the degree of interaction between a plurality of users) is represented as the analysis result 217.
- the arrow symbol of the analysis result 217 is displayed thicker as the number of transitions is larger, and thinner as the number of transitions is smaller. Thereby, the voice conference system S can visually notify the user of the degree of communication between the users in the voice conference while the voice conference continues.
- the display control unit 124 changes the display mode of the user image 213 according to the amount of speech of the user based on the analysis result by the voice analysis unit 122 while the voice conference continues.
- the display control unit 124 totals the amount of speech from the start time to the current time in the voice conference for each user.
- the display control unit 124 determines the color and size of the user image 213 of the user. , Shape, border, etc. are different from the user image 213 of other users. As a result, the voice conference system S can urge users who have few remarks to speak.
- the display control unit 124 may switch whether or not to display the analysis result 217 on each of the plurality of user terminals 2.
- the display control unit 124 switches whether or not to display the analysis result 217 according to, for example, the purpose of the audio conference.
- the display control unit 124 acquires information indicating the user's proficiency level and personality stored in the storage unit 11 in advance, and the acquired proficiency level or personality is a predetermined condition (for example, being an advanced person or having a discussion.
- the analysis result 217 may be displayed only on the user terminal 2 of the user who satisfies (being a good personality). Further, the display control unit 124 may display the analysis result 217 only on the user terminal 2 of the user designated by the administrator.
- the display control unit 124 acquires the line-of-sight direction of the user by a known line-of-sight detection method, and the acquired line-of-sight direction satisfies a predetermined condition (for example, the analysis result 217 is not gazed at for a predetermined time or longer).
- the analysis result 217 may be displayed only on the user terminal 2 of the above.
- the display control unit 124 causes the administrator terminal 3 to display a voice conference list screen including information on a plurality of voice conferences managed by the administrator.
- each room (virtual room) is provided in the audio conference (higher-level audio conference) held for the purpose of a predetermined audio conference, and each room (virtual room) has a plurality of lower layers. This is the case when a voice conference is held.
- the administrator manages the audio conferences of a plurality of upper layers, even if a plurality of audio conferences are held in the lower layers, the information regarding the audio conferences of the plurality of upper layers can be obtained. It may be possible to display the including audio conference list screen on the administrator terminal 3. In this case, as the information related to the audio conference in the upper layer, all the information of the audio conference held in the audio conference in the upper layer may be displayed without distinguishing the lower layer. Allows you to selectively display information on a specific lower-level audio conference in a higher-level audio conference, so that information on a specific lower-level audio conference in one upper-level audio conference and other higher-level audio conferences can be selectively displayed. It may be possible to display the information of the audio conference of the lower hierarchy in the audio conference of the hierarchy.
- the administrator terminal 3 displays the voice conference list screen on the display unit 31 in response to the control by the voice conference device 1.
- the audio conference list screen shows the audio conference information 311 and the analysis result 312 in association with each of the audio conferences (rooms) of the plurality of lower layers included in the audio conference of the upper layer managed by the administrator. ing.
- the audio conference information 311 represents the name of the user participating in the audio conference.
- the audio conference information 311 may represent the number of users participating in the audio conference, the purpose of the audio conference, and the like.
- the analysis result 312 represents the result of the analysis of the voice conference by the voice analysis unit 122.
- the audio conference list screen includes the proposal information 313, the intervention button 314, the comparison information 315, and the regrouping button 316, which will be described later.
- the display control unit 124 causes the administrator terminal 3 to display the analysis result 312 of the voice emitted in the voice conference by the voice analysis unit 122 in association with each of the plurality of voice conferences (rooms). It is desirable that the display control unit 124 causes the administrator terminal 3 to display the analysis result in association with each of the plurality of voice conferences while the plurality of voice conferences are continuing. As a result, the audio conference system S notifies the administrator of the status of the plurality of audio conferences currently being held, and facilitates the administrator to determine whether or not to intervene in the plurality of audio conferences. be able to.
- the display control unit 124 was generated from the start time to the current time in the voice conference for each combination of two users participating in one voice conference, for example, based on the analysis result by the voice analysis unit 122.
- An arrow symbol indicating the number of transitions (that is, the degree of interaction between a plurality of users) is represented as the analysis result 312.
- the arrow symbol of the analysis result 312 is displayed thicker as the number of transitions is larger, and thinner as the number of transitions is smaller.
- the voice conference system S can visually notify the administrator of the degree of communication between users in the voice conference while the voice conference continues.
- the display control unit 124 for example, based on the analysis result by the voice analysis unit 122, analyzes a line graph showing the time change of the speech amount of each of the plurality of users participating in one voice conference (room). Represented as 312.
- the display control unit 124 displays the amount of speech for each user as a line graph, with the amount of speech on the vertical axis and the time on the horizontal axis.
- the display control unit 124 displays the value obtained by accumulating the user's speech volumes at each time point, that is, the sum of the user's speech volumes in order, on the vertical axis.
- the voice conference system S visually notifies the administrator of the voice volume of the entire voice conference in addition to the voice volume of each user participating in the voice conference while the voice conference continues. be able to.
- the display control unit 124 represents, for example, a bar graph showing the total value of the speech volume of each of the plurality of users participating in one voice conference as the analysis result 312 based on the analysis result by the voice analysis unit 122.
- the voice conference system S can visually notify the administrator of the total value of the speech volume of each user while the voice conference continues.
- the display control unit 124 enlarges the selected arrow symbol, line graph, or bar graph when the operation unit 34 of the administrator terminal 3 performs an operation to select the arrow symbol, line graph, or bar graph included in the analysis result 312. And display it in detail.
- the display control unit 124 displays, as the analysis result 312, not only the arrow symbol, the line graph, and the bar graph, but also other analysis results obtained by analyzing the voice in association with each of the plurality of voice conferences (rooms). May be good.
- the display control unit 124 causes the administrator terminal 3 to display information indicating the voice conference (room) in which the proposal unit 123 is proposing intervention. As shown in FIG. 5A, the display control unit 124 displays the proposal information 313 indicating the audio conference in which the proposal unit 123 is proposing intervention on the audio conference list screen displayed by the administrator terminal 3. Let me. The display control unit 124 displays, for example, the proposal information 313 including characters that can identify the voice conference (room) in which the proposal unit 123 is proposing intervention, and characters that explain the reason why the intervention is necessary.
- the display control unit 124 is the voice conference information 311 corresponding to the voice conference in which the proposal unit 123 proposes intervention among the plurality of voice conferences included in the voice conference list screen displayed by the administrator terminal 3.
- the display mode is different from the display mode of the voice conference information 311 corresponding to other voice conferences.
- the display control unit 124 changes the frame line of the voice conference information 311 as a display mode, but even if the color, size, shape, etc. of the voice conference information 311 are changed. good.
- the display control unit 124 may display the comparison information 315 indicating the comparison result calculated by the voice analysis unit 122.
- the display control unit 124 displays, for example, the proposal information 313 including characters that can identify the voice conference compared with the voice pattern and characters that explain the comparison result.
- the display control unit 124 may display the comparison result of the voice with respect to the voice pattern not only on the administrator terminal 3 but also on the user terminal 2.
- the voice conference system S can present the voice analysis result based on the voice pattern acquired in advance to the manager, and the manager or the user can easily interpret the analysis result.
- the display control unit 124 in addition to the analysis result by the voice analysis unit 122, inputs an action and a comment input by the user while the voice conference is continuing. It may be displayed.
- the action is one of a plurality of actions such as likes, applause, and laughter input by the user on the user terminal 2 during the voice conference, and is associated with the input time and stored in the storage unit. It is stored in 11.
- the comment is an impression, an annotation, or the like input by the user on the user terminal 2 during the voice conference, and is stored in the storage unit 11 in association with the input time.
- FIG. 6 is a schematic diagram for explaining a method of displaying actions and comments on the analysis result.
- the display control unit 124 indicates the content of the action and the comment at a position corresponding to the time when the action and the comment indicating the time change of the speech amount are input. Display characters. Further, the display control unit 124 may display actions and comments not only on the administrator terminal 3 but also on the user terminal 2.
- the voice conference system S presents the analysis result together with the user's action and comment input during the voice conference, and the administrator or the user can easily interpret the analysis result.
- the display control unit 124 may display information indicating an action input in one user terminal 2 on another user terminal 2 during the continuation of the voice conference. As a result, the user can notify other users of opinions such as consent without interrupting the ongoing conversation.
- the voice analysis unit 122 may divide a plurality of users into a plurality of groups for executing a plurality of voice conferences based on the analysis results of the plurality of voice conferences. That is, the voice analysis unit 122 proposes a plurality of preferable groups for executing the voice conference from the next time onward based on the analysis result.
- FIG. 7 is a schematic diagram for explaining a method of dividing a plurality of users into a plurality of groups based on the analysis result. It is assumed that the voice analysis unit 122 stores the analysis results of analyzing the voices of the two voice conferences corresponding to the groups G1 and G2 in which the plurality of users are divided, in the storage unit 11, respectively. The voice analysis unit 122 divides a plurality of users into new groups G1'and G2' based on the analysis result stored in the storage unit 11.
- the description here means regrouping between each room (virtual room) (between lower-level audio conferences), with groups G1 and G2 as new groups G1'and G2'.
- the regrouping may include the content of dividing the audio conference of a certain room (virtual room) into the audio conferences of a lower layer.
- the voice analysis unit 122 divides a plurality of users into a plurality of groups so that the amount of speech or the tendency of speech of the plurality of users belonging to one group becomes close, for example, based on the analysis result.
- the voice analysis unit 122 clusters the speech volumes of the plurality of users by using a known method, and groups each cluster.
- the voice analysis unit 122 tends to make remarks, for example, in a voice conference, it often interrupts in the middle of another person's remark, or in a voice conference, there are many remarks in either the first half, the middle stage, or the second half. Users may be in the same group. On the contrary, the voice analysis unit 122 may divide the plurality of users into a plurality of groups so that the amount of speech or the tendency of speech of the plurality of users belonging to one group becomes far.
- the voice analysis unit 122 may divide a plurality of users into a plurality of groups based on the attributes of each of the plurality of users in addition to the analysis result.
- the attributes of the user are, for example, the results of the user's school and the result of the personality diagnosis performed in advance for the user.
- the voice analysis unit 122 divides the plurality of users into a plurality of groups so that the attributes of the plurality of users belonging to one group are close to each other or far from each other.
- the voice conference unit 121 automatically starts a voice conference in which a plurality of users included in each of the plurality of groups generated by the voice analysis unit 122 participate. Further, the display control unit 124 displays information indicating the voice conference corresponding to the group including the user on the voice conference list screen displayed by the user terminal 2 in FIG. 3, so that the user can use the voice conference list screen. May be notified of the audio conference to attend.
- the voice conference system S can efficiently perform user learning and evaluation by proposing a group for holding a voice conference between similar users or dissimilar users. Can be done.
- the voice conference unit 121 may automatically intervene in the voice conference (room) in which the proposal unit 123 proposes intervention by using at least one of voice and text.
- the voice conference unit 121 determines the intervention content based on the analysis result by the voice analysis unit 122. For example, when there is a user in the voice conference whose speech amount is equal to or less than a predetermined value, the voice conference unit 121 determines as the intervention content to prompt the user to speak. Further, the voice conference unit 121 determines, for example, that when the total amount of speech of a plurality of users participating in the voice conference is equal to or less than a predetermined value, the intervention content is to encourage all the users to speak.
- the voice conference unit 121 generates an automatic voice (machine voice) indicating the intervention content and outputs it to a plurality of user terminals 2 participating in the voice conference, or generates a character indicating the intervention content and performs a voice conference. It is displayed on a plurality of user terminals 2 participating in.
- the voice conference system S can automatically intervene in the voice conference that requires intervention based on the analysis result of the voice, and can facilitate the voice conference while reducing the burden on the administrator.
- the voice conference unit 121 may automatically intervene only in a part of the user terminals 2 selected based on a predetermined condition among the plurality of user terminals 2. In this case, the voice conference unit 121 selects, for example, the user terminal 2 of the user whose speech amount is equal to or less than a predetermined value as an intervention target.
- the voice conference unit 121 may use the image (the image may be a still image or a moving image) and the administrator specified by the user terminal 2 in addition to or in place of at least one of voice and characters. As described above, the image specified by the terminal 3 may be transmitted and received.
- FIG. 8 is a diagram showing a flowchart of a voice conferencing method executed by the voice conferencing device 1.
- the display control unit 124 causes the user terminal 2 to display a voice conference list screen including information on a plurality of voice conferences in which the user can participate.
- the voice conference unit 121 accepts the participation of any of the users in the voice conference according to the operation in the operation unit 24 of the user terminal 2 (S11).
- the voice conference unit 121 starts the voice conference by starting the exchange of voice with the plurality of user terminals 2 of the plurality of users participating in the voice conference (S12).
- the voice analysis unit 122 analyzes the voices emitted in each of the plurality of voice conferences (S13).
- the voice analysis unit 122 calculates, for example, the amount of speech of each of the plurality of users in association with each of the plurality of speech conferences, and also calculates the degree of interaction (transition of the speaker) between the plurality of users.
- the display control unit 124 displays the analysis result by the voice analysis unit 122 on the user terminal 2 (S14).
- the display control unit 124 displays, for example, an arrow symbol indicating the degree of interaction between a plurality of users on the user terminal 2 based on the analysis result by the voice analysis unit 122 while the voice conference is continuing. Let me. Further, the display control unit 124 changes the display mode of the user image according to the amount of speech of the user, for example, based on the analysis result by the voice analysis unit 122 while the voice conference continues.
- the display control unit 124 causes the administrator terminal 3 to display the analysis result 312 of the voice emitted in the voice conference by the voice analysis unit 122 in association with each of the plurality of voice conferences (S15).
- the display control unit 124 displays, for example, an arrow symbol indicating the degree of interaction between a plurality of users on the administrator terminal 3 based on the analysis result by the voice analysis unit 122 while the voice conference is continuing. Let me.
- the display control unit 124 for example, while the voice conference continues, the time of each of the plurality of users participating in one voice conference based on the analysis result by the voice analysis unit 122.
- a line graph showing the change is displayed on the administrator terminal 3.
- the display control unit 124 for example, is the total of the speech volume of each of the plurality of users participating in one voice conference based on the analysis result by the voice analysis unit 122 while the voice conference is continuing.
- the bar graph showing the value is displayed on the administrator terminal 3.
- the voice conference device 1 analyzes the voice emitted in each of the plurality of voice conferences, associates them with each of the plurality of voice conferences, and outputs the voice analysis result to the administrator terminal. Display in 3.
- the voice conference system S can give the administrator a bird's-eye view of the status of the plurality of voice conferences, and enables the administrator to easily know the status of the plurality of voice conferences.
- the processor of the voice conferencing device 1 is the main body of each step (process) included in the voice conferencing method shown in FIG. That is, the processor of the voice conference device 1 reads a program for executing the voice conference method shown in FIG. 8 from the storage unit 11, and executes the program to control each unit of the voice conference device 1. Perform the audio conferencing method shown in.
- the steps included in the audio conferencing method shown in FIG. 8 may be partially omitted, the order between the steps may be changed, or a plurality of steps may be performed in parallel.
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- User Interface Of Digital Computer (AREA)
- Telephonic Communication Services (AREA)
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2020/021646 WO2021245759A1 (ja) | 2020-06-01 | 2020-06-01 | 音声会議装置、音声会議システム及び音声会議方法 |
| JP2022529155A JP7530070B2 (ja) | 2020-06-01 | 2020-06-01 | 音声会議装置、音声会議システム及び音声会議方法 |
| US18/071,636 US12260876B2 (en) | 2020-06-01 | 2022-11-30 | Voice conference apparatus, voice conference system and voice conference method |
| JP2024113452A JP7766887B2 (ja) | 2020-06-01 | 2024-07-16 | 音声会議装置、音声会議システム及び音声会議方法 |
| JP2025177353A JP2026012820A (ja) | 2020-06-01 | 2025-10-21 | 音声会議装置、音声会議システム及び音声会議方法 |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/JP2020/021646 WO2021245759A1 (ja) | 2020-06-01 | 2020-06-01 | 音声会議装置、音声会議システム及び音声会議方法 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/071,636 Continuation US12260876B2 (en) | 2020-06-01 | 2022-11-30 | Voice conference apparatus, voice conference system and voice conference method |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021245759A1 true WO2021245759A1 (ja) | 2021-12-09 |
Family
ID=78830186
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2020/021646 Ceased WO2021245759A1 (ja) | 2020-06-01 | 2020-06-01 | 音声会議装置、音声会議システム及び音声会議方法 |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US12260876B2 (https=) |
| JP (3) | JP7530070B2 (https=) |
| WO (1) | WO2021245759A1 (https=) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2023210052A1 (ja) * | 2022-04-27 | 2023-11-02 | ハイラブル株式会社 | 音声分析装置、音声分析方法及び音声分析プログラム |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2008078555A1 (ja) * | 2006-12-22 | 2008-07-03 | Nec Corporation | 会議制御方法、システム及びプログラム |
| JP2018124456A (ja) * | 2017-02-01 | 2018-08-09 | 株式会社リコー | 情報端末、情報処理装置、情報処理システム、情報処理方法、及びプログラム |
| JP2018170009A (ja) * | 2015-11-10 | 2018-11-01 | 株式会社リコー | 電子会議システム |
| WO2019139101A1 (ja) * | 2018-01-12 | 2019-07-18 | ソニー株式会社 | 情報処理装置、情報処理方法およびプログラム |
| WO2019142231A1 (ja) * | 2018-01-16 | 2019-07-25 | ハイラブル株式会社 | 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2003299051A (ja) | 2002-03-29 | 2003-10-17 | Matsushita Electric Ind Co Ltd | 情報出力装置および情報出力方法 |
| US7319745B1 (en) * | 2003-04-23 | 2008-01-15 | Cisco Technology, Inc. | Voice conference historical monitor |
| US20070106724A1 (en) * | 2005-11-04 | 2007-05-10 | Gorti Sreenivasa R | Enhanced IP conferencing service |
| JP2010055307A (ja) * | 2008-08-27 | 2010-03-11 | Hitachi Ltd | 会議支援システム及び会議支援方法 |
| US9324323B1 (en) * | 2012-01-13 | 2016-04-26 | Google Inc. | Speech recognition using topic-specific language models |
| US10177926B2 (en) | 2012-01-30 | 2019-01-08 | International Business Machines Corporation | Visualizing conversations across conference calls |
| US20140372941A1 (en) * | 2013-06-17 | 2014-12-18 | Avaya Inc. | Discrete second window for additional information for users accessing an audio or multimedia conference |
| US9992330B1 (en) * | 2017-05-26 | 2018-06-05 | Global Tel*Link Corporation | Conference functionality between inmates and multiple approved parties in controlled environment |
| JP7046546B2 (ja) * | 2017-09-28 | 2022-04-04 | 株式会社野村総合研究所 | 会議支援システムおよび会議支援プログラム |
| WO2019142230A1 (ja) | 2018-01-16 | 2019-07-25 | ハイラブル株式会社 | 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム |
| US11882161B2 (en) * | 2021-06-30 | 2024-01-23 | Rovi Guides, Inc. | Breakout of participants in a conference call |
-
2020
- 2020-06-01 JP JP2022529155A patent/JP7530070B2/ja active Active
- 2020-06-01 WO PCT/JP2020/021646 patent/WO2021245759A1/ja not_active Ceased
-
2022
- 2022-11-30 US US18/071,636 patent/US12260876B2/en active Active
-
2024
- 2024-07-16 JP JP2024113452A patent/JP7766887B2/ja active Active
-
2025
- 2025-10-21 JP JP2025177353A patent/JP2026012820A/ja active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2008078555A1 (ja) * | 2006-12-22 | 2008-07-03 | Nec Corporation | 会議制御方法、システム及びプログラム |
| JP2018170009A (ja) * | 2015-11-10 | 2018-11-01 | 株式会社リコー | 電子会議システム |
| JP2018124456A (ja) * | 2017-02-01 | 2018-08-09 | 株式会社リコー | 情報端末、情報処理装置、情報処理システム、情報処理方法、及びプログラム |
| WO2019139101A1 (ja) * | 2018-01-12 | 2019-07-18 | ソニー株式会社 | 情報処理装置、情報処理方法およびプログラム |
| WO2019142231A1 (ja) * | 2018-01-16 | 2019-07-25 | ハイラブル株式会社 | 音声分析装置、音声分析方法、音声分析プログラム及び音声分析システム |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2023210052A1 (ja) * | 2022-04-27 | 2023-11-02 | ハイラブル株式会社 | 音声分析装置、音声分析方法及び音声分析プログラム |
| WO2023209898A1 (ja) * | 2022-04-27 | 2023-11-02 | ハイラブル株式会社 | 音声分析装置、音声分析方法及び音声分析プログラム |
Also Published As
| Publication number | Publication date |
|---|---|
| JP7766887B2 (ja) | 2025-11-11 |
| JP7530070B2 (ja) | 2024-08-07 |
| JP2026012820A (ja) | 2026-01-27 |
| US20230093298A1 (en) | 2023-03-23 |
| JP2024147690A (ja) | 2024-10-16 |
| US12260876B2 (en) | 2025-03-25 |
| JPWO2021245759A1 (https=) | 2021-12-09 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10917613B1 (en) | Virtual object placement in augmented reality environments | |
| JP4503710B2 (ja) | 通信方法及び端末 | |
| JP6734852B2 (ja) | イベントを追跡し、仮想会議のフィードバックを提供するシステム及び方法 | |
| JP5959748B2 (ja) | オーケストレーション・モデルを実施するビデオ会議システム | |
| US9112980B2 (en) | Systems and methods for selectively reviewing a recorded conference | |
| JP6872066B1 (ja) | コンピュータを介したコミュニケーションを実施するためのシステム、方法及びプログラム | |
| JP2026012820A (ja) | 音声会議装置、音声会議システム及び音声会議方法 | |
| JP7452299B2 (ja) | 会話支援システム、会話支援方法及びプログラム | |
| US12279071B2 (en) | Visual image management | |
| JP7152453B2 (ja) | 情報処理装置、情報処理方法、情報処理プログラム及び情報処理システム | |
| US20040107251A1 (en) | System and method for communicating expressive images for meetings | |
| JP2006229903A (ja) | 会議支援システム及び会議支援方法、並びにコンピュータ・プログラム | |
| JP7561101B2 (ja) | 情報処理装置、情報処理方法、及び情報処理プログラム | |
| JP6935569B1 (ja) | 会議管理装置、会議管理方法、プログラム及び会議管理システム | |
| JPWO2021245759A5 (https=) | ||
| JP2023114493A (ja) | 入退室通知支援装置、入退室通知支援方法、及びプログラム | |
| JP7465040B1 (ja) | コミュニケーション可視化システム | |
| JP2000092466A (ja) | 会議参加者の状態管理方法とその記憶媒体 | |
| US12423053B2 (en) | Information processing apparatus, information processing system, non-transitory computer readable medium, and information processing method | |
| US20230379435A1 (en) | Meeting management apparatus, meeting management method, and non-transitory computer-readable medium | |
| JP7691149B2 (ja) | 動画像分析システム | |
| JP7849877B2 (ja) | ビデオセッション評価端末、ビデオセッション評価システム及びビデオセッション評価プログラム | |
| JP2024162292A (ja) | 会合支援システム、会合支援方法、及び会合支援プログラム | |
| JP2025012693A (ja) | 会議システム、リアクションマーク表示方法、プログラム | |
| JP2022113294A (ja) | 表示装置、サーバ装置、表示システム、表示方法、及びプログラム |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20938722 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022529155 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 22/03/2023) |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 20938722 Country of ref document: EP Kind code of ref document: A1 |