WO2022259531A1

WO2022259531A1 - Device, method, and program for online conference

Info

Publication number: WO2022259531A1
Application number: PCT/JP2021/022335
Authority: WO
Inventors: 勉籔内; 仁志瀬下; 照久井上
Original assignee: 日本電信電話株式会社
Priority date: 2021-06-11
Filing date: 2021-06-11
Publication date: 2022-12-15

Abstract

This device for an online conference has a voice recognition unit (2013), a voice analysis unit (2014), an issue extraction unit (2015), and a display control unit (2016). The voice recognition unit converts a first voice collected from a terminal in an online conference into first text, and converts a second voice into a second text. The voice analysis unit disassembles the first text and the second text into word units. When a first co-occurrence matrix for a first word extracted from the first text and a second co-occurrence matrix for a second word extracted from the second text are similar, the issue extraction unit extracts the first word and the second word as a current issue in the online meeting. The display control unit synthesizes the extracted first word and second word into a screen for the online conference, and displays the screen for the online conference in which the first word and the second word are synthesized on the terminal.

Description

Apparatus, method and program for online conference

Embodiments relate to devices, methods and programs for online conferences.

In recent years, online meetings are becoming more popular due to improvements in communication technology. In an online conference, each user participating in the conference accesses a conference URL (Universal Resource Locator) provided by a conference server using a terminal such as a personal computer (PC). The conference server controls transmission and reception of data so that the user's voice and/or image collected by each terminal can be shared among the terminals. In this way, an online conference is realized among a plurality of users.

Japanese Patent No. 5955817

Embodiments provide an apparatus, method, and program for an online conference that can conduct an online conference more effectively.

The device for an online conference of the embodiment has a speech recognition unit, a speech analysis unit, an issue extraction unit, and a display control unit. A speech recognizer converts a first speech collected from a terminal in an online conference into a first text, and converts a second speech into a second text. The speech analysis unit decomposes each of the first text and the second text into word units. The issue extraction unit determines whether a first co-occurrence matrix for the first word extracted from the first text is similar to a second co-occurrence matrix for the second word extracted from the second text. extract the first word and the second word as the current topic in the online meeting. The display control unit synthesizes the extracted first word and second word into a screen for the online conference, and displays the screen for the online conference in which the first word and the second word are synthesized. Send to terminal.

According to the embodiment, there is provided an apparatus, method, and program for an online conference that can hold an online conference more effectively.

FIG. 1 is a diagram showing an example of the configuration of an online conference system according to an embodiment. FIG. 2 is a diagram illustrating an example of a hardware configuration of a terminal; FIG. 3 is a diagram illustrating an example of a hardware configuration of a conference server; FIG. 4 is a functional block diagram of the conference server. FIG. 5 is a flowchart showing an example of analysis processing. FIG. 6A is a diagram for explaining creation of a co-occurrence matrix. FIG. 6B is a diagram showing an example of co-occurrence matrix data. FIG. 7 is a flowchart illustrating an example of conference processing. FIG. 8 is a flowchart illustrating an example of issue extraction processing. FIG. 9 is a diagram showing a display example of points of contention on a terminal.

Hereinafter, embodiments will be described with reference to the drawings. FIG. 1 is a diagram showing an example of the configuration of an online conference system 1 according to an embodiment. As shown in FIG. 1, the online conference system 1 includes terminals 10-1, 10-2, . . . , 10-l and a conference server 20, for example. The terminals 10-1, 10-2, . . . , 10-l and the conference server 20 are connected to a network NW such as the Internet.

Terminals 10-1, 10-2, . . . , 10-l are l (l is a natural number) terminals operated by respective users participating in the conference. Terminals 10-1, 10-2, . Terminals 10-1, 10-2, . . . , 10-l display the screen of the online conference on the web browser. , 10-l also collects the voices of the corresponding users and transmits the collected voices to the conference server 20. The terminals 10-1, 10-2, . . . , 10-l also take pictures of the corresponding users, and transmit the pictures of the users obtained by taking pictures to the conference server 20. The terminals 10-1, 10-2, . Also, the terminals 10-1, 10-2, . Also, the terminals 10-1, 10-2, . The terminals 10-1, 10-2, .

The conference server 20 as an example of an online conference device is a server computer for controlling online conferences. The conference server 20 does not have to be a single computer, and may be composed of multiple computers. The conference server 20 controls various processes of the online conference. For example, the conference server 20 sets a URL for an online conference. The conference server 20 also updates the online conference screen on the web browser according to the access from the terminal 10 . Also, the conference server 20 transmits the voice transmitted from the terminal 10 to the other terminals 10 .

FIG. 2 is a diagram showing an example of the hardware configuration of the terminal 10. As shown in FIG. As shown in FIG. 2, the terminal 10 includes a processor 101, a ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103, a storage 104, an input device 105, a communication module 106, and a display 107. , a camera 108 , a microphone (microphone) 109 and a speaker 110 . Here, the terminals 10 need not all have the same configuration as the terminals 10-1, 10-2, ..., 10-l.

The processor 101 is a processing circuit capable of executing various programs and controls the overall operation of the terminal 10. The processor 101 may be a processor such as a CPU (Central Processing Unit), MPU (Micro Processing Unit), or GPU (Graphics Processing Unit). Also, the processor 101 may be an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array), or the like. Furthermore, the processor 101 may be composed of a single CPU or the like, or may be composed of a plurality of CPUs or the like.

The ROM 102 is a non-volatile semiconductor memory and holds programs for controlling the terminal 10, control data, and the like.

The RAM 103 is a volatile semiconductor memory and is used as a work area for the processor 101.

The storage 104 is a non-volatile storage device such as a hard disk drive (HDD) or solid state drive (SSD), and holds system software of the terminal 10 and various application software. In embodiments, storage 104 holds application software for participating in online meetings, such as a web browser. The storage 104 is not limited to a storage built into the terminal 10, and may be a storage externally attached to the terminal 10. FIG.

The input device 105 is an interface device for the user of the terminal 10 to operate the terminal 10 . The input device 105 can include, for example, a touch panel, keyboard, mouse, various operation buttons, various operation switches, and the like.

The communication module 106 is a module that includes circuits used for connecting the terminal 10 to the network NW. The communication module 106 may be, for example, a communication module conforming to a wired LAN (Local Area Network) standard. Also, the communication module 106 may be a communication module conforming to the wireless LAN standard, for example. In this case, the communication module 106 performs processing for connecting to the network NW via the access point.

The display 107 is a display device such as a liquid crystal display and an organic EL (Electro Luminescence) display. The display 107 displays various screens such as an online conference screen. The display 107 is not limited to being configured integrally with the terminal 10 , and may be a display externally attached to the terminal 10 .

The camera 108 captures an image within the angle of view and generates an image of the user within the angle of view. Camera 108 has a lens and an imaging device. The lens forms an image of the light within the angle of view on the imaging element. The imaging element converts the imaged light into an image signal, which is an electrical signal. The camera 108 is not limited to being configured integrally with the terminal 10 , and may be a camera externally attached to the terminal 10 .

The microphone 109 converts sounds collected from around the terminal 10 into electrical signals. A microphone 109 collects the voice of the user of the terminal 10 during an online conference, for example. The microphone 109 is not limited to being built in the terminal 10 and may be a microphone externally attached to the terminal 10 .

The speaker 110 reproduces sound based on the audio signal. The speaker 110 reproduces the voice of another terminal 10 during an online conference, for example. The speaker 110 is not limited to being built in the terminal 10 and may be a speaker externally attached to the terminal 10 .

FIG. 3 is a diagram showing an example of the hardware configuration of the conference server 20. As shown in FIG. As shown in FIG. 3, the conference server 20 has a processor 201 , a ROM 202 , a RAM 203 , a storage 204 , an input device 205 and a communication module 206 . Here, the conference server 20 may further have a display or the like.

The processor 201 is a processing circuit capable of executing various programs and controls the overall operation of the conference server 20. The processor 201 may be a processor such as a CPU, MPU, GPU. Also, the processor 201 may be an ASIC, FPGA, or the like. Furthermore, the processor 201 may be composed of a single CPU or the like, or may be composed of a plurality of CPUs or the like.

The ROM 202 is a non-volatile semiconductor memory and holds programs for controlling the conference server 20, control data, and the like.

The RAM 203 is a volatile semiconductor memory and is used as a work area for the processor 201.

The storage 204 is a non-volatile storage device such as an HDD or SSD, and holds the system software of the conference server 20 and the like. In an embodiment, storage 204 holds program 2041 for controlling an online conference. The storage 204 also holds a corpus 2042 that is used for natural language analysis for extracting issues during online meetings. The corpus 2042 in the embodiment is structured and recorded so that the processor 201 can refer to various documents such as minutes of online or offline meetings held in the past, technical documents and references related to the topic of each meeting. It is a database that The number of words included in the corpus 2042 for a topic is, for example, several million to hundreds of millions of words. Documents for generating the corpus 2042 are input by the administrator of the conference server 20, for example. Also, the minutes of the online conference can be automatically generated by recording the remarks of the users of the respective terminals 10, for example. In addition, technical documents and the like may be collected by the processor 201 via the network NW. Thus, inputting documents to generate corpus 2042 may be done in any manner. The storage 204 also holds co-occurrence matrix data 2043 . The co-occurrence matrix data 2043 is data representing the co-occurrence relationship between words included in the corpus 2042 as a matrix. Here, the storage 204 does not necessarily have to hold the corpus 2042 and the co-occurrence matrix data 2043 . The corpus 2042 and co-occurrence matrix data 2043 may be stored in storage separate from the conference server 20 . In this case, the conference server 20 acquires the corpus 2042 and the co-occurrence matrix data 2043 from this separate storage as needed.

The input device 205 is an interface device for the administrator of the conference server 20 to operate the conference server 20 . The input device 205 can include, for example, a touch panel, keyboard, mouse, various operation buttons, various operation switches, and the like.

The communication module 206 is a module that includes circuits used by the conference server 20 to connect to the network NW. The communication module 206 may be, for example, a communication module conforming to the wired LAN standard. Also, the communication module 206 may be a communication module conforming to the wireless LAN standard, for example.

FIG. 4 is a functional block diagram of the conference server 20. As shown in FIG. As shown in FIG. 4 , the conference server 20 has a corpus analysis section 2011 and a co-occurrence matrix creation section 2012 . The conference server 20 also has a voice recognition unit 2013 , a voice analysis unit 2014 , an issue extraction unit 2015 and a display control unit 2016 . By executing the program 2041, the processor 201 of the conference server 20 performs a corpus analysis unit 2011, a co-occurrence matrix creation unit 2012, a speech recognition unit 2013, a speech analysis unit 2014, an issue extraction unit 2015, and a display control unit. 2016. The corpus analysis unit 2011, the co-occurrence matrix creation unit 2012, the speech recognition unit 2013, the speech analysis unit 2014, the issue extraction unit 2015, and the display control unit 2016 are realized by hardware different from the processor 201. may

The corpus analysis unit 2011 analyzes the corpus 2042 . For example, the corpus analysis unit 2011 identifies the part of speech of each word included in the corpus 2042 . Then, the corpus analysis unit 2011 excludes words of a specific part of speech. Words of a specific part of speech are words that do not have an important meaning in the semantic interpretation of a sentence, such as particles and auxiliary verbs.

The co-occurrence matrix creation unit 2012 creates a co-occurrence matrix for each word extracted by the corpus analysis unit 2011 . A co-occurrence matrix is a matrix representing whether or not other words appear simultaneously before and after a certain word, that is, whether or not a certain word and another word co-occur. For example, the elements of the co-occurrence matrix represent the frequency with which a word co-occurs with another word. Then, the co-occurrence matrix creating unit 2012 stores the created co-occurrence matrix as the co-occurrence matrix data 2043 in the storage 204 . Details of the operation of creating the co-occurrence matrix will be described later.

The speech recognition unit 2013 identifies from which terminal when speech data is sent from terminals 10-1, 10-2, . Recognize who speaks. Further, the speech recognition unit 2013 converts the transmitted speech data into text data by speech recognition using frequency analysis or the like.

The speech analysis unit 2014 analyzes the speech converted into text by the speech recognition unit 2013. The speech analysis unit 2014 morphologically analyzes text, for example. The morphological analysis is performed using, for example, a morphological analysis engine for Japanese such as MeCab. Of course, if the recognized speech is not Japanese, a morphological analysis engine for other languages may be used.

The issue extraction unit 2015 extracts the current issue in the online conference based on the analysis result of the speech analysis unit 2014 and the co-occurrence matrix data 2043 . The details of the operation of extracting issues will be described later.

The display control unit 2016 controls the display 107 of the terminal 10 to display the online conference screen. For example, the display control unit 2016 transmits screen data of the online conference to the display 107 of the terminal 10 . In addition, the display control unit 2016 updates the online conference screen displayed on the display 107 of the terminal 10 according to the user's image transmitted from the terminal 10 . Furthermore, the display control unit 2016 updates the online conference screen displayed on the display 107 of the terminal 10 according to the text representing the issue extracted by the issue extraction unit 2015 .

Next, the operation of the online conference system 1 according to the embodiment will be described. FIG. 5 is a flowchart showing analysis processing by the conference server 20. As shown in FIG. The processing of FIG. 5 is performed by the processor 201 at timing after the end of the online conference or after an arbitrary period of time, such as several months. It is assumed that sufficient corpora 2042 are held in the storage 204 when the processing of FIG. 5 is performed.

In step S1, the processor 201 acquires the corpus 2042 from the storage 204.

In step S2, the processor 201 decomposes each sentence included in the corpus 2042 into word units by morphological analysis using a morphological analysis engine such as MeCab, and identifies the part of speech of each word. For example, suppose corpus 2042 contained the sentence "The router forwards the packet to the network." The processor 201, for example, converts this sentence into “router (noun)”, “ga (particle)”, “packet (noun)”, “wo (particle)”, “forwarding (noun)”, “shite (verb) ”, “network (noun)”, “he (particle)”, “transfer (noun)”, “suru (verb)”, “. (period)”.

In step S3, the processor 201 excludes unnecessary words in terms of semantic interpretation from the words obtained in step S2. For example, the processor 201 leaves nouns and excludes other words. Depending on the sentence, processor 201 may also retain verbs, adjectives, and the like.

At step S4, the processor 201 creates a co-occurrence matrix for each of the remaining words. A window size for creating a co-occurrence matrix is 2, for example. The window size is information indicating how many adjacent words co-occur with the target word for which the co-occurrence matrix is to be created. For example, it is assumed that "router", "packet", "forwarding", "network", and "transfer" remain in the process of step S3. In this case, for example, a co-occurrence matrix for "forwarding" is created as shown in FIG. 6A. That is, since "router", "packet", "network", and "forwarding" all appear within two words before and after "forwarding", 1 is added to each matrix element. Based on a similar concept, the processor 201 determines the presence or absence of co-occurrence for each word within the range of the window size for the entire corpus 2042, and creates a co-occurrence matrix. At this time, the processor 201 adds 1 to the corresponding element of the matrix when it is determined that the words already determined to co-occur co-occur in another sentence again. For example, when the co-occurrence matrix of "forwarding" is detected again when the co-occurrence matrix of "forwarding" is detected, the processor 201 adds "packet" to the co-occurrence matrix of "forwarding". Add 1 to the element of At the same time, the processor 201 adds 1 to the "forwarding" element of the "packet" co-occurrence matrix.

At step S5, the processor 201 normalizes the co-occurrence matrix for each created word. Specifically, the processor 201 divides the value of each element of the co-occurrence matrix for each word by the frequency of appearance of the word in the co-occurrence matrix. In this way, co-occurrence matrix data 2043 as shown in FIG. 6B is created. Here, in FIG. 6B, Tn (n=1, 2, . Each row in FIG. 6B is a co-occurrence matrix for the word Tn. Cij (i=1, 2, . . . , n, j=1, 2, . . . , n) in the elements of the co-occurrence matrix indicates the frequency with which word Ti and word Tj co-occur. Fi (i=1, 2, . . . , n) in the elements of the co-occurrence matrix indicates the appearance frequency of the word Ti in the entire corpus 2042 . Here, Cij=Cji due to the property of the co-occurrence matrix. Also, in FIG. 6B, Cii, that is, the value of co-occurrence frequency for the same word, is assumed to be zero.

In step S6, the processor 201 stores the co-occurrence matrix data 2043 in the storage 204. After that, the processor 201 terminates the processing of FIG.

Here, once the co-occurrence matrix data 2043 is created, in the analysis process, the documents added to the corpus 2042 should be processed in FIG.

FIG. 7 is a flowchart showing an example of conference processing. The processing in FIG. 7 is processing after the online conference is started. It is assumed that after the online conference is started, the terminal 10 transmits to the conference server 20 the user's images sequentially captured by the camera 108 and the user's voices sequentially collected by the microphone 109 . Note that while the camera 108 is powered off, the terminal 10 does not transmit the user's image to the conference server 20 . Similarly, the terminal 10 does not transmit the user's voice to the conference server 20 while the microphone 109 is powered off.

In step S<b>11 , the processor 201 determines whether or not the user's image has been transmitted from any terminal 10 . In step S11, when the user's image has been transmitted from the terminal 10, the process proceeds to step S12. In step S11, if the user's image has not been transmitted from the terminal 10, the process proceeds to step S13.

In step S<b>12 , the processor 201 updates the online conference screen according to the user's image transmitted from the terminal 10 . Then, the processor 201 transmits the updated screen data of the online conference to each terminal 10 . In response to this, each terminal 10 updates the display on the display 107, for example, the web browser, with the received screen data. After that, the process moves to step S13.

In step S13, the processor 201 determines whether or not any terminal 10 has transmitted voice. In step S13, when voice is transmitted from the terminal 10, the process proceeds to step S14. In step S13, if no voice has been transmitted from the terminal 10, the process proceeds to step S18.

In step S14, the processor 201 identifies from which terminal 10 the voice has been transmitted, for example, by the ID of the terminal 10 transmitted together with the voice. Thereby, the processor 201 identifies the user who made the statement.

In step S15, the processor 201 copies the transmitted voice data. The processor 201 then transmits the transmitted voice data to the other terminal 10 . In response to this, each terminal 10 reproduces the voice of the user who spoke from the speaker 110 based on the received voice data. Also, the processor 201 holds a copy of the audio data in the RAM 203, for example. After that, the process moves to step S16.

In step S16, the processor 201 determines whether or not another user has spoken within a certain period of time, such as the past five minutes. The time of step S16 may be set appropriately. In step S16, when there is a statement from another user, the process proceeds to step S17. In step S16, when there is no other user's speech, the process proceeds to step S18.

In step S17, the processor 201 performs issue extraction processing. After the issue extraction process, the process proceeds to step S18. The issue extraction process is a process of extracting the current issue in the online conference from the voices of multiple users. Details of the issue extraction process will be described later.

At step S18, the processor 201 determines whether or not to end the online conference. For example, the processor 201 determines to end the online conference when all the terminals 10 request disconnection. In step S18, when the online conference is not ended, the process returns to step S11. In step S18, when ending the online conference, the processor 201 ends the processing of FIG.

FIG. 8 is a flowchart showing an example of issue extraction processing. In the following description, for example, during the past 5 minutes, a first utterance by a first user, "Router forwards packets to network" and "Router forwards packets to network by UPnP." It is assumed that the second user's second utterance of "Do." has been collected.

In step S21, the processor 201 recognizes the voice data of each of the multiple users held in the RAM 103, for example, and converts the data of each voice into text data.

In step S22, the processor 201 decomposes each text into word units by morphological analysis. For example, for the first utterance, the text is "router", "is", "packet", "to", "forwarding", "to", "network", "to", "forward", " and "." Also, for the second utterance, the text is "router", "is", "packet", "to", "UPnP", "to", "by", "network", "to", "transfer ”, “do”, and “.”.

In step S23, the processor 201 excludes unnecessary words from the meaning of the words obtained in step S22. For example, the processor 201 leaves nouns and excludes other words. Depending on the sentence, processor 201 may also retain verbs, adjectives, and the like.

At step S24, the processor 201 counts the appearance frequency of each word obtained at step S23 during the online conference.

In step S25, the processor 201 determines whether there is a word whose appearance frequency is equal to or greater than the first threshold value, for example, 5 times or more for each user's utterance. In step S25, when there is a word whose appearance frequency is equal to or higher than the first threshold for each user's utterance, the process proceeds to step S26. In step S25, when there is no word whose appearance frequency is equal to or higher than the first threshold for each user's utterance, the processor 201 terminates the processing of FIG. In this case, it is determined that there are no points of contention at this time.

In step S26, the processor 201 extracts words whose frequency of appearance is greater than or equal to the first threshold for each user's utterance. For the following explanation, assume that "forwarding" is extracted from the first utterance and "UPnP" is extracted from the second utterance. Then, the processor 201 determines whether or not there is a set of words in which the difference in appearance frequency is equal to or less than the second threshold value, for example, 5 times or less, among the words extracted for each user. For example, if the appearance frequency of "forwarding" is 10 times and the appearance frequency of "UPnP" is 5 times, the difference between the two is 5 times. In step S26, when there is a set of words whose difference in frequency of appearance is equal to or less than the second threshold, the process proceeds to step S27. In step S26, when there is no group of words whose difference in frequency of appearance is equal to or less than the second threshold, the processor 201 terminates the processing of FIG. In this case, it is determined that there are no points of contention at this time. Here, the determination in step S26 may be made based on whether or not the ratio of appearance frequencies is close to 1, instead of whether or not the difference in appearance frequencies is equal to or less than the second threshold.

In step S27, the processor 201 extracts word sets whose appearance frequency difference is equal to or less than the second threshold. Note that if there are multiple sets of words that match the condition, the processor 201 may extract all of those sets. Processor 201 then extracts from co-occurrence matrix data 2043 a co-occurrence matrix for each word in the extracted word set. Processor 201 then determines whether there is a set of words with similar co-occurrence matrices. Whether the co-occurrence matrices are similar can be determined from the cosine similarity of the co-occurrence matrices. For example, let a be the vector representing the co-occurrence matrix of the first word in the set of extracted words, b be the vector representing the co-occurrence matrix of the second word, and θ be the angle between vector a and vector b. Then the cosine similarity can be calculated by the following formula.

In step S27, it is determined that vector a and vector b are similar, that is, that the co-occurrence matrices are similar when θ is equal to or less than a third threshold value, eg, 30 degrees. In step S27, when there is a set of words with similar co-occurrence matrices, the process proceeds to step S28. In step S27, when there is no set of words with similar co-occurrence matrices, the processor 201 terminates the processing of FIG. In this case, it is determined that there are no points of contention at this time.

In step S28, the processor 201 extracts word pairs with similar co-occurrence matrices. For example, for the first statement "The router forwards the packet to the network" and the second statement "The router forwards the packet to the network by UPnP", the common usage of the word "forwarding" is The co-occurrence matrix and the co-occurrence matrix for the word "UPnP" are similar. Words with similar co-occurrence matrices often have some sort of relationship such as a synonym relationship or an antonym relationship. For example, the word "forwarding" in the field of communications means forwarding received data as is to a designated device. On the other hand, the word "UPnP (Universal Plug and Play)" is one of mutual automatic recognition methods between network devices. That is, the word "UPnP" and the word "forwarding" have the relationship that the word "UPnP" is one of the methods of the word "forwarding". If such mutually related words appear frequently from multiple users, it can be inferred that those words are the current topic of discussion during the online meeting. Therefore, at step S28, the processor 201 synthesizes a plurality of words presumed to be these points of contention into the screen for the online conference. The processor 201 then transmits to each terminal 10 the data of the online conference screen in which a plurality of words presumed to be the point of contention are synthesized. In response to this, each terminal 10 updates the display on the display 107, for example, a web browser, with the received screen data. After that, the processor 201 terminates the processing of FIG.

FIG. 9 is a diagram showing a display example of an online conference screen in which points of discussion are combined. For example, on the screen of the online conference, images of users captured by the camera 108 of each terminal 10 are displayed. FIG. 9 shows an example in which images of three users, namely user 1, user 2, and user 3, are displayed. The image of User 1 is displayed, for example, in the upper left display area 107a within the screen of the online conference. Similarly, the image of User 2 is displayed, for example, in the upper right display area 107b within the screen of the online conference. Similarly, the image of User 3 is displayed in, for example, the lower left display area 107c within the screen of the online conference. Meanwhile, the current issue may be displayed in an empty area within the screen of the online meeting. In the example of FIG. 9, the current issue is displayed, for example, in the lower right display area 107d within the screen of the online conference. Current issue display 107e includes, for example, a title and a word that is presumed to be the current issue. The title is, for example, a title such as "current issue". The title may be determined appropriately. For example, the title may be a word or the like that indicates the contents of the corpus 2042 that includes the word presumed to be the current topic of discussion, such as "communication." Also, the words presumed to be the current point of contention are "forwarding" and "UPnP" in the example of FIG.

Here, although only the extracted words are listed in FIG. 9, for example, when the extracted words are opposing concepts, "VS ” may be displayed together. There is no particular limitation on how to display the points at issue in this way.

Also, the display position of the display 107e of the current issue is not particularly limited. For example, the display position of the display 107e of the current issue may be a fixed position such as the display area in the upper left corner within the screen of the online conference.

Also, in the process of FIG. 8, the current issue can be updated sequentially during the online meeting. Along with this, the display in FIG. 9 can also be updated sequentially. In this case, past issues may be displayed as they are in addition to the current issues. In this case, it is desirable to display the time when the issue was displayed as well as the issue in the past.

As described above, according to the embodiment, speech sequentially collected from terminals during an online meeting is analyzed, and a set of words having similar co-occurrence matrices in the analyzed speech is extracted as the current issue. be. A display representing the current issue is then combined with the online conference screen displayed on each user's terminal. In this way, in the embodiment, since the words that are the points of discussion in the meeting are sequentially displayed on the user's terminal, the topic during the online meeting can be prevented from deviating from the agenda, and the meeting can be held effectively. . Embodiments also allow each user to follow the flow of the meeting because the words on the topic of the meeting can be updated on an ongoing basis.

Here, in the embodiment, the process of extracting points of contention is performed when a plurality of users make statements within a certain period of time. On the other hand, the process of extracting points of contention may be performed even when the same user makes multiple statements within a certain period of time.

In addition, in the embodiment, a set of words that appear frequently in each user's utterances, have a small difference in appearance frequency between the words, and have similar co-occurrence matrices is extracted as an issue. be done. Not all of these conditions need necessarily be met. For example, the condition of difference in frequency of occurrence between words may be excluded.

Also, in the embodiment, the conference server 20 analyzes the corpus 2042 and creates the co-occurrence matrix data 2043 . On the other hand, the analysis of the corpus 2042 and the creation of the co-occurrence matrix data 2043 may be performed by a server or the like other than the conference server 20 .

It should be noted that the present invention is not limited to the above-described embodiments, and can be variously modified in the implementation stage without departing from the gist of the present invention. Further, each embodiment may be implemented in combination as appropriate, in which case the combined effect can be obtained. Furthermore, various inventions are included in the above embodiments, and various inventions can be extracted by combinations selected from a plurality of disclosed constituent elements. For example, even if some constituent elements are deleted from all the constituent elements shown in the embodiments, if the problem can be solved and effects can be obtained, the configuration with the constituent elements deleted can be extracted as an invention.

1 online conference system 10, 10-1, 10-2, ..., 10-l terminal 20 conference server 101 processor 102 ROM
103 RAM
DESCRIPTION OF SYMBOLS 104... Storage 105... Input device 106... Communication module 107... Display 108... Camera 109... Microphone (microphone)
110... Speaker 201... Processor 202... ROM
203 RAM
204... Storage 205... Input device 206... Communication module 2011... Corpus analysis unit 2012... Co-occurrence matrix creation unit 2013... Speech recognition unit 2014... Speech analysis unit 2015... Issue extraction unit 2016... Display control unit 2041... Program 2042... Corpus 2043 … co-occurrence matrix data NW … network

Claims

A device for an online conference with a plurality of terminals connected online,
a speech recognition unit that converts a first speech collected from the terminal in the online conference into a first text and converts a second speech into a second text;
a speech analysis unit that decomposes each of the first text and the second text into word units;
a first co-occurrence matrix for a first word extracted in the first text and a second co-occurrence matrix for a second word extracted in the second text are similar an issue extraction unit that sometimes extracts the first word and the second word as a current issue in the online conference;
Synthesizing the extracted first word and the second word into a screen for the online conference, and a screen for the online conference in which the first word and the second word are synthesized. to the terminal; and
An apparatus for online conferencing comprising:
The issue extracting unit extracts, as the first words, words having an appearance frequency equal to or higher than a first threshold among the words decomposed from the first text, and extracts the words decomposed from the second text as the first words. Among them, words whose frequency of appearance is equal to or higher than the first threshold are extracted as the second words,
An apparatus for online conferencing according to claim 1.
The point-of-issue extraction unit extracts words having an appearance frequency equal to or higher than the first threshold among words decomposed from the first text and words having an appearance frequency equal to or greater than the first threshold among words decomposed from the second text. Among the pairs of words equal to or greater than the threshold of, a pair of words whose appearance frequency difference is equal to or less than a second threshold is extracted as the first word and the second word,
An apparatus for online conferencing according to claim 2.
a corpus analysis unit that decomposes the corpus into word units;
a co-occurrence matrix creation unit that creates a co-occurrence matrix for each word based on the words decomposed from the corpus;
An apparatus for online conferencing according to any one of claims 1 to 3, further comprising:
The first sound is a sound uttered by a first user, and the second sound is a sound uttered by a second user different from the first user,
Device for online conferencing according to any one of claims 1 to 4.
A method for an online conference with a device for an online conference with a plurality of terminals connected online, comprising:
converting, by the device, a first speech collected from the terminal in the online conference into a first text and a second speech into a second text;
decomposing each of the first text and the second text into word units by the device;
The apparatus generates a first co-occurrence matrix for a first word extracted from the first text and a second co-occurrence matrix for a second word extracted from the second text. extracting the first word and the second word as a current topic in the online meeting when they are similar;
The apparatus synthesizes the extracted first word and the second word into a screen for the online conference, and the online conference in which the first word and the second word are synthesized. transmitting to the terminal a screen for
A method for online conferencing comprising:
An online conference for causing a processor to function as the speech recognition unit, the speech analysis unit, the issue extraction unit, and the display control unit in the apparatus for online conference according to any one of claims 1 to 5. program.