WO2018069580A1

WO2018069580A1 - Interactive collaboration tool

Info

Publication number: WO2018069580A1
Application number: PCT/FI2017/050719
Authority: WO
Inventors: Niina HALONEN; Kirsti LONKA; Olli SARVI
Original assignee: University Of Helsinki
Priority date: 2016-10-13
Filing date: 2017-10-13
Publication date: 2018-04-19

Abstract

A method and computer program for receiving (210) recorded speech of plural attendees of a meeting; converting (220) recorded speech to text during the meeting; enabling editing (230) of the text by the attendees during the meeting; identifying (240) key terms from the edited text; forming (250) a dynamic key term visualisation from the identified key terms; and enabling modifying (260) of the key terms by the attendees after the forming of the key term visualisation and correspondingly updating the dynamic key term visualisation.

Description

INTERACTIVE COLLABORATION TOOL

TECHNICAL FIELD

The present invention generally relates to an interactive collaboration tool.

BACKGROUND ART

This section illustrates useful background information without admission of any technique described herein representative of the state of the art. Collaboration of people typically involves bidirectional exchange of information in which thoughts and ideas are jointly developed by participants to facilitate assessment of the objectives and / or the solutions or approach to reach the objectives. This is often implemented by holding a meeting in which attendants discuss based on an agenda of the meeting and someone takes and afterwards circulates minutes of the meeting. Action points may be written in the minutes to designate who should take care of various work items that were decided to be done.

Tools exist to facilitate the forming of the minutes and various techniques further exist to facilitate quick access to information in the minutes. For example, US 9,035,996 Bl discloses that recordings made with participants' computing devices can be processed into transcript in which each participant is identified. The processed text from the transcript can be displayed in a word cloud using a variety of formats and techniques including alphabetization, font size differences for emphasis and color differences for emphasis or other formats to provide distinction to the text within the word cloud. The word cloud can be displayed as a summary of the transcript and used to pick the interest of a user seeking to join multi-device video communication session and a user can search recorded content by selecting a text element within the word cloud. Existing tools such as the US 9,035,996 Bl may facilitate visualization of a meeting and particularly access to the content of its recording, but not particularly help the collaboration while a meeting is in progress or after the meeting is over. Typically, meetings involve presence of plural people and initiate thought processes that do not stop when the meeting ends. Sometimes, some of the participants may further discuss after the meeting informally at other occasions when in touch, but such discussions would not be conveyed to the other attendants. Moreover, the automatic formation of the word cloud can only make an effort to summarize the meeting by drawing the minutes using computerized techniques such as using a frequency analysis. Such summaries may better identify new business jargon terms than point to actually important topics, because people tend to paraphrase others rather than echo same words not to waste time and to indicate having understood previous speakers.

Meetings as such may be useful for dissemination of information, but often some further action should be taken and some co-ordination of work is needed. Traditionally, the agenda of a next meeting includes verification of the progress in the action points recorded in the minutes of a previous meeting and the attendants may use the minutes as well as a reminder of any tasks assigned to them. However, the tools that may help to visualize and summarize the discussions during the meetings do not offer particular support for post-meeting collaboration.

It is an object of the present invention to avoid or mitigate the aforementioned problems or at least to provide new technical alternatives to the state of the art.

SUMMARY

According to a first aspect of the invention there is provided a method comprising: receiving recorded speech of plural attendees of a meeting;

converting recorded speech to text during the meeting;

enabling editing of the text by the attendees during the meeting;

identifying key terms from the edited text;

forming a dynamic key term visualisation from the identified key terms;

enabling modifying of the key terms by the attendees after the forming of the key term visualisation and correspondingly updating the dynamic key term visualisation. The key term visualization may comprise or be any of a graphic significance presentation of a sub-set of the key terms; and a numeric chart of a sub-set of the key terms; and a word cloud. The sub-set of key terms may comprise key terms ranked by descending score. The word cloud may be formed with words and/or phrases.

The forming of the dynamic key term visualisation may be performed repeatedly during the meeting. The forming of the dynamic key term visualisation may be performed repeatedly during pauses in speech of the participants. The attendees may be enabled to modify the key terms in real-time during the meeting. The meeting may be an on-line meeting in which at least one attendee participates over a data connection. Alternatively, the meeting may be a face-to-face meeting in which all participants physically meet each other. The key terms may comprise keywords. The key terms may comprise key phrases.

By forming a dynamic key term visualisation based on the recorded speech and enabling modifying of the dynamic key term visualisation by the attendees, the attendees may be provided by a visualization that illustrates main items discussed in the meeting. The illustration of the main items may facilitate understanding and developing the topic of the meeting. Moreover, the dynamic development of the key term visualisation may facilitate collaboration by the attendees by enabling on-line and/or offline development of the illustration. The method may comprise enabling a user to combine key terms to a combined expression. The combined expression may be usable for defining action points and/or summarizing the meeting.

The method may comprise associating the combined expression with a given person or group of persons. The associating may comprise forming a calendar entry using the combined expression for the given person or group of persons. The method may comprise comparing the dynamic key term visualisation with earlier formed key term visualisations. The comparison may be used to identify earlier work relating to the topic of the meeting. The comparison may be used to detect developments made since earlier work relating to the topic. The comparison may be used to identify the contribution of the attendants to the propagation of the meeting and / or to the development since the earlier work relating to the topic.

The modifying of the key terms may comprise adding or removing key terms. Alternatively or additionally, the modifying of the key terms may comprise editing the keyterms.

The method may comprise detecting the attendee whose speech is being received. The method may comprise presenting the identified key terms together with an indication of the related attendee or attendees. The method may comprise detecting from received speech such attendees whose speech has been received less than by a set minimum proportion and prompting comments from such attendees.

The detecting of the attendee whose speech is being received may be performed by identifying an individual channel used by the attendee or by use of voice recognition.

The method may comprise receiving a topic of the meeting and maintaining the text associated with the topic.

The method further comprise storing the dynamic key term visualisation in a repository accessible by the attendees. The method may comprise reporting the key term visualisation to one or more persons associated with the attendees.

The speech may be converted to text using a browser operated web service. The service may be implemented using mobile devices and dedicated application(s).

The method may be performed in a network based service. The network based service may be implemented using a NodeJS backend. The network based service may be run by a dedicated server and/or a cloud computing system.

The method may comprise providing a browser based user interface for the attendees. The browser based user interface may be implemented using WebRTC.

According to a second aspect of the invention there is provided a computer program comprising computer executable program code which when executed by at least one processor causes an apparatus to perform the second aspect.

According to a third aspect of the invention there is provided an apparatus comprising a memory configured to store the computer program of the second aspect and a processor configured to control operation of the apparatus according to the computer program.

According to a fourth aspect of the invention there is provided a computer program product comprising a non-transitory computer readable medium having the computer program of the third aspect stored thereon. Any foregoing memory medium may comprise a digital data storage such as a data disc or diskette, optical storage, magnetic storage, or opto-magnetic storage. The memory medium may be formed into a device without other substantial functions than storing memory or it may be formed as part of a device with other functions, including but not limited to a memory of a computer, a chip set, and a sub assembly of an electronic device.

Different non-binding aspects and embodiments of the present invention have been illustrated in the foregoing. The embodiments in the foregoing are used merely to explain selected aspects or steps that may be utilized in implementations of the present invention. Some embodiments may be presented only with reference to certain aspects of the invention. It should be appreciated that corresponding embodiments may apply to other aspects as well. BRIEF DESCRIPTION OF DRAWINGS

Some embodiments of the invention will be described with reference to the accompanying drawings, in which:

Fig. 1 shows a schematic picture of a system according to an embodiment of the invention;

Fig. 2 shows a flow chart illustrating a method of an embodiment of the invention; Fig. 3 shows a block diagram of an apparatus suited for implementing an embodiment of the invention;

Fig. 4 illustrates a dashboard view of an embodiment; and

Fig. 5 shows a modification view for allowing participants.

DETAILED DESCRIPTION

In the following description, like reference signs denote like elements or steps.

Fig. 1 shows a schematic picture of a system 100 according to an embodiment of the invention. The system has plural users 110, a web browser user interface 120 and an application interface 130 for use with a browser or dedicated applications or apps such tablet computer or mobile phone apps. Through the web browser user interface 120 and the application interface 130 are connected to a speech recognition web service 150 (using WebRTC, for example) and to a NodeJS Backend 140, which are connected to an information extraction and scoring engine 160 that extracts and scores key terms (e.g. keywords and/or key phrases) based on natural language processing with scaleable information matching.

The architecture of Fig. 2 can be implemented using discrete elements such as separate servers or cloud services that provide the different functionalities. On the other hand, some or all of the services used by the users 110 can be provided in common by one or more entities.

Fig. 2 shows a flow chart illustrating a method of an embodiment of the invention. The method comprises:

210. Receiving recorded speech of plural attendees of a meeting;

220. Converting recorded speech to text during the meeting;

230. Enabling editing of the text by the attendees during the meeting;

240. Identifying key terms from the edited text;

250. Forming a dynamic key term visualisation from the identified key terms; and 260. Enabling modifying of the key terms by the attendees after the forming of the key term visualisation and correspondingly updating the dynamic key term visualisation.

The forming of the dynamic key term visualisation is preferably performed repeatedly during the meeting, for example during pauses in speech of the participants, with given intervals or when a given amount of speech has been converted to text.

By forming a dynamic key term visualisation based on the recorded speech and enabling modifying of the dynamic key term visualisation by the attendees, the attendees can be provided by a visualization that illustrates main items discussed in the meeting. Such an illustration of the main items facilitates understanding and developing the topic of the meeting. Moreover, the dynamic development of the key term visualisation facilitates collaboration by the attendees by enabling on-line and/or offline development of the illustration.

In an embodiment, a user is enabled to combine key terms to a combined expression. The combined expression can be usable, for example, for defining action points and/or summarizing the meeting.

The method comprises in an embodiment associating the combined expression with a given person or group of persons. The associating comprises, for example, forming a calendar entry using the combined expression for the given person or group of persons. The method comprises in an embodiment comparing the dynamic key term visualisation with earlier formed key term visualisations for the comparison to be used, for example, to any of: identifying earlier work relating to the topic of the meeting; detecting developments made since earlier work relating to the topic; identifying the contribution of the attendants to the propagation of the meeting and / or development since the earlier work relating to the topic.

The modifying of the key terms comprises, for example, adding or removing key terms. Alternatively or additionally, the modifying of the key terms may comprise editing the key terms.

In an embodiment, the method comprises detecting the attendee whose speech is being received; and/or presenting the identified key terms together with an indication of the related attendee or attendees. The method comprises in an embodiment detecting from received speech such attendees whose speech has been received less than by a set minimum proportion of time and prompting comments from such attendees.

The detecting of the attendee whose speech is being received can be performed, for example, by identifying an individual channel used by the attendee or by use of voice recognition.

The method comprises in an embodiment receiving a topic of the meeting and maintaining the text associated with the topic.

The method preferably comprises storing the dynamic key term visualisation in a repository accessible by the attendees. The method may comprise reporting the key term visualisation to one or more persons associated with the attendees.

In an embodiment, the speech is converted to text using a browser operated web service. The method is implemented in an embodiment with support for mobile devices and applications configured to interface with the user 110.

An example use case is next described. First, the user 110 logs into a service of an embodiment and creates a title for a topic. If the topic is new, the user is prompted to create a new session. The user may be prompted, for example, to give a title name and date and time for the session and to add desired participants for this specific session. The user is shown a dashboard view to all the sessions the user has created or participated in, wherein one topic my comprise plural sessions. Fig. 4 illustrates a dashboard view of an embodiment.

During a session, all the spoken audio of the participants is recorded. The participants are preferably able to edit and add text while the recording is paused, see Fig. 5. After the session ends, the results of the session are processed and a canvas is shown (e.g. to the participants and optionally other authorized people) to enable further editing of the session outcomes. This editing can be performed by allowing the users to position key terms on the canvas as they like on the canvas. Content may also be added or removed from the content on the canvas e.g. by typing new content into a text box, after clicking on desired position on the canvas or dragging out or clicking an existing entry. Alternatively or additionally to canvas based modification, a view and further processing of the session results can be arranged by enabling users to work with a word list shown with associated scores in which key terms can be selected for editing their text and/or score (e.g. estimated relevance to the topic of the session).

The text converted from speech can be split into key terms based on various probabilistic models. For example, frequency of various words or phrases can be compared to a reference corpus to determine how much their use frequency differs either being greater or smaller than in the reference corpus. The reference corpus can be selected from the same or associated topic to reduce significance of likely trivial items. In an embodiment, a further key term visualisation is presented based on a search on results from outside of the present session. For example, a user may search in other recorded sessions regarding same or other topic, or from another source such as the Internet and the search results can be presented as another key term visualisation for comparison.

The method is preferable performed automatically so that the key term visualisation of a session is built and updated as an ongoing process while the session progresses so that the participants can obtain an instantaneous and dynamically developing graphical presentation of the progress of content of the session. The method may help to remove need for manual summarizing of topics and recording of future action points and reduce the need to communicate between the attendees or other persons who should be aware of the progress or results of the session. The method may further enable combining massive data sets interactively with the spoken contribution of the attendees so providing an all new search and interaction tool. For example, during the session, thousands of comparisons and decisions may be performed per second while the key term visualisation is updated. Such fast computation may be particularly useful to update the visualization of the participants during natural pauses in speech and thus avoid necessitating people to interrupt their normal interaction.

Fig. 3 shows a block diagram of an apparatus 300 suited for implementing an embodiment of the invention. The apparatus 300 can be used, depending on implementation, as a user terminal and/or computer server for implementing at least some parts of the method of Fig. 2. Notice that it is not necessary to run any part of the method of Fig. 2 as a network based service but instead, in some embodiments the functionalities are implemented locally.

The apparatus 300 comprises a communication interface or input/output 310 for communicating with other entities with, for example, a local area network (LAN) port or mobile communication networks (e.g. UMTS, CDMA-2000, GSM), a processor 320, a user interface 330, a memory 340. The memory 340 comprises a work memory 342 and non-volatile memory 344 comprising computer program code 346 to be executed by the processor 320 in place and/or within the work memory 340. The non-volatile memory 344 can be used for storing additionally other long-lasting data such as user settings, database data for storing, for example, key term visualisation data.

The processor 320 is, for example, formed of one or more of: a master control unit (MCU); a microprocessor; a digital signal processor (DSP); an application specific integrated circuit (ASIC); a field programmable gate array; a microcontroller. The processor 320 is capable of, for example, controlling the operation of the apparatus 300 using on the computer program code 346.

Various embodiments have been presented. It should be appreciated that in this document, words comprise, include and contain are each used as open-ended expressions with no intended exclusivity.

For example, video image is received in an embodiment in addition to the recorded speech so that also video image or still images of the video image can be stored and presented on displaying any derivative information based on the recorded speech. For example, the words of the word cloud or other visualization may be associated with respective portion of speech. On accessing a word of visualisation, a respective portion of received speech can be replayed. Alternatively or additionally, video image or still images of the video can be presented on accessing the word of the visualization. In an embodiment, the key terms of the visualization are associated with respective portions of recorded speech or video and replayed on accessing the key terms e.g. through the visualization.

In one example, the key terms of the visualization are used as search terms on accessing of the key terms. For example, on clicking a key term of the visualization, a supplementary information search is automatically performed from the Internet or an inter-organisation data repository. In another example, the supplementary search is performed in advance for use of contemporary material and the search results are then presented on accessing of the respective key term. In an example, key terms or phrases can be stored to an idea bank for subsequent use and stored key terms or phrases can be automatically searched and retrieved from the idea bank through associated key terms. For example, on accessing a visualized key term, the user may be provided with related information. The related information may comprise any of a key term visualization; recording of speech; recording of video; and written note. The automatic searching and retrieving of the stored key terms or phrases can be performed even during normal breathing pauses or change of turn of speaker in a normal meeting thanks to the speed of computers in a manner that would be impossible to manually implement by people.

In an example, productivity of different people and groups of people is automatically measured by computing the amount of or relevance of the produced key term visualisations to subsequent work within the organization. By automatically computing the subsequent use of work of earlier people and teams, thousands of different word cloud combinations can be compared and adaptively scored unlike with any existing manual methods.

The foregoing description has provided by way of non-limiting examples of particular implementations and embodiments of the invention a full and informative description of the best mode presently contemplated by the inventors for carrying out the invention. It is however clear to a person skilled in the art that the invention is not restricted to details of the embodiments presented in the foregoing, but that it can be implemented in other embodiments using equivalent means or in different combinations of embodiments without deviating from the characteristics of the invention.

Furthermore, some of the features of the afore-disclosed embodiments of this invention may be used to advantage without the corresponding use of other features. As such, the foregoing description shall be considered as merely illustrative of the principles of the present invention, and not in limitation thereof. Hence, the scope of the invention is only restricted by the appended patent claims.

Claims

1. A method comprising:

receiving (210) recorded speech of plural attendees of a meeting;

converting (220) recorded speech to text during the meeting;

enabling editing (230) of the text by the attendees during the meeting;

identifying (240) key terms from the edited text;

forming (250) a dynamic key term visualisation to the attendees from the identified key terms;

enabling modifying (260) of the key terms by the attendees after the forming of the key term visualisation and correspondingly updating the dynamic key term visualisation.

2. The method of claim 1, characterized in that the key term visualization comprises a word cloud.

3. The method of claim 1 or 2, characterized in that the forming of the dynamic key term visualisation is performed repeatedly during the meeting.

4. The method of any one of preceding claims, characterized in that the forming of the dynamic key term visualisation is performed repeatedly during pauses in speech of the participants.

5. The method of any one of preceding claims, characterized in that the method further comprises enabling a user to combine key terms to a combined expression.

6. The method of claim 5, characterized in that the method further comprises associating the combined expression with a given person or group of persons.

7. The method of any one of preceding claims, characterized in that the method further comprises comparing the dynamic key term visualisation with earlier formed key term visualisations.

8. The method of any one of preceding claims, characterized in that the method further comprises storing the dynamic key term visualisation in a repository accessible by the attendees.

9. The method of any one of preceding claims, characterized in that speech is converted to text using a browser operated web service or mobile devices with dedicated application support.

10. The method of any one of preceding claims, characterized in that the method is performed in a network based service using a NodeJS backend and a browser based user interface for the attendees that is implemented using WebRTC.

11. The method of any one of preceding claims, characterized in that the key terms of the visualization are associated to respective passages of the recorded speech.

12. The method of any one of preceding claims, characterized in that the key terms of the visualization are associated to respective video images or still images.

13. The method of any one of preceding claims, characterized in that the key terms of the visualization are associated to respective data available in the Internet or an inter-organisation data repository.

14. The method of any one of preceding claims, characterized in that productivity of different people and groups of people is automatically measured by computing the amount of or relevance of the produced key term visualisations to subsequent work within the organization.

15. A computer program comprising computer executable program code which when executed by at least one processor causes an apparatus to perform the method of any one of the preceding claims. 1/5

2 / 5

210

Receiving recorded speech of plural attendees of a

meeting

220

Converting recorded speech to text during the meeting

230

Enabling editing of the text by the attendees during the

meeting

240

Identifying key terms from the edited text

Forming a dynamic key term visualisation from the 250 identified key terms

Enabling modifying of the key terms by the attendees 260 after the forming of the key term visualisation and

correspondingly updating the dynamic key term

visualisation

3/5

Fig.4a

4 / 5

Fig. 4b

Doris Mitchell

Ad Session

Third Session Name Fourth Session Name Fifth ABy Janet Hawkins ABy Janet Hawkins &By|

¾12 Mar 2016 ¾12 Mar 2016

Description lorem ipsum Description lorem ipsum Descripti dolor sit amet, consectetur dolor sit amet, consectetur amet, co adipiscing elit. Praesent adipiscing elit. Praesent Praesenl consecteiur odio pretitium, consecteiur odio pretitium, Posuere posuere dul in, imperdiet posuere dul in, imperdiet Praesenl metus. Praesent et metus. Praesent et

vulputate lacus (more)... vulputate lacus (more)...

£A 4 < 12 o o o ££ 4 <P 12 o o o 2^

Cynthia Foster

Keith Johnston]

Kathryn Day

Philip Watson

Logo For The team name Doris Mitchell

Topic Name In Progress

Session Name Q 41 : 12 Recording□

Hl2 Mar 2016 Lorem ipsum dolor sit amet consectetur adipiscing elit.

Curabitur maximus, enim id viverra ultrices, odio lorem

Proposed by ^: Users should be: able:

viverra neque, vel ultricies erat est eget velit. Vivamus : to edit: and^: add ^: text,^: : Participiants: vel sagittis leo. Phasellus consectetur lacus vitae dolor while^: the recording: is:

[Cynthia Foster] vehicula, in condimentum massa pellentesque. Vivamus

tempor arcu sit amet blandit faucibus. Internum et

Keith Johnston) malesuada fames ec ante ipsum primis in faucibus.

Nuinc nec orci ut orci aliquam vehicula ac et nisi.

Kathrvn Day!

Stop session:

Philip Watson 1 0

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur

maximus, enim id viverra ultrices, odio lorem viverra neque, vel

ultricies erat est eget velit. Vivamus vel sagittis leo. Phasellus

consectetur lacus vitae dolor vehicula, in condimentum massa

pellentesque. Vivamus tempor arcu sit amet blandit

Cynthia Foster]

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur

maximus, enim id viverra ultrices, odio lorem viverra neque, vel

ultricies erat est eget velit. Vivamus vel sagittis leo. Phasellus

consectetur lacus vitae dolor vehicula, in condimentum massa

pellentesque. Vivamus tempor arcu sit amet blandit fauci-

Comment^: