WO2018069580A1 - Interactive collaboration tool - Google Patents

Interactive collaboration tool Download PDF

Info

Publication number
WO2018069580A1
WO2018069580A1 PCT/FI2017/050719 FI2017050719W WO2018069580A1 WO 2018069580 A1 WO2018069580 A1 WO 2018069580A1 FI 2017050719 W FI2017050719 W FI 2017050719W WO 2018069580 A1 WO2018069580 A1 WO 2018069580A1
Authority
WO
WIPO (PCT)
Prior art keywords
visualisation
attendees
meeting
key term
key
Prior art date
Application number
PCT/FI2017/050719
Other languages
French (fr)
Inventor
Niina HALONEN
Kirsti LONKA
Olli SARVI
Original Assignee
University Of Helsinki
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Helsinki filed Critical University Of Helsinki
Publication of WO2018069580A1 publication Critical patent/WO2018069580A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1831Tracking arrangements for later retrieval, e.g. recording contents, participants activities or behavior, network status
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/06Message adaptation to terminal or network requirements
    • H04L51/066Format adaptation, e.g. format conversion or compression
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting

Definitions

  • the present invention generally relates to an interactive collaboration tool.
  • Meetings as such may be useful for dissemination of information, but often some further action should be taken and some co-ordination of work is needed.
  • the agenda of a next meeting includes verification of the progress in the action points recorded in the minutes of a previous meeting and the attendants may use the minutes as well as a reminder of any tasks assigned to them.
  • the tools that may help to visualize and summarize the discussions during the meetings do not offer particular support for post-meeting collaboration.
  • a method comprising: receiving recorded speech of plural attendees of a meeting;
  • the key term visualization may comprise or be any of a graphic significance presentation of a sub-set of the key terms; and a numeric chart of a sub-set of the key terms; and a word cloud.
  • the sub-set of key terms may comprise key terms ranked by descending score.
  • the word cloud may be formed with words and/or phrases.
  • the forming of the dynamic key term visualisation may be performed repeatedly during the meeting.
  • the forming of the dynamic key term visualisation may be performed repeatedly during pauses in speech of the participants.
  • the attendees may be enabled to modify the key terms in real-time during the meeting.
  • the meeting may be an on-line meeting in which at least one attendee participates over a data connection. Alternatively, the meeting may be a face-to-face meeting in which all participants physically meet each other.
  • the key terms may comprise keywords.
  • the key terms may comprise key phrases.
  • the attendees may be provided by a visualization that illustrates main items discussed in the meeting.
  • the illustration of the main items may facilitate understanding and developing the topic of the meeting.
  • the dynamic development of the key term visualisation may facilitate collaboration by the attendees by enabling on-line and/or offline development of the illustration.
  • the method may comprise enabling a user to combine key terms to a combined expression.
  • the combined expression may be usable for defining action points and/or summarizing the meeting.
  • the method may comprise associating the combined expression with a given person or group of persons.
  • the associating may comprise forming a calendar entry using the combined expression for the given person or group of persons.
  • the method may comprise comparing the dynamic key term visualisation with earlier formed key term visualisations.
  • the comparison may be used to identify earlier work relating to the topic of the meeting.
  • the comparison may be used to detect developments made since earlier work relating to the topic.
  • the comparison may be used to identify the contribution of the attendants to the propagation of the meeting and / or to the development since the earlier work relating to the topic.
  • the modifying of the key terms may comprise adding or removing key terms.
  • the modifying of the key terms may comprise editing the keyterms.
  • the method may comprise detecting the attendee whose speech is being received.
  • the method may comprise presenting the identified key terms together with an indication of the related attendee or attendees.
  • the method may comprise detecting from received speech such attendees whose speech has been received less than by a set minimum proportion and prompting comments from such attendees.
  • the detecting of the attendee whose speech is being received may be performed by identifying an individual channel used by the attendee or by use of voice recognition.
  • the method may comprise receiving a topic of the meeting and maintaining the text associated with the topic.
  • the method further comprise storing the dynamic key term visualisation in a repository accessible by the attendees.
  • the method may comprise reporting the key term visualisation to one or more persons associated with the attendees.
  • the speech may be converted to text using a browser operated web service.
  • the service may be implemented using mobile devices and dedicated application(s).
  • the method may be performed in a network based service.
  • the network based service may be implemented using a NodeJS backend.
  • the network based service may be run by a dedicated server and/or a cloud computing system.
  • the method may comprise providing a browser based user interface for the attendees.
  • the browser based user interface may be implemented using WebRTC.
  • a computer program comprising computer executable program code which when executed by at least one processor causes an apparatus to perform the second aspect.
  • an apparatus comprising a memory configured to store the computer program of the second aspect and a processor configured to control operation of the apparatus according to the computer program.
  • a computer program product comprising a non-transitory computer readable medium having the computer program of the third aspect stored thereon.
  • Any foregoing memory medium may comprise a digital data storage such as a data disc or diskette, optical storage, magnetic storage, or opto-magnetic storage.
  • the memory medium may be formed into a device without other substantial functions than storing memory or it may be formed as part of a device with other functions, including but not limited to a memory of a computer, a chip set, and a sub assembly of an electronic device.
  • Fig. 1 shows a schematic picture of a system according to an embodiment of the invention
  • Fig. 2 shows a flow chart illustrating a method of an embodiment of the invention
  • Fig. 3 shows a block diagram of an apparatus suited for implementing an embodiment of the invention
  • Fig. 4 illustrates a dashboard view of an embodiment
  • Fig. 5 shows a modification view for allowing participants.
  • Fig. 1 shows a schematic picture of a system 100 according to an embodiment of the invention.
  • the system has plural users 110, a web browser user interface 120 and an application interface 130 for use with a browser or dedicated applications or apps such tablet computer or mobile phone apps.
  • a speech recognition web service 150 using WebRTC, for example
  • a NodeJS Backend 140 which are connected to an information extraction and scoring engine 160 that extracts and scores key terms (e.g. keywords and/or key phrases) based on natural language processing with scaleable information matching.
  • key terms e.g. keywords and/or key phrases
  • Fig. 2 can be implemented using discrete elements such as separate servers or cloud services that provide the different functionalities.
  • some or all of the services used by the users 110 can be provided in common by one or more entities.
  • Fig. 2 shows a flow chart illustrating a method of an embodiment of the invention. The method comprises:
  • the forming of the dynamic key term visualisation is preferably performed repeatedly during the meeting, for example during pauses in speech of the participants, with given intervals or when a given amount of speech has been converted to text.
  • the attendees can be provided by a visualization that illustrates main items discussed in the meeting.
  • a visualization that illustrates main items discussed in the meeting.
  • Such an illustration of the main items facilitates understanding and developing the topic of the meeting.
  • the dynamic development of the key term visualisation facilitates collaboration by the attendees by enabling on-line and/or offline development of the illustration.
  • a user is enabled to combine key terms to a combined expression.
  • the combined expression can be usable, for example, for defining action points and/or summarizing the meeting.
  • the method comprises in an embodiment associating the combined expression with a given person or group of persons.
  • the associating comprises, for example, forming a calendar entry using the combined expression for the given person or group of persons.
  • the method comprises in an embodiment comparing the dynamic key term visualisation with earlier formed key term visualisations for the comparison to be used, for example, to any of: identifying earlier work relating to the topic of the meeting; detecting developments made since earlier work relating to the topic; identifying the contribution of the attendants to the propagation of the meeting and / or development since the earlier work relating to the topic.
  • the modifying of the key terms comprises, for example, adding or removing key terms.
  • the modifying of the key terms may comprise editing the key terms.
  • the method comprises detecting the attendee whose speech is being received; and/or presenting the identified key terms together with an indication of the related attendee or attendees.
  • the method comprises in an embodiment detecting from received speech such attendees whose speech has been received less than by a set minimum proportion of time and prompting comments from such attendees.
  • the detecting of the attendee whose speech is being received can be performed, for example, by identifying an individual channel used by the attendee or by use of voice recognition.
  • the method comprises in an embodiment receiving a topic of the meeting and maintaining the text associated with the topic.
  • the method preferably comprises storing the dynamic key term visualisation in a repository accessible by the attendees.
  • the method may comprise reporting the key term visualisation to one or more persons associated with the attendees.
  • the speech is converted to text using a browser operated web service.
  • the method is implemented in an embodiment with support for mobile devices and applications configured to interface with the user 110.
  • the user 110 logs into a service of an embodiment and creates a title for a topic. If the topic is new, the user is prompted to create a new session. The user may be prompted, for example, to give a title name and date and time for the session and to add desired participants for this specific session.
  • the user is shown a dashboard view to all the sessions the user has created or participated in, wherein one topic my comprise plural sessions.
  • Fig. 4 illustrates a dashboard view of an embodiment.
  • a session all the spoken audio of the participants is recorded.
  • the participants are preferably able to edit and add text while the recording is paused, see Fig. 5.
  • a canvas is shown (e.g. to the participants and optionally other authorized people) to enable further editing of the session outcomes.
  • This editing can be performed by allowing the users to position key terms on the canvas as they like on the canvas.
  • Content may also be added or removed from the content on the canvas e.g. by typing new content into a text box, after clicking on desired position on the canvas or dragging out or clicking an existing entry.
  • a view and further processing of the session results can be arranged by enabling users to work with a word list shown with associated scores in which key terms can be selected for editing their text and/or score (e.g. estimated relevance to the topic of the session).
  • the text converted from speech can be split into key terms based on various probabilistic models. For example, frequency of various words or phrases can be compared to a reference corpus to determine how much their use frequency differs either being greater or smaller than in the reference corpus.
  • the reference corpus can be selected from the same or associated topic to reduce significance of likely trivial items.
  • a further key term visualisation is presented based on a search on results from outside of the present session. For example, a user may search in other recorded sessions regarding same or other topic, or from another source such as the Internet and the search results can be presented as another key term visualisation for comparison.
  • the method is preferable performed automatically so that the key term visualisation of a session is built and updated as an ongoing process while the session progresses so that the participants can obtain an instantaneous and dynamically developing graphical presentation of the progress of content of the session.
  • the method may help to remove need for manual summarizing of topics and recording of future action points and reduce the need to communicate between the attendees or other persons who should be aware of the progress or results of the session.
  • the method may further enable combining massive data sets interactively with the spoken contribution of the attendees so providing an all new search and interaction tool. For example, during the session, thousands of comparisons and decisions may be performed per second while the key term visualisation is updated. Such fast computation may be particularly useful to update the visualization of the participants during natural pauses in speech and thus avoid necessitating people to interrupt their normal interaction.
  • Fig. 3 shows a block diagram of an apparatus 300 suited for implementing an embodiment of the invention.
  • the apparatus 300 can be used, depending on implementation, as a user terminal and/or computer server for implementing at least some parts of the method of Fig. 2. Notice that it is not necessary to run any part of the method of Fig. 2 as a network based service but instead, in some embodiments the functionalities are implemented locally.
  • the apparatus 300 comprises a communication interface or input/output 310 for communicating with other entities with, for example, a local area network (LAN) port or mobile communication networks (e.g. UMTS, CDMA-2000, GSM), a processor 320, a user interface 330, a memory 340.
  • the memory 340 comprises a work memory 342 and non-volatile memory 344 comprising computer program code 346 to be executed by the processor 320 in place and/or within the work memory 340.
  • the non-volatile memory 344 can be used for storing additionally other long-lasting data such as user settings, database data for storing, for example, key term visualisation data.
  • the processor 320 is, for example, formed of one or more of: a master control unit (MCU); a microprocessor; a digital signal processor (DSP); an application specific integrated circuit (ASIC); a field programmable gate array; a microcontroller.
  • the processor 320 is capable of, for example, controlling the operation of the apparatus 300 using on the computer program code 346.
  • video image is received in an embodiment in addition to the recorded speech so that also video image or still images of the video image can be stored and presented on displaying any derivative information based on the recorded speech.
  • the words of the word cloud or other visualization may be associated with respective portion of speech.
  • a respective portion of received speech can be replayed.
  • video image or still images of the video can be presented on accessing the word of the visualization.
  • the key terms of the visualization are associated with respective portions of recorded speech or video and replayed on accessing the key terms e.g. through the visualization.
  • the key terms of the visualization are used as search terms on accessing of the key terms.
  • a supplementary information search is automatically performed from the Internet or an inter-organisation data repository.
  • the supplementary search is performed in advance for use of contemporary material and the search results are then presented on accessing of the respective key term.
  • key terms or phrases can be stored to an idea bank for subsequent use and stored key terms or phrases can be automatically searched and retrieved from the idea bank through associated key terms.
  • the user may be provided with related information.
  • the related information may comprise any of a key term visualization; recording of speech; recording of video; and written note.
  • productivity of different people and groups of people is automatically measured by computing the amount of or relevance of the produced key term visualisations to subsequent work within the organization.
  • productivity of different people and groups of people is automatically measured by computing the amount of or relevance of the produced key term visualisations to subsequent work within the organization.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Multimedia (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Tourism & Hospitality (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method and computer program for receiving (210) recorded speech of plural attendees of a meeting; converting (220) recorded speech to text during the meeting; enabling editing (230) of the text by the attendees during the meeting; identifying (240) key terms from the edited text; forming (250) a dynamic key term visualisation from the identified key terms; and enabling modifying (260) of the key terms by the attendees after the forming of the key term visualisation and correspondingly updating the dynamic key term visualisation.

Description

INTERACTIVE COLLABORATION TOOL
TECHNICAL FIELD
The present invention generally relates to an interactive collaboration tool.
BACKGROUND ART
This section illustrates useful background information without admission of any technique described herein representative of the state of the art. Collaboration of people typically involves bidirectional exchange of information in which thoughts and ideas are jointly developed by participants to facilitate assessment of the objectives and / or the solutions or approach to reach the objectives. This is often implemented by holding a meeting in which attendants discuss based on an agenda of the meeting and someone takes and afterwards circulates minutes of the meeting. Action points may be written in the minutes to designate who should take care of various work items that were decided to be done.
Tools exist to facilitate the forming of the minutes and various techniques further exist to facilitate quick access to information in the minutes. For example, US 9,035,996 Bl discloses that recordings made with participants' computing devices can be processed into transcript in which each participant is identified. The processed text from the transcript can be displayed in a word cloud using a variety of formats and techniques including alphabetization, font size differences for emphasis and color differences for emphasis or other formats to provide distinction to the text within the word cloud. The word cloud can be displayed as a summary of the transcript and used to pick the interest of a user seeking to join multi-device video communication session and a user can search recorded content by selecting a text element within the word cloud. Existing tools such as the US 9,035,996 Bl may facilitate visualization of a meeting and particularly access to the content of its recording, but not particularly help the collaboration while a meeting is in progress or after the meeting is over. Typically, meetings involve presence of plural people and initiate thought processes that do not stop when the meeting ends. Sometimes, some of the participants may further discuss after the meeting informally at other occasions when in touch, but such discussions would not be conveyed to the other attendants. Moreover, the automatic formation of the word cloud can only make an effort to summarize the meeting by drawing the minutes using computerized techniques such as using a frequency analysis. Such summaries may better identify new business jargon terms than point to actually important topics, because people tend to paraphrase others rather than echo same words not to waste time and to indicate having understood previous speakers.
Meetings as such may be useful for dissemination of information, but often some further action should be taken and some co-ordination of work is needed. Traditionally, the agenda of a next meeting includes verification of the progress in the action points recorded in the minutes of a previous meeting and the attendants may use the minutes as well as a reminder of any tasks assigned to them. However, the tools that may help to visualize and summarize the discussions during the meetings do not offer particular support for post-meeting collaboration.
It is an object of the present invention to avoid or mitigate the aforementioned problems or at least to provide new technical alternatives to the state of the art.
SUMMARY
According to a first aspect of the invention there is provided a method comprising: receiving recorded speech of plural attendees of a meeting;
converting recorded speech to text during the meeting;
enabling editing of the text by the attendees during the meeting;
identifying key terms from the edited text;
forming a dynamic key term visualisation from the identified key terms;
enabling modifying of the key terms by the attendees after the forming of the key term visualisation and correspondingly updating the dynamic key term visualisation. The key term visualization may comprise or be any of a graphic significance presentation of a sub-set of the key terms; and a numeric chart of a sub-set of the key terms; and a word cloud. The sub-set of key terms may comprise key terms ranked by descending score. The word cloud may be formed with words and/or phrases.
The forming of the dynamic key term visualisation may be performed repeatedly during the meeting. The forming of the dynamic key term visualisation may be performed repeatedly during pauses in speech of the participants. The attendees may be enabled to modify the key terms in real-time during the meeting. The meeting may be an on-line meeting in which at least one attendee participates over a data connection. Alternatively, the meeting may be a face-to-face meeting in which all participants physically meet each other. The key terms may comprise keywords. The key terms may comprise key phrases.
By forming a dynamic key term visualisation based on the recorded speech and enabling modifying of the dynamic key term visualisation by the attendees, the attendees may be provided by a visualization that illustrates main items discussed in the meeting. The illustration of the main items may facilitate understanding and developing the topic of the meeting. Moreover, the dynamic development of the key term visualisation may facilitate collaboration by the attendees by enabling on-line and/or offline development of the illustration. The method may comprise enabling a user to combine key terms to a combined expression. The combined expression may be usable for defining action points and/or summarizing the meeting.
The method may comprise associating the combined expression with a given person or group of persons. The associating may comprise forming a calendar entry using the combined expression for the given person or group of persons. The method may comprise comparing the dynamic key term visualisation with earlier formed key term visualisations. The comparison may be used to identify earlier work relating to the topic of the meeting. The comparison may be used to detect developments made since earlier work relating to the topic. The comparison may be used to identify the contribution of the attendants to the propagation of the meeting and / or to the development since the earlier work relating to the topic.
The modifying of the key terms may comprise adding or removing key terms. Alternatively or additionally, the modifying of the key terms may comprise editing the keyterms.
The method may comprise detecting the attendee whose speech is being received. The method may comprise presenting the identified key terms together with an indication of the related attendee or attendees. The method may comprise detecting from received speech such attendees whose speech has been received less than by a set minimum proportion and prompting comments from such attendees.
The detecting of the attendee whose speech is being received may be performed by identifying an individual channel used by the attendee or by use of voice recognition.
The method may comprise receiving a topic of the meeting and maintaining the text associated with the topic.
The method further comprise storing the dynamic key term visualisation in a repository accessible by the attendees. The method may comprise reporting the key term visualisation to one or more persons associated with the attendees.
The speech may be converted to text using a browser operated web service. The service may be implemented using mobile devices and dedicated application(s).
The method may be performed in a network based service. The network based service may be implemented using a NodeJS backend. The network based service may be run by a dedicated server and/or a cloud computing system.
The method may comprise providing a browser based user interface for the attendees. The browser based user interface may be implemented using WebRTC.
According to a second aspect of the invention there is provided a computer program comprising computer executable program code which when executed by at least one processor causes an apparatus to perform the second aspect.
According to a third aspect of the invention there is provided an apparatus comprising a memory configured to store the computer program of the second aspect and a processor configured to control operation of the apparatus according to the computer program.
According to a fourth aspect of the invention there is provided a computer program product comprising a non-transitory computer readable medium having the computer program of the third aspect stored thereon. Any foregoing memory medium may comprise a digital data storage such as a data disc or diskette, optical storage, magnetic storage, or opto-magnetic storage. The memory medium may be formed into a device without other substantial functions than storing memory or it may be formed as part of a device with other functions, including but not limited to a memory of a computer, a chip set, and a sub assembly of an electronic device.
Different non-binding aspects and embodiments of the present invention have been illustrated in the foregoing. The embodiments in the foregoing are used merely to explain selected aspects or steps that may be utilized in implementations of the present invention. Some embodiments may be presented only with reference to certain aspects of the invention. It should be appreciated that corresponding embodiments may apply to other aspects as well. BRIEF DESCRIPTION OF DRAWINGS
Some embodiments of the invention will be described with reference to the accompanying drawings, in which:
Fig. 1 shows a schematic picture of a system according to an embodiment of the invention;
Fig. 2 shows a flow chart illustrating a method of an embodiment of the invention; Fig. 3 shows a block diagram of an apparatus suited for implementing an embodiment of the invention;
Fig. 4 illustrates a dashboard view of an embodiment; and
Fig. 5 shows a modification view for allowing participants.
DETAILED DESCRIPTION
In the following description, like reference signs denote like elements or steps.
Fig. 1 shows a schematic picture of a system 100 according to an embodiment of the invention. The system has plural users 110, a web browser user interface 120 and an application interface 130 for use with a browser or dedicated applications or apps such tablet computer or mobile phone apps. Through the web browser user interface 120 and the application interface 130 are connected to a speech recognition web service 150 (using WebRTC, for example) and to a NodeJS Backend 140, which are connected to an information extraction and scoring engine 160 that extracts and scores key terms (e.g. keywords and/or key phrases) based on natural language processing with scaleable information matching.
The architecture of Fig. 2 can be implemented using discrete elements such as separate servers or cloud services that provide the different functionalities. On the other hand, some or all of the services used by the users 110 can be provided in common by one or more entities.
Fig. 2 shows a flow chart illustrating a method of an embodiment of the invention. The method comprises:
210. Receiving recorded speech of plural attendees of a meeting;
220. Converting recorded speech to text during the meeting;
230. Enabling editing of the text by the attendees during the meeting;
240. Identifying key terms from the edited text;
250. Forming a dynamic key term visualisation from the identified key terms; and 260. Enabling modifying of the key terms by the attendees after the forming of the key term visualisation and correspondingly updating the dynamic key term visualisation.
The forming of the dynamic key term visualisation is preferably performed repeatedly during the meeting, for example during pauses in speech of the participants, with given intervals or when a given amount of speech has been converted to text.
By forming a dynamic key term visualisation based on the recorded speech and enabling modifying of the dynamic key term visualisation by the attendees, the attendees can be provided by a visualization that illustrates main items discussed in the meeting. Such an illustration of the main items facilitates understanding and developing the topic of the meeting. Moreover, the dynamic development of the key term visualisation facilitates collaboration by the attendees by enabling on-line and/or offline development of the illustration.
In an embodiment, a user is enabled to combine key terms to a combined expression. The combined expression can be usable, for example, for defining action points and/or summarizing the meeting.
The method comprises in an embodiment associating the combined expression with a given person or group of persons. The associating comprises, for example, forming a calendar entry using the combined expression for the given person or group of persons. The method comprises in an embodiment comparing the dynamic key term visualisation with earlier formed key term visualisations for the comparison to be used, for example, to any of: identifying earlier work relating to the topic of the meeting; detecting developments made since earlier work relating to the topic; identifying the contribution of the attendants to the propagation of the meeting and / or development since the earlier work relating to the topic.
The modifying of the key terms comprises, for example, adding or removing key terms. Alternatively or additionally, the modifying of the key terms may comprise editing the key terms.
In an embodiment, the method comprises detecting the attendee whose speech is being received; and/or presenting the identified key terms together with an indication of the related attendee or attendees. The method comprises in an embodiment detecting from received speech such attendees whose speech has been received less than by a set minimum proportion of time and prompting comments from such attendees.
The detecting of the attendee whose speech is being received can be performed, for example, by identifying an individual channel used by the attendee or by use of voice recognition.
The method comprises in an embodiment receiving a topic of the meeting and maintaining the text associated with the topic.
The method preferably comprises storing the dynamic key term visualisation in a repository accessible by the attendees. The method may comprise reporting the key term visualisation to one or more persons associated with the attendees.
In an embodiment, the speech is converted to text using a browser operated web service. The method is implemented in an embodiment with support for mobile devices and applications configured to interface with the user 110.
An example use case is next described. First, the user 110 logs into a service of an embodiment and creates a title for a topic. If the topic is new, the user is prompted to create a new session. The user may be prompted, for example, to give a title name and date and time for the session and to add desired participants for this specific session. The user is shown a dashboard view to all the sessions the user has created or participated in, wherein one topic my comprise plural sessions. Fig. 4 illustrates a dashboard view of an embodiment.
During a session, all the spoken audio of the participants is recorded. The participants are preferably able to edit and add text while the recording is paused, see Fig. 5. After the session ends, the results of the session are processed and a canvas is shown (e.g. to the participants and optionally other authorized people) to enable further editing of the session outcomes. This editing can be performed by allowing the users to position key terms on the canvas as they like on the canvas. Content may also be added or removed from the content on the canvas e.g. by typing new content into a text box, after clicking on desired position on the canvas or dragging out or clicking an existing entry. Alternatively or additionally to canvas based modification, a view and further processing of the session results can be arranged by enabling users to work with a word list shown with associated scores in which key terms can be selected for editing their text and/or score (e.g. estimated relevance to the topic of the session).
The text converted from speech can be split into key terms based on various probabilistic models. For example, frequency of various words or phrases can be compared to a reference corpus to determine how much their use frequency differs either being greater or smaller than in the reference corpus. The reference corpus can be selected from the same or associated topic to reduce significance of likely trivial items. In an embodiment, a further key term visualisation is presented based on a search on results from outside of the present session. For example, a user may search in other recorded sessions regarding same or other topic, or from another source such as the Internet and the search results can be presented as another key term visualisation for comparison.
The method is preferable performed automatically so that the key term visualisation of a session is built and updated as an ongoing process while the session progresses so that the participants can obtain an instantaneous and dynamically developing graphical presentation of the progress of content of the session. The method may help to remove need for manual summarizing of topics and recording of future action points and reduce the need to communicate between the attendees or other persons who should be aware of the progress or results of the session. The method may further enable combining massive data sets interactively with the spoken contribution of the attendees so providing an all new search and interaction tool. For example, during the session, thousands of comparisons and decisions may be performed per second while the key term visualisation is updated. Such fast computation may be particularly useful to update the visualization of the participants during natural pauses in speech and thus avoid necessitating people to interrupt their normal interaction.
Fig. 3 shows a block diagram of an apparatus 300 suited for implementing an embodiment of the invention. The apparatus 300 can be used, depending on implementation, as a user terminal and/or computer server for implementing at least some parts of the method of Fig. 2. Notice that it is not necessary to run any part of the method of Fig. 2 as a network based service but instead, in some embodiments the functionalities are implemented locally.
The apparatus 300 comprises a communication interface or input/output 310 for communicating with other entities with, for example, a local area network (LAN) port or mobile communication networks (e.g. UMTS, CDMA-2000, GSM), a processor 320, a user interface 330, a memory 340. The memory 340 comprises a work memory 342 and non-volatile memory 344 comprising computer program code 346 to be executed by the processor 320 in place and/or within the work memory 340. The non-volatile memory 344 can be used for storing additionally other long-lasting data such as user settings, database data for storing, for example, key term visualisation data.
The processor 320 is, for example, formed of one or more of: a master control unit (MCU); a microprocessor; a digital signal processor (DSP); an application specific integrated circuit (ASIC); a field programmable gate array; a microcontroller. The processor 320 is capable of, for example, controlling the operation of the apparatus 300 using on the computer program code 346.
Various embodiments have been presented. It should be appreciated that in this document, words comprise, include and contain are each used as open-ended expressions with no intended exclusivity.
For example, video image is received in an embodiment in addition to the recorded speech so that also video image or still images of the video image can be stored and presented on displaying any derivative information based on the recorded speech. For example, the words of the word cloud or other visualization may be associated with respective portion of speech. On accessing a word of visualisation, a respective portion of received speech can be replayed. Alternatively or additionally, video image or still images of the video can be presented on accessing the word of the visualization. In an embodiment, the key terms of the visualization are associated with respective portions of recorded speech or video and replayed on accessing the key terms e.g. through the visualization.
In one example, the key terms of the visualization are used as search terms on accessing of the key terms. For example, on clicking a key term of the visualization, a supplementary information search is automatically performed from the Internet or an inter-organisation data repository. In another example, the supplementary search is performed in advance for use of contemporary material and the search results are then presented on accessing of the respective key term. In an example, key terms or phrases can be stored to an idea bank for subsequent use and stored key terms or phrases can be automatically searched and retrieved from the idea bank through associated key terms. For example, on accessing a visualized key term, the user may be provided with related information. The related information may comprise any of a key term visualization; recording of speech; recording of video; and written note. The automatic searching and retrieving of the stored key terms or phrases can be performed even during normal breathing pauses or change of turn of speaker in a normal meeting thanks to the speed of computers in a manner that would be impossible to manually implement by people.
In an example, productivity of different people and groups of people is automatically measured by computing the amount of or relevance of the produced key term visualisations to subsequent work within the organization. By automatically computing the subsequent use of work of earlier people and teams, thousands of different word cloud combinations can be compared and adaptively scored unlike with any existing manual methods.
The foregoing description has provided by way of non-limiting examples of particular implementations and embodiments of the invention a full and informative description of the best mode presently contemplated by the inventors for carrying out the invention. It is however clear to a person skilled in the art that the invention is not restricted to details of the embodiments presented in the foregoing, but that it can be implemented in other embodiments using equivalent means or in different combinations of embodiments without deviating from the characteristics of the invention.
Furthermore, some of the features of the afore-disclosed embodiments of this invention may be used to advantage without the corresponding use of other features. As such, the foregoing description shall be considered as merely illustrative of the principles of the present invention, and not in limitation thereof. Hence, the scope of the invention is only restricted by the appended patent claims.

Claims

Claims
1. A method comprising:
receiving (210) recorded speech of plural attendees of a meeting;
converting (220) recorded speech to text during the meeting;
enabling editing (230) of the text by the attendees during the meeting;
identifying (240) key terms from the edited text;
forming (250) a dynamic key term visualisation to the attendees from the identified key terms;
enabling modifying (260) of the key terms by the attendees after the forming of the key term visualisation and correspondingly updating the dynamic key term visualisation.
2. The method of claim 1, characterized in that the key term visualization comprises a word cloud.
3. The method of claim 1 or 2, characterized in that the forming of the dynamic key term visualisation is performed repeatedly during the meeting.
4. The method of any one of preceding claims, characterized in that the forming of the dynamic key term visualisation is performed repeatedly during pauses in speech of the participants.
5. The method of any one of preceding claims, characterized in that the method further comprises enabling a user to combine key terms to a combined expression.
6. The method of claim 5, characterized in that the method further comprises associating the combined expression with a given person or group of persons.
7. The method of any one of preceding claims, characterized in that the method further comprises comparing the dynamic key term visualisation with earlier formed key term visualisations.
8. The method of any one of preceding claims, characterized in that the method further comprises storing the dynamic key term visualisation in a repository accessible by the attendees.
9. The method of any one of preceding claims, characterized in that speech is converted to text using a browser operated web service or mobile devices with dedicated application support.
10. The method of any one of preceding claims, characterized in that the method is performed in a network based service using a NodeJS backend and a browser based user interface for the attendees that is implemented using WebRTC.
11. The method of any one of preceding claims, characterized in that the key terms of the visualization are associated to respective passages of the recorded speech.
12. The method of any one of preceding claims, characterized in that the key terms of the visualization are associated to respective video images or still images.
13. The method of any one of preceding claims, characterized in that the key terms of the visualization are associated to respective data available in the Internet or an inter-organisation data repository.
14. The method of any one of preceding claims, characterized in that productivity of different people and groups of people is automatically measured by computing the amount of or relevance of the produced key term visualisations to subsequent work within the organization.
15. A computer program comprising computer executable program code which when executed by at least one processor causes an apparatus to perform the method of any one of the preceding claims. 1/5
Figure imgf000017_0001
2 / 5
210
Receiving recorded speech of plural attendees of a
meeting
220
Converting recorded speech to text during the meeting
230
Enabling editing of the text by the attendees during the
meeting
240
Identifying key terms from the edited text
Forming a dynamic key term visualisation from the 250 identified key terms
Enabling modifying of the key terms by the attendees 260 after the forming of the key term visualisation and
correspondingly updating the dynamic key term
visualisation
Figure imgf000018_0001
3/5
Fig.4a
Figure imgf000019_0002
Figure imgf000019_0001
4 / 5
Fig. 4b
Doris Mitchell
Figure imgf000020_0001
Figure imgf000020_0002
Ad Session
Third Session Name Fourth Session Name Fifth ABy Janet Hawkins ABy Janet Hawkins &By|
¾12 Mar 2016 ¾12 Mar 2016
Description lorem ipsum Description lorem ipsum Descripti dolor sit amet, consectetur dolor sit amet, consectetur amet, co adipiscing elit. Praesent adipiscing elit. Praesent Praesenl consecteiur odio pretitium, consecteiur odio pretitium, Posuere posuere dul in, imperdiet posuere dul in, imperdiet Praesenl metus. Praesent et metus. Praesent et
vulputate lacus (more)... vulputate lacus (more)...
£A 4 < 12 o o o ££ 4 <P 12 o o o 2^
Cynthia Foster
Keith Johnston]
Kathryn Day
Philip Watson
Figure imgf000021_0001
Logo For The team name Doris Mitchell
Topic Name In Progress
Session Name Q 41 : 12 Recording□
Figure imgf000021_0002
Hl2 Mar 2016 Lorem ipsum dolor sit amet consectetur adipiscing elit.
Curabitur maximus, enim id viverra ultrices, odio lorem
Proposed by : Users should be: able:
viverra neque, vel ultricies erat est eget velit. Vivamus : to edit: and: add : text,: : Participiants: vel sagittis leo. Phasellus consectetur lacus vitae dolor while: the recording: is:
[Cynthia Foster] vehicula, in condimentum massa pellentesque. Vivamus
tempor arcu sit amet blandit faucibus. Internum et
Keith Johnston) malesuada fames ec ante ipsum primis in faucibus.
Nuinc nec orci ut orci aliquam vehicula ac et nisi.
Kathrvn Day!
Stop session:
Philip Watson 1 0
Figure imgf000021_0003
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur
maximus, enim id viverra ultrices, odio lorem viverra neque, vel
ultricies erat est eget velit. Vivamus vel sagittis leo. Phasellus
consectetur lacus vitae dolor vehicula, in condimentum massa
pellentesque. Vivamus tempor arcu sit amet blandit
Cynthia Foster]
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Curabitur
maximus, enim id viverra ultrices, odio lorem viverra neque, vel
ultricies erat est eget velit. Vivamus vel sagittis leo. Phasellus
consectetur lacus vitae dolor vehicula, in condimentum massa
pellentesque. Vivamus tempor arcu sit amet blandit fauci-
Comment:
PCT/FI2017/050719 2016-10-13 2017-10-13 Interactive collaboration tool WO2018069580A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20165781 2016-10-13
FI20165781 2016-10-13

Publications (1)

Publication Number Publication Date
WO2018069580A1 true WO2018069580A1 (en) 2018-04-19

Family

ID=60201607

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2017/050719 WO2018069580A1 (en) 2016-10-13 2017-10-13 Interactive collaboration tool

Country Status (1)

Country Link
WO (1) WO2018069580A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113113017A (en) * 2021-04-08 2021-07-13 百度在线网络技术(北京)有限公司 Audio processing method and device
CN113129895A (en) * 2021-04-20 2021-07-16 上海仙剑文化传媒股份有限公司 Voice detection processing system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161619A1 (en) * 2008-12-18 2010-06-24 Lamere Paul B Method and Apparatus for Generating Recommendations From Descriptive Information
EP2288105A1 (en) * 2009-08-17 2011-02-23 Avaya Inc. Word cloud audio navigation
CA2692314A1 (en) * 2010-02-08 2011-08-08 Yellowpages.Com Llc Systems and methods to provide search based on social graphs and affinity groups
US20120179465A1 (en) * 2011-01-10 2012-07-12 International Business Machines Corporation Real time generation of audio content summaries
WO2012175556A2 (en) * 2011-06-20 2012-12-27 Koemei Sa Method for preparing a transcript of a conversation
US9035996B1 (en) 2012-04-16 2015-05-19 Google Inc. Multi-device video communication session
US20160171090A1 (en) * 2014-12-11 2016-06-16 University Of Connecticut Systems and Methods for Collaborative Project Analysis

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100161619A1 (en) * 2008-12-18 2010-06-24 Lamere Paul B Method and Apparatus for Generating Recommendations From Descriptive Information
EP2288105A1 (en) * 2009-08-17 2011-02-23 Avaya Inc. Word cloud audio navigation
CA2692314A1 (en) * 2010-02-08 2011-08-08 Yellowpages.Com Llc Systems and methods to provide search based on social graphs and affinity groups
US20120179465A1 (en) * 2011-01-10 2012-07-12 International Business Machines Corporation Real time generation of audio content summaries
WO2012175556A2 (en) * 2011-06-20 2012-12-27 Koemei Sa Method for preparing a transcript of a conversation
US9035996B1 (en) 2012-04-16 2015-05-19 Google Inc. Multi-device video communication session
US20160171090A1 (en) * 2014-12-11 2016-06-16 University Of Connecticut Systems and Methods for Collaborative Project Analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Learning Python with Raspberry Pi", 29 January 2014, ISBN: 978-1-118-71705-9, article BRADBURY & EVERARD: "Learning Python with Raspberry Pi", pages: 180 - 180, XP055430713 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113113017A (en) * 2021-04-08 2021-07-13 百度在线网络技术(北京)有限公司 Audio processing method and device
CN113113017B (en) * 2021-04-08 2024-04-09 百度在线网络技术(北京)有限公司 Audio processing method and device
CN113129895A (en) * 2021-04-20 2021-07-16 上海仙剑文化传媒股份有限公司 Voice detection processing system

Similar Documents

Publication Publication Date Title
US10860985B2 (en) Post-meeting processing using artificial intelligence
US11307735B2 (en) Creating agendas for electronic meetings using artificial intelligence
EP3467822B1 (en) Speech-to-text conversion for interactive whiteboard appliances in multi-language electronic meetings
US20210297275A1 (en) Organizing and aggregating meetings into threaded representations
EP3467821B1 (en) Selection of transcription and translation services and generation of combined results
US11062271B2 (en) Interactive whiteboard appliances with learning capabilities
US11030585B2 (en) Person detection, person identification and meeting start for interactive whiteboard appliances
EP3309731A1 (en) Managing electronic meetings using artificial intelligence and meeting rules templates
US9712569B2 (en) Method and apparatus for timeline-synchronized note taking during a web conference
US20220294836A1 (en) Systems for information sharing and methods of use, discussion and collaboration system and methods of use
US20180101760A1 (en) Selecting Meeting Participants for Electronic Meetings Using Artificial Intelligence
US8271509B2 (en) Search and chat integration system
JP5003125B2 (en) Minutes creation device and program
US10860797B2 (en) Generating summaries and insights from meeting recordings
US20120321062A1 (en) Telephonic Conference Access System
US20140278377A1 (en) Automatic note taking within a virtual meeting
CN107636651A (en) Subject index is generated using natural language processing
US20130144603A1 (en) Enhanced voice conferencing with history
US20090006982A1 (en) Collaborative generation of meeting minutes and agenda confirmation
US20100161604A1 (en) Apparatus and method for multimedia content based manipulation
US20150066935A1 (en) Crowdsourcing and consolidating user notes taken in a virtual meeting
CN107211062A (en) Audio playback scheduling in virtual acoustic room
JP2008310618A (en) Web conference support program, recording medium with the same recorded thereon, and device and method for web conference support
US20230274730A1 (en) Systems and methods for real time suggestion bot
Chi et al. Intelligent assistance for conversational storytelling using story patterns

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17792108

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17792108

Country of ref document: EP

Kind code of ref document: A1