WO2023192200A1

WO2023192200A1 - Systems and methods for attending and analyzing virtual meetings

Info

Publication number: WO2023192200A1
Application number: PCT/US2023/016448
Authority: WO
Inventors: Greg TAPPER; Andrew Hickey; Andrew Stewart; Galiya Krupa NARENDRABHAI; Vijay Alias VANSH
Original assignee: Pattern Ai Llc
Priority date: 2022-03-29
Filing date: 2023-03-27
Publication date: 2023-10-05

Abstract

The present disclosure provides systems and methods for attending and analyzing virtual calls or meetings. In one aspect, the present disclosure provides a system comprising a robot for autonomously attending one or more virtual calls or meetings as a distinct entity. The robot may be configured to: (a) record, transcribe, and analyze a content of the one or more virtual calls or meetings in response to an instruction or delegation by a user; (b) detect one or more keywords communicated during the one or more virtual calls or meetings; and (c) generate meeting analytics based at least in part on the one or more detected keywords, unique audio signals, and/or visual cues from physical behavior.

Description

SYSTEMS AND METHODS FOR ATTENDING AND ANALYZING VIRTUAL MEETINGS

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the priority and benefit of U.S. Provisional Application No. 63/324,937, filed on March 29, 2022, the entirety of which is incorporated herein by reference.

BACKGROUND

[0002] Virtual calls and meetings can allow various parties to communicate seamlessly with each other from remote locations. In some instances, the physical distance between certain parties may make in person meetings challenging or impractical. Virtual calls or meetings may help to electronically bridge this physical distance and connect two parties to facilitate real time communications.

SUMMARY

[0003] Recognized herein are various limitations with virtual meeting software currently available. Currently, virtual meeting software requires the presence or attendance of a human being. If a user has multiple meetings with overlapping or conflicting schedules, the user may be unable to attend all of his or her meetings unless he or she reschedules some of the meetings for another time or day. Although virtual meeting software can assist with scheduling, currently available technology does not address the need for platforms or systems that allow a virtual entity to attend a meeting on behalf of a user.

[0004] The present disclosure addresses at least the abovementioned shortcomings of conventional systems and methods for attending virtual meetings. In various aspects and embodiments, the present disclosure provides systems and methods for attending and analyzing virtual meetings using a virtual entity. The virtual entity can be configurable to attend virtual meetings or calls without the need for a physical user to be present. The presently disclosed systems and methods may enhance a user’s ability to delegate or instruct a virtual entity to attend calls or meetings on the user’s behalf so that the user can spend his or her day more efficiently while still being privy to the content of the calls or meetings and/or any meeting analytics that can be derived from the content of the calls or meetings.

[0005] In one aspect, the present disclosure provides a robot for autonomously attending one or more virtual calls or meetings as a distinct entity. In some embodiments, the robot may comprise an independent software agent.

[0006] In some embodiments, the robot may be configured to (a) record, transcribe, and analyze a content of the one or more virtual calls or meetings in response to an instruction or delegation by a user, (b) detect one or more keywords communicated during the one or more virtual calls or meetings, and/or (c) generate meeting analytics based at least in part on the one or more detected keywords.

[0007] In some embodiments, the robot is capable of being directed, scheduled, or instructed to attend the one or more virtual calls or meetings without a concurrent presence of an administrator or an owner of the robot. In some embodiments, the robot is configurable to attend a plurality of calls or meetings simultaneously and generate discrete meeting analytics for each of the plurality of calls or meetings. In some embodiments, the robot is configurable to attend a plurality of calls or meetings and to generate aggregated meeting analytics for the plurality of calls or meetings.

[0008] In some embodiments, the instruction by the user comprises an instruction for the robot to attend the one or more virtual calls or meetings at a scheduled time. In some embodiments, the instruction by the user comprises an instruction for the robot to execute one or more tasks at a predetermined or user-specified time.

[0009] In some embodiments, the one or more keywords are set or determined by the user in advance of the one or more virtual calls or meetings. In some embodiments, the robot is configured to listen for, perceive, or sense the one or more keywords. In some embodiments, the robot is configured to use the one or more keywords to generate a more accurate transcription of the one or more calls or meetings.

[0010] In some embodiments, the meeting analytics comprise a meeting summary based on one or more domain specific datasets. In some embodiments, the meeting analytics comprise a report indicating (i) when the one or more keywords were detected during the one or more virtual calls or meetings and (ii) an identity of an individual or entity that communicated the one or more keywords. In some embodiments, the meeting analytics indicate when each attendee of the one or more virtual calls or meetings spoke. In some embodiments, the meeting analytics are filterable to show when one or more select attendees of the one or more virtual calls or meetings spoke. In some embodiments, the meeting analytics comprise one or more key moments corresponding to a positive or negative emotion, intention, or sentiment of one or more attendees of the one or more virtual calls or meetings. In some embodiments, the meeting analytics comprise one or more key moments associated with a time when one or more attendees of the one or more virtual calls or meetings spoke. In some cases, the one or more key moments correspond to an emotion, intention, or sentiment of the one or more attendees. In some cases, the one or more key moments are determined based on a quantifiable measure corresponding to a magnitude of the emotion or intention of the one or more attendees. In some embodiments, the meeting analytics provide additional contextual information associated with the one or more key moments. In some cases, the additional contextual information comprises a transcription of a related portion of a conversation corresponding to the one or more key moments. In some embodiments, the meeting analytics comprise information on attendee engagement during the one or more virtual calls or meetings.

[0011] In some embodiments, the robot is configured to transcribe and/or analyze one or more audio communications made during the one or more virtual calls or meetings using natural language processing, artificial intelligence, digital signal processing, audio waveform analysis, and/or machine learning. In some embodiments, the robot is configured to identify speech patterns, conversational patterns, meeting patterns, and/or a progression of discussion topics, sentiment, or engagement for the one or more virtual calls or meetings on a time series basis. In some embodiments, the robot is configured to identify and analyze meeting patterns based on the one or more virtual calls or meetings. In some cases, the meeting patterns comprise information on a number of meetings attended during a time period, a characteristic of the meetings attended, and/or a sentiment associated with the meetings. In some embodiments, the robot is configured to identify and track attendees of the one or more virtual calls or meetings. In some embodiments, the robot is configured to track one or more verbal or physical actions by the attendees. In some embodiments, the robot is further configured to identify one or more characteristics of the attendees. In some cases, the one or more characteristics comprise demographics, job title, age, gender, geography, company name, or company size.

[0012] In some embodiments, the robot is configured to (i) identify voice prints for one or more attendees of the one or more virtual calls or meetings and (ii) determine an intention, an emotion, or a sentiment of the one or more attendees based on the voice prints.

[0013] In some embodiments, the robot is configured to detect one or more questions from an attendee and/or one or more emotions or behavioral intentions of the attendee based at least in part on one or more audio signals or language patterns detected during the one or more virtual calls or meetings. [0014] In some embodiments, the robot is configured to determine one or more next steps or actions items, one or more actions to be taken in the present or the future, one or more questions asked, one or more questions answered, and/or one or more tasks to be completed, based at least in part on one or more audio signals or language patterns detected during the one or more virtual calls or meetings.

[0015] In some embodiments, the robot is configured to dynamically filter attendees from the one or more virtual calls or meetings. In some embodiments, the robot is configured to detect one or more keyword patterns on a time-series basis. In some embodiments, the robot is configured to generate one or more meeting recordings or transcripts and selectively distribute the one or more meeting recordings or transcripts to one or more other entities. In some embodiments, the robot is configured to generate one or more comments or insights based on the content of the one or more virtual calls or meetings. In some embodiments, the robot is configured to generate an extractive meeting summary based on the content of the one or more virtual calls or meetings. In some embodiments, the robot is configured to (i) generate one or more transcripts based on the content of the one or more virtual calls or meetings and (ii) search the one or more transcripts for a topic, keyword, or phrase of interest. In some embodiments, the robot is configured to (i) generate one or more transcripts based on the content of the one or more virtual calls or meetings and (ii) search the one or more transcripts for an emotion, intention, keyword, topic, or sentiment of interest. In some embodiments, the robot is configured to (i) generate one or more transcripts based on the content of the one or more virtual calls or meetings and (ii) search the one or more transcripts based on a meeting characteristic. The meeting characteristic may comprise an attendee name, attendee age, attendee gender, attendee location or geography, entity name, entity location or geography, entity size, or any combination thereof. In some embodiments, the robot is configured to automatically schedule the one or more virtual calls or meetings on behalf of the user.

[0016] Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.

[0017] Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto. The computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein. [0018] Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

INCORPORATION BY REFERENCE

[0019] All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] The novel features of the invention are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present invention will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the invention are utilized, and the accompanying drawings (also “Figure” and “FIG.” herein), of which:

[0021] FIG. 1 schematically illustrates a robot configured to perform one or more tasks or actions based on an input from a user, in accordance with some embodiments.

[0022] FIG. 2 schematically illustrates a virtual platform comprising a user interface for a user to select a recording of a virtual call or meeting, in accordance with some embodiments.

[0023] FIG. 3 schematically illustrates a user interface configured to display meeting analytics that can be generated for a virtual call or meeting, in accordance with some embodiments.

[0024] FIG. 4 schematically illustrates an exemplary user interface configured to display information on discussions during the meetings or comments made during the meeting, in accordance with some embodiments.

[0025] FIG. 5 and FIG. 6 schematically illustrate various key moments associated with a virtual call or meeting, in accordance with some embodiments.

[0026] FIG. 7 schematically illustrates engagement information for the virtual call or meeting, in accordance with some embodiments.

[0027] FIG. 8 schematically illustrates a portion of a user interface that may be configured to display a transcript of a call or meeting, in accordance with some embodiments.

[0028] FIG. 9 and FIG. 10 schematically illustrate a user interface for scheduling a software agent or robot to attend one or more meetings, in accordance with some embodiments.

[0029] FIG. 11 schematically illustrates an exemplary user interface for managing one or more keywords to be detected or analyzed by the robot or software agent, in accordance with some embodiments.

[0030] FIG. 12 schematically illustrates a keyword tab of a meeting analytics user interface, in accordance with some embodiments.

[0031] FIG. 13 and FIG. 14 schematically illustrate a sentiment score that can be assigned to or associated with a virtual call or meeting and/or one or more communications made during the virtual call or meeting, in accordance with some embodiments.

[0032] FIG. 15 schematically illustrates a computer system that is programmed or otherwise configured to implement methods provided herein.

[0033] FIG. 16 shows an example of graphical user interface (GUI) for managing meeting records.

[0034] FIG. 17 and FIG. 18 show an example of meeting interface.

[0035] FIG. 19 shows examples of rules provided within the user interface for a user to set up rules for the robot to attend a scheduled meeting.

[0036] FIG. 20 an example integrating the platform with a messaging application.

DETAILED DESCRIPTION

[0037] While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.

[0038] Whenever the term “at least,” “greater than,” or “greater than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “at least,” “greater than” or “greater than or equal to” applies to each of the numerical values in that series of numerical values. For example, greater than or equal to 1, 2, or 3 is equivalent to greater than or equal to 1, greater than or equal to 2, or greater than or equal to 3.

[0039] Whenever the term “no more than,” “less than,” or “less than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “no more than,” “less than,” or “less than or equal to” applies to each of the numerical values in that series of numerical values. For example, less than or equal to 3, 2, or 1 is equivalent to less than or equal to 3, less than or equal to 2, or less than or equal to 1.

[0040] The term “real time” or “real-time,” as used interchangeably herein, generally refers to an event (e.g., an operation, a process, a method, a technique, a computation, a calculation, an analysis, a visualization, an optimization, etc.) that is performed using recently obtained (e.g., collected or received) data. In some cases, a real time event may be performed almost immediately or within a short enough time span, such as within at least 0.0001 millisecond (ms), 0.0005 ms, 0.001 ms, 0.005 ms, 0.01 ms, 0.05 ms, 0.1 ms, 0.5 ms, 1 ms, 5 ms, 0.01 seconds, 0.05 seconds, 0.1 seconds, 0.5 seconds, 1 second, or more. In some cases, a real time event may be performed almost immediately or within a short enough time span, such as within at most 1 second, 0.5 seconds, 0.1 seconds, 0.05 seconds, 0.01 seconds, 5 ms, 1 ms, 0.5 ms, 0.1 ms, 0.05 ms, 0.01 ms, 0.005 ms, 0.001 ms, 0.0005 ms, 0.0001 ms, or less.

[0041] Robot / Software Agent

[0042] In one aspect, the present disclosure provides a system comprising a software agent (e.g., an independent software agent) or a robot. The software agent or robot may be implemented using a computing device. In some cases, the software agent or robot may comprise a computer program that can perform one or more actions or tasks for a user or another program. In some cases, the one or more actions or tasks can be executed by the software agent or robot based on one or more inputs or commands provided by a user or an administrator of the robot or software agent. In other cases, the one or more actions or tasks can be executed by the software agent or robot without user instruction, interaction or supervision.

[0043] In some embodiments, the software agent or robot may be configured to autonomously attend one or more virtual calls or meetings as a distinct entity. In some embodiments, the software agent or robot may be configured to (a) record, transcribe, and analyze a content of the one or more virtual calls or meetings in response to an instruction or delegation by a user, (b) detect one or more keywords communicated during the one or more virtual calls or meetings, and/or (c) generate meeting analytics based at least in part on the one or more detected keywords. In some non-limiting embodiments, the software agent or robot may be configured to generate a meeting summary and/or an extractive summary, whereby certain specific transcribed comments can be identified based on characteristics of interest (e.g., keywords, key moments, emotions, or sentiments) and extracted in the aggregate to provide a summarization of the meeting.

[0044] In some embodiments, the software agent or robot may be configured to transcribe and/or analyze one or more audio communications made during the one or more virtual calls or meetings using natural language processing, artificial intelligence, signal processing (e.g., digital signal processing), and/or machine learning. In some embodiments, the software agent or robot may be configured to identify speech patterns, conversational patterns, meeting patterns, and/or a progression of discussion topics for the one or more virtual calls or meetings on a time series basis.

[0045] In some embodiments, the software agent or robot may be configured to identify and analyze meeting patterns for the one or more virtual calls or meetings. The meeting patterns may comprise information on a number of meetings attended during a time period, a characteristic of the meetings attended, and/or a sentiment associated with the meetings. In some embodiments, the characteristic of the meetings attended may comprise information on the amount of time one or more attendees spoke during a meeting, the number of attendees, and/or one or more characteristics or speech patterns of the attendees.

[0046] User Instruction / Delegation

[0047] In some embodiments, the software agent or robot may be configured to record, transcribe, and analyze a content of the one or more virtual calls or meetings in response to an instruction or delegation by a user. In some cases, the instruction by the user may comprise an instruction for the robot to attend the one or more virtual calls or meetings at a scheduled time. The robot may be capable of being directed, scheduled, or instructed to attend the one or more virtual calls or meetings without a concurrent presence of an administrator or an owner of the robot. In any of the embodiments described herein, the robot may be configurable to attend a plurality of calls or meetings simultaneously, and to generate discrete meeting analytics for each of the plurality of calls or meetings. In some cases, the discrete meeting analytics for each of the plurality of calls or meetings can be generated simultaneously in parallel. In any of the embodiments described herein, the meeting analytics can be displayed for a single meeting or in the aggregate for a plurality of meetings. In some cases, the meeting analytics for multiple meetings can be aggregated into a set of metrics, and the set of metrics can be displayed in a single viewing area of a graphical user interface. In other cases, the respective meeting analytics for multiple meetings can be independently displayed across multiple viewing areas of a graphical user interface. [0048] In some cases, the robot may be configured to execute one or more tasks at a predetermined or user-specified time in response to one or more instructions provided by the user (e.g., the administrator or owner of the robot). In some cases, the one or more tasks may involve finding one more data sources or data sets of interest, analyzing the one or more data sets of interest, and/or generating one or more inferences or providing one or more deliverables based on the analysis of the one or more data sets of interest. In some embodiments, the data sources may comprise local resources (e.g., documents or other electronic files) or online resources such as a webpage or other virtual platform that contains data or information of interest to the user. In some non-limiting embodiments, the one or more tasks may comprise autonomously executing a travel booking or autonomously reading a financial report such as a 1 OK or 10Q earnings report. In some other embodiments, the one or more tasks may involve attending a virtual call or meeting on behalf of the user. In any of the embodiments described herein, the one or more tasks may be pre-scheduled by the user for execution at a certain time or on a certain date, or may be assigned by the user to the robot for immediate execution (or execution within a predetermined time frame).

[0049] Scheduling

[0050] In some embodiments, the software agent or robot may be configured to automatically schedule the one or more virtual calls or meetings on behalf of the user. Such automatic scheduling may be performed based on contextual information that is provided to or accessible by the software agent or robot. In some cases, the contextual information may be derived from written or verbal communications involving one or more participants of the virtual call or meeting. In some cases, the written or verbal communications may comprise email exchanges or phone calls between a third party and a user or an administrator of the robot. In some cases, the contextual information may include mutual calendar availability provided to the robot by the one more meeting attendees. The mutual calendar availability may correspond to a selection of time slots during which two or more meeting attendees are available.

[0051] In other embodiments, the software agent or robot may be configured to schedule the one or more virtual calls or meetings based on a command or an instruction provided by the user or the administrator of the robot. The command or instruction may comprise information on the timing of the meeting, the location of the meeting, the content of the meeting, and/or the participants of the meeting. In some cases, the robot or software agent may be configured to designate one or more participants as required or optional. In some cases, the robot or software agent may be configured to distribute a meeting invite for one or more virtual calls or meetings. Such distribution may be performed automatically by the robot based contextual information, or in response to a direct command or instruction from the user or the administrator of the robot.

[0052] Keywords

[0053] In some embodiments, the software agent or robot may be configured to detect one or more keywords communicated during the one or more virtual calls or meetings. The term “keyword” may also be referred to as “topic” which are utilized interchangeably throughout the specification. A topic may include one or more words or one or more phrases. [0054] In some embodiments, the robot may be configured to detect one or more keywords or topics patterns on a time-series basis. In some cases, the one or more keywords can be set or determined by the user in advance of the one or more virtual calls or meetings. FIG. 11 shows an example of a graphical user interface (GUI) for a user to define, manage and assign one or more keywords/topics to one or more meetings ahead of the scheduled meeting time. In some cases, the robot may be configured to listen for, perceive, or sense the one or more keywords. In some cases, the robot may be further configured to use the one or more keywords to generate or inform an accurate transcription of the one or more calls or meetings. In some embodiments, the software agent or robot may be configured to detect one or more keywords communicated across multiple virtual calls or meetings. In some cases, the software agent or robot may be configured to identify the specific virtual calls or meetings during which the one or more keywords were communicated or detected. This may allow a user to select or view a particular call or meeting of interest to better understand the context and conversations surrounding the keywords communicated or detected.

[0055] In some cases, the one or more keywords may be entered and set in advance by a user. In some cases, the one or more keywords can be organized into one or more groups (e.g., by topic or relevancy or level of importance / interest). In some embodiments, the robot or software agent may be configured to automatically look for the keywords or any keyword groups that came up in each virtual call or meeting. In some cases, the keywords or keyword groups may be customized, modified, or otherwise adjusted on a meeting-to-meeting basis. The user interface may permit a user to generate new groups of keywords, to add additional keywords to a group, or to modify or edit the keywords within one or more groups [0056] Meeting Analytics

[0057] In any of the embodiments described herein, the software agent or robot may be configured to generate meeting analytics based at least in part on the one or more detected keywords. The one or more detected keywords may comprise at least a subset of the keywords that have been previously entered or set by the user or the administrator of the software agent or robot.

[0058] In some embodiments, the meeting analytics may comprise a meeting summary based on one or more domain specific datasets. The one or more domain specific datasets may be filterable by a user or an administrator of the software agent or robot.

[0059] In some cases, the one or more domain specific datasets may correspond to different domains (e.g., divisions, departments, or centers having different technical or business functions). The different domains may correspond to, for example, sales, marketing, engineering, legal, business, and the like.

[0060] In some cases, the one or more domain specific datasets may comprise data or information relating to or associated with various conversational patterns or features of interest. The various conversational patterns or features of interest may comprise, for example, key moments, attendee engagement, attendee sentiment, or any other pattern or feature of interest as described elsewhere herein. In some cases, the one or more domain specific datasets may comprise data or information relating to or associated with a topic of interest, an attendee or speaker of interest, or a time period of interest.

[0061] In some embodiments, the meeting analytics may comprise a report indicating (i) when the one or more keywords were detected during the one or more virtual calls or meetings and (ii) an identity of an individual or entity that communicated the one or more keywords. The report may be displayed to a user via a user interface. In some embodiments, the meeting analytics may indicate when each attendee of the one or more virtual calls or meetings spoke. In some cases, the meeting analytics may be displayed or organized relative to a chronological timeline of the one or more virtual calls or meetings.

[0062] In some embodiments, the meeting analytics may be filterable to show when one or more select attendees of the one or more virtual calls or meetings spoke. In some cases, the robot may be configured to dynamically filter the meeting analytics for actions, emotions, sentiments, or intentions of one or more select attendees of the one or more virtual calls or meetings. In some cases, the robot may be configured to dynamically filter the meeting analytics for key words, questions, or statements communicated by one or more select attendees of the one or more virtual calls or meetings. In some cases, the robot may be configured to dynamically filter the meeting analytics by attendee name or entity (e.g., the company that an attendee is representing, employed by, or otherwise affiliated with).

[0063] Key moments

[0064] In some embodiments, the meeting analytics may comprise one or more key moments for the one or more virtual calls or meetings. In some cases, the one or more key moments may correspond to a positive or negative emotion, intention, or sentiment of one or more attendees of the one or more virtual calls or meetings. The emotion, intention, or sentiment may indicate, for example, whether a participant agrees or disagrees with a certain topic or decision, whether the participant seems cooperative or receptive to a certain idea, whether the participant’s actions indicate agreement, trust, honesty, frustration, anger, ambivalence, or any other type of emotion, intention, or sentiment that can be inferred based on an action, a speech, or an appearance of the participant.

[0065] In some cases, the meeting analytics may comprise one or more key moments associated with a time when one or more attendees of the one or more virtual calls or meetings spoke. In some cases, the one or more key moments may correspond to an emotion, intention, or sentiment of the one or more attendees as described above. In some embodiments, the one or more key moments may be determined based on a quantifiable measure corresponding to a magnitude of the emotion or intention of the one or more attendees.

[0066] In some embodiments, the meeting analytics may provide additional contextual information associated with the one or more key moments. In some cases, the additional contextual information may comprise a transcription of a related portion of a conversation corresponding to the one or more key moments. In some cases, the key moments may be mapped to or linked to or otherwise associated with a portion of a conversation as documented in the transcription.

[0067] Engagement

[0068] In some embodiments, the meeting analytics may comprise information on attendee engagement during the one or more virtual calls or meetings. In some cases, the engagement information may be based on a level of alertness or attention of the participants or attendees. The level of alertness or attention may be derived from speech patterns (e.g., a time to respond to a question) or based on a visual analysis of a video of the virtual call or meeting (e.g., if a participant or attendee is within a field of view of a camera or video camera, or if the participant or attendee’s gaze is focused on their screen or elsewhere). In some embodiments, the engagement may be based on total time spoken by an attendee, a rate of words spoken, a total number of words spoken, or audio signal information such as pitch, tone, volume, and/or energy level.

[0069] In some cases, the level of engagement may be visualized for different portions or time periods of a virtual call or meeting. The level of engagement for each attendee or participant may change during the course of the virtual call or meeting, and the changes in the level of engagement may be monitored and tracked as the virtual call or meeting progresses.

[0070] Attendee Analysis

[0071] In some embodiments, the robot may be configured to identify and track attendees of the one or more virtual calls or meetings. In some cases, the robot may be configured to track one or more verbal or physical actions by the attendees. In some cases, the robot may be configured to track when participants enter or leave the one or more virtual calls or meetings.

[0072] In some cases, the robot may be configured to identify one or more characteristics of the attendees. The one or more characteristics may comprise, for example, demographics, job title, age, gender, geography, company name, and/or company size. The one or more characteristics may be inferred or determined based on various actions by the attendees or an appearance of the attendees. In some embodiments, the robot may be configured to (i) identify voice prints for one or more attendees of the one or more virtual calls or meetings and (ii) determine an intention, an emotion, or a sentiment of the one or more attendees based on the voice prints. In some embodiments, the voice print may be used to accurately identify an attendee in a meeting. In other embodiments, the voice print may be used to distinguish between attendees in order to improve transcription accuracy and facilitate diarisation or automatic labeling of speakers.

[0073] Speech Patterns of Interest

[0074] In some embodiments, the robot may be configured to detect and monitor speech patterns of interest. The speech patterns of interest may be detected using, for example, natural language processing. In some cases, the speech patterns of interest may be detected based on an identification of certain keywords that are mentioned or communicated during the one or more virtual calls or meetings. In some cases, the speech patterns of interest may be detected based on audio signals that are detected during a virtual call or meeting, or other audio communications made during the virtual call or meeting. Such audio communications may include communications made by an attendee, audio signals generated by a computer program, or audio signals generated by content (e.g., audio content or visual content) that is being shared by an attendee of the virtual call or meeting.

[0075] In some cases, the robot may be configured to detect (i) one or more questions from an attendee and/or (ii) one or more emotions or behavioral intentions of the attendee, based at least in part on one or more audio signals or language patterns detected during the one or more virtual calls or meetings. In some cases, the robot may be configured to determine (i) one or more next steps or actions items, (ii) one or more actions to be taken in the present or the future, and/or (iii) one or more tasks to be completed, based at least in part on one or more audio signals or language patterns detected during the one or more virtual calls or meetings.

[0076] Transcripts

[0077] In any of the embodiments described herein, the software agent or robot may be configured to generate one or more meeting recordings or transcripts and selectively distribute the one or more meeting recordings or transcripts to one or more other entities. The one or more meeting recordings or transcripts may be automatically generated as a meeting or call progresses and the robot detects audio signals or other communications by an attendee or a participant.

[0078] In some embodiments, the robot may be configured to (i) generate one or more transcripts based on the content of the one or more virtual calls or meetings and (ii) search the one or more transcripts for a topic, keyword, or phrase of interest. In some cases, the robot may be configured to (i) generate one or more transcripts based on the content of the one or more virtual calls or meetings and (ii) search the one or more transcripts for an emotion, intention, keyword, topic, or sentiment of interest.

[0079] Robot Insights

[0080] In some embodiments, the robot may be configured to generate one or more comments or insights based on the content of the one or more virtual calls or meetings. The comments or insights may comprise, for example, a summary of a discussion topic, an identification of an action item or a matter to be further researched or analyzed, or any other information that can be derivable based on meeting discussions or the information exchanged during the virtual call or meeting. Such comments or insights may be provided directly to a user or an administrator of the robot or software agent. In some cases, the comments or insights may be mapped or linked to certain select portions of a transcript documenting the conversations or discussions conducted during the virtual call or meeting. In such cases, the user or the administrator of the robot or software agent can review the comments or insights and also reference the exact conversations or discussions that were used to derive the comments or insights.

[0081] FIG. 1 illustrates an example of a robot 101 that may be configured to generate one or more outputs 103 based on one or more inputs 102 provided to the robot 101. In some embodiments, the robot 101 may comprise a software agent or any other type of executable computer program that can leverage or utilize the processing power or capabilities of a computing device (e.g., a computer, a processor, a logic circuit, etc.) to perform one or more actions or tasks. In some cases, the robot may perform the one or more actions or tasks by generating one or more outputs 103. In some cases, the one or more outputs 103 may be based on the one or more inputs 102 provided to the robot 101.

[0082] In some embodiments, the one or more outputs 103 may comprise, for example, one or more meeting analytics as described elsewhere herein. In some cases, the one or more outputs 103 may comprise information or data requested by a user. In some embodiments, the one or more inputs 102 may comprise, for example, an instruction or a command to attend a virtual call or a meeting. In some cases, the instruction or command may be associated with a user’s input to schedule the robot 101 to attend one or more virtual calls or meetings. In some cases, the one or more inputs 102 may comprise an instruction or a command to execute a task. The task may be to find and/or analyze one or more data sources, and to generate one or more inferences or to provide one or more deliverables based on the analysis of the one or more data sources. The data sources may comprise local resources (e.g., documents or other electronic files) or online resources such as a webpage or other virtual platform that contains data or information of interest to the user.

[0083] In some cases, the one or more outputs 103 of the robot 101 may be provided to a user. The user may be an operator, an owner, or an administrator of the robot 101. In some cases, the one or more outputs 103 may be provided to a computing unit 104 that is accessible by the user. In some cases, the computing unit 104 may comprise a display or a virtual interface for 2D or 3D rendering of the one or more outputs 103.

[0084] Examples of Graphical User Interface (GUI)

[0085] FIG. 2 illustrates an example of a virtual platform comprising a user interface 201. The user interface 201 may allow a user to select a recording of a virtual call or meeting

202. The virtual call or meeting 202 may be recorded using a software agent (e.g., a robot). The robot may be configured to automatically record, transcribe, and analyze one or more virtual calls or meetings 202. The user interface 201 may be configured to present a user with a list of virtual calls or meetings 202 that have been recorded by the robot. The user interface 201 may also provide, for each meeting instance, the name of the meeting, the meeting date, the meeting topic, the meeting duration, the attendees of the meeting and/or their corporate affiliations, and information on whether the recordings have been shared with certain individuals. The user interface 201 may permit filtering of the recordings of the virtual calls or meetings 202. Such filtering may be based on meeting topic, meeting date, attendees, or other user-definable factors / characteristics / properties of the virtual calls or meetings 202. The virtual platform may provide an interface and system of record for users to store, search, and/or retrieve digital records of meetings based on transcribed content for one or more meetings. FIG. 16 shows another example of graphical user interface (GUI) for managing meeting records. As illustrated in the example, the GUI may display a summary of recorded meetings. The summary may include information indicating the type of the meeting data (e.g., audio, video, transcript, comment/chat collected during or related to a digital meeting, attachment transmitted during a meeting, etc.). For example, an entry of the summary may display an icon 1601 indicating the meeting data is a video recording, the number of chats/comments 1603 and/or number of attachments 1605.

[0086] The system of record may provide a summary of the characteristics of one or more meetings. The platform may permit users to search for meeting characteristics based on keywords, attendee names, title, gender, age, company name, company size, date of meeting, and/or analyzed features such as emotions, sentiments, matching voice prints, or other biometric indicators.

[0087] FIG. 3 illustrates a user interface 301 configured to display meeting analytics 302 that can be generated by the robot for a virtual call or meeting. In some cases, the user interface 301 may display video 303 recorded for the virtual call or meeting. The video 303 may be of one or more participants or attendees of the virtual call or meeting.

[0088] In some embodiments, the meeting analytics 302 may comprise information on each participant or attendee. The participants or attendees of a virtual call or meeting may be identified automatically. In some cases, the participants or attendees may be identified based on speech patterns detected during the virtual call or meeting. In some cases, the participants or attendees may be identified using speaker diarisation, which involves partitioning an input audio or video stream into segments according to speaker identity. Speaker diarisation may be used to answer the question of "who spoke when?" In some cases, speaker diarisation may utilize a combination of speaker segmentation (i.e., finding speaker change points in an audio or video stream) and speaker clustering (i.e., grouping together speech segments on the basis of speaker characteristics). In some cases, the user interface 301 may identify and label each participant or attendee as well as the times at which each participant or attendee spoke.

[0089] In some cases, the meeting analytics 302 may group the attendees by their corporate affiliations. This may provide a user with additional information on when different attendees associated with a same company or group spoke relative to each other.

[0090] In some cases, the meeting analytics 302 may comprise information on conversational patterns, such as when each participant or attendee spoke. The meeting analytics 302 may also indicate when the host spoke. In some cases, the meeting analytics 302 may provide additional information on conversational patterns, such as perceived sentiment or emotion of the speakers during the virtual or meeting.

[0091] In some cases, the user interface 301 may permit a user to add or remove individual participants or attendees from the meeting analytics 302. Doing so may update the conversational patterns to only show when certain select subsets of participants or attendees spoke.

[0092] FIG. 4 shows an exemplary user interface 401 configured to display information on discussions during the meetings or comments 402 made during the meeting. In some cases, the user interface 401 may provide information on when each individual participant or attendee spoke and the substance of the discussions or any comments 402 made during the meeting by the robot or the individual participants or attendees of the meeting.

[0093] In some embodiments, the user interface 401 may provide a discussion tab that is configured to display the communications made at a particular time instance during the virtual call or meeting. The communications made at the particular time instance may comprise visual communications, verbal communications (e.g., speech), and/or text-based communications made by a participant or attendee. In some cases, a particular time instance of interest may be selected by a user viewing the meeting analytics. In some cases, the user interface 401 may permit a user to select the particular time instance by choosing to examine a portion of a meeting timeline associated with the virtual call or meeting. The meeting timeline may comprise a timeline that corresponds to the duration of the virtual call or meeting. In some cases, the user may select the particular time instance by pressing or clicking on a specific portion or section of the meeting timeline, or by scrubbing along the meeting timeline with a cursor, a play head, or a play bar to the desired time instance of interest.

[0094] The chats/comments collected during the meeting may include chat with the robot. FIG. 17 and FIG. 18 show an example of meeting interface. The meeting interface 1700 can be provided by any existing meeting software. The robot may be capable of attending or seamlessly integrated into a variety of platforms, including existing meeting platforms and/or meeting software, calendar or scheduling software. Methods and systems provided herein may allow a robot to collect meeting data without the need to modify the interface or appearance of existing meeting software. As illustrated in FIG. 17, the virtual agent or robot may attend a meeting as any other users. For example, the robot may attend and be admitted 1701 to a current meeting. Any number of robots 1801, 1803 can attend a meeting. The robot may attend a meeting with or without the attendance of its owner or administrator 1807. Any participant or attendee of the meeting may interact with the robot such as by conducting inmeeting chat with the robot. The meeting chat 1805 function may be the features provided by the meeting software. The chat may be recorded as part of the meeting data.

[0095] FIG. 5 and FIG. 6 show the key moments 500 associated with a virtual call or meeting. The key moments 500 may indicate the emotion, sentiment, or intention of a speaker. The key moments 500 may be arranged along a meeting timeline with one or more bars indicating a positive or negative emotion, sentiment, or intention. The bars may be color coded to differentiate between positive and negative emotions, sentiments, or intentions. The magnitude of the bars may vary depending on how positive or negative the detected emotion, sentiment, or intention is. In some cases, the user interface may allow a user to interact with each key moment 500 (e.g., by pressing or clicking on the key moment of interest) to see the specific communications made during the key moments 500.

[0096] In some cases, the user interface may provide a timeline for each individual participant or attendee, and may display one or more signals indicating a particular type of speech pattern detected. The speech pattern may comprise, for example, communications relating to next steps, pain points, comments, or questions. The signals may comprise icons that are arranged along the timeline for each individual participant or attendee, and may indicate a time at which the participant or attendee made a communication that relates to next steps, pain points, comments, or questions. [0097] In some cases, the meeting analytics shown in the user interface may be filterable to provide information on when each type of speech pattern was detected, regardless of the speaker. In other cases, the meeting analytics may be filterable to provide information on when a particular user made a communication containing a speech pattern of interest.

[0098] Referring to FIG. 6, in some cases, the user interface may comprise a moments tab 600 that allows a user to view and/or analyze various key moments 500. The key moments 500 may be filterable by the type of key moment or the type of speech pattern detected. The type of key moment or the type of speech pattern detected may correspond to, for example, next steps, pain points, comments, or questions. The key moments 500 may be associated with a particular participant who initiated or contributed to the key moments, a time stamp indicating when the key moments occurred, and information on or a transcription detailing the next steps, pain points, comments, or questions associated with the key moments that have been detected or identified. In some cases, the moments tab 600 may also provide information on the individual who will be assigned to perform the next steps associated with a key moment 500. In some cases, the moments tab 600 may provide information on the reactions of different participants or individuals to a particular key moment 500.

[0099] FIG. 7 shows the engagement information 700 for the virtual call or meeting. The engagement information may comprise information on the level, amount, or frequency of exchange between two or more participants. The engagement information 700 may be based on the frequency at which different speakers switch between active speech and passive listening. In some cases, the engagement information 700 may be displayed as one or more metrics along a meeting timeline. The metrics may increase when engagement is higher, and may decrease when engagement is lower. In some cases, the engagement information 700 may be based on a level of alertness or attention of the participants or attendees. The level of alertness or attention may be derived from speech patterns (e.g., a time to respond to a question) or based on a visual analysis of a video of the virtual call or meeting (e.g., if a participant or attendee is within a field of view of a camera or video camera, or if the participant or attendee’s gaze is focused on their screen or elsewhere).

[00100] FIG. 8 shows a portion of an exemplary user interface 801 that may be configured to display a transcript 802 of the call. The transcript 802 may comprise a chronological listing of all communications made by each participant or attendee of the virtual call or meeting. The transcript 802 may be automatically produced by the robot using machine learning, artificial intelligence, and/or natural language processing. The transcript 802 may be searchable by the user. In some cases, the user may scroll through the transcript 802 to view the conversations or discussions that took place at a select time period or time instance of interest. In some non-limiting embodiments, various portions of the transcript 802 may be annotated or marked to indicate certain key moments of interest.

[00101] FIG. 9 and FIG. 10 show an exemplary user interface 901 for scheduling the software agent or robot to attend one or more meetings. The one or more meetings may already have a set time and date. In some cases, the one or more meetings may be tentatively scheduled for a first time and date, and may be rescheduled for a second time or date. In such cases, the robot may be configured to attend the virtual call or meeting at the second time or date. In some cases, the robot may be configured to coordinate or perform the actual scheduling of the virtual call or meeting, with the understanding or knowledge that the call or meeting should take place within a certain time frame and should be focused on certain key topics. Once the meeting or call is scheduled, the robot may then attend the meeting or call at the scheduled time and date.

[00102] A robot may be scheduled to attend a future meeting via one or more channels. For example, a robot may be connected to a selected future meeting by adding a meeting link 1001 (e.g., add a zoom link), by sending a calendar invite 1003 with a meeting link and/or by synching with a calendar application (e.g., automatically identify calendar event with meetings) 1005.

[00103] In some cases, the user interface 901 may display calendar events from a user’s calendar. The user’s calendar may be linked to the platform. The software may automatically identify one or more meetings based on calendar data. The user may affirmatively select one or more identified meetings 903 for the robot or software agent to attend. A robot may not attend a meeting 905 that is not selected by a user.

[00104] In some cases, the user interface 901 may display a list of upcoming meetings associated with a user’s calendar or schedule. The user interface 901 may permit a user to select one or more meetings 903 for the robot or software agent to attend (e.g., by using a toggle element). The robot or software agent may then attend and automatically record the meetings, and then deliver one or more meeting insights to the user’s account.

[00105] In some cases, the user interface 901 may allow a user to invite the robot to a meeting that is currently ongoing. In such cases, the user may provide a link to the meeting for the robot to attend. In some cases, the user may designate a name for the meeting for future reference. In some cases, the robot may be invited to a meeting via a calendar invite. In some cases, the robot or software agent may sync with a user’s calendar or schedule and attend one or more meetings associated with the user’s calendar or schedule. In some cases, the robot or software agent may attend every meeting on the user’s calendar or schedule, or a subset or particular selection of meetings in the user’s calendar or schedule.

[00106] The robot or software agent may be configured to join a particular subset or selection of meetings based on one or more rules set by the owner or the administrator of the robot or software agent. For example, user may set up rules such as for the robot to only attend meetings when the user is a host, when the user is a participant, when the user is not participating, when a given person is an attendee of a meeting, a particular type of meeting, and the like. In some cases, calendar metadata may be processed to extract meeting information. The calendar metadata may comprise a plurality of metadata attributes for a meeting such as meeting duration (e.g., time in minutes), meeting start/end time, meeting location, meeting type, event type, attendees, host, or others. The calendar metadata may be used to determine a rule for the robot to attend a meeting. FIG. 19 shows examples of rules 1900 provided within the user interface for a user to set up rules for a robot to attend a meeting. For instance, the rules may include, attending any calendar event with a meeting link, only calendar events when the user is a host, only internal meetings, only external meetings, etc.

[00107] As shown in FIG. 10, the user interface 901 may not or need not be initially configured to display the upcoming meetings for a user or an administrator until the user or administrator links his or her calendar to the robot for syncing. Once the user’s calendar is synced with the platform, the scheduler tab of the user interface may be updated to show a listing of all upcoming meetings that can be attended by the robot or the software agent on the user’s behalf. In some non-limiting embodiments, the user interface 901 may comprise a panel for a user to quickly access the scheduler function or to switch between a scheduler tab and one or more other tabs (e.g., a meetings tab). In some embodiments, the user interface 901 may allow a user to direct or instruct the robot to automatically share the meeting transcript and/or any meeting analytics with specific third parties (e.g., parties who did not attend the virtual call or meeting).

[00108] In addition to conveniently “connection” or “integration” to any existing meeting software, and scheduling applications, the provided system can be integrated to any other messaging applications or Electronic mail (email) software. For example, as soon as a meeting is ended, a message including a summary of the meeting information may be delivered to a user via a messaging account or email address that is connected to the platform. As shown in FIG. 20, the message may include a brief summary of the meeting such as the speakers, percentage of participation in the meeting, headline and summary of the meeting. Alternatively or additionally, the message may be sent to the user within the provided application.

[00109] FIG. 11 shows an exemplary user interface 1101 for managing one or more keywords 1102 to be detected or analyzed by the robot or software agent. Keywords 1102 may be entered and set in advance by a user and/or organized into one or more groups. For example, a user may assign one or more keywords or keyword groups to a scheduled meeting.

[00110] Once the keyword(s) is assigned to a meeting, the robot or software agent may be configured to automatically look for the keywords 1102 or any keyword groups that come up in each virtual call or meeting. In some cases, when the keyword/topic is assigned in advance of a meeting, the robot or software agent may focus on meeting content related to the topic such as by collecting additional data (e.g., visual in addition to audio) related to the relevant content during the meeting (e.g., to save bandwidth or memory). Alternatively, the meeting data to be collected or recorded may not change based on the assigned keyword/topic.

[00111] In some cases, the keywords 1102 or keyword groups 1103 may be customized, modified, or otherwise adjusted on a meeting-by-meeting basis. The user interface may permit a user to generate new groups of keywords, add additional keywords to a group, or modify or edit the keywords within one or more groups. In some cases, the keywords 1102 may be color coded based on user preference, by subject matter or topic, or based on a level of importance. As illustrated in the graphical user interface (GUI) 1101, a user may create or add a new keyword group 1104. A keyword group 1103 may include one or more keywords 1106. A user may add or create new keyword such as by clicking on a graphical element 1105.

[00112] In some cases, a new keyword or keyword group may be manually inputted by a user. For example, by clicking on the button 1105, a user may be prompted to type a word, and/or a topic (e.g., word or phrase). Alternatively or additionally, a candidate or recommended keyword or keyword group may be displayed on the GUI. For example, a recommended keyword may be displayed upon a user clicking on the button 1105 and a user may choose to modify, accept or reject the keyword. The recommended keyword/topic may be automatically generated by the system such as utilizing a machine learning algorithm trained model. In some cases, the model may be trained using training data. The training data may be created based on data such as past topics, meeting data related to the user, an organization, a field, an industry, based on public knowledgebase and the like.

[00113] FIG. 12 shows a keyword tab 1201 of the meeting analytics user interface. The keyword tab 1201 may display a list of keywords 1202 and one or more markers indicating when the keywords were communicated during the call or meeting. The markers may be arranged along a timeline corresponding to the call or meeting. In some cases, the markers may be color coded to visually indicate additional information (e.g., relevance, sentiment, importance, etc.) about the communication containing the key words. In some cases, the user interface may allow a user to add additional keywords for analysis, and the keyword tab 1201 may be updated to display additional markers for the newly added keywords. In some cases, the meeting analytics may be updated as new keywords are added.

[00114] In some embodiments, the user interface may provide an interactive element 1203 that allows a user to modify the keyword settings or keyword groupings used. For example, upon clicking on the keyword setting element 1203, a user may be permitted to assign one or more topics or keywords (groups) to a selected upcoming meeting, or a selected recorded meeting for post-meeting analytics.

[00115] FIG. 13 and FIG. 14 show a sentiment score 1301 that can be assigned to or associated with a virtual call or meeting and/or one or more communications made during the virtual call or meeting. The sentiment score 1301 may comprise a metric that can be displayed along a discussion timeline associated with one or more participants or attendees of the virtual call or meeting. The speech patterns for the participants or attendees may each have a respective sentiment score associated therewith. In some cases, the sentiment scores 1301 may be color coded to indicate speech patterns that are positive, negative, or neutral. [00116] As shown in FIG. 14, in some cases the meeting analytics user interface may comprise a section that displays key moment information 1401. In some cases, the key moment information 1401 may include information on the overall sentiment 1402 of the participants or the attendees of the virtual call or meeting. The overall sentiment 1402 may be displayed as a color coded metric along a meeting timeline. In some cases, the overall sentiment 1402 may be determined based on an aggregation or a combination of the sentiment data for each individual participant or attendee.

[00117] In some embodiments, the user interface may be configured to provide overall metrics for a virtual call or meeting. In some embodiments, the user interface may be configured to provide an overall sentiment score. The overall sentiment score may indicate a relative distribution or ratio of positive, negative, and neutral sentiments for a call or meeting. In some cases, the user interface may also be configured to display or provide an engagement score, metrics on host talk time, and a listing of the most common keywords detected for a virtual call or meeting.

[00118] Suitable data processing techniques such as voice recognition, facial recognition, natural language processing, sentiment analysis and the like may be employed to process the meeting data (e.g., audio, visual, textual, etc.). For example, sentiment analysis may be applied to audio and/or chat messages which utilizes a trained model to identify and extract opinions. For example, the model for generating summary of a discussion topic, identifying an action item, insights, and/or sentiment analysis may be based on a Transformer trained on large text corpora. For instance, the Transformer may be BERT or Generative Pre-trained Transformer (GPT) using masked language modeling as the training tasks. The Transformer model (e.g., GPT) may take transformer model embeddings and generate outputs from them. The training process may be performed on a large base of parameters, attention layers, and batch sizes. As an example, a transformer may consist of several encoder blocks and/or decoder blocks. Each encoder block contains a self-attention layer and a feed forward layer, while each decoder block contains an encoder-decoder attention layer in addition to the selfattention and feed forward layers.

[00119] As mentioned above, machine learning techniques may be employed to generate meeting summary, identify key moment, extract topics, determine sentiment score based on transcription, and perform various other functions as described herein. In some cases, the model may be a multimodal model that can take multimodal input data such as raw RGB frames of videos, audio waveforms, text transcripts of the speech audio and the like. In some cases, the model may be created by fine-tuning a pre-trained transformer model based on the multimodal personal and private meeting data. The term “multimodal” as utilized herein may generally refer to multisource modal information, and different forms of data, such as visual, audio or text data, etc. Due to different forms of data representation, even if the internal semantics are similar, the actual representation is still different, the fine-tuning 1020 of the Transformer model on the multimodal personal and private data may comprise updating the weights and biases of the pre-trained model 1013 using the private and personal data. The model framework may include any suitable structures or architecture. In some cases, the model framework may include a multi-stream structure using an extra transformer layer to encode inter relationship of multi-modal information. For example, the fine-tuning process may comprise linearly projecting each modality into a feature vector and feeding it into a Transformer encoder. A semantically hierarchical common space may be employed to account for the granularity of different modalities. As an example of learning algorithm, a Noise Contrastive Estimation (NCE) may be used as the loss objective to train the model. For example, when the private and personal data includes text features and video features, the model framework may include a text transformer (e.g., BERT or GPT) to embed discrete text features, a visual transformer that takes in the continuous video features and a third cross- modal transformer to embed mutual information between two modalities.

[00120] In some cases, training the model may comprise adjustments to a pre-trained transformer (e.g., unidirectional Transformer) structure to adapt to multi-modal processing and temporal modeling. For example, the model framework may share the weight of selfattention layer across different modalities, but keep the layer of tokenization and linear projection independent for each modality. Alternatively, different attention masks may be used to accommodate downstream tasks that require different modalities. In some cases, the adjustments may be based on the objective of the task. For instance, lower layers of the transformer have less impact on the fine tuning objective for a given task while higher layers are more vital to the task, the lower layers of the Transformer may be changed as little as possible to enable the model to learn basic semantic information from the input text, while the top layers may allow a larger effect influenced by style factors (e.g., tone of voice and facial expressions).

[00121] Computer Systems

[00122] In another aspect, the present disclosure provides computer systems that are programmed or otherwise configured to implement methods of the disclosure, e.g., any of the subject methods for attending and analyzing virtual calls or meetings. FIG. 15 shows a computer system 1501 that is programmed or otherwise configured to implement a method for analyzing virtual calls or meetings. The computer system 1501 may be configured to, for example, (i) record, transcribe, and analyze a content of the one or more virtual calls or meetings in response to an instruction or delegation by a user; (ii) detect one or more keywords communicated during the one or more virtual calls or meetings; and (iii) generate meeting analytics based at least in part on the one or more detected keywords. The computer system 1501 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device. The electronic device can be a mobile electronic device. The computer system 1501 may be configured to implement the presently disclosed methods with the aid of the robots and/or software agents described elsewhere herein.

[00123] The computer system 1501 may include a graphics processing unit (GPU) or a central processing unit (CPU, also "processor" and "computer processor" herein) 1505, which can be a single core or multi core processor, or a plurality of processors for parallel processing. The computer system 1501 also includes memory or memory location 1510 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 1515 (e.g., hard disk), communication interface 1520 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 1525, such as cache, other memory, data storage and/or electronic display adapters. The memory 1510, storage unit 1515, interface 1520 and peripheral devices 1525 are in communication with the CPU 1505 through a communication bus (solid lines), such as a motherboard. The storage unit 1515 can be a data storage unit (or data repository) for storing data. The computer system 1501 can be operatively coupled to a computer network ("network") 1530 with the aid of the communication interface 1520. The network 1530 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. The network 1530 in some cases is a telecommunication and/or data network. The network 1530 can include one or more computer servers, which can enable distributed computing, such as cloud computing. The network 1530, in some cases with the aid of the computer system 1501, can implement a peer-to-peer network, which may enable devices coupled to the computer system 1501 to behave as a client or a server.

[00124] The CPU 1505 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as the memory 1510. The instructions can be directed to the CPU 1505, which can subsequently program or otherwise configure the CPU 1505 to implement methods of the present disclosure. Examples of operations performed by the CPU 1505 can include fetch, decode, execute, and writeback.

[00125] The CPU 1505 can be part of a circuit, such as an integrated circuit. One or more other components of the system 1501 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC).

[00126] The storage unit 1515 can store files, such as drivers, libraries and saved programs. The storage unit 1515 can store user data, e.g., user preferences and user programs. The computer system 1501 in some cases can include one or more additional data storage units that are located external to the computer system 1501 (e.g., on a remote server that is in communication with the computer system 1501 through an intranet or the Internet).

[00127] The computer system 1501 can communicate with one or more remote computer systems through the network 1530. For instance, the computer system 1501 can communicate with a remote computer system of a user (e.g., an owner or an administrator of the robot or the software agent). Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user can access the computer system 1501 via the network 1530. [00128] Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 1501, such as, for example, on the memory 1510 or electronic storage unit 1515. The machine executable or machine readable code can be provided in the form of software. During use, the code can be executed by the processor 1505. In some cases, the code can be retrieved from the storage unit 1515 and stored on the memory 1510 for ready access by the processor 1505. In some situations, the electronic storage unit 1515 can be precluded, and machine-executable instructions are stored on memory 1510.

[00129] The code can be pre-compiled and configured for use with a machine having a processor adapted to execute the code, or can be compiled during runtime. The code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.

[00130] Aspects of the systems and methods provided herein, such as the computer system 1501, can be embodied in programming. Various aspects of the technology may be thought of as "products" or "articles of manufacture" typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. "Storage" type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible "storage" media, terms such as computer or machine "readable medium" refer to any medium that participates in providing instructions to a processor for execution.

[00131] Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media including, for example, optical or magnetic disks, or any storage devices in any computer(s) or the like, may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.

[00132] The computer system 1501 can include or be in communication with an electronic display 1535 that comprises a user interface (LT) 1540 for providing, for example, a portal for an end user to control or monitor a robot or a software agent as described elsewhere herein, and/or to view meeting analytics generated by the robot or software agent. The portal may be provided through an application programming interface (API). A user or entity can also interact with various elements in the portal via the UI. Examples of UI's include, without limitation, a graphical user interface (GUI) and web-based user interface. In some nonlimiting embodiments, the computer system 1501 can include or provide a voice user interface (VUI) that allows a user or entity to interact with the system through voice or speech commands. In some embodiments, the computer system 1501 can include or provide computer vision capabilities or functionalities that enable images and video inputs to be processed or analyzed by the robot (e.g., to generate meeting analytics as described elsewhere herein). Such computer vision capabilities or functionalities can be implemented with the aid of one or more imaging sensors operatively coupled to the system.

[00133] Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the central processing unit 1505. For example, the algorithm may be configured to (a) record, transcribe, and analyze a content of the one or more virtual calls or meetings in response to an instruction or delegation by a user; (b) detect one or more keywords communicated during the one or more virtual calls or meetings; and (c) generate meeting analytics based at least in part on the one or more detected keywords.

[00134] While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.

Claims

CLAIMS WHAT IS CLAIMED IS:

1. A system, comprising: a robot for autonomously attending one or more virtual calls or meetings as a distinct entity, wherein the robot is configured to:

(a) record, transcribe, and analyze a content of the one or more virtual calls or meetings in response to an instruction or delegation by a user;

(b) detect one or more keywords communicated during the one or more virtual calls or meetings; and

(c) generate meeting analytics based at least in part on the one or more detected keywords.

2. The system of claim 1, wherein the meeting analytics comprise a meeting summary based on one or more domain specific datasets.

3. The system of claim 1, wherein the meeting analytics comprise a report indicating (i) when the one or more keywords were detected during the one or more virtual calls or meetings and (ii) an identity of an individual or entity that communicated the one or more keywords.

4. The system of claim 1, wherein the meeting analytics indicate when each attendee of the one or more virtual calls or meetings spoke.

5. The system of claim 4, wherein the meeting analytics are filterable to show when one or more select attendees of the one or more virtual calls or meetings spoke.

6. The system of claim 1, wherein the meeting analytics comprise one or more key moments corresponding to a positive or negative emotion, intention, or sentiment of one or more attendees of the one or more virtual calls or meetings.

7. The system of claim 4, wherein the meeting analytics further comprise one or more key moments associated with a time when one or more attendees of the one or more virtual calls or meetings spoke, wherein the one or more key moments correspond to an emotion, intention, or sentiment of the one or more attendees. The system of claims 6 or 7, wherein the one or more key moments are determined based on a quantifiable measure corresponding to a magnitude of the emotion or intention of the one or more attendees. The system of claim 6, wherein the meeting analytics provide additional contextual information associated with the one or more key moments, wherein the additional contextual information comprises a transcription of a related portion of a conversation corresponding to the one or more key moments. The system of claim 1, wherein the meeting analytics comprise information on attendee engagement during the one or more virtual calls or meetings. The system of claim 1, wherein the robot is configured to transcribe and/or analyze one or more audio communications made during the one or more virtual calls or meetings using natural language processing, artificial intelligence, digital signal processing, audio waveform analysis, and/or machine learning. The system of claim 1, wherein the robot is configured to identify speech patterns, conversational patterns, meeting patterns, and/or a progression of discussion topics, sentiment, or engagement for the one or more virtual calls or meetings on a time series basis. The system of claim 1, wherein the robot is configured to identify and analyze meeting patterns based on the one or more virtual calls or meetings, wherein the meeting patterns comprise information on a number of meetings attended during a time period, a characteristic of the meetings attended, and/or a sentiment associated with the meetings. The system of claim 1, wherein the robot is further configured to identify and track attendees of the one or more virtual calls or meetings. The system of claim 14, wherein the robot is configured to track one or more verbal or physical actions by the attendees. The system of claim 14, wherein the robot is further configured to identify one or more characteristics of the attendees, wherein the one or more characteristics comprise demographics, job title, age, gender, geography, company name, or company size. The system of claim 1, wherein the robot is further configured to (i) identify voice prints for one or more attendees of the one or more virtual calls or meetings and (ii) determine an intention, an emotion, or a sentiment of the one or more attendees based on the voice prints. The system of claim 1, wherein the robot is further configured to detect one or more questions from an attendee and/or one or more emotions or behavioral intentions of the attendee based at least in part on one or more audio signals or language patterns detected during the one or more virtual calls or meetings. The system of claim 1, wherein the robot is further configured to determine one or more next steps or actions items, one or more actions to be taken in the present or the future, one or more questions asked, one or more questions answered, and/or one or more tasks to be completed, based at least in part on one or more audio signals or language patterns detected during the one or more virtual calls or meetings. The system of claim 1, wherein the robot is further configured to dynamically filter attendees from the one or more virtual calls or meetings. The system of claim 1, wherein the robot is further configured to detect one or more keyword patterns on a time-series basis. The system of claim 1, wherein the robot is further configured to generate one or more meeting recordings or transcripts and selectively distribute the one or more meeting recordings or transcripts to one or more other entities. The system of claim 1, wherein the robot is further configured to generate (i) one or more comments or insights or (ii) an extractive meeting summary based on the content of the one or more virtual calls or meetings. The system of claim 1, wherein the robot is further configured to (i) generate one or more transcripts based on the content of the one or more virtual calls or meetings and (ii) search the one or more transcripts for a topic, keyword, or phrase of interest. The system of claim 1, wherein the robot is further configured to (i) generate one or more transcripts based on the content of the one or more virtual calls or meetings and (ii) search the one or more transcripts for an emotion, intention, keyword, topic, or sentiment of interest. The system of claim 1, wherein the robot is further configured to (i) generate one or more transcripts based on the content of the one or more virtual calls or meetings and (ii) search the one or more transcripts based on a meeting characteristic. The system of claim 26, wherein the meeting characteristic comprises an attendee name, attendee age, attendee gender, attendee location or geography, entity name, entity location or geography, entity size, or any combination thereof. The system of claim 1, wherein the robot is further configured to automatically schedule the one or more virtual calls or meetings on behalf of the user. The system of claim 1, wherein the instruction by the user comprises an instruction for the robot to attend the one or more virtual calls or meetings at a scheduled time. The system of claim 1, wherein the robot is capable of being directed, scheduled, or instructed to attend the one or more virtual calls or meetings without a concurrent presence of an administrator or an owner of the robot. The system of claim 1, wherein the robot is configurable to attend a plurality of calls or meetings simultaneously and generate (i) discrete meeting analytics for each of the plurality of calls or meetings and/or (ii) aggregated meeting analytics for the plurality of calls or meetings. The system of claim 1, wherein the instruction by the user comprises an instruction for the robot to execute one or more tasks at a predetermined or user-specified time. The system of claim 1, wherein the one or more keywords are set or determined by the user in advance of the one or more virtual calls or meetings. The system of claim 33, wherein the robot is configured to listen for, perceive, or sense the one or more keywords. The system of claim 33, wherein the robot is configured to use the one or more keywords to generate a more accurate transcription of the one or more calls or meetings. The system of claim 1, wherein the robot comprises an independent software agent.