US12328347B2 - Synthetic moderator - Google Patents
Synthetic moderator Download PDFInfo
- Publication number
- US12328347B2 US12328347B2 US17/402,228 US202117402228A US12328347B2 US 12328347 B2 US12328347 B2 US 12328347B2 US 202117402228 A US202117402228 A US 202117402228A US 12328347 B2 US12328347 B2 US 12328347B2
- Authority
- US
- United States
- Prior art keywords
- call
- participant
- command
- intent
- client device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/147—Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/1066—Session management
- H04L65/1083—In-session procedures
- H04L65/1093—In-session procedures by adding participants; by removing participants
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/16—Arrangements for providing special services to substations
- H04L12/18—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
- H04L12/1813—Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
- H04L12/1822—Conducting the conference, e.g. admission, detection, selection or grouping of participants, correlating users to one or more conference sessions, prioritising transmission
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/02—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/1066—Session management
- H04L65/1083—In-session procedures
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/40—Support for services or applications
- H04L65/403—Arrangements for multi-party communication, e.g. for conferences
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
- H04N7/155—Conference systems involving storage of or access to video conference sessions
Definitions
- Video conferencing also referred to as conference calling
- conference calling allows multiple participants to interact with each other using video, audio, messages, etc.
- Manually moderating a conference call by a participant of the call may detract from the purpose of the call.
- a challenge is to provide systems that automatically moderate the call for the participants of the call.
- one or more embodiments relate to a method implementing a synthetic moderator.
- Utterance text is obtained from a conference call that includes a call context.
- An intent is identified, from the utterance text, for a command.
- Contextual data is identifying, from the call context, for the command.
- the command is executed using the contextual data.
- a result of executing the command is presented.
- one or more embodiments relate to a system that includes an application executing on at least one processor.
- Utterance text is obtained from a conference call that includes a call context.
- An intent is identified, from the utterance text, for a command.
- Contextual data is identifying, from the call context, for the command.
- the command is executed using the contextual data.
- a result of executing the command is presented.
- one or more embodiments relate to a method implementing a synthetic moderator.
- a client device connects to a conference call that includes a call context.
- utterance text is obtained from the conference call.
- An intent is identified, from the utterance text, for a command.
- Contextual data is identifying, from the call context, for the command.
- the command is executed using the contextual data.
- a result of executing the command is presented.
- FIG. 1 shows a diagram of systems in accordance with disclosed embodiments.
- FIG. 2 shows a flowchart in accordance with disclosed embodiments.
- FIG. 3 A , FIG. 3 B , FIG. 3 C , FIG. 3 D , FIG. 4 A , FIG. 4 B , FIG. 4 C , FIG. 4 D , FIG. 5 A , FIG. 5 B , FIG. 5 C , FIG. 6 A , FIG. 6 B , FIG. 6 C , FIG. 6 D , FIG. 6 E , FIG. 6 F , FIG. 6 G , FIG. 6 H , FIG. 6 I , FIG. 7 A , FIG. 7 B , FIG. 7 C , and FIG. 7 D show examples in accordance with disclosed embodiments.
- FIG. 8 A and FIG. 8 B show computing systems in accordance with disclosed embodiments.
- ordinal numbers e.g., first, second, third, etc.
- an element i.e., any noun in the application.
- the use of ordinal numbers is not to imply or create any particular ordering of the elements nor to limit any element to being only a single element unless expressly disclosed, such as by the use of the terms “before”, “after”, “single”, and other such terminology. Rather, the use of ordinal numbers is to distinguish between the elements.
- a first element is distinct from a second element, and the first element may encompass more than one element and succeed (or precede) the second element in an ordering of elements.
- a synthetic moderator may be invoked during a call with one or more participants present in the call. Invocation may involve one or more participants pronouncing a phrase (e.g., common activation phrase, user-defined phrase, keyword, any phrase with microphone muted in software, etc.) and interacting with an activation element in the user interface (e.g., activation button, activation slider, toggle, checkbox, interactive image and/or animation, etc.).
- the moderator manager may begin to receive audio and video streams from one or more participants present in the call.
- moderator manager may begin to receive textual representation of audio stream (e.g., transcription). For example, the first participant may pronounce activation phrase.
- Transcription of audio streams of the first, second, and third participants may be sent to moderator manager.
- the second participant may press a button on user interface to invoke the synthetic moderator. Audio and video streams of first, second, and third participants may be sent to moderator manager.
- one or more participants may receive indication that the synthetic moderator was engaged.
- the moderator manager may display a visual indication to call participants (e.g., displaying a synthetic moderator icon or animation on the screen, creating participant placeholder for the synthetic moderator, etc.), an audible indication to call participants (e.g., playback of recorded audio, playback of recorded utterance, synthesized greeting speech, etc.), and a textual indication to call participants (e.g., persistent chat message with activation timestamp, call transcript entry, etc.).
- the first participant may invoke the synthetic moderator by pronouncing an activation phrase (e.g., “hey moderator”)
- the first, second, and third participants may see an animation on the screen of their devices
- the first, second and third participants may hear the synthesized greeting speech spoken by the synthetic moderator
- the animation indication may persist on the screen while the synthetic moderator is engaged
- the visual indication may be removed when the synthetic moderator is disengaged.
- the synthetic moderator speaks by converting text strings to speech that is played into an audio stream added to the call for the synthetic moderator.
- the first participant may have muted microphone and invoke the synthetic moderator by speaking.
- the first participant may hear a synthesized audio message and the second and third participants may not receive indication of the synthetic moderator being engaged by the first participant.
- the first participant may interact with the synthetic moderator while second and third participants are not aware of the interaction.
- the synthetic moderator may unmute the microphone of first participant, the second and third participants may not be aware that microphone was unmuted by the synthetic moderator and not by the first participant directly.
- the moderator manager may process the video, audio and text received from one or more participants to extract a requested or expected action (e.g., intent, direct command, request, question, statement, invocation, etc.).
- a requested or expected action e.g., intent, direct command, request, question, statement, invocation, etc.
- extraction of intent may happen by matching against pre-determined intent templates (e.g., grammar, regular expressions, etc.).
- extraction of intent may use fuzzy matching (e.g., machine learning, neural networks, k-nearest neighbors, clustering, etc.).
- Zero or more intents may be extracted from a chunk (e.g., sequence of one or more frames of a video, a set of multiple audio sample, an utterance represented by text, etc.) of video, audio and text received by the moderator manager.
- first participant may request a call to be schedule by speaking it out.
- the moderator manager may receive a transcription (e.g., textual representation of a spoken phrase and/or sentence) of first participant's utterance.
- the moderator manager may use regular expressions to match against a template comprising words, phrases, patterns of characters, etc.
- the moderator manager may find zero or more intent matches in the transcript.
- the moderator manager may execute one or more extracted intents and may attempt to gather information and context to execute one or more extracted intents.
- the moderator manager may gather information and context by making requests to services responsible for such information and context (e.g., database, directory, application programming interface (API), registry, file system, participant device data, etc.).
- the moderator manager may request information from one or more participants (e.g., using visual aid, using chat, using synthesized voice, using icons and pictograms, etc.) and gather their submissions (e.g., vocal replies, textual replies, messages, selections, user interface actions, etc.).
- the first participant may produce an add participant intent that is identified by the moderator manager.
- the intent may require a name of a participant to add.
- the moderator manager may show the first participant a text prompt that may be used by first participant to enter the name of a participant to add.
- the moderator manager may use entered text to lookup users and contacts of the first participant for a matching name (e.g., exact match, approximate match, Levenshtein distance, last name only, etc.).
- the moderator manager may execute an add participant intent using the matching user and first participant's contact.
- the moderator manager may gather participant submissions to information and context gathering from multiple participants. For example, first participant may initiate call rescheduling intent. The moderator manager may attempt to gather information (e.g., desired call date, desired call time). The moderator manager may send an audible request to each of the call participants for a desired call date and time. The first participant may respond with desired call time with their voice. The second participant may respond with desired day of week with voice. The third participant may respond with desired call date in a chat message. The moderator manager may use gathered information to produce a single (e.g., non-conflicting) call date and time. The moderator manager may repeat information and context gathering if conflicts between context and information are found. The moderator manager may execute the intent once no conflicts are present.
- information e.g., desired call date, desired call time
- the moderator manager may send an audible request to each of the call participants for a desired call date and time.
- the first participant may respond with desired call time with their voice
- the moderator manager may produce a response to one or more participants before, during, or after the intent is executed (e.g., after call was scheduled, after participant was added, while information is being gathered from participants, before email is sent to participants, etc.).
- the response may be presented with text displayed on participant devices (e.g., dialog message, overlay text, system notification, popup prompt, etc.).
- the response may be presented with synthesized and recorded speech (e.g., text-to-speech, generated speech, recorded utterance, etc.).
- the first participant may initiate an intent to schedule a call.
- the moderator manager may gather context and information, execute the intent, and initiate playback of pre-recorded synthesized utterance that may be perceived by one or more connected call participants that the call was scheduled.
- the response may be presented with visual indication (e.g., animation of new calendar entry appearing is played).
- the moderator manager may record the intent, gather information and context, store a response to persistent storage (e.g., database, file storage, cloud storage, distributed storage, etc.) and use at least some of the recorded information when extracting intents, gathering information and context, executing intents, and producing responses in the future.
- the moderator manager may use at least some recorded information to improve accuracy of extracting intents (e.g., recognize commonly used intents, adjust fuzzy matching to include previously unused words and phrases, associate abbreviations, and synonyms to reduce number of information and context gathering stages, etc.).
- the moderator manager may use at least some recorded information in context and information gathering stages.
- the moderator manager may reuse previously gathered context information for executing intents during the same and different calls.
- the moderator manager may use at least some recorded information when producing responses (e.g., use different phrases to mean same thing to avoid repetition, use similar language when producing responses to consecutive intents and information gathering stages, produce personalized responses based on context and information about call participants, etc.).
- the moderator manager may create delayed intents.
- the moderator manager may store the intent, the information and context for executing the intent, execution moment (e.g., call phase, after keyword and trigger word is spoken and appears in a message, specific date and time, etc.), as well as other relevant information, in storage (e.g., temporary storage, operational storage, persistent storage, database, etc.).
- the moderator manager may execute one or more delayed intents at execution moments during and after the call.
- the moderator manager may produce a response (e.g., audio response, recorded utterance playback, text response) and other type of completion notification (e.g., instant message, push notification, email message, text message, on-screen indication, user interface indication, etc.) that may contain intent execution results (e.g., date of next call, bullet point summary, call reports, etc.).
- a response e.g., audio response, recorded utterance playback, text response
- other type of completion notification e.g., instant message, push notification, email message, text message, on-screen indication, user interface indication, etc.
- intent execution results e.g., date of next call, bullet point summary, call reports, etc.
- the first participant may initiate an intent for spoken call points to be summarized and distributed to one or more call participants before and after the call ends.
- the moderator manager may execute an intent right away (e.g., begin recording call audio and video streams for each participant, begin transcribing call audio from one or more participants, begin recording chat messages sent by call participants, etc.).
- the moderator manager may create a delayed intent to produce and distribute call points summary for an execution moment before and after the end of the call.
- the moderator manager may execute the delayed intent at an execution moment (e.g., when the call ends, when last participant disconnects, when one or more participants indicate the call has ended by interacting with user interface elements, etc.).
- the moderator manager may produce a response to one or more participants as result of executing the delayed intent.
- the moderator manager may produce the execution report that may be sent (e.g., on-screen report, email notification, push notification, etc.) to one or more participants that initiated the original intent that led to creation of delayed intent.
- a first participant may produce a call transfer intent that is identified by the moderator manager.
- the intent may require a name of the device (e.g., device type, user-defined device name, device location name, relative device location description, etc.) to which the call may be transferred.
- the moderator manager may execute a call transfer intent using the device name.
- the state and context of the moderator manager may be transferred to the device.
- the moderator manager uses strings of structured text (e.g., JavaScript object notation (JSON) text).
- JSON JavaScript object notation
- the message below may be received by the moderator manager:
- the value “private” is a Boolean value (e.g., true or false) that may indicate whenever input from other participants may be processed by the moderator manager. As an example, if private is set to true, second and third participants may not reply to information requests made by the moderator manager.
- the above message may be sent to provide an indication to other participants that the moderator manager is engaged.
- the above message may indicate that input from other participants may be processed by the moderator manager.
- the moderator manager may extract intents, collect information and contexts, and prepare to execute the commands related to intents.
- An optional utterance parameter may be used by the moderator manager to pass custom textual message with an activation indication message to a participant.
- the optional utterance parameter may be used as text-to-speech system input by one or more participants to produce audible activation indication.
- the following message may be sent from the moderator manager to every participant when information and/or context gathering occurs.
- the message above contains an utterance parameter that may be used by the moderator manager to pass custom textual message with information and context requests to one or more participants.
- the custom textual message may be used as text-to-speech system input by one or more participants to produce audible information and context request.
- the message may also contain an optional “gather_id” parameter, which may be used by the moderator manager to associate participant responses with information gathering requests associated with intents.
- the following message may be sent from the moderator manager to an external application programming interface (API) (for example to get participant schedules) when information or context gathering occurs.
- API application programming interface
- the message may contain API-specific parameters. For example, it may contain “participant_emails” parameters that may contain list of call participant emails, message may optionally contain “date_range” (e.g., the moderator manager has gathered information that the call to schedule is to happen on a specific week, such as “next week”) that may be of type Unix timestamp.
- the response message above may contain a list with zero or more entries, for each requested participant email. Each participant entry may contain start and end times in Unix timestamp format of future scheduled events. If the “date_range” parameter is specified in the preceding request message, then the response message may contain the events in that time range.
- FIG. 1 shows a diagram of embodiments that are in accordance with the disclosure.
- FIG. 1 shows a diagram of the system ( 100 ) that implements synthetic moderation.
- the embodiments of FIG. 1 may be combined and may include or be included within the features and embodiments described in the other figures of the application.
- the features and elements of FIG. 1 are, individually and as a combination, improvements to synthetic moderation technology and computing systems.
- the various elements, systems, and components shown in FIG. 1 may be omitted, repeated, combined, and/or altered as shown from FIG. 1 . Accordingly, the scope of the present disclosure should not be considered limited to the specific arrangements shown in FIG. 1 .
- the system ( 100 ) synthetically moderates conference calls.
- the system ( 100 ) includes the client devices A ( 102 ) through N ( 115 ), the server ( 128 ), and the repository ( 138 ).
- a conference call (also referred to as a call) include multiple video and audio streams shared in real time between the client devices A ( 102 ) and N ( 115 ) using the conference applications A ( 105 ) and N ( 118 ).
- a conference call allows the participants of the call (the users of the client devices A ( 102 ) through N ( 115 )) to communicate vocally and visibly.
- the conference service ( 132 ) may facilitate a call between the client devices A ( 102 ) and N ( 115 ).
- the conference call may utilize multiple standards and protocols, including audio standards (G.711, G.722, G.728, G.729, Opus, etc.), video standards (H.261, H.262, MPEG-2, H.263, H.264, MPEG4, AVC, HEVC, VP8, VP9, etc.), data standards (e.g., T.120), network protocol standards (TCP/IP, HTTP, HTTPS, UDP, etc.), etc.
- audio standards G.711, G.722, G.728, G.729, Opus, etc.
- video standards H.261, H.262, MPEG-2, H.263, H.264, MPEG4, AVC, HEVC, VP8, VP9, etc.
- data standards e.g., T.120
- network protocol standards e.g., HTTP, HTTPS, UDP, etc.
- the client devices A ( 102 ) and N ( 115 ) are computing systems (further described in FIG. 8 A ).
- the client devices A ( 102 ) and N ( 115 ) may be desktop computers, mobile devices, laptop computers, tablet computers, on-board computers, etc.
- the client device A ( 102 ) may be used by a user (also referred to as a participant of a call) to initiate or join a conference call with the client device N ( 115 ).
- the client devices A ( 102 ) and N ( 115 ) respectively include several hardware and software components (e.g., processors, memory, programs, etc.).
- the client devices A ( 102 ) and N ( 115 ) respectively include the conference applications A ( 105 ) and N ( 118 ), the moderator managers A ( 108 ) and N ( 120 ), the synthetic moderators A ( 110 ) and N ( 122 ).
- the client devices A ( 102 ) and N ( 115 ) may operate in a similar fashion.
- the conference application A ( 105 ) sends and receives audio, video, and data for a conference call.
- the conference application A ( 105 ) may receive audio and video input streams from cameras and microphones of the client device A ( 102 ) and share the audio and video input streams with the client device N ( 115 ).
- the data may include chat messages sent between participants.
- the moderator manager A ( 108 ) moderates calls placed using the conference application A ( 105 ).
- the moderator manager A ( 108 ) receives user inputs, identifies intents, and produces results.
- the moderator manager A ( 108 ) receives utterance text, which is a transcription of speech from a user.
- the utterance text is analyzed to identify intents and to process commands from the intents.
- the synthetic moderator A ( 110 ) provides outputs to a call that are generated by the moderator manager A ( 108 ). For example, the synthetic moderator A ( 110 ) may initiate a new audio stream on a call to play a greeting message after the moderator manager A ( 108 ) detects an activation phrase in utterance text from a user of the client device A ( 102 ).
- the server ( 128 ) is a computing system (further described in FIG. 8 A ).
- the server ( 128 ) may include multiple physical and virtual computing systems that form part of a cloud computing environment. In one embodiment, execution of the programs and applications of the server ( 128 ) is distributed to multiple physical and virtual computing systems in the cloud computing environment.
- the server ( 128 ) includes several hardware and software components (processors, memory, programs, etc.), including the server application ( 130 ), the conference service ( 132 ), the moderator manager service ( 135 ), and the synthetic moderator service ( 137 ).
- the server application ( 130 ) provides centralized access to data and streams used by the system.
- the server application ( 130 ) hosts a website accessible to the client devices A ( 102 ) and N ( 115 ) that provides functionality for conference calls, calendaring, scheduling, contact tracking, etc.
- the conference service ( 132 ) may be used in a client server model to host a conference call between the client devices A ( 102 ) and N ( 115 ).
- the conference service ( 132 ) may receive and share audio, video, and data streams between the client devices A ( 102 ) and N ( 115 ).
- the moderator manager service ( 135 ) may operate in a client server model as a moderator manager for calls between the client devices A ( 102 ) and N ( 115 ).
- the moderator manager service ( 135 ) may detect user inputs, identify intents, and provide results.
- the repository ( 138 ) is a computing system that may include multiple computing devices in accordance with the computing system ( 800 ) and the nodes ( 822 ) and ( 822 ) described below in FIGS. 8 A and 8 B .
- the repository ( 138 ) may be hosted by a cloud services provider that also hosts the server ( 128 ).
- the cloud services provider may provide hosting, virtualization, and data storage services as well as other cloud services and to operate and control the data, programs, and applications that store and retrieve data from the repository ( 138 ).
- the data in the repository ( 138 ) includes the records ( 140 ).
- the records ( 140 ) are electronic files stored in the repository ( 138 ).
- the records ( 140 ) include data calendars, contacts, schedules, etc.
- the records ( 140 ) are used by the system to schedule and set up conference calls between the client devices A ( 102 ) through N ( 115 ).
- FIG. 2 shows a flowchart of synthetic moderation. Embodiments of FIG. 2 may be combined and may include or be included within the features and embodiments described in the other figures of the application. The features of FIG. 2 are, individually and as an ordered combination, improvements to synthetic moderation technology and computing systems. While the various steps in the flowcharts are presented and described sequentially, one of ordinary skill will appreciate that at least some of the steps may be executed in different orders, may be combined or omitted, and at least some of the steps may be executed in parallel. Furthermore, the steps may be performed actively or passively. For example, some steps may be performed using polling or be interrupt driven. By way of an example, determination steps may not have a processor process an instruction unless an interrupt is received to signify that condition exists. As another example, determinations may be performed by performing a test, such as checking a data value to test whether the value is consistent with the tested condition.
- the process ( 200 ) synthetically moderates a call.
- the process ( 200 ) may execute on a client device or on a server.
- utterance text is obtained from a conference call that includes a call context.
- the call context may include a call identifier, a set of participant identifiers corresponding to a set of participants of the call, a mute status of a participant of the set of participants, etc.
- the call identifier is a value that uniquely identifies one call from the other calls that may be placed using the system.
- the participant identifiers are values that uniquely identify the participants of a call.
- the mute status is a value that identifies whether the audio stream of a participant is muted (no sound being transmitted) or unmuted (sound transmission is enabled).
- the utterance text is obtained by receiving an audio stream with speech and transcribing the speech from the audio stream to text. Transcription programs generating the utterance text may continuously transcribe speech from the audio streams of the call.
- a moderator manager receives the utterance text from transcription program. In one embodiment, a moderator manager receives zero or more alternative transcriptions for the same audio stream. In one embodiment, a moderator manager includes the transcription program and process the audio stream to generate utterance text. In one embodiment, the audio stream is a second audio stream from a second client device.
- the utterance text may be received as a text input from a participant.
- a participant may input a text string to a client device that is received by the system and processed as the utterance text in addition to or in lieu of generating the utterance text from the transcription of speech of the participants of a call.
- intents are identified from utterance text for commands.
- One or more intents may be identified from utterance text by a moderator manager.
- a command is a set of instructions executed by a moderator manager in response to intents identified from the inputs (speech, text, etc.) of participants of a call.
- a command may include function calls to an application programming interface (API).
- API application programming interface
- an intent may be identified that maps to the command for unmuting an audio stream.
- the moderator manager may call the unmute function of an API for the conference call in response to the intent identified from the speech of a participant.
- an intent may be one of a set of intents.
- Each intent may include a name, a set of trigger strings, a set of commands, etc.
- the set of trigger strings are strings that, when recognized by the moderator manager, trigger execution of the one or more commands associated with an intent.
- the intent is identified using a chatbot.
- the chatbot is a program that conducts a conversation with a person in lieu of providing direct contact with a live human agent.
- the utterance text is sent as an input to the chatbot.
- the chatbot processes the utterance text and returns the intent.
- the intent may be received as an output from the chatbot as a text string.
- the output from the chatbot is mapped to a command, which is further processed by the moderator manager.
- contextual data for the command is identified from the call context.
- the contextual data may include a call identifier, a set of participant identifiers, a mute status of a participant, etc.
- non-contextual data for the command may be transcribed and identified.
- Subsequent speech is transcribed from an audio stream to form subsequent text.
- the non-contextual data is identified for the command from the subsequent text.
- the subsequent speech may be from the participant that triggered the moderator manager.
- the subsequent speech is received from a second audio stream of the conference call corresponding to a second participant.
- the non-contextual data for the command may be a subsequent participant identifier, a date value, a time value, etc.
- commands are executed using the contextual data.
- the system may be configured to execute multiple different commands.
- executing the command adds a subsequent participant to the conference call using a subsequent participant identifier from non-contextual data.
- executing the command schedules an ensuing conference call using the contextual data and non-contextual data.
- executing the command unmutes an audio stream of the conference call.
- executing the command transfers a call from a first client device of a first participant to a second client device of the first participant.
- a result of executing the command is presented.
- the result may indicate that an ensuing call has been scheduled, may show a new person being connected to the current call, may show a change in mute status, may show an agenda or summary, etc.
- the result is presented by a synthetic moderator creating a new audio stream to play audio messages to one or more of the participants.
- the result may be overlaid onto video stream of the participant that triggered the moderator manager.
- the result may be displayed in a new video stream.
- FIGS. 3 A through 3 D, 4 A through 4 D, 5 A through 5 C, 6 A through 6 I , and FIGS. 7 A through 7 D show examples of sequences and interfaces of synthetic moderators.
- FIGS. 3 A through 3 D show an example of scheduling an ensuing call with synthetic moderation.
- FIGS. 4 A through 4 D show an example of adding a participant with synthetic moderation.
- FIGS. 5 A through 5 C show an example of unmuting a device using synthetic moderation.
- FIGS. 6 A through 6 I show an example of processing multiple intents for agenda and summary functionality using synthetic moderation.
- FIGS. 7 A through 7 D may be combined and may include or be included within the features and embodiments described in the other figures of the application.
- the features and elements of FIGS. 3 A through 3 D, 4 A through 4 D, 5 A through 5 C, 6 A through 6 I , and FIGS. 7 A through 7 D are, individually and as a combination, improvements to synthetic moderation technology and computing systems.
- FIGS. 7 A through 7 D show an example of transferring a call between devices using synthetic moderation.
- the various features, elements, widgets, components, and interfaces shown in FIGS. 3 A through 3 D, 4 A through 4 D, 5 A through 5 C, 6 A through 6 I , and FIGS. 7 A through 7 D may be omitted, repeated, combined, and/or altered as shown. Accordingly, the scope of the present disclosure should not be considered limited to the specific arrangements shown in FIGS. 3 A through 3 D, 4 A through 4 D, 5 A through 5 C, 6 A through 6 I , and FIGS. 7 A through 7 D .
- the sequence ( 300 ) schedules an ensuing call.
- the sequence ( 300 ) is performed by the server ( 302 ), the client device A ( 304 ), the client device B ( 306 ), the client device C ( 308 ), and the client device D ( 310 ).
- a first participant (using the client device ( 304 )) may be taking part in a scheduled video conference call with three other participants (using the client devices B ( 306 ) through D ( 310 )).
- the first participant uses the client device ( 304 ), which displays the user interface of FIG. 3 B .
- the third participant of the client device C ( 308 ) may initiate discussion about when next call may be held.
- the first participant of the client device A ( 304 ) may pronounce a keyword phrase triggering the activation ( 322 ) that may engage a synthetic moderator operating on the client device A ( 304 ).
- the moderator manager may receive the notification message about synthetic moderator being engaged and moderator manager may activate transcription of audio from each of the call participants.
- the moderator manager executes on the client device A ( 304 ). In one embodiment, the moderator manager may execute on the server ( 302 ).
- the moderator manager may produce the notification ( 324 ) (shown on the user interface ( 350 ) of FIG. 3 C ).
- the notification ( 324 ) is presented as an audible response with a greeting phrase that each of the participants may be able to hear on the client devices A ( 304 ) through D ( 310 ).
- the first participant may speak the phrase containing the intent ( 326 ) to reschedule the present conference call.
- the moderator manager may use the audio transcription from the first participant to extract the intent and begin gathering information and context for executing the intent of rescheduling a call.
- the moderator manager may gather the list of call participants to invite from the list of participants in the current call.
- the moderator manager generates the request ( 328 ).
- the request ( 328 ) may be presented as an audible request for information that the first, second, third, and fourth participants will hear on the client devices A ( 304 ) through D ( 310 ).
- the request may include an utterance asking for date and/or time of the call to be scheduled.
- the second participant may speak out the utterance “same time as this call”.
- the moderator manager may process the received transcription of that spoken phrase, identify the intent ( 330 ) to identify a time for the ensuing call, and proceed to extract the time from the current call details and use the same time for setting up the ensuing call.
- the third participant may speak out “next Tuesday”.
- the moderator manager may process the received transcription of that spoken phrase and identify the intent ( 332 ) to identify a date for the ensuing call.
- the moderator manager resolves the day of week to a full date and uses the full date for scheduling the ensuing call.
- the moderator manager may execute the command ( 334 ) for the intent ( 326 ) once information from contextual data (e.g., the current participants) and non-contextual data (e.g., the date and time of the ensuing call) have been gathered.
- the moderator manager may make a request to an application programming interface (API) endpoint of the video conference service executing on the server ( 302 ).
- API application programming interface
- the server ( 302 ) generates the response ( 336 ).
- the response ( 336 ) is transmitted to the client device A ( 304 ).
- the moderator manager on the client device ( 304 ) produces a result that may identify the resolved scheduled call date and time.
- the result may be displayed on the client devices A ( 304 ) through D ( 310 ) (shown in FIG. 3 D ). Additionally, the result may be contained in an affirmation email and text notification sent to one or more participants.
- the moderator manager may have each of the current call participants hear the response on their devices.
- the moderator manager may continue to process the audio transcripts from each of the participants for the next four seconds, before disengaging synthetic moderator if no more intents are extracted or if intent to disengage synthetic moderator is extracted from one or more transcripts.
- the user interface ( 350 ) is displayed on the client device A ( 304 ) (of FIG. 3 A ).
- the user interface ( 350 ) shows the four participants of the call.
- the user interface ( 350 ) is updated to show the icon ( 352 ).
- the icon ( 352 ) is a visual representation of the notification ( 324 ) of FIG. 3 A .
- the user interface ( 350 ) is updated to show the text ( 354 ).
- the text ( 354 ) is a visual representation of the result ( 338 ) of FIG. 3 A .
- the sequence ( 400 ) schedules add a participant to a call.
- the sequence ( 400 ) is performed by the server ( 402 ), the client device A ( 404 ), the client device B ( 406 ), and the client device C ( 408 ).
- a first participant (using the client device A ( 404 )) may be taking part in a video call (shown in FIG. 4 B ) with one other participant (using the client device B ( 406 )).
- the second participant may propose a third participant be added to the call.
- the first participant may pronounce a phrase triggering the activation ( 422 ) that engages the synthetic moderator.
- the synthetic moderator generates the notification ( 424 ) that includes a visual animation (shown in FIG. 4 C ) that may be displayed on screens of the client device A ( 404 ) and the client device B ( 406 ).
- the synthetic moderator may produce synthesized greeting utterance that may be heard by both first and second participants.
- the moderator manager on the client device A ( 404 ) may initiate transcription of audio from both first and second participants.
- the second participant may speak out the intent ( 426 ) to add a person to the call by naming the new participant.
- the moderator manager may receive the transcription containing the spoken words of the second participant.
- the moderator manager may process the transcription and it may extract the intent ( 426 ).
- the moderator manager may process the same transcript and extract information used to execute the intent, namely the name of participant to add.
- the moderator manager may gather context used to execute the intent, namely the unique call identifier where the participant is to be added to.
- the moderator manager executes the command ( 428 ) based on the intent ( 426 ) to add a new participant to the call.
- the participant may be added by invoking an API of the group video call service.
- the moderator manager receives the response ( 430 ) from the server ( 402 ).
- the response ( 430 ) may indicate the success of adding the new participant.
- the moderator manager generates the result ( 432 ) with audible indication that participant is added to the call (shown in FIG. 4 D ).
- the moderator manager may remain engaged for three seconds after producing an audible response, after which, if none of the participants had engaged with synthetic moderator by producing a new recognized intent, the synthetic moderator may disengage.
- the user interface ( 450 ) is displayed on the client device A ( 404 ) (of FIG. 4 A ).
- the user interface ( 450 ) shows the two participants of the call.
- the user interface ( 450 ) is updated to show the icon ( 452 ).
- the icon ( 452 ) is a visual representation of the notification ( 424 ) of FIG. 4 A .
- the user interface ( 450 ) is updated to show the video stream ( 454 ).
- the video stream ( 454 ) displays the third participant using the client device C ( 408 ) of FIG. 4 A .
- the sequence ( 500 ) changes the mute status of the client device A ( 504 ).
- the sequence ( 500 ) is performed by the client device A ( 504 ), the client device B ( 506 ), the client device C ( 508 ), and the client device D ( 510 ).
- a first participant (of the client device A ( 504 )) may be taking part in a video call with three other participants (of the client devices B ( 506 ), C ( 508 ), and D ( 510 )).
- the first participant may have muted ( 522 ) the microphone of the client device A ( 504 ) using software mute functionality (illustrated in FIG. 5 B ).
- the call may be configured to have moderator manager monitor microphone audio levels.
- the second participant may ask the first participant a question.
- the first participant may start replying to the question without unmuting microphone of the client device A ( 504 ).
- the moderator manager of the client device A may detect the first participant speaking while device microphone is muted at the detection ( 524 ). The moderator manager may also detect that none of the other participants are speaking. The moderator manager may engage the synthetic moderator to produce an audible activation indication for first participant. The second and third participants may not receive any visual or any other explicit indication that synthetic moderator was engaged for the first participant.
- the moderator manager generates the notification ( 526 ) to produce an audible greeting phrase “would you like to unmute the microphone?” that may only be audible to the first participant.
- the moderator manager may activate processing of audio spoken by the first participant.
- the first participant signals the intent ( 528 ) pronounced as an affirmative reply.
- the moderator manager may interpret the affirmative reply based on the context as an intent request to unmute the microphone of the first participant.
- the moderator manager may produce an API call to video conferencing service with a request to unmute ( 530 ) the microphone of the first participant.
- the moderator manager may immediately disengage synthetic moderator without producing any response to the first participant.
- Second and third participants may notice that the microphone state indication for the first participant changed from muted to unmuted (shown in FIG. 5 C ). The first participant may continue answering the question immediately.
- the user interface ( 550 ) is displayed on the client device A ( 504 ) (of FIG. 5 A ).
- the user interface ( 550 ) shows the four participants of the call.
- the mute icon ( 552 ) indicates that the audio stream of the first participant of the first client device A ( 504 ) (of FIG. 5 A ) is muted.
- the user interface ( 550 ) is updated to show the unmute icon ( 554 ).
- the unmute icon ( 554 ) indicates that the audio stream for the first participant is not muted and may be heard by the other participants.
- the sequence ( 600 ) performs agenda and summary functions.
- the sequence ( 600 ) is performed by the server ( 602 ), the client device A ( 604 ), the client device B ( 606 ), the client device C ( 608 ), the client device D ( 610 ), and the client device E ( 612 ).
- a first participant (using the client device A ( 604 )) may participate in a video conference call with four other participants (using the client devices B ( 606 ), C ( 608 ), D ( 610 ), and E ( 612 )).
- the first, second, third and fourth participants may join the call on scheduled time (shown in FIG. 6 B ).
- the fifth participant (using the client device E ( 612 )) may join the call ten minutes late.
- the first participant may engage the activation ( 622 ) of the synthetic moderator immediately after joining the call.
- the first participant may use a trigger phrase that the first participant previously selected.
- the moderator manager may send the notification ( 624 ) to each of the present participants (shown in FIG. 6 C ).
- the moderator manager may activate transcription of audio streams from each of the participants that may have already joined the call.
- the moderator manager may activate transcription of audio streams of newly joined and rejoined (e.g., reconnecting after network interruption, reconnecting after device change, reconnecting after device restart, etc.) participants immediately after the participants join the call.
- the first participant may speak out the intents ( 626 ) to enable one or more synthetic moderator group call functionalities, for example summary functionality and meeting agenda functionality.
- the moderator manager may process the transcript of the first participant audio stream and extract the intents ( 626 ) to activate two synthetic moderator functionalities.
- the moderator manager may generate the response ( 628 ) to produce an audible response for each of the participants present in the call.
- the response ( 628 ) may contain affirmation that functionality was enabled.
- the moderator manager may execute the intents. For summary functionality, the moderator manager may begin to record and persistently store the transcript from each of the participants. For meeting agenda functionality, the moderator manager may make use of API functionality to load a list of topics that is associated to this scheduled conference call from video conferencing service.
- the moderator manager may disengage the synthetic moderator (see FIG. 6 D ) and create delayed intent to support summary functionality and/or meeting agenda functionality.
- summary functionality may have an intent to be executed at execution moment when a participant joins the schedules call late, such as the fifth participant.
- the moderator manager may execute the join call summary ( 630 ) intent and privately display the results to fifth participant without displaying the results to the first through fourth participants (shown in FIG. 6 E ).
- the results may include spoken points and/or quotes from other call participants that spoke before fifth participant joined the call.
- the agenda functionality may have an intent to be executed at one or more execution moments during the call (e.g., after one quarter of the scheduled call time passes, during an interruption and/or a moment of silence, ten minutes before the call is scheduled to end, etc.).
- the moderator manager may execute the delayed intent associated with agenda functionality at halfway point during the meeting.
- the moderator manager may produce an audible signal to indicate that synthetic moderator is engaged.
- the moderator manager may create a delayed intent with execution moment set to the next audio silence and/or one or more participants may pronounce a keyword and/or a phrase that moderator manager may recognize as request to engage the agenda functionality.
- the moderator manager may produce the results of executing delayed agenda functionality intent.
- Each connected call participants may be able to see and/or hear the current meeting agenda ( 632 ) on the client devices A ( 604 ) through E ( 612 ) (shown in FIG. 6 F ).
- the third participant may request the agenda functionality to perform the updated intent ( 634 ) by speaking the agenda points that were covered since the last time agenda functionality intent was executed.
- the moderator manager may process the transcript of third participant's speech, extract the intent to mark some of the agenda points as completed, and execute the intent.
- the moderator manager may execute an update to the agenda ( 636 ).
- the result of the execution of the intent e.g., marking some agenda points as completed, crossing out agenda points, changing color of completed agenda point
- the moderator manager may continue transcribing and recording the transcription until the end of the call (shown in FIG. 6 H ).
- the moderator manager may then execute the summary functionality ( 638 ) as a delayed intent that is executed when the call ends (e.g., each of the participants have left).
- the execution result of that intent e.g., brief transcript of the call
- the user interface ( 650 ) is displayed on the client device A ( 604 ) (of FIG. 6 A ).
- the user interface ( 650 ) shows four participants present on the call.
- the user interface ( 650 ) is updated to show the icon ( 652 ).
- the icon ( 652 ) is a visual representation of the notification ( 624 ) of FIG. 6 A .
- the user interface ( 650 ) is updated to remove the icon ( 652 ) (of FIG. 6 C ).
- the call may proceed with multiple intents for the summary and agenda functionality.
- the user interface ( 650 ) is updated to show the video stream ( 654 ) for the fifth participant. Additionally, the call join summary ( 656 ) is displayed. In one embodiment, the call join summary ( 656 ) may be shown on the client device E ( 612 ) without being displayed on the other client devices A ( 604 ) through D ( 610 ).
- the user interface ( 650 ) is updated to display the agenda window ( 658 ).
- the agenda window ( 658 ) shows a list of pre-determined agenda items. One of the agenda items is marked as completed as a result of executing the update agenda intent.
- the user interface ( 650 ) is updated to display an update to the agenda window ( 658 ).
- the agenda window ( 658 ) is updated to show an additional point has been discussed.
- the user interface ( 650 ) is updated to remove the agenda window ( 658 ) (of FIG. 6 G ). The participants may continue the call.
- the user interface ( 650 ) is updated to display the summary ( 660 ).
- the summary ( 660 ) may be received by the client device A ( 604 ) (of FIG. 6 A ) after the call is ended and displayed after the video conferencing application is closed.
- the sequence ( 700 ) performs a call transfer between devices using a synthetic moderator.
- the sequence ( 700 ) is performed by the server ( 702 ), the client device A ( 704 ), the client device B ( 706 ), the client device C ( 708 ), and the client device D ( 710 ).
- a first participant using the (client device A ( 704 )) may be taking part in a scheduled video conference call ( 722 ) with two other participants (using the client devices B ( 706 ) and C 708 ) (shown in FIG. 7 B ).
- the client device A ( 704 ) may be a laptop computer that is used by the first participant to participate in the call.
- the first participant may decide to continue the call on the client device D ( 710 ), which may be a smartphone device.
- the activation ( 724 ) is triggered by the first participant pronouncing a keyword phrase that may engage a synthetic moderator on the client device A ( 704 ). In some situations, the first participant may mute their microphone from other call participants before pronouncing a keyword phrase.
- the moderator manager may receive the notification message about synthetic moderator being engaged and moderator manager may activate transcription of audio from one or more call participants.
- the moderator manager may produce the notification ( 726 ) as an audible and visual response with a greeting phrase (shown in FIG. 7 C ).
- a greeting phrase shown in FIG. 7 C .
- One or more of the other call participants may be able to hear response and the notification ( 726 ).
- the first participant may speak the phrase containing the intent ( 728 ) to transfer the call to another device (e.g., “transfer the call to my phone”, “continue this call on my Android phone”, “switch call to handheld”, etc.), i.e., the client device D ( 710 ).
- the moderator manager may use the audio transcription from the first participant to extract the intent and context required for executing the intent.
- the moderator manager may perform the command ( 730 ) by communicating with the video conferencing service hosted by the server ( 702 ) and transmit the device name extracted from the transcript (e.g., “phone”, “Android phone”, “handheld”, etc.).
- the device name extracted from the transcript e.g., “phone”, “Android phone”, “handheld”, etc.
- the server ( 702 ) may perform the call transfer ( 732 ) with the video conferencing service to initiate a connection with the identified device (the client device D ( 710 )).
- the client device D ( 710 ) may join the video conference call before the current device (the client device A ( 704 )) leaves the call.
- the state and context transfer ( 734 ) may be performed for the moderator manager executing on the client device A ( 704 ) (or the moderator manager service executing on the server 702 ) to communicate with the moderator manager on the client device D ( 710 ).
- the moderator manager of the client device A ( 704 ) may transmit the state and context that has been gathered to the moderator manager of the client device D ( 710 ).
- the client device A ( 704 ) may disconnect from the call (shown in FIG. 7 D ).
- the first participant may continue participating in the call using the client device D ( 710 ).
- the message below may be sent between moderator manager on the client device A ( 704 ) and moderator manager on the client device D ( 710 ).
- the state and context are transferred between moderator managers of the client devices A ( 704 ) and D ( 710 ) with the key and value strings for “state” and “context”.
- the key and value strings for “state” identify the state of the current intent (the call transfer intent) and any pending intents on the client device A ( 704 ) (e.g., call scheduling intents, add participant intents, etc.).
- the key and value strings for “context” identify the context for the current and any pending intents on the client device A ( 704 ) (e.g., participant identifiers, dates, times, etc.).
- the user interface ( 750 ) is displayed on the client device A ( 704 ) (of FIG. 7 A ).
- the user interface ( 750 ) shows three participants present on the call.
- the user interface ( 750 ) is updated to show the icon ( 752 ).
- the icon ( 752 ) is a visual representation of the notification ( 726 ) of FIG. 7 A .
- the call has been transferred from the client device A ( 704 ) (operating the user interface ( 750 )) to the client device D ( 710 ).
- the user interface ( 750 ) is updated to remove the video streams of the other participants.
- the client device D ( 710 ) is updated to show the video streams of the other participants of the call.
- Embodiments of the invention may be implemented on a computing system. Any combination of a mobile, a desktop, a server, a router, a switch, an embedded device, or other types of hardware may be used.
- the computing system ( 800 ) may include one or more computer processor(s) ( 802 ), non-persistent storage ( 804 ) (e.g., volatile memory, such as a random access memory (RAM), cache memory), persistent storage ( 806 ) (e.g., a hard disk, an optical drive such as a compact disk (CD) drive or a digital versatile disk (DVD) drive, a flash memory, etc.), a communication interface ( 812 ) (e.g., Bluetooth interface, infrared interface, network interface, optical interface, etc.), and numerous other elements and functionalities.
- non-persistent storage 804
- persistent storage e.g., a hard disk, an optical drive such as a compact disk (CD) drive or a digital versatile disk (DVD) drive, a flash
- the computer processor(s) ( 802 ) may be an integrated circuit for processing instructions.
- the computer processor(s) ( 802 ) may be one or more cores or micro-cores of a processor.
- the computing system ( 800 ) may also include one or more input device(s) ( 810 ), such as a touchscreen, a keyboard, a mouse, a microphone, a touchpad, an electronic pen, or any other type of input device.
- the communication interface ( 812 ) may include an integrated circuit for connecting the computing system ( 800 ) to a network (not shown) (e.g., a local area network (LAN), a wide area network (WAN) such as the Internet, a mobile network, or any other type of network) and/or to another device, such as another computing device.
- a network not shown
- LAN local area network
- WAN wide area network
- another device such as another computing device.
- the computing system ( 800 ) may include one or more output device(s) ( 808 ), such as a screen (e.g., a liquid crystal display (LCD), a plasma display, a touchscreen, a cathode ray tube (CRT) monitor, a projector, or other display device), a printer, an external storage, or any other output device.
- a screen e.g., a liquid crystal display (LCD), a plasma display, a touchscreen, a cathode ray tube (CRT) monitor, a projector, or other display device
- One or more of the output device(s) ( 808 ) may be the same or different from the input device(s) ( 810 ).
- the input and output device(s) ( 810 and ( 808 )) may be locally or remotely connected to the computer processor(s) ( 802 ), non-persistent storage ( 804 ), and persistent storage ( 806 ).
- Software instructions in the form of computer readable program code to perform embodiments of the invention may be stored, in whole or in part, temporarily or permanently, on a non-transitory computer readable medium such as a CD, a DVD, a storage device, a diskette, a tape, flash memory, physical memory, or any other computer readable storage medium.
- the software instructions may correspond to computer readable program code that, when executed by a processor(s), is configured to perform one or more embodiments of the invention.
- the computing system ( 800 ) in FIG. 8 A may be connected to or be a part of a network.
- the network ( 820 ) may include multiple nodes (e.g., node X ( 822 ), node Y ( 824 )).
- Each node may correspond to a computing system, such as the computing system ( 800 ) shown in FIG. 8 A , or a group of nodes combined may correspond to the computing system ( 800 ) shown in FIG. 8 A .
- embodiments of the invention may be implemented on a node of a distributed system that is connected to other nodes.
- embodiments of the invention may be implemented on a distributed computing system having multiple nodes, where each portion of the invention may be located on a different node within the distributed computing system.
- one or more elements of the aforementioned computing system ( 800 ) may be located at a remote location and connected to the other elements over a network.
- the node may correspond to a blade in a server chassis that is connected to other nodes via a backplane.
- the node may correspond to a server in a data center.
- the node may correspond to a computer processor or micro-core of a computer processor with shared memory and/or resources.
- the nodes (e.g., node X ( 822 ), node Y ( 824 )) in the network ( 820 ) may be configured to provide services for a client device ( 826 ).
- the nodes may be part of a cloud computing system.
- the nodes may include functionality to receive requests from the client device ( 826 ) and transmit responses to the client device ( 826 ).
- the client device ( 826 ) may be a computing system, such as the computing system ( 800 ) shown in FIG. 8 A . Further, the client device ( 826 ) may include and/or perform all or a portion of one or more embodiments of the invention.
- the computing system ( 800 ) or group of computing systems described in FIGS. 8 A and 8 B may include functionality to perform a variety of operations disclosed herein.
- the computing system(s) may perform communication between processes on the same or different system.
- a variety of mechanisms, employing some form of active or passive communication, may facilitate the exchange of data between processes on the same device. Examples representative of these inter-process communications include, but are not limited to, the implementation of a file, a signal, a socket, a message queue, a pipeline, a semaphore, shared memory, message passing, and a memory-mapped file. Further details pertaining to a couple of these non-limiting examples are provided below.
- sockets may serve as interfaces or communication channel end-points enabling bidirectional data transfer between processes on the same device.
- a server process e.g., a process that provides data
- the server process may create a first socket object.
- the server process binds the first socket object, thereby associating the first socket object with a unique name and/or address.
- the server process then waits and listens for incoming connection requests from one or more client processes (e.g., processes that seek data).
- client processes e.g., processes that seek data.
- the client process then proceeds to generate a connection request that includes at least the second socket object and the unique name and/or address associated with the first socket object.
- the client process then transmits the connection request to the server process.
- the server process may accept the connection request, establishing a communication channel with the client process, or the server process, busy in handling other operations, may queue the connection request in a buffer until server process is ready.
- An established connection informs the client process that communications may commence.
- the client process may generate a data request specifying the data that the client process wishes to obtain.
- the data request is subsequently transmitted to the server process.
- the server process analyzes the request and gathers the requested data.
- the server process then generates a reply including at least the requested data and transmits the reply to the client process.
- the data may be transferred, more commonly, as datagrams or a stream of characters (e.g., bytes).
- Shared memory refers to the allocation of virtual memory space in order to substantiate a mechanism for which data may be communicated and/or accessed by multiple processes.
- an initializing process first creates a shareable segment in persistent or non-persistent storage. Post creation, the initializing process then mounts the shareable segment, subsequently mapping the shareable segment into the address space associated with the initializing process. Following the mounting, the initializing process proceeds to identify and grant access permission to one or more authorized processes that may also write and read data to and from the shareable segment. Changes made to the data in the shareable segment by one process may immediately affect other processes, which are also linked to the shareable segment. Further, when one of the authorized processes accesses the shareable segment, the shareable segment maps to the address space of that authorized process. Often, only one authorized process may mount the shareable segment, other than the initializing process, at any given time.
- the computing system performing one or more embodiments of the invention may include functionality to receive data from a user.
- a user may submit data via a graphical user interface (GUI) on the user device.
- GUI graphical user interface
- Data may be submitted via the graphical user interface by a user selecting one or more graphical user interface widgets or inserting text and other data into graphical user interface widgets using a touchpad, a keyboard, a mouse, or any other input device.
- information regarding the particular item may be obtained from persistent or non-persistent storage by the computer processor.
- the contents of the obtained data regarding the particular item may be displayed on the user device in response to the user's selection.
- a request to obtain data regarding the particular item may be sent to a server operatively connected to the user device through a network.
- the user may select a uniform resource locator (URL) link within a web client of the user device, thereby initiating a Hypertext Transfer Protocol (HTTP) or other protocol request being sent to the network host associated with the URL.
- HTTP Hypertext Transfer Protocol
- the server may extract the data regarding the particular selected item and send the data to the device that initiated the request.
- the contents of the received data regarding the particular item may be displayed on the user device in response to the user's selection.
- the data received from the server after selecting the URL link may provide a web page in Hyper Text Markup Language (HTML) that may be rendered by the web client and displayed on the user device.
- HTML Hyper Text Markup Language
- the computing system may extract one or more data items from the obtained data.
- the extraction may be performed as follows by the computing system ( 800 ) in FIG. 8 A .
- the organizing pattern e.g., grammar, schema, layout
- the organizing pattern is determined, which may be based on one or more of the following: position (e.g., bit or column position, Nth token in a data stream, etc.), attribute (where the attribute is associated with one or more values), or a hierarchical/tree structure (consisting of layers of nodes at different levels of detail-such as in nested packet headers or nested document sections).
- the raw, unprocessed stream of data symbols is parsed, in the context of the organizing pattern, into a stream (or layered structure) of tokens (where each token may have an associated token “type”).
- extraction criteria are used to extract one or more data items from the token stream or structure, where the extraction criteria are processed according to the organizing pattern to extract one or more tokens (or nodes from a layered structure).
- the token(s) at the position(s) identified by the extraction criteria are extracted.
- the token(s) and/or node(s) associated with the attribute(s) satisfying the extraction criteria are extracted.
- the token(s) associated with the node(s) matching the extraction criteria are extracted.
- the extraction criteria may be as simple as an identifier string or may be a query presented to a structured data repository (where the data repository may be organized according to a database schema or data format, such as XML).
- the extracted data may be used for further processing by the computing system.
- the computing system ( 800 ) of FIG. 8 A while performing one or more embodiments of the invention, may perform data comparison.
- the comparison may be performed by submitting A, B, and an opcode specifying an operation related to the comparison into an arithmetic logic unit (ALU) (i.e., circuitry that performs arithmetic and/or bitwise logical operations on the two data values).
- ALU arithmetic logic unit
- the ALU outputs the numerical result of the operation and/or one or more status flags related to the numerical result.
- the status flags may indicate whether the numerical result is a positive number, a negative number, zero, etc.
- the comparison may be executed. For example, in order to determine if A>B, B may be subtracted from A (i.e., A ⁇ B), and the status flags may be read to determine if the result is positive (i.e., if A>B, then A ⁇ B>0).
- a and B may be vectors, and comparing A with B requires comparing the first element of vector A with the first element of vector B, the second element of vector A with the second element of vector B, etc.
- comparing A with B requires comparing the first element of vector A with the first element of vector B, the second element of vector A with the second element of vector B, etc.
- if A and B are strings, the binary values of the strings may be compared.
- the computing system ( 800 ) in FIG. 8 A may implement and/or be connected to a data repository.
- a data repository is a database.
- a database is a collection of information configured for ease of data retrieval, modification, re-organization, and deletion.
- a Database Management System (DBMS) is a software application that provides an interface for users to define, create, query, update, or administer databases.
- the user, or software application may submit a statement or query into the DBMS. Then the DBMS interprets the statement.
- the statement may be a select statement to request information, update statement, create statement, delete statement, etc.
- the statement may include parameters that specify data, or data container (database, table, record, column, view, etc.), identifier(s), conditions (comparison operators), functions (e.g., join, full join, count, average, etc.), sort (e.g., ascending, descending), or others.
- the DBMS may execute the statement. For example, the DBMS may access a memory buffer, a reference or index a file for read, write, deletion, or any combination thereof, for responding to the statement.
- the DBMS may load the data from persistent or non-persistent storage and perform computations to respond to the query.
- the DBMS may return the result(s) to the user or software application.
- the computing system ( 800 ) of FIG. 8 A may include functionality to present raw and/or processed data, such as results of comparisons and other processing.
- presenting data may be accomplished through various presenting methods.
- data may be presented through a user interface provided by a computing device.
- the user interface may include a GUI that displays information on a display device, such as a computer monitor or a touchscreen on a handheld computer device.
- the GUI may include various GUI widgets that organize what data is shown as well as how data is presented to a user.
- the GUI may present data directly to the user, e.g., data presented as actual data values through text, or rendered by the computing device into a visual representation of the data, such as through visualizing a data model.
- a GUI may first obtain a notification from a software application requesting that a particular data object be presented within the GUI.
- the GUI may determine a data object type associated with the particular data object, e.g., by obtaining data from a data attribute within the data object that identifies the data object type.
- the GUI may determine any rules designated for displaying that data object type, e.g., rules specified by a software framework for a data object class or according to any local parameters defined by the GUI for presenting that data object type.
- the GUI may obtain data values from the particular data object and render a visual representation of the data values within a display device according to the designated rules for that data object type.
- Data may also be presented through various audio methods.
- data may be rendered into an audio format and presented as sound through one or more speakers operably connected to a computing device.
- haptic methods may include vibrations or other physical signals generated by the computing system.
- data may be presented to a user using a vibration generated by a handheld computer device with a predefined duration and intensity of the vibration to communicate the data.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
| { | |||
| “activated”: true, | |||
| “private”: true/false | |||
| } | |||
The value “private” is a Boolean value (e.g., true or false) that may indicate whenever input from other participants may be processed by the moderator manager. As an example, if private is set to true, second and third participants may not reply to information requests made by the moderator manager.
| { | |||
| “activated”: true, | |||
| “utterance”: “Moderator speaking” (Optional) | |||
| } | |||
The above message may be sent to provide an indication to other participants that the moderator manager is engaged. In one embodiment, the above message may indicate that input from other participants may be processed by the moderator manager. The moderator manager may extract intents, collect information and contexts, and prepare to execute the commands related to intents.
| { | |||
| “utterance”: “At what time should we have the call?”, | |||
| “gather_id”: “unique identifier” (Optional) | |||
| } | |||
The message above contains an utterance parameter that may be used by the moderator manager to pass custom textual message with information and context requests to one or more participants. In one embodiment, the custom textual message may be used as text-to-speech system input by one or more participants to produce audible information and context request. The message may also contain an optional “gather_id” parameter, which may be used by the moderator manager to associate participant responses with information gathering requests associated with intents.
| { |
| “participant_emails”: [“email1@domain1.com”, |
| “email2@domain2.com”], |
| “date_range”: [1627000000, 1628000000] (Optional) |
| } |
The message may contain API-specific parameters. For example, it may contain “participant_emails” parameters that may contain list of call participant emails, message may optionally contain “date_range” (e.g., the moderator manager has gathered information that the call to schedule is to happen on a specific week, such as “next week”) that may be of type Unix timestamp.
| [ | |||
| { | |||
| “participant_email”: “email2@domain2.com”, | |||
| “scheduled_events”: [ | |||
| { | |||
| “start”: 1627100000, | |||
| “end”: 1627200000 | |||
| } | |||
| ] | |||
| } | |||
| ] | |||
The response message above may contain a list with zero or more entries, for each requested participant email. Each participant entry may contain start and end times in Unix timestamp format of future scheduled events. If the “date_range” parameter is specified in the preceding request message, then the response message may contain the events in that time range.
| { |
| “state”: { |
| “key”: “value” (zero or more, where “value” can be any |
| serializable data type) |
| }. |
| “context”: { |
| “key”: “value” (zero or more, where “value” can be any |
| serializable data type) |
| } |
| } |
Claims (19)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/402,228 US12328347B2 (en) | 2021-08-13 | 2021-08-13 | Synthetic moderator |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US17/402,228 US12328347B2 (en) | 2021-08-13 | 2021-08-13 | Synthetic moderator |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20230059158A1 US20230059158A1 (en) | 2023-02-23 |
| US12328347B2 true US12328347B2 (en) | 2025-06-10 |
Family
ID=85228416
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/402,228 Active US12328347B2 (en) | 2021-08-13 | 2021-08-13 | Synthetic moderator |
Country Status (1)
| Country | Link |
|---|---|
| US (1) | US12328347B2 (en) |
Citations (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030235279A1 (en) * | 2002-03-27 | 2003-12-25 | Morgan Richomme | Dynamic web conference monitoring through a streaming mechanism |
| US20050033807A1 (en) | 2003-06-23 | 2005-02-10 | Lowrance John D. | Method and apparatus for facilitating computer-supported collaborative work sessions |
| US20110072362A1 (en) * | 2009-09-22 | 2011-03-24 | International Business Machines Corporation | Meeting Agenda Management |
| US20130212287A1 (en) * | 2010-12-03 | 2013-08-15 | Siemens Enterprise Communications, Inc. | Method and Apparatus for Controlling Sessions From One or More Devices |
| US20160242095A1 (en) * | 2013-03-14 | 2016-08-18 | Sorenson Communications, Inc. | Methods, devices, and systems for remotely controlling a communication device |
| US20160266864A1 (en) * | 2015-03-10 | 2016-09-15 | Zoho Corporation Private Limited | Methods and apparatus for enhancing electronic presentations |
| US9704128B2 (en) | 2000-09-12 | 2017-07-11 | Sri International | Method and apparatus for iterative computer-mediated collaborative synthesis and analysis |
| US20180131904A1 (en) * | 2013-06-26 | 2018-05-10 | Touchcast LLC | Intelligent virtual assistant system and method |
| US20190189117A1 (en) * | 2017-12-15 | 2019-06-20 | Blue Jeans Network, Inc. | System and methods for in-meeting group assistance using a virtual assistant |
| US20190378076A1 (en) * | 2018-06-06 | 2019-12-12 | International Business Machines Corporation | Meeting Management |
| US20200092519A1 (en) * | 2019-07-25 | 2020-03-19 | Lg Electronics Inc. | Video conference system using artificial intelligence |
| US20200110572A1 (en) * | 2018-10-08 | 2020-04-09 | Nuance Communications, Inc. | System and method for managing a mute button setting for a conference call |
| US10742817B1 (en) * | 2018-09-05 | 2020-08-11 | West Corporation | Conference call notification and setup configuration |
| US20210058264A1 (en) * | 2019-08-23 | 2021-02-25 | Mitel Networks (International) Limited | Advising meeting participants of their contributions based on a graphical representation |
| US11095579B1 (en) * | 2020-05-01 | 2021-08-17 | Yseop Sa | Chatbot with progressive summary generation |
| US20220122603A1 (en) * | 2019-11-01 | 2022-04-21 | Samsung Electronics Co., Ltd. | Method and apparatus for supporting voice agent in which plurality of users participate |
-
2021
- 2021-08-13 US US17/402,228 patent/US12328347B2/en active Active
Patent Citations (16)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9704128B2 (en) | 2000-09-12 | 2017-07-11 | Sri International | Method and apparatus for iterative computer-mediated collaborative synthesis and analysis |
| US20030235279A1 (en) * | 2002-03-27 | 2003-12-25 | Morgan Richomme | Dynamic web conference monitoring through a streaming mechanism |
| US20050033807A1 (en) | 2003-06-23 | 2005-02-10 | Lowrance John D. | Method and apparatus for facilitating computer-supported collaborative work sessions |
| US20110072362A1 (en) * | 2009-09-22 | 2011-03-24 | International Business Machines Corporation | Meeting Agenda Management |
| US20130212287A1 (en) * | 2010-12-03 | 2013-08-15 | Siemens Enterprise Communications, Inc. | Method and Apparatus for Controlling Sessions From One or More Devices |
| US20160242095A1 (en) * | 2013-03-14 | 2016-08-18 | Sorenson Communications, Inc. | Methods, devices, and systems for remotely controlling a communication device |
| US20180131904A1 (en) * | 2013-06-26 | 2018-05-10 | Touchcast LLC | Intelligent virtual assistant system and method |
| US20160266864A1 (en) * | 2015-03-10 | 2016-09-15 | Zoho Corporation Private Limited | Methods and apparatus for enhancing electronic presentations |
| US20190189117A1 (en) * | 2017-12-15 | 2019-06-20 | Blue Jeans Network, Inc. | System and methods for in-meeting group assistance using a virtual assistant |
| US20190378076A1 (en) * | 2018-06-06 | 2019-12-12 | International Business Machines Corporation | Meeting Management |
| US10742817B1 (en) * | 2018-09-05 | 2020-08-11 | West Corporation | Conference call notification and setup configuration |
| US20200110572A1 (en) * | 2018-10-08 | 2020-04-09 | Nuance Communications, Inc. | System and method for managing a mute button setting for a conference call |
| US20200092519A1 (en) * | 2019-07-25 | 2020-03-19 | Lg Electronics Inc. | Video conference system using artificial intelligence |
| US20210058264A1 (en) * | 2019-08-23 | 2021-02-25 | Mitel Networks (International) Limited | Advising meeting participants of their contributions based on a graphical representation |
| US20220122603A1 (en) * | 2019-11-01 | 2022-04-21 | Samsung Electronics Co., Ltd. | Method and apparatus for supporting voice agent in which plurality of users participate |
| US11095579B1 (en) * | 2020-05-01 | 2021-08-17 | Yseop Sa | Chatbot with progressive summary generation |
Also Published As
| Publication number | Publication date |
|---|---|
| US20230059158A1 (en) | 2023-02-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11990124B2 (en) | Language model prediction of API call invocations and verbal responses | |
| US8768705B2 (en) | Automated and enhanced note taking for online collaborative computing sessions | |
| US20210157834A1 (en) | Diagnostics capabilities for customer contact services | |
| US20210158234A1 (en) | Customer contact service with real-time agent assistance | |
| US20210158805A1 (en) | Systems and methods to analyze customer contacts | |
| US20210158813A1 (en) | Enrichment of customer contact data | |
| CN116368785B (en) | Intelligent query buffering mechanism | |
| US20210158235A1 (en) | Customer contact service with real-time supervisor assistance | |
| US11671467B2 (en) | Automated session participation on behalf of absent participants | |
| CN111033492A (en) | Provide command bundle suggestions for automation assistants | |
| CN109545205B (en) | Context-based virtual assistant implementation | |
| US20160006776A1 (en) | Systems and methods for enhanced conference session interaction | |
| KR20210008521A (en) | Dynamic and/or context-specific hot words to invoke automated assistants | |
| US20180211223A1 (en) | Data Processing System with Machine Learning Engine to Provide Automated Collaboration Assistance Functions | |
| EP4393144B1 (en) | Determination and visual display of spoken menus for calls | |
| US12079629B2 (en) | Score prediction using hierarchical attention | |
| JP7297797B2 (en) | Method and apparatus for managing holds | |
| US20230169272A1 (en) | Communication framework for automated content generation and adaptive delivery | |
| US10297255B2 (en) | Data processing system with machine learning engine to provide automated collaboration assistance functions | |
| JP2020518905A (en) | Initializing an automated conversation with an agent via selectable graphic elements | |
| US10972297B2 (en) | Data processing system with machine learning engine to provide automated collaboration assistance functions | |
| US8994774B2 (en) | Providing information to user during video conference | |
| KR101618084B1 (en) | Method and apparatus for managing minutes | |
| US20250260770A1 (en) | Enhanced control of presenter queue notifications | |
| US11086592B1 (en) | Distribution of audio recording for social networks |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| AS | Assignment |
Owner name: MASS LUMINOSITY, INC., TEXAS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MUNOZ, ANGEL;ATROSHENKO, TEODOR;REEL/FRAME:065356/0529 Effective date: 20210813 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE AFTER FINAL ACTION FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |