CN112204512A

CN112204512A - Method, apparatus and computer readable medium for desktop sharing over web socket connections in networked collaborative workspaces

Info

Publication number: CN112204512A
Application number: CN201980036854.6A
Authority: CN
Inventors: 马可·瓦莱里奥·马西; 克里斯蒂亚诺·富马加利
Original assignee: Limag Ltd
Current assignee: Limag Ltd
Priority date: 2018-06-01
Filing date: 2019-05-30
Publication date: 2021-01-08
Also published as: BR112020024441A2; EP3803558A1; KR20210018353A; JP2021525910A; WO2019229208A1

Abstract

A method, system and computer readable medium for desktop sharing over a web socket connection in a networked collaborative workspace, comprising: the method includes sending, over a web socket connection, a representation of a collaborative workspace hosted on a server and accessible by a plurality of participants on a plurality of computing devices, receiving a request to share at least a portion of a local desktop of the local computing device within the collaborative workspace and a selection of a region within the representation of the collaborative workspace, generating a stream object configured to output a video stream of at least a portion of the local desktop of the local computing device and sending one or more commands to the server over the web socket connection, the one or more commands including a stream object and information corresponding to the selected region, and configured to cause the server to insert the stream object into the collaborative workspace at the selected region.

Description

Method, apparatus and computer readable medium for desktop sharing over web socket connections in networked collaborative workspaces

Background

Operating systems and applications executing within the operating system often utilize external hardware devices to allow a user to provide input to the program and output to the user. Common examples of external hardware devices include a keyboard, a computer mouse, a microphone, and an external speaker. These external hardware devices interface with the operating system through the use of drivers, which are specialized software programs configured to interface between the hardware commands used by a particular hardware device and the operating system.

Applications will sometimes be designed to interface with certain hardware devices. For example, a speech-to-text word processing application may be designed to interface with an audio headset that includes a microphone. In this case, the application must be specially configured to receive voice commands, perform voice recognition, convert recognized words into textual content, and output the textual content into a document. This functionality will typically be embodied in an Application Program Interface (API) of the application, which is a defined set of communication methods between various software components. In the example of a speech recognition application, the API may include an interface between the application and software on a driver responsible for interfacing with the hardware device (microphone) itself.

One problem with existing software utilizing dedicated hardware devices is that the application or operating system software itself must be customized and specifically designed to use the hardware device. This customization means that the hardware device cannot go beyond the scope defined for it by the application and cannot be used in an environment outside the particular application for which it is designed. For example, a user of a speech-to-text word processing application cannot use speech commands to manipulate other applications or other components within the operating system unless these other applications or operating systems are specifically designed to use receiving speech commands through a microphone.

FIG. 1 shows an example of an existing architecture of a system for user input with coupled hardware devices. Operating system 100A of FIG. 1 includes executing applications 101A and 102A, each of which has their own API,101B and 102B, respectively. The operating system 100A also has its own API 100B, and dedicated drivers 100C,101C and 102C configured to interface with the

hardware devices

100D,101D and 102D.

As shown in FIG. 1, application API 101B is configured to interface with driver 101C, and driver 101C interfaces with hardware device 101D. Similarly, the application API 102B is configured to interface with a driver 102C, and the driver 102C interfaces with the hardware device 102D. At the operating system level, operating system API 100B is configured to interface with driver 100C, and driver 100C interfaces with hardware device 100D.

The architecture of the system shown in FIG. 1 limits the ability of a user to utilize hardware devices outside of the context of certain applications or operating systems. For example, the user cannot provide input to the application 102A using the hardware device 101D, and cannot provide input to the application 101A or the operating system 100A using the hardware device 102D.

Accordingly, there is a need for an improved hardware-software interface in view of hardware devices used in multiple software environments.

Drawings

FIG. 1 shows an example of an existing architecture of a system for user input with coupled hardware devices.

FIG. 2 illustrates the architecture of a system utilizing a generic hardware-software interface in accordance with exemplary embodiments.

FIG. 3 shows a flowchart for implementing a generic hardware-software interface in accordance with an example embodiment.

FIG. 4 shows a flowchart for determining user input based at least in part on information captured by one or more hardware devices communicatively coupled to a system when the information captured by the one or more hardware devices includes one or more images, according to an example embodiment.

Fig. 5A illustrates an example of object recognition according to an exemplary embodiment.

Fig. 5B illustrates an example of determining input location coordinates according to an exemplary embodiment.

FIG. 6 illustrates a flow diagram for determining user input based at least in part on information captured by one or more hardware devices communicatively coupled to the system when the captured information is voice information, according to an example embodiment.

Fig. 7 illustrates a tool interface that may be part of a transparent layer according to an example embodiment.

FIG. 8 shows an example of a stylus that may be part of a system according to an example embodiment.

FIG. 9 illustrates a flowchart for identifying context corresponding to user input, according to an example embodiment.

FIG. 10 illustrates an example of using input coordinates to determine context according to an exemplary embodiment.

FIG. 11 illustrates a flowchart for converting user input into transparent layer commands, according to an example embodiment.

Fig. 12A illustrates an example of receiving input coordinates when switching a selection mode according to an exemplary embodiment.

Fig. 12B illustrates an example of receiving input coordinates when switching the pointing mode according to an exemplary embodiment.

Fig. 12C illustrates an example of receiving input coordinates when the drawing mode is switched according to an exemplary embodiment.

FIG. 13 illustrates an example of a transparent layer command determined based on one or more words identified in input speech data according to an example embodiment.

FIG. 14 illustrates another example of a transparent layer command determined based on one or more words identified in the input speech data according to an example embodiment.

FIG. 15 illustrates a flow diagram for executing one or more transparent layer commands on a transparent layer in accordance with an exemplary embodiment.

FIG. 16 shows an example interface for adding a new command corresponding to a user input, according to an example embodiment.

FIG. 17 illustrates various components and options of a drawing interface and drawing schema according to an exemplary embodiment.

FIG. 18 illustrates a calibration and setup interface for a camera hardware device that recognizes objects and allows a user to provide input using touch and gestures, according to an example embodiment.

FIG. 19 illustrates a general settings interface that allows a user to customize various aspects of the interface, switch input modes, and make other changes according to an exemplary embodiment.

FIG. 20 illustrates a flowchart for desktop sharing through a web socket connection in a networked collaborative workspace, according to an exemplary embodiment.

FIG. 21A illustrates a network architecture for a host and a sending cooperative work area in accordance with an illustrative embodiment.

FIG. 21B illustrates a process for propagating edits to collaborative workspaces within a network in accordance with an exemplary embodiment.

FIG. 22 illustrates multiple representations of collaborative workspaces in accordance with an exemplary embodiment.

FIG. 23A illustrates an example of a user interface (desktop) of a local computing device prior to receiving a request and a region selection, according to an example embodiment.

FIG. 23B illustrates an example of a user interface (desktop) of the local computing device after receiving the request and before selecting the region according to an example embodiment.

Fig. 24A to 24C illustrate an example of a source selection process according to an exemplary embodiment.

FIG. 25 illustrates a flow diagram for generating a stream object configured to output a video stream of at least a portion of a local desktop of a local computing device, according to an example embodiment.

FIG. 26 illustrates a process of sending commands and propagating stream objects from a local computing device in accordance with an illustrative embodiment.

FIG. 27 shows an example of an interface of a local computing device after a server embeds a stream object into a collaborative workspace, according to an example embodiment.

FIG. 28 illustrates a flowchart for controlling a desktop or a portion of a desktop via an embedded stream object from a local computing device, according to an example embodiment.

29A-29C illustrate an example of controlling a desktop or a portion of a desktop via an embedded stream object from a local computing device in accordance with an illustrative embodiment.

FIG. 30 shows a flowchart for controlling a desktop or a portion of a desktop via an embedded stream object from a remote computing device, according to an example embodiment.

31A-31C illustrate an example of controlling a desktop or a portion of a desktop via an embedded stream object from a remote computing device according to an example embodiment.

FIG. 32 illustrates an exemplary computing environment configured to execute the disclosed methods.

Detailed Description

Although the methods, devices, and computer-readable media are described herein by way of example and embodiments, those skilled in the art will recognize that the methods, devices, and computer-readable media for implementing a general hardware-software interface are not limited to the embodiments or figures described. It should be understood that the drawings and description are not intended to be limited to the particular forms disclosed. On the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the appended claims. Any headings used herein are for organizational purposes only and are not meant to limit the scope of the description or the claims. As used herein, the word "may" is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words "include", "including" and "comprises" mean including, but not limited to.

Applicants have invented a method, apparatus, and computer-readable medium that addresses the problems associated with previous hardware-software interfaces for hardware devices. In particular, applicants have developed a generic hardware-software interface that allows users to utilize communicatively coupled hardware devices in a variety of software environments. The disclosed implementations eliminate the need to customize an application or operating system designed to interface with a particular hardware device through the use of a dedicated virtual driver and corresponding transparent layer, as described in more detail below.

FIG. 2 illustrates the architecture of a system utilizing a generic hardware-software interface in accordance with exemplary embodiments. As shown in FIG. 2, operating system 200A includes a transparent layer 203 that sends virtual drivers 204. As will be explained in more detail below, the transparent layer 203 is an API configured to interface between the virtual driver and the operating system and/or applications executing on the operating system. In this example, transparent layer 203 connects virtual driver 204 and API 201B of application 201A, API 202B of application 202A, and operating system API 200B of operating system 200A.

The transparent layer 203 may be part of a software process running on an operating system and may have its own User Interface (UI) elements, including a transparent UI superimposed on the underlying user interface and/or visual UI elements with which the user can interact.

Virtual driver 204 is configured as an emulated driver 205A and 205B that interfaces with

hardware devices

206A and 206B, respectively. The virtual driver may receive user input indicating on which virtual driver the virtual driver is to be emulated, for example in the form of voice commands, selections made on a user interface, and/or gestures made by a user in front of a coupled web camera. For example, each connected hardware device may operate in a "listening" mode, and each emulation driver in virtual drivers 204 may be configured to detect an initialization signal that serves as a signal to switch the virtual driver to a particular emulation mode. For example, a user stating "start voice command" may activate a driver corresponding to a microphone to receive a new voice command. Similarly, a user giving a certain gesture may activate a driver corresponding to the web camera to receive gesture input or touch input.

The virtual driver may also be configured to interface with a native driver (e.g., native driver 205C), which itself communicates with the hardware device 206C. In one example, the hardware device 206C may be a standard input device, such as a keyboard or mouse, that is natively supported by the operating system.

The system shown in FIG. 2 allows for a generic hardware-software interface to be implemented, where a user may use any coupled hardware device in various environments, such as a specific application or operating system, without the need to customize the application or operating system to interface with the hardware device.

For example, the hardware device 206A may capture information that is subsequently received by the virtual driver 204 of the emulated driver 205A. The virtual driver 204 may determine user input based on the captured information. For example, if the information is a series of images of the user moving his hand, the virtual driver may determine that the user has performed a gesture.

Based on the identified context (e.g., a particular application or operating system), the user input may be converted into a transparent layer command and sent to the transparent layer 203 for execution. The transparent layer commands may include native commands in the identified context. For example, if the identified context is application 201A, the native commands will be in a format compatible with application API 201B of application 201A. Execution of the transparent layer command may then be configured to cause execution of one or more native commands in the identified context. This is accomplished through the transparent layer 203 interfacing with each API of the applications executing on the operating system 200A as well as the operating system APIs 200B. For example, if the native command is an operating system command, such as a command to launch a new program, the transparent layer 203 may provide the native command to the operating system API 200B for execution.

As shown in fig. 2, there is bi-directional communication between all of the components shown. This means, for example, that executing a transparent layer command in the transparent layer 203 may result in sending information to the virtual driver 204 and to one of the connected hardware devices. For example, after a voice command is recognized as an input, converted to a transparent layer command including a native command, and executed by the transparent layer (resulting in execution of the native command in the recognized context), a signal may be sent from the transparent layer to the speaker (via the virtual driver) to send an acoustic output "command received".

Of course, the architecture shown in FIG. 2 is for illustrative purposes only, and it should be understood that the number of applications executed, the number and type of hardware devices connected, the number of drivers, and the emulation drivers may vary.

At step 301, a user input is determined based at least in part on information captured by one or more hardware devices communicatively coupled to the system. As used herein, a system may refer to one or more computing devices that perform the steps of the method, an apparatus comprising one or more processors and one or more memories that perform the steps of the method, or any other computing system.

The user input may be determined by a virtual driver executing on the system. As previously described, the virtual driver may operate in an emulation mode that emulates other hardware drivers, thereby receiving captured information from the hardware device, or may alternatively receive captured information from one or more other hardware drivers configured to interface with a particular hardware device.

Various hardware devices may be used, such as cameras, video cameras, microphones, headsets with two-way communication, mice, touch pads, track pads, controllers, game pads, joysticks, touch screens, motion capture devices including accelerometers and/or tilt sensors, remote controls, touch pens or any combination of these devices. Of course, this list of hardware devices is provided as an example only, and any hardware device that can be used to detect voice, image, video, or touch information may be utilized.

The communicative coupling between the hardware devices and the system may take a variety of forms. For example, the hardware device may communicate with the system via a wireless network, a bluetooth protocol, radio frequency, infrared signals, and/or via a physical connection (e.g., a Universal Serial Bus (USB) connection). The communication may also include wireless and wired communication. For example, a hardware device may include two components, one of which wirelessly (e.g., via bluetooth) sends signals to a second component, which itself is connected to the system via a wired connection (e.g., USB). Various communication techniques may be utilized in accordance with the systems described herein, and these examples are not intended to be limiting.

The information captured by the one or more hardware devices may be any type of information, such as image information including one or more images, video frames, sound information, and/or touch information. The captured information may be in any suitable format, such as a.wav or.mp 3 file for voice information, a.jpeg file for images, digital coordinates for touch information, and so forth.

The techniques described herein may allow any display device to effectively function as a "touch" screen device in any context, even if the display device does not include any hardware to detect touch signals or touch-based gestures. This will be described in more detail below and may be achieved by analyzing images captured by a camera or camcorder.

FIG. 4 illustrates a flow diagram for determining user input based at least in part on information captured by one or more hardware devices communicatively coupled to the system when the information captured by the one or more hardware devices includes one or more images.

In step 401, one or more images are received. These images may be captured by a hardware device, such as a camera or camcorder, and may be received by a virtual driver, as previously described.

In step 402, objects in one or more images are identified. The object may be, for example, a hand, a finger, or other body part of the user. The object may also be a dedicated device, such as a stylus or pen, or a dedicated hardware device, such as a motion tracking stylus/remote control, communicatively coupled to the system and containing an accelerometer and/or tilt sensor. Object recognition may be performed by the virtual driver, which may be based on earlier training, for example by using a calibration routine run by the object.

Fig. 5A illustrates an example of object recognition according to an exemplary embodiment. As shown in fig. 5A, image 501 includes a user's hand that has been identified as object 502. The recognition algorithm may of course be configured to recognize different objects, such as fingers.

Returning to FIG. 4, in step 403, one or more directions and one or more locations of the identified objects are determined. This can be achieved in a number of ways. If the object is not a hardware device but a body part, such as a hand or finger, the object may be mapped into a three-dimensional coordinate system using the known position of the camera as a reference point to determine the three-dimensional coordinates of the object and various angles with respect to the X, Y and Z axes. If the object is a hardware device and includes motion tracking hardware, such as an accelerometer and/or tilt sensor, the image information may be used in conjunction with information indicated by the accelerometer and/or tilt sensor to determine the position and orientation of the object.

At step 404, a user input is determined based at least in part on the one or more orientations and the one or more locations of the identified object. This may include determining location coordinates on a transparent User Interface (UI) of the transparent layer based at least in part on the one or more orientations and the one or more locations. The transparent UI is part of a transparent layer and is superimposed on the underlying UI corresponding to the operating system and/or any applications executing on the operating system.

FIG. 5B shows an example of this step when the object is a user's finger. As shown in fig. 5B, the display device 503 includes an underlying UI 506 and a transparent UI 507 superimposed on the underlying UI 506. The transparent UI 507 is shown with dot shading for clarity, but it should be understood that in practice the transparent UI is a transparent layer that is not visible to the user. Additionally, the transparent UI 507 is shown to be slightly smaller than the underlying UI 506, but it should be understood that in practice the transparent UI will cover the same screen area as the underlying UI.

As shown in FIG. 5B, the position and orientation information of the object (the user's finger) is used to project a line onto the plane of the display device 503 and determine an intersection point 505. Image information captured by the camera 504 and the known location of the display device 503 underneath the camera may be used to assist in the projection. As shown in fig. 5B, the user input is determined as input coordinates at the intersection point 505.

As will be discussed further below, the actual transparent layer commands generated based on the input may be based on user settings and/or the identified context. For example, the command may be a touch command indicating that an object at the coordinates of point 505 should be selected and/or opened. The command may also be a point command indicating the coordinates at which a pointer (e.g., a mouse pointer) should be moved to point 505. Additionally, the command may be an edit command that modifies a graphical output (e.g., an annotation interface or a drawing element) at the location.

Although fig. 5B shows the recognition object 502 as being a certain distance from the display device 503, a touch input may be detected regardless of the distance. For example, if the user were to physically touch the display device 503, the techniques described above would still determine the input coordinates. In this case, the projected line between the object 502 and the intersection point will be just short.

Of course, touch input is not the only type of user input that can be determined from a captured image. The step of determining a user input based at least in part on the one or more directions and the one or more locations of the recognized object may comprise determining a gesture input. In particular, the location and orientation of the recognized object on the plurality of images may be analyzed to determine a corresponding gesture, such as a swipe gesture, a zoom-out gesture, and/or any known or customized gesture. The user may calibrate the virtual driver to recognize custom gestures that are mapped to specific environments and commands within those environments. For example, a user may create a custom gesture that maps to an operating system context and results in the execution of native operating system commands that launch a particular application.

As previously described, the information captured by the one or more hardware devices in step 301 of fig. 3 may also include voice information captured by a microphone. FIG. 6 illustrates a flow diagram for determining user input based at least in part on information captured by one or more hardware devices communicatively coupled to the system when the captured information is voice information. As described below, speech recognition is performed on the voice information to identify one or more words corresponding to the user input.

At step 601, sound data is received. As described above, sound data may be captured by a hardware device, such as a microphone, and received by a virtual driver. At step 602, the received voice data may be compared to a voice dictionary. The voice dictionary may include voice signatures of one or more recognized words, such as command words or command modifiers. At step 603, one or more words in the sound data are identified as user input based on the comparison. The identified word or words may then be converted to a transparent layer command and passed to the transparent layer.

As previously described, the driver, the expected type of user input, and the command generated based on the user input, emulated by the virtual driver may be determined based at least in part on one or more settings or previous user inputs.

Fig. 7 shows a tool interface 701, which may also be part of a transparent layer. Unlike transparent UIs, tool interface 701 is visible to the user and can be used to select between different options for changing the emulation mode of the virtual driver, native commands generated based on user input, or perform additional functions.

Button 701A allows the user to select the type of drawing tool used to graphically modify the user interface when the user enters coordinates (e.g., coordinates adjusted according to the user's hand or stylus/remote control touching the screen). The various drawing implements may include different brushes, colors, pens, highlighters, etc. These tools may result in graphical changes of different styles, thicknesses, colors, etc.

Button 701B allows the user to switch between selection, pointing or drawing modes when input coordinates are received as user input. In the selection mode, the input coordinates may be treated as a "touch" and result in the object being selected or opened at the input coordinates. In the pointing mode, the coordinates may be treated as a pointer (e.g., mouse pointer) location, effectively allowing the user to simulate a mouse. In the drawing mode, the coordinates may be processed as a location that changes the graphical output of the user interface to display the appearance of a drawing or writing on the user interface. The nature of the change may depend on the drawing tool selected, as discussed with reference to button 701A. Button 701B may also alert the virtual driver to the expected image input and/or motion input (if a motion tracking device is used) and emulate the appropriate driver accordingly.

Button 701C alerts the virtual driver that a voice command is desired. This may cause the virtual driver to emulate a driver corresponding to the coupled microphone to receive and parse the voice input, as described with respect to fig. 6.

Button 701D opens a launcher application, which may be part of a transparent layer, and may be used to launch an application within an operating system or to launch a particular command within an application. The launcher may also be used to customize options in the transparent layer, such as to customize voice commands, to customize gestures, to customize native commands for applications associated with user input and/or to calibrate hardware devices and user input (e.g., voice calibration, motion capture device calibration, and/or object recognition calibration).

Button 701E may be used to capture a screenshot of the user interface and export the screenshot as an image. This may be used in conjunction with the drawing mode of button 701B and the drawing tool of 701A. After the user has tagged a particular user interface, the tagged version may be exported as an image.

Button 701F also allows for image editing and may be used to change the color of an image or aspects of an image created by a user on a user interface. Similar to the drawing mode of the button 701B, the button changes the nature of the image change at the input coordinates.

The button 701G cancels drawing on the user interface. Selecting this button may remove all graphical indicia on the user interface and reset the underlying UI to the state it was in before the user created the drawing.

Button 701H may be used to launch a whiteboard application that allows a user to create drawings or writing on a virtual whiteboard using drawing patterns.

Button 701I may be used to add text annotations to objects, such as those shown in the operating system UI or application UI. The text annotations may be interpreted from a speech signal or typed in by the user using a keyboard.

The button 701J may be used to turn the tool interface 701 on or off. When closed, the tool interface may be minimized or completely removed from the underlying user interface.

As previously mentioned, a stylus or remote hardware device may be used with the present system and in conjunction with other hardware devices (e.g., a camera or camcorder). FIG. 8 shows an example of a stylus 801 that may be used with the system. The stylus 801 may communicate with the hardware receiver 802, for example, via bluetooth. The hardware receiver may be connected to a computer system, for example, via USB802B, and signals from the stylus passed to the computer system via the hardware receiver may be used to control and interact with a menu 803, the menu 803 being similar to the tool interface shown in fig. 7.

As shown in fig. 8, a stylus 801 may include physical buttons 801A. These physical buttons 801 can be used to turn on the stylus, navigate menus 803, and make selections. Additionally, the stylus 801 may include a unique tip 801B captured in an image by a camera and recognized by a virtual driver. This may allow the stylus 801 to be used for drawing and editing while in drawing mode. Stylus 801 may also include motion tracking hardware, such as accelerometers and/or tilt sensors, to aid in position detection when the stylus is used to provide input coordinates or gestures. Additionally, the hardware receiver 802 may include a calibration button 802A that, when pressed, may initiate a calibration utility in the user interface. This allows the stylus to be calibrated.

Returning to FIG. 3, at step 302, a context corresponding to the user input is identified. The identified context includes one of an operating system or an application executing on the operating system.

FIG. 9 illustrates a flowchart for identifying context corresponding to user input, according to an example embodiment. As shown in FIG. 9, operating system data 901, application data 902, and user input data 903 may all be used to determine context 904.

Operating system data 901 can include, for example, information about active windows in an operating system. For example, if the active window is a calculator window, the context may be determined to be a calculator application. Similarly, if the active window is a Microsoft Word window, the context may be determined to be a Microsoft Word application. On the other hand, if the active window is a folder, the active context may be determined to be the operating system. The operating system data may also include additional information such as which applications are currently executing, the last application launched, and any other operating system information that may be used to determine context.

The application data 902 may include, for example, information about one or more applications being executed and/or information mapping particular applications to certain types of user input. For example, a first application may be mapped to a speech input such that whenever a speech command is received, the context is automatically determined to be the first application. In another example, a particular gesture may be associated with the second application such that when the gesture is received as input, the second application is started or closed, or some action within the second application is performed.

The user input 903 may also be used to determine context in a variety of ways. As described above, certain types of user input may be mapped to certain applications. In the above example, the speech input is associated with a context of the first application. In addition, the attributes of the user input may also be used to determine context. The gesture or action may be mapped to an application or operating system. Specific words in the voice command may also be mapped to an application or operating system. The input coordinates may also be used to determine context. For example, a window in the user interface at the input coordinate location may be determined, and an application corresponding to the window may be determined as the context.

FIG. 10 illustrates an example of using input coordinates to determine context. As shown in fig. 10, the display device 1001 is displaying a user interface 1002. Also shown are a camera 1004 and a transparency layer 1003 superimposed on the underlying user interface 1003. The user points to location 1005 in user interface 1002 with stylus 1000. Since location 1005 is located within an application window corresponding to application 1, application 1 may be determined to be the context of the user input, as opposed to application 2, application 3, or the operating system.

Returning to FIG. 3, at step 303, the user input is converted into one or more transparent layer commands based at least in part on the identified context. As previously described, the transparent layer includes an Application Program Interface (API) configured to interface between the virtual driver and the operating system and/or an application executing on the operating system.

FIG. 11 shows a flow diagram for converting user input into transparent layer commands. As shown in step 1104 of fig. 11, a transparent layer command may be determined based at least in part on the identified context 1102 and the user input 1103. The transparent layer commands may include one or more native commands configured to execute in one or more corresponding contexts. The transparent layer commands may also include response outputs to be sent to the virtual driver and hardware device.

The identified context 1102 may be used to determine which transparent layer command should be mapped to the user input. For example, if the identified context is "operating system," the swipe gesture input may be mapped to a transparent layer command that causes the user interface to scroll through currently open windows within the operating system (by minimizing one open window and maximizing the next open window). Alternatively, if the identified context is a "web browser application," the same swipe gesture input may be mapped to a transparent layer command that causes the web page to scroll.

The user input 1103 also determines transparent layer commands because the user input specifically maps to certain native commands within one or more contexts, and these native commands are part of the transparent layer commands. For example, the voice command "open email" may be mapped to a specific operating system local command to launch the email application Outlook. When a voice input is received that includes the recognized word "open email," this results in a determination of a clear layer command that includes a native command to initiate Outlook.

As shown in fig. 11, the transparent layer command may also be determined based on one or more user settings 1101 and an API library 1104. The API library 1104 may be used to look up native commands corresponding to the identified context and the particular user input. In the example of a swipe gesture and a web browser application context, an API library corresponding to the web browser application may be queried for an appropriate API call to cause scrolling of the web page. Alternatively, the API library 1104 may be omitted and the native commands may be mapped to specific user inputs and identified contexts.

Where the user input is determined to be input coordinates, the transparent layer command is determined based at least in part on the input location coordinates and the identified context. In this case, the transparent layer commands can include at least one native command in the identified context that is configured to perform an action at a corresponding location coordinate in the underlying UI.

When there is more than one possible action mapped to a particular context and user input, the settings 1101 may be used to determine a corresponding transparent layer command. For example, button 701B of fig. 7 allows the user to select between pointing or drawing modes when input coordinates are received as user input. This setting can be used to determine transparent layer commands and, by extension, which native command to execute and which action to execute. In this case, possible native commands may include: a selection command configured to select an object associated with a respective location coordinate in the underlying UI; a pointer command configured to move a pointer to a corresponding location coordinate in the underlying UI; and an image command configured to change display output at the corresponding position coordinates in the underlying UI.

Fig. 12A shows an example of receiving input coordinates when switching the selection mode. As shown in fig. 12A, a user points at a stylus 1200 at an operating system UI 1202 (with a superimposed transparent UI 1203) on a display device 1201. Similar to the previous example, camera 1204 may be used to determine position and orientation information and input coordinates for stylus 1200. Since the selection mode is switched and the stylus 1200 is pointing to a folder 1205 within the operating system UI 1202, the determined transparent layer commands may include native operating system commands to select an object associated with the input coordinates (in this example, the folder 1205). In another example, if the window is located at the input coordinates, this will result in the selection of the entire window.

Fig. 12B shows an example of receiving input coordinates when switching the pointing mode. In this case, the determined transparent layer commands may include native operating system commands to move the mouse pointer 1206 to the location of the input coordinates.

Fig. 12C shows an example of receiving input coordinates when the drawing mode is switched and the user has swept the stylus 1200 over multiple input coordinates. In this case, the determined transparent layer commands may include native operating system commands to change the display output at the location of each input coordinate to produce a user draw line 1207 on the user interface 1202. The modified graphical output generated in the drawing mode may be stored as part of the transparency layer 1203, for example as metadata associated with the path of the input coordinates. The user may then select an option to export the changed display output as an image.

Where the user input is identified as a gesture, converting the user input to one or more transparent layer commands based at least in part on the identified context may include determining transparent layer commands based at least in part on the identified gesture and the identified context. The transparent layer commands may include at least one native command in the identified context that is configured to perform an action associated with the identified gesture in the identified context. Such examples are discussed above with respect to a swipe gesture and a web browser application context that result in a native command configured to perform a scrolling action in a web browser.

Where the user input is identified as one or more words (e.g., by using speech recognition), converting the user input into one or more transparent layer commands based at least in part on the identified may include determining transparent layer commands based at least in part on the identified one or more words and the identified context. The transparent layer command may include at least one native command in the identified context configured to perform an action associated with the identified one or more words in the identified context.

Fig. 13 shows an example of a transparent layer command 1300 determined based on one or more words identified in the input speech data. The recognized word 1301 includes one of the phrases "whiteboard" or "blank page". The transparent layer command 1300 also includes a description of the command 1302 and response instructions 1303, the response instructions 1303 being output instructions that the transparent layer sends to the virtual driver and hardware output device when executing the transparent layer command. Further, the transparent layer command 1300 includes an actual local command 1304 for invoking a whiteboard function.

FIG. 14 shows another example of a transparent layer command 1400 determined based on one or more words identified in the input speech data, according to an example embodiment. In this example, the one or more words are "open email". As shown in fig. 14, the transparent layer command 1400 includes a native command "Outlook. exe," which is an instruction to run a particular executable that launches the Outlook application. The transparent layer command 1400 also includes a voice response "email open" that will be output in response to receiving the voice command.

Returning to FIG. 3, at step 304, one or more transparent layer commands are executed on the transparent layer. Execution of the one or more transparent layer commands is configured to execute the one or more native commands in the identified context.

FIG. 15 illustrates a flow diagram for executing one or more transparent layer commands on a transparent layer in accordance with an exemplary embodiment. At step 1501, at least one local command in the transparent layer commands is identified. For example, native commands may be designated as native commands within a transparent layer command structure to allow recognition.

At step 1502, at least one native command is executed in the identified context. This step may include passing the at least one native command to the identified context via the API identified for the context and executing the native command within the identified context. For example, if the identified context is an operating system, native commands may be passed to the operating system for execution via an operating system API. Additionally, if the identified context is an application, the native command may be passed to the application for execution via an application API.

Optionally, at step 1503, a response may be sent to the hardware device. As previously described, the response may be routed from the transparent layer to the virtual driver and to the hardware device.

Figures 16-19 illustrate additional features of the system disclosed herein. FIG. 16 shows an example interface for adding a new command corresponding to a user input, according to an example embodiment. The dashboard in interface 1600 includes icons for applications 1601 that have been added and can be launched using predetermined user inputs and hardware devices (e.g., voice commands). The dashboard may also display other commands specific to the application and mapped to certain user inputs. Selecting add button 1602 opens an add command menu 1603. This menu allows the user to select between the following options: item type: a fixed item to be added to the bottom bar menu/a normal item to be added to the drag menu; icon: selecting an image icon; background: selecting a background icon color; color: selecting an icon color; name: setting a new project name; voice command: setting a voice activation command to open a new application; and (3) feedback response: setting application voice response feedback; command: selecting an application type or custom command type to be launched (e.g., launch application command, perform operation in application command, close application command, etc.); the process starts: if a new process or application is started, starting the name of the process or application; and parameters: to be passed to any parameter in the new process or application.

FIG. 17 illustrates various components and options of a drawing interface 1700 and drawing schema according to an exemplary embodiment. Fig. 18 shows a calibration and settings interface 1800 for a camera hardware device that recognizes objects and allows a user to provide input using touches and gestures. FIG. 19 illustrates a generic settings interface 1900 that allows a user to customize various aspects of the interface, switch input modes, and make other changes. As shown in interface 1900, the user may also access a settings page to calibrate and adjust settings of a hardware stylus (referred to as a "magic pen").

The system disclosed herein may be implemented on a plurality of networked computing devices and used in conducting a networked collaboration session. For example, the whiteboard functionality previously described may be a shared whiteboard between multiple users on multiple computing devices.

However, one problem with existing whiteboards or other shared collaboration spaces is that there is no simple way to interact with remote computing devices or share desktop screens without interrupting or disrupting collaboration sessions. For example, if a participant in a collaborative workspace wishes to share a display with other participants, all participants are required to minimize or close the collaboration session, execute a screen sharing application, and join a screen sharing conference. During a shared collaboration session, this often interrupts the workflow and shared brainstorming sessions that the collaboration space is intended to facilitate.

In addition to the previously described methods and systems for implementing a generic hardware-software interface, applicants have invented methods, apparatus, and computer-readable media that allow desktop sharing over a web socket connection in a networked collaborative workspace.

FIG. 20 illustrates a flowchart for desktop sharing through a web socket connection in a networked collaborative workspace, according to an exemplary embodiment. All of the steps shown in fig. 20 may be performed on a local computing device, such as a client device connected to a server, and do not require multiple computing devices. The disclosed process may also be implemented by a plurality of devices connected to a server.

At step 2001, a representation of a collaborative workspace hosted on a server is sent on a user interface of a local computing device. The collaborative workspace may be accessed by a plurality of participants on a plurality of computing devices through a web socket connection including a local participant at a local computing device and one or more remote participants at a remote computing device. As used herein, remote computing devices and remote participants refer to computing devices and participants other than local participants and local computing devices. The remote computing device is separated from the local device by a network, such as a Wide Area Network (WAN).

FIG. 21A illustrates a network architecture for a host and a sending cooperative work area in accordance with an illustrative embodiment. As shown in FIG. 21A, the server 2100 is connected to computing devices 2101A-2101F. The server 2100 and computing devices 2101A-2101F may be connected via a network connection (e.g., a web socket connection) that allows bidirectional communication between the computing devices 2101A-2101F (clients) and the server 2100. As shown in fig. 21A, the computing device may be any type of computing device, such as a laptop computer, desktop computer, smart phone, or other mobile device.

The collaborative workspace may be, for example, a digital whiteboard configured to propagate any edits from any of the multiple participants to the other participants through web socket connections. FIG. 21B illustrates a process for propagating edits to collaborative workspaces within a network in accordance with an exemplary embodiment. As shown in FIG. 21B, if a user at computing device 2101B edits or changes to the collaborative workspace, the edit or change 2102B is sent to the server 2100 where it is used to update the hosted version of the workspace 2102B. The edits or changes are then propagated by the server 2100 as

updates

2102A,2102C,2102D,2102E, and 2102F to the other

connected computing devices

2101A,2101C,2101D,2101E, and 2101F.

Each representation of the collaborative workspace may be a version of the collaborative workspace customized for the local participants. For example, as described above, each representation of a collaborative workspace may include one or more remote participant objects corresponding to one or more remote computing devices connected to a server.

FIG. 22 illustrates multiple representations of collaborative workspaces in accordance with an exemplary embodiment. As shown in fig. 22, server 2200 hosts collaborative workspace 2201. The version of the collaborative workspace hosted on the server is propagated to the connected devices, as previously described. FIG. 22 also shows a representation of the collaborative workspace for three connected users, user 1, user 2 and user 3. Each representation may optionally be customized for a local participant (for a local computing device at each location).

Returning to FIG. 20, at step 2002, a request to share at least a portion of a native desktop of a local computing device within a collaborative workspace and a selection of a region within a representation of the collaborative workspace is received by the local computing device.

23A-23B illustrate examples of steps for receiving a request within a collaborative workspace to share at least a portion of a native desktop of a native computing device and selecting a region within a representation of the collaborative workspace, according to example embodiments.

Fig. 23A illustrates an example of a user interface (desktop) of a local computing device prior to receiving a request and region selection. As shown in FIG. 23A, the user interface 2301 includes a collaboration application 2302 that locally displays a representation of a collaboration workspace 2303 hosted on a server, as well as a separate display application 2308 (e.g., Powerpoint (TM)) and a separate document editing application (e.g., WordTM). All user applications executing on the local computing device are shown as tabs in a task bar 2306 of the operating system ("OS"), displaying menus related to the OS in addition to OS menu buttons.

The collaborative application 2302 may include a representation of a collaborative workspace 2303, the collaborative workspace 2303 containing all edits and contributions of the local participant and any other participants and the toolbar 2304. Toolbar 2304 may include various editing tools, settings, commands, and options for interacting with or configuring representations of collaborative workspaces. For example, toolbar 2304 may include editing tools to draw a representation of collaborative workspace 2303, where edits are propagated through a web socket connection to servers and other connected computing devices.

Toolbar 2304 also includes a screen share button 2305 that, when selected, causes the local computing device to receive a request to share at least a portion of the local desktop of the local computing device within the collaborative workspace. Thus, the user may initiate screen sharing within the collaborative workspace by selecting screen sharing button 2305.

Fig. 23B illustrates an example of a user interface (desktop) of the local computing device after receiving the request and before selecting the region. As shown in FIG. 23B, selection of the screen share button 2305 may cause a region window 2309 to appear within the representation of the collaborative workspace 2303. Window 2309 determines the resulting output area for screen sharing of the native desktop (or a portion of the native desktop), and may be moved and/or customized by the user in terms of size, shape, orientation, location, and the like. Once the user has selected the position/size/shape for window 2309, the user may complete the selection by some input (e.g., pressing a pointing device, re-selecting button 2305, or some other input). The selected area including the relevant parameters (size, shape, orientation, etc.) within the collaborative workspace may then be received by the local computing device. Alternatively, the area may be set to some default value, including default size, position and orientation, and may be further configured by the user if the user wishes to deviate from the area.

Of course, the process illustrated in FIGS. 23A-23B is merely one example of receiving a request to share at least a portion of a local desktop of a local computing device within a collaborative workspace and a selection of an area within a representation of the collaborative workspace. This step can be implemented in a number of ways. For example, screen share button 2305 may be dragged into collaborative workspace 2303 instead of being selected. The screen sharing request may also be initiated by the user using some input command, such as a keyboard command or a selection within a menu or submenu, which may be recognized by the collaboration application as a request to share the screen. The request to initiate screen sharing within the collaborative workspace may also be initiated after a separate screen sharing session has been initiated. For example, the user may drag a taskbar tab, icon or screen sharing window to a location within the collaborative workspace causing the computing device to receive requests and selections of regions within the collaborative workspace.

The step of receiving a request to share at least a portion of a local desktop of a local computing device and a selection of a region within a representation of a collaborative workspace may include a sub-step that allows a user to select a source for screen sharing, such as whether to share their entire desktop, one or more windows within their desktop, or output associated with one or more applications running on their local computing device. These substeps may include transmitting a source selection interface within the user interface, the source selection interface configured to receive a selection of at least a portion of the native desktop, and receiving a selection of at least a portion of the native desktop within the source selection interface.

Fig. 24A to 24C illustrate an example of a source selection process according to an exemplary embodiment. Fig. 24A shows a user interface (desktop) 2406 of the local computing device before the user selects any screen sharing commands or buttons. Numerals 2401-2408 represent the same components as numerals 2301-2308 in fig. 23A discussed above.

FIG. 24B shows the user interface 2406 after the user has selected the screen share button 2405. As shown in FIG. 24B, a source selection interface 2409 may be sent within the collaborative workspace 2403 or within the collaborative application 2404 that allows users to select whether they want to share their entire desktop or a portion of their desktop, and which portion of their desktop they want to share. The source selection interface may list all currently active applications running on the local computing device, as well as any windows (e.g., windows corresponding to the OS or windows created by the applications), and allow the user to select between sharing the entire local desktop, sharing one or more windows within the local desktop, or sharing one or more interfaces corresponding to one or more applications executing on the local computing device. For example, if a user selects an application to share, all interfaces (e.g., windows, prompts, displays, etc.) associated with the application may be shared. If the user selects a single window to share, only that window will be shared. Additionally, if a user chooses to share their entire desktop, the contents of the entire desktop can be shared with other participants.

FIG. 24C shows interface 2401 after the user has selected "document editing application" within selection interface 2409. This selection will specify the document editing application as the source of the screen sharing stream, meaning that other participants in the collaborative workspace will be able to view the interface corresponding to the document editing application executing on the local computing device. The selection may be stored in memory and/or passed to an application or program that generates a stream object that captures the relevant portion of the desktop, as will be discussed further below.

The source selection steps described above and with respect to fig. 24A-24C may be performed as part of, before or after the selection of the regions discussed with respect to fig. 23A-23B. For example, after the user selects an area of the screen sharing window, the system may display a source selection interface. Alternatively, the source selection interface may be displayed before the selection area. The source selection process may also be performed at a later step in the overall process, such as when generating a stream object.

The source selection process may also be omitted (by default, the entire desktop is shared) and/or may be performed in other ways. For example, rather than displaying a source selection interface, a prompt may be displayed that instructs the user to select all active windows they want to share, or to enter a command to share the entire desktop. Many variations are possible and these embodiments are not limiting.

The inputs described with respect to step 2002 and FIGS. 23A-23B and 24A-24C may be received by any type of pointing device, such as a mouse, touch screen or stylus. The previously described techniques involving virtual drivers and/or transparent layers may be used to detect input. For example, the input may be a pointing gesture of the user. Additionally, the above actions, such as drag and drop actions, selections, deselections, or other input or input sequences, may also be input using the previously described techniques involving virtual drivers and/or transparent layers.

Returning to FIG. 20, at step 2003, a stream object configured to output a video stream of at least a portion of a local desktop of a local computing device is generated. The stream object may be a media stream, such as a video stream, configured to capture a stream of at least a portion of the local desktop.

As previously described, a representation of a collaborative workspace hosted on a server may be transmitted on a local computing device by a local collaboration application executing on the local computing device. The collaboration application may be, for example, a web application, and communicates and interfaces with a screen capture program on the local computing device. The screen capture program is a program configured to generate a stream of at least a portion of the desktop. The collaboration application may interface with the screen capture program via an Application Program Interface (API). In addition, the collaboration application may interface with the screen capture program via a transparent layer that itself interfaces with multiple applications running on the local computing device. The screen capture program functionality for generating the media stream may also be integrated into the collaboration application so that the collaboration application may simply call the relevant routine or process to instantiate the stream object.

At step 2501, the local collaboration application sends a request for a source identifier to a screen capture program executing on the local computing device via an Application Program Interface (API) between the local collaboration application and the screen capture program. As previously described, the API may be the transparent layer itself. The request may include additional attributes, such as a selected source of a screen sharing stream (such as a particular application or window). Alternatively, the source selection process may be performed after the request is submitted or omitted rather than the default source (e.g., the entire desktop). The source identifier is a handle or address of the media stream to be created and allows an application to access the output of the media stream and the resulting screen share.

At step 2502, the screen capture program initiates a flow of at least a portion of a native desktop of the native computing device, the flow having a corresponding source identifier. When the source parameters are provided to the screen capture program, the screen capture program may initiate the flow using only the identified components (e.g., a particular application or window). Otherwise, the screen capture program may initiate streaming of the entire native desktop by default, or display source selection options to the user as previously described. The initiated stream is a screen capture sequence that periodically captures a snapshot of at least a portion of the desktop (e.g., 30 times per second). The stream may be accessed using a source identifier, which, as described above, is a handle that allows a program to access the stream.

At step 2503, the screen capture program sends the source identifier to the local collaboration application. At step 2504, the local collaboration application generates a stream object based at least in part on the source identifier. In addition to the source identifier, the local collaboration application may optionally utilize earlier provided information, such as a user-specified region, to create the stream object. The stream objects are media streams and corresponding output interfaces having a defined format. The defined format may optionally be based on user input, such as a selected region. The stream object is a media stream object that is compatible with the video stream from the participant's camera and is configured to be embedded within the collaborative workspace.

The screen capture program is a program configured to generate a stream of the native desktop or a stream of a portion of the native desktop or a component integrated into the native collaboration application and configured to generate a stream of the native desktop or a portion of the native desktop. For example, the screen capture program may be a web browser or browser engine component, which is the basis or endpoint for web real-time communication (WebRTC) streaming. The following section provides an exemplary implementation of the step of generating a stream object when the screen capture program is chrome.

Get usermedia () function interface can access the screen capture function in google browser (Chrome). The gUM function may be called once to retrieve the user audio/video stream and the gUM function may be called a second time to obtain the screen stream.

In Chrome, permissions to use screen capture functionality may be enabled by utilizing a Chrome extension in a web application (e.g., one possible implementation of a collaboration application). Extending usage functions

chroma. The SourceID may then be used as a parameter in the gUM function to retrieve the corresponding stream.

Extensions for screen sharing may include content scripts that run in the context of a collaborative application and background scripts that run in a separate extension context. The content script may communicate with the collaboration application by sending a message to the window or via a Document Object Model (DOM) operation, while the background script cannot. The background script can access all Chrome extension APIs, but the content script cannot. The content script and the background script may communicate with each other through a function chrome. Given this architecture, the process of generating a stream object configured to output a video stream of at least a portion of a local desktop of a local computing device may be performed by:

(1) the collaboration application sending a request for a screen sharing source identifier to the content script;

(2) the content script passes the request to the background script;

(3) the background script calls the function chrome.

(4) The content script returns it to the collaboration application, which ultimately calls the getUserMedia function with the source identifier as one of the constraints/parameters.

For the gUM function in Chrome, the constraints for the video stream may include { Chrome media Source: 'desktop'; 1920 parts of maxWidth; 1080; maxFrameRate 10; minAspectRatio 1.77; chroma media sourceId: sourceId } or { maxWidth: 1920; 1080; maxFrameRate 10; minAspectRatio 1.77; chromeMediaSourceId sourceId } is used.

The screen called by share gUM returns a mediaStream, which can be shared as WebRTC mediaStream over the peer connection.

Of course, the above implementation using the Chrome browser as the screen capture program is provided as an example only, and the step of generating the stream object may be performed using other programs or browsers that support the screen capture function (e.g., Firefox browser Firefox) or a separate and independent screen capture program.

Returning to FIG. 20, at step 2004, the local computing device sends one or more commands to the server over the web socket connection. The one or more commands may include a stream object and information corresponding to the selected region, and are configured to cause the server to insert the stream object into the collaborative workspace based at least in part on the selected region.

For example, if the user previously selected a circular area in the lower right corner of the collaborative workspace as the selection area for screen sharing, the server may insert the stream object into the collaborative workspace such that the media stream is displayed in a circular format in the lower right corner of the collaborative workspace when the media stream is embedded in the collaborative workspace. The size and orientation of the circle may be based on the same properties of the selected area. Of course, like any other object in the collaborative workspace, the stream object may be adjusted or moved by the participant after the participant embeds it in the collaborative workspace by interacting with a representation of their collaborative workspace.

The format of the stream object within the collaborative workspace may be determined based on a previously selected region that includes attributes of the selected region, such as shape, size, and location. These attributes may be sent with the stream object to one or more commands of the server. The server may then determine an insertion point and format for embedding the stream object into the collaborative workspace based on these attributes.

Alternatively, the stream object may be a media stream object having predetermined spatial attributes based on a user's previous selection of a region. In this case, when the stream object is generated at the local computing device, the display properties of the stream object may be integrated into the stream object. The stream object (with embedded spatial attributes) may then be sent to a server, which embeds the stream object in a suitable format in a collaborative workspace at a suitable location based on the embedded spatial attributes.

In addition to including the stream object itself, the one or more commands may optionally include an address of the stream object or other identifier that the server may use to retrieve the stream object or instantiate its own instance of the stream object.

The server inserts the stream object into the collaborative workspace such that the representation of the stream object is propagated through the web socket connection to the plurality of computing devices. Thus, each connected computing device will have a representation of the stream object in their respective representations of the collaborative workspaces.

The inserted stream object is configured to receive a video stream of at least a portion of the local desktop of the local computing device and to send the video stream of at least a portion of the local desktop of the local computing device to the plurality of computing devices over the web socket connection.

As previously described, the process includes forwarding the stream information from a local computing device that instantiates a stream object (and is identified by a stream identifier as the source of the media stream) to the server and then to each of the plurality of computing devices connected to the server in the representation of the collaborative workspace. Thus, the stream object itself may be embedded within a collaborative workspace on the server, and the resulting stream may be propagated to the connected clients.

FIG. 26 illustrates a process of sending commands and propagating stream objects from a local computing device in accordance with an illustrative embodiment. As shown in fig. 26, the local computing device 2601 sends a command (including a stream object or a reference/pointer to a stream object) to the server 2600. The server 2600 then inserts the stream object into the collaborative workspace, causing the collaborative workspace with the embedded stream object to be propagated to all connected devices, including the local computing device 2601 and the

remote computing devices

2602 and 2603.

FIG. 27 shows an example of an interface of a local computing device after a server embeds a stream object into a collaborative workspace, according to an example embodiment. Numerals 2701- < - > 2708 correspond to the same components described with respect to numerals 2301- < - > 2308 in fig. 23A. Fig. 27 also shows an embedded stream object 2709 that displays the media stream of the user's desktop. In this case, it is assumed that the selected source is the entire desktop. Each remote participant connected to the server will have the same stream object embedded within the representation of their collaborative workspace. As shown in FIG. 27, the resulting embedded stream provides a "picture-in-picture" effect that allows the local and remote participants to view the content of the shared screen within the environment of the collaborative workspace. Thus, participants may share related programs and information without interrupting the collaboration session.

In addition to the above-described techniques, applicants have invented new techniques for allowing local and remote participants to control the desktop or portions of the desktop displayed within an embedded stream object. This novel technique utilizes a transparent layer and allows a user (both locally and remotely) to efficiently navigate through a desktop or a portion of a desktop displayed within an embedded stream object.

At step 2801, the inserted stream object is transmitted within a representation of a collaborative workspace on a user interface of the local computing device. The inserted stream object is associated with a network address of the source of the video stream. The association may be provided by the server in the form of tags or metadata associated with the stream objects. Further, the association may be part of the stream object and may be based on, for example, the source identifier discussed above. For example, when creating a stream object, the device that created the stream object may include a tag that indicates the IP address of the device.

At step 2802, a transparent layer executing on the local computing device detects a user input associated with the inserted stream object, the user input corresponding to a location within the local desktop. As previously described, the transparent layer includes an Application Program Interface (API) configured to interface with one or more of an operating system or one or more applications configured to execute on the operating system. The transparent layer may detect user input associated with the inserted stream object based on the location of the input (determined by the coordinates) and the location of the stream object. For example, if there is an overlap between a mouse click and some portion of a stream object, the input may be detected as a user input associated with the inserted stream object.

The user input may also be mapped to a particular location within the local desktop based on the location of the input within the inserted stream object. Again, a map may be stored indicating the area or coordinates within the inserted stream object associated with a different portion of the native desktop, and this location may be mapped to a corresponding portion of the native desktop. For example, the sub-regions of the inserted stream object may be associated with a particular application occupying a corresponding region in the native desktop, or may be associated with corresponding coordinates in the native desktop.

The mapping process may utilize a scaling mechanism or process that detects the relative position of the input within the inserted stream object and maps the relative position to an absolute position within the desktop (or a portion of the desktop) being streamed by the stream object.

Additionally, as previously described, the input may come from a pointing device, such as a mouse, or via other input means, such as an input mechanism that relies on a virtual driver and a transparent layer.

At step 2804, the transparent layer executing on the local computing device determines that the network address associated with the inserted stream object corresponds to the network address of the local computing device. This may be determined, for example, by comparing the IP address of the device providing the input with the IP address associated with the stream object to determine if a match exists.

At step 2805, based on a determination that the network address associated with the inserted stream object corresponds to a network address of a computing device providing the input, the transparent layer sends one or more second commands to one or more of the operating system or one or more applications configured to execute on the operating system, the one or more second commands configured to perform the user input at a location within the local desktop.

As previously described, the transparent layer may interface with the OS or applications running on the OS. Thus, any input within the inserted stream object may be mapped to a corresponding location within the local desktop, and commands may be sent (depending on the relevant context, as previously discussed) to the appropriate application or to the OS to perform the input at the corresponding location within the local desktop.

As shown in fig. 29A, a local user interface (desktop) 2901 includes a collaboration application 2902 that displays a representation of a collaboration workspace. The representation includes an inserted/embedded stream object 2903, the stream object 2903 is streaming the native desktop. Local user interface 2901 also includes a taskbar 2906 including an OS menu button 2905. As shown, the mouse pointer is over a button 2904 within the inserted stream object 2903 that corresponds to an OS menu button 2905 within the local desktop.

Fig. 29B shows the result of a user clicking at the position of the button 2904 within the stream object 2903. As a result of this input detected by the transparent layer, the location of the input within the stream object 2903 is mapped to a corresponding location within the desktop 2901. Since the corresponding location is the OS menu button 2905, this input causes the transparent layer to send a command to the OS to activate the OS menu button 2905. This change in desktop 2901 is itself captured by the stream object showing button 2904 within the inserted stream object that was activated.

Fig. 29C shows the interface 2901 and inserted stream object 2903 after the input is sent to the local desktop. As shown in fig. 29C, the OS menu is opened and includes a list of selectable indicators 2907. This change is thus captured by the inserted stream object 2903, the stream object 2903 itself displaying a corresponding opening of the button 2904, including a list of selectable indicators 2908.

As indicated above, the transparent layer may be effectively used to control the native desktop through the embedded stream object. This effectively provides users participating in the collaboration session with a remote control interface that allows the users to stay within the collaboration session and simultaneously navigate their desktop or applications within the desktop that they share with other participants.

The present system may also be used to allow remote participants to control the desktop or a portion of the desktop that is being shared. This functionality has great utility because it allows remote participants to access other desktops and applications that are shared via streaming objects inserted within the collaborative workspace.

At step 3001, a stream object inserted within the representation of the collaborative workspace is transmitted on a user interface of the remote computing device. The inserted stream object is associated with a network address of the source of the video stream. The association may be provided by the server in the form of tags or metadata associated with the stream objects. Further, the association may be part of the stream object and may be based on, for example, the source identifier discussed above. For example, when creating a stream object, the device that created the stream object may include a tag that indicates the IP address of the device.

At step 3002, a transparent layer executing on the remote computing device detects user input associated with the inserted stream object, the user input corresponding to a location within the local desktop. As previously described, the transparent layer is configured as an Application Program Interface (API) that interfaces with one or more of the operating system or one or more applications configured to execute on the operating system. The transparent layer may detect user input associated with the inserted stream object based on the location of the input (determined by the coordinates) and the location of the stream object. For example, if there is an overlap between a mouse click and some portion of a stream object, the input may be detected as a user input associated with the inserted stream object.

The user input may also be mapped to a particular location within the local desktop based on the location of the input within the inserted stream object. Again, a mapping may be stored that indicates the area or coordinates within the inserted stream object associated with a different portion of the native desktop, and this location may be mapped to a corresponding portion of the native desktop. For example, the sub-regions of the inserted stream object may be associated with a particular application occupying a corresponding region in the native desktop, or may be associated with corresponding coordinates in the native desktop.

At step 3004, the transparent layer executing on the remote computing device determines that the network address associated with the inserted stream object does not correspond to the network address of the remote computing device. This may be determined, for example, by comparing the IP address of the device providing the input (the remote computing device) to the IP address associated with the stream object to determine if there is a match.

At step 3005, based on a determination that the network address associated with the inserted stream object does not correspond to the network address of the computing device providing the input, the transparent layer sends one or more second commands to the local computing device over the web socket connection, the one or more second commands configured to cause the local transparent layer executing on the local computing device to perform the user input at a location within the local desktop.

The one or more second commands may be routed from the remote computing device to the local computing device through the server and web socket connection. In particular, the one or more second commands may be sent to a server having the destination address as the IP address of the local computing device, which is then routed by the server to the local computing device.

The one or more second commands may be configured to cause the local transparent layer at the local computing device to send local commands to one or more of the local operating system or one or more local applications configured to execute on the local operating system, the one or more local commands configured to perform user input at the location within the local desktop.

As previously described, the transparent layer may interface with the OS or applications running on the OS. Thus, any input within the inserted stream object may be mapped to a corresponding location within the local desktop, and commands may be sent from the local transparent layer (depending on the relevant context as previously described) to the appropriate application or OS on the local computing device to perform the input at the corresponding location within the local desktop.

As shown in fig. 31A, remote user interface (desktop) 3101 includes collaboration application 3102 that displays a representation of a collaboration workspace. The representation includes an inserted/embedded streaming object 3103 that is streaming the native desktop (as used herein, "native" refers to a device that instantiates the streaming object and shares its desktop or a portion of its desktop). The remote user interface 3101 also includes a taskbar and a window corresponding to a web browser application running on the remote desktop. As shown, the mouse pointer is over button 3104 within the inserted stream object 3103, button 3104 corresponding to the OS menu button within the native desktop being streamed.

Fig. 31B shows the result of a user clicking at the position of button 3104 within stream object 3103. As a result of this input detected by the remote transparent layer, the location of the input within stream object 3103 is mapped to a corresponding location within the local desktop being streamed. The remote transparent layer then sends a command to the local transparent layer on the local computing device to cause an input at a corresponding location within the local desktop. Since the corresponding location is an OS menu button of the native desktop, the input causes the remote transparent layer to send a command to the native transparent layer, which itself sends a command to the native OS to activate the OS menu button of the native desktop. This change in the local desktop is captured by stream object 3103, which stream object 3103 shows button 3104 within the inserted stream object being activated. Note that remote desktop 3101 is not affected by this input (other than the update to stream object 3103) because the inserted stream object is not a stream remote desktop, but a different desktop associated with the local computing device.

Fig. 31C shows the interface 3101 and inserted stream object 3103 after the input is sent to the native desktop. At the time shown in FIG. 31C, the native OS menu in the streaming native desktop is opened and includes a list of selectable indicators. This change is thus captured by the inserted stream object 3103, which itself displays the corresponding opening of button 3104 to include a list of selectable indicators.

As indicated above, the transparent layer may be used to control the remote desktop through the embedded stream object. This effectively provides users participating in the collaboration session with a remote control interface that allows the user to stay within the collaboration session and simultaneously navigate the desktops or applications of other participants within the collaboration workspace. For example, if two participants are displaying displays to a set of other participants, a first display participant may share a display application on their desktop and interpret a first set of slides shared with stream objects in a collaborative workspace. The first display participant may then "yield" control of the display application to the second display participant, which may remotely control the display application on the first display participant's desktop.

Alternatively, the remote control functionality may include permissions, authentication, or some other access control mechanism that allows each participant to configure whether and which participants may remotely control their shared desktop through the stream object. For example, each user may store preferences indicating whether they are allowed their local desktop or a portion of their local desktop to be controlled by other participants. These preferences may be stored at each computing device (and may be accessed by and used by the transparent layer to allow or prevent remote control input), or may be stored at and used by the server to allow or prevent remote control input between computing devices. Regardless of how these access control mechanisms are stored, they may be used to determine whether a remote participant may provide input to another participant's desktop via an inserted stream object.

One or more of the above-described techniques may be implemented in or involving one or more computer systems. Fig. 32 illustrates an example of a special purpose computing environment 3200. The computing environment 3200 is not intended to suggest any limitation as to the scope of use or functionality of the described embodiments.

Referring to fig. 32, the computing environment 3200 includes at least one processing unit 3210 and memory 3220. The processing unit 3210 executes computer-executable instructions and may be a real or virtual processor. In a multi-processing system, multiple processing units execute computer-executable instructions to increase processing power. The memory 3220 may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory, etc.), or some combination of the two. The memory 3220 may store software 3280 that implements the techniques.

The computing environment may have additional features. For example, the computing environment 3200 includes a memory 3240, one or more input devices 3250, one or more output devices 3260, and one or more communication connections 3290. An interconnection mechanism 3270, such as a bus, controller or network, interconnects the components of the computing environment 3200. Typically, operating system software or firmware (not shown) provides an operating environment for other software executing in the computing environment 3200, and coordinates activities of the components of the computing environment 3200.

The memory 3240 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment 3200. Memory 3240 may store instructions for software 3280.

Input device 3250 may be a touch input device such as a keyboard, mouse, pen, trackball, touch screen, or game controller, a voice input device, a scanning device, a digital camera, a remote control, or another device that provides input to computing environment 3200. Output device 3260 may be a display, television, monitor, printer, speaker, or another device that provides output from computing environment 3200.

Communication connection 3290 enables communication over a communication medium with another computing entity. The communication medium transmits information such as computer-executable instructions, audio or video information, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.

Implementations may be described in the context of computer-readable media. Computer readable media are any available media that can be accessed within a computing environment. By way of example, and not limitation, within computing environment 3200 computer-readable media may comprise memory 3220, memory 3240, communication media, and combinations of any of the above.

Of course, for ease of identification, fig. 32 illustrates computing environment 3200, display device 3260 and input device 3250 as separate devices. The computing environment 3200, the display device 3260 and the input device 3250 may be separate devices (e.g., personal computers connected to a monitor and mouse by wires), may be integrated in a single device (e.g., a mobile device with a touch display, such as a smart phone or tablet) or any combination of devices (e.g., a computing device operatively coupled to a touch screen display device.

Having described and illustrated the principles of the invention with reference to described embodiments, it will be recognized that the described embodiments can be modified in arrangement and detail without departing from such principles. Elements of the described embodiments shown in software may be implemented in hardware and vice versa.

In view of the many possible embodiments to which the principles of our invention may be applied, we claim as our invention all such embodiments as may come within the scope and spirit of the following claims and equivalents thereto.

Claims

1. A method for desktop sharing in a networked collaborative workspace through a web socket connection, the method comprising:

sending, on a user interface of a local computing device, a representation of a collaborative workspace hosted on a server and accessible by a plurality of participants on a plurality of computing devices over a web socket connection;

the local computing device receiving a request to share at least a portion of a local desktop of the local computing device within the collaborative workspace and a selection of an area within the representation of the collaborative workspace;

generating, by the local computing device, a stream object configured to output a video stream of at least a portion of the local desktop of the local computing device; and

sending, by the local computing device, one or more commands to the server over the web socket connection, the one or more commands including the stream object and information corresponding to the selected region and configured to cause the server to insert the stream object into the collaborative workspace at the selected region.

2. The method of claim 1, wherein receiving the request to share the at least a portion of the native desktop of the native computing device and the selection of the area within the representation of the collaborative workspace comprises:

sending a source selection interface within the user interface, the source selection interface configured to receive a selection of the at least a portion of the native desktop; and

receiving a selection of the at least a portion of the native desktop.

3. A method as described in claim 1, wherein the at least one portion includes one of a window within the local desktop, an interface corresponding to an application executing on the local computing device, or the local desktop.

4. The method of claim 1, wherein transmitting, by a local collaboration application executing on the local computing device, a representation of the collaborative workspace hosted on the server on the local computing device, and wherein generating the stream object configured to output a video stream of the at least a portion of the local desktop of the local computing device comprises:

the local collaboration application sending a request for a source identifier to a screen capture program executing on the local computing device via an Application Program Interface (API) between the local collaboration application and the screen capture program;

initiating, by the screen capture program, a flow of the at least a portion of the native desktop of the native computing device, the flow having a corresponding source identifier;

the screen capture program sending the source identifier to the local collaboration application;

generating, by the local collaboration application, the stream object based at least in part on the source identifier.

5. The method of claim 1, wherein the inserted stream object is configured to receive the video stream of the at least a portion of the local desktop of the local computing device and to send the video stream of the at least a portion of the local desktop of the local computing device to the plurality of computing devices over the web socket connection.

6. The method of claim 1, further comprising:

transmitting, on a user interface of the local computing device, an inserted stream object within the representation of the collaborative workspace, the inserted stream object being associated with a network address of a source of the video stream; and

detecting, by a transparent layer executing on the local computing device, a user input associated with the inserted stream object, the user input corresponding to a location within the local desktop, wherein the transparent layer comprises an Application Program Interface (API) configured to interface with one or more of an operating system or one or more applications configured to execute on the operating system;

determining, by the transparent layer executing on the local computing device, that the network address associated with the inserted stream object corresponds to a network address of the local computing device; and

sending, by the transparent layer executing on the local computing device, one or more second commands to one or more of the operating system or the one or more applications configured to execute on the operating system, the one or more second commands configured to perform the user input at the location within the local desktop.

7. The method of claim 1, further comprising:

transmitting, on a remote user interface of a remote computing device of the plurality of computing devices, the inserted stream object within a remote representation of the collaborative workspace, the inserted stream object associated with a network address of a source of the video stream; and

detecting, by a remote transparent layer executing on the remote computing device, a remote user input associated with the inserted stream object, the remote user input corresponding to a location within the local desktop, wherein the transparent layer comprises an Application Program Interface (API) configured to interface with one or more of an operating system or the one or more applications configured to execute on the operating system;

determining, by the remote transparent layer executing on the remote computing device, that the network address associated with the inserted stream object does not correspond to a network address of the remote computing device; and

sending, by the remote transparent layer executing on the remote computing device, one or more second commands to the local computing device over the web socket connection, the one or more second commands configured to cause a local transparent layer executing on the local computing device to perform the user input at the location within the local desktop.

8. A local computing device for desktop sharing over a web socket connection in a networked collaborative workspace, the local computing device comprising:

one or more processors; and

one or more memories operatively coupled to at least one of the one or more processors and having instructions stored thereon that, when executed by at least one of the one or more processors, cause at least one of the one or more processors to:

sending, on a user interface of the local computing device, a representation of a collaborative workspace hosted on a server and accessible by a plurality of participants on a plurality of computing devices through a web socket connection;

receiving a request to share the at least a portion of the native desktop of the local computing device within the collaborative workspace and a selection of an area within the representation of the collaborative workspace;

generating a stream object configured to output a video stream of the at least a portion of the native desktop of the native computing device; and

sending one or more commands to a server over the web socket connection, the one or more commands including the stream object and information corresponding to the selected region and configured to cause the server to insert the stream object into the collaborative workspace at the selected region.

9. The local computing device of claim 8, wherein the instructions, when executed by at least one of the one or more processors, cause at least one of the one or more processors to receive a request to share the at least a portion of the local desktop of the local computing device and a selection of an area within the representation of the collaborative workspace cause at least one of the one or more processors to:

receiving a selection of the at least a portion of the native desktop.

10. A local computing device as described in claim 8, wherein the at least one portion comprises one of a window within the local desktop, an interface corresponding to an application executing on the local computing device, or the local desktop.

11. The local computing device of claim 8, wherein the representation of the collaborative workspace hosted on the server is sent on the local computing device by a local collaboration application executing on the local computing device, and wherein the instructions, when executed by at least one of the one or more processors, cause the at least one of the one or more processors to generate a stream object configured to output a video stream of the at least one portion of the local desktop of the local computing device, the instructions further causing at least one of the one or more processors to perform:

initiating, by the screen capture program, a flow of the at least a portion of the native desktop of the native computing device, the flow having the corresponding source identifier;

12. The local computing device of claim 8, wherein the inserted stream object is configured to receive the video stream of the at least a portion of the local desktop of the local computing device and to send the video stream of the at least a portion of the local desktop of the local computing device to the plurality of computing devices over the web socket connection.

13. The local computing device of claim 8, wherein at least one of the one or more memories has further instructions stored thereon that, when executed by at least one of the one or more processors, cause at least one of the one or more processors to:

the operating system to which one or more second commands are sent by the transparent layer executing on the local computing device is configured to execute one or more of the one or more applications executing on the operating system, the one or more second commands configured to execute the user input at the location within the local desktop.

14. At least one non-transitory computer-readable medium storing computer-readable instructions, wherein the computer-readable instructions, when executed by a local computing device, cause the local computing device to:

15. The at least one non-transitory computer-readable medium of claim 14, wherein the instructions, when executed by the local computing device, cause the local computing device to receive a request to share at least a portion of a local desktop of the local computing device and a selection of an area within the representation of the collaborative workspace cause the local computing device to:

receiving a selection of the at least a portion of the native desktop.

16. The at least one non-transitory computer-readable medium of claim 14, wherein the at least one portion comprises one of a window within the local desktop, an interface corresponding to an application executing on the local computing device, or the local desktop.

17. The at least one non-transitory computer-readable medium of claim 14, wherein the representation of the collaborative workspace hosted on the server is sent on the local computing device by a local collaboration application executing on the local computing device, and wherein the instructions, when executed by the local computing device, cause the local computing device to generate a stream object configured to output a video stream of the at least a portion of the local desktop of the local computing device, the instructions further causing the local computing device to:

the screen capture program sends the source identification to the local collaboration application;

18. The at least one non-transitory computer-readable medium of claim 14, wherein the inserted stream object is configured to receive the video stream of the at least a portion of the local desktop of the local computing device and to send the video stream of the at least a portion of the local desktop of the local computing device to the plurality of computing devices over the web socket connection.

19. The at least one non-transitory computer-readable medium of claim 14, further storing computer-readable instructions that, when executed by the local computing device, cause the local computing device to:

transmitting, on a user interface of the local computing device, an inserted stream object within a representation of the collaborative workspace, the inserted stream object being associated with a network address of a source of the video stream; and

20. The at least one non-transitory computer-readable medium of claim 14, further storing computer-readable instructions that, when executed by a remote computing device of the plurality of computing devices, cause the remote computing device to:

transmitting, on a remote user interface of the remote computing device, the inserted stream object within a remote representation of the collaborative workspace, the inserted stream object associated with a network address of a source of the video stream; and

detecting, by a remote transparent layer executing on the remote computing device, a remote user input associated with the inserted stream object, the remote user input corresponding to a location within the local desktop, wherein the transparent layer comprises the Application Program Interface (API) configured to interface with one or more of an operating system or the one or more applications configured to execute on the operating system;