US20160077703A1 - Switching Between Views Using Natural Gestures - Google Patents
Switching Between Views Using Natural Gestures Download PDFInfo
- Publication number
- US20160077703A1 US20160077703A1 US14/946,415 US201514946415A US2016077703A1 US 20160077703 A1 US20160077703 A1 US 20160077703A1 US 201514946415 A US201514946415 A US 201514946415A US 2016077703 A1 US2016077703 A1 US 2016077703A1
- Authority
- US
- United States
- Prior art keywords
- data
- view
- screen
- video
- mobile device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
- G06F3/0488—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2203/00—Indexing scheme relating to G06F3/00 - G06F3/048
- G06F2203/048—Indexing scheme relating to G06F3/048
- G06F2203/04808—Several contacts: gestures triggering a specific function, e.g. scrolling, zooming, right-click, when the user establishes several contacts with the surface simultaneously; e.g. using several fingers or a combination of fingers and pen
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/41—Structure of client; Structure of client peripherals
- H04N21/414—Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
- H04N21/41407—Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/478—Supplemental services, e.g. displaying phone caller identification, shopping application
- H04N21/4788—Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
Definitions
- the specification relates to a system and method for switching between video views and data views.
- the specification relates to a system and method for switching between video views and data views using natural gestures in mobile videoconferencing.
- the disclosure includes a system and method for switching between video views and data views on a mobile device.
- the system includes a controller, a view presentation module, a screen detection module and a view switching module.
- the controller receives data indicating a participant joins a multi-user communication session.
- the view presentation module presents a video stream of the multi-user communication session on a mobile device associated with the participant.
- the screen detection module determines an occurrence of a detection trigger event.
- the controller receives a video frame image from the video stream responsive to the occurrence of the detection trigger event.
- the screen detection module detects a first data screen in the video frame image.
- the controller receives data describing a first natural gesture performed on the mobile device.
- the view switching module switches a view on the mobile device from video view to data view responsive to the first natural gesture.
- the view presentation module presents a first data stream associated with the first data screen on the mobile device.
- a computer-implemented method with the following steps.
- the method receives data indicating that a first participant, a second participant and a third participant joined a multi-user communication session.
- the method presents a video stream of a multi-user communication session to a mobile device associated with the third participant.
- receives receives a video frame image from the first video stream that includes a first device associated with the first participant and a second device associated with the second participant.
- the method detects a first data screen from the first device and a second data screen from the second device in the video frame image.
- the method receives data describing a selection of the first data screen performed on the mobile device.
- the method switches a view on the mobile device from video view to a first data view that corresponds to the first data screen responsive to the selection.
- the method presents the first data stream on the mobile device.
- FIG. 1A is a high-level block diagram illustrating one embodiment of a system for switching between video views and data views.
- FIG. 1B is a high-level block diagram illustrating another embodiment of a system for switching between video views and data views.
- FIG. 2 is a block diagram illustrating one embodiment of a participation application.
- FIG. 3A is a graphic representation illustrating one embodiment of a process for performing data screen detection.
- FIG. 3B is a graphic representation illustrating one embodiment for switching between video views and data views on a mobile device using natural gestures.
- FIG. 4A is a graphic representation of one embodiment of a graphic user interface illustrating a video view mode on a mobile device.
- FIG. 4B is a graphic representation of one embodiment of a graphic user interface illustrating a data view mode on a mobile device.
- FIG. 4C is a graphic representation of one embodiment of a graphic user interface illustrating an embedded data view mode on a mobile device.
- FIG. 5 is a flow diagram illustrating one embodiment of a method for switching between video views and data views using natural gestures in a multi-user communication session.
- FIGS. 6A-6C are flow diagrams illustrating another embodiment of a method for switching between video views and data views using natural gestures in a multi-user communication session.
- FIG. 7 is a flow diagram illustrating one embodiment of a method for switching between a video view and one of two different data views using a selection in a multi-user communication session.
- the system described in the disclosure is particularly advantageous in numerous respects.
- the system allows a participant to use natural gestures such as pinch gestures to switch between video views and data views, and is capable of providing consistent and seamless user experience in multi-user communication sessions including mobile videoconferencing sessions.
- the system is capable of automatically detecting data screens in video frame images.
- a participant sees a data screen in a video view before switching to a data view showing the data screen in full resolution, which allows the participant to understand the relationship between a data stream of the data screen and the video content in the video view and therefore avoids confusion when more than one data stream is present.
- the system can eliminate a remote viewer's confusion between the projector screen and the whiteboard screen when the remote viewer frequently switches between the video view and the projection screen data view or between the video view and the whiteboard screen data view.
- the system supports embedded data streams and is capable of providing embedded data streams to users.
- the system can present a data stream of a current meeting in a data view mode, where the data stream of the current meeting is a video clip describing a meeting.
- the video clip is embedded with slides and whiteboard stoke information presented in the previous meeting.
- the system can switch from the data view mode to the embedded data view mode to present the embedded slides and whiteboard stroke information to participants of the current meeting in full resolution.
- the system may have other numerous advantages.
- the invention also relates to an apparatus for performing the operations herein.
- This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer.
- a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, flash memories including USB keys with non-volatile memory or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
- Some embodiments can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements.
- a preferred embodiment is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- some embodiments can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
- a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- a data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus.
- the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
- I/O devices including but not limited to keyboards, displays, pointing devices, etc.
- I/O controllers can be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
- Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
- FIG. 1A illustrates a block diagram of a system 100 for switching between video views and data views according to one embodiment.
- the illustrated system 100 includes a hosting device 101 accessible by a host 135 , a registration server 130 , a camera 103 , display devices 107 a . . . 107 n and mobile devices 115 a . . . 115 n accessible by participants 125 a . . . 125 n .
- a letter after a reference number e.g., “ 115 a ,” represents a reference to the element having that particular reference number.
- a reference number in the text without a following letter, e.g., “ 115 ,” represents a general reference to instances of the element bearing that reference number.
- these entities of the system 100 are communicatively coupled via a network 105 .
- the network 105 can be a conventional type, wired or wireless, and may have numerous different configurations including a star configuration, token ring configuration or other configurations. Furthermore, the network 105 may include a local area network (LAN), a wide area network (WAN) (e.g., the Internet), and/or other interconnected data paths across which multiple devices may communicate. In some embodiments, the network 105 may be a peer-to-peer network. The network 105 may also be coupled to or includes portions of a telecommunications network for sending data in a variety of different communication protocols.
- the network 105 includes Bluetooth communication networks or a cellular communications network for sending and receiving data including via short messaging service (SMS), multimedia messaging service (MMS), hypertext transfer protocol (HTTP), direct data connection, WAP, email, etc.
- SMS short messaging service
- MMS multimedia messaging service
- HTTP hypertext transfer protocol
- FIG. 1A illustrates one network 105 coupled to the mobile devices 115 , the hosting device 101 and the registration server 130 , in practice one or more networks 105 can be connected to these entities.
- a hosting environment 137 can be an environment to host a multi-user communication session.
- An example multi-user communication session includes a videoconferencing meeting.
- a hosting environment 137 is a room where all the devices within the dashed box in FIG. 1A are visible to users.
- the hosting environment 137 could be a conference room environment including one or more display devices 107 and one or more cameras 103 present in the conference room.
- Example display devices 107 include, but are not limited to, a projector, an electronic whiteboard, a liquid-crystal display and any other conventional display devices.
- the camera 103 is an advanced videoconferencing camera.
- Example cameras 103 include, but are not limited to, a high-definition (HD) video camera that captures high-resolution videos, a pan-tilt-zoom (PTZ) camera that can be mechanically controlled or a group of cameras that provide multi-view or panoramic views in the hosting environment 137 .
- HD high-definition
- PTZ pan-tilt-zoom
- the hosting environment 137 can include one or more display devices 107 and one or more cameras 103 .
- the hosting device 101 , the display devices 107 a . . . 107 n and the camera 103 are located within the hosting environment 137 .
- the hosting device 101 is communicatively coupled to the display device 107 a via signal line 116 , the display device 107 n via signal line 118 and the camera 103 via signal line 114 .
- the display device 107 a is optionally coupled to the registration server 130 via signal line 102 ;
- the display device 107 n is optionally coupled to the registration server 130 via signal line 104 ;
- the camera 103 is optionally coupled to the registration server 130 via signal line 112 .
- the hosting device 101 can be a computing device that includes a processor and a memory, and is coupled to the network 105 via signal line 131 .
- the hosting device 101 is a hardware server.
- the hosting device 101 is a laptop computer or a desktop computer.
- the hosting device 101 is accessed by a host 135 , for example, a user that manages a meeting.
- the hosting device 101 includes a hosting application 109 and a storage device for storing the presentations generated by the hosting application 109 .
- the hosting application 109 includes software for hosting a multi-user communication session.
- the hosting application 109 hosts a video conferencing meeting that the host 135 manages and one or more participants 125 join using one or more mobile devices 115 .
- the hosting application 109 generates slides for giving a presentation.
- the hosting application 109 displays data to be shared with other participants 125 on one or more data screens of one or more display devices 107 in the hosting environment 137 .
- Data to be shared with participants 125 includes, but is not limited to, a text-based document, web page content, presentation slides, video clips, stroke-based handwritten comments and/or other user annotations, etc.
- the one or more data screens in the hosting environment 137 are visible to the camera 103 .
- the presentation slides to be shared with other remote participants 125 are projected on the wall, where the projection of the presentation slides (or, at least a predetermined portion of the projection) is within a field of view of the camera 103 .
- the camera 103 is capable of capturing the projection of the presentation slides in one or more video frame images of a video stream.
- the hosting application 109 can control movement of the camera 103 so that the electronic whiteboard is visible to the camera 103 .
- the camera 103 is capable of capturing the comments shown in the electronic whiteboard in one or more video frame images of a video stream.
- the camera 103 captures a video stream including video frame images depicting the hosting environment 137 , where the video frame images contain data screens of the display devices 107 and/or the data screen of the hosting device 101 .
- the camera 103 sends the video stream to the hosting device 101 , causing the hosting device 101 to forward the video stream to one or more of the registration server 130 and the mobile device 115 .
- the camera 103 sends the video stream directly to the registration server 130 and/or the mobile device 115 via the network 105 .
- the camera 103 sends a latest video frame image captured by the camera 103 to the registration server 130 responsive to an occurrence of a detection trigger event. The detection trigger event is described below in more detail with reference to FIG. 2 .
- the hosting application 109 or the display device 107 captures a high quality version of a data stream displayed on a data screen of the display device 107 .
- This high quality version of the data stream displayed on the data screen is referred to as a data stream associated with the data screen, which includes a series of data screen images (e.g., screenshot images) depicting content displayed on the data screen of the display device 107 over time.
- a screenshot image of a data screen depicts content displayed on the data screen at a particular moment of time. At different moments of time, different screenshot images of the data screen are captured, which form a data stream associated with the data screen.
- a screenshot image of the data screen may be also referred to as a data frame of the data stream.
- the hosting application 109 captures a series of screenshot images describing a slide presentation in high resolution directly from a presentation computing device.
- an electronic whiteboard captures original stroke information displayed on the whiteboard screen, and sends screenshot images depicting the original stroke information to the hosting application 109 .
- the hosting application 109 sends the data stream associated with the data screen to one or more of the mobile device 115 and the registration server 130 .
- the display device 107 directly sends the data stream including one or more data screen images to one or more of the mobile device 115 and the registration server 130 .
- the display device 107 periodically sends an up-to-date data screen image to the registration server 130 .
- the participation application 123 a can be operable on the registration server 130 .
- the registration server 130 includes a processor and a memory, and is coupled to the network 105 via signal line 106 .
- the registration server 130 includes a database for storing registered images.
- the registration server 130 registers the display device 107 and receives a video feed for a meeting from the camera 103 .
- the video feed includes one or more video frame images.
- the registration server 130 runs image matching algorithms to find a correspondence between a latest video frame and a latest screenshot image of a data screen associated with the display device 107 or the hosting device 101 . If a match is found, the matching area is highlighted in the video frame image and displayed on the mobile device 115 .
- the registration server 130 is described below in more detail with reference to FIGS. 2 and 3A .
- the participation application 123 b may be stored on a mobile device 115 a , which is connected to the network 105 via signal line 108 .
- the mobile device 115 a , 115 n is a computing device with limited display space that includes a memory and a processor, for example a laptop computer, a tablet computer, a mobile telephone, a smartphone, a personal digital assistant (PDA), a mobile email device or other electronic device capable of accessing a network 105 .
- the mobile device 115 includes a touch screen for displaying data and receiving natural gestures from a participant 125 . Examples of natural gestures include, but are not limited to, tap, double tap, long press, scroll, pan, flick, two finger tap, pinch open, pinch close, etc.
- the participant 125 a interacts with the mobile device 115 a .
- the mobile device 115 n is communicatively coupled to the network 105 via signal line 110 .
- the participant 125 n interacts with the mobile device 115 n .
- the participant 125 can be a remote user participating in a multi-user communication session such as a videoconferencing session hosted by the hosting device 101 .
- the mobile devices 115 a , 115 n in FIG. 1A are used by way of example. While FIG. 1A illustrates two mobile devices 115 a and 115 n , the disclosure applies to a system architecture having one or more mobile devices 115 .
- the participation application 123 is distributed such that it may be stored in part on the mobile device 115 a , 115 n and in part on the registration server 130 .
- the participation application 123 b on the mobile device 115 acts as a thin-client application that displays the video stream or the data stream while the registration server 130 performs the screen detection steps.
- the participation application 123 b on the mobile device 115 a instructs the display to present the video stream or the data stream, for example, by rendering images in a browser.
- the participation application 123 b receives user input (e.g., natural gestures) from the participant 125 and interprets the user input. For example, assume the participation application 123 b currently displays the video stream.
- the participation application 123 b receives user input from the participant 125 a to magnify the screen so much that it overcomes a threshold and the participation application 123 b determines that the stream should be switched to the data stream of the hosting device 101 .
- the participation application 123 b sends instructions indicating to switch from the video stream to the data stream to the participation application 123 a on the registration server 130 .
- the participation application 123 can be code and routines for participating in a multi-user communication session.
- the participation application 123 can be implemented using hardware including a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).
- the participation application 123 can be implemented using a combination of hardware and software.
- the participation application 123 may be stored in a combination of the devices and servers, or in one of the devices or servers.
- FIG. 1B is another embodiment of a system for switching between video views and data views.
- mobile devices 115 can comprise the camera 103 and the participation application 123 b .
- the mobile devices 115 are coupled to the display devices 107 a , 107 n via signal lines 136 , 138 , respectively.
- the participant 125 can activate the camera 103 on the mobile device 115 and point it at the display devices 107 a , 107 n to capture their content.
- the mobile device 115 can transmit the images directly to the registration server 130 via signal line 154 .
- the images can serve as a query from the mobile device 115 .
- the participation application 123 uses the captured images to detect the screen from the video view and switches to the data view in response to receiving gestures from the participant 125 .
- FIG. 2 is a block diagram of a computing device 200 that includes a participation application 123 , a processor 235 , a memory 237 , an input/output device 241 , a communication unit 239 and a storage device 243 according to some examples.
- the components of the computing device 200 are communicatively coupled by a bus 220 .
- the input/output device 241 is communicatively coupled to the bus 200 via signal line 242 .
- the computing device 200 can be one of a mobile device 115 and a registration server 130 .
- the registration server 130 can include a participation application 123 with some of the components described below and the mobile device 115 can include some of the other components described below.
- the processor 235 includes an arithmetic logic unit, a microprocessor, a general purpose controller or some other processor array to perform computations and provide electronic display signals to a display device.
- the processor 235 is coupled to the bus 220 for communication with the other components via signal line 222 .
- Processor 235 processes data signals and may include various computing architectures including a complex instruction set computer (CISC) architecture, a reduced instruction set computer (RISC) architecture, or an architecture implementing a combination of instruction sets.
- FIG. 2 includes a single processor 235 , multiple processors 235 may be included. Other processors, operating systems, sensors, displays and physical configurations are possible.
- the memory 237 stores instructions and/or data that can be executed by the processor 235 .
- the memory 237 is coupled to the bus 220 for communication with the other components via signal line 224 .
- the instructions and/or data may include code for performing the techniques described herein.
- the memory 237 may be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, flash memory or some other memory device.
- DRAM dynamic random access memory
- SRAM static random access memory
- flash memory or some other memory device.
- the memory 237 also includes a non-volatile memory or similar permanent storage device and media including a hard disk drive, a floppy disk drive, a CD-ROM device, a DVD-ROM device, a DVD-RAM device, a DVD-RW device, a flash memory device, or some other mass storage device for storing information on a more permanent basis.
- a non-volatile memory or similar permanent storage device and media including a hard disk drive, a floppy disk drive, a CD-ROM device, a DVD-ROM device, a DVD-RAM device, a DVD-RW device, a flash memory device, or some other mass storage device for storing information on a more permanent basis.
- the communication unit 239 transmits and receives data to and from at least one of the hosting device 101 , the mobile device 115 and the registration server 130 depending upon where the participation application 123 is stored.
- the communication unit 239 is coupled to the bus 220 via signal line 226 .
- the communication unit 239 includes a port for direct physical connection to the network 105 or to another communication channel.
- the communication unit 239 includes a USB, SD, CAT-5 or similar port for wired communication with the mobile device 115 .
- the communication unit 239 includes a wireless transceiver for exchanging data with the mobile device 115 or other communication channels using one or more wireless communication methods, including IEEE 802.11, IEEE 802.16, BLUETOOTH® or another suitable wireless communication method.
- the communication unit 239 includes a cellular communications transceiver for sending and receiving data over a cellular communications network including via short messaging service (SMS), multimedia messaging service (MMS), hypertext transfer protocol (HTTP), direct data connection, WAP, e-mail or another suitable type of electronic communication.
- SMS short messaging service
- MMS multimedia messaging service
- HTTP hypertext transfer protocol
- WAP direct data connection
- e-mail e-mail
- the communication unit 239 includes a wired port and a wireless transceiver.
- the communication unit 239 also provides other conventional connections to the network 105 for distribution of files and/or media objects using standard network protocols including TCP/IP, HTTP, HTTPS and SMTP, etc.
- the storage device 243 can be a non-transitory memory that stores data for providing the functionality described herein.
- the storage device 243 may be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, flash memory or some other memory devices.
- the storage device 243 also includes a non-volatile memory or similar permanent storage device and media including a hard disk drive, a floppy disk drive, a CD-ROM device, a DVD-ROM device, a DVD-RAM device, a DVD-RW device, a flash memory device, or some other mass storage device for storing information on a more permanent basis.
- the storage device 243 is communicatively coupled to the bus 220 via signal line 228 .
- the storage device 243 stores one or more of a video stream including one or more video frame images, a data stream including one or more data screen images and one or more detection trigger events, etc.
- the storage device 243 may store other data for providing the functionality described herein.
- the storage device 243 could store copies of video conferencing materials, such as presentations, documents, audio clips, video clips, etc.
- the participation application 123 includes a controller 202 , a view presentation module 204 , a screen detection module 206 , a view switching module 208 , a user interface module 210 and an optional camera adjustment module 212 .
- the components of the participation application 123 are communicatively coupled via the bus 220 .
- the components can be stored in part on the mobile device 115 and in part on the registration server 130 .
- the participation application 123 stored on the registration server 130 could include the screen detection module 206 and the participation application 123 stored on the mobile device could include the remaining components.
- the controller 202 can be software including routines for handling communications between the participation application 123 and other components of the computing device 200 .
- the controller 202 can be a set of instructions executable by the processor 235 to provide the functionality described below for handling communications between the participation application 123 and other components of the computing device 200 .
- the controller 202 can be stored in the memory 237 of the computing device 200 and can be accessible and executable by the processor 235 . In either embodiment, the controller 202 can be adapted for cooperation and communication with the processor 235 and other components of the computing device 200 via signal line 230 .
- the controller 202 sends and receives data, via the communication unit 239 , to and from one or more of the mobile device 115 , the hosting device 101 and the registration server 130 .
- the controller 202 receives, via the communication unit 239 , user input from a participant 125 operating on a mobile device 115 and sends the user input to the view switching module 208 .
- the controller 202 receives graphical data for providing a user interface to a participant 125 from the user interface module 210 and sends the graphical data to a mobile device 115 , causing the mobile device 115 to present the user interface to the participant 125 .
- the controller 202 receives data from other components of the participation application 123 and stores the data in the storage device 243 .
- the controller 202 receives data describing one or more detection trigger events from the screen detection module 206 and stores the data in the storage device 243 .
- the controller 202 retrieves data from the storage device 243 and sends the data to other components of the participation application 123 .
- the controller 202 retrieves a data stream from the storage device 243 and sends the data stream to the view presentation module 204 for presenting the data stream to a participant 125 .
- the view presentation module 204 can be software including routines for presenting a video view or a data view on a mobile device 115 .
- the view presentation module 204 can be a set of instructions executable by the processor 235 to provide the functionality described below for presenting a data view or a video view on a mobile device 115 .
- the view presentation module 204 can be stored in the memory 237 of the computing device 200 and can be accessible and executable by the processor 235 .
- the view presentation module 204 can be adapted for cooperation and communication with the processor 235 and other components of the computing device 200 via signal line 232 .
- a video view mode presents video data associated with a multi-user communication session to a participant 125 .
- the video view mode presents the video stream of the other participants in the multi-user communication session to the participant 125 in full screen on the mobile device 115 .
- the video view mode presents the video stream on the mobile device 115 in full resolution.
- the view presentation module 204 receives data indicating that a participant 125 joins a multi-user communication session from a mobile device 115 associated with the participant 125 .
- the mobile device 115 is in the video view mode.
- the view presentation module 204 receives a video stream including one or more video frame images from the camera 103 directly or via the hosting device 101 , and presents the video stream to the participant 125 on a display of the mobile device 115 .
- one or more data screens that are in the same hosting environment 137 as the camera 103 are captured in the one or more video frame images of the video stream, and the one or more video frame images include sub-images depicting the one or more data screens.
- the one or more video frame images capture at least a portion of a data screen of a hosting device 101 , a portion of a screen projection on the wall and/or a portion of a data screen of an electronic whiteboard.
- the one or more video frame images capture the full data screen of the hosting device 101 , the full projection screen on the wall and/or the full data screen of the electronic whiteboard.
- a data view mode presents a data stream associated with the multi-user communication session to the participant 125 .
- the data view mode presents a data stream with the slides being presented during the multi-user communication session to the participant 125 in full screen on the mobile device 115 .
- the data view mode presents the data stream on the mobile device 115 in full resolution.
- the view presentation module 204 receives, from the view switching module 208 , an identifier (ID) of a detected data screen and a view switching signal indicating that a view on the mobile device 115 should be switched from the video view to the data view.
- the view presentation module 204 receives a data stream associated with the detected data screen directly from the display device 107 associated with the data screen.
- the view presentation module 204 receives the data stream via the hosting device 101 . Responsive to receiving the view switching signal, the view presentation module 204 stops presenting the video stream on the mobile device 115 and starts to present the data stream associated with the data screen on the mobile device 115 .
- an embedded data stream is included in the data stream.
- the view presentation module 204 receives, from the view switching module 208 , a view switching signal instructing the view presentation module 204 to switch the view on the mobile device 115 from the data view to an embedded data view.
- the embedded data view mode presents the embedded data stream to the participant 125 in full resolution or in full screen on the mobile device 115 .
- the view presentation module 204 stops presenting the data stream and starts to present the embedded data stream on the mobile device 115 .
- An embedded data stream can be a videoconferencing meeting, a presentation, a video clip, a text document, presentation slides, or other types of data embedded in the data stream.
- the view presentation module 204 If the view presentation module 204 receives a view switching signal instructing the view presentation module 204 to switch the view from the embedded data view back to the data view from the view switching module 208 , the view presentation module 204 stops presenting the embedded data stream and starts to present the data stream on the mobile device 115 again. In one embodiment, the view presentation module 204 receives a view switching signal instructing the view presentation module 204 to switch the view on the mobile device 115 from the data view to the video view from the view switching module 208 . Responsive to the view switching signal, the view presentation module 204 stops presenting the data stream and starts to present the video stream on the mobile device 115 .
- the screen detection module 206 can be software including routines for performing data screen detection in a video frame image.
- the screen detection module 206 can be a set of instructions executable by the processor 235 to provide the functionality described below for performing data screen detection in a video frame image.
- the screen detection module 206 can be stored in the memory 237 of the computing device 200 and can be accessible and executable by the processor 235 . In either embodiment, the screen detection module 206 can be adapted for cooperation and communication with the processor 235 and other components of the computing device 200 via signal line 234 .
- the screen detection module 206 registers one or more display devices 107 with the registration server 130 .
- the screen detection module 206 can record a device identifier, a user associated with the display device 107 , etc. with the display device 107 and store the registration information in the storage 243 .
- Each display device 107 sends an updated image of its data screen to the registration server 130 periodically.
- each display device 107 sends its up-to-date screenshot image to the registration server 130 periodically.
- the display device 107 sends the updated screenshot images of its data screen to the registration server 130 via the hosting device 101 .
- the screen detection module 206 detects an occurrence of a trigger event.
- the event could be a detection trigger event that triggers a detection of one or more data screens in a video frame image.
- a detection trigger event causes the screen detection module 206 to detect whether the video frame image includes a data screen.
- Example detection trigger events include, but are not limited to, motion of the camera 103 (e.g., panning, zooming or tilting of the camera 103 , movement of the camera 103 , etc.) and/or motion of an object in the video frame image (e.g., appearance of a projection on the wall in the video frame image, movement of a whiteboard, etc.).
- the trigger event could be based on a timer.
- the screen detection module 206 Responsive to the occurrence of the detection trigger event, receives a latest video frame image of the video stream from the camera 103 directly or via the hosting device 101 . In some examples, the screen detection module 206 receives the latest video frame image of the video stream from the mobile device 115 or a video server that provides the video stream. The screen detection module 206 performs data screen detection in the latest video frame image responsive to the occurrence of the detection trigger event. For example, the screen detection module 206 determines whether a data screen appears in the latest video frame image by matching a latest screenshot image of the data screen with the latest video frame image.
- the screen detection module 206 determines whether a sub-image that matches the latest screenshot image of the data screen appears in the latest video frame image. For example, the screen detection module 206 determines whether the latest video frame image includes a sub-image that depicts the data screen (e.g., the screen detection module 206 determines whether the data screen is captured by the latest video frame image). In a further example, the screen detection module 206 runs an image matching algorithm to find the correspondence between the latest video frame image and the latest screenshot image of the data screen. If the screen detection module 206 finds a match between the latest video frame image and the latest screenshot image of the data screen, the screen detection module 206 highlights the matching area in the video frame image on the mobile device 115 . For example, the screen detection module 206 highlights the detected data screen in the video frame image on the mobile device 115 .
- the screen detection module 206 runs the image matching algorithm in real time.
- An example image matching algorithm includes a scale-invariant feature transform (SIFT) algorithm.
- SIFT scale-invariant feature transform
- KNN k-nearest neighbors
- RANSAC random sample consensus
- the screen detection module 206 detects one or more data screens existing in the video frame image, the screen detection module 206 generates a matching result including one or more matches between the one or more data screens and the video frame image.
- the screen detection module 206 notifies the mobile device 115 of the one or more matches, and establishes a direct connection between the mobile device 115 and each display device 107 that has one matched data screen.
- the screen detection module 206 highlights one or more matching areas in the video frame image, where each matching area corresponds to a position of one data screen captured in the video frame image.
- the screen detection module 206 displays the highlighted matching areas on the mobile device 115 .
- the camera 103 is statically deployed and captures one or more data screens in the hosting environment 137 , and positions of the one or more data screens remain unchanged in the video frame images.
- the screen detection module 206 can determine existence of the one or more data screens based on the static setting in the hosting environment 137 , and can pre-calibrate positions of the one or more data screens in the video frame images.
- the screen detection module 206 highlights the one or more data screens in the video frame images at the pre-calibrated positions in the video frame images.
- the screen detection module 206 sends one or more screen IDs identifying the one or more detected data screens and data describing one or more matching areas in the video frame image to the view switching module 208 .
- the screen detection module 206 sends pre-calibrated positions of one or more data screens to the view switching module 208 .
- the screen detection module 206 stores the one or more screen IDs, data describing the one or more matching areas and/or the pre-calibrated positions in the storage 243 .
- the view switching module 208 can be software including routines for switching a view on a mobile device 115 between a video view and a data view.
- the view switching module 208 can be a set of instructions executable by the processor 235 to provide the functionality described below for switching a view on a mobile device 115 between a video view and a data view.
- the view switching module 208 can be stored in the memory 237 of the computing device 200 and can be accessible and executable by the processor 235 . In either embodiment, the view switching module 208 can be adapted for cooperation and communication with the processor 235 and other components of the computing device 200 via signal line 236 .
- the view switching module 208 receives, from the screen detection module 206 , data describing one or more screen IDs identifying one or more detected data screens and one or more matching areas associated with the one or more detected data screens in the video frame image.
- the mobile device 115 presents the video stream to the participant 125 , with the one or more detected data screens highlighted in the matching areas of the video frame images. If the participant 125 performs a natural gesture (e.g., a pinch open or double tap gesture, etc.) within a highlighted matching area of a data screen on a touch screen of the mobile device 115 , the view switching module 208 interprets the participant's natural gesture as a command to switch from the video view to the data view.
- a natural gesture e.g., a pinch open or double tap gesture, etc.
- the view switching module 208 generates a view switching signal describing the command and sends the view switching signal to the view presentation module 204 , causing the view presentation module 204 to present the data view to the participant 125 .
- the view switching module 208 interprets the natural gesture as a command to switch from the video view to the data view if the portion of the data screen detected in the video frame image is greater than a predetermined threshold (e.g., a majority portion of the data screen appearing in the video frame image).
- the participant 125 can use a natural gesture to zoom into a data screen detected in the video frame image, so that the video view presenting the video frame image scales up accordingly on the touch screen of the mobile device 115 . If the size of the scaled-up data screen in the video frame image reaches a predetermined threshold, the view switching module 208 automatically switches the view on the mobile device 115 from the video view to the data view, causing the view presentation module 204 to present the data stream associated with the detected data screen on the mobile device 115 . The mobile device 115 switches from the video view mode to the data view mode accordingly.
- the participant 125 can further perform natural gestures to operate on the data stream such as zooming into the data stream, copying the data stream, dragging the data stream, etc.
- the mobile device 115 presents the data stream to the participant 125 .
- a natural gesture e.g., a pinch close gesture or tapping on an exit icon, etc.
- the view switching module 208 interprets the participant's natural gesture as a command to switch from the data view back to the video view.
- the view switching module 208 generates a view switching signal describing the command and sends the view switching signal to the view presentation module 204 , causing the view presentation module 204 to present the video view to the participant 125 .
- the screen detection module 206 detects the one or more data screens visible to the camera 103 in the video frame images, and highlights the one or more data screens in the video frame images. For example, the participant 125 can use a natural gesture to zoom out the data stream, so that the data view presenting the data stream scales down accordingly on the touch screen of the mobile device 115 . If the size of the scaled down data stream reaches a predetermined threshold, the view switching module 208 automatically switches the view on the mobile device 115 from the data view to the video view, causing the view presentation module 204 to present the video stream on the mobile device 115 .
- the participant 125 can use a natural gesture on the embedded data stream.
- the view switching module 208 interprets the natural gesture as a command to switch from the data view to the embedded data view.
- the view switching module 208 generates a view switching signal describing the command and sends the view switching signal to the view presentation module 204 , causing the view presentation module 204 to present the embedded data stream to the participant 125 in full resolution.
- the participant 125 may perform another natural gesture to exit from the embedded data view and return to the data view.
- the participant 125 can issue a tap open command on an icon representing the embedded video during the data view mode, causing the view presentation module 204 to present the embedded video in full screen on the mobile device 115 . After viewing the embedded video, the participant 125 can issue a pinch close command to exit from the embedded data view and return to the data view.
- the user interface module 210 can be software including routines for generating graphical data for providing a user interface.
- the user interface module 210 can be a set of instructions executable by the processor 235 to provide the functionality described below for generating graphical data for providing a user interface.
- the user interface module 210 can be stored in the memory 237 of the computing device 200 and can be accessible and executable by the processor 235 . In either embodiment, the user interface module 210 can be adapted for cooperation and communication with the processor 235 and other components of the computing device 200 via signal line 238 .
- the user interface module 210 receives instructions from the view presentation module 204 to generate graphical data for providing a user interface to a user such as a host 135 or a participant 125 .
- the user interface module 210 sends the graphical data to the hosting device 101 or the mobile device 115 , causing the hosting device 101 or the mobile device 115 to present the user interface to the user.
- the user interface module 210 generates graphical data for providing a user interface that depicts a video stream or a data stream.
- the user interface module 210 sends the graphical data to the mobile device 115 , causing the mobile device 115 to present the video stream or the data stream to the participant 125 via the user interface.
- the user interface module 210 may generate graphical data for providing other user interfaces to users.
- the optional camera adjustment module 212 can be software including routines for adjusting a camera 103 .
- the camera adjustment module 212 can be a set of instructions executable by the processor 235 to provide the functionality described below for adjusting a camera 103 .
- the camera adjustment module 212 can be stored in the memory 237 of the computing device 200 and can be accessible and executable by the processor 235 . In either embodiment, the camera adjustment module 212 can be adapted for cooperation and communication with the processor 235 and other components of the computing device 200 via signal line 240 .
- a participant 125 can use natural gestures to navigate the camera 103 .
- the participant 125 can perform a natural gesture to change the view angle of the camera 103 via a user interface shown on the mobile device 115 .
- the camera adjustment module 212 receives data describing the participant's natural gesture and interprets the participant's natural gesture as a command to adjust the camera 103 such as panning, tilting, zooming in or zooming out the camera 103 .
- the camera adjustment module 212 adjusts the camera 103 according to the participant's natural gesture.
- the participant 125 may keep one or more data screens of one or more display devices 107 within the field of view of the camera 103 , so that the camera 103 captures the one or more data screens in the video frame images.
- An example use of the system described herein includes a videoconferencing scenario, where a first party (e.g., a host 135 ) is in a conference room equipped with a camera 103 and one or more data screens, and a second party (e.g., a participant 125 ) is a remote mobile user participating the videoconference using a mobile device 115 such as a smart phone or a tablet.
- a mobile device 115 such as a smart phone or a tablet.
- the participation application 123 receives a video stream from the camera 103 and presents the video stream to the participant 125 on a touch screen of the mobile device 115 .
- the participation application 123 detects one or more data screens captured by video frame images.
- the participant 125 can issue a natural gesture such as a pinch open gesture on a detected data screen highlighted in the video frame images, causing the mobile device 115 to switch from video view to data view. Afterwards, the participation application 123 presents a data stream associated with the detected data screen to the participant 125 in full resolution. The participant 125 may issue another natural gesture such as a pinch close gesture to switch from the data view back to the video view.
- a natural gesture such as a pinch open gesture on a detected data screen highlighted in the video frame images, causing the mobile device 115 to switch from video view to data view.
- the participation application 123 presents a data stream associated with the detected data screen to the participant 125 in full resolution.
- the participant 125 may issue another natural gesture such as a pinch close gesture to switch from the data view back to the video view.
- Another example use of the system described herein includes a retrieval application for retrieving information relevant to an image.
- a user can capture an image of an advertisement (e.g., an advertisement for a vehicle brand), and instruct the retrieval application to retrieve information relevant to the advertisement.
- the image of the advertisement may include a banner and/or a data screen image showing a commercial video.
- the retrieval application can instruct the screen detection module 206 to detect the data screen in the image of the advertisement and to identify a product that matches content shown in the data screen image.
- the retrieval application may retrieve information relevant to the identified product from one or more databases and provide the relevant information to a user.
- Other example uses of the system described herein are possible.
- FIG. 3A is a graphic representation 300 illustrating one embodiment of a process for performing data screen detection.
- the camera 103 establishes a video stream connection 302 with the mobile device 115 .
- the camera 103 sends a video stream to the mobile device 115 via the video stream connection 302 , causing the mobile device 115 to present the video stream to the participant 125 in a video view mode.
- the display device 107 registers with the registration server 130 and sends updated screenshot images 304 of a data screen associated with the display device 107 to the registration server 130 periodically.
- the display device 107 is an electronic whiteboard.
- the registration server 130 detects a detection trigger event.
- the registration server 130 detects motion of the camera 103 such as panning or tilting.
- the registration server 130 receives a latest video frame image 306 from the camera 103 responsive to the detection trigger event.
- the registration server 130 receives the latest video frame image 306 from the mobile device 115 .
- the registration server 130 uses an image-matching method to detect active data screens dynamically. For example, the registration server 130 uses an image matching algorithm to find the correspondence between the latest video frame image 306 and the latest screenshot image received from either the hosting device 101 or the display device 107 . If a matching result 308 between the latest video frame image 306 and the latest screenshot image of the data screen is found, the registration server 130 notifies the mobile device 115 of the matching result 308 and highlights the corresponding data screen in the video frame images. For example, the registration server 130 uses a box 310 to highlight a data screen of an electronic whiteboard in the video frame image. The display device 107 associated with the data screen establishes a data stream connection 312 with the mobile device 115 . The display device 107 may send a data stream to the mobile device 115 via the data stream connection 312 .
- FIG. 3B is a graphic representation 319 illustrating one embodiment for switching between video views and data views on a mobile device 115 using natural gestures.
- the participation application 123 interprets natural gestures from a participant 125 to achieve the seamless user experience.
- the participation application 123 captures and transmits a first data stream and a second data stream along with the video stream captured from the camera 103 to the mobile device 115 .
- the first data stream includes the high quality screenshot images from the hosting device 101 (e.g., a laptop), and the second data stream includes the strokes from the display device 107 (e.g., an electronic whiteboard). Both data screens (the data screen of the laptop and the data screen of the electronic whiteboard) are visible to the camera 103 .
- the screenshot images from the hosting device 101 include an embedded image depicting the data screen of the display device 107 .
- the video view is shown on the mobile device 115 to present the video frame image 320 to the participant 125 .
- the video frame image 320 is shown in full resolution or on full screen of the mobile device 115 .
- both data screens (the data screen 324 of the laptop and the data screen 322 of the electronic whiteboard) are visible in the video frame image 320 displayed on the participant's mobile device 115 .
- the participation application 123 intelligently detects and notifies the participant 125 of the existence of data screens in the video frame images. For example, the participation application 123 highlights the data screens 322 and 324 in the video frame image 320 .
- phase ( 1 ) if the participant 125 tries to get more detail from the laptop data screen 324 , he or she can perform a natural gesture 330 on the laptop data screen 324 shown in the video frame image 320 to zoom into the laptop data screen 324 .
- An example natural gesture 330 can be a pinch or double tap gesture. Responsive to the natural gesture 330 , the video view on the mobile device 115 scales up. If the size of the recognized laptop data screen 324 reaches a pre-set threshold, the mobile device 115 automatically switches from the video view to the data view. For example, the view on the mobile device 115 switches from presenting the video frame image 320 in full resolution to presenting a high quality screenshot image 326 of the laptop data screen 324 in full resolution. The participation application 123 interprets any further pinch or dragging gestures performed on the screenshot image 326 as operating on the screenshot image 326 of the laptop data screen 324 .
- phase ( 2 ) when the participant 125 performs a natural gesture 332 such as a pinch gesture on the screenshot image 326 to zoom out of the data view and the zoom-out scale ratio reaches a pre-set threshold, the mobile device 115 switches back to the video view from the data view.
- the participation application 123 presents the video frame image 320 on the mobile device 115 in full resolution, and detects and marks the visible data screens 322 and 324 in the video frame image 320 .
- the participant 125 performs a natural gesture 334 such as a dragging gesture on the highlighted data screen 322 in the video frame image 320 , causing the mobile device 115 to enlarge the video view greater than a threshold amount, which causes the mobile device 115 to switch from showing the video frame image 320 in full resolution to the data view showing a screenshot image 328 of the electronic whiteboard in full resolution.
- the participant 125 performs a natural gesture 336 such as a pinch gesture on the screenshot image 328 to zoom out the data view, causing the mobile device 115 to decrease the data view until a threshold point triggers the mobile device 115 to switch back to the video view from the data view.
- the mobile device 115 presents the video frame image 320 to the participant 125 .
- FIG. 4A is a graphic representation 400 of one embodiment of a graphic user interface illustrating a video view on a mobile device 115 .
- the example user interface shows a video frame image 402 depicting a conference room.
- the video frame image 402 depicts a host 135 and a data screen 404 of the hosting device 101 projected on a wall of the conference room.
- the data screen 404 includes an embedded data screen 406 . If the participant 125 performs a natural gesture on the data screen 404 captured in the video frame image 402 , the mobile device 115 switches from the video view to the data view shown in FIG. 4B .
- FIG. 4B is a graphic representation 420 of one embodiment of a graphic user interface illustrating a data view on a mobile device 115 .
- a data stream including screenshot images of the data screen 404 is presented on the mobile device 115 .
- the data stream is a multi-user communication session including an embedded data stream.
- the data stream is a video clip of another conference with embedded slides.
- the embedded data screen 406 presenting the embedded slides is shown in the screenshot image of the data screen 404 .
- the participant 125 When the participant 125 switches to the data view shown in FIG. 4B from the video view shown in FIG. 4A , the data stream including the video clip starts to play.
- the participant 125 may exit from the data view shown in FIG. 4B and return to the video view shown in FIG. 4A by performing a natural gesture (e.g., a pinch to close gesture) on the screenshot image of the data screen 404 .
- the participant 125 can keep zooming into the data view if the video clip includes embedded presentation slides or whiteboard strokes information. For example, if the participant 125 performs a natural gesture on the embedded data screen 406 , the mobile device 115 can switch from the data view to an embedded data view shown in FIG. 4C to present slides embedded in the video clip.
- FIG. 4C is a graphic representation 440 of one embodiment of a graphic user interface illustrating an embedded data view on a mobile device 115 .
- the slides shown in the embedded data screen 406 is presented to the participant 125 .
- the participant 125 may exit from the embedded data view and return to the data view shown in FIG. 4B by performing a natural gesture (e.g., a pinch to close gesture) on the screenshot image of the embedded data screen 406 .
- a natural gesture e.g., a pinch to close gesture
- FIG. 5 is a flow diagram illustrating one embodiment of a method 500 for switching between video views and data views using natural gestures in a multi-user communication session.
- the controller 202 receives 502 data indicating that a participant 125 joined a multi-user communication session from a mobile device 115 associated with the participant 125 .
- the view presentation module 204 presents 504 a video stream of the multi-user communication session to the mobile device 115 .
- the view presentation module 204 instructs the user interface engine 210 to generate graphical data for displaying the video stream.
- the screen detection module 206 determines an occurrence of a detection trigger event.
- the controller 202 receives 506 a video frame image from the video stream responsive to the occurrence of the detection trigger event.
- the controller 202 receives a latest video frame image of the video stream from the camera 103 .
- the screen detection module 206 detects 508 a first data screen in the video frame image. For example, the screen detection module 206 determines that the video frame image captures the first data screen.
- the controller 202 receives 510 data describing a first natural gesture performed on the mobile device 115 .
- the controller 202 receives data describing a pinch to open gesture performed on the first data screen in the video frame image.
- the view switching module 208 switches 512 a view on the mobile device 115 from video view to data view responsive to the first natural gesture.
- the view presentation module 204 presents 514 a first data stream associated with the first data screen on the mobile device 115 .
- the first data stream includes one or more high-definition screenshot images of the first data screen generated by a display device 107 associated with the first data screen.
- FIGS. 6A-6C are flow diagrams illustrating another embodiment of a method 600 for switching between video views and data views using natural gestures in a multi-user communication session.
- the controller 202 receives 602 data indicating that a participant 125 joined a multi-user communication session from a mobile device 115 associated with the participant 125 .
- the view presentation module 204 presents 604 a video stream of the multi-user communication session on the mobile device 115 .
- the screen detection module 206 registers 606 a display device 107 with the registration server 130 .
- the display device 107 includes a data screen for presenting a data stream of the multi-user communication session in the hosting environment 137 .
- the controller 202 receives 608 images of the data screen from the display device 107 periodically. For example, the controller 202 receives screenshot images of the data screen from the display device 107 periodically.
- the screen detection module 206 detects 610 an occurrence of a detection trigger event.
- the controller 202 receives 612 a latest video frame image from the camera 103 responsive to the occurrence of the detection trigger event.
- the screen detection module 206 performs 614 data screen detection in the latest video frame image using the latest image of the data screen received from the display device 107 .
- the screen detection module 206 determines 616 whether a sub-image that matches the latest image of the data screen is found in the latest video frame image. If the sub-image is found in the latest video frame image, the method 600 moves to step 618 . Otherwise, the method 600 ends.
- the screen detection module 206 generates a matching result indicating the match between the latest image of the data screen and the latest video frame image, and notifies the mobile device 115 of the matching result.
- the screen detection module 206 provides data between the mobile device 115 and the display device 107 associated with the data screen. For example, the screen detection module 206 establishes a direct connection between the devices. In one embodiment, the display device 107 can transmit a data stream associated with the data screen to the mobile device 115 via the direct connection.
- the controller 202 receives 622 data describing a first natural gesture performed by the participant 125 on the sub-image depicting the data screen in the video frame image.
- the controller 202 receives 624 a data stream associated with the data screen from the display device 107 .
- the view switching module 208 switches 626 a view on the mobile device 115 from video view to data view responsive to the first natural gesture exceeding a threshold. For example, the user makes an expanding view starting in the center of the screen and moving over half the width of the screen.
- the view presentation module 204 presents 628 the data stream associated with the data screen on the mobile device 115 .
- the controller 202 receives 630 data describing a second natural gesture performed by the participant 125 on the data stream.
- the view switching module 208 switches 632 the view on the mobile device 115 from data view back to video view responsive to the second natural gesture exceeding a threshold.
- the view presentation module 204 presents 634 the video stream on the mobile device 115 .
- FIG. 7 is a flow diagram illustrating one embodiment of a method 700 for switching between a video view and one of two different data views using a selection in a multi-user communication session.
- the controller 202 receives 702 data indicating that a first participant, a second participant and a third participant 125 joined a multi-user communication session.
- the view presentation module 204 presents 704 a video stream of the multi-user communication session to a mobile device 115 associated with the third participant 125 .
- the view presentation module 204 instructs the user interface engine 210 to generate graphical data for displaying the video stream.
- the screen detection module 206 determines an occurrence of a detection trigger event.
- the controller 202 receives 706 a video frame image from the video stream that includes a first device associated with the first participant and a second device associated with the second participant. For example, the controller 202 receives a latest video frame image of the video stream from the camera 103 .
- the screen detection module 206 detects 708 a first data screen from the first device and a second data screen from the second device in the video frame image.
- the controller 202 receives 710 data describing a selection of the first data screen performed on the mobile device 115 .
- the controller 202 receives data describing a finger pressing in the center of the image of the first device to indicate that the third participant wants to view the first data view.
- the view switching module 208 switches 712 a view on the mobile device 115 from video view to a first data view that corresponds to the first data screen responsive to the selection.
- the view presentation module 204 presents 714 the first data stream on the mobile device 115 .
- modules, routines, features, attributes, methodologies and other aspects of the specification can be implemented as software, hardware, firmware or any combination of the three.
- a component an example of which is a module, of the specification is implemented as software
- the component can be implemented as a standalone program, as part of a larger program, as a plurality of separate programs, as a statically or dynamically linked library, as a kernel loadable module, as a device driver, and/or in every and any other way known now or in the future to those of ordinary skill in the art of computer programming.
- the specification is in no way limited to embodiments in any specific programming language, or for any specific operating system or environment. Accordingly, the disclosure is intended to be illustrative, but not limiting, of the scope of the specification, which is set forth in the following claims.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Information Transfer Between Computers (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The disclosure includes a system and method for switching between video views and data views. The system includes a controller, a view presentation module, a screen detection module and a view switching module. The controller receives data indicating a participant joined a multi-user communication session. The view presentation module presents a video stream on a mobile device associated with the participant. The screen detection module determines an occurrence of a detection trigger event. The controller receives a video frame image responsive to the occurrence of the detection trigger event. The screen detection module detects a data screen in the video frame image. The view switching module switches a view on the mobile device from video view to data view responsive to a natural gesture performed by the participant. The view presentation module presents a data stream associated with the data screen on the mobile device.
Description
- This application is a continuation of and claims priority to U.S. application Ser. No. 14/019,915, filed Sep. 6, 2013, entitled “Switching Between Views Using Natural Gestures,” which claims priority under 35 USC §119(e) to U.S. Application No. 61/825,482, filed May 20, 2013, entitled “Method of Switching between Views in Mobile Videoconferencing Using Gestures,” each of which is incorporated by reference in its entirety.
- 1. Field of the Invention
- The specification relates to a system and method for switching between video views and data views. In particular, the specification relates to a system and method for switching between video views and data views using natural gestures in mobile videoconferencing.
- 2. Description of the Background Art
- Existing videoconferencing systems collect and transmit data streams along with video and audio streams in a videoconference session. This is because in most business meetings, the users expect to not only see each other, but also to exchange data information, such as documents, presentation slides, handwritten comments, etc. These data streams are usually directly captured from computer screens, separately encoded with special coding tools, and displayed side-by-side with the video streams on a remote site.
- The explosion of mobile devices drives more and more videoconferencing service providers to develop mobile applications such as smart phone and tablet applications. These mobile applications make it much easier for the users to access the videoconferencing service from anywhere using mobile devices.
- However, it becomes a problem to display both a video view and a data view on the mobile device simultaneously. Due to the limited screen size of the mobile device, it is not possible to display both the video view and the data view at full resolution side-by-side. Currently, the commonly used method is to use a user interface that displays one view at the full screen scale while showing only the thumbnail for the other view. The user interface combines and displays the video view and the data view together. Such a user interface fails to provide a unified experience by separating the user interface into multiple view modes, and may cause confusion when there is more than one data stream.
- The disclosure includes a system and method for switching between video views and data views on a mobile device. In one embodiment, the system includes a controller, a view presentation module, a screen detection module and a view switching module. The controller receives data indicating a participant joins a multi-user communication session. The view presentation module presents a video stream of the multi-user communication session on a mobile device associated with the participant. The screen detection module determines an occurrence of a detection trigger event. The controller receives a video frame image from the video stream responsive to the occurrence of the detection trigger event. The screen detection module detects a first data screen in the video frame image. The controller receives data describing a first natural gesture performed on the mobile device. The view switching module switches a view on the mobile device from video view to data view responsive to the first natural gesture. The view presentation module presents a first data stream associated with the first data screen on the mobile device.
- In another embodiment, a computer-implemented method with the following steps is performed. The method receives data indicating that a first participant, a second participant and a third participant joined a multi-user communication session. The method presents a video stream of a multi-user communication session to a mobile device associated with the third participant. The method receives a video frame image from the first video stream that includes a first device associated with the first participant and a second device associated with the second participant. The method detects a first data screen from the first device and a second data screen from the second device in the video frame image. The method receives data describing a selection of the first data screen performed on the mobile device. The method switches a view on the mobile device from video view to a first data view that corresponds to the first data screen responsive to the selection. The method presents the first data stream on the mobile device.
- Other aspects include corresponding methods, systems, apparatuses, and computer program products for these and other innovative aspects.
- The invention is illustrated by way of example, and not by way of limitation in the figures of the accompanying drawings in which like reference numerals are used to refer to similar elements.
-
FIG. 1A is a high-level block diagram illustrating one embodiment of a system for switching between video views and data views. -
FIG. 1B is a high-level block diagram illustrating another embodiment of a system for switching between video views and data views. -
FIG. 2 is a block diagram illustrating one embodiment of a participation application. -
FIG. 3A is a graphic representation illustrating one embodiment of a process for performing data screen detection. -
FIG. 3B is a graphic representation illustrating one embodiment for switching between video views and data views on a mobile device using natural gestures. -
FIG. 4A is a graphic representation of one embodiment of a graphic user interface illustrating a video view mode on a mobile device. -
FIG. 4B is a graphic representation of one embodiment of a graphic user interface illustrating a data view mode on a mobile device. -
FIG. 4C is a graphic representation of one embodiment of a graphic user interface illustrating an embedded data view mode on a mobile device. -
FIG. 5 is a flow diagram illustrating one embodiment of a method for switching between video views and data views using natural gestures in a multi-user communication session. -
FIGS. 6A-6C are flow diagrams illustrating another embodiment of a method for switching between video views and data views using natural gestures in a multi-user communication session. -
FIG. 7 is a flow diagram illustrating one embodiment of a method for switching between a video view and one of two different data views using a selection in a multi-user communication session. - The system described in the disclosure is particularly advantageous in numerous respects. First, the system allows a participant to use natural gestures such as pinch gestures to switch between video views and data views, and is capable of providing consistent and seamless user experience in multi-user communication sessions including mobile videoconferencing sessions.
- Second, the system is capable of automatically detecting data screens in video frame images. A participant sees a data screen in a video view before switching to a data view showing the data screen in full resolution, which allows the participant to understand the relationship between a data stream of the data screen and the video content in the video view and therefore avoids confusion when more than one data stream is present. For example, if a speaker in a conference room moves frequently between two or more data screens such as a projector screen and a whiteboard screen, the system can eliminate a remote viewer's confusion between the projector screen and the whiteboard screen when the remote viewer frequently switches between the video view and the projection screen data view or between the video view and the whiteboard screen data view.
- Third, the system supports embedded data streams and is capable of providing embedded data streams to users. For example, the system can present a data stream of a current meeting in a data view mode, where the data stream of the current meeting is a video clip describing a meeting. The video clip is embedded with slides and whiteboard stoke information presented in the previous meeting. The system can switch from the data view mode to the embedded data view mode to present the embedded slides and whiteboard stroke information to participants of the current meeting in full resolution. The system may have other numerous advantages.
- A system and method for switching between video views and data views is described below. In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the embodiments can be practiced without these specific details. In other instances, structures and devices are shown in block diagram form in order to avoid obscuring the invention. For example, the invention is described in one embodiment below with reference to mobile devices such as a smart phone and particular software and hardware. However, the description applies to any type of computing device that can receive data and commands, and any peripheral devices providing services.
- Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
- Some portions of the detailed descriptions that follow are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers or the like.
- It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing” or “computing” or “calculating” or “determining” or “displaying” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
- The invention also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, flash memories including USB keys with non-volatile memory or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
- Some embodiments can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. A preferred embodiment is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- Furthermore, some embodiments can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this invention, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
- A data processing system suitable for storing and/or executing program code will include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
- Finally, the algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description below. In addition, the specification is not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the various embodiments as described herein.
-
FIG. 1A illustrates a block diagram of asystem 100 for switching between video views and data views according to one embodiment. The illustratedsystem 100 includes a hostingdevice 101 accessible by ahost 135, aregistration server 130, acamera 103,display devices 107 a . . . 107 n andmobile devices 115 a . . . 115 n accessible byparticipants 125 a . . . 125 n. InFIG. 1A and the remaining figures, a letter after a reference number, e.g., “115 a,” represents a reference to the element having that particular reference number. A reference number in the text without a following letter, e.g., “115,” represents a general reference to instances of the element bearing that reference number. In the illustrated embodiment, these entities of thesystem 100 are communicatively coupled via anetwork 105. - The
network 105 can be a conventional type, wired or wireless, and may have numerous different configurations including a star configuration, token ring configuration or other configurations. Furthermore, thenetwork 105 may include a local area network (LAN), a wide area network (WAN) (e.g., the Internet), and/or other interconnected data paths across which multiple devices may communicate. In some embodiments, thenetwork 105 may be a peer-to-peer network. Thenetwork 105 may also be coupled to or includes portions of a telecommunications network for sending data in a variety of different communication protocols. In some embodiments , thenetwork 105 includes Bluetooth communication networks or a cellular communications network for sending and receiving data including via short messaging service (SMS), multimedia messaging service (MMS), hypertext transfer protocol (HTTP), direct data connection, WAP, email, etc. AlthoughFIG. 1A illustrates onenetwork 105 coupled to themobile devices 115, the hostingdevice 101 and theregistration server 130, in practice one ormore networks 105 can be connected to these entities. - A hosting
environment 137 can be an environment to host a multi-user communication session. An example multi-user communication session includes a videoconferencing meeting. In some examples, a hostingenvironment 137 is a room where all the devices within the dashed box inFIG. 1A are visible to users. For example, the hostingenvironment 137 could be a conference room environment including one ormore display devices 107 and one ormore cameras 103 present in the conference room.Example display devices 107 include, but are not limited to, a projector, an electronic whiteboard, a liquid-crystal display and any other conventional display devices. In one embodiment, thecamera 103 is an advanced videoconferencing camera.Example cameras 103 include, but are not limited to, a high-definition (HD) video camera that captures high-resolution videos, a pan-tilt-zoom (PTZ) camera that can be mechanically controlled or a group of cameras that provide multi-view or panoramic views in the hostingenvironment 137. Although twodisplay devices 107 and onecamera 103 are illustrated inFIG. 1A , the hostingenvironment 137 can include one ormore display devices 107 and one ormore cameras 103. - The hosting
device 101, thedisplay devices 107 a . . . 107 n and thecamera 103 are located within the hostingenvironment 137. The hostingdevice 101 is communicatively coupled to thedisplay device 107 a viasignal line 116, thedisplay device 107 n viasignal line 118 and thecamera 103 viasignal line 114. Thedisplay device 107 a is optionally coupled to theregistration server 130 viasignal line 102; thedisplay device 107 n is optionally coupled to theregistration server 130 viasignal line 104; and thecamera 103 is optionally coupled to theregistration server 130 viasignal line 112. - The hosting
device 101 can be a computing device that includes a processor and a memory, and is coupled to thenetwork 105 viasignal line 131. For example, the hostingdevice 101 is a hardware server. In another example, the hostingdevice 101 is a laptop computer or a desktop computer. The hostingdevice 101 is accessed by ahost 135, for example, a user that manages a meeting. The hostingdevice 101 includes a hostingapplication 109 and a storage device for storing the presentations generated by the hostingapplication 109. - The hosting
application 109 includes software for hosting a multi-user communication session. For example, the hostingapplication 109 hosts a video conferencing meeting that thehost 135 manages and one ormore participants 125 join using one or moremobile devices 115. In another example, the hostingapplication 109 generates slides for giving a presentation. - In one embodiment, the hosting
application 109 displays data to be shared withother participants 125 on one or more data screens of one ormore display devices 107 in the hostingenvironment 137. Data to be shared withparticipants 125 includes, but is not limited to, a text-based document, web page content, presentation slides, video clips, stroke-based handwritten comments and/or other user annotations, etc. The one or more data screens in the hostingenvironment 137 are visible to thecamera 103. For example, the presentation slides to be shared with otherremote participants 125 are projected on the wall, where the projection of the presentation slides (or, at least a predetermined portion of the projection) is within a field of view of thecamera 103. In this case, thecamera 103 is capable of capturing the projection of the presentation slides in one or more video frame images of a video stream. - In another example, if a user in a conference room writes comments on an electronic whiteboard, the hosting
application 109 can control movement of thecamera 103 so that the electronic whiteboard is visible to thecamera 103. In this case, thecamera 103 is capable of capturing the comments shown in the electronic whiteboard in one or more video frame images of a video stream. Thecamera 103 captures a video stream including video frame images depicting the hostingenvironment 137, where the video frame images contain data screens of thedisplay devices 107 and/or the data screen of the hostingdevice 101. - In one embodiment, the
camera 103 sends the video stream to the hostingdevice 101, causing the hostingdevice 101 to forward the video stream to one or more of theregistration server 130 and themobile device 115. In another embodiment, thecamera 103 sends the video stream directly to theregistration server 130 and/or themobile device 115 via thenetwork 105. In yet another embodiment, thecamera 103 sends a latest video frame image captured by thecamera 103 to theregistration server 130 responsive to an occurrence of a detection trigger event. The detection trigger event is described below in more detail with reference toFIG. 2 . - In one embodiment, the hosting
application 109 or thedisplay device 107 captures a high quality version of a data stream displayed on a data screen of thedisplay device 107. This high quality version of the data stream displayed on the data screen is referred to as a data stream associated with the data screen, which includes a series of data screen images (e.g., screenshot images) depicting content displayed on the data screen of thedisplay device 107 over time. A screenshot image of a data screen depicts content displayed on the data screen at a particular moment of time. At different moments of time, different screenshot images of the data screen are captured, which form a data stream associated with the data screen. In some examples, a screenshot image of the data screen may be also referred to as a data frame of the data stream. - In some examples, the hosting
application 109 captures a series of screenshot images describing a slide presentation in high resolution directly from a presentation computing device. In some additional examples, an electronic whiteboard captures original stroke information displayed on the whiteboard screen, and sends screenshot images depicting the original stroke information to the hostingapplication 109. - In one embodiment, the hosting
application 109 sends the data stream associated with the data screen to one or more of themobile device 115 and theregistration server 130. In another embodiment, thedisplay device 107 directly sends the data stream including one or more data screen images to one or more of themobile device 115 and theregistration server 130. For example, thedisplay device 107 periodically sends an up-to-date data screen image to theregistration server 130. - In one embodiment, the
participation application 123 a can be operable on theregistration server 130. Theregistration server 130 includes a processor and a memory, and is coupled to thenetwork 105 viasignal line 106. Theregistration server 130 includes a database for storing registered images. Theregistration server 130 registers thedisplay device 107 and receives a video feed for a meeting from thecamera 103. The video feed includes one or more video frame images. Theregistration server 130 runs image matching algorithms to find a correspondence between a latest video frame and a latest screenshot image of a data screen associated with thedisplay device 107 or the hostingdevice 101. If a match is found, the matching area is highlighted in the video frame image and displayed on themobile device 115. Theregistration server 130 is described below in more detail with reference toFIGS. 2 and 3A . - In another embodiment, the
participation application 123 b may be stored on amobile device 115 a, which is connected to thenetwork 105 viasignal line 108. Themobile device network 105. Themobile device 115 includes a touch screen for displaying data and receiving natural gestures from aparticipant 125. Examples of natural gestures include, but are not limited to, tap, double tap, long press, scroll, pan, flick, two finger tap, pinch open, pinch close, etc. - In the illustrated embodiment, the
participant 125 a interacts with themobile device 115 a. Themobile device 115 n is communicatively coupled to thenetwork 105 viasignal line 110. Theparticipant 125 n interacts with themobile device 115 n. Theparticipant 125 can be a remote user participating in a multi-user communication session such as a videoconferencing session hosted by the hostingdevice 101. Themobile devices FIG. 1A are used by way of example. WhileFIG. 1A illustrates twomobile devices mobile devices 115. - In one embodiment, the
participation application 123 is distributed such that it may be stored in part on themobile device registration server 130. For example, theparticipation application 123 b on themobile device 115 acts as a thin-client application that displays the video stream or the data stream while theregistration server 130 performs the screen detection steps. Theparticipation application 123 b on themobile device 115 a instructs the display to present the video stream or the data stream, for example, by rendering images in a browser. Theparticipation application 123 b receives user input (e.g., natural gestures) from theparticipant 125 and interprets the user input. For example, assume theparticipation application 123 b currently displays the video stream. Theparticipation application 123 b receives user input from theparticipant 125 a to magnify the screen so much that it overcomes a threshold and theparticipation application 123 b determines that the stream should be switched to the data stream of the hostingdevice 101. Theparticipation application 123 b sends instructions indicating to switch from the video stream to the data stream to theparticipation application 123 a on theregistration server 130. - The
participation application 123 can be code and routines for participating in a multi-user communication session. In one embodiment, theparticipation application 123 can be implemented using hardware including a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC). In another embodiment, theparticipation application 123 can be implemented using a combination of hardware and software. In various embodiments, theparticipation application 123 may be stored in a combination of the devices and servers, or in one of the devices or servers. -
FIG. 1B is another embodiment of a system for switching between video views and data views. In this embodiment, there is no hostingdevice 101. Instead,mobile devices 115 can comprise thecamera 103 and theparticipation application 123 b. Themobile devices 115 are coupled to thedisplay devices signal lines - The
participant 125 can activate thecamera 103 on themobile device 115 and point it at thedisplay devices mobile device 115 can transmit the images directly to theregistration server 130 viasignal line 154. For example, the images can serve as a query from themobile device 115. Theparticipation application 123 uses the captured images to detect the screen from the video view and switches to the data view in response to receiving gestures from theparticipant 125. - Referring now to
FIG. 2 , an example of theparticipation application 123 is shown in more detail.FIG. 2 is a block diagram of acomputing device 200 that includes aparticipation application 123, aprocessor 235, amemory 237, an input/output device 241, acommunication unit 239 and astorage device 243 according to some examples. The components of thecomputing device 200 are communicatively coupled by abus 220. The input/output device 241 is communicatively coupled to thebus 200 viasignal line 242. In some embodiments, thecomputing device 200 can be one of amobile device 115 and aregistration server 130. For example, in one embodiment, theregistration server 130 can include aparticipation application 123 with some of the components described below and themobile device 115 can include some of the other components described below. - The
processor 235 includes an arithmetic logic unit, a microprocessor, a general purpose controller or some other processor array to perform computations and provide electronic display signals to a display device. Theprocessor 235 is coupled to thebus 220 for communication with the other components viasignal line 222.Processor 235 processes data signals and may include various computing architectures including a complex instruction set computer (CISC) architecture, a reduced instruction set computer (RISC) architecture, or an architecture implementing a combination of instruction sets. AlthoughFIG. 2 includes asingle processor 235,multiple processors 235 may be included. Other processors, operating systems, sensors, displays and physical configurations are possible. - The
memory 237 stores instructions and/or data that can be executed by theprocessor 235. Thememory 237 is coupled to thebus 220 for communication with the other components viasignal line 224. The instructions and/or data may include code for performing the techniques described herein. Thememory 237 may be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, flash memory or some other memory device. In some embodiments, thememory 237 also includes a non-volatile memory or similar permanent storage device and media including a hard disk drive, a floppy disk drive, a CD-ROM device, a DVD-ROM device, a DVD-RAM device, a DVD-RW device, a flash memory device, or some other mass storage device for storing information on a more permanent basis. - The
communication unit 239 transmits and receives data to and from at least one of the hostingdevice 101, themobile device 115 and theregistration server 130 depending upon where theparticipation application 123 is stored. Thecommunication unit 239 is coupled to thebus 220 viasignal line 226. In some embodiments, thecommunication unit 239 includes a port for direct physical connection to thenetwork 105 or to another communication channel. For example, thecommunication unit 239 includes a USB, SD, CAT-5 or similar port for wired communication with themobile device 115. In some embodiments, thecommunication unit 239 includes a wireless transceiver for exchanging data with themobile device 115 or other communication channels using one or more wireless communication methods, including IEEE 802.11, IEEE 802.16, BLUETOOTH® or another suitable wireless communication method. - In some embodiments, the
communication unit 239 includes a cellular communications transceiver for sending and receiving data over a cellular communications network including via short messaging service (SMS), multimedia messaging service (MMS), hypertext transfer protocol (HTTP), direct data connection, WAP, e-mail or another suitable type of electronic communication. In some embodiments, thecommunication unit 239 includes a wired port and a wireless transceiver. Thecommunication unit 239 also provides other conventional connections to thenetwork 105 for distribution of files and/or media objects using standard network protocols including TCP/IP, HTTP, HTTPS and SMTP, etc. - The
storage device 243 can be a non-transitory memory that stores data for providing the functionality described herein. Thestorage device 243 may be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, flash memory or some other memory devices. In some embodiments, thestorage device 243 also includes a non-volatile memory or similar permanent storage device and media including a hard disk drive, a floppy disk drive, a CD-ROM device, a DVD-ROM device, a DVD-RAM device, a DVD-RW device, a flash memory device, or some other mass storage device for storing information on a more permanent basis. - In the illustrated embodiment, the
storage device 243 is communicatively coupled to thebus 220 viasignal line 228. In one embodiment, thestorage device 243 stores one or more of a video stream including one or more video frame images, a data stream including one or more data screen images and one or more detection trigger events, etc. Thestorage device 243 may store other data for providing the functionality described herein. For example, thestorage device 243 could store copies of video conferencing materials, such as presentations, documents, audio clips, video clips, etc. - In the illustrated embodiment shown in
FIG. 2 , theparticipation application 123 includes acontroller 202, aview presentation module 204, ascreen detection module 206, aview switching module 208, auser interface module 210 and an optional camera adjustment module 212. The components of theparticipation application 123 are communicatively coupled via thebus 220. Persons of ordinary skill in the art will recognize that the components can be stored in part on themobile device 115 and in part on theregistration server 130. For example, theparticipation application 123 stored on theregistration server 130 could include thescreen detection module 206 and theparticipation application 123 stored on the mobile device could include the remaining components. - The
controller 202 can be software including routines for handling communications between theparticipation application 123 and other components of thecomputing device 200. In one embodiment, thecontroller 202 can be a set of instructions executable by theprocessor 235 to provide the functionality described below for handling communications between theparticipation application 123 and other components of thecomputing device 200. In another embodiment, thecontroller 202 can be stored in thememory 237 of thecomputing device 200 and can be accessible and executable by theprocessor 235. In either embodiment, thecontroller 202 can be adapted for cooperation and communication with theprocessor 235 and other components of thecomputing device 200 viasignal line 230. - In one embodiment, the
controller 202 sends and receives data, via thecommunication unit 239, to and from one or more of themobile device 115, the hostingdevice 101 and theregistration server 130. For example, thecontroller 202 receives, via thecommunication unit 239, user input from aparticipant 125 operating on amobile device 115 and sends the user input to theview switching module 208. In another example, thecontroller 202 receives graphical data for providing a user interface to aparticipant 125 from theuser interface module 210 and sends the graphical data to amobile device 115, causing themobile device 115 to present the user interface to theparticipant 125. - In one embodiment, the
controller 202 receives data from other components of theparticipation application 123 and stores the data in thestorage device 243. For example, thecontroller 202 receives data describing one or more detection trigger events from thescreen detection module 206 and stores the data in thestorage device 243. In another embodiment, thecontroller 202 retrieves data from thestorage device 243 and sends the data to other components of theparticipation application 123. For example, thecontroller 202 retrieves a data stream from thestorage device 243 and sends the data stream to theview presentation module 204 for presenting the data stream to aparticipant 125. - The
view presentation module 204 can be software including routines for presenting a video view or a data view on amobile device 115. In one embodiment, theview presentation module 204 can be a set of instructions executable by theprocessor 235 to provide the functionality described below for presenting a data view or a video view on amobile device 115. In another embodiment, theview presentation module 204 can be stored in thememory 237 of thecomputing device 200 and can be accessible and executable by theprocessor 235. In either embodiment, theview presentation module 204 can be adapted for cooperation and communication with theprocessor 235 and other components of thecomputing device 200 viasignal line 232. - A video view mode presents video data associated with a multi-user communication session to a
participant 125. For example, the video view mode presents the video stream of the other participants in the multi-user communication session to theparticipant 125 in full screen on themobile device 115. In another example, the video view mode presents the video stream on themobile device 115 in full resolution. - In one embodiment, the
view presentation module 204 receives data indicating that aparticipant 125 joins a multi-user communication session from amobile device 115 associated with theparticipant 125. Correspondingly, themobile device 115 is in the video view mode. Theview presentation module 204 receives a video stream including one or more video frame images from thecamera 103 directly or via the hostingdevice 101, and presents the video stream to theparticipant 125 on a display of themobile device 115. - In some examples, one or more data screens that are in the
same hosting environment 137 as thecamera 103 are captured in the one or more video frame images of the video stream, and the one or more video frame images include sub-images depicting the one or more data screens. For example, the one or more video frame images capture at least a portion of a data screen of a hostingdevice 101, a portion of a screen projection on the wall and/or a portion of a data screen of an electronic whiteboard. In another example, the one or more video frame images capture the full data screen of the hostingdevice 101, the full projection screen on the wall and/or the full data screen of the electronic whiteboard. - A data view mode presents a data stream associated with the multi-user communication session to the
participant 125. For example, the data view mode presents a data stream with the slides being presented during the multi-user communication session to theparticipant 125 in full screen on themobile device 115. In another example, the data view mode presents the data stream on themobile device 115 in full resolution. - In one embodiment, the
view presentation module 204 receives, from theview switching module 208, an identifier (ID) of a detected data screen and a view switching signal indicating that a view on themobile device 115 should be switched from the video view to the data view. In some embodiments, theview presentation module 204 receives a data stream associated with the detected data screen directly from thedisplay device 107 associated with the data screen. In some other embodiments, theview presentation module 204 receives the data stream via the hostingdevice 101. Responsive to receiving the view switching signal, theview presentation module 204 stops presenting the video stream on themobile device 115 and starts to present the data stream associated with the data screen on themobile device 115. Persons of ordinary skill in the art will recognize that the sections describing presenting the video stream or data stream are meant to represent theview presentation module 204 instructing theuser interface module 210 to generate graphical data that is sent to themobile device 115 via thecommunication unit 239 for display. - In some examples, an embedded data stream is included in the data stream. The
view presentation module 204 receives, from theview switching module 208, a view switching signal instructing theview presentation module 204 to switch the view on themobile device 115 from the data view to an embedded data view. The embedded data view mode presents the embedded data stream to theparticipant 125 in full resolution or in full screen on themobile device 115. Responsive to receiving the view switching signal, theview presentation module 204 stops presenting the data stream and starts to present the embedded data stream on themobile device 115. An embedded data stream can be a videoconferencing meeting, a presentation, a video clip, a text document, presentation slides, or other types of data embedded in the data stream. - If the
view presentation module 204 receives a view switching signal instructing theview presentation module 204 to switch the view from the embedded data view back to the data view from theview switching module 208, theview presentation module 204 stops presenting the embedded data stream and starts to present the data stream on themobile device 115 again. In one embodiment, theview presentation module 204 receives a view switching signal instructing theview presentation module 204 to switch the view on themobile device 115 from the data view to the video view from theview switching module 208. Responsive to the view switching signal, theview presentation module 204 stops presenting the data stream and starts to present the video stream on themobile device 115. - The
screen detection module 206 can be software including routines for performing data screen detection in a video frame image. In one embodiment, thescreen detection module 206 can be a set of instructions executable by theprocessor 235 to provide the functionality described below for performing data screen detection in a video frame image. In another embodiment, thescreen detection module 206 can be stored in thememory 237 of thecomputing device 200 and can be accessible and executable by theprocessor 235. In either embodiment, thescreen detection module 206 can be adapted for cooperation and communication with theprocessor 235 and other components of thecomputing device 200 viasignal line 234. - In one embodiment, the
screen detection module 206 registers one ormore display devices 107 with theregistration server 130. For example, thescreen detection module 206 can record a device identifier, a user associated with thedisplay device 107, etc. with thedisplay device 107 and store the registration information in thestorage 243. Eachdisplay device 107 sends an updated image of its data screen to theregistration server 130 periodically. For example, eachdisplay device 107 sends its up-to-date screenshot image to theregistration server 130 periodically. In some examples, thedisplay device 107 sends the updated screenshot images of its data screen to theregistration server 130 via the hostingdevice 101. - In one embodiment, the
screen detection module 206 detects an occurrence of a trigger event. For example, the event could be a detection trigger event that triggers a detection of one or more data screens in a video frame image. For example, a detection trigger event causes thescreen detection module 206 to detect whether the video frame image includes a data screen. Example detection trigger events include, but are not limited to, motion of the camera 103 (e.g., panning, zooming or tilting of thecamera 103, movement of thecamera 103, etc.) and/or motion of an object in the video frame image (e.g., appearance of a projection on the wall in the video frame image, movement of a whiteboard, etc.). In another example, the trigger event could be based on a timer. - Responsive to the occurrence of the detection trigger event, the
screen detection module 206 receives a latest video frame image of the video stream from thecamera 103 directly or via the hostingdevice 101. In some examples, thescreen detection module 206 receives the latest video frame image of the video stream from themobile device 115 or a video server that provides the video stream. Thescreen detection module 206 performs data screen detection in the latest video frame image responsive to the occurrence of the detection trigger event. For example, thescreen detection module 206 determines whether a data screen appears in the latest video frame image by matching a latest screenshot image of the data screen with the latest video frame image. - In some examples, for each data screen registered with the
registration server 130, thescreen detection module 206 determines whether a sub-image that matches the latest screenshot image of the data screen appears in the latest video frame image. For example, thescreen detection module 206 determines whether the latest video frame image includes a sub-image that depicts the data screen (e.g., thescreen detection module 206 determines whether the data screen is captured by the latest video frame image). In a further example, thescreen detection module 206 runs an image matching algorithm to find the correspondence between the latest video frame image and the latest screenshot image of the data screen. If thescreen detection module 206 finds a match between the latest video frame image and the latest screenshot image of the data screen, thescreen detection module 206 highlights the matching area in the video frame image on themobile device 115. For example, thescreen detection module 206 highlights the detected data screen in the video frame image on themobile device 115. - In one embodiment, the
screen detection module 206 runs the image matching algorithm in real time. An example image matching algorithm includes a scale-invariant feature transform (SIFT) algorithm. The SIFT algorithm extracts feature points of both the latest video frame image and the latest screenshot image of the data screen, where the feature points from both images are matched based on the k-nearest neighbors (KNN), and the random sample consensus (RANSAC) algorithm is used to find the consensus and to determine the homographic matrix. Additional information about how to use SIFT, KNN and RANSAC for image matching can be found at Hess, R., An Open-Source SIFT Library, Proceedings of the International Conference on Multimedia, October 2010, pp. 1493-96. Persons of ordinary skill in the art will recognize that other image matching algorithms can be used. - If the
screen detection module 206 detects one or more data screens existing in the video frame image, thescreen detection module 206 generates a matching result including one or more matches between the one or more data screens and the video frame image. Thescreen detection module 206 notifies themobile device 115 of the one or more matches, and establishes a direct connection between themobile device 115 and eachdisplay device 107 that has one matched data screen. Thescreen detection module 206 highlights one or more matching areas in the video frame image, where each matching area corresponds to a position of one data screen captured in the video frame image. Thescreen detection module 206 displays the highlighted matching areas on themobile device 115. - In another embodiment, the
camera 103 is statically deployed and captures one or more data screens in the hostingenvironment 137, and positions of the one or more data screens remain unchanged in the video frame images. Thescreen detection module 206 can determine existence of the one or more data screens based on the static setting in the hostingenvironment 137, and can pre-calibrate positions of the one or more data screens in the video frame images. Thescreen detection module 206 highlights the one or more data screens in the video frame images at the pre-calibrated positions in the video frame images. - The
screen detection module 206 sends one or more screen IDs identifying the one or more detected data screens and data describing one or more matching areas in the video frame image to theview switching module 208. In another embodiment, thescreen detection module 206 sends pre-calibrated positions of one or more data screens to theview switching module 208. In yet another embodiment, thescreen detection module 206 stores the one or more screen IDs, data describing the one or more matching areas and/or the pre-calibrated positions in thestorage 243. - The
view switching module 208 can be software including routines for switching a view on amobile device 115 between a video view and a data view. In one embodiment, theview switching module 208 can be a set of instructions executable by theprocessor 235 to provide the functionality described below for switching a view on amobile device 115 between a video view and a data view. In another embodiment, theview switching module 208 can be stored in thememory 237 of thecomputing device 200 and can be accessible and executable by theprocessor 235. In either embodiment, theview switching module 208 can be adapted for cooperation and communication with theprocessor 235 and other components of thecomputing device 200 viasignal line 236. - In one embodiment, the
view switching module 208 receives, from thescreen detection module 206, data describing one or more screen IDs identifying one or more detected data screens and one or more matching areas associated with the one or more detected data screens in the video frame image. In the video view mode, themobile device 115 presents the video stream to theparticipant 125, with the one or more detected data screens highlighted in the matching areas of the video frame images. If theparticipant 125 performs a natural gesture (e.g., a pinch open or double tap gesture, etc.) within a highlighted matching area of a data screen on a touch screen of themobile device 115, theview switching module 208 interprets the participant's natural gesture as a command to switch from the video view to the data view. Theview switching module 208 generates a view switching signal describing the command and sends the view switching signal to theview presentation module 204, causing theview presentation module 204 to present the data view to theparticipant 125. In one embodiment, theview switching module 208 interprets the natural gesture as a command to switch from the video view to the data view if the portion of the data screen detected in the video frame image is greater than a predetermined threshold (e.g., a majority portion of the data screen appearing in the video frame image). - For example, the
participant 125 can use a natural gesture to zoom into a data screen detected in the video frame image, so that the video view presenting the video frame image scales up accordingly on the touch screen of themobile device 115. If the size of the scaled-up data screen in the video frame image reaches a predetermined threshold, theview switching module 208 automatically switches the view on themobile device 115 from the video view to the data view, causing theview presentation module 204 to present the data stream associated with the detected data screen on themobile device 115. Themobile device 115 switches from the video view mode to the data view mode accordingly. Theparticipant 125 can further perform natural gestures to operate on the data stream such as zooming into the data stream, copying the data stream, dragging the data stream, etc. - In the data view mode, the
mobile device 115 presents the data stream to theparticipant 125. If theparticipant 125 performs a natural gesture (e.g., a pinch close gesture or tapping on an exit icon, etc.) on the data stream displayed on a touch screen of themobile device 115, theview switching module 208 interprets the participant's natural gesture as a command to switch from the data view back to the video view. Theview switching module 208 generates a view switching signal describing the command and sends the view switching signal to theview presentation module 204, causing theview presentation module 204 to present the video view to theparticipant 125. Again, thescreen detection module 206 detects the one or more data screens visible to thecamera 103 in the video frame images, and highlights the one or more data screens in the video frame images. For example, theparticipant 125 can use a natural gesture to zoom out the data stream, so that the data view presenting the data stream scales down accordingly on the touch screen of themobile device 115. If the size of the scaled down data stream reaches a predetermined threshold, theview switching module 208 automatically switches the view on themobile device 115 from the data view to the video view, causing theview presentation module 204 to present the video stream on themobile device 115. - In the data view mode, if the presented data stream includes an embedded data stream, the
participant 125 can use a natural gesture on the embedded data stream. Theview switching module 208 interprets the natural gesture as a command to switch from the data view to the embedded data view. Theview switching module 208 generates a view switching signal describing the command and sends the view switching signal to theview presentation module 204, causing theview presentation module 204 to present the embedded data stream to theparticipant 125 in full resolution. Theparticipant 125 may perform another natural gesture to exit from the embedded data view and return to the data view. For example, if the data stream includes an embedded video, theparticipant 125 can issue a tap open command on an icon representing the embedded video during the data view mode, causing theview presentation module 204 to present the embedded video in full screen on themobile device 115. After viewing the embedded video, theparticipant 125 can issue a pinch close command to exit from the embedded data view and return to the data view. - The
user interface module 210 can be software including routines for generating graphical data for providing a user interface. In one embodiment, theuser interface module 210 can be a set of instructions executable by theprocessor 235 to provide the functionality described below for generating graphical data for providing a user interface. In another embodiment, theuser interface module 210 can be stored in thememory 237 of thecomputing device 200 and can be accessible and executable by theprocessor 235. In either embodiment, theuser interface module 210 can be adapted for cooperation and communication with theprocessor 235 and other components of thecomputing device 200 viasignal line 238. - In one embodiment, the
user interface module 210 receives instructions from theview presentation module 204 to generate graphical data for providing a user interface to a user such as ahost 135 or aparticipant 125. Theuser interface module 210 sends the graphical data to the hostingdevice 101 or themobile device 115, causing the hostingdevice 101 or themobile device 115 to present the user interface to the user. For example, theuser interface module 210 generates graphical data for providing a user interface that depicts a video stream or a data stream. Theuser interface module 210 sends the graphical data to themobile device 115, causing themobile device 115 to present the video stream or the data stream to theparticipant 125 via the user interface. In other embodiments, theuser interface module 210 may generate graphical data for providing other user interfaces to users. - The optional camera adjustment module 212 can be software including routines for adjusting a
camera 103. In one embodiment, the camera adjustment module 212 can be a set of instructions executable by theprocessor 235 to provide the functionality described below for adjusting acamera 103. In another embodiment, the camera adjustment module 212 can be stored in thememory 237 of thecomputing device 200 and can be accessible and executable by theprocessor 235. In either embodiment, the camera adjustment module 212 can be adapted for cooperation and communication with theprocessor 235 and other components of thecomputing device 200 viasignal line 240. - In one embodiment, a
participant 125 can use natural gestures to navigate thecamera 103. For example, theparticipant 125 can perform a natural gesture to change the view angle of thecamera 103 via a user interface shown on themobile device 115. The camera adjustment module 212 receives data describing the participant's natural gesture and interprets the participant's natural gesture as a command to adjust thecamera 103 such as panning, tilting, zooming in or zooming out thecamera 103. The camera adjustment module 212 adjusts thecamera 103 according to the participant's natural gesture. Through adjusting thecamera 103, theparticipant 125 may keep one or more data screens of one ormore display devices 107 within the field of view of thecamera 103, so that thecamera 103 captures the one or more data screens in the video frame images. - An example use of the system described herein includes a videoconferencing scenario, where a first party (e.g., a host 135) is in a conference room equipped with a
camera 103 and one or more data screens, and a second party (e.g., a participant 125) is a remote mobile user participating the videoconference using amobile device 115 such as a smart phone or a tablet. After theparticipant 125 joins the videoconference, theparticipation application 123 receives a video stream from thecamera 103 and presents the video stream to theparticipant 125 on a touch screen of themobile device 115. Theparticipation application 123 detects one or more data screens captured by video frame images. Theparticipant 125 can issue a natural gesture such as a pinch open gesture on a detected data screen highlighted in the video frame images, causing themobile device 115 to switch from video view to data view. Afterwards, theparticipation application 123 presents a data stream associated with the detected data screen to theparticipant 125 in full resolution. Theparticipant 125 may issue another natural gesture such as a pinch close gesture to switch from the data view back to the video view. - Another example use of the system described herein includes a retrieval application for retrieving information relevant to an image. For example, a user can capture an image of an advertisement (e.g., an advertisement for a vehicle brand), and instruct the retrieval application to retrieve information relevant to the advertisement. The image of the advertisement may include a banner and/or a data screen image showing a commercial video. The retrieval application can instruct the
screen detection module 206 to detect the data screen in the image of the advertisement and to identify a product that matches content shown in the data screen image. The retrieval application may retrieve information relevant to the identified product from one or more databases and provide the relevant information to a user. Other example uses of the system described herein are possible. -
FIG. 3A is agraphic representation 300 illustrating one embodiment of a process for performing data screen detection. After aparticipant 125 joins a multi-user communication session using themobile device 115, thecamera 103 establishes avideo stream connection 302 with themobile device 115. Thecamera 103 sends a video stream to themobile device 115 via thevideo stream connection 302, causing themobile device 115 to present the video stream to theparticipant 125 in a video view mode. Thedisplay device 107 registers with theregistration server 130 and sends updatedscreenshot images 304 of a data screen associated with thedisplay device 107 to theregistration server 130 periodically. In the illustrated example, thedisplay device 107 is an electronic whiteboard. In one embodiment, theregistration server 130 detects a detection trigger event. For example, theregistration server 130 detects motion of thecamera 103 such as panning or tilting. Theregistration server 130 receives a latestvideo frame image 306 from thecamera 103 responsive to the detection trigger event. In some examples, theregistration server 130 receives the latestvideo frame image 306 from themobile device 115. - The
registration server 130 uses an image-matching method to detect active data screens dynamically. For example, theregistration server 130 uses an image matching algorithm to find the correspondence between the latestvideo frame image 306 and the latest screenshot image received from either the hostingdevice 101 or thedisplay device 107. If amatching result 308 between the latestvideo frame image 306 and the latest screenshot image of the data screen is found, theregistration server 130 notifies themobile device 115 of the matchingresult 308 and highlights the corresponding data screen in the video frame images. For example, theregistration server 130 uses abox 310 to highlight a data screen of an electronic whiteboard in the video frame image. Thedisplay device 107 associated with the data screen establishes adata stream connection 312 with themobile device 115. Thedisplay device 107 may send a data stream to themobile device 115 via thedata stream connection 312. -
FIG. 3B is agraphic representation 319 illustrating one embodiment for switching between video views and data views on amobile device 115 using natural gestures. Theparticipation application 123 interprets natural gestures from aparticipant 125 to achieve the seamless user experience. In this embodiment, theparticipation application 123 captures and transmits a first data stream and a second data stream along with the video stream captured from thecamera 103 to themobile device 115. The first data stream includes the high quality screenshot images from the hosting device 101 (e.g., a laptop), and the second data stream includes the strokes from the display device 107 (e.g., an electronic whiteboard). Both data screens (the data screen of the laptop and the data screen of the electronic whiteboard) are visible to thecamera 103. In one embodiment, the screenshot images from the hostingdevice 101 include an embedded image depicting the data screen of thedisplay device 107. - At the beginning, the video view is shown on the
mobile device 115 to present thevideo frame image 320 to theparticipant 125. For example, thevideo frame image 320 is shown in full resolution or on full screen of themobile device 115. As shown inFIG. 3B , both data screens (thedata screen 324 of the laptop and the data screen 322 of the electronic whiteboard) are visible in thevideo frame image 320 displayed on the participant'smobile device 115. Theparticipation application 123 intelligently detects and notifies theparticipant 125 of the existence of data screens in the video frame images. For example, theparticipation application 123 highlights the data screens 322 and 324 in thevideo frame image 320. - At phase (1), if the
participant 125 tries to get more detail from thelaptop data screen 324, he or she can perform anatural gesture 330 on the laptop data screen 324 shown in thevideo frame image 320 to zoom into thelaptop data screen 324. An examplenatural gesture 330 can be a pinch or double tap gesture. Responsive to thenatural gesture 330, the video view on themobile device 115 scales up. If the size of the recognized laptop data screen 324 reaches a pre-set threshold, themobile device 115 automatically switches from the video view to the data view. For example, the view on themobile device 115 switches from presenting thevideo frame image 320 in full resolution to presenting a highquality screenshot image 326 of the laptop data screen 324 in full resolution. Theparticipation application 123 interprets any further pinch or dragging gestures performed on thescreenshot image 326 as operating on thescreenshot image 326 of thelaptop data screen 324. - At phase (2), when the
participant 125 performs anatural gesture 332 such as a pinch gesture on thescreenshot image 326 to zoom out of the data view and the zoom-out scale ratio reaches a pre-set threshold, themobile device 115 switches back to the video view from the data view. Again, theparticipation application 123 presents thevideo frame image 320 on themobile device 115 in full resolution, and detects and marks the visible data screens 322 and 324 in thevideo frame image 320. - At phase (3), the
participant 125 performs anatural gesture 334 such as a dragging gesture on the highlighteddata screen 322 in thevideo frame image 320, causing themobile device 115 to enlarge the video view greater than a threshold amount, which causes themobile device 115 to switch from showing thevideo frame image 320 in full resolution to the data view showing ascreenshot image 328 of the electronic whiteboard in full resolution. At phase (4), theparticipant 125 performs anatural gesture 336 such as a pinch gesture on thescreenshot image 328 to zoom out the data view, causing themobile device 115 to decrease the data view until a threshold point triggers themobile device 115 to switch back to the video view from the data view. Again, themobile device 115 presents thevideo frame image 320 to theparticipant 125. -
FIG. 4A is agraphic representation 400 of one embodiment of a graphic user interface illustrating a video view on amobile device 115. The example user interface shows avideo frame image 402 depicting a conference room. Thevideo frame image 402 depicts ahost 135 and adata screen 404 of the hostingdevice 101 projected on a wall of the conference room. The data screen 404 includes an embeddeddata screen 406. If theparticipant 125 performs a natural gesture on thedata screen 404 captured in thevideo frame image 402, themobile device 115 switches from the video view to the data view shown inFIG. 4B . -
FIG. 4B is agraphic representation 420 of one embodiment of a graphic user interface illustrating a data view on amobile device 115. In this example, a data stream including screenshot images of thedata screen 404 is presented on themobile device 115. The data stream is a multi-user communication session including an embedded data stream. For example, the data stream is a video clip of another conference with embedded slides. The embeddeddata screen 406 presenting the embedded slides is shown in the screenshot image of thedata screen 404. - When the
participant 125 switches to the data view shown inFIG. 4B from the video view shown inFIG. 4A , the data stream including the video clip starts to play. In one embodiment, theparticipant 125 may exit from the data view shown inFIG. 4B and return to the video view shown inFIG. 4A by performing a natural gesture (e.g., a pinch to close gesture) on the screenshot image of thedata screen 404. In one embodiment, theparticipant 125 can keep zooming into the data view if the video clip includes embedded presentation slides or whiteboard strokes information. For example, if theparticipant 125 performs a natural gesture on the embeddeddata screen 406, themobile device 115 can switch from the data view to an embedded data view shown inFIG. 4C to present slides embedded in the video clip. -
FIG. 4C is agraphic representation 440 of one embodiment of a graphic user interface illustrating an embedded data view on amobile device 115. In this example, the slides shown in the embeddeddata screen 406 is presented to theparticipant 125. Theparticipant 125 may exit from the embedded data view and return to the data view shown inFIG. 4B by performing a natural gesture (e.g., a pinch to close gesture) on the screenshot image of the embeddeddata screen 406. -
FIG. 5 is a flow diagram illustrating one embodiment of amethod 500 for switching between video views and data views using natural gestures in a multi-user communication session. In one embodiment, thecontroller 202 receives 502 data indicating that aparticipant 125 joined a multi-user communication session from amobile device 115 associated with theparticipant 125. Theview presentation module 204 presents 504 a video stream of the multi-user communication session to themobile device 115. For example, theview presentation module 204 instructs theuser interface engine 210 to generate graphical data for displaying the video stream. In one embodiment, thescreen detection module 206 determines an occurrence of a detection trigger event. Thecontroller 202 receives 506 a video frame image from the video stream responsive to the occurrence of the detection trigger event. For example, thecontroller 202 receives a latest video frame image of the video stream from thecamera 103. Thescreen detection module 206 detects 508 a first data screen in the video frame image. For example, thescreen detection module 206 determines that the video frame image captures the first data screen. - The
controller 202 receives 510 data describing a first natural gesture performed on themobile device 115. For example, thecontroller 202 receives data describing a pinch to open gesture performed on the first data screen in the video frame image. Theview switching module 208 switches 512 a view on themobile device 115 from video view to data view responsive to the first natural gesture. Theview presentation module 204 presents 514 a first data stream associated with the first data screen on themobile device 115. In one embodiment, the first data stream includes one or more high-definition screenshot images of the first data screen generated by adisplay device 107 associated with the first data screen. -
FIGS. 6A-6C are flow diagrams illustrating another embodiment of amethod 600 for switching between video views and data views using natural gestures in a multi-user communication session. Referring toFIG. 6A , thecontroller 202 receives 602 data indicating that aparticipant 125 joined a multi-user communication session from amobile device 115 associated with theparticipant 125. Theview presentation module 204 presents 604 a video stream of the multi-user communication session on themobile device 115. Thescreen detection module 206 registers 606 adisplay device 107 with theregistration server 130. Thedisplay device 107 includes a data screen for presenting a data stream of the multi-user communication session in the hostingenvironment 137. Thecontroller 202 receives 608 images of the data screen from thedisplay device 107 periodically. For example, thecontroller 202 receives screenshot images of the data screen from thedisplay device 107 periodically. - The
screen detection module 206 detects 610 an occurrence of a detection trigger event. Thecontroller 202 receives 612 a latest video frame image from thecamera 103 responsive to the occurrence of the detection trigger event. Thescreen detection module 206 performs 614 data screen detection in the latest video frame image using the latest image of the data screen received from thedisplay device 107. - Referring to
FIG. 6B , thescreen detection module 206 determines 616 whether a sub-image that matches the latest image of the data screen is found in the latest video frame image. If the sub-image is found in the latest video frame image, themethod 600 moves to step 618. Otherwise, themethod 600 ends. Turning to step 618, thescreen detection module 206 generates a matching result indicating the match between the latest image of the data screen and the latest video frame image, and notifies themobile device 115 of the matching result. Thescreen detection module 206 provides data between themobile device 115 and thedisplay device 107 associated with the data screen. For example, thescreen detection module 206 establishes a direct connection between the devices. In one embodiment, thedisplay device 107 can transmit a data stream associated with the data screen to themobile device 115 via the direct connection. - The
controller 202 receives 622 data describing a first natural gesture performed by theparticipant 125 on the sub-image depicting the data screen in the video frame image. Thecontroller 202 receives 624 a data stream associated with the data screen from thedisplay device 107. Theview switching module 208 switches 626 a view on themobile device 115 from video view to data view responsive to the first natural gesture exceeding a threshold. For example, the user makes an expanding view starting in the center of the screen and moving over half the width of the screen. Theview presentation module 204 presents 628 the data stream associated with the data screen on themobile device 115. - Referring to
FIG. 6C , thecontroller 202 receives 630 data describing a second natural gesture performed by theparticipant 125 on the data stream. Theview switching module 208switches 632 the view on themobile device 115 from data view back to video view responsive to the second natural gesture exceeding a threshold. Theview presentation module 204 presents 634 the video stream on themobile device 115. -
FIG. 7 is a flow diagram illustrating one embodiment of amethod 700 for switching between a video view and one of two different data views using a selection in a multi-user communication session. In one embodiment, thecontroller 202 receives 702 data indicating that a first participant, a second participant and athird participant 125 joined a multi-user communication session. Theview presentation module 204 presents 704 a video stream of the multi-user communication session to amobile device 115 associated with thethird participant 125. For example, theview presentation module 204 instructs theuser interface engine 210 to generate graphical data for displaying the video stream. In one embodiment, thescreen detection module 206 determines an occurrence of a detection trigger event. Thecontroller 202 receives 706 a video frame image from the video stream that includes a first device associated with the first participant and a second device associated with the second participant. For example, thecontroller 202 receives a latest video frame image of the video stream from thecamera 103. Thescreen detection module 206 detects 708 a first data screen from the first device and a second data screen from the second device in the video frame image. - The
controller 202 receives 710 data describing a selection of the first data screen performed on themobile device 115. For example, thecontroller 202 receives data describing a finger pressing in the center of the image of the first device to indicate that the third participant wants to view the first data view. Theview switching module 208 switches 712 a view on themobile device 115 from video view to a first data view that corresponds to the first data screen responsive to the selection. Theview presentation module 204 presents 714 the first data stream on themobile device 115. - The foregoing description of the embodiments has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the specification to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the embodiments be limited not by this detailed description, but rather by the claims of this application. As will be understood by those familiar with the art, the examples may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Likewise, the particular naming and division of the modules, routines, features, attributes, methodologies and other aspects are not mandatory or significant, and the mechanisms that implement the description or its features may have different names, divisions and/or formats. Furthermore, as will be apparent to one of ordinary skill in the relevant art, the modules, routines, features, attributes, methodologies and other aspects of the specification can be implemented as software, hardware, firmware or any combination of the three. Also, wherever a component, an example of which is a module, of the specification is implemented as software, the component can be implemented as a standalone program, as part of a larger program, as a plurality of separate programs, as a statically or dynamically linked library, as a kernel loadable module, as a device driver, and/or in every and any other way known now or in the future to those of ordinary skill in the art of computer programming. Additionally, the specification is in no way limited to embodiments in any specific programming language, or for any specific operating system or environment. Accordingly, the disclosure is intended to be illustrative, but not limiting, of the scope of the specification, which is set forth in the following claims.
Claims (1)
1. A computer-implemented method comprising:
receiving, with one or more processors, data indicating a participant joined a multi-user communication session;
presenting, with the one or more processors, a video stream of the multi-user communication session on a mobile device associated with the participant;
receiving, with the one or more processors, a video frame image from the video stream responsive to the occurrence of the detection trigger event;
detecting, with the one or more processors, a first data screen in the video frame image;
receiving, with the one or more processors, data describing a first natural gesture performed on the mobile device;
switching, with the one or more processors, a view on the mobile device from video view to data view responsive to the first natural gesture exceeding a threshold; and
presenting, with the one or more processors, a first data stream associated with the first data screen on the mobile device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/946,415 US20160077703A1 (en) | 2013-05-20 | 2015-11-19 | Switching Between Views Using Natural Gestures |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361825482P | 2013-05-20 | 2013-05-20 | |
US14/019,915 US9197853B2 (en) | 2013-05-20 | 2013-09-06 | Switching between views using natural gestures |
US14/946,415 US20160077703A1 (en) | 2013-05-20 | 2015-11-19 | Switching Between Views Using Natural Gestures |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/019,915 Continuation US9197853B2 (en) | 2013-05-20 | 2013-09-06 | Switching between views using natural gestures |
Publications (1)
Publication Number | Publication Date |
---|---|
US20160077703A1 true US20160077703A1 (en) | 2016-03-17 |
Family
ID=51895457
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/019,915 Expired - Fee Related US9197853B2 (en) | 2013-05-20 | 2013-09-06 | Switching between views using natural gestures |
US14/946,415 Abandoned US20160077703A1 (en) | 2013-05-20 | 2015-11-19 | Switching Between Views Using Natural Gestures |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/019,915 Expired - Fee Related US9197853B2 (en) | 2013-05-20 | 2013-09-06 | Switching between views using natural gestures |
Country Status (1)
Country | Link |
---|---|
US (2) | US9197853B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
NO20172029A1 (en) * | 2017-12-22 | 2018-10-08 | Pexip AS | Visual control of a video conference |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103096141B (en) * | 2011-11-08 | 2019-06-11 | 华为技术有限公司 | A kind of method, apparatus and system obtaining visual angle |
JP6379805B2 (en) * | 2013-09-17 | 2018-08-29 | 株式会社リコー | Information processing program, information processing apparatus, and information processing system |
KR20150043755A (en) * | 2013-10-15 | 2015-04-23 | 삼성전자주식회사 | Electronic device, method and computer readable recording medium for displaing of the electronic device |
TWI526080B (en) * | 2013-12-26 | 2016-03-11 | 廣達電腦股份有限公司 | Video conferencing system |
US9430046B2 (en) * | 2014-01-16 | 2016-08-30 | Denso International America, Inc. | Gesture based image capturing system for vehicle |
JP6617417B2 (en) * | 2015-03-05 | 2019-12-11 | セイコーエプソン株式会社 | Display device and display device control method |
CN104883583B (en) * | 2015-06-05 | 2017-11-21 | 广东欧珀移动通信有限公司 | A kind of method and device for obtaining Online Video sectional drawing |
US10083238B2 (en) * | 2015-09-28 | 2018-09-25 | Oath Inc. | Multi-touch gesture search |
US10469909B1 (en) * | 2016-07-14 | 2019-11-05 | Gopro, Inc. | Systems and methods for providing access to still images derived from a video |
US10264302B2 (en) * | 2016-09-30 | 2019-04-16 | Ricoh Company, Ltd. | Communication management apparatus, method and computer-readable storage medium for generating image data identification information |
US10509964B2 (en) | 2017-01-11 | 2019-12-17 | Microsoft Technology Licensing, Llc | Toggle view functions for teleconferencing sessions |
US10389974B2 (en) * | 2017-01-16 | 2019-08-20 | Microsoft Technology Licensing, Llc | Switch view functions for teleconference sessions |
US10827212B2 (en) * | 2017-08-02 | 2020-11-03 | Delta Electronics, Inc. | Image transmission equipment and image transmission method |
US20190087060A1 (en) * | 2017-09-19 | 2019-03-21 | Sling Media Inc. | Dynamic adjustment of media thumbnail image size based on touchscreen pressure |
US10891014B2 (en) * | 2018-03-21 | 2021-01-12 | Microsoft Technology Licensing, Llc | Remote view manipulation in communication session |
US11678031B2 (en) | 2019-04-19 | 2023-06-13 | Microsoft Technology Licensing, Llc | Authoring comments including typed hyperlinks that reference video content |
US11785194B2 (en) | 2019-04-19 | 2023-10-10 | Microsoft Technology Licensing, Llc | Contextually-aware control of a user interface displaying a video and related user text |
CN112312057A (en) * | 2020-02-24 | 2021-02-02 | 北京字节跳动网络技术有限公司 | Multimedia conference data processing method and device and electronic equipment |
US11621979B1 (en) | 2020-12-31 | 2023-04-04 | Benjamin Slotznick | Method and apparatus for repositioning meeting participants within a virtual space view in an online meeting user interface based on gestures made by the meeting participants |
US11330021B1 (en) | 2020-12-31 | 2022-05-10 | Benjamin Slotznick | System and method of mirroring a display of multiple video feeds in videoconferencing systems |
US11546385B1 (en) | 2020-12-31 | 2023-01-03 | Benjamin Slotznick | Method and apparatus for self-selection by participant to display a mirrored or unmirrored video feed of the participant in a videoconferencing platform |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080165388A1 (en) * | 2007-01-04 | 2008-07-10 | Bertrand Serlet | Automatic Content Creation and Processing |
US20130083151A1 (en) * | 2011-09-30 | 2013-04-04 | Lg Electronics Inc. | Electronic device and method for controlling electronic device |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5999208A (en) | 1998-07-15 | 1999-12-07 | Lucent Technologies Inc. | System for implementing multiple simultaneous meetings in a virtual reality mixed media meeting room |
US6816626B1 (en) | 2001-04-27 | 2004-11-09 | Cisco Technology, Inc. | Bandwidth conserving near-end picture-in-picture videotelephony |
US7266568B1 (en) | 2003-04-11 | 2007-09-04 | Ricoh Company, Ltd. | Techniques for storing multimedia information with source documents |
US7590941B2 (en) | 2003-10-09 | 2009-09-15 | Hewlett-Packard Development Company, L.P. | Communication and collaboration system using rich media environments |
US8223186B2 (en) | 2006-05-31 | 2012-07-17 | Hewlett-Packard Development Company, L.P. | User interface for a video teleconference |
KR101403839B1 (en) * | 2007-08-16 | 2014-06-03 | 엘지전자 주식회사 | Mobile communication terminal with touchscreen and display control method thereof |
KR101417002B1 (en) * | 2007-08-29 | 2014-07-08 | 엘지전자 주식회사 | A mobile communication terminal having multilateral image communication function and multilateral method for converting image communication mode thereof |
US20100077431A1 (en) | 2008-09-25 | 2010-03-25 | Microsoft Corporation | User Interface having Zoom Functionality |
US8330793B2 (en) | 2009-10-09 | 2012-12-11 | Hewlett-Packard Development Company, L.P. | Video conference |
US8451994B2 (en) * | 2010-04-07 | 2013-05-28 | Apple Inc. | Switching cameras during a video conference of a multi-camera mobile device |
US20130147906A1 (en) * | 2011-12-07 | 2013-06-13 | Reginald Weiser | Systems and methods for offloading video processing of a video conference |
US9699271B2 (en) * | 2013-01-29 | 2017-07-04 | Blackberry Limited | Method and apparatus for suspending screen sharing during confidential data entry |
-
2013
- 2013-09-06 US US14/019,915 patent/US9197853B2/en not_active Expired - Fee Related
-
2015
- 2015-11-19 US US14/946,415 patent/US20160077703A1/en not_active Abandoned
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080165388A1 (en) * | 2007-01-04 | 2008-07-10 | Bertrand Serlet | Automatic Content Creation and Processing |
US20130083151A1 (en) * | 2011-09-30 | 2013-04-04 | Lg Electronics Inc. | Electronic device and method for controlling electronic device |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
NO20172029A1 (en) * | 2017-12-22 | 2018-10-08 | Pexip AS | Visual control of a video conference |
NO343032B1 (en) * | 2017-12-22 | 2018-10-08 | Pexip AS | Visual control of a video conference |
US10645330B2 (en) | 2017-12-22 | 2020-05-05 | Pexip AS | Visual control of a video conference |
Also Published As
Publication number | Publication date |
---|---|
US9197853B2 (en) | 2015-11-24 |
US20140340465A1 (en) | 2014-11-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9197853B2 (en) | Switching between views using natural gestures | |
US10742932B2 (en) | Communication terminal, communication system, moving-image outputting method, and recording medium storing program | |
US9531999B2 (en) | Real-time smart display detection system | |
CN110636353B (en) | Display device | |
US9258520B2 (en) | Video communication terminal and method of displaying images | |
JP6171263B2 (en) | Remote conference system and remote conference terminal | |
EP3657824A2 (en) | System and method for multi-user control and media streaming to a shared display | |
US10241660B2 (en) | Display control apparatus, method for controlling the same, and storage medium | |
EP3002905A1 (en) | Unified communication-based video conference call method, device and system | |
JP2005051778A (en) | Integrated system for providing shared interactive environment, computer data signal, program, system, method for exchanging information in shared interactive environment, and method for annotating live video image | |
US20180307754A1 (en) | Presenter/viewer role swapping during zui performance with video background | |
US20030009524A1 (en) | System and method for point to point integration of personal computers with videoconferencing systems | |
CN102810048A (en) | Display apparatus and method | |
US11190653B2 (en) | Techniques for capturing an image within the context of a document | |
CN108260011B (en) | Method and system for realizing painting on display equipment | |
US10893206B1 (en) | User experience with digital zoom in video from a camera | |
US20160041737A1 (en) | Systems, methods and computer program products for enlarging an image | |
WO2023125313A1 (en) | Video recording method and apparatus and electronic device | |
JP6497002B2 (en) | System and method for switching screen display between multiple views using gestures | |
CN106210665B (en) | Remote host control method and system based on video acquisition | |
US9519709B2 (en) | Determination of an ordered set of separate videos | |
US10645330B2 (en) | Visual control of a video conference | |
US11893541B2 (en) | Meeting and collaborative canvas with image pointer | |
CN117234324A (en) | Image acquisition method, device, equipment and medium of information input page | |
KR20150021349A (en) | Network Camera Dashboard Apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: RICOH COMPANY, LTD., JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SHI, SHU;BARRUS, JOHN;REEL/FRAME:037106/0238 Effective date: 20131003 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |