US20220292801A1 - Formatting Views of Whiteboards in Conjunction with Presenters - Google Patents
Formatting Views of Whiteboards in Conjunction with Presenters Download PDFInfo
- Publication number
- US20220292801A1 US20220292801A1 US17/654,585 US202217654585A US2022292801A1 US 20220292801 A1 US20220292801 A1 US 20220292801A1 US 202217654585 A US202217654585 A US 202217654585A US 2022292801 A1 US2022292801 A1 US 2022292801A1
- Authority
- US
- United States
- Prior art keywords
- whiteboard
- talker
- writing
- presentation device
- interactive group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000009432 framing Methods 0.000 claims abstract description 57
- 230000002452 interceptive effect Effects 0.000 claims description 34
- 238000000034 method Methods 0.000 claims description 27
- 230000015654 memory Effects 0.000 claims description 25
- 238000013528 artificial neural network Methods 0.000 description 15
- 238000012545 processing Methods 0.000 description 8
- 235000019800 disodium phosphate Nutrition 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 230000001133 acceleration Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 238000003491 array Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 241000414697 Tegra Species 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
- H04N7/147—Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/70—Circuitry for compensating brightness variation in the scene
- H04N23/741—Circuitry for compensating brightness variation in the scene by increasing the dynamic range of the image compared to the dynamic range of the electronic image sensors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/45—Cameras or camera modules comprising electronic image sensors; Control thereof for generating image signals from two or more image sensors being of different type or operating in different modes, e.g. with a CMOS sensor for moving images in combination with a charge-coupled device [CCD] for still images
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/80—Camera processing pipelines; Components thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
- H04N7/152—Multipoint control units therefor
Definitions
- This disclosure relates generally to videoconferencing and relates particularly to detection of whiteboards and individuals in one or more captured audio-visual streams.
- whiteboards are treated primarily as content sources, so that the whiteboard is provided as a content stream.
- a presenter is seen in a video stream, even if he moves around.
- a camera is dedicated to a whiteboard, but then a user must switch the video source being provided to the far end between the whiteboard and the presenter. If the presenter is standing near or in front of the whiteboard, any framing with the whiteboard can become confusing.
- the whiteboard is provided in the content stream and displayed on a content monitor, but the whiteboard is also present in the presenter video stream and the main monitor.
- FIG. 1 is an illustration of a conference room including multiple cameras and a whiteboard according to examples of the present disclosure.
- FIG. 2A is an illustration of a presenter separated from a whiteboard and the resulting framing according to examples of the present disclosure.
- FIG. 2B is an illustration of a presenter standing in front of an empty whiteboard and the resulting framing according to examples of the present disclosure.
- FIG. 2C is an illustration of a presenter standing in front of a whiteboard full of writing and the resulting framing according to examples of the present disclosure.
- FIG. 2D is an illustration of a presenter standing in front of a whiteboard only partially filled with writing and the resulting framing according to examples of the present disclosure.
- FIG. 3 is a high-level flowchart of framing operations according to examples of the present disclosure.
- FIG. 4A is a flowchart of framing operations involving a presenter and a whiteboard according to examples of the present disclosure.
- FIG. 4B is the flowchart of FIG. 4A with added delay periods.
- FIG. 4C is the flowchart of FIG. 4A when the whiteboard is never provided as content.
- FIG. 5 is a high-level block diagram of a videoconferencing system according to examples of the present disclosure.
- FIG. 6 is a more detailed block diagram of the videoconferencing system of FIG. 5 according to examples of the present disclosure.
- FIG. 7 is a block diagram of a system on a chip for use in the videoconferencing systems of FIGS. 5 and 6 .
- a near end videoconferencing endpoint determines if there is a whiteboard and if a presenter is near the whiteboard. If there is no whiteboard in view or the presenter is not near the whiteboard, any content from a camera focused on the whiteboard is continued and any presenter framing is done normally. If the presenter is in front of the whiteboard, any whiteboard content is ended, and appropriate portions of the whiteboard are included in the main video stream framed with the presenter. If the whiteboard is empty, framing is done without reference to the whiteboard. If the whiteboard is full or has writing away from the presenter, the entire whiteboard and the presenter are framed together.
- the whiteboard only has writing near the presenter, only the relevant portion of the whiteboard is framed with the presenter.
- the whiteboard By including the whiteboard in the framing with the presenter and turning off any whiteboard content stream when the presenter is near the whiteboard, the far end viewer does not see the whiteboard in two different streams.
- Conference room C is an exemplary near end location.
- Conference room C includes a conference table 10 and a series of chairs 12 A- 12 F.
- a whiteboard 16 is located on one wall of the conference room C.
- a videoconferencing endpoint 498 which includes a camera 502 to view individuals seated in the various chairs 12 A-F and the whiteboard 16 and a microphone array 504 to determine speaker direction, is provided at one end of the conference room C.
- a second camera and microphone array combination 510 is provided on one side of the conference room C and has a clearer view of the whiteboard 16 .
- a third camera and microphone array 512 is provided on a side of the conference room C holding the whiteboard 16 .
- a content camera 511 is mounted opposite the whiteboard 16 to capture the whiteboard 16 to provide as a content stream.
- a monitor or television 506 is provided to display the far end conference site or sites and generally to provide the loudspeaker output.
- FIG. 2A illustrates a presenter P separated from the whiteboard 16 .
- the presenter P is framed normally and the content camera 511 provides the view of the whiteboard 16 as content in the videoconference. Because there is no overlap, there is no confusion by a viewer at the far end.
- the presenter P has moved to be in front of the whiteboard 16 .
- the framed view F 2 of the presenter P now overlaps the whiteboard 16 . Therefore, the content camera 511 is no longer providing content to avoid potential viewer confusion.
- the framing view F 2 of the presenter P is the same size as the framing view F 1 , as the size and location are based only on the presenter P, as there is nothing on the whiteboard 16 to display.
- the whiteboard 16 has been filled with two columns 200 and 202 of writing.
- the presenter P has not moved from FIG. 2B .
- the framing view F 3 includes the presenter P and the entire whiteboard 16 , as the whiteboard 16 is substantially filled with writing. No content is being provided from the content camera 511 as the presenter P is in front of the whiteboard 16 and the contents of the whiteboard 16 are provided in the framing view F 3 .
- FIG. 2D is like FIG. 2C , except that the whiteboard 16 only contains writing in the left column 200 .
- the right portion of the whiteboard 16 is empty. As the right portion of the whiteboard 16 is empty, that portion need not be shown in framing view F 4 , which is based on the presenter P and the left column 200 .
- FIG. 2D in FIG. 2D no content is being provided from the content camera 511 .
- a high-level flowchart 300 of camera framing by a near end videoconference endpoint is illustrated.
- video streams are received from any cameras and audio streams are received from any microphone arrays.
- regions of interest are located. Regions of interest are objects or areas that are of interest in performing framing decisions. Regions of interest include conference participants but also objects in the conference room, such as the whiteboard 16 or any object to which the participant's views may be directed.
- neural networks are trained for face and body finding and for detecting the presence of various objects that can be regions of interest.
- the objects include a whiteboard, including the amount and location of writing on the whiteboard.
- the output of the neural network can include not only a bounding box for the whiteboard but also outputs related to the amount and location of writing.
- the amount and location of the writing is determined in a second neural network to simplify the training of the neural network performing the main region of interest detection.
- the bounding box information for the whiteboard is provided as an input to the specialized neural network to minimize the requirements of the specialized neural network.
- the face and body finding are performed in one neural network and other regions of interest, such as the whiteboard, are detected in a different neural network, allowing simplification of each neural network and reuse of existing face and body finding neural networks.
- the detection of any writing on the whiteboard can be performed by the region of interest detection neural network or in an additional neural network as described.
- step 306 the audio streams from the microphone arrays are used for sound source localization (SSL), with the SSL results then used in combination with the video streams to find talkers.
- SSL sound source localization
- step 308 the parties are framed as desired. Framing is usually based on the locations and numbers of talkers or participants to be framed. Examples according to the present disclosure add the location of a whiteboard into the framing considerations. Details of the framing according to examples of the present disclosure are provided in FIGS. 4A, 4B and 4C .
- FIG. 4A provides details of step 308 for examples according to the present disclosure when a whiteboard may be involved.
- a determination is made whether the ROIs, from step 304 , include a whiteboard. If so, in step 404 a determination is made whether the presenter, the talker as determined based on SSL and video observations, is near the whiteboard. Near is effectively if the presenter is in a position that the framed view of the presenter would include the whiteboard. If not, or if there is no whiteboard ROI, in step 406 any content from a whiteboard, as from the content camera 511 , is provided as content in the videoconference. Operation proceeds to step 408 , where normal framing operations are performed, as illustrated in FIG. 2A . These normal framing operations can be rule of thirds framing, centered framing, and the like.
- step 410 transmission of the whiteboard as content is discontinued.
- step 412 it is determined if the whiteboard is empty. If so, the whiteboard need not be considered in framing determinations and operation proceeds to step 408 , for framing as illustrated in FIG. 2B .
- step 414 it is determined of the whiteboard is substantially full of writing or is only in portions not adjacent the presenter. If the whiteboard is full or the writing is not adjacent the presenter, in step 416 framing is based on the presenter and the entire whiteboard, as in FIG. 2C . If the whiteboard is only partially filled and the portion is adjacent the presenter, in step 418 framing is based on the presenter and just the portion of the whiteboard containing the writing, as in FIG. 2D .
- step 450 it is determined if the talker has been away from the whiteboard for a desired period, such as five seconds. If so, then the whiteboard is provided as content in step 408 . If the talker has been not away for the desired period, operation proceeds to step 410 , with the whiteboard remaining discontinued as content.
- step 452 it is determined if the talker has been near the whiteboard for a desired period, such as five seconds. If so, operation proceeds to step 410 and the provision of the whiteboard as content is discontinued. If the desired period has not elapsed, operation proceeds to step 406 , where the whiteboard continues to be provided as content.
- a desired period such as five seconds.
- FIG. 4C Operation is similar even if the whiteboard is never provided as content, such as when there is no camera aimed at the whiteboard to operate as a content camera. This operation is shown in FIG. 4C .
- FIG. 4C is FIG. 4A with steps 406 and 410 removed as the whiteboard is never to be provided as content.
- a similar modification can be done to FIG. 4B to include the time delays of FIG. 4B .
- whiteboards have been discussed above, it is understood that other objects are similar to whiteboards, so the term whiteboard as used herein is not limited to just dry erase whiteboards per se but includes similar items, such as smart or interactive whiteboards, flip charts, extra-large sticky notes, bulletin boards with paper on them, boards (including Kanban boards and scrum boards), clusters of sticky notes, a wall with a projected image from an interactive projector, etc., all of which are broadly considered as interactive group presentation devices.
- writing is used broadly, so that other information besides the illustrated textual information, such as graphical information, pre-printed materials, etc. placed on or displayed by the whiteboard are classified as writing, all of which are broadly considered as information.
- a content camera 511 has been described as capturing the whiteboard to be provided as content. If the whiteboard is a smart or interactive whiteboard, the whiteboard itself may be providing the content image. If the whiteboard is an image projected by an interactive projector, the projector may be providing the content image. The transmission of the content image in either case would be controlled as described in FIGS. 4A and 4B , just in cooperation with the smart whiteboard or interactive projector instead of the content camera 511 .
- the camera with the best view of the presenter P and whiteboard 16 is used for the framing operations and then transmitted to the far end.
- the camera would be camera 510 , absent a participant standing in front of the camera 510 .
- the whiteboard 16 has been shown mounted on a wall, the whiteboard may also be freestanding or a portion of another object.
- the whiteboard By including the whiteboard into presenter or talker framing decisions when the presenter is near or in front of the whiteboard, the experience of viewers at the far end is improved as confusion with provision of the whiteboard as content is reduced, particularly if the provision of whiteboard content is coordinated with the presenter framing decisions so that the whiteboard is not presented in both normal video stream and the content stream at the same time.
- FIG. 5 illustrates an exemplary videoconferencing endpoint 498 as used at a near end or a far end according to the present disclosure.
- a codec 500 the processing unit of the videoconferencing endpoint 498 , performs the necessary processing.
- a camera 502 and a microphone array 504 are included in the codec 500 to form an integrated unit, such as a bar.
- An external microphone 508 is connected to the codec 500 to be used on a conference room table.
- Cameras 510 and 512 which include integrated microphone arrays, are connected to the codec 500 to provide alternate or additional views or video streams.
- a content camera 511 is connected to the codec 500 to provide a content stream for use in the videoconference.
- a television or monitor 506 including a speaker, is also connected to the codec 500 to provide video and audio output. Additional monitors can be used if desired to provide greater flexibility in displaying conference participants and conference content.
- the codec 500 is connected to a corporate or other local area network (LAN) 514 .
- the corporate LAN 514 is connected to a firewall 516 and then the Internet 518 in a common configuration to allow communication with a remote endpoint 634 at a far end.
- a system on chip (SoC) 600 is the primary component of the codec 500 .
- the SoC 600 is similar to those used for cellular telephones and handheld equipment, such as a Tegra X1 or Qualcomm 835.
- the SoC 600 may be included as the main component on a system on module (SOM), such as nVidiaTM Jetson TX1 or IntrinsycTM Open-QTM 835 System on Module.
- SOM system on module
- the SoC 600 contains the CPUs 601 , DSP(s) 602 , a GPU 606 , onboard RAM 608 , a video encode and decode module 614 , an HDMI output module 616 , a camera inputs module 618 , a DRAM interface 610 , a flash memory interface and an I/O module 622 .
- the I/O module 622 provides audio inputs and outputs, such as I2S signals; USB interfaces; an SDIO interface; PCIe interfaces; an SPI interface; an I2C interface and various general purpose I/O pins (GPIO).
- Cameras 510 , 512 and content camera 511 are connected to the camera inputs module 618 .
- the monitor and speaker 506 is connected to the HDMI output module 616 .
- External DRAM 612 and a Wi-Fi/Bluetooth module 620 are connected to the SoC 600 to provide the needed bulk operating memory (RAM associated with each CPU and DSP is not shown) and additional I/O capabilities commonly used today.
- An audio codec 624 is connected to the SoC 600 to provide local analog line level capabilities.
- An analog microphone 508 is connected to the audio codec 624 .
- NICs 626 , 628 are connected to the PCIe interfaces of the SoC 600 .
- NIC 626 is for connection to the corporate LAN 514 and then to IP microphones 632 , the Internet 518 and remote or far end endpoints 634 , while the other NIC 628 is used for local connection of IP-connected devices, such as IP microphones 630 .
- Flash memory 604 is connected to the SoC 600 to hold the programs that are executed by the CPUs 601 and DSPs 602 to provide the endpoint functionality of the codec 500 , including the whiteboard and presenter framing discussed above.
- Illustrated modules include a video codec 650 , camera control 652 , face, body and ROI finding 653 , neural network models 655 , framing 654 , other video processing 656 , audio codec 658 , audio processing 660 , sound source localization 661 , network operations 666 , user interface 668 and operating system and various other modules 670 .
- the RAM 608 and DRAM 612 is used for storing any of the modules in the flash memory 604 when the module is executing, storing video images of video streams and audio samples of audio streams and can be used for scratchpad operation of the SoC 600 .
- the neural network models 855 and face, body and ROI finding 853 are used with the framing 654 to perform the whiteboard and presenter detection and framing as described above for FIGS. 3 and 4 and illustrated in FIGS. 2A-2D .
- FIG. 7 is a block diagram of an exemplary system on a chip (SoC) 700 as can be used as the SoC 600 in the codec 500 .
- SoC system on a chip
- a series of more powerful microprocessors 702 such as ARM® A72 or A53 cores, form the CPUs 601 or primary general purpose processing block of the SoC 700
- DSP digital signal processor
- a simpler processor 706 such as ARM RSF cores, provides general control capability in the SoC 700 .
- the more powerful microprocessors 702 , more powerful DSP 704 , less powerful DSPs 705 and simpler processor 706 each include various data and instruction caches, such as L1I, L1D, and L2D, to improve speed of operations.
- a high speed interconnect 708 connects the microprocessors 702 , more powerful DSP 704 , simpler DSPs 705 and processors 706 to various other components in the SoC 700 .
- a shared memory controller 710 which includes onboard memory or SRAM 608 , is connected to the high speed interconnect 708 to act as the onboard SRAM for the SoC 700 .
- a DDR (double data rate) memory controller system 714 is connected to the high speed interconnect 708 and acts as an external interface to external DRAM memory.
- a video acceleration module 716 and a radar processing accelerator (PAC) module 718 are similarly connected to the high speed interconnect 708 .
- a neural network acceleration module 717 is provided for hardware acceleration of neural network operations.
- a vision processing accelerator (VPACC) module is the video encoder/decoder 614 and is connected to the high speed interconnect 708 , as is a depth and motion PAC (DMPAC) module 722 .
- a graphics acceleration module 724 is connected to the high speed interconnect 708 .
- a display subsystem as the HDMI output 616 is connected to the high speed interconnect 708 to allow operation with and connection to various video monitors.
- a system services block 732 which includes items such as DMA controllers, memory management units, general purpose I/O's, mailboxes, and the like, is provided for normal SoC 700 operation.
- a serial connectivity module 734 is connected to the high speed interconnect 708 and includes modules as normal in an SoC.
- a connectivity module 736 provides interconnects for external communication interfaces, such as PCIe block 738 , USB block 740 and an Ethernet switch 742 .
- a capture/MIPI module is the camera interface 618 and includes a four lane CSI 2 compliant transmit block 746 and a four lane CSI 2 receive module and hub.
- An MCU island 760 is provided as a secondary subsystem and handles operation of the integrated SoC 700 when the other components are powered down to save energy.
- An MCU ARM processor 762 such as one or more ARM R5F cores, operates as a master and is coupled to the high speed interconnect 708 through an isolation interface 761 .
- An MCU general purpose I/O (GPIO) block 764 operates as a slave.
- MCU RAM 766 is provided to act as local memory for the MCU ARM processor 762 .
- a CAN bus block 768 an additional external communication interface, is connected to allow operation with a conventional CAN bus environment in a vehicle.
- An Ethernet MAC (media access control) block 770 is provided for further connectivity.
- External memory generally non volatile memory (NVM) such as flash memory 604
- NVM non volatile memory
- the MCU ARM processor 762 operates as a safety processor, monitoring operations of the SoC 700 to ensure proper operation of the SoC 700 .
- a system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions.
- One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions.
- One general aspect includes a method of presenting a talker and a whiteboard to a far end of a videoconference. The method also includes receiving at least one video stream containing both the talker and the whiteboard. The method also includes determining the presence of the talker near the whiteboard. The method also includes when the talker is near the whiteboard, framing the talker and the whiteboard together for provision to the far end.
- Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
- Implementations may include one or more of the following features.
- the method may include determining the presence of writing on the whiteboard, and where framing the talker and the whiteboard together is performed only when there is writing on the whiteboard. Determining the presence of writing on the whiteboard includes determining that the writing only partially fills the whiteboard and the writing is adjacent to the talker, and where framing the talker and the whiteboard together frames the talker and only the portion of the whiteboard adjacent to the talker containing the writing when the writing only partially fills the whiteboard and the writing is adjacent to the talker.
- Determining the presence of writing on the whiteboard includes determining that the writing fills the whiteboard, and where framing the talker and the whiteboard together frames the talker and the entire whiteboard when the determining the presence of writing on the whiteboard determines that the writing fills the whiteboard.
- the method the near end environment further containing a camera for providing a view of the whiteboard as content in the videoconference the method may include: discontinuing provision of the whiteboard as content when the talker and the whiteboard are framed together.
- the method may include continuing provision of the whiteboard as content when the talker is not near the whiteboard.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Studio Devices (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
A videoconferencing endpoint determines if there is a whiteboard and if a presenter is near the whiteboard. If there is no whiteboard in view or the presenter is not near the whiteboard, any content from a camera focused on the whiteboard is continued and any presenter framing is done normally. If the presenter is in front of the whiteboard, any whiteboard content is ended, and appropriate portions of the whiteboard are included in the main video stream framed with the presenter. If the whiteboard is empty, framing is done without reference to the whiteboard. If the whiteboard is full or has writing away from the presenter, the entire whiteboard and the presenter are framed together. If the whiteboard only has writing near the presenter, only the relevant portion of the whiteboard is framed with the presenter.
Description
- This application claims priority to U.S. Provisional Application Ser. No. 63/161,133, filed Mar. 15, 2021, the contents of which are incorporated herein in their entirety by reference.
- This disclosure relates generally to videoconferencing and relates particularly to detection of whiteboards and individuals in one or more captured audio-visual streams.
- Currently whiteboards are treated primarily as content sources, so that the whiteboard is provided as a content stream. A presenter is seen in a video stream, even if he moves around. In some cases, a camera is dedicated to a whiteboard, but then a user must switch the video source being provided to the far end between the whiteboard and the presenter. If the presenter is standing near or in front of the whiteboard, any framing with the whiteboard can become confusing. For example, the whiteboard is provided in the content stream and displayed on a content monitor, but the whiteboard is also present in the presenter video stream and the main monitor.
- For illustration, there are shown in the drawings certain examples described in the present disclosure. In the drawings, like numerals indicate like elements throughout. The full scope of the inventions disclosed herein are not limited to the precise arrangements, dimensions, and instruments shown. In the drawings:
-
FIG. 1 is an illustration of a conference room including multiple cameras and a whiteboard according to examples of the present disclosure. -
FIG. 2A is an illustration of a presenter separated from a whiteboard and the resulting framing according to examples of the present disclosure. -
FIG. 2B is an illustration of a presenter standing in front of an empty whiteboard and the resulting framing according to examples of the present disclosure. -
FIG. 2C is an illustration of a presenter standing in front of a whiteboard full of writing and the resulting framing according to examples of the present disclosure. -
FIG. 2D is an illustration of a presenter standing in front of a whiteboard only partially filled with writing and the resulting framing according to examples of the present disclosure. -
FIG. 3 is a high-level flowchart of framing operations according to examples of the present disclosure. -
FIG. 4A is a flowchart of framing operations involving a presenter and a whiteboard according to examples of the present disclosure. -
FIG. 4B is the flowchart ofFIG. 4A with added delay periods. -
FIG. 4C is the flowchart ofFIG. 4A when the whiteboard is never provided as content. -
FIG. 5 is a high-level block diagram of a videoconferencing system according to examples of the present disclosure. -
FIG. 6 is a more detailed block diagram of the videoconferencing system ofFIG. 5 according to examples of the present disclosure. -
FIG. 7 is a block diagram of a system on a chip for use in the videoconferencing systems ofFIGS. 5 and 6 . - Far end viewer comprehension is improved in examples according to the present disclosure. A near end videoconferencing endpoint determines if there is a whiteboard and if a presenter is near the whiteboard. If there is no whiteboard in view or the presenter is not near the whiteboard, any content from a camera focused on the whiteboard is continued and any presenter framing is done normally. If the presenter is in front of the whiteboard, any whiteboard content is ended, and appropriate portions of the whiteboard are included in the main video stream framed with the presenter. If the whiteboard is empty, framing is done without reference to the whiteboard. If the whiteboard is full or has writing away from the presenter, the entire whiteboard and the presenter are framed together. If the whiteboard only has writing near the presenter, only the relevant portion of the whiteboard is framed with the presenter. By including the whiteboard in the framing with the presenter and turning off any whiteboard content stream when the presenter is near the whiteboard, the far end viewer does not see the whiteboard in two different streams.
- Referring now to
FIG. 1 , a conference room C configured for use in videoconferencing is illustrated. The conference room C is an exemplary near end location. Conference room C includes a conference table 10 and a series ofchairs 12A-12F. Awhiteboard 16 is located on one wall of the conference room C. Avideoconferencing endpoint 498, which includes acamera 502 to view individuals seated in thevarious chairs 12A-F and thewhiteboard 16 and amicrophone array 504 to determine speaker direction, is provided at one end of the conference room C. A second camera andmicrophone array combination 510 is provided on one side of the conference room C and has a clearer view of thewhiteboard 16. A third camera andmicrophone array 512 is provided on a side of the conference room C holding thewhiteboard 16. Acontent camera 511 is mounted opposite thewhiteboard 16 to capture thewhiteboard 16 to provide as a content stream. A monitor ortelevision 506 is provided to display the far end conference site or sites and generally to provide the loudspeaker output. -
FIG. 2A illustrates a presenter P separated from thewhiteboard 16. As there is no overlap between a framed view F1 of the presenter P and thewhiteboard 16, the presenter P is framed normally and thecontent camera 511 provides the view of thewhiteboard 16 as content in the videoconference. Because there is no overlap, there is no confusion by a viewer at the far end. - In
FIG. 2B , the presenter P has moved to be in front of thewhiteboard 16. The framed view F2 of the presenter P now overlaps thewhiteboard 16. Therefore, thecontent camera 511 is no longer providing content to avoid potential viewer confusion. As thewhiteboard 16 is empty, the framing view F2 of the presenter P is the same size as the framing view F1, as the size and location are based only on the presenter P, as there is nothing on thewhiteboard 16 to display. - In
FIG. 2C , thewhiteboard 16 has been filled with twocolumns FIG. 2B . As the presenter P is in front of awhiteboard 16 containing writing, the framing view F3 includes the presenter P and theentire whiteboard 16, as thewhiteboard 16 is substantially filled with writing. No content is being provided from thecontent camera 511 as the presenter P is in front of thewhiteboard 16 and the contents of thewhiteboard 16 are provided in the framing view F3. -
FIG. 2D is likeFIG. 2C , except that thewhiteboard 16 only contains writing in theleft column 200. The right portion of thewhiteboard 16 is empty. As the right portion of thewhiteboard 16 is empty, that portion need not be shown in framing view F4, which is based on the presenter P and theleft column 200. As withFIG. 2C , inFIG. 2D no content is being provided from thecontent camera 511. - By including the presence of the
whiteboard 16 and any writing on thewhiteboard 16 into the decisions for framing the presenter P, and appropriately controlling the transmission of the whiteboard as content, viewer confusion is reduced. - Referring now to
FIG. 3 , a high-level flowchart 300 of camera framing by a near end videoconference endpoint is illustrated. Instep 302, video streams are received from any cameras and audio streams are received from any microphone arrays. Instep 304, regions of interest are located. Regions of interest are objects or areas that are of interest in performing framing decisions. Regions of interest include conference participants but also objects in the conference room, such as thewhiteboard 16 or any object to which the participant's views may be directed. In some examples according to the present disclosure, neural networks are trained for face and body finding and for detecting the presence of various objects that can be regions of interest. In the present examples, the objects include a whiteboard, including the amount and location of writing on the whiteboard. For a whiteboard, the output of the neural network can include not only a bounding box for the whiteboard but also outputs related to the amount and location of writing. In some examples, the amount and location of the writing is determined in a second neural network to simplify the training of the neural network performing the main region of interest detection. The bounding box information for the whiteboard is provided as an input to the specialized neural network to minimize the requirements of the specialized neural network. In some examples, the face and body finding are performed in one neural network and other regions of interest, such as the whiteboard, are detected in a different neural network, allowing simplification of each neural network and reuse of existing face and body finding neural networks. In those examples, the detection of any writing on the whiteboard can be performed by the region of interest detection neural network or in an additional neural network as described. - In
step 306, the audio streams from the microphone arrays are used for sound source localization (SSL), with the SSL results then used in combination with the video streams to find talkers. In the case of a presenter in front of a whiteboard, there is generally only a single talker to be framed. - After the talkers are found in
step 306, instep 308 the parties are framed as desired. Framing is usually based on the locations and numbers of talkers or participants to be framed. Examples according to the present disclosure add the location of a whiteboard into the framing considerations. Details of the framing according to examples of the present disclosure are provided inFIGS. 4A, 4B and 4C . -
FIG. 4A provides details ofstep 308 for examples according to the present disclosure when a whiteboard may be involved. Instep 402, a determination is made whether the ROIs, fromstep 304, include a whiteboard. If so, in step 404 a determination is made whether the presenter, the talker as determined based on SSL and video observations, is near the whiteboard. Near is effectively if the presenter is in a position that the framed view of the presenter would include the whiteboard. If not, or if there is no whiteboard ROI, instep 406 any content from a whiteboard, as from thecontent camera 511, is provided as content in the videoconference. Operation proceeds to step 408, where normal framing operations are performed, as illustrated inFIG. 2A . These normal framing operations can be rule of thirds framing, centered framing, and the like. - If the presenter is near the whiteboard in
step 404, instep 410 transmission of the whiteboard as content is discontinued. Instep 412, it is determined if the whiteboard is empty. If so, the whiteboard need not be considered in framing determinations and operation proceeds to step 408, for framing as illustrated inFIG. 2B . - If the whiteboard is not empty, in
step 414 it is determined of the whiteboard is substantially full of writing or is only in portions not adjacent the presenter. If the whiteboard is full or the writing is not adjacent the presenter, instep 416 framing is based on the presenter and the entire whiteboard, as inFIG. 2C . If the whiteboard is only partially filled and the portion is adjacent the presenter, instep 418 framing is based on the presenter and just the portion of the whiteboard containing the writing, as inFIG. 2D . - In a simplified example, there is no evaluation of the amount or location of any writing on the whiteboard and the presenter is simply framed with the entire whiteboard when the presenter is near the whiteboard, so that the framing is as shown in
FIG. 2C even if there is no writing or the writing is adjacent the presenter. - If the presenter is pacing, so that the whiteboard comes into and out of a framing view of the presenter, a situation might arise where the whiteboard content stream is rapidly and repeatedly turned on and off. This would be distracting to the viewer at the far end, so in some examples time delays are included after the determination of
step 404 as shown inFIG. 4B . If the talker is not near or away the whiteboard instep 404, instep 450 it is determined if the talker has been away from the whiteboard for a desired period, such as five seconds. If so, then the whiteboard is provided as content instep 408. If the talker has been not away for the desired period, operation proceeds to step 410, with the whiteboard remaining discontinued as content. - If the talker is near the whiteboard in
step 404, instep 452 it is determined if the talker has been near the whiteboard for a desired period, such as five seconds. If so, operation proceeds to step 410 and the provision of the whiteboard as content is discontinued. If the desired period has not elapsed, operation proceeds to step 406, where the whiteboard continues to be provided as content. - Operation is similar even if the whiteboard is never provided as content, such as when there is no camera aimed at the whiteboard to operate as a content camera. This operation is shown in
FIG. 4C .FIG. 4C isFIG. 4A withsteps FIG. 4B to include the time delays ofFIG. 4B . - While whiteboards have been discussed above, it is understood that other objects are similar to whiteboards, so the term whiteboard as used herein is not limited to just dry erase whiteboards per se but includes similar items, such as smart or interactive whiteboards, flip charts, extra-large sticky notes, bulletin boards with paper on them, boards (including Kanban boards and scrum boards), clusters of sticky notes, a wall with a projected image from an interactive projector, etc., all of which are broadly considered as interactive group presentation devices.
- While writing on the whiteboard has been discussed above, it is understood that writing is used broadly, so that other information besides the illustrated textual information, such as graphical information, pre-printed materials, etc. placed on or displayed by the whiteboard are classified as writing, all of which are broadly considered as information.
- In the examples of this disclosure, a
content camera 511 has been described as capturing the whiteboard to be provided as content. If the whiteboard is a smart or interactive whiteboard, the whiteboard itself may be providing the content image. If the whiteboard is an image projected by an interactive projector, the projector may be providing the content image. The transmission of the content image in either case would be controlled as described inFIGS. 4A and 4B , just in cooperation with the smart whiteboard or interactive projector instead of thecontent camera 511. - While the use of neural networks has been described to determine the presence of a whiteboard and the amount of writing on a whiteboard, it is understood that more conventional computer vision techniques can also be used.
- In examples according to the present disclosure, the camera with the best view of the presenter P and
whiteboard 16 is used for the framing operations and then transmitted to the far end. For example, inFIG. 1 that camera would becamera 510, absent a participant standing in front of thecamera 510. - While this disclosure has focused on the use of a whiteboard in a conference room, it is understood that the whiteboard and presenter may be in many different settings, including a classroom, an auditorium, a lecture hall, a theater and so on.
- Additionally, while the
whiteboard 16 has been shown mounted on a wall, the whiteboard may also be freestanding or a portion of another object. - By including the whiteboard into presenter or talker framing decisions when the presenter is near or in front of the whiteboard, the experience of viewers at the far end is improved as confusion with provision of the whiteboard as content is reduced, particularly if the provision of whiteboard content is coordinated with the presenter framing decisions so that the whiteboard is not presented in both normal video stream and the content stream at the same time.
-
FIG. 5 illustrates anexemplary videoconferencing endpoint 498 as used at a near end or a far end according to the present disclosure. Acodec 500, the processing unit of thevideoconferencing endpoint 498, performs the necessary processing. In the illustrated example, acamera 502 and amicrophone array 504 are included in thecodec 500 to form an integrated unit, such as a bar. Anexternal microphone 508 is connected to thecodec 500 to be used on a conference room table.Cameras codec 500 to provide alternate or additional views or video streams. Acontent camera 511 is connected to thecodec 500 to provide a content stream for use in the videoconference. A television or monitor 506, including a speaker, is also connected to thecodec 500 to provide video and audio output. Additional monitors can be used if desired to provide greater flexibility in displaying conference participants and conference content. - The
codec 500 is connected to a corporate or other local area network (LAN) 514. Thecorporate LAN 514 is connected to afirewall 516 and then theInternet 518 in a common configuration to allow communication with aremote endpoint 634 at a far end. - Details of the
codec 500 are shown inFIG. 6 . In the illustrated example, a system on chip (SoC) 600 is the primary component of thecodec 500. TheSoC 600 is similar to those used for cellular telephones and handheld equipment, such as a Tegra X1 or Qualcomm 835. TheSoC 600 may be included as the main component on a system on module (SOM), such as nVidia™ Jetson TX1 or Intrinsyc™ Open-Q™ 835 System on Module. TheSoC 600 contains theCPUs 601, DSP(s) 602, aGPU 606,onboard RAM 608, a video encode and decodemodule 614, anHDMI output module 616, acamera inputs module 618, aDRAM interface 610, a flash memory interface and an I/O module 622. The I/O module 622 provides audio inputs and outputs, such as I2S signals; USB interfaces; an SDIO interface; PCIe interfaces; an SPI interface; an I2C interface and various general purpose I/O pins (GPIO). -
Cameras content camera 511 are connected to thecamera inputs module 618. The monitor andspeaker 506 is connected to theHDMI output module 616.External DRAM 612 and a Wi-Fi/Bluetooth module 620 are connected to theSoC 600 to provide the needed bulk operating memory (RAM associated with each CPU and DSP is not shown) and additional I/O capabilities commonly used today. Anaudio codec 624 is connected to theSoC 600 to provide local analog line level capabilities. Ananalog microphone 508 is connected to theaudio codec 624. - Preferably two network interface chips (NICs) 626, 628, such as Intel I210, are connected to the PCIe interfaces of the
SoC 600. In the illustrated embodiment,NIC 626 is for connection to thecorporate LAN 514 and then toIP microphones 632, theInternet 518 and remote orfar end endpoints 634, while theother NIC 628 is used for local connection of IP-connected devices, such asIP microphones 630. -
Flash memory 604 is connected to theSoC 600 to hold the programs that are executed by theCPUs 601 andDSPs 602 to provide the endpoint functionality of thecodec 500, including the whiteboard and presenter framing discussed above. Illustrated modules include avideo codec 650,camera control 652, face, body and ROI finding 653,neural network models 655, framing 654,other video processing 656,audio codec 658,audio processing 660,sound source localization 661,network operations 666,user interface 668 and operating system and variousother modules 670. TheRAM 608 andDRAM 612 is used for storing any of the modules in theflash memory 604 when the module is executing, storing video images of video streams and audio samples of audio streams and can be used for scratchpad operation of theSoC 600. The neural network models 855 and face, body and ROI finding 853 are used with theframing 654 to perform the whiteboard and presenter detection and framing as described above forFIGS. 3 and 4 and illustrated inFIGS. 2A-2D . -
FIG. 7 is a block diagram of an exemplary system on a chip (SoC) 700 as can be used as theSoC 600 in thecodec 500. A series of morepowerful microprocessors 702, such as ARM® A72 or A53 cores, form theCPUs 601 or primary general purpose processing block of theSoC 700, while a more powerful digital signal processor (DSP) 704 and multiple lesspowerful DSPs 705, together theDSPs 602, provide specialized computing capabilities. Asimpler processor 706, such as ARM RSF cores, provides general control capability in theSoC 700. The morepowerful microprocessors 702, morepowerful DSP 704, lesspowerful DSPs 705 andsimpler processor 706 each include various data and instruction caches, such as L1I, L1D, and L2D, to improve speed of operations. Ahigh speed interconnect 708 connects themicroprocessors 702, morepowerful DSP 704,simpler DSPs 705 andprocessors 706 to various other components in theSoC 700. For example, a sharedmemory controller 710, which includes onboard memory orSRAM 608, is connected to thehigh speed interconnect 708 to act as the onboard SRAM for theSoC 700. A DDR (double data rate)memory controller system 714 is connected to thehigh speed interconnect 708 and acts as an external interface to external DRAM memory. A video acceleration module 716 and a radar processing accelerator (PAC)module 718 are similarly connected to thehigh speed interconnect 708. A neuralnetwork acceleration module 717 is provided for hardware acceleration of neural network operations. A vision processing accelerator (VPACC) module is the video encoder/decoder 614 and is connected to thehigh speed interconnect 708, as is a depth and motion PAC (DMPAC)module 722. - A graphics acceleration module 724 is connected to the
high speed interconnect 708. A display subsystem as theHDMI output 616 is connected to thehigh speed interconnect 708 to allow operation with and connection to various video monitors. A system services block 732, which includes items such as DMA controllers, memory management units, general purpose I/O's, mailboxes, and the like, is provided fornormal SoC 700 operation. Aserial connectivity module 734 is connected to thehigh speed interconnect 708 and includes modules as normal in an SoC. Aconnectivity module 736 provides interconnects for external communication interfaces, such asPCIe block 738,USB block 740 and anEthernet switch 742. A capture/MIPI module is thecamera interface 618 and includes a fourlane CSI 2 compliant transmit block 746 and a fourlane CSI 2 receive module and hub. - An
MCU island 760 is provided as a secondary subsystem and handles operation of theintegrated SoC 700 when the other components are powered down to save energy. AnMCU ARM processor 762, such as one or more ARM R5F cores, operates as a master and is coupled to thehigh speed interconnect 708 through anisolation interface 761. An MCU general purpose I/O (GPIO) block 764 operates as a slave.MCU RAM 766 is provided to act as local memory for theMCU ARM processor 762. ACAN bus block 768, an additional external communication interface, is connected to allow operation with a conventional CAN bus environment in a vehicle. An Ethernet MAC (media access control) block 770 is provided for further connectivity. External memory, generally non volatile memory (NVM) such asflash memory 604, is connected to theMCU ARM processor 762 via anexternal memory interface 769 to store instructions loaded into the various other memories for execution by the various appropriate processors. TheMCU ARM processor 762 operates as a safety processor, monitoring operations of theSoC 700 to ensure proper operation of theSoC 700. - It is understood that this is one example of an SoC provided for explanation and many other SoC examples are possible, with varying numbers of processors, DSPs, accelerators and the like.
- A system of one or more computers can be configured to perform particular operations or actions by virtue of having software, firmware, hardware, or a combination of them installed on the system that in operation causes or cause the system to perform the actions. One or more computer programs can be configured to perform particular operations or actions by virtue of including instructions that, when executed by data processing apparatus, cause the apparatus to perform the actions. One general aspect includes a method of presenting a talker and a whiteboard to a far end of a videoconference. The method also includes receiving at least one video stream containing both the talker and the whiteboard. The method also includes determining the presence of the talker near the whiteboard. The method also includes when the talker is near the whiteboard, framing the talker and the whiteboard together for provision to the far end. Other embodiments of this aspect include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.
- Implementations may include one or more of the following features. The method may include determining the presence of writing on the whiteboard, and where framing the talker and the whiteboard together is performed only when there is writing on the whiteboard. Determining the presence of writing on the whiteboard includes determining that the writing only partially fills the whiteboard and the writing is adjacent to the talker, and where framing the talker and the whiteboard together frames the talker and only the portion of the whiteboard adjacent to the talker containing the writing when the writing only partially fills the whiteboard and the writing is adjacent to the talker. Determining the presence of writing on the whiteboard includes determining that the writing fills the whiteboard, and where framing the talker and the whiteboard together frames the talker and the entire whiteboard when the determining the presence of writing on the whiteboard determines that the writing fills the whiteboard. The method the near end environment further containing a camera for providing a view of the whiteboard as content in the videoconference, the method may include: discontinuing provision of the whiteboard as content when the talker and the whiteboard are framed together. The method may include continuing provision of the whiteboard as content when the talker is not near the whiteboard. Determining the presence of the talker near the whiteboard includes detecting regions of interest in the at least one video stream; and determining if a region of interest is a whiteboard. Implementations of the described techniques may include hardware, a method or process, or computer software on a computer-accessible medium.
- The above description is intended to be illustrative, and not restrictive. For example, the above-described embodiments may be used in combination with each other. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.”
Claims (20)
1. A method of presenting a talker and a whiteboard to a far end of a videoconference, the near end environment containing the talker, the whiteboard and at least one video camera for providing a video stream to the far end, the at least one video camera having both the talker and the whiteboard within its field of view, the method comprising:
receiving at least one video stream containing both the talker and the whiteboard;
determining the presence of the talker near the whiteboard; and
when the talker is near the whiteboard, framing the talker and the whiteboard together for provision to the far end.
2. The method of claim 1 , further comprising:
determining the presence of writing on the whiteboard, and
wherein framing the talker and the whiteboard together is performed only when there is writing on the whiteboard.
3. The method of claim 2 , wherein determining the presence of writing on the whiteboard includes determining that the writing only partially fills the whiteboard and the writing is adjacent to the talker, and
wherein framing the talker and the whiteboard together frames the talker and only the portion of the whiteboard adjacent to the talker containing the writing when the writing only partially fills the whiteboard and the writing is adjacent to the talker.
4. The method of claim 2 , wherein determining the presence of writing on the whiteboard includes determining that the writing fills the whiteboard, and
wherein framing the talker and the whiteboard together frames the talker and the entire whiteboard when the determining the presence of writing on the whiteboard determines that the writing fills the whiteboard.
5. The method of claim 1 , the near end environment further containing a camera for providing a view of the whiteboard as content in the videoconference, the method further comprising:
discontinuing provision of the whiteboard as content when the talker and the whiteboard are framed together.
6. The method of claim 5 , further comprising:
continuing provision of the whiteboard as content when the talker is not near the whiteboard.
7. The method of claim 1 , wherein determining the presence of the talker near the whiteboard includes:
detecting regions of interest in the at least one video stream; and
determining if a region of interest is a whiteboard.
8. A videoconference endpoint for use in a near end environment containing a talker, an interactive group presentation device and at least one video camera for providing a video stream to a far end videoconference endpoint, the at least one video camera having both the talker and the interactive group presentation device within its field of view, comprising:
a processor;
a network interface coupled to the processor for connection to a far end videoconference endpoint;
a camera interface coupled to the processor for receiving at least one video stream having both the talker and the interactive group presentation device;
a video output interface coupled to the processor for providing a video stream to a display for presentation; and
memory coupled to the processor for storing instructions executed by the processor to perform the operations of:
receiving at least one video stream containing both the talker and the interactive group presentation device;
determining the presence of the talker near the interactive group presentation device; and
when the talker is near the interactive group presentation device, framing the talker and the interactive group presentation device together for provision to the far end.
9. The videoconference endpoint of claim 8 , the memory further storing instructions executed by the processor to perform the operations of:
determining the presence of information on the interactive group presentation device, and
wherein framing the talker and the interactive group presentation device together is performed only when there is information on the interactive group presentation device.
10. The videoconference endpoint of claim 9 , wherein determining the presence of information on the interactive group presentation device includes determining that the information only partially fills the interactive group presentation device and the information is adjacent to the talker, and
wherein framing the talker and the interactive group presentation device together frames the talker and only the portion of the interactive group presentation device adjacent to the talker containing the information when the information only partially fills the interactive group presentation device and the information is adjacent to the talker.
11. The videoconference endpoint of claim 9 , wherein determining the presence of information on the interactive group presentation device includes determining that the information fills the interactive group presentation device, and
wherein framing the talker and the interactive group presentation device together frames the talker and the entire interactive group presentation device when the determining the presence of information on the interactive group presentation device determines that the information fills the interactive group presentation device.
12. The videoconference endpoint of claim 8 , the near end environment further containing a camera for providing a view of the interactive group presentation device as content in the videoconference, the memory further storing instructions executed by the processor to perform the operations of:
discontinuing provision of the interactive group presentation device as content when the talker and the interactive group presentation device are framed together.
13. The videoconference endpoint of claim 12 , the memory further storing instructions executed by the processor to perform the operations of:
continuing provision of the interactive group presentation device as content when the talker is not near the interactive group presentation device.
14. The videoconference endpoint of claim 8 , wherein determining the presence of the talker near the interactive group presentation device includes:
detecting regions of interest in the at least one video stream; and
determining if a region of interest is an interactive group presentation device.
15. A non-transitory processor readable memory containing instructions that when executed cause a processor or processors to perform the following method of framing a talker, the near end environment containing a talker, a whiteboard and at least one video camera for providing a video stream to a far end, the at least one video camera having both the talker and the whiteboard within its field of view, the method comprising:
receiving at least one video stream containing both the talker and the whiteboard;
determining the presence of the talker near the whiteboard; and
when the talker is near the whiteboard, framing the talker and the whiteboard together for provision to the far end.
16. The non-transitory processor readable memory of claim 15 , the method further comprising:
determining the presence of writing on the whiteboard, and
wherein framing the talker and the whiteboard together is performed only when there is writing on the whiteboard.
17. The non-transitory processor readable memory of claim 16 , wherein determining the presence of writing on the whiteboard includes determining that the writing only partially fills the whiteboard and the writing is adjacent to the talker, and
wherein framing the talker and the whiteboard together frames the talker and only the portion of the whiteboard adjacent to the talker containing the writing when the writing only partially fills the whiteboard and the writing is adjacent to the talker.
18. The non-transitory processor readable memory of claim 16 , wherein determining the presence of writing on the whiteboard includes determining that the writing fills the whiteboard, and
wherein framing the talker and the whiteboard together frames the talker and the entire whiteboard when the determining the presence of writing on the whiteboard determines that the writing fills the whiteboard.
19. The non-transitory processor readable memory of claim 15 , the near end environment further containing a camera for providing a view of the whiteboard as content in the videoconference, the method further comprising:
discontinuing provision of the whiteboard as content when the talker and the whiteboard are framed together.
20. The non-transitory processor readable memory of claim 19 , the method further comprising:
continuing provision of the whiteboard as content when the talker is not near the whiteboard.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/654,585 US20220292801A1 (en) | 2021-03-15 | 2022-03-12 | Formatting Views of Whiteboards in Conjunction with Presenters |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163161133P | 2021-03-15 | 2021-03-15 | |
US17/654,585 US20220292801A1 (en) | 2021-03-15 | 2022-03-12 | Formatting Views of Whiteboards in Conjunction with Presenters |
Publications (1)
Publication Number | Publication Date |
---|---|
US20220292801A1 true US20220292801A1 (en) | 2022-09-15 |
Family
ID=83195054
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/654,585 Pending US20220292801A1 (en) | 2021-03-15 | 2022-03-12 | Formatting Views of Whiteboards in Conjunction with Presenters |
US17/659,895 Active US11696038B2 (en) | 2021-03-15 | 2022-04-20 | Multiple camera color balancing |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/659,895 Active US11696038B2 (en) | 2021-03-15 | 2022-04-20 | Multiple camera color balancing |
Country Status (1)
Country | Link |
---|---|
US (2) | US20220292801A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019229891A1 (en) * | 2018-05-30 | 2019-12-05 | 株式会社ニコンビジョン | Optical detection device and method, and distance measurement device and method |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2485887A1 (en) * | 2004-10-25 | 2006-04-25 | Athentech Technologies Inc. | Adjustment of multiple data channels using relative strength histograms |
EP1966648A4 (en) * | 2005-12-30 | 2011-06-15 | Nokia Corp | Method and device for controlling auto focusing of a video camera by tracking a region-of-interest |
JP4864835B2 (en) * | 2007-08-21 | 2012-02-01 | Kddi株式会社 | Color correction apparatus, method and program |
JP5379647B2 (en) * | 2009-10-29 | 2013-12-25 | オリンパス株式会社 | Imaging apparatus and image generation method |
JP2017139678A (en) * | 2016-02-05 | 2017-08-10 | Necプラットフォームズ株式会社 | Image data converter, image data conversion method, image data conversion program, pos terminal, and server |
JP6925816B2 (en) * | 2017-02-09 | 2021-08-25 | 株式会社小松製作所 | Position measurement system, work machine, and position measurement method |
-
2022
- 2022-03-12 US US17/654,585 patent/US20220292801A1/en active Pending
- 2022-04-20 US US17/659,895 patent/US11696038B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
US11696038B2 (en) | 2023-07-04 |
US20220294969A1 (en) | 2022-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10917612B2 (en) | Multiple simultaneous framing alternatives using speaker tracking | |
US9462227B2 (en) | Automatic video layouts for multi-stream multi-site presence conferencing system | |
US9466222B2 (en) | System and method for hybrid course instruction | |
US20210409646A1 (en) | Apparatus for video communication | |
US9232185B2 (en) | Audio conferencing system for all-in-one displays | |
US20130162752A1 (en) | Audio and Video Teleconferencing Using Voiceprints and Face Prints | |
US7694027B2 (en) | System and method for peripheral communication with an information handling system | |
EP2348671A1 (en) | Conference terminal, conference server, conference system and method for data processing | |
CN110333837B (en) | Conference system, communication method and device | |
US20230283888A1 (en) | Processing method and electronic device | |
CN108924469B (en) | Display picture switching transmission system, intelligent interactive panel and method | |
US20220292801A1 (en) | Formatting Views of Whiteboards in Conjunction with Presenters | |
CN111163280A (en) | Asymmetric video conference system and method thereof | |
KR102168948B1 (en) | Mobile video control system and and operation method thereof | |
US11937057B2 (en) | Face detection guided sound source localization pan angle post processing for smart camera talker tracking and framing | |
US20230135996A1 (en) | Automatically determining the proper framing and spacing for a moving presenter | |
KR20080087267A (en) | Transmission system of interactive video and audio | |
TW202244914A (en) | Data sharing method and data sharing system | |
Rui et al. | PING: A Group-to-individual distributed meeting system | |
Zhang | Multimodal collaboration and human-computer interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: PLANTRONICS, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SCHAEFER, STEPHEN PAUL;BRYAN, DAVID A;CHILDRESS, ROMMEL GABRIEL, JR;REEL/FRAME:059248/0024 Effective date: 20220311 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: HEWLETT-PACKARD DEVELOPMENT COMPANY, L.P., TEXAS Free format text: NUNC PRO TUNC ASSIGNMENT;ASSIGNOR:PLANTRONICS, INC.;REEL/FRAME:065549/0065 Effective date: 20231009 |